巨硬出品的playwright是一个非常不错的自动化工具,能够用同一套API去控制Chromium
, Firefox
和 WebKit
,看到这儿大家也许会想到可以用这玩意儿做爬虫,相比puppeteer
或者pyppeteer
,他们适配的浏览器更多,适配的语言更多。但是其官方不提供远程调用的方式。这儿是一个简单的远程调用playwright
的梳理流程和简单的代码。以playwright-python版本作为示例,其他语言的playwright
的做法是一样的。
安装和启动方式
python版本的playwright在启动浏览器进程的时候是用nodejs版本的进行启动的

在安装的时候setup.py会去下载对应版本的nodejs的版本

启动代码为
https://github.com/microsoft/playwright-python/blob/03e5cd01fdda2125cea47ab443d34564f767af13/playwright/_impl/_transport.py#L57
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
| class Transport: async def run(self) -> None: self._loop = asyncio.get_running_loop() self._stopped_future: asyncio.Future = asyncio.Future()
self._proc = proc = await asyncio.create_subprocess_exec( str(self._driver_executable), "run-driver", stdin=asyncio.subprocess.PIPE, stdout=asyncio.subprocess.PIPE, stderr=_get_stderr_fileno(), limit=32768, ) assert proc.stdout assert proc.stdin self._output = proc.stdin
while not self._stopped: try: buffer = await proc.stdout.readexactly(4) length = int.from_bytes(buffer, byteorder="little", signed=False) buffer = bytes(0) while length: to_read = min(length, 32768) data = await proc.stdout.readexactly(to_read) length -= to_read if len(buffer): buffer = buffer + data else: buffer = data obj = json.loads(buffer)
if "DEBUGP" in os.environ: print("\x1b[33mRECV>\x1b[0m", json.dumps(obj, indent=2)) self.on_message(obj) except asyncio.IncompleteReadError: break await asyncio.sleep(0) self._stopped_future.set_result(None)
|
所以要使playwright能够进行远程访问,只需要修改python版本的启动方式,然后后面的远程的nodejs版本更具请求参数进行转换,然后将ws暴露出来,即可进行远程访问

python端和nodejs端是通过进程间通信进行通信的,所以我们只需要在python和nodejs外各包一层,然后在让他们外包的一层之间用socket进行通信即可实现远程调用。

代码实现
python端代码修改
修改代码https://github.com/microsoft/playwright-python/blob/29cddbd5174ab262e5cb57b2d8c8fbcf8df3e171/playwright/_impl/_driver.py#L24
1 2 3 4 5 6 7
| def compute_driver_executable() -> Path: return Path("/Users/lozzo/.virtualenvs/py37/lib/python3.7/site-packages/playwright/driver/playwright.sh")
|
其中/Users/lozzo/.virtualenvs/py37/lib/python3.7/site-packages/playwright/driver/playwright.sh
内容为
1 2 3 4 5
| #!/bin/sh
cd /Users/lozzo/workdir/sovietironfist/test ts-node processPipe.ts
|
processPipe.ts
内容为
1 2 3 4 5 6 7
| import net from "net" (async()=>{ const socket = new net.Socket() socket.connect({host:"127.0.0.1",port:12345}) process.stdin.pipe(socket) socket.pipe(process.stdout) })()
|
需要注意的是请不要在这个脚本中使用console.*
进行任何标准输入输出操作,这些操作会被本地的playwright-python进行读取,产生异常
当然也可以修改playwright/_impl/_transport.py Transport类,使之直接和远端的socket链接,少走一层
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
| class Transport: async def run(self) -> None: self._loop = asyncio.get_running_loop() self._stopped_future: asyncio.Future = asyncio.Future() reader, writer = await asyncio.open_connection(host='127.0.0.1',port=12345) self._output = writer
while not self._stopped: try: buffer = await reader.readexactly(4) length = int.from_bytes(buffer, byteorder="little", signed=False) buffer = bytes(0) while length: to_read = min(length, 32768) data = await reader.readexactly(to_read) length -= to_read if len(buffer): buffer = buffer + data else: buffer = data obj = json.loads(buffer)
if "DEBUGP" in os.environ: print("\x1b[33mRECV>\x1b[0m", json.dumps(obj, indent=2)) self.on_message(obj) except asyncio.IncompleteReadError: break await asyncio.sleep(0) self._stopped_future.set_result(None)
|
node端代码修改
常驻进程为socketPipe.ts
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
| import net from 'net' import { spawn, ChildProcessWithoutNullStreams } from 'child_process' ;(async () => { let child: ChildProcessWithoutNullStreams | undefined const server = net.createServer() server.listen(12345) const close = () => { if (child) { child.kill(0) child = undefined } } server.on('connection', (socket: net.Socket) => { console.log('connection') if (!child) { child = spawn('/Users/lozzo/.virtualenvs/py37/lib/python3.7/site-packages/playwright/driver/b.sh') } child.on('exit', (number, signal) => { console.log('exit', number, signal) }) child.stdout.pipe(socket) socket.pipe(child.stdin) socket.on('error', close) })
server.on('close', close) server.on('error', close) })()
|
其中/Users/lozzo/.virtualenvs/py37/lib/python3.7/site-packages/playwright/driver/b.sh
内容为
1 2 3
| #!/bin/sh SCRIPT_PATH="$(cd "$(dirname "$0")" ; pwd -P)" $SCRIPT_PATH/node $SCRIPT_PATH/package/lib/cli/cli.js 'run-driver'
|
服务端启动ts-node socketPipe.ts
然后就可以远程调用了(本地无感使用),同理,其他的语言的服务是可以一样操作的