这是indexloc提供的服务,不要输入任何密码
Skip to content

URL scraper ends with error #173

@bibi-b

Description

@bibi-b

Worked before. But after a complete new installation the dockerized repo shows the following:

? What kind of data would you like to add to convert into long-term memory? Article or Blog Link(s)
? Do you want to scrape a single article/blog/url or many at once? Single URL
[NOTICE]: The first time running this process it will download supporting libraries.

Paste in the URL of an online article or blog: https://www.voigtdental.de
[INFO] Starting Chromium download.
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 109M/109M [00:03<00:00, 34.1Mb/s]
[INFO] Beginning extraction
[INFO] Chromium extracted to: /app/.local/share/pyppeteer/local-chromium/588429
Traceback (most recent call last):
File "/app/collector/main.py", line 84, in
main()
File "/app/collector/main.py", line 52, in main
link()
File "/app/collector/scripts/link.py", line 24, in link
req.html.render()
File "/app/collector/v-env/lib/python3.10/site-packages/requests_html.py", line 598, in render
content, result, page = self.session.loop.run_until_complete(self._async_render(url=self.url, script=script, sleep=sleep, wait=wait, content=self.html, reload=reload, scrolldown=scrolldown, timeout=timeout, keep_page=keep_page))
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/app/collector/v-env/lib/python3.10/site-packages/requests_html.py", line 512, in _async_render
await page.goto(url, options={'timeout': int(timeout * 1000)})
File "/app/collector/v-env/lib/python3.10/site-packages/pyppeteer/page.py", line 837, in goto
raise error
pyppeteer.errors.TimeoutError: Navigation Timeout Exceeded: 8000 ms exceeded.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions