+
Skip to content

PDFDocument() from pdfminer requires a parser argument #43

@kehayakawa

Description

@kehayakawa

Using the fix from #24 and python 3.5.2, I call slate.PDF(file) but PDFDocument requires a parser. What should be put here? I tried self.parser but this didn't work.

Traceback (most recent call last):
File "pdftotext.py", line 7, in
doc = slate.PDF(f)
File "//anaconda/lib/python3.5/site-packages/slate/classes.py", line 56, in init
self.doc = PDFDocument()
TypeError: init() missing 1 required positional argument: 'parser'

putting self.doc = PDFDocument(self.parser) leads to this error that I cannot fix either.

Traceback (most recent call last):
File "pdftotext.py", line 7, in
doc = slate.PDF(f)
File "//anaconda/lib/python3.5/site-packages/slate/classes.py", line 57, in init
self.doc = PDFDocument(self.parser)
File "//anaconda/lib/python3.5/site-packages/pdfminer/pdfdocument.py", line 559, in init
pos = self.find_xref(parser)
File "//anaconda/lib/python3.5/site-packages/pdfminer/pdfdocument.py", line 773, in find_xref
for line in parser.revreadlines():
File "//anaconda/lib/python3.5/site-packages/pdfminer/psparser.py", line 285, in revreadlines
s = self.fp.read(prevpos-pos)
File "//anaconda/lib/python3.5/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x88 in position 2: invalid start byte

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载