这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@franzbischoff
Copy link
Contributor

better parse for e-mails.
deals with several encodings and multipart issues.
It can still miss some weird encoding that may be used on fields other than "Subject" and the e-mail content, but it seems a case-to-case to deal with in further versions.

@review-agent-prime
Copy link

The changes in this PR are generally good, improving the readability and functionality of the code. However, there are a few areas that could be improved:

  1. Error Handling: In the as_mbox function, when the filename is not provided, an error message is printed and the function returns. However, it would be better to raise an exception instead of just printing an error message. This would allow the calling function to handle the error appropriately.
if filename is not None:
    filename = str(filename)
else:
    raise ValueError("[ERROR]: No filename provided.")
  1. Code Duplication: The get_content function and the parse_subject function both contain code to decode the content based on the encoding type. This code could be extracted into a separate function to avoid duplication.
def decode_content(encoded_text, encoding, charset):
    is_quoted_printable = encoding.upper() == "Q"
    is_base64 = encoding.upper() == "B"
    if is_quoted_printable:
        return quopri.decodestring(encoded_text).decode(charset)
    elif is_base64:
        return base64.b64decode(encoded_text).decode(charset)
  1. Use of print statements: The use of print statements for logging purposes is not recommended. It would be better to use a logging library, which provides more flexibility and control over the log messages.
import logging
logger = logging.getLogger(__name__)

# Replace print statements with logger calls
logger.error("[ERROR]: No filename provided.")
  1. String formatting: The new string formatting method (f-strings) is used in most places, but the old formatting method is used in one place. It would be better to use the new method consistently.
"url": f"file://{os.path.abspath(f'{parent_dir}/processed/{slugify(filename)}-{guid()}{ext}')}",

@franzbischoff
Copy link
Contributor Author

franzbischoff commented Oct 29, 2023

Wow, a PR autobot :-D

1- Nice!
2- Very nice :-D
3- Agree! But I just repeat what was done :-D
4- just did :)

@timothycarambat timothycarambat merged commit 26dba59 into Mintplex-Labs:master Oct 30, 2023
@franzbischoff franzbischoff deleted the feature/mbox branch October 31, 2023 08:40
franzbischoff referenced this pull request in franzbischoff/anything-llm Nov 4, 2023
* mbox parsing improvements v1

* autobots roll out!
cabwds pushed a commit to cabwds/anything-llm that referenced this pull request Jul 3, 2025
* mbox parsing improvements v1

* autobots roll out!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants