+
Skip to content

Conversation

seanthegeek
Copy link

Useful when creating indicators out of email headers.

@mgoffin
Copy link
Contributor

mgoffin commented Oct 1, 2015

I agree this will extract the email address, but it also encourages people to provide invalid email addresses in the To/CC/etc. fields.

@apolkosnik-old
Copy link
Contributor

How about uploaded emails, we'll have to correct them by hand? ;-) http://www.phpclasses.org/browse/file/14672.html

@apolkosnik-old
Copy link
Contributor

I think this spells it out... https://tools.ietf.org/html/rfc2822#section-3.4

@seanthegeek
Copy link
Author

Uploaded emails are my use case too. Eyballing those, the vast majority of headers are in the format of John Doe <john.joe@example.com>. I pushed a change so that it only carves for the + button. Thoughts?

@apolkosnik-old
Copy link
Contributor

'''>>> import re'''
'''>>> email_regex = re.compile(b''<(.+>)')'''
'''>>> maile = "Test3 test3@example.com"'''
'''>>> email_regex.findall(maile)'''
'''['test3@example.com>']'''

You need to change it to re.compile(b''<(.+)>')

@mgoffin
Copy link
Contributor

mgoffin commented Oct 1, 2015

It doesn't spell it out. Open up GMail, paste in "<Michael Goffin> mgoffin@gmail.com" for the To address, and see what happens. It fails because of the display name in the brackets. People expect proper email addresses for this stuff.

That being said, these values are also used for generating Targets, and that will continue to break for you if you abuse the field. It's designed for email addresses (mgoffin@gmail.com), not complex addresses with display names and other nonsense.

@apolkosnik-old
Copy link
Contributor

Reading through the RFC brings up some more interesting cases...

@apolkosnik-old
Copy link
Contributor

So clearly, some processing needs to happen on the uploaded emails to convert from Address to Addr-spec. Otherwise, anyone uploading .msg or .eml files en-masse will not be a happy camper.

@mgoffin
Copy link
Contributor

mgoffin commented Oct 1, 2015

Yep. I think options would be:

  1. drop everything but addr-spec on upload of email, and refrain from people adding anything but that
  2. allow for people to upload whatever they want in fields where email addresses are concerned, and DTRT when it comes to generating indicators and building out a Target profile (which could include filling out first and last name instead of just email address).

@seanthegeek
Copy link
Author

I'd prefer option two. It's more complex, but it covers building relationships in the possible case of an attacker that uses the same display name with different addresses.

@apolkosnik-old
Copy link
Contributor

+1 on option two

@brlogan
Copy link
Contributor

brlogan commented Oct 6, 2015

We've struggled with this same issue. Uploading email files has led to messy Email fields.
I think option two is probably the best answer here.

@apolkosnik-old
Copy link
Contributor

I was looking at some of the python email libraries, so that we don't have to redevelop the wheel, and keep up with the corner cases.

import email
from email.utils import parseaddr

m = email.message_from_string(str(sys.stdin.read()))
print("orig from: %s" % m['from'])
fff = parseaddr(m['from'])
print("parsed from: %s" % repr(fff))

And the output is:

orig from: "O'Test, User" <user.test@example.com>
parsed from: ("O'Test, User", 'user.test@example.com')

There's some more example code at http://blog.magiksys.net/parsing-email-using-python-header

@apolkosnik-old
Copy link
Contributor

While looking at an unrelated issue, it looks that we might run into a case where the message headers are encoded... https://www.ietf.org/rfc/rfc1342.txt, I think that should be handled before this happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载