WO2006026921A2 - Systeme et procede de detection d'hameçonnage et de verification de publicite electronique - Google Patents

Systeme et procede de detection d'hameçonnage et de verification de publicite electronique Download PDF

Info

Publication number: WO2006026921A2
Authority: WO; WIPO (PCT)
Prior art keywords: message; list; aggregator; isp; banko
Prior art date: 2004-09-07

Application number

PCT/CN2005/001423

Other languages

English (en)

Inventor

Marvin Shannon

Wesley Boudeville

Original Assignee

Metaswarm (Hongkong) Ltd.

Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)

2004-09-07

Filing date

2005-09-07

Publication date

2006-03-16

2005-09-07 Application filed by Metaswarm (Hongkong) Ltd. filed Critical Metaswarm (Hongkong) Ltd.

2006-03-16 Publication of WO2006026921A2 publication Critical patent/WO2006026921A2/fr

Links

238000000034 method Methods 0.000 title claims description 191
238000004458 analytical method Methods 0.000 claims description 14
230000008520 organization Effects 0.000 claims description 14
230000009471 action Effects 0.000 claims description 11
238000012360 testing method Methods 0.000 claims description 10
230000006399 behavior Effects 0.000 description 27
241000700605 Viruses Species 0.000 description 18
230000008901 benefit Effects 0.000 description 16
238000004891 communication Methods 0.000 description 12
239000000284 extract Substances 0.000 description 12
230000008859 change Effects 0.000 description 10
238000013459 approach Methods 0.000 description 8
238000013515 script Methods 0.000 description 8
230000000007 visual effect Effects 0.000 description 7
230000002155 anti-virotic effect Effects 0.000 description 6
230000000694 effects Effects 0.000 description 6
238000001514 detection method Methods 0.000 description 5
238000009826 distribution Methods 0.000 description 5
230000004044 response Effects 0.000 description 5
230000003542 behavioural effect Effects 0.000 description 4
230000003278 mimic effect Effects 0.000 description 4
230000036961 partial effect Effects 0.000 description 4
238000012795 verification Methods 0.000 description 4
102000010954 Link domains Human genes 0.000 description 3
108050001157 Link domains Proteins 0.000 description 3
230000005540 biological transmission Effects 0.000 description 3
230000002860 competitive effect Effects 0.000 description 3
238000005516 engineering process Methods 0.000 description 3
238000011156 evaluation Methods 0.000 description 3
238000000605 extraction Methods 0.000 description 3
101100004996 Mus musculus Ca5a gene Proteins 0.000 description 2
238000004422 calculation algorithm Methods 0.000 description 2
230000001934 delay Effects 0.000 description 2
238000012217 deletion Methods 0.000 description 2
230000037430 deletion Effects 0.000 description 2
238000010586 diagram Methods 0.000 description 2
230000006870 function Effects 0.000 description 2
238000007689 inspection Methods 0.000 description 2
238000013507 mapping Methods 0.000 description 2
230000007246 mechanism Effects 0.000 description 2
238000012015 optical character recognition Methods 0.000 description 2
230000001105 regulatory effect Effects 0.000 description 2
230000007480 spreading Effects 0.000 description 2
238000003892 spreading Methods 0.000 description 2
230000003068 static effect Effects 0.000 description 2
238000003860 storage Methods 0.000 description 2
238000010200 validation analysis Methods 0.000 description 2
KJLPSBMDOIVXSN-UHFFFAOYSA-N 4-[4-[2-[4-(3,4-dicarboxyphenoxy)phenyl]propan-2-yl]phenoxy]phthalic acid Chemical compound C=1C=C(OC=2C=C(C(C(O)=O)=CC=2)C(O)=O)C=CC=1C(C)(C)C(C=C1)=CC=C1OC1=CC=C(C(O)=O)C(C(O)=O)=C1 KJLPSBMDOIVXSN-UHFFFAOYSA-N 0.000 description 1
241000239290 Araneae Species 0.000 description 1
244000035744 Hura crepitans Species 0.000 description 1
206010035148 Plague Diseases 0.000 description 1
241000607479 Yersinia pestis Species 0.000 description 1
230000001174 ascending effect Effects 0.000 description 1
230000009286 beneficial effect Effects 0.000 description 1
230000000903 blocking effect Effects 0.000 description 1
230000015556 catabolic process Effects 0.000 description 1
239000003086 colorant Substances 0.000 description 1
239000012141 concentrate Substances 0.000 description 1
238000010276 construction Methods 0.000 description 1
230000009193 crawling Effects 0.000 description 1
238000013502 data validation Methods 0.000 description 1
238000006731 degradation reaction Methods 0.000 description 1
235000015872 dietary supplement Nutrition 0.000 description 1
230000003467 diminishing effect Effects 0.000 description 1
239000000796 flavoring agent Substances 0.000 description 1
235000019634 flavors Nutrition 0.000 description 1
238000009472 formulation Methods 0.000 description 1
230000001771 impaired effect Effects 0.000 description 1
238000003780 insertion Methods 0.000 description 1
230000037431 insertion Effects 0.000 description 1
230000003993 interaction Effects 0.000 description 1
238000007726 management method Methods 0.000 description 1
239000000463 material Substances 0.000 description 1
239000000203 mixture Substances 0.000 description 1
230000004048 modification Effects 0.000 description 1
238000012986 modification Methods 0.000 description 1
238000005457 optimization Methods 0.000 description 1
238000005192 partition Methods 0.000 description 1
238000003825 pressing Methods 0.000 description 1
238000007639 printing Methods 0.000 description 1
230000008569 process Effects 0.000 description 1
239000000047 product Substances 0.000 description 1
238000010926 purge Methods 0.000 description 1
230000009467 reduction Effects 0.000 description 1
238000011160 research Methods 0.000 description 1
239000000523 sample Substances 0.000 description 1
230000001568 sexual effect Effects 0.000 description 1
238000010561 standard procedure Methods 0.000 description 1
230000008685 targeting Effects 0.000 description 1
238000012549 training Methods 0.000 description 1
238000012546 transfer Methods 0.000 description 1
230000001052 transient effect Effects 0.000 description 1
230000007704 transition Effects 0.000 description 1

Classifications

- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
- H04L63/1483—Countermeasures against malicious traffic service impersonation, e.g. phishing, pharming or web spoofing

Definitions

This invention relates generally to information delivery and management in a computer network. More particularly, the invention relates to techniques for automatically classifying electronic communications and web pages as phishing or non-phishing. It also relates to techniques for detecting viruses or worms.
Phishing is characterized by the phisher sending many electronic messages, typically purporting to be from a financial institution or large company, at which the recipient presumably has an account. Often, the message will claim that the recipient needs to enter certain crucial personal pieces of information, like her account number or username, and her password. The message usually looks like it came from the purported source. It may have images downloaded from the actual source, to reinforce this impression.
the message might have, and this is the point of the message, a link or button that is actually to a different location. So if the user fills out her personal data in the message and presses the button, the information is sent to the phisher. Or if she follows the link, she gets to a page at some website where she is encouraged to fill out information, and this page is then sent to the phisher.
the message might have a button that makes a popup window. Within this popup might be a form to be filled out and a button to be pushed that sends the data to the phisher.
Phishing is a type of spam. And it is well known that many spam messages have forged senders. But phishing tends to be fairly distinct from other spam, in that its forged sender is meant to have specific deceptive purpose. Usually reinforced by wording and visuals in the body of the message.
Phishing messages tend to be very carefully crafted. Unlike a typical spam message, phishing is considered highly damaging, because it involves actual fraud. There is incentive for an ISP for prevent it, as well as for the company that is impersonated by the phisher.
viruses that take over computers, often unbeknownst to their owners. These viruses then run web servers that are the destinations of those crucial links in mass phishing messages. The viruses forward any such information received from tricked users to other locations on the Internet. In this way, the phisher can further obscure her origin. Plus, for a given mass phishing mailing, she can divide up those messages so that different messages might point to different hijacked computers. This means that tracking down and disabling the virus on one such computer is not enough to stop all her phishing.
Another method looks for heuristics in a suspected message. Like key words or phrases. This has the drawback of language dependence. Plus, it is probabilistic (subjective). It does not decisively (deterministically) say whether a message is phishing or not. This method comes from its use against general spam. Another method looks at registrations of new domains. It flags a domain if that name seems to be a variant of a bank's name, say. But a phisher might register a completely innocuous domain. And then possibly use a bank's name as a subdomain. Where she does not have to register this subdomain with any DNS group. Hence it is not possible to detect this subdomain until after she sends out her phishing messages. Plus, the method fails against her subverting an existing domain and using that as a destination in her messages.
Another method involves spiders crawling the Web. Looking for a phisher setting up a domain with web pages that mimic a bank's. But the Web is so vast that where does this method start looking? Also, a smart phisher can isolate her domain from the Web while she is developing it. And, if she needs to test it connected to the Web, she can just make it connected for a brief time.
Bayesians are popular for antispam. (Though they have limitations there.) However, there is a fundamental problem with using Bayesians against phishing. A Bayesian needs two sets of training data. One set is good data. The other is bad. In this latter case, phishing messages. But these use very respectable and mainstream financial language. They adhere closely to legitimate financial messages. Wheres generic spam often has sexual words, or phrases about "free” or health supplements [etc].
Another method involves user education. But there will always be gullible users. And the phishers can produce messages of extremely high quality, with very convincing text and visuals. Along these lines, one variant method tells the user to move her mouse over the links in a suspected message. Looking for a link that does not go to the purported sender. There are several problems with this. Firstly, it is manually some effort to move the mouse over several links and look at what the browser says is the destination. There are unsophisticated users who do not understand what the browser is saying for the destination. They know enough to identify and click on a link. But the destination information is beyond their ken. Secondly, that information can be deliberately obfuscated by the phisher.
a variation of the user education involves suggesting that the user examine the HTML source code of a suspected phishing message. So that she might find the actual network address that a link goes to, and see if this is suspect. It is difficult to take this seriously.
Most HTML email that is read in a browser has a structure that someone not used to HTML will find hard to follow. In part because the message providers that offer a web interface to read email often add many ads into the web page, outside the region displaying the message. Plus, for the user to examine the HTML, she has to decide in the first place that the message appears suspicious.
Another method uses two factor gadgets.
a hardware cryptographic approach. These generate transient passwords that the user can use when logging into a bank's website. So that if the website is fake (“pharm"), the phisher's capture of the password is of little value.
the gadgets were developed for use by employees and consultants of a company, when remotely accessing the company's computers. Usually, an employee only has one full time job, and so only needs to carry one gadget. But people often have several bank accounts, plus perhaps accounts at other companies targeted by phishers. It is very cumbersome for a person to carry around several gadgets.
Another method involves a software cryptographic approach. Where a bank might distribute passwords to its customers. And thence possibly encrypt future messages to them. An unsophisticated customer might find this confusing. And there are problems with key distribution and revocation.
Our method involves a company producing, optionally but preferably as a Web Service, a list of partner domains or addresses that will be in links in outgoing messages or in its web pages. ISPs and other message providers can access these lists and apply them against domains inside bodies of incoming messages purporting to be from that company. If a domain in such a message is not on the Partner List, then the message can be dealt with as a phishing message or invalid electronic advertisement. An ISP can also use the method to permit advertising access to its customers. The method defines a heterogeneous p2p network whose main purpose is to prevent phishing fraud. The method also involves a third party (Aggregator) that collects Partner Lists and acts as a trusted source of these for network queries.
Aggregator aggregate
Our methods offer a unified detection of phishing, pharming and viruses.
Fig. 1 shows a systems diagram of the general Anti -phishing technology.
Fig. 1 shows a systems diagram of the general Anti-phishing technology.
the first party is the companies that have been impersonated or think that they might be impersonated in the future by phishers.
the second party is the ISP or other electronic message providers. Later, we show how other parties can be involved.
Our methods apply across all electronic communication modalities, including email, instant messaging, WAP (Wireless Access Protocol) and SMS.
WAP Wireless Access Protocol
SMS Short Less Access Protocol
our method also applies when the message is heard by the user. For example, the user might be blind, and the "browser" is a program that reads out the message to her. Also for brevity, we imagine that the first party might be a bank.
BankO With a website, bankO.com. On its website's IP address, at some port, it offers a programmatic interface. An external party, like an ISP, can query this. Call this party ISPO. BankO replies with a string that is a list of base domains in links in messages that it will be sending or is sending to its customers or anyone else, like potential customers. The query and the reply can be in any format that BankO and a potential questioner, who might be a message provider, agree to support.
this format could be in XML.
the query could simply be " ⁇ askLinks/>”.
the reply by BankO could be
base domains we mean that you take a domain, like a.b.partnerO.com, and reduce it to the minimal rightmost name that can be owned. Which gives us partnerO.com in this example. Notice that this means that the number of fields in the resultant base domain can be a function of the country, if the rightmost field is a country. This is also shown above, where we have the base domain partner2.com.au, because in Australia (represented by “au”), its dot com domains end in com.au. Whereas in France (represented by "fr”), you can own a domain, like "partner3.fr”.
a raw network address might be used, instead of a base domain.
the syntax of the query and reply might be more intricate.
an entry for a partner might include the start and end dates for which messages were sent that had links to that partner. This would let ISPO assess a message for validity at the time when it was received.
Another example might be a notation for a raw network address that gave a range of addresses. For instance, using the Internet Protocol Version 4, a reply might include "10.20.30.*" where the asterisk denotes any valid value in the last field.
the bank might also return information about the various stylistic properties of its outgoing messages.
styles in previous U.S. Provisional filings: Number 60320046, “System and Method for the Classification of Electronic Communications", filed March 24, 2003; Number 60481745, “System and Method for the Algorithmic Categorization and Grouping of Electronic Communications, filed December 5, 2003; Number 60481789, “System and Method for the Algorithmic Disposition of Electronic Communications", filed December 14, 2003; Number 60481899, “Systems and Method for Advanced Statistical Categorization of Electronic Communications", filed January 15, 2004; Number 60521014, “Systems and Method for the Correlations of Electronic Communications", filed February 5, 2004; Number 60521174, “System and Method for Finding and Using Styles in Electronic Communications", filed March 3, 2004; Number 60521622, “System and Method for Using a Domain Cloaking to Correlate the Various Domains Related to Electronic Messages", filed June 7, 2004; Number 6052
the bank might make these styles available to other organizations. So an ISP which gets a message can reduce it to a canonical form, as in Provisional 60320046, and then find its • styles and compare these with the approved styles for that bank. If some styles of the message are not in the approved styles, then the ISP might regard the message as not being from the bank. Along these lines, the bank might also return full or partial Bulk Message Envelope [BME] information, where the BME is defined in that Provisional. (The styles are partial BME information.)
the query has a domain, and is asking whether that domain is on the Partner List.
the query might be " ⁇ aslcLinks>partner20.com ⁇ /askLinks>”.
the reply might simply be " ⁇ yes/>” or " ⁇ no/>”.
BankO might want to do this if it wants to hide its Partner List. So that someone who wants to know has to make repeated queries. This can be detected by BankO and used as a possible heuristic indicating a non-message provider, if the questioner offers a set of domains that tend to not be in the Partner List. That is, the questioner seems to be guessing, as opposed to finding these in actual messages.
partner when we said "partner", this can also include subsidiaries and parent companies. For example, suppose BankO has a home mortgage subsidiary, HomeO. Perhaps it bought this company at some time in the past, and runs it as a wholly-owned subsidiary that has its own website. Then the Partner List can include homeO.com. Likewise, suppose BankO is a subsidiary of an automaker, AutoO. It might be the financing arm for AutoO. Then BankO's Partner List might also have an entry for autoO.com.
ISPO has received the Partner List from BankO by whatever means, and entered it into its database.
ISPO then receives a message for one of its users, with the sender supposedly being from BankO. That is, the sender purports to be, for example, someone@* .bankO.com, where the asterisk indicates possible subdomains of bankO.com.
spammers in general, forge the sender lines.
phishers in particular, forge these, in most known cases of phishing, because it is crucial that the recipient believe the message is actually from a purported source, starting with the sender line.
ISPO then extracts the base domains from links in the body of the message. Taking care to avoid links that are in comments. During this extraction, if it encounters dynamically generated hyperlinks, it should apply our method in Provisional 60521698 in order to properly extract any valid base domain.
ISPO can maintain a list of companies that are likely to be impersonated. Periodically, it could query those companies for their lists of partners, and then store and use these lists in the above manner. Or there might be some central organization with a list of companies that offer such data about their partners. ISPO can get its list of companies from this organization, and thence query those companies, or get that data from the organization. In the next section, we call this an Aggregator, and will describe its roles.
ISPO can reduce the message to a canonical form and put it into a BME as described in Provisional 60320046. Then it can compare information in this BME with any full or partial BME information obtained from BankO. If there is disagreement, the message might be considered as not coming from BankO.
BankO does not make its Partner List available to a query by any other party. Instead, BankO might require an explicit prior agreement with specific ISPs. Then, it might furnish those ISPs with usernames and passwords. So that an ISP would give its usemame and password, and, if BankO says these are valid, it replies with its Partner List.
the usemame of an ISP might be one or more of its network addresses.
BankO can still send messages that describe companies not on its Partner List. It just should not make these hyperlinks. Intra-company, users inside BankO can still send any messages to other internal users, and have hyperlinks to any outside destination, independent of the Partner List. BankO can run our method as an ISP, on its messages that
IP address space has around 2 ⁇ 32 entries, for IPv4, which is the current common version of the Internet. Plus, when IPv6 becomes common, its numbering space will have 2 A 128 entries. So spammers have a lot of name and address space to occupy.
a blacklist can have hundreds of thousands or millions of entries. Whereas the number of companies using our method to furnish Partner Lists is likely to be few. Only the largest companies might need our method, because they are the likeliest targets for phishers. Suppose these are the 500 largest companies in a country, and the average size of a Partner List is 10. If we use hashing to speed searching, then we have two levels of searching. The first level is a hash search of a table of 500 entries, so the search time goes as log 500.
a spammer especially a phisher, can set up beforehand a set of new domains and addresses, that have never been used for spamming, and hence might not be on any ISP's blacklist. This is especially true if the domains consist of hijacked computers. Then, she sends a barrage of many messages. The messages may or may not emanate from those domains. But the key characteristic is that the messages have links to these domains [or addresses]. She is taking advantage of the vulnerability of a blacklist to a "zero-day" or "zero-hour” attack. This refers to the fact that it takes time for someone at an ISP or antispam company that maintains a blacklist to identify and put a new entry into the blacklist. During this time, anyone receiving such a message is vulnerable to being tricked.
an ISP might perhaps delay the delivering of received messages to its users' mailboxes, until it has had time to try to compute new entries for the blacklist, based on the new mail. Many ISPs will not be willing or able to do this, especially because it introduces a delay in mail getting for the users. For competitive reasons, this is not attractive. Most users expect to get messages as quickly as possible. In contrast, when applying the Partner Lists, these can be done immediately.
An ISP can decide whether or not to charge for using our method. It might charge BankO. If so, it might also decide not to charge other companies. It can do this on a case by case basis. Regardless of whether it charges any companies or not, it can have separate policies regarding charging its users. For example, it might not charge them at all. Or, it might offer this as part of an extra, for-fee, premium service, as protection against phishing. Also, different ISPs can pursue different policies. In the above example of a query and reply, elaborations are possible. For example, an ISP, or any other entity on the net, for that matter, can ask BankO to be put on an "alert list". So that when BankO changes its list of partners, it informs everyone on the alert list.
BankO informs others of changes, it might include the new list in its message, or it might require that the contactees use the above query and reply method to get the new list.
an ISP can easily apply our method. But it is not restricted to an ISP or message provider.
a browser or any other program that can display an electronic message that contains hyperlinks, or buttons linking to somewhere on an electronic network, and can let the user pick such a link or button and thence go to that link, or destination in the button.
This program can use our method. It can have a list of companies, for which it periodically goes to, in the fashion described for ISPs earlier, and downloads their Partner Lists. Then, when the user views a message that claims to be from a sender at one of those companies, it checks the base domains in the body against the company's partners, and alerts the user if there is a discrepancy.
base domains As those which could be purchased. If governments or educational institutions were to use our method, there would be obvious generalizations of base domains to accommodate this usage. For example, in the United States, the federal government agencies use the domain ,gov. We can choose the base domains here to be 2 fields, like ssa.gov or irs.gov. Then each agency might make its Partner List available. Similarly for other governments.
ISPs In most developed countries, something like 5-10 ISPs have over 50% of the market. By necessity, phishers have to preferentially target the users at these ISPs. Perhaps also by choice. The largest ISPs serve a mass market. They are more likely to have technologically unsawy users, who are in turn more likely to be fooled by a phisher. Therefore, a relatively few but crucial set of ISPs have to adopt our method, in order for benefits to happen. But this set can start with just one ISP. Now consider the targets. As discussed earlier, these will tend to be large financial firms, and other large companies that, for example, have many online customers. In total, there are not that many targets. And, like ISPs, they do not have to join en masse for our method to yield benefits.
An ISP can use the Partner Lists it gets from the companies as input to other antispam methods, to better filter out spam. Plus it might also use these lists to better understand its users' interests and in turn use this to offer better marketing of users and their interests, as discussed in our Earlier Provisionals. (There may be issues here of the ISP having to get permission from the companies whose Partner Lists it is using.) For example, aside from rejecting messages purporting to be from BankO, ISPO might offer BankO guaranteed advertising access to some subset of ISPO's customers. This can be a revenue source for ISPO.
a major advantage of our method is that the ISP does NOT attempt to authenticate a received message, in general.
Some other anti-phishing and antispam methods have proposed various means of attempted authentication. But, for example, "Sender ID” (as suggested by Microsoft) is vulnerable to IP spoofing. Plus, as proposed, Sender ID requires extensive changes to DNS servers throughout the Internet and will take several years to implement, even according to its proponents.
J be the set of major ISPs
T be the set of large companies that phishers use for targets.
a substantial fraction of J and T end up using our method, thus reducing a phisher's revenue.
She might then look at smaller ISPs, aiming at these customers who are also customers of T. Or she might try to impersonate smaller companies, aiming at customers of T or of small ISPs.
the ISPs already using our method for large companies can sign up smaller companies.
smaller ISPs not yet using our method might decide to do so. Especially if there is competitive pressure for them to offer this level of protection to their customers, that their larger competitors already do.
Our method can greatly reduce losses due to phishing. It would help many people and companies. Language independent and can be deployed globally. When applied, the users do not have to see the phishing messages. So no matter how well worded a message, or how persuasively a phishing web site might mimic a target site, the users do not get fooled. Users do not have to learn new passwords or perform new actions.
a similar idea to the Partner List is a Behavior List. Written preferably in XML. It can indicate the various types of behavior exhibited by the message. For example, does the message have popups? And if so, how many? Does it have pop-unders? Does it use scripting?
Our methods here are not restricted to a browser. They can be used in any user application that can display electronic messages or web pages or which gets or manipulates data that comes over a network, and which permits the addition of third party plug-ins. Plus, if such a program did not, our methods below still can apply, if the program is reimplemented to permit the addition of our plug-in, or, equivalently, of its functionality.
the meaning of this variable is that it designates the value as some kind of network address, similar to, or the same as, a URL or URI (Universal Resource Indicator) or phone number, for the bank.
the name of the variable is arbitrary, but let us call it "a”. So we might have a tag like :
the tag name and variable name are always in lowercase. HTML is deliberately permissive, so that, for example, these are the same to a browser: ⁇ body>, ⁇ BODY>, ⁇ Body>. But our requirement for our tag means that parsing does not have to first push an input string to lower case, for example, before looking for " ⁇ notphish ". 2. The tag is terminated with these characters, and in this order - ⁇ space, forward slash, greater than ⁇ . The inclusion of the forward slash is characteristic of XML tags, which are more rigorously formatted than HTML tags.
the browser has a plug-in program whose job is to handle this tag.
the plug-in runs whenever a page is loaded, but before it is displayed.
the plug-in has an optional but preferred button, that is visible.
Sarah may be able to suppress the display of this plug-in's button, just like that of any other plug-in's visible component, if she desires, and if the browser permits such an action.
the plug-in tells the browser that no scripting routine executed from the page should be able to launch a window over the plug-in's visible component.
the plug-in parses the current page. It looks for such a tag, as described above. If the tag exists, then it extracts the address's value. It also extracts any links from the page. Usually these might be outgoing links. But they might also be incoming links. (The latter might be when an HTML message loads an image from some URL.)
the link extraction may possibly involve evaluation or attempted evaluation of scripting routines, if the page contains dynamic hyperlinks. (See Section 2.4.)
the plug-in reduces these links to their base domains. While doing so, it optionally, but preferably, makes a list of the different protocols used by the links. Typically, this list might be ⁇ "http", “https”,”ftps”... ⁇ . The point here is that a bank might require that authentic links only use a secure protocol like https.
the plug-in can then go out to an Aggregator Web Service, as mentioned in the previous section.
This is a network site with a list of keys.
Each key is the base domain of an Aggregator customer, like a bank.
the Aggregator has its Partner List.
the plug-in sends the address value of "bankO.com".
the format of the message is arbitrary. But let us write it in XML format as -
the plug-in also sends a list of the unique base domains and the list of protocols for these domains. If so, then the format of this is arbitrary. But we could choose to write it as -
the Aggregator checks if it has bankO.com as one of its keys. If not, then it returns a negative result.
the format of this is arbitrary, but let us write it as " ⁇ no/>".
the plug-in compares every base domain found in the web page with those in the official Partner List. If all the domains are official partners, and the protocols are the approved protocols, then the plug-in can show this in some fashion. Visually, it can show a color or image that indicates a validated page. Otherwise, it shows a different color or image for a non-validated page. Optionally there might only be two different colors, or two different images, essentially meaning Yes or No. As simple as possible. Suppose we have determined that a page is non-valid, as in the previous paragraph. We define it as being "invalid". This is quite distinct from a page being non-valid simply because it lacks our tag, which will be true of most pages.
the plug-in can turn off all links in the page, or only the invalid links. Plus, the plug-in might highlight or in some way change the representation of the invalid links. It could pop up a window warning the user about the page. Very useful in alerting Sarah about a possible phishing web page or message.
the plug-in might send the page, or some subset thereof, to one or more of the bank, Aggregator or appropriate regulatory authorities.
the plug-in could perform the previous test. If all the page's domains and protocols are in the official lists, then it might reply with " ⁇ yes/>", for example. While if any domain or protocol is not in the official lists, then it might reply with " ⁇ no/>", for example. The plug-in can take this reply and respond accordingly to Sarah.
the plug-in might be configured to automatically perform this on every page viewed by the browser.
the computational and bandwidth requirements are minimal. Most pages will not have the tag, so the plug-in will not need to consume bandwidth to ask the Aggregator. Plus, when the user uses the browser to go from one page to another, that change of page occurs on the user's timescale. Given current computers, the time for the plug-in to look for the tag and, if it exists, doing the other tasks, will usually be negligible on that timescale.
the network communication to and from the plug-in is optionally but preferably unencrypted plain text.
the plug-in can also cache data it gets from the Aggregator. So that in future, it would look first in its cache, and, if the desired data is not present, then it asks the Aggregator.
Our method is a significant generalization of the previous section. We can now apply antiphishing methods not just to electronic messages, but to any web document. And do both in essentially the same fashion.
a variable could be stored that indicates to the plug-in whether it should upload the base domain of the page, or the full address of the page, or omit it. For example, we might call such a variable "u”, with possible values of "1", “2” and "3", where these correspond respectively to those different choices. Then we might see a tag like this:
the plug-in By sending address information to the bank or Aggregator, the plug-in gives more information to the latter party or parties. For example, it can be used to detect if a page has been copied from some authorized web site to an unauthorized web site. This is important because some phishers set up pharms, which are web sites that pretend to be those of banks, say, in order to trick a visitor in revealing her account information. These pharms might have several pages copied more or less verbatim from the real bank pages.
This id could be some string, as in this example:
the id enables a useful new capability.
the bank can then restrict what partner links are on this page.
BankO has the partners given earlier - autol5.com.au, store23.com, store.com.cn. It gets an id sent to it of "jld". This maps to a page that should only link to store23.com. So it sends only that partner back, in its reply.
the plug-in gets this reply and compares it to the base domains extracted from its page. If these are autol5.com.au and store23.com, then the plug-in marks the page as invalid. Without the id, BankO has to send back its entire Partner List. Whereas now, it has more precise control. Possibly for targeted marketing.
the id does not give the phisher any advantage.
she copies a valid page, with such a tag to a web site she controls. She edits the page. Whether she changes it to an id valid for another page or message, or to a non-existent id, or leaves the id unchanged, any Partner List at BankO or the aggregator will still not include her domains.
an instance of the tag might also include variables for both uploading and id.
a very common usage of browsers is to read electronic messages.
Each major ISP has its own format for displaying a message in a web page.
ancillary regions of the web page Often used to navigate between messages, and related tasks. Plus, the ISP might put clickable ads in these regions. If the message has our tag, we don't want the domains or protocols in these ads, or those found from any other links outside the body, to be compared with the bank's Partner List.
the plug-in can have logic that is aware of each major ISP's message displaying. In Section 3, we explain in greater detail how this is performed.
bankO.com So the first time that the plug-in sees a Notphish tag pointing to bankO.com, it might need to seek out the Aggregator, as described earlier. But now that it knows that bankO.com is a valid domain, as attested by the Aggregator, it may optionally store bankO.com in a list of valid domains. This might be only in volatile memory, for that instance of the browser. Or it might store the list on the disk, for future invocations of the browser.
the plug-in might have logic to prevent it. This includes, but is not limited to, the following:
An Aggregator can invalidate a domain.
Another optimization is that when the plug-in gets the data for BankO, it might store these in memory or disk. It might choose to consult this, when it encounters future pages with tags pointing to BankO, instead of asking bankO.com or the Aggregator. If so, it should take into account any starting or expiration dates for a partner. So that if the current time for a partner is after an expiration date, for example, then the plug-in needs to go out on the net and get an updated Partner List.
Another possible role of the Aggregator is to get some measure of how heavily the banks are using our methods. Clearly, whenever it gets a query about bankO, it can increment a counter for that bank. But suppose the plug-in were only to ask about bankO once, and then later query bankO directly? The Aggregator can record the plug-in's address in a list associated with bankO. The address might be the computer's network address or hardware address.
the plug-in need never contact the Aggregator. This could arise if when the plug-in is installed, it comes with a hardwired list of known banks. Then, so long as the browser stays within those banks, there is no need to ask an Aggregator about a different domain. While this may seem overly restrictive on the usage of the browser, imagine a plug-in that expressly restricts its validating steps to that list. That is, if the plug-in comes across a page with a Notphish tag pointing to a domain not on the list, the plug-in does not attempt any validation. Nor does it indicate that the page is invalid. The point here is that some combination of the author of the plug-in and whoever paid for the plug-in's distribution, can restrict its validating utility to that initial list. (Who presumably might have paid the former party or parties.)
a simple extension is to have the plug-in be able to accept updates to the list, or also deletions from the list, across the network. This would probably necessitate some type of authentication mechanism, in order for the plug-in to be assured of receiving appropriate data. Regardless of any particular mechanism, this remote updating would let the plug-in owner charge others, to be put into the updates, or not to be removed from the plug-in's list.
An alternative method is for the plug-in to be physically distributed on media, like a CD.
This might be financed by one or more of the banks that are registered with the Aggregator.
the CDs might go to customers of the banks.
the cost of this, and the knowledge of the customers' addresses, is deliberately used to stymie the phishers, who are unlikely to be able to afford the former, or know the latter.
a related issue is what happens when a phisher writes a fake plug-in and somehow gets it disseminated to a browser.
We have a countermeasure where the user manually types in the address of the Aggregator into the browser's address panel, and directs the browser there.
the Aggregator can have a web page where the user can upload the plug-in. (This may not be possible in some browsers.) Then, the uploaded binary can be compared with the real one.
a phisher, Jane can construct more elaborate attacks.
One possibility is that she might write a scripting routine. This can run by the browser if the user [recipient] is viewing the message in a browser, and if she has enabled it to run scripting code present in messages.
the de facto scripting language for use in browsers is JavaScript, but our remarks apply to any scripting language supported by a browser.
the combination of a scripting language and browser may enable the following:
the routine might be able to display an image, over the portion of the browser that shows the sender information. The image might say, for example, info@bankO.com, and the body of the message might have text claiming to be from BankO, but with a link to Jane's domain.
the catch here is that the sender field in the message header might have a base domain unrelated to bankO.com. Jane can essentially write any address she wishes here, except one from bankO.com.
the script is showing widgets, it is possible that the image of a sender address is generated dynamically. That is, suppose the image is seen by a human as "info@bankO.com". This text might not actually exist as a single string inside the script. Instead, there might be arbitrarily complicated code that builds this string. This is to avoid us being able to simply parse out an address.
spammers in Provisional 60521698. There, the spammer was making a dynamic hyperlink in the message body. Here, it is not a hyperlink that is being made, but the dynamic aspect is the same. Likewise, we can evaluate the script in an isolated "sandbox", to try to discern the address it constructs.
Provisional can be applied, especially if Jane decides to write infinite loops into her script, to stymie our evaluations. (If in fact we detect an infinite loop in the script, then we cannot evaluate the message further. So we prevent the message from going to its recipient and we exit these steps.)
a heuristic flag which we term a style, to indicate that the message has an image over the display of the sender field.
we can set one or more style flags if we detect scripting that makes these images.
the more of these flags that are set the more suspect a message is.
Some of these flags might depend on whether an image blocks a particular part of the displayed message. An important example is where an image blocks the subject line, replacing it with another subject line, unbeknownst to the user. Why would Jane do this?
a common way of viewing messages is to read a message, then go directly to the next message in the list, in ascending or descending order. In doing so, the user unlikely to notice any discrepancy between what is presented before her in a message, compared to what she might have seen for that message in a far earlier perusal of the list of messages, especially if she has many messages.
Every target in Gamma has an optional list of misspellings. We can then apply these to the sender base domain, to detect possible phishing. If we get a match, then apply the steps in Section 1.
misspelling list for a given bank obviously can have some subjectivity in its construction. But listing the obvious misspellings forces her to choose far more blatant misspellings, that the recipient is more likely to detect.
markO is a partner in the first place, because BankO probably sent out many messages that pointed to it. Under these circumstances, there is no effective way to hide markO's status from Jane. Nor should there be. But not only does this tell her about markO; as an element of psychology, it implicitly validates markO to BankO's recipients. That is, a message purported to be from markO, and which mentions BankO, might have credibility to an unsuspecting user.
Jane sends a message, claiming to be from info@markO.com. It has one or more outgoing links to bankO.com. Or one or more incoming links from bankO.com, where it is loading images from bankO's website, say. Optionally, it might also have incoming or outgoing links to markO.com. All these are fine. But the message also has an outgoing link to somewhere.com, for example. This might be pointing to Jane's website. How do we test for this?
BankO might choose not to have any other companies in its Partner List, except possibly subsidiaries or parent companies. And if so, those entities all have Partner Lists that are consistent with what messages will be sent. This would prevent any links to outside companies in its messages, but it may regard that as acceptable.
BankO might require that its partners also have Partner Lists at the major ISPs or Aggregators. If our method gets widely used, then this may not be an undue restriction, though it may still limit BankO's partners to major companies. BankO may or may not see this as a handicap to its marketing.
Rho(p,bank0.com) means that for any messages coming from p, and which have links to or from BankO, then the only valid outgoing links are those in Rho.
Rho is acceptable
BankO sends these to the ISPs or the Aggregator. Later, when we (the ISP) get a message claiming to be from markO.com, and which has links to bankO.com and other addresses, we check that these are all in Rho(mark0.com,bank0.com). If not, then we designate the message as phishing.
a bank may choose to have a redirector on its website. Probably for use by its own web pages or by its partners. The danger is if the redirector will redirect a query to an arbitrary URL given in the query.
a phisher can send messages, claiming to be from the bank. AU the links in the message are to the bank, so everything appears correct. But one link is to the redirector, telling it to redirect to the pharm website.
the bank should check the base domain in the destination of any redirector query against a list of valid destinations. This could be a separate list, analogous to the Partner List, which hitherto has been for valid destinations in outgoing messages from the bank. But the bank may find it convenient to also use the Partner List for redirections. Hitherto, we have discussed only an ISP using the bank's Partner List.
ISPO that has the bank's Partner List. When it gets a message claiming to be from the bank, it also checks any links to the bank. If a link has a redirector, then ISPO checks the destination's base domain against the Partner List. 2.10 Extending to Persons and Smaller Companies
the recipient's computer or ISP may have to contact the sender's computer or ISP, or a key handling authority, to obtain the sender's public key, if the method involves public/private key techniques. So this will consume some outgoing and incoming bandwidth. Perhaps just as importantly, the latency can be mostly the time spent waiting for the other computer to process the query.
Partner List should usually be faster than any secure authentication method. Plus, there is no ancillary bandwidth and latency cost, because all this can be done on the ISP's computers, or by the plug-in on the user's computer.
the Notphish tag could have an argument that was the hash (or some other type of signature) of the message or web page.
the hash was done with the Notphish tag absent.
the tag is inserted into the web page or outgoing message, with the hash included.
the plug-in saw the Notphish tag, it might remove the tag, compute the hash of the remaining message or page, and compare with the tag's hash value. If the two disagree, then the message or page is invalid. If the two agree, then the plug-in should still consult the Aggregator, which has presumably been furnished with valid hashes of messages or pages, by the banks that wrote them.
a simple extension of this method is for the ability to do multiple hashes per message or page.
the author could embed begin and end tags for each part that is to be hashed. Preferably, these parts do not overlap, for simplicity.
the names of these tags could be some agreed upon standard.
the plug-in could find the tags and do the hashing. Then it would contact the Aggregator and compare with the hashes that the author's company had uploaded to the Aggregator.
Another variant deals with the case when a message has attachments, where these attachments might not be in plain text or HTML.
the author might hash the attachments, using one hash per attachment, say. Then, these could be uploaded to the Aggregator. Plus, the Notphish tag that is put into the plain text or HTML attachment could have an argument that indicates to a message provider or plug-in that gets the message, that hashes should be made of the other attachments.
Notphish tag might have an argument that indicated that certain types of scripting should not be in the message or page.
an ISP might offer a premium service to its users, whereby it checks a user's messages. For those with the Notphish tag, it performs the validation steps described above, that the plug-in did. If the ISP finds that a message is invalid, it might delete the message. Or make it available to the user, but in a separate folder (called "Invalid" perhaps), and possibly with the links turned off. This is useful if some of its users do not have the plug-in, and so cannot do the steps of this Invention.
any method might be used on either side. Though any given method might perhaps have greatest efficacy on a particular side.
the validate includes, but is not limited to, the detection of phishing messages.
Section 2 a salient point was how a plug-in that runs in the browser can extract the body of a message from a web page. The problem is that suppose the web page is displayed in
HTML HyperText Markup Language
⁇ body> and ⁇ /body> tags we cannot simply find the ⁇ body> and ⁇ /body> tags, and then extract everything in between. These tags refer to the body of the entire page. But when an ISP displays a message, the actual body of the message itself is embedded in some subset of those tags. In general, outside that message body, but still within the overall ⁇ body> and ⁇ /body> tags, the ISP puts other information, including hyperlinks. Some of these links are to itself, so that the user can go to the next and previous messages, for example. But some links may be to third parties. The ISP may be showing ads for those parties.
the plug-in can find from the browser the current URL that the browser is showing. From this, it can reduce the address to a base domain and then compare it to a list of known ISPs. For each ISP on this list, the plug-in can have logic that knows of the specific delimiters used by that ISP. Most often, these delimiters might be implemented as tags. The tags may or may not be comment tags.
the end tag need not necessarily have the same name as the start tag, preceded by a forward slash. That is, suppose the start tag is ⁇ messageBody>. Then, following a common XML convention, the end tag would be written as ⁇ /messageBody>. But this is not necessary.
the end tag might be ⁇ endMessage>, for example.
An ISP might inform the company that wrote the plug-in about the exact delimiters that the ISP is using. Or the company might periodically query it. Then, the plug-in might periodically query the company for such information. Or, the company might update the plug-in, for such plug-ins that it knows their network addresses, and that are currently running at that address and can receive such updated information.
plug-in Another optional method is for the plug-in to have a list of ISPs, and occasionally, it queries one or more of these, asking for the current delimiters.
the plug-in can cache these, and write these to non-volatile storage, so as to use them the next time that it runs, as well as during the current run.
the ISP can be queried, by the plug-in or the company, this can be done in the fashion of a Web Service.
the query might simply be " ⁇ askDelimiters/>", where this goes to some port on the ISP's web site.
the ISP's reply might be
⁇ /delimiters> Consider an ISP with a given user interface. It wants to change this, and also change to new delimiters. It should convey this information to the company prior to the change, and also to any plug-ins that query it directly. In the information that it conveys, it can include a time, after which the new user interface and delimiters take effect. But there are difficulties in standardizing on a uniform time across a network. Different computers can be set to different times, under the control of their owners. The plug-in has no control of the time on its computer. Hence, to handle the transition to new delimiters, the plug-in should have logic to accommodate both for some period of time.
the Aggregator might also hold such delimiter information. And the ISP might send new information to it, if the ISP will be changing its delimiters.
the ISP can also make the information about a future set of delimiters available as a Web Service. Essentially, for a pair of delimiters, there might be an extra set of tags, like ⁇ time>8Aug2007 ⁇ /time>. Or it could be written as an attribute, like ⁇ delimiter time- '8Aug2007" />. When queried, the ISP's Web Service might return the currently active delimiters, and possibly a future pair.
An ISP might have different user interfaces at any given time. So that its users could choose between these, for example. In general, each interface would have different delimiters. This information can also be communicated to the company or plug-in, in the above fashion. If the ISP decides to make these delimiters available via a Web Service, it could be via the obvious generalization of our example above for a Web Service that offers up one set of delimiters.
ISPs might standardize on a common user interface. At least to the extent that they use the same delimiters for the message body. They could still customize the visual appearance.
the support of an ISP by the plug-in may involve the ISP furnishing a fee to the company that wrote the plug-in.
the plug-in can straightforwardly extract it. Note that the plug-in must start from the first (earliest) start tag, and go to the last end tag, and take everything in between as the message body. This is to prevent a phisher inserting extra start and end tags, to throw off where we extract our links 10 from.
One utility is to let the plug-in attempt to extract the purported sender line. Which could then be used in further validating the message.
the ISP can make this list available via a Web Service.
An ISP might also modify existing links in a message body. For example, it might change an existing link in the body to point to an address run by the ISP. The rest of the contents of this altered link contain the original link information, modified in some fashion. If the user picks the link, the browser goes to the ISP's address, passing it the arguments in this altered link. The ISP's server at that address uses these arguments to contact the original link.
Hotmail which is owned by Microsoft Corp.
the plug-in should be able to undo the ISP's changes, in order to extract the original base domain.
This treatment is similar to how the plug-in finds the delimiters for the message body.
the plug-in finds the base domain of the page, and sees that it is a known ISP, it can look up in its internal data if that ISP does this link changing. If so, the plug-in might have a prerecorded list of the ISP's domains and it can check the domain in the link against this list. If there is a match then the plug-in knows that the original domain is in the arguments of the link. Again, for a given ISP, the plug-in may have logic that can unravel these arguments and find the original domain. The computational effort of this to the plug-in is not expected to be high, because the ISP itself has to perform this unraveling on its machine, as quickly as possible, in order to scale to large workloads.
ISP Internet Protocol
the ISP could implement the query answering as a Web Service.
the information from the ISP is of two types. Firstly, a list of the addresses of its servers, that it writes as the addresses in the changed links. Secondly, it needs to convey information about its encoding scheme for moving the original link into the arguments of the changed link. There may need to be an a priori agreement between the ISP and the plug-in's company as to a simple, standardized language or convention that expresses this mapping. It would then also be of use to other ISPs that wished to change links. Because by describing their mappings in this language, they could still carry out their changes, and their customers could still use our methods to validate their messages. Note that this does not constrain an ISP in what rewriting changes it wishes to make to a link, provided that those changes can be expressed in that language.
ISPs have an incentive to conform to our methods described here. And to inform an Aggregator or the plug-in's company in advance of any changes to their user interfaces. This does not constrain their freedom of action in designing unique user interfaces.
the plug-in can then dynamically accommodate their changes, and offer their end users protection against fraudulent electronic messaging that is received by the ISPs.
One method of verification could be via insertion of a tag into Gamma. That is, if Gamma really came from Chi, then Chi would have placed this tag into it. Preferably at or near the front of Gamma, so that it can be quickly detected, and the contents of this tag read by some kind of loading program ("Loader"). While the format of the tag is broadly arbitrary, we can take a preferred implementation to be the Notphish tag, perhaps in the form of this example -
the Loader is a program in the operating system. Here, part of its duties can be similar to those of the browser plug-in discussed earlier. Where the Loader's relationship to the operating system is like the plug-in with respect to the browser.
the Loader can read the tag and ask the Aggregator for data about chi.com. First, is it a customer of the Aggregator? If not, then the tag is false, and the Loader might mark the program as invalid (a virus perhaps?) and deal with it accordingly. (Deletion?)
Aggregator information associated with the id 1258d might include a hash of the program and a code designating which hash algorithm was used. So the Loader can use this algorithm against Gamma and compare the resultant hash with the downloaded hash. If different, then Gamma can be marked as suspect.
Chi might at some earlier time have uploaded to the Aggregator a Partner List and Behavioral List for Gamma.
the Partner List is a
Loader Prior to Runner be activated on Gamma, Loader can also attempt to extract network addresses and protocols from Gamma. If any are not on the Partner List, then Loader can 5 mark Gamma as suspect.
Runner can scrutinize any network communications made by Gamma, prior to these connections actually being made. If it finds an address or protocol combination not in the Partner List, then it can terminate Gamma and inform Sarah.
This runtime assessment 0 differs from the HTML/plain text message or web page, because those usually have static addresses, which permit an extraction by the plug-in, before the browser displays the data. But for an arbitrary program Gamma, which is not malware, it may have valid reason to generate a dynamic address.
the Behavior List shows the types of operations that Gamma will or might perform. For example, will Gamma change any existing files? Will it delete any existing files? Will it use any encryption routines already present on the computer? Does it contain encryption routines? Can it run with the privileges of a typical user, or does it need system privileges? Etc.
our method lets a reputable company release a program with several generically suspect behaviors, without it being "hit” by conventional analysis. Each company defines the correct behaviors of its programs, independently of other companies.
An attachment might be some arbitrary file, or an encoding of that file. The latter might arise if the file has binary data, whereupon a common encoding method like Base64 or uuencode could be used to convert the file to an ASCII representation for transmission via email. Typically, many email handling programs require or prefer that messages be restricted to ASCII symbols.
the Notphish tag might be inside the attachment body.
the information that would be inside the Notphish tag might be written in the attachment header.
the information might, optionally but preferably, be written in the form "X-Notphish " where the "X-" is used to start a line in the attachment header. There might be several such Notphish lines. This "X-" prefixing is the standard method in email for third parties to attach custom lines to a header.
Each item in the Updater List might be a network address (like a URL), such that the act of selecting it initiates a download.
our method is not concerned about the format of the updates. But about a simple and safe means for finding these on the web. Then, the Loader (or perhaps the Runner) might download the Updater List from the Aggregator, and contact one of those sites for an update.
the tag information might be encoded in the name of the Notphish file itself, e.g. "Notphish-bank0.com.txt", with the file being empty.
the Loader can then do a listing of the actual files in d and down into the subdirectories, and, at the simplest level, compare these names with those from the Aggregator. If there are extra files or missing files in d, then the Loader might indicate this to Sarah.
the Loader could also compute hashes of some or all of the files, and compare against the correct hash values given by the Aggregator.
the Aggregator reply can indicate this if needed. Or, sometimes a file takes on one of a (small) range of values. So the file might have a set of hashes affiliated with it. Again, the Aggregator reply can indicate this if needed.
the recipient can have a program runtime security policy.
each policy might also have specific cases pertaining to different companies that author these tagged programs. Even though all these companies are customers of the Aggregator, and are thus presumed to be reputable. For example, suppose Sarah's computer runs a Microsoft Corp.'s operating system. She might have a policy that is willing to grant more privileges to a program that has been verified to be from Microsoft. While programs from any other computer company will run under more restricted privileges.
a Partner List of a file or a message or web page can also have a policy applied to it. So that if an entry is considered undesirable under that policy, then the file is not run, or the message or web page is marked as "bad" by the plug-in. Recall also that if the network addresses are dynamically generated during the running of the program, then an analysis of the static code may be insufficient to determine these addresses. Hence it is useful for the author to send a Partner List to the Aggregator.
a user with a computer She obtains some data. Where the data type could be, but is not limited to, email, web page or file. Where the obtaining could be via an electronic network (wired or wireless) or by other means, like physical distribution using a DVD or tape. And where the obtaining could have been initiated by her (e.g. she goes to a website using a browser), or it could have been initiated by an external event (e.g. she gets an electronic message or someone at a trade show hands her a DVD).
her e.g. she goes to a website using a browser
an external event e.g. she gets an electronic message or someone at a trade show hands her a DVD.
the data usually has assertions that it was originally from some company or author. Or these assertions might arise out of the context in which the data was obtained. For the latter case, imagine, for example, that she got the data from a DVD, that had a label with that company's name on it, as the author or originator of the data. In general, this purported company is different from whoever or whatever directly passed that data to her.
the data has a tag. (Maybe several tags.) This is some information, written in a standard format, that can identify the company and possibly the data. Her computer extracts the tag and presents the information to an Aggregator. If no such company is supported by the Aggregator, or an id of the data is not recognized, then the Aggregator tells the user that the data is invalid.
the Aggregator downloads to the user's computer various metrics about that data. These metrics were earlier sent to the Aggregator by the company.
the metrics include, but are not limited to, one or more of a Partner List, Behavior List and hash/hashes.
the user's computer can independently compute one or more of these metrics from its copy of the data. It compares the computed metrics with the downloaded metrics. If any differ, then the data is invalid (or highly suspect).
the Aggregator acts as a trusted source of information for her computer.
One of its crucial tasks is to only have customers that it has validated in some fashion as reputable.
the network that the user's computer uses to talk to the Aggregator need not be the same as the network that was used to get the data. (Assuming that the data was obtained via a network.) For example, imagine Sarah with a cellphone that has Internet capability and Bluetooth capability. She walks by a billboard with a Bluetooth transceiver. It detects that her phone can receive Bluetooth, and it sends her a message via Bluetooth. Offering a discount on some products made by a company, BigStore.com. Her phone can programmatically try to verify if this message really came from BigStore.com, if it has a tag. The phone can contact the Aggregator on the Internet and use our method. So here, our method spans two different types of networks.
the data does not have to be digital. Our examples of electronic message, website and file were inherently digital. But it is also possible that the data exist in an analog form. For example, consider a sheet of paper with written text. Sarah might have a device (like a scanner) that can read this using optical character recognition. The text could include a tag. And, once the text is scanned, it can be hashed. And this hash can be compared with that from an Aggregator, using the tag.
the tag might be in a watermark or in glyphs embedded in the page. And where we assume that Sarah's device can also read this encoding and extract the tag.
the data might be a barcode on an item, and within this barcode is a tag. So that a barcode reader can read the barcode, extract the tag and contact an Aggregator to validate the barcode.
Another variant involves the use of the location of the tag or of the user.
the tag might have some position information about itself (perhaps GPS coordinates), which Sarah's phone could verify. And this information might be uploaded to the Aggregator. Or the Aggregator might download position information, that could then be compared with that in the tag. Or, possibly, the tag has no position information. Instead, when Sarah contacts the Aggregator, it downloads approved position information to her. She might then compare this to the position of the tagged item (in this case, the billboard) or to her position. If she is within sufficient proximity (which can also be downloaded from the Aggregator) then the data containing the tag is valid, and she can do a course of action. But if she is out of proximity, then the data is invalid, and that action cannot be done.
Another variant goes back to what can be expressed in a tag. It might have a field that indicates to the user that she should upload to the Aggregator, not just the tag data, but extra information that she supplies. Hitherto, the main focus of our invention has been letting the user validate data. But our method can also be used to perhaps validate the user.
the Aggregator takes the uploaded data from the tag and from the user, and applies logic to determine what it will download. One possible use is to charge the user for asking the Aggregator. Or to limit the number of queries per unit time.
Junk faxes are a problem.
One possible answer involves the fax getting a message. It might have a policy that it will only print a message that can be validated. That is, the message has a tag, and the fax machine computes a hash of the message, say. Then, it uses the tag to ask an Aggregator for the appropriate hash. By comparing the computed hash with that from the Aggregator, the machine can tell if the message came from the purported sender.
the item in question has the physical form of an id of some kind. Like a passport or driver's license. Or a smart card or credit card. All of these often have the name of the user, a serial number and sundry data. These may be represented on the card in both analog and digital form.
the analog form is the printing of the data on the item's surface or pages.
the digital form might be held in a magnetic strip on the item, for example.
Sarah and her computer ask for this id from a person, Ralph. He hands over the id and her computer scans in its digital data.
the computer might also have other hardware that can analyze the physical form of the id, to try to see if it is a forgery. For the digital data, Sarah wants to verify that.
Her computer finds a hash of the digital data. Then, it contacts the Aggregator and uploads the entity's name and the computed hash. The Aggregator replies whether such a hash exists for that entity.
An extension of this method is for the item to have some data represented only in analog form, e.g. written on the surface of the item. Then the scanner might do an analog scan to read and convert this data to digital form, and combine it with the digital data stored on the item, before hashing.
Sarah might be superfluous here, unlike her computer. That is, an implementation of our method does not need the manual intervention of Sarah. Instead, Ralph might place his item into the scanner. Or, the scanner might be able to read the information from the item when it comes within a certain proximity of the scanner.
the cheaper cost of our system may enable more frequent id verification, and thus reduce the incidence of fraud.

Landscapes

Engineering & Computer Science (AREA)
Computer Security & Cryptography (AREA)
Computer Hardware Design (AREA)
Computing Systems (AREA)
General Engineering & Computer Science (AREA)
Computer Networks & Wireless Communication (AREA)
Signal Processing (AREA)
Information Transfer Between Computers (AREA)

PCT/CN2005/001423 2004-09-07 2005-09-07 Systeme et procede de detection d'hameçonnage et de verification de publicite electronique WO2006026921A2 (fr)

Applications Claiming Priority (8)

Application Number	Priority Date	Filing Date
US52224504P	2004-09-07	2004-09-07
US60/522,245		2004-09-07
US52245804P	2004-10-04	2004-10-04
US60/522,458		2004-10-04
US52252804P	2004-10-11	2004-10-11
US60/522,528		2004-10-11
US16226605A	2005-09-04	2005-09-04
US11/162,266		2005-09-04

Publications (1)

Publication Number	Publication Date
WO2006026921A2 true WO2006026921A2 (fr)	2006-03-16

Family

ID=36036707

Family Applications (1)

Application Number	Title	Priority Date	Filing Date
PCT/CN2005/001423 WO2006026921A2 (fr)	2004-09-07	2005-09-07	Systeme et procede de detection d'hameçonnage et de verification de publicite electronique

Country Status (1)

Country	Link
WO (1)	WO2006026921A2 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
WO2007076715A1 (fr) *	2005-12-30	2007-07-12	Metaswarm (Hongkong) Ltd.	Systeme et procede d'approbation de pages web et de messages electroniques
WO2008086924A1 (fr) *	2007-01-16	2008-07-24	International Business Machines Corporation	Procede et dispositif de detection de fraude informatique
US8856937B1 (en) *	2008-06-27	2014-10-07	Symantec Corporation	Methods and systems for identifying fraudulent websites
US9118704B2 (en)	2012-10-24	2015-08-25	Hewlett-Packard Development Company, L.P.	Homoglyph monitoring

2005
- 2005-09-07 WO PCT/CN2005/001423 patent/WO2006026921A2/fr active Application Filing

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
WO2007076715A1 (fr) *	2005-12-30	2007-07-12	Metaswarm (Hongkong) Ltd.	Systeme et procede d'approbation de pages web et de messages electroniques
WO2008086924A1 (fr) *	2007-01-16	2008-07-24	International Business Machines Corporation	Procede et dispositif de detection de fraude informatique
US9083735B2 (en)	2007-01-16	2015-07-14	International Business Machines Corporation	Method and apparatus for detecting computer fraud
US9521161B2 (en)	2007-01-16	2016-12-13	International Business Machines Corporation	Method and apparatus for detecting computer fraud
US8856937B1 (en) *	2008-06-27	2014-10-07	Symantec Corporation	Methods and systems for identifying fraudulent websites
US9118704B2 (en)	2012-10-24	2015-08-25	Hewlett-Packard Development Company, L.P.	Homoglyph monitoring

Legal Events

Date	Code	Title	Description
2006-03-16	AK	Designated states	Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW
2006-03-16	AL	Designated countries for regional patents	Kind code of ref document: A2 Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU LV MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG
2006-05-24	121	Ep: the epo has been informed by wipo that ep was designated in this application
2007-03-08	NENP	Non-entry into the national phase in:	Ref country code: DE
2007-10-31	122	Ep: pct application non-entry in european phase

Publication	Publication Date	Title
US20240089285A1 (en)	2024-03-14	Automated responsive message to determine a security risk of a message sender
US12261883B2 (en)	2025-03-25	Detecting phishing attempts
US11689559B2 (en)	2023-06-27	Anti-phishing
US11102244B1 (en)	2021-08-24	Automated intelligence gathering
Ramzan	2010	Phishing attacks and countermeasures
Zeller et al.	2008	Cross-site request forgeries: Exploitation and prevention
US8713677B2 (en)	2014-04-29	Anti-phishing system and method
US8095967B2 (en)	2012-01-10	Secure web site authentication using web site characteristics, secure user credentials and private browser
US20070094500A1 (en)	2007-04-26	System and Method for Investigating Phishing Web Sites
US20070174630A1 (en)	2007-07-26	System and Method of Mobile Anti-Pharming and Improving Two Factor Usage
US20070094389A1 (en)	2007-04-26	Provision of rss feeds based on classification of content
US20150213131A1 (en)	2015-07-30	Domain name searching with reputation rating
US20070005702A1 (en)	2007-01-04	User interface for email inbox to call attention differently to different classes of email
US20130031213A1 (en)	2013-01-31	Obtaining and assessing objective data relating to network resources
US20060190533A1 (en)	2006-08-24	System and Method for Registered and Authenticated Electronic Messages
Herzberg et al.	2004	Protecting (even) Naive Web Users, or: preventing spoofing and establishing credentials of web sites
WO2007076715A1 (fr)	2007-07-12	Systeme et procede d'approbation de pages web et de messages electroniques
WO2006026921A2 (fr)	2006-03-16	Systeme et procede de detection d'hameçonnage et de verification de publicite electronique
WO2007016869A2 (fr)	2007-02-15	Systemes et procedes ameliores de commerce electronique, de detection des virus et de protection contre le hameçonnage
WO2007016868A2 (fr)	2007-02-15	Systeme et procede pour verifier des liens et des adresses electroniques dans des pages web et des messages
Harley et al.	2007	Phish phodder: is user education helping or hindering?
US20100215176A1 (en)	2010-08-26	Means and method for controlling the distribution of unsolicited electronic communications
WO2006042480A2 (fr)	2006-04-27	Systeme et procede d'investigation de sites de peche aux donnees personnelles
Dunham	2004	Phishing isn't so sophisticated: scary!
Ceesay	2008	Mitigating phishing attacks: a detection, response and evaluation framework