WO2002037328A2 - Integration de recherche et de classification: resultat et classement - Google Patents
Integration de recherche et de classification: resultat et classement Download PDFInfo
- Publication number
- WO2002037328A2 WO2002037328A2 PCT/IL2001/000942 IL0100942W WO0237328A2 WO 2002037328 A2 WO2002037328 A2 WO 2002037328A2 IL 0100942 W IL0100942 W IL 0100942W WO 0237328 A2 WO0237328 A2 WO 0237328A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- score
- composite
- query
- component
- document
- Prior art date
Links
- 239000002131 composite material Substances 0.000 claims abstract description 91
- 238000000034 method Methods 0.000 claims description 55
- 238000004422 calculation algorithm Methods 0.000 claims description 7
- 238000004891 communication Methods 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 5
- 238000010606 normalization Methods 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims 2
- 238000013507 mapping Methods 0.000 claims 1
- 230000008569 process Effects 0.000 description 8
- 230000000699 topical effect Effects 0.000 description 6
- 238000013459 approach Methods 0.000 description 5
- 230000007246 mechanism Effects 0.000 description 4
- 238000001914 filtration Methods 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 238000007635 classification algorithm Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 230000001609 comparable effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
Definitions
- a search mechanism typically attaches to each document a set of indexing concepts.
- An indexing concept is a symbol or value that characterizes the document, and is typically used within search queries or within routing queries ("queries" that specify which documents will be routed to an addressee).
- Typical types of indexing concepts include topical categories (also known as controlled keywords, topics, descriptors etc.). These are symbols denoting topical issues, which are usually general or abstract concepts that do not necessarily appear literally in the text. For example, a topical category may be "Company Acquisition". This term, serving as the name of the category, may not appear literally in a document that describes such an event.
- indexing concepts may also be used to determine routine routing of incoming documents to addressees.
- indexing process The process of associating indexing concepts to documents (the indexing process) is performed either manually, automatically, or by some combination of the two modes.
- indexing concepts that consist of terms and names from the document text
- the indexing process usually involves scanning the text of the document, identifying words, terms and names, and possibly bringing these terms to some canonical form (e.g. the grammatical base form (lemma) of the word).
- canonical form e.g. the grammatical base form (lemma) of the word.
- the first approach is based on manual definition of the rules, or some other type of logic, by which a document is being classified to a category based on the terms in the text.
- some systems allow users (or administrators) to define complex queries, which may include Boolean and other types of conditions (such as weights and proximity) that the terms in the document should satisfy.
- a document that satisfies these conditions is classified to the category.
- An example for such a system is the Topics TM system that was developed by Verity Inc., USA.
- the characterization of a category is referred to as the "profile" of the category. Basically, the profile is a weighted vector of terms, but it can include more sophisticated conditions as described above.
- Every document is scored according to the correlation between the profile and the terms that appear in it.
- the second approach is based on automatic learning of the "logic" which entails the classification of the document to a category.
- Methods belonging to this approach utilize a set of training documents, for which the correct categories are known in advance (usually as the result of manual classification of these documents).
- a learning method may then include a learning phase, in which some model of the category is constructed. For example, such a model may include terms that are highly associated with the category, and possibly some weights that quantify the degree of correlation (entailment) between each term and the category.
- a learning method may be memory based, in which case the learning method simply stores the training data in some useful format.
- the method classifies it automatically by consulting or applying the category model (or by simply comparing the document to the training data, in case of a memory based approach).
- trainable (learning) classification systems are described in: 1. C. Apte and F. Damerau and S. Weiss, 1994. Towards language independent automated learning of text categorization models, in Proceedings of ACM-SIGIR Conference on Information Retrieval.
- Any form of display that takes document scores into account is often sorted by some relevance ranking, which is intended to approximate the degree of relevance of the document to the query.
- the query includes some free-text terms and the scoring is dependent on various known per se criteria such as the number, frequency and positioning of the free text terms in the document.
- the query "laser” is searched within the category "science”.
- the scoring of the so retrieved documents takes into account score of categories in addition to other scores.
- the latter include the basic query score (such as the free-text words that were introduced to the query), but as will be explained in greater detail below possibly also other known per se scoring criteria.
- a query is not bound to any free text form and accordingly any form of query that produces a set of results with scores is applicable (hereinafter basic query).
- basic query is a free-text query.
- Other options such as browsing a directory or asking for similar pages are also applicable.
- document should be construed in a broad manner including, but not limited to, text documents represented in various formats, multimedia documents that include audio and/or video.
- the scoring phase where a documents are assigned with composite scores, there follows a display step where the documents (or data associated therewith such as titles) are displayed, preferably according to some ranking criterion.
- the ranking is realized by sorting the documents by their composite scores and displaying all (or some of them according to a pre-defined criteria), in, say descending composite score order.
- the invention is not bound by any particular interface for placing the query(s) or obtaining the query results, and accordingly the appropriate interface may be determined, depending upon the particular application.
- the term category encompasses both pre-determined categories and ad-hoc categories. Accordingly, the score a document is given in relation to a category may be the result of a supervised classification (into pre-determined categories, using some automatic classification method) or an unsupervised classification (into ad-hoc categories, using some clustering algorithm).
- a composite query is composed of at least a basic query component (e.g. free-text query component) and indexing concept component and more specifically category component.
- a basic query component e.g. free-text query component
- indexing concept component e.g. category component
- each of the said components may comprise several sub-components: the basic query component may be a free-text phrase that includes several words; similarly, the indexing concept may include several categories.
- Each document has a composite score for the query as a whole. This score is determined by scores for each of the query components, both the free-text component and the categories (which by themselves may be the result of combining the scores for their sub-components) which are then composed so as to obtain a composite score of the document.
- a method for obtaining a composite score of documents comprising: i) providing a composite query that includes at least basic query component and indexing concept component and obtain at least one document that meet said composite query; ii) calculating a non-Boolean score of said at least one document according to each one of said components; iii) combining said scores so as to obtain a composite score; and iv) displaying at least one of said documents associated with said score.
- the invention further provides a system for obtaining a composite score of documents, comprising: i) means that include user interface for providing a composite query that includes at least basic query component and indexing concept component and obtain at least one document that meet said composite query; ii) means that include processor for calculating a non-Boolean score of said at least one document according to each one of said components; iii) means that include processor combining said scores so as to obtain a composite score; and iv) means that include user interface for displaying at least one of said documents associated with said score.
- combining the scores is accomplished by taking into account relationships between the components within the document, such as adjacency.
- a filtering condition is applied to the score of the query so as to consider only documents that match the query at a score that meets the specified filtering criterion.
- this filtering criterion being a threshold and only those documents whose score exceed the specified threshold are considered for the subsequent category score and the scoring combination step (which bring about the composite score of the document.
- composite score is referred to occasionally in short as score).
- category score of the document is not only combined with query score of the specified document but possibly also with other scores of the documents, e.g. the date of the document. In other words, other factors which are not necessarily related to the specified query/category components may be weighted and combined to the composite score.
- the invention further provides a method for obtaining a composite score of documents, comprising: i) providing a composite query that includes at least indexing concept component that is constituted by at least two sub-components and obtain at least one document that meet said composite query; ii) calculating a non-Boolean score of said at least one document according to each one of said components; iii) combining said scores so as to obtain a composite score; and iv) displaying at least one of said documents associated with said score.
- the invention provides a system for obtaining a composite score of documents, comprising: i) means that include user interface for providing a composite query that includes at least indexing concept component that is constituted by at least two subcomponents and obtain at least one document that meets said composite query; ii) means that include processor for calculating a non-Boolean score of said at least one document according to each one of said components; iii) means that include processor combining said scores so as to obtain a composite score; and means that include user interface for displaying at least one of said documents associated with said score.
- the use of basic query component is obviated.
- a composite query is composed only of an indexing concept component (e.g., that includes several categories), in which case the composite score is determined by combining the scores of the distinct category sub-components.
- indexing concept component e.g., that includes several categories
- the composite score is determined by combining the scores of the distinct category sub-components.
- other score such as date
- FIG. 1 is a generalized schematic illustration of a system in accordance with an embodiment of the invention
- Fig. 2 is a flow chart illustrating a generalized sequence of operation in accordance with a preferred embodiment of the invention.
- Figs. 3A-B illustrate screen results according to hitherto known database search system which will assist in clarifying a category scoring step that is utilized in the system and method of the invention.
- free text query component is only out of many possible variants of basic query component.
- FIG. 1 illustrating a generalized schematic system (10) in accordance with an embodiment of the invention.
- plurality of user nodes communicate through communication medium (14), e.g. the Internet with a server (15).
- the user nodes running e.g. a browser application and place a query that consists, e.g. of plurality of free-text key words and possibly some categories.
- the query is processed wholly at server (15) (or divided among the user node and the server node) and the resulting documents and their associated composite score is displayed at the user node screen.
- the server hold(s) database of documents and/or other documents repository.
- any user nodes may include one of the following: personal computer, Personal Digital Assistant (PDA), or Cellular telephone, Other variants are applicable all as required and appropriate. Attention is now directed to Figs. 3A-B which will assist in understanding the sequence of operation in accordance with a preferred embodiment of the invention.
- U.S. patent 5,924,090 "Method and Apparatus for Searching a Database of Records" discloses system for searching a database and present to the user a small number of categories along with a list of most relevant documents that satisfy a query.
- the methodology of the Krellenstein patent has a sophisticated clustering algorithm that includes three primary steps: identifying candidate categories, weighting candidate categories and displaying a set of search result categories selected from the candidate categories.
- Figs. 3A-B A typical result of the system according to the Krellenstein patent is illustrated in Figs. 3A-B, as extracted from the www.northernlight.com site.
- the free-text component of the query "text categorization" (31) results in 19,215 documents (records) (32) (of which 6 are shown in the first page).
- the documents are assigned to 15 categories (33).
- the set of categories are determined after applying the specified sophisticated clustering including identifying candidate categories, weighting candidate categories (so as to obtain categories score) and displaying a set of search result categories selected from the candidate categories. As specified above the selection depends, of course, upon the so calculated score. It goes without saying that due to the coarse "Boolean" criterion that is used in the technique according to the Krellenstein patent, some categories (such as sport) are displayed notwithstanding the fact that they have low or no relevance.
- the user can repeat this process further narrowing the search with each iteration.
- double clicking the category "Special collection documents” (34) will result in applying the specified steps again giving rise to the search results illustrated in Fig. 3A.
- the category "Special collection documents” stands for the category component of the query and accordingly the composite query includes by this example a free-text component "text categorization” and category component "Special collection documents”. As shown there are 2057 documents (35) in the sought category (36) that, in turn are assigned to 12 categories (37).
- the score of free-text component is non-Boolean (e.g. score that ranges over a fine tuned scale, as known per se ) and the score of the category component is Boolean.
- the score of the category component is Boolean.
- a document is displayed in the specified category if it belongs thereto and is not displayed if it does not belong thereto.
- Fig. 2 Before turning to Fig. 2, it should be noted that the various elements described in Fig. 2 may be implemented in the user and the server nodes, depending upon the particular application. Thus, in accordance with a non-limiting example the calculating and combining steps are realized at the remote server site.
- Fig. 2 illustrating a flow chart of a generalized sequence of operation in accordance with a preferred embodiment of the invention.
- a composite query is applied to the database (and/or any other document repository) (22) similar to the composite query with a free-text component "text categorization” and category component "Special collection documents" discussed above.
- the composite query is not necessarily applied in one step and, if desired, may be constructed in several stages.
- the free text component is applied as a first step and thereafter the category component is designated.
- the process may be continued iteratively by designating additional free-text components and category components. Having obtained the resulting documents that meet the query, the documents are scored in respect of each component (23).
- the free-text score aims at determining how relevant the key words are to the document and there are numerous scoring techniques that may be employed to this end e.g. in accordance with the conventional search engines such as Alta VistaTM search engine where each document is associated with a non-Boolean score, signifying how relevant is the document to the free-text query words. The higher the score the more relevant is the document.
- a non-Boolean score is calculated in respect of the category component.
- the score for the category component may be the one obtained by applying some non-supervised classification algorithm such as e.g. in accordance with the specified Krellenstein Patent.
- the fine tuned score is maintained and utilized in the next step.
- a supervised algorithm such as using profiles for classifying to categories may be utilized.
- a composite score is determined by some mechanism that combines the scores of the distinct scored components.
- the composite score takes into account relationships between the matches of the components in the document, say any one or combination of the following operators: sum, product, average, weighted average, geometric mean, or minimum of the component scores. Insofar as the latter example is concerned, there may be various considerations what operator or operators to employ. By way of non limiting example geometric mean is preferable over average if the composite score should emphasize a significant contribution of every component and not only one of them.
- the combination step may employ not only "mathematical” (mathematical encompasses also "logical”) operators, e.g. of the kind specified.
- matrix operators e.g. of the kind specified.
- other operators are employed in addition or in lieu the specified mathematical operators. For example, order of components in the query may be taken in account, where e.g. the later the component the more weight it receives.
- certain components a priori receive more weight, say the free-text component benefits from higher weight than the category component etc.
- the combination step utilizes in addition or in lieu of the specified operators proximity/distance operators, one example being the adjacency operator.
- each paragraph is scored by the number of different matches in it.
- a "bonus” is conferred to the overall score as a function of the number of paragraphs with much intersection between the query components.
- the adjacency operator also takes into account the "weight" of the matching profile or free-text query term. That is, terms in the profile and in the query may have strength (profile weight, general term weight in the query - like the known per se Inverted Document Frequency - IDF). The boost entailed by adjacent query and profile terms should be larger if these are terms with high weight.
- the invention is not bound in the specified mathematical and non-mathematical operators in the score combination step.
- additional components may be utilized.
- the GoogleTM incorporates factors related to the number and quality of links pointing at a document.
- the specified component may be combined in the composite score e.g. by adding "bonus score" in the case of qualitative links.
- the document's date may also be a factor, where, say, new documents receives a bonus score as compared to older document.
- Other modified components may be utilized in addition or in lieu of the above, all as required and appropriate.
- the documents are displayed along with their associated score.
- the documents are sorted, ranked and displayed (e.g. as a whole or title or abstract, all as known per se) according to the composite score, say in a descending order.
- the documents are displayed in a hierarchy of categories, according to their classification by some classification algorithm.
- Standard search engines present all matches of a free-text query in a list ordered by match score. Thus there's no need to set a threshold of a minimal score, since the user sees only the first part of the list and can see the rest upon request.
- some documents may have low scores for the composite query, but they are the only documents in some category (note that the query isn't necessarily a free-text query, it might be any combination of free-text queries and category selection operations).
- the category appears in the hierarchy but when the user "drills down" into the category, the documents found there are actually of very low relevance for the query.
- a threshold is set for the minimal score (in the free-text component) a document should have in order to be displayed in the hierarchy. Thus, this category will not be displayed at all. For other categories the threshold may imply a lower number of documents within the category.
- indexing concepts e.g. categories
- the query might be "the category "science” and the category “news”, resulting in documents that are classified to both these categories
- composite (non-Boolean) score is based only on scores of indexing concepts. If desired other factors may be utilized in order to give rise to composite score, such as date and/or order, all as explained in detail above.
- system may be a suitably programmed computer.
- the invention contemplates a computer program being readable by a computer for executing the method of the invention.
- the invention further contemplates a machine-readable memory tangibly embodying a program of instructions executable by the machine for executing the method of the invention.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
L'invention concerne un système permettant d'obtenir un résultat combiné de documents. Ce système comprend une interface utilisateur permettant de procéder à une interrogation combinée composée d'un élément d'interrogation en forme libre et d'un élément de catégorie, et d'obtenir des documents répondant à l'interrogation combinée. Ce système comprend également un processeur permettant de calculer un résultat non-booléen en fonction de chacun des éléments. De plus, le processeur est conçu pour combiner les résultats de manière à obtenir un résultat combiné et pour afficher, via l'interface utilisateur, les documents associés audit résultat et éventuellement classés par ces résultats.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2002210882A AU2002210882A1 (en) | 2000-10-17 | 2001-10-11 | Integrating search, classification, scoring and ranking |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US69030700A | 2000-10-17 | 2000-10-17 | |
US09/690,307 | 2000-10-17 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2002037328A2 true WO2002037328A2 (fr) | 2002-05-10 |
WO2002037328A3 WO2002037328A3 (fr) | 2003-09-04 |
Family
ID=24771952
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IL2001/000942 WO2002037328A2 (fr) | 2000-10-17 | 2001-10-11 | Integration de recherche et de classification: resultat et classement |
Country Status (2)
Country | Link |
---|---|
AU (1) | AU2002210882A1 (fr) |
WO (1) | WO2002037328A2 (fr) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1626356A3 (fr) * | 2004-08-13 | 2006-08-23 | Microsoft Corporation | Procédé et système pour résumer un document |
US7890539B2 (en) | 2007-10-10 | 2011-02-15 | Raytheon Bbn Technologies Corp. | Semantic matching using predicate-argument structure |
US8280719B2 (en) | 2005-05-05 | 2012-10-02 | Ramp, Inc. | Methods and systems relating to information extraction |
WO2015168397A1 (fr) * | 2014-05-01 | 2015-11-05 | Lexisnexis, A Division Of Reed Elsevier Inc. | Systèmes et procédés pour afficher des indicateurs de pertinence estimée pour des ensembles de documents résultats et pour afficher des visualisations d'interrogation |
CN110390094A (zh) * | 2018-04-20 | 2019-10-29 | 伊姆西Ip控股有限责任公司 | 对文档进行分类的方法、电子设备和计算机程序产品 |
US10635679B2 (en) | 2018-04-13 | 2020-04-28 | RELX Inc. | Systems and methods for providing feedback for natural language queries |
US11620342B2 (en) * | 2019-03-28 | 2023-04-04 | Verizon Patent And Licensing Inc. | Relevance-based search and discovery for media content delivery |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5940821A (en) * | 1997-05-21 | 1999-08-17 | Oracle Corporation | Information presentation in a knowledge base search and retrieval system |
EP1155377A1 (fr) * | 1999-02-25 | 2001-11-21 | Focusengine Software Ltd. | Procede et appareil d'affichage dynamique d'un ensemble de documents organises selon une hierarchie de concepts d'indexation |
-
2001
- 2001-10-11 WO PCT/IL2001/000942 patent/WO2002037328A2/fr active Application Filing
- 2001-10-11 AU AU2002210882A patent/AU2002210882A1/en not_active Abandoned
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1626356A3 (fr) * | 2004-08-13 | 2006-08-23 | Microsoft Corporation | Procédé et système pour résumer un document |
US7698339B2 (en) | 2004-08-13 | 2010-04-13 | Microsoft Corporation | Method and system for summarizing a document |
US8280719B2 (en) | 2005-05-05 | 2012-10-02 | Ramp, Inc. | Methods and systems relating to information extraction |
US7890539B2 (en) | 2007-10-10 | 2011-02-15 | Raytheon Bbn Technologies Corp. | Semantic matching using predicate-argument structure |
US11372874B2 (en) | 2014-05-01 | 2022-06-28 | RELX Inc. | Systems and methods for displaying estimated relevance indicators for result sets of documents and for displaying query visualizations |
US9626455B2 (en) | 2014-05-01 | 2017-04-18 | Lexisnexis, A Division Of Reed Elsevier Inc. | Systems and methods for displaying estimated relevance indicators for result sets of documents and for displaying query visualizations |
JP2017515249A (ja) * | 2014-05-01 | 2017-06-08 | レクシスネクシス ア ディヴィジョン オブ リード エルザヴィア インコーポレイテッド | 結果文書セットに関する推定関連性指示子を表示するため及びクエリ可視化を表示するためのシステム及び方法 |
US10268738B2 (en) | 2014-05-01 | 2019-04-23 | Lexisnexis, A Division Of Reed Elsevier Inc. | Systems and methods for displaying estimated relevance indicators for result sets of documents and for displaying query visualizations |
AU2015253062B2 (en) * | 2014-05-01 | 2020-07-23 | Lexisnexis, A Division Of Reed Elsevier Inc. | Systems and methods for displaying estimated relevance indicators for result sets of documents and for displaying query visualizations |
WO2015168397A1 (fr) * | 2014-05-01 | 2015-11-05 | Lexisnexis, A Division Of Reed Elsevier Inc. | Systèmes et procédés pour afficher des indicateurs de pertinence estimée pour des ensembles de documents résultats et pour afficher des visualisations d'interrogation |
US11989195B2 (en) | 2014-05-01 | 2024-05-21 | RELX Inc. | Systems and methods for displaying estimated relevance indicators for result sets of documents and for displaying query visualizations |
US10635679B2 (en) | 2018-04-13 | 2020-04-28 | RELX Inc. | Systems and methods for providing feedback for natural language queries |
US11144561B2 (en) | 2018-04-13 | 2021-10-12 | RELX Inc. | Systems and methods for providing feedback for natural language queries |
CN110390094A (zh) * | 2018-04-20 | 2019-10-29 | 伊姆西Ip控股有限责任公司 | 对文档进行分类的方法、电子设备和计算机程序产品 |
CN110390094B (zh) * | 2018-04-20 | 2023-05-23 | 伊姆西Ip控股有限责任公司 | 对文档进行分类的方法、电子设备和计算机程序产品 |
US11620342B2 (en) * | 2019-03-28 | 2023-04-04 | Verizon Patent And Licensing Inc. | Relevance-based search and discovery for media content delivery |
Also Published As
Publication number | Publication date |
---|---|
AU2002210882A1 (en) | 2002-05-15 |
WO2002037328A3 (fr) | 2003-09-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6826576B2 (en) | Very-large-scale automatic categorizer for web content | |
US7496567B1 (en) | System and method for document categorization | |
US5960422A (en) | System and method for optimized source selection in an information retrieval system | |
US7707201B2 (en) | Systems and methods for managing and using multiple concept networks for assisted search processing | |
AU2005209586B2 (en) | Systems, methods, and interfaces for providing personalized search and information access | |
CA2281645C (fr) | Systeme et procede de traitement et de recuperation de texte | |
JP4726528B2 (ja) | マルチセンスクエリについての関連語提案 | |
US7096214B1 (en) | System and method for supporting editorial opinion in the ranking of search results | |
US10445359B2 (en) | Method and system for classifying media content | |
US7676452B2 (en) | Method and apparatus for search optimization based on generation of context focused queries | |
EP1565846B1 (fr) | Stockage et extraction d'informations | |
US5625767A (en) | Method and system for two-dimensional visualization of an information taxonomy and of text documents based on topical content of the documents | |
US6385602B1 (en) | Presentation of search results using dynamic categorization | |
JP3270783B2 (ja) | 複数の文書検索方法 | |
US6286000B1 (en) | Light weight document matcher | |
US20050060290A1 (en) | Automatic query routing and rank configuration for search queries in an information retrieval system | |
US20020194161A1 (en) | Directed web crawler with machine learning | |
US20040049499A1 (en) | Document retrieval system and question answering system | |
US7340460B1 (en) | Vector analysis of histograms for units of a concept network in search query processing | |
EP1426882A2 (fr) | Stockage et récuperation des informations | |
US7024405B2 (en) | Method and apparatus for improved internet searching | |
WO2004097671A2 (fr) | Systeme et procede de production de categories affinees pour un groupe de resultats de recherche | |
US20070112839A1 (en) | Method and system for expansion of structured keyword vocabulary | |
WO2002037328A2 (fr) | Integration de recherche et de classification: resultat et classement | |
Chung et al. | Developing a specialized directory system by automatically classifying Web documents |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
122 | Ep: pct application non-entry in european phase | ||
NENP | Non-entry into the national phase |
Ref country code: JP |