WO1999034307A1 - Serveur d'extraction - Google Patents
Serveur d'extraction Download PDFInfo
- Publication number
- WO1999034307A1 WO1999034307A1 PCT/US1998/027664 US9827664W WO9934307A1 WO 1999034307 A1 WO1999034307 A1 WO 1999034307A1 US 9827664 W US9827664 W US 9827664W WO 9934307 A1 WO9934307 A1 WO 9934307A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- electronic document
- section
- document
- concepts
- semantic network
- Prior art date
Links
- 238000000605 extraction Methods 0.000 title description 16
- 238000004458 analytical method Methods 0.000 claims abstract description 38
- 238000000034 method Methods 0.000 claims description 25
- 238000001914 filtration Methods 0.000 claims description 2
- 230000000877 morphologic effect Effects 0.000 abstract description 3
- 239000000284 extract Substances 0.000 description 42
- 230000008569 process Effects 0.000 description 11
- 230000000007 visual effect Effects 0.000 description 11
- 238000010586 diagram Methods 0.000 description 9
- 239000000872 buffer Substances 0.000 description 6
- 238000004590 computer program Methods 0.000 description 6
- 230000008520 organization Effects 0.000 description 5
- 238000012545 processing Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000003121 nonmonotonic effect Effects 0.000 description 3
- 238000012015 optical character recognition Methods 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 3
- 239000000470 constituent Substances 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012805 post-processing Methods 0.000 description 2
- 239000008186 active pharmaceutical agent Substances 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000007935 neutral effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012797 qualification Methods 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 210000003813 thumb Anatomy 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
Definitions
- Figure 7 is a block diagram of a preferred embodiment of the section processors for a resume document type.
- the document pre-processor 210 identifies the file by the Microsoft Word signature and uses the Microsoft Object Linking and Embedding Software Development Kit (Microsoft OLE 2.0 SDK) to extract text from the Microsoft Word File.
- Microsoft OLE 2.0 SDK Microsoft Object Linking and Embedding Software Development Kit
- Phone numbers are strings of contiguous digits following a pattern.
- City • After identifying the zip code, state, e-mail and phone numbers, it is easy to recognize the city by searching the area where the state and zip code were found.
- the experience section starts with a keyword indicating the beginning of the experience section.
- Job titles can be listed in any case, not necessarily in the case listed in the database.
- the education section processor first identifies the education section within the document, identifies the individual record listings, extracts the start and end dates from each of the listings, extracts the degree name from each of the listings, extracts major names from each of the listings, extracts the university/institute name from each of the listings, extracts the GPA from each of the listings, extracts the status of the degree from each of the listing, and stores the extracted information in an Education Record in the target database 1 10.
- the heuristics applied by the experience section processor includes the following:
- the end of education section is a different position after the beginning of the education section beginning and is already identified as a section beginning based on the formatting information by the document-preprocessing module. Identifying Individual Records:
- GPA listing follows a pattern like N NN or N NN/NN 0 where N is a digit • GPA value usually follows the keyword 'GPA'
- the awards and honors section typically lists the awards and honors received by the candidate. This is usually a short section listing items in one of the standard formats. In this section, each item is typically preceded by either a bullet character like '*', or listed in one paragraph, or as one line or in a multi-column format (several items in one line and the items are arranged in different columns) Each listing in this section may contain the date an award or an honor was obtained and a highlight of the award.
- the patents section processor first identifies the patents section within the document. Then the patents section processor recognizes the pattern or format of the listing, identifies the individual record listings, extracts the date when a patent was granted/filed, extracts the title of the patent, extracts the status of the patent, extracts the patent number, and finally, stores the extracted information in a Patents Record in the target database 1 10.
- the resume may contain a publications section which lists the books, technical articles, journal articles and any other publications by the candidate.
- Each listing in this section may contain the date of publication, publication name, publisher name, publication type, ISBN number if any, page range if any.
- the publications section processor extracts the following information from the listing of each publication record: the date of publication, the ISBN, the page range, the publication type, the publication name, and the publisher name
- the publications section processor first identifies the publications section within the document. The publications section processor then recognizes the pattern or format of the listing, identifies the individual record listings, extracts the date of the publication, extracts the ISBN, extracts the page range, extracts the publication type, extracts the publication name, extracts the publisher name, and finally, stores the extracted information in a Publications Record in the target database 1 10.
- the heuristics used by the publications section processor include:
- Extracting ISBN • ISBN patterns are stored in the knowledge base of the extraction server, which is user configurable.
- Page range is usually followed by the keywords such as 'pp.' or 'pages' etc.
- Page range listings follow the pattern such as N-N, where N is a number.
- Publisher name usually consists of keywords like 'Publishing Co.', 'Publishers' etc. These keywords are also user configurable.
- Table 10 illustrates the preferred column headings and descriptions for an Experience Detail Record for storing information pertaining to a candidate's experience in the target database 110.
- the database record illustrated in Table 10 stores the information extracted from the resume pertaining to one project or a job done at a particular company.
- a candidate may have more than one Experience Detail Record.
- An Experience Detail Record is created for each of the projects that were mentioned in the experience section of the resume.
- Table 12 illustrates the preferred column headings and descriptions for an awards- Honors Record.
- Table 13 illustrates the preferred column headings and descriptions for a Course Record.
- Table 14 illustrates the preferred column headings and descriptions for a Patent Record.
- a single record is created for each of the patent mentioned in the resume.
- Table 15 illustrates the preferred column headings and descriptions for a Publication Record.
- a ingle record is created for each publication mentioned by the candidate.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB9923074A GB2338807A (en) | 1997-12-29 | 1998-12-28 | Extraction server for unstructured documents |
AU19482/99A AU1948299A (en) | 1997-12-29 | 1998-12-28 | Extraction server for unstructured documents |
PCT/US1999/026083 WO2000026839A1 (fr) | 1998-11-04 | 1999-11-03 | Modele evolue destine a l'extraction automatique des informations relatives au savoir-faire et aux connaissances depuis un document electronique |
GB0113250A GB2359168A (en) | 1998-11-04 | 2001-05-31 | Advanced model for automatic extraction of skill and knowledge information from an electronic document |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US6892097P | 1997-12-29 | 1997-12-29 | |
US60/068,920 | 1997-12-29 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1999034307A1 true WO1999034307A1 (fr) | 1999-07-08 |
Family
ID=22085559
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1998/027664 WO1999034307A1 (fr) | 1997-12-29 | 1998-12-28 | Serveur d'extraction |
Country Status (3)
Country | Link |
---|---|
AU (1) | AU1948299A (fr) |
GB (1) | GB2338807A (fr) |
WO (1) | WO1999034307A1 (fr) |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
NL1015151C2 (nl) * | 2000-05-10 | 2001-12-10 | Collexis B V | Inrichting en werkwijze voor het catalogiseren van tekstuele informatie. |
US6510433B1 (en) * | 1997-06-04 | 2003-01-21 | Gary L. Sharp | Database structure having tangible and intangible elements and management system therefor |
EP1606723A2 (fr) * | 2003-03-10 | 2005-12-21 | Unisys Corporation | Systeme et procede pour stocker et acceder a des donnees dans une memoire de donnees comprenant des arbres interverrouilles |
EP1609081A2 (fr) * | 2003-03-10 | 2005-12-28 | Unisys Corporation | Systeme et procede pour la memorisation et l'acces de donnees dans une memoire de donnees a arborescences imbriquees |
WO2004042493A3 (fr) * | 2002-10-24 | 2006-03-02 | Agency Science Tech & Res | Procede et systeme de decouverte de connaissance a partir de documents textuels |
CN1310175C (zh) * | 2002-11-22 | 2007-04-11 | 国际商业机器公司 | 搜索引擎管理系统和方法 |
US7249046B1 (en) * | 1998-10-09 | 2007-07-24 | Fuji Xerox Co., Ltd. | Optimum operator selection support system |
WO2004088546A3 (fr) * | 2003-03-27 | 2007-12-27 | Electronic Data Syst Corp | Representation de donnees pour analyse de liens amelioree |
US7340471B2 (en) | 2004-01-16 | 2008-03-04 | Unisys Corporation | Saving and restoring an interlocking trees datastore |
US7348980B2 (en) | 2004-11-08 | 2008-03-25 | Unisys Corporation | Method and apparatus for interface for graphic display of data from a Kstore |
US7389301B1 (en) | 2005-06-10 | 2008-06-17 | Unisys Corporation | Data aggregation user interface and analytic adapted for a KStore |
US7409380B1 (en) | 2005-04-07 | 2008-08-05 | Unisys Corporation | Facilitated reuse of K locations in a knowledge store |
US7418445B1 (en) | 2004-11-08 | 2008-08-26 | Unisys Corporation | Method for reducing the scope of the K node construction lock |
US7499932B2 (en) | 2004-11-08 | 2009-03-03 | Unisys Corporation | Accessing data in an interlocking trees data structure using an application programming interface |
US7593909B2 (en) | 2003-03-27 | 2009-09-22 | Hewlett-Packard Development Company, L.P. | Knowledge representation using reflective links for link analysis applications |
US7593923B1 (en) | 2004-06-29 | 2009-09-22 | Unisys Corporation | Functional operations for accessing and/or building interlocking trees datastores to enable their use with applications software |
US7676477B1 (en) | 2005-10-24 | 2010-03-09 | Unisys Corporation | Utilities for deriving values and information from within an interlocking trees data store |
US7689571B1 (en) | 2006-03-24 | 2010-03-30 | Unisys Corporation | Optimizing the size of an interlocking tree datastore structure for KStore |
US7716241B1 (en) | 2004-10-27 | 2010-05-11 | Unisys Corporation | Storing the repository origin of data inputs within a knowledge store |
US7734571B2 (en) | 2006-03-20 | 2010-06-08 | Unisys Corporation | Method for processing sensor data within a particle stream by a KStore |
US7908240B1 (en) | 2004-10-28 | 2011-03-15 | Unisys Corporation | Facilitated use of column and field data for field record universe in a knowledge store |
CN103207872A (zh) * | 2012-01-17 | 2013-07-17 | 深圳市快播科技有限公司 | 一种实时索引方法和服务器 |
US20150169676A1 (en) * | 2013-12-18 | 2015-06-18 | International Business Machines Corporation | Generating a Table of Contents for Unformatted Text |
WO2017017678A1 (fr) * | 2015-07-27 | 2017-02-02 | Opisoft Care Ltd. | Système et procédé de recherche de phrase dans une section de document |
CN107844497A (zh) * | 2016-09-20 | 2018-03-27 | 天脉聚源(北京)科技有限公司 | 一种数据库检索的方法和系统 |
WO2021026428A1 (fr) * | 2019-08-07 | 2021-02-11 | Zinatt Technologies, Inc. | Caractéristique d'entrée de données pour système de suivi d'informations |
US11829701B1 (en) * | 2022-06-30 | 2023-11-28 | Accenture Global Solutions Limited | Heuristics-based processing of electronic document contents |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2428114A (en) | 2005-07-08 | 2007-01-17 | William Alan Hollingsworth | Data Format Conversion System |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5297039A (en) * | 1991-01-30 | 1994-03-22 | Mitsubishi Denki Kabushiki Kaisha | Text search system for locating on the basis of keyword matching and keyword relationship matching |
-
1998
- 1998-12-28 WO PCT/US1998/027664 patent/WO1999034307A1/fr active Application Filing
- 1998-12-28 GB GB9923074A patent/GB2338807A/en not_active Withdrawn
- 1998-12-28 AU AU19482/99A patent/AU1948299A/en not_active Abandoned
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5297039A (en) * | 1991-01-30 | 1994-03-22 | Mitsubishi Denki Kabushiki Kaisha | Text search system for locating on the basis of keyword matching and keyword relationship matching |
Non-Patent Citations (4)
Title |
---|
ASHISH N ET AL: "Semi-automatic wrapper generation for Internet information sources", PROCEEDINGS OF THE SECOND IFCIS INTERNATIONAL CONFERENCE ON COOPERATIVE INFORMATION SYSTEMS, COOPIS'97 (CAT. NO.97TB100143), PROCEEDINGS OF COOPIS 97: 2ND IFCIS CONFERENCE ON COOPERATIVE INFORMATION SYSTEMS, KIAWAH ISLAND, SC, USA, 24-27 JUNE 1997, ISBN 0-8186-7946-8, 1997, Los Alamitos, CA, USA, IEEE Comput. Soc, USA, pages 160 - 169, XP002099173 * |
HAMMER J ET AL: "Extracting semistructured information from the Web", PROCEEDINGS OF THE WORKSHOP ON MANAGEMENT OF SEMI-STRUCTURED DATA, PROCEEDINGS OF WORKSHOP ON MANAGEMENT OF SEMI-STRUCTURED DATA, TUCSON, AZ, USA, 16 MAY 1997, 1997, Murray Hill, NJ, USA, AT & T Labs - Research, USA, pages 18 - 25, XP002099172 * |
NESTOROV S ET AL: "Inferring structure in semistructured data", SEMI-STRUCTURED DATA WORKSHOP HELD IN CONJUNCTION WITH SIGMOD '97, TUCSON, AZ, USA, MAY 1997, vol. 26, no. 4, ISSN 0163-5808, SIGMOD Record, Dec. 1997, ACM, USA, pages 39 - 43, XP002099175 * |
SMITH D ET AL: "Information extraction for semi-structured documents", PROCEEDINGS OF THE WORKSHOP ON MANAGEMENT OF SEMI-STRUCTURED DATA, PROCEEDINGS OF WORKSHOP ON MANAGEMENT OF SEMI-STRUCTURED DATA, TUCSON, AZ, USA, 16 MAY 1997, 1997, Murray Hill, NJ, USA, AT & T Labs - Research, USA, pages 60 - 66, XP002099174 * |
Cited By (41)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6510433B1 (en) * | 1997-06-04 | 2003-01-21 | Gary L. Sharp | Database structure having tangible and intangible elements and management system therefor |
US6665680B2 (en) * | 1997-06-04 | 2003-12-16 | Gary L. Sharp | Database structure having tangible and intangible elements and management system therefore |
US7249046B1 (en) * | 1998-10-09 | 2007-07-24 | Fuji Xerox Co., Ltd. | Optimum operator selection support system |
NL1015151C2 (nl) * | 2000-05-10 | 2001-12-10 | Collexis B V | Inrichting en werkwijze voor het catalogiseren van tekstuele informatie. |
US7734556B2 (en) | 2002-10-24 | 2010-06-08 | Agency For Science, Technology And Research | Method and system for discovering knowledge from text documents using associating between concepts and sub-concepts |
WO2004042493A3 (fr) * | 2002-10-24 | 2006-03-02 | Agency Science Tech & Res | Procede et systeme de decouverte de connaissance a partir de documents textuels |
CN1310175C (zh) * | 2002-11-22 | 2007-04-11 | 国际商业机器公司 | 搜索引擎管理系统和方法 |
EP1609081A2 (fr) * | 2003-03-10 | 2005-12-28 | Unisys Corporation | Systeme et procede pour la memorisation et l'acces de donnees dans une memoire de donnees a arborescences imbriquees |
EP1606723A4 (fr) * | 2003-03-10 | 2006-12-20 | Unisys Corp | Systeme et procede pour stocker et acceder a des donnees dans une memoire de donnees comprenant des arbres interverrouilles |
EP1609081A4 (fr) * | 2003-03-10 | 2006-12-20 | Unisys Corp | Systeme et procede pour la memorisation et l'acces de donnees dans une memoire de donnees a arborescences imbriquees |
EP1703422A3 (fr) * | 2003-03-10 | 2006-12-20 | Unisys Corporation | Système et procédé pour stocker et accéder à des données dans une mémoire de données comprenant des arbres emboités |
EP1801717A1 (fr) * | 2003-03-10 | 2007-06-27 | Unisys Corporation | Systeme pour stocker et acceder à des donnees dans une memoire de donnees comprenant des arbres interverrouilles. |
EP1703422A2 (fr) * | 2003-03-10 | 2006-09-20 | Unisys Corporation | Système et procédé pour stocker et accéder à des données dans une mémoire de données comprenant des arbres emboités |
EP1606723A2 (fr) * | 2003-03-10 | 2005-12-21 | Unisys Corporation | Systeme et procede pour stocker et acceder a des donnees dans une memoire de donnees comprenant des arbres interverrouilles |
US7424480B2 (en) | 2003-03-10 | 2008-09-09 | Unisys Corporation | System and method for storing and accessing data in an interlocking trees datastore |
US7593909B2 (en) | 2003-03-27 | 2009-09-22 | Hewlett-Packard Development Company, L.P. | Knowledge representation using reflective links for link analysis applications |
WO2004088546A3 (fr) * | 2003-03-27 | 2007-12-27 | Electronic Data Syst Corp | Representation de donnees pour analyse de liens amelioree |
US7580947B2 (en) | 2003-03-27 | 2009-08-25 | Hewlett-Packard Development Company, L.P. | Data representation for improved link analysis |
US7340471B2 (en) | 2004-01-16 | 2008-03-04 | Unisys Corporation | Saving and restoring an interlocking trees datastore |
US7593923B1 (en) | 2004-06-29 | 2009-09-22 | Unisys Corporation | Functional operations for accessing and/or building interlocking trees datastores to enable their use with applications software |
US7716241B1 (en) | 2004-10-27 | 2010-05-11 | Unisys Corporation | Storing the repository origin of data inputs within a knowledge store |
US7908240B1 (en) | 2004-10-28 | 2011-03-15 | Unisys Corporation | Facilitated use of column and field data for field record universe in a knowledge store |
US7499932B2 (en) | 2004-11-08 | 2009-03-03 | Unisys Corporation | Accessing data in an interlocking trees data structure using an application programming interface |
US7418445B1 (en) | 2004-11-08 | 2008-08-26 | Unisys Corporation | Method for reducing the scope of the K node construction lock |
US7348980B2 (en) | 2004-11-08 | 2008-03-25 | Unisys Corporation | Method and apparatus for interface for graphic display of data from a Kstore |
US7409380B1 (en) | 2005-04-07 | 2008-08-05 | Unisys Corporation | Facilitated reuse of K locations in a knowledge store |
US7389301B1 (en) | 2005-06-10 | 2008-06-17 | Unisys Corporation | Data aggregation user interface and analytic adapted for a KStore |
US7676477B1 (en) | 2005-10-24 | 2010-03-09 | Unisys Corporation | Utilities for deriving values and information from within an interlocking trees data store |
US7734571B2 (en) | 2006-03-20 | 2010-06-08 | Unisys Corporation | Method for processing sensor data within a particle stream by a KStore |
US7689571B1 (en) | 2006-03-24 | 2010-03-30 | Unisys Corporation | Optimizing the size of an interlocking tree datastore structure for KStore |
CN103207872A (zh) * | 2012-01-17 | 2013-07-17 | 深圳市快播科技有限公司 | 一种实时索引方法和服务器 |
US20150169676A1 (en) * | 2013-12-18 | 2015-06-18 | International Business Machines Corporation | Generating a Table of Contents for Unformatted Text |
US20160188569A1 (en) * | 2013-12-18 | 2016-06-30 | International Business Machines Corporation | Generating a Table of Contents for Unformatted Text |
WO2017017678A1 (fr) * | 2015-07-27 | 2017-02-02 | Opisoft Care Ltd. | Système et procédé de recherche de phrase dans une section de document |
CN107844497A (zh) * | 2016-09-20 | 2018-03-27 | 天脉聚源(北京)科技有限公司 | 一种数据库检索的方法和系统 |
WO2021026428A1 (fr) * | 2019-08-07 | 2021-02-11 | Zinatt Technologies, Inc. | Caractéristique d'entrée de données pour système de suivi d'informations |
CN115210708A (zh) * | 2019-08-07 | 2022-10-18 | 齐纳特科技公司 | 信息跟踪系统的数据条目特征 |
CN115210708B (zh) * | 2019-08-07 | 2023-09-01 | 齐纳特科技公司 | 处理文本数据的方法和系统、非暂时性计算机可读介质 |
US11783127B2 (en) | 2019-08-07 | 2023-10-10 | Zinatt Technologies, Inc. | Data entry feature for information tracking system |
US12254270B2 (en) | 2019-08-07 | 2025-03-18 | Zinatt Technologies, Inc. | Data entry feature for information tracking system |
US11829701B1 (en) * | 2022-06-30 | 2023-11-28 | Accenture Global Solutions Limited | Heuristics-based processing of electronic document contents |
Also Published As
Publication number | Publication date |
---|---|
GB2338807A (en) | 1999-12-29 |
AU1948299A (en) | 1999-07-19 |
GB9923074D0 (en) | 1999-12-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO1999034307A1 (fr) | Serveur d'extraction | |
Kowalski | Information retrieval systems: theory and implementation | |
US7257530B2 (en) | Method and system of knowledge based search engine using text mining | |
US8977953B1 (en) | Customizing information by combining pair of annotations from at least two different documents | |
Witten | Text Mining. | |
CN102254014B (zh) | 一种网页特征自适应的信息抽取方法 | |
Ahmed et al. | Language identification from text using n-gram based cumulative frequency addition | |
US20090259670A1 (en) | Apparatus and Method for Conditioning Semi-Structured Text for use as a Structured Data Source | |
JPH06110948A (ja) | 文献を識別し、検索し、分類する方法 | |
Bohne et al. | Efficient keyword extraction for meaningful document perception | |
KR20010108845A (ko) | 정보검색에서 질의어 처리를 위한 단어 클러스터 관리장치 및 그 방법 | |
Rathod | Extractive text summarization of Marathi news articles | |
WO2007113585A1 (fr) | procédés et systèmes d'indexation et de récupération de documents | |
Lardera et al. | Keyword | |
US20070162447A1 (en) | System and method for extraction of factoids from textual repositories | |
US20070179932A1 (en) | Method for finding data, research engine and microprocessor therefor | |
WO2000026839A1 (fr) | Modele evolue destine a l'extraction automatique des informations relatives au savoir-faire et aux connaissances depuis un document electronique | |
Xu et al. | Using SVM to extract acronyms from text | |
Hassel | Evaluation of automatic text summarization | |
Yurtsever et al. | Figure search by text in large scale digital document collections | |
Patel et al. | Influence of Gujarati STEmmeR in supervised learning of web page categorization | |
Tsuboi | Authorship identification for heterogeneous documents | |
JP2002183175A (ja) | テキストマイニング方法 | |
JP2002278982A (ja) | 情報抽出方法および情報検索方法 | |
Mahdi et al. | A citation-based approach to automatic topical indexing of scientific literature |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
ENP | Entry into the national phase |
Ref document number: 9923074 Country of ref document: GB Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: IN/PCT/1999/41/KOL Country of ref document: IN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 09380219 Country of ref document: US |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
122 | Ep: pct application non-entry in european phase |