WO2018157330A1 - Procédé et système de partitionnement de mégadonnées - Google Patents
Procédé et système de partitionnement de mégadonnées Download PDFInfo
- Publication number
- WO2018157330A1 WO2018157330A1 PCT/CN2017/075330 CN2017075330W WO2018157330A1 WO 2018157330 A1 WO2018157330 A1 WO 2018157330A1 CN 2017075330 W CN2017075330 W CN 2017075330W WO 2018157330 A1 WO2018157330 A1 WO 2018157330A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- big data
- word segmentation
- category
- categories
- server
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000000638 solvent extraction Methods 0.000 title abstract description 4
- 230000011218 segmentation Effects 0.000 claims abstract description 34
- 238000005192 partition Methods 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012517 data analytics Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
Definitions
- the present invention relates to the field of data processing, and in particular, to a method and system for dividing big data.
- Big Data The strategic significance of big data technology is not to master huge data information, but to professionalize these meaningful data.
- big data the key to profitability in this industry is to increase the “processing capability” of the data and “add value” of the data through “processing”.
- Big data must not be processed by a single computer, and a distributed architecture must be used. It features distributed data mining for massive data. But it must rely on cloud computing for distributed processing, distributed databases and cloud storage, and virtualization technologies. With the advent of the cloud era, big data (Big Data) has also attracted more and more attention.
- Big Data (Big) Data) is often used to describe a large amount of unstructured data and semi-structured data created by a company that spends too much time and money when downloaded to a relational database for analysis. Big data analytics is often associated with cloud computing because real-time large dataset analysis requires a framework like MapReduce to distribute work to dozens, hundreds, or even thousands of computers.
- the application provides a method for dividing big data. It solves the shortcomings of inconvenient retrieval of the technical solutions of the prior art.
- a method for dividing big data comprising the following steps: a method for dividing big data, the method comprising the following steps:
- the server receives the big data that needs to be divided
- the server performs word segmentation processing on the big data to obtain keywords of the big data
- the server divides the big data into categories according to the keywords, and marks the divided categories in big data.
- the method further includes:
- the server performs word segmentation on big data through Baidu word segmentation or natural language segmentation.
- the method further includes:
- the server service divides the categories into primary and secondary categories, carries the primary category in the head of big data, and carries the secondary category at the end of big data.
- a system for dividing big data comprising:
- a transceiver unit configured to receive big data that needs to be divided
- the processing unit is configured to perform word segmentation processing on the big data to obtain keywords of the big data, divide the big data into categories according to keywords, and mark the classified categories in big data.
- system further includes:
- the processing unit is configured to perform word segmentation processing on the big data by using a Baidu word segmentation method or a natural language word segmentation method.
- system further includes:
- the processing unit is configured to serve the category into a primary category and a secondary category, and the primary category is carried in the head of the big data, and the secondary category is carried in the tail of the big data.
- a third aspect provides a server, including: a processor, a wireless transceiver, a memory, and a bus, wherein the processor, the wireless transceiver, and the memory are connected by a bus, and the wireless transceiver is configured to receive big data that needs to be divided. ;
- the processor is configured to perform word segmentation processing on the big data to obtain keywords of the big data, divide the big data into categories according to keywords, and mark the classified categories in big data.
- the processor is configured to perform word segmentation processing on the big data by using a Baidu word segmentation method or a natural language word segmentation method.
- the processor is configured to serve the category into a primary category and a secondary category, and the primary category is carried in a header of the big data, and the secondary category is carried in a tail of the big data.
- the technical solution provided by the invention is classified according to keywords of big data, so it has the advantage of convenient retrieval of big data.
- FIG. 1 is a flowchart of a method for dividing big data according to a first preferred embodiment of the present invention
- FIG. 2 is a structural diagram of a big data partitioning system according to a second preferred embodiment of the present invention.
- FIG. 3 is a hardware structural diagram of a server according to a second preferred embodiment of the present invention.
- FIG. 1 is a method for dividing big data according to a first preferred embodiment of the present invention. The method is as shown in FIG. 1 and includes the following steps:
- Step S101 The server receives big data that needs to be divided.
- Step S102 The server performs word segmentation processing on the big data to obtain keywords of the big data.
- Step S103 The server divides the big data into categories according to keywords, and marks the divided categories in big data.
- the technical solution provided by the invention is classified according to keywords of big data, so it has the advantage of convenient retrieval of big data.
- the server performs word segmentation processing on big data through Baidu word segmentation or natural language segmentation.
- the server service divides the categories into primary categories and secondary categories, carries the primary category in the head of big data, and carries the secondary category at the end of big data.
- FIG. 2 is a schematic diagram of a big data partitioning system according to a second preferred embodiment of the present invention. The system is as shown in FIG.
- the transceiver unit 201 is configured to receive big data that needs to be divided;
- the processing unit 202 is configured to perform word segmentation processing on the big data to obtain keywords of the big data, divide the big data into categories according to keywords, and mark the classified categories in big data.
- the technical solution provided by the invention is classified according to keywords of big data, so it has the advantage of convenient retrieval of big data.
- the processing unit 202 is configured to perform word segmentation processing on the big data by using a Baidu word segmentation method or a natural language word segmentation method.
- the processing unit 202 is configured to serve the category into a primary category and a secondary category, and the primary category is carried in a header of the big data, and the secondary category is carried in a tail of the big data.
- FIG. 3 is a server 30, including: a processor 301, a wireless transceiver 302, a memory 303, and a bus 304.
- the wireless transceiver 302 is configured to send and receive data with and from an external device.
- the number of processors 301 can be one or more.
- processor 301, memory 302, and transceiver 303 may be connected by bus 304 or other means.
- Server 30 can be used to perform the steps of FIG. For the meaning and examples of the terms involved in the embodiment, reference may be made to the corresponding embodiment of FIG. 1. I will not repeat them here.
- the wireless transceiver 302 is configured to receive big data that needs to be divided.
- the program code is stored in the memory 303.
- the processor 901 is configured to call the program code stored in the memory 903 for performing the following operations:
- the processor 301 is configured to perform word segmentation processing on the big data to obtain keywords of the big data, divide the big data into categories according to keywords, and mark the classified categories in big data.
- the processor 301 herein may be a processing component or a general term of multiple processing components.
- the processing element can be a central processor (Central) Processing Unit, CPU), or a specific integrated circuit (Application Specific Integrated) Circuit, ASIC), or one or more integrated circuits configured to implement embodiments of the present application, such as one or more microprocessors (digital singnal Processor, DSP), or one or more Field Programmable Gate Arrays (FPGAs).
- CPU central processor
- ASIC Application Specific Integrated Circuit
- DSP digital singnal Processor
- FPGAs Field Programmable Gate Arrays
- the memory 303 may be a storage device or a collective name of a plurality of storage elements, and is used to store executable program code or parameters, data, and the like required for the application running device to operate. And the memory 303 may include random access memory (RAM), and may also include non-volatile memory (non-volatile memory) Memory), such as disk storage, flash (Flash), etc.
- RAM random access memory
- non-volatile memory non-volatile memory
- flash flash
- Bus 304 can be an industry standard architecture (Industry Standard Architecture, ISA) bus, Peripheral Component (PCI) bus or extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc.
- the bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is shown in Figure 3, but it does not mean that there is only one bus or one type of bus.
- the terminal may further include input and output means connected to the bus 304 for connection to other parts such as the processor 301 via the bus.
- the input/output device can provide an input interface for the operator, so that the operator can select the control item through the input interface, and can also be other interfaces through which other devices can be externally connected.
- the program may be stored in a computer readable storage medium, and the storage medium may include: Flash drive, read-only memory (English: Read-Only Memory, referred to as: ROM), random accessor (English: Random Access Memory, referred to as: RAM), disk or CD.
- ROM Read-Only Memory
- RAM Random Access Memory
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
L'invention a trait à un procédé de partitionnement de mégadonnées, comprenant les étapes suivantes : un serveur reçoit des mégadonnées devant être partitionnées (101) ; le serveur effectue une segmentation sur les mégadonnées pour obtenir des mots-clés de ces mégadonnées (102) ; et le serveur partitionne les mégadonnées en fonction des catégories des mots-clés et marque les catégories partitionnées dans les mégadonnées (103). La solution technique fournie par le procédé présente l'avantage de permettre à un utilisateur de réaliser facilement une recherche documentaire.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2017/075330 WO2018157330A1 (fr) | 2017-03-01 | 2017-03-01 | Procédé et système de partitionnement de mégadonnées |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2017/075330 WO2018157330A1 (fr) | 2017-03-01 | 2017-03-01 | Procédé et système de partitionnement de mégadonnées |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018157330A1 true WO2018157330A1 (fr) | 2018-09-07 |
Family
ID=63369609
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2017/075330 WO2018157330A1 (fr) | 2017-03-01 | 2017-03-01 | Procédé et système de partitionnement de mégadonnées |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2018157330A1 (fr) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110021439A (zh) * | 2019-03-07 | 2019-07-16 | 平安科技(深圳)有限公司 | 基于机器学习的医疗数据分类方法、装置和计算机设备 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103838886A (zh) * | 2014-03-31 | 2014-06-04 | 辽宁四维科技发展有限公司 | 基于代表词知识库的文本内容分类方法 |
US20140229514A1 (en) * | 2013-02-13 | 2014-08-14 | International Business Machines Corporation | Supporting big data in enterprise content management systems |
CN104951553A (zh) * | 2015-06-30 | 2015-09-30 | 成都蓝码科技发展有限公司 | 一种数据处理准确的内容搜集与数据挖掘平台及其实现方法 |
CN105224955A (zh) * | 2015-10-16 | 2016-01-06 | 武汉邮电科学研究院 | 基于微博大数据获取网络服务状态的方法 |
-
2017
- 2017-03-01 WO PCT/CN2017/075330 patent/WO2018157330A1/fr active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140229514A1 (en) * | 2013-02-13 | 2014-08-14 | International Business Machines Corporation | Supporting big data in enterprise content management systems |
CN103838886A (zh) * | 2014-03-31 | 2014-06-04 | 辽宁四维科技发展有限公司 | 基于代表词知识库的文本内容分类方法 |
CN104951553A (zh) * | 2015-06-30 | 2015-09-30 | 成都蓝码科技发展有限公司 | 一种数据处理准确的内容搜集与数据挖掘平台及其实现方法 |
CN105224955A (zh) * | 2015-10-16 | 2016-01-06 | 武汉邮电科学研究院 | 基于微博大数据获取网络服务状态的方法 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110021439A (zh) * | 2019-03-07 | 2019-07-16 | 平安科技(深圳)有限公司 | 基于机器学习的医疗数据分类方法、装置和计算机设备 |
CN110021439B (zh) * | 2019-03-07 | 2023-01-24 | 平安科技(深圳)有限公司 | 基于机器学习的医疗数据分类方法、装置和计算机设备 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20230306020A1 (en) | System and method for analysis of graph databases using intelligent reasoning systems | |
US11494665B2 (en) | Multi-tenant knowledge graph databases with dynamic specification and enforcement of ontological data models | |
US9183107B2 (en) | Management of stream operators with dynamic connections | |
JP2010531481A (ja) | 表形式データストリームプロトコルによる表値パラメータの転送 | |
CN111753140A (zh) | Xml文件解析方法及相关设备 | |
CN114443699A (zh) | 信息查询方法、装置、计算机设备及计算机可读存储介质 | |
CN118626437A (zh) | 基于人工智能的建设工程档案管理方法及系统 | |
WO2023231615A1 (fr) | Procédé de création de colonnes matérialisées et procédé d'interrogation de données basés sur un lac de données | |
CN103678425A (zh) | 多系统的集成分析 | |
WO2018157330A1 (fr) | Procédé et système de partitionnement de mégadonnées | |
WO2018223354A1 (fr) | Procédé et système d'enregistrement de présence à base de positionnement | |
WO2018157391A1 (fr) | Procédé et système d'évaluation de mégadonnées en entreprise | |
WO2018157332A1 (fr) | Procédé et système statistiques appliqués à des mégadonnées | |
WO2018157333A1 (fr) | Procédé et système de traitement de mégadonnées | |
WO2018157331A1 (fr) | Procédé et système de stockage appliqués à des mégadonnées | |
WO2018157392A1 (fr) | Procédé et système pour déterminer les entreprises affiliées sur la base de mégadonnées | |
CN115604191A (zh) | 业务流量控制方法、装置、电子设备及可读存储介质 | |
WO2018170887A1 (fr) | Procédé et système d'affichage de liste de mégadonnées | |
WO2018170888A1 (fr) | Procédé et système de combinaison et de sélection de sous-commande de liste de mégadonnées | |
WO2018209504A1 (fr) | Procédé et système de gestion d'application de terminal sur la base d'un groupe | |
WO2018165839A1 (fr) | Procédé et système de mise en œuvre de chenilles distribuées | |
WO2017117781A1 (fr) | Procédé et système de classification d'informations de réseau | |
CN114253480A (zh) | 一种数据存储方法、装置、电子设备和存储介质 | |
WO2018209550A1 (fr) | Procédé et système de mise à jour de système de terminal | |
WO2018006256A1 (fr) | Procédé et système de collecte de données de courrier locales |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17898706 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 31.01.2020) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17898706 Country of ref document: EP Kind code of ref document: A1 |