+

CN115953214A - Product category labeling method and system across online shopping platforms - Google Patents

Product category labeling method and system across online shopping platforms Download PDF

Info

Publication number
CN115953214A
CN115953214A CN202211474346.6A CN202211474346A CN115953214A CN 115953214 A CN115953214 A CN 115953214A CN 202211474346 A CN202211474346 A CN 202211474346A CN 115953214 A CN115953214 A CN 115953214A
Authority
CN
China
Prior art keywords
category
commodity
word set
words
interference
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211474346.6A
Other languages
Chinese (zh)
Inventor
张瑞
单震
谢传家
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chaozhou Zhuoshu Big Data Industry Development Co Ltd
Original Assignee
Chaozhou Zhuoshu Big Data Industry Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chaozhou Zhuoshu Big Data Industry Development Co Ltd filed Critical Chaozhou Zhuoshu Big Data Industry Development Co Ltd
Priority to CN202211474346.6A priority Critical patent/CN115953214A/en
Publication of CN115953214A publication Critical patent/CN115953214A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明公开了跨线上购物平台的商品品类标注方法及系统,属于大数据处理技术领域,要解决的技术问题为如何简单有效的实现跨平台筛选并标注生活必需品。包括如下步骤:采集各个线上购物平台中商品的商品信息,所述商品信息至少包括商品名称;对于每个品类,采集所述品类的特征词并构建所述品类的特征词集合;对于每个品类,采集所述品类的干扰词并构建所述品类的干扰词集合;对于从各个线上购物平台采集到的每个商品,在每个品类下,当且仅当所述商品的商品名称包含所述品类的特征词集合中任一元素、同时不包含所述品类的干扰词集合中任一元素,为所述商品标注所述品类对应的品类标签。

Figure 202211474346

The invention discloses a method and system for labeling commodity categories across online shopping platforms, and belongs to the technical field of big data processing. The technical problem to be solved is how to simply and effectively realize cross-platform screening and label daily necessities. The method comprises the following steps: collecting commodity information of commodities in each online shopping platform, said commodity information at least including commodity names; for each category, collecting characteristic words of said category and constructing a set of characteristic words of said category; for each Category, collecting noise words of the category and constructing a set of noise words of the category; for each commodity collected from each online shopping platform, under each category, if and only if the product name of the commodity contains Any element in the feature word set of the category and any element in the noise word set not including the category is marked with a category label corresponding to the category for the commodity.

Figure 202211474346

Description

跨线上购物平台的商品品类标注方法及系统Product category labeling method and system across online shopping platforms

技术领域technical field

本发明涉及大数据处理技术领域,具体地说是跨线上购物平台的商品品类标注方法及系统。The invention relates to the technical field of big data processing, in particular to a method and system for labeling commodity categories across online shopping platforms.

背景技术Background technique

粮油、蔬菜等生活必需品为市场保供稳价工作的重点。越来越多人选择线上购买粮油、蔬菜等生活必需品。各类线上购物平台的生活必需品也是重点监测对象。Grain, oil, vegetables and other daily necessities are the focus of work to ensure supply and stabilize prices in the market. More and more people choose to buy daily necessities such as grain, oil and vegetables online. The daily necessities of various online shopping platforms are also key monitoring objects.

然而,普遍地,线上购物平台的类目众多,且不同线上购物平台之间的平台类目存在较大差别。有的平台商品数量百万、千万量级;有的平台上与某一目标生活必需品相关的类目多达千余个;有的平台类目分为三个层级,有的平台类目分为五个层级;有的平台类目由卖家自己设计,即使两个平台都为三级类目结构,同一类生活必需品对应的平台类目也不尽相同。However, in general, there are many categories of online shopping platforms, and the platform categories of different online shopping platforms are quite different. Some platforms have products in the order of millions or tens of millions; some platforms have more than a thousand categories related to a certain target daily necessities; some platforms have three levels of categories, and some platforms have categories There are five levels; some platform categories are designed by sellers themselves. Even if both platforms have a three-level category structure, the platform categories corresponding to the same type of daily necessities are not the same.

如何简单有效的实现跨平台筛选并标注生活必需品,是需要解决的技术问题。How to simply and effectively implement cross-platform screening and labeling of daily necessities is a technical problem that needs to be solved.

发明内容Contents of the invention

本发明的技术任务是针对以上不足,提供跨线上购物平台的商品品类标注方法及系统,来解决如何简单有效的实现跨平台筛选并标注生活必需品的技术问题。The technical task of the present invention is to address the above deficiencies and provide a cross-online shopping platform commodity category labeling method and system to solve the technical problem of how to simply and effectively realize cross-platform screening and labeling of daily necessities.

第一方面,本发明一种跨线上购物平台的商品品类标注方法,对于作为生活必需品的商品,用于基于中文关键词实现跨线上平台的商品品类标注,所述方法包括如下步骤:In the first aspect, the present invention provides a method for labeling commodity categories across online shopping platforms. For commodities as daily necessities, it is used to mark commodity categories across online platforms based on Chinese keywords. The method includes the following steps:

采集各个线上购物平台中商品的商品信息,所述商品信息至少包括商品名称;Collect commodity information of commodities in each online shopping platform, where the commodity information includes at least commodity names;

对于每个品类,采集所述品类的特征词并构建所述品类的特征词集合;For each category, collect the feature words of the category and build the set of feature words of the category;

对于每个品类,采集所述品类的干扰词并构建所述品类的干扰词集合;For each category, collecting noise words of the category and constructing a set of noise words of the category;

对于从各个线上购物平台采集到的每个商品,在每个品类下,当且仅当所述商品的商品名称包含所述品类的特征词集合中任一元素、同时不包含所述品类的干扰词集合中任一元素,为所述商品标注所述品类对应的品类标签。For each product collected from each online shopping platform, under each category, if and only if the product name of the product contains any element in the feature word set of the category, and does not contain any element of the category Any element in the noise word set marks the product with a category label corresponding to the category.

作为优选,对于每个品类,基于品类自身名称、品类近义词以及品类子类名称采集所述品类的特征词。Preferably, for each category, the characteristic words of the category are collected based on the name of the category itself, synonyms of the category and names of subcategories of the category.

作为优选,对于每个品类,采集所述品类的干扰词并构建所述品类的干扰词集合,包括如下步骤:As preferably, for each category, collecting the noise words of the category and constructing the noise word set of the category includes the following steps:

对于包含特征词集合中任一元素的每种商品,随机选出预定种类的商品作为样本商品;For each commodity that contains any element in the feature word set, randomly select a commodity of a predetermined type as a sample commodity;

对于所述样本商品,查找不属于所述品类的商品作为目标商品;For the sample product, find a product that does not belong to the category as the target product;

对于所述目标商品,从所述目标商品的商品名称中查找干扰词,并基于所有目标商品的干扰词构建所述品类的干扰词集合。For the target commodity, search for noise words from the commodity names of the target commodity, and construct a noise word set of the category based on noise words of all target commodities.

作为优选,对于从各个线上购物平台采集到的每个商品,在每个品类下,通过正则表达式以及REGEXP_SUBSTR()进行包含关系判断,判断所述商品是否满足如下条件:当且仅当所述商品的商品名称包含所述品类的特征词集合中任一元素、同时不包含所述品类的干扰词集合中任一元素。Preferably, for each product collected from each online shopping platform, under each category, use regular expressions and REGEXP_SUBSTR() to determine the inclusion relationship, and judge whether the product meets the following conditions: if and only if the The product name of the product contains any element in the feature word set of the category, and does not contain any element in the noise word set of the category.

作为优选,所述方法还包括如下步骤:Preferably, the method also includes the steps of:

对于每个品类,定期采集所述品类的特征词并更新所述品类的特征词集合;For each category, regularly collect the feature words of the category and update the set of feature words of the category;

对于每个品类,定期采集所述品类的干扰词并更新所述品类的干扰词集合;For each category, regularly collect noise words of the category and update the set of noise words of the category;

对于已经标注的商品,在每个品类下,基于更新后的所述品类的特征词集合和干扰词集合,对所述商品的标注进行更新。For the marked commodities, under each category, the labeling of the commodities is updated based on the updated feature word set and noise word set of the category.

第二方面,本发明一种跨线上购物平台的商品品类标注系统,用于通过如第一方面任一项所述的跨线上购物平台的商品品类标注方法进行商品标注,所述系统包括:In the second aspect, the present invention is a cross-line shopping platform product category labeling system, which is used to perform product labeling through the cross-line shopping platform product category labeling method described in any one of the first aspects, and the system includes :

商品采集模块,所述商品采集模块用于采集各个线上购物平台中商品的商品信息,所述商品信息至少包括商品名称;A commodity collection module, the commodity collection module is used to collect commodity information of commodities in each online shopping platform, and the commodity information includes at least commodity names;

特征词集合构建模块,对于每个品类,所述特征词集合构建模块用于采集所述品类的特征词并构建所述品类的特征词集合;A feature word set building module, for each category, the feature word set building module is used to collect the feature words of the category and build the feature word set of the category;

干扰词集合构建模块,对于每个品类,所述干扰词集合构建模块用于采集所述品类的干扰词并构建所述品类的干扰词集合;A noise word set building module, for each category, the noise word set building module is used to collect the noise words of the category and construct the noise word set of the category;

商品标注模块,对于从各个线上购物平台采集到的每个商品,所述商品标注模块用于执行如下:在每个品类下,当且仅当所述商品的商品名称包含所述品类的特征词集合中任一元素、同时不包含所述品类的干扰词集合中任一元素,为所述商品标注所述品类对应的品类标签。Commodity labeling module, for each product collected from each online shopping platform, the product labeling module is used to perform the following: under each category, if and only if the product name of the product contains the characteristics of the category Any element in the word set and any element in the noise word set that does not contain the category at the same time, mark the category label corresponding to the category for the commodity.

作为优选,对于每个品类,所述特征词集合构建模块用于基于品类自身名称、品类近义词以及品类子类名称采集所述品类的特征词。Preferably, for each category, the characteristic word set building module is used to collect the characteristic words of the category based on the name of the category itself, synonyms of the category and names of subcategories of the category.

作为优选,对于每个品类,所述干扰词集合构建模块用于执行如下实现采集所述品类的干扰词并构建所述品类的干扰词集合:As preferably, for each category, the noise word set building module is used to perform the following implementation to collect the noise words of the category and construct the noise word set of the category:

对于包含特征词集合中任一元素的每种商品,随机选出预定种类的商品作为样本商品;For each commodity that contains any element in the feature word set, randomly select a commodity of a predetermined type as a sample commodity;

对于所述样本商品,查找不属于所述品类的商品作为目标商品;For the sample product, find a product that does not belong to the category as the target product;

对于所述目标商品,从所述目标商品的商品名称中查找干扰词,并基于所有目标商品的干扰词构建所述品类的干扰词集合。For the target commodity, search for noise words from the commodity names of the target commodity, and construct a noise word set of the category based on noise words of all target commodities.

作为优选,对于从各个线上购物平台采集到的每个商品,在每个品类下,所述商品标注模块用于通过正则表达式以及REGEXP_SUBSTR()进行包含关系判断,判断所述商品是否满足如下条件:当且仅当所述商品的商品名称包含所述品类的特征词集合中任一元素、同时不包含所述品类的干扰词集合中任一元素。Preferably, for each product collected from each online shopping platform, under each category, the product labeling module is used to judge the inclusion relationship through regular expressions and REGEXP_SUBSTR(), and judge whether the product satisfies the following Condition: if and only if the product name contains any element in the feature word set of the category, and does not contain any element in the noise word set of the category.

作为优选,所述系统还包括更新模块,所述更细模块用于执行:Preferably, the system further includes an update module, and the more detailed module is used to execute:

对于每个品类,定期采集所述品类的特征词并更新所述品类的特征词集合;For each category, regularly collect the feature words of the category and update the set of feature words of the category;

对于每个品类,定期采集所述品类的干扰词并更新所述品类的干扰词集合;For each category, regularly collect noise words of the category and update the set of noise words of the category;

对于已经标注的商品,在每个品类下,基于更新后的所述品类的特征词集合和干扰词集合,对所述商品的标注进行更新。For the marked commodities, under each category, the labeling of the commodities is updated based on the updated feature word set and noise word set of the category.

本发明的跨线上购物平台的商品品类标注方法及系统具有以下优点:对于每个品类,构建特征词集合和干扰词集合,对于每个商品,只需要获取商品的商品名称信息,在每个品类下,基于特征词结合和干扰词集合可筛选该商品是否属于该品类,进行标注,在不同线上购物平台之间的平台类目存在较大差别的情况下,实现了商品的标注。The product category labeling method and system across online shopping platforms of the present invention have the following advantages: for each category, a feature word set and a noise word set are constructed, and for each product, only the product name information of the product needs to be obtained, and in each Under the category, based on the combination of feature words and the set of noise words, it is possible to screen whether the product belongs to the category and mark it. When there are large differences in the platform categories between different online shopping platforms, the product labeling is realized.

附图说明Description of drawings

为了更清楚地说明本发明实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the following will briefly introduce the accompanying drawings that need to be used in the descriptions of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only of the present invention. For some embodiments, those skilled in the art can also obtain other drawings based on these drawings without creative effort.

下面结合附图对本发明进一步说明。The present invention will be further described below in conjunction with the accompanying drawings.

图1为实施例1一种跨线上购物平台的商品品类标注方法的流程框图。Fig. 1 is a flowchart of a method for labeling commodity categories across online shopping platforms in Embodiment 1.

具体实施方式Detailed ways

下面结合附图和具体实施例对本发明作进一步说明,以使本领域的技术人员可以更好地理解本发明并能予以实施,但所举实施例不作为对本发明的限定,在不冲突的情况下,本发明实施例以及实施例中的技术特征可以相互结合。The present invention will be further described below in conjunction with accompanying drawing and specific embodiment, so that those skilled in the art can better understand the present invention and can be implemented, but the embodiment given is not as the limitation of the present invention, in the case of no conflict Next, the embodiments of the present invention and the technical features in the embodiments can be combined with each other.

本发明实施例提供跨线上购物平台的商品品类标注方法及系统,用于解决如何简单有效的实现跨平台筛选并标注生活必需品的技术问题。Embodiments of the present invention provide a method and system for labeling commodity categories across online shopping platforms, which are used to solve the technical problem of how to simply and effectively realize cross-platform screening and labeling of daily necessities.

实施例1:Example 1:

本发明一种跨线上购物平台的商品品类标注方法,对于作为生活必需品的商品,用于基于中文关键词实现跨线上平台的商品品类标注,所述方法包括如下步骤:The present invention is a method for labeling commodity categories across online shopping platforms. For commodities as daily necessities, it is used to mark commodity categories across online platforms based on Chinese keywords. The method includes the following steps:

S100、采集各个线上购物平台中商品的商品信息,所述商品信息至少包括商品名称;S100. Collect commodity information of commodities in each online shopping platform, where the commodity information includes at least commodity names;

S200、对于每个品类,采集所述品类的特征词并构建所述品类的特征词集合;S200. For each category, collect characteristic words of the category and construct a set of characteristic words of the category;

S300、对于每个品类,采集所述品类的干扰词并构建所述品类的干扰词集合;S300. For each category, collect noise words of the category and construct a noise word set of the category;

S400、对于从各个线上购物平台采集到的每个商品,在每个品类下,当且仅当所述商品的商品名称包含所述品类的特征词集合中任一元素、同时不包含所述品类的干扰词集合中任一元素,为所述商品标注所述品类对应的品类标签。S400. For each product collected from each online shopping platform, under each category, if and only if the product name of the product contains any element in the feature word set of the category and does not contain the Any element in the noise word set of the category is labeled with a category label corresponding to the category for the commodity.

本实施例中,步骤S200中确定个品类的特征词集合,作为具体实施案例,对于每个品类,基于品类自身名称、品类近义词以及品类子类名称采集所述品类的特征词。例如,对于“粮食-大米”而言,品类特征词集合可以是{大米,粳米,籼米,东北大米,五常大米,长粒香}。In this embodiment, in step S200, a set of characteristic words of a category is determined. As a specific implementation example, for each category, the characteristic words of the category are collected based on the name of the category itself, synonyms of the category, and names of subcategories of the category. For example, for "grain-rice", the set of category feature words may be {rice, japonica rice, indica rice, northeast rice, Wuchang rice, long-grain rice}.

步骤S300确定各品类的干扰词集合,作为具体实施案例,对于每个品类,采集所述品类的干扰词并构建所述品类的干扰词集合,包括如下步骤:Step S300 determines the set of noise words of each category, as a specific implementation example, for each category, collects the noise words of the category and constructs the set of noise words of the category, including the following steps:

(1)对于包含特征词集合中任一元素的每种商品,随机选出预定种类的商品作为样本商品;(1) For each commodity that contains any element in the feature word set, randomly select a commodity of a predetermined type as a sample commodity;

(2)对于所述样本商品,查找不属于所述品类的商品作为目标商品;(2) For the sample product, find a product that does not belong to the category as the target product;

(3)对于所述目标商品,从所述目标商品的商品名称中查找干扰词,并基于所有目标商品的干扰词构建所述品类的干扰词集合。(3) For the target commodity, search for noise words from the commodity names of the target commodity, and construct a noise word set of the category based on noise words of all target commodities.

即在该步骤中,对于包含品类特征词集合任一元素的商品,随机选出其中少量商品(例如总量的10%或者100个);查看其中实际上不属于该品类的商品,从它们的商品名称中概括出干扰词集合。对于“粮食-大米”而言,品类干扰词集合可以是{米粉,米线,米醋,米饼,米酥,芝麻}。That is, in this step, randomly select a small number of commodities (for example, 10% or 100 of the total amount) from commodities that contain any element of the category feature word set; check the commodities that do not actually belong to the category, from their A set of noise words is summarized in the product title. For "grain-rice", the set of category noise words can be {rice noodles, rice noodles, rice vinegar, rice cakes, rice crisps, sesame}.

步骤S400对商品进行标注,作为具体实施案例,对于从各个线上购物平台采集到的每个商品,在每个品类下,以SQL语句为例,通过正则表达式以及REGEXP_SUBSTR()进行包含关系判断,判断所述商品是否满足如下条件:当且仅当所述商品的商品名称包含所述品类的特征词集合中任一元素、同时不包含所述品类的干扰词集合中任一元素。Python等其他主流语言中有类似功能。Step S400 marks the commodities. As a specific implementation example, for each commodity collected from each online shopping platform, under each category, using SQL statements as an example, the inclusion relationship is judged through regular expressions and REGEXP_SUBSTR() , to determine whether the product satisfies the following condition: if and only if the product name of the product contains any element in the feature word set of the category and does not contain any element in the noise word set of the category. There are similar functions in other mainstream languages such as Python.

作为本实施例的改进,该方法还包括如下步骤:As an improvement of this embodiment, the method also includes the following steps:

对于每个品类,定期采集所述品类的特征词并更新所述品类的特征词集合;For each category, regularly collect the feature words of the category and update the set of feature words of the category;

对于每个品类,定期采集所述品类的干扰词并更新所述品类的干扰词集合;For each category, regularly collect noise words of the category and update the set of noise words of the category;

对于已经标注的商品,在每个品类下,基于更新后的所述品类的特征词集合和干扰词集合,对所述商品的标注进行更新。For the marked commodities, under each category, the labeling of the commodities is updated based on the updated feature word set and noise word set of the category.

在上述改进的步骤S500中,动态调整各品类特征词集合和干扰词集合。根据实际情况,定期调整各品类的特征词集合和干扰词集合,用于对新出现的商品筛选与标注,或对已经标注商品做更新。In the above improved step S500, the set of characteristic words and the set of noise words of each category are dynamically adjusted. According to the actual situation, regularly adjust the feature word set and noise word set of each category to screen and label new products or update products that have been marked.

线上购物平台的类目众多,且不同线上购物平台之间的平台类目存在较大差别。本申请方法简单、普适,只需要获得商品的名称信息,即可从各平台上众多商品中筛选并标注出目标生活必需品,为相关部门下一步的保供稳价工作做好铺垫。There are many categories of online shopping platforms, and the platform categories of different online shopping platforms are quite different. This application method is simple and universal. You only need to obtain the name information of the product, and you can select and mark the target necessities of life from many products on each platform, paving the way for the next step of the work of ensuring supply and stabilizing prices for relevant departments.

实施例2:Example 2:

本发明一种跨线上购物平台的商品品类标注系统,包括商品采集模块、特征词集合构建模块、干扰词集合构建模块、商品标注模块,该系统用于通过实施例1公开的方法进行商品标注。The present invention is a product category labeling system across online shopping platforms, including a product collection module, a feature word set building module, a noise word set building module, and a product labeling module. The system is used to label products through the method disclosed in Embodiment 1 .

商品采集模块用于采集各个线上购物平台中商品的商品信息,所述商品信息至少包括商品名称。The commodity collection module is used to collect commodity information of commodities in each online shopping platform, and the commodity information includes at least commodity names.

对于每个品类,特征词集合构建模块用于采集所述品类的特征词并构建所述品类的特征词集合。For each category, the characteristic word set building module is used to collect the characteristic words of the category and construct the characteristic word set of the category.

作为具体实施案例,对于每个品类,所述特征词集合构建模块用于基于品类自身名称、品类近义词以及品类子类名称采集所述品类的特征词。对于“粮食-大米”而言,品类特征词集合可以是{大米,粳米,籼米,东北大米,五常大米,长粒香}。As a specific implementation example, for each category, the feature word set building module is used to collect feature words of the category based on the name of the category itself, synonyms of the category, and names of subcategories of the category. For "grain-rice", the set of category feature words can be {rice, japonica rice, indica rice, northeast rice, Wuchang rice, long-grain rice}.

对于每个品类,干扰词集合构建模块用于采集所述品类的干扰词并构建所述品类的干扰词集合。For each category, the noise word set building module is used to collect noise words of the category and construct a noise word set of the category.

作为具体实施案例,干扰词集合构建模块用于执行如下实现采集所述品类的干扰词并构建所述品类的干扰词集合:As a specific implementation example, the noise word set building module is used to perform the following implementation to collect the noise words of the category and construct the noise word set of the category:

(1)对于包含特征词集合中任一元素的每种商品,随机选出预定种类的商品作为样本商品;(1) For each commodity that contains any element in the feature word set, randomly select a commodity of a predetermined type as a sample commodity;

(2)对于所述样本商品,查找不属于所述品类的商品作为目标商品;(2) For the sample product, find a product that does not belong to the category as the target product;

(3)对于所述目标商品,从所述目标商品的商品名称中查找干扰词,并基于所有目标商品的干扰词构建所述品类的干扰词集合。(3) For the target commodity, search for noise words from the commodity names of the target commodity, and construct a noise word set of the category based on noise words of all target commodities.

本实例中,对于包含品类特征词集合任一元素的商品,干扰词集合构建模块用于随机选出其中少量商品(例如总量的10%或者100个);查看其中实际上不属于该品类的商品,从它们的商品名称中概括出干扰词集合。对于“粮食-大米”而言,品类干扰词集合可以是{米粉,米线,米醋,米饼,米酥,芝麻}。In this example, for commodities containing any element of the category feature word set, the noise word set building block is used to randomly select a small number of commodities (for example, 10% or 100 of the total amount); Items, summarizing the set of noise words from their item names. For "grain-rice", the set of category noise words can be {rice noodles, rice noodles, rice vinegar, rice cakes, rice crisps, sesame}.

对于从各个线上购物平台采集到的每个商品,商品标注模块用于执行如下:在每个品类下,当且仅当所述商品的商品名称包含所述品类的特征词集合中任一元素、同时不包含所述品类的干扰词集合中任一元素,为所述商品标注所述品类对应的品类标签。For each product collected from each online shopping platform, the product labeling module is used to perform the following: under each category, if and only if the product name of the product contains any element in the feature word set of the category , while not including any element in the noise word set of the category, and labeling the product with a category label corresponding to the category.

作为具体实施案例,对于从各个线上购物平台采集到的每个商品,在每个品类下,所述商品标注模块用于通过正则表达式以及REGEXP_SUBSTR()进行包含关系判断,判断所述商品是否满足如下条件:当且仅当所述商品的商品名称包含所述品类的特征词集合中任一元素、同时不包含所述品类的干扰词集合中任一元素。具体的,SQL语言中,“包含”关系的判断可以使用正则表达式及“REGEXP_SUBSTR()”实现,Python等其他主流语言中有类似功能。As a specific implementation example, for each product collected from various online shopping platforms, under each category, the product labeling module is used to judge the inclusion relationship through regular expressions and REGEXP_SUBSTR() to determine whether the product The following conditions are met: if and only if the commodity name of the commodity contains any element in the feature word set of the category, and does not contain any element in the noise word set of the category. Specifically, in the SQL language, the judgment of the "contains" relationship can be implemented using regular expressions and "REGEXP_SUBSTR()", and other mainstream languages such as Python have similar functions.

作为改进,本实施例系统还包括更新模块,更细模块用于执行:As an improvement, the system of this embodiment also includes an update module, and the more detailed module is used to execute:

(1)对于每个品类,定期采集所述品类的特征词并更新所述品类的特征词集合;(1) For each category, regularly collect the feature words of the category and update the set of feature words of the category;

(2)对于每个品类,定期采集所述品类的干扰词并更新所述品类的干扰词集合;(2) For each category, regularly collect the noise words of the category and update the set of noise words of the category;

(3)对于已经标注的商品,在每个品类下,基于更新后的所述品类的特征词集合和干扰词集合,对所述商品的标注进行更新。(3) For the marked commodities, under each category, the labeling of the commodities is updated based on the updated feature word set and noise word set of the category.

本实施例中通过更新模块根据实际情况,定期调整各品类的特征词集合和干扰词集合,用于对新出现的商品筛选与标注,或对已经标注商品做更新。In this embodiment, the update module regularly adjusts the set of feature words and the set of noise words of each category according to the actual situation, and is used to screen and label new products or update products that have been marked.

上文通过附图和优选实施例对本发明进行了详细展示和说明,然而本发明不限于这些已揭示的实施例,基与上述多个实施例本领域技术人员可以知晓,可以组合上述不同实施例中的代码审核手段得到本发明更多的实施例,这些实施例也在本发明的保护范围之内。The present invention has been shown and described in detail through the accompanying drawings and preferred embodiments above, but the present invention is not limited to these disclosed embodiments, and those skilled in the art based on the above-mentioned multiple embodiments can know that the above-mentioned different embodiments can be combined More embodiments of the present invention can be obtained by means of code review in the present invention, and these embodiments are also within the protection scope of the present invention.

Claims (10)

1. A commodity class labeling method of an online shopping platform is characterized in that commodities which are taken as living necessities are labeled based on Chinese keywords, and the method comprises the following steps:
collecting commodity information of commodities in each online shopping platform, wherein the commodity information at least comprises commodity names;
for each category, collecting the characteristic words of the category and constructing a characteristic word set of the category;
for each category, collecting interference words of the category and constructing an interference word set of the category;
for each commodity collected from each online shopping platform, under each category, if and only if the commodity name of the commodity contains any element in the feature word set of the category and does not contain any element in the interference word set of the category, labeling the commodity with a category label corresponding to the category.
2. The method of claim 1, wherein the characteristic words of each item are collected based on its own name, its synonym, and its subclass name.
3. The method for labeling commodity categories on the cross-line shopping platform according to claim 1, wherein for each category, the method for collecting the interference words of the category and constructing the interference word set of the category comprises the following steps:
randomly selecting a commodity of a preset type as a sample commodity for each commodity containing any element in the feature word set;
for the sample commodity, searching for a commodity which does not belong to the category as a target commodity;
and for the target commodity, searching interference words from the commodity name of the target commodity, and constructing an interference word set of the category based on the interference words of all the target commodities.
4. The method for labeling merchandise categories of the online shopping platform according to claim 1, wherein for each merchandise collected from each online shopping platform, under each category, the inclusion relation judgment is performed through a regular expression and REGEXP _ SUBSTR (), and whether the merchandise satisfies the following conditions is judged: and if and only if the commodity name of the commodity contains any element in the characteristic word set of the category and does not contain any element in the interference word set of the category.
5. The method for labeling merchandise items on an off-line shopping platform according to any one of claims 1 to 4, further comprising the steps of:
for each category, periodically collecting the characteristic words of the category and updating the characteristic word set of the category;
for each category, periodically collecting interference words of the category and updating an interference word set of the category;
and for the marked commodities, updating the marks of the commodities based on the updated feature word set and the updated interference word set of the commodity under each commodity.
6. A system for labeling commodities based on an online shopping platform, characterized in that the system is used for labeling commodities by the method for labeling commodities based on the online shopping platform according to any one of claims 1 to 5, and the system comprises:
the system comprises a commodity acquisition module, a commodity display module and a commodity display module, wherein the commodity acquisition module is used for acquiring commodity information of commodities in each online shopping platform, and the commodity information at least comprises commodity names;
the characteristic word set building module is used for collecting the characteristic words of the categories and building a characteristic word set of the categories for each category;
the interference word set building module is used for collecting interference words of the categories and building an interference word set of the categories for each category;
a merchandise labeling module for executing the following for each merchandise collected from each online shopping platform: and under each category, if and only if the commodity name of the commodity contains any element in the feature word set of the category and does not contain any element in the interference word set of the category, labeling the commodity with a category label corresponding to the category.
7. The system of claim 6, wherein for each item, the feature word set construction module is configured to collect feature words of the item based on an item name, an item synonym, and an item subclass name.
8. The system of claim 6, wherein for each category, the interfering word set constructing module is configured to perform the following operations to collect interfering words of the category and construct the interfering word set of the category:
randomly selecting a commodity of a preset type as a sample commodity for each commodity containing any element in the feature word set;
for the sample commodity, searching for a commodity which does not belong to the category as a target commodity;
and for the target commodity, searching interference words from the commodity name of the target commodity, and constructing an interference word set of the category based on the interference words of all the target commodities.
9. The system for labeling merchandise items on an online shopping platform according to claim 6, wherein for each merchandise collected from each online shopping platform, in each item, the merchandise labeling module is configured to perform containment relationship judgment through a regular expression and REGEXP _ SUBSTR () to determine whether the merchandise satisfies the following condition: and if and only if the commodity name of the commodity contains any element in the feature word set of the category and does not contain any element in the interference word set of the category.
10. The system for labeling merchandise items on an off-line shopping platform according to any one of claims 6 to 9, further comprising an updating module for performing:
for each category, periodically collecting the characteristic words of the category and updating the characteristic word set of the category;
for each category, periodically collecting interference words of the category and updating an interference word set of the category;
and for the marked commodities, updating the marks of the commodities based on the updated feature word set and the updated interference word set of the commodity under each commodity.
CN202211474346.6A 2022-11-23 2022-11-23 Product category labeling method and system across online shopping platforms Pending CN115953214A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211474346.6A CN115953214A (en) 2022-11-23 2022-11-23 Product category labeling method and system across online shopping platforms

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211474346.6A CN115953214A (en) 2022-11-23 2022-11-23 Product category labeling method and system across online shopping platforms

Publications (1)

Publication Number Publication Date
CN115953214A true CN115953214A (en) 2023-04-11

Family

ID=87281438

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211474346.6A Pending CN115953214A (en) 2022-11-23 2022-11-23 Product category labeling method and system across online shopping platforms

Country Status (1)

Country Link
CN (1) CN115953214A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103778205A (en) * 2014-01-13 2014-05-07 北京奇虎科技有限公司 Commodity classifying method and system based on mutual information
CN105045909A (en) * 2015-08-11 2015-11-11 北京京东尚科信息技术有限公司 Method and device for recognizing commodity name from text
CN105573968A (en) * 2015-12-10 2016-05-11 天津海量信息技术有限公司 Text indexing method based on rules
CN112463971A (en) * 2020-09-15 2021-03-09 杭州商情智能有限公司 E-commerce commodity classification method and system based on hierarchical combination model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103778205A (en) * 2014-01-13 2014-05-07 北京奇虎科技有限公司 Commodity classifying method and system based on mutual information
CN105045909A (en) * 2015-08-11 2015-11-11 北京京东尚科信息技术有限公司 Method and device for recognizing commodity name from text
CN105573968A (en) * 2015-12-10 2016-05-11 天津海量信息技术有限公司 Text indexing method based on rules
CN112463971A (en) * 2020-09-15 2021-03-09 杭州商情智能有限公司 E-commerce commodity classification method and system based on hierarchical combination model

Similar Documents

Publication Publication Date Title
Karmaker et al. Analyzing supply chain risk factors in the small and medium enterprises under fuzzy environment: Implications towards sustainability for emerging economies
Lin et al. The knowledge of entry mode decision for small and medium enterprises
Biddanda et al. A variant-centric perspective on geographic patterns of human allele frequency variation
Darmawan et al. Value chain analysis for green productivity improvement in the natural rubber supply chain: a case study
US20110167330A1 (en) Dynamically filtering aggregate reports based on values resulting from one or more previously applied filters
AU2006291316B2 (en) Filtering user interface for a data summary table
Ribeiro-Duthie et al. Fair trade and staple foods: A systematic review
Amudha et al. A study on TOPSIS MCDM techniques and its application
Tng et al. Financial stress, economic activity and monetary policy in the ASEAN-5 economies
US10067964B2 (en) System and method for analyzing popularity of one or more user defined topics among the big data
Ruiz-Mercado et al. Using green chemistry and engineering principles to design, assess, and retrofit chemical processes for sustainability
Woodhouse et al. qTeller: A tool for comparative multi-genomic gene expression analysis
Wang et al. Two-Stage Fuzzy MCDM for Green Supplier Selection in Steel Industry.
JP2010231779A (en) Benchmark evaluation system and program
Tong A comparative review on company specific determinants for sustainability reporting in United Kingdom (UK) and Malaysia
US20240420259A1 (en) Information analysis device, and storage medium
Verma et al. Relationship among environmental performance, R&D expenditure and financial performance: evidence from indian manufacturing firms
CN112561642A (en) Multidimensional product comparison analysis method and device, computer equipment and storage medium
US10719561B2 (en) System and method for analyzing popularity of one or more user defined topics among the big data
Xiang et al. Exploring the food-energy-water nexus in China's national industries: Insights from network structure and production disturbances
JP2021163002A (en) Commodity demand prediction program
CN115953214A (en) Product category labeling method and system across online shopping platforms
Aduhene-Chinbuah et al. Multi-risk management in Ghana's agricultural sector: Strategies, actors, and conceptual shifts—a review
Nédellec et al. WTO, an ontology for wheat traits and phenotypes in scientific publications
Kleer et al. Acquisition through innovation tournaments in high-tech industries: a comparative perspective

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载