+

WO2022156589A1 - Method and device for determining live broadcast click rate - Google Patents

Method and device for determining live broadcast click rate Download PDF

Info

Publication number
WO2022156589A1
WO2022156589A1 PCT/CN2022/071797 CN2022071797W WO2022156589A1 WO 2022156589 A1 WO2022156589 A1 WO 2022156589A1 CN 2022071797 W CN2022071797 W CN 2022071797W WO 2022156589 A1 WO2022156589 A1 WO 2022156589A1
Authority
WO
WIPO (PCT)
Prior art keywords
live broadcast
user
data
click
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2022/071797
Other languages
French (fr)
Chinese (zh)
Inventor
王艺斐
王晶晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Publication of WO2022156589A1 publication Critical patent/WO2022156589A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/219Managing data history or versioning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2308Concurrency control
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • G06Q10/083Shipping
    • G06Q10/0838Historical data

Definitions

  • the present disclosure relates to the field of computer technology, and in particular, to a method and device for determining the click-through rate of a live broadcast.
  • the embodiments of the present disclosure provide a method and apparatus for determining the click-through rate of live broadcast, which can improve the prediction accuracy of the click-through rate of live broadcast, thereby improving the accuracy of push, and reducing the possibility of insufficient stock or slow sales.
  • a method for determining a live broadcast click-through rate comprising:
  • the multiple historical user data, and the generation time of the multiple historical user data determine the user behavior sequence corresponding to the multiple historical user data
  • the click rate prediction model is trained
  • the click-through rate of the target user on the target live broadcast data is determined.
  • determining the user behavior sequence corresponding to the multiple historical user data according to the sequence generation model, the multiple historical user data, and the generation time of the multiple historical user data including:
  • the user behavior feature and the generation time corresponding to the user behavior feature are used as the input of the sequence generation model, and the weight value corresponding to each of the user behavior features is determined according to the output of the sequence generation model;
  • the user behavior sequence is generated according to the user behavior feature and the weight value.
  • determining the weight corresponding to each of the user behavior features according to the output of the sequence generation model including:
  • the output of the sequence generation model is normalized to obtain a weight value corresponding to each of the user behavior features.
  • the training of the click-through rate prediction model according to the user behavior sequence, the user attribute characteristics and the live broadcast characteristics includes:
  • the user behavior sequence is input into the ARMA model, and the user dynamic feature is determined according to the output of the ARMA model;
  • the user dynamic feature, the user attribute feature and the live broadcast feature are used as the input of the click-through rate prediction model to train the click-through rate prediction model.
  • the method further includes:
  • the live broadcast data is pushed to the target user.
  • the method further includes:
  • the inventory corresponding to the target live broadcast data is determined, and inventory management is performed according to the inventory.
  • the sequence generation model is a random Senli model
  • the click-through rate prediction model is an XGBOOST model.
  • a device for determining a click-through rate of a live broadcast including:
  • the acquisition module is used to acquire multiple historical user data and multiple historical live broadcast data
  • a sequence generation module configured to determine the user behavior sequence corresponding to the multiple historical user data according to the sequence generation model, the multiple historical user data, and the generation time of the multiple historical user data;
  • a feature generation module configured to determine user attribute features according to the plurality of historical user data, and determine live broadcast features according to the plurality of historical live broadcast data;
  • a model training module used for training the click-through rate prediction model according to the user behavior sequence, the user attribute characteristics and the live broadcast characteristics
  • the data processing module is used to determine the click rate of the target user on the target live broadcast data according to the trained click rate prediction model.
  • an electronic device for determining the click-through rate of a live broadcast including:
  • processors one or more processors
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the method for determining the click-through rate of a live broadcast provided by the present disclosure.
  • a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, implements the method for determining the click-through rate of a live broadcast provided by the present disclosure.
  • the above-mentioned embodiments have the following advantages or beneficial effects: because the time-based user data training model is used to determine the technical means of the live broadcast click-through rate, the technical problems of inaccurate subjective prediction push, insufficient stock or unsalable phenomena are overcome, and the result is achieved.
  • the technical effect of improving the prediction accuracy of the click-through rate of the live broadcast thereby improving the accuracy of the push, and reducing the possibility of insufficient stock or slow sales.
  • FIG. 1 is an exemplary system architecture diagram of a method for determining a live click rate or an apparatus for determining a live click rate that is suitable for application in an embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of a main process of a method for determining a live broadcast click-through rate according to an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of a detailed flow of a method for determining a live broadcast click-through rate according to an embodiment of the present disclosure
  • FIG. 4 is a schematic diagram of main modules of a device for determining a live click rate according to an embodiment of the present disclosure
  • FIG. 5 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server according to an embodiment of the present disclosure.
  • ARMA model is an important method to study time series, including: autoregressive model (Autoregressive model, referred to as AR model), moving average model (Moving average model, referred to as MA model) and an autoregressive moving average model (ARMA).
  • AR model autoregressive model
  • MA model moving average model
  • ARMA autoregressive moving average model
  • Censoring refers to the property that the autocorrelation function (ACF) or partial autocorrelation function (PACF) of the time series is 0 after a certain order.
  • ACF autocorrelation function
  • PAF partial autocorrelation function
  • Tailing refers to the property that the autocorrelation function (ACF) or partial autocorrelation function (PACF) of the time series is not all zero after a certain order.
  • ACF autocorrelation function
  • PAF partial autocorrelation function
  • AIC Akaike Information Criterion
  • Akaike Information Criterion is a standard to measure the goodness of statistical model fitting. Usually, the smaller the AIC value, the better the model.
  • BIC Bayesian Information Criterion
  • FIG. 1 shows an exemplary system architecture diagram of a method for determining a click-through rate of live broadcast or an apparatus for determining a click-through rate of a live broadcast that is suitable for use in an embodiment of the present disclosure.
  • the exemplary system architecture of the method or the apparatus for determining the live click rate includes:
  • the system architecture 100 may include terminal devices 101 , 102 , and 103 , a network 104 and a server 105 .
  • the network 104 is a medium used to provide a communication link between the terminal devices 101 , 102 , 103 and the server 105 .
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
  • the user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and the like.
  • Various communication client applications may be installed on the terminal devices 101 , 102 and 103 , such as shopping applications, web browser applications, search applications, instant messaging tools, email clients, social platform software, and the like.
  • the terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop computers, desktop computers, and the like.
  • the server 105 may be a server that provides various services, for example, a background management server that provides support for shopping websites browsed by the terminal devices 101 , 102 , and 103 .
  • the background management server may analyze and process the received user feature query request and other data, and feed back the processing results (eg, user features) to the terminal devices 101 , 102 , and 103 .
  • the method for determining the click-through rate of live broadcast is generally performed by the server 105 , and accordingly, the device for determining the click-through rate of the live broadcast is generally set in the server 105 .
  • terminal devices, networks and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.
  • FIG. 2 is a schematic diagram of the main process of a method for determining a live broadcast click-through rate according to an embodiment of the present disclosure. As shown in FIG. 2 , the method for determining a live broadcast click-through rate of the present disclosure includes:
  • Step S201 acquiring multiple historical user data and multiple historical live broadcast data.
  • a plurality of historical user data and a plurality of historical live broadcast data are obtained based on the historical data of the platform.
  • the historical user data may include data such as the user's age, gender, purchasing ability, occupation, and preference, and may also include the user's browsing data. , comments, favorites, add-ons, ordering, sharing and other operations data, can also include the time when the user operation data is generated; historical live broadcast data can include live broadcast brand, lottery, time, interaction, anchor and commodity information.
  • Step S202 according to the sequence generation model, the multiple historical user data and the generation time of the multiple historical user data, determine the user behavior sequence corresponding to the multiple historical user data.
  • the user behavior characteristics are determined based on the user operation data therein, and the corresponding generation time of the user behavior characteristics is determined based on the time when the user operation data is generated, and the user behavior characteristics and user behavior characteristics are determined.
  • the corresponding generation time of the feature is input to the sequence generation model, and the feature score of each user behavior feature is output.
  • the feature scores of the user behavior features are normalized to obtain the weight value corresponding to each user behavior feature; based on the data of the user behavior feature and the weight value corresponding to the user behavior feature, the user behavior sequence is generated after weighted summation.
  • the user behavior sequence contains 12 elements, that is, the 12-month user behavior score, which characterizes the user's behavior in the past 1 year.
  • Step S203 Determine user attribute characteristics according to the plurality of historical user data, and determine live broadcast characteristics according to the plurality of historical live broadcast data.
  • the user attribute feature is determined based on the user information data therein, and the live broadcast feature is determined based on the plurality of historical live broadcast data therein.
  • Step S204 train a click-through rate prediction model according to the user behavior sequence, the user attribute feature, and the live broadcast feature.
  • the user behavior sequence obtained in step S202 is input into the ARMA model, the ARMA model is trained, and the ARMA model parameters are output as user dynamic features.
  • the user dynamic characteristics, the user attribute characteristics obtained in step S203, and the live broadcast characteristics are input into the click-through rate prediction model, the click-through rate prediction model is trained, and the trained click-through rate prediction model is output.
  • the CTR prediction model is the XGBOOST model.
  • Step S205 according to the trained click-through rate prediction model, determine the click-through rate of the target user on the target live broadcast data.
  • the target user data and a plurality of live broadcast data to be pushed are acquired, and the click rate of the target user on the target live broadcast data is determined according to the target user data and the trained click rate prediction model.
  • the click-through rate may include data of regular clicks, browsing, favorites, add-ons, and ordering.
  • push live broadcast data for the target user or, according to the click rate, determine the inventory corresponding to the target live broadcast data, and carry out inventory management according to the inventory amount to timely increase, allocate supply warehouses or support demand warehouses.
  • the user behavior sequence corresponding to the historical user data is determined according to the multiple historical user data
  • the live broadcast characteristics are determined according to the multiple historical live broadcast data
  • train the click-through rate prediction model according to the trained click-through rate prediction model, determine the click rate of the target user on the target live broadcast data and other steps, which can adapt to the periodic changes of user behavior, optimize the performance of the live broadcast data prediction model, and make full use of Live broadcast resources, accurately predict live broadcast click-through rate, accurately push live broadcasts to users and manage inventory reasonably.
  • FIG. 3 is a schematic diagram of a detailed process of a method for determining a live broadcast click-through rate according to an embodiment of the present disclosure. As shown in FIG. 3 , the method for determining a live broadcast click-through rate of the present disclosure includes:
  • step S301 a database of live broadcast is constructed.
  • a live broadcast database is constructed based on the existing historical data of the platform, and historical user data and historical live broadcast data are obtained from the historical data of the platform.
  • Historical user data can include multiple pieces of data. Take historical user data of an e-commerce platform as an example.
  • historical user data can include data such as the user's age, gender, purchasing ability, occupation, and preferences.
  • historical user data can be It includes data of users' operations such as browsing live broadcasts, browsing products, adding purchases, placing orders, sharing, and commenting.
  • historical user data may include operation time data of users corresponding to the operation data.
  • the historical live broadcast data may include multiple pieces. Taking historical user data of an e-commerce platform as an example, for example, the historical live broadcast data may include data such as the live broadcast brand, lottery, time, interaction, anchor, and merchandise.
  • the platform can obtain relevant data on a regular basis, and update and save the obtained relevant data to the database.
  • Step S302 constructing a user attribute feature of the live broadcast.
  • the information data of the historical user data is extracted therefrom, and based on the information data of the historical user data, the user attribute feature can be determined.
  • the historical user data may include data such as the user's age, gender, occupation, preference category, purchasing ability, geographic location, and consumption time.
  • the attribute characteristics may include user age characteristics, user gender characteristics, user occupation characteristics, user preference category characteristics, user purchasing ability characteristics, user geographic location characteristics, user consumption time characteristics, and the like.
  • step S303 a live broadcast feature of the live broadcast is constructed.
  • the live broadcast database constructed in step S301 information data of the historical live broadcast data is extracted therefrom, and based on the information data of the historical live broadcast data, the live broadcast feature can be determined.
  • the historical live broadcast data may include data such as the brand, lottery, time, interaction, anchor and product of the live broadcast, and the live broadcast characteristics determined based on the historical live broadcast data may include the live broadcast brand characteristics.
  • the characteristics of live broadcast brands include the characteristics of the number of live broadcast brands, the characteristics of the number of fans of the live broadcast brand, etc.; the characteristics of live broadcast lottery draws include the characteristics of whether the live broadcast draws lottery, the characteristics of the number of live broadcast lottery draws, etc.; the characteristics of live broadcast time include whether the live broadcast time is a weekend, the live broadcast time period, etc.
  • the characteristics of live broadcast anchors include the characteristics of the number of live broadcast anchors, the characteristics of the number of fans of the live broadcast anchor, the characteristics of the type of live broadcast anchors, the characteristics of the type of goods brought by the live broadcast anchor, etc.
  • the characteristics of live broadcast products include the characteristics of the number of live broadcast products, the characteristics of the average price of live broadcast products, and the characteristics of the type of live broadcast products. etc.; the additional features of live broadcast include whether there are star features in the live broadcast room, whether the live broadcast has continuous microphone features, etc.
  • the user attribute feature of the live broadcast obtained based on step S302 and the live broadcast feature of the live broadcast obtained in step S302 include discrete features and continuous features.
  • Discrete features for example, user's age feature, gender feature, geographical location feature, etc.
  • the Embedding embedding process can extract features from the original data, and perform dimension reduction processing through the principle of matrix multiplication. Continuous features are inherently continuous and therefore do not require processing.
  • XGBOOST input the processed continuous live broadcast user attribute features and live broadcast features into the XGBOOST model for training, and output the feature score of each feature.
  • the characteristics of the final live broadcast are used as the user attribute characteristics and live broadcast characteristics.
  • XGBOOST can alleviate errors caused by feature sparseness and correlation, effectively remove redundant features, and improve feature quality.
  • the feature crossover capability of the XGBOOST model itself can be used in python to use the feature importance function of the model for feature screening. Because the focus of the two indicators of the product browsing rate in the live broadcast room and the product order rate in the live broadcast room are different, the importance of the output features is also different for the model training of the two indicators.
  • Step S304 constructing the user behavior characteristics of the live broadcast.
  • the operation data of the historical user data is extracted therefrom, and based on the operation data of the historical user data, the user behavior characteristic can be determined.
  • the historical user data may include the user's data on operations such as browsing live broadcasts, browsing products, adding purchases, placing orders, and sharing
  • the user behavior characteristics determined based on the historical user data may include User browsing live broadcast features, user browsing product features, user add-on purchase features, user ordering features, user sharing features, etc.
  • Step S305 constructing a user behavior sequence of the live broadcast.
  • the operation time data corresponding to the operation data of the historical user data is extracted therefrom, and based on the operation time data corresponding to the operation data of the historical user data, the behavioral characteristics of each user can be determined. generation time.
  • the user behavior characteristics of the live broadcast constructed according to step S304 and the corresponding generation time input sequence generation model of each user behavior characteristic are trained, and the feature score of each user behavior characteristic is output after calculating the information gain, and the feature
  • the scores are normalized to obtain the weight of each user's behavioral feature, and the user's behavioral sequence can be obtained by summing the data of the user's behavioral feature and the weight of each user's behavioral feature.
  • the weight can represent the importance/degree of importance of different features
  • the sequence refers to a vector composed of the values of multiple features within a predetermined time period.
  • the sequence generation model may be a random forest model.
  • the user behavior characteristics of the live broadcast constructed according to step S304 include user browsing live broadcast characteristics, user browsing commodity characteristics, user add-on purchase characteristics, user ordering characteristics, and user sharing characteristics; the history is extracted from the live broadcast database constructed in step S301.
  • the operation time data corresponding to the operation data of the user data determines the generation time of each user behavior feature, that is, the generation time of the user’s live broadcast feature, the user’s product browsing feature, the user’s additional purchase feature, the user’s order feature, and the user’s sharing feature.
  • the feature of user browsing live broadcast is represented by A
  • the feature of user browsing products is represented by B
  • the feature of user add-on purchase is represented by C
  • the feature of user placing an order is represented by D
  • the feature of user sharing is represented by E.
  • a live broadcast database is constructed based on the historical data of the platform within the past one year, and historical user data and historical live broadcast data are obtained from the live broadcast database. Based on the acquired historical user data and historical live broadcast data, construct user behavior characteristics A, B, C, D, and E for the past year, and generate user behavior characteristics A, B, C, D, E and corresponding user behavior characteristics.
  • Time input random forest model for training calculate the information gain and output the feature scores of A, B, C, D, E, normalize the feature scores of A, B, C, D, E to obtain A, B,
  • the weights W A , W B , W C , W D , and W E of C, D, and E combine the data of A, B, C, D, and E with the weights W A , W of A, B, C, D, and E.
  • B , WC , WD , and WE perform weighted summation to obtain the user behavior sequence.
  • the data of A, B, C, D, and E may be the number of times users browse live broadcasts, the times users browse products, the times users add purchases, the times users place orders, and the times users share.
  • the weighted sum of the data of A, B, C, D, and E and the weights of A , B , C , D , and E is carried out to obtain the user behavior sequence, including: The weighted sum of the data of A, B, C, D, and E in January and the weights of A, B, C, D, and E, W A , W B , W C , W D , and W E , get the user in January Behavior scores; weighting the data of A, B, C, D, E from February to December with the weights of A , B , C , D , E, WA, WB, WC, WD, WE, respectively Sum up to get the behavior scores of users from February to December; combine the user behavior scores of 12 months to obtain the user behavior sequence.
  • the user behavior sequence contains
  • the product viewing rate in the live broadcast room and the product order rate in the live broadcast room focuses on the pageviews, and the live broadcast room product order rate focuses more on the order volume. Therefore, the model training for the two metrics, the feature importance of the output is also different.
  • the data of A, B, C, D, and E are input into the random forest model, and the output weights W A , W B , W C , W D , W E are [0.3, 0.2 , 0.2, 0.1, 0.2]; and for the product order rate indicator in the live broadcast room, the data of A, B, C, D, and E are input into the random forest model, and the output weights W A , W B , W C , W D , W E is [0.1, 0.2, 0.2, 0.3, 0.2].
  • the weights W A and W B of A and B are greater than the weight W D of D, indicating the importance of user browsing live broadcast characteristics and user browsing product characteristics in the product viewing rate index in the live broadcast room.
  • the weight of D W D accounts for a higher proportion of the live broadcast room product ordering rate, indicating that the user's ordering characteristics are under the live broadcast room products. The importance is higher in the single rate indicator.
  • Step S306 constructing a user dynamic feature of the live broadcast.
  • the user behavior sequence is input into the ARMA model, and the stationarity of the user behavior sequence is detected by the ADF test. Based on the results of the ADF test, it is judged whether the user behavior sequence is stationary, and if not, the difference processing (difference operation) is performed until the user behavior sequence is stationary. After confirming that the user behavior sequence is stable, calculate the autocorrelation coefficient a and partial autocorrelation coefficient b of the user behavior sequence, and identify the ARMA model according to the autocorrelation coefficient a (ACF) and the partial autocorrelation coefficient b (PACF).
  • ACF autocorrelation coefficient a
  • PPF partial autocorrelation coefficient b
  • identifying the ARMA model includes: if the autocorrelation coefficient a is tailing and the partial autocorrelation coefficient b is p-order truncation, then the ARMA model is an ARp model; if the autocorrelation coefficient a is q-order truncation, the partial autocorrelation coefficient b If the autocorrelation coefficient a is tailing and the partial autocorrelation coefficient b is also tailing, the ARMA model is the ARMAp,q model. Based on the determined ARMA model, the order p and q are determined in combination with AIC and BIC criteria. The order p and q represent the autocorrelation characteristics of the sequence itself, especially the periodic behavior. After the order p and q are determined, the model parameters of the ARMA model can be obtained, and the model parameters of the ARMA model are combined into a feature vector, which is the user dynamic feature.
  • User dynamic features are sequence features, which themselves are continuous.
  • deep learning models such as RNN, LSTM, and temporal convolutional network can be used to construct user dynamic features.
  • the ARMA model is used to abstract the user behavior sequence, and its model parameters are used to construct the user dynamic characteristics of the live broadcast.
  • Step S307 the click rate prediction model is trained.
  • a characteristic sample of the live broadcast click rate is constructed.
  • the feature samples of the live CTR are divided into training set and test set, and 80% of the feature samples are selected as the training set for training the model to obtain the trained CTR prediction model; the remaining 20% of the feature samples are used as the test set. Used to test the CTR prediction model after live training.
  • the training set feature samples of the live click rate are input into the XGBOOST model, the weak classifiers are trained in an iterative loop, the multiple weak classifiers are iteratively integrated into a combined classifier, and the trained click rate prediction model is obtained.
  • the click-through rate prediction model is used to predict the click-through rate of the live broadcast, wherein the click-through rate may include data such as regular clicks, browsing, favorites, add-ons, ordering, and sharing.
  • the XGBOOST model (eXtreme Gradient Boosting) is a boosting tree model, iterative training is performed according to the input training set feature samples, the weak classifiers of each iteration are learned step by step, and the coefficients of the samples in the training set are updated according to the coefficients of the weak classifiers. Weight; the residuals between the results of fitting previous weak classifiers and the training set samples, and iteratively integrates multiple weak classifiers into a strong classifier to obtain a prediction model.
  • XGBOOST has good learning performance.
  • XGBOOST model can consume less computing resources in a short time and obtain a prediction model with excellent performance. It combines the advantages of sexual awareness and cross-validation.
  • models such as LR, random forest, GBDT, BP neural network and other algorithms can be used for model training.
  • Step S308 verifying the click-through rate prediction model.
  • step S307 input the test set feature samples of the live broadcast click rate into the combined classifier obtained by training in step S307, that is, the trained click rate prediction model, output the user's live broadcast click rate, and calculate the click rate prediction model according to the test result.
  • Error judge whether the model error is higher than the error standard value, and modify the click-through rate prediction model according to the model error, so that the click-through rate prediction model meets the requirements.
  • the prediction accuracy of the prediction model is greatly improved, and the click-through rate of the live broadcast can be accurately predicted.
  • Step S309 the model is used.
  • the obtained target user data and a plurality of live broadcast data to be pushed are input into the click-through rate prediction model, and output.
  • the click-through rate of the target live stream data According to the click rate, the live broadcast data can be pushed to the target user; or, according to the click rate, the inventory corresponding to the target live broadcast data can be determined, and inventory management can be carried out according to the inventory amount, for example, direct allocation of inventory from the origin, allocation from the supply warehouse Inventory to demand warehouse, etc.
  • the click-through rate may include data of regular clicks, browsing, favorites, add-ons, and ordering. If the click-through rate of an order for a certain product is high, the warehouse will increase the inventory of the product; if a user has a high click-through rate to browse, favorite or add a certain product, it will recommend the live broadcast of the product or similar products to the user. 's live broadcast.
  • constructing a live broadcast database by constructing a live broadcast user attribute feature; constructing a live broadcast feature; constructing a live broadcast user behavior feature; constructing a live broadcast user behavior sequence; constructing a live broadcast user dynamic feature; click-through rate prediction Model training; verification of the click-through rate prediction model; model use and other steps can adapt to the cyclical changes in user behavior, optimize the performance of the live broadcast data prediction model, make full use of live broadcast resources, accurately predict the live broadcast click rate, and can accurately push the live broadcast to users and reasonably Manage inventory.
  • FIG. 4 is a schematic diagram of the main modules of the apparatus for determining the click-through rate of live broadcast according to an embodiment of the present disclosure.
  • the apparatus 400 for determining the click-through rate of live broadcast of the present disclosure includes:
  • the obtaining module 401 is configured to obtain multiple historical user data and multiple historical live broadcast data.
  • the obtaining module 401 obtains a plurality of historical user data and a plurality of historical live broadcast data based on the historical data of the platform. It can include data of users' browsing, comments, favorites, add-ons, ordering, sharing, etc., as well as the time when the user's operation data is generated; historical live broadcast data can include live broadcast brand, lottery, time, interaction, anchor and commodity and other information data.
  • the sequence generation module 402 is configured to determine the user behavior sequence corresponding to the multiple historical user data according to the sequence generation model, the multiple historical user data, and the generation time of the multiple historical user data.
  • the user behavior characteristics are determined based on the user operation data therein, and the generation time corresponding to the user behavior characteristics is determined based on the time when the user operation data is generated, and the sequence is generated.
  • the module 402 generates a model by inputting the user behavior feature and the generation time sequence corresponding to the user behavior feature, and outputs a feature score of each user behavior feature.
  • the sequence generation module 402 normalizes the feature scores of the user behavioral features to obtain a weight value corresponding to each user behavioral feature; based on the data of the user behavioral feature and the weighted value corresponding to the user behavioral feature, the weighted sum is generated.
  • User behavior sequence contains 12 elements, that is, the 12-month user behavior score, which characterizes the user's behavior in the past 1 year.
  • the feature generation module 403 is configured to determine user attribute features according to the multiple historical user data, and determine the live broadcast feature according to the multiple historical live broadcast data.
  • the feature generating module 403 determines the user attribute feature based on the user information data therein, and determines the live broadcast feature based on the plurality of historical live broadcast data therein.
  • the model training module 404 is configured to train the click rate prediction model according to the user behavior sequence, the user attribute feature and the live broadcast feature.
  • the model training module 404 inputs the user behavior sequence into the ARMA model, trains the ARMA model, and outputs the ARMA model parameters as user dynamic features.
  • the model training module 404 inputs the user dynamic features, the user attribute features and the live broadcast features obtained by the feature generation module 403 into the click-through rate prediction model, trains the click-through rate prediction model, and outputs the trained click-through rate prediction model.
  • the CTR prediction model is the XGBOOST model.
  • the data processing module 405 is configured to determine the click rate of the target user on the target live broadcast data according to the trained click rate prediction model.
  • the target user data and a plurality of live broadcast data to be pushed are obtained, and the data processing module 405 determines the click rate of the target user on the target live broadcast data according to the target user data and the trained click rate prediction model.
  • the click-through rate may include data of regular clicks, browsing, favorites, add-ons, and ordering.
  • the click rate push live broadcast data for the target user; or, according to the click rate, determine the inventory corresponding to the target live broadcast data, and carry out inventory management according to the inventory amount to timely increase, allocate supply warehouses or support demand warehouses.
  • modules such as an acquisition module, a sequence generation module, a feature generation module, a model training module, and a data processing module can adapt to periodic changes in user behavior, optimize the performance of the live broadcast data prediction model, and make full use of live broadcast resources. , which can accurately predict the click-through rate of live broadcasts, accurately push live broadcasts to users and manage inventory reasonably.
  • FIG. 5 is a schematic structural diagram of a computer system suitable for implementing the terminal device according to the embodiment of the present disclosure.
  • the computer system 500 of the terminal device according to the embodiment of the present disclosure includes:
  • a central processing unit (CPU) 501 can execute various appropriate actions and processes according to a program stored in a read only memory (ROM) 502 or a program loaded into a random access memory (RAM) 503 from a storage section 508 .
  • ROM read only memory
  • RAM random access memory
  • various programs and data necessary for the operation of the system 500 are also stored.
  • the CPU 501 , the ROM 502 , and the RAM 503 are connected to each other through a bus 504 .
  • An input/output (I/O) interface 505 is also connected to bus 504 .
  • the following components are connected to the I/O interface 505: an input section 506 including a keyboard, a mouse, etc.; an output section 507 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker, etc.; a storage section 508 including a hard disk, etc. ; and a communication section 509 including a network interface card such as a LAN card, a modem, and the like. The communication section 509 performs communication processing via a network such as the Internet.
  • a drive 510 is also connected to the I/O interface 505 as needed.
  • a removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is mounted on the drive 510 as needed so that a computer program read therefrom is installed into the storage section 508 as needed.
  • embodiments of the present disclosure include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart.
  • the computer program may be downloaded and installed from the network via the communication portion 509 and/or installed from the removable medium 511 .
  • CPU central processing unit
  • the computer-readable medium shown in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • the computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions.
  • the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • the modules involved in the embodiments of the present disclosure may be implemented in software or hardware.
  • the described modules can also be set in the processor, for example, it can be described as: a processor includes an acquisition module, a sequence generation module, a feature generation module, a model training module, and a data processing module. Wherein, the names of these modules do not constitute a limitation on the module itself under certain circumstances.
  • the acquisition module can also be described as "a module for acquiring live broadcast data from a live broadcast platform".
  • the present disclosure also provides a computer-readable medium.
  • the computer-readable medium may be included in the device described in the above-mentioned embodiments, or it may exist alone without being assembled into the device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by a device, the device includes: acquiring a plurality of historical user data and a plurality of historical live broadcast data; generating a model according to the sequence, The plurality of historical user data and the generation time of the plurality of historical user data determine the user behavior sequence corresponding to the plurality of historical user data; determine user attribute characteristics according to the plurality of historical user data, and A plurality of historical live broadcast data determine the live broadcast characteristics; according to the user behavior sequence, the user attribute characteristics and the live broadcast characteristics, the click-through rate prediction model is trained; according to the trained click-through rate prediction model, the target user's information about the target live broadcast data is determined. CTR.
  • the prediction accuracy of the click-through rate of the live broadcast can be improved, so that the push accuracy can be improved, and the possibility of insufficient stock or slow sales can be reduced.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Economics (AREA)
  • Molecular Biology (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Development Economics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)

Abstract

The present disclosure relates to the technical field of computers, and provides a method and device for determining a live broadcast click rate. A specific embodiment of the method comprises: acquiring a plurality of pieces of historical user data and a plurality of pieces of historical live broadcast data; according to a sequence generation model, the plurality of pieces of historical user data and the generation time of the plurality of pieces of historical user data, determining a user behavior sequence corresponding to the plurality of pieces of historical user data; determining user attribute features according to the plurality of pieces of historical user data, and determining live broadcast features according to the plurality of pieces of historical live broadcast data; according to the user behavior sequence, the user attribute features and the live broadcast features, training a click rate prediction model; and according to the trained click rate prediction model, determining a click rate of a target user about target live broadcast data. The embodiment can improve the prediction accuracy of a live broadcast click rate, thereby improving the pushing accuracy, and reducing the possibility of phenomena such as insufficient stock or unsalable stock.

Description

一种直播点击率的确定方法和装置A kind of method and device for determining click-through rate of live broadcast

相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS

本申请要求享有2021年1月21日提交的申请号为202110081300.7的中国发明专利申请的优先权,其全部内容通过引用并入本文。This application claims the priority of the Chinese invention patent application with application number 202110081300.7 filed on January 21, 2021, the entire contents of which are incorporated herein by reference.

技术领域technical field

本公开涉及计算机技术领域,尤其涉及一种直播点击率的确定方法和装置。The present disclosure relates to the field of computer technology, and in particular, to a method and device for determining the click-through rate of a live broadcast.

背景技术Background technique

直播作为一个新兴产业,其引导式消费逐渐成为了电商行业主要的一种营销方式。As an emerging industry, live streaming has gradually become a major marketing method in the e-commerce industry.

现有技术中,运营人员主要基于直播数据(如直播的时间段、主播粉丝数、是否有明星、是否有抽奖及奖品份数等),根据自身经验或市场经验来预测用户关于该直播的点击率。In the prior art, operators mainly predict users’ clicks on the live broadcast based on their own experience or market experience based on live broadcast data (such as the time period of the live broadcast, the number of fans of the anchor, whether there are stars, whether there is a lottery and the number of prizes, etc.). Rate.

由于运营人员经验有限,而且不同的直播所对应的数据千变万化,因此,根据运营人员的个体经验主观预测的直播点击率的准确性较低,从而导致基于直播数据进行推送或者备货时,可能出现推送不准确、备货量不足或滞销的现象。Due to the limited experience of operators and the ever-changing data corresponding to different live broadcasts, the accuracy of the live broadcast click-through rate subjectively predicted based on the operator's individual experience is low. Inaccuracy, insufficient stock or slow sales.

发明内容SUMMARY OF THE INVENTION

有鉴于此,本公开实施例提供一种直播点击率的确定方法和装置,能够提高直播点击率的预测准确性,从而能够提高推送准确性,以及降低备货量不足或滞销等现象的可能性。In view of this, the embodiments of the present disclosure provide a method and apparatus for determining the click-through rate of live broadcast, which can improve the prediction accuracy of the click-through rate of live broadcast, thereby improving the accuracy of push, and reducing the possibility of insufficient stock or slow sales.

为实现上述目的,根据本公开实施例的一个方面,提供了一种直 播点击率的确定方法,包括:In order to achieve the above purpose, according to an aspect of the embodiments of the present disclosure, a method for determining a live broadcast click-through rate is provided, comprising:

获取多个历史用户数据和多个历史直播数据;Obtain multiple historical user data and multiple historical live broadcast data;

根据序列生成模型、所述多个历史用户数据以及所述多个历史用户数据的产生时间,确定所述多个历史用户数据对应的用户行为序列;According to the sequence generation model, the multiple historical user data, and the generation time of the multiple historical user data, determine the user behavior sequence corresponding to the multiple historical user data;

根据所述多个历史用户数据确定用户属性特征,并根据所述多个历史直播数据确定直播特征;Determine user attribute characteristics according to the plurality of historical user data, and determine the live broadcast characteristics according to the plurality of historical live broadcast data;

根据用户行为序列、所述用户属性特征以及所述直播特征,对点击率预测模型进行训练;According to the user behavior sequence, the user attribute characteristics and the live broadcast characteristics, the click rate prediction model is trained;

根据训练后的点击率预测模型,确定目标用户关于目标直播数据的点击率。According to the trained click-through rate prediction model, the click-through rate of the target user on the target live broadcast data is determined.

可选地,所述根据序列生成模型、所述多个历史用户数据以及所述多个历史用户数据的产生时间,确定所述多个历史用户数据对应的用户行为序列,包括:Optionally, determining the user behavior sequence corresponding to the multiple historical user data according to the sequence generation model, the multiple historical user data, and the generation time of the multiple historical user data, including:

根据所述多个历史用户数据确定用户行为特征;determining user behavior characteristics according to the plurality of historical user data;

将所述用户行为特征以及所述用户行为特征对应的产生时间作为所述序列生成模型的输入,根据所述序列生成模型的输出确定每一个所述用户行为特征对应的权重值;The user behavior feature and the generation time corresponding to the user behavior feature are used as the input of the sequence generation model, and the weight value corresponding to each of the user behavior features is determined according to the output of the sequence generation model;

根据所述用户行为特征以及所述权重值,生成所述用户行为序列。The user behavior sequence is generated according to the user behavior feature and the weight value.

可选地,根据所述序列生成模型的输出确定每一个所述用户行为特征对应的权重,包括:Optionally, determining the weight corresponding to each of the user behavior features according to the output of the sequence generation model, including:

将所述序列生成模型的输出进行归一化处理,得到每一个所述用户行为特征对应的权重值。The output of the sequence generation model is normalized to obtain a weight value corresponding to each of the user behavior features.

可选地,所述根据用户行为序列、所述用户属性特征以及所述直播特征,对点击率预测模型进行训练,包括:Optionally, the training of the click-through rate prediction model according to the user behavior sequence, the user attribute characteristics and the live broadcast characteristics includes:

将所述用户行为序列输入ARMA模型,根据所述ARMA模型的输出确定用户动态特征;The user behavior sequence is input into the ARMA model, and the user dynamic feature is determined according to the output of the ARMA model;

将所述用户动态特征、所述用户属性特征以及所述直播特征作为 所述点击率预测模型的输入,以对所述点击率预测模型进行训练。The user dynamic feature, the user attribute feature and the live broadcast feature are used as the input of the click-through rate prediction model to train the click-through rate prediction model.

可选地,在确定目标用户关于目标直播数据的点击率之后,还包括:Optionally, after determining the click-through rate of the target user on the target live broadcast data, the method further includes:

根据所述点击率,为所述目标用户推送直播数据。According to the click rate, the live broadcast data is pushed to the target user.

可选地,在确定目标用户关于目标直播数据的点击率之后,还包括:Optionally, after determining the click-through rate of the target user on the target live broadcast data, the method further includes:

根据所述点击率,确定所述目标直播数据所对应的库存量,并根据所述库存量进行库存管理。According to the click rate, the inventory corresponding to the target live broadcast data is determined, and inventory management is performed according to the inventory.

可选地,所述序列生成模型为随机森立模型;Optionally, the sequence generation model is a random Senli model;

和/或,and / or,

所述点击率预测模型为XGBOOST模型。The click-through rate prediction model is an XGBOOST model.

根据本公开实施例的再一个方面,提供了一种直播点击率的确定装置,包括:According to yet another aspect of the embodiments of the present disclosure, there is provided a device for determining a click-through rate of a live broadcast, including:

获取模块,用于获取多个历史用户数据和多个历史直播数据;The acquisition module is used to acquire multiple historical user data and multiple historical live broadcast data;

序列生成模块,用于根据序列生成模型、所述多个历史用户数据以及所述多个历史用户数据的产生时间,确定所述多个历史用户数据对应的用户行为序列;a sequence generation module, configured to determine the user behavior sequence corresponding to the multiple historical user data according to the sequence generation model, the multiple historical user data, and the generation time of the multiple historical user data;

特征生成模块,用于根据所述多个历史用户数据确定用户属性特征,并根据所述多个历史直播数据确定直播特征;a feature generation module, configured to determine user attribute features according to the plurality of historical user data, and determine live broadcast features according to the plurality of historical live broadcast data;

模型训练模块,用于根据用户行为序列、所述用户属性特征以及所述直播特征,对点击率预测模型进行训练;a model training module, used for training the click-through rate prediction model according to the user behavior sequence, the user attribute characteristics and the live broadcast characteristics;

数据处理模块,用于根据训练后的点击率预测模型,确定目标用户关于目标直播数据的点击率。The data processing module is used to determine the click rate of the target user on the target live broadcast data according to the trained click rate prediction model.

根据本公开实施例的另一个方面,提供了一种直播点击率的确定电子设备,包括:According to another aspect of the embodiments of the present disclosure, there is provided an electronic device for determining the click-through rate of a live broadcast, including:

一个或多个处理器;one or more processors;

存储装置,用于存储一个或多个程序,storage means for storing one or more programs,

当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现本公开提供的直播点击率的确定方法。When the one or more programs are executed by the one or more processors, the one or more processors implement the method for determining the click-through rate of a live broadcast provided by the present disclosure.

根据本公开实施例的还一个方面,提供了一种计算机可读介质,其上存储有计算机程序,所述程序被处理器执行时实现本公开提供的直播点击率的确定方法。According to another aspect of the embodiments of the present disclosure, a computer-readable medium is provided, on which a computer program is stored, and when the program is executed by a processor, implements the method for determining the click-through rate of a live broadcast provided by the present disclosure.

上述实施例具有如下优点或有益效果:因为采用基于时间的用户数据训练模型以确定直播点击率的技术手段,所以克服了主观预测推送不准确、备货量不足或滞销的现象的技术问题,进而达到能够提高直播点击率的预测准确性,从而能够提高推送准确性,以及降低备货量不足或滞销等现象的可能性的技术效果。The above-mentioned embodiments have the following advantages or beneficial effects: because the time-based user data training model is used to determine the technical means of the live broadcast click-through rate, the technical problems of inaccurate subjective prediction push, insufficient stock or unsalable phenomena are overcome, and the result is achieved. The technical effect of improving the prediction accuracy of the click-through rate of the live broadcast, thereby improving the accuracy of the push, and reducing the possibility of insufficient stock or slow sales.

上述的非惯用的可选方式所具有的进一步效果将在下文中结合具体实施方式加以说明。Further effects of the above non-conventional alternatives will be described below in conjunction with specific embodiments.

附图说明Description of drawings

附图用于更好地理解本公开,不构成对本公开的不当限定。其中:The accompanying drawings are used for a better understanding of the present disclosure, and do not constitute an improper limitation of the present disclosure. in:

图1是适于应用于本公开实施例的直播点击率的确定方法或直播点击率的确定装置的示例性系统架构图;FIG. 1 is an exemplary system architecture diagram of a method for determining a live click rate or an apparatus for determining a live click rate that is suitable for application in an embodiment of the present disclosure;

图2是根据本公开实施例的直播点击率的确定方法的主要流程的示意图;2 is a schematic diagram of a main process of a method for determining a live broadcast click-through rate according to an embodiment of the present disclosure;

图3是根据本公开实施例的直播点击率的确定方法的详细流程的示意图;3 is a schematic diagram of a detailed flow of a method for determining a live broadcast click-through rate according to an embodiment of the present disclosure;

图4是根据本公开实施例的直播点击率的确定装置的主要模块的示意图;4 is a schematic diagram of main modules of a device for determining a live click rate according to an embodiment of the present disclosure;

图5是适于用来实现本公开实施例的终端设备或服务器的计算机系统的结构示意图。FIG. 5 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server according to an embodiment of the present disclosure.

具体实施方式Detailed ways

以下结合附图对本公开的示范性实施例做出说明,其中包括本公开实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本公开的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding and should be considered as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted from the following description for clarity and conciseness.

ARMA模型:自回归滑动平均模型(即Autoregressive moving average model),是研究时间序列的重要方法,包括:自回归模型(Autoregressive model,简称AR模型)、移动平均模型(Moving average model,简称MA模型)和自回归滑动平均模型(ARMA)。ARMA model: Autoregressive moving average model (ie Autoregressive moving average model), is an important method to study time series, including: autoregressive model (Autoregressive model, referred to as AR model), moving average model (Moving average model, referred to as MA model) and an autoregressive moving average model (ARMA).

截尾:指时间序列的自相关函数(ACF)或偏自相关函数(PACF)在某阶后均为0的性质。Censoring: refers to the property that the autocorrelation function (ACF) or partial autocorrelation function (PACF) of the time series is 0 after a certain order.

拖尾:指时间序列的自相关函数(ACF)或偏自相关函数(PACF)并不在某阶后均为0的性质。Tailing: refers to the property that the autocorrelation function (ACF) or partial autocorrelation function (PACF) of the time series is not all zero after a certain order.

AIC,赤池信息量准则(即Akaike information criterion),是衡量统计模型拟合优良性的一种标准,通常AIC值越小,模型越好。AIC, Akaike Information Criterion (Akaike Information Criterion), is a standard to measure the goodness of statistical model fitting. Usually, the smaller the AIC value, the better the model.

BIC,贝叶斯信息量准则(即Bayesian information criterion),是衡量统计模型拟合优良性的一种标准,通常BIC值越小,模型越好。AIC是从预测角度选择好的模型用来预测;而BIC是从拟合角度选择对数据拟合最好的模型BIC, Bayesian Information Criterion (Bayesian Information Criterion), is a standard to measure the goodness of the fitting of statistical models. Usually, the smaller the BIC value, the better the model. AIC is to select a good model from the perspective of prediction for prediction; while BIC is to select the model that best fits the data from the perspective of fitting

图1示出了适于应用于本公开实施例的直播点击率的确定方法或直播点击率的确定装置的示例性系统架构图,如图1所示,本公开实施例的直播点击率的确定方法或直播点击率的确定装置的示例性系统架构包括:FIG. 1 shows an exemplary system architecture diagram of a method for determining a click-through rate of live broadcast or an apparatus for determining a click-through rate of a live broadcast that is suitable for use in an embodiment of the present disclosure. As shown in FIG. The exemplary system architecture of the method or the apparatus for determining the live click rate includes:

如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型, 例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 1 , the system architecture 100 may include terminal devices 101 , 102 , and 103 , a network 104 and a server 105 . The network 104 is a medium used to provide a communication link between the terminal devices 101 , 102 , 103 and the server 105 . The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如购物类应用、网页浏览器应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等。The user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and the like. Various communication client applications may be installed on the terminal devices 101 , 102 and 103 , such as shopping applications, web browser applications, search applications, instant messaging tools, email clients, social platform software, and the like.

终端设备101、102、103可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、膝上型便携计算机和台式计算机等等。The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop computers, desktop computers, and the like.

服务器105可以是提供各种服务的服务器,例如对用户利用终端设备101、102、103所浏览的购物类网站提供支持的后台管理服务器。后台管理服务器可以对接收到的用户特征查询请求等数据进行分析等处理,并将处理结果(例如用户特征)反馈给终端设备101、102、103。The server 105 may be a server that provides various services, for example, a background management server that provides support for shopping websites browsed by the terminal devices 101 , 102 , and 103 . The background management server may analyze and process the received user feature query request and other data, and feed back the processing results (eg, user features) to the terminal devices 101 , 102 , and 103 .

需要说明的是,本公开实施例所提供的直播点击率的确定方法一般由服务器105执行,相应地,直播点击率的确定装置一般设置于服务器105中。It should be noted that the method for determining the click-through rate of live broadcast provided by the embodiment of the present disclosure is generally performed by the server 105 , and accordingly, the device for determining the click-through rate of the live broadcast is generally set in the server 105 .

应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.

图2是根据本公开实施例的直播点击率的确定方法的主要流程的示意图,如图2所示,本公开的直播点击率的确定方法包括:FIG. 2 is a schematic diagram of the main process of a method for determining a live broadcast click-through rate according to an embodiment of the present disclosure. As shown in FIG. 2 , the method for determining a live broadcast click-through rate of the present disclosure includes:

步骤S201,获取多个历史用户数据和多个历史直播数据。Step S201, acquiring multiple historical user data and multiple historical live broadcast data.

示例性地,基于平台的历史数据获取多个历史用户数据和多个历史直播数据,历史用户数据可以包括用户的年龄、性别、购买能力、职业和喜好等信息的数据,还可以包括用户的浏览、评论、收藏、加购、下单、分享等操作的数据,还可以包括用户操作数据产生的时间;历史直播数据可以包括直播的品牌、抽奖、时间、互动、主播和商品等信息的数据。Exemplarily, a plurality of historical user data and a plurality of historical live broadcast data are obtained based on the historical data of the platform. The historical user data may include data such as the user's age, gender, purchasing ability, occupation, and preference, and may also include the user's browsing data. , comments, favorites, add-ons, ordering, sharing and other operations data, can also include the time when the user operation data is generated; historical live broadcast data can include live broadcast brand, lottery, time, interaction, anchor and commodity information.

步骤S202,根据序列生成模型、所述多个历史用户数据以及所述多个历史用户数据的产生时间,确定所述多个历史用户数据对应的用 户行为序列。Step S202, according to the sequence generation model, the multiple historical user data and the generation time of the multiple historical user data, determine the user behavior sequence corresponding to the multiple historical user data.

示例性地,根据步骤S201获取的多个历史用户数据,基于其中的用户操作数据确定用户行为特征,并基于用户操作数据产生的时间确定用户行为特征对应的产生时间,将用户行为特征、用户行为特征对应的产生时间输入序列生成模型,输出每个用户行为特征的特征分数。将用户行为特征的特征分数进行归一化处理,获得每个用户行为特征对应的权重值;基于用户行为特征的数据以及用户行为特征对应的权重值,加权求和后生成用户行为序列。用户行为序列包含12个元素,即12个月的用户行为分数,表征近1年内该用户的行为。Exemplarily, according to a plurality of historical user data acquired in step S201, the user behavior characteristics are determined based on the user operation data therein, and the corresponding generation time of the user behavior characteristics is determined based on the time when the user operation data is generated, and the user behavior characteristics and user behavior characteristics are determined. The corresponding generation time of the feature is input to the sequence generation model, and the feature score of each user behavior feature is output. The feature scores of the user behavior features are normalized to obtain the weight value corresponding to each user behavior feature; based on the data of the user behavior feature and the weight value corresponding to the user behavior feature, the user behavior sequence is generated after weighted summation. The user behavior sequence contains 12 elements, that is, the 12-month user behavior score, which characterizes the user's behavior in the past 1 year.

步骤S203,根据所述多个历史用户数据确定用户属性特征,并根据所述多个历史直播数据确定直播特征。Step S203: Determine user attribute characteristics according to the plurality of historical user data, and determine live broadcast characteristics according to the plurality of historical live broadcast data.

示例性地,根据步骤S201获取的多个历史用户数据,基于其中的用户信息数据确定用户属性特征,并基于其中的多个历史直播数据确定直播特征。Exemplarily, according to the plurality of historical user data obtained in step S201, the user attribute feature is determined based on the user information data therein, and the live broadcast feature is determined based on the plurality of historical live broadcast data therein.

步骤S204,根据用户行为序列、所述用户属性特征以及所述直播特征,对点击率预测模型进行训练。Step S204 , train a click-through rate prediction model according to the user behavior sequence, the user attribute feature, and the live broadcast feature.

示例性地,根据步骤S202获得的用户行为序列,将用户行为序列输入ARMA模型,对ARMA模型进行训练,输出ARMA模型参数作为用户动态特征。将用户动态特征、步骤S203获得的用户属性特征和直播特征输入点击率预测模型,对点击率预测模型进行训练,输出训练后的点击率预测模型。其中,点击率预测模型为XGBOOST模型。Exemplarily, according to the user behavior sequence obtained in step S202, the user behavior sequence is input into the ARMA model, the ARMA model is trained, and the ARMA model parameters are output as user dynamic features. The user dynamic characteristics, the user attribute characteristics obtained in step S203, and the live broadcast characteristics are input into the click-through rate prediction model, the click-through rate prediction model is trained, and the trained click-through rate prediction model is output. Among them, the CTR prediction model is the XGBOOST model.

步骤S205,根据训练后的点击率预测模型,确定目标用户关于目标直播数据的点击率。Step S205, according to the trained click-through rate prediction model, determine the click-through rate of the target user on the target live broadcast data.

示例性地,获取目标用户数据以及多个待推送的直播数据,根据目标用户数据以及训练后的点击率预测模型,确定目标用户关于目标直播数据的点击率。其中,点击率可以包括常规点击、浏览、收藏、加购和下单数据等。根据点击率,为目标用户推送直播数据;或者,根据点击率,确定目标直播数据所对应的库存量,并根据库存量进行库存管理,适时地增加、供给仓调拨或者支援需求仓。Exemplarily, the target user data and a plurality of live broadcast data to be pushed are acquired, and the click rate of the target user on the target live broadcast data is determined according to the target user data and the trained click rate prediction model. The click-through rate may include data of regular clicks, browsing, favorites, add-ons, and ordering. According to the click rate, push live broadcast data for the target user; or, according to the click rate, determine the inventory corresponding to the target live broadcast data, and carry out inventory management according to the inventory amount to timely increase, allocate supply warehouses or support demand warehouses.

在本公开实施例中,通过获取多个历史用户数据和多个历史直播 数据;根据序列生成模型、所述多个历史用户数据以及所述多个历史用户数据的产生时间,确定所述多个历史用户数据对应的用户行为序列;根据所述多个历史用户数据确定用户属性特征,并根据所述多个历史直播数据确定直播特征;根据用户行为序列、所述用户属性特征以及所述直播特征,对点击率预测模型进行训练;根据训练后的点击率预测模型,确定目标用户关于目标直播数据的点击率等步骤,能够适应用户行为的周期性变化,优化直播数据预测模型的性能,充分利用直播资源,准确预测直播点击率,可以向用户精准推送直播并合理管理库存。In the embodiment of the present disclosure, by acquiring a plurality of historical user data and a plurality of historical live broadcast data; The user behavior sequence corresponding to the historical user data; the user attribute characteristics are determined according to the multiple historical user data, and the live broadcast characteristics are determined according to the multiple historical live broadcast data; according to the user behavior sequence, the user attribute characteristics and the live broadcast characteristics , train the click-through rate prediction model; according to the trained click-through rate prediction model, determine the click rate of the target user on the target live broadcast data and other steps, which can adapt to the periodic changes of user behavior, optimize the performance of the live broadcast data prediction model, and make full use of Live broadcast resources, accurately predict live broadcast click-through rate, accurately push live broadcasts to users and manage inventory reasonably.

图3是根据本公开实施例的直播点击率的确定方法的详细流程的示意图,如图3所示,本公开的直播点击率的确定方法包括:3 is a schematic diagram of a detailed process of a method for determining a live broadcast click-through rate according to an embodiment of the present disclosure. As shown in FIG. 3 , the method for determining a live broadcast click-through rate of the present disclosure includes:

步骤S301,构建直播的数据库。In step S301, a database of live broadcast is constructed.

示例性地,基于平台已有的历史数据构建直播的数据库,从平台的历史数据获取历史用户数据和历史直播数据。历史用户数据可以包括多个,以某电商平台的历史用户数据为例,例如,历史用户数据可以包括用户的年龄、性别、购买能力、职业和喜好等信息的数据,或者,历史用户数据可以包括用户的浏览直播、浏览商品、加购、下单、分享、评论等操作的数据,或者,历史用户数据可以包括用户的与操作数据对应的操作时间数据。历史直播数据可以包括多个,以某电商平台的历史用户数据为例,例如,历史直播数据可以包括直播的品牌、抽奖、时间、互动、主播和商品等信息的数据。Exemplarily, a live broadcast database is constructed based on the existing historical data of the platform, and historical user data and historical live broadcast data are obtained from the historical data of the platform. Historical user data can include multiple pieces of data. Take historical user data of an e-commerce platform as an example. For example, historical user data can include data such as the user's age, gender, purchasing ability, occupation, and preferences. Alternatively, historical user data can be It includes data of users' operations such as browsing live broadcasts, browsing products, adding purchases, placing orders, sharing, and commenting. Alternatively, historical user data may include operation time data of users corresponding to the operation data. The historical live broadcast data may include multiple pieces. Taking historical user data of an e-commerce platform as an example, for example, the historical live broadcast data may include data such as the live broadcast brand, lottery, time, interaction, anchor, and merchandise.

进一步地,平台可以定期获取相关数据,并将获取的相关数据更新保存至数据库。Further, the platform can obtain relevant data on a regular basis, and update and save the obtained relevant data to the database.

步骤S302,构建直播的用户属性特征。Step S302, constructing a user attribute feature of the live broadcast.

示例性地,根据步骤S301构建的直播的数据库,从中提取历史用户数据的信息数据,基于历史用户数据的信息数据,可以确定用户属性特征。以某电商平台的历史用户数据为例,例如,历史用户数据可以包括用户的年龄、性别、职业、喜好类别、购买能力、地理位置和消费时间等信息的数据,基于历史用户数据确定的用户属性特征可以 包括用户年龄特征、用户性别特征、用户职业特征、用户喜好类别特征、用户购买能力特征、用户地理位置特征、用户消费时间特征等。Exemplarily, according to the live database constructed in step S301, the information data of the historical user data is extracted therefrom, and based on the information data of the historical user data, the user attribute feature can be determined. Take the historical user data of an e-commerce platform as an example. For example, the historical user data may include data such as the user's age, gender, occupation, preference category, purchasing ability, geographic location, and consumption time. The attribute characteristics may include user age characteristics, user gender characteristics, user occupation characteristics, user preference category characteristics, user purchasing ability characteristics, user geographic location characteristics, user consumption time characteristics, and the like.

步骤S303,构建直播的直播特征。In step S303, a live broadcast feature of the live broadcast is constructed.

示例性地,根据步骤S301构建的直播的数据库,从中提取历史直播数据的信息数据,基于历史直播数据的信息数据,可以确定直播特征。以某电商平台的历史用户数据为例,例如,历史直播数据可以包括直播的品牌、抽奖、时间、互动、主播和商品等信息的数据,基于历史直播数据确定的直播特征可以包括直播品牌特征、直播抽奖特征、直播时间特征、直播主播特征、直播商品特征和直播附加特征等。Exemplarily, according to the live broadcast database constructed in step S301, information data of the historical live broadcast data is extracted therefrom, and based on the information data of the historical live broadcast data, the live broadcast feature can be determined. Take the historical user data of an e-commerce platform as an example. For example, the historical live broadcast data may include data such as the brand, lottery, time, interaction, anchor and product of the live broadcast, and the live broadcast characteristics determined based on the historical live broadcast data may include the live broadcast brand characteristics. , live sweepstakes features, live broadcast time features, live broadcast anchor features, live broadcast commodity features, and live broadcast additional features, etc.

进一步地,直播品牌特征包括直播品牌个数特征、直播品牌粉丝数特征等;直播抽奖特征包括直播是否抽奖特征、直播抽奖次数特征等;直播时间特征包括直播时间是否周末特征、直播时间段特征等;直播主播特征包括直播主播个数特征、直播主播粉丝数特征、直播主播类型特征、直播主播带货类型特征等;直播商品特征包括直播商品个数特征、直播商品均价特征、直播商品类型特征等;直播附加特征包括直播间是否有明星特征、直播是否连麦特征等。Further, the characteristics of live broadcast brands include the characteristics of the number of live broadcast brands, the characteristics of the number of fans of the live broadcast brand, etc.; the characteristics of live broadcast lottery draws include the characteristics of whether the live broadcast draws lottery, the characteristics of the number of live broadcast lottery draws, etc.; the characteristics of live broadcast time include whether the live broadcast time is a weekend, the live broadcast time period, etc. The characteristics of live broadcast anchors include the characteristics of the number of live broadcast anchors, the characteristics of the number of fans of the live broadcast anchor, the characteristics of the type of live broadcast anchors, the characteristics of the type of goods brought by the live broadcast anchor, etc. The characteristics of live broadcast products include the characteristics of the number of live broadcast products, the characteristics of the average price of live broadcast products, and the characteristics of the type of live broadcast products. etc.; the additional features of live broadcast include whether there are star features in the live broadcast room, whether the live broadcast has continuous microphone features, etc.

进一步地,基于步骤S302获得的直播的用户属性特征和基于步骤S302获得的直播的直播特征包括离散型特征和连续型特征。离散型特征(例如,用户的年龄特征、性别特征、地理位置特征等)具有一定的稀疏性,会导致模型性能急剧下降,故而需要对离散型特征进行Embedding嵌入处理,输出连续向量,从而使得特征在特征空间中有更好的表达能力。其中,Embedding嵌入处理可以从原始数据中提取特征,通过矩阵乘法的原理进行降维处理。连续型特征本身具有连续性,因此无需处理。Further, the user attribute feature of the live broadcast obtained based on step S302 and the live broadcast feature of the live broadcast obtained in step S302 include discrete features and continuous features. Discrete features (for example, user's age feature, gender feature, geographical location feature, etc.) have a certain sparsity, which will lead to a sharp drop in model performance. Therefore, it is necessary to perform Embedding processing on discrete features and output continuous vectors, so that the features It has better expressive power in feature space. Among them, the Embedding embedding process can extract features from the original data, and perform dimension reduction processing through the principle of matrix multiplication. Continuous features are inherently continuous and therefore do not require processing.

更进一步地,将处理得到的连续型的直播的用户属性特征和直播特征输入XGBOOST模型进行训练,输出每个特征的特征分数,分数越高,表示重要性越高;选取特征分数在预定值以上的特征作为最终的直播的用户属性特征和直播特征。XGBOOST作为集成学习模型,可以缓解特征稀疏和相关性带来的误差,有效地去除冗余特征,提高特征质量。XGBOOST模型本身的特征交叉能力,可以在python中运用模型 的Feature importance功能进行特征筛选。由于直播间商品浏览率和直播间商品下单率两个指标的侧重点不同,因此,针对两个指标的模型训练,输出的特征重要性也不同。Further, input the processed continuous live broadcast user attribute features and live broadcast features into the XGBOOST model for training, and output the feature score of each feature. The higher the score, the higher the importance; the selected feature score is above the predetermined value. The characteristics of the final live broadcast are used as the user attribute characteristics and live broadcast characteristics. As an ensemble learning model, XGBOOST can alleviate errors caused by feature sparseness and correlation, effectively remove redundant features, and improve feature quality. The feature crossover capability of the XGBOOST model itself can be used in python to use the feature importance function of the model for feature screening. Because the focus of the two indicators of the product browsing rate in the live broadcast room and the product order rate in the live broadcast room are different, the importance of the output features is also different for the model training of the two indicators.

步骤S304,构建直播的用户行为特征。Step S304, constructing the user behavior characteristics of the live broadcast.

示例性地,根据步骤S301构建的数据库,从中提取历史用户数据的操作数据,基于历史用户数据的操作数据,可以确定用户行为特征。以某电商平台的历史用户数据为例,例如,历史用户数据可以包括用户的浏览直播、浏览商品、加购、下单、分享等操作的数据,基于历史用户数据确定的用户行为特征可以包括用户浏览直播特征、用户浏览商品特征、用户加购特征、用户下单特征、用户分享特征等。Exemplarily, according to the database constructed in step S301, the operation data of the historical user data is extracted therefrom, and based on the operation data of the historical user data, the user behavior characteristic can be determined. Taking the historical user data of an e-commerce platform as an example, for example, the historical user data may include the user's data on operations such as browsing live broadcasts, browsing products, adding purchases, placing orders, and sharing, and the user behavior characteristics determined based on the historical user data may include User browsing live broadcast features, user browsing product features, user add-on purchase features, user ordering features, user sharing features, etc.

步骤S305,构建直播的用户行为序列。Step S305, constructing a user behavior sequence of the live broadcast.

示例性地,根据步骤S301构建的直播的数据库,从中提取历史用户数据的与操作数据对应的操作时间数据,基于历史用户数据的与操作数据对应的操作时间数据,可以确定每个用户行为特征的产生时间。Exemplarily, according to the live database constructed in step S301, the operation time data corresponding to the operation data of the historical user data is extracted therefrom, and based on the operation time data corresponding to the operation data of the historical user data, the behavioral characteristics of each user can be determined. generation time.

示例性地,将根据步骤S304构建的直播的用户行为特征及对应的每个用户行为特征的产生时间输入序列生成模型中进行训练,计算信息增益后输出每个用户行为特征的特征分数,将特征分数进行归一化处理,获得每个用户行为特征的权重,将用户行为特征的数据及每个用户行为特征的权重进行加权求和,即可获得用户的行为序列。其中,权重可以表征不同特征的重要性/重要程度,序列是指由预定时间段内的多个特征的值组成的向量。其中,序列生成模型可以为随机森林模型。Exemplarily, the user behavior characteristics of the live broadcast constructed according to step S304 and the corresponding generation time input sequence generation model of each user behavior characteristic are trained, and the feature score of each user behavior characteristic is output after calculating the information gain, and the feature The scores are normalized to obtain the weight of each user's behavioral feature, and the user's behavioral sequence can be obtained by summing the data of the user's behavioral feature and the weight of each user's behavioral feature. Among them, the weight can represent the importance/degree of importance of different features, and the sequence refers to a vector composed of the values of multiple features within a predetermined time period. The sequence generation model may be a random forest model.

示例性地,根据步骤S304构建的直播的用户行为特征包括用户浏览直播特征、用户浏览商品特征、用户加购特征、用户下单特征、用户分享特征;从步骤S301构建的直播的数据库中提取历史用户数据的与操作数据对应的操作时间数据,确定每个用户行为特征的产生时间,即用户浏览直播特征、用户浏览商品特征、用户加购特征、用户下单特征、用户分享特征的产生时间。其中,用户浏览直播特征由A表示,用户浏览商品特征由B表示,用户加购特征由C表示,用户下单特征由D表示,用户分享特征由E表示。Exemplarily, the user behavior characteristics of the live broadcast constructed according to step S304 include user browsing live broadcast characteristics, user browsing commodity characteristics, user add-on purchase characteristics, user ordering characteristics, and user sharing characteristics; the history is extracted from the live broadcast database constructed in step S301. The operation time data corresponding to the operation data of the user data determines the generation time of each user behavior feature, that is, the generation time of the user’s live broadcast feature, the user’s product browsing feature, the user’s additional purchase feature, the user’s order feature, and the user’s sharing feature. Among them, the feature of user browsing live broadcast is represented by A, the feature of user browsing products is represented by B, the feature of user add-on purchase is represented by C, the feature of user placing an order is represented by D, and the feature of user sharing is represented by E.

示例性地,基于平台近1年内的历史数据构建直播的数据库,从直播的数据库中获取历史用户数据和历史直播数据。基于获取的历史用户数据和历史直播数据,构建近1年的用户行为特征A、B、C、D、E,将用户行为特征A、B、C、D、E及对应的用户行为特征的产生时间输入随机森林模型进行训练,计算信息增益后输出A、B、C、D、E的特征分数,将A、B、C、D、E的特征分数进行归一化处理,获得A、B、C、D、E的权重W A、W B、W C、W D、W E,将A、B、C、D、E的数据与A、B、C、D、E的权重W A、W B、W C、W D、W E进行加权求和,获得用户行为序列。 Exemplarily, a live broadcast database is constructed based on the historical data of the platform within the past one year, and historical user data and historical live broadcast data are obtained from the live broadcast database. Based on the acquired historical user data and historical live broadcast data, construct user behavior characteristics A, B, C, D, and E for the past year, and generate user behavior characteristics A, B, C, D, E and corresponding user behavior characteristics. Time input random forest model for training, calculate the information gain and output the feature scores of A, B, C, D, E, normalize the feature scores of A, B, C, D, E to obtain A, B, The weights W A , W B , W C , W D , and W E of C, D, and E combine the data of A, B, C, D, and E with the weights W A , W of A, B, C, D, and E. B , WC , WD , and WE perform weighted summation to obtain the user behavior sequence.

示例性地,A、B、C、D、E的数据可以为用户浏览直播的次数、用户浏览商品的次数、用户加购的次数、用户下单的次数、用户分享的次数。将A、B、C、D、E的数据与A、B、C、D、E的权重W A、W B、W C、W D、W E进行加权求和,获得用户行为序列,包括:将1月份的A、B、C、D、E的数据与A、B、C、D、E的权重W A、W B、W C、W D、W E进行加权求和,得到1月份用户的行为分数;分别将2-12月份的A、B、C、D、E的数据与A、B、C、D、E的权重W A、W B、W C、W D、W E进行加权求和,得到2-12月份用户的行为分数;将12个月份的用户行为分数进行组合,即可获得用户行为序列。用户行为序列包含12个元素,即12个月的用户行为分数,表征近1年内该用户的行为。 Exemplarily, the data of A, B, C, D, and E may be the number of times users browse live broadcasts, the times users browse products, the times users add purchases, the times users place orders, and the times users share. The weighted sum of the data of A, B, C, D, and E and the weights of A , B , C , D , and E is carried out to obtain the user behavior sequence, including: The weighted sum of the data of A, B, C, D, and E in January and the weights of A, B, C, D, and E, W A , W B , W C , W D , and W E , get the user in January Behavior scores; weighting the data of A, B, C, D, E from February to December with the weights of A , B , C , D , E, WA, WB, WC, WD, WE, respectively Sum up to get the behavior scores of users from February to December; combine the user behavior scores of 12 months to obtain the user behavior sequence. The user behavior sequence contains 12 elements, that is, the 12-month user behavior score, which characterizes the user's behavior in the past 1 year.

进一步地,例如,将A、B、C、D、E的数据输入随机森林模型中,输出的权重W A、W B、W C、W D、W E是[0.3,0.2,0.2,0.1,0.2];将1月份的A、B、C、D、E的数据(用户浏览直播5次、浏览商品10次、加购3次、下单2次、分享1次)与A、B、C、D、E的权重W A、W B、W C、W D、W E进行加权求和,得到1月份用户的行为分数s1=0.3*5+0.2*10+0.2*3+0.1*2+0.2*1=4.5;分别将2-12月份的A、B、C、D、E的数据与A、B、C、D、E的权重W A、W B、W C、W D、W E进行加权求和,得到2-12月份用户的行为分数s2、s3、s4、s5、s6、s7、s8、s9、s10、s11、s12;将12个月份的用户行为分数进行组合,即可获得用户行为序列[s1,s2,s3,s4,s5,s6,s7,s8,s9,s10,s11,s12]。 Further, for example, input the data of A, B, C, D, and E into the random forest model, the output weights W A , W B , W C , W D , W E are [0.3, 0.2, 0.2, 0.1, 0.2]; Compare the data of A, B, C, D, and E in January (users browsed the live broadcast 5 times, browsed products 10 times, added 3 times, placed an order 2 times, and shared 1 time) with A, B, C , D and E weights W A , W B , W C , W D , and W E are weighted and summed to obtain the user’s behavior score in January s1=0.3*5+0.2*10+0.2*3+0.1*2+ 0.2*1=4.5; respectively compare the data of A, B, C, D, and E from February to December with the weights of A, B, C, D, and E W A , W B , W C , W D , W E Perform the weighted summation to get the user behavior scores s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12 from February to December; combine the user behavior scores for 12 months to get User behavior sequence [s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12].

进一步地,由于直播间商品浏览率和直播间商品下单率两个指标 的侧重点不同,直播间商品浏览率侧重浏览量,直播间商品下单率更侧重下单量。因此,针对两个指标的模型训练,输出的特征重要性也不同。例如,针对直播间商品浏览率指标,将A、B、C、D、E的数据输入随机森林模型中,输出的权重W A、W B、W C、W D、W E是[0.3,0.2,0.2,0.1,0.2];而针对直播间商品下单率指标,将A、B、C、D、E的数据输入随机森林模型中,输出的权重W A、W B、W C、W D、W E就是[0.1,0.2,0.2,0.3,0.2]。由此可见,针对直播间商品浏览率指标,A、B的权重W A、W B大于D的权重W D,说明直播间商品浏览率指标中用户浏览直播特征、用户浏览商品特征的重要性要高于用户下单特征;针对直播间商品浏览率和直播间商品下单率指标,D的权重W D在直播间商品下单率中占比更高,说明用户下单特征在直播间商品下单率指标中的重要性更高。 Further, due to the difference in the focus of the two indicators, the product viewing rate in the live broadcast room and the product order rate in the live broadcast room, the product viewing rate in the live broadcast room focuses on the pageviews, and the live broadcast room product order rate focuses more on the order volume. Therefore, the model training for the two metrics, the feature importance of the output is also different. For example, for the product browsing rate indicator in the live broadcast room, the data of A, B, C, D, and E are input into the random forest model, and the output weights W A , W B , W C , W D , W E are [0.3, 0.2 , 0.2, 0.1, 0.2]; and for the product order rate indicator in the live broadcast room, the data of A, B, C, D, and E are input into the random forest model, and the output weights W A , W B , W C , W D , W E is [0.1, 0.2, 0.2, 0.3, 0.2]. It can be seen that, for the product viewing rate index in the live broadcast room, the weights W A and W B of A and B are greater than the weight W D of D, indicating the importance of user browsing live broadcast characteristics and user browsing product characteristics in the product viewing rate index in the live broadcast room. Higher than the user's ordering characteristics; for the live broadcast room product viewing rate and live broadcast room product ordering rate indicators, the weight of D W D accounts for a higher proportion of the live broadcast room product ordering rate, indicating that the user's ordering characteristics are under the live broadcast room products. The importance is higher in the single rate indicator.

步骤S306,构建直播的用户动态特征。Step S306, constructing a user dynamic feature of the live broadcast.

示例性地,根据步骤S305构建的直播的用户行为序列,将用户行为序列输入ARMA模型,通过ADF检验对用户行为序列的平稳性进行检测。基于ADF检验的结果,判断用户行为序列是否平稳,若不平稳,则对其进行差分处理(差分运算)直到用户行为序列平稳。确认用户行为序列平稳后,计算用户行为序列的自相关系数a及偏自相关系数b,根据自相关系数a(ACF)及偏自相关系数b(PACF)识别ARMA模型。其中,识别ARMA模型包括:若自相关系数a为拖尾,偏自相关系数b为p阶截尾,则ARMA模型为ARp模型;若自相关系数a为q阶截尾,偏自相关系数b为拖尾,则ARMA模型为MAq模型;若自相关系数a为拖尾,偏自相关系数b也为拖尾,则ARMA模型为ARMAp,q模型。基于确定的ARMA模型,结合AIC、BIC准则确定阶数p、q,阶数p、q代表了序列本身的自相关特性,尤其是周期性行为。阶数p、q确定之后,即可求得ARMA模型的模型参数,将ARMA模型的模型参数组合为特征向量,即为用户动态特征。Exemplarily, according to the live user behavior sequence constructed in step S305, the user behavior sequence is input into the ARMA model, and the stationarity of the user behavior sequence is detected by the ADF test. Based on the results of the ADF test, it is judged whether the user behavior sequence is stationary, and if not, the difference processing (difference operation) is performed until the user behavior sequence is stationary. After confirming that the user behavior sequence is stable, calculate the autocorrelation coefficient a and partial autocorrelation coefficient b of the user behavior sequence, and identify the ARMA model according to the autocorrelation coefficient a (ACF) and the partial autocorrelation coefficient b (PACF). Among them, identifying the ARMA model includes: if the autocorrelation coefficient a is tailing and the partial autocorrelation coefficient b is p-order truncation, then the ARMA model is an ARp model; if the autocorrelation coefficient a is q-order truncation, the partial autocorrelation coefficient b If the autocorrelation coefficient a is tailing and the partial autocorrelation coefficient b is also tailing, the ARMA model is the ARMAp,q model. Based on the determined ARMA model, the order p and q are determined in combination with AIC and BIC criteria. The order p and q represent the autocorrelation characteristics of the sequence itself, especially the periodic behavior. After the order p and q are determined, the model parameters of the ARMA model can be obtained, and the model parameters of the ARMA model are combined into a feature vector, which is the user dynamic feature.

进一步地,用户的周期性行为,例如每周、每月的惯性行为,使得用户行为具有周期性的规律。用户动态特征为序列特征,本身具有连续性。Further, the user's periodic behavior, such as weekly and monthly inertial behavior, makes the user's behavior have periodic laws. User dynamic features are sequence features, which themselves are continuous.

进一步地,可以使用RNN,LSTM,时间卷积网络等深度学习模型用于构建用户动态特征。Further, deep learning models such as RNN, LSTM, and temporal convolutional network can be used to construct user dynamic features.

示例性地,利用用户行为序列作为直播的点击率预测模型的输入时,存在维度太大、可能造成资源浪费等问题,同时维度和序列长度不统一。因此,采用ARMA模型对用户行为序列进行抽象,使用其模型参数构建直播的用户动态特征。Exemplarily, when the user behavior sequence is used as the input of the click-through rate prediction model of the live broadcast, there are problems such as too large dimension, which may cause waste of resources, and at the same time, the dimension and sequence length are not uniform. Therefore, the ARMA model is used to abstract the user behavior sequence, and its model parameters are used to construct the user dynamic characteristics of the live broadcast.

步骤S307,点击率预测模型训练。Step S307, the click rate prediction model is trained.

示例性地,根据步骤S306构建的直播的用户动态特征、步骤S302构建的直播的用户属性特征及步骤S303构建的直播的直播特征,构建直播点击率的特征样本。将直播点击率的特征样本分为训练集和测试集,选取80%的特征样本作为训练集,用于训练模型,获得训练后的点击率预测模型;剩余的20%的特征样本作为测试集,用于测试直播训练后的点击率预测模型。Exemplarily, according to the user dynamic characteristics of the live broadcast constructed in step S306, the user attribute characteristics of the live broadcast constructed in step S302, and the live broadcast characteristics constructed in step S303, a characteristic sample of the live broadcast click rate is constructed. The feature samples of the live CTR are divided into training set and test set, and 80% of the feature samples are selected as the training set for training the model to obtain the trained CTR prediction model; the remaining 20% of the feature samples are used as the test set. Used to test the CTR prediction model after live training.

示例性地,将直播点击率的训练集特征样本输入XGBOOST模型中,迭代循环训练弱分类器,将多个弱分类器迭代集成为一个组合分类器,获得训练后的点击率预测模型。Exemplarily, the training set feature samples of the live click rate are input into the XGBOOST model, the weak classifiers are trained in an iterative loop, the multiple weak classifiers are iteratively integrated into a combined classifier, and the trained click rate prediction model is obtained.

进一步地,点击率预测模型用于预测直播的点击率,其中,点击率可以包括常规点击、浏览、收藏、加购、下单和分享等数据。Further, the click-through rate prediction model is used to predict the click-through rate of the live broadcast, wherein the click-through rate may include data such as regular clicks, browsing, favorites, add-ons, ordering, and sharing.

进一步地,XGBOOST模型(eXtreme Gradient Boosting)是一种提升树模型,根据输入的训练集特征样本进行迭代训练,分步学习每次迭代的弱分类器,根据弱分类器的系数更新训练集中样本的权重;拟合历次弱分类器的结果与训练集样本的残差,将多个弱分类器迭代集成为一个强分类器,得到预测模型。XGBOOST作为集成学习模型,具有良好的学习性能,同时,XGBOOST模型作为优化的分布式梯度增强模型,可以在短时间内消耗较少的计算资源,并获得性能优异的预测模型,将正则化、稀疏性感知、交叉验证等优点集于一身。Further, the XGBOOST model (eXtreme Gradient Boosting) is a boosting tree model, iterative training is performed according to the input training set feature samples, the weak classifiers of each iteration are learned step by step, and the coefficients of the samples in the training set are updated according to the coefficients of the weak classifiers. Weight; the residuals between the results of fitting previous weak classifiers and the training set samples, and iteratively integrates multiple weak classifiers into a strong classifier to obtain a prediction model. As an ensemble learning model, XGBOOST has good learning performance. At the same time, as an optimized distributed gradient boosting model, XGBOOST model can consume less computing resources in a short time and obtain a prediction model with excellent performance. It combines the advantages of sexual awareness and cross-validation.

进一步地,可以使用LR、随机森林、GBDT、BP神经网络等算法进行模型训练。Further, models such as LR, random forest, GBDT, BP neural network and other algorithms can be used for model training.

步骤S308,验证点击率预测模型。Step S308, verifying the click-through rate prediction model.

示例性地,将直播点击率的测试集特征样本输入基于步骤S307训 练获得的组合分类器,即训练后的点击率预测模型,输出用户的直播点击率,根据测试的结果计算点击率预测模型的误差,判断模型误差是否高于误差标准值,根据模型误差对点击率预测模型进行修正,使得点击率预测模型满足要求。Exemplarily, input the test set feature samples of the live broadcast click rate into the combined classifier obtained by training in step S307, that is, the trained click rate prediction model, output the user's live broadcast click rate, and calculate the click rate prediction model according to the test result. Error, judge whether the model error is higher than the error standard value, and modify the click-through rate prediction model according to the model error, so that the click-through rate prediction model meets the requirements.

进一步地,通过基于序列特征的关联关系,使得预测模型的预测准确度大大提高,可以准确地预测直播点击率。Further, through the association relationship based on sequence features, the prediction accuracy of the prediction model is greatly improved, and the click-through rate of the live broadcast can be accurately predicted.

步骤S309,模型使用。Step S309, the model is used.

示例性地,基于步骤S308获得的点击率预测模型,根据获取的目标用户数据以及多个待推送的直播数据,将获取的目标用户数据以及多个待推送的直播数据输入点击率预测模型,输出目标直播数据的点击率。根据点击率,可以为目标用户推送直播数据;或者,根据点击率,可以确定目标直播数据所对应的库存量,并根据库存量进行库存管理,例如,从产地直接调拨库存备货、从供给仓调拨库存至需求仓等。Exemplarily, based on the click-through rate prediction model obtained in step S308, according to the obtained target user data and a plurality of live broadcast data to be pushed, the obtained target user data and a plurality of live broadcast data to be pushed are input into the click-through rate prediction model, and output. The click-through rate of the target live stream data. According to the click rate, the live broadcast data can be pushed to the target user; or, according to the click rate, the inventory corresponding to the target live broadcast data can be determined, and inventory management can be carried out according to the inventory amount, for example, direct allocation of inventory from the origin, allocation from the supply warehouse Inventory to demand warehouse, etc.

进一步地,点击率可以包括常规点击、浏览、收藏、加购和下单数据等。如果某种商品下单的点击率较高,则仓库增加该商品的库存;如果某用户浏览、收藏或者加购某种商品的点击率较高,则向该用户推荐该商品的直播或者类似商品的直播。Further, the click-through rate may include data of regular clicks, browsing, favorites, add-ons, and ordering. If the click-through rate of an order for a certain product is high, the warehouse will increase the inventory of the product; if a user has a high click-through rate to browse, favorite or add a certain product, it will recommend the live broadcast of the product or similar products to the user. 's live broadcast.

在本公开实施例中,通过构建直播的数据库;构建直播的用户属性特征;构建直播的直播特征;构建直播的用户行为特征;构建直播的用户行为序列;构建直播的用户动态特征;点击率预测模型训练;验证点击率预测模型;模型使用等步骤,能够适应用户行为的周期性变化,优化直播数据预测模型的性能,充分利用直播资源,准确预测直播点击率,可以向用户精准推送直播并合理管理库存。In the embodiments of the present disclosure, by constructing a live broadcast database; constructing a live broadcast user attribute feature; constructing a live broadcast feature; constructing a live broadcast user behavior feature; constructing a live broadcast user behavior sequence; constructing a live broadcast user dynamic feature; click-through rate prediction Model training; verification of the click-through rate prediction model; model use and other steps can adapt to the cyclical changes in user behavior, optimize the performance of the live broadcast data prediction model, make full use of live broadcast resources, accurately predict the live broadcast click rate, and can accurately push the live broadcast to users and reasonably Manage inventory.

图4是根据本公开实施例的直播点击率的确定装置的主要模块的示意图,如图4所示,本公开的直播点击率的确定装置400包括:FIG. 4 is a schematic diagram of the main modules of the apparatus for determining the click-through rate of live broadcast according to an embodiment of the present disclosure. As shown in FIG. 4 , the apparatus 400 for determining the click-through rate of live broadcast of the present disclosure includes:

获取模块401,用于获取多个历史用户数据和多个历史直播数据。The obtaining module 401 is configured to obtain multiple historical user data and multiple historical live broadcast data.

示例性地,所述获取模块401基于平台的历史数据获取多个历史用户数据和多个历史直播数据,历史用户数据可以包括用户的年龄、 性别、购买能力、职业和喜好等信息的数据,还可以包括用户的浏览、评论、收藏、加购、下单、分享等操作的数据,还可以包括用户操作数据产生的时间;历史直播数据可以包括直播的品牌、抽奖、时间、互动、主播和商品等信息的数据。Exemplarily, the obtaining module 401 obtains a plurality of historical user data and a plurality of historical live broadcast data based on the historical data of the platform. It can include data of users' browsing, comments, favorites, add-ons, ordering, sharing, etc., as well as the time when the user's operation data is generated; historical live broadcast data can include live broadcast brand, lottery, time, interaction, anchor and commodity and other information data.

序列生成模块402,用于根据序列生成模型、所述多个历史用户数据以及所述多个历史用户数据的产生时间,确定所述多个历史用户数据对应的用户行为序列。The sequence generation module 402 is configured to determine the user behavior sequence corresponding to the multiple historical user data according to the sequence generation model, the multiple historical user data, and the generation time of the multiple historical user data.

示例性地,根据所述获取模块401获取的多个历史用户数据,基于其中的用户操作数据确定用户行为特征,并基于用户操作数据产生的时间确定用户行为特征对应的产生时间,所述序列生成模块402将用户行为特征、用户行为特征对应的产生时间输入序列生成模型,输出每个用户行为特征的特征分数。所述序列生成模块402将用户行为特征的特征分数进行归一化处理,获得每个用户行为特征对应的权重值;基于用户行为特征的数据以及用户行为特征对应的权重值,加权求和后生成用户行为序列。用户行为序列包含12个元素,即12个月的用户行为分数,表征近1年内该用户的行为。Exemplarily, according to a plurality of historical user data acquired by the acquisition module 401, the user behavior characteristics are determined based on the user operation data therein, and the generation time corresponding to the user behavior characteristics is determined based on the time when the user operation data is generated, and the sequence is generated. The module 402 generates a model by inputting the user behavior feature and the generation time sequence corresponding to the user behavior feature, and outputs a feature score of each user behavior feature. The sequence generation module 402 normalizes the feature scores of the user behavioral features to obtain a weight value corresponding to each user behavioral feature; based on the data of the user behavioral feature and the weighted value corresponding to the user behavioral feature, the weighted sum is generated. User behavior sequence. The user behavior sequence contains 12 elements, that is, the 12-month user behavior score, which characterizes the user's behavior in the past 1 year.

特征生成模块403,用于根据所述多个历史用户数据确定用户属性特征,并根据所述多个历史直播数据确定直播特征。The feature generation module 403 is configured to determine user attribute features according to the multiple historical user data, and determine the live broadcast feature according to the multiple historical live broadcast data.

示例性地,根据所述获取模块401获取的多个历史用户数据,所述特征生成模块403基于其中的用户信息数据确定用户属性特征,并基于其中的多个历史直播数据确定直播特征。Exemplarily, according to the plurality of historical user data obtained by the obtaining module 401, the feature generating module 403 determines the user attribute feature based on the user information data therein, and determines the live broadcast feature based on the plurality of historical live broadcast data therein.

模型训练模块404,用于根据用户行为序列、所述用户属性特征以及所述直播特征,对点击率预测模型进行训练。The model training module 404 is configured to train the click rate prediction model according to the user behavior sequence, the user attribute feature and the live broadcast feature.

示例性地,根据所述序列生成模块402获得的用户行为序列,所述模型训练模块404将用户行为序列输入ARMA模型,对ARMA模型进行训练,输出ARMA模型参数作为用户动态特征。所述模型训练模块404将用户动态特征、所述特征生成模块403获得的用户属性特征和直播特征输入点击率预测模型,对点击率预测模型进行训练,输出训练后的点击率预测模型。其中,点击率预测模型为XGBOOST模型。Exemplarily, according to the user behavior sequence obtained by the sequence generation module 402, the model training module 404 inputs the user behavior sequence into the ARMA model, trains the ARMA model, and outputs the ARMA model parameters as user dynamic features. The model training module 404 inputs the user dynamic features, the user attribute features and the live broadcast features obtained by the feature generation module 403 into the click-through rate prediction model, trains the click-through rate prediction model, and outputs the trained click-through rate prediction model. Among them, the CTR prediction model is the XGBOOST model.

数据处理模块405,用于根据训练后的点击率预测模型,确定目标 用户关于目标直播数据的点击率。The data processing module 405 is configured to determine the click rate of the target user on the target live broadcast data according to the trained click rate prediction model.

示例性地,获取目标用户数据以及多个待推送的直播数据,所述数据处理模块405根据目标用户数据以及训练后的点击率预测模型,确定目标用户关于目标直播数据的点击率。其中,点击率可以包括常规点击、浏览、收藏、加购和下单数据等。根据点击率,为目标用户推送直播数据;或者,根据点击率,确定目标直播数据所对应的库存量,并根据库存量进行库存管理,适时地增加、供给仓调拨或者支援需求仓。Exemplarily, the target user data and a plurality of live broadcast data to be pushed are obtained, and the data processing module 405 determines the click rate of the target user on the target live broadcast data according to the target user data and the trained click rate prediction model. The click-through rate may include data of regular clicks, browsing, favorites, add-ons, and ordering. According to the click rate, push live broadcast data for the target user; or, according to the click rate, determine the inventory corresponding to the target live broadcast data, and carry out inventory management according to the inventory amount to timely increase, allocate supply warehouses or support demand warehouses.

在本公开实施例中,通过获取模块、序列生成模块、特征生成模块、模型训练模块和数据处理模块等模块,能够适应用户行为的周期性变化,优化直播数据预测模型的性能,充分利用直播资源,准确预测直播点击率,可以向用户精准推送直播并合理管理库存。In the embodiments of the present disclosure, modules such as an acquisition module, a sequence generation module, a feature generation module, a model training module, and a data processing module can adapt to periodic changes in user behavior, optimize the performance of the live broadcast data prediction model, and make full use of live broadcast resources. , which can accurately predict the click-through rate of live broadcasts, accurately push live broadcasts to users and manage inventory reasonably.

图5是适于用来实现本公开实施例的终端设备的计算机系统的结构示意图,如图5所示,本公开实施例的终端设备的计算机系统500包括:FIG. 5 is a schematic structural diagram of a computer system suitable for implementing the terminal device according to the embodiment of the present disclosure. As shown in FIG. 5 , the computer system 500 of the terminal device according to the embodiment of the present disclosure includes:

中央处理单元(CPU)501,其可以根据存储在只读存储器(ROM)502中的程序或者从存储部分508加载到随机访问存储器(RAM)503中的程序而执行各种适当的动作和处理。在RAM503中,还存储有系统500操作所需的各种程序和数据。CPU501、ROM502以及RAM503通过总线504彼此相连。输入/输出(I/O)接口505也连接至总线504。A central processing unit (CPU) 501 can execute various appropriate actions and processes according to a program stored in a read only memory (ROM) 502 or a program loaded into a random access memory (RAM) 503 from a storage section 508 . In the RAM 503, various programs and data necessary for the operation of the system 500 are also stored. The CPU 501 , the ROM 502 , and the RAM 503 are connected to each other through a bus 504 . An input/output (I/O) interface 505 is also connected to bus 504 .

以下部件连接至I/O接口505:包括键盘、鼠标等的输入部分506;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分507;包括硬盘等的存储部分508;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分509。通信部分509经由诸如因特网的网络执行通信处理。驱动器510也根据需要连接至I/O接口505。可拆卸介质511,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器510上,以便于从其上读出的计算机程序根据需要被安装入存储部分508。The following components are connected to the I/O interface 505: an input section 506 including a keyboard, a mouse, etc.; an output section 507 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker, etc.; a storage section 508 including a hard disk, etc. ; and a communication section 509 including a network interface card such as a LAN card, a modem, and the like. The communication section 509 performs communication processing via a network such as the Internet. A drive 510 is also connected to the I/O interface 505 as needed. A removable medium 511, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is mounted on the drive 510 as needed so that a computer program read therefrom is installed into the storage section 508 as needed.

特别地,根据本公开的实施例,上文参考流程图描述的过程可以 被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分509从网络上被下载和安装,和/或从可拆卸介质511被安装。在该计算机程序被中央处理单元(CPU)501执行时,执行本公开的系统中限定的上述功能。In particular, the processes described above with reference to the flowcharts may be implemented as computer software programs according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network via the communication portion 509 and/or installed from the removable medium 511 . When the computer program is executed by the central processing unit (CPU) 501, the above-described functions defined in the system of the present disclosure are executed.

需要说明的是,本公开所示的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、RF等等,或者上述的任意合适的组合。It should be noted that the computer-readable medium shown in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. In this disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present disclosure, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,上述模块、程序段、或代码的一部分包含一个或多个用于 实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图或流程图中的每个方框、以及框图或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams or flowchart illustrations, and combinations of blocks in the block diagrams or flowchart illustrations, can be implemented in special purpose hardware-based systems that perform the specified functions or operations, or can be implemented using A combination of dedicated hardware and computer instructions is implemented.

描述于本公开实施例中所涉及到的模块可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的模块也可以设置在处理器中,例如,可以描述为:一种处理器包括获取模块、序列生成模块、特征生成模块、模型训练模块、数据处理模块。其中,这些模块的名称在某种情况下并不构成对该模块本身的限定,例如,获取模块还可以被描述为“从直播平台获取直播数据的模块”。The modules involved in the embodiments of the present disclosure may be implemented in software or hardware. The described modules can also be set in the processor, for example, it can be described as: a processor includes an acquisition module, a sequence generation module, a feature generation module, a model training module, and a data processing module. Wherein, the names of these modules do not constitute a limitation on the module itself under certain circumstances. For example, the acquisition module can also be described as "a module for acquiring live broadcast data from a live broadcast platform".

作为另一方面,本公开还提供了一种计算机可读介质,该计算机可读介质可以是上述实施例中描述的设备中所包含的;也可以是单独存在,而未装配入该设备中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被一个该设备执行时,使得该设备包括:获取多个历史用户数据和多个历史直播数据;根据序列生成模型、所述多个历史用户数据以及所述多个历史用户数据的产生时间,确定所述多个历史用户数据对应的用户行为序列;根据所述多个历史用户数据确定用户属性特征,并根据所述多个历史直播数据确定直播特征;根据用户行为序列、所述用户属性特征以及所述直播特征,对点击率预测模型进行训练;根据训练后的点击率预测模型,确定目标用户关于目标直播数据的点击率。As another aspect, the present disclosure also provides a computer-readable medium. The computer-readable medium may be included in the device described in the above-mentioned embodiments, or it may exist alone without being assembled into the device. The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by a device, the device includes: acquiring a plurality of historical user data and a plurality of historical live broadcast data; generating a model according to the sequence, The plurality of historical user data and the generation time of the plurality of historical user data determine the user behavior sequence corresponding to the plurality of historical user data; determine user attribute characteristics according to the plurality of historical user data, and A plurality of historical live broadcast data determine the live broadcast characteristics; according to the user behavior sequence, the user attribute characteristics and the live broadcast characteristics, the click-through rate prediction model is trained; according to the trained click-through rate prediction model, the target user's information about the target live broadcast data is determined. CTR.

根据本公开实施例的技术方案,能够提高直播点击率的预测准确性,从而能够提高推送准确性,以及降低备货量不足或滞销等现象的可能性。According to the technical solutions of the embodiments of the present disclosure, the prediction accuracy of the click-through rate of the live broadcast can be improved, so that the push accuracy can be improved, and the possibility of insufficient stock or slow sales can be reduced.

上述具体实施方式,并不构成对本公开保护范围的限制。本领域技术人员应该明白的是,取决于设计要求和其他因素,可以发生各种各样的修改、组合、子组合和替代。任何在本公开的精神和原则之内所作的修改、等同替换和改进等,均应包含在本公开保护范围之内。The above-mentioned specific embodiments do not constitute a limitation on the protection scope of the present disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may occur depending on design requirements and other factors. Any modifications, equivalent replacements, and improvements made within the spirit and principles of the present disclosure should be included within the protection scope of the present disclosure.

Claims (10)

一种直播点击率的确定方法,包括:A method for determining the click-through rate of a live broadcast, including: 获取多个历史用户数据和多个历史直播数据;Obtain multiple historical user data and multiple historical live broadcast data; 根据序列生成模型、所述多个历史用户数据以及所述多个历史用户数据的产生时间,确定所述多个历史用户数据对应的用户行为序列;According to the sequence generation model, the multiple historical user data, and the generation time of the multiple historical user data, determine the user behavior sequence corresponding to the multiple historical user data; 根据所述多个历史用户数据确定用户属性特征,并根据所述多个历史直播数据确定直播特征;Determine user attribute characteristics according to the plurality of historical user data, and determine the live broadcast characteristics according to the plurality of historical live broadcast data; 根据用户行为序列、所述用户属性特征以及所述直播特征,对点击率预测模型进行训练;According to the user behavior sequence, the user attribute characteristics and the live broadcast characteristics, the click rate prediction model is trained; 根据训练后的点击率预测模型,确定目标用户关于目标直播数据的点击率。According to the trained click-through rate prediction model, the click-through rate of the target user on the target live broadcast data is determined. 根据权利要求1所述的方法,其中,所述根据序列生成模型、所述多个历史用户数据以及所述多个历史用户数据的产生时间,确定所述多个历史用户数据对应的用户行为序列,包括:The method according to claim 1, wherein the user behavior sequence corresponding to the plurality of historical user data is determined according to a sequence generation model, the plurality of historical user data, and the generation time of the plurality of historical user data ,include: 根据所述多个历史用户数据确定用户行为特征;determining user behavior characteristics according to the plurality of historical user data; 将所述用户行为特征以及所述用户行为特征对应的产生时间作为所述序列生成模型的输入,根据所述序列生成模型的输出确定每一个所述用户行为特征对应的权重值;The user behavior feature and the generation time corresponding to the user behavior feature are used as the input of the sequence generation model, and the weight value corresponding to each of the user behavior features is determined according to the output of the sequence generation model; 根据所述用户行为特征以及所述权重值,生成所述用户行为序列。The user behavior sequence is generated according to the user behavior feature and the weight value. 根据权利要求2所述的方法,其中,根据所述序列生成模型的输出确定每一个所述用户行为特征对应的权重,包括:The method according to claim 2, wherein determining the weight corresponding to each of the user behavior characteristics according to the output of the sequence generation model, comprising: 将所述序列生成模型的输出进行归一化处理,得到每一个所述用户行为特征对应的权重值。The output of the sequence generation model is normalized to obtain a weight value corresponding to each of the user behavior features. 根据权利要求1所述的方法,其中,所述根据用户行为序列、所述用户属性特征以及所述直播特征,对点击率预测模型进行训练, 包括:The method according to claim 1, wherein the training of the click-through rate prediction model according to the user behavior sequence, the user attribute feature and the live broadcast feature includes: 将所述用户行为序列输入ARMA模型,根据所述ARMA模型的输出确定用户动态特征;The user behavior sequence is input into the ARMA model, and the user dynamic feature is determined according to the output of the ARMA model; 将所述用户动态特征、所述用户属性特征以及所述直播特征作为所述点击率预测模型的输入,以对所述点击率预测模型进行训练。The user dynamic feature, the user attribute feature, and the live broadcast feature are used as the input of the click-through rate prediction model, so as to train the click-through rate prediction model. 根据权利要求1所述的方法,其中,在确定目标用户关于目标直播数据的点击率之后,还包括:The method according to claim 1, wherein after determining the click-through rate of the target user on the target live broadcast data, the method further comprises: 根据所述点击率,为所述目标用户推送直播数据。According to the click rate, the live broadcast data is pushed to the target user. 根据权利要求1所述的方法,其中,在确定目标用户关于目标直播数据的点击率之后,还包括:The method according to claim 1, wherein after determining the click-through rate of the target user on the target live broadcast data, the method further comprises: 根据所述点击率,确定所述目标直播数据所对应的库存量,并根据所述库存量进行库存管理。According to the click rate, the inventory corresponding to the target live broadcast data is determined, and inventory management is performed according to the inventory. 根据权利要求1所述的方法,其中,The method of claim 1, wherein, 所述序列生成模型为随机森立模型;The sequence generation model is a random forest model; 和/或,and / or, 所述点击率预测模型为XGBOOST模型。The click-through rate prediction model is an XGBOOST model. 一种直播点击率的确定装置,包括:A device for determining the click-through rate of live broadcast, comprising: 获取模块,用于获取多个历史用户数据和多个历史直播数据;The acquisition module is used to acquire multiple historical user data and multiple historical live broadcast data; 序列生成模块,用于根据序列生成模型、所述多个历史用户数据以及所述多个历史用户数据的产生时间,确定所述多个历史用户数据对应的用户行为序列;a sequence generation module, configured to determine the user behavior sequence corresponding to the multiple historical user data according to the sequence generation model, the multiple historical user data, and the generation time of the multiple historical user data; 特征生成模块,用于根据所述多个历史用户数据确定用户属性特征,并根据所述多个历史直播数据确定直播特征;a feature generation module, configured to determine user attribute features according to the plurality of historical user data, and determine live broadcast features according to the plurality of historical live broadcast data; 模型训练模块,用于根据用户行为序列、所述用户属性特征以及所述直播特征,对点击率预测模型进行训练;a model training module, used for training the click-through rate prediction model according to the user behavior sequence, the user attribute characteristics and the live broadcast characteristics; 数据处理模块,用于根据训练后的点击率预测模型,确定目标用 户关于目标直播数据的点击率。The data processing module is used to determine the click rate of the target user on the target live broadcast data according to the trained click rate prediction model. 一种直播点击率的确定电子设备,包括:An electronic device for determining the click-through rate of a live broadcast, comprising: 一个或多个处理器;one or more processors; 存储装置,用于存储一个或多个程序,storage means for storing one or more programs, 当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-7中任一所述的方法。The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-7. 一种计算机可读介质,其上存储有计算机程序,所述程序被处理器执行时实现如权利要求1-7中任一所述的方法。A computer-readable medium having a computer program stored thereon, the program implementing the method of any one of claims 1-7 when executed by a processor.
PCT/CN2022/071797 2021-01-21 2022-01-13 Method and device for determining live broadcast click rate Ceased WO2022156589A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110081300.7 2021-01-21
CN202110081300.7A CN113778979B (en) 2021-01-21 2021-01-21 A method and device for determining click rate of live broadcast

Publications (1)

Publication Number Publication Date
WO2022156589A1 true WO2022156589A1 (en) 2022-07-28

Family

ID=78835536

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/071797 Ceased WO2022156589A1 (en) 2021-01-21 2022-01-13 Method and device for determining live broadcast click rate

Country Status (2)

Country Link
CN (1) CN113778979B (en)
WO (1) WO2022156589A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117579872A (en) * 2024-01-15 2024-02-20 北京永泰万德信息工程技术有限公司 Live broadcast pushing method and system for live broadcast display screen
CN118784904A (en) * 2024-09-06 2024-10-15 江苏省新闻出版学校 A network live teaching management system and method based on artificial intelligence

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113778979B (en) * 2021-01-21 2025-03-21 北京沃东天骏信息技术有限公司 A method and device for determining click rate of live broadcast

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109992710A (en) * 2019-02-13 2019-07-09 网易传媒科技(北京)有限公司 Clicking rate predictor method, system, medium and calculating equipment
CN110929206A (en) * 2019-11-20 2020-03-27 腾讯科技(深圳)有限公司 Click rate estimation method and device, computer readable storage medium and equipment
CN111046294A (en) * 2019-12-27 2020-04-21 支付宝(杭州)信息技术有限公司 Click rate prediction method, recommendation method, model, device and equipment
US20200285937A1 (en) * 2017-10-11 2020-09-10 Beijing Sankuai Online Technology Co., Ltd Consumption capacity prediction
CN113778979A (en) * 2021-01-21 2021-12-10 北京沃东天骏信息技术有限公司 Method and device for determining live broadcast click rate

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10664540B2 (en) * 2017-12-15 2020-05-26 Intuit Inc. Domain specific natural language understanding of customer intent in self-help
US11196964B2 (en) * 2019-06-18 2021-12-07 The Calany Holding S. À R.L. Merged reality live event management system and method
CN111445280A (en) * 2020-03-10 2020-07-24 携程计算机技术(上海)有限公司 Model generation method, restaurant ranking method, system, device and medium
CN111711828B (en) * 2020-05-18 2022-04-05 北京字节跳动网络技术有限公司 Information processing method and device and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200285937A1 (en) * 2017-10-11 2020-09-10 Beijing Sankuai Online Technology Co., Ltd Consumption capacity prediction
CN109992710A (en) * 2019-02-13 2019-07-09 网易传媒科技(北京)有限公司 Clicking rate predictor method, system, medium and calculating equipment
CN110929206A (en) * 2019-11-20 2020-03-27 腾讯科技(深圳)有限公司 Click rate estimation method and device, computer readable storage medium and equipment
CN111046294A (en) * 2019-12-27 2020-04-21 支付宝(杭州)信息技术有限公司 Click rate prediction method, recommendation method, model, device and equipment
CN113778979A (en) * 2021-01-21 2021-12-10 北京沃东天骏信息技术有限公司 Method and device for determining live broadcast click rate

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117579872A (en) * 2024-01-15 2024-02-20 北京永泰万德信息工程技术有限公司 Live broadcast pushing method and system for live broadcast display screen
CN117579872B (en) * 2024-01-15 2024-04-30 北京永泰万德信息工程技术有限公司 Live broadcast pushing method and system for live broadcast display screen
CN118784904A (en) * 2024-09-06 2024-10-15 江苏省新闻出版学校 A network live teaching management system and method based on artificial intelligence

Also Published As

Publication number Publication date
CN113778979A (en) 2021-12-10
CN113778979B (en) 2025-03-21

Similar Documents

Publication Publication Date Title
CN109460513B (en) Method and apparatus for generating click rate prediction model
CN105208113A (en) Information pushing method and device
WO2022156589A1 (en) Method and device for determining live broadcast click rate
CN111095330B (en) Machine learning method and system for predicting online user interactions
CN112598472A (en) Product recommendation method, device, system, medium and program product
US20210192549A1 (en) Generating analytics tools using a personalized market share
CN110020876B (en) A method and device for generating information
CN110298716A (en) Information-pushing method and device
CN109961299A (en) The method and apparatus of data analysis
CN114549125B (en) Item recommendation method and device, electronic device and computer-readable storage medium
CN112749323B (en) Method and device for constructing user portrait
CN113495991A (en) Recommendation method and device
CN110866040A (en) User portrait generation method, device and system
CN113763112A (en) A kind of information push method and device
CN109978594B (en) Order processing method, device and medium
CN110796505A (en) Service object recommendation method and device
CN109299351B (en) Content recommendation method and device, electronic equipment and computer readable medium
CN113822734B (en) Method and device for generating information
CN113159877B (en) Data processing method, device, system and computer readable storage medium
CN113269600B (en) Information sending method and device
CN113792952A (en) Method and apparatus for generating a model
CN111768218B (en) Method and device for processing user interaction information
CN110490682B (en) Method and device for analyzing commodity attributes
CN113327147A (en) Method and device for displaying article information
CN117349546A (en) Method, device and system for generating information

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22742068

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22742068

Country of ref document: EP

Kind code of ref document: A1

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载