CN113297151B

CN113297151B - Data processing method and device

Info

Publication number: CN113297151B
Application number: CN202110182725.7A
Authority: CN
Inventors: 刘志鹏; 张友东; 杨成虎; 吴兴博
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Cloud Computing Ltd
Priority date: 2021-02-10
Filing date: 2021-02-10
Publication date: 2025-01-03
Anticipated expiration: 2041-02-10
Also published as: CN113297151A

Abstract

Embodiments of the present specification provide a data processing method and device, wherein the data processing method comprises: receiving a data storage request, wherein the data storage request carries data to be stored, attribute information of the data to be stored, and a data type of the data to be stored; determining a data tag of the data to be stored based on the attribute information of the data to be stored; creating a data partition based on the data tag, and determining the data type of the data to be stored corresponding to the data tag; based on the data type of the data to be stored corresponding to the data tag, dividing the data partition corresponding to the data tag into sub-partitions corresponding to the data type; based on the data type of the data to be stored corresponding to the data tag, storing the data to be stored corresponding to the data tag in the sub-partition corresponding to the data type.

Description

Data processing method and device

Technical Field

The embodiment of the specification relates to the technical field of computers, in particular to a data processing method. One or more embodiments of the present specification also relate to a data processing apparatus, a computing device, and a computer-readable storage medium.

Background

With the wider and wider application of the internet or the internet of things, a detection system or an internet of things device in the internet generates more and more data based on time series, and the data is called time series data. The time sequence data is different in source, but due to the reasons of version upgrading of acquisition equipment or system upgrading and the like, the data types of the time sequence data are various, and as the data types are increased, the time sequence database cannot be stored due to single set storage data types, so that the data is lost, the data safety is not high, inconvenience is brought to subsequent data inquiry, and the data inquiry efficiency is reduced.

Disclosure of Invention

In view of this, the present embodiments provide a data processing method. One or more embodiments of the present specification are also directed to a data processing apparatus, a computing device, and a computer-readable storage medium, which address the technical deficiencies of the prior art.

According to a first aspect of embodiments of the present specification, there is provided a data processing method, including:

receiving a data storage request, wherein the data storage request carries data to be stored, attribute information of the data to be stored and a data type of the data to be stored;

determining a data tag of the data to be stored based on attribute information of the data to be stored;

creating a data partition based on the data tag, and determining a data type of data to be stored corresponding to the data tag;

Dividing a data partition corresponding to the data tag into sub-partitions corresponding to the data type based on the data type of the data to be stored corresponding to the data tag;

And storing the data to be stored corresponding to the data tag to the sub-partition corresponding to the data type based on the data type of the data to be stored corresponding to the data tag.

According to a second aspect of embodiments of the present specification, there is provided a data processing apparatus comprising:

the receiving module is configured to receive a data storage request, wherein the data storage request carries data to be stored, attribute information of the data to be stored and a data type of the data to be stored;

a determining module configured to determine a data tag of the data to be stored based on attribute information of the data to be stored;

The creation module is configured to create a data partition based on the data tag and determine the data type of the data to be stored corresponding to the data tag;

the dividing module is configured to divide the data partition corresponding to the data tag into sub-partitions corresponding to the data type based on the data type of the data to be stored corresponding to the data tag;

and the storage module is configured to store the data to be stored corresponding to the data tag to the sub-partition corresponding to the data type based on the data type of the data to be stored corresponding to the data tag.

According to a third aspect of embodiments of the present specification, there is provided a computing device comprising:

A memory and a processor;

The memory is configured to store computer-executable instructions that, when executed, implement the steps of the data processing method.

According to a fourth aspect of embodiments of the present specification, there is provided a computer readable storage medium storing computer executable instructions which, when executed by a processor, implement the steps of any one of the data processing methods.

According to the embodiment of the specification, the data label of the data to be stored is determined through the attribute information of the data to be stored, the data partition is created according to the data label of the data to be stored, the data type of the data to be stored corresponding to the data label is determined, the data partition is further divided into corresponding sub-partitions according to the data type of the data to be stored, the data to be stored is stored in the sub-partitions corresponding to the data type, the data to be stored is stored according to the data type, the data loss caused by the fact that the data to be stored cannot be stored due to the data type of the data to be stored in a database is avoided, the safety of data storage is improved, the rapid storage of massive time sequence data is also met, the storage efficiency is improved, and the follow-up data query efficiency is further improved.

Drawings

FIG. 1 is a schematic diagram of a time-series database model of a data processing method according to one embodiment of the present disclosure;

FIG. 2 is a flow chart of a method of data processing provided in one embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a data processing method applied to recording temperature values according to an embodiment of the present disclosure;

FIG. 4 is a process flow diagram of a data processing method according to one embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a data processing apparatus according to one embodiment of the present disclosure;

FIG. 6 is a block diagram of a computing device provided in one embodiment of the present description.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many other forms than described herein and similarly generalized by those skilled in the art to whom this disclosure pertains without departing from the spirit of the disclosure and, therefore, this disclosure is not limited by the specific implementations disclosed below.

The terminology used in the one or more embodiments of the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the specification. As used in this specification, one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that, although the terms first, second, etc. may be used in one or more embodiments of this specification to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of one or more embodiments of the present description. The term "if" as used herein may be interpreted as "at..once" or "when..once" or "in response to a determination", depending on the context.

First, terms related to one or more embodiments of the present specification will be explained.

Metrics (metrics) represent a collection of similar series of time series data.

Tags (tags) describe the characteristics of the data source and are generally not time-varying.

The Field (Field) describes the measurement index of the data source, which generally varies over time.

Time line TIME SERIES. A certain index of the data source changes with time, and a time line is formed, for example, a time line is formed by Metric+Tags+Field.

Data Point (Data Point) a Data source generates a certain measurement index Value (Field Value) at a certain time.

Timestamp (Timestamp) the point in time at which the data was generated.

Dynamic data type-the data type supporting the written data point index value is dynamic.

The time sequence database is used for describing the change of things along with time and is widely applied to the industrial scenes such as application performance detection, an Internet of things equipment detection system, industrial Internet and the like. Because the time sequence database is different from the general database in data model, the time sequence database needs to support the writing of second-level data points, and simultaneously has the functions of high compression ratio, low cost storage, pre-downsampling, interpolation, multidimensional aggregation calculation and the like, thereby meeting the requirements of mass time sequence data storage and processing and being extremely difficult to efficiently manage time sequence data.

Referring to fig. 1, fig. 1 is a schematic diagram of a time-series database model of a data processing method according to an embodiment of the present disclosure.

Fig. 1 is an industry timing model view, which also refers to a timing model view for storing timing data in a timing database, where a part a in fig. 1 is a time stamp sequence for recording time for storing the timing data, for example, 2020-10-24:01, 2020-10-24:02, 2020-10-24:03, etc., where time stamp data may set continuous time stamps with a certain time interval according to practical applications, a part B in fig. 1 is a tag sequence, where information such as a device number and a region may be included, where information such as a region may be described, where a tag may represent a device for recording the piece of timing data, and information such as a region where the device is located may set different tag information according to practical applications, and in this embodiment of the present specification, a part C in fig. 1 is a field sequence, where information such as a temperature field and a description field may be included, where it is required to be described, and where the field data sequence may be a sequence for storing practical data.

In practical applications, for the storage manner of the time series data, reference may be made to the time series data storage model of fig. 1, where there may be a plurality of field sequences of the time series data, and the type of the value written in each field sequence is fixed, for example, in fig. 1, the data written in the temperature field is floating point type data, and the data written in the description field is character string type data. Based on this, when the data type written in the database is integer type data, the data cannot be written in the database, that is, since the data type of the temperature field in the time sequence database is floating point type data, in the case that the written temperature field data is integer type data, the corresponding time sequence database does not support the data writing.

In order to solve the problem that the time sequence database cannot support data writing after data type conversion, the data processing method provided by the embodiment of the specification sets partitions with different data types in the same field sequence so as to support dynamic type change of field data, and provides great help for flexibility of adapting to more project scenes.

It should be noted that the time series data has the characteristic of flexible data types, not only the data types of different time lines are inconsistent, but also the data types of the same time line are often dynamically changed, and based on the data types, the writing of the time series data of the same time line can be applied to any scene for storing the time series data of various types, and the description is not limited in any way.

In the present specification, a data processing method is provided, and the present specification relates to a data processing apparatus, a computing device, and a computer-readable storage medium, which are described in detail in the following embodiments one by one.

Fig. 2 shows a flowchart of a data processing method according to an embodiment of the present disclosure, which specifically includes the following steps.

Step 202, receiving a data storage request, wherein the data storage request carries data to be stored, attribute information of the data to be stored and a data type of the data to be stored.

The attribute information of the data to be stored may be understood as a storage time stamp of the data to be stored, metric information of the data to be stored, and the like.

The data type of the data to be stored may be understood as a data form or type of the data to be stored, such as integer type data, floating point type data, numeric type string data, and the like.

Specifically, the server receives a data storage request of a user, where the data storage request carries data to be stored, attribute information of the data to be stored, and a data type of the data to be stored, for example, the server receives a data storage request of temperature data at a time t, the data to be stored carried in the data storage request may be 15 degrees of actual temperature data, the attribute information of the data to be stored may be a time t and a temperature tag, and the data type of the data to be stored may be an integer.

In the embodiments of the present disclosure, weather data recorded in beijing is taken as an example, where the weather data includes temperature data, humidity data, air pressure data, and the like.

And 204, determining a data tag of the data to be stored based on the attribute information of the data to be stored.

The data tag of the data to be stored may be understood as a measurement tag of the data to be stored, for example, the data tag may be any tag information that may represent a data attribute, such as temperature, humidity, air quality, and the like.

The data tag provided in the embodiment of the present disclosure is described in detail with reference to temperature or humidity, but is not limited thereto.

Specifically, the data tag of the data to be stored is determined according to the attribute information of the data to be stored, for example, the data tag of the data to be stored can be determined to be the temperature according to the time t and the temperature tag of the attribute of the data to be stored, and then the field of the data to be stored in the time sequence database can be expressed as the temperature.

And 206, creating a data partition based on the data tag, and determining the data type of the data to be stored corresponding to the data tag.

The data partition may be understood as a data partition storing time series data, different data partitions may be divided according to different fields, and the storage location of the field with temperature is a data partition.

Specifically, the database creates a data partition for storing the data to be stored based on the data tag for determining the data to be stored, one data tag corresponds to one data partition, and the database determines the data type of the data to be stored.

For example, the time sequence database needs to store meteorological data in 7 days of a certain city, then the fields stored in the database can be divided into temperature, humidity, air pressure and the like, then it can be determined that the data labels of the data to be stored can be temperature, humidity, air pressure and the like, corresponding data partitions are created in the time sequence database based on each data label, the data storing the temperature is determined to be integer type data, the data storing the humidity is determined to be floating point type data, and the data storing the air pressure is determined to be character string type data.

And step 208, dividing the data partition corresponding to the data tag into sub-partitions corresponding to the data type based on the data type of the data to be stored corresponding to the data tag.

The sub-partition may be understood as a sub-partition divided among the data partitions to store data to be stored of different data types.

Specifically, according to the data type of the data to be stored corresponding to the data tag, the data partition corresponding to the data tag may be divided into sub-partitions corresponding to the data type, and in the case that the data type of the data to be stored with the storage temperature is determined to be integer type data, by following the above example, the data partition with the storage temperature may be divided into sub-partitions corresponding to the integer type data, where the time sequence data of the storage temperature is the integer type data.

In practical application, because the equipment for data acquisition is updated or the system is updated, the data types of the data to be stored may be different, and then different sub-partitions can be divided according to the different data types in the time sequence database for data storage, specifically, the dividing the data partition corresponding to the data tag into the sub-partitions corresponding to the data types based on the data types of the data to be stored corresponding to the data tag comprises:

And dividing the data partition corresponding to the data tag into at least two sub-partitions corresponding to the at least two data types based on the at least two data types of the data to be stored corresponding to the data tag.

Specifically, the data to be stored may include at least two data types, and the data partition corresponding to the data tag of the data to be stored is divided into at least two sub-partitions, where each sub-partition corresponds to a data type of the data to be stored.

For example, when time series data of humidity is recorded, in the data partition corresponding to the humidity data, the integer type data may be divided into one sub-partition, the floating point type data may be divided into one sub-partition, and the string type data may be divided into one sub-partition.

In practical application, when a large amount of time sequence data is stored, the data to be stored corresponding to each data tag can be at least two data types, the types of the at least two data to be stored can be determined in the large amount of time sequence data in a classification mode, and at least two sub-partitions are divided in the data partition based on the types of the plurality of data to be stored.

According to the embodiment of the specification, at least two sub-partitions are divided in the data partition corresponding to each data tag, so that the data to be stored corresponding to each data tag is stored in the sub-partition corresponding to different data types, the data to be stored is stored in different partitions according to the data types, the situation that the data cannot be written into the database due to different data types is avoided, and the safety of data storage is improved.

And 210, storing the data to be stored corresponding to the data tag to a sub-partition corresponding to the data type based on the data type of the data to be stored corresponding to the data tag.

Specifically, after determining the sub-partition corresponding to the data type of the data to be stored, storing the data to be stored corresponding to the received data tag into the sub-partition corresponding to the data type.

In order to determine that the data to be stored of a certain timestamp is quickly stored in the corresponding position of the time sequence database, the position to be stored can be determined in the sub-partition corresponding to the data type of the data to be stored, specifically, the storing the data to be stored corresponding to the data tag to the sub-partition corresponding to the data type based on the data type of the data to be stored corresponding to the data tag includes:

Determining a target storage file of the data to be stored based on the data to be stored corresponding to the data tag;

determining a storage time stamp of the data to be stored in the target storage file, and determining a position to be stored of the data to be stored in a sub-partition corresponding to the data type based on the storage time stamp and the sub-partition corresponding to the data type of the data to be stored;

And storing the data to be stored in the position to be stored of the sub-partition corresponding to the data type, wherein the data to be stored comprises a storage time stamp and a storage data value.

The target storage file may be understood as a storage file storing time series data to be stored.

Specifically, after determining the data tag of the data to be stored, a target storage file of the data to be stored can be determined according to the data to be stored corresponding to the data tag, a storage time stamp corresponding to the data to be stored is determined in the target storage file, then based on the storage time stamp and a sub-partition corresponding to the data type of the data to be stored, the position to be stored of the data to be stored in the sub-partition corresponding to the data type of the data to be stored can be determined, and finally the data to be stored is stored in the data to be stored of the sub-partition corresponding to the data type of the data to be stored, wherein the data to be stored comprises the storage time stamp and the storage data value.

In practical application, a target storage file of the data to be stored can be determined based on index data of the data to be stored corresponding to the data tag, the target storage file stores the data to be stored, a time line of the data to be stored is included in the target storage file, the data tag is determined according to the time line of the data to be stored, the sub-partition is divided according to the data type of the data to be stored in the data partition determined by the data tag, and the position to be stored of the data to be stored is determined according to the sub-partition and the storage time stamp of the data to be stored, so that the data to be stored is stored.

For example, when the time sequence database receives outdoor temperature data of Beijing city with 2021-1-1:00 as data to be stored at 0 degree, it can be determined that index data of the data to be stored is an area, then it can be determined that a target storage file is a meteorological data file for storing Beijing area, according to time carried by the data to be stored, a time line in the target storage file can be determined, a storage time stamp is 2021-1-1:00, according to the temperature data at 0 degree, it can be determined that a data type of the data to be stored is an integer type, according to the storage time stamp and the data type, it is determined that a position to be stored of the data to be stored is a in a data partition, and then the data to be stored is stored at 0 degree in the position to be stored a.

According to the data processing method provided by the embodiment of the specification, the data to be stored is quickly stored into the data storage file by determining the position to be stored of the data to be stored in the corresponding sub-partition of the data partition.

Referring to fig. 3, fig. 3 is a schematic diagram showing a structure of a data processing method applied to recording temperature values according to an embodiment of the present disclosure.

The time sequence data model includes a time stamp, a tag and a field, in which the portion a in fig. 3 is a time stamp sequence, the portion B in fig. 3 is a tag sequence, and the portion C in fig. 3 is a field sequence, where the field sequence in fig. 3 is obviously different from the field sequence in fig. 1, and the data partition with a temperature of a field is taken as an example, in the data partition with a temperature of the field, the data partition with a temperature of the field may be divided into multiple data types, the temperature field in fig. 3 is divided into a floating point type partition, an integer type partition and a character string type partition, and data corresponding to each time stamp corresponds to only one data type, for example, the data type of the data to be stored with a time stamp 2020-10-24:01 is floating point type data, that is 12.1 degrees, then the data with a temperature of the 12.1 degrees is stored in the data partition with a temperature of the data type of the data label, and then any data may not be written in the data partition with other data types, or may be set to be empty according to a preset condition, and the present embodiment may not be defined.

Further, after the storing the data to be stored corresponding to the data tag to the sub-partition corresponding to the data type, the method further includes:

determining the data type of the data to be stored corresponding to the data tag as a target data type;

determining a data type to be filled in a data partition corresponding to the data tag based on the target data type, and determining a position to be filled for the data to be stored based on the data type to be filled;

and filling the data into the to-be-filled position based on a preset filling rule.

The preset filling rule may be understood as a rule of the data content filled in the position to be filled, for example, "0" or "null" may be filled in the position to be filled.

Specifically, the data type of the data to be stored corresponding to the data tag is determined as the target data type, in the data partition corresponding to the data tag, other data types except the target data type can be determined as the data type to be filled, further the position to be filled of the time line data of the data to be stored is determined, and the position to be filled is filled with data according to a preset filling rule.

In practical application, in the list of the storage time sequence data, in each time line, under the condition that the data type of the data to be stored is determined to be A, the data partition further comprises a data type B and a data type C, and then the data type B and the data type C are the data type to be filled, and then the corresponding filling data can be filled in the sub-partition corresponding to the data type to be filled.

It should be noted that, the data filled in the to-be-filled positions in some timelines does not occupy the storage space.

According to the data processing method provided by the embodiment of the specification, the to-be-filled positions in each time line are determined, and the to-be-filled positions are filled with the empty data, so that the data can be orderly stored in the storage file of the time sequence data, and the storage efficiency is improved.

After the data to be stored is stored, the data of the same type is stored, and the data of the same type is compressed together, so that the compression ratio is higher, and the storage cost is reduced, specifically, after the data to be stored corresponding to the data tag is stored to the sub-partition corresponding to the data type, the method further comprises the steps of:

and compressing and storing the data to be stored in the sub-partition corresponding to each data type based on the data type of the data to be stored corresponding to the data tag.

Specifically, after the data to be stored is stored in the corresponding sub-partition corresponding to the different types, a great amount of time sequence data can be stored in the sub-partition corresponding to each data type, and the data to be stored in the sub-partition corresponding to each data type is compressed and stored.

In practical application, when the data type corresponding to each data tag is stored at the bottom layer, the stored data of each data type is compressed and stored together, so that the data of the same type is stored together, the data of the same type is compressed together to realize higher compression ratio, and further, the storage cost is reduced.

According to the data processing method provided by the embodiment of the specification, the data of the same type are compressed and stored together, so that the compression ratio is improved, the data storage cost is reduced, massive time sequence data can be stored rapidly, the storage efficiency is improved, and convenience is provided for subsequent data inquiry.

In addition, in the data processing method provided in the embodiment of the present disclosure, by dividing the sub-partitions of multiple data types in the data partition of the same data tag, a user is supported to write data of different data types on the same data tag, and in the subsequent query data, the user further includes:

Receiving a data query request, wherein the data query request carries time attribute information of data to be queried and a data tag of the data to be queried;

Determining a target storage file corresponding to the data to be queried based on the time attribute information of the data to be queried;

Determining the data type of the data to be queried in the target storage file based on the data tag of the data to be queried;

determining a sub-partition corresponding to the data type of the data to be queried, and querying the data to be queried from the sub-partition corresponding to the data type of the data to be queried.

The time attribute information of the data to be queried can be understood as time information of the data to be queried required by the user.

Specifically, the time sequence database receives a data query request, wherein the data query request carries time attribute information of data to be queried and a data tag of the data to be queried, a target storage file corresponding to the data to be queried is determined according to the time attribute information of the data to be queried, then a data type of the data to be queried is determined in the target storage file according to the data tag of the data to be queried, and then the data to be queried is queried from the sub-partition by determining the sub-partition corresponding to the data type of the data to be queried.

In practical application, the data query request of the user may be a query request of a single data, and the storage location of the data to be queried may be further determined according to time attribute information and a data tag carried in the single data query request, and the data to be queried is queried from the sub-partition corresponding to the data type of the data to be queried.

For example, when the user requests a data query for temperature data 2021-1-1:00, determining that time attribute information of the data to be queried is 2021-1-1:00:01, determining that a data tag of the data to be queried is temperature data, and further determining a target storage file of the data to be queried according to the time stamp, wherein the target storage file is stored according to a time line sequence, determining that the target storage file can be determined through an index file of the time line, determining that the data type of the temperature data stored by 2021-1-1:00:01 is an integer type in the target storage file, and continuously querying specific integer type data from an integer type sub-partition, wherein finally the queried 2021-1-1:01 temperature data is 15 degrees.

In practical application, if the queried data is numeric string data, the data anomaly at that time can be determined by the string data, and the embodiment of the present specification does not limit the case of the anomaly data.

Further, after the time sequence database receives at least two data query requests, respectively querying the data to be queried in the sub-partition corresponding to the data type of each data to be queried, and specifically, the data processing method further includes:

Receiving at least two data query requests, wherein the data query requests carry time attribute information of at least two data to be queried and data tags of the at least two data to be queried;

Determining a target storage file corresponding to each piece of data to be queried based on the time attribute information of the at least two pieces of data to be queried;

Determining the data type of each piece of data to be queried based on the data tag of each piece of data to be queried in the target storage file corresponding to each piece of data to be queried;

determining the sub-partition corresponding to the data type of each piece of data to be queried, and querying the data to be queried from the sub-partition corresponding to the data type of each piece of data to be queried.

Specifically, the time sequence database receives at least two data query requests, wherein each data query request carries time attribute information of each data to be queried and a data tag of each data to be queried, a target storage file corresponding to each data to be queried is determined according to the time attribute information of the data to be queried, the data type of each data to be queried is determined in the corresponding target storage file according to the data tag of each data to be queried, and then the data to be queried is queried from each sub-partition by determining the sub-partition corresponding to the data type of each data to be queried.

For example, if the user requests a data query of humidity data within 5 days of 2021-1 to 2021-1-5, a target storage file of data to be queried is determined according to a time stamp of the query data, and temperature data within each time line is determined in the target storage file, wherein the temperature data of 2021-1-1 to be queried is 5 degrees, the temperature data of 2021-1-2 is 4 degrees, the temperature data of 2021-1-3 is 2.1 degrees, the temperature data of 2021-1-4 is 1 degree, and the temperature data of 2021-1-5 is 3.2 degrees.

According to the data processing method provided by the embodiment of the specification, the data to be queried is queried in the sub-partitions corresponding to different data types of the data to be queried respectively, so that the user can query the data effectively through querying the designated data types, and the data query efficiency is improved.

After a plurality of data are queried, data aggregation or downsampling operation can be performed according to the plurality of data to realize data query, specifically, after the data to be queried are queried based on the sub-partition corresponding to the data type of each data to be queried, the method further comprises the following steps:

Receiving a data aggregation request aiming at the data to be queried, and determining the data type of each data to be queried based on the data aggregation request;

Determining data to be processed based on the data type of each data to be queried, and processing the data to be processed to obtain at least two processed data to be queried, wherein the data types of the at least two data to be queried are the same;

And calculating aggregate data based on the at least two data to be queried after processing.

The data aggregation request may be understood as a processing request for performing an aggregation operation on data, for example, performing a summation operation on data to be queried or performing an average operation on data to be queried to obtain the data to be queried after the operation.

Specifically, after receiving a data aggregation request of data to be queried, the time sequence database can determine the data type of each data to be queried, then determine the data to be processed according to different data types, process the data to be processed to obtain at least two processed data to be queried, wherein the data types of the processed data to be queried are the same, and calculate the aggregated data for the processed at least two data to be queried.

In practical application, due to different devices for recording data or different system levels, the received data types are different, so that when at least two data to be queried are processed subsequently, the situation that the data processing cannot be realized due to different data types can occur, and format conversion processing is needed to be performed on the data to be queried of different types, so that the data types of all the data to be queried are the same, and the aggregate data is calculated subsequently according to the data to be queried of the same type.

Along with the above example, in the data to be queried, only 2021-1-3 temperature data is 2.1 degrees and 2021-1-5 temperature data is 3.2 degrees and floating point type data, then the two floating point type data can be converted into integer type data, after format conversion, 2021-1-3 temperature data is 2 degrees and 2021-1-5 temperature data is 3 degrees, so that the sum operation or the average operation of the integer type data for 5 days can be realized when 2021-1-1 to 2021-1-5 humidity data within 5 days are all integer type data.

It should be noted that, for the string-type data, format conversion may also be performed, where a specific format conversion manner may be set according to practical applications, and the embodiment of the present disclosure is not limited in any way.

In summary, according to the data processing method provided by the embodiment of the present disclosure, the data tag of the data to be stored is determined according to the attribute information of the data to be stored, the data partition is created according to the data tag of the data to be stored, the data type of the data to be stored corresponding to the data tag is determined, the data partition is further divided into corresponding sub-partitions according to the data type of the data to be stored, and the data to be stored is stored in the sub-partitions corresponding to the data type, so that the data to be stored is prevented from being stored in the database due to the data type of the data to be stored, and the data loss caused by the failure of storing the data to be stored is avoided, so that the safety of data storage is improved, the rapid storage of massive time sequence data is also satisfied, the storage efficiency is improved, and further the data query efficiency is also improved.

Referring to fig. 4, fig. 4 is a schematic diagram illustrating a temperature storage process of a data processing method according to an embodiment of the present disclosure.

In fig. 4, the part a is in a data storage file format, the bottom of the part a is index data, the above storage file of the part a can be searched by the index data, after the storage position of the data to be stored is determined by the index data and the file 3 is stored, the storage file 3 includes time line index data and time line data of the part B in fig. 4, each time line data includes a data tag, for example, the time line data 2 of the part B in fig. 4 includes a plurality of data tags of the part C, wherein the data tag 1 of the part C in the figure includes three types, which are respectively in a floating point type, an integer type and a character string type, and the data to be stored is stored under the partition of each data type, for example, as shown in the part D in fig. 4, wherein the data to be stored is in the format of the data to be stored, for example, the part E in fig. 4 includes a time stamp and a data value.

In practical application, the storage position of the data to be stored is determined by the offset in the index data, so that the data storage of a large amount of time sequence data through data classification is realized. Based on the method, the dynamic data types are created, and the data of different data types are stored respectively, so that the convenience of using the time sequence database by a user is greatly improved.

In summary, the data tag of the data to be stored is determined according to the attribute information of the data to be stored, the data partition is created according to the data tag of the data to be stored, the data type of the data to be stored corresponding to the data tag is determined, the data partition is further divided into corresponding sub-partitions according to the data type of the data to be stored, the data to be stored is stored in the sub-partitions corresponding to the data type, the data to be stored is stored according to the data type, the data loss caused by the fact that the data to be stored cannot be stored due to the data type of the data to be stored in the database is avoided, the safety of the data storage is improved, the rapid storage of massive time sequence data is also met, the storage efficiency is improved, and the follow-up data query efficiency is further improved.

Corresponding to the above method embodiments, the present disclosure further provides an embodiment of a data processing apparatus, and fig. 5 shows a schematic structural diagram of a data processing apparatus according to one embodiment of the present disclosure. As shown in fig. 5, the apparatus includes:

The receiving module 502 is configured to receive a data storage request, where the data storage request carries data to be stored, attribute information of the data to be stored, and a data type of the data to be stored;

A determining module 504 configured to determine a data tag of the data to be stored based on attribute information of the data to be stored;

A creating module 506, configured to create a data partition based on the data tag, and determine a data type of the data to be stored corresponding to the data tag;

A dividing module 508, configured to divide, based on a data type of the data to be stored corresponding to the data tag, a data partition corresponding to the data tag into sub-partitions corresponding to the data type;

The storage module 510 is configured to store the data to be stored corresponding to the data tag to the sub-partition corresponding to the data type based on the data type of the data to be stored corresponding to the data tag.

Optionally, the partitioning module 508 is further configured to:

Optionally, the storage module 510 is further configured to:

Optionally, the data processing apparatus further includes:

Storing the data to be stored corresponding to the data tag to the data type to determine the data type as a target data type;

Optionally, the data processing apparatus further includes:

According to the data processing device provided by the embodiment of the specification, the data label of the data to be stored is determined through the attribute information of the data to be stored, the data partition is created according to the data label of the data to be stored, the data type of the data to be stored corresponding to the data label is determined, the data partition is divided into the corresponding sub-partitions according to the data type of the data to be stored, the data to be stored is stored in the sub-partitions corresponding to the data type, the data to be stored is prevented from being stored according to the data type, the data loss caused by the fact that the data to be stored cannot be stored due to the data type of the data to be stored in a database is avoided, the safety of data storage is improved, the rapid storage of massive time sequence data is met, the storage efficiency is improved, and the follow-up data query efficiency is also improved.

The above is a schematic solution of a data processing apparatus of the present embodiment. It should be noted that, the technical solution of the data processing apparatus and the technical solution of the data processing method belong to the same conception, and details of the technical solution of the data processing apparatus, which are not described in detail, can be referred to the description of the technical solution of the data processing method.

Fig. 6 illustrates a block diagram of a computing device 600 provided in accordance with one embodiment of the present description. The components of computing device 600 include, but are not limited to, memory 610 and processor 620. The processor 620 is coupled to the memory 610 via a bus 630 and a database 650 is used to hold data.

Computing device 600 also includes access device 640, access device 640 enabling computing device 600 to communicate via one or more networks 660. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. The access device 640 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present description, the above-described components of computing device 600, as well as other components not shown in FIG. 6, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device shown in FIG. 6 is for exemplary purposes only and is not intended to limit the scope of the present description. Those skilled in the art may add or replace other components as desired.

Computing device 600 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smart phone), wearable computing device (e.g., smart watch, smart glasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 600 may also be a mobile or stationary server.

Wherein the processor 620 is configured to implement the steps of the data processing method when executing computer-executable instructions as follows.

The foregoing is a schematic illustration of a computing device of this embodiment. It should be noted that, the technical solution of the computing device and the technical solution of the data processing method belong to the same concept, and details of the technical solution of the computing device, which are not described in detail, can be referred to the description of the technical solution of the data processing method.

An embodiment of the present specification also provides a computer-readable storage medium storing computer instructions which, when executed by a processor, are configured to implement the steps of a data processing method as described above.

The above is an exemplary version of a computer-readable storage medium of the present embodiment. It should be noted that, the technical solution of the storage medium and the technical solution of the data processing method belong to the same concept, and details of the technical solution of the storage medium which are not described in detail can be referred to the description of the technical solution of the data processing method.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

The computer instructions include computer program code that may be in source code form, object code form, executable file or some intermediate form, etc. The computer readable medium may include any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.

It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the embodiments are not limited by the order of actions described, as some steps may be performed in other order or simultaneously according to the embodiments of the present disclosure. Further, those skilled in the art will appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily all required for the embodiments described in the specification.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

The preferred embodiments of the present specification disclosed above are merely used to help clarify the present specification. Alternative embodiments are not intended to be exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the teaching of the embodiments. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. This specification is to be limited only by the claims and the full scope and equivalents thereof.

Claims

1. A data processing method, comprising:

Receive a data storage request, wherein the data storage request carries data to be stored, attribute information of the data to be stored, and a data type of the data to be stored; the data type is a data form or type of the data to be stored, including integer data, floating point data, and numerical string data;

Determining a data tag of the data to be stored based on the attribute information of the data to be stored, wherein the data tag represents tag information of the data attribute of the data to be stored;

Creating a data partition based on the data tag, and determining the data type of the data to be stored corresponding to the data tag;

Based on the data type of the data to be stored corresponding to the data tag, the data partition corresponding to the data tag is divided into sub-partitions corresponding to the data type, wherein based on the data type of the data to be stored corresponding to the data tag, the data partition corresponding to the data tag is divided into sub-partitions corresponding to the data type, including: based on at least two data types of the data to be stored corresponding to the data tag, the data partition corresponding to the data tag is divided into at least two sub-partitions corresponding to the at least two data types;

Based on the data type of the to-be-stored data corresponding to the data tag, the to-be-stored data corresponding to the data tag is stored in the sub-partition corresponding to the data type.

2. The data processing method according to claim 1, storing the data to be stored corresponding to the data tag in the sub-partition corresponding to the data type based on the data type of the data to be stored corresponding to the data tag, comprises:

Determining a target storage file for the data to be stored based on the data to be stored corresponding to the data tag;

Determine a storage timestamp of the data to be stored in the target storage file, and determine a storage location of the data to be stored in the subpartition corresponding to the data type based on the storage timestamp and the subpartition corresponding to the data type of the data to be stored;

The data to be stored is stored in a to-be-stored location of the sub-partition corresponding to the data type, wherein the data to be stored includes a storage timestamp and a storage data value.

3. The data processing method according to claim 2, after storing the to-be-stored data corresponding to the data tag in the sub-partition corresponding to the data type, further comprising:

Based on the data type of the to-be-stored data corresponding to the data tag, the to-be-stored data stored in the sub-partition corresponding to each data type is compressed and stored.

4. The data processing method according to claim 1 or 3, after storing the to-be-stored data corresponding to the data tag in the sub-partition corresponding to the data type, further comprising:

Determine the data type of the data to be stored corresponding to the data tag as the target data type;

Determine the data type to be filled in the data partition corresponding to the data tag based on the target data type, and determine the position to be filled for the data to be stored based on the data type to be filled;

The position to be filled is filled with data based on a preset filling rule.

5. The data processing method according to claim 1, further comprising:

Receive a data query request, wherein the data query request carries time attribute information of the data to be queried and a data tag of the data to be queried;

Determine the target storage file corresponding to the data to be queried based on the time attribute information of the data to be queried;

The sub-partition corresponding to the data type of the data to be queried is determined, and the data to be queried is queried from the sub-partition corresponding to the data type of the data to be queried.

6. The data processing method according to claim 5, further comprising:

Receive at least two data query requests, wherein the data query requests carry time attribute information of at least two data to be queried and data tags of at least two data to be queried;

Determine a target storage file corresponding to each data to be queried based on the time attribute information of the at least two data to be queried;

Determine the data type of each to-be-queried data in the target storage file corresponding to each to-be-queried data based on the data label of each to-be-queried data;

The sub-partition corresponding to the data type of each data to be queried is determined, and the data to be queried is queried from the sub-partition corresponding to the data type of each data to be queried.

7. The data processing method according to claim 6, after querying the data to be queried based on the sub-partition corresponding to the data type of each data to be queried, further comprising:

receiving a data aggregation request for the data to be queried, and determining a data type of each data to be queried based on the data aggregation request;

Aggregate data is calculated based on the processed at least two pieces of data to be queried.

8. A data processing device, comprising:

The receiving module is configured to receive a data storage request, wherein the data storage request carries data to be stored, attribute information of the data to be stored, and a data type of the data to be stored; the data type is a data form or type of the data to be stored, including integer data, floating point data, and numerical string data;

a determination module, configured to determine a data tag of the data to be stored based on the attribute information of the data to be stored, wherein the data tag represents tag information of the data attribute of the data to be stored;

A creation module, configured to create a data partition based on the data tag and determine the data type of the data to be stored corresponding to the data tag;

A partitioning module, configured to partition the data partition corresponding to the data tag into sub-partitions corresponding to the data type based on the data type of the to-be-stored data corresponding to the data tag;

The partitioning module is further configured to, based on at least two data types of the to-be-stored data corresponding to the data tag, partition the data partition corresponding to the data tag into at least two sub-partitions corresponding to the at least two data types;

The storage module is configured to store the data to be stored corresponding to the data tag in the sub-partition corresponding to the data type based on the data type of the data to be stored corresponding to the data tag.

9. A computing device comprising:

Memory and processor;

The memory is used to store computer-executable instructions, and the processor is used to execute the computer-executable instructions, wherein the processor implements the steps of the data processing method according to any one of claims 1 to 7 when executing the computer-executable instructions.

10. A computer-readable storage medium storing computer instructions, wherein the computer instructions, when executed by a processor, implement the steps of the data processing method according to any one of claims 1 to 7.