CN117880767B

CN117880767B - Short message data transmission method with low delay

Info

Publication number: CN117880767B
Application number: CN202410281618.3A
Authority: CN
Inventors: 王金龙; 尹意萍; 曾永明; 蓝丹丹
Original assignee: Shenzhen Chengliye Technology Development Co ltd
Current assignee: Shenzhen Chengliye Technology Development Co ltd
Priority date: 2024-03-13
Filing date: 2024-03-13
Publication date: 2024-05-28
Anticipated expiration: 2044-03-13
Also published as: CN117880767A

Abstract

The invention relates to the technical field of short message transmission, in particular to a low-delay short message data transmission method, which comprises the following steps: collecting Unicode coded text information to be transmitted by a user; acquiring character spacing of each character in Unicode coded text information of the short message; setting a breakpoint, constructing a breakpoint character segment, constructing a screening character segment, acquiring a breakpoint screening characteristic value of the screening character segment according to the number ratio of various characters in the screening character segment combination and the change condition of the character spacing, and primarily screening the breakpoint; obtaining a Huffman compression estimation amount according to the number of various characters and the number of characters required to be used for expressing the most basic Huffman compression tree vigor of the various characters, calculating the self-adaptation degree of each individual in a genetic algorithm, obtaining the finally segmented compression character segments, compressing the compression character segments and completing the transmission of the short message data. The invention improves the compression effect and accelerates the transmission of the short message data.

Description

Short message data transmission method with low delay

Technical Field

The application relates to the technical field of short message transmission, in particular to a low-delay short message data transmission method.

Background

The short message sending is the basic function of the smart phone, and the short message service is widely applied to administrative management, ticket information, financial communication, security verification and other functions. The original short message service intelligently sends text information, and with the continuous progress of technology, the current short message service can support the communication of multimedia information such as images, videos and the like. In the current technical research field, main research directions include accelerating the transmission speed of a short message, enhancing the data security of the transmission of the short message, and the like.

In the process of transmitting short message data, the data type frequently transmitted is text data, in order to accelerate the transmission of the text data, a compression algorithm is often adopted to compress the data to be transmitted, and then the compressed data is communicated for transmission, so that the content of the transmitted data is reduced.

The conventional compression algorithm is a huffman compression algorithm, and the conventional huffman compression algorithm adopts a fixed length to divide the transmission text, and performs huffman compression once on each segmented character segment. For text data of short messages, the characters are unevenly distributed in the whole text data due to writing habit, so that a traditional text segmentation mode adopting a fixed length can lead to segmented text character segments to contain characters with low frequency, lead to oversubstantial Huffman tree, reduce compression effect and delay transmission of the short message data.

Disclosure of Invention

In order to solve the technical problems, the invention provides a low-delay short message data transmission method to solve the existing problems.

The invention relates to a low-delay short message data transmission method which adopts the following technical scheme:

the embodiment of the invention provides a low-delay short message data transmission method, which comprises the following steps:

collecting Unicode coded text information to be transmitted by a user;

Acquiring character spacing of each character in the Unicode-encoded text message according to the intervals among the characters of the Unicode-encoded text message; setting a breakpoint according to the character spacing of each character in the Unicode coded text message, and constructing a breakpoint character segment; combining each breakpoint character segment in Unicode coded text information to form a screening character segment, and acquiring breakpoint screening characteristic values of the screening character segment according to the number proportion of various characters in the screening character segment combination and the change condition of character spacing; primarily screening break points according to break point screening characteristic values of the screening character segments and a Bayesian jump point detection algorithm; obtaining a Huffman compression estimation amount according to the number of various characters and the number of characters required to be used for expressing the most basic Huffman compression tree potential of various characters; the break points obtained by preliminary screening are formed into individuals of a genetic algorithm, and the self-adaptation degree of each individual is obtained according to the Huffman compression estimation amount and the character number of each character segment obtained by segmentation in the optimization process;

and dividing Unicode codes by adopting a genetic algorithm in combination with the self-adaptation degree of each individual to obtain final divided compressed character segments, and adopting a Huffman compression algorithm to compress each final divided compressed character segment so as to finish the transmission of short message data.

Preferably, the obtaining the character spacing of each character in the Unicode-encoded text message according to the interval between characters of the Unicode-encoded text message includes:

Counting the positions of various characters in Unicode coded text information of a short message, wherein the character spacing expression is as follows:

In the method, in the process of the invention, Is the character spacing between the position of the nth character appearing at the s time and the position of the last occurrence in Unicode coded text message,/>、/>The nth character is respectively in the s-th appearance position and the s+1th appearance position in the Unicode coded text message.

Preferably, the setting a breakpoint according to the character spacing of each character in the Unicode encoded text message, and constructing a breakpoint character segment includes:

Arranging all character intervals obtained by calculation in Unicode coded text information of a short message according to the sequence from large to small, selecting the character interval of the A before ranking, and presetting two break points between the position of the leftmost character of the selected character interval and the position of the rightmost character; and taking the character segment between any two adjacent break points in the Unicode coded text information of the short message as a break point character segment.

Preferably, the constructing the screening character segment by combining each breakpoint character segment in the Unicode-encoded text message includes:

for each breakpoint character segment, the first breakpoint character segment is used as a first screening character segment, the second breakpoint character segment is added into the first screening character segment to form a second screening character segment together, and similarly, the first t continuous breakpoint character segments form a t screening character segment.

Preferably, the breakpoint filtering characteristic value of the filtering character segment is obtained according to the number ratio of various characters in the filtering character segment combination and the change condition of the character spacing, and the expression is as follows:

In the method, in the process of the invention, Is the breakpoint filtering characteristic value obtained by filtering the character segment t,/>Is the ratio of the number of m-th characters in the screening character segment t to the total number of characters in the screening character segment,/>The method is to screen the ratio of the number of characters of the mth character in the Unicode code of the text of the short message to the total number of characters in the Unicode of the text of the short message in the character segment t,/>The standard deviation of the character spacing of the mth character in the screening character segment t is the character type number in the screening character segment t.

Preferably, the primary screening of the break points according to the break point screening feature values of the screening character segments and the bayesian jump point detection algorithm includes:

The breakpoint screening characteristic values of all the screening character segments are formed into a breakpoint screening characteristic value sequence, and Bayesian jump point detection is carried out on the breakpoint screening characteristic value sequence;

when a jump point is smaller than the previous breakpoint screening characteristic value and B breakpoint screening characteristic values after the jump point do not appear, the jump point is used as a selected screening character segment;

And for all the selected screening character segments, reserving the break points at the rightmost side and the leftmost side of the selected screening character segments, deleting other break points in the middle of the selected screening character segments, and primarily screening the break points.

Preferably, the huffman compression estimation method is obtained according to the number of various characters and the number of characters required to be used for expressing the most basic huffman compression tree potential of various characters, and comprises the following steps:

Taking the same character as the same type, counting the character type number P in Unicode coding of the text of the short message, wherein the expression of the Huffman compression estimator Et is as follows:

In the method, in the process of the invention, Is the binary length of the characters expressed by the original coding mode, p is the ranking of the p-th character of the characters in Unicode coding of the text of the short message after the characters are ranked from big to small according to the number of the characters, and the method comprises the steps ofIs the number of characters contained in the p-th character in Unicode code of the text of the short message,/>The number of characters required to be used when expressing the most basic huffman compression tree containing P characters in binary coding.

Preferably, the obtaining the self-adaptation degree of each individual according to the huffman compression estimation and the character number of each character segment obtained by segmentation in the optimization process includes:

in the genetic algorithm optimization process, each individual divides Unicode coded text information into a plurality of compressed character segments, for the individual u, the ratio of Huffman compression estimation to the total number of characters of each compressed character segment obtained by division is calculated, and the sum of the ratio obtained by calculation of all compressed character segments divided by the individual u is used as the self-adaption degree of the individual u.

Preferably, the segmenting the Unicode code by adopting a genetic algorithm and combining the self-fitness of each individual to obtain the final segmented compressed character segment comprises the following steps:

And taking the individual with the minimum self-adaption degree as an optimal individual, and dividing the Unicode-encoded short message text information through the optimal individual, wherein a breakpoint with the breakpoint position of 1 of the optimal individual is taken as a dividing point of the Unicode-encoded short message text information, and the Unicode-encoded short message text information is divided at the position of each dividing point.

Preferably, the compressing the finally segmented compressed character segments includes: and taking the finally divided compressed character segments as input of a Huffman compression algorithm, and compressing the compressed character segments by adopting the Huffman compression algorithm.

The invention has at least the following beneficial effects:

According to the characteristic that the higher the character repetition frequency is, the better the Huffman compression effect is, the character spacing is calculated, and break points are arranged between characters with long character spacing and are used as the basis for dividing the original short message coding data; further, the screening character segments are set according to the breakpoints, the difference between the local frequency and the overall frequency of each character in the screening character segments, the character interval marking difference of each character and the character category number are calculated, the cutting effect of the screening character segments on the original short message coding data is represented, a large number of breakpoints are preliminarily filtered, the searched search space is greatly reduced when the compressed character segments are finally determined by using an optimization algorithm, and the problems of excessive consumption of calculation resources and overlong algorithm calculation time are avoided.

Furthermore, the invention takes the compression effect of the most basic Huffman compression number as a self-adaptive degree function, takes the screened break points as individuals, adopts a genetic algorithm to cut the original short message coding data, optimizes the best compression character segment cutting method, replaces the traditional compression character segment with fixed length, can obtain better compression effect in comparison, and accelerates the transmission of the short message data.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flow chart of steps of a low-delay short message data transmission method provided by the invention;

FIG. 2 is a character distribution feature pictorial intent;

fig. 3 is a schematic diagram of character breakpoint setting.

Detailed Description

In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following detailed description refers to the specific implementation, structure, characteristics and effects of a low-delay short message data transmission method according to the invention by combining the accompanying drawings and the preferred embodiment. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

The following specifically describes a specific scheme of the low-delay short message data transmission method provided by the invention with reference to the accompanying drawings.

The invention provides a low-delay short message data transmission method, concretely, the invention provides a low-delay short message data transmission method, referring to fig. 1, comprising the following steps:

step S001, obtaining the text information of the short message transmitted by the user.

The embodiment of the invention aims to process the short message data in the short message data transmission process, improve the compression effect and ensure the transmission efficiency. Therefore, the text information of the short message to be transmitted by the user is collected first and is used as basic data for the subsequent short message data transmission analysis. Note that, the text information adopts Unicode encoding.

Step S002, constructing a character distribution feature map and setting break points.

First, the present embodiment will calculate the character spacing from the spacing between characters, expressed as:

In the method, in the process of the invention, Is the character spacing between the position of the nth character appearing at the s time and the position of the last occurrence in Unicode coded text message,/>、/>The nth character is respectively in the s-th appearance position and the s+1-th appearance position in the Unicode coded text message.

In the formula, the character spacing of each character of the nth character in Unicode coding is calculated, the distribution characteristics of the characters are represented, the smaller the character spacing is, the denser the nth character of the Unicode coding is locally distributed, and the region with dense character distribution is used as a character segment for carrying out Huffman compression, so that a better compression effect can be achieved.

Further, a character distribution characteristic diagram is constructed, as shown in fig. 2, wherein the character distribution characteristic diagram is a character spacing diagram of 3 characters, and the characters shown in fig. 2 are not uniformly distributed in the whole text, so that a breakpoint can be set at a position with a longer character spacing of an nth character and used as a breakpoint for dividing a character segment, thereby achieving a better compression effect.

Therefore, all the calculated character pitches are obtained, the lengths of the character pitches are arranged from large to small, and the character pitch of the first A is selected. Further, two break points are manually set between the position of the leftmost character and the position of the rightmost character of the selected character spacing; i.e. two break points are set per character spacing. In this embodiment, a=5, and the practitioner can set the settings by himself. The setting schematic diagram of the break point is shown in fig. 3, and the break point is set in the selected character spacing.

The character segments between any two adjacent break points are connected to be used as a break point character segment, and finally the Unicode coded text message is divided into a plurality of break point character segments to replace the traditional segmentation mode with fixed character segment length, so that the final text message can obtain better compression effect compared with the traditional Huffman compression.

Step S003, the break points are initially screened.

In the embodiment, the break points are combined by adopting the optimization algorithm to obtain the final character segment segmentation structure, but because the break points are set in step S002, only the character spacing is considered, so that the number of the break points is excessive, the number of the break points is required to be primarily screened, the possible character segment segmentation mode is reduced, the search space when the optimization algorithm is finally used is reduced, the algorithm of the invention operates faster, and the consumption of computing resources is reduced.

For breakpoint character segments, the embodiment takes the first t consecutive breakpoint character segments as the t screening character segments, constructs breakpoint screening characteristic values, and calculates the following formula:

In the method, in the process of the invention, Is the breakpoint filtering characteristic value obtained by filtering the character segment t, M is the character type number in the filtering character segment t, and is the number of character types in the filtering character segment tIs the ratio of the number of m-th characters in the screening character segment combined by the first t breakpoint character segments to the total number of characters in the screening character segment,/>The method is to screen the ratio of the number of characters of the mth character in the Unicode code of the text of the short message in the character segment t to the total number of the Unicode code characters of the text of the short message,/>Is the standard deviation of the character spacing of the mth character in the screening character segment formed by combining the first t breakpoint character segments.

In the formula, the breakpoint screening characteristic value is calculated by combining the first t breakpoint character segments into a screening character segment, and the larger the breakpoint screening characteristic value is, the better the Huffman compression effect of the compressed character segment formed by the first t breakpoint character segments is represented, so that the breakpoints are screened based on the breakpoint screening characteristic value.

In the formula, when the number of m-th character in the screening character segment is equal to the ratioRatio to number of whole characters/>The larger the difference value is, the larger the character repetition frequency in the screening character segment is, and the better the Huffman compression effect is on the screening character segment, and the larger the corresponding breakpoint screening characteristic value is. The larger the standard deviation of the character spacing in the screening character segment is, the more uneven the character spacing distribution is, the more the break points exist in the screening character segment are likely to be, the worse the effect of Huffman compression on the screening character segment is, and the smaller the corresponding break point screening characteristic value is. The more the number of character types in the screening character segment or the more the total number of break points in the screening character segment, the more complex the corresponding Huffman tree structure is, the worse the Huffman compression effect is caused, and the smaller the corresponding break point screening characteristic value is.

Therefore, by the method of this embodiment, for Unicode encoding of short message text, the 1 st breakpoint character segment is used as the first filtering character segment, and the breakpoint filtering feature value is calculated; Further adding the 2 nd breakpoint character segment into the first screening character segment to form a second screening character segment, and calculating a breakpoint screening characteristic value/>; Similarly, the obtained breakpoint screening characteristic values form a breakpoint screening characteristic value sequence, bayesian jump point detection is carried out on the breakpoint screening characteristic value sequence, when one jump point is smaller than the previous breakpoint screening characteristic value and B breakpoint screening characteristic values after the jump point do not appear, the jump point is used as a selected screening character segment, the rightmost breakpoint and the leftmost breakpoint of the selected screening character segment are reserved, and the middle breakpoint is deleted; and deleting the selected screening character segments from Unicode codes of the short message text, repeating the steps until all breakpoint character segments are traversed, and finishing the preliminary screening of breakpoints, thereby greatly reducing the search space of a genetic algorithm and unnecessary searching. Bayesian point detection is a common-function technique in the art and will not be described in detail.

And S004, calculating Huffman compression estimation, constructing a self-adaption degree function, obtaining an optimal character segment segmentation mode, and completing data compression.

In the embodiment, a genetic algorithm is adopted to carry out final screening on the preliminarily selected break points, and an optimal solution is obtained in an optimal iteration mode so as to obtain the optimal segmentation position of the Unicode coded text message. For convenience in calculation, the embodiment uses a conventional huffman compression tree as an estimated compression result, and calculates a huffman compression estimation amount:

Where Et is Huffman compression estimator, P is the number of character types in Unicode encoding of the text of the SMS, Is the binary length of the characters expressed by the original coding mode, p is the ranking of the p-th character of the characters in Unicode coding of the text of the short message after the characters are ranked from big to small according to the number of the characters, and the method comprises the steps ofIs the number of characters contained in the p-th character in Unicode code of the text of the short message,/>The number of characters required to be used when the most basic Huffman compression tree containing P characters is expressed by binary codes is obtained through statistics in the prior art.

Further, the genetic algorithm is adopted to carry out final selection on the preliminarily selected breakpoint, and after the breakpoint is preliminarily selected, the embodiment adopts the optimization algorithm to carry out final setting on the breakpoint, and compared with the method adopting the optimization algorithm to directly select the optimal individual from the text message information coded by Unicode, namely the optimal breakpoint setting result, the embodiment can reduce the calculated amount, avoid the local optimal solution and improve the accuracy of breakpoint position setting through the preliminarily selected breakpoint. The final breakpoint setting process specifically comprises the following steps:

Assuming that R breakpoints are reserved in the step S003, the process divides the Unicode-encoded text message by selecting an optimal breakpoint combination from the R breakpoints, wherein the obtained R breakpoints form individuals of a genetic algorithm, and when the R-th breakpoint position is 0, the condition that the Unicode-encoded text message is not divided at the R-th breakpoint position is indicated; when the position of the r-th breakpoint is 1, the corresponding breakpoint is the segmentation point of the Unicode-encoded text message, which means that the Unicode-encoded text message is segmented at the position of the r-th breakpoint. For each individual, the original Unicode code may be partitioned into multiple segments.

In order to obtain the optimal individual and realize the effective segmentation of Unicode coded text message, the embodiment further calculates the self-adaptation degree of each individual, which is specifically as follows: taking an individual u as an example, dividing the Unicode-encoded text message into a plurality of compressed character segments by the individual u, calculating the ratio of Huffman compression estimation quantity to the total number of characters of each segment, and calculating the sum of the ratio calculated by all the compressed character segments of the Unicode-encoded text message divided by the individual u to serve as the self-adaption degree of the individual u, wherein the smaller the self-adaption degree is, the more excellent the individual is.

According to the method described above in this embodiment, the degree of self-adaptation of each individual is obtained.

Setting each parameter in the genetic algorithm, wherein the crossing rate is set to be 0.2 in the embodiment; the variation rate is 0.05; population number 40; elite individual number 5; the iteration number 100 can be set by the practitioner, and this embodiment is not particularly limited.

Finally, through a genetic algorithm, taking the individual with the minimum self-adaption degree as an optimal individual, so that the optimal individual can be obtained according to the method of the embodiment. For the optimal individual, when the r-th breakpoint position is 1, the corresponding breakpoint is the segmentation point of the Unicode coded text message, and the Unicode coded text message is segmented through all the segmentation points in the optimal individual. According to the embodiment, the Unicode coded text message is segmented according to the optimal individuals to obtain a plurality of compressed character segments, and the compressed character segments with fixed length in the conventional Huffman compression are replaced to complete the compression of the text data of the short message. Compared with the traditional Huffman compression algorithm, the embodiment dynamically adjusts the length of the compressed character segment by calculating the distribution characteristics of the characters, thereby obtaining better compression effect and enabling the short message communication to reach faster speed.

In summary, according to the embodiment of the invention, according to the characteristics of higher character repetition frequency and better huffman compression effect, the character spacing is calculated, and break points are set between characters with long character spacing to serve as the basis for dividing the original short message coding data; further, the screening character segments are set according to the breakpoints, the difference between the local frequency and the overall frequency of each character in the screening character segments, the character interval marking difference of each character and the character category number are calculated, the cutting effect of the screening character segments on the original short message coding data is represented, a large number of breakpoints are preliminarily filtered, the searched search space is greatly reduced when the compressed character segments are finally determined by using an optimization algorithm, and the problems of excessive consumption of calculation resources and overlong algorithm calculation time are avoided.

Meanwhile, the embodiment of the invention takes the compression effect of the most basic Huffman compression number as a self-adaptive degree function, takes the screened break points as individuals, adopts a genetic algorithm to cut the original short message coding data, optimizes the best compression character segment cutting method, replaces the traditional compression character segment with a fixed length, can obtain better compression effect in comparison, and accelerates the transmission of the short message data.

It should be noted that: the sequence of the embodiments of the present invention is only for description, and does not represent the advantages and disadvantages of the embodiments. And the foregoing description has been directed to specific embodiments of this specification. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

In this specification, each embodiment is described in a progressive manner, and the same or similar parts of each embodiment are referred to each other, and each embodiment mainly describes differences from other embodiments.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; the technical solutions described in the foregoing embodiments are modified or some of the technical features are replaced equivalently, so that the essence of the corresponding technical solutions does not deviate from the scope of the technical solutions of the embodiments of the present application, and all the technical solutions are included in the protection scope of the present application.

Claims

1. A low-delay short message data transmission method is characterized by comprising the following steps:

collecting Unicode coded text information to be transmitted by a user;

2. The method for transmitting low-delay sms data of claim 1, wherein the obtaining the character spacing of each character in Unicode-encoded sms text message according to the interval between characters of Unicode-encoded sms text message comprises:

3. The method for transmitting low-delay short message data according to claim 1, wherein the setting a breakpoint according to the character spacing of each character in the Unicode-encoded text message, and constructing a breakpoint character segment, comprises:

4. The method for transmitting low-delay sms data of claim 3, wherein constructing the screening character segment by combining each breakpoint character segment in Unicode-encoded sms text message comprises:

5. The method for transmitting low-delay short message data as recited in claim 4, wherein the breakpoint selection feature value of the selection character segment is obtained according to the number ratio of the characters in the combination of the selection character segment and the change condition of the character spacing, and the expression is:

6. The method for transmitting low-delay short message data as recited in claim 5, wherein the step of primarily screening the break points according to the break point screening feature values of each screening character segment in combination with a bayesian jump point detection algorithm comprises the steps of:

7. The method for transmitting low-latency short message data according to claim 1, wherein said obtaining a huffman compression estimate based on the number of characters and the number of characters required to be used to express the most basic huffman compression tree potential of the characters comprises:

8. The method for transmitting low-delay short message data as recited in claim 7, wherein the obtaining the adaptation degree of each individual according to the huffman compression estimation and the number of characters of each character segment obtained by segmentation in the optimization process comprises:

9. The method for transmitting low-delay short message data according to claim 1, wherein the step of dividing Unicode codes by using a genetic algorithm in combination with the self-fitness of each individual to obtain the final divided compressed character segments comprises the steps of:

10. The method for transmitting low-delay short message data as recited in claim 1, wherein compressing each of the finally divided compressed character segments comprises: and taking the finally divided compressed character segments as input of a Huffman compression algorithm, and compressing the compressed character segments by adopting the Huffman compression algorithm.