+

CN111552716A - Privacy-protecting public substring determining method and device - Google Patents

Privacy-protecting public substring determining method and device Download PDF

Info

Publication number
CN111552716A
CN111552716A CN202010665112.4A CN202010665112A CN111552716A CN 111552716 A CN111552716 A CN 111552716A CN 202010665112 A CN202010665112 A CN 202010665112A CN 111552716 A CN111552716 A CN 111552716A
Authority
CN
China
Prior art keywords
character
string
bit
character string
bit value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010665112.4A
Other languages
Chinese (zh)
Inventor
尹红军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202010665112.4A priority Critical patent/CN111552716A/en
Publication of CN111552716A publication Critical patent/CN111552716A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Storage Device Security (AREA)

Abstract

The specification relates to the field of computer information processing, in particular to a public substring determining method and device capable of protecting privacy. The method comprises the following steps: converting the first character into a first bit string; sequentially comparing whether the bit value of each position in the first bit string is equal to the bit value of the corresponding position in the second bit string in the second device one by utilizing a bit equality judging circuit constructed based on a garbled circuit mechanism to obtain a bit string comparison result; determining whether the first character and the second character are equal or not according to the bit string comparison result; when the first character and the second character are equal, determining that the first character is a same character between the first character string and the second character string; and determining the longest common sub-string between the first character string and the second character string based on the positions of the same characters between the first character string and the second character string in the first character string and the second character string respectively.

Description

Privacy-protecting public substring determining method and device
Technical Field
One or more embodiments of the present disclosure relate to the field of computer information processing, and in particular, to a method and an apparatus for determining a public substring for protecting privacy.
Background
With the rapid development of big data, artificial intelligence, distributed computing and the like, personal information of users is increasingly used for computing and analyzing by various organizations and departments. The sharing or spreading of the personal information of the user among different organizations and departments seriously affects the stability of the society and the safety of the individual to some extent. Therefore, various machine learning and data statistics need to be established under the condition of protecting the privacy of the user to be developed continuously.
The longest common substring algorithm is a basic operation for text and sequence processing, and extraction and analysis of common parts of multi-party information are involved in the operation process, so how to protect the multi-party information in the process of performing the longest common substring algorithm is an urgent problem to be solved.
Disclosure of Invention
One or more embodiments of the present specification describe a public string determining method for protecting privacy, which may calculate a longest public string of data owned by both parties while guaranteeing privacy of both parties.
According to a first aspect, a public character string determination method for protecting privacy is provided, and is applied to a first device with a first character string, wherein the first character string at least comprises a first character; the method comprises the following steps: converting the first character into a first bit string; sequentially comparing whether the bit value of each position in the first bit string is equal to the bit value of the corresponding position in the second bit string in the second device one by utilizing a bit equality judging circuit constructed based on a garbled circuit mechanism to obtain a bit string comparison result; the second bit string is obtained by converting a second character, wherein the second character is a character at any position in the second character string owned by the second device; determining whether the first character and the second character are equal or not according to the bit string comparison result; wherein when the first character and the second character are equal, the first character is determined to be an identical character between the first character string and the second character string; and determining the longest common substring between the first character string and the second character string based on the positions of the same characters between the first character string and the second character string in the first character string and the second character string respectively.
In some embodiments, said converting said first character into a first bit string comprises: converting the first character into a first integer based on UTF encoding or ASCII encoding; converting the first integer to the first bit string.
In some embodiments, the sequentially comparing, one by one, whether the bit value of each position in the first bit string is equal to the bit value of the corresponding position in the second bit string in the second device by using the bit equality determination circuit constructed based on the garbled circuit mechanism to obtain the bit string comparison result includes: sending an aliasing truth table of the bit equality judgment circuit to the second device based on an aliasing circuit mechanism; sending an encryption key of a first bit value to the second device, the first bit value being a bit value of a first position in the first bit string, so that the second device decrypts the confusion truth table according to the encryption key of the first bit value and an encryption key of a second bit value, the second bit value being a bit value of the first position in the second bit string, the encryption key of the second bit value being received by the second device from the first device through an oblivious transmission protocol; receiving the decrypted decryption result from the second device, and determining whether the first bit value and the second bit value are equal according to the decryption result.
In some embodiments, the determining whether the first character and the second character are equal according to the bit string comparison result includes: and when the bit value of each position in the first bit string is equal to the bit value of the corresponding position in the second bit string, determining that the first character and the second character are equal.
In some embodiments, the determining the longest common substring between the first character string and the second character based on the positions of the same character between the first character string and the second character string in the first character string and the second character string, respectively, includes: constructing a matrix by taking each position in the first character string as a column and each position in the second character string as a row; when a character at a first position in the first character string is the same as a character at a second position in the second character string, setting matrix elements corresponding to the first position and the second position as a first value; in the matrix, connecting first values which are sequentially adjacent along the diagonal direction to obtain at least one diagonal line; determining a longest first diagonal line of the at least one diagonal line; and determining the character at the position corresponding to the first diagonal line in the first characters as the longest common substring.
In some embodiments, the first string and the second string belong to text or biometric information.
According to a first aspect, a public character string determining apparatus for protecting privacy is provided, configured to a first device having a first character string, where the first character string includes at least a first character; the device comprises: a conversion unit configured to convert the first character into a first bit string; the comparison unit is configured to compare whether the bit value of each position in the first bit string is equal to the bit value of the corresponding position in the second bit string in the second device one by one in sequence by using a bit equality judgment circuit constructed based on a garbled circuit mechanism to obtain a bit string comparison result; the second bit string is obtained by converting a second character, wherein the second character is a character at any position in the second character string owned by the second device; a first determining unit configured to determine whether the first character and the second character are equal according to the bit string comparison result; wherein when the first character and the second character are equal, the first character is determined to be an identical character between the first character string and the second character string; a second determining unit configured to determine a longest common sub-string between the first character string and the second character string based on positions of identical characters between the first character string and the second character string in the first character string and the second character string, respectively.
According to a third aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first aspect.
According to a fourth aspect, there is provided a computing device comprising a memory having stored therein executable code and a processor that, when executing the executable code, implements the method of the first aspect.
According to the method and the device provided by the embodiment of the specification, the longest common character string of the data owned by the two parties can be calculated under the condition that the two parties do not leak the data owned by each other.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic diagram of a system architecture to which the public string determination method for privacy protection disclosed in this specification is applicable;
FIG. 2 is a schematic diagram of a bit equality determination circuit disclosed in the present specification;
FIG. 3 illustrates a flow chart of a public string determination method for privacy protection disclosed herein;
fig. 4 shows a schematic block diagram of a public string determination apparatus for privacy protection disclosed in the present specification.
Detailed Description
The scheme provided by the specification is described below with reference to the accompanying drawings.
The longest common substring algorithm is a relatively classical computer algorithm that can query two strings of consecutive identical substring lengths. The longest common substring algorithm can be used for text similarity calculation, biological genetic similarity analysis, and the like.
Usually two strings belong to different respective owners, and therefore computing the longest common substring of the two strings involves a multi-party computation. In view of the current requirements of secure multi-party computation or privacy protection, it is desirable to complete the computation of the longest common substring without the parties revealing each other's data (strings). In the embodiments of the present specification, the longest common substring may also be referred to as a longest common string.
The embodiment of the specification provides a public character string determining method for protecting privacy, and the longest public substring calculation can be completed under the condition that a plurality of parties do not reveal respective character strings.
Next, referring to fig. 1, fig. 2, and fig. 3, a public character string determination method for protecting privacy provided in an embodiment of the present specification is specifically described.
The public string determination method for protecting privacy provided by the embodiment of the present specification can be applied to the system 100 shown in fig. 1. System 100 may include device 110 and device 120. Device 110 and/or device 120 may be any computing, processing capable apparatus, device, platform, cluster of devices.
Referring to fig. 1, the device 110 may include a processor 111, a communication interface 112, and a memory 113.
The memory 113 may store software programs, instructions, and related information. For example, the implementation program 1131 on the device 110 side, which may be used to store the public string determination method for protecting privacy provided in the embodiments of the present specification, a first string, a bit equality determination circuit, a truth table (which may also be referred to as a true truth table), an obfuscation truth table, a key with a bit value of "0", a key with a bit value of "1", and the like.
Illustratively, the first string may be text, e.g., a sentence. For example, the first string may be a "white dog".
Illustratively, the first string may be biogenetic information, such as a DeoxyriboNucleic Acid (DNA) sequence. Specifically, for example, the first string may be "ATCAATG".
For example, the bit equality determination circuit may be a preset logic circuit that may be used to determine whether two bit values are equal. In the embodiments of the present specification, that two bit values are equal means that the two bit values are both "0" or both "1".
In one example, referring to fig. 2, the bit equality determination circuit may be an exclusive or gate logic circuit. The circuit includes an input line x, an input line y, and an output line z. If the bit values input by the input line x and the input line y are the same (both "0" or both "1"), the output line outputs "0". If the bit values input by input line x and input line y are different (e.g., input line x inputs "0" and input line y inputs "1"), then output line y outputs "1". Thus, the truth table shown in table 1 can be constructed. In the embodiments of the present specification, the truth table herein may also be referred to as a true truth table.
TABLE 1
Figure 233547DEST_PATH_IMAGE001
Random numbers can be generatedK 0xCorresponding to the input bit value '0' of the input line x, a random number is generatedK 1xSetting generation of random number corresponding to input bit value "1" of input line xK 0yCorresponding to the input bit value '0' of the input line y, a random number is generatedK 1yCorresponding to input bit value '1' of input line y, generating random numberK 0zCorresponding to the output bit value '0' of the output line z, a random number is generatedK 1zThe output bit value "1" corresponding to the output line z. Thus, a randomized truth table as shown in Table 2 can be obtained.
TABLE 2
Figure DEST_PATH_IMAGE002
The random number corresponding to the input line x and the random number corresponding to the input line y can be used as a key (that is, the random number corresponding to the input line x and the random number corresponding to the input line y can be referred to as a key), and the random number corresponding to the output line z is encrypted, so that the encryption truth table shown in table 3 can be obtained.
TABLE 3
Figure 782340DEST_PATH_IMAGE003
The order of rows in the column where "z" is located in table 3 can be scrambled to obtain a confusion truth table. In one example, the confusion truth table may be as shown in Table 4.
TABLE 4
Figure DEST_PATH_IMAGE005AAAA
In the embodiments shown in fig. 2 and tables 1 to 4, the bit equality determination circuit is described by way of example, and is not limited thereto. In other embodiments of this specification, other types of logic circuits that can be used to determine whether two bits are equal to each other, and a truth table, a randomized truth table, an obfuscated truth table, etc. corresponding to the logic circuits may also be sampled, and are not described herein again.
The processor 111 may invoke software programs and information in the memory 113 to implement the associated functions. For example, the program 1131 may be invoked to execute the implementation steps, and the like, of the public string determination method for protecting privacy provided by the embodiment of the present specification in the device 110.
Communication interface 112 may communicate with device 120 to enable information interaction between device 110 and device 120.
With continued reference to fig. 1, the device 120 may include a processor 121, a communication interface 122, and a memory 123.
The memory 123 may store software programs, instructions, and related information. For example, the implementation program 1231, the second character string, and the like on the device 120 side of the public character string determination method for protecting privacy provided in the embodiments of the present specification may be stored. Illustratively, the second string may be text, e.g., a sentence. For example, the second string may be a "yellow-white dog". Illustratively, the second string may be genetic information, such as a DeoxyriboNucleic Acid (DNA) sequence. For example, the second string may be "TACAATG".
Processor 121 may invoke software programs and information in memory 123 to implement the associated functions. For example, the program 1231 may be called to perform the implementation steps, and the like, of the public string determination method for protecting privacy provided by the embodiment of the present specification on the device 120.
Communication interface 122 may communicate with device 110 to enable information interaction between device 110 and device 120.
Next, with reference to fig. 3, an example of a public character string determination method for protecting privacy provided in an embodiment of the present specification is described.
The steps shown in fig. 3 may be performed by the device 110 shown in fig. 1, wherein the first string of characters includes at least a first character. As shown in fig. 3, device 110 may perform: step 301, converting the first character into a first bit string; step 302, utilizing a bit equality judging circuit constructed based on a garbled circuit mechanism to sequentially compare whether the bit value of each position in the first bit string is equal to the bit value of the corresponding position in the second bit string in the second device one by one to obtain a bit string comparison result; the second bit string is obtained by converting a second character, wherein the second character is a character at any position in the second character string owned by the second device; step 303, determining whether the first character and the second character are equal according to the bit string comparison result; wherein when the first character and the second character are equal, the first character is determined to be an identical character between the first character string and the second character string; step 304, determining the longest common substring between the first character string and the second character string based on the position of each identical character between the first character string and the second character string in the first character string and the second character string respectively.
Next, each step described above will be specifically described with reference to specific examples.
First, in step 301, the first character is converted into a first bit string.
In some embodiments, device 110 may convert the first character to an integer using utf (unicode transformation format) encoding. The integer is then converted into a string of bits.
In some embodiments, device 110 may convert the first character to an integer using ASCII (American Standard Code for information exchange). The integer is then converted into a string of bits.
It will be appreciated that if the first character is a chinese character (e.g., "white"), then a character may be converted to a 16-bit string, i.e., the first bit string is a 16-bit string. If the first character is a letter (e.g., "a"), then a character may be converted to an 8-bit string, i.e., the first bit string is an 8-bit string.
Secondly, in step 302, using a bit equality judging circuit constructed based on a garbled circuit mechanism to sequentially compare whether the bit value of each position in the first bit string is equal to the bit value of the corresponding position in the second bit string in the second device one by one to obtain a bit string comparison result; the second bit string is obtained by converting a second character, and the second character is a character at any position in the second character string owned by the second device.
The related structure of the second device can be referred to as the device 120 in fig. 1, and is not described herein. Device 120 may convert any of the second characters into a bit string, e.g., may convert a second character in the second character string (e.g., "yellow" in "yellow-white dog") into a second bit string. For a specific conversion manner, reference may be made to the above description of step 301, and details are not described herein again. Illustratively, the device 120 may also associate the position of the second character in the second character string with each bit in the second bit string. Taking the second character as the ith character in the second character string as an example, each bit in the second bit string may be associated with i.
In some embodiments, device 110 may send an obfuscation truth table of the bit-equal decision circuit to device 120 based on an obfuscation circuit mechanism; and sending an encryption key of a first bit value to the second device, the first bit value being a bit value of a first position in the first bit string.
For example, device 110 may send the obfuscation truth table shown in Table 4 to device 120 and the key to the bit value of the first position in the first bit string (the position may be referred to herein as a bit)120. For convenience of description, a bit value of a first position in the first bit string may be referred to as a first bit value. For example, if the first bit value is "0", the key for the first bit value may be "in table 2"K 0x". As described above, the key is a random number. That is, the device 120 acquires a random number and cannot know the bit value of the first position in the first bit string, so that privacy protection of the first bit string or the first character can be achieved.
For example, device 110 may associate a key that is a first bit value with an identification of a first location to indicate to the device: the key of the first bit value of this transmission is the key of the bit value of the first position in the first bit string. Thus, device 120 may obtain the key for the bit value of the first position in the second bit string from device 110. For convenience of description, the bit value of the first position in the second bit string may be referred to as a second bit value.
Device 120 may receive the key of the second bit value from device 110 according to an Oblivious Transfer (OT) protocol. For example, the second bit value may be set to "0", and the key of the second bit value may be "in table 2"K 0y". The oblivious transmission can also be called as a lost transmission, and is a two-party communication protocol capable of protecting privacy, so that two communication parties can transmit data in a selective fuzzification mode. In particular, the sender, i.e. the device 110, may have a plurality of data, e.g.K 0yK 1y. The receiving party, i.e. the device 120, may receive the data of the plurality of data that it needs via an inadvertent transmission. In this process, the sender does not know which data the receiver receives, nor does the receiver know other data than the data it receives. The inadvertent transmission protocol is the underlying protocol of the garbled circuit and more specific details can be described with reference to the prior art and will not be described further herein.
In the manner described above, the device 120 may obtain a key for the input of the bit-relative decision circuit input line a (i.e., the first bit value) and a key for the input line b (i.e., the second bit value). Then, the device 120 can use the secondThe key of the one bit value and the key of the second bit value decrypt each row in the obfuscated truth table to obtain a unique correct result. For example, a key of a first bit value may be set to "K 0x", the key of the second bit value is"K 0y", then the only correct result is utilization"K 0x"and"K 0y'Pair'
Figure 542486DEST_PATH_IMAGE006
Obtained by decryption "K 0z”。
Because "K 0z"is a random number, and therefore, the device 120 cannot know the actual judgment result of the bit relative judgment circuit. That is, the device 120 does not know whether the first bit value and the second bit value are equal. Device 120 may send the unique correct result to device 110, and device 110 may obtain a true value corresponding to the unique correct result according to the true truth table shown in table 1 and the randomized truth table shown in table 2, thereby determining whether the first bit value and the second bit value are equal or identical. For example, the only correct result can be set to "K 0z", it may be determined that the first bit value and the second bit value are equal.
Illustratively, as described above, the device 120 may associate bits in the second bit string with the position of the second character in the second character string. Therefore, when the device 120 sends the unique correct result to the device 110, the position of the second character in the second character and the unique correct result may be sent to the device 110 together, so that the device 110 may know the position of the character corresponding to the unique correct result in the second character string.
With reference to the above manner, it can be determined one by one whether the bit value of each position in the first bit string is equal to the bit value of the corresponding position in the second bit string.
For example, when determining whether the bit value of each position in the first bit string is equal to the bit value of the corresponding position in the second bit string, the random number of the input bit value of the input line x, the random number of the input bit value of the input line y, and the random number of the output bit value of the input line z may be set to obtain different randomization truth tables, different encryption randomization truth tables, and different confusion truth tables, respectively. Then, whether the characteristic values of the positions in the first bit string and the second bit string are equal can be judged according to the random number of the input bit value of the input line x, the random number of the input bit value of the input line y, the random number of the output bit value of the input line z, the randomization truth table, the encryption randomization truth table and the confusion truth table corresponding to each position.
In other words, in this example, the random number of input bit values of input line x, the random number of input bit values of input line y, the random number of output bit values of input line z, the randomized truth table, the encrypted randomized truth table, and the obfuscated truth table are different when determining whether the characteristic values in the first bit string and the second bit string are equal at different positions. This prevents the device 110 from inferring the actual bit value corresponding to the random number of input bit values for the input line x after multiple rounds of decryption.
By the scheme of judging whether the bit value of each position in the first bit string is equal to the bit value of the corresponding position in the second bit string one by one, the bit string comparison result of the first bit string and the second bit string can be determined.
Thus, the device 110 may execute step 303 to determine whether the first character and the second character are equal according to the bit string comparison result; and when the first character and the second character are equal, determining that the first character is an identical character between the first character string and the second character string.
If the bit value of each position in the first bit string is equal to the bit value of the corresponding position in the second bit string, it can be determined that the first bit string is equal to the second bit string. That is, the first character and the second character may be determined to be equal or identical.
If the bit value of any position in the first bit string is not equal to the bit value of the corresponding position in the second bit string, it may be determined that the first bit string is not equal to the second bit string. That is, it may be determined that the first character and the second character are not equal.
For convenience of description, the character at a certain position in the second character string is directly referred to as a second character. In determining whether the first character and the second character are equal, the device 110 does not know which character the second character is, and only knows where in the second character string the second character is. Taking "yellow" in the case where the second character is "yellow-white dog", the device 110 does not know that the second character is "yellow", and only knows that the second character is the first character in the second string. Of course, when device 110 determines that the first character and the second character are equal, device 110 may know which character the second character is.
Referring to the scheme of determining whether the first character and the second character are equal, the device 110 may determine whether each character in the first character string is equal to a character in a respective position in the second character string.
Device 110 may then perform step 304 to determine a longest common substring between the first character string and the second character string based on the position of each identical character between the first character string and the second character string in the first character string and the second character string, respectively.
In some embodiments, a matrix may be constructed by taking each bit in the first character string as a column and each bit in the second character string as a row; when a character at a first position in the first character string is the same as a character at a second position in the second character string, setting matrix elements corresponding to the first position and the second position as a first value; in the matrix, connecting first values which are sequentially adjacent along the diagonal direction to obtain at least one diagonal line; determining that a first diagonal of the at least one diagonal is longest; and determining the character at the position corresponding to the first diagonal line in the first characters as the longest common substring.
The first string is a "white dog", the second string is a "yellow-white dog", and the first value is "1", which will be described in detail. Through steps 301 and 303, the device 110 may determine that the first character in the first character string is equal to the second character in the second character string, the second character in the first character string is equal to the third character in the second character string, the third character in the first character string is equal to the fourth character in the second character string, and the fifth character in the first character string is equal to the fifth character in the second character string. The device 110 uses each bit of the first character string as a column and each bit of the second character string as a row, and sets a matrix element corresponding to a position where the same character in the first character string and the second character string is located as "1", so that a matrix as shown in fig. 5 can be obtained.
TABLE 5
Figure 809519DEST_PATH_IMAGE007
As can be seen from table 5, the matrix element corresponding to the first character in the first character string and the second character in the second character string has a value of "1", the matrix element corresponding to the second character in the first character string and the third character in the second character string has a value of "1", and the matrix element corresponding to the third character in the first character string and the fourth character in the second character string has a value of "1", and are adjacent to each other in the order along the diagonal line. By connecting the three matrix elements whose values are "1", the longest diagonal can be obtained. Thus, the character in the first character string corresponding to the longest diagonal line can be taken as the longest common sub-string between the first character string and the second character string, i.e., "white".
Therefore, by the public character string determining method for protecting privacy provided by the embodiment of the specification, the longest public character string of data owned by two parties can be calculated and obtained under the condition that the two parties do not reveal the data owned by each other.
In a second aspect, embodiments of the present specification provide a public string determination apparatus 400 that protects privacy. Apparatus 400 may be configured with device 110. Referring to fig. 4, the apparatus 400 includes:
a conversion unit 410 configured to convert the first character into a first bit string;
a comparing unit 420 configured to compare, in sequence, one by one, whether a bit value of each position in the first bit string is equal to a bit value of a corresponding position in the second bit string in the second device, using a bit equality determining circuit constructed based on a garbled circuit mechanism, so as to obtain a bit string comparison result; the second bit string is obtained by converting a second character, wherein the second character is a character at any position in the second character string owned by the second device;
a first determining unit 430 configured to determine whether the first character and the second character are equal according to the bit string comparison result; wherein when the first character and the second character are equal, the first character is determined to be an identical character between the first character string and the second character string;
a second determining unit 440 configured to determine a longest common sub-string between the first character string and the second character string based on positions of respective identical characters between the first character string and the second character string in the first character string and the second character string, respectively.
In some embodiments, the conversion unit 410 is further configured to: converting the first character into a first integer based on UTF encoding or ASCII encoding; and converting the first integer into the first bit string.
In some embodiments, the comparing unit 420 includes a sending subunit, a determining subunit; the transmitting subunit is configured to transmit an aliasing truth table of the bit equality determination circuit to the second device based on an aliasing circuit mechanism; the transmitting subunit is further configured to transmit to the second device an encryption key of a first bit value, the first bit value being a bit value of a first position in the first bit string, for the second device to decrypt the confusion truth table according to the encryption key of the first bit value and an encryption key of a second bit value, the second bit value being a bit value of the first position in the second bit string, the encryption key of the second bit value being received by the second device from the first device through an oblivious transmission protocol; the determining subunit is further configured to receive the decrypted decryption result from the second device and determine whether the first bit value and the second bit value are equal according to the decryption result.
In some embodiments, the first determining unit 430 is further configured to determine that the first character and the second character are equal when the bit value of each position in the first bit string is equal to the bit value of the corresponding position in the second bit string.
In some embodiments, the second determining unit 440 includes: the method comprises the following steps of constructing a subunit, a connecting unit, a first determining subunit and a second determining subunit; the constructing subunit is configured to construct a matrix by taking each bit in the first character string as a column and each bit in the second character string as a row; when a character at a first position in the first character string is the same as a character at a second position in the second character string, setting matrix elements corresponding to the first position and the second position as a first value; the line connecting unit is configured to connect first values which are sequentially adjacent along an oblique diagonal direction in the matrix to obtain at least one diagonal line; the first determining subunit is configured to determine a longest first diagonal line of the at least one diagonal line; the second determining subunit is configured to determine a character of the first characters at a position corresponding to the first diagonal as the longest common substring.
In some embodiments, the first string and the second string belong to text or biometric information.
The public character string determining apparatus for protecting privacy provided by the embodiments of the present specification may calculate the longest public character string of data owned by both parties without revealing the data owned by each other.
Embodiments of the present specification also provide a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method shown in fig. 3.
Embodiments of the present specification also provide a computing device comprising a memory having stored therein executable code and a processor that, when executing the executable code, implements the method illustrated in fig. 3.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims (14)

1. A public character string determination method for protecting privacy is applied to a first device with a first character string, wherein the first character string at least comprises a first character; the method comprises the following steps:
converting the first character into a first bit string;
sequentially comparing whether the bit value of each position in the first bit string is equal to the bit value of the corresponding position in the second bit string in the second device one by utilizing a bit equality judging circuit constructed based on a garbled circuit mechanism to obtain a bit string comparison result; the second bit string is obtained by converting a second character, wherein the second character is a character at any position in the second character string owned by the second device;
determining whether the first character and the second character are equal or not according to the bit string comparison result; wherein when the first character and the second character are equal, the first character is determined to be an identical character between the first character string and the second character string;
and determining the longest common substring between the first character string and the second character string based on the positions of the same characters between the first character string and the second character string in the first character string and the second character string respectively.
2. The method of claim 1, wherein the converting the first character into a first bit string comprises:
converting the first character into a first integer based on UTF encoding or ASCII encoding;
converting the first integer to the first bit string.
3. The method of claim 1, wherein the comparing the bit value of each position in the first bit string with the bit value of the corresponding position in the second bit string in the second device one by one in sequence by using the bit equality determination circuit constructed based on the garbled circuit mechanism to obtain the bit string comparison result comprises:
sending an aliasing truth table of the bit equality judgment circuit to the second device based on an aliasing circuit mechanism;
sending an encryption key of a first bit value to the second device, the first bit value being a bit value of a first position in the first bit string, so that the second device decrypts the confusion truth table according to the encryption key of the first bit value and an encryption key of a second bit value, the second bit value being a bit value of the first position in the second bit string, the encryption key of the second bit value being received by the second device from the first device through an oblivious transmission protocol;
receiving the decrypted decryption result from the second device, and determining whether the first bit value and the second bit value are equal according to the decryption result.
4. The method of claim 1, wherein the determining whether the first character and the second character are equal according to the bit string comparison result comprises:
and when the bit value of each position in the first bit string is equal to the bit value of the corresponding position in the second bit string, determining that the first character and the second character are equal.
5. The method of claim 1, wherein the determining the longest common substring between the first character string and the second character string based on the location of the same character between the first character string and the second character string in the first character string and the second character string, respectively, comprises:
constructing a matrix by taking each position in the first character string as a column and each position in the second character string as a row; when a character at a first position in the first character string is the same as a character at a second position in the second character string, setting matrix elements corresponding to the first position and the second position as a first value;
in the matrix, connecting first values which are sequentially adjacent along the diagonal direction to obtain at least one diagonal line;
determining a longest first diagonal line of the at least one diagonal line;
and determining the character at the position corresponding to the first diagonal line in the first characters as the longest common substring.
6. The method of claim 1, wherein the first and second strings belong to text or biometric information.
7. A public character string determining device for protecting privacy is configured to a first device with a first character string, wherein the first character string at least comprises a first character; the device comprises:
a conversion unit configured to convert the first character into a first bit string;
the comparison unit is configured to compare whether the bit value of each position in the first bit string is equal to the bit value of the corresponding position in the second bit string in the second device one by one in sequence by using a bit equality judgment circuit constructed based on a garbled circuit mechanism to obtain a bit string comparison result; the second bit string is obtained by converting a second character, wherein the second character is a character at any position in the second character string owned by the second device;
a first determining unit configured to determine whether the first character and the second character are equal according to the bit string comparison result; wherein when the first character and the second character are equal, the first character is determined to be an identical character between the first character string and the second character string;
a second determining unit configured to determine a longest common sub-string between the first character string and the second character string based on positions of identical characters between the first character string and the second character string in the first character string and the second character string, respectively.
8. The apparatus of claim 7, wherein the conversion unit is further configured to:
converting the first character into a first integer based on UTF encoding or ASCII encoding;
converting the first integer to the first bit string.
9. The apparatus of claim 7, wherein the comparing unit comprises a transmitting subunit, a determining subunit;
the transmitting subunit is configured to transmit an aliasing truth table of the bit equality determination circuit to the second device based on an aliasing circuit mechanism;
the transmitting subunit is further configured to transmit to the second device an encryption key of a first bit value, the first bit value being a bit value of a first position in the first bit string, for the second device to decrypt the confusion truth table according to the encryption key of the first bit value and an encryption key of a second bit value, the second bit value being a bit value of the first position in the second bit string, the encryption key of the second bit value being received by the second device from the first device through an oblivious transmission protocol;
the determining subunit is further configured to receive the decrypted decryption result from the second device and determine whether the first bit value and the second bit value are equal according to the decryption result.
10. The apparatus of claim 7, wherein the first determination unit is further configured to determine that the first character and the second character are equal when a bit value of each position in the first bit string is equal to a bit value of a corresponding position in the second bit string.
11. The apparatus of claim 7, wherein the second determining unit comprises: the method comprises the following steps of constructing a subunit, a connecting unit, a first determining subunit and a second determining subunit;
the constructing subunit is configured to construct a matrix by taking each bit in the first character string as a column and each bit in the second character string as a row; when a character at a first position in the first character string is the same as a character at a second position in the second character string, setting matrix elements corresponding to the first position and the second position as a first value;
the line connecting unit is configured to connect first values which are sequentially adjacent along an oblique diagonal direction in the matrix to obtain at least one diagonal line;
the first determining subunit is configured to determine a longest first diagonal line of the at least one diagonal line;
the second determining subunit is configured to determine a character of the first characters at a position corresponding to the first diagonal as the longest common substring.
12. The apparatus of claim 7, wherein the first and second strings belong to text or biometric information.
13. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-6.
14. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that, when executed by the processor, implements the method of any of claims 1-6.
CN202010665112.4A 2020-07-10 2020-07-10 Privacy-protecting public substring determining method and device Pending CN111552716A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010665112.4A CN111552716A (en) 2020-07-10 2020-07-10 Privacy-protecting public substring determining method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010665112.4A CN111552716A (en) 2020-07-10 2020-07-10 Privacy-protecting public substring determining method and device

Publications (1)

Publication Number Publication Date
CN111552716A true CN111552716A (en) 2020-08-18

Family

ID=72007111

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010665112.4A Pending CN111552716A (en) 2020-07-10 2020-07-10 Privacy-protecting public substring determining method and device

Country Status (1)

Country Link
CN (1) CN111552716A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114401080A (en) * 2021-12-20 2022-04-26 武汉天喻信息产业股份有限公司 A method and system for intersection of privacy sets

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109241016A (en) * 2018-08-14 2019-01-18 阿里巴巴集团控股有限公司 Secure calculation method and device, electronic equipment
CN109857740A (en) * 2019-01-25 2019-06-07 上海赜睿信息科技有限公司 Storage method, matching process, electronic equipment and the readable storage medium storing program for executing of character string
CN110299989A (en) * 2019-06-10 2019-10-01 南通大学 A kind of encryption and decryption method of Chinese and English character string

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109241016A (en) * 2018-08-14 2019-01-18 阿里巴巴集团控股有限公司 Secure calculation method and device, electronic equipment
CN109857740A (en) * 2019-01-25 2019-06-07 上海赜睿信息科技有限公司 Storage method, matching process, electronic equipment and the readable storage medium storing program for executing of character string
CN110299989A (en) * 2019-06-10 2019-10-01 南通大学 A kind of encryption and decryption method of Chinese and English character string

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李功丽等: "面向用户隐私保护的高效基因比对方案", 《计算机应用》 *
樊重俊等: "《大数据分析与应用》", 31 January 2016, 立信会计出版社 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114401080A (en) * 2021-12-20 2022-04-26 武汉天喻信息产业股份有限公司 A method and system for intersection of privacy sets

Similar Documents

Publication Publication Date Title
US10652010B2 (en) Fully homomorphic encrypted ciphertext query method and system
CN107819569B (en) The encryption method and terminal device of log-on message
CN111510281B (en) Homomorphic encryption method and device
TWI489847B (en) Data encryption method, data verification method and electronic apparatus
US8180048B2 (en) Method and system for computational transformation
CN114244507B (en) Quantum direct communication method, device, equipment and system based on single-path transmission
CN112287377A (en) Model training method based on federal learning, computer equipment and storage medium
CN113098675B (en) Binary data encryption system and method based on polynomial complete homomorphism
CN106610995A (en) Ciphertext index creating method, device and system
CN114500006B (en) Query request processing method and device
CN114499845B (en) Multi-party secure computing method, system, device, storage medium and equipment
WO2024051864A1 (en) Method for optimizing constant round secure multi-party computation protocol
CN116488810B (en) Identity authentication method, identity authentication system, and readable storage medium
Joshy et al. Text to image encryption technique using RGB substitution and AES
CN111552716A (en) Privacy-protecting public substring determining method and device
CN119011122A (en) Cloud platform data security processing method and system based on cloud computing
CN115277064B (en) Data encryption and data decryption methods and devices, electronic equipment and medium
CN113518244B (en) Digital television signal data transmission method and device based on substitute text combination
CN113792322B (en) Safe two-party comparison method and system
JPH08204701A (en) E-mail encryption communication system and encryption communication method
Padmapriya et al. A Technique of Data Security using DNA Cryptography with Optimized Data Storage
CN114024674A (en) Method and system for comparing two parties safely
Al-Attab et al. Lightweight effective encryption algorithm for securing data in cloud computing
CN118400096B (en) Quantum attack resistant PSI method, quantum attack resistant PSI device, quantum attack resistant communication equipment and storage medium
CN117440372B (en) Zero trust authentication method and device for wireless network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200818

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载