CN119067092A

CN119067092A - A file content difference identification method, device, equipment and storage medium

Info

Publication number: CN119067092A
Application number: CN202411045180.5A
Authority: CN
Inventors: 李兴伟
Original assignee: Ping An Bank Co Ltd
Current assignee: Ping An Bank Co Ltd
Priority date: 2024-07-31
Filing date: 2024-07-31
Publication date: 2024-12-03

Abstract

The application discloses a method, a device, equipment and a storage medium for identifying file content differences, and belongs to the technical field of big data. According to the application, the baseline file and the current file are read as comparison objects, and the data fields required to be compared are respectively extracted from the two files through file analysis, and meanwhile, the position information of each field in the file is recorded. These data fields are then hashed to generate a hash value. Then, a hash mapping table is constructed, and the hash value and the position information are stored in a key value pair mode, so that subsequent quick retrieval is facilitated. And finally, identifying the hash difference item by comparing the keys of the two Ha Xiying tables, and precisely positioning the difference position of the file content according to the position information corresponding to the hash difference item. The application also relates to the field of blockchain technology, wherein the baseline file and the current file can be stored on a blockchain network. The scheme of the application realizes the accurate identification and efficient positioning of the file content change.

Description

File content difference identification method, device, equipment and storage medium

Technical Field

The application belongs to the technical field of big data, and particularly relates to a method, a device, equipment and a storage medium for identifying file content differences.

Background

Under the current background of the day-to-day and month-to-month age of the Internet technology, the iteration period of the versions of the software application and the data file is continuously shortened, the content is updated frequently, and the trend puts an unprecedented high requirement on the software testing work. In the face of rapid change of mass files, the traditional file content comparison method often relies on manual comparison of file contents one by one, so that the efficiency is low, a large amount of human resources are consumed, errors are easily caused by human negligence or fatigue, and the accuracy and reliability of a test result are seriously affected.

Disclosure of Invention

The embodiment of the application aims to provide a method, a device, computer equipment and a storage medium for identifying file content differences, which aim to improve the file content comparison efficiency and accuracy through automation and reduce the workload of testers.

In order to solve the above technical problems, the embodiment of the present application provides a method for identifying file content differences, which adopts the following technical scheme:

A method for identifying differences in file content, comprising:

receiving a difference identification instruction, and reading a baseline file and a current file to be compared;

Analyzing the baseline file to obtain a first data field, and analyzing the current file to obtain a second data field;

performing hash calculation on the first data field to obtain a first hash value, and performing hash calculation on the second data field to obtain a second hash value;

acquiring position information of a first data field to obtain first position information, and acquiring position information of a second data field to obtain second position information;

Storing a first hash value and first position information in a key value pair mode to obtain a first hash mapping table, and storing a second hash value and second position information in a key value pair mode to obtain a second Ha Xiying table, wherein a key bit of the hash mapping table stores the hash value, and a value bit of the hash mapping table stores the position information;

and sequentially comparing the key positions of the first hash mapping table and the second Ha Xiying table to determine a difference hash value, and determining the difference position of the file content based on the position information corresponding to the difference hash value.

Further, before receiving the difference identification instruction and reading the baseline file and the current file to be compared, the method further comprises the following steps:

The method comprises the steps of configuring a file basic parameter table, wherein the file basic parameter table is used for configuring basic parameters of files to be compared, and the basic parameters of the files to be compared comprise stored file names, baseline file paths, current file paths, file dates, file numbers and file owner information;

Reading a baseline file and a current file to be compared after receiving a difference identification instruction, wherein the method specifically comprises the following steps:

acquiring a basic parameter table of a file to be compared carried in a difference identification instruction after receiving the difference identification instruction;

And reading the baseline file and the current file based on the basic parameter table of the file to be compared.

Further, after configuring the file base parameter table, the method further comprises:

Configuring an analysis rule table, wherein the analysis rule table is used for storing file analysis rules, and a field name, a field start bit and a field end bit corresponding to each file analysis rule are preset in the analysis rule table;

analyzing the baseline file to obtain a first data field, which specifically comprises the following steps:

Determining a first file parsing rule matched with the baseline file;

Analyzing the baseline file by using a first file analysis rule to determine a field name, a field start bit and a field end bit of a data field in the baseline file to obtain a first data field;

analyzing the current file to obtain a second data field, which specifically comprises the following steps:

determining a second file parsing rule matched with the current file;

And analyzing the current file by using a second file analysis rule to determine the field name, the field start bit and the field end bit of the data field in the current file, so as to obtain a second data field.

Further, a white list corresponding to each file parsing rule is preset in the parsing rule table, and after the first file parsing rule is used to parse the baseline file to determine a field name, a field start bit and a field end bit of a data field in the baseline file, the method further includes:

Screening the data fields in the baseline file based on the white list of the first file analysis rule to obtain a first data field screening result;

based on the screening result of the first data field, eliminating the data field which is matched with the white list of the first file analysis rule in the baseline file;

Analyzing the current file by using a second file analysis rule to determine a field name, a field start bit and a field end bit of a data field in the current file, and after obtaining the second data field, further comprising:

Screening the data fields in the current file based on the white list of the second file analysis rule to obtain a second data field screening result;

and removing the data field which is matched with the white list of the second file analysis rule in the current file based on the screening result of the second data field.

Further, storing the first hash value and the first location information in the form of a key value pair to obtain a first hash mapping table, and storing the second hash value and the second location information in the form of a key value pair to obtain a second Ha Xiying table, which specifically includes:

Storing the first hash value and the first position information into a first temporary bucket file which is established in advance;

storing the second hash value and the second position information into a pre-established second temporary bucket file;

Analyzing the first temporary bucket file to obtain a first hash mapping table, and analyzing the second temporary bucket file to obtain a second Ha Xiying table;

The first hash mapping table and the second Ha Xiying table are mapping tables of Key and Value with the same structure, a first hash Value is stored in the Key of the first hash mapping table, value of the first hash mapping table stores first position information, a second hash Value is stored in the Key of the second Ha Xiying table, and Value of the second Ha Xiying table stores second position information;

Further, sequentially comparing the key positions of the first hash mapping table and the second Ha Xiying table to determine a difference hash value, and determining a file content difference position based on position information corresponding to the difference hash value, which specifically includes:

Traversing each first hash value stored in the Key of the first hash mapping table, and sequentially comparing each first hash value stored in the Key of the first hash mapping table with a second hash value stored in the Key of the second Ha Xiying table in a one-to-one correspondence manner to obtain a first hash value comparison result;

determining a second hash value inconsistent with the first hash value based on the first hash value comparison result to obtain a first difference hash value;

Searching a first Value corresponding to the Key of the first difference hash Value in the second hash mapping table in a Key Value pair searching mode, and acquiring a file content difference position stored in the first Value to obtain a first file content difference position;

further, searching a first Value corresponding to the Key of the first difference hash Value in the second hash mapping table by searching a Key Value pair, and acquiring a file content difference position stored in the first Value to obtain the first file content difference position, and then further comprising:

Traversing each second hash value stored in the Key of the second Ha Xiying table, and comparing each second hash value stored in the Key of the second Ha Xiying table with the first hash value stored in the Key of the first hash mapping table in sequence in a one-to-one correspondence manner to obtain a second hash value comparison result;

Determining a first hash value inconsistent with the second hash value based on the second hash value comparison result to obtain a second difference hash value;

searching a second Value corresponding to the Key of the second difference hash Value in the first hash mapping table in a Key Value pair searching mode, and acquiring a file content difference position stored in the second Value to obtain a second file content difference position.

In order to solve the above technical problems, the embodiment of the present application further provides a device for identifying file content differences, which adopts the following technical scheme:

A document content difference identifying apparatus comprising:

The file reading module is used for receiving the difference identification instruction and reading the baseline file and the current file to be compared;

The file analysis module is used for analyzing the baseline file to obtain a first data field, and analyzing the current file to obtain a second data field;

The hash calculation module is used for carrying out hash calculation on the first data field to obtain a first hash value, and carrying out hash calculation on the second data field to obtain a second hash value;

The position acquisition module is used for acquiring the position information of the first data field to obtain first position information and acquiring the position information of the second data field to obtain second position information;

the hash mapping module is used for storing a first hash value and first position information in a key value pair mode to obtain a first hash mapping table, and storing a second hash value and second position information in a key value pair mode to obtain a second Ha Xiying table, wherein the key position of the hash mapping table stores the hash value, and the value position of the hash mapping table stores the position information;

The difference identification module is used for sequentially comparing the key positions of the first hash mapping table and the second Ha Xiying table to determine a difference hash value and determining the difference position of the file content based on the position information corresponding to the difference hash value.

In order to solve the above technical problems, the embodiment of the present application further provides a computer device, which adopts the following technical schemes:

a computer device comprising a memory having stored therein computer readable instructions which when executed by a processor implement the steps of the file content variance identification method of any one of the preceding claims.

In order to solve the above technical problems, an embodiment of the present application further provides a computer readable storage medium, which adopts the following technical schemes:

A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor implement the steps of the file content variance identification method as claimed in any one of the preceding claims.

Compared with the prior art, the embodiment of the application has the following main beneficial effects:

The application discloses a method, a device, equipment and a storage medium for identifying file content differences, and belongs to the technical field of big data. According to the application, a difference identification instruction is received firstly, and a baseline file and a current file to be compared are read. Then, the two files are respectively parsed to extract a first data field and a second data field. And acquiring the position information of the first data field and the second data field, and recording the position information as the first position information and the second position information respectively. Then, hash computation is performed on the two data fields to generate a first hash value and a second hash value. And then, storing the first hash value and the corresponding position information thereof as a first hash mapping table, and storing the second hash value and the position information thereof as a second Ha Xiying table, wherein the mapping table uses a key value pair form, keys are hash values, and the values are position information. And finally, sequentially comparing keys in the first hash mapping table and the second Ha Xiying table to identify a difference hash value, and determining the content difference position in the file according to the position information corresponding to the difference hash value. According to the method, the corresponding data fields are analyzed by the two files to be compared, the hash value of the data fields is generated through hash calculation, and the content difference of the two files is determined through comparison of the hash values, so that accurate identification and efficient positioning of file content change are realized, the comparison efficiency and accuracy are improved, and the test cost is reduced.

Drawings

In order to more clearly illustrate the solution of the present application, a brief description will be given below of the drawings required for the description of the embodiments of the present application, it being apparent that the drawings in the following description are some embodiments of the present application, and that other drawings may be obtained from these drawings without the exercise of inventive effort for a person of ordinary skill in the art.

FIG. 1 illustrates an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 illustrates a flow chart of one embodiment of a method of document content variance identification according to the present application;

FIG. 3 is a schematic diagram showing the structure of an embodiment of a document content difference identifying apparatus according to the present application;

Fig. 4 shows a schematic structural diagram of an embodiment of a computer device according to the application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs, the terms used in the description herein are used for the purpose of describing particular embodiments only and are not intended to limit the application, and the terms "comprising" and "having" and any variations thereof in the description of the application and the claims and the above description of the drawings are intended to cover non-exclusive inclusions. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

In order to make the person skilled in the art better understand the solution of the present application, the technical solution of the embodiment of the present application will be clearly and completely described below with reference to the accompanying drawings.

As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 101, 102, 103.

The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic video expert compression standard audio plane 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic video expert compression standard audio plane 4) players, laptop and desktop computers, and the like.

The server 105 may be a server that provides various services, such as a background server that provides support for pages displayed on the terminal devices 101, 102, 103, and may be a stand-alone server, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms.

It should be noted that, the method for identifying file content differences provided in the embodiments of the present application is generally executed by a server, and accordingly, the device for identifying file content differences is generally disposed in the server.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow chart of one embodiment of a method of document content variance identification according to the present application is shown. The file content difference identification method comprises the following steps:

s201, receiving a difference identification instruction, and reading a baseline file and a current file to be compared.

The server receives instructions from the user or system to determine the files that need to be compared, the baseline file is typically a reference file, and the current file is the file that needs to be compared to the baseline file.

And a file-comparison-template (file-comparison-template) for configuring the required basic parameters of the comparison file, wherein the basic parameters comprise basic information required by the comparison of the file, and the basic information must be clear before the comparison operation is performed. Specifically, the roles of the file base parameter table include:

File name (file_name) is stored, and the file names to be compared are recorded, so that the system can accurately identify and process the target file.

Baseline file path (env_pat1) a storage path for a baseline file (i.e., a reference file or an old version file) for comparison is stored. This is an important reference point in the comparison process for comparison with the current file.

Current file path (env_path2) the storage path of the file currently needed to be aligned (i.e., new version file or test file) is stored.

File date (path_time), in which date information of the file is recorded, helps track version and modification time of the file, and is very useful for version control and history comparison.

The file number (template_id), which is a unique identifier of a file, is used to find the corresponding file parsing rule in the B table (i.e., flat_file_parameter), and this number ensures an accurate association between the file and its parsing rule.

The file owner information (team) records the team or department to which the file belongs, and is helpful for managing the access authority and responsibility attribution of the file.

In the embodiment of the application, after receiving a difference identification instruction, a server acquires a basic parameter table of a file to be compared carried in the difference identification instruction, information such as a stored file name, a base line file path, a current file path, a file date, a file number, a file owner and the like is recorded in the basic parameter table of the file to be compared, and then the server reads the base line file and the current file based on the basic parameter table of the file to be compared.

In the above embodiment, the present application records the information such as the file name, the baseline file path, the current file path, the file date, the file number, the file owner and the like of the files to be compared through configuring the file base parameter table, so that the server can quickly find the files to be compared.

It should be noted that, the file to be compared may be stored in the FTP file storage server, and the system obtains the specified file from the FTP file storage server in SFTP mode and downloads the specified file to the local server.

S202, analyzing the baseline file to obtain a first data field, and analyzing the current file to obtain a second data field.

The process of parsing the file requires converting the file content into a data structure (such as a data field) that can be compared, and can be implemented by a preset file parsing rule, and the accuracy of this process directly affects the results of subsequent hash computation and difference recognition.

Determining a first file parsing rule matched with the baseline file;

determining a second file parsing rule matched with the current file;

The configuration parsing rule table (flat_file_parameter) is mainly used for storing parsing rules of the file. These rules are the basis for parsing the file contents, extracting the fields and making comparisons. Specifically, the role of configuring the parsing rule table includes:

A field name (field_name) is defined, which designates a name for each field so that the meaning and purpose of each field can be clarified when parsing a file.

The field start bit (field_start_index) and the field end bit (field_end_index) define the start and end positions of each field in the file.

Whether or not the primary key (is_business_index) identifies which fields are the primary key of the service, and the primary key field of the service can be specially processed in the process of comparison.

Whether a white list (is exclude) specifies which fields should be excluded or ignored during the alignment process, which helps to reduce unnecessary alignment effort and improve alignment efficiency.

In the embodiment of the application, the file base parameter table and the configuration analysis rule table play a complementary role in the difference comparison flow of file contents. The file base parameter table provides basic file information and parameters required for comparison, and the configuration analysis rule table provides analysis rules and comparison logic of file contents. The two are combined together to support efficient and accurate file content difference comparison operation.

On the basis of file content difference identification, a screening mechanism of data fields is added, and data fields which do not need to be compared are filtered through a preset white list. When the baseline file and the current file are analyzed, screening of data fields is carried out according to respective file analysis rules and corresponding whitelists, and therefore it is ensured that only fields which accord with the rules and are not in the whitelists are reserved for subsequent hash calculation and difference comparison, and comparison of the data fields in the whitelists is not needed. The mechanism effectively improves the accuracy of difference identification, avoids unnecessary fields from participating in comparison, reduces the calculated amount, and simultaneously avoids false alarm caused by irrelevant field differences.

Through the screening step, the system can identify the data fields which are subjected to key modification in the file, and the user can concentrate on the difference analysis of the key content, so that the efficiency and the value of the whole difference identification process are improved.

S203, performing hash calculation on the first data field to obtain a first hash value, and performing hash calculation on the second data field to obtain a second hash value.

And carrying out hash calculation on the parsed data fields by utilizing a hash algorithm to generate a unique hash value, wherein the hash value is an efficient means for identifying the file content difference. The hash value can reflect the content characteristics of the data fields, and is necessarily the same for data fields of the same content, and is necessarily different for data fields where any different content exists. The file content difference is identified by comparing the hash values, so that the subsequent comparison process is greatly simplified, and the difference identification efficiency is improved.

S204, acquiring the position information of the first data field to obtain first position information, and acquiring the position information of the second data field to obtain second position information.

After obtaining the hash values of the data fields, the system also records the location information of these fields in the file. The acquisition of the position information is crucial to the final positioning of the difference of the file contents, and by correlating the hash value with the position information, the system can quickly point out the specific position when the difference is found, and a more detailed difference report is provided for the user.

S205, storing a first hash value and first position information in a key value pair mode to obtain a first hash mapping table, and storing a second hash value and second position information in a key value pair mode to obtain a second Ha Xiying table, wherein the key position of the hash mapping table stores the hash value, and the value position of the hash mapping table stores the position information.

By constructing the hash mapping table, the hash value and the position information are stored in the form of key value pairs. The data structure of the key value pair is convenient for quickly searching the position information corresponding to the hash value, simplifies the subsequent comparison flow, and fully prepares the system for the next difference comparison by respectively constructing the hash mapping tables of the baseline file and the current file.

The first temporary bucket file master, the master temporary bucket file, is also referred to as a "master version temporary data file" or "reference temporary data file". "master" herein generally refers to a reference, primary or original version, and "temporary bucket file" refers to a file for temporarily storing the results of a process.

The second temporary bucket file slave, the slave temporary bucket file is also referred to as a "slave temporary data file" or a "current temporary data file". "slave" as referred to herein with respect to "master" refers to another version or state, typically a copy or derivative version that requires comparison or synchronization with the master version.

The master temporary bucket file and the slave temporary bucket file are temporary bucket files which are lifted for file comparison, the master temporary bucket file is used for storing a first hash value and first position information, and the slave temporary bucket file is used for storing a second hash value and second position information.

In a specific embodiment of the application, the system reads the baseline file and the current file, respectively, according to the configuration in the file base parameter table. And then determining file analysis rules corresponding to the baseline file and the current file according to the configuration analysis rule table, and analyzing the baseline file and the current file according to the file analysis rules. In the file analysis process, file contents are circularly read through the file stream, each row of data is processed according to analysis rules, fields which do not need to be compared are removed, hash operation is carried out on the remaining fields, a result and a row number are stored in a temporary bucket file, wherein a first hash value and first position information are stored in a master temporary bucket file, a second hash value and second position information are stored in a slave temporary bucket file, the calculated amount of directly comparing the file contents is effectively reduced, and the comparison efficiency is improved.

Then, the master temporary bucket file and the slave temporary bucket file are respectively parsed into two hash mapping tables Map, the master temporary bucket file is parsed to obtain a first hash mapping table MASTERMAP, and the slave temporary bucket file is parsed to obtain a second hash mapping table slaveMap.

Parsing these temporary files into two maps means storing the data in the form of Key-value pairs (keys-Valuepair) into two Map data structures. Wherein:

The Key of Map stores the Hash value of the data field. The Hash value is a value calculated by a Hash function, and is unique to an input data field. Using a Hash value as a Key allows for quick lookup and comparison of whether the data lines are identical, as the same data lines will produce the same Hash value.

The Value of Map stores the line number of the data line in the original file, which is used to identify the location of the data line in the file, so that a specific line can be quickly located when a discrepancy is found.

In this way, the contents of two files can be converted into two maps, which are then compared efficiently to find the differences. Specifically, one of the maps (e.g., MASTERMAP) can be traversed, and for each Key (Hash value), it is checked whether the same Key exists in the other Map (slaveMap). If not, it is indicated that there are no rows in the slave file that are identical to the corresponding Hash values in the master file, i.e., that there is a difference in the two row data fields. If so, the two data fields are considered identical and this Key is removed from both maps for subsequent processing of only the difference item.

S206, sequentially comparing the key positions of the first hash mapping table and the second Ha Xiying table to determine a difference hash value, and determining the difference position of the file content based on the position information corresponding to the difference hash value.

The system finds out a mismatch hash value (i.e., a difference hash value) by comparing the key bits (i.e., hash values) of the two hash map tables. Then, based on the position information corresponding to the difference hash values, the system can accurately position the difference position of the file content, so that the accuracy of difference identification is improved, and visual difference display is provided for a user.

The difference of the file contents is efficiently compared through the hash mapping tables, and firstly hash values stored in the two hash mapping tables are compared one by one, wherein the hash values represent specific fragments of the file contents. Through a fine comparison process, inconsistent hash values (namely difference hash values) are quickly identified, and position information corresponding to the difference hash values directly points to specific difference positions in file contents. And then, quickly positioning the file content position information corresponding to the difference hash value by utilizing the key value pair relation stored in the hash mapping table. The hash-based comparison method not only greatly improves the efficiency of difference identification, but also ensures the accuracy of positioning, and provides powerful support for subsequent file content modification or synchronization.

In a particular embodiment of the application, traversal MASTERMAP checks, for each Key in MASTERMAP, in turn, whether the same Key exists in slaveMap. If the same Key exists, it indicates that the two rows of data fields are identical, and the Key and its corresponding row number are removed from MASTERMAP, indicating that the two rows of data are identical. If the same Key does not exist, the two data fields are different, namely the data fields with the data fields possibly updated or in error are indicated, and the Key and the corresponding line number are reserved and used as file content differences and positions. And in the same way, for each Key in slaveMap, checking whether the same Key exists in MASTERMAP in sequence, and finally merging the remaining Key in MASTERMAP and the corresponding line number thereof with the remaining Key in slaveMap and the corresponding line number thereof to obtain a final file content difference result of two file comparison.

In this embodiment, the electronic device (e.g., the server shown in fig. 1) on which the file content difference identification method operates may be connected by a wired connection or a wireless connection. It should be noted that the wireless connection may include, but is not limited to, 3G/4G connection, wiFi connection, bluetooth connection, wiMAX connection, zigbee connection, UWB (ultra wideband) connection, and other now known or later developed wireless connection.

It should be emphasized that, to further ensure the privacy and security of the files to be compared, the files to be compared may also be stored in a node of a blockchain.

The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The blockchain (Blockchain), essentially a de-centralized database, is a string of data blocks that are generated in association using cryptographic methods, each of which contains information from a batch of network transactions for verifying the validity (anti-counterfeit) of its information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.

Those skilled in the art will appreciate that implementing all or part of the processes of the methods of the embodiments described above may be accomplished by way of computer readable instructions, stored on a computer readable storage medium, which when executed may comprise processes of embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a document content difference identifying apparatus, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be specifically applied to various electronic devices.

As shown in fig. 3, the file content difference identifying apparatus 300 according to the present embodiment includes:

The file reading module 301 is configured to receive a difference identification instruction, and read a baseline file and a current file to be compared;

The file parsing module 302 is configured to parse the baseline file to obtain a first data field, and parse the current file to obtain a second data field;

the hash calculation module 303 is configured to perform hash calculation on the first data field to obtain a first hash value, and perform hash calculation on the second data field to obtain a second hash value;

The position obtaining module 304 is configured to obtain position information of the first data field, obtain first position information, and obtain position information of the second data field, obtain second position information;

The hash mapping module 305 is configured to store a first hash value and first location information in a key value pair form to obtain a first hash mapping table, and store a second hash value and second location information in a key value pair form to obtain a second Ha Xiying table, where a key position of the hash mapping table stores the hash value and a value position of the hash mapping table stores the location information;

The difference identifying module 306 is configured to sequentially compare key positions of the first hash map table and the second Ha Xiying table to determine a difference hash value, and determine a difference position of the file content based on position information corresponding to the difference hash value.

Further, the file content difference identifying apparatus 300 further includes:

The first table configuration module is used for configuring a file basic parameter table, wherein the file basic parameter table is used for configuring basic parameters of files to be compared, and the basic parameters of the files to be compared comprise stored file names, baseline file paths, current file paths, file dates, file numbers and file owner information;

the file reading module 301 specifically includes:

The parameter table acquisition unit is used for acquiring a basic parameter table of the file to be compared carried in the difference identification instruction when the difference identification instruction is received;

And the file reading unit is used for reading the baseline file and the current file based on the basic parameter table of the file to be compared.

The second table configuration module is used for configuring an analysis rule table, wherein the analysis rule table is used for storing file analysis rules, and a field name, a field start bit and a field end bit corresponding to each file analysis rule are preset in the analysis rule table;

The file parsing module 302 specifically includes:

A first rule matching unit for determining a first file parsing rule matched with the baseline file;

The base line file analysis unit is used for analyzing the base line file by using a first file analysis rule so as to determine the field name, the field start bit and the field end bit of the data field in the base line file and obtain a first data field;

the second rule matching unit is used for determining a second file analysis rule matched with the current file;

and the current file analysis unit is used for analyzing the current file by using a second file analysis rule so as to determine the field name, the field start bit and the field end bit of the data field in the current file and obtain a second data field.

the first field screening module is used for screening the data fields in the baseline file based on the white list of the first file analysis rule to obtain a first data field screening result;

The first field eliminating module is used for eliminating data fields which are matched with the white list of the first file analysis rule in the baseline file based on the first data field screening result;

the second field screening module is used for screening the data fields in the current file based on the white list of the second file analysis rule to obtain a second data field screening result;

And the second field eliminating module is used for eliminating the data field which is matched with the white list of the second file analysis rule in the current file based on the second data field screening result.

Further, the hash mapping module 305 specifically includes:

A first storage unit for storing the first hash value and the first location information into a first temporary bucket file established in advance;

a second storage unit for storing a second hash value and second position information into a pre-established second temporary bucket file;

The temporary bucket file analyzing unit is used for analyzing the first temporary bucket file to obtain a first hash mapping table, and analyzing the second temporary bucket file to obtain a second Ha Xiying table;

further, the difference identifying module 306 specifically includes:

The first hash value comparison unit is used for traversing each first hash value stored in the Key of the first hash mapping table, and sequentially comparing each first hash value stored in the Key of the first hash mapping table with a second hash value stored in the Key of the second Ha Xiying table in a one-to-one correspondence manner to obtain a first hash value comparison result;

The first hash value screening unit is used for determining a second hash value inconsistent with the first hash value based on the comparison result of the first hash value to obtain a first difference hash value;

the first file content difference identification unit is used for searching a first Value corresponding to the Key of the first difference hash Value in the second hash mapping table in a Key Value pair searching mode, acquiring a file content difference position stored in the first Value, and acquiring a first file content difference position;

Further, the difference identifying module 306 further includes:

the second hash value comparison unit is used for traversing each second hash value stored in the Key of the second Ha Xiying table, and sequentially comparing each second hash value stored in the Key of the second Ha Xiying table with the first hash value stored in the Key of the first hash mapping table in a one-to-one correspondence manner to obtain a second hash value comparison result;

The second hash value screening unit is used for determining a first hash value inconsistent with the second hash value based on a second hash value comparison result to obtain a second difference hash value;

the second file content difference identification unit is used for searching a second Value corresponding to the Key of the second difference hash Value in the first hash mapping table in a Key Value pair searching mode, acquiring a file content difference position stored in the second Value, and obtaining a second file content difference position.

In the embodiment, the method analyzes the corresponding data fields of the two files to be compared, generates the hash value of the data fields through hash calculation, determines the content difference of the two files through comparison of the hash values, realizes accurate identification and efficient positioning of file content change, improves comparison efficiency and accuracy, and reduces test cost.

The application discloses a file content difference identification device, and belongs to the technical field of big data. According to the application, a difference identification instruction is received firstly, and a baseline file and a current file to be compared are read. Then, the two files are respectively parsed to extract a first data field and a second data field. And acquiring the position information of the first data field and the second data field, and recording the position information as the first position information and the second position information respectively. Then, hash computation is performed on the two data fields to generate a first hash value and a second hash value. And then, storing the first hash value and the corresponding position information thereof as a first hash mapping table, and storing the second hash value and the position information thereof as a second Ha Xiying table, wherein the mapping table uses a key value pair form, keys are hash values, and the values are position information. And finally, sequentially comparing keys in the first hash mapping table and the second Ha Xiying table to identify a difference hash value, and determining the content difference position in the file according to the position information corresponding to the difference hash value. According to the method, the corresponding data fields are analyzed by the two files to be compared, the hash value of the data fields is generated through hash calculation, and the content difference of the two files is determined through comparison of the hash values, so that accurate identification and efficient positioning of file content change are realized, the comparison efficiency and accuracy are improved, and the test cost is reduced.

In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 4, fig. 4 is a basic structural block diagram of a computer device according to the present embodiment.

The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It should be noted that only computer device 4 having components 41-43 is shown in the figures, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and its hardware includes, but is not limited to, a microprocessor, an Application SPECIFIC INTEGRATED Circuit (ASIC), a Programmable gate array (Field-Programmable GATE ARRAY, FPGA), a digital Processor (DIGITAL SIGNAL Processor, DSP), an embedded device, and the like.

The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.

The memory 41 includes at least one type of readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, which are provided on the computer device 4. Of course, the memory 41 may also comprise both an internal memory unit of the computer device 4 and an external memory device. In this embodiment, the memory 41 is typically used to store an operating system and various application software installed on the computer device 4, such as computer readable instructions of a file content difference identifying method. Further, the memory 41 may be used to temporarily store various types of data that have been output or are to be output.

The processor 42 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute computer readable instructions stored in the memory 41 or process data, such as computer readable instructions for executing the file content difference identifying method.

The network interface 43 may comprise a wireless network interface or a wired network interface, which network interface 43 is typically used for establishing a communication connection between the computer device 4 and other electronic devices.

The application discloses computer equipment, and belongs to the technical field of big data. According to the application, a difference identification instruction is received firstly, and a baseline file and a current file to be compared are read. Then, the two files are respectively parsed to extract a first data field and a second data field. And acquiring the position information of the first data field and the second data field, and recording the position information as the first position information and the second position information respectively. Then, hash computation is performed on the two data fields to generate a first hash value and a second hash value. And then, storing the first hash value and the corresponding position information thereof as a first hash mapping table, and storing the second hash value and the position information thereof as a second Ha Xiying table, wherein the mapping table uses a key value pair form, keys are hash values, and the values are position information. And finally, sequentially comparing keys in the first hash mapping table and the second Ha Xiying table to identify a difference hash value, and determining the content difference position in the file according to the position information corresponding to the difference hash value. According to the method, the corresponding data fields are analyzed by the two files to be compared, the hash value of the data fields is generated through hash calculation, and the content difference of the two files is determined through comparison of the hash values, so that accurate identification and efficient positioning of file content change are realized, the comparison efficiency and accuracy are improved, and the test cost is reduced.

The present application also provides another embodiment, namely, a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of the file content difference identification method as described above.

The application discloses a computer readable storage medium, and belongs to the technical field of big data. According to the application, a difference identification instruction is received firstly, and a baseline file and a current file to be compared are read. Then, the two files are respectively parsed to extract a first data field and a second data field. And acquiring the position information of the first data field and the second data field, and recording the position information as the first position information and the second position information respectively. Then, hash computation is performed on the two data fields to generate a first hash value and a second hash value. And then, storing the first hash value and the corresponding position information thereof as a first hash mapping table, and storing the second hash value and the position information thereof as a second Ha Xiying table, wherein the mapping table uses a key value pair form, keys are hash values, and the values are position information. And finally, sequentially comparing keys in the first hash mapping table and the second Ha Xiying table to identify a difference hash value, and determining the content difference position in the file according to the position information corresponding to the difference hash value. According to the method, the corresponding data fields are analyzed by the two files to be compared, the hash value of the data fields is generated through hash calculation, and the content difference of the two files is determined through comparison of the hash values, so that accurate identification and efficient positioning of file content change are realized, the comparison efficiency and accuracy are improved, and the test cost is reduced.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.

The application is operational with numerous general purpose or special purpose computer system environments or configurations. Such as a personal computer, a server computer, a hand-held or portable device, a tablet device, a multiprocessor system, a microprocessor-based system, a set top box, a programmable consumer electronics, a network PC, a minicomputer, a mainframe computer, a distributed computing environment that includes any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

It is apparent that the above-described embodiments are only some embodiments of the present application, but not all embodiments, and the preferred embodiments of the present application are shown in the drawings, which do not limit the scope of the patent claims. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a thorough and complete understanding of the present disclosure. Although the application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing description, or equivalents may be substituted for elements thereof. All equivalent structures made by the content of the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the scope of the application.

Claims

1. A method for identifying differences in file content, comprising:

Receive the difference recognition instruction and read the baseline file and current file to be compared;

Parsing the baseline file to obtain a first data field, and parsing the current file to obtain a second data field;

Performing a hash calculation on the first data field to obtain a first hash value, and performing a hash calculation on the second data field to obtain a second hash value;

Acquire the position information of the first data field to obtain first position information, and acquire the position information of the second data field to obtain second position information;

The first hash value and the first location information are stored in the form of a key-value pair to obtain a first hash mapping table, and the second hash value and the second location information are stored in the form of a key-value pair to obtain a second hash mapping table, wherein the key position of the hash mapping table stores the hash value and the value position of the hash mapping table stores the location information;

The keys of the first hash mapping table and the second hash mapping table are sequentially compared to determine a difference hash value, and a file content difference position is determined based on the position information corresponding to the difference hash value.

2. The file content difference identification method according to claim 1, characterized in that before receiving the difference identification instruction and reading the baseline file to be compared and the current file, it also includes:

A file basic parameter table is configured, wherein the file basic parameter table is used to configure basic parameters of the file to be compared, and the basic parameters of the file to be compared include storage file name, baseline file path, current file path, file date, file number and file owner information;

After receiving the difference recognition instruction, the baseline file and the current file to be compared are read, including:

After receiving the difference recognition instruction, obtaining the basic parameter table of the file to be compared carried in the difference recognition instruction;

The baseline file and the current file are read based on the basic parameter table of the file to be compared.

3. The file content difference identification method according to claim 1, characterized in that after configuring the configuration file basic parameter table, it also includes:

Configure a parsing rule table, wherein the parsing rule table is used to store file parsing rules, and the parsing rule table is pre-set with a field name, a field start bit, and a field end bit corresponding to each file parsing rule;

The parsing of the baseline file to obtain the first data field specifically includes:

Determining a first file parsing rule that matches the baseline file;

Parsing the baseline file using the first file parsing rule to determine a field name, a field start bit, and a field end bit of a data field in the baseline file, and obtaining the first data field;

The current file is parsed to obtain a second data field, which specifically includes:

Determine a second file parsing rule that matches the current file;

The current file is parsed using the second file parsing rule to determine the field name, field start bit, and field end bit of the data field in the current file to obtain the second data field.

4. The file content difference identification method according to claim 3 is characterized in that the parsing rule table also pre-sets a white list corresponding to each file parsing rule, and after the baseline file is parsed using the first file parsing rule to determine the field name, field start bit and field end bit of the data field in the baseline file and obtaining the first data field, further comprising:

Filter the data fields in the baseline file based on the whitelist of the first file parsing rule to obtain a first data field filtering result;

Based on the first data field screening result, remove the data fields in the baseline file that match the whitelist of the first file parsing rule;

The current file is parsed using the second file parsing rule to determine the field name, field start bit, and field end bit of the data field in the current file, and after obtaining the second data field, the method further includes:

Filter the data fields in the current file based on the whitelist of the second file parsing rule to obtain a second data field filtering result;

Based on the second data field screening result, the data fields in the current file that match the whitelist of the second file parsing rule are removed.

5. The file content difference identification method according to claim 1, characterized in that storing the first hash value and the first position information in the form of a key-value pair to obtain a first hash mapping table, and storing the second hash value and the second position information in the form of a key-value pair to obtain a second hash mapping table, specifically comprises:

storing the first hash value and the first location information in a pre-established first temporary bucket file;

storing the second hash value and the second location information in a pre-established second temporary bucket file;

Parsing the first temporary bucket file to obtain a first hash mapping table, and parsing the second temporary bucket file to obtain a second hash mapping table;

Among them, the first hash mapping table and the second hash mapping table are mapping tables of Key and Value with the same structure, the first hash value is stored in the Key of the first hash mapping table, the first location information is stored in the Value of the first hash mapping table, the second hash value is stored in the Key of the second hash mapping table, and the second location information is stored in the Value of the second hash mapping table.

6. The method for identifying file content differences according to claim 5, characterized in that the key positions of the first hash mapping table and the second hash mapping table are sequentially compared to determine the difference hash value, and the file content difference position is determined based on the position information corresponding to the difference hash value, specifically comprising:

Traversing each first hash value stored in the Key of the first hash mapping table, and comparing each first hash value stored in the Key of the first hash mapping table with the second hash value stored in the Key of the second hash mapping table in a one-to-one correspondence, to obtain a first hash value comparison result;

Based on the comparison result of the first hash value, determine a second hash value inconsistent with the first hash value to obtain a first difference hash value;

The first Value corresponding to the Key of the first difference hash value is searched in the second hash mapping table by searching for a key-value pair, and the file content difference position stored in the first Value is obtained to obtain the first file content difference position.

7. The file content difference identification method according to claim 6, characterized in that after searching the second hash mapping table for the first Value corresponding to the Key of the first difference hash value by searching for a key-value pair and obtaining the file content difference position stored in the first Value, the method further comprises:

Traversing each second hash value stored in the Key of the second hash mapping table, and comparing each second hash value stored in the Key of the second hash mapping table with the first hash value stored in the Key of the first hash mapping table one by one, to obtain a second hash value comparison result;

Based on the comparison result of the second hash value, determine a first hash value inconsistent with the second hash value to obtain a second difference hash value;

The second Value corresponding to the Key of the second difference hash value is searched in the first hash mapping table by searching for a key-value pair, and the file content difference position stored in the second Value is obtained to obtain the second file content difference position.

8. A device for identifying differences in file content, comprising:

The file reading module is used to receive the difference recognition instruction and read the baseline file and the current file to be compared;

A file parsing module, used to parse the baseline file to obtain a first data field, and to parse the current file to obtain a second data field;

A hash calculation module, configured to perform a hash calculation on the first data field to obtain a first hash value, and perform a hash calculation on the second data field to obtain a second hash value;

A position acquisition module, used to acquire the position information of the first data field to obtain first position information, and to acquire the position information of the second data field to obtain second position information;

a hash mapping module, configured to store the first hash value and the first location information in the form of a key-value pair to obtain a first hash mapping table, and to store the second hash value and the second location information in the form of a key-value pair to obtain a second hash mapping table, wherein the key position of the hash mapping table stores the hash value, and the value position of the hash mapping table stores the location information;

The difference identification module is used to sequentially compare the key positions of the first hash mapping table and the second hash mapping table to determine the difference hash value, and determine the file content difference position based on the position information corresponding to the difference hash value.

9. A computer device, comprising a memory and a processor, wherein the memory stores computer-readable instructions, and the processor implements the steps of the file content difference identification method according to any one of claims 1 to 7 when executing the computer-readable instructions.

10. A computer-readable storage medium, characterized in that computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, the steps of the file content difference identification method according to any one of claims 1 to 7 are implemented.