CN119621074A - A code conversion method, device, computer equipment and storage medium - Google Patents
A code conversion method, device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN119621074A CN119621074A CN202411692096.2A CN202411692096A CN119621074A CN 119621074 A CN119621074 A CN 119621074A CN 202411692096 A CN202411692096 A CN 202411692096A CN 119621074 A CN119621074 A CN 119621074A
- Authority
- CN
- China
- Prior art keywords
- code
- conversion
- target language
- preliminary
- semantic information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/51—Source to source
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/42—Syntactic analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/42—Syntactic analysis
- G06F8/425—Lexical analysis
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Stored Programmes (AREA)
Abstract
The invention belongs to the field of artificial intelligence, and relates to a code conversion method, a code conversion device, computer equipment and a storage medium, wherein the method comprises the steps of obtaining source codes and target languages; analyzing the source code to generate an abstract grammar tree, analyzing the abstract grammar tree to extract semantic information, converting the code to a target language according to the semantic information and grammar rules of the target language to obtain a preliminary conversion code, and further optimizing the preliminary conversion code to generate a final translation code. The method has the advantages of high automation degree, great reduction of the workload of manual conversion, standard and unified conversion process, improvement of the consistency and accuracy of code conversion, deeper understanding of source codes by analyzing and analyzing AST, easy expansion and maintenance, addition of support to new languages or characteristics according to requirements, ensuring that the converted codes are correct, efficient and strong in readability, and promotion of cross-language programming and code multiplexing.
Description
Technical Field
The present invention relates to the field of artificial intelligence, and in particular, to a transcoding method, apparatus, computer device, and storage medium.
Background
With the rapid development of technology, migration and updating of software systems is a common requirement. However, manually converting large code libraries from one programming language to another is a time consuming and error prone task. Existing transcoding tools often lack flexibility and extensibility, have difficulty handling complex language characteristics and meet custom requirements. Furthermore, these tools often fail to take full advantage of the latest artificial intelligence techniques to optimize the conversion process and results. Thus, there is a need for an intelligent, customizable code translation system that improves conversion efficiency and accuracy.
Disclosure of Invention
In order to solve the above technical problems, in one aspect, the present invention provides a transcoding method, which adopts the following technical scheme, including:
Acquiring a source code and a target language;
Analyzing the source code to generate an abstract syntax tree;
analyzing the abstract syntax tree and extracting semantic information;
converting the code to the target language according to the semantic information and the grammar rule of the target language to obtain a preliminary conversion code;
and further optimizing the preliminary translation code to generate a final translation code.
Preferably, the step of obtaining the source code and the target language specifically includes:
Acquiring the source code from a code library;
Acquiring the target language from an input interface;
storing the source code and the target language.
Preferably, the step of parsing the source code to generate an abstract syntax tree specifically includes:
Performing lexical analysis on the source code, dividing a character sequence of the source code into lexical units, and distributing corresponding labels for each lexical unit;
Based on the lexical analysis, a grammar analyzer is used to identify grammar structures and to construct abstract grammar trees.
Preferably, the step of analyzing the abstract syntax tree and extracting semantic information specifically includes:
performing static check on the abstract syntax tree to verify the semantic correctness of the code;
Optimizing the abstract syntax tree to improve the execution efficiency of the program;
Traversing each node of the abstract syntax tree, processing the syntax structure, and extracting the semantic information.
Preferably, the step of converting the code into the target language according to the semantic information and the grammar rule of the target language to obtain the preliminary converted code specifically includes:
Selecting a conversion template according to the grammar rule of the target language and the semantic information;
And filling the semantic information into the conversion template, and generating a source code of a target language to obtain a preliminary conversion code.
Preferably, the step of further optimizing the preliminary translation code to generate a final translation code specifically includes:
transferring the preliminary conversion codes to a GPT-4 model;
analyzing the preliminary conversion codes by a GPT-4 model to generate optimization suggestions;
And applying the optimization suggestion to the preliminary translation code to generate a final translation code.
Preferably, after the step of further optimizing the preliminary translation code to generate a final translation code, the method further comprises:
and executing the translation code under the target language environment to obtain an execution result of the translation code.
In order to solve the technical problem, the invention also provides a code conversion device, which adopts the following technical scheme that:
The acquisition module is used for acquiring the source code and the target language;
the analysis module is used for analyzing the source code and generating an abstract syntax tree;
The extraction module is used for analyzing the abstract syntax tree and extracting semantic information;
The conversion module is used for converting the codes into target languages according to the semantic information and the grammar conversion rules to obtain primary conversion codes;
and the optimizing module is used for further optimizing the preliminary conversion codes and generating final translation codes.
In order to solve the technical problem, the invention also provides a computer device, which adopts the technical scheme that the computer device comprises a memory and a processor, wherein the memory stores computer readable instructions, and the processor realizes the steps of the code conversion method when executing the computer readable instructions.
In order to solve the above technical problem, the present invention further provides a computer readable storage medium, which adopts the technical scheme that the computer readable storage medium stores computer readable instructions, and the computer readable instructions implement the steps of the code conversion method when being executed by a processor.
Compared with the prior art, the method has the advantages that the source code and the target language are firstly obtained, then the source code is analyzed to generate the abstract grammar tree, then the abstract grammar tree is analyzed to extract semantic information, then the code is converted into the target language according to the semantic information and grammar rules of the target language to obtain preliminary converted code, then the preliminary converted code is further optimized to generate final translated code, the automation degree is high, the workload of manual conversion is greatly reduced, the conversion process is standard and uniform, the consistency and accuracy of code conversion are improved, the source code can be deeply understood through analysis and analysis of AST, so that the translated code with higher quality is generated, the method is easy to expand and maintain, the support for new language or characteristics can be added according to the requirement, the optimizing step ensures that the converted code is correct, efficient and strong in readability, the cross-language programming and code multiplexing are promoted, the cost of software development is reduced, the powerful tool support is provided for the cross-language migration and upgrading of the software system, and the development cost of the software system is greatly reduced, and the development and development efficiency of the software is greatly improved.
Drawings
In order to more clearly illustrate the solution of the present invention, a brief description will be given below of the drawings required for the description of the embodiments of the present invention, it being apparent that the drawings in the following description are some embodiments of the present invention, and that other drawings may be obtained from these drawings without the exercise of inventive effort for a person of ordinary skill in the art.
FIG. 1 is an exemplary system architecture diagram in which the present invention may be applied;
FIG. 2 is a flow chart of one embodiment of a transcoding method of the present invention;
FIG. 3 is a schematic diagram of one embodiment of a transcoding device of the present invention;
FIG. 4 is a timing diagram of the constituent modules of another transcoding device of the present invention;
FIG. 5 is a schematic diagram of an embodiment of a computer device of the present invention.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs, the terms used in the description herein are used for the purpose of describing particular embodiments only and are not intended to limit the invention, and the terms "comprising" and "having" and any variations thereof in the description of the invention and the claims and the above description of the drawings are intended to cover non-exclusive inclusions. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the invention. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
In order to make the person skilled in the art better understand the solution of the present invention, the technical solution of the embodiment of the present invention will be clearly and completely described below with reference to the accompanying drawings.
As shown in fig. 1, the system architecture includes a user layer, LANGCHAIN integration layer, code processing and analysis layer, core translation function layer, code generation and verification layer, and the like.
The user layer involves user interfaces/APIs, translation workflow orchestration, transcoding engines, etc.
The code processing and analysis layer involves a specialized code parsing tool and LANGCHAIN code loader. The professional code parsing tool is used for Abstract Syntax Tree (AST) generation and semantic analysis. LANGCHAIN the code loader includes a source code loader and a code fragment divider.
The core translation functional layer comprises a language characteristic mapping chain, a grammar conversion engine, a code reconstruction engine, an identifier renaming tool, a code optimization chain and the like.
The code generation and verification layer includes an object language code generator, a grammar checker, a unit test generation chain, a code formatting tool, a knowledge base and storage and infrastructure layer. The knowledge base and the storage comprise a language characteristic mapping database, a conversion mode base and a code example base. The infrastructure layer includes a large-scale language model (LLM), a vector database, a version control system, and the like.
Example 1
With continued reference to FIG. 2, a flow chart of one embodiment of the transcoding method of the present invention is shown. A method of transcoding comprising the steps of:
Step S1, acquiring source codes and target languages.
In this embodiment, the electronic device (e.g., server/terminal device) on which the transcoding method operates may receive the transcoding request through a wired connection or a wireless connection. It should be noted that the wireless connection may include, but is not limited to, 3G/4G/5G connections, wiFi connections, bluetooth connections, wiMAXX connections, zigbee connections, UWB (ultra wideband) connections, and other now known or later developed wireless connection means.
In this embodiment, step S1, obtaining the source code and the target language may specifically further include the steps of:
s11, acquiring source codes from a code library.
The source code may be obtained, for example, by accessing the url connection address of the source code repository over a Web interface. If the code base is stored locally, the source code is acquired from the local code base. For example, a version control system (e.g., git) may be used to clone or pull a code library.
S12, acquiring a target language from an input interface.
The target language is obtained from an input interface such as a text box.
S13, storing the source code and the target language.
And storing the acquired source code and target language information.
And S2, analyzing the source code to generate an abstract syntax tree.
An Abstract Syntax Tree (AST) is a tree-like data structure in which each node represents a syntax structure in source code, and the relationships between nodes are represented by way of tree connections, such as parent and child nodes, siblings, etc., which enable the compiler to more conveniently analyze and process the code.
In particular implementations, the source code may be parsed using Tree-sitter.
In this embodiment, step S2, parsing the source code, generating an abstract syntax tree body may further include the steps of:
s21, performing lexical analysis on the source code, dividing the character sequence of the source code into lexical units, and distributing corresponding labels for each lexical unit.
Lexical analysis is the first step in parsing the source code, which divides the character sequence of the source code into a series of lexical units (tokens). These lexical elements include identifiers, keywords, operators, and the like. The lexical analyzer identifies the lexical units by scanning the source code and assigns a corresponding tag to each lexical unit.
For example, in C++ code, the lexical analyzer identifies lexical elements such as ' int ', ' main ', ' { ', ' etc., and marks them as keywords, identifiers, symbols, etc., respectively.
S22, on the basis of lexical analysis, a grammar analyzer is used for identifying grammar structures and constructing an abstract grammar tree.
Taking c++ code as an example, the parser recognizes the syntactic structures of function declarations, variable declarations, assignment expressions, etc., and converts them into nodes in the AST. For example, the following c++ code:
its corresponding abstract syntax tree AST (simplified representation) is as follows:
Program
└──FunctionDeclaration(main)
├──Declaration(a)
│├──TypeSpecifier(int)
│└──AssignmentExpression
│├──Identifier(a)
│└──Constant(
├──Declaration(b)
│├──TypeSpecifier(int)
│└──AssignmentExpression
│├──Identifier(b)
│└──Constant(10)
├──Declaration(c)
│├──TypeSpecifier(int)
│└──AssignmentExpression
│├──Identifier(c)
│└──BinaryExpression(+)
│├──Identifier(a)
│└──Identifier(b)
└──FunctionCall(printf)
├──Constant(The sum of a and b is%dn)
└──Identifier(c)
In this AST, each node represents a syntactic construct in the source code, such as a variable declaration, an assignment expression, a binary expression, a function declaration, and so on.
Code parsing, for example using Tree-sitter, processes the grammatical features of different programming languages, example code:
and S3, analyzing the abstract syntax tree and extracting semantic information.
In this embodiment, step S3, analyzing the abstract syntax tree, and extracting the semantic information concrete may further include the steps of:
s31, carrying out static check on the abstract syntax tree, and verifying the semantic correctness of the code.
Static checking is carried out on the abstract syntax tree, including type checking, scope analysis and the like, so that the semantic correctness of codes is ensured.
S32, optimizing the abstract syntax tree to improve the execution efficiency of the program.
The abstract syntax tree AST is optimized by means of optimization means such as constant folding, dead code elimination, etc.
S33, traversing each node of the abstract syntax tree, processing the syntax structure, and extracting semantic information.
The abstract syntax tree AST is traversed using a visitor mode or an iterator mode. The semantic analyzer traverses the abstract syntax tree, checking whether the definitions and uses of variables, functions, types are consistent, and whether there are semantic errors such as type mismatches, non-declared variables, etc.
In the semantic analysis process, a symbol table needs to be constructed and maintained for storing information such as variables, functions, types and the like, and checking such as repeated definition, type matching and the like.
And S4, converting the code into the target language according to the semantic information and the grammar rule of the target language to obtain a preliminary conversion code.
In this embodiment, step S4, converting the code to the target language according to the semantic information and the grammar rule of the target language, to obtain the preliminary converted code may further include the steps of:
s41, selecting a conversion template according to grammar rules and semantic information of the target language.
A conversion template library is constructed that contains various common code patterns. These translation templates should be designed according to the grammar rules of the target language. And selecting the most suitable conversion template from the template library according to the grammar rules and semantic information of the target language. Selected by pattern matching, rule reasoning, etc.
By precisely analyzing the grammar rules and semantic information of the target language, the logical consistency of the selected conversion templates with the source code can be ensured. The template can greatly reduce the workload of manually writing codes, improve the conversion speed and improve the code conversion efficiency. The design of the template library makes the transcoding process more modular and maintainable.
S42, filling the semantic information into the conversion template, and generating a source code of the target language to obtain a preliminary conversion code.
And mapping the variable names, data types and other information in the source code to corresponding variables and data types of the target language. Control flow statements (e.g., loops, conditional decisions) in the source code are mapped to corresponding statements in the target language.
And filling the mapped semantic information into the selected template. Such as string replacement, code segment concatenation, etc.
It should be noted that, the generated code needs to be verified in grammar to ensure that it conforms to the grammar rules of the target language.
And S5, further optimizing the preliminary conversion codes to generate final translation codes.
The generated preliminary translation code is optimized for the purpose of improving its performance, readability and maintainability.
In this embodiment, step S5, further optimizing the preliminary translation code, and generating the final translation code may specifically further include the steps of:
s51, the primary conversion codes are transferred to the GPT-4 model.
The preliminary translation code is input into the GPT-4 model through an interactive interface of the GPT-4 model, or an API or an online platform.
S52, analyzing the preliminary conversion codes by the GPT-4 model to generate optimization suggestions.
The GPT-4 model further analyzes and optimizes the primarily translated code based on its natural language processing and code generation capabilities.
GPT-4, after receiving the primary converted code, uses its ability of deep learning and pattern recognition to make a detailed analysis on the primary converted code. It can identify potential errors, inconsistencies, and possible optimization points in the code. GPT-4 will then generate optimized code suggestions that include more compact syntax, more efficient algorithm implementation, etc.
And S53, applying the optimization suggestion to the preliminary translation code to generate a final translation code.
After receiving the code suggestion generated by GPT-4, it may be evaluated and adjusted. If the suggestions are reasonable and valid, the suggestions are merged into the preliminary translation code to generate the final translation code. This process improves the accuracy of the transcoding.
Translation code is generated based on the grammar of the target language, maintaining the code format and comments.
A configurable conversion workflow manager may also be defined and executed during code optimization to handle exceptions and error recovery. The types of variables and expressions are inferred based on the context information. The identifier naming styles are converted using a rules engine and a machine learning model. An optimization mode of the language is applied, such as removing redundant code.
The integrated GPT-4 model performs code optimization of example code:
An example conversion flow code is as follows:
In some optional implementations of this embodiment, after step S5, the electronic device may further perform the following steps:
s6, displaying a comparison view of the original code in the translation code on the Web interface.
User feedback may be collected through a Web interface to improve transcoding rules.
In some optional implementations of this embodiment, after step S5, the electronic device may further perform the following steps:
Step S7, executing the translation code under the target language environment to obtain an execution result of the translation code.
The target environment is configured, so that the running environment required by the target language is installed and configured correctly. For example, for Python, a Python interpreter needs to be installed.
And executing the translation code, and running the generated code in the target environment. For the above Python code, it can be executed in the command line:
python hello.py
And acquiring an execution result, and observing the result of code execution. In the above example, the console will output "Hello, world-.
And analyzing the execution result, and performing function verification to verify whether the converted code realizes the same function as the source code. For the above example, as long as the output is correct, it indicates that the function verification is passed. Performance analysis may also be performed on the converted code. Performance analysis tools (such as cProfile in Python) are used to find potential performance bottlenecks. The memory use condition of the codes can be concerned, and the problem of memory leakage or other resource waste is avoided. In Python, memory analysis tools (e.g., memory profiler) can be used to check memory usage. Error handling may also be performed by checking whether the converted code has properly handled all possible error conditions. Such as verification of input data, exception handling, etc. Code readability may also be evaluated to evaluate whether the converted code is easy to read and maintain. Good code structure and clear logic are critical for long-term maintenance. The unit test can be performed on the converted codes, so that the code coverage rate is high. This helps to find potential bugs and unprocessed exceptions. The converted code can also be compared with the source code to find out the optimization point. For example, in Python, a list derivation may be used to optimize loop operation.
In the implementation, a detailed characteristic mapping table can be established for characteristic differences among different programming languages, such as aspects of memory management, concurrency models and the like, so that substitution implementation is provided for characteristics without direct correspondence.
For some transformations, a wider range of code context conditions need to be considered, the context information can be captured and utilized using data flow analysis and symbol table approaches.
For situations where the automatically generated code may lack readability, annotations and structures of the original code may be preserved by employing intelligent formatting algorithms.
For the situation that the source code may use a language specific library and there is no direct correspondence in the target language, a custom mapping interface may be provided by establishing a mapping relationship of the common library.
For the situation that the possible performance of the translated code is reduced, the performance optimization suggestion can be made by using a machine learning model in a mode of integrating target languages.
In particular, in order to convert the multi-file item codes, the multi-file items can be imported by way of recursive directory scanning and batch conversion. Git integration can also be added to support direct import and export of codes from the repository for version control integration. A RESTful API may also be provided, allowing other services to call translation functions, providing API services. Allowing users to define and share custom conversion rules and setting custom rule engines. Docker containerization applications may also be used to ensure environmental consistency. And adopting a CI/CD pipeline to automatically test and update deployment. High availability and automatic expansion and contraction are achieved using Kubernetes for container orchestration.
The method has the advantages that the source code and the target language are firstly obtained, then the source code is analyzed to generate the abstract grammar tree, then the abstract grammar tree is analyzed to extract semantic information, then the code is converted into the target language according to the semantic information and grammar rules of the target language to obtain the preliminary conversion code, then the preliminary conversion code is further optimized to generate the final translation code, the automation degree is high, the workload of manual conversion is greatly reduced, the conversion process is standard and uniform, the consistency and the accuracy of the code conversion are improved, the source code can be deeply understood through analysis and analysis of AST, so that the translation code with higher quality is generated, the expansion and the maintenance are easy, the support to new language or characteristics can be added according to requirements, the optimization step ensures that the converted code is correct, high-efficiency and high-readability, the cross-language programming and the code multiplexing are promoted, the software development cost is reduced, the cross-language migration and the upgrading of a software system are provided with powerful tool support, the development cost is greatly reduced, and the software maintenance and the evolution efficiency is improved.
The invention is operational with numerous general purpose or special purpose computer system environments or configurations. Such as a personal computer, a server computer, a hand-held or portable device, a tablet device, a multiprocessor system, a microprocessor-based system, a set top box, a programmable consumer electronics, a network PC, a minicomputer, a mainframe computer, a distributed computing environment that includes any of the above systems or devices, and the like. The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by computer readable instructions stored in a computer readable storage medium that, when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.
Example two
With further reference to fig. 3, as an implementation of the method shown in fig. 2 described above, the present invention provides an embodiment of a transcoding device, which corresponds to the embodiment of the method shown in fig. 2, and which is particularly applicable to various electronic devices.
As shown in fig. 3, the transcoding device 60 of the present embodiment includes an acquisition module 61, an analysis module 62, an extraction module 63, a conversion module 64, and an optimization module 65. Wherein:
An acquisition module 61 for acquiring a source code and a target language;
the parsing module 62 is configured to parse the source code to generate an abstract syntax tree;
An extracting module 63, configured to analyze the abstract syntax tree and extract semantic information;
The conversion module 64 is configured to convert the code into a target language according to the semantic information and the grammar conversion rule, so as to obtain a preliminary converted code;
The optimizing module 65 is configured to further optimize the preliminary translation code to generate a final translation code.
The method has the advantages that the source code and the target language are firstly obtained, then the source code is analyzed to generate the abstract grammar tree, then the abstract grammar tree is analyzed to extract semantic information, then the code is converted into the target language according to the semantic information and grammar rules of the target language to obtain the preliminary conversion code, then the preliminary conversion code is further optimized to generate the final translation code, the automation degree is high, the workload of manual conversion is greatly reduced, the conversion process is standard and uniform, the consistency and the accuracy of the code conversion are improved, the source code can be deeply understood through analysis and analysis of AST, so that the translation code with higher quality is generated, the expansion and the maintenance are easy, the support to new language or characteristics can be added according to requirements, the optimization step ensures that the converted code is correct, high-efficiency and high-readability, the cross-language programming and the code multiplexing are promoted, the software development cost is reduced, the cross-language migration and the upgrading of a software system are provided with powerful tool support, the development cost is greatly reduced, and the software maintenance and the evolution efficiency is improved.
Example III
The invention also provides an embodiment of a transcoding device, which corresponds to the embodiment of the method shown in fig. 2, and which can be applied in particular to various electronic devices.
The code conversion device of the embodiment adopts a modularized design and comprises a code analysis module, a semantic analysis module, a conversion rule engine, a code generation module and a workflow manager.
The front end is provided with a Web interface, which can be realized by using React or Vue.
The backend uses Python services and can be implemented using FastAPI or Flask.
The code parsing module may parse source code using Tree-sitter to generate an Abstract Syntax Tree (AST).
The semantic analysis module is used for analyzing AST and extracting semantic information of codes.
The conversion rule engine is used to define and apply conversion rules from a source language to a target language.
The AI enhancement module optimizes the conversion results based on LANGCHAIN framework using GPT-4 and other models.
The code generation module generates target language code based on the converted AST.
The workflow manager is used for coordinating the work of each module, managing the whole code conversion flow, defining and executing a configurable conversion workflow, handling exception and carrying out error recovery.
Data store, using PostgreSQL to store conversion history and user feedback information.
The transcoding device of the embodiment integrates an AI enhancement module based on LANGCHAIN framework, and optimizes transcoding by using a large language model such as GPT-4. And the Tree-sitter is used for carrying out efficient code analysis, so that the supporting capability of various programming languages is improved. The customizable conversion rule engine is realized, and the user-defined conversion logic is supported.
The code conversion device of the embodiment is provided with a code optimization module based on machine learning, and can learn and apply programming modes and best practices.
Through the settings, the device can accurately understand the structure and the semantics of the source code and generate the code conforming to the habit of the target language.
Fig. 4 is a timing diagram of the constituent blocks of another transcoding device of the present invention. As shown in fig. 4, the constituent block timings of the transcoding device are as follows:
The user inputs the source code and the target language through the Web interface.
The code analysis module uses Tree-sitter to analyze the source code, to analyze the code efficiently, process the grammar characteristics of different programming languages, and generate abstract grammar Tree AST.
Example code:
The semantic analysis module performs type inference and symbol analysis, analyzes variable scope and data flow, analyzes abstract syntax tree AST, and extracts semantic information.
The conversion rule engine defines mapping rules among languages, supports user-defined rules, applies predefined grammar conversion rules, and processes basic language conversion.
The AI enhancements handle complex logic transformations using custom chain built LANGCHAIN. And transmitting the preliminary conversion result to a GPT-4 model for further optimization.
Example code:
the code generation module generates code based on the grammar of the target language, maintains the code format and comments, combines the outputs of the Tree-sitter and AI enhancement modules, and generates a final translation code.
And showing a comparison view of the original code and the translation code on a Web interface.
User feedback is collected to improve the conversion rules and AI hints.
An AST conversion algorithm recursively traverses the source AST, applying conversion rules to generate a target AST.
A type inference algorithm infers the types of variables and expressions based on the context information.
Naming convention transformations, transforming identifier naming styles using a rules engine and machine learning model.
Code optimization algorithms apply language specific optimization techniques such as removing redundant code.
Example conversion flow:
The embodiment has the advantages of high efficiency, remarkably improved code conversion efficiency through automatic workflow and AI assistance, high accuracy, combination of static analysis, machine learning and a large language model, high expandability, modular design and configurable workflow support easy addition of new languages and characteristics, high intelligent degree, continuous learning and improvement of conversion quality of an AI enhancement module, adaptation to different programming styles and modes, high customization, support of user-defined conversion rules and optimization strategies, meeting specific requirements, high consistency, ensuring that converted codes meet best practices and coding specifications of target languages, cross-platform support, web interface and API service, easy integration of a system into various development environments, friendly version control, integration with a Git version control system and the like, convenient management of code conversion of large projects, continuous optimization performance, continuous improvement of system performance and code conversion quality through a user feedback mechanism, powerful tool support for cross-language migration and upgrading of a software system, great reduction of development cost, and improvement of software maintenance and evolution efficiency.
Example IV
In order to solve the technical problems, the embodiment of the invention also provides computer equipment. Referring specifically to fig. 5, fig. 5 is a basic structural block diagram of a computer device according to the present embodiment.
The computer device 7 includes a memory 71, a processor 72, and a network interface 73 communicatively coupled to each other via a system bus. It is noted that only the computer device 7 with the component memory 71, the processor 72 and the network interface 73 is shown in the figures, but it is understood that not all the illustrated components are required to be implemented and that more or fewer components may be implemented instead. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and its hardware includes, but is not limited to, a microprocessor, an Application SPECIFIC INTEGRATED Circuit (ASIC), a Programmable gate array (Field-Programmable GATE ARRAY, FPGA), a digital Processor (DIGITAL SIGNAL Processor, DSP), an embedded device, and the like.
The computer device may be a desktop computer, a notebook computer, a palm computer, a cloud server, or the like. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.
The memory 71 includes at least one type of readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage 71 may be an internal storage unit of the computer device 7, such as a hard disk or a memory of the computer device 7. In other embodiments, the memory 71 may also be an external storage device of the computer device 7, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, which are provided on the computer device 7. Of course, the memory 71 may also include both the internal storage unit of the computer device 7 and its external storage device. In this embodiment, the memory 71 is typically used to store an operating system and various types of application software installed on the computer device 7, such as computer readable instructions of a transcoding method. In addition, the above-described memory 71 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 72 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 72 is typically used to control the overall operation of the computer device 7 described above. In this embodiment, the processor 72 is configured to execute computer readable instructions stored in the memory 71 or process data, such as computer readable instructions for executing the transcoding method.
The network interface 73 may comprise a wireless network interface or a wired network interface, which network interface 73 is typically used for establishing a communication connection between the computer device 7 and other electronic devices.
The embodiment has the advantages of high automation degree, great reduction of manual conversion workload, standard and unified conversion process, improvement of consistency and accuracy of code conversion, deeper understanding of source codes through analysis and analysis of AST, generation of translation codes with higher quality, easiness in expansion and maintenance, capability of adding support to new languages or characteristics according to requirements, ensuring correctness, high efficiency and high readability of converted codes by optimization steps, promotion of cross-language programming and code multiplexing, reduction of software development cost, provision of powerful tool support for cross-language migration and upgrading of a software system, great reduction of development cost, and improvement of software maintenance and evolution efficiency.
Example five
The present invention also provides another embodiment, namely, a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of a transcoding method as described above.
The embodiment has the advantages of high automation degree, great reduction of manual conversion workload, standard and unified conversion process, improvement of consistency and accuracy of code conversion, deeper understanding of source codes through analysis and analysis of AST, generation of translation codes with higher quality, easiness in expansion and maintenance, capability of adding support to new languages or characteristics according to requirements, ensuring correctness, high efficiency and high readability of converted codes by optimization steps, promotion of cross-language programming and code multiplexing, reduction of software development cost, provision of powerful tool support for cross-language migration and upgrading of a software system, great reduction of development cost, and improvement of software maintenance and evolution efficiency.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods of the embodiments of the present invention.
It is apparent that the above-described embodiments are only some embodiments of the present invention, but not all embodiments, and the preferred embodiments of the present invention are shown in the drawings, which do not limit the scope of the patent claims. This invention may be embodied in many different forms, but rather, embodiments are provided in order to provide a thorough and complete understanding of the present disclosure. Although the invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing description, or equivalents may be substituted for elements thereof. All equivalent structures made by the content of the specification and the drawings of the invention are directly or indirectly applied to other related technical fields, and are also within the scope of the invention.
Claims (10)
1. A method of transcoding comprising the steps of:
Acquiring a source code and a target language;
Analyzing the source code to generate an abstract syntax tree;
analyzing the abstract syntax tree and extracting semantic information;
converting the code to the target language according to the semantic information and the grammar rule of the target language to obtain a preliminary conversion code;
and further optimizing the preliminary translation code to generate a final translation code.
2. The transcoding method of claim 1, wherein the step of obtaining the source code and the target language specifically comprises:
Acquiring the source code from a code library;
Acquiring the target language from an input interface;
storing the source code and the target language.
3. The transcoding method of claim 1, wherein said parsing said source code to generate an abstract syntax tree comprises:
Performing lexical analysis on the source code, dividing a character sequence of the source code into lexical units, and distributing corresponding labels for each lexical unit;
Based on the lexical analysis, a grammar analyzer is used to identify grammar structures and to construct abstract grammar trees.
4. The transcoding method according to claim 1, wherein said step of analyzing said abstract syntax tree and extracting semantic information comprises:
performing static check on the abstract syntax tree to verify the semantic correctness of the code;
Optimizing the abstract syntax tree to improve the execution efficiency of the program;
Traversing each node of the abstract syntax tree, processing the syntax structure, and extracting the semantic information.
5. The transcoding method of claim 1, wherein the step of converting the code into the target language according to the semantic information and the grammar rule of the target language, to obtain the preliminary converted code comprises:
Selecting a conversion template according to the grammar rule of the target language and the semantic information;
And filling the semantic information into the conversion template, and generating a source code of a target language to obtain a preliminary conversion code.
6. The transcoding method of claim 1, wherein said step of further optimizing said preliminary translation code to generate a final translation code comprises:
transferring the preliminary conversion codes to a GPT-4 model;
analyzing the preliminary conversion codes by a GPT-4 model to generate optimization suggestions;
And applying the optimization suggestion to the preliminary translation code to generate a final translation code.
7. The transcoding method of any one of claims 1 to 6, further comprising, after said step of further optimizing said preliminary translation code to generate a final translation code:
and executing the translation code under the target language environment to obtain an execution result of the translation code.
8. A transcoding apparatus, comprising:
The acquisition module is used for acquiring the source code and the target language;
the analysis module is used for analyzing the source code and generating an abstract syntax tree;
The extraction module is used for analyzing the abstract syntax tree and extracting semantic information;
The conversion module is used for converting the codes into target languages according to the semantic information and the grammar conversion rules to obtain primary conversion codes;
and the optimizing module is used for further optimizing the preliminary conversion codes and generating final translation codes.
9. A computer device comprising a memory having stored therein computer readable instructions which when executed by a processor implement the steps of the transcoding method of any of claims 1to 7.
10. A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor implement the steps of the transcoding method of any of claims 1 to 7.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411692096.2A CN119621074A (en) | 2024-11-25 | 2024-11-25 | A code conversion method, device, computer equipment and storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411692096.2A CN119621074A (en) | 2024-11-25 | 2024-11-25 | A code conversion method, device, computer equipment and storage medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN119621074A true CN119621074A (en) | 2025-03-14 |
Family
ID=94904048
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202411692096.2A Pending CN119621074A (en) | 2024-11-25 | 2024-11-25 | A code conversion method, device, computer equipment and storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN119621074A (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN120010825A (en) * | 2025-04-22 | 2025-05-16 | 长春建筑学院 | Programming assistance method, computer device and storage medium based on generative AI |
| CN120315690A (en) * | 2025-06-17 | 2025-07-15 | 麒麟软件有限公司 | Method, device, apparatus and product for introducing signals into code |
-
2024
- 2024-11-25 CN CN202411692096.2A patent/CN119621074A/en active Pending
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN120010825A (en) * | 2025-04-22 | 2025-05-16 | 长春建筑学院 | Programming assistance method, computer device and storage medium based on generative AI |
| CN120315690A (en) * | 2025-06-17 | 2025-07-15 | 麒麟软件有限公司 | Method, device, apparatus and product for introducing signals into code |
| CN120315690B (en) * | 2025-06-17 | 2025-08-26 | 麒麟软件有限公司 | Method, apparatus, device and product for introducing signals in a code |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Moldovan et al. | AG: Imperative-style Coding with Graph-based Performance | |
| CN110673854B (en) | SAS language compilation method, device, equipment and readable storage medium | |
| CN111708539A (en) | Application program code conversion method and device, electronic equipment and storage medium | |
| CN119621074A (en) | A code conversion method, device, computer equipment and storage medium | |
| US20170357927A1 (en) | Process management for documentation-driven solution development and automated testing | |
| CN101208660A (en) | Transcoding | |
| CN110941427B (en) | Code generation method and code generator | |
| Cánovas Izquierdo et al. | A domain specific language for extracting models in software modernization | |
| CN113504900A (en) | Programming language conversion method and device | |
| CN117608656A (en) | Mixed front end frame migration method based on AST and LLM | |
| US20240370452A1 (en) | Architecture for data map converters | |
| CN117632751A (en) | Deep learning compiler fuzz testing method and system based on large language model | |
| CN110928535B (en) | Derived variable deployment method, device, equipment and readable storage medium | |
| CN118573738A (en) | Industrial control protocol configuration method, device, computer equipment and storage medium | |
| CN119690440A (en) | Automatic code activity diagram generation system and method | |
| CN120104113A (en) | Program code intelligent conversion method and system based on knowledge graph | |
| Cordy | Source transformation, analysis and generation in TXL | |
| CN105335161B (en) | It is a kind of from TASM time abstractions state machine to extension NTA automatic machines conversion method | |
| CN119691039A (en) | Data standardization method and device, computer equipment and storage medium | |
| CN119576333A (en) | Code translation method, system, electronic device and computer readable storage medium | |
| CN117971236A (en) | Operator analysis method, device, equipment and medium based on lexical and grammatical analysis | |
| Akers et al. | Case study: Re-engineering C++ component models via automatic program transformation | |
| CN117519667B (en) | Intelligent contract automatic generation method, management system and storage medium | |
| CN111857732A (en) | A Marker-Based Parallelization Method for Serial Programs | |
| Arora et al. | Tools and techniques for non-invasive explicit parallelization |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |