WO1997032261A1 - Method and system for genetic programming - Google Patents
Method and system for genetic programming Download PDFInfo
- Publication number
- WO1997032261A1 WO1997032261A1 PCT/US1996/002758 US9602758W WO9732261A1 WO 1997032261 A1 WO1997032261 A1 WO 1997032261A1 US 9602758 W US9602758 W US 9602758W WO 9732261 A1 WO9732261 A1 WO 9732261A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- program
- gene
- solution
- string
- strings
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/12—Computing arrangements based on biological models using genetic models
- G06N3/126—Evolutionary algorithms, e.g. genetic algorithms or genetic programming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/30—Creation or generation of source code
Definitions
- the present invention relates to computing systems for discovering efficient programs through genetic programming techniques. More specifically, the invention combines genetic algorithms with graph reduction techniques to produce a computing system which permits the efficient evolution of programs for solving problems whose input is known and which have a function for determining whether one program is better than another.
- Lambda calculus represents another way to manipulate information to obtain desired results.
- special "lambda expressions” are created to describe a system which closely links data to functions.
- a universal programming machine can be constructed which produces results equivalent to those produced by von Neumann machines.
- Combinators are an example of a graph reduction system. That is, a system where programs are represented as lists of elements which are transformed by a set of graph reduction operators. While this invention focuses on combinators because of their compactness and proven universality, the invention covers the use of any and all graph reduction systems, including those which work on strings which represent graphs.
- Combinators K, S, I, B, C, and W are defined by the following functions and illustrated graphically in Figure 1.
- f, g, x and y may be functions, operators, constants, or combinator expressions
- Combinator computing systems usually use bracket abstraction to remove references to variables; i.e., input data which changes each time the program is executed.
- the bracket abstraction method produces a pure combinator expression having no references to input data. Instead the combinators are applied directly to input data arranged as a graph or tree.
- Genetic algorithms are usually demonstrated in programs which look for the maximal or optimal value from a known measurement function such as finding function maxima or minima. For example, the "Traveling Salesperson" problem of searching for the most efficient route of travel among a very large number of possible routes is often used to demonstrate genetic algorithms.
- the present invention describes a computer system capable of being constructed with custom hardware or implemented as a software state machine, which uses genetic programming and graph reduction techniques to create programs which can solve computational problems. This is accomplished by using a genetic program to successively produce more efficient programs represented in the form of graph reduction operator strings.
- a graph reduction system is one where the program is represented by a series of operators which transform or reduce a graph (or list) representation of data or program components. This includes graphs which are represented as strings and graph reduction systems based on string manipulations.
- the present invention describes a computer system capable of being constructed with custom hardware or implemented as a software state machine, which uses genetic programming and graph reduction techniques to create programs which can solve computational problems. This is accomplished by using a genetic program to successively produce more efficient programs represented in the form of graph reduction operator strings.
- a graph reduction system is one where the program is represented by a series of operators which transform or reduce a graph (or list) representation of data or program components. This includes graphs which are represented as strings and graph reduction systems based on string manipulations.
- the present invention differs from previous attempts at building genetic programming machines in that the program gene strings themselves are programs. Previous attempts mapped the gene strings to program fragments. This mapping of gene strings is time consuming because it requires continuous data transfers to and from various memory locations throughout the evolution process in order to assemble candidate programs for testing. Through the use of combinators, the present invention eliminates the need for mapping gene strings and repeated data transfers associated with such mapping. Thus, the present genetic programming system is faster than other genetic programming systems which employs the mapping of data and variables.
- FIGURE 1 is a graphical illustration of the functions performed by various combinators
- FIGURE 2 is a block diagram of a combinator machine used to load and evaluate combinator strings according to the present invention
- FIGURE 3 is a block diagram of a genetic programming system having multiple combinator machines arranged in parallel; evaluated is represented in a single gene string, no mapping is required with the present invention. Evaluation of each program string is performed by applying the program gene string to input data and measuring the relative "quality” or “fitness” of the resulting output.
- the present invention differs from previous attempts at building genetic programming machines in that the program gene strings themselves are programs. Previous attempts mapped the gene strings to program fragments. This mapping of gene strings is time consuming because it requires continuous data transfers to and from various memory locations throughout the evolution process in order to assemble candidate programs for testing. Through the use of combinators, the present invention eliminates the need for mapping gene strings and repeated data transfers associated with such mapping. Thus, the present genetic programming system is faster than other genetic programming systems which employs the mapping of data and variables.
- FIGURE 1 is a graphical illustration of the functions performed by various combinators
- FIGURE 2 is a block diagram of a combinator machine used to load and evaluate combinator strings according to the present invention
- FIGURE 3 is a block diagram of a genetic programming system having multiple combinator machines arranged in parallel; components.
- a control bus 32 interconnects the various combinator machine components and provides for the flow of control instructions between components.
- a Combinator Evaluation Unit/Controller (CEU/Controller) 34 is electrically coupled to both buses 30 and 32. CEU/Controller 34 evaluates combinator program gene strings.
- a microcode memory 36 is a Read Only Memory (ROM) which contains descriptions for evaluating individual combinators. Microcode memory 36 provides the necessary rules and instructions for evaluating various portions of the combinator program gene strings.
- ROM Read Only Memory
- CEU/Controller 34 is capable of fetching various elements of the gene string being evaluated.
- CEU/Controller 34 also controls I/O Processor (IOP) 38 and arithmetic logic unit (ALU) 40, both of which are electrically coupled to both main bus 30 and control bus 32.
- Control bus 32 is used by CEU/Controller 34 to control the operation of IOP 38 and ALU 40.
- IOP 38 provides an interface between the combinator machine and other portions of the genetic programming system.
- a main memory 42 and a stack memory 44 are both connected to main bus 30.
- Main memory 42 contains the program gene string being evaluated by CEU/Controller 34 as well as the input data which the program is applied to.
- Stack memory 44 stores intermediate states and values generated as a result of evaluating the program gene string. For example, stack memory 44 may store a pointer which recalls where an evaluation was stopped while a sub- evaluation was completed.
- Figure 2 illustrates an embodiment of an individual combinator machine
- this same architecture can also be implemented as a software state machine running on hardware in a traditional computer system.
- the system's central processing unit CPU
- CPU central processing unit
- ALU 40 shown in Figure 2.
- CEU/Controller 34 and ALU 40 are replaced by separate programs in a software library which are invoked as necessary.
- Main memory 42 and stack memory 44 are implemented as blocks of memory within the system's RAM and the system's I/O processing system replaces IOP 38 for the input and output of combinator expressions and data.
- An evaluator program is used instead of microcode 36 which successively retrieves elements of the particular gene string being evaluated and calls the appropriate combinator library subroutine to implement the evaluation.
- Figure 3 shows a block diagram of a genetic programming system utilizing several combinator machines arranged in parallel. Multiple combinator machines 50 are arranged in a parallel manner. Each combinator machine 50 may be implemented in either hardware (as shown in Figure 2) or software, as described above.
- any number of combinator machines 50 may be arranged in parallel; the number of combinator machines depends on a variety of factors.
- cost limitations may restrict the number of combinator machines used. The greater the number of combinator machines used, the greater the overall cost of the system.
- Speed requirements may also dictate the number of combinator machines used in a particular system.
- a system which requires fast evaluation requires a greater number of combinator machines than a system with lower speed requirements.
- the complexity of the problem to be solved affects the number of combinator machines required in a particular system. Relatively simple problems will generally require fewer combinator machines than complex problems.
- the number of combinator machines used in a genetic programming system varies depending on the number of gene strings being evaluated in a particular generation.
- a system which needs to evaluate many gene strings in each generation will benefit from a larger number of combinator machines.
- a genetic programming system which evaluates 100 gene strings in each generation may have 100 combinator machines; i.e., one combinator machine to evaluate each gene string. In such a system, all 100 gene strings are evaluated simultaneously, resulting in faster overall operation of the system.
- each combinator machine 50 is connected to a data/control bus 46 and a program gene string bus 48.
- a genetic program controller (GP controller) 52 is connected to data/control bus 46.
- GP controller 52 controls the overall operation of the genetic programming system by distributing evaluation work among the combinator machines 50, retrieving results from the combinator machines, and evolving the various gene strings.
- FIG. 8 illustrates a block diagram of a general purpose computer capable of being used with the present invention.
- a bus 10 interconnects the various system components and provides a common pathway for the flow of data, instructions, and the like.
- a central processing unit (CPU) 12 is connected to bus 10 and performs the actual computing operations.
- a random access memory 14 is also connected to bus 10 and provides a location for storing data and other information.
- a data storage device 16 is connected to bus 10 and provides non-volatile storage of information. Data storage device 16 may be a disk drive, tape drive, or similar storage devices.
- An input device 18 is connected to bus 10 and allows a computer user to input data, commands, and other information into the computer system.
- Input device 18 may be a keyboard, optical scanner, microphone, or other device capable of generating a machine-readable signal for the computer system.
- the input device used varies depending on the particular application. If a particular system is used for handwriting recognition, the input device must be capable of reading or digitizing a person's handwriting. In that situation, the input device may be an optical scanner, a pressure-sensitive writing tablet, a light pen with a light reader, or other similar device. In certain situations, multiple input devices are required. For example, a genetic programming system used in a voice recognition system requires a microphone or similar device to input a particular voice pattern as well as a keyboard to input various user-defined parameters.
- a display device 20 is connected via bus 10 to the other system components.
- Display device 20 is a video monitor used to display user- defined parameters, status of program operation, and program results.
- An output device 22 is also connected to bus 10 and provides a permanent copy of the program results.
- output device 22 is a printer.
- step 58 The overall operation of the evolution process utilized by the genetic programming system is illustrated in Figure 4. As shown in step 58, before the genetic programming system can begin evolving gene strings, various parameters and data must be provided to the system.
- a user enters the number of gene strings to be included in the gene pool.
- the user enters a preferred length or range of lengths for the initial gene strings.
- the length of a gene string refers to the number of genes contained in the string.
- the user enters a frequency of mutation of the gene strings.
- the user enters the frequency of permutation of the gene strings. Information regarding the mating rate of the gene strings is entered at step 92.
- All of the data entry steps 86-92 may be entered by the user of the genetic programming system as described above. Once entered, the data is stored in a memory device such as GP memory 54 described above with respect to Figure 3. Alternatively, any of the data entered in steps 86-92 may be permanently stored in the memory device and, therefore, need not be entered by the user. Another alternative provides for default parameter values. If the user does not enter data for a particular step, the default value is used. Finally, the data entered in steps 86-92 can be determined randomly by the genetic programming system within a specified range of values for a given parameter. For example, the probability of mutation may be randomly chosen between the values of .001 and .01, the length of initial gene strings may be selected between 40 characters and 100 in length, etc.
- step 94 the user enters the input data which is to be applied to each program gene string.
- the input data will vary depending on the problem to be solved.
- This input data is stored in memory for use by the genetic programming system.
- step 95 the user enters all values which are to be used as constants.
- constants may be single character values, strings of characters or numbers. This list of constants input in this step will be used in constructing program gene strings.
- the user enters a fitness function which is also stored in memory.
- the fitness function is used to determine whether one result is better than another. Additional details regarding selecting and applying fitness functions are discussed below.
- a set of termination criteria is entered by the user. This termination criteria is used to determine when evaluation of a particular gene string should be terminated. Additional details regarding determining the termination criteria are provided below.
- the gene pool is comprised of program gene strings constructed using combinators. This step requires the random creation of strings of combinators, operators, and constants.
- the first step in creating an initial gene pool involves retrieving the parameters regarding the size of the initial gene pool, step 100.
- the size of the initial gene pool is usually determined by the user based on the estimate of the program complexity and need for variety within the pool of candidates but may also be determined by using a default value or a randomly generated value.
- the routine determines whether the gene pool is full; i.e., whether the number of gene strings in the pool equals the size of the pool determined at step 100. Initially, the gene pool is empty and the routine branches to step 104 where the length of the next gene string is determined.
- the gene string length may be randomly determined or selected by the user, as described above. Preferably, a range of lengths is used in order to create a wide variety of candidates for the initial pool.
- the first string may be 30 elements long, the next 43, the next 22, etc.
- the routine determines whether the gene string is long enough; i.e., whether the gene string length equals the length determined in step 104. Initially, creation of the gene string has not started and the routine branches to step 108 where a gene is randomly selected. The gene selected is either a combinator, a constant or an operator. The selection is made by weighted random choice with preference given to combinators.
- the combinator is added to the end of the gene string at step 118. If the combinator is the first gene in the string, it forms the starting point for building the gene string.
- the constant is added to the end of the gene string at step 118. As with the combinator above, if the constant is the first gene in the string, it forms the starting point for building the gene string.
- a character constant is defined as a character with the single quote character around it (e.g., 'C') while the combinator is a "naked" C character (e.g., C).
- strings of characters which form constants are surrounded by the double-quote character (e.g, "Fred") while numbers are naked characters in the numeric range (0- 9) with an optional decimal point and optional sign character (e.g., -3.14159).
- strings they are actually included into the program gene strings as subtrees of character constants (e.g., ( 'F' 'r' 'e' 'd') ) but may be entered into the lists of possible constants in the double-quote notation ("Fred").
- the user may simply input such values as seem useful as part of the data the program gene strings are being applied to.
- a program which is meant to calculate mathematical values might usefully use such mathematical constants as pi, e, etc.
- a program meant to analyze chemical substances might have chemical constants, etc.
- step 118 the routine branches to step 106 to determine whether the string is long enough. Genes are repeatedly added to the gene string until the desired length is obtained. When the gene string is completed, the program branches to step 107 where the parentheses in the generated expressions are balanced.
- the gene string ( (2 3 + K) (4 7 *) 2 1) has two sub-trees off of it consisting of the strings 2 3 + K and 4 7 * respectively in separate branches of the tree along with the two branches holding only the single number 2 and 1.
- the open parenthesis character ( ' ( ' ) and the close parenthesis character ( ' ) ' ) are treated as separate choices in the list of possible entries in a gene string and so at the end of the step 106, the generated string must be examined and if there are extra open parentheses, enough close parentheses are added at the end of the string to balance the number of open and closes in the string. Likewise if there are more close parentheses characters, open parentheses characters are added to the beginning. While there are other ways to ensure this balancing such as adding both an open and close pair at random locations to the string at the same time, this is considered a superior method of adding the necessary structure to the expressions.
- step 107 the system returns to step 102 to determine whether the gene pool is full. If it is, then creation of the initial gene pool is complete. Otherwise, another gene string is created following the steps described above.
- the initial generation is evaluated by applying each member to the input data (step 60) and evaluating the fitness (step 62) by applying the fitness function to the results produced in step 60.
- step 60 application of the input data is done by applying the input data to each gene string in the pool to produce a result.
- the first generation of gene strings will be composed entirely of randomly generated genes.
- the pool is comprised of the gene strings which have evolved from previous generations.
- the user-supplied input data is applied to each string to allow combinator reduction on the data by one or more combinator machines. Combinator reduction creates a value or set of values.
- an expression such as 'A+3' may be treated as an illegal combination and thus rejected when the program gene string is evaluated but the previously suggested extension to operators is considered to be a superior solution.
- a program error may occur at any time in the application of a gene string to input data. At such a time the application is aborted and the program is given a very low fitness rating. Errors may occur for a variety of reasons but the commonest is that an impossible operation is being attempted. Since gene strings are changing constantly due to the combination of strings it is possible for illegal expressions such as '+ * 3' to occur. These expressions will cause errors when the program is run and such expressions are eliminated by assigning low fitness ratings in step 62 where program string fitness is assessed.
- each gene string is assigned a fitness value.
- the fitness value represents the similarity between the output obtained from the gene string and the desired result.
- the fitness value is represented as a numerical value.
- the fitness function provides an objective measure of "how good" the program is when applied to the input data. This fitness comparison is applied to all program gene strings of a generation and allows the ranking of all gene strings relative to one another. Program gene strings which have a higher fitness value when evaluated by the fitness function are ranked higher than with poorer values. Initially, most randomly generated gene strings will have a poor fitness value, as a result of the random nature of the initial selection process. However, as successive generations are created, the mechanism of evolution develops gene strings with increasingly higher fitness values.
- input data for a handwriting recognition system may be a digitized handwritten sample to which gene strings are applied.
- the desired output is the actual string of characters contained in the handwriting sample.
- Each combinator expression is applied to the input data to create and output which is compared to the expected string of characters.
- the fitness value may be determined from the number of correctly identified characters positioned in the correct location. Depending on the requirements and preferences of the user, proper order of the letters may be more important than identifying every character. On the other hand, a different user of the same system might place greater emphasis on properly identifying each character rather than properly ordering each identified character.
- results will be poor and may include some outputs with no correct characters whatsoever, only rearrangements of points.
- the gene strings whose evaluations produce characters or strings of characters will have higher fitness values than those whose evaluations identified fewer characters. The closer a program's output comes to matching the expected string of characters, the better its fitness value.
- the system determines whether to terminate evolution. In other words, there must be some criteria for determining when one or more of the gene strings evolved is "good enough" to satisfy the goals of the user.
- This termination criteria is problem-dependent and must be supplied by the user.
- a fitness function is used as a measure for determining the relative superiority of one solution as compared to a second solution. Typical termination criteria may require that some threshold value is arrived at when the fitness function is applied.
- Another possible termination criteria can be the fact that the fitness of the program gene string has not noticeably improved during the last 'n' generations. Alternatively, the termination criteria may be both of these measurements, such that evolution is terminated when either criteria is satisfied. If the termination criteria is satisfied, then the program is terminated and the best program gene strings created are provided to the user of the system as the best programming solutions for the particular problem.
- the termination function is input into the system in step 96 of Figure 5 as discussed above.
- step 66 an operation is selected to be used to build the next generation of program gene strings.
- Either mating (step 68) or replication (step 70) is chosen based in part by the mating rate entered as a system parameter as described in step 92 in Figure 5.
- the mating operation involves combining two program gene strings to create two new and different gene strings in the next generation.
- the replication operation involves choosing a program gene string from the current generation and copying it into the next generation.
- candidates For both these operations candidates must be chosen from the current generation. In the case of mating, two candidates must be selected (step 72). In the case of replication, a single candidate must be selected (step 74).
- the preferred method of choosing candidates is to use the fitness rankings to weight the selection of candidates from the current generation. To do this the total of all the fitness ratings of the current generation are summed. The probability of any candidate being selected is then the ratio of its fitness when compared to the total fitness of all candidates.
- the gene string is split at one or more locations, called the crossover points, and combined with pieces of other successful gene strings which are similarly split at one or more locations.
- the simplest and most powerful way to mate two gene strings is to pick an arbitrary point in each gene string, break each string into two parts and then combine each portion of the first gene string with a corresponding portion of the second gene string.
- step 72 a crossover point is determined for each gene string at step 76.
- step 78 the pieces of two or more gene strings are applied to one another, thereby creating new program gene strings.
- step 79 each newly created string is checked to ensure that the number of open parentheses in the strings are matched by an equal number of close parentheses. This is similar to the parentheses balancing performed in step 107 of Figure 6 where the initial gene strings are created.
- the gene string S(S(B+)1C)KI is mated with B(C(KS) I*)7K1, by random choice it is decide to break the first gene string after the 'B' combinator and to break the second gene string after the 'I' combinator, and then mate the pieces to create the strings: S(S(B*)7K1 and +) 1C)KIB(C (KS) I respectively.
- the first gene string created from this mating: S(S(B*)7K1 has more open parentheses than close parentheses. This expresses an unfinished tree and so a close parentheses is added to create a finished expression of S(S(B*)7K1).
- the result of mating is to produce two gene strings which have different lengths than their parents. This is both typical and necessary because it permits the length of the program gene strings created by mating to change in successive generations thus allowing for additional complexity of gene strings or the simplification of gene strings.
- a gene string in the current generation is replicated into the next generation.
- Gene string replication is simply the copying of a gene string into the next generation. This is similar to an individual surviving into the next generation in the natural world.
- the gene string is not altered, but merely copied into the gene pool for consideration in the next generation.
- Gene strings which are replicated into the next generation are generally those strings from the previous generation having the highest fitness levels.
- Step 74 is where the actual choice of individual for replication is made. This choice is done by weighted random choice as described in step 72 above when choosing mating partners.
- the permutation operation, step 80, is performed by taking the permutation frequency as entered in step 91 of Figure 5 and randomly checking whether any string has been permuted. If it has, the order of the candidates genes are "scrambled.” For example, if the string S (SB (CK) SI) *7K, having been replicated, is randomly checked for permutation. If a random check shows that the string is permuted in the next generation, the resulting string may be S (BC7 (CK) *I) SK or any other possible reordering of the gene string.
- Permutation provides an efficient way to transform a gene string which is close to being a correct solution without having to rely on other genes to produce a mate which will create such a rearrangement. Thus, program gene strings which have a relatively high fitness value may achieve an even higher fitness value of the genes are merely rearranged in a different sequence.
- the preferred method of implementation checks each gene in the string for possible permutation based on the permutation frequency. If it is randomly decided to permute the gene, then another gene is randomly chosen and the location of the genes within the program gene string as swapped. For example, if the program gene string is S(K*+3)C(4 -)5 and by sequential check it is found that while the first two genes ('S' and '(' respectively) are not permuted (based on the random check against the permutation frequency) but that the 3rd gene ('K') is to be permuted, then by random selection, another gene (say, 'C) is chosen and the two are swapped producing the new program gene string of S(C*+3)K(4 -)5. The rest genes would then also be checked before completing the permutation operation.
- permutation may cause the parentheses in the expression to become scrambled, with unbalanced sequences of opens and closes (e.g., it may cause the string to begin with a close parenthesis)
- the parenthesis balancing process described in step 79 must be repeated as part of the permutation operation.
- the last operator is the mutation operator, step 82.
- a mutation is when a gene spontaneously transforms itself to another value.
- the possibility for mutation is based on the mutation frequency as shown in step 90 of Figure 5. This frequency is used to determine the possibility of any given gene changing its value. For example, if the gene string S(S(B+)1C)KI has a mutation in the C combinator, a new gene is chosen in a manner similar to that used in creating the initial population. The new gene replaces the C combinator and may result in a new string of S (S (B+) 1K)KI. Mutations result in gene strings which are noticeably different and may create a radically new gene string with the potential to produce an entirely different approach to solving a problem. Thus, mutation is more likely to produce an innovative result.
- the parentheses balancing operation described in step 79 for mating must be applied to the new gene string.
- step 84 the new generation is checked for completeness and if it is filled then the system returns to step 60 and begins the cycle again. If it is not filled, the system returns to step 66 and begins the process of mating or replicating in order to continue filling the generation.
- a solution to this endless evaluation problem involves timing the evaluation and terminating any evaluation which does not produce a result within a predetermined period of time.
- a gene string which cannot be evaluated within the specified time period is either not included in the candidate gene pool for the next generation or is given a very low fitness rating.
- Figure 2 illustrates a combinator machine for evaluating program gene strings containing combinators.
- the combinator machine includes main memory 42 which is designed to store elements in a tree format, and combinators being implemented as machine primitives. Such dedicated machines are efficient and can be implemented as a single, custom integrated circuit, such as an ASIC, and can be run in parallel with other combinator machines.
- a combinator machine can be implemented in software by implementing combinators as stack manipulations.
- the tree on which the combinators work is built on a push-down stack mechanism available on most computers currently on the market. Evaluation trees are built as a series of pointers on the machine's stack and the combinators alter the tree by rearranging the order of the elements in the stack as appropriate. This works well because most current processors have efficient stack manipulation operators but have less efficient operators for manipulating tree structures.
- the genetic algorithm operators can also be expressed in combinator form since combinators provide a universal machine. Therefore, if a specialized combinator machine is created, there is no need to have a machine which recognizes anything but combinator expressions.
- Figure 3 illustrates a genetic programming system with multiple combinator machines arranged in parallel. Due to the need to have several gene strings in the gene pool which must be continually evaluated each generation, there is a natural and effective way to create a parallel processing system. As shown in Figure 3, each combinator machine is capable of evaluating a combinator gene string given a set of input data. These combinator machines are used to evaluate each of the gene strings in the gene pool. Thus, rather than a sequential evaluation of each program gene string in the gene pool. all gene strings are evaluated simultaneously, thereby enhancing efficiency and speed.
- GP controller 52 controls the entire genetic programming system and uses Data/control bus 46 to direct combinator machines 50. GP controller 52 uses the input criteria, data and parameters stored in GP memory 54 which is loaded as part of programming the system. GP memory 54 stores information such as string lengths, mutation and mating rates, as well as input data, fitness function and termination criteria used to evaluate the effectiveness of the program gene strings which are being evolved.
- Combinator machines 50 can evaluate arbitrary gene strings when presented with the input data. Additionally, the combinator machines can store instructions to implement the necessary operations used in the genetic algorithm.
- GP controller 52 directs each combinator machine 50 to generate a random gene string from a set of combinators, constants, and operators. GP controller 52 controls mating, permutation and other genetic functions by analyzing the results of a generation as communicated along Data/control bus 46 and issuing appropriate instructions to each combinator machine as to what function it should perform next with respect to each program gene string.
- part or all of a gene string is passed between combinator machines depending on whether they are being reproduced or mated with a gene string in another combinator machine.
- These gene strings or gene string fragments travel along program gene string bus 48 .
- the evaluation of a single gene string can be broken into pieces and several combinator machines can be used to evaluate each gene string portion. This is possible because all combinators can be evaluated in an order-independent fashion and maintain the same result when all of the combinator pieces are reunited.
- a single gene string may be broken into sub-strings A, B and C, each of which are evaluated independently of the others. When the resulting strings are recombined, the final result will be the same as if the string had been evaluated in a single process.
- An example of the present invention is illustrated by its application to the well-known traveling salesperson problem. This is an optimization problem where the goal is to find the shortest route for a salesperson to travel to all of his or her sales stops. A table of distances between stops is supplied as input data and the desired route is the one which provides the shortest distance while traveling to each location at least once.
- the present invention can find a program solution which will produce one of the better routes given a table of distances between the cities.
- the user creates a tree structure containing the input data; i.e., the distances between all the cities which must be visited.
- the user inputs the fitness function and the termination function for the problem.
- the first approach is what might be called the "weak” form of genetic programming which uses known methods or known solutions written as program gene strings. These known programs are used to populate the initial gene string pool. This is a case of beginning with good initial gene strings in an attempt to find an even better solution. This approach is referred to as the "weak” form of genetic programming because it is less likely to produce an innovative result since all of the initial gene strings represent known solutions. Since the known solutions already produce relatively satisfactory results, the genetic programming system is less likely to produce an innovative result which differs significantly from "conventional wisdom.”
- a tree structure (or, mathematicily, a graph) is created containing the input of the distance table. Since combinator programs operate on graphs, the input data must be represented as a graph.
- Figure 7. This figure illustrates a tree, each of whose branches is a sub-tree describing the distances between the city represented by the branch and the other cities in the tree.
- the text string shown at the bottom of Figure 7 is a string representation of the same graph. It is simply a more compact way to describe the same structure. For example, the first branch in the tree represents the distances from city 'A' to all the other cities.
- Each of these branches have "twigs” consisting of distances and a "label twig” identifying the city these distances are associated with.
- the distance from 'A' to 'B' is 12
- the distance from 'A' to 'C is 17,
- the distance from 'A' to 'D' is 45.
- the next branch up the tree represents the distances from city 'B' to the other cities; i.e., B to A is 12, B to C is 27, and B to D is 32.
- the tree continues up to the final sub-tree which gives the distances from city D to the other cities.
- the program gene strings generated by the invention are applied to this tree of input to produce a result.
- the desired result is a list of cities in the order they should be visited.
- the fitness function must give the highest ratings to those program gene strings producing the shortest total distance. However, the fitness function must also evaluate candidates which do not produce a complete route. Thus, the first measure of fitness must be whether a particular program gene string selects a complete list of cities.
- a tentative solution is compared to a likely worst-case solution.
- This worst-case solution is at least as large as any proposed solution since no route which visits all the cities can make the longest trip in all cases unless all distances are the same; i.e., all cities are equidistant from one another, which is not the case in the problem illustrated in Figure 7.
- this fitness function can be used to produce an objective comparison of proposed routes. This is done by producing a ratio of efficiency of travel by dividing the distance traveled in a proposed route by the worst-case route.
- This fitness function sets a premium on completing the route since any program which does not finish will have a large penalty in the form of the worst-case route distance multiplied by the number of cities missed in the proposed route.
- the first half of the function reduces to 0 (since n-v will be zero) and the second part, d/w, will become the deciding factor. Since w is a constant, the smaller the value of d, the smaller the ratio and the better the fitness. Since the distance is expressed as a ratio, the relative fitness of routes will not be influenced by the scale used to measure distance.
- the fitness function works equally well with kilometers, miles or feet. However, to produce an accurate result, all distance values must be represented using the same unit of measure; i.e., all measurements in miles or all measurements in kilometers.
- the user must provide a termination criteria which is used to determine when a particular program gene string is "good enough.” In this case we will simply look for a lack of progress in the results. If, for 100 generations (where a generation is the execution of steps 60 through 84 in Figure 4), the best candidate has not improved by more than 5%, evolution is terminated and the program gene string having the best fitness value is used as the solution to the problem.
- the above traveling salesperson example is limited to four cities for simplicity and is for illustration purposes only. However, the same tree structure can be used to represent the distances between any number of cities. A larger number of cities merely results in a larger tree to represent the input data.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Physiology (AREA)
- Genetics & Genomics (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Stored Programmes (AREA)
- Apparatus Associated With Microorganisms And Enzymes (AREA)
Abstract
Description
Claims
Priority Applications (8)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1019980705136A KR19990077006A (en) | 1996-03-01 | 1996-03-01 | Genetic Programming Method and System |
US09/142,185 US6327582B1 (en) | 1996-03-01 | 1996-03-01 | Method and system for genetic programming |
DE1996631694 DE69631694T2 (en) | 1996-03-01 | 1996-03-01 | Genetic programming system and method using genetic programming techniques |
PCT/US1996/002758 WO1997032261A1 (en) | 1996-03-01 | 1996-03-01 | Method and system for genetic programming |
JP09530897A JP2000505580A (en) | 1996-03-01 | 1996-03-01 | Genetic programming method and system |
ES96908579T ES2217308T3 (en) | 1996-03-01 | 1996-03-01 | METHOD AND SYSTEM OF GENETIC PROGRAMMING. |
EP96908579A EP0898750B9 (en) | 1996-03-01 | 1996-03-01 | Method and system for genetic programming |
CA002239228A CA2239228C (en) | 1996-03-01 | 1996-03-01 | Method and system for genetic programming |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/US1996/002758 WO1997032261A1 (en) | 1996-03-01 | 1996-03-01 | Method and system for genetic programming |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1997032261A1 true WO1997032261A1 (en) | 1997-09-04 |
Family
ID=25680264
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1996/002758 WO1997032261A1 (en) | 1996-03-01 | 1996-03-01 | Method and system for genetic programming |
Country Status (8)
Country | Link |
---|---|
US (1) | US6327582B1 (en) |
EP (1) | EP0898750B9 (en) |
JP (1) | JP2000505580A (en) |
KR (1) | KR19990077006A (en) |
CA (1) | CA2239228C (en) |
DE (1) | DE69631694T2 (en) |
ES (1) | ES2217308T3 (en) |
WO (1) | WO1997032261A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2001075791A2 (en) * | 2000-04-04 | 2001-10-11 | Aber Genomic Computing Limited | Apparatus and a method for solving problems |
US7127436B2 (en) | 2002-03-18 | 2006-10-24 | Motorola, Inc. | Gene expression programming algorithm |
US7725409B2 (en) | 2007-06-05 | 2010-05-25 | Motorola, Inc. | Gene expression programming based on Hidden Markov Models |
WO2015044655A1 (en) * | 2013-09-27 | 2015-04-02 | Robert Cory | Computer program generation |
Families Citing this family (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6532453B1 (en) * | 1999-04-12 | 2003-03-11 | John R. Koza | Genetic programming problem solver with automatically defined stores loops and recursions |
PT102508A (en) * | 2000-08-10 | 2002-02-28 | Maria Candida De Carvalho Ferr | GENETICAL ALGORITHMS MIXED - LINEAR AND NON-LINEAR - TO SOLVE PROBLEMS SUCH AS OPTIMIZATION, FUNCTION DISCOVERY, LOGIC PLANNING AND SYNTHESIS |
KR20030027542A (en) * | 2001-09-29 | 2003-04-07 | 주식회사 케이티 | Evolution Method of Enhancing Evolution Speed of Genetic Algorithm |
WO2003038749A1 (en) * | 2001-10-31 | 2003-05-08 | Icosystem Corporation | Method and system for implementing evolutionary algorithms |
WO2004090692A2 (en) | 2003-04-04 | 2004-10-21 | Icosystem Corporation | Methods and systems for interactive evolutionary computing (iec) |
WO2005013081A2 (en) * | 2003-08-01 | 2005-02-10 | Icosystem Corporation | Methods and systems for applying genetic operators to determine system conditions |
US7356518B2 (en) * | 2003-08-27 | 2008-04-08 | Icosystem Corporation | Methods and systems for multi-participant interactive evolutionary computing |
US7243086B2 (en) * | 2003-12-19 | 2007-07-10 | Fuji Xerox Co., Ltd. | Methods and systems for automatically generating provably correct computer program code |
US7707220B2 (en) | 2004-07-06 | 2010-04-27 | Icosystem Corporation | Methods and apparatus for interactive searching techniques |
WO2007035848A2 (en) | 2005-09-21 | 2007-03-29 | Icosystem Corporation | System and method for aiding product design and quantifying acceptance |
US7505947B2 (en) * | 2005-10-20 | 2009-03-17 | International Business Machines Corporation | Computer controlled method using genetic algorithms to provide non-deterministic solutions to problems involving physical restraints |
WO2008079097A1 (en) * | 2006-12-22 | 2008-07-03 | Singapore Technologies Dynamics Pte Ltd | Method and apparatus for automatic configuration of meta-heuristic algorithms in a problem solving environment |
US7792816B2 (en) | 2007-02-01 | 2010-09-07 | Icosystem Corporation | Method and system for fast, generic, online and offline, multi-source text analysis and visualization |
US20090037352A1 (en) * | 2007-08-01 | 2009-02-05 | Electronic Data Systems Corporation | System and method for automated determination of solutions to known equations |
US8984259B2 (en) * | 2008-11-04 | 2015-03-17 | International Business Machines Corporation | Method, system, and computer program product for optimizing runtime branch selection in a flow process |
US9147206B2 (en) * | 2009-08-31 | 2015-09-29 | Accenture Global Services Limited | Model optimization system using variable scoring |
US20110060895A1 (en) * | 2009-09-09 | 2011-03-10 | Neal Solomon | System and methods for generating and organizing modular program code components |
US8838510B2 (en) | 2011-09-16 | 2014-09-16 | International Business Machines Corporation | Choosing pattern recognition algorithms and data features using a genetic algorithm |
US9753696B2 (en) * | 2014-03-14 | 2017-09-05 | Microsoft Technology Licensing, Llc | Program boosting including using crowdsourcing for correctness |
KR101725629B1 (en) * | 2015-04-27 | 2017-04-12 | 성균관대학교산학협력단 | System and method for predicting vehicular traffic based on genetic programming using fitness function considering error magnitude |
JP6325762B1 (en) * | 2017-03-15 | 2018-05-16 | 楽天株式会社 | Information processing apparatus, information processing method, and information processing program |
US11038528B1 (en) | 2020-06-04 | 2021-06-15 | International Business Machines Corporation | Genetic programming based compression determination |
JP2024050317A (en) * | 2022-09-29 | 2024-04-10 | 富士通株式会社 | FLOW GENERATION PROGRAM, FLOW GENERATION METHOD, AND INFORMATION PROCESSING APPARATUS |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4734848A (en) * | 1984-07-17 | 1988-03-29 | Hitachi, Ltd. | Combination reduction processing method and apparatus |
US4935877A (en) * | 1988-05-20 | 1990-06-19 | Koza John R | Non-linear genetic algorithms for solving problems |
US5343554A (en) * | 1988-05-20 | 1994-08-30 | John R. Koza | Non-linear genetic process for data encoding and for solving problems using automatically defined functions |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4697242A (en) | 1984-06-11 | 1987-09-29 | Holland John H | Adaptive computing system capable of learning and discovery |
US4821333A (en) | 1986-08-22 | 1989-04-11 | Environmental Research Inst. Of Michigan | Machine learning procedures for generating image domain feature detector structuring elements |
US5222192A (en) | 1988-02-17 | 1993-06-22 | The Rowland Institute For Science, Inc. | Optimization techniques using genetic algorithms |
US5255345A (en) | 1988-02-17 | 1993-10-19 | The Rowland Institute For Science, Inc. | Genetic algorithm |
US5148513A (en) | 1988-05-20 | 1992-09-15 | John R. Koza | Non-linear genetic process for use with plural co-evolving populations |
US5140530A (en) | 1989-03-28 | 1992-08-18 | Honeywell Inc. | Genetic algorithm synthesis of neural networks |
US5249259A (en) | 1990-01-23 | 1993-09-28 | Massachusetts Institute Of Technology | Genetic algorithm technique for designing neural networks |
AU7563191A (en) | 1990-03-28 | 1991-10-21 | John R. Koza | Non-linear genetic algorithms for solving problems by finding a fit composition of functions |
US5048095A (en) | 1990-03-30 | 1991-09-10 | Honeywell Inc. | Adaptive image segmentation system |
-
1996
- 1996-03-01 CA CA002239228A patent/CA2239228C/en not_active Expired - Lifetime
- 1996-03-01 DE DE1996631694 patent/DE69631694T2/en not_active Expired - Lifetime
- 1996-03-01 KR KR1019980705136A patent/KR19990077006A/en not_active Application Discontinuation
- 1996-03-01 WO PCT/US1996/002758 patent/WO1997032261A1/en active IP Right Grant
- 1996-03-01 EP EP96908579A patent/EP0898750B9/en not_active Expired - Lifetime
- 1996-03-01 ES ES96908579T patent/ES2217308T3/en not_active Expired - Lifetime
- 1996-03-01 JP JP09530897A patent/JP2000505580A/en active Pending
- 1996-03-01 US US09/142,185 patent/US6327582B1/en not_active Expired - Lifetime
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4734848A (en) * | 1984-07-17 | 1988-03-29 | Hitachi, Ltd. | Combination reduction processing method and apparatus |
US4935877A (en) * | 1988-05-20 | 1990-06-19 | Koza John R | Non-linear genetic algorithms for solving problems |
US5343554A (en) * | 1988-05-20 | 1994-08-30 | John R. Koza | Non-linear genetic process for data encoding and for solving problems using automatically defined functions |
Non-Patent Citations (12)
Title |
---|
1990 IEEE INT'L. CONF. ON NEURAL NETWORKS, 17-21 June 1990, J. BARNDEN and K. SRINIVAS, "Dissolving Variables in Connectionist Combinatory Logic", pages III-709 to III-714. * |
1990 INT'L. CONF. ON COMPUTER LANGUAGES, 12-15 March 1990, P.J. KOOPMAN Jr. et al., "Cache Performance of Combinator Graph Reduction", pages 39-48. * |
1990 INT'L. WORKSHOP ON TOOLS FOR ARTIFICIAL INTELLIGENCE, 6-9 November 1990, J.R. KOZA, "Genetically Breeding Populations of Computer Programs to Solve Problems in Artificial Intelligence", pages 819-827. * |
FIRST IEEE CONF. ON EVOLUTIONARY COMPUTATION, 27-29 June 1994, S. HANDLEY, "On the Use of a Directed Acyclic Graph to Represent a Population of Computer Programs", pages 154-159. * |
PROC. INT'L. CONF. ON GENETIC ALGORITHMS AND THEIR APPLICATIONS, 24-26 July 1985, N.L. CRAMER, "A Representation for the Adaptive Generation of Simple Sequential Programs", pages 183-187. * |
PROC. OF THE SECOND INT'L. CONF. ON GENETIC ALGORITHMS, 1987, C. FUJIKI and J. DICKINSON, "Using the Genetic Algorithm to Generate LISP Source Code to Solve the Prisoner's Dilemma", pages 236-240. * |
PROC. OF THE SECOND INT'L. CONF. ON GENETIC ALGORITHMS, 1987, K. DE JONG, "On Using Genetic Algorithms to Search Program Spaces", pages 210-216. * |
PROCEEDINGS OF THE FOURTEENTH ANNUAL CONFERENCE OF THE COGNITIVE SCIENCE SOCIETY, 29 July to 1 August 1992, P.J. ANGELINE and J.B. POLLACK, "The Evolutionary Induction of Subroutines", pages 236-241. * |
PROCEEDINGS OF THE SIXTH INT'L. CONF. ON GENETIC ALGORITHMS, 15-19 July 1995, T. HAYNES et al., "Strongly Typed Genetic Programming in Evolving Cooperation Strategies", pages 271-278. * |
PROCEEDINGS OF THE SIXTH INT'L. CONF. ON GENETIC ALGORITHMS, 15-19 July 1995, W.B. LANGDON, "Evolving Data Structures With Genetic Programming", pages 295-302. * |
PROCEEDINGS OF THE SIXTH INT'L. CONF. ON GENTIC ALGORITHMS, 15-19 July 1995, J.R. KOZA, "Two Ways of Discovering the Size and Shape of a Computer Program to Solve a Problem", pages 287-294. * |
See also references of EP0898750A4 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2001075791A2 (en) * | 2000-04-04 | 2001-10-11 | Aber Genomic Computing Limited | Apparatus and a method for solving problems |
WO2001075791A3 (en) * | 2000-04-04 | 2002-06-20 | Aber Genomic Computing Ltd | Apparatus and a method for solving problems |
US7127436B2 (en) | 2002-03-18 | 2006-10-24 | Motorola, Inc. | Gene expression programming algorithm |
US7725409B2 (en) | 2007-06-05 | 2010-05-25 | Motorola, Inc. | Gene expression programming based on Hidden Markov Models |
WO2015044655A1 (en) * | 2013-09-27 | 2015-04-02 | Robert Cory | Computer program generation |
Also Published As
Publication number | Publication date |
---|---|
EP0898750A1 (en) | 1999-03-03 |
CA2239228C (en) | 2002-12-03 |
EP0898750A4 (en) | 1999-04-14 |
CA2239228A1 (en) | 1997-09-04 |
EP0898750B9 (en) | 2004-12-01 |
ES2217308T3 (en) | 2004-11-01 |
JP2000505580A (en) | 2000-05-09 |
US6327582B1 (en) | 2001-12-04 |
EP0898750B1 (en) | 2004-02-25 |
KR19990077006A (en) | 1999-10-25 |
DE69631694T2 (en) | 2005-01-13 |
DE69631694D1 (en) | 2004-04-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP0898750B1 (en) | Method and system for genetic programming | |
Fikes | REF-ARF: A system for solving problems stated as procedures | |
JP2881711B2 (en) | Genetic synthesis of neural networks | |
CN108345937B (en) | Circulation is merged with library | |
Langdon | Genetic programming and data structures: genetic programming+ data structures= automatic programming! | |
González Muñoz et al. | Multi-stage genetic fuzzy systems based on the iterative rule learning approach | |
US5136686A (en) | Non-linear genetic algorithms for solving problems by finding a fit composition of functions | |
Coello et al. | A parallel implementation of an artificial immune system to handle constraints in genetic algorithms: Preliminary results | |
Bäck | A user’s guide to genesys 1.0 | |
Pappa et al. | Evolving rule induction algorithms with multi-objective grammar-based genetic programming | |
WO2021009618A1 (en) | Group specific decision tree | |
Dracopoulos et al. | Genetic programming for prediction and control | |
Fister et al. | uarmsolver: A framework for association rule mining | |
Xavier-Júnior et al. | A novel evolutionary algorithm for automated machine learning focusing on classifier ensembles | |
Lucasius et al. | Gates towards evolutionary large-scale optimization: a software-oriented approach to genetic algorithms—II. Toolbox description | |
Jurczuk et al. | Fitness evaluation reuse for accelerating GPU-based evolutionary induction of decision trees | |
Biggar et al. | The attractor of the replicator dynamic in zero-sum games | |
Banerjee et al. | Feature selection using rough sets | |
JP2010033214A (en) | Rule learning method, program, and device | |
CN115878094A (en) | Code searching method, device, equipment and storage medium | |
Dumitrescu et al. | Generalized decision trees built with evolutionary techniques | |
Quirino et al. | Instinct-based mating in genetic algorithms applied to the tuning of 1-nn classifiers | |
Kowalski et al. | A new evolutionary algorithm for synchronization | |
Holbrook | Optimizing Future Perfect: A Model for Composition with Genetic Algorithms | |
Land Jr et al. | Background on genetic algorithms |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): CA FI JP KR MX NO US |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
ENP | Entry into the national phase |
Ref document number: 2239228 Country of ref document: CA Ref country code: CA Ref document number: 2239228 Kind code of ref document: A Format of ref document f/p: F |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1996908579 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1019980705136 Country of ref document: KR |
|
WWE | Wipo information: entry into national phase |
Ref document number: 09142185 Country of ref document: US |
|
WWP | Wipo information: published in national office |
Ref document number: 1996908579 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 1019980705136 Country of ref document: KR |
|
WWR | Wipo information: refused in national office |
Ref document number: 1019980705136 Country of ref document: KR |
|
WWG | Wipo information: grant in national office |
Ref document number: 1996908579 Country of ref document: EP |