+

WO2005076923A2 - Manipulations de bases de donnees selon une theorie de groupe - Google Patents

Manipulations de bases de donnees selon une theorie de groupe Download PDF

Info

Publication number
WO2005076923A2
WO2005076923A2 PCT/US2005/003562 US2005003562W WO2005076923A2 WO 2005076923 A2 WO2005076923 A2 WO 2005076923A2 US 2005003562 W US2005003562 W US 2005003562W WO 2005076923 A2 WO2005076923 A2 WO 2005076923A2
Authority
WO
WIPO (PCT)
Prior art keywords
group
query
database
elements
data
Prior art date
Application number
PCT/US2005/003562
Other languages
English (en)
Other versions
WO2005076923A3 (fr
WO2005076923A8 (fr
Inventor
Heidi Dixon
Matthew L. Ginsberg
David Hofer
Eugene M. Luks
Original Assignee
State Of Oregon Acting By And Through The State Board Of Higher Education On Behalf Of The University Of Oregon
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US10/773,351 external-priority patent/US20050187905A1/en
Application filed by State Of Oregon Acting By And Through The State Board Of Higher Education On Behalf Of The University Of Oregon filed Critical State Of Oregon Acting By And Through The State Board Of Higher Education On Behalf Of The University Of Oregon
Publication of WO2005076923A2 publication Critical patent/WO2005076923A2/fr
Publication of WO2005076923A3 publication Critical patent/WO2005076923A3/fr
Publication of WO2005076923A8 publication Critical patent/WO2005076923A8/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2452Query translation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • Y10S707/99934Query formulation, input preparation, or translation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • Y10S707/99935Query augmenting and refining, e.g. inexact access

Definitions

  • This invention pertains in general to searching and manipulating database information in order to identify database elements satisfying specified properties.
  • the invention more particularly pertains to manipulating database information to represent and solve satisfiability and other types of logic problems such as those involved in microprocessor verification and testing.
  • the above need is met by using group theory to represent the data describing the application domain.
  • This representation expresses the structure inherent in the data.
  • group theory techniques are used to solve problems based on the data in a manner that uses computational resources efficiently.
  • the data describing the application domain are represented as sets of database entries, where each set (c,G) includes a database element c and a group G of elements g that can act on the database element c to produce a new database element (which is typically denoted as g(c)).
  • the present invention uses computational techniques to perform database manipulations (such as a query for a database entry satisfying certain syntactic properties) on the data in the group theory representation, rather than on the data in the non-group theory representation (referred to herein as the data's "native representation").
  • a database query asks whether the data describing the application domain have one or more specified properties, seeks to identify any input data having the properties and, in some cases, constructs new database elements.
  • a query specifies a Boolean satisfiability problem and seeks to determine whether any solutions or partial solutions to the problem exist (and to identify the solutions or partial solutions), whether any element ofthe problem is unsatisfiable given known or hypothesized problem features, or whether any single element ofthe problem can be used to derive new features from existing ones.
  • the input query is typically directed toward the native representation ofthe data and is therefore converted into an equivalent query on the data in the group theory representation.
  • the converted query is executed on the data in the group theory representation.
  • query execution produces a collection of zero or more group elements g that can act on database elements c associated to the group elements and satisfy the properties specified by the input query. If necessary or desired, the results ofthe query are converted from the group theory representation into the native representation ofthe data for the application domain.
  • FIG. 1 is a high-level block diagram of a computer system for storing and manipulating data using group theory according to an embodiment ofthe present invention
  • FIG. 2 is a high-level block diagram of program modules for storing and manipulating data using group theory according to one embodiment ofthe present invention
  • FIG. 3 is a flowchart illustrating steps for performing database manipulations using group theory according to an embodiment of present invention
  • FIG. 4 is a flowchart illustrating the "formulate query” and "execute query” steps of FIG. 3 according to an embodiment ofthe present invention wherein multiple low-level queries are generated from an initial query;
  • FIG. 5 is a flowchart illustrating a more detailed view ofthe "formulate query” and “execute query” steps of FIG. 3; and FIGS . 6-16 illustrate diagrams representing examples of search spaces utilized by an embodiment ofthe present invention.
  • FIG. 1 is a high-level block diagram of a computer system 100 for storing and manipulating database information using group theory according to an embodiment ofthe present invention. Illustrated are at least one processor 102 coupled to a bus 104. Also coupled to the bus 104 are a memory 106, a storage device 108, a keyboard 110, a graphics adapter 112, a pointing device 114, and a network adapter 116. A display 118 is coupled to the graphics adapter 112. Some embodiments ofthe computer system 100 have different and/or additional components than the ones described herein.
  • the processor 102 may be any general-purpose processor such as an INTEL x86 compatible-, POWERPC compatible-, or SUN MICROSYSTEMS SPARC compatible-central processing unit (CPU).
  • the storage device 108 may be any device capable of holding large amounts of data, like a hard drive, compact disk read-only memory (CD-ROM), DVD, etc.
  • the memory 106 holds instructions and data used by the processor 102.
  • the pointing device 114 may be a mouse, track ball, light pen, touch- sensitive display, or other type of pointing device and is used in combination with the keyboard 110 to input data into the computer system 100.
  • the network adapter 116 optionally couples the computer system 100 to a local or wide area network.
  • Program modules 120 for storing and manipulating database infom ation using group theory are stored on the storage device 108, from where they are loaded into the memory 106 and executed by the processor 102.
  • hardware or software modules may be stored elsewhere within the computer system 100 or on one or more other computer systems connected to the computer system 100 via a network or other means.
  • module refers to computer program instructions, embodied in software and/or hardware, for performing the function attributed to the module.
  • the operation ofthe computer system 100 is controlled by the LINUX operating system, although other operating systems can be used as well.
  • An embodiment ofthe present invention receives data describing objects or activities external to the computer system 100 and formulates the data in a representation that expresses structure inherent in the data.
  • the representation utilizes a branch of mathematics referred to as "Group Theory" and the data are represented as sets of database entries, where each set (c, G) includes a database element c and a group G of elements g that can act on the database element.
  • the group elements g are permutations acting on the associated database element c.
  • the present invention uses computational techniques to perform database manipulations (such as a query for a database entry satisfying certain syntactic properties) on the data in the group theory representation, rather than on the data in the non-group theory representation (referred to herein as the data's "native representation"). These database manipulations are more efficient than equivalent manipulations on native or other representations ofthe data.
  • FIG. 2 is a high-level block diagram of program modules 120 for storing and manipulating database information using group theory according to one embodiment of the present invention.
  • modules can provide other functionalities in addition to, or instead of, the ones described herein.
  • the functionalities can be distributed among the modules in a manner different than described herein.
  • a database module 210 stores data utilized by the present invention.
  • the term “database” refers to a collection of data and does not imply any particular arrangement ofthe data beyond that described herein.
  • the data are within one or more application domains.
  • the data within each application domain represent and/or describe one or more objects or activities external to the computer system, including prospective and/or theoretical instances ofthe objects and activities, relevant to the domain.
  • data within a given application domain can represent a design for a microprocessor or other digital logic device.
  • data within a given application domain can represent a series of actions that must or can be performed on a construction project for building a complex object such as a ship or bridge.
  • the data can represent an employee directory containing names, telephone numbers, and reporting paths.
  • the data within an application domain have inherent structure due to the nature ofthe domain. This structure can be explicit and/or implicit.
  • the data within each application domain are preferably represented using group theory in order to express this structure.
  • the group theory representation is contrasted with the data's non-group theory, or native, representation where the structure is not necessarily expressed. For example, if the application domain describes an employee directory, the data in the native representation include a listing of names and reporting paths or perhaps a machine- readable (but not group theory-based) equivalent ofthe listing and paths.
  • the data in the group theory representation are represented as sets of database elements c and groups G of group elements g, i.e. ⁇ c,G).
  • Each element g ofthe group G "acts" on its associated database element c (the action is typically denoted as g(c)) to produce a new database element.
  • An input/output (I/O) module 212 controls the flow of data into, and out of, the computer system 100.
  • the I/O module 212 receives "input data,” which are data within an application domain and describing one or more objects or activities external to the computer system, including prospective and/or theoretical instances ofthe objects and/or activities, and "input queries,” which are queries asking whether the input data have certain properties.
  • the input data received by the I/O module 212 can be encoded in either a group theory representation or the data's native representation.
  • the input data and/or input queries are provided to the computer system 100 by, for example, transmission over a computer network, loading from a computer- readable medium, and/or via the keyboard 110, pointing device 114, or other interface device.
  • the input data are preferably stored in the database 210.
  • the input queries are stored in the database 210 or elsewhere within the computer system 100.
  • the input data are generated by creating electronic representations of objects and/or activities in the pertinent application domains, such as representations of the aforementioned digital logic device, construction project, or employee directory, and providing the electronic representations to the computer system 100.
  • the input data are generated by taking measurements of physical or prospective physical objects.
  • the input data can be provided to the computer system 100 in their native representation or after the data are converted from their native representation to the group theory representation.
  • An input query asks whether the input data have given properties and, in some cases, seeks to identify the input data having the properties. For example, a query can ask whether a digital logic device behaves in a specified manner, whether an optimal set of steps for a construction project exist, etc.
  • a query specifies a Boolean satisfiability problem and seeks to determine whether any solutions or partial solutions to the problem exist (and to identify the solutions or partial solutions), whether any element ofthe problem is unsatisfiable given known or hypothesized problem features, or whether any single element ofthe problem can be used to derive new features from existing ones.
  • a query can also cause new database elements to be added in response to results of a query. For example, a query can save its results in the database 210 in a group theory representation for use by subsequent queries.
  • the data flowing out ofthe computer system 100 are referred to herein as "output data" and represent a result ofthe operation of an input query on the input data as achieved through one or more database manipulations using group theory.
  • the output data represent a concrete, tangible, and useful result ofthe database manipulations and can include data resulting from physical transformations ofthe input data.
  • the output data can represent a result of a verification and test procedure indicating whether a digital logic device contains any logic errors.
  • the output data can represent a result of a process that determines an optimal ordered sequence of steps for a construction project or other multi-step task.
  • a computer system or human being can use the output data to redesign the digital logic device to correct any errors, to perform the steps ofthe construction project, etc.
  • the computer system 100 outputs the output data by, for example, transmitting the data over a computer network, loading the data onto a computer-readable medium, displaying the data on a monitor, etc.
  • a database construction module 214 receives input data within an application domain from the I/O module 212, converts the data from their native representation into the group theory representation (if necessary), and stores the data in the group theory representation in the database 210.
  • the database construction module 214 converts the data from their native representation to the group theory representation by encoding the input data as one or more augmented clauses, where each clause has a pair ⁇ C,G) including a database element c and a group G of elements g acting on it, so that the augmented clause is equivalent to the conjunction ofthe results of operating on c with the elements of G. This encoding expresses the structure inherent in the input data.
  • the input data can be represented as a set of constraints, and each constraint can be written as a logically equivalent set of Boolean clauses.
  • the union of these sets provides an alternative (and often substantially larger) description ofthe input data.
  • the database construction module 214 converts a constraint into the group theory representation, a representative clause c is combined with a group G that captures the known generation method for that constraint type.
  • the group is applied to the representative clause, the entire set of Boolean clauses can be generated.
  • the group theory representation captures these different generation methods in a single representation.
  • a query formulation module 216 receives the input query from the I/O module 212.
  • the input query is typically directed toward the native representation ofthe input data.
  • the query formulation module 216 therefore converts the input query into one or more equivalent queries on the input data in the group theory representation.
  • a query specifies a search for database elements satisfying a property P.
  • the query formulation module 216 converts this query into a search for a pair (g,c) where g is a group element and c is a database element, such that g(c) satisfies property P.
  • the input query is a "high-level” query.
  • a high-level query is a general question about the input data. Examples of high-level queries are “are the data consistent?" and “does the logic device described by the data have errors?"
  • the answer to a high-level query is not generally a database element, but rather is information derived from the database elements. For example, the answer can be a description of an error in a digital logic device or a report that the device does not have any errors.
  • the query formulation module 216 converts the high-level query directly into a group theory based representation. In another embodiment, the query formulation module 216 converts the high-level query into multiple "low-level" queries. A low-level query corresponds to a search for a database element satisfying a particular property, such as "is there a single element ofthe data in the application domain that is unsatisfiable given the assignments made so far?" The query formulation module 216 converts each of these low-level queries into a group theory representation.
  • the input query received by the query formulation module 216 is a low-level query.
  • the low-level query can be derived from a high-level query by an entity external to the module 216.
  • the low-level query can be the entire query desired to be executed on the input data.
  • the query formulation module 216 converts the low-level query into one or more queries in the group theory representation.
  • a query execution module 218 receives the one or more converted queries from the query formulation module 216 and executes the queries on the data in the group theory representation ofthe input data in the database 210.
  • the query execution module 218 preferably uses techniques drawn from computational group theory in order to execute the queries efficiently.
  • the result of a query is a set of zero or more pairs (g,c) where g is a group element and c is a database element, where each g(c) satisfies the property P specified by the query.
  • the set of answers to the query form a subgroup.
  • the result ofthe query can be represented using a compact representation of the subgroup where a small number of group elements h ⁇ ...hNT are described explicitly and the other elements ofthe subgroup can then be computed by combining hj...h discourse.
  • a result construction module 220 receives the results ofthe one or more query executions from the query execution module 218 and converts the results from the group theory representation to the native representation ofthe data.
  • the result produced by the result construction module 220 can be a result of a test on a digital logic device, a sequence of steps for a construction project, etc.
  • the native representation ofthe query results is received by the I/O module 212, which outputs it as the output data.
  • the result construction module 220 In the simple case where a query executed by the query execution module 218 produces a set of pairs (g,c) where g is a group element and c is a database element, such that each g(c) satisfies property P, the result construction module 220 generates g(c) to reconstruct the result ofthe original query in terms familiar to the native representation of the input data. In a more complex case where the query execution module 218 executes multiple queries and generates multiple results in response to an input query, one embodiment ofthe result construction module 220 combines the results into a collective result answering the input query. For example, a high-level query can lead to multiple low-level queries and corresponding results.
  • an embodiment of the result construction module 220 analyzes the results ofthe low-level queries to generate an answer to the high-level query in terms familiar to the native representation of the input data.
  • the result construction module 220 constructs answers to both high and low-level queries in terms familiar to the native representation ofthe input data. This latter embodiment can be used, for example, when the results ofthe low- level queries allow one to understand the answer to the high-level query.
  • FIG. 3 is a flowchart illustrating steps for performing database manipulations using group theory according to an embodiment of present invention.
  • steps can be performed in different orders.
  • the functionalities described herein as being within certain steps can be performed within other steps.
  • the steps described herein are performed by the modules illustrated in FIG. 2. However, in alternative embodiments, some or all of the steps are performed by other entities.
  • the input data and input query are received 310.
  • the input data can describe a digital logic device and the input query can ask whether the device behaves in a specified manner.
  • the input data need not be received contemporaneously with the input query, and the input data and input query are not necessarily received in a particular order.
  • the input data either are already in the group theory representation, or are converted 312 from their native representation into the group theory representation.
  • the input query is formulated 314 in terms ofthe group theory representation.
  • the group theory representation ofthe query is executed 316 on the group theory representation ofthe input data to produce a result.
  • the result is then output 318.
  • FIG. 4 is a flowchart illustrating a view ofthe "formulate query” 314 and "execute query” 316 steps of FIG. 3 according to an embodiment where a high-level query is reduced into multiple low-level queries. These steps are identified by a broken line 320 in FIG. 3.
  • the low-level queries in the group theory representation are generated 410 from the high-level query (or from another low-level query as described below).
  • a low-level query is executed 412 on the group theory representation ofthe input data to produce a result. This result is returned 414 to the entity performing the method and can be used, for example, to create new database elements and/or modify existing elements. If 416 there are more low-level queries, the method returns to step 412 and continues to execute.
  • the results of one or more ofthe low-level queries are used 418 to generate 410 additional low-level queries and the method therefore returns to step 410.
  • the method uses the results ofthe low-level queries to generate 420 a result ofthe high-level query. This result is then output 318.
  • FIG. 5 is a flowchart illustrating a detailed view ofthe "formulate query” 314 and "execute query” 316 steps of FIG. 3, which can correspond to the "execute low-level query” 412 step of FIG. 4 in certain embodiments.
  • the native representation ofthe input query (or low-level queries generated from a high-level query) are generally ofthe type "find element .
  • This query is converted 510 into the equivalent group theory query, which is "find database element c and element g of a group G, such that g(c) satisfies property P.”
  • the formulated query is executed 512 on the input data in the group theory representation to identify any database elements c and group elements g where g(c) satisfies P.
  • the zero or more pairs (g,c) that are identified in response to the query are converted from the group theory representation into the native representation ofthe input data. This conversion is performed by using the group element g, and the database element c, to construct 514 g(c).
  • the resulting value 516 is returned as the result ofthe query.
  • the following description includes three sections describing an embodiment ofthe present invention.
  • the first section describes the theory ofthe present invention.
  • the second section describes an implementation of one embodiment.
  • the third section describes examples using an embodiment ofthe present invention for circuit verification and test and for solving planning problems. Those of skill in the art will recognize that embodiments ofthe present inventiqn can differ from the ones described herein.
  • SAT refers to conventional Boolean satisfiability work, representing information as conjunctions of disjunctions of literals (CNF).
  • cardinality refers to the use of "counting" constraints; if we think of a conventional disjunction of literals VJ,; as ⁇ U ⁇ i i then a cardinality constraint is one of the form ⁇ k ⁇ k i for a positive integer k.
  • pseudo-Boolean constraints extend cardinality constraints by allowing the literals in question to be weighted: Each u>i is a positive integer giving the weight to be assigned to the associated literal.
  • p-simulation hierarchy gives the minimum proof length for the representation on three classes of problems: the pigeonhole problem, parity problems and clique coloring problems.
  • An E indicates exponential proof length; P indicates polynomial length. While symmetry-exploitation techniques can provide polynomial-length proofs in cer- tain instances, the method is so brittle against changes in the axiomatization that we do not regard this as a polynomial approach in general. 3.
  • Inference indicates the extent to which resolution can be lifted to a broader setting. This is straightforward in the pseudo-Boolean case; cardinality constraints have the problem that the most natural resolvent of two cardinality constraints may not be one.
  • ZAP can p-simulate extended resolution, so that if P is a proof in extended resolution, there is a proof _?' of size polynomial in P in ZAP (Theorem 5.12).
  • the fundamental inference step in ZAP is in NP with respect to the ZAP representation, and therefore has worst case complexity exponential in the representation size (i.e., polynomial in the number of Boolean axioms being resolved).
  • the average case complexity appears to be low-order polynomial in the size of the ZAP representation (i.e., polynomial in the logarithm of the number of Boolean axioms being resolved).
  • ZAP obtains the savings attributable to subsearch in the QPROP case while casting them in a general setting that is equivalent to watched literals in the Boolean case. This particular observation is dependent on a variety of results from computational group theory.
  • ZAP In addition to learning the Boolean consequences of resolution, ZAP continues to sup- port relevance-based learning schemes while also allowing the derivation of first-order consequences, conclusions based on parity arguments, and combinations thereof.
  • the "inference” section describes resolution in this broader setting, and the p-simulation hierarchy section presents a variety of examples of these ideas at work, showing that the pigeonhole problem, clique-coloring problems, and Tseitin's parity examples all admit short proofs in the new framework. We also show that our methods p-simulate extended resolution.
  • the learning section recasts the DPLL algorithm in the new terms and discusses the continued applicability of relevance in our setting.
  • Variables are assigned values via branching and unit propagation.
  • unit propagation (described below), the existing partial assignment is propagated to new variables where possible. If unit propagation terminates without reaching a contradiction or finding a solution, then a branch variable is selected and assigned a value, and the procedure recurs.
  • unit propagation is a formal description of unit propagation:
  • Vj. be a clause, which we will denote by c.
  • P is a partial assignment of values to variables.
  • poss(c, P)
  • poss(c) is the number of literals that are either already satisfied or not valued by P, reduced by one (since the clause requires one literal true be true).
  • S is a set of clauses
  • poss n (S, P) for the subset of c € S for which poss(c, P) ⁇ n and ⁇ >oss >n (S, P) for the subset of learned clauses c € S for IU UC/.
  • a partial assignment is an ordered sequence (.i, . . . , l n ) of literals.
  • the above procedure returns a pair of values. The first indicates whether a contradiction has been found. If so, the second value is the reason for the failure, an unsatisfiable consequence of the clausal database C. If no contradiction is found, the second value is a suitably modified partial assignment. Procedure 2.7 has also been modified to work with annotated partial assignments, and to annotate the new choices that are made when P is extended.
  • Theorem 2.11 RBL is sound and complete, and uses an amount of memory polynomial in the size of C (although exponential in the relevance bound k).
  • the set of axioms that need to be investigated in the DPLL inner loop often has structure that can be exploited to speed the examination process. If a ground axiomatization is replaced with a lifted one, the search for axioms with specific syntactic properties is NP-complete in the number of variables in the lifted axiom, and is called subsearch for that reason. In many cases, search techniques can be applied to the subsearch problem. As an example, suppose that we are looking for unit instances of the lifted axiom a(x, y) V b(y, z) V , z) (1) where each variable is taken from a domain of size d, so that (1) corresponds to of ground axioms.
  • the pigeonhole problem can be solved in polynomial time if we extend our representation to include cardinality axioms such as x ⁇ - ⁇ r Xm ⁇ k ( 3 )
  • the single axiom (3) is equivalent to TM. conventional disjunctions.
  • each of the axiom sets consists of axioms of equal length; it follows that the axioms can all be obtained from a single one simply by permuting the literals in the theory.
  • literals are permuted with other literals of the same sign; in
  • composition is irrelevant, as is the choice of first element within a particular cycle. If ⁇ i and ⁇ 2 are two permutations, it is obviously possible to compose them; we will write the composition as ⁇ ⁇ where the order means that we operate on a particular element of ⁇ 1, . . . , n ⁇ first with ⁇ ⁇ and then with ⁇ 2 . It is easy to see that while composition is associative, it is not necessarily commutative.
  • a subset S of a group G will be called a subgroup of G if S is closed under the group operations of inversion and multiplication.
  • S ⁇ G to denote the fact that S is a subgroup of G, and will write S ⁇ G if the inclusion is proper.
  • closure under multiplication suffices. But let us return to our examples.
  • the set ⁇ in (9) is a subgroup of the full permutation group S 2n on 2n literals in n variables, since ⁇ is easily seen to be closed under inversion and composition.
  • the set of permutations needed to generate the four axioms from the first is given by: XI, ⁇ XI) X , -> ⁇ ) (11) x 2 , - ⁇ 2 )(xz, -x 3 ) (12)
  • a and B for x corresponding to the permutation (a(A, A), a(B, A))(a(A, B), a(B, B))(c(A, A), c(B, A))(c(A, B), c(B, B)) where we have included in a single permutation the induced changes to all of the relevant ground literals.
  • the set of ground instances corresponding to a single non-Boolean axiom can be generated from any single ground instance by the elements of a subgroup of the group S 2n of permutations of the literals in the problem.
  • the fact that only a tiny fraction of the subsets of S 2n is closed under the group operations and therefore a subgroup suggests that this subgroup property in some sense captures and generalizes the general idea of structure that underlies our motivating examples. Note, incidentally, that some structure is surely needed here; a problem in random 3- SAT for example, can always be encoded using a single 3-literal clause c and then that set of permutations needed to recover the entire problem from c in isolation.
  • Proposition 3.4 W n is the wreath product of S 2 and S n , typically denoted S 2 . S n .
  • S 2 typically denoted S 2 .
  • S n typically denoted S 2 .
  • An augmented clause in an n-variable Boolean satisfiability problem is a pair (c, G) where c is a Boolean clause and G ⁇ W n .
  • the sections below demonstrate that augmented clauses have the following properties: 1. They can be represented compactly, 2. They can be combined efficiently using a generalization of resolution, 3. They generalize existing concepts such as quantification over finite domains, cardinality, and parity constraints, together with providing natural generalizations for proof techniques involving such constraints and for extended resolution, 4. RBL can be extended with little or no computational overhead to manipulate augmented clauses instead of ground ones, and 5. Propagation can be computed efficiently in this generalized setting.
  • Proposition 3.6 is the first of the results promised: In any case where n Boolean axioms can be captured as instances of an augmented clause, that augmented clause can be represented using 0(log 2 ) generators. The proof of Proposition 3.6 requires the following result from group theory: Theorem 3.7 (Lagrange) If G is a finite group and S ⁇ G, then ⁇ S ⁇ divides ⁇ G ⁇ .
  • Corollary 3.8 Any augmented clause in a theory containing n literals can be expressed in 0(n 2 log 2 ) space. In fact, Corollary 3.8 can be strengthened using:
  • Proposition 3.9 Any subgroup of S n can be described in polynomial time using at most 0(n) generators.
  • the extensions need to simultaneously extend elements of all of the individual groups Gj, acting on the various subsets K.
  • Proposition 4.14 resolve((c ⁇ , G) , (c 2 , G)) ⁇ (resolve(c ⁇ , c 2 ), G).
  • Proposition 4.15 resolve((c ⁇ , Gi), (c 2 , G 2 ))
  • (resolve(c ⁇ , c 2 ), G x D G 2 ).
  • the resolvent of two augmented clauses can depend on the choice of the representative elements in addition to the choice of subgroup of extn(c,, Gi).
  • This set is easily seen to be a subgroup and is called the set stabilizer of C. It is often denoted G ⁇ c ⁇ . 2.
  • This set is also a subgroup and is called the pointwise stabilizer of C. It is typically denoted Gc- 3.
  • An augmented clause (c', G) is a weakening of an augmented clause (c, G) if d is a superset of c. It is known that proof lengths under resolution or extended resolution do not change if weakening is included as an allowable inference. (Roughly speaking, the literals intro- **d during a weakening step just have to be resolved away later anyway.) For augmented resolution, this is not the case.
  • introduction of new groups is equivalent to the introduction of new variables in extended resolution. Unlike extended resolution, however, where it is unclear when new variables should be introduced and what those variables should be, the situation in ZAP is clearer because the new groups are used to collapse the partitioning of a single augmented clause.
  • One embodiment of ZAP itself does not include introduction as an inference step. 5 Examples and proof complexity
  • x makes a "copy" of Sym(_9) inside of Sym(V) corresponding to permuting the elements of a. 's domain.
  • pigeonhole problems are in many ways the simplest.
  • ⁇ j the fact that pigeon . (of n + 1) is in hole j of n, so that there are n 2 + n variables in the problem.
  • G the subgroup of W n 2 +n that allows arbitrary exchanges of the n + 1 pigeons or the n holes, so that G is isomorphic to S n+ ⁇ x S n . This is the reason that this particular example will be so straightforward: there is a single global group that we will be able to use throughout the entire analysis.
  • the pigeonhole problem is difficult for resolution but easy for many other proof systems; clique coloring problems are difficult for both resolution and other approaches such as pseudo- Boolean axiomatizations.
  • the clique coloring problems are derivatives of pigeonhole problems where the exact nature of the pigeonhole problem is obscured. Somewhat more specifically, they say that a graph includes a clique of n + 1 nodes (where every node in the clique is connected to every other), and that the graph must be colored in n colors. If the graph itself is known to be a clique, the problem is equivalent to the pigeonhole problem. But if we know only that the clique can be embedded into the graph, the problem is more difficult.
  • ⁇ j j to describe the graph, to describe the coloring of the graph, and ⁇ f ⁇ to describe the embedding of the cliQue into the graph.
  • the graph has m nodes, the clique is of size n + 1, and there are n colors available.
  • ⁇ y- means that there is an edge between graph nodes . and j
  • d j means that graph node i is colored with the th color
  • ⁇ f ⁇ means that the _th element of the clique is mapped to graph node j.
  • the first axiom (33) says that two of the m nodes in the graph cannot be the same color (of the n colors available) if they are connected by an edge.
  • (34) says that every graph node has a color.
  • (35) says that every element of the clique appears in the graph, and (36) says that no two elements of the clique map to the same node in the graph.
  • (37) says that the clique is indeed a clique - no two clique elements can map to disconnected nodes in the graph.
  • any two nodes, clique elements or colors can be swapped.
  • parity constraint ⁇ Zi ⁇ O is equivalent to the augmented constraint (-uc 1 V-c 2 V---Va ; J fc,Fs)
  • a(A, B) as we search for a proof involving a single intermediate location, and then a(A, x) A a(x, y) A a(y, B) - ⁇ (_4, B) as we search for a proof involve two such locations, and so on, eventually concluding a(A, x x ) A - - - A a(x n , B) ⁇ a(A, B) (41) for some suitably large n.
  • the problem is that if d is the size of our domain, (41) will have (p. n - g r01in( i instances and is in danger of overwhelming our unit propagation algorithm even in the presence of reasonably sophisticated subsearch techniques. We argued previously that this problem requires that we learn only a version of (41) for which every instance is relevant.
  • a routine that finds instances of (c, G) for which g(c) n S 0 and ⁇ g(c) D U ⁇ ⁇ 1 for disjoint S and U, needed by line 1 in the unit propagation procedure 6.2.
  • Section 2 is a review of material from ZAP2. We begin in Section 2.1 by presenting both the Boolean satisfiability algorithms that we hope to generalize and the basic algebraic ideas underlying ZAP.
  • Section 2.2 describes the group-theoretic computations utilized by the ZAP implementation. 2.
  • Section 3 gives a brief introduction to some of the ideas in computational group theory that we use. 3.
  • Sections 4 and 5 describe the implementations of the computations discussed in Section 2. For each basic construction, we describe the algorithm used and give an example of the computation in action. 4.
  • Section 6 extends the basic algorithms of Section 5 to deal with unit propagation, where we want to compute not a single unit clause instance, but a list of all of the unit consequences of an augmented clause.
  • Section 7 discusses the implementation of Zhang and Stickel's watched literal idea in our setting. 6.
  • Section 8 describes a technique that can be used to select among the possible resolvents of two augmented clauses. This is functionality with no analog in a conventional prover, where there is only a single ground reason for the truth or falsity of any given variable. If the reasons are augmented clauses, there may be a variety of ways in which ground instances of those clauses can be combined.
  • S(P) the literals that are satisfied by P
  • U(P) the set of literals that are unvalued by P.
  • the result returned depends on whether or not a contradiction was encountered during the propagation, with the first result returned being true if a contradiction was found and false if none was found.
  • the first result returned is true if a contradiction was found and false if none was found.
  • c is the clause and c z is the reason that the last variable was set in a way that caused c to be unsatisfiable
  • Procedure 2.3 (Relevance-bounded reasoning, RBL) Given a SAT problem C, a set of learned nogoods D and an annotated partial assignment P, to compute RBL(G, D, P):
  • the procedure is recursive. If at any point unit propagation produces a contradiction c, we use the learn procedure to incorporate c into the solver's current state, and then recurse. (If c is empty, it means that we have derived a contradiction and the procedure fails.) In the backtracking step (line 6), we backtrack not just until c is satisfiable, but until it enables a unit propagation. This leads to increased flexibility in the choice of variables to be assigned after the backtrack is complete, and generally improves performance. If unit propagation does not indicate the presence of a contradiction or produce a solution to the problem in question, we pick an unvalued literal, set it to true, and recurse again.
  • VjZj be a clause, which we will denote by c, and let P be a partial assignment.
  • poss(c, P)
  • poss(c) is the number of literals that are either already satisfied or not valued by P, reduced by one (since the clause requires at least one true literal).
  • poss(c, P) ⁇ c n [U(P) U S(P)]
  • the possible value of a clause is essentially a measure of what other authors have called its irrelevance.
  • An augmented clause in an n-variable Boolean satisfiability problem is a pair (c, G) where c is a Boolean clause and G is a group such that G ⁇ W n .
  • the clause c itself will be called the base instance of (c, G).
  • an augmented clause consists of a conventional clause and a group G of permutations acting on it; the intent is that we can act on the clause with any element of the group and still get a clause that is "part" of the original theory.
  • suitably chosen groups correspond to cardinality constraints, parity constraints (the group flips the signs of any even number of variables) , and universal quantification over finite domains.
  • Procedure 2.8 (Relevance-bounded reasoning, RBL) Given a SAT problem G, a set of learned clauses D, and an annotated partial assignment P, to compute RBh(C, D, P): 1 (x, y) ⁇ — UNIT-PROPAGATE(G U D, P) 2 if x — true 3 then (c, G) ⁇ — y 4 if c is empty 5 then return FAILURE 6 else remove successive elements from P so that c is unit 7 D ⁇ - learn , P, (c, G)) 8 return RBL(G, D, P) 9 else P ⁇ - y
  • Procedure 2.5 can now be used unchanged, with d being an augmented clause instead of a simple one.
  • the effect of Definition 2.9 is to cause us to remove only augmented clauses for which every instance is irrelevant. Presumably, it will be useful to retain the clause as long as it has some relevant instance.
  • a proof engine built around the above three procedures would have the following properties: • Since the number of generators of a group is logarithmic in the group size, it would achieve exponential improvements in basic representational efficiency. • Since only .-relevant nogoods are retained as the search proceeds, the memory requirements remain polynomial in the size of the problem being solved. • It can produce polynomially sized proofs ofthe pigeonhole and clique coloring problems, and any parity problem. • It generalizes first-order inference provided that all quantifiers are universal and all domains of quantification are finite.
  • Definition 2.10 For a permutation p and set S with S p — S, will mean the restriction of p to the given set, and we will say that p is a lifting to the original set on which p acts.
  • the set of stable extensions stab( K , G ⁇ ) is closed under composition, and is therefore a subgroup of Sym( ⁇ ) .
  • G' 2 l is a subgroup of G. In fact, we have:
  • the point stabilizer of L is the subgroup G_ ⁇ G of all g £ G such that I 9 — I for every I £ L.
  • (gi) denotes the group generated by the #,. It is easy to see that a generating set is strong just in case it has the property discussed above, in that each ⁇ can be generated incrementally from ' l+1 l and the generators that are in fact elements of ⁇ — ⁇ 11 .
  • n-cycle is odd if and only if n is even (consider (2) above) , so given the strong generating set ⁇ (n - 1, n), (n - 2, n - 1, n), . . . , (2, 3, . . . , n), (1, 2, . . . , n) ⁇ for S n , a strong generating set for A n if n is odd is ⁇ (n — 2, n — 1, 7.), (n — 1, n)(n — 3, n — 2, n — 1, n), . . . , (n — 1, n)(2, 3, . . .
  • n (1, 2, . . . , n) ⁇ and if n is even is ⁇ (n — 2, n — l, n), (n — l, n)(n — 3, n — 2, n — l, n), . . . , (2, 3, . . . , n), (n — l, n)(l, 2, . . . , n) ⁇
  • Eg Given groups II ⁇ , G and g £' G, we define Eg to be the set of all hg for h £ H. For any such g, we will say that Hg is a (right) coset of H in G.
  • G be a group, and ⁇ a stabilizer chain for it.
  • coset deco - position tree for G we will mean a tree whose vertices at the ith level are the cosets of G ⁇ and for which the parent of a particular G g is that coset of ' ⁇ _I ⁇ that contains it.
  • the cosets correspond to the points to which the sequence (h, . . . , li) can be mapped, with the points in the image of /j identifying the children of any particular node at level i — 1.
  • V 6 a e a V d b e b y d c y d Suppose also that we are working with an assignment for which a and b are true and c and d are false, and are trying to determine if any instance of (4) is unsatisfied.
  • the coset decomposition tree associated with 6 4 is shown in Figure 6.
  • the nodes on the lefthand edge are labeled by the associated groups; for example, the node at level 2 is labeled with Sym(6, c, d) because this is the point at which we have fixed a but b, c and d are still allowed to vary.
  • the first entry (ab) means that we are taking the coset of the basic group Sym(6, c, d) that is obtained by multiplying each element by (ab) on the right.
  • the point a is uniformly mapped to b, and b is uniformly mapped to a.
  • c and d can be swapped or not.
  • a is still uniformly mapped to b, and b is now uniformly mapped to c.
  • c can be mapped either to a or to d.
  • the basic group is trivial and the single member of the coset can be obtained by multiplying the coset representatives on the path to the root.
  • the ninth and tenth nodes correspond to the permutations (06c) and (abed) respectively, and do indeed partition the coset of (5).
  • the original augmented clause (4) may indeed have unsatisfiable instances. But when we move to the first child, we know that the image of a is a, so that the instance of the clause in question is a V x for some x. Since a is true for the assignment in question, it follows that the clause must be satisfied. In a similar way, mapping to 6 also must produce a satisfied
  • Proposition 4.2 The result returned by Procedure 4.1 is stab(cj, Gj).
  • Procedure 4.1 The result returned by Procedure 4.1 is stab(cj, Gj).
  • x The example we will use is that with which we began this section, but we modify x to be ((ad), (be), (bf), (xy)) instead of he earlier ((ad), (be), (bf)).
  • the second swap (fee) is sanctioned in both cases.
  • stabilizer is not straightforward, and is not known to generators of the group being considered.
  • the most effective implementations work with a coset decomposition as described in Section 3.2; in computing G ⁇ sy for some set S, a node can be pruned when it maps a point inside of S out of S or vice versa. GAP implements this, although more efficient implementations are possible in some cases.
  • c n Group intersection is also not known to be polynomial in the number of generators; once again, a coset decomposition is used. Coset decompositions are constructed for each of the groups being combined, and the search spaces are pruned accordingly. 6.
  • H ⁇ G be groups.
  • a transversal of H in G we will mean any subset of G that contains one element of each coset of H.
  • G/H the transversal
  • Step 6 now selects the point to which __ ⁇ should be mapped; if we select x ⁇ or x 2 , then
  • Proposition 5.6 Provided that ⁇ c h D V ⁇ > overlap (H, c, V) > ⁇ c H n V ⁇ for all h £ H, tr U, k) as computed by Procedure 5.5 returns an element g £ G for which c 9 D U ⁇ ⁇ k, if such an element exists, and returns FAILURE otherwise.
  • the procedure is simplified significantly by the fact that we only need to return a single g with the desired properties, as opposed to all such g. In the examples arising in (ir)relevance calculations, a single answer suffices. But if we want to compute the unit consequences of a given literal, we need all of the unit instances of the clause in question. There are other considerations at work in this case, however, and we defer discussion of this topic until Section 6.
  • Our initial version of overlap is:
  • this expression will be negative; the number of points that will be mapped into U is therefore at least max( ⁇ W n c
  • T be a set, and G a group acting on it.
  • G acts transitively on T if T is an orbit of G.
  • a group G acts transitively on a set T.
  • a block system for G is a partitioning of T into sets B ⁇ , . . . , B n such that G permutes the B ⁇ .
  • Every group acting transitively and nontrivially on a set T has at least two block systems:
  • Proposition 5.16 Let G be a group acting transitively on a set T, and let c, V T. Suppose also that ⁇ Bi ⁇ is a block system for G and that cf) B l ⁇ 0 for n of the blocks in ⁇ Bi ⁇ . Then if b is the size of an individual block _3 2 and g E G, ⁇ c 9 n V ⁇ > ⁇ c ⁇ + m +n ⁇ B t n V ⁇ - nb (17)
  • Proposition 5.21 Let H be a group and c, V sets acted on by H. Then for any h E H,
  • GAP includes algorithms for finding minimal block systems for which a given set of elements (called a "seed" in GAP) is contained within a single block.
  • the basic idea is to form an initial block "system" where the points in the seed are in one block and each point outside of the seed is in a block of its own.
  • the algorithm then repeatedly runs through the generators of the group, seeing if any generator g maps elements x, y in one block to x g and y 9 that are in different blocks. If this happens, the blocks containing x g and y 3 are merged. This continues until every generator respects the candidate block system, at which point the procedure is complete.
  • Block pruning will not help us with the example at the end of Section 5.1.
  • the final space being searched is illustrated by Figure 12.
  • the first fringe node (where a is mapped to c and b to d) is essentially identical to the second (where a is mapped to d and b to c). It is important not to expand both since more complicated examples may involve a substantial amount of search below the nodes that are fringe nodes in the above figure. This is the sort of situation in which lexicographic pruning can generally be applied. We want to identify the two fringe nodes as equivalent in some way, and then expand only the lexicographically least member of each equivalence class.
  • n corresponds to a coset Ht of G, where H stabilizes each x 3 for j ⁇ i.
  • H stabilizes each x 3 for j ⁇ i.
  • z 3 the image of x 3 under t.
  • the node can be pruned, and we finally get the reduced search space as shown in Figure 11 as desired.
  • This node can be pruned by Lemma 5.26 as well.
  • the orbit of a under G is ⁇ a, b, c, d ⁇ , we can once again prune this node. (The previous node, which maps a to c, cannot be pruned, of course.) This particular example is simple.
  • the nodes being examined are at depth one, and there is significant overlap in the groups in question. While the same node is pruned by either lemma here, the lemmas prune different nodes in more complex cases.
  • the groups J G ⁇ c ⁇ and K — G ⁇ s,uy can be computed at the root of the tree, and the group J is independent of the sets S and U and can therefore be cached with the augmented clause
  • Unit propagation Procedure 5.28 was designed around the need to find a single permutation g £ G satisfying the conditions of the /c-transporter problem, and this technically suffices for ZAP'S needs.
  • unit propagation it is useful to collect all of the unit consequences of an augmented clause (c, G) at once, as opposed to collecting them via repeated transitions of G's coset decomposition tree.
  • ⁇ V the augmented clause
  • (c, G) be an augmented clause, and P a partial assignment.
  • the unit ⁇ .-consequences involve an additional requirement that the literal w appear in the clause instance in question. This will be useful when we discuss watched literals in the next section.
  • the unit consequences of (20) are e and /.
  • the unit c-consequences are the same (although we can no longer use the identity permutation ( ) , since the needed c is not in the base instance of (20)).
  • the pruning lemmas may prune a node because it includes a permutation g that is not minimal in its double coset JgK. This is a problem because g and the minimal element of JgK may correspond to distinct unit consequences. In our running example, it may well be that none of the minimal elements of JgK supports / as a conclusion; if we accumulate all of the minimal elements, we will not get a full set of unit consequences as a result. Given a successful g that is minimal in its double coset, reconstructing the relevant orbits under J and K is easy, so we begin by introducing some definitions that cater to this.
  • W contains at least two unvalued literals in C if C. What about if C is satisfied or unit? What should the watching set be in this case? In some sense, it doesn't matter. Assuming that we notice when a clause changes from unsettled to unit (so that we can either unit propagate or detect a potential contradiction) , the clause becomes uninteresting from this perspective, since it can never generate a second unit propagation. So we can watch it or not as we see fit. In another sense, however, it does matter.
  • One of the properties that we would like the watching sets to have is that they remain valid during a backtrack. That means that if a settled clause C becomes unsettled during a backtrack, there must be two watched and unvalued variables after that backtrack. In order to discuss backtracking in a formal way, we introduce:
  • P be a partial assignment for a set T of (possibly augmented) clauses.
  • P is T-closed if no clause C £ T has a unit consequence given P.
  • a T-closure of P is any minimal, sound and T-closed extension of P, and will be denoted by either P_ or by simply P if T is clear from context.
  • the definition of closure makes sense because the intersection of two closed partial assignments is closed as well. To compute the closure, we simply add unit consequences one at a time until no more are available. Note that there is still some ambiguity; if there is more than one unit consequence that can be added at some point, we can add the unit consequences in any order.
  • this maintenance would involve modifying the watching sets as infrequently as possible, so that we could adjust them only as required when variables take new values, and not during backtracking at all.
  • a and fe are unvalued and watch the clause a V 6 V c
  • Proposition 7.8 Suppose that W is a watching set for C under P and I is a literal. Then: 1. W is a watching set for G under any backtrack point for P. 2. If C is settled by (P, I), then W is a watching set for C under (P, I). 3. If C is settled by (P, l), and ⁇ (W - ⁇ -./ ⁇ ) n C n U(P-, G ) ⁇ > 1, then W - ⁇ -.Z ⁇ is a watching set for C under (P, I) . 4- If ⁇ >l . W ( ⁇ ⁇ C , then W is a watching set for C under (P, I). The proposition tells us how to modify the watching sets as the search proceeds.
  • a watching set for (c, G) under P is any set of literals W that is a watching set for every instance c 9 of (c, G) under P.
  • W is a watching set for (c, G) under any backtrack point for P. 2. If - ⁇ l $. W ( ⁇ c G , then W is a watching set for (c, G) under (P, I) . 3. If ⁇ (W U V) n c 9 n U((P, l)) ⁇ > l for every g E G such that c 9 is unsettled by (P, l), then W U V is a watching set for (c, G) under (P, I) . 4.
  • Procedure 7.11 returns a nonempty set W if and only if Procedure 5.20 returns a value in excess of k.
  • ⁇ c h ⁇ ⁇ W ⁇ > k for every h £ H.
  • Proposition 7.14 Suppose that overlap (H, c, V, k) is computed using Procedure 7.11, or otherwise satisfies the conclusion of Proposition 7.12. Then if there is a g E G such that w E c 9 and c 9 ( S ⁇ c 9 f ⁇ U — 0, Transport (G, c, S, U, w) as computed by Procedure 7.13 returns (true, g) for such a g. If there is no such g, Procedure 7.13 returns (false, Z, W), where Z is a skeletal set of unit w- consequences of (c, G) given P, and W is such that
  • n is a node that can be pruned either by a counting argument (with the new contribution W n to the set of watched literals) or by a lexicographic argument using another node n', then since the node n' will eventually fail, it will contribute its own watching set W n ⁇ to the eventually returned value. While it is possible that W n ⁇ W n ⁇ (different elements can be selected by the overlap function in line 6, for example), we expect that in the vast majority of cases we will have W n — W n > and the non-lexicographic prune will have no impact on the eventual watching set computed.
  • Proposition 7.14 implies that the watching set returned by Procedure 7.13 can be used to update the watching set as in the third claim of Proposition 7.10.
  • a watched augmented clause is a pair ((c, G), W) where (c, G) is an augmented clause and W is a watching set for (c, G).
  • Proposition 7.17 Let P be an annotated partial assignment, and C a set of watched augmented clauses, where for every ((c, G), W) £ C, W is a watching set for (c, G) under P. Let L be the set of unit consequences of clauses in C. // UNIT-PROPAGATE(G, P, L) returns (true, c) for an augmented clause c, then c is a nogood for P, and any modified watching sets in C are still watching sets under P. Otherwise, the value returned is (false, P) and the watching sets in C will all have been replaced with watching sets under P. Procedure 7.16 can be modified and incorporated into Procedure 2.8, where the literal most recently added to the partial assignment is added to L and thereby passed into the unit propagation procedure.
  • Proposition 8.5 Suppose that we are given two augmented clauses (a, G) and ( ⁇ , H) such that a and ⁇ are unit for a partial assignment P, with I £ a and X £ ⁇ . Then the value returned by Procedure 8.4 is a ⁇ p-minimal candidate l-resolvent of ( , G) and ( ⁇ , H) .
  • the procedure can be implemented somewhat more efficiently than described above; if af, for example, already satisfies the condition implicit in line 6, there is no need to reinvoke the transport function for g. More important than this relatively slender improvement, however, is the fact that resolution now involves repeated calls to the transport function.
  • Boolean satisfiability engines need not worry about the time used by the resolution function, since unit propagation dominates the running time.
  • a naive implementation of Procedure 8.4 involves more calls to transport than does the unit propagation procedure, so that resolution comes to dominate ZAP'S overall runtime. To correct this, remember the point of Procedure 8.4. The procedure is not needed for correctness; it is only needed to find improved resolution instances. The amount of time spent looking for such instances should be less than the computational savings achieved by having them.
  • Boolean satisfiability problems are typically in a format where variables correspond to positive integers, literals are nonzero integers (negative integers are negated literals) , and clauses are terminated with zeroes.
  • DIMACS format precedes the actual clauses in the problem with a single line such as p cnf 220 1122 indicating that there are 220 variables appearing in 1,122 clauses in this problem.
  • This numerical format makes it impossible to exploit any existing understanding that the user might have of the problem in question; this may not be a problem for a conventional Boolean tool (since the problem structure will have been obscured by the Boolean encoding in any event), but was felt to be inappropriate when building an augmented solver.
  • predicate instances directly (e.g., in [l 3] indicating that the first pigeon is in the third hole) or in a nonground fashion (e.g., in [x y] ).
  • Group definition When a group is specified directly, it is assigned a symbolic designator that can then be used in an augmented clause.
  • the group syntax is the conventional one, with a group being described in terms of generators, each of which is a permutation. Each permutation is a list of cycles, and each cycle is a space-separated list of literals.
  • An augmented clause that uses a previously defined group is of the form symbols GROUP group 2 • • • group re where the symbols are essentially a sequence of literals and each group ⁇ is the designator for a group to be used.
  • the group in the augmented clause is then the group collectively generated by the group s.
  • here is the group-based encoding of the pigeonhole instance involving four pigeons and three holes:
  • GROUP G ( (in[l 1] in [2 1] ) (in[l 2] in[2 2] ) (in[l 3] in [2 3] ) ) ( (in[l 1] i [3 1] in [4 1] ) (in[l 2] in [3 2] in [4 2] ) (in [l 3] in [3 3] in [4 3] ) ) ( (in[l 1] in[l 2] ) (in [2 1] in[2 2] ) (in[3 1] in [3 2] ) (in[4 1] in[4 2] ) ) . ( (in [l 1] in[l 3] ) (in[2 1] in[2 3] ) (in [3 1] in[3 3] ) (in [4 1] in [4 3] ) ) > ;
  • the general form of a ZAP axiom is quantifiers symbols result where the quantifiers are described in the next section.
  • the quantifiers are of the form V(x x , . . . , x fc ) or 3(x ⁇ , . . . , x fc ) where each of the j are variables that can then appear in future predicate instances.
  • V(x ⁇ , . . . , x / 0 where the V quantifier means that the variables can take any values that do not cause any of the quantified predicate's instances to become identical.
  • the axiom saying that only one pigeon can be in each hole now becomes simply V(p ⁇ , p 2 )V/ ⁇ .
  • SORT pigeon 9 SORT hole 8; PREDICATE i (pigeon hole) ;
  • FORALL(z) EXISTS (h) in[z h] This is the nine pigeon instance.
  • the two axioms say directly that no hole z can contain two distinct pigeons x and y (note the use of the NOTEQ or V) , and every pigeon z has to be in some hole h. This encoding is presumably more intuitive than the previous one.
  • a full adder is a digital logic device accepting three inputs, two bits and a carry-in, and producing two outputs, a sum and a carry-out.
  • the full adder implements a truth table as follows:
  • circuit elements or gates, (of which there are five), gate types (gtypes, of which there are three - AND, OR and XOR), and ports (each gate has three ports, two inputs and one output). There are also the three inputs to the adder, and the two outputs.
  • 6 PREDICATE type(gate gtype); 7 PREDICATE val(port gate); 8 PREDICATE input(addin); 9 PREDICATE output(addout); Lines 6-9 give the predicates that are used to describe the circuit. The description says that a gate is of a particular type, or that the value of a port on a gate is true or false (corresponding to high and low circuit values, respectively).
  • 10 GROUP xor val[l 1] val[2 1] val[3 1] %2 0 ;
  • Lines 10-12 describe the behavior of XOR gates. Lines 10-11 specify two groups that will be used. The first group (line 10) says that the sum ofthe two inputs and the output is equivalent to zero mod 2.
  • the second group (line 11) says that if a gate is an XOR gate (type 1 ) and the first two inputs are false, the output must be false. Note that these two lines do not actually say anything about the circuit in question; they simply define groups that are used in line 12, where the axiom describing XOR gates is presented.
  • line 22 says that the first input to the adder is connected to port 1 of gate 2.
  • Line 28 says that port 3 of gate 2 is connected to port 1 of gate 3.
  • Line 32 says that port 3 of gate 3 (the output ofthe second XOR gate) is connected to output 1 of the adder itself.
  • lines 1-11 are used to define the groups and lines 12-41 are used to produce actual database elements in the group theory representation. These database elements are described below.
  • This group is defined as the group associated with the constraint
  • the variable g can take any value from one to five and each value corresponds to a gate in the circuit.
  • the group should have the effect of allowing the value of g to vary freely over all five of its possible values.
  • To construct generators for this group we first construct a permutation that has the effect of swapping the value of g from one to two when applied to the representative clause. This permutation is described in lines 1-4. We then construct a permutation that has the effect of cycling the value of g from two up to five. This permutation is defined in lines 5-8. By repeatedly applying these permutations to the representative clause, we can generate an instance of the clause for every value of variable g. The group generated by these two permutations is the desired group gates.
  • the permutation described in lines 1 and 2 simultaneously flips variables val ⁇ 1] and val ⁇ 1].
  • the permutation described in lines 3 and 4 simultaneously flip variables val [2 1] and valfl 1].
  • the composition of these two permutations flips the remaining even sized subset of variables containing variables val ⁇ 1] and valfl 1].
  • the group generated by these two permutations is the group XOR.
  • the notation seen in lines 1-4 is then abbreviated with the shorthand notation F2:fval[3 1] val ⁇ 1] valfl 1]]]
  • Database elements 22-33 (the circuit diagram) include the group corresponding to the mod 2 constraint.
  • Database elements 34-41 once again include the trivial group. 34 ( [-output ⁇ ] -outputfl] -input ⁇ ] -input ⁇ ] -inputfl]] , ⁇ > )
  • axiom 18 would have been type ⁇ 3] instead of type ⁇ 1]. (In the integer version that is eventually analyzed, axiom 18 is ( [27] , ⁇ > ) instead of ( [»], ⁇ >)•
  • Lines 5-9 simply repeat the requirements ofthe input theory regarding gate type. Lines 1-2 say that the failing case is that where the two input bits are true (and the carry-in is false, since it is not included). Both the carry-out and sum bits are true (lines 3-4); the sum bit should be off.
  • the values ofthe intermediate points in the circuit are shown in lines 10-20 and can be used to assist in identifying the source ofthe problem if desired.
  • a planning problem is a general class of problems that concern how to construct a sequence of actions to achieve some goal.
  • a planning problem is a general class of problems that concern how to construct a sequence of actions to achieve some goal.
  • a logistics problem Given a set of packages located at given initial locations, a set of planes at given initial locations, and a destination location for each package; can the packages be delivered to their destinations within a given amount of time?
  • the description ofthe input data representing the planning problem in the native domain follows.
  • Lines 8 and 9 describe frame axioms. Frame axioms require packages to maintain their current position unless they are moved. Line 8 asserts that packages stay at the same location or they are loaded onto planes, and line 9 asserts that packages stay in the same plane or are unloaded.
  • Lines 10-13 describe the rules of how packages are allowed to change positions, namely how packages are loaded onto and unloaded from planes. If a package is loaded onto a plane, then the plane is required to be at the same location as the package at the time when the package is loaded (line 11), as well as the preceding time step (line 10). Axioms for unloading a package (lines 12 and 13) are simply the reverse of loading. A loading or unloading action requires a total of two time-steps for completion.
  • Lines 14-18 FORALL(p t) EXISTS(a 1) packageAtfp 1 1] inPlanefp a t] ; Lines 14-18 insure the consistency of any given state.
  • Lines 14 and 15 require that planes and packages be in only one location at a time. Note that because planes cannot be at more than one location during any given time, it follows that a plane must take at least one time step to fly between cities.
  • Line 16 requires that packages be in only one plane at a time, and line 17 keeps packages from being in both a plane and a location at the same time.
  • line 18 requires that each package always exists somewhere, either in a plane or at a location. 19 package Atfl 1 1];
  • each constraint is encoded as a group-based database element consisting of a clause together with a group on the set of literals.
  • the representative clauses associated with the constraints described in lines 8-18 are as follows: 8 [-packageAt ⁇ 1 1] packageAt ⁇ 1 2] inPlane ⁇ 1 2] inPlane ⁇ 2 2]]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Des données d'une base de données décrivent un domaine d'application tel qu'un problème de satisfaisabilité. Les données sont representées d'une manière qui exprime la structure inhérente aux données et une telle représentation utilise une théorie de groupe et représente les données sous forme d'une ou plusieurs clauses augmentées , chaque clause possédant une paire (c,G) incluant un élément de base de données c et un groupe G des éléments du groupe g agissant sur celui-ci. Une consultation est codée dans une représentation de théorie de groupe et est exécutée sur la représentation de théorie de groupe des données afin d'identifier des éléments de base de données et des éléments de groupe associés satisfaisant à la consultation. Si souhaité, les éléments de base de données satisfaisants sont convertis de la représentation de théorie de groupe en représentation native des données.
PCT/US2005/003562 2004-02-05 2005-02-04 Manipulations de bases de donnees selon une theorie de groupe WO2005076923A2 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US10/773,351 2004-02-05
US10/773,351 US20050187905A1 (en) 2004-02-05 2004-02-05 Database manipulations using group theory
US10/989,982 US7440942B2 (en) 2004-02-05 2004-11-15 Database manipulations using group theory
US10/989,982 2004-11-15

Publications (3)

Publication Number Publication Date
WO2005076923A2 true WO2005076923A2 (fr) 2005-08-25
WO2005076923A3 WO2005076923A3 (fr) 2007-11-22
WO2005076923A8 WO2005076923A8 (fr) 2008-01-17

Family

ID=34864643

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2005/003562 WO2005076923A2 (fr) 2004-02-05 2005-02-04 Manipulations de bases de donnees selon une theorie de groupe

Country Status (2)

Country Link
US (1) US7440942B2 (fr)
WO (1) WO2005076923A2 (fr)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7363298B2 (en) * 2005-04-01 2008-04-22 Microsoft Corporation Optimized cache efficiency behavior
US7606776B1 (en) 2005-09-28 2009-10-20 Actenum Corporation Flexible constraint propagation engine for combinatorial optimization applications
US7562055B2 (en) * 2006-09-18 2009-07-14 International Business Machines Corporation Resolve trace minimization
US8126255B2 (en) * 2007-09-20 2012-02-28 Kla-Tencor Corp. Systems and methods for creating persistent data for a wafer and for using persistent data for inspection-related functions
US8103674B2 (en) * 2007-12-21 2012-01-24 Microsoft Corporation E-matching for SMT solvers
US8463743B2 (en) * 2009-02-17 2013-06-11 Microsoft Corporation Shared composite data representations and interfaces
US8738584B2 (en) * 2009-02-17 2014-05-27 Microsoft Corporation Context-aware management of shared composite data
US8266181B2 (en) 2010-05-27 2012-09-11 International Business Machines Corporation Key-break and record-loop processing in parallel data transformation
US8635206B2 (en) 2011-06-30 2014-01-21 International Business Machines Corporation Database query optimization
US8438513B1 (en) * 2011-12-30 2013-05-07 Northeastern University Quantifier elimination by dependency sequents
US9734262B2 (en) * 2012-09-05 2017-08-15 Patrick DeLeFevre Method and system for understanding text
US9222895B2 (en) 2013-02-25 2015-12-29 Kla-Tencor Corp. Generalized virtual inspector
US9816939B2 (en) 2014-07-22 2017-11-14 Kla-Tencor Corp. Virtual inspection systems with multiple modes
WO2016043609A1 (fr) * 2014-09-18 2016-03-24 Empire Technology Development Llc Analyse sémantique latente en trois dimensions
US20230350968A1 (en) * 2022-05-02 2023-11-02 Adobe Inc. Utilizing machine learning models to process low-results web queries and generate web item deficiency predictions and corresponding user interfaces

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6601058B2 (en) * 1998-10-05 2003-07-29 Michael Forster Data exploration system and method
US6556978B1 (en) * 1999-03-09 2003-04-29 The State Of Oregon Acting By And Through The State Board Of Higher Education On Behalf Of The University Of Oregon Satisfiability algorithms and finite quantification
EP1122640A1 (fr) * 2000-01-31 2001-08-08 BRITISH TELECOMMUNICATIONS public limited company Dispositif pour la génération automatique de code source
US6912700B1 (en) * 2001-06-06 2005-06-28 The United States Of America As Represented By The National Security Agency Method and system for non-linear state based satisfiability
US6886153B1 (en) * 2001-12-21 2005-04-26 Kla-Tencor Corporation Design driven inspection or measurement for semiconductor using recipe
US7653520B2 (en) * 2002-07-19 2010-01-26 Sri International Method for combining decision procedures with satisfiability solvers

Also Published As

Publication number Publication date
US20050177558A1 (en) 2005-08-11
WO2005076923A3 (fr) 2007-11-22
US7440942B2 (en) 2008-10-21
WO2005076923A8 (fr) 2008-01-17

Similar Documents

Publication Publication Date Title
Chandrasekaran et al. Deterministic algorithms for the Lovász local lemma
Andersen An introduction to binary decision diagrams
Turi et al. On the foundations of final coalgebra semantics: non-well-founded sets, partial orders, metric spaces
Luisa Algorithms and complexity
WO2005076923A2 (fr) Manipulations de bases de donnees selon une theorie de groupe
Huang et al. The language of search
Forbes Polynomial identity testing of read-once oblivious algebraic branching programs
Fagan et al. Optimising clifford circuits with quantomatic
Mateescu et al. AND/OR multi-valued decision diagrams (AOMDDs) for graphical models
Buhrman et al. Limits of quantum speed-ups for computational geometry and other problems: Fine-grained complexity via quantum walks
US20050187905A1 (en) Database manipulations using group theory
Neele et al. Solving parameterised boolean equation systems with infinite data through quotienting
Beame et al. Exact model counting of query expressions: Limitations of propositional methods
Dixon et al. Generalizing boolean satisfiability II: Theory
Gervet Constraints over structured domains
Hawkins et al. A hybrid BDD and SAT finite domain constraint solver
Luks et al. The complexity of symmetry-breaking formulas
Heijltjes et al. Complexity bounds for sum-product logic via additive proof nets and petri nets
Dixon et al. Generalizing Boolean satisfiability III: implementation
Gyssens et al. Tagging as an Alternative to Object Creation.
Sinha Efficient reconstruction of depth three circuits with top fan-in two
Masina et al. On CNF Conversion for SAT and SMT enumeration
Poremba et al. New Techniques for Static Symmetry Breaking in Many-Sorted Finite Model Finding
Ferrari et al. Structured transition systems with parametric observations: observational congruences and minimal realizations
Murray et al. Efficient query processing with reduced implicate tries

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Country of ref document: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载