HK1018328B - Data retrieval method and apparatus with multiple source capability - Google Patents
Data retrieval method and apparatus with multiple source capability Download PDFInfo
- Publication number
- HK1018328B HK1018328B HK99103011.1A HK99103011A HK1018328B HK 1018328 B HK1018328 B HK 1018328B HK 99103011 A HK99103011 A HK 99103011A HK 1018328 B HK1018328 B HK 1018328B
- Authority
- HK
- Hong Kong
- Prior art keywords
- data
- information
- database
- data source
- driver
- Prior art date
Links
Description
This application is a continuation-in-part application of U.S. patent application No. 08/593,118 (1996-2/1/d application), which is a continuation-in-part application of U.S. patent application No. 08/582,062 (attorney docket No. 23134/90100, filed 12/20/1995 entitled "data retrieval method and apparatus with multiple source capacities"), incorporated herein by reference.
Technical Field
The present invention relates to computer-implemented systems that are capable of retrieving information stored in one or more different sources and in any of a number of different formats, and of providing reports and analysis based on that information, and more particularly, to computer methods and apparatus that are capable of automatically retrieving database information (including structural and/or related information) stored in any of a number of formats without relying on human analysis of the source data.
Background
Methods of computer accessible information have been developed in a variety of ways, such as relational or hierarchical database management systems, spread-file data systems, spreadsheet systems, and the like. These systems are used to store, manipulate and display a myriad of information, including accounting or other financial information, scientific or technical data, corporate or business data, names, addresses and telephone data and statistical data. A variety of formats and data structures have been developed and this has been both a good side and a bad side. On the positive side, having multiple different types of systems, different systems can be provided with different quality support (e.g., optimizing the relationship between data entry or storage and speed or flexibility of data analysis and reporting, or optimizing the relationship between accounting data and corporate data, etc.), or providing a user interface or other feature that can appeal to individual or corporate preferences. However, the increasing number of such information systems is a substantial barrier in situations where it is useful to have simultaneous access to information in two or more such systems (e.g., to coordinate or combine such information). Examples of such cases include: (1) an accountant wishing to make standardized statements, having a plurality of clients each retaining accounting data in their respective different types of data sources; (2) companies with several divisions that wish to make a unified report, but where different divisions employ different companies or financial software; (3) a company wishing to make a unified report whose accounting information is on a first model or brand of database (or other data source) and whose company information is on a second model of database; (4) a set of scientists investigating the same question, each storing or utilizing data retained in a different type or brand database or other data source. The reader may also encounter other similar examples after understanding this specification. Furthermore, in some cases, it may be desirable to provide a method of accessing data (e.g., to provide consistent and/or facilitated reporting and analysis of data) even when all of the required information is stored in a single type of data source or in a single data file.
This is difficult for a variety of reasons, including differences in the method of organizing the information and differences between types of data sources. In some cases, similar types of information may be organized in different ways, even with the same database software. For example, in a first example, using a first database package, a user may organize personal records of a company, storing names of owners of the company in a first table or list, addresses of the owners in a second table or list, and phone numbers of the owners in a third table or list, and storing pointers or links to indicate which address and which phone number the name is associated with. However, another example of using the same software may occur where different people organizing personal information may provide a single table where each line or "record" of information includes a name, address, and phone number without any links or pointers from records in one table to records in another table.
Further, different types of data may have different structures and/or different data storage formats or schemes. For example, some software packages are organized in a hierarchical manner (e.g., in a tree structure), while others may be organized in a relational database form (modeled on a two-dimensional table of rows and columns). Further, the information may be stored in this form (strictly speaking, not in a database form, such as storing data in a "flat file" form) as a spreadsheet or the like. In addition, different types of data sources may store data in various formats. For example, some database products (database products) store each table, each report format, and each query as separate files on a storage device (such as a hard disk), while other software may store all tables, relationships, queries, table formats, etc. in a single file. Some products may store each record and/or field as fixed length data and/or at fixed locations in a file, while other products may employ delimiters to distinguish one record from the next or to distinguish one field from the next within a record. This location may be different for different software products even if different software products store a particular type of information on the reservation. In addition, data is encoded differently in different software products, such as using ASCII encoding in one product and polyphonic (multi-byte) characters in another product. In some cases, the data may be compressed and/or encoded.
In the past, because the types of data were quite different, when access to the stored information was required (e.g., to normalize and analyze reports and/or to combine or reconcile information from two or more databases), a consultant or other expert analyzed each "source" data file or database individually or "manually" to understand its structure, relational data storage format, data organization within the database, etc. The expert then creates some input method or queries the data in the source data file or database to obtain the desired access, coordination, or combination. Although this method is workable, it is laborious and time consuming because it requires human analysis, and it typically takes days or weeks to access, reconcile or combine because it typically takes a long time for an expert or consultant to complete the analysis.
Thus, a combined and/or coordinated system is provided in which information organized in various formats or forms or in various ways may be accessed while reducing or eliminating the need for human analysis, thereby providing a system that is at least partially automated and less laborious and time consuming than some prior approaches.
Disclosure of Invention
The present invention relates to systems for accessing stored information, for example, accessing information or obtaining coordination and/or combinations in two different information storage systems. Preferably, in one embodiment, some or all of the analyses involved are performed automatically (i.e., without manual analysis) using a suitably programmed computer. Preferably, the system is flexible in that it is not limited to the format it can access, but is constructed to obtain data from virtually any computer-readable information source. Preferably, the system is extensible, and more preferably, the modules are extensible, wherein elements may be added to allow access to additional types, formats, or organizations of data. In one embodiment, accessing, reconciling, or combining data is accompanied by enhancing the analysis of the data (i.e., providing data analysis types and/or reports that are not found or used in the original data source). Preferably, the system can be used to provide standardization of data analysis and reporting among several data sources. In one embodiment, to obtain the desired results (such as by employing text recognition, artificial intelligence, and/or an expert system), the system employs the contents of a source data file or database, as well as information about the results. In one embodiment, the system utilizes this information to at least partially control the methods by which data is available for analysis and reporting. In one embodiment, the system uses this information to provide such analysis and reporting.
Output or reports are provided that generate information about data included in a data source, which may be two or more source data, in a standardized or unified manner. Drivers are provided for specific different types of source data, including programming to identify the structure or other characteristics of the various data sources (e.g., for use with the first new database). Preferably, the new database is structured to allow for high activity and/or rapid output or reporting, or is optimized for reporting purposes. In one embodiment, the present invention includes converting one or more data sources into one or more unified databases, preferably generating one or more key directory tables, selectively generating class groups or accumulations and additional data or optional arguments.
In one embodiment, the present invention creates or provides a database based on accounting or other data converted from stored data files, such as data files created by previous accounting or other software.
In one embodiment, the system is configured to facilitate updating some or all of the new database portions (such as by storing one or more data profiles that should be updated and/or creating or defining a schedule that automatically performs the update process within a pre-defined time interval).
In one embodiment, improvements are provided that can automatically identify desired data (such as by looking when multiple values are found in a given data segment or data parameter).
Preferably, one or more validation or auditing tools are provided to detect potential errors or problems.
In accordance with one aspect of the present invention, a computer-implemented method is provided. The method comprises the following steps: providing a first driver that issues instructions for accessing data that may be stored in a first or second different data source, the first driver containing program instructions for use in conjunction with the first data source; and automatically obtaining information about the data structure of the first data source using the first driver without manual analysis of the first data source, the information about the data structure of the first data source resulting in optimization of a new database when the new database is formed in which information about the first database is to be stored.
In accordance with another aspect of the present invention, a computer-implemented method is provided. The method may be used in conjunction with accessing data that may be stored in first and second different data sources. The method comprises the following steps: a first step of providing a first driver comprising program instructions for use in conjunction with the first data source; a second step of automatically obtaining information about the data structure of the first data source using the first driver without manual analysis of the first data source; storing at least some information from the first data source in a first database using the first driver; wherein the first database is augmented relative to the first data source.
In accordance with yet another aspect of the present invention, a computer-implemented method is provided for use in connection with data potentially stored in first and second different data sources, at least one of the first and second data sources being for generating at least a first output. The method comprises the following steps: providing a first driver comprising program instructions for use in conjunction with the first data source; providing a second driver, different from said first driver, containing programming code for use in conjunction with said second data source; applying said first and second drivers to obtain first and second information about data structures of said first and second data sources, respectively; using the first and second information to define a structure of a first database; storing at least some information from the first and second data sources in the first database using the first and second drivers; and at least generating a first report according to the information in the first database.
In accordance with yet another aspect of the present invention, an apparatus is provided that may be used in connection with accessing data that may be stored in first and second different data sources. The apparatus comprises: first drive means comprising program instructions for use in conjunction with said first data source; a second driver, different from said first driver, containing programming code for use in conjunction with said second data source; means for employing said first and second driving means to obtain first and second information about the data structure of said first and second data sources, respectively; means for using said first and second information to define a structure of a first database; means for storing at least some information from said first and second data sources in said first database using said first and second drive means; wherein the first information results in an optimization of the first database.
In accordance with another aspect of the present invention, a computer-implemented method is provided. The method comprises the following steps: providing a first driver that issues instructions for accessing data stored in a first data source; using the first driver to obtain first information about a data structure of the first data source; applying the first information to define a structure of a first database; employing the first driver to store at least some information from the first data source in the first database; storing second information defining at least a portion of the data stored in the first data source to be used to update the first database; and updating the first database using the second information; wherein the first information results in an optimization of the first database.
In accordance with another aspect of the present invention, a computer-implemented method is provided that may be used in connection with accessing data that may be stored in a first data source, wherein the first data source is used to generate at least a first output. The method comprises the following steps: providing a first driver comprising program instructions for use in conjunction with the first data source; obtaining first information about a data structure of the first data source by automatically accessing information content stored in the first data source using the first driver; defining a structure of a first database distinct from the first data source using the first information, wherein the first database does not exist prior to the step of obtaining the first information with the first driver; storing at least some information from the first data source in the first database with the first driver; updating said first database with less than all of the information in said first data source, said updating step being performed after said step of storing at least some of the information from said first data source in said first database with said first driver; wherein the first information results in an optimization of the first database.
In accordance with another aspect of the present invention, a computer-implemented method is provided. The method comprises the following steps: providing a first driver that issues instructions for accessing data stored in a first data source, the first driver containing program instructions for use in conjunction with the first data source; automatically obtaining first information related to the first data source with the first driver without manual analysis of the first data source; creating at least a first database for storing at least some data from the first data source, the first database being based on at least some of the first information; creating at least a second and a third database containing information from said first database, wherein said second and third databases are different from each other; wherein the first information results in an optimization of the first database.
In accordance with another aspect of the present invention, an apparatus is provided that may be used in connection with computer-implemented access to data that may be stored in either of a first or second different data source. The apparatus comprises: a first driver including program instructions for use in conjunction with said first data source, said first driver not being used in conjunction with said second data source; means for automatically obtaining first information about a data structure of the first data source with the first driving means without manual analysis of the first data source by automatically accessing information content stored in the first data source; means for defining a structure of a first database distinct from said data source with said first information, wherein said first database does not exist prior to automatically obtaining said first information with said first drive means; wherein the first information results in an optimization of the first database.
Drawings
FIG. 1 is a schematic diagram of a flat file data store;
FIG. 2 illustrates a directory structure of the type used in conjunction with the data store shown in FIG. 1;
3A-3C illustrate examples of data storage formats used in conjunction with the data storage shown in FIG. 1;
4A-4F are diagrams of examples of data stored in tables of a relational database;
FIG. 5 illustrates a directory structure of the type used in conjunction with FIGS. 4A-4F;
FIG. 6 is a schematic diagram of an example of a flat file data store;
7A-7D are schematic diagrams of data stored in tables of a relational database;
FIG. 8 is a block diagram of a system for information retrieval in accordance with an embodiment of the present invention;
FIG. 9 is a schematic diagram of the contents of functional blocks according to an embodiment of the present invention;
FIG. 10 is a flow diagram of a process for information retrieval in accordance with an embodiment of the present invention;
11A and 11B illustrate a pseudocode process for selecting or searching a directory according to an embodiment of the present invention; and
fig. 12 is a schematic diagram of data stored in a table of database 808 provided in accordance with an embodiment of the present invention.
Detailed Description
Before describing certain aspects of the present invention, various forms of storing information are provided to aid in understanding the present invention. This can be done by providing several examples, including examples of accounting information and examples of scientific or technical information. Table 1 compares the types of data that may be stored by two different companies. Table 1 is intended to show the conceptual organization of accounting and other information for two companies, and not necessarily information stored in a database (although it could be if desired).
TABLE 1 example of accounting organization
Company #1 company #2
| Account | Account |
| Cash money | Cash money |
| Payment account | Bank #1 |
| Account payable | Bank #2 |
| Bank #3 | |
| Deposit of money | |
| Examination of | |
| Account payable | |
| In part | |
| Consultant | |
| Accounts receivable | |
| Sale | |
| Profit |
TABLE 1 example of accounting organization
Company #1 company #2
| Employee's staff | Employee's staff |
| Salesman #1 | Sale |
| Salesman #2 | Salesman #1 |
| Salesman #2 | |
| Investigation and research | |
| Investigator #1 | |
| InvestigationMember #2 | |
| Plan for | Plan for |
| Investigation and research | Investigation and research |
| Sale | Of chemical origin |
| Of biomedicine | |
| Sale | |
| Old product | |
| New method (lines) | |
| Product(s) | Product(s) |
| Product #1 | Old production line |
| Product #2 | Production line #1 |
| Product 1 | |
| Product 2 | |
| Production line #2 | |
| Product 3 | |
| Product 4 | |
| Novel production line | |
| Production line #3 | |
| Product 5 | |
| Product 6 | |
| Production line 4 | |
| Product 7 | |
| Product 8 | |
| Subsidiary company | |
| Subsidiary #1 | |
| Subsidiary #2 |
In the first example of table 1, company 1 holds account information, employee information, plan information, and product information, and has four account sections. The account information for company 1 includes three components: cash, accounts payable, and accounts receivable. The company keeps its list of salesmen, tracks information with two plans (research and sales) and keeps its list of products. The second example of table 1 is a longer (but also simplified) example. In this example, the account has a plurality of components on one level. Although company 2 accounts have the following categories: cash, accounts payable, and accounts receivable, but each category has subclasses, with some subclasses having a finer division. Similarly, employee, plan, and product information is divided into several categories and sub-categories, while company 2 also has additional items (such as sub-companies) that are tracked.
Table 1 shows that even before taking into account the differences between data sources and/or data storage companies, the structure of a company and/or its method of choosing to organize its information results in a difference from one system to another. For example, if company 1 and company 2 were customers of an accountant who wanted to make reports and analyses of a consistent or standard system (or for both companies), it would be very difficult even if company 1 and company 2 were using the same database software, and even if the organized accounting information within that software was in a similar form. Thus, with the prior approach, a human is typically required to analyze and understand the information in Table 1 in order to provide consistent or standardized reports and analysis for both companies based on the databases of those companies.
Still in view of accounting information, several types of information stores may be used to store accounting information for company 1 and/or company 2. For example, the information may be stored as one or more flat files. Note that, at least for some purposes, a "flat file" information store is not a true database system. However, in at least some embodiments, the present invention is capable of accommodating flat file data as well as other database and non-database storage methods.
Fig. 1 schematically shows how information of company 2 of table 1 is stored with a plurality of flat files. Although fig. 1 shows that the information may appear in written form on multiple sheets of paper, in practice, the data may be stored on a computer readable medium (such as a hard disk as described below). The format of fig. 1 is intended to show the logical structure of data of a plurality of files that are grouped into files 101a to 101f, each file including file identification information 104 (shown in fig. 1) as header or header information 104a, 104b and a plurality of records (shown in fig. 1) as information rows 106a, 106b, 106c, each record having a plurality of fields (grouped into columns 108a, 108b, 108c, 108d as shown in fig. 1). Methods and apparatus for storing and accessing data to have or reflect a logical row and column structure as shown in fig. 1 are known to those familiar with programming techniques. Can combine a plurality of flat file informationThe stored programs are used together with the present invention. Examples of such information storage programs include simple Accounting under the trade name Simply AccountingTMAnd MAS-90.
Data organized in the logical structure shown in fig. 1 is stored in a number of different formats. For example, in one embodiment, the data in each of the flat files 101a through 101f is stored in a separate file on the hard disk of a personal computer. Fig. 2 shows a directory/file structure that may be used to store such files, where all the files shown in fig. 1 are stored in multiple subdirectories. It is known to those skilled in the art that even though the various files are organized into a directory hierarchy as shown in fig. 2, they may be physically stored on hard disks in a number of separate locations. A variety of formats may be used to store data in a file. Examples of these are shown in fig. 3A to 3C. In the example of FIG. 3A, the file includes header information, followed by storage of the first record 106a, the second record 106b, and so on. In the example of FIG. 3A, fixed length data is employed, where each record 106a, 106b has the same length 304 (i.e., occupies a fixed number of bits). In the example of FIG. 3A, each field within each record also includes a fixed length from 308a to 308 d.
FIG. 3B illustrates another fixed length data storage approach, where data is stored in column order, rather than row order (all data is stored contiguously, then all descriptions, etc.). In the example of fig. 3B, it may be useful to store (e.g., as part of a header) the record number indication 322 in order to facilitate finding the desired data. In a fixed length system (e.g., as shown in fig. 3A or 3B), the particular data may be found a given distance (i.e., a given number of bits) from the beginning of the data. For example, in FIG. 3A, if the header 302 is known to have a length of 4 bytes and the record length 304 is known to be 8 bytes, then the data information of the first record 106a beginning at byte number 5 needs to be found, the second record 106b beginning at byte number 13 needs to be found, and so on.
Fig. 3C shows the storage of data in delimited fields rather than in fields of a given length. In the delimited format, a specific symbol (i.e., a bit pattern different from any pattern used to store data) is used to mark the end or beginning of a record and/or field. In the embodiment shown in fig. 3C, two different special symbols are used, one for marking the beginning of a record and the other for marking the beginning of a field. These symbols are shown in fig. 3C as colon and semicolon, respectively, which represent either bit pattern or symbol. In the data format of FIG. 3C, information representing data information of the first record 106a may be identified as information after the first new record symbol 324a but before the first new field symbol 324 b. The data information from the second record 106b is the information after the second new record symbol 326a, but before the next new field symbol 326b, etc. Many other formats for storing information are possible. As can be seen from the description of fig. 3A through 3C, the diversification of data storage formats brings with it another problem of accessing, coordinating, and combining different types of data of an information storage system. Previously, there was a need for knowledge of the format of data storage (such as, in some cases, by analyzing examples of stored information) that would require direct access to the information (e.g., without the use of a database management system or other software that reads the stored information). Thus, the need of an accountant who wants to access the storage information of the companies 1 and 2 of table 1 requires not only information on the logical organization of data, its logical directory structure (fig. 2), but also information on the data storage format (fig. 3A to 3C).
There are many other possibilities besides storing information as flat file data. Fig. 4A to 4F show one possible organization of information in view of a relational database. In the example of fig. 4A to 4F, a first service table 402 is stored, which has a plurality of records 406a to 406 d. Note that the record shown in FIG. 4A is similar in some respects to that shown in FIG. 1 (i.e., includes a data segment, a description segment, and a volume segment). In the example of FIG. 4A, each record provides an additional index segment. The service table in the example of fig. 4A does not include header information as shown in fig. 1, and includes only a single service table fig. 4A (instead of multiple tables from tables 101a to 101f as in the embodiment of fig. 1). In the relational database shown in fig. 4A to 4F, additional tables capable of reflecting the organization shown in fig. 1 are provided. For example, the accounts table 412 includes all the category tables as shown in Table 1, with an index 414 associated with each account. Similarly, employee table 416 includes the names of the employees of Table 1, each name having an index 418 associated therewith. Further, for each name, it is represented in fig. 4C whether the person is related to a seller (sales force) or to a researcher (reflecting the hierarchy as shown in table 1). In addition, a field is included to indicate the location of the employee. Additional tables (not shown) may be provided to list the various plans, products, and subsidiaries of company 2 that reflect the organization of table 1.
Fig. 4D shows a link table 422, which indicates that for each record of the service table 402, any record that is needed is linked to other tables. For example, if the first transaction 406a is a transaction related to a bank 1 component (bankno.1 component) of a cash account, then a record 428 is provided indicating that for a transaction record having an index value of 1, the appropriate account reference is the one having an index 424. Similarly, links may also be made with employee table 416 or other tables (not shown). Thus, while in FIG. 1a single business table 402 is needed for each possible combination of accounts, subsidiaries, products, plans, etc. (which may result in a large number of files of relatively complex accounting structures), in the embodiment of FIGS. 4A through 4F only a single business table is needed, with the linked table map 422 providing the information obtained in the embodiment of FIG. 1 as to which flat file the business is known to be stored.
In a typical relational database, only information that meets certain criteria (e.g., the business of particular employees for those particular accounts) may be identified or displayed. In some database software, these criteria or "queries" may be stored for use, for example, when selective information is needed. FIG. 4E illustrates a table that stores a plurality of such queries (e.g., using Structured Query Language (SQL)). The queries used in a particular database system may reflect the method by which a company analyzes or organizes data. Thus, an accountant who may be interested in standardizing reports and analysis based on information in such a database may wish to know and/or be able to replicate the type of data analysis represented by the various stored queries (FIG. 4E).
In addition, many types of databases allow users to design reports (either for display or for printing), and in some cases, to store information defining such reports (e.g., re-multiplexed reports). Another table or set of tables (not shown) may then be stored as part of or in conjunction with the relational database to maintain information about the reports.
The information and structures shown in fig. 4A through 4F may be stored in a number of different ways. FIG. 5 illustrates a directory/file hierarchy that may be used to store multiple design tables, link tables, query tables, and/or report companies. These data may be stored in a number of different data formats, such as any of those shown in fig. 3A, or other formats known to those skilled in the art.
Another example of information that may be stored in various formats is scientific or technical information. In fig. 6, a flat file system is provided to store surface temperature information for, e.g., meteorological surveys. In the example of FIG. 6, each file 602a, 602b, 602c stores information for a particular occasion and unit of measure (e.g., degrees Fahrenheit or Celsius) as indicated by its header 604. For each record 606a, 606b, 606c, the data and reading for each hour of the day is stored in separate fields. Fig. 7A to 7D show a relational database system for storing this type of data. The data table 702 includes all observations, as well as a unit of measure 704a and an index 704b for each observation. Tables 706 (fig. 7B) and 708 (fig. 7C) may be used to represent index values for the location associated with the data point and the time of day associated with the data point stored in fig. 7A (as indicated by its index value 704B). While the examples of fig. 6 and 7A-7D are simplified, it will be apparent to those skilled in the art how to build an information system for storing more complex sets of data (such as meteorological data, including wind speed and direction, etc.). Thus, if a researcher wants to correlate information from two meteorological data sources, the foregoing methods (one of which stores information in the system shown in FIG. 6 and the other of which stores information in the system shown in FIGS. 7A through 7F) may be used so that the information storage structures and organizations shown in FIGS. 6 and 7A through 7D need to be analyzed by an advisor or other expert and the data accessed in such a way as to allow the combination or coordination of the data therein.
FIG. 8 is a block diagram of a system according to one embodiment of the present invention for use in overcoming the difficulties described above with respect to accessing, coordinating, or otherwise manipulating data across different information systems. In the FIG. 8 embodiment, the main process 802 selectively enables various drivers 804a, 804b, 804c, 804d as needed in connection with analyzing and/or accessing information in the various data sources 806a, 806b, 806c, 806 d. For ease of discussion, the source data 806a-806d described above with respect to FIG. 8 may be data stored or generated by one of a variety of programs or systems for organizing or storing data (such as flat file systems, databases, spreadsheets, etc., as described above). The processes and data of FIG. 8 may reside in one or more computers. In one embodiment, the process is implemented in the context of a Local Area Network (LAN), which includes a network server computer and associated hard disk or other storage device and one or more client computers. In one embodiment, the main process 802 is executed on a client computer, while the information sources 806 and data files for the new database 808 are stored on a hard disk (or other data store) associated with a network server. The invention can be practiced in many other configurations, such as on a single computer, on multiple non-networked computers employing computers linked by communication links (e.g., wide area networks, modem communications), through the internet, and the like. Various types of computers can be used to implement the invention, including mainframe computers or personal computers, such as those manufactured by International Business machines (e.g., computer-based 386, 486 or Pentium II), apple Inc., such as the Macintosh computer, and "cloned" versions of these computers). In one embodiment, the process is implemented using the DOS operating system and/or the Microsoft Windows or Windows 95 client interface.
Those items referred to herein as drivers 804a through 804d should not be confused with the type of data filter provided in some database programs. In general, a data filter is a type of interrogation or logical test used to select certain and/or fields based on user determinable criteria. On the other hand, drivers 804a to 804d, which will be described in detail below, are provided with a variety of processing functions for analyzing and accessing different types of source data. In another embodiment, the function module 804 is provided as a Dynamic Link Library (DLL) in a manner that those skilled in the art can understand after understanding the present specification. Drivers 804a through 8094d are structured to operate on one or more types of data sources (e.g., operate on data files generated by a particular database program). Depending on the nature of the database program, it may sometimes be desirable to have two separate drivers, for example, for generating data files from two separate versions of the database software package. In some cases, it may be possible to provide a single driver that may be used in conjunction with data files generated using two (or more) different types or brands of software (or different versions of a given brand of database or other software).
In general, the source data 806 a-06 d as embodied in FIG. 8 may be any computer-readable source of information. Examples of such include flat file source data, hierarchical databases, relational databases, spreadsheets, and the like. Although fig. 8 illustrates an embodiment with four data sources, the present invention cannot be employed with only a single data source or with five or more data sources. While the present invention may be employed where each data source 806a-806d is shown using a different type or brand of software, the present invention may also be employed where two or more data sources are generated by the same brand or type of database or other software. As an example, a first driver may be configured to retrieve data files from a database generated using dBase II A second driver may be configured to retrieve information from a data file generated using dbaseIII , and a third driver may be configured to retrieve information from a flat file system (such as simple Accounting)TM) The information of the generated data file and the fourth driver may be configured for use in retrieving information from a data file generated using microsoft Access .
Once the data source has been analyzed (as will be described in greater detail below), the results of such analysis may be employed in a variety of ways, including providing a user with access to information in the data source for viewing or editing, copying some or all of the data while preferably enhancing it (as described below) to create a new database, create a data report (for viewing, printing, storing, transmitting, etc.), query, etc.
In the FIG. 8 embodiment, after the host process 802 utilizes the driver 804 to analyze the source data 806a-806d, one or more new databases 808 may be created that include data from one or more of the various data sources 806a-806 d. In one embodiment, a new database is created for each data source. It may be desirable to combine two or more such databases, for example, using standard database techniques, such as when the databases have similar results. In another embodiment, one database 806 may include information from two or more data sources (e.g., if a company employs one database or other data source to store sales information and another database or other data source for storing employee information). If desired, new database or databases 808 may be used to generate reports, e.g., using report author 810, and if desired, it may be used to enter, view, or analyze data, e.g., using database management system 812 or other data 814. In one embodiment, database 808 is a Microsoft Access database that includes base code with one or more guides, templates, filters, and/or toolbox software (such as those understood by those with the property Microsoft Access programming) for providing database reports and analysis (such as outputting standard financial reports), for example. In one embodiment, the financial and administrative reporting software is provided as an extension to and variation of the MVTM brand name available from Timeline, Inc. of Bellvuw, WA.
Preferably, the analysis system also includes a module configured to generate or provide a report or screen display for a purpose or for a group. For example, in the embodiment of FIG. 8, an Executive Information System (EIS)815 is preferably provided with a user interface for ease of use using a guided tool or schema selector (e.g., selecting between bar charts and percentage charts (pie charts), selecting report periods, selecting quarterly or weekly reports, etc.) and is configured to output data analysis in various spreadsheets, presentation schemas, or printed forms. In one embodiment, the practitioner is provided with a menu of various views of data, including views that have been automatically generated or reviewed using the automatic accumulation (roll-up) generation process described above.
In one embodiment, the running processing functionality of information in a new database or database 808 is enhanced as a result of using processes such as 802. This is in the sense of information to generate, display or output an analysis or relationship of data that is not displayed or output or to utilize the source data 806a-806 d.
FIG. 9 is a diagram of various drivers 804 a-804 d. Each driver includes a number of defined processes or functions 901 to 910. Each function may include computer program instructions 912, for example, to implement and perform one or more of the steps described below and shown in fig. 10. In one embodiment, each of the functions 901-910 is a callable subroutine or procedure. The functions 901 to 910 defined in a given driver 804b include functions that must be performed or carried out differently depending on the source data 806a, 806 b. Thus, for example, with respect to one of the functions 901 designed to select certain directories on a hard disk or other information storage device that stores the desired information, the process of selecting a directory will vary depending on the type of source design 806, as can be seen, for example, in comparing the example of FIG. 2 with the example of FIG. 5. Accordingly, the programming 912 implementing function 1 in the first driver 804a is different from the programming code implementing the corresponding function in the second driver 804 b. In this way, each driver defines one or more processes to function through those processes designed to accommodate the different characteristics of two or more different types of source designs. For example, FIG. 11A illustrates a portion of a process, represented in pseudocode, of the type used in connection with selecting a directory structure and/or selecting a directory as shown in FIG. 2, while FIG. 11B illustrates a corresponding portion of pseudocode of a process used in connection with selecting a directory structure or selecting a directory as shown in FIG. 5. Those skilled in the art will appreciate from the example of fig. 11A and 11B how to construct drivers to perform the same function for two different types of source data. Although fig. 9 shows functional blocks having 10 functions, the present invention can be utilized with functional blocks having somewhat different functions. A system may be constructed in which different functional blocks define different functions and/or one or more functions are constructed to provide or return null or constant values or information.
1002 may be initiated by a variety of processes, which is one method as shown in fig. 10. In one embodiment, the method of FIG. 10 is implemented using a computer program stored on a medium such as a diskette, CD-ROM, or other non-volatile media, and the program is launched (i.e., loaded into memory and executed) by issuing instructions to the computer (e.g., via a keyboard, mouse, or the like). Alternatively, the program may be initiated by another program. For example, in one embodiment, the new database 808 is a Microsoft Access database, which may include routines (such as a so-called "wizard") to initiate the process, while the process (FIG. 10) in turn accesses data in the information source 806 to provide or update the database 808. In this embodiment, it may be useful to use the tutorial to display a prompt or "dialog" for soliciting user input as needed (e.g., step 1020) so that the user interface has external features consistent with the user interface for data 808.
In the process shown in FIG. 10, the first step after the process starts 1002 is to identify and initialize the dynamic driver 1004. For this reason, the driver 804 is considered dynamic in the sense that drivers can be added or deleted in modular form (e.g., to accommodate new or different types of data sources). For example, the system initially provided to the user is one with four drivers as shown in FIG. 8, but additional drivers purchased from software retailers, downloaded from information services, web, Internet-connected data, or by writing customer drivers may be added in the future. Because of the modular, dynamic nature of the drivers, it is not known in advance which driver is available, so when the program is launched 1002, the program recognizes the drivers it can utilize. In one embodiment by searching for a disk or directory for a file having a preset (partial) file name or file extension. In one embodiment, the program may further analyze selected portions of each file (e.g., header information) to prove that the file identified by the file name and/or extension is the desired driver. Drivers are initialized to include identifying and connecting driver functionality and initializing data within each driver.
Then, a determination is made whether the process is entered or updated 1006. In the input, a process is first performed in which all or most of the data and structures in the data source are accessed and stored to the new database. In an update, a process is performed in which only selected portions of the data and/or structure are accessed (e.g., to ensure that information in the new database 808 reflects the most recent changes or additions that may have been added to the data source 806. in typical cases, input is performed when the system of FIG. 8 first accesses or utilizes information from a given data source, or if more changes or additions are added to the data source. generally, updates are performed periodically (e.g., daily, weekly, etc.) to synchronize data in the source data 806 with data in the new database 808. in one embodiment, selection of the input or update 1006 is performed automatically (i.e., by making an update unless a particular data source has been accessed for the first time by the process.) in another embodiment, by providing input (e.g., by keyboard selection, utilizing a pointing device, etc.), allowing the user to select between input and update.
If the selection 1008 is entered, the main process 802 begins performing the functions of one or more drivers to select a directory to search 1010. The driver 804 loaded into or called by the main process 802 will depend on which source data is being accessed. In particular, for a given data source 806a, the main process 802 employs a driver 804b that is structured to accommodate the source data 806a type. If more than one source of data 806 is accessed, the main process 802 will exercise any drivers 804 made up of each source data 806. Preferably, the file structure automatically determines the type of data source based on characteristics such as the name (or "extension") of the file and/or directory, number, size and structure of the file, or other information in the file. In another embodiment, the user is allowed or asked to indicate the type of data source (e.g., by identifying the brand or version number of the software used to create the data source file, or by indicating whether the user wishes the process to search only for local disk files or perform a search that includes network files).
At the end of step 1010, the main process 802 accesses the stored directory table searched for all data sources 806, as identified by the driver 804 identified in step 1004. After step 1010, the main process 802 loads or retrieves or calls another function 902 of the function blocks 804a to 804d, searching the directory selected in step 1010 for data to be entered 1012. The directory 1012 is searched in a manner that depends on how the information is stored in the different data sources 806. For example, for some types of source data, it may be sufficient to identify only files having a certain file name and/or a certain file extension. For other types of source data 806, the data in the various files needs to be scanned to identify files having a certain structure or content (e.g., in a header portion of the file or elsewhere). Thus, the different modules 804 are structured to provide the "search directory" function 902 in different ways to accept different data sources 806.
If the update 1014 is made instead of the input 1008, the selection and searching of the directory is not necessarily required, since the results of the functions 901, 902 of selecting and searching the directory are stored in a way that the main process 802 can access later when the input is started. Thus, using this stored information, the main process 802 can identify previously entered or updated data. In one embodiment, it is useful to prevent the loading of redundant data (i.e., data that is already present in the new database 808). Generally, in the case of a full update, primarily in step 1016, new or changed data is identified (because of the last entry or update) so that at least some of the data in database 808 is not reloaded. In one embodiment, to prevent redundant data loads, the system may wish to identify data that has not changed (because it was the last input or updated). Generally, if this process is tracked, at the end of the process, the data in the new database 808 is synchronized with the information in the source data 806 (i.e., it includes results and data that accurately represent the data source 806 in their current state).
In the embodiment of FIG. 10, in order to give the user an opportunity to enter or update data, or to choose an opportunity to prevent entry or update of certain data, an identification of the data being entered or updated is displayed 1018. The method of organizing the data representation to be updated or entered for display will depend on the type of data source being accessed and will be provided in response to a call or initiation of a function in one of the drivers 804(902 a). For example, the function 902a that constitutes a driver for use in conjunction with FIGS. 4A-4F may display a list of subsidiaries in the table 430 (FIG. 4F) to give the user the opportunity to enter or update data for some companies but not others. Preferably, the user may select one or more companies from the displayed table. The display in the selection steps 1018, 1020 may be repeated for data in other types of organizations or data sources (to display data that allows selection specific to a certain user 416, certain credentials 412, depending on how the display function 902a of the drive is written or composed).
As will be described in detail below, in some cases, after the new database 808 is generated, the user may continue to use the original information sources 806a-806d to generate, store, edit, and in other cases, view the data. Thus, the present invention is compatible with databases 806, user interfaces, etc. that have long been fairly familiar with which users would like to continue to work. In this case, however, it is preferable to update the new database 808 constantly to reflect the new or edited data in the information sources 806a-806d, (or last updated) as the new database 808 is created first (or most recently updated). One way to do this is to repeat the entire process of generating database 808 (described below). However, in many cases, this is inefficient because most of the originally created new database 808 has not changed and is still valid. Thus, according to one embodiment of the invention, a process is provided to allow the new database 808 to be updated with only data from the information sources 806a-806d (which has been changed since the database 808 was last updated).
The frequency with which database 808 must be updated depends on the frequency of the various applications made up of database 808. Thus, if database 808 is accessed only once a week, updating database 808 every day becomes meaningless. Further, the particular information included in the update may vary depending on the application that is made up of the new database 808. Thus, if plant production data is reviewed for management quarterly, then data updated weekly (e.g., for regional sales) need not be included in the past. Preferably, one or more profiles defining various types of updates are created and stored, and preferably a schedule is also created and stored. Preferably, different update profiles are scheduled to be executed at different times or intervals, such as by providing a schedule that automatically executes a regional sales update process weekly, for example, prior to a weekly sales meeting, and schedules a quarterly update of production data prior to a quarterly production review.
Since different people utilizing the system of fig. 8 may exercise different portions of the system (e.g., information sources 806a-806d or new database 808) to perform daily operations. The profile may be stored as well as the new database 822, the old databases 824a-824d (if there is an appropriate "write enable" for the information stored in the information sources 806a-806 d), or both.
There are a number of schemes that can be used to create profiles. It may include commands (such as menu selections or buttons) to allow the user to request the creation of an update profile (e.g., by a predetermined sequence of screen tables or dialog boxes, such as provided by Microsoft Access Wizards). The user may also be automatically prompted to decide whether to create or edit a profile, for example, whenever the user requests that the current screen or printed report of exercise data be undefined (or that the frequency of exercising data exceeds the current frequency of profile updates for that data). In one embodiment, profile 822 is created by creating a database table that lists each field of database 808, and for each such field, indicates whether there is an existing profile that updates the database, the identification name or name of such profile, and the current update frequency or interval.
In addition to automatically scheduling updates, the system user of FIG. 8 is preferably also provided with the option of requiring an update at any time, and preferably with the option of selecting or requiring a full update in any predefined update profile. If desired, the system may prompt the user to determine whether the data set to be updated should be stored as a profile (as described above). The system may also provide a list of profiles that are already existing or defined so that the user can select among predefined profiles rather than among displayed data (such as a list of subsidiaries).
In some cases, it may be predetermined that all available information from the database ("full update") needs to be entered or updated frequently, and that the function 902a of the driver of the application may simply advance the program to the main process 802. For example, with respect to the data sources shown in fig. 7A to 7D, it can be determined that the available surface temperatures measured from various places are always included in each input and update. In one embodiment, a display is provided to the user that indicates the location of the data selected by the user. For example, the directories, sub-directories and files containing information to be accessed may be displayed. The user may also be provided with an opportunity to select which directory to access as an option.
Once the decision is made, for example, at steps 1018 and 1020, the data to be entered or updated, and the total information is installed into the system at 1022. If access to information from two or more data sources is desired, then steps 1022 through 1046 may be performed serially (i.e., on a first data source using a first module, followed by steps 1022 through 1046 on a second data source using an appropriate driver, etc.) or in parallel (i.e., steps may be performed on each desired data source using an appropriate driver before subsequent steps are performed on each information module).
The general information includes information about the structure of data in the data source. The type of general information loaded in this step 1022 varies according to various types of source data. For example, for a function 903 written or composed for use in conjunction with a database such as that shown in fig. 4A through 4F, the general information may include, for example, an account identification or other category applied in the data source 806. On the other hand, if the function 903 of the driver 804 is constructed or written for use in conjunction with a data source as shown in FIG. 1, then it is necessary to determine how much of the portion is to be used in the data source, but in this case, this information would be determined by the multiple flat files 101a through 101f found in the data source 806. The general information may also include information such as how many plans 112, how many products 114 and/or production lines 116, or how many subsidiaries 118 are defined in the database 806. The general information may also include the company name, the first month of the fiscal year, and generally any other information that may be loaded at once (as opposed to, for example, the information loaded in steps 1024, 1033, and 1036, which is generally loaded into a loop). If the "load general information" function 903 is provided in a drive that is configured for use with a data source as shown in FIG. 6, then general information (such as the location number 612 in the database) may be loaded in step 1022.
The main process 802 also calls or launches an appropriate driver or module 804 to load the functionality 904 of the data definition 1024. The data definition may include such things as the name of the text, the size of the field, the type of data (string, integer or decimal; decimal), and similar characteristics for various data stored in the data source 806 as an identification number for a particular data category or class. Preferably, the load data definition includes information required for the data query to obtain the architecture or construction of the information stored in the data source and the representation of the data elements in the data source, as required in generating one or more new databases 808 that include all the structure and data required for the type of report or analysis being performed on the new database. It is an intelligent query in the sense that the data query in the "load data definition" step can be consistent with virtually any data source and identifies the standard form in which the data source is stored on demand (e.g., for reporting and analysis). In the example of fig. 4A through 4F, the information required to represent the architecture of the source data includes, for example, the names of the four account parts (account, company, employee, and location), as well as the data type (e.g., numeric or string) and the length required to store any string account part. In the example of fig. 6, the information required to represent the architecture of the data source includes the names of the storage account sections (place and date) and the parameter names for these data (units). The query may include other optional data identifying that may be loaded (e.g., a number of invoices). The particular type of query being made is determined by the characteristics of the particular data source being analyzed and is thus different for each driver 804. Generally, functions implemented in driver 804 perform the steps of loading data or information 1022, 1024, 1033, 1036, while main process 802 performs the steps of saving information 1026, 1028, 1030, 1032, 1034, 1038.
The main process 802 then identifies or creates a database 1026, i.e., identifies or creates a file or other data storage structure in the new database 808, with the new database 808 as the location to hold the information loaded from the data source or sources 806. Updates typically do not require the creation of new databases or database tables, as updates are typically simply added to existing tables that already exist in a database.
Main process 802 then calls the appropriate function 905 of one or more drivers to create a data table 1028 to be used to store data from the source data to new database 808. The method of creating the database tables preferably takes into account the data and data source structure and the method of exercising the new database 808 (e.g., for analysis, report generation, etc.). Because the particular tables created vary depending on the nature of the information at the data source 806 (e.g., determined in steps 1022 and 1024), creating the database table 1028 is a function provided by the driver 804 for the particular database construct being accessed. For example, when writing or composing the "create database table" function 905 for use in conjunction with a data source such as that shown in FIGS. 4A through 4F, the created database tables include, for example, an account table, an employee table, a subsidiary table (which may be an accumulation of accounts), and a detail table (described in detail below), although the "create database table" written or provided in a drive composed for use in conjunction with FIGS. 7A through 7D may include a location table, a data table, a time table, a unit table, and a detail table. Preferably, the table created in the new database 808 has a structure or construct that is dynamic in the sense that it can accept any data definition or structure found from the various data sources 806. In one embodiment, the new database 808 is intended primarily for outputting information such as generated reports and analysis, and is preferably structured to provide optimal output performance (such as high flexibility for output and the types of data analysis available and relatively rapid execution of such analysis and/or output). Because of this, the database is optimized in terms of output speed and or flexibility if it provides output speed and flexibility that is superior to the speed and flexibility of some other various possible configurations. For this reason, "optimization" does not necessarily require mathematically exact optimization. In one embodiment, three general types of tables are provided in step 1028: a plurality of tables (including accumulation tables when appropriate), at least one detail table, and at least one entry table. Preferably, a table of one type is provided for each method, wherein specific data points or records can be sorted. For example, if the "create data table" function 905 is provided in driver 804 (which is constructed for use with FIGS. 4A through 4F), then the new database as shown in FIG. 12 will include a plurality of category tables 1202 including, for example, a list of all possible categories of accounts 1203, a list of all possible subsidiaries found in data source 806 1230, a list of all products recorded in data source 806 1232, a list of all employees recorded in data source 806 1216, and a list of various locations, sales areas, etc. recorded in data source 806. In the illustrated embodiment, each record or entry in each category table 1202 is associated with an index for a detail table described below.
In the embodiment shown in FIG. 12, a detail table 1240 is provided, where once aggregated, it has a record of the traffic entering or in the data source or sources 806 for each account. In the embodiment of fig. 12, fields 1244 (i.e., data segment 1242b, description segment 1242c, volume segment 1242d, and number field 1242e) are provided. An index section 1242a is provided to store an identification number or index number of each record. Further, for each record, a separate field is provided to store a representation of any appropriate information for each defined in the load general information step 1022, including, in the example of FIG. 12, an account category 1242f, a subsidiary category 1242g, a product category 1242h, and an employee category 1242 i. In general, it is desirable to provide as many different fields (i.e., categories) as are needed to analyze or output data, which appear in the data source 806. Therefore, it is useful to have an account type because it is desired to output a report classified by business-related items. Further, for the required accounting purposes, it is necessary to print out a separate report for each subsidiary or a report in which the business is classified by the subsidiary, and it is useful to have a subsidiary category 1242 g. Generally, separate fields may be provided in the detail table 1240 for each desired method of selecting, grouping, reporting, printing, or analyzing data. The structure of the database shown in fig. 12 is in great contrast to the structure of the data source shown in fig. 4A to 4F and the structure of the data source shown in fig. 1. For example, in the structure shown in fig. 4A to 4F, the method in which a particular service (fig. 4A) is associated with a particular account (fig. 4B) is shown in a separate linked list (fig. 4D), while in the embodiment of fig. 12, the index of the appropriate account 1242F is stored in its own field in the same record that includes the service information 1244. Thus, while database 808 with tables as shown in FIG. 12 may store information found in a database as shown in FIG. 1 or databases as shown in FIGS. 4A through 4F, the structure and architecture of the database in the example of FIG. 12 is different from that of the database as shown in FIG. 1 or the data sources as shown in FIGS. 4A through 4F. Similarly, the relational database structure of FIG. 12 is different from the flat file structure shown in FIG. 1, although the kinds of information stored in the two organizations are similar.
The main process 802 may be configured to save general information 1030 (loaded in step 1022) and save data definitions 1032 (loaded in step 1024), for example, in additional tables provided in the database 808 (e.g., for use in later steps in FIG. 10 and/or later updates). The main process 802 loads data definition codes (e.g., field lengths, data types) for the various tables created in steps 1028(1033) into the new database 808 using the appropriate functionality 908 of one or more drivers 804. In one embodiment, the accumulation information is also loaded at this time. In general, accumulation information refers to information for defining a sub-category of data (i.e., a group of items in a category table). As an example, as shown in FIG. 12, the employee category table may be associated with a location or area code (e.g., identifying the location or place or area where each company employee is located or should assume responsibility). For example, a company has a number of sales people, each sales person associated with a sales area. As another example, various products of a company may be accumulated or aggregated into a product line. By defining a field 1238 for the location accumulation code, the structure of FIG. 12 properly illustrates that reports sorted by sales region are feasible. Accumulation can also be used to provide statistical analysis (e.g., mean, means, standard deviation, etc.) of the group data. Although in the example of fig. 12, a field of the location accumulation code 1238 is shown as a field of the employee category table 1216, the location field 1238 may be provided in detail if necessary.
The way in which the function of loading the accumulation code 908 operates varies depending on the type of source data 806 used to construct the function, and is provided as a function of the various drivers 804 so that different programming instructions can be provided for use with different types of source data. As an example, in a driver 804 configured for use in conjunction with a database as shown in FIGS. 4A through 4F, a "define accumulate code" function 908 may be provided. In this example, the location field has been defined in the employee table 416, which may be used directly for location codes, whereas in the embodiment shown in FIG. 1, there is no representation of the location of the employee 120 associated with a particular flat file 101 a. Thus, in one embodiment, the location accumulation is data that is not likely to be obtained by the data source of FIG. 1. However, if there is another file that provides the home address of each employee in the company, for example, then the home situation of each salesperson, for example, can be employed to determine the sales area for which the salesperson should be responsible, and thus, in theory, location accumulation codes can be defined. Further, the "define accumulate code" function may include accessing information that may be used to define the accumulate code. For example, in connection with the data source as shown in FIG. 6, the "define data accumulation" function 908 may include, for each potential temperature station 612, indicating whether the station is in the northern hemisphere or the southern hemisphere, and may create a hemisphere code on that basis. In some cases, it may be desirable to provide word recognition and/or searching for keywords in the database to define additional accumulations and/or structures.
In some cases, the accumulated code is associated with information that is not used in the data source as a basis for analyzing or classifying the data (e.g., a hemispherical accumulated code used in the data source of FIG. 6). Thus, in these cases, providing the accumulated code includes providing enhanced data that is unavailable (or at least not utilized) for showing or analyzing information in the data source 806 by automatically providing additional elements. Preferably, the data is classified into categories, and then they are classified by deep analysis of the data source.
The system may be configured to identify and add certain reporting relationships determined by the degree of certainty of the data or the structure of the data in the information sources 806a-806 f. For example, sales information may have sales organized by sales regions (either represented in information sources 806a-806d or, for example, using salesman address inference as described above). According to embodiments of the present invention, these areas may also be viewed or analyzed by other groups (load, two or more vice presidents, each responsible for two or more sales areas), from which the relationship of the vice presidents to the sales areas may be clearly found or inferred in the information sources 806a-806 d.
Preferably, such additional accumulation or enhancement is defined only if the data indicates that it makes sense to view the data according to accumulation criteria. For example, in one embodiment, no enhancement or additional relationships are automatically added unless there is more than one value for a particular field or parameter, but preferably less than the cumulative number of changes to that field or parameter. For example, if at least two different sub-presidents are responsible for different areas, it makes sense to view sales according to the sales volume corresponding to each sub-president. However, in the example of fig. 6-7, if all reporting locations are set in tropical regions, the system does not automatically create an accumulation or augmentation by viewing weather data according to the kind of snowfall volume, since all station reports include zero for snowfall volume in this example. Thus, in the business database, the system preferably examines the data in the source databases 806a-806d to determine, for example, whether there are multiple salesmen, multiple regions, and/or multiple products. If there are multiple values in the fields for the various records of the database, the system may be structured to provide an option with the fields to view the average or sum of the data (e.g., with values such as the average or sum of sales, overhead, etc.) according to the fields.
4A-4E, the table of FIG. 4D may be used to determine which employees are associated with any business. Table 4C is used to determine, for a given employee, whether the employee is associated with the northwest region, the southwest region, or the central region. If all of the services are associated with employees in a central location, the services of the location need not be displayed. However, if the businesses in Table 4D are associated with employees in at least two different areas, then the system may be configured to automatically generate an accumulation that displays the average or sum of the businesses (and/or businesses of a particular account type as determined from FIG. 4B) separated by the location or area associated with the employee associated with the business.
In addition to defining the accumulation, the process may also store optional reference fields. In general, the optional reference field refers to a field that is not generally used for grouping data such as free text segments (notes, memo segments, number of votes, etc.), but refers to what is required for inclusion in a report or the like.
After loading and accumulation of data definition code, the data definition code and accumulation are stored in the new database 808 (e.g., providing listing of categories in various category tables 1202. Loading and storing data definition and accumulation codes 1033, 1034 are executed in a round robin fashion to load and store a particular category (in the example of FIGS. 4A-4F, a particular account, company, employee, and location)
In review, according to the illustrated embodiment, step 1024 defines the category (e.g., account, company, employee, and location) and its data type (e.g., string or number). Step 1028 creates the table of contents defined in step 1024 (and other tables defined in step 1022). Step 1032 stores the data definition in a criteria table. Steps 1033 and 1034 load and save the data definition and accumulated code.
In the process, while options regarding the data structure are set in the new database 808, data that is the subject of the source data 806 (e.g., accounting entries or business in the case of accounting source data, temperature data in the case of meteorological temperature source data) is not loaded into the new database 808. Thus, the main process 802 invokes or initiates a function 909 in the appropriate driver 804 to load data 1036, save data 1038, and repeat the process 1039 until all required data is loaded and saved 1040. Thus, at the end of the constructs 1036, 1038, 1040, data from the one or more data sources 806 is provided to the new database 808.
When two or more data sources having different structures and/or generated using different brands or types of software are combined by using the process of FIG. 10, data from different types of sources can be made to have a common database structure (e.g., as shown in FIG. 12). This facilitates commonalization or standardization of the analysis and reporting of data, which are preferably optimized to provide flexibility and speed of output.
In the example of FIG. 10, a main process 802 may now be established and, if desired, a data query (such as summary query 1042) may be performed. Generally, three queries can be established. A first type of query may be provided that is common to all new databases 808 created using the process of fig. 10, such as a query that provides a number of entries to a detailed table or a number of entries to a given data range (e.g., quarterly). Another query may be established, at least in part, depending on general information and with respect to one or more data sources 806 (including the accumulation that has been provided, so that it is provided as part of driver 804, particularly with respect to a particular data source, if desired). A third query may be provided to replicate or include the query or report used in the original data source (e.g., as shown in fig. 4E).
After providing the new database 808, the system automatically performs some sort of audit or audit checking to prove that the system is functioning properly, according to one embodiment. Various checks may be included. The data samples (or, if desired, all data) in the new database 808 can be compared to the corresponding data in the sources 806d and 806d to verify that the data has not been corrupted. The new database 808 may be checked to verify that the desired structure exists. For example, in the case of an accounting database, the new database 808 may be automatically checked to determine the required accounting portion is present. The new database 808 may be checked for empty groups (such as by determining accumulation, or other defined enhancement non-empty).
Once the new database 808 is provisioned and certified, and the appropriate queries are established, the main process 802 may close the tables and database 1044, as well as the dynamic driver 1046 (e.g., freeing memory).
While the primary use of the present invention is in connection with providing standardized and/or enhanced reporting and analysis of data in one or more data sources, the present invention may also be employed in connection with data entry and data storage by employing database management systems (e.g., microsoft access , Excel , FoxPro , Btrieve , etc.). While it is contemplated that the primary use of the present invention includes continuing to employ the original source data 806 for input and storage while maintaining the original source data 806 as a copy of the same information in the new database 808 for reporting and analysis purposes, the present invention may also be employed to transfer data from one source data 806a, b to another (e.g., 806c) by first storing it in the new database 808 as described above, and thereafter downloading or exporting the information from the new database 808 to a different type of source data 806 c.
From the above description, a number of advantages of the present invention can be seen. The present invention makes use of standardizing reports and analysis regardless of the brand and variety of data sources used. The present invention provides a system that can be optimized or provide improved performance of output or reporting data. The present invention provides data reporting and analysis capabilities that have been enhanced compared to the data portion and analysis of the data source. By perfecting the interrogation of the source data, the present invention can reflect the certification tables established in the data source due to the accounting system. In one embodiment, the process extracts some or all of the defined accumulations, optional reference fields, accounting period information. By doing this automatically and eliminating (reducing) the need for manual analysis, the present invention is less labor and time intensive than previous methods, and in some cases, new databases 808 that take days or weeks to complete in previous methods may be provided within minutes or n hours. In one embodiment, the driver 804 may be configured to detect, analyze, and maintain in the new database 808 any secrets, passwords, permissions, etc. used in the data source 806. Thus, the system executor does not have to maintain a new or separate set of accounts, passwords, permissions, etc. for the new database 808, other than with the original data source 806. Preferably, the system can be configured to perform substantial updates at predetermined intervals (such as daily, weekly, etc.).
Many variations and modifications of the invention may be used. Some aspects of the invention may be employed without the necessity of using other aspects. For example, a new database 808 may be provided without defining new or attachment accumulations. While in the above description the various drivers 804 may be provided as separate DLL files, and the various drivers 804 are dynamic in the sense that they can be simply added as much as needed by storing additional DLL files in the appropriate commands, the invention may also be made operable wherein the functions performed by the functional block are provided as part or a subroutine called by the main process 802 (but not separately stored blocks).
While the present invention has been described in terms of the preferred embodiment(s) and certain variations and modifications therein, other variations and modifications may be made thereto, and the invention is limited only by the following claims.
Claims (36)
1. A computer-implemented method, comprising:
providing a first driver that issues instructions for accessing data that may be stored in a first or second different data source, the first driver containing program instructions for use in conjunction with the first data source; and
using said first driver to automatically obtain information about the data structure of said first data source without manual analysis of said first data source, said information about the data structure of said first data source resulting in optimization of a new database when said new database is formed in which information about said first database is to be stored.
2. A computer-implemented method usable in connection with accessing data potentially stored in first and second different data sources, the method comprising the steps of:
a first step of providing a first driver comprising program instructions for use in conjunction with the first data source;
a second step of automatically obtaining information about the data structure of the first data source using the first driver without manual analysis of the first data source;
storing at least some information from the first data source in a first database using the first driver;
wherein the first database is augmented relative to the first data source.
3. The method of claim 2, further comprising:
the structure of the first database is defined by the first information.
4. The method of claim 2, wherein the first driver includes programming code that is callable by a primary process, and wherein the method is extensible without substantial changes to the primary process to accommodate the second data source by providing and invoking a second driver that contains program instructions for use in conjunction with the second data source.
5. The method of claim 2, wherein the method flexibly allows access to any readable data source by providing a driver for use in conjunction with such readable data source.
6. A computer-implemented method for use in conjunction with data potentially stored in first and second distinct data sources, at least one of the first and second data sources being for generating at least a first output, the method comprising the steps of:
providing a first driver comprising program instructions for use in conjunction with the first data source;
providing a second driver, different from said first driver, containing programming code for use in conjunction with said second data source;
applying said first and second drivers to obtain first and second information about data structures of said first and second data sources, respectively;
using the first and second information to define a structure of a first database;
storing at least some information from the first and second data sources in the first database using the first and second drivers; and is
And at least generating a first report according to the information in the first database.
7. The method of claim 6, wherein the first report is enhanced relative to the first output.
8. The method of claim 6, wherein each driver includes program instructions for performing a plurality of functions.
9. The method of claim 8, wherein the plurality of functions includes at least one function selected from the group consisting of:
selecting a directory where the data source is located;
searching a directory of data files;
displaying data to be input and updated;
loading general information from the data source;
loading a data definition from the data source;
creating a data table for storing at least some information of the data source;
storing general information obtained from the data source;
storing a data definition from the data source;
loading a data definition code into said first database;
loading accumulated information into said first database; and
loading data from the data source into the first database.
10. The method of claim 6, wherein the first database is optimized for speed of data output.
11. The method of claim 6, wherein the first database is optimized for flexibility in data output.
12. The method of claim 6, wherein the first database comprises a plurality of category tables, at least one detail table, and at least one entry table.
13. An apparatus for use in connection with accessing data potentially stored in first and second different data sources, the apparatus comprising:
first drive means comprising program instructions for use in conjunction with said first data source;
a second driver, different from said first driver, containing programming code for use in conjunction with said second data source;
means for employing said first and second driving means to obtain first and second information about the data structure of said first and second data sources, respectively;
means for using said first and second information to define a structure of a first database;
means for storing at least some information from said first and second data sources in said first database using said first and second drive means;
wherein the first information results in an optimization of the first database.
14. A computer-implemented method, comprising the steps of:
providing a first driver that issues instructions for accessing data stored in a first data source;
using the first driver to obtain first information about a data structure of the first data source;
applying the first information to define a structure of a first database;
employing the first driver to store at least some information from the first data source in the first database;
storing second information defining at least a portion of the data stored in the first data source to be used to update the first database; and
updating the first database using the second information;
wherein the first information results in an optimization of the first database.
15. The method of claim 6, wherein at least one of the first and second data sources comprises a plurality of records, each record having a plurality of fields for storing data values, and the method further comprises;
identifying at least one field, wherein at least first and second different data values are stored in the one field of at least two records; and
displaying first and second data sets corresponding to the first and second different data values.
16. The method of claim 6, further comprising the steps of: analyzing information stored in the first database to identify errors.
17. The method of claim 2, further comprising the steps of: storing data from the data source in the first database.
18. The method of claim 2, further comprising a step for validating the information.
19. The method of claim 17, further comprising the steps of: at least some of the data stored in the first database is changed to provide changed data, and at least some of the changed data is written back to the data source.
20. The method of claim 17, further comprising the steps of: a second database is created and at least some of the data stored in the first database is stored in the second database.
21. The method of claim 20, wherein said steps for creating said first and second databases comprise the steps of: database tables are created in view of the manner in which the first and second databases will be used.
22. The method of claim 20, wherein at least one of said first and said second databases is enhanced with respect to said first data source.
23. A computer-implemented method usable in connection with accessing data potentially stored in a first data source, wherein the first data source is operable to generate at least a first output, the method comprising:
providing a first driver comprising program instructions for use in conjunction with the first data source;
obtaining first information about a data structure of the first data source by automatically accessing information content stored in the first data source using the first driver;
defining a structure of a first database distinct from the first data source using the first information, wherein the first database does not exist prior to the step of obtaining the first information with the first driver;
storing at least some information from the first data source in the first database with the first driver;
updating said first database with less than all of the information in said first data source, said updating step being performed after said step of storing at least some of the information from said first data source in said first database with said first driver;
wherein the first information results in an optimization of the first database.
24. The method of claim 23, wherein the updating step uses at least some of the first information.
25. The method of claim 23, further comprising the steps of: periodically repeating the updating step, wherein data in the first data source is synchronized with data in the first database.
26. The method of claim 23, further comprising the steps of: a second database is created and at least some of the data stored in the first database is stored in the second database.
27. The method of claim 26, further comprising:
and generating at least a first report according to the information in at least one of the first database and the second database.
28. The method of claim 27, wherein the first report is enhanced relative to the first output.
29. A computer-implemented method, characterized in that the method comprises the steps of:
providing a first driver that issues instructions for accessing data stored in a first data source, the first driver containing program instructions for use in conjunction with the first data source;
automatically obtaining first information related to the first data source with the first driver without manual analysis of the first data source;
creating at least a first database for storing at least some data from the first data source, the first database being based on at least some of the first information;
creating at least a second and a third database containing information from said first database, wherein said second and third databases are different from each other;
wherein the first information results in an optimization of the first database.
30. An apparatus usable in connection with computer-implemented access to data potentially stored in either of a first or second disparate data source, the apparatus comprising:
a first driver including program instructions for use in conjunction with said first data source, said first driver not being used in conjunction with said second data source;
means for automatically obtaining first information about a data structure of the first data source with the first driving means without manual analysis of the first data source by automatically accessing information content stored in the first data source;
means for defining a structure of a first database distinct from said data source with said first information, wherein said first database does not exist prior to automatically obtaining said first information with said first drive means;
wherein the first information results in an optimization of the first database.
31. The apparatus of claim 30, wherein the first database is enhanced with respect to the first data source.
32. The apparatus of claim 30, wherein the first driver includes programming code that is callable by a main process, and the apparatus is extensible to accommodate the second data source through a second driver that contains program instructions for use with the second data source, but the second driver is not used in conjunction with the first data source, and the main process is not substantially altered.
33. The apparatus of claim 32, wherein each drive means comprises program instructions for performing a plurality of functions.
34. The apparatus of claim 33, wherein the plurality of functions includes at least one function selected from the following:
selecting a directory where the data source is located;
searching a directory of data files;
displaying data to be input or updated;
loading general information from the data source;
loading a data definition from the data source;
creating a database table for storing at least some information from the data source;
storing general information obtained from the data source;
storing a data definition from the data source;
loading data definition code into said first database;
loading accumulated information into said first database; and
loading data from the data source into the first database.
35. The apparatus of claim 30, wherein the first information about the data structure includes information from a queue stored in the first data source.
36. The apparatus of claim 30, wherein said first data source stores accounting information, and said information about a data structure of said first data source includes an identification of an accounting portion.
Applications Claiming Priority (5)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US58206295A | 1995-12-30 | 1995-12-30 | |
| US08/582,062 | 1996-01-02 | ||
| US08/593,118 | 1996-02-01 | ||
| US08/593,118 US5802511A (en) | 1996-01-02 | 1996-02-01 | Data retrieval method and apparatus with multiple source capability |
| PCT/US1996/020366 WO1997024658A1 (en) | 1995-12-30 | 1996-12-20 | Data retrieval method and apparatus with multiple source capability |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| HK1018328A1 HK1018328A1 (en) | 1999-12-17 |
| HK1018328B true HK1018328B (en) | 2005-01-07 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US6026392A (en) | Data retrieval method and apparatus with multiple source capability | |
| US8327260B2 (en) | System for annotating a data object by creating an interface based on a selected annotation structure | |
| KR20010071701A (en) | Data Retrieval Method and Apparatus with Multiple Source Capability | |
| US7925658B2 (en) | Methods and apparatus for mapping a hierarchical data structure to a flat data structure for use in generating a report | |
| US6785689B1 (en) | Consolidation of multiple source content schemas into a single target content schema | |
| US6625617B2 (en) | Modularized data retrieval method and apparatus with multiple source capability | |
| US6631382B1 (en) | Data retrieval method and apparatus with multiple source capability | |
| KR100538547B1 (en) | Data retrieval method and apparatus with multiple source capability | |
| US20020138297A1 (en) | Apparatus for and method of analyzing intellectual property information | |
| US20060242160A1 (en) | Method and apparatus for transporting data for data warehousing applications that incorporates analytic data interface | |
| US20080126057A1 (en) | System and method for managing simulation models | |
| JP4609995B2 (en) | Method and system for online analytical processing (OLAP) | |
| US20040041838A1 (en) | Method and system for graphing data | |
| US10902023B2 (en) | Database-management system comprising virtual dynamic representations of taxonomic groups | |
| Rifaie et al. | Data warehouse architecture and design | |
| EP1634192A1 (en) | Data processing system and method for application programs in a data warehouse | |
| US8280896B2 (en) | Reporting row structure for generating reports using focus areas | |
| CN1150670A (en) | data processing equipment | |
| HK1018328B (en) | Data retrieval method and apparatus with multiple source capability | |
| EP1304630A2 (en) | Report generating system | |
| AU772658B2 (en) | Data retrieval method and apparatus with multiple source capability | |
| Chang | A database management system for interlibrary loan | |
| MXPA00012346A (en) | Data retrieval method and apparatus with multiple source capability | |
| Jiang | FHWA InfoMaterials™ Dataset Management | |
| AU2004200749A1 (en) | Data retrieval method and apparatus with multiple source capability |