US20090070601A1 - Method and apparatus for recursively analyzing log file data in a network - Google Patents
Method and apparatus for recursively analyzing log file data in a network Download PDFInfo
- Publication number
- US20090070601A1 US20090070601A1 US12/268,437 US26843708A US2009070601A1 US 20090070601 A1 US20090070601 A1 US 20090070601A1 US 26843708 A US26843708 A US 26843708A US 2009070601 A1 US2009070601 A1 US 2009070601A1
- Authority
- US
- United States
- Prior art keywords
- entries
- addresses
- stream
- results
- streams
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 238000012545 processing Methods 0.000 claims abstract description 12
- 238000001914 filtration Methods 0.000 claims 11
- 238000010586 diagram Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000012544 monitoring process Methods 0.000 description 3
- 238000013474 audit trail Methods 0.000 description 2
- 238000007418 data mining Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/06—Generation of reports
- H04L43/065—Generation of reports related to network devices
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1425—Traffic logging, e.g. anomaly detection
Definitions
- Embodiments of the present invention generally relate to data mining and, more particularly, to a method and apparatus for recursively analyzing log file data in a network.
- elements in a network employ various data logging processes to automatically record events in a certain scope in order to provide an audit trail.
- the network operator may use the audit trail for various purposes, such as diagnosing problems, tracking network access among users, and the like.
- a server in a network typically creates and maintains one or more server log files that contain a record of activity performed by the server for client devices.
- a typical example is a web server that maintains a history of requests received by client devices for web content.
- the data in a log file may be analyzed to obtain various types of statistics related to the activity of the particular network element.
- network operators track security-related statistics, such as monitoring Internet access by client devices to detect requests for inappropriate or illicit content.
- security-related statistics such as monitoring Internet access by client devices to detect requests for inappropriate or illicit content.
- Internet access monitoring is typically employed in an enterprise setting.
- Conventional analysis tools for detecting inappropriate Internet use rely on detecting particular words or phrases in log file entries indicative of content that has been deemed inappropriate or illicit for the particular environment. Entries containing such words or phrases are copied and stored in a result file. Such analysis tools, however, generate a substantial number of false matches.
- the result file includes an arbitrary sequence of entries without any useful organization of data. Accordingly, there exists a need in the art for an improved method and apparatus for analyzing log file data in a network.
- entries in the log data are filtered using a plurality of filters to select first entries from the entries.
- the first entries are filtered using a plurality of false positive filters associated with the plurality of filters to select second entries from the first entries.
- Unique IP addresses are identified in the second entries.
- the entries in the log data are then filtered using the unique IP addresses to select third set entries.
- the third entries are analyzed to detect one or more patterns.
- FIG. 1 is a block diagram depicting an exemplary network architecture in accordance with one or more aspects of the invention
- FIG. 2 is a flow diagram depicting an exemplary embodiment of a method for processing log data produced by a network device in accordance with one or more aspects of the invention
- FIG. 3 is a flow diagram depicting an exemplary embodiment of a method for processing log data produced by a network device in accordance with one or more aspects of the invention.
- FIG. 4 is a block diagram depicting an exemplary embodiment of a computer suitable for implementing the processes and methods described herein.
- FIG. 1 is a block diagram depicting an exemplary network architecture 100 in accordance with one or more aspects of the invention.
- the network architecture 100 includes a local network 102 , client devices 104 - 1 through 104 -N (collectively client devices 104 ), a network devices 106 , and a computer 114 , where N is an integer greater than zero.
- the client devices 104 , the network device 106 , and the computer 114 are coupled to the local network 102 .
- the network device 106 is configured for communication with a wide area network 112 . In the present embodiment the wide area network 112 is the Internet.
- the local network 102 comprises any type of packet transport network known in the art (e.g., an Ethernet network). Data may be communicated over the local network 102 using any type of protocol, such as transmission control protocol/internet protocol (TCP/IP).
- the client devices 104 may include desktop computers, workstations, and the like. The client devices 104 are configured to access the wide area network 112 via the network device 106 .
- the network device 106 may be a server configured to provide a gateway to the wide area network 112 for the client devices.
- the network device 106 may comprise a web server, proxy server, or the like.
- the network device 106 is configured to maintain one or more log files as log data 110 .
- a log file includes entries that list actions that have occurred with respect to the network element 106 .
- a web server maintains log files listing every request made to the server for access to the Internet.
- Each log file in the data 110 includes a list of entries, where each entry includes various fields having particular values associated with particular requests handled by the network element.
- an entry includes a field indicative of a request made by a client device and an IP address associated with the request.
- the request field includes a character string indicative of a particular file or object requested.
- Each log file in the data 110 may be formatted using any type of format known in the art.
- One exemplary type of format is known as the common log format (CLF).
- a CLF entry includes fields for a remote host, a remote login name of a user, a username as which a user has authenticated himself, the date and time of a request, the request itself, status of the request, and the number of bytes transferred during the request.
- an entry in a log file having a CLF format may be:
- the IP address 127.0.0.1 is the IP address of the client device that made the request to the server. The hyphen indicates that the remote login name of the user is not available for this request.
- the value [6/Dec/2005:14:32:15-0600] is the date and time the server finished processing the request.
- the value “GET/picture.gif HTTP/1.0” is the request line from the client device. In this exemplary request, the method used by the client device is GET, the client requested the resource picture.gif, and the client used the hypertext transfer protocol (HTTP) version 1.0.
- HTTP hypertext transfer protocol
- the value 200 is the status code the server sends back to the client device (a full list of service codes is contained in the HTTP specification, RFC 2616).
- the value 2326 includes the size of the object returned to the client device. It is to be understood that the log files generated by the network device 106 may have various other formats known in the art.
- the computer 114 includes a log analysis module 108 for performing analysis of the log file data 110 generated by the network element 106 .
- the computer 114 obtains the log file data 110 from the network device 106 via the local network 102 .
- the computer 114 is configured to store result data 118 produced by the log analysis module 108 in the database 116 .
- the database 116 may comprise a storage device associated with the computer 114 .
- the database 116 may be more sophisticated and comprise, for example, a server executing any type of well-known database software. For purposes of clarity by example, only a single network device is shown.
- the local network 102 may include multiple network devices, each of which maintains one or more log files.
- the computer 114 may interface with these multiple network devices via the local network 102 to obtain log file data to process.
- the log analysis module 108 analyzes input log file data to detect specific patterns. The rate of false positive matching for the patterns is reduced by using recursive data mining.
- the log analysis module 108 processes input log file data in two phases. In a first phase, entries in the log file data that include predefined character strings are identified. Unique IP addresses for the identified entries are also identified. In one embodiment, machine names associated with the IP addresses obtained. The identified entries are organized based on the unique IP addresses/machine names and stored in a database. In a second phase, entries in the log file data that include the unique IP addresses obtained in the first phase are identified. In this manner, all activity associated with each of the unique IP addresses may be identified. The entries identified in the second phase may be filtered based on particular event types. The identified entries may be organized based on IP address/event type and stored in a database.
- FIG. 2 is a flow diagram depicting an exemplary embodiment of a method 200 for processing log data produced by a network device in accordance with one or more aspects of the invention.
- the method 200 may be performed during the first phase of analysis performed by the log analysis module 108 .
- the method 200 begins at step 202 , where one or more raw log files are obtained from one or more network devices.
- the log files are “raw” in that they have not been processed or re-formatted.
- the log files are filtered using a plurality of filters to select first entries.
- Each of the plurality of filters comprises a specific string of ascii characters.
- the filters may include words or phrases indicative of material that has been flagged by the network operator as being inappropriate for the setting in which the network is deployed.
- Each entry in the log files is processed to determine if the entry includes any of the character strings specified by the filters.
- the character strings may be located in the request field. Entries containing such character strings are extracted to form a first group of entries (referred to as the first entries). Step 204 is referred to as the first pass filter.
- the first entries are filtered using false positive filters associated with the plurality of filters to select second entries.
- Each of the false positive filters comprises a specific character string that is not indicative of inappropriate material, but its inclusion in an entry is known to cause the entry to be extracted by the first pass filter step 204 .
- the word “virgin” comprises one of the filters employed in the first pass.
- Such a filter captures entries that include the word “virgin”, as well as other words with that character string, such as “Virginia”. If the word “Virginia” is not one of the filters employed in the first pass, the capture of an entry with the word “Virginia” is a false positive.
- the character string “Virginia” may be employed as a known false positive filter. Those entries in the first entries that contain character strings indicative of false positives are removed from the first entries to form a second group of entries (referred to as the second entries).
- the second entries are filtered using an event filter to produce a plurality of streams of entries (“event streams”).
- the event filter is configured to categorize the second entries based on specific types of events. For example, one type of event may be particular time associated with a request recorded in an entry. Entries having a time after 12:00 PM are output as one stream, and entries having a time before 12:00 PM are output as another stream. Other types of events include whether a request resulted in a connection, whether a request was denied, and whether the request resulted in an error. Thus, entries having requests resulting in a connection are output in one stream, entries having requests that were denied are output in another stream, and entries having requests that resulted in errors are output as yet another stream.
- step 208 is omitted and the second entries are processed as a single group.
- the method 200 proceeds to execute step 201 . If multiple event streams are produced at step 208 , then the method 201 may be executed concurrently for each event stream. Alternatively, the event streams may be processed serially. If step 208 is omitted, then the method 201 operates of the second entries as a single group.
- the method 201 begins at step 210 , where the field delimiters for the input entries are standardized. The entries may be formatted into a standard delimited format to assist in further processing.
- unique IP addresses are identified from the input entries. As described above, each entry includes an IP address of a particular client device making the request recorded in the entry. There may be several entries in the stream that have the same IP address. The input entries are processed to identify the unique IP addresses.
- a machine name associated with each unique IP address is obtained. For example, each unique IP address may be scanned to obtain a corresponding machine name that identifies a client device using that IP address for a given session. The machine names attached to the unique IP addresses may be used to assist in collection of metrics per client device for historical monitoring.
- results of the process 201 are stored in a database. The results comprise one or more result files produced steps of the process 201 . For example, a result file may be produced at step 210 that includes the formatted entries. A result file may be produced at step 212 that includes the unique IP address identified. A result file may be produced at step 214 that includes unique IP address and corresponding machine names.
- a result file may also be produced that includes the input entries organized in terms of IP address/machine name.
- each result file is time-stamped and digitally signed to ensure that integrity is maintained. Processes for digitally signing files are well-known in the art.
- FIG. 3 is a flow diagram depicting an exemplary embodiment of a method 300 for processing log data produced by a network device in accordance with one or more aspects of the invention.
- the method 300 may be performed during the second phase of analysis performed by the log analysis module 108 .
- the method 300 begins at step 302 , where one or more raw log files are obtained from one or more network devices.
- the log file(s) are filtered using an IP filter configured with IP addresses of interest to select entries.
- the IP addresses of interest are obtained at step 312 .
- the IP addresses of interest may comprise the unique IP addresses identified during the method 200 (e.g., the first analysis phase).
- the selected entries are filtered using an event count filter. That is, each event type is counted for a given IP address (e.g., a particular IP address had 100 requests that resulted in a connection).
- the event count filter may be configured to count occurrences of any number of event types for the entries.
- the results are stored in a database.
- the results comprise one or more result files produced steps of the method 300 .
- a result file may be produced at step 304 that includes the selected entries organized based on IP address.
- a result file may be produced at step 306 that includes the counts of event types for the IP addresses.
- each result file is time-stamped and digitally signed to ensure that integrity is maintained.
- FIG. 4 is a block diagram depicting an exemplary embodiment of the computer 114 configured to implement the processes and methods described herein.
- the computer 114 includes a processor 401 , a memory 403 , various support circuits 404 , and an I/O interface 402 .
- the processor 401 may be any type processing element known in the art, such as microprocessor.
- the support circuits 404 for the processor 401 include conventional cache, power supplies, clock circuits, data registers, I/O interfaces, and the like.
- the I/O interface 402 may be directly coupled to the memory 403 or coupled through the processor 401 .
- the I/O interface 402 may be coupled to various input devices 412 and output devices 411 , such as a conventional keyboard, mouse, printer, and the like.
- the memory 403 may store all or portions of one or more programs and/or data to implement the processes and methods described herein. Notably, the memory 403 may store program code to be executed by the processor 401 for performing the method 200 of FIG. 2 and the method 300 of FIG. 3 .
- the processor 401 may store program code to be executed by the processor 401 for performing the method 200 of FIG. 2 and the method 300 of FIG. 3 .
- one or more aspects of the invention are disclosed as being implemented as a computer executing a software program, those skilled in the art will appreciate that the invention may be implemented in hardware, software, or a combination of hardware and software. Such implementations may include a number of processors independently executing various programs and dedicated hardware, such as ASICs.
- the computer 114 may be programmed with an operating system, which may be OS/2, Java Virtual Machine, Linux, Solaris, Unix, Windows, Windows95, Windows98, Windows NT, and Windows2000, WindowsME, and WindowsXP, among other known platforms. At least a portion of an operating system may be disposed in the memory 403 .
- the memory 403 may include one or more of the following random access memory, read only memory, magneto-resistive read/write memory, optical read/write memory, cache memory, magnetic read/write memory, and the like, as well as signal-bearing media as described below.
- An aspect of the invention is implemented as a program product for use with a computer system.
- Program(s) of the program product defines functions of embodiments and can be contained on a variety of signal-bearing media, which include, but are not limited to: (i) information permanently stored on non-writable storage media (e.g., read-only memory devices within a computer such as CD-ROM or DVD-ROM disks readable by a CD-ROM drive or a DVD drive); (ii) alterable information stored on writable storage media (e.g., floppy disks within a diskette drive or hard-disk drive or read/writable CD or read/writable DVD); or (iii) information conveyed to a computer by a communications medium, such as through a computer or telephone network, including wireless communications.
- a communications medium such as through a computer or telephone network, including wireless communications.
- the latter embodiment specifically includes information downloaded from the Internet and other networks.
- Such signal-bearing media when carrying computer
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Debugging And Monitoring (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Method and apparatus for processing log data produced by a network is described. In one example, entries in the log data are filtered using a plurality of filters to select first entries from the entries. The first entries are filtered using a plurality of false positive filters associated with the plurality of filters to select second entries from the first entries. Unique IP addresses are identified in the second entries. The entries in the log data are then filtered using the unique IP addresses to select third set entries. The third entries are analyzed to detect one or more patterns.
Description
- This application is a continuation of co-pending U.S. Ser. No. 11/301,389 filed on Dec. 13, 2005, which is currently allowed, the contents of which is incorporated herein by reference.
- 1. Field of the Invention
- Embodiments of the present invention generally relate to data mining and, more particularly, to a method and apparatus for recursively analyzing log file data in a network.
- 2. Description of the Related Art
- Presently, elements in a network employ various data logging processes to automatically record events in a certain scope in order to provide an audit trail. The network operator may use the audit trail for various purposes, such as diagnosing problems, tracking network access among users, and the like. In particular, a server in a network typically creates and maintains one or more server log files that contain a record of activity performed by the server for client devices. A typical example is a web server that maintains a history of requests received by client devices for web content. The data in a log file may be analyzed to obtain various types of statistics related to the activity of the particular network element.
- In one type of log file analysis, network operators track security-related statistics, such as monitoring Internet access by client devices to detect requests for inappropriate or illicit content. Such Internet access monitoring is typically employed in an enterprise setting. Conventional analysis tools for detecting inappropriate Internet use rely on detecting particular words or phrases in log file entries indicative of content that has been deemed inappropriate or illicit for the particular environment. Entries containing such words or phrases are copied and stored in a result file. Such analysis tools, however, generate a substantial number of false matches. In addition, the result file includes an arbitrary sequence of entries without any useful organization of data. Accordingly, there exists a need in the art for an improved method and apparatus for analyzing log file data in a network.
- Method and apparatus for processing log data produced by a network is described. In one embodiment, entries in the log data are filtered using a plurality of filters to select first entries from the entries. The first entries are filtered using a plurality of false positive filters associated with the plurality of filters to select second entries from the first entries. Unique IP addresses are identified in the second entries. The entries in the log data are then filtered using the unique IP addresses to select third set entries. The third entries are analyzed to detect one or more patterns.
- So that the manner in which the above recited features of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical embodiments of this invention and are therefore not to be considered limiting of its scope, for the invention may admit to other equally effective embodiments.
-
FIG. 1 is a block diagram depicting an exemplary network architecture in accordance with one or more aspects of the invention; -
FIG. 2 is a flow diagram depicting an exemplary embodiment of a method for processing log data produced by a network device in accordance with one or more aspects of the invention; -
FIG. 3 is a flow diagram depicting an exemplary embodiment of a method for processing log data produced by a network device in accordance with one or more aspects of the invention; and -
FIG. 4 is a block diagram depicting an exemplary embodiment of a computer suitable for implementing the processes and methods described herein. -
FIG. 1 is a block diagram depicting anexemplary network architecture 100 in accordance with one or more aspects of the invention. Thenetwork architecture 100 includes alocal network 102, client devices 104-1 through 104-N (collectively client devices 104), anetwork devices 106, and acomputer 114, where N is an integer greater than zero. Theclient devices 104, thenetwork device 106, and thecomputer 114 are coupled to thelocal network 102. Thenetwork device 106 is configured for communication with awide area network 112. In the present embodiment thewide area network 112 is the Internet. - The
local network 102 comprises any type of packet transport network known in the art (e.g., an Ethernet network). Data may be communicated over thelocal network 102 using any type of protocol, such as transmission control protocol/internet protocol (TCP/IP). Theclient devices 104 may include desktop computers, workstations, and the like. Theclient devices 104 are configured to access thewide area network 112 via thenetwork device 106. Thenetwork device 106 may be a server configured to provide a gateway to thewide area network 112 for the client devices. For example, thenetwork device 106 may comprise a web server, proxy server, or the like. Thenetwork device 106 is configured to maintain one or more log files aslog data 110. A log file includes entries that list actions that have occurred with respect to thenetwork element 106. For example, a web server maintains log files listing every request made to the server for access to the Internet. - Each log file in the
data 110 includes a list of entries, where each entry includes various fields having particular values associated with particular requests handled by the network element. In particular, an entry includes a field indicative of a request made by a client device and an IP address associated with the request. The request field includes a character string indicative of a particular file or object requested. Each log file in thedata 110 may be formatted using any type of format known in the art. One exemplary type of format is known as the common log format (CLF). A CLF entry includes fields for a remote host, a remote login name of a user, a username as which a user has authenticated himself, the date and time of a request, the request itself, status of the request, and the number of bytes transferred during the request. - For example, an entry in a log file having a CLF format may be:
- 127.0.0.1-frank [6/Dec/2005:14:32:15-0600] “GET/picture.gif HTTP/1.0” 200 2326
The IP address 127.0.0.1 is the IP address of the client device that made the request to the server. The hyphen indicates that the remote login name of the user is not available for this request. The value [6/Dec/2005:14:32:15-0600] is the date and time the server finished processing the request. The value “GET/picture.gif HTTP/1.0” is the request line from the client device. In this exemplary request, the method used by the client device is GET, the client requested the resource picture.gif, and the client used the hypertext transfer protocol (HTTP) version 1.0. Thevalue 200 is the status code the server sends back to the client device (a full list of service codes is contained in the HTTP specification, RFC 2616). The value 2326 includes the size of the object returned to the client device. It is to be understood that the log files generated by thenetwork device 106 may have various other formats known in the art. - The
computer 114 includes alog analysis module 108 for performing analysis of thelog file data 110 generated by thenetwork element 106. Thecomputer 114 obtains thelog file data 110 from thenetwork device 106 via thelocal network 102. Thecomputer 114 is configured to storeresult data 118 produced by thelog analysis module 108 in thedatabase 116. Thedatabase 116 may comprise a storage device associated with thecomputer 114. Alternatively, thedatabase 116 may be more sophisticated and comprise, for example, a server executing any type of well-known database software. For purposes of clarity by example, only a single network device is shown. It is to be understood that thelocal network 102 may include multiple network devices, each of which maintains one or more log files. Thecomputer 114 may interface with these multiple network devices via thelocal network 102 to obtain log file data to process. - In accordance with one or more aspects of the invention, the
log analysis module 108 analyzes input log file data to detect specific patterns. The rate of false positive matching for the patterns is reduced by using recursive data mining. In one embodiment, thelog analysis module 108 processes input log file data in two phases. In a first phase, entries in the log file data that include predefined character strings are identified. Unique IP addresses for the identified entries are also identified. In one embodiment, machine names associated with the IP addresses obtained. The identified entries are organized based on the unique IP addresses/machine names and stored in a database. In a second phase, entries in the log file data that include the unique IP addresses obtained in the first phase are identified. In this manner, all activity associated with each of the unique IP addresses may be identified. The entries identified in the second phase may be filtered based on particular event types. The identified entries may be organized based on IP address/event type and stored in a database. -
FIG. 2 is a flow diagram depicting an exemplary embodiment of amethod 200 for processing log data produced by a network device in accordance with one or more aspects of the invention. Themethod 200 may be performed during the first phase of analysis performed by thelog analysis module 108. Themethod 200 begins atstep 202, where one or more raw log files are obtained from one or more network devices. The log files are “raw” in that they have not been processed or re-formatted. Atstep 204, the log files are filtered using a plurality of filters to select first entries. Each of the plurality of filters comprises a specific string of ascii characters. For example, the filters may include words or phrases indicative of material that has been flagged by the network operator as being inappropriate for the setting in which the network is deployed. Each entry in the log files is processed to determine if the entry includes any of the character strings specified by the filters. For example, the character strings may be located in the request field. Entries containing such character strings are extracted to form a first group of entries (referred to as the first entries). Step 204 is referred to as the first pass filter. - At
step 206, the first entries are filtered using false positive filters associated with the plurality of filters to select second entries. Each of the false positive filters comprises a specific character string that is not indicative of inappropriate material, but its inclusion in an entry is known to cause the entry to be extracted by the firstpass filter step 204. For example, assume the word “virgin” comprises one of the filters employed in the first pass. Such a filter captures entries that include the word “virgin”, as well as other words with that character string, such as “Virginia”. If the word “Virginia” is not one of the filters employed in the first pass, the capture of an entry with the word “Virginia” is a false positive. Thus, the character string “Virginia” may be employed as a known false positive filter. Those entries in the first entries that contain character strings indicative of false positives are removed from the first entries to form a second group of entries (referred to as the second entries). - At
optional step 208, the second entries are filtered using an event filter to produce a plurality of streams of entries (“event streams”). The event filter is configured to categorize the second entries based on specific types of events. For example, one type of event may be particular time associated with a request recorded in an entry. Entries having a time after 12:00 PM are output as one stream, and entries having a time before 12:00 PM are output as another stream. Other types of events include whether a request resulted in a connection, whether a request was denied, and whether the request resulted in an error. Thus, entries having requests resulting in a connection are output in one stream, entries having requests that were denied are output in another stream, and entries having requests that resulted in errors are output as yet another stream. Those skilled in the art will appreciate that various other types of events may be employed by the event filter. The event filter may filter for any number of event types to produce any number of event streams. In another embodiment,step 208 is omitted and the second entries are processed as a single group. - The
method 200 proceeds to executestep 201. If multiple event streams are produced atstep 208, then themethod 201 may be executed concurrently for each event stream. Alternatively, the event streams may be processed serially. Ifstep 208 is omitted, then themethod 201 operates of the second entries as a single group. Themethod 201 begins atstep 210, where the field delimiters for the input entries are standardized. The entries may be formatted into a standard delimited format to assist in further processing. Atstep 212, unique IP addresses are identified from the input entries. As described above, each entry includes an IP address of a particular client device making the request recorded in the entry. There may be several entries in the stream that have the same IP address. The input entries are processed to identify the unique IP addresses. - At
step 214, a machine name associated with each unique IP address is obtained. For example, each unique IP address may be scanned to obtain a corresponding machine name that identifies a client device using that IP address for a given session. The machine names attached to the unique IP addresses may be used to assist in collection of metrics per client device for historical monitoring. Atstep 216, results of theprocess 201 are stored in a database. The results comprise one or more result files produced steps of theprocess 201. For example, a result file may be produced atstep 210 that includes the formatted entries. A result file may be produced atstep 212 that includes the unique IP address identified. A result file may be produced atstep 214 that includes unique IP address and corresponding machine names. A result file may also be produced that includes the input entries organized in terms of IP address/machine name. Atoptional step 218, each result file is time-stamped and digitally signed to ensure that integrity is maintained. Processes for digitally signing files are well-known in the art. -
FIG. 3 is a flow diagram depicting an exemplary embodiment of amethod 300 for processing log data produced by a network device in accordance with one or more aspects of the invention. Themethod 300 may be performed during the second phase of analysis performed by thelog analysis module 108. Themethod 300 begins atstep 302, where one or more raw log files are obtained from one or more network devices. Atstep 304, the log file(s) are filtered using an IP filter configured with IP addresses of interest to select entries. The IP addresses of interest are obtained atstep 312. The IP addresses of interest may comprise the unique IP addresses identified during the method 200 (e.g., the first analysis phase). Atstep 306, the selected entries are filtered using an event count filter. That is, each event type is counted for a given IP address (e.g., a particular IP address had 100 requests that resulted in a connection). The event count filter may be configured to count occurrences of any number of event types for the entries. - At
step 308, the results are stored in a database. The results comprise one or more result files produced steps of themethod 300. For example, a result file may be produced atstep 304 that includes the selected entries organized based on IP address. A result file may be produced atstep 306 that includes the counts of event types for the IP addresses. Atoptional step 310, each result file is time-stamped and digitally signed to ensure that integrity is maintained. -
FIG. 4 is a block diagram depicting an exemplary embodiment of thecomputer 114 configured to implement the processes and methods described herein. Thecomputer 114 includes aprocessor 401, amemory 403,various support circuits 404, and an I/O interface 402. Theprocessor 401 may be any type processing element known in the art, such as microprocessor. Thesupport circuits 404 for theprocessor 401 include conventional cache, power supplies, clock circuits, data registers, I/O interfaces, and the like. The I/O interface 402 may be directly coupled to thememory 403 or coupled through theprocessor 401. The I/O interface 402 may be coupled tovarious input devices 412 andoutput devices 411, such as a conventional keyboard, mouse, printer, and the like. - The
memory 403 may store all or portions of one or more programs and/or data to implement the processes and methods described herein. Notably, thememory 403 may store program code to be executed by theprocessor 401 for performing themethod 200 ofFIG. 2 and themethod 300 ofFIG. 3 . Although one or more aspects of the invention are disclosed as being implemented as a computer executing a software program, those skilled in the art will appreciate that the invention may be implemented in hardware, software, or a combination of hardware and software. Such implementations may include a number of processors independently executing various programs and dedicated hardware, such as ASICs. - The
computer 114 may be programmed with an operating system, which may be OS/2, Java Virtual Machine, Linux, Solaris, Unix, Windows, Windows95, Windows98, Windows NT, and Windows2000, WindowsME, and WindowsXP, among other known platforms. At least a portion of an operating system may be disposed in thememory 403. Thememory 403 may include one or more of the following random access memory, read only memory, magneto-resistive read/write memory, optical read/write memory, cache memory, magnetic read/write memory, and the like, as well as signal-bearing media as described below. - An aspect of the invention is implemented as a program product for use with a computer system. Program(s) of the program product defines functions of embodiments and can be contained on a variety of signal-bearing media, which include, but are not limited to: (i) information permanently stored on non-writable storage media (e.g., read-only memory devices within a computer such as CD-ROM or DVD-ROM disks readable by a CD-ROM drive or a DVD drive); (ii) alterable information stored on writable storage media (e.g., floppy disks within a diskette drive or hard-disk drive or read/writable CD or read/writable DVD); or (iii) information conveyed to a computer by a communications medium, such as through a computer or telephone network, including wireless communications. The latter embodiment specifically includes information downloaded from the Internet and other networks. Such signal-bearing media, when carrying computer-readable instructions that direct functions of the invention, represent embodiments of the invention.
- While the foregoing is directed to embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.
Claims (20)
1. A method of processing log data produced by a network, comprising:
filtering entries in the log data using a plurality of filters to select first entries from the entries;
filtering the first entries using a plurality of false positive filters associated with the plurality of filters to select second entries from the first entries;
identifying internet protocol (IP) addresses in the second entries;
filtering the entries in the log data using the IP addresses to select third entries from the entries; and
analyzing the third entries to detect one or more patterns.
2. The method of claim 1 , wherein the identifying comprises:
separating the second entries into a plurality of streams based on a respective plurality of event types;
for each stream of the plurality of streams:
identifying a set of IP addresses in the each stream of the plurality of streams; and
obtaining corresponding machine names for each IP address in the set of IP addresses.
3. The method of claim 2 , further comprising:
formatting the each stream of the plurality of streams into a delimited format prior to the identifying of the set of IP addresses in the each stream of the plurality of streams.
4. The method of claim 3 , further comprising at least one of:
storing results of the formatting for the stream in a first results file; and
storing results of the obtaining for the stream in a second results file.
5. The method of claim 1 , wherein the analyzing comprises:
determining a count of the third entries corresponding to each of a plurality of event types; and
storing results of the determining in a results file.
6. The method of claim 1 , further comprising:
storing results of the filtering using the sets of IP addresses in a results file.
7. The method of claim 1 , further comprising:
storing results of the analyzing in a results file;
time stamping the results file; and
digitally signing the results file using a cryptographic key.
8. The method of claim 1 , wherein the log data comprises a plurality of logs produced by one or more internet protocol (IP) devices in the network.
9. Apparatus for processing log data produced by a network, comprising:
means for filtering entries in the log data using a plurality of filters to select first entries from the entries;
means for filtering the first entries using a plurality of false positive filters associated with the plurality of filters to select second entries from the first entries;
means for identifying internet protocol (IP) addresses in the second entries;
means for filtering the entries in the log data using the IP addresses to select third entries from the entries; and
means for analyzing the third entries to detect one or more patterns.
10. The apparatus of claim 9 , wherein the means for identifying comprises:
means for separating the second entries into a plurality of streams based on a respective plurality of event types;
means for identifying a set of IP addresses for each stream of the plurality of streams; and
means for obtaining corresponding machine names for each IP address in the set of IP addresses for each stream of the plurality of streams.
11. The apparatus of claim 10 , further comprising:
means for formatting the each stream of the plurality of streams into a delimited format prior to the identifying the set of IP addresses in the each stream of the plurality of streams.
12. The apparatus of claim 11 , further comprising at least one of:
means for storing results of the formatting for the stream in a first results file; and
means for storing results of the obtaining for the stream in a second results file.
13. The apparatus of claim 9 , wherein the means for analyzing comprises:
means for determining a count of the third entries corresponding to each of a plurality of event types; and
means for storing results of the determining in a results file.
14. The apparatus of claim 9 , further comprising:
means for storing results of the filtering using the sets of IP addresses in a results file.
15. A computer readable medium having stored thereon instructions that, when executed by a processor, cause the processor to perform a method of processing log data produced by a network, comprising:
filtering entries in the log data using a plurality of filters to select first entries from the entries;
filtering the first entries using a plurality of false positive filters associated with the plurality of filters to select second entries from the first entries;
identifying internet protocol (IP) addresses in the second entries;
filtering the entries in the log data using the IP addresses to select third entries from the entries; and
analyzing the third entries to detect one or more patterns.
16. The computer readable medium of claim 15 , wherein the identifying comprises:
separating the second entries into a plurality of streams based on a respective plurality of event types;
for each stream of the plurality of streams:
identifying a set of IP addresses in the each stream of the plurality of streams; and
obtaining corresponding machine names for each IP address in the set of IP addresses.
17. The computer readable medium of claim 16 , further comprising:
formatting the each stream of the plurality of streams into a delimited format prior to the identifying the set of IP addresses in the each stream of the plurality of streams.
18. The computer readable medium of claim 17 , further comprising at least one of:
storing results of the formatting or the stream in a first results file; and
storing results of the obtaining for the stream in a second results file.
19. The computer readable medium of claim 15 , wherein the analyzing comprises:
determining a count of the third entries corresponding to each of a plurality of event types; and
storing results of the determining in a results file.
20. The computer readable medium of claim 15 , further comprising:
storing results of the analyzing in a results file;
time stamping the results file; and
digitally signing the results file using a cryptographic key.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/268,437 US20090070601A1 (en) | 2005-12-13 | 2008-11-10 | Method and apparatus for recursively analyzing log file data in a network |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/301,389 US7451145B1 (en) | 2005-12-13 | 2005-12-13 | Method and apparatus for recursively analyzing log file data in a network |
US12/268,437 US20090070601A1 (en) | 2005-12-13 | 2008-11-10 | Method and apparatus for recursively analyzing log file data in a network |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/301,389 Continuation US7451145B1 (en) | 2005-12-13 | 2005-12-13 | Method and apparatus for recursively analyzing log file data in a network |
Publications (1)
Publication Number | Publication Date |
---|---|
US20090070601A1 true US20090070601A1 (en) | 2009-03-12 |
Family
ID=39940894
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/301,389 Active 2026-09-05 US7451145B1 (en) | 2005-12-13 | 2005-12-13 | Method and apparatus for recursively analyzing log file data in a network |
US12/268,437 Abandoned US20090070601A1 (en) | 2005-12-13 | 2008-11-10 | Method and apparatus for recursively analyzing log file data in a network |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/301,389 Active 2026-09-05 US7451145B1 (en) | 2005-12-13 | 2005-12-13 | Method and apparatus for recursively analyzing log file data in a network |
Country Status (1)
Country | Link |
---|---|
US (2) | US7451145B1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110307502A1 (en) * | 2010-06-14 | 2011-12-15 | Microsoft Corporation | Extensible event-driven log analysis framework |
US20130246349A1 (en) * | 2011-01-06 | 2013-09-19 | International Business Machines Corporation | Records declaration filesystem monitoring |
US10977150B2 (en) * | 2015-10-15 | 2021-04-13 | King.Com Ltd. | Data analysis |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10331693B1 (en) * | 2016-09-12 | 2019-06-25 | Amazon Technologies, Inc. | Filters and event schema for categorizing and processing streaming event data |
US10769152B2 (en) * | 2016-12-02 | 2020-09-08 | Cisco Technology, Inc. | Automated log analysis |
US10496467B1 (en) | 2017-01-18 | 2019-12-03 | Amazon Technologies, Inc. | Monitoring software computations of arbitrary length and duration |
CN107592233A (en) * | 2017-10-30 | 2018-01-16 | 郑州云海信息技术有限公司 | A kind of method and system for screening network log |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6128298A (en) * | 1996-04-24 | 2000-10-03 | Nortel Networks Corporation | Internet protocol filter |
US20020178169A1 (en) * | 2001-05-23 | 2002-11-28 | Nair Sandeep R. | System and method for efficient and adaptive web accesses filtering |
US20030145225A1 (en) * | 2002-01-28 | 2003-07-31 | International Business Machines Corporation | Intrusion event filtering and generic attack signatures |
US20040093521A1 (en) * | 2002-07-12 | 2004-05-13 | Ihab Hamadeh | Real-time packet traceback and associated packet marking strategies |
US20050193430A1 (en) * | 2002-10-01 | 2005-09-01 | Gideon Cohen | System and method for risk detection and analysis in a computer network |
US20050273593A1 (en) * | 2002-06-03 | 2005-12-08 | Seminaro Michael D | Method and system for filtering and suppression of telemetry data |
US20070084915A1 (en) * | 2005-10-18 | 2007-04-19 | Weipeng Yan | Identifying spurious requests for information |
US7487546B1 (en) * | 2004-09-03 | 2009-02-03 | Symantec Corporation | Hosts file protection system and method |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6055572A (en) * | 1998-01-20 | 2000-04-25 | Netscape Communications Corporation | System and method for creating pathfiles for use to predict patterns of web surfaces |
US20030204741A1 (en) * | 2002-04-26 | 2003-10-30 | Isadore Schoen | Secure PKI proxy and method for instant messaging clients |
US7219131B2 (en) * | 2003-01-16 | 2007-05-15 | Ironport Systems, Inc. | Electronic message delivery using an alternate source approach |
US7249162B2 (en) * | 2003-02-25 | 2007-07-24 | Microsoft Corporation | Adaptive junk message filtering system |
-
2005
- 2005-12-13 US US11/301,389 patent/US7451145B1/en active Active
-
2008
- 2008-11-10 US US12/268,437 patent/US20090070601A1/en not_active Abandoned
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6128298A (en) * | 1996-04-24 | 2000-10-03 | Nortel Networks Corporation | Internet protocol filter |
US20020178169A1 (en) * | 2001-05-23 | 2002-11-28 | Nair Sandeep R. | System and method for efficient and adaptive web accesses filtering |
US6741990B2 (en) * | 2001-05-23 | 2004-05-25 | Intel Corporation | System and method for efficient and adaptive web accesses filtering |
US20030145225A1 (en) * | 2002-01-28 | 2003-07-31 | International Business Machines Corporation | Intrusion event filtering and generic attack signatures |
US7222366B2 (en) * | 2002-01-28 | 2007-05-22 | International Business Machines Corporation | Intrusion event filtering |
US20050273593A1 (en) * | 2002-06-03 | 2005-12-08 | Seminaro Michael D | Method and system for filtering and suppression of telemetry data |
US20040093521A1 (en) * | 2002-07-12 | 2004-05-13 | Ihab Hamadeh | Real-time packet traceback and associated packet marking strategies |
US20050193430A1 (en) * | 2002-10-01 | 2005-09-01 | Gideon Cohen | System and method for risk detection and analysis in a computer network |
US7487546B1 (en) * | 2004-09-03 | 2009-02-03 | Symantec Corporation | Hosts file protection system and method |
US20070084915A1 (en) * | 2005-10-18 | 2007-04-19 | Weipeng Yan | Identifying spurious requests for information |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110307502A1 (en) * | 2010-06-14 | 2011-12-15 | Microsoft Corporation | Extensible event-driven log analysis framework |
US8832125B2 (en) * | 2010-06-14 | 2014-09-09 | Microsoft Corporation | Extensible event-driven log analysis framework |
US20130246349A1 (en) * | 2011-01-06 | 2013-09-19 | International Business Machines Corporation | Records declaration filesystem monitoring |
US9075815B2 (en) | 2011-01-06 | 2015-07-07 | International Business Machines Corporation | Records declaration filesystem monitoring |
US9959283B2 (en) * | 2011-01-06 | 2018-05-01 | International Business Machines Corporation | Records declaration filesystem monitoring |
US10977150B2 (en) * | 2015-10-15 | 2021-04-13 | King.Com Ltd. | Data analysis |
Also Published As
Publication number | Publication date |
---|---|
US7451145B1 (en) | 2008-11-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10397279B2 (en) | Directing audited data traffic to specific repositories | |
US9047269B2 (en) | Modeling interactions with a computer system | |
EP1955159B1 (en) | Log collection, structuring and processing | |
US8032489B2 (en) | Log collection, structuring and processing | |
EP2244418B1 (en) | Database security monitoring method, device and system | |
US20090070601A1 (en) | Method and apparatus for recursively analyzing log file data in a network | |
CN110602029B (en) | Method and system for identifying network attack | |
KR101676366B1 (en) | Attacks tracking system and method for tracking malware path and behaviors for the defense against cyber attacks | |
US20080059123A1 (en) | Management of host compliance evaluation | |
US12155678B1 (en) | Detecting and mitigating leaked cloud authorization keys | |
CN111859076B (en) | Data crawling method, device, computer equipment and computer readable storage medium | |
CN115865525B (en) | Log data processing method, device, electronic equipment and storage medium | |
CN115766258B (en) | Multi-stage attack trend prediction method, equipment and storage medium based on causal relationship graph | |
CN113377718A (en) | Log information processing method and device, computer equipment and storage medium | |
CN106528805A (en) | Mobile internet baleful program URL intelligent analyzing and mining method based on users | |
US20050177630A1 (en) | Service analysis | |
US7080250B2 (en) | System, method and program product for automatically collecting state information for computer system intrusion analysis | |
CN111460337B (en) | URL recognition rate analysis method and device | |
CN116527303B (en) | Industrial control equipment information extraction method and device based on marked traffic comparison | |
US20050267966A1 (en) | Methods, systems and computer program products for auditing network device configurations | |
CN119363382A (en) | A Snort rule management method, system, device and medium | |
CN119966742A (en) | A network threat data entry method, device, equipment and medium | |
CN118734313A (en) | A cross-platform cloud resource anomaly detection method and device | |
CN118838590A (en) | Method and system for acquiring website API interface by one key | |
CN119885168A (en) | Virtual machine mirror image static scanning method and system based on super fusion platform |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |