WO2006017689A2

WO2006017689A2 - Data context switching in a semantic processor

Info

Publication number: WO2006017689A2
Application number: PCT/US2005/027803
Authority: WO
Inventors: Somsubhra Sikdar; Kevin J. Rowett; Caveh Jalali; Prasad Rajendra Rallapalli; Jonathan Sweedler; Rajesh Nair; Komal Rathi; Joel Leon Lach
Original assignee: Mistletoe Technologies, Inc.
Priority date: 2004-08-05
Filing date: 2005-08-05
Publication date: 2006-02-16
Also published as: JP2008509484A; WO2006017689A9; WO2006017689A3

Abstract

Embodiments of a multiple-parsing-context parser and semantic processor are shown and described. The described embodiments allow an input data stream to be parsed in multiple contexts, with the parser switching between contexts as the input data stream dictates. For instance, one embodiment allows a SONET input data stream, including multiple interleaved payloads and SONET transport overhead, to be parsed using multiple grammars, with control passing between the grammars and contexts in a single pass. This approach allows a reconfigurable semantic processor to serve different payload arrangements for a complex multiplexed SONET stream.

Description

DATA CONTEXT SWITCHING IN A SEMANTIC PROCESSOR

This application claims priority of U.S. provisional patent application No. 60/599,830, filed August 5, 2004, and U.S. utility patent application No. 11/181,528, filed July 14, 2005, entitled "TCP ISOLATION WITH SEMANTIC PROCESSOR TCP STATE MACHINE", and U.S. utility patent application No. 11/185,223, filed July 19, 2005, entitled "DEBUG NON-TERMINAL SYMBOL FOR PARSER ERROR HANDLING", and U.S. utility patent application No. 11/186,144, filed July 20, 2005, entitled "PACKET OUTPUT BUFFER FOR SEMANTIC PROCESSOR", and copending U.S. Patent Application 10/351,030, titled "Reconfigurable Semantic Processor," filed by Somsubhra Sikdar on January 24, 2003, is incorporated herein by reference.

FIELD OF THE INVENTION

This invention relates generally to digital semantic processors, and more particularly to methods and apparatus for switching parsing contexts while parsing a data stream. BACKGROUND OF THE INVENTION

In the data communications field, networking devices such as servers typically use packets when communicating over a network. A packet is a finite-length (generally several tens to several thousands of octets) digital transmission unit comprising one or more header fields and a data field. The data field may contain virtually any type of digital data. The header fields convey information (in different formats depending on the type of header and options) related to delivery and interpretation of the packet contents. This information may, e.g., identify the packet's source or destination, identify the protocol to be used to interpret the packet, identify the packet's place in a sequence of packets, provide an error correction checksum, or aid packet flow control. The finite length of a packet can vary based on the type of network that the packet is to be transmitted through and the type of application used to present the data.

Typically, packet headers and their functions are arranged in an orderly fashion according to the open-systems interconnection (OSI) reference model. This model partitions packet communications functions into layers, each layer performing specific functions in a manner that can be largely independent of the functions of the other layers. For instance, a network layer, typically implemented with the well-known Internet Protocol (IP), provides network- wide packet delivery and switching functionality, while a higher-level transport layer can provide mechanisms for end-to-end delivery of packets. As such, each layer can prepend its own header to a packet, and regard all higher-layer headers as merely part of the data to be transmitted.

Transmission Control Protocol (TCP) is a transport layer used to provide mechanisms for highly-reliable end-to-end delivery of packet streams during an established TCP session. Traditionally, the establishment of a TCP session requires a three-way handshake between communicating endpoints. This three-way handshaking allows TCP endpoints to exchange socket information uniquely identifying the TCP session to be established, and to exchange initial sequence numbers and window sizes used in the packet sequencing, error recovery, and flow control. An example of a typical three-way handshake may include a first TCP endpoint sending a synchronize SYN packet to a second TCP endpoint, the second TCP endpoint responding with a synchronize and acknowledgment SYN-ACK packet, and the first TCP endpoint sending an acknowledgement ACK packet in response to the SYN-ACK packet. TCP further requires a similar exchange of termination FIN packets and acknowledgments to the FIN packets when closing an existing TCP session. Thus to use TCP in data exchanges, TCP endpoints must be able maintain information regarding the state of each of its TCP sessions, e.g., opening a TCP session, waiting for acknowledgment, exchanging data, or closing a TCP session. A commonly exploited weakness of TCP stems from this maintenance of state information. For instance, in a SYN flood denial-of-service attack, multiple SYN packets are received by a TCP endpoint, each requesting the establishment of a different TCP session. The initiator of the attack, however, does not have any intention of completing the corresponding three-way handshakes, often times providing a fictitious source port to ensure their failure. Responding to this flood of SYN packets allocates the TCP endpoint' s limited processing resources by requiring it maintain state information for each session opening while waiting for acknowledgments that will never arrive. Another attack that misallocates processing resources involves receiving packets for a session that conflict with the maintained state information, e.g., sending a SYN packet in an already established session or a FIN packet for a session that has not been established.

Once a TCP session is properly established, TCP endpoints may exchange data in a TCP packet stream. Since packets may be lost, or arrive out-of-order during transmission, TCP provides mechanisms to retransmit lost or late packets and reorder the packet stream upon arrival including discarding duplicate packets. TCP endpoints may also be required to perform other exception processing prior to the TCP reordering, such as reassembling lower- layer fragmented packets, e.g., IP fragments, and/or performing cryptography operations, e.g., according to an Internet Protocol Security (IPSec) header(s). Thus use of TCP to reliably exchange packet streams comes at a cost of efficiency in TCP endpoint processing and increased vulnerability to TCP -based attacks.

SONET (Synchoronous Optical NETwork) refers to a widely used standard for digital communication across fiber-optic cables, using a hierarchy of several different line rates. The SONET standards (ANSI Tl.105.x) define line rates as STS-N, where "STS" is an acronym for Synchronous Transport Signal and "N" is commonly one of 1, 3, 12, 24, 48, 192, and 768. The line rate of an STS-N signal is N x 51.84 Mbps (million bits per second), transmitted as 8000 frames/second, the frame size growing proportionally with N. The following background description contains a brief introduction to SONET concepts and terminology that will be helpful in understanding the embodiments described later in the application. Referring to Figure 1 , a portion of an exemplary SONET network 100 is illustrated.

An STS terminal multiplexer 110 receives data on one or more non-SONET tributaries, such as Ethernet links, circuit-switched voice network data links, Asynchronous Transfer Mode (ATM) data links, etc. STS terminal MUX 110 places data from its tributaries into SONET STS-N frames, which are then directed across a SONET path to Path-Terminating Equipment (PTE) 120, which could be another STS terminal MUX or some other device that extracts the tributary data from the STS-N signal.

Other SONET devices can reside along the path between STS terminal MUX 110 and PTE 120. For instance, two add/drop multiplexers (ADMs) 130 and 140, and two repeaters 150 and 160 are shown in the path between STS terminal MUX 110 and PTE 120. ADMs 130 and 140 multiplex lower-rate STS-N signals to a higher-rate STS-N signal and vice-versa. For instance, ADM 130 could multiplex four STS-3 lines to an STS-12 line, and/or could extract (drop) some STS-3 lines from an incoming STS-12 and replace (add) other STS-3 lines to produce an outgoing STS-12 line.

Repeaters such as 150 and 160 can be inserted in lines too long for a single long fiber to reliably carry the SONET signal between endpoints. The repeaters cannot modify the SONET payload, but merely retime and retransmit it.

Three different layer terminologies are also illustrated in Figure 1. For a given data stream, an STS path layer exists from where the data is first placed in a SONET frame to where that data is removed from a SONET frame. A line layer exists over any SONET path segment where the payload is unchanged. A section layer exists between any SONET receiver/transmitter pair that share the same fiber.

A SONET link carries overhead bits for the path, line, and section layers. These overhead bits are referred to respectively as Path OverHead (POH), Line OverHead (LOH), and Section OverHead (SOH). SONET endpoints that are only section endpoints can generate and/or modify SOH, but cannot modify LOH or POH. Endpoints that are also line endpoints can additionally generate and/or modify LOH, but cannot modify POH. Path endpoints are the only endpoints allowed to create POH.

Overhead and payload occupy specific locations in a SONET frame. The general structure of a SONET frame 200 is illustrated in Figure 2. Every STS-N frame contains 9ON columns and nine rows of byte data, which are transmitted in raster fashion, left-to-right and top-to-bottom. The first 3N columns contain overhead data, and the last 87N columns contain what is referred to as a Synchronous Payload Envelope (SPE) 230. Within the first 3N columns, the first three rows contain SOH 210, and the last six rows contain LOH 220. The POH lies within the synchronous payload envelope, as will be described shortly. Figure 3 shows additional SONET frame detail 300 for two consecutive STS-I frames

K and K+l. The POH in an STS-I frame occupies one column of the SPE, leaving the remaining 86 columns available to transmit input data. Rather than occupying a fixed location within the frame like the SOH and LOH, however, the POH can be defined with a first byte starting at any row, column location within the SPE. This capability exists primarily to allow circuit-switched data, which also has an 8000 frames/second format but is not necessarily in perfect synchronization with the SONET frame timing, to be more easily carried as a SONET payload.

The beginning location of the POH is actually specified by an offset stored in the first two bytes of the LOH, referred to herein as the "H1H2 pointer." Path-Terminating Equipment (PTE) interprets the H1H2 pointer values to locate the beginning of the next POH first byte that follows the H1H2 pointer. For instance, an H1H2 pointer offset of 0 would indicate that the frame overhead begins at row 4, column 4, just to the right of the H1H2

pointer.

As illustrated in Figure 3, the Hl H2 pointer of frame K has an offset to row 5, about seven or eight columns into the SPE, where the first byte of the POH for frame K will be found. The data payload for SPE K begins immediately after the first byte of the POH for frame K, and in this instance continues down to the first part of row 5 of frame K+l . As illustrated, the POH first byte for frame K+l follows immediately after the last byte of SPE K. This is not necessary, however, as frame K shows a slight shift between the frame K-I POH and the frame K POH, as represented by the H1H2 pointer.

One of the types of data that can be carried in a SONET SPE is packet data, such as is commonly carried on well-known computer networks. The packets carried in a SONET frame sequence need not be arranged in any synchronous fashion with respect to the SONET frames. For instance, Figure 4 shows an illustrative SPE K, with a data payload that begins with the terminal end of a Packet_l payload, followed by a Packet_2 IP (Internet Protocol) header, TCP (Transmission Control Protocol) header, and payload, a Packet_3 IP header, ARP (Address Resolution Protocol) header, and payload, a Packet_4 IP header, UDP (Uniform Datagram Protocol) header, and the first portion of payload. Within the frame or consecutive frames carrying SPE K, normally once a row a POH byte interrupts the packet data stream, as well as SOH and LOH bytes (not shown in Figure 4). The PTE receiving this SONET stream is expected to remove the SONET overhead bytes and output just the header and payloads from the SPEs.

A further complication in the SONET hierarchy lies in the ability within the SONET framework to build a higher-level SONET frame stream from different combinations of lower-level SONET frame streams. For instance, a segment of a SONET network 500, illustrated in Figure 5, shows one way in which an STS-12 stream can be constructed from six STS-I streams and two STS-3c concatenated streams (a concatenated STS-Nc frame, designated by the "c" prefix, has a similar SPE structure as an STS-I frame, but with more columns, and also contains a few LOH differences, as is well-known by those skilled in SONET networking).

In SONET network segment 500, a first STS-3 multiplexer (MUX) 510 combines three STS-I streams A, B, and C to create an STS-3 stream AA. A second STS-3 MUX 520 combines three other STS-I streams D, E, and F to create another STS-3 stream DD. STS-12 MUX 530 creates an STS-12 stream AAA by combining STS-3 streams AA and DD with two STS-3c concatenated streams BB and CC.

The overall frame structure 600 of a frame from STS-3 stream AA is illustrated in Figure 6. The first 9 columns contain SOH and LOH bytes, including H1H2 bytes for each of the individual lower-rate SPEs multiplexed in the frame. The H1H2 pointers from STS-Is A, B, and C are copied into three consecutive byte-interleaved H1H2 fields in the LOH section of the frame structure. Each H1H2 pointer indicates a respective offset to the first byte of its POH. Any arrangement of starting points for the three STS-I POH columns is allowable, with each lying somewhere in the 87 columns of the STS-3 SPE that correspond to the correct STS-I stream.

The next 261 columns in frame structure 600 contain STS-3 payload, consisting of byte-interleaved content from the three received STS-I SPEs. For instance, column 10 of frame structure 600 contains column 4 from an STS-I stream A frame, column 11 contains column 4 from an STS-I stream B frame, and column 12 contains column 4 from an STS-I stream c frame. This byte-interleaved pattern then repeats for column 5 of each of the three STS-I streams, and so on up to column 87 of the three STS-I streams, which appear respectively in columns 268, 269, and 270 of the STS-3 frame. Although the frame structure for STS-12 stream AAA is not illustrated, STS-12 multiplexer 530 takes four input STS-3 and/or STS-3c streams and byte-interleaves them in the same manner as just described. The output pattern for the STS-12 SPE in this example would repeatedly byte-interleave the four STS-3 input streams with the pattern AA, BB, CC, DD, AA, BB, ... , CC, DD. Expanding this pattern to include the embedded STS-Is per Figure 6, the STS-12 SPE byte-interleave pattern looks like A, BB, CC, D, B, BB, CC, E, C, BB, CC, F, repeated along each row.

Conventionally, a set of demultiplexers exactly mirroring the multiplexers shown in Figure 5 is required to terminate a path with the STS-12 frame structure. From Figures 5 and 6 and the preceding description, it can be appreciated that other combinations of STS-I, STS- 3, and/or STS-3c frames can form an STS-12 frame, or the STS-12 frame can itself be an STS-12c frame.

An STS-48 frame is created similarly to an STS-12 frame, except in place of the 4:1 multiplexer 530 of Figure 5, a 16:1 multiplexer is used to byte-interleave 16 STS-3 and/or STS-3c frames. Other combinations of lower-order STS streams are possible.

BRIEF DESCRIPTION OF THE DRAWING

The invention may be best understood by reading the disclosure with reference to the drawing, wherein: Figure 1 contains a network diagram for an exemplary path of a SONET network;

Figure 2 illustrates a generalized SONET frame structure; Figure 3 shows particulars of a SONET STS-I framing;

Figure 4 illustrates an STS-I SPE, showing one way in which packet data may be transmitted over a SONET STS-I path; Figure 5 shows one possible arrangement of SONET multiplexers useful in generating an STS-12 frame;

Figure 6 illustrates the general payload division for STS-3 frames generated by the

STS-3 multiplexer elements shown in Figure 5;

Figure 7 illustrates the general layout for one embodiment of a semantic processor

according to an embodiment of the present invention;

Figure 8 shows one possible implementation of a reconfigurable input buffer useful

with SONET embodiments of the present invention;

Figure 9 contains a block diagram for the parser and related components of the

semantic processor shown in Figure 7;

Figure 10 shows the format of STS-3 frames after removing byte-interleaving with the

input buffer of Figure 8; and

Figure 11 contains one row of de-interleaved STS-3 frame data extracted from Figure

10, showing contextual switches in the byte stream for that row.

Figure 12 illustrates, in block form, a network communications system useful with

embodiments of the present invention;

Figure 13A illustrates, in block form, embodiments of the proxy shown in Figure 1;

Figure 13B shows, in block form, an example packet flow through proxy 200 shown

in Figures 12 and 13 A;

Figure 14 shows an example flow chart illustrating embodiments for operating the

proxy shown in Figures 12, 13 A, and 13B;

Figure 15 illustrates, in block form, a semantic processor useful with embodiments of

the network-interface proxy and device-interface proxy shown in Figures 13 A and 13B; and

Figure 16 shows an example flow chart illustrating embodiments for operating the semantic processor shown in Figure 15 as a TCP state machine.

FIG. 17 illustrates, in block form, a semantic processor useful with embodiments of the invention.

FIG. 18 contains a flow chart for the processing of received packets in the semantic processor with the recirculation buffer in FIG. 17.

FIG. 19 illustrates a more detailed semantic processor implementation useful with embodiments of the invention.

FIG. 20 contains a flow chart of received IP -fragmented packets in the semantic processor in FIG. 19.

FIG. 21 contains a flow chart of received encrypted and/or unauthenticated packets in the semantic processor in FIG. 19. FIG. 22 illustrates yet another semantic processor implementation useful with embodiments of the invention.

FIG. 23 illustrates an embodiment of the packet output buffer in the semantic processor in FIG. 22.

FIG. 24 illustrates the information contained in the buffer in FIG. 23. Figure 25 shows an embodiment of a semantic processor in block form.

Figure 26 shows an embodiment of a parser table. Figure 27 shows an embodiment of a production rule table organization. Figure 28 shows an embodiment of a parser in block form. Figure 29 shows a flow chart of an embodiment of processing data. Figure 30 shows a flow chart of an embodiment of processing a debug production rule in a semantic processor.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS In the example of Figures 4-6, eight separate packet payloads are SONET-framed and multiplexed onto a single STS-12 line. Traditionally, an STS-12 demultiplexer, two STS-3 demultiplexers, and eight packet processors would be required to terminate the paths created by the networking elements shown in Figure 5. A different arrangement of termination

hardware would be required for a different formulation of STS- 12 framing. Other

arrangements of termination hardware would be required, e.g., for formulations of STS-48

framing.

It would be desirable for a single device to be able to handle different STS-3, STS-12,

STS-48, etc., and/or handle both STS demultiplexing and payload processing for an STS data

stream. It has now been discovered that, with certain enhancements, a semantic processor having a direct execution parser can be configured to handle such tasks. For instance, the

following description contains semantic processor embodiments and methods with the

potential to receive a byte-interleaved STS frame structure such as that previously described,

and completely interpret and process that structure, including the packet data carried in the

SPE if desired. As an added benefit, at least some described embodiments can be

reconfigured, by loading different parser grammar, to handle other STS frame structures

and/or payload content. One difficulty in directly parsing a received SONET frame structure is that the byte

stream represents data from many different parsing contexts, intermixed in a fashion that

typically changes parsing contexts at each received byte. SONET processing occurs in at

least one context, and could be divided into section, line, path, and framing contexts. Each

atomic STS-I or STS-Nc payload stream also uses a different context.

The arrangement of the contexts also depends on how the STS-N stream is created,

and also changes over time as the POH columns are allowed to shift. Accordingly, for

general SONET processing the order of received contextual information cannot be predicted

readily more than a frame in advance.

This problem of multiple parsing contexts in a single data stream is not unique to

SONET parsing. Other networking streams exist that can be parsed more easily when parsing is not constrained to a single, linear context. Because SONET is believed to represent a difficult example of such a stream, the described embodiments focus on a SONET implementation. Those skilled in the art will recognize how multiple-context parsing can be applied to other parsing problems. Those skilled in the art will also recognize that not every multiple-context parser will need some of the functionality that is specific to SONET, such as a byte de-interleaver and POH counters.

The semantic processor embodiment illustrated in Figures 7-9 is capable of configuration to properly parse SONET data, such as that contained in Figure 6, using multiple simultaneous parsing contexts. Turning first to Figure 7, a semantic processor embodiment 700 is illustrated. An

OC-N PHY (Optical Carrier STS-N physical interface) 710 connects semantic processor 700 to a pair of SONET fiber-optic carriers (not shown). The inbound STS-N stream is optically detected by PHY 710 and electrically transmitted to a Packet Input Buffer (PIB) 800. The outbound STS-N stream is electrically transmitted from a Packet Output Buffer (POB) 730 to PHY 710 for laser modulation on an outbound SONET fiber.

A parser 900 is connected to receive data input from PIB 800. A Parser Table (PT) 740 stores parsing data particular to the input that semantic processor is configured to parse. A Production Rule Table (PRT) 750 stores grammatical production rules particular to the input that semantic processor is configured to parse. A Semantic Code Table (SCT) 770 stores microcode segments for a SPU cluster 760, which preferably contains multiple semantic processing units capable of executing the microcode segments when so instructed, e.g., by parser 900 or another SPU. A SPU memory 780 stores data needed by the SPU, such as session contexts, packet data currently being processed, etc. Preferably, PT 740, PRT 750, and SCT 770 are implemented with programmable on-chip memory, and SPU memory 780 is implemented with a cached DRAM external memory. In operation, PIB 800 receives input data, such as a sequence of Ethernet frames, SONET frames, Fibre Channel frames, etc., from an appropriate PHY, such as the OC-N PHY 710 of Figure 7. The frame data is retrieved from PIB 800 by parser 900 and/or SPU cluster 760, as directed by the grammatical production rules and the SCT microcode. When parser 900 receives input data from PIB 800, it can do one of two things with the input data. First, it can literally match input data symbols with literal ("terminal") symbols stored in the production rules. Second, and generally more common, parser 900 can "look up" additional production rules for the data input symbols, based on the current data input symbol(s) and a current production ("non-terminal" or "NT") symbol. It is noted that terminal symbol matching can generally be performed using either production rule terminal symbols or parser table terminal match values.

When parser 900 is directed by the grammar to look up an additional production rule, parser 900 supplies one or more current data input (DI) symbols and non-terminal (NT) grammar symbols to parser table 740. Based on the (NT, DI) pairing supplied by the parser, parser table 740 issues a corresponding PR code to production rule table (PRT) 750. In some embodiments, parser table 740 is implemented with a Ternary Content- Addressable Memory (TCAM) consisting of entries of the form (NT, DI_match). Multiple (NT, DIjtnatch) entries can exist for the same NT symbol and different DIjtnatch values. The TCAM will return, as the PR code, the address of the entry with the correct NT value and DI_match value that matches the supplied data input DI. Because the TCAM allows individual bits or bytes in each TCAM entry to be set to a "don't care" value, each TCAM entry can be tailored to respond to the bits or bytes in DI that matter for the current production rule.

When parser table 740 locates the correct PR code, the code is passed to production rule table 750. PRT 750 locates a production rule corresponding to the PR code, and outputs the production rule. Although the production rule can contain whatever is desired, in Figure 7 the production rule contains an array of up to M non-terminal (and/or terminal) symbols

NT[], an array of up to R Semantic Code entry Points SEP[], and a SkipBytes value. The symbol array NT[] directs the parser as to what input will now be expected, based on the latest parsing cycle. The SkipBytes value directs the parser as to how many, if any, of the DI bytes were consumed by the latest parsing cycle, and can therefore by discarded. The semantic code entry point array SEP [] directs the SPU cluster 760 as to any processor tasks that should be performed based on the latest parsing cycle.

Based on a SEP supplied from PRT 750, SPU cluster 760 loads and executes a code segment to perform a task. For instance, a SPU could be directed to move the payload portion of an input packet from PIB 800 to SPU memory 780 and manipulate that payload portion in some fashion. SPU cluster 760 can also create output packet/frames and supply them to POB 720 for outbound transmission at PHY 710.

The preceding introduction to operation of one semantic processor embodiment is sufficient to allow understanding into the interoperation of PIB 800 and parser 900, which will be discussed in detail regarding multiple-context SONET processing, with the other blocks of an exemplary semantic processor. For further explanation of the other blocks of

Figure 7, the reader is respectfully referred to co-pending U.S. Patent Application

10/351,030, which was incorporated herein by reference.

Although parser 900 could conceivably operate directly on a byte-interleaved SONET stream, context-switching on every byte boundary during high-speed operation could waste a prohibitive number of parser cycles. Accordingly, Figure 8 illustrates one embodiment of

PIB 800 that is capable of partially de-interleaving received SONET frames.

The goal for this input buffer embodiment is to, on a row-by-row basis, de-interleave the byte-interleaved SPEs present in an STS-N stream. Reference will be made to Figure 10, which illustrates the desired output of PIB 800 for STS-3 frames consisting of three byte- interleaved STS-Is A, B, and C (see Figure 6). Optionally, the SOH and FOH columns can be "de-interleaved" as well with a parser grammar written so as to expect such a structure, or specific overhead sections such as the H1H2 pointers can be de-interleaved.

Frame structure 1000 of Figure 10 would apply to the line STS-3 AA in Figure 5. In frame structure 1000, nine columns of SOH and LOH and 261 columns of STS-3 SPE exist, just like in frame structure 600 of Figure 6. After byte de-interleaving, however, the 261 columns of STS-3 SPE are divided into three consecutive 87-column sections, corresponding respectively to STS-Is A, B, and C of Figure 5. It is noted that frame structure 1000 can be created with about a 2/3-row de-interleave latency, as the STS-I A payload section for each row cannot be completed until three bytes from the end of that row when the last STS-I A byte is received.

Referring back to Figure 8, the blocks of PIB 800 will now be explained as to their function in transforming the frame structure of Figure 6 to the frame structure of Figure 10. PIB 800 is preferably reconfigurable, however, to perform no de-interleaving or de- interleaving for a different interleaved structure.

An Input FIFO 810 receives an STS-N data stream from an OC-N PHY. Input FIFO 810 detects framing bytes within the STS-N data stream in order to frame-sync. Input FIFO 810 then coordinates with a standard SONET descrambler 820 to descramble the portions of each SONET frame that were scrambled prior to transmission. The DQI output of input FIFO 810 outputs a byte DQI of descrambled SONET frame data each time it receives a data clock signal D_CLK from a VIB (Virtual Input Buffer) stage 830.

VIB stage 830 has the task of matching each byte of DQI with a corresponding VIBAddress entry from a programmable Byte De-Interleaver Pattern (BDIP) register 834. VIB stage 830 obtains one VIBAddress entry from BDIP register 834, and then BDIP register increments, each time VIB stage 830 asserts an address clock signal A_CLK. BDIP register 834 contains a mapping for one row of SONET frame data to a group of VIB input buffers in an input buffer 850. For instance, the pattern for frame structure 600 of Figure 6 can be 275 entries with the pattern:

&,1,2,3,1,2,3,1,2,3,...,1,2,3,1₅2,3,1,&,2,&,3,&@, where "&" and "@" signify special control characters. In this example, VIB 0 is not assigned to hold any bytes, VIB 1 is assigned to hold STS-I A bytes, VIB 2 is assigned to hold STS-I B bytes, and VIB 3 is assigned to hold STS-I C bytes. The control character "&" is not matched with a DQI by VIB stage 830, but is written to the same VIB as the last DQI, signifying "end of row" for that particular VIB. The control character "@" tells BDIP register that it has reached the end of a frame row, causing BDIP register 834 to reset its read pointer to the head of the register and assert the signal S_Wrap to alert an output controller 860 that a complete frame row has been (or will be in a few more clock cycles) de- interleaved and is ready for output.

An address decode stage 832 receives DQI/VIBAddress pairs and &/VIBAddress pairs from VIB stage 830. Address decode stage 832 supplies the VIBAddress to VIB pointer registers block 840, which returns a physical buffer address corresponding to the VIBAddress. Address decode stage 832 writes the DQI or "&" to the physical buffer address in input buffer 850.

The VIB pointer registers 840 are configured to operate as a plurality of ring buffer pointer registers, each storing a physical StartAddress, a physical EndAddress, a

WritePointer, and a ReadPointer. When address decode stage 832 supplies a VIBAddress to VIB Pointer Registers 840, the correct WritePointer is retrieved and returned to address decode stage 832 as a physical buffer address. The WritePointer is then incremented, and reset to StartAddress when WritePointer reaches EndAddress. On the output side of PIB 800, an output controller 860 interfaces with a parser through a FrameRowReady output signal, a parser output FIFO 862, and a DataRequest input. Output controller 860 also interfaces with a SPU or SPU cluster through a SPU output FIFO 864 and a SPU_IN FIFO 866. The operation of both interfaces will be described in turn. When output controller 860 receives an S_Wrap signal from BDIP register 834, controller 860 asserts FrameRowReady to the parser. Output controller 860 can then respond to DataRequest inputs from the parser by transmitting DQO (DQ Output) values to the parser through parser FIFO 862.

Output controller 860 loads parser FIFO 862 by issuing D REQ (data request) signals to an address decode stage 870. Address decode stage 870 is initially set internally to read out of VIB 0. Each time it receives a D_REQ signal, address decode stage requests the current ReadPointer for VIB 0 from VIB pointer registers 840. Address decode stage 870 supplies this ReadPointer as a buffer read address to input buffer 850 and receives a DQO. Meanwhile, VIB Pointer Registers 840 increment ReadPointer, and reset it to StartAddress when ReadPointer reaches EndAddress.

When address decode stage 870 receives a DQO from input buffer 850 that contains an "&" control character, it knows that it has reaches the end of that virtual buffer's input for the current SONET row. Accordingly, address decode stage 870 increments its internal VIBAddress from VIB 0 to VIB 1 and begins reading from VIB 1 on the next D_REQ. This same behavior is repeated as an "&" is reached in each VIB, until the "&" in the last allocated VIB is reached. At that point, the internal VIBAddress is reset to VIB 0 in preparation for reading the next frame row.

DQO values can be read out to a SPU in similar fashion through SPU FIFO 864, based on commands received from a SPU at a SPUJDSf FIFO 866. It is also noted that "burst" read or skip forward commands can be efficiently implemented by storing pointers to the next few "&" characters in each VIB in the VIB pointer registers. Multiple consecutive DQO values can then be read at once as long as they do not read past the next "&" in the current VIB.

SPU_IN FIFO 866 can also be used to program PIB 800 at runtime. By issuing an appropriate CMD_IN, a SPU can instruct output controller 860 to receive a pattern and load that pattern to BDIP register 834 through the RegisterProgram signal interface. A SPU can also instruct output controller 860 to configure VIB StartAddress and EndAddress values for each active VIB in VIB Pointer Registers 840 and then initialize the ReadPointer and WritePointer values. A SPU can also instruct output controller 860 to configure input FIFO for the appropriate frame size and alignment character sequence, and load a seed to descrambler 820.

VIB stage 830 and/or Address Decode stage 832 can be pipelined as necessary to achieve an appropriate throughput for designed STS-N data rates. BDIP register 834 is designed with a size sufficient to hold a row mapping for the largest STS-N of interest. Input buffer 850 is designed with a size sufficient to hold at least two rows of the largest STS-N of interest. VIB pointer registers 840 are designed with enough pointer register sets to de- interleave the largest STS-N of interest were that STS-N composed entirely of STS-Is (for instance, 49 pointer register sets could be included to handle de-interleaving for any STS-48). For non-byte-interleaved input, much of the functionality just described could be bypassed with a simple single-ring-buffer interface to input buffer 850. Optionally, the described hardware can be configured with a continuous VIB 0 pattern in BDIP register 834 and a VIB 0 register set with a StartAddress set to the left end of input buffer 850 and an EndAddress set to the right end of input buffer 850.

Given the STS-3 example of Figure 6 and the description of PIB 800 operation, PIB 800 can produce the modified SONET frame structure 1000 of Figure 10 to parser 900 using three virtual input buffers and the exemplary pattern stored in BDIP register 834. Rather than the byte-interleaved STS-3 frame structure, frame structure 1000 is rearranged to place the 90 columns corresponding to STS-I A TOH and SPE columns in the first 90 columns of the frame, followed by all STS-I B columns starting in column 91, followed by all STS-I C columns starting in column 181. Other column rearrangements are possible, as long as the parser grammar is configured to expect what is received.

Given the rearranged STS-3 frame structure of Figure 10, parser 900 will still use at least four different contexts to process frame structure 1000. As illustrated in Figure 11, seven contexts are used. Context 0 is the root context for the input stream. Contexts 1, 3, and 5 are used to parse the TOH bytes for STS-I A, B, and C, respectively. Contexts 2, 4, and 6 are used to parse the payloads for STS-I A, B, and C, respectively.

Figure 11 repeats only frame K+l, row 1 from the rearranged STS-3 example of Figure 10. The first three bytes can be parsed in context 1, an STS-I parsing context with grammar designed to recognize the alignment and JO characters appearing in SOH row 1. The next 87 bytes are part of the STS-I A payload context, and contain a POH byte and 86 payload bytes, which in this example are considered to be packet data (headers and/or packet payload). The actual location of the POH byte within these 87 bytes is part of context 1, and must be remembered from an H1H2 LOH field that was previously parsed. And the remaining 86 bytes in all likelihood do not start with the first byte of a packet, but could be a continuation of a previously unfinished packet parsing context that must be revisited.

The same observations hold for the middle 90 bytes and the last 90 bytes, respectively representing STS-I B (contexts 3 and 4) and STS-I C (contexts 5 and 6). The packet data in these last two byte segments will, for this example, represent different parsing contexts than the packet data in the first 87 bytes of SPE data. And the H1H2 values for these last two byte segments will indicate POH locations, within the 90 byte segments, unique to those segments. At the end of the row shown in Figure 11, the context shifts back to context 0, the root STS-3 context. Accordingly, during the processing of the exemplary modified STS-3 row in Figure 11, a direct execution parser according to an embodiment of the invention could switch parsing contexts thirteen times. The switching itself is controlled by the modified STS-3 context, even when that context is not "active."

A block diagram of a parser embodiment 900 is shown in Figure 9, along with PIB 800, parser table 740, and production rule table 750. Parser 900 comprises an input data interface 910, a bank of context symbol registers 920, a parser state machine 930, a parser stack manager 940, a bank of context head/tail registers 950, and a parser stack memory 960. Input data interface 910 communicates with PIB 800. When PIB 800 asserts D_RDY to interface 910, interface 910 is allowed to send load or skip requests to PIB 800 until D_RDY is deasserted, signaling the end of a SONET frame row. In response to load requests, PIB 800 supplies input data to interface 910 over bus DIN.

Input data interface 910 requests data from PIB 800 in response to requests from parser state machine 930 (although interface 910 may cache not-yet-requested data as long as it responds correctly to external requests). Parser state machine 930 maintains the ID for the current context on the CTS bus. Whenever parser state machine 930 instructs input data interface to load input data to context symbol registers 920, row byte counters in a POH registers/counters 912 block, internal to input data interface 910, track the number of bytes read to that context.

Within register/counter block 912, two count registers are allocated for each context. The first, the previously mentioned row_byte_counter, counts the number of bytes read to the current context on the current frame row. The row byte counters for all contexts are reset each new frame row. The second, the SPE byte counter, counts the number of bytes read to the current context in the current SPE. The SPE byte counter counts are reset after each SPE is read in.

Four comparison registers are also allocated for each context in register/counter block

912. A row_count_max register defines the maximum number of symbols that should be

read to that context on the current row, for instance 86 symbols for an STS-I row, excluding

overhead bytes. A SPE_count_max register defines the maximum number of symbols that

I should be read to that context for the current SPE, for instance 774 for an STS-I SPE,

excluding overhead bytes. A current_POH_pointer maintains a value for the column location

of the POH in the current SPE, and a next_POH_pointer maintains a value for the column

location of the POH in the next SPE. The row_count_max register and SPE_count_max

registers are loaded when parser table 740 and production rule table 750 are configured for

the current input stream. The current_POH_pointer and next_POH_pointer are dynamic, and

set by the load_POH_register signal from parser state machine 930.

A enable flag register also exists for each context; the value of the enable flag register

determines whether the registers and counters for a particular context are enabled. When the

registers and counters for a particular context are enabled and that context is active, the row

byte and SPE byte counters increment as data is transferred to the context symbol registers

920. When the row byte counter value reaches the row_count_max register value, input data

interface 910 signals a level-decrement grammar context switch to parser state machine 930

and stops any pending transfer to the current context. When the row byte counter value

reaches the current_POH_pointer value, input data interface 910 signals a level-decrement

grammar context switch to parser state machine 930 and stops any pending transfer to the

current context. Also, when the SPE byte counter value reaches the SPE_count_max value, the next_POH_pointer is loaded to the current_POHjpointer, and the input data interface 910

skips bytes if necessary in the current row until it reaches the next_POH_pointer.

One final task of registers/counters 912 is to determine when the transmitter has signaled a positive or a negative byte stuff. A positive byte stuff is signaled by the transmitter inverting bits 7, 9, 11, 13, and 15 of the POH pointer, and a negative byte stuff is signaled by the transmitter inverting bits 8, 10, 12, 14, and 16 of the POH pointer. When parser state machine 930 loads a next_POHjpointer for a context, registers/counters 912 compare the next_POH_pointer value to the current POH_pointer value and signal a positive or negative byte stuff condition, as described, to parser state machine 930. Also, for a positive byte stuff, the row byte counter is incremented by one; for a negative byte stuff, the row byte counter is decremented by one.

Context symbol registers 920 store context symbols for^each of N potential contexts. For instance, when context 2 is an IP packet context, context 2 may be exited and re-entered several times per packet before parsing is completed on that packet. Context symbol register 920 maintains the current symbol state in the interim for the suspended contexts. When register 920 receives input bytes from input data interface 910, it stores them to the context register indicated on the CTS bus. Further, the value output on the DI bus to parser state machine 930 and parser table 740 is the value contained in the context register indicated on the CTS bus. Context symbol register 930 also preferably maintains two values for each context symbol register, indicating how many of its input bytes are valid, and how many input bytes have been requested but not filled.

Parser state machine 930 coordinates operation of the other elements of parser 900, and signals all context switches. As previously mentioned, byte counter comparisons may in some circumstances signal the parser state machine to switch contexts. The example below will also demonstrate how context switches can be initiated by the grammar itself, when parser state machine receives special non-terminals that instruct it to switch contexts. Except for the context-switching function described herein, parser state machine 930 can in some embodiments function similarly to the parser state machine described in copending U.S. Patent Application 10/351,030.

Parser stack manager 940 performs pushes and pops of parser stack symbols to parser

stack memory 960. In the illustrated embodiment, the CTS bus value is used by parser stack

manager 940 to access a set of head/tail registers 950 for each context. The head/tail registers

950 contain two pointers for each context: a head pointer, which contains the address in

parser stack memory 960 for the top-of-stack symbol for that context; and a tail pointer,

which contains the address in parser stack memory 960 for the bottom-of-stack symbol for

that context. When symbols are pushed to a context, the head pointer is incremented. When

symbols are popped to a context, the head pointer is decremented.

The operation of and cooperation between the elements of parser 900 will be

described further by example, with reference to the de-interlaced STS-3 frames of Figure 10.

Before parser 900 begins processing input from an STS-N stream, it can be

specifically configured for the appropriate number of parsing contexts for that stream.

Preferably, however, contexts are available and ready for use already, with movement

between them directed by the grammar, with the STS-3 grammar being a "root" grammar and

the required number of "branch" grammars for the input port.

For instance, the STS-3 root frame grammar for the illustrated embodiment can be

defined as: $STS-3_Stream := TOF STS-3_DI_Frame

$STS-3_DI_Frame:= @@L+1 CTS @@L+3 CTS @@L+5 CTS \ \* 1

@@L+1 CTS @@L+3 CTS @@L+5 CTS \ \*2 @@L+1 CTS @@L+3 CTS @@L+5 CTS \ \*3 @@L+1 CTS @@L+3 CTS @@L+5 CTS \ \*4 @@L+1 CTS @@L+3 CTS @@L+5 CTS \ \*5 @@L+1 CTS @@L+3 CTS @@L+5 CTS \ \*6

@@L+1 CTS @@L+3 CTS @@L+5 CTS \ \*7 @@L+1 CTS @@L+3 CTS @@L+5 CTS \ \*8 @@L+1 CTS @@L+3 CTS @@L+5 CTS \*9 $TOF := CTL_NewFrame SKIP_1_BYTE $CTS := CTL_ContextShift SKIP_1_BYTE $A1 := 0xF6

$A2 := 0x28

In the above grammar, the STS-3 stream is defined as a top of frame (TOF) symbol,

followed by a de-interlaced STS-3 frame. The TOF grammar includes an instruction to

"SKHM BYTE," in other words, to consume the CTLJSfewFrame symbol from the symbol

register 920 for context 0.

The STS-3 de-interlaced frame definition includes nine repetitions of a common

pattern "@@L+1 CTS @@L+3 CTS @@L+5 CTS." The CTS grammar looks for and

consumes a CTL_ContextShift character, which was inserted in the incoming data stream

between virtual input buffer segments by the PIB. The @@L+n grammar signifies a special

parser directive to shift contexts, with relation to the present context, upwards n contexts.

Thus on each row of a de-interlaced frame, the root grammar switches three times, to an STS-

1 grammar, as defined below. The STS-I grammar processes STS-I de-interlaced input on nine rows, for instance as

follows:

$STS-l_Frame := STS-l_Fr_rowl STS-l_Fr_row2 STS-l_Fr_row3 \

STS-l_Fr_row4 STS-l_Fr_row5 STS-l_Fr_row6 \ STS-l_Fr_row7 STS-I _Fr_row8 STS-l_Fr_row9

$STS-l_Fr_rowl := STS-l_TOH_rowl STS-l_SPE_row

$STS-1 Fr_row2 = STS-l_TOH_row2 STS-l_SPE_row $STS-1 Fr_row3 = STS-l_TOH_row3 STS-l_SPE_row $STS-1 Fr_row4 = STS-l_TOH_row4 STS-l_SPE_row $STS-1 Fr_row5 .= STS-l_TOH_row5 STS-l_SPE_row $STS-1 Fr_row6 = STS-l_TOH_row6 STS-l_SPE_row $STS-1 Fr_row7 ^~ STS-l_TOH_row7 STS-l_SPE__row $STS-1 Fr_row8 .= STS-l_TOH_row8 STS-l_SPE_row $STS-1 Fr row9 := STS-I TOH row9 STS-I SPE row

$STS-l_SPE_row := @@L++ POH @@L++ @@Lroot $POH := octet $STS-l_TOH_rowl := Al A2 JO SKIP_3_Bytes $J0 := octet $STS-l_TOH_row2 := Bl El Fl SKIP_3_Bytes

$B1 = octet

$E1 — octet

$F1 = octet

$STS-l_TOH_row3 := D1_D2_D3 SKIP_3_Bytes $D 1_D2_D3 := octet octet octet $STS-l_TOH_row4 := Jl_Pointer \

Neg_Stuff I Pos_Stuff I No_Stuff

$Pos_Stuff := Pos_Shift SKIP_4_Bytes

$Neg_Stuff := Neg_Shift SKIP_2JBytes

$No_Stuff := No_Shift SKIP_3_Bytes $Jl_Pointer := @@XferPointer

$STS- l_TOH_row5 := B2 K1_K2 SKIP_3_Bytes $B2 := octet

$K1_K2 := octet octet

$STS- l_TOH_row6 := D4_D5_D6 SKIP_3_Bytes $D4_D5_D6 := octet octet octet

$STS-l_TOH_row7 := D7_D8_D9 SKIP_3_Bytes $D7_D8_D9 := octet octet octet

$STS-l_TOH_row8 := D10_Dl 1_D12 SKIP_3 JBytes $D 10__D 11_D 12 := octet octet octet

$STS-l_TOH_row9 := S1_Z1 MO_M1_Z2 E2 SKIP_3_Bytes $S1_Z1 := octet

$MO_M1_Z2 := octet $E2 := octet Each row of the STS-I frame is separately defined, such that each time an STS-I

context is called it will be re-entered at the proper location. Each row performs TOH

processing, as will be described below, and then performs STS-l_SPE_row processing. The

STS-l_SPE_row processing increments to the next context, which could be an IP context,

ATM context, etc. The input data interface 910 will signal a context switch back to the STS- l_SPE_row grammar when the POH column is reached, at which time the POH byte is

consumed. The STS-l_SPE_row grammar then increments back to the next context until

input data interface 910 signals a context switch back to the STS-l_SPE_row grammar when the end of that SPE row is reached. The STS-l_SPE_row grammar then instructs the parser state machine to return to the root grammar with the @@Lroot command.

Each set of defined transport overhead bytes is parsed within the appropriate row. Although not illustrated, several of the transport overhead bytes may be processed with their own sub-grammars, if desired. For instance, the Dl through D12 bytes can be used as another data channel, which could be parsed with an appropriate grammar.

One possible processing approach for TOH row 4 is illustrated. That row contains the POH pointer for the next SPE. The special directive @@XferPointer transfers the two pointer bytes to the appropriate next_POHjpointer register, causing input data interface 910 to assert Pos_Shift, Neg_Shift, or No_Shift back to parser state machine 930. Depending on the state of the asserted variable, either two, three, or four input bytes are then consumed.

Although special-purpose execution engines have been shown and described, alternative implementations can use the described multi-symbol parser as a front-end datagram processor for a general purpose processor. The STS-3 example presented above was used for simplicity — the concepts illustrated therein can be extended, e.g. to parsing

STS-12, STS-48, etc., streams formed from any combination of STS-I, STS-3, STS-3c, STS- 12, STS- 12c, STS-48c streams.

Direct network communication using Transmission Control Protocol (TCP) may increase a networking device's vulnerability to TCP -based attacks and require additional processing of packets upon arrival. The addition of a proxy TCP endpoint designed to specifically perform the direct TCP -based network communications, shields networking devices from potential attacks and increases their processing efficiency. Embodiments of the present invention will now be described in more detail.

Figure 12 illustrates, in block form, a network communications system 100 useful with embodiments of the present invention. Referring to Figure 12, the network communications system 2100 includes a networking device 2140 that communicates over a network 2120 via a proxy 2200. The network 2120 may be any Wide Area Network (WAN) that provides packet switching. The networking device 2140 may be a server or any other device capable of network communications. The proxy 2200 maintains at least one TCP session over the network 2120 and a corresponding local session with the networking device 2140. In some embodiments, the local session may be a TCP session established with the networking device 2140 through a private network, e.g., a company enterprise network, Internet Service Provider (ISP) network, home network, etc. The proxy 2200 functions as a network communications intermediary for networking device 2140 by translating data between the local and TCP sessions. For instance, when receiving packetized data from the network 2120 in a TCP session, the proxy 2200 may sequence and depacketize the data prior to providing it to the networking device 2140 in the local session. The depacketization may include reassembling Internet Protocol (IP) fragments, and/or performing cryptography operations, e.g., according to the Internet Protocol Security (EPSec) header(s). This sequencing and processing by proxy 2200 allows the networking device 2140 to receive a uniform data stream in the local session, ensuring quality-of-service (QOS) for the networking device 2140 and control over network bandwidth usage.

Since the proxy 2200 is the endpoint for the network communications, not networking device 2140, the TCP session has a TCP signature of the proxy 2200, thus concealing the identity of the networking device 2140 from the network 2120. This concealment of the networking device 2140 limits its exposure to network-based attacks. The proxy 2200 may perform Network Address Translation (NAT) of destination and source IP addresses to help hide the identity of the networking device 2140. The proxy 2200 may be implemented at any network interface, such as a firewall. In some embodiments, proxy 2200 may provide network communication and processing for multiple networking devices 2140. In these embodiments, the management of network communication at a single network interface point may allow proxy 2200 to provide additional functionality for increasing the efficiency of the network management and packet processing. For instance, when the proxy 2200 discovers network changes, e.g., next hop change, Internet Control Message Protocol (ICMP) fragments, packet loss, etc., in one of the TCP sessions, the changes may be applied to all of the TCP sessions. This becomes especially powerful when combined with the full neighbor implementation of Border Gateway Protocol (BGP) or other link state routing protocol that is aware of the entire topology of network 2120. Additionally, since the proxy 2200 maintains multiple sessions, the status and statistics of these sessions can be accessed at a single network interface point.

The structure and operation of proxy 2200 for some embodiments of the invention will be explained with reference to Figures 13 A-15. Figure 13A illustrates, in block form, embodiments of the proxy 2200 shown in Figure 12. Referring to Figure 13 A, the proxy 2200 includes a network-interface proxy 2210 to manage one or more TCP sessions over the network 2120 and a device-interface proxy 2220 to manage one or more local sessions with networking device 2140. The network-interface proxy 2210 and device-interface proxy 2220 exchange data to be transmitted over their respective sessions. For instance, when network- interface proxy 2210 provides payload data from the TCP session to device-interface proxy 2220, the device-interface proxy 2220 transmits the data to the networking device 2140 in the local session. Alternatively, when device-interface proxy 2220 provides payload data from networking device 2140 to network-interface proxy 2210, the network-interface proxy 2210 transmits the data over the network 2120 in the TCP session.

The network-interface proxy 2210 includes a TCP state machine 2212 to establish and manage the TCP sessions over the network 2120, including maintaining state information for each TCP session and implementing packet sequencing, error recovery and flow control mechanisms. The TCP state machine 2212 sequences and processes packet streams received over the TCP sessions and provides the sequenced payload data to the device-interface proxy 2220. Because TCP state machine 2212 previously sequenced and processed the payload data, the device-interface proxy 2220 is then capable of providing a uniform data stream to networking device 2140 in the local session. The TCP state machine 2212 further packetizes payload data received from device-interface proxy 2220 and transmits it over the corresponding TCP session.

The device-interface proxy 2220 may include a TCP state machine 2222 to establish and manage local TCP sessions with the networking device 2140. TCP state machine 2222 operates similarly to TCP state machine 2212 with respect to packet streams over the local TCP sessions.

Figure 13B shows, in block form, an example packet flow through proxy 2200 shown in Figures 12 and 13 A. Referring to Figure 13B, the network-interface proxy 2210 receives a packet stream in TCP session 2122. In this example embodiment, the packet stream includes three TCP data payloads 1, 2A, 2B, 2C, and 3, which may arrive at network-interface proxy 2210 at varying rates, out-of-order, IP fragmented, e.g., payload 2 fragmented into 2A, 2B and 2C, and duplicated. The network-interface proxy 2210 reassembles the fragmented packets (fragments 2A, 2B, and 2C into TCP payload 2), reorders the TCP payloads, and discards the duplicated packets upon their arrival. The in-order and reassembled TCP payload data is then provided to the device-interface proxy 2220, where it is transmitted in the local TCP session 2124 at a uniform rate. The network-interface proxy 2210 may also perform cryptography operations upon the TCP packets prior to the reassembly and reordering, when they are received in need of decryption and/or authentication. This processing and uniform transmission by the proxy 200 allows a networking device 2140 to II¹"" if.,,,. II ..'' 'U' Ii iui Ii „■^■ it™. ,ιi o, H_nJi „3 receive a uniform in-order packet stream, thus reducing its processing burden.

Figure 14 shows an example flow chart 2300 illustrating embodiments for operating the proxy 2200 shown in Figures 12, 13 A, and 13B. According to a block 2310, the proxy 2200 establishes a TCP session over the network 2120 and a local session with a networking device 2140. The proxy 2200 may establish the TCP session 2122 through a three-way handshake with a remote TCP endpoint. The proxy 2200 may then establish a local session 2124 with the networking device 2140 responsive to the remote TCP session 2122 establishment. The local session 2124 may be established concurrently with the establishment of the TCP session 2122 to decrease data exchange latency, or it may be established after the TCP session 2122 to avoid problems with SYN floods and other TCP- based attacks. In some embodiments, the local session 2124 is also a TCP session established with a three-way handshake between the proxy 2200 and the networking device 2140.

According to a next block 2320, the proxy 2200 receives a packet stream in the TCP session 2122 over the network 2120. The proxy 2200 manages the TCP session 2122 by providing error recovery for lost or late packets and flow rate control by adjusting the size of the TCP window.

According to a next block 2330, the proxy 2200 translates data from the packet stream to the local session 2124. The translation includes sequencing and depacketizing the data, e.g., with the network-interface proxy 2210, and providing the data to the networking device 2140 in the local session 2124. The sequencing may include reordering of those packets received out-of-order and discarding duplicated packets, while the depacketization may include any additional processing that may be required, such as reassembly of IP fragmented packets and/or performance of cryptography operations according to IPSec headers. Although the flowchart 2300 shows data transfers from the network 2120 to the networking device 2140, proxy 2200 may also provide data in the opposite direction. The proxy 2200 I'¹"¹¹ H-" 'I ^■■•' i-i' '^;"^:|! 'i-i' ^;:;"iι ■^•■' IC .1'" fcli ItJ „;::!! provides operations that are not typically provided in firewalls. However, the proxy 2200 can also include, in addition to the TCP proxy operations, other conventional firewall operations

Figure 15 illustrates, in block form, a semantic processor 2400 useful with embodiments of the network-interface proxy 2210 and device-interface proxy 2220 shown in Figures 13A and 13B. Referring to Figure 15, a semantic processor 2400 contains an input buffer 2430 for buffering data streams received through the input port 2410, and an output buffer 2440 for buffering data steams to be transmitted through output port 2420. Input 2410 and output port 2420 may comprise a physical interface to network 2120 (Figures 12, 13 A, and 13B), e.g., an optical, electrical, or radio frequency driver/receiver pair for an Ethernet, Fibre Channel, 802.1 Ix, Universal Serial Bus, Firewire, SONET, or other physical layer interface.

A PCI-X interface 2480 is coupled to the input buffer 2430, the output buffer 2440, and an external PCI bus 2482. The PCI bus 2482 can connect to other PCI-capable components, such as disk drives, interfaces for additional network ports, other semantic processors, etc. The PCI-X interface 2480 provides data streams or packets to input buffer 2430 from PCI bus 2482 and transmits data streams packets over PCI bus 2482 from output buffer 2440.

Semantic processor 2400 includes a direct execution parser (DXP) 2450 that controls the processing of packets in the input buffer 2430 and a semantic processing unit (SPU) 2460 for processing segments of the packets or for performing other operations. The DXP 2450 maintains an internal parser stack (not shown) of non-terminal (and possibly also terminal) symbols, based on parsing of the current input frame or packet up to the current input symbol. When the symbol (or symbols) at the top of the parser stack is a terminal symbol, DXP 2450 compares data DI at the head of the input stream to the terminal symbol and expects a match in order to continue. When the symbol at the top of the parser stack is a non-terminal (NT) II"'" Il Il „•^■' u :::aι iui :» / ic: ./ α ui ,,^»:i; symbol, DXP 2450 uses the non-terminal symbol NT and current input data DI to expand the grammar production on the stack. As parsing continues, DXP 2450 instructs a SPU 2460 to process segments of the input, or perform other operations.

Semantic processor 2400 uses at least three tables. Code segments for SPU 2460 are stored in semantic code table 2456. Complex grammatical production rules are stored in a production rule table (PRT) 2454. Production rule (PR) codes 2453 for retrieving those production rules are stored in a parser table (PT) 2452. The PR codes 2453 in parser table 2452 also allow DXP 2450 to detect whether, for a given production rule, a code segment from semantic code table 2456 should be loaded and executed by SPU 2460. The production rule (PR) codes 2453 in parser table 2452 point to production rules in production rule table 2454. PR are stored, e.g., in a row-column format or a content- addressable format. In a row-column format, the rows of the table are indexed by a non¬ terminal symbol NT on the top of the internal parser stack, and the columns of the table are indexed by an input data value (or values) DI at the head of the input. In a content- addressable format, a concatenation of the non-terminal symbol NT and the input data value (or values) DI can provide the input to the parser table 2452. Preferably, semantic processor 2400 implements a content-addressable format, where DXP 2450 concatenates the non¬ terminal symbol NT with 8 bytes of current input data DI to provide the input to the parser table 2452. Optionally, parser table 2452 concatenates the non-terminal symbol NT and 8 bytes of current input data DI received from DXP 2450.

Input buffer 2430 includes a recirculation buffer 2432 to buffer data steams requiring additional passes through the DXP 2450. DXP 2450 parses data streams from recirculation buffer 2432 similarly to those received through input port 2410 or PCI bus 2482.

The semantic processor 2400 includes a memory subsystem 2470 for storing or augmenting segments of the packets. When prompted by the DXP 2450 in response the IK lL. 1! ,.^•■ 'I ..It Si IUl Si ./ IC ,.<^■ β IUl ..» parsing of packet headers, the SPU 2460 may sequence TCP packets and/or collect and assemble IP fragmented packets within memory subsystem 2470. The memory subsystem 2470 may also perform cryptography operations on data streams, including encryption, decryption, and authentication, when directed by SPU 2450. Once reassembled and/or processed in the memory subsystem 2470, the packets or their headers with a specialized NT symbol may be sent to the recirculation buffer 2432 for additional parsing by DXP 2450.

In certain state-dependent protocols, such as TCP, the reception order of packets gives rise to semantics that may be exploited by this semantic processing architecture. For instance, the reception of a TCP SYN packet indicates to the DXP 2450 an attempt to establish a TCP session, however if the session has already been established there is no further need to allocate resources to complete the processing of the packet, acknowledge its arrival, or maintain corresponding state information. Thus any TCP packet may be correct syntactically, but out-of-sequence with regard to the state of the TCP session. The semantic processor 2400 recognizes these packet-ordering semantics and implements a TCP state machine, such as 2212 or 2222 in Figure 14, for managing the required TCP interactions and maintaining the state information for TCP sessions.

Figure 16 shows an example flow chart 2500 illustrating embodiments for operating the semantic processor 2400 shown in Figure 15 as a TCP state machine. Referring to Figure 16, the semantic processor 2400 receives a packet at input buffer 2430 (at block 2510) and determines the packet contains a TCP header (at block 2520). The semantic processor 2400 determines the presence of the TCP header by parsing through the received packet's lower level headers with DXP 2450.

In a next decision block 2530, the semantic processor 2400 determines whether the received TCP packet corresponds to a TCP session maintained by semantic processor 2400. The memory subsystem 2470 maintains information for each active TCP session with IK IL. II ..•^■■ U Ϊ3 1L.H !b> ./ IC ./^' Oi U „3 semantic processor 2400, including the current state of the session, packet sequencing, and window sizing. The SPU 2460, when directed by the DXP 2450, performs a lookup within memory subsystem 2470 for a maintained TCP session that corresponds to the received TCP packet. When a TCP session corresponding to the TCP packet is maintained within semantic processor 2400, in a next decision block 2540, the semantic processor 2400 determines whether the TCP packet coincides with the current state of the TCP session. The SPU 2460 may retrieve the state of the maintained TCP session, e.g., one or more non-terminal (NT) symbols, for the DXP 2450. These NT symbols point to specialized grammatical production rules that correspond to each of the TCP states and control how the DXP 2450 parses the TCP packet.

For instance, when the TCP packet is a SYN packet and its corresponding TCP session is already established, the TCP SYN packet does not coincide with the state of the TCP session and thus is discarded (at block 2580) without further processing. Alternatively, when the TCP packet is a TCP data packet or a TCP FIN packet in an already established TCP session, the DXP 2450 parses the packet according to the state of the TCP session in a next block 2550.

Upon completion of parsing by the DXP 2450, the SPU 2460 may forward the TCP packet to the destination address for a networking device 2140, or send the payload to another semantic processor 2400 where it is provided to the networking device 2140 in a local session 2124. The SPU 2460 performs any reassembly or cryptography operations, including decryption and/or authentication, before forwarding the packets in the TCP session to the networking device 2140. The processed packets are provided to output buffer 2440, or to PCI bus 2482 via PCI-X interface 2480, after the processing operations have been completed by SPU 2460. IK L: Ii .■^■'^■ υ !hi Ui "3 / c:: ..<"^■ «; u .:s

When, at decision block 2530, a TCP session corresponding to the TCP packet is not

maintained within semantic processor 2400, in a next decision block 2560, the semantic

processor 2400 determines whether the TCP packet is a SYN packet attempting to establish a

TCP session with semantic processor 2400. The DXP 2450 may determine if the TCP packet

is a SYN packet by parsing the SYN flag in the TCP header.

When the TCP packet is not a SYN packet, in the next block 2580, the semantic

processor 2400 discards the packet from the input buffer 2430. The SPU 2460 may discard

the packet from the input buffer 2430 when directed by DXP 2450.

When the TCP packet is a SYN packet, in a next block 2570, the semantic processor

2400 open a TCP session according to the TCP SYN packet. The SPU 2460, when directed

by DXP 2450, executes microinstructions from semantic code table 2456 that cause the SPU

2460 to open a TCP session. The SPU 2460 may open the TCP session by sending a TCP

ACK message back to the source address identified by the TCP SYN packet and by

allocating a context control block within memory subsystem 2470 for maintaining

information, including the state of the session, and packet sequencing and window sizing

information. Execution then returns to block 2510, where semantic processor 2400 receives

subsequent packets at input buffer 2430, and the DXP 2450 parses the subsequent packets

corresponding to the established TCP session.

Many devices communicate, either over networks or back planes, by broadcast or

point-to-point, using bundles of data called packets. Packets have headers that provide

information about the nature of the data inside the packet, as well as the data itself, usually in

a segment of the packet referred to as the payload. Semantic processing, where the semantics of the header drive the processing of the payload as necessary, fits especially well in packet

processing. Figure 17 shows a block diagram of a semantic processor 3010. The semantic F" Ll Il ,/ U !i:::iι LH !!:::n / «:::!: ./ U Ul „::;!! processor 3010 may contain an input buffer 3014 to buffer an input data stream received through the input port 3012; a parser 3018, which may also be referred to as a direct execution parser to control the processing of packets in the input buffer 3012; at least one semantic processing unit 3016 to process segments of the packets or to perform other operations; and a memory subsystem 3026 to store or augment segments of the packets.

The parser 18 maintains an internal parser stack 3032, shown in Figure 20, of symbols, based on parsing of the current input frame or packet up to the current input symbol. For instance, each symbol on the parser stack 3032 is capable of indicating to the parser 3018 a parsing state for the current input frame or packet. The symbols are generally non-terminal symbols, although terminal symbols may be in the parser stack as well.

When the symbol or symbols at the top of the parser stack 3032 is a terminal symbol, the parser 3018 compares data at the head of the input stream to the terminal symbol and expects a match in order to continue. The data is identified as Data In and is generally taken in some portion, such as bytes. Terminal symbols, for example, may be compared against one byte of data, DI. When the symbol at the top of the parser stack 3032 is a non-terminal (NT) symbol, parser 3018 uses the non-terminal symbol NT and current input data DI detect a match in the production rule code memory 3220 and subsequently the product rule table (PRT) 3022 which may yield more non-terminal (NT) symbols that expands the grammar production on the stack 3032. In addition, with a non-terminal symbol, as parsing continues, the parser 3018 may instruct SPU 3016 to process segments of the input stream, or perform other operations. A segment of the input stream may be the next 'n' bytes of data, identified as DI[n]. The parser 3018 may parse the data in the input stream prior to receiving all of the data to be processed by the semantic processor 3010. For instance, when the data is packetized the semantic processor 3010 may begin to parse through the headers of the packet before the entire packet II™" Ii,.,,. it ..^■■' ii,..ii :::::iι 11...11 :rn ..^■' 11::::; J'^X O IUI .3 is received at input port 3012.

Semantic processor 3010 generally uses at least three tables. Code segments for SPU 3016 are stored in semantic code table (SCT) 3024. Complex grammatical production rules are stored in a production rule table (PRT) 3022. Production rule (PR) codes for retrieving those production rules are stored in a parser table (PT) 3020. The PR codes in parser table 3020 also allow parser 3018 to detect whether a code segment from semantic code table 3024 should be loaded and executed by SPU 3016 for a give production rule.

The production rule (PR) codes in parser table 3200 point to production rules in production rule table 3220. PR codes are stored in some fashion, such as in a row-column format or a content-addressable format. In a row-column format, the rows of the table 3020 are indexed by a non-terminal symbol NT on the top of the internal parser stack 3032 of Figure 20, and the columns of the table are indexed by an input data value or values DI at the head of the data input stream in input buffer 3012. In a content-addressable format, a concatenation of the non-terminal symbol NT and the input data value or values DI can provide the input to the table 3020. Semantic processor 3010 will typically implement a content-addressable format, in which parser 3018 concatenates the non-terminal symbol NT with 8 bytes of current input data DI to provide the input to the parser table 3020. Optionally, parser table 3020 concatenates the non-terminal symbol NT and 8 bytes of prior input data DI stored in the parser 3018. It must be noted that some embodiments may include more components than those shown in Figure 17. However, for discussion purposes and application of the embodiments, those components are peripheral.

General parser operation for some embodiments will first be explained with reference to Figures 17-20. Figure 18 illustrates one possible implementation of a parser table 3020. Parser table30 20 is comprised of a production rule (PR) code memory 3220. PR code IHML,,;; Ii ,/ ifj Ϊ:::U U :::::n / c;..-'¹ om » memory 3200 contains a plurality of PR codes that are used to access a corresponding production rule stored in the production rule table (PRT) 3022. Practically, codes for many different grammars can exist at the same time in production rule code memory 3200. Unless required by a particular lookup implementation, the input values as discussed above such as a non-terminal (NT) symbol concatenated with current input values DI[n], where n is a selected match width in bytes need not be assigned in any particular order in PR code memory 3200.

In one embodiment, parser table 3200 also includes an addressor 3202 that receives an NT symbol and data values DI[n] from parser 3018 of Figure 17. Addressor 3202 concatenates an NT symbol with the data values DI[n], and applies the concatenated value to PR code memory 3200. Optionally, parser 3018 concatenates the NT symbol and data values DI[n] prior to transmitting them to parser table 3020.

Although conceptually it is often useful to view the structure of production rule code memory 3200 as a matrix with one PR code for each unique combination of NT code and data values, there is no limitation implied as to the embodiments of the present invention. Different types of memory and memory organization may be appropriate for different

applications.

For example, in one embodiment , the parser table 3020 is implemented as a Content Addressable Memory (CAM), where addressor 3202 uses an NT code and input data values DI[n] as a key for the CAM to look up the PR code corresponding to a production rule in the PRT 3022. Preferably, the CAM is a Ternary CAM (TCAM) populated with TCAM entries. Each TCAM entry comprises an NT code and a DI[n] match value. Each NT code can have multiple TCAM entries. Each bit of the DI[n] match value can be set to "0", "1", or "X" (representing "Don't Care"). This capability allows PR codes to require that only certain bits/bytes of DI[n] match a coded pattern in order for parser table 3020 to find a match. For instance, one row of the TCAM can contain an NT code NTJP for an IP destination address IF'¹¹ U I! ..^■•" U bi U !bt / n / ;i::l u JS field, followed by four bytes representing an IP destination address corresponding to a device incorporating the semantic processor 3010. The remaining four bytes of the TCAM row are set to "don't care." Thus when NTJP and eight bytes DI[8] are submitted to parser table 3020, where the first four bytes of DI[8] contain the correct IP address, a match will occur no matter what the last four bytes of DI[8] contain.

Since, the TCAM employs the "Don't Care" capability and there can be multiple TCAM entries for a single NT, the TCAM can find multiple matching TCAM entries for a given NT code and DI[n] match value. The TCAM prioritizes these matches through its hardware and only outputs the match of the highest priority. Further, when a NT code and a DI[n] match value are submitted to the TCAM, the TCAM attempts to match every TCAM entry with the received NT code and DI[n] match code in parallel. Thus, the TCAM has the ability to determine whether a match was found in parser table 3020 in a single clock cycle of semantic processor 3010.

Another way of viewing this architecture is as a "variable look-ahead" parser. Although a fixed data input segment, such as eight bytes, is applied to the TCAM, the TCAM coding allows a next production rale to be based on any portion of the current eight bytes of input. If only one bit, or byte, anywhere within the current eight bytes at the head of the input stream, is of interest for the current rule, the TCAM entry can be coded such that the rest are ignored during the match. Essentially, the current "symbol" can be defined for a given production rule as any combination of the 64 bits at the head of the input stream. By intelligent coding, the number of parsing cycles, NT codes, and table entries can generally be reduced for a given parsing task.

The TCAM in parser table 3020 produces a PR code corresponding to the TCAM entry 3204 matching NT and DI[n], as explained above. The PR code can be sent back to parser 3018, directly to PR table 3022, or both. In one embodiment, the PR code is the row P L I! ■■^•'^■ U !!::;[. IUl ^»3 ./ ιι:::!; /^{' 1}B lLn index of the TCAM entry producing a match.

When no TCAM entry 204 matches NT and DI[n], several options exist. In one embodiment, the PR code is accompanied by a "valid" bit, which remains unset if no TCAM entry matched the current input. In another embodiment, parser table 3020 constructs a default PR code corresponding to the NT supplied to the parser table. The use of a valid bit or default PR code will next be explained in conjunction with Figure 19.

Parser table 3020 can be located on or off-chip or both, when parser 3018 and SPU 3016 are integrated together in a circuit. For instance, static RAM (SRAM) or TCAM located on-chip can serve as parser table 3020. Alternately, off-chip DRAM or TCAM storage can store parser table 3020, with addressor 3202 serving as or communicating with a memory controller for the off-chip memory. , In other embodiments, the parser table 3020 can be located in off-chip memory, with an on-chip cache capable of holding a section of the parser table 3020.

Figure 19 illustrates one possible implementation for production rule table 3022. PR table 3022 comprises a production rule memory 3220, a Match All Parser entries Table (MAPT) memory 3228, and an addressor 3222.

In one embodiment, addressor 3222 receives PR codes from either parser 3018 or parser table 3020, and receives NT symbols from parser 3018. Preferably, the received NT symbol is the same NT symbol that is sent to parser table 3020, where it was used to locate the received PR code. Addressor 3222 uses these received PR codes and NT symbols to access corresponding production rules and default production rules, respectively. In one embodiment, the received PR codes address production rules in production rule memory 3220 and the received NT codes address default production rules in MAPT 3228. Addressor 3222 may not be necessary in some implementations, but when used, can be part of parser 3018, part of PRT 3022, or an intermediate functional block. An addressor may not be IP C T^'/ U bi U lb ./ ii:::!: ./ iEl «.:iil needed, for instance, if parser table 3020 or parser 3018 constructs addresses directly.

Production rule memory 3220 stores the production rales 3224 containing three data segments. These data segments include: a symbol segment, a SPU entry point (SEP) segment, and a skip bytes segment. These segments can either be fixed length segments or variable length segments that are, preferably, null-terminated. The symbol segment contains terminal and/or non-terminal symbols to be pushed onto the parser stack 3032 of Figure 20. The SEP segment contains SPU entry points (SEP) used by the SPU 16 in processing segments of data. The skip bytes segment contains skip bytes data used by the input buffer ¹ 3014 to increment its buffer pointer and advance the processing of the input stream. Other information useful in processing production rules can also be stored as part of production rale 3224.

MAPT 3228 stores default production rales 3226, which in this embodiment have the same structure as the PRs in production rule memory 3220, and are accessed when a PR code cannot be located during the parser table lookup. Although production rale memory 3220 and MAPT 3228 are shown as two separate memory blocks, there is not requirement or limitation to this implementation, hi one embodiment, production rule memory 3220 and MAPT 3228 are implemented as on-chip SRAM, where each production rale and default production rule contains multiple null- terminated segments. As production rales and default production rales can have various lengths, it is preferable to take an approach that allows easy indexing into their respective memories 3220 and 3228. In one approach, each PR has a fixed length that can accommodate a fixed maximum number of symbols, SEPs, and auxiliary data such as the skip bytes field. When a given PR does not need the maximum number of symbols or SEPs allowed for, the sequence can be terminated with a NULL symbol or SEP. When a given PR would require more than P 11,,,;; Ii / u a iι.,.ιt sn ./ EI: ./ ;i:j iy 3 the maximum number, it can be split into two PRs. These are then accessed such as by having the first issue a skip bytes value of zero and pushing an NT onto the stack that causes the second to be accessed on the following parsing cycle. In this approach, a one-to-one correspondence between TCAM entries and PR table entries can be maintained, such that the row address obtained from the TCAM is also the row address of the corresponding production rule in PR table 3022.

The MAPT 3228 section of PRT 3022 can be similarly indexed, but using NT codes instead of PR codes. For instance, when a valid bit on the PR code is unset, addressor 3222 ' can select as a PR table address the row corresponding to the current NT. For instance, if 256 - NTs are allowed, MAPT 3228 could contain 256 entries, each indexed to one of the NTs. When parser table 3020 has no entry corresponding to a current NT and data input DI[n], the corresponding default production rule from MAPT 3228 is accessed.

Taking the IP destination address again as an example, the parser table 3020 can be configured to respond to one of two expected destination addresses during the appropriate parsing cycle. For all other destination addresses, no parser table entry would be found. Addressor 3222 would then look up the default rule for the current NT, which would direct the parser 3018 and/or SPU 3016 to flush the current packet as a packet of no interest.

Although the above production rule table indexing approach provides relatively straightforward and rapid rule access, other indexing schemes are possible. For variable- length PR table entries, the PR code could be arithmetically manipulated to determine a production rule's physical memory starting address (this would be possible, for instance, if the production rules were sorted by expanded length, and then PR codes were assigned according to a rule's sorted position). In another approach, an intermediate pointer table can be used to determine the address of the production rule in PRT 3022 from the PR code or the default production rule in MAPT 3228 from the NT symbol. Il " It , Il 'I i¹ ' Mi Ii "ft It ^μ ,.' .1"Ii ¹I H 'i

Figure 20 shows one possible block implementation for parser 3018. Parser control finite state machine (FSM) 3030 controls and sequences overall parser 3018 operations, based on inputs from the other logical blocks in Figure 20. Parser stack 3032 stores the symbols to be executed by parser 3018. Input stream sequence control 3028 retrieves input data values from input buffer 12, to be processed by parser 18. SPU interface 3034 dispatches tasks to SPU 3016 on behalf of parser 3018. The particular functions of these blocks will be further described below.

The basic operation of the blocks in Figures 17-20 will now be described with reference to the flowchart of an embodiment of data stream parsing in Figure 21. According to a block 3040, semantic processor 3010 waits for a packet to be received at input buffer 3014 through input port 3012.

If a packet has been received at input buffer 3014, input buffer 3014 sends a Port ID signal to parser 3018 to be pushed onto parser stack 32 as a NT symbol at 3042. The Port ID signal alerts parser 3018 that a packet has arrived at input buffer 3014. In one embodiment, the Port ID signal is received by the input stream sequence control 3028 and transferred to FSM 3030, where it is pushed onto parser stack 3032. A 1-bit status flag, preceding or sent in parallel with the Port ID, may denote the Port ID as an NT symbol.

According to a next block 3044, parser 3018 receives N bytes of input stream data from input buffer 3012. This is done, after determining that the symbol on the top of parser stack 3032 is not the bottom-of-stack symbol and that the DXP is not waiting for further input. Parser 3018 requests and receives the data through a DATA/CONTROL signal coupled between the input stream sequence control 3028 and input buffer 3012.

At 3046, the process determines whether the symbol on the parser stack 3032 is a terminal symbol or an NT symbol. This determination may be performed by FSM 3030 reading the status flag of the symbol on parser stack 3032. When the symbol is determined to be a terminal symbol at 3046, parser 3018 checks for a match between the T symbol and the next byte of data from the received N bytes at 3048. FSM 3030 may check for a match by comparing the next byte of data received by input stream sequence control 3028 to the T symbol on parser stack 3032. After the check is completed, FSM 3030 pops the T symbol off of the parser stack 3032, possibly by decrementing the stack pointer.

When a match is not made at 3046 or at 3048, the remainder of the current data segment may be assumed in some circumstances to be unparseable as there was neither an NT symbol match nor a terminal symbol match. At 3050, parser 3018 resets parser stack 3032 and launches a SEP to remove the remainder of the current packet from the input buffer 3014. In one embodiment, FSM 3030 resets parser stack 3032 by popping off the remaining symbols, or preferably by setting the top-of-stack pointer to point to the bottom-of-stack symbol. Parser 3018 launches a SEP by sending a command to SPU 3016 through SPU interface 3034. This command may require SPU 3016 to load microinstructions from SCT 3024, that when executed, enable SPU 3016 to remove the remainder of the unparseable data segment from the input buffer 3014. Execution then returns to block 3040.

It is noted that not every instance of unparseable input in the data stream may result in abandoning parsing of the current data segment. For instance, the parser may be configured to handle ordinary header options directly with grammar. Other, less common or difficult header options could be dealt with using a default grammar rule that passes the header options to a SPU for parsing.

Returning to 3046, if a match is made execution returns to block 3044, where parser 3018 requests and receives additional input stream data from input buffer 3014. In one embodiment, parser 3018 would only request and receive one byte of input stream data after a T symbol match was made, to refill the DI buffer since one input symbol was consumed. At 3050, when the symbol is determined to be an NT symbol, parser 3018 sends the NT symbol from parser stack 3032 and the received N bytes DI[N] in input stream sequence control 3028 to parser table 3020, where parser table 3020 checks for a match as previously described. In the illustrated embodiment, parser table 3020 concatenates the NT symbol and the received N bytes. Optionally, the NT symbol and the received N bytes can be concatenated prior to being sent to parser table 3020. The received N bytes are concurrently sent to both SPU interface 3034 and parser table 3020, and the NT symbol is concurrently sent to both the parser table 3020 and the PRT 3022. After the check is completed, FSM 3030 pops the NT symbol off of the parser stack 3032, possibly by decrementing the stack pointer.

If a match is made at 3050, it is determined if the symbol is a debug symbol at 3052. If it is a debug symbol at 3052, the process moves to a debug process as set out in Figure 22. If it is not a debug symbol at 3052, a production rule code match is determined at 3056. This provides a matching production rule from the production rule table 3022. Optionally, the PR code is sent from parser table 3200 to PRT 3250, through parser 3018.

If the NT symbol is does not have a production rule code match at 3056, parser 3018 uses the received NT symbol to look up a default production rule in the PRT 3022 at 3058. In one embodiment, the default production rule is looked up in the MAPT 3228 memory located within PRT 3022. Optionally, MAPT 3228 memory can be located in a memory block other than PRT 3022. In one embodiment, the default production rule may be a debug rule that places the parser in debug mode in recognition of encountering a symbol that has no rule.

In one embodiment, when PRT 3022 receives a PR code, it only returns a PR to parser 3018 at 3060, corresponding either to a found production rule or a default production rule. Optionally, a PR and a default PR can both be returned to parser 3018 at 3060, with parser 3018 determining which will be used.

At 3062, parser 3018 processes the rule received from PRT 3250. The rule received by parser 3018 can either be a production rule or a default production rule. In one embodiment, FSM 3030 divides the rule into three segments, a symbol segment, SEP segment, and a skip bytes segment. Each segment of the rule may be fixed length or null- terminated to enable easy and accurate division.

In the illustrated embodiment, FSM 3030 pushes T and/or NT symbols, contained in the symbol segment of the production rule, onto parser stack 3032. FSM 3030 sends the SEPs contained in the SEP segment of the production rule to SPU interface 3034. Each SEP contains an address to microinstructions located in SCT 3024. Upon receipt of the SEPs, SPU interface 3034 allocates SPU 3016 to fetch and execute the microinstructions pointed to by the SEP. SPU interface 3034 also sends the current DI[N] value to SPU 3016, as in many situations the task to be completed by the SPU will need no further input data. Optionally, SPU interface 3034 fetches the microinstructions to be executed by SPU 3016, and sends them to SPU 3016 concurrent with its allocation.

FSM 3030 sends the skip bytes segment of the production rule to input buffer 3014 through input stream sequence control 3028. Input buffer 3014 uses the skip bytes data to increment its buffer pointer, pointing to a location in the input stream. Each parsing cycle can accordingly consume any number of input symbols between 0 and 8. After parser 3018 processes the rule received from PRT 3022, the next symbol on the parser stack 3032 is determined to be a bottom-of-stack symbol at 3064, or if the parser stack need further parsing. At 3064, parser 3018 determines whether the input data in the selected buffer is in need of further parsing. In one embodiment, the input data in input buffer 3014 is in need of further parsing when the stack pointer for parser stack 3032 is pointing to a symbol other than the bottom-of-stack symbol. In some embodiments, FSM 3030 receives a stack empty signal SE when the stack pointer for parser stack 3032 is pointing to the bottom-of- stack symbol.

When the input data in the selected buffer does not need to be parsed further at 3064, typically determined by a particular NT symbol at the top of the parser stack, execution returns to block 3040. When the input data in the selected buffer needs to be parsed further, parser 3018 determines whether it can continue parsing the input data in the selected buffer at 3066. In one embodiment, parsing can halt on input data from a given buffer, while still in need of parsing, for a number of reasons, such as dependency on a pending or executing SPU operation, a lack of input data, other input buffers having priority over parsing, etc. Parser 3018 is alerted to SPU processing delays by SEP dispatcher 3036 through a Status signal, and is alerted to priority parsing tasks by status values in stored in FSM 3030.

When parser 3018 can continue parsing in the current parsing context, execution returns to block 3044, where parser 3018 requests and receives up to N bytes of data from the input data within the selected buffer. When parser 3018 cannot continue parsing at 3066, parser 3018 saves the selected parser stack and subsequently de-selects the selected parser stack and the selected input buffer at 3068. Input stream sequence control 3028, after receiving a switch signal from FSM 3030, de-selects one input port within 3012 by selecting another port within 3012 that has received input data. The selected port within 3012 and the selected stack within the parser stack 3032 can remain active when there is not another port with new data waiting to be parsed.

Having seen the typical parsing operation, it is now possible to see how a NT symbol designating a debug operation may be useful. When parser 3018 encounters a debug NT symbol as shown at 3054 in Figure 21, the parser is placed in a debug state. It must be noted that a 'debug' symbol may be an explicit debug symbol or a previously unknown symbol in the data being parsed. Both of these will be referred to as a debug symbol. For example, any NT symbol for which there is not a match may place the parser in a debug state. In this last embodiment, the default production rule of Figure 21 is a debug rule. In either case, the parser is placed in a debug state upon encountering a symbol that is unanticipated or for which there is no rule. The default production rule for the unknown symbol becomes a debug production rule.

In Figure 22, the parser assumes a debug state at 3070. The debug state will trigger an error message, either after the parser assumes the debug state, or simultaneously. The error message may be an interrupt transmitted to the SPU dispatcher indicating that an error condition or interrupt has occurred and a SPU is needed to handle the situation. The dispatcher then launches an SPU to handle the error.

Handling the error may comprise gathering information related to the situation that caused the parser to assume the debug state. This information may include the last key used in looking up the symbol in the CAM, where the key may be the last NT symbol concatenated with the next N bytes of data, as discussed above. The information may also include the last production rule code retrieved prior to this symbol, the packet identifier of the current packet being processed, the status of the FSM, and the status of any error and interrupt registers used in the system. Further, the debug may cause the parser to save the contents of the parser stack for inspection or observation by an SPU. Once this information is gathered, it is stored, presented to a user, or transmitted back to a manufacturer. For example, if the present parser is operating in a laboratory, it may save an error log for a user to view later, or create an error message on a user display. This would allow programmers at the laboratory to determine what the parser encountered that caused it to enter the debug state, and to provide a rule for that situation in the PRT 3022, accessible via a test station to which the parser is attached, such as a computer workstation. Alternatively, the log could be generated by a device operating at a customer site, and the log accessed by a service person during maintenance. In yet another alternative, the log or error message may be transmitted from the customer site back to the manufacturer to allow the manufacturer to remedy the problem. 5 In this manner, the ability of a manufacturer to identify and expand a grammar used in

₎ parsing packets is enhanced. The debug state allows the system to gather data related to a situation that the parser encountered and could not parse. This data can be used to determine if there is a new or previously unknown header that requires a new production rule code to be added to the grammar.

10 FIG. 23 shows a block diagram of a semantic processor 4100 according to an embodiment of the invention. The semantic processor 4100 contains an input buffer 4140 for buffering a packet data stream (e.g., the input stream) received through the input port 4120, a direct execution parser (DXP) 4180 that controls the processing of packet data received at the input buffer 4140, a recirculation buffer 4160, a semantic processing unit (SPU) 4200 for

15 processing segments of the packets or for performing other operations, a memory subsystem 4240 for storing and/or augmenting segments of the packets, and an output buffer 4750 for buffering a data stream (e.g., the output stream) received from the SPU 4200.

The DXP 4180 maintains an internal parser stack (not shown) of terminal and non¬ terminal symbols, based on parsing of the current frame up to the current symbol. For 0 instance, each symbol on the internal parser stack is capable of indicating to the DXP 4180 a parsing state for the current input frame or packet. When the symbol (or symbols) at the top of the parser stack is a terminal symbol, DXP 4180 compares data at the head of the input stream to the terminal symbol and expects a match in order to continue. When the symbol at the top of the parser stack is a non-terminal symbol, DXP 4180 uses the non-terminal symbol

25 and current input data to expand the grammar production on the stack. As parsing continues, DXP 4180 instructs SPU 4200 to process segments of the input stream or perform other operations. The DXP 4180 may parse the data in the input stream prior to receiving all of the data to be processed by the semantic processor 4100. For instance, when the data is packetized, the semantic processor 4100 may begin to parse through the headers of the packet before the entire packet is received at input port 4120.

Semantic processor 4100 uses at least three tables. Code segments for SPU 4200 are stored in semantic code table (SCT) 4150. Complex grammatical production rules are stored in a production rule table (PRT) 4190. Production rule codes for retrieving those production rules are stored in a parser table (PT) 4170. The production rule codes in parser table 4170 allow DXP 4180 to detect whether, for a given production rule, a code segment from SCT 4150 should be loaded and executed by SPU 4200.

Some embodiments of the invention contain many more elements than those shown in FIG. 23, but these elements may appear in every system or software embodiment. Thus, a description of the packet flow within the semantic processor 100 shown in FIG. 23 will be given before more complex embodiments are addressed.

FIG. 24 contains a flow chart 4300 for the processing of received packets through the semantic processor 4100 of FIG. 23. The flowchart 4300 is used for illustrating a method of the invention.

According to a block 4310, a packet is received at the input buffer 4140 through the input port 4120. According to a next block 4320, the DXP 4180 begins to parse through the header of the packet within the input buffer 4140. According to a decision block 4330, it is determined whether the DXP 4180 was able to completely parse through header. In the case where the packet needs no additional manipulation or additional packets to enable the processing of the packet payload, the DXP 4180 will completely parse through the header. In the case where the packet needs additional manipulation or additional packets to enable the processing of the packet payload, the DXP 4180 will cease to parse the header.

If the DXP 4180 was able to completely parse through the header, then according to a next block 4370, the DXP 4180 calls a routine within the SPU 4200 to process the packet payload. The semantic processor 4100 then waits for a next packet to be received at the input buffer 4140 through the input port 4120.

If the DXP 4180 had to cease parsing the header, then according to a next block 4340, the DXP 4180 calls a routine within the SPU 4200 to manipulate the packet or wait for additional packets. Upon completion of the manipulation or the arrival of additional packets, the SPU 4200 creates an adjusted packet. According to a next block 4350, the SPU 4200 writes the adjusted packet (or a portion thereof) to the recirculation buffer 4160. This can be accomplished by either enabling the recirculation buffer 4160 with direct memory access to the memory subsystem 4240 or by having the SPU 4200 read the adjusted packet from the memory subsystem 4240 and then write the adjusted packet to the recirculation buffer 4160. Optionally, to save processing time within the SPU 4200, instead of the entire adjusted packet, a specialized header can be written to the recirculation buffer 4160. This specialized header directs the SPU 4200 to process the adjusted packet without having to transfer the entire packet out of memory subsystem 4240.

According to a next block 4360, the DXP 180 begins to parse through the header of the data within the recirculation buffer 4160. Execution is then returned to block 4330, where it is determined whether the DXP 4180 was able to completely parse through the header. If the DXP 4180 was able to completely parse through the header, then according to a next block 4370, the DXP 4180 calls a routine within the SPU 4200 to process the packet payload and the semantic processor 4100 waits for a next packet to be received at the input buffer 4140 through the input port 4120. If the DXP 4180 had to cease parsing the header, execution returns to block 4340

where the DXP 4180 calls a routine within the SPU 4200 to manipulate the packet or wait for

additional packets, thus creating an adjusted packet. The SPU 4200 then writes the adjusted

packet to the recirculation buffer 4160, and the DXP 4180 begins to parse through the header of the packet within the recirculation buffer 4160.

FIG. 25 shows another semantic processor embodiment 4400. Semantic processor

4400 includes memory subsystem 4240, which comprises an array machine-context data

memory (AMCD) 4430 for accessing data in dynamic random access memory (DRAM) 4480

through a hashing function or content-addressable memory (CAM) lookup, a cryptography

block 4440 for encryption or decryption, and/or authentication of data, a context control

block (CCB) cache 4450 for caching context control blocks to and from DRAM 4480, a

general cache 4460 for caching data used in basic operations, and a streaming cache 4470 for

caching data streams as they are being written to and read from DRAM 4480. The context

control block cache 4450 is preferably a software-controlled cache, i.e., the SPU 4410

determines when a cache line is used and freed.

The SPU 4410 is coupled with AMCD 4430, cryptography block 4440, CCB cache

4450, general cache 4460, and streaming cache 4470. When signaled by the DXP 4180 to

process a segment of data in memory subsystem 4240 or received at input buffer 4120 (FIG.

23), the SPU 4410 loads microinstructions from semantic code table (SCT) 4150. The loaded

microinstructions are then executed in the SPU 4410 and the segment of the packet is

processed accordingly.

FIG. 26 contains a flow chart 4500 for the processing of received Internet Protocol

(IP)-fragmented packets through the semantic processor 400 of FIG. 25. The flowchart 4500

is used for illustrating one method according to an embodiment of the invention.

Once a packet is received at the input buffer 4140 through the input port 4120 and the DXP 4180 begins to parse through the headers of the packet within the input buffer 4140, according to a block 4510, the DXP 4180 ceases parsing through the headers of the received packet because the packet is determined to be an IP-fragmented packet. Preferably, the DXP 4180 completely parses through the IP header, but ceases to parse through any headers belonging to subsequent layers, such as TCP, UDP, iSCSI, etc.

According to a next block 4520, the DXP 4180 signals to the SPU 4410 to load the appropriate microinstructions from the SCT 4150 and read the received packet from the input buffer 4140. According to a next block 4530, the SPU 4410 writes the received packet to DRAM 4480 through the streaming cache 4470. Although blocks 4520 and 4530 are shown as two separate steps, optionally, they can be performed as one step—with the SPU 4410 reading and writing the packet concurrently. This concurrent operation of reading and writing by the SPU 4410 is known as SPU pipelining, where the SPU 4410 acts as a conduit or pipeline for streaming data to be transferred between two blocks within the semantic processor 4400. According to a next decision block 4540, the SPU 4410 determines if a Context

Control Block (CCB) has been allocated for the collection and sequencing of the correct IP packet fragments. Preferably, the CCB for collecting and sequencing the fragments corresponding to an IP-fragmented packet is stored in DRAM 4480. The CCB contains pointers to the IP fragments in DRAM 4480, a bit mask for the IP-fragmented packets that have not arrived, and a timer value to force the semantic processor 4400 to cease waiting for additional IP -fragmented packets after an allotted period of time and to release the data stored in the CCB within DRAM 4480.

The SPU 4410 preferably determines if a CCB has been allocated by accessing the AMCD's 4430 content-addressable memory (CAM) lookup function using the IP source address of the received IP -fragmented packet combined with the identification and protocol from the header of the received IP packet fragment as a key. Optionally, the IP fragment keys are stored in a separate CCB table within DRAM 4480 and are accessed with the CAM by using the IP source address of the received IP-fragmented packet combined with the identification and protocol from the header of the received IP packet fragment. This optional addressing of the IP fragment keys avoids key overlap and sizing problems.

If the SPU 4410 determines that a CCB has not been allocated for the collection and sequencing of fragments for a particular IP-fragmented packet, execution then proceeds to a block 4550 where the SPU 4410 allocates a CCB. The SPU 4410 preferably enters a key corresponding to the allocated CCB, the key comprising the IP source address of the received IP fragment and the identification and protocol from the header of the received IP -fragmented packet, into an IP fragment CCB table within the AMCD 4430, and starts the timer located in the CCB. When the first fragment for given fragmented packet is received, the IP header is also saved to the CCB for later recirculation. For further fragments, the IP header need not be saved. Once a CCB has been allocated for the collection and sequencing of IP-fragmented packet, the SPU 4410 stores a pointer to the IP-fragmented packet (minus its IP header) in DRAM 4480 within the CCB, according to a next block 4560. The pointers for the fragments can be arranged in the CCB as, e.g., a linked list. Preferably, the SPU 4410 also updates the bit mask in the newly allocated CCB by marking the portion of the mask corresponding to the received fragment as received.

According to a next decision block 4570, the SPU 410 determines if all of the IP fragments from the packet have been received. Preferably, this determination is accomplished by using the bit mask in the CCB. A person of ordinary skill in the art can appreciate that there are multiple techniques readily available to implement the bit mask, or an equivalent tracking mechanism, for use with the invention. If all of the fragments have not been received for the IP-fragmented packet, then the semantic processor 4400 defers further processing on that fragmented packet until another fragment is received.

If all of the IP fragments have been received, according to a next block 4580, the SPU 4410 resets the timer, reads the IP fragments from DRAM 4480 in the correct order, and writes them to the recirculation buffer 4160 for additional parsing and processing. Preferably, the SPU 4410 writes only a specialized header and the first part of the reassembled IP packet (with the fragmentation bit unset) to the recirculation buffer 4160. The specialized header enables the DXP 4180 to direct the processing of the reassembled IP- fragmented packet stored in DRAM 4480 without having to transfer all of the IP -fragmented packets to the recirculation buffer 4160. The specialized header can consist of a designated non-terminal symbol that loads parser grammar for IP and a pointer to the CCB. The parser can then parse the IP header normally and proceed to parse higher-layer (e.g., TCP) headers. In an embodiment of the invention, DXP 4180 decides to parse the data received at either the recirculation buffer 4160 or the input buffer 4140 through round robin arbitration. A high level description of round robin arbitration will now be discussed with reference to a first and a second buffer for receiving packet data streams. After completing the parsing of a packet within the first buffer, DXP 4180 looks to the second buffer to determine if data is available to be parsed. If so, the data from the second buffer is parsed. If not, then DXP 4180 looks back to the first buffer to determine if data is available to be parsed. DXP 4180 continues this round robin arbitration until data is available to be parsed in either the first buffer or second buffer.

FIG. 27 contains a flow chart 4600 for the processing of received packets in need of decryption and/or authentication through the semantic processor 4400 of FIG. 25. The flowchart 4600 is used for illustrating another method according to an embodiment of the invention.

Once a packet is received at the input buffer 4140 or the recirculation buffer 4160 and the DXP 4180 begins to parse through the headers of the received packet, according to a ^• block 4610, the DXP 4180 ceases parsing through the headers of the received packet because it is determined that the packet needs decryption and/or authentication. IfDXP 4180 begins to parse through the packet headers from the recirculation buffer 4160, preferably, the recirculation buffer 4160 will only contain the aforementioned specialized header and the first part of the reassembled IP packet.

According to a next block 4620, the DXP 4180 signals to the SPU 4410 to load the appropriate microinstructions from the SCT 4150 and read the received packet from input buffer 4140 or recirculation buffer 4160. Preferably, SPU 4410 will read the packet fragments from DRAM 4480 instead of the recirculation buffer 4160 for data that has not already been placed in the recirculation buffer 4160.

According to a next block 4630, the SPU 4410 writes the received packet to cryptography block 4440, where the packet is authenticated, decrypted, or both. In a preferred embodiment, decryption and authentication are performed in parallel within cryptography block 4440. The cryptography block 4440 enables the authentication, encryption, or decryption of a packet through the use of Triple Data Encryption Standard (T- DES), Advanced Encryption Standard (AES), Message Digest 5 (MD-5), Secure Hash Algorithm 1 (SHA-I), Rivest Cipher 4 (RC-4) algorithms, etc. Although block 4620 and 4630 are shown as two separate steps, optionally, they can be performed as one step with the SPU 4410 reading and writing the packet concurrently.

The decrypted and/or authenticated packet is then written to SPU 4410 and, according to a next block 4640, the SPU 4410 writes the packet to the recirculation buffer 4160 for further processing. In a preferred embodiment, the cryptography block 4440 contains a direct memory access engine that can read data from and write data to DRAM 4480. By writing the

decrypted and/or authenticated packet back to DRAM 4480, SPU 4410 can then readjust the

headers of the decrypted and/or authenticated packet from DRAM 4480 and subsequently

write them to the recirculation buffer 4160. Since the payload of the packet remains in

DRAM 4480, semantic processor 4400 saves processing time. Like with IP fragmentation, a

specialized header can be written to the recirculation buffer to orient the parser and pass CCB

information back to SPU 4410.

Multiple passes through the recirculation buffer 4160 may be necessary when IP

fragmentation and encryption/authentication are contained in a single packet received by the

semantic processor 4400.

FIG. 28 shows yet another semantic processor embodiment. Semantic processor 4700

contains a semantic processing unit (SPU) cluster 4410 containing a plurality of semantic

processing units 4410-1, 4410-2, 4410-n. Preferably, each of the SPUs 4410-1 to 4410-n is

identical and has the same functionality. The SPU cluster 4410 is coupled to the memory¹

subsystem 4240, a SPU entry point (SEP) dispatcher 4720, the SCT 4150, port input buffer

(PIB) 4730, packet output buffer (POB) 4750, and a machine central processing unit (MCPU)

4771.

When DXP 4180 determines that a SPU task is to be launched at a specific point in

parsing, DXP 4180 signals SEP dispatcher 4720 to load microinstructions from SCT 4150

and allocate a SPU from the plurality of SPUs 4410- 1 to 4410-n within the SPU cluster 4410

to perform the task. The loaded microinstructions and task to be performed are then sent to

the allocated SPU. The allocated SPU then executes the microinstructions and the data packet is processed accordingly. The SPU can optionally load microinstructions from the

SCT 4150 directly when instructed by the SEP dispatcher 4720.

The MCPU 4771 is coupled with the SPU cluster 4410 and memory subsystem 4240. The MCPU 4771 may perforin any desired function for semantic processor 4700 that can be

reasonably accomplished with traditional software running on standard hardware. These

functions are usually infrequent, non-time-critical functions that do not warrant inclusion in

SCT 4150 due to complexity. Preferably, the MCPU 4771 also has the capability to

communicate with the dispatcher in SPU cluster 4410 in order to request that a SPU perform

tasks on the MCPU' s behalf.

In an embodiment of the invention, the memory subsystem 4240 further comprises a

DRAM interface 4790 that couples the cryptography block 4440, context control block cache

4450, general cache 4460, and streaming cache 4470 to DRAM 4480 and external DRAM

4791. In this embodiment, the AMCD 4430 connects directly to an external TCAM 4793,

which, in turn, is coupled to an external Static Random Access Memory (SRAM) 4795.

The PIB 4730 contains at least one network interface input buffer, a recirculation

buffer, and a Peripheral Component Interconnect (PCI-X) input buffer. The POB 4750

contains at least one network interface output buffer and a Peripheral Component

Interconnect (PCI-X) output buffer. The port block 4740 contains one or more ports, each

comprising a physical interface, e.g., an optical, electrical, or radio frequency driver/receiver

pair for an Ethernet, Fibre Channel, 802.1 Ix, Universal Serial Bus, Firewire, or other

physical layer interface. Preferably, the number of ports within port block 4740 corresponds

to the number of network interface input buffers within the PIB 4730 and the number of

output buffers within the POB 4750.

The PCI-X interface 4760 is coupled to a PCI-X input buffer within the PIB 4730, a

PCI-X output buffer within the POB 4750, and an external PCI bus 4780. The PCI bus 4780 can connect to other PCI-capable components, such as disk drive, interfaces for additional

network ports, etc.

FIG. 29 shows one embodiment of the POB 4750 in more detail. The POB 4750 comprises two FIFO controllers and two buffers implemented in RAM. For each FIFO controller, the POB 4750 includes a packer which comprises an address decoder. The output of the POB 4750 is coupled to an egress state machine which then connects to an interface.

As shown in FIG. 30, each buffer is 69 bits wide. The lower 64 bits of the buffer hold data, followed by three bits of encoded information to indicate how many bytes in that location are valid. Then two bits on the end are used to provide additional information, such as: a 0 indicates data; a 1 indicates end of packet (EOP); a 2 indicates Cyclic Redundance Code (CRC); and 3 is reserved.

The buffer holds 8 bytes of data. However, the packets of data sent to the buffer may be formed in "scatter-gather" format. That is, the header of the packer can be in one location in memory while the rest of the packet can be in another location. Thus, when the SPU writes to the POB 4750, the SPU may, for example, first write 3 bytes of data and then write another 3 bytes of data. To avoid having to write partial bytes into the RAM, the POB 4750 includes a packer for holding bytes of data in a holding register until enough bytes are accumulated to send to the buffer.

Referring back to FIG. 29, the SPUs in the SPU cluster 4710 access the POB 4750 via the address bus and the data bus. To determine how many of the bytes of data sent from the SPU are valid, the packer in the POB 4750 decodes the lower 3 bits of the address, i.e. bits [2:0] of the address. In one embodiment, the address decoding scheme implemented may be as shown in Table 1 below.

Table 1

When the packer has decoded the address, the packer then determines whether it has enough data to commit to the RAM. If the packer determines there are not enough data, the packer sends the data into the holding register. When enough bytes have been accumulated in the holding register, the data is pushed into the FIFO controller and sent to the RAM. In some cases, the SPU in the SPU cluster 4710 may write an EOP into the packer. Here, the packer sends all of the data to the RAM. In one embodiment, the packer may be implemented using flip-flop registers.

The POB 4750 further comprises an egress state machine. The egress state machine tracks the states of each FIFO; the state machine senses that a FIFO has data and unloads the FIFO to the interface. The state machine then alternates to the other FIFO and unloads that FIFO to the interface. If both FIFOs are empty, the state machine will assume that the first FIFO has data and then alternate between the FIFOs, unloading them to the interface. Thus, data in the packer is sent out in the order it was written into the packer.

The POB 4750 includes a CRC engine to detect error conditions in the buffered data. Error conditions which may be encountered include underruns and invalid EOP. hi an overrun condition, the SPU cannot feed data quickly enough into the POB 4750 and there are not enough packets to process. With an invalid EOP error, an EOP is written into the packer while there is no packet in flight. These two conditions will flag an error which shut off the POB 4750, thereby preventing the SPUs from accessing the buffers. In one embodiment, underruns may be avoided by setting a programmable threshold

to indicate when to start sending out the packets to the buffer. For example, underruns can be

avoided altogether if the threshold is set to be the end of packet. In this case, packets will not

be sent until the end of packet is sent and underruns will not occur. However, performance

will not be optimal at this threshold.

Each SPU in the SPU cluster can access the POB 4750. However, to prevent

) corruption of packets sent to the POB 4750, only one SPU can write into the FIFO. In one

embodiment, a token mechanism, such as flags maintained in external memory, may be used

to indicate which SPU can access the POB 4750. Another SPU cannot access the buffer until

released by the first SPU.

One of ordinary skill in the art will recognize that the concepts taught herein can be

tailored to a particular application in many other advantageous ways. Although one set of

cooperating parser/input buffer hardware and grammar functionality has been described,

others are possible. Also, not every multiple-context parsing approach will need hardware

counters, as some parsing problems may contain only context-switching points that can be

derived from the input data itself.

Those skilled in the art recognize that other functional partitions are possible within

the scope of the invention. Further, what functions are and are not implemented on a

common integrated circuit can vary depending on application.

Finally, although the specification may refer to "an", "one", "another", or "some"

embodiment(s) in several locations, this does not necessarily mean that each such reference is

to the same embodiment(s), or that the feature only applies to a single embodiment.

Claims

CLAIMSWhat is claimed is:

1. An apparatus comprising: a parser configured to parse a data stream according to a plurality of parsing-contexts, wherein the parser switches parsing-contexts responsive to the semantics of the data stream.

2. The apparatus according to claim 1 wherein the parser parses a first segment of the data stream according to a first parsing-context and a second segment of the data stream according to a second parsing-context.

3. The apparatus according to claim 2 wherein the parser includes a data interface

configured to receive segments of the data stream and to count a number of received

segments, the parser switches parsing-contexts responsive to the counted number.

4. The apparatus according to claim 1 wherein the parser switches parsing contexts

according to switching symbols identified while parsing the data stream, each switching

symbol indicates to the parser a next parsing-context for parsing the data stream.

5. A system comprising:

a proxy configured to manage a plurality of sessions including at least one

transmission control protocol (TCP) session, wherein the proxy translates data between the

transmission control protocol session and a local session.

6. The system of claim 1 wherein the proxy operates:

a device-interface proxy configured to manage the local session with a networking device; and a network-interface proxy configured to manage the transmission control protocol session over a public network, wherein the network-interface proxy depacketizes and sequences the data received over the transmission control protocol session prior to providing it to the device-interface proxy.

7. The system of claim 1 wherein the proxy includes at least one semantic processor having a direct execution parser to identify the transmission control protocol sessions according to a stored grammar.

8. The system of claim 3 wherein the semantic processor implements a transmission control protocol state machine to manage the transmission control protocol sessions by switching contexts within the stored grammar.

9. The system of claim 1 including a plurality of networking devices each configured to process data from the network, wherein the proxy is coupled between the network and the network devices and configured to manage at least one local session with each networking device and to translate data between each local session and the corresponding transmission control protocol session.

10. A device, comprising: at least one input port to allow the processor to receive data;

a parser to:

parse the data in response to symbols in a parser stack;

determine when a symbol is a debug non-terminal symbol; and notify the device of an interrupt.

11. The device of claim 1 , the device further comprising an array of semantic processing units.

12. The device of claim 2, the device further comprising a dispatcher to receive the notification of the interrupt and to dispatch the interrupt to one of the array of semantic processing units to handle the interrupt.

13. The device of claim 3, the semantic processing unit further to gather data related to the interrupt and store it.

14. The device of claim 4, the semantic processing unit to gather data further comprising a semantic processing unit to determine at least one of the group consisting of: the last data, a packet identifier from which the data was accessed, a last state in the parser finite state machine, and last contents of the parser stack.

15. A processor, comprising : a direct execution parser configured to control the processing of digital data by semantically parsing data; a plurality of semantic processing units configured to perform data operations when prompted by the direct execution parser; and a plurality of output buffers for buffering data received from the plurality of semantic processing units.

16. The processor of claim 14, wherein each of the plurality of output buffers is configured for access by only one of the plurality of semantic processing units at any given time.

17. The processor of claim 14, further comprising a token mechanism for indicating which semantic processing unit can access the plurality of output buffers.

18. The processor of claim 14, wherein the plurality of output buffers send data received from the plurality of semantic processing units to a network interface port.

19. The processor of claim 14, wherein the plurality of output buffers send data received from the plurality of semantic processing unit to a peripheral component.