US20120057640A1

US20120057640A1 - Video Analytics for Security Systems and Methods

Info

Publication number: US20120057640A1
Application number: US13/225,238
Authority: US
Inventors: Fang Shi; Changsong Qi; Jin Ming; Keqiang Dai
Original assignee: Intersil Americas LLC
Current assignee: Intersil Americas LLC
Priority date: 2010-09-02
Filing date: 2011-09-02
Publication date: 2012-03-08
Also published as: US20120057633A1; US20120057629A1; US20140369417A1; US8824554B2; US20120057634A1; US9609348B2

Abstract

Video processing, encoding and decoding systems are described. A processor receives video frames representative of a sequence of images captured by a video sensor and the video frames are encode according to a desired video encoding standard. A video analytics processor receives video analytics metadata generated by the video encoder from the sequence of images and produces video analytics messages for transmission to a client device which performs client side video analytics processing. The video analytics metadata may comprise pixel domain video analytics information directly from an analog-to-digital front end or directly from an encoding engine as the engine is performing compression.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application claims priority from PCT/CN2010/076555 (title: “Video Analytics for Security Systems and Methods”) which was filed in the Chinese Receiving Office on Sep. 2, 2010, from PCT/CN2010/076569 (title: “Video Classification Systems and Methods”) which was filed in the Chinese Receiving Office on Sep. 2, 2010, from PCT/CN2010/076564 (title: “Rho-Domain Metrics”) which was filed in the Chinese Receiving Office on Sep. 2, 2010, and from PCT/CN2010/076567 (title: “Systems And Methods for Video Content Analysis) which was filed in the Chinese Receiving Office on Sep. 2, 2010, each of these applications being hereby incorporated herein by reference. The present Application is also related to concurrently filed U.S. Patent non-provisional applications entitled “Video Classification Systems and Methods” (attorney docket no. 043497-0393274), “Rho-Domain Metrics” (attorney docket no. 043497-0393276) and “Systems And Methods for Video Content Analysis” (attorney docket no. 043497-0393278), which are expressly incorporated by reference herein.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block schematic illustrating a simplified example of a video security surveillance analytics architecture according to certain aspects of the invention.
FIG. 2 is a block schematic depicting an example of a video analytics engine according to certain aspects of the invention.
FIG. 3 depicts an example of H.264 standards-defined bitstream syntax.
FIG. 4A is an image that includes both foreground and background objects.
FIG. 4B is the image of 4A from which foreground objects have been extracted using techniques according to certain aspects of the invention.
FIGS. 5A and 5B are images illustrating virtual line counting according to certain aspects of the invention.
FIG. 6 is a simplified block schematic illustrating a processing system employed in certain embodiments of the invention.

DETAILED DESCRIPTION

Embodiments of the present invention will now be described in detail with reference to the drawings, which are provided as illustrative examples so as to enable those skilled in the art to practice the invention. Notably, the figures and examples below are not meant to limit the scope of the present invention to a single embodiment, but other embodiments are possible by way of interchange of some or all of the described or illustrated elements. Wherever convenient, the same reference numbers will be used throughout the drawings to refer to same or like parts. Where certain elements of these embodiments can be partially or fully implemented using known components, only those portions of such known components that are necessary for an understanding of the disclosed embodiments will be described, and detailed descriptions of other portions of such known components will be omitted so as not to obscure the disclosed embodiments. In the present specification, an embodiment showing a singular component should not be considered limiting; rather, the invention is intended to encompass other embodiments including a plurality of the same component, and vice-versa, unless explicitly stated otherwise herein. Moreover, applicants do not intend for any term in the specification or claims to be ascribed an uncommon or special meaning unless explicitly set forth as such. Further, certain embodiments of the present invention encompass present and future known equivalents to the components referred to herein by way of illustration.
Certain embodiments of the invention comprise systems having an architecture that is operable to perform video analytics for security applications. Video analytics may also be referred to as video content analysis. In a video security surveillance analytics architecture where the server encodes captured video images, certain embodiments provide greatly improved video analytics efficiency for client side processing applications and systems. By improving and/or optimizing client side video analytics efficiency, client-side performance can be greatly improved, consequently enabling processing of an increased number of video channels. Moreover, video analytics metadata (“VAMD”) created on the server side according to certain aspects of the invention can enable high accuracy video analytics on the server side and for the video security surveillance system as a whole. According to certain aspects of the invention, the advantages of a layered video analytics system architecture can include facilitating and/or enabling a balanced partition of video analytics at multiple layers. These layers may include server and client layers, pixel domain layers and motion domain layers. For example, global analytics defined to include information related to background frame, segmented object descriptors and camera parameters can enable cost efficient yet complex video analytics in the receiver side for many advanced video intelligent application and can enable an otherwise difficult or impossible level of video analytics efficiency in terms of computational complexity and analytic accuracy.
A simplified example of a video security surveillance analytics architecture is shown in FIG. 1. In the example, the system is partitioned into server side 10 and client side 12 elements. The terms server and client are used here to include hardware and software systems, apparatus and other components that perform types of functions that can be attributed to server side 10 and client side 12 operations. It will be appreciated that certain elements may be provided on either or both server side 10 and client side 12, and that at least some client and server functionality may be committed to hardware components such as application specific integrated circuits, sequencers, custom logic devices as needed, typically to improve one or more of efficiency, reliability, processing speed and security. Server side 10 components may be embodied in a security surveillance or other camera.
On server side 10, a video sensor 100 can be configured to capture information representative a sequence of images, including video data, and passes the information to a video encoder module 102 adapted for use in embodiments of the invention. One example of such video encoder module 102 is the TW5864 from Intersil Techwell Inc., which can be adapted and/or configured to generate VAMD 103 related to video bitstream 105. In certain embodiments, video encoder 102 can be configured to generate one or more compressed video bitstream 105 that complies with industry standards and/or that is generated according to a proprietary specification. The video encoder 102 is typically configurable to produce VAMD103 that can comprise pixel domain video analytics information, such as information obtained directly from an analog-to-digital (“ND”) front end (e.g. at the video sensor 100) and/or from an encoding engine 102 as the encoding engine 102 is performing video compression to obtain video bitstream 103. VAMD103 may comprise block base video analytics information including, for example, macroblock (“MB”) level information such as motion vector, MB-type and/or number of non-zero coefficients, etc. A MB typically comprises a 16×16 pixel block.
In certain embodiments, VAMD 123 can comprise any video encoding intermediate data such as MB-type, motion vectors, non-zero coefficient (as per the H.264 standard), quantization parameter, DC or AC information, motion estimation metric sum of absolute value (“SAD”), etc. VAMD 123 can also comprise useful information such as motionFlag information generated in an analog to digital front end module, such module being found, for example, in the TW5864 device referenced above. VAMD is typically processed in VAE 104 to generate more advanced video intelligent information that may include, for example, motion indexing, background extraction, object segmentation, motion detection, virtual line detection, object counting, motion tracking and speed estimation.
Video analytics engine 104 can be configured to receive the VAMD103 from the encoder 102 and to process the VAMD103 using one or more video analytics algorithms based on application requirements. Video analytics engine 104 can generate useful video analytics results, such as background model, motion alarm, virtual line detections, electronic image stabilization parameters, etc. A more detailed example of a video analytics engine 104 is shown in FIG. 2. Video analytics results can comprise video analytics messages (“VAM”) that may be categorized into a global VAM class and a local VAM class. Global VAM includes video analytics messages applicable to a group of pictures, such as background frames, foreground object segmentation descriptors, camera parameters, predefined motion alarm regions coordination and index, virtual lines, etc. Local VAM can be defined as localized VAM applied to a specific individual video frame, and can include global motion vectors of a current frame, motion alarm region alarm status of the current frame, virtual line counting results, object tracking parameters, camera moving parameters, and so on.
In certain embodiments, an encoder generated video bitstream 105, VAMD 103 and VAM generated by video analytics engine 104 are packed together as a layered structure into a network bitstream 106 following a predefined packaging format. The network bitstream 106 can be sent though a network to client side of the system. The network bitstream 106 may be stored locally, on a server and/or on a remote storage device for future playback and/or dissemination.
FIG. 3 depicts an example of an H.264 standards-defined bitstream syntax, in which VAM and VAMD 103 can be packed into a supplemental enhancement information (“SEI”) network abstraction layer package unit. Following SPS, PPS and IDR network abstraction layer units, a global video analytics (“GVA”) SEI network abstraction layer unit can be inserted into network bitstream 106. The GVA network abstraction layer unit may include the global video analytics messages for a corresponding group of pictures, a pointer to the first local video analytics SEI network abstraction layer location within the group of pictures, and pointer to the next GVA network abstraction layer unit, and may include an indication of the duration of frames which the GVA applicable. Following each individual frame which is associated with VAM or VAMD elements, a local video analytics (“LVA”) SEI network abstraction layer unit is inserted right after the frame's payload network abstraction layer unit. The LVA can comprise local VAM, VAMD information and a pointer to a location of the next frame which has LVA SEI network abstraction layer unit. The amount of VAMD packed into an LVA network abstraction layer unit depends on the network bandwidth condition and the complexity of user video analytics requirement. For example, if sufficient network bandwidth is available, additional VAMD can be packed. The VAMD can be used by client side video analytics systems and may simplify and/or optimize performance of certain functions. When network bandwidth is limited, less VAMD may be sent to meet the network bandwidth constraints. While FIG. 3 illustrates a bitstream format for H.264 standards, the principles involved may be applied in other video standards and formats.
In certain embodiments of the invention, a client side system 12 receives and decodes the network bitstream106 sent from a server side system 10. The advantages of a layered video analytics system architecture, which can include facilitating and/or enabling a balanced partition of video analytics at multiple layers, become apparent at the client side 12. Layers can include server and client layers, pixel domain layers and motion domain layers. Global video analytics messages such as background frame, segmented object descriptors and camera parameters can enable a cost efficient yet complicated video analytics in the receiver side for many advanced video intelligent applications. The VAM enables an otherwise difficult or impossible level of video analytics efficiency in term of computational complexity and analytic accuracy.
In certain embodiments of the invention, the client side system 12 separates the compressed video bitstream 125, the VAMD 123 and the VAM from the network bitstream 106. The video bitsream can be decoded using decoder 124 and provided with VAMD 123 and associated VAM to client application 122. Client application typically employs video analytics techniques appropriate for the application at hand. For example, analytics may include background extraction, motion tracking, object detection, and other functions. Known analytics can be selected and adapted to use the VAMD 103 and VAM that were derived from the encoder 102 and video analytics engine 104 at the server side 10 to obtain richer and more accurate results 120. Adaptions of the analytics may be based on speed requirements, efficiency, and the enhanced information available through the VAM and VAMD 123.
Certain advantages may be accrued from video analytics system architecture and layered video analytics information embedded in network bitstreams according to certain aspects of the invention. For example, greatly improved video analytics efficiency can be obtained on the client side 12. In one example, video analytics engine 104 receives and processes encoder feedback VAMD to produce the video analytics information that may be embedded in the network bitstream 106. The use of embedded layered VAM provides users direct access to a video analytics message of interest, and permits use of VAM with limited or no additional processing. In one example, additional processing would be unnecessary to access the motion frame, number of object passing a virtual line, object moving speed and classification, etc. In certain embodiments, information related to object tracking may be generated using additional, albeit limited, processing related to the motion of the identified object. Information related to electronic image stabilization may be obtained by additional processing based on the global motion information provided in VAM. Accordingly, in certain embodiments, client side 12 video analytics efficiency can be optimized and performance can be greatly improved, consequently enabling processing of an increased number of channels.
Certain embodiments enable operation of high-accuracy video analytics applications on the client side 12. According to certain aspects of the invention, client side 12 video analytics may be performed using information generated on the server side 10. Without VAM embedded in the network bitstream 106, client side video analytics processing would have to rely on video reconstructed from the decoded video bitstream 125. Decoded bitstream 125 typically lacks some of the detailed information of the original video content (e.g. content provided by video sensor 100), which may be discarded or lost in the video compression process. Consequently, video analytics performed solely on the client side 12 cannot generally preserve the accuracy that can be obtained if the processing was performed at the server side 10, or at the client side 12 using VAMD 123 derived from original video content on the server side 10. Loss of accuracy due to analytics processing that is limited to client side 12 can exhibit problems with geometric center of an object, object segmentation, etc. Therefore, embedded VAM can enable improved system-level accuracy.
Certain embodiments of the invention enable fast video indexing, searching and other applications. In particular, embedded, layered VAM in the network bitstream enables fast video indexing, video searching, video classification applications and other applications in the client side. For instance, motion detection information, object indexing, foreground and background partition, human detection, human behavior classification information of the VAM can simplify client-side and/or downstream tasks that include, for example, video indexing, classification and fast searching in the client. Without VAM, a client generally needs vast computational power to process the video data and to rebuild the required video analytics information for a variety of applications including the above-listed applications. It will be appreciated that not all VAM can be accurately reconstructed at the client side 12 using video bitstream 125 and it is possible that certain applications, such as human behavioral analysis applications, cannot even be performed if VAM created at server side 10 is not available.
Certain embodiments of the invention permit the use of more complex server/client algorithms, partitioning of computational capability and balancing of network bandwidth. In certain embodiments, the video analytics system architecture allows video analytics to be partitioned between server and client sides based on network bandwidth availability, server and client computational capability and the complexity of the video analytics. In one example, in response to low network bandwidth conditions, the system can embed more condensed VAM in the network bitstream 106 after processing by the VAE 104. The VAM can include motion frame index, object index, and so on. After extracting the VAM from the bitstream, the client side 12 system can utilize the VAM to assist further video analytics processing. More VAMD 103 can be directly embedded into the network bitstream 106 and processing by the VAE 104 can be limited or halted when computational power is limited on the server side 10. Computational power on the server side 10 may be limited when, for example, the server side 10 system is embodied in a camera, a digital video recorder (“DVR”) or network video recorder (“NVR”). Certain embodiments may use client side 12 systems to process embedded VAMD 123 in order to accomplish the desired video analytics function system. In some embodiments, more video analytics functions can be partitioned and/or assigned to server side 10 when, for example, the client side is required to monitor and/or process multiple channels simultaneously. It will be appreciated, therefore, that a balanced video analytics system can be achieved for a variety of system configurations.

EXAMPLES

With reference to FIG. 2, certain embodiments provide electronic image stabilization (“EIS”) capabilities 220. EIS 220 finds wide application that can be used in video security applications. A current captured video frame is processed with reference to the previous reconstructed reference frame or frames and generates a global motion vector 202 for the current frame, utilizing the global motion vector to compensate the reconstructed image in the client side to reduce or eliminate image instability or shaking.
In a conventional pixel domain EIS algorithm, the current and previous reference frames are fetched, a block based or grey-level histogram based matching algorithm is applied to obtain local motion vectors, and the local motion vectors are processed to generate a pixel domain global motion vector. The drawbacks of the conventional approach include the high computational cost associated with the matching algorithm used to generate local motion vectors and the very high memory bandwidth required to fetch both current reconstructed frame and previous reference frames.
In certain embodiments of the invention, the video encoding engine 102 can generate VAMD 103 including block-based motion vectors, MB-type, etc., as a byproduct of video compression processing. VAMD 103 is fed into VAE 104, which can be configured to process the VAMD 103 information in order to generate global motion vector 202 as a VAM. The VAM is then embedded into the network bitstream 106 to transmit to the client side 12, typically over a network. A client side 12 processor can parse the network bitstream 106, extract the global motion information for each frame and apply global motion compensation to accomplish EIS 220.

Video Background Modeling

Certain embodiments of the invention comprise a video background modeling feature that can construct or reconstruct a background image 222 which can provide highly desired information for use in a wide variety of video surveillance applications, including motion detection, object segmentation, abundant object detection, etc. Conventional pixel domain background extraction algorithms operate on a statistical model of multiple frame co-located pixel values. For example, a Gauss model is used to model N continuous frames' co-located pixels and to select the mathematical most likely pixel value as the background pixel. If a video frame's height is denoted as H, width as W and continuous N frames to satisfy the statistical model requirement, then total W*H*N pixels are needed to process to generate a background frame.
In certain embodiments, MB-based VAMD 103 is used to generate the background information rather than pixel-based background information. According to certain aspects of the invention, the volume of information generated from VAMD 103 is typically only 1/256 of the volume of pixel-based information. In one example, MB based motion vector and non-zero-count information can be used to detect background from foreground moving object. FIG. 4A shows an original image with background and foreground objects, and FIG. 4B shows a typical background extracted by processing VAMD.
Certain embodiments of the invention provide systems and methods for motion detection 200 and virtual line counting 201. A motion detector 200 can be used to automatically detect motion of objects including humans, animals and/or vehicles entering predefined regions of interest. Virtual line detection and counting module 201 can detect a moving object that crosses an invisible line defined by user configuration and that can count a number of objects crossing the line as illustrated in FIGS. 5A and 5B. The virtual line can be based on actual lines in the image and can be a delineation of an area defined by a polygon, circle, ellipse or irregular area. In some embodiments, the number of objects crossing one or more lines can be recorded as an absolute number and/or as a statistical frequency and an alarm may be generated to indicate any line crossing, a threshold frequency or absolute number of crossings and/or an absence of crossings within a predetermined time. In certain embodiments, motion detection 200 and virtual line and counting 201 can be achieved by processing one or more MB-based VAMDs. Information such as motion alarm and object count across virtual line can be packed as VAM is transmitting to the client side 12. Motion indexing, object counting or similar customized applications can be easily archived by extracting the VAM with simple processing. It will be appreciated that configuration information may be provided from client side to server side as a form of feedback, using packed information as a basis for resetting lines, areas of interest and so on.
Certain embodiments of the invention provide improved object tracking within a sequence of video frames using VAMD 103. Certain embodiments can facilitate client side measurement of speed of motion of objects and can assist in identifying directions of movement. Furthermore, VAMD 103 can provide useful information related to video mosaics 221, including motion indexing and object counting.

System Description

Turning now to FIG. 6, certain embodiments of the invention employ a processing system that includes at least one computing system 60 deployed to perform certain of the steps described above. Computing system 60 may be a commercially available system that executes commercially available operating systems such as Microsoft Windows®, UNIX or a variant thereof, Linux, a real time operating system and or a proprietary operating system. The architecture of the computing system may be adapted, configured and/or designed for integration in the processing system, for embedding in one or more of an image capture system, communications device and/or graphics processing systems. In one example, computing system 60 comprises a bus 602 and/or other mechanisms for communicating between processors, whether those processors are integral to the computing system 60 (e.g. 604, 605) or located in different, perhaps physically separated computing systems 60. Typically, processor 604 and/or 605 comprises a CISC or RISC computing processor and/or one or more digital signal processors. In some embodiments, processor 604 and/or 605 may be embodied in a custom device and/or may perform as a configurable sequencer. Device drivers 603 may provide output signals used to control internal and external components and to communicate between processors 604 and 605.
Computing system 60 also typically comprises memory 606 that may include one or more of random access memory (“RAM”), static memory, cache, flash memory and any other suitable type of storage device that can be coupled to bus 602. Memory 606 can be used for storing instructions and data that can cause one or more of processors 604 and 605 to perform a desired process. Main memory 606 may be used for storing transient and/or temporary data such as variables and intermediate information generated and/or used during execution of the instructions by processor 604 or 605. Computing system 60 also typically comprises non-volatile storage such as read only memory (“ROM”) 608, flash memory, memory cards or the like; non-volatile storage may be connected to the bus 602, but may equally be connected using a high-speed universal serial bus (USB), Firewire or other such bus that is coupled to bus 602. Non-volatile storage can be used for storing configuration, and other information, including instructions executed by processors 604 and/or 605. Non-volatile storage may also include mass storage device 610, such as a magnetic disk, optical disk, flash disk that may be directly or indirectly coupled to bus 602 and used for storing instructions to be executed by processors 604 and/or 605, as well as other information.
In some embodiments, computing system 60 may be communicatively coupled to a display system 612, such as an LCD flat panel display, including touch panel displays, electroluminescent display, plasma display, cathode ray tube or other display device that can be configured and adapted to receive and display information to a user of computing system 60. Typically, device drivers 603 can include a display driver, graphics adapter and/or other modules that maintain a digital representation of a display and convert the digital representation to a signal for driving a display system 612. Display system 612 may also include logic and software to generate a display from a signal provided by system 600. In that regard, display 612 may be provided as a remote terminal or in a session on a different computing system 60. An input device 614 is generally provided locally or through a remote system and typically provides for alphanumeric input as well as cursor control 616 input, such as a mouse, a trackball, etc. It will be appreciated that input and output can be provided to a wireless device such as a PDA, a tablet computer or other system suitable equipped to display the images and provide user input.
In certain embodiments, computing system 60 may be embedded in a system that captures and/or processes images, including video images. In one example, computing system may include a video processor or accelerator 617, which may have its own processor, non-transitory storage and input/output interfaces. In another example, video processor or accelerator 617 may be implemented as a combination of hardware and software operated by the one or more processors 604, 605. In another example, computing system 60 functions as a video encoder, although other functions may be performed by computing system 60. In particular, a video encoder that comprises computing system 60 may be embedded in another device such as a camera, a communications device, a mixing panel, a monitor, a computer peripheral, and so on.
According to one embodiment of the invention, portions of the described invention may be performed by computing system 60. Processor 604 executes one or more sequences of instructions. For example, such instructions may be stored in main memory 606, having been received from a computer-readable medium such as storage device 610. Execution of the sequences of instructions contained in main memory 606 causes processor 604 to perform process steps according to certain aspects of the invention. In certain embodiments, functionality may be provided by embedded computing systems that perform specific functions wherein the embedded systems employ a customized combination of hardware and software to perform a set of predefined tasks. Thus, embodiments of the invention are not limited to any specific combination of hardware circuitry and software.
The term “computer-readable medium” is used to define any medium that can store and provide instructions and other data to processor 604 and/or 605, particularly where the instructions are to be executed by processor 604 and/or 605 and/or other peripheral of the processing system. Such medium can include non-volatile storage, volatile storage and transmission media. Non-volatile storage may be embodied on media such as optical or magnetic disks, including DVD, CD-ROM and BluRay. Storage may be provided locally and in physical proximity to processors 604 and 605 or remotely, typically by use of network connection. Non-volatile storage may be removable from computing system 604, as in the example of BluRay, DVD or CD storage or memory cards or sticks that can be easily connected or disconnected from a computer using a standard interface, including USB, etc. Thus, computer-readable media can include floppy disks, flexible disks, hard disks, magnetic tape, any other magnetic medium, CD-ROMs, DVDs, BluRay, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, RAM, PROM, EPROM, FLASH/EEPROM, any other memory chip or cartridge, or any other medium from which a computer can read.
Transmission media can be used to connect elements of the processing system and/or components of computing system 60. Such media can include twisted pair wiring, coaxial cables, copper wire and fiber optics. Transmission media can also include wireless media such as radio, acoustic and light waves. In particular radio frequency (RF), fiber optic and infrared (IR) data communications may be used.
Various forms of computer readable media may participate in providing instructions and data for execution by processor 604 and/or 605. For example, the instructions may initially be retrieved from a magnetic disk of a remote computer and transmitted over a network or modem to computing system 60. The instructions may optionally be stored in a different storage or a different part of storage prior to or during execution.
Computing system 60 may include a communication interface 618 that provides two-way data communication over a network 620 that can include a local network 622, a wide area network or some combination of the two. For example, an integrated services digital network (ISDN) may used in combination with a local area network (LAN). In another example, a LAN may include a wireless link. Network link 620 typically provides data communication through one or more networks to other data devices. For example, network link 620 may provide a connection through local network 622 to a host computer 624 or to a wide are network such as the Internet 628. Local network 622 and Internet 628 may both use electrical, electromagnetic or optical signals that carry digital data streams.
Computing system 60 can use one or more networks to send messages and data, including program code and other information. In the Internet example, a server 630 might transmit a requested code for an application program through Internet 628 and may receive in response a downloaded application that provides or augments functional modules such as those described in the examples above. The received code may be executed by processor 604 and/or 605.
Additional Descriptions of Certain Aspects of the Invention
The foregoing descriptions of the invention are intended to be illustrative and not limiting. For example, those skilled in the art will appreciate that the invention can be practiced with various combinations of the functionalities and capabilities described above, and can include fewer or additional components than described above. Certain additional aspects and features of the invention are further set forth below, and can be obtained using the functionalities and components described in more detail above, as will be appreciated by those skilled in the art after being taught by the present disclosure.
Certain embodiments of the invention provide video processing systems and methods. Some of these embodiments comprise a processor configured to receive video frames representative of a sequence of images captured by a video sensor. Some of these embodiments comprise a video encoder operative to encode the video frames according to a desired video encoding standard. Some of these embodiments comprise a video analytics processor that receives video analytics metadata generated by the video encoder from the sequence of images. In some of these embodiments, the video analytics processor is configurable to produce video analytics messages for transmission to a client device. In some of these embodiments, the video analytics messages are used for client side video analytics processing.
In some of these embodiments, the video analytics metadata comprise pixel domain video analytics information. In some of these embodiments, the pixel domain video analytics information includes information received directly from an analog-to-digital front end. In some of these embodiments, the pixel domain video analytics information includes information received directly from an encoding engine as the engine is performing compression. In some of these embodiments, the video analytics messages include information related to one or more of a background model, a motion alarm, a virtual line detection and electronic image stabilization parameters. In some of these embodiments, the video analytics messages comprise video analytics messages related to a group of images, including messages related to one or more of a background frame, a foreground object segmentation descriptor, a camera parameter, a virtual line and a predefined motion alarm region.
In some of these embodiments, the video analytics messages comprise video analytics messages related to an individual video frame, including messages related to one or more of a global motion vector, a motion alarm region alarm status, a virtual line count, an object tracking parameter and a camera motion parameter. In some of these embodiments, the video analytics messages are transmitted to the client device in a layered structure network bitstream comprising encoder generated video bitstream, a portion of the video analytics metadata. In some of these embodiments, the video analytics messages and the portion of the video analytics metadata are transmitted in a supplemental enhancement information network abstraction layer package unit of an H.264 bitstream.
Certain embodiments of the invention provide video decoding systems and methods. Some of these embodiments comprise a decoder configured to extract a video frame and one or more video analytics messages from a network bitstream. In some of these embodiments, the video analytics messages provide information related to characteristics of the video frame. Some of these embodiments comprise one or more video processors configured to produce video analytics metadata related to the video frame based on content of the video frame and the video analytics messages.
In some of these embodiments, the video analytics metadata comprise pixel domain video analytics information received directly from an analog-to-digital front end. In some of these embodiments, the video analytics metadata comprise pixel domain video analytics information received directly from an encoding engine as the engine was performing compression. In some of these embodiments, the video analytics messages comprise video analytics messages related to a plurality of video frames, including messages related to one or more of a background frame, a foreground object segmentation descriptor, a camera parameter, a virtual line and a predefined motion alarm region. In some of these embodiments, the video analytics messages comprise video analytics messages related to an individual video frame, including messages related to one or more of a global motion vector, a motion alarm region alarm status, a virtual line count, an object tracking parameter and a camera motion parameter.
In some of these embodiments, the video analytics messages are received in a supplemental enhancement information network abstraction layer package unit of an H.264 bitstream. In some of these embodiments, the video analytics messages are received in a supplemental enhancement information network abstraction layer package unit of an H.264 bitstream and together with a portion of the pixel domain video analytics information. In some of these embodiments, the one or more video processors configured to produce a global motion vector. In some of these embodiments, the one or more video processors provide electronic image stabilization based on the video analytics messages. In some of these embodiments, the one or more video processors extract a background image for a plurality of video frames based on the video analytics messages. In some of these embodiments, the one or more video processors use the video analytics messages to monitor objects crossing a virtual line in a plurality of video frames.
Although the present invention has been described with reference to specific exemplary embodiments, it will be evident to one of ordinary skill in the art that various modifications and changes may be made to these embodiments without departing from the broader spirit and scope of the invention. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.

Claims

What is claimed is:

1. A video processing system comprising:

a video encoder operative to encode a sequence of images captured by a video sensor into video frames according to a desired video encoding standard and to generate video analytics metadata based on information in the sequence of images;

a video analytics processor configured to receive and process the video analytics metadata to produce video analytics messages suitable for transmission to a client device and that are useable for client-side video analytics processing.

2. The video processing system of claim 1, wherein the video analytics metadata comprise pixel domain video analytics information received directly from an analog-to-digital front end.

3. The video processing system of claim 1, wherein the video encoder comprises an encoding engine, and wherein the video analytics metadata comprise pixel domain video analytics information received directly from the encoding engine and generated as the encoding engine is performing compression on the sequence of images.

4. The video processing system of claim 3, wherein the video analytics messages include information related to one or more of a background model, a motion alarm, a virtual line detection and electronic image stabilization parameters.

5. The video processing system of claim 2, wherein the video analytics messages comprise video analytics messages related to a group of images and include messages related to one or more of a background frame, a foreground object segmentation descriptor, a camera parameter, a virtual line and a predefined motion alarm region.

6. The video processing system of claim 1, wherein the video analytics messages comprise video analytics messages related to an individual video frame and include messages related to one or more of a global motion vector, a motion alarm region alarm status, a virtual line count, an object tracking parameter and a camera motion parameter.

7. The video processing system of claim 1, wherein the video processing system is configured to transmit video analytics messages to the client device in a layered structured network bitstream comprising an encoder-generated video bitstream and at least a portion of the video analytics metadata.

8. The video processing system of claim 7, wherein the video analytics messages and the portion of the video analytics metadata are transmitted in a supplemental enhancement information network abstraction layer package unit of an H.264 bitstream.

9. A video decoding system comprising:

a decoder configured to extract video frames and one or more video analytics messages from a network bitstream, wherein the video analytics messages comprise information derived from pixel domain video analytics information which identifies characteristics of a sequence of images represented in the video frames; and

one or more video processors configured to produce video analytics metadata related to the video frame based on the extracted video frames and the information in the video analytics messages.

10. The video decoding system of claim 9, wherein the video analytics metadata comprise pixel domain video analytics information generated directly by an analog-to-digital front end.

11. The video decoding system of claim 9, wherein the video analytics metadata comprise pixel domain video analytics information generated directly by an encoding engine as the engine performed compression on the sequence of images.

12. The video decoding system of claim 11, wherein the video analytics messages are received with a portion of the pixel domain video analytics information in a supplemental enhancement information network abstraction layer package unit of an H.264 bitstream.

13. The video decoding system of claim 9, wherein one or more video processors extract a background image for a plurality of the video frames based on the information in the video analytics messages.

14. The video decoding system of claim 9, wherein one or more video processors use the information in the video analytics messages to monitor objects crossing a virtual line observed in a plurality of the video frames.

15. The video decoding system of claim 9, wherein the one or more video processors are configured to produce a global motion vector using the information in the video analytics messages.

16. The video decoding system of claim 9, wherein one or more video processors provide electronic image stabilization based on the information in the video analytics messages.

17. The video decoding system of claim 9, wherein the video analytics messages include information concerning one or more of a background frame, a foreground object a segmentation descriptor, a camera parameter, a virtual line and a predefined motion alarm region.

18. The video decoding system of claim 9, wherein the video analytics messages comprise video analytics messages concerning an individual video frame and including information related to one or more of a global motion vector, a motion alarm region alarm status, a virtual line count, an object tracking parameter and a camera motion parameter.

19. A non-transitory computer-readable medium encoded with data and instructions wherein the data and instructions, when executed by a processor of a video processing system, cause the video processing system to perform a method comprising:

encoding a sequence of images captured by a video sensor into video frames according to a desired video encoding standard;

generating pixel domain video analytics information from the sequence of images while encoding the sequence of images;

producing video analytics messages using the pixel domain video analytics information; and

transmitting the video analytics messages concurrently with the video frames, wherein the video analytics messages are configured to facilitate client-side video analytics processing of the video frames.

20. The non-transitory computer-readable medium of claim 19, wherein certain video analytics messages correspond to an individual video frame and relate to one or more of a global motion vector, a motion alarm region, a virtual line, object tracking and camera motion.