CN108595469A

CN108595469A - A kind of semantic-based agricultural machinery monitor video image section band Transmission system

Info

Publication number: CN108595469A
Application number: CN201810181230.0A
Authority: CN
Inventors: 王硕; 李登辉; 安新宇
Original assignee: Luoyang Kelon Creative Technology Ltd
Current assignee: Luoyang Kelon Creative Technology Ltd
Priority date: 2018-03-06
Filing date: 2018-03-06
Publication date: 2018-09-28

Abstract

The present invention relates to intelligent network connection agricultural machinery systems technology fields, a kind of disclosed semantic-based agricultural machinery monitor video image section band Transmission system, information collection and video monitoring are carried out by vehicle-mounted camera and roadside field fixing camera, the key frame and movement locus of video are extracted again, and the original motion trajectory to extracting carries out post-processing operation, obtains its direction of motion and track G-bar information；Then Object identifying operation is carried out to key frame, recognition result is added in video semantic nouns tag set；Then name-matches are carried out with the trail file in semantic nouns label and training set, find out corresponding trail file and calculates track semantic similarity, select the highest trail file of k similarity；The present invention can improve the computing capability and storage capacity of the network equipment, and the bandwidth consumption for exchanging or saving network transmission for storage capacity is calculated using network element appropriate, that is, improves communication capacity, and be conducive to the inquiry and retrieval of video data.

Description

A kind of semantic-based agricultural machinery monitor video image section band Transmission system

Technical field

The present invention relates to intelligent network connection agricultural machinery systems technology field more particularly to a kind of semantic-based agricultural machinery monitor videos Image section band Transmission system.

Background technology

In intelligent network joins agricultural machinery system, video data is to be transferred to data center and core net by mobile network. However, due to farmland geographical location and floor space etc., intelligent network joins the mobile communications network of agricultural machinery system：Core net with The medium that access net is transmitted as mobile video business is but faced with bandwidth, wired and wirelessly limited congenital disadvantage.

It is for the video information equally measured, how in its useful information of guarantee to save with the specific goal in research of system is changed It does not lose and under the premise of video quality does not reduce, reduces as far as possible in its transmission process to transfer resource, is i.e. bandwidth disappears Consumption.

The video content that intelligent network connection agricultural machinery generates in operation process much all need to only transmit its content information, User is simultaneously not concerned with detailed video pictures, and is intended to obtain information from video pictures, for example, encountered in traveling process barrier, Semantic information can only be transmitted to feed back video content by driving to the regular events such as field boundary.Since semantic information is compared In video information, its scale of construction is few too many, therefore in intelligent network joins agricultural machinery system, semantic-based section bandization technology is extremely important. Meanwhile the semantic information by extracting video, video data can be converted from the representation based on frame based on semantic right to The representation of elephant so that computer can more accurately understand video data, convenient for being better achieved to video content Expression, is also more conducive to the inquiry and retrieval of video data.Intelligent network joins in agricultural machinery system, and video data is to pass through mobile network Network is transferred to data center and core net.However, due to farmland geographical location and floor space etc., intelligent network joins agricultural machinery The medium that the core net of the mobile communications network of system is transmitted with access net as mobile video business, is but faced with bandwidth, has Line and the congenital disadvantage being wirelessly limited.Therefore, such as how existing hardware technology is incited somebody to action to store and transmit a large amount of video data As one of problem in the urgent need to address.

Invention content

The present invention proposes a kind of semantic-based agricultural machinery monitor video image section band Transmission system.

For achieving the above object, the present invention adopts the following technical scheme that：

A kind of semantic-based agricultural machinery monitor video image section band Transmission system, it is special for the process that agricultural machinery is advanced Sign is：Information collection and video monitoring are carried out by vehicle-mounted camera and roadside field fixing camera, then extracts video Key frame and movement locus, and the original motion trajectory to extracting carry out post-processing operation, obtain its direction of motion and Track G-bar information；Then Object identifying operation is carried out to key frame, recognition result is added to video semantic nouns mark In label set；Then name-matches are carried out with the trail file in semantic nouns label and training set, finds out corresponding track text Part simultaneously calculates track semantic similarity, selects the highest trail file of k similarity；Finally, it reads of the same name with k trail file Training set mark file, result is added in video semantic verbs tag set；Fed back by transmitting semantic information Video content；

Video data is wherein fed back, its language is expressed in such a way that semantic nouns label and semantic verbs label are combined It is Moving Objects in video that adopted information, wherein semantic nouns label is corresponding, and corresponding semantic verbs label is Moving Objects Specific action behavior, the two combines the semantic label as video；

1, semantic nouns tag extraction：

In terms of semantic nouns label, the operations such as shot segmentation and key-frame extraction are carried out to test video first, depending on Then the key frame of frequency carries out Object identifying, you can obtain semantic nouns label to key frame.

1) shot segmentation

Shot segmentation algorithm based on histogram, be the gray scale of each pixel between consecutive frame, brightness are divided into it is N number of Grade, then make histogram for each grade statistical pixel number and compare, the histogram of two images is provided, then difference of histograms Calculation formula is as follows：

Wherein, N is the sum of image frame pixel；What hm (i)-hn (i) was indicated is two video frame in this histogram list of i Distance above position；Do not consider the location information of pixel based on histogram method, and uses the statistical value of its brightness and color；

2) key-frame extraction

Key-frame Extraction Algorithm based on cluster, the key-frame extraction based on cluster, implementation step：

The first step：If some camera lens Mi includes n picture frame, it is expressed as Mi={ N₁... ..., N_n, wherein N₁Headed by frame, N_nFor tail frame；If the similarity between adjacent two frame define be this adjacent two frames color histogram similarity, that is to say straight Square figure characteristic difference predefines the density of a threshold value δ control cluster；

Second step:Calculate present frame N_iWith it is existing some cluster barycenter between similarity, if the value be less than δ, the frame Distance is larger between the cluster, therefore N_iIt cannot be added in the cluster；If Ni and all existing cluster barycenter similarities are small In δ, then it is its barycenter that Ni, which forms new a cluster and Ni,；Otherwise the frame is added to and is similarly spent in maximum cluster, Keep the distance between the frame and this barycenter clustered minimum；

Third walks：After the n picture frame that camera lens Mi is included is referred to different clusters respectively by first two steps, pass is selected Key frame：The representative frame that the frame nearest from cluster barycenter is clustered as this, the representative of all clusters are extracted from each cluster Frame just constitutes the key frame of camera lens Mi；

2, semantic verbs tag extraction：

In terms of semantic verbs label, the fortune of test video is obtained by the related algorithm of Moving Objects detect and track first Dynamic track data, then investigates the semantic similarity between the track data and the track data of training set video of test video, The semantic nouns label of test video and the marked content of training set video are contacted simultaneously, can be obtained verb after comprehensive analysis Semantic label；

1) movement locus extraction and analysis when extracting movement locus, usually first carry out Moving Objects detection operation, then right The Moving Objects that detected are into line trace, by the result of tracking with coordinate representation out to get movement locus；Extraction movement rail When mark, Moving Objects detection operation is usually first carried out, then to the Moving Objects that detected into line trace, by the result of tracking With coordinate representation out to get movement locus；

2) video trains set construction method, using the method for machine learning come when extracting video semanteme, construction video is trained Collection；Way is first selecting video, extracts the related data of key frame and movement locus, and longest one is chosen in movement locus Item, and its information is extracted by track post-processing operation, include the G-bar of the direction of motion, path curves；Then right These videos carry out manual mark, and marked content is the motor behavior of object；Wherein, key frame is used for video semantic nouns label Extraction, motion track information and the content marked by hand are prepared for video semantic verbs tag extraction；To in training set Each video extract its related data and carry out manual mark, after the completion of all operations, video training set i.e. construct finish；

3) semantic verbs tag extraction algorithm, semantic verbs tag extraction algorithm are to be based on machine learning thought, with training The data of collection are matched with the data of video to be analyzed, and process is as follows：The key frame and movement locus of video are extracted first, And the original motion trajectory to extracting carries out post-processing operation, obtains its direction of motion and track G-bar information；It connects It and Object identifying operation is carried out to key frame, recognition result is added in video semantic nouns tag set；Then noun is used Trail file in semantic label and training set carries out name-matches, finds out corresponding trail file and to calculate track semanteme similar Degree, selects the highest trail file of k similarity；Finally, it reads and marks file with k trail file training set of the same name, it will As a result it is added in video semantic verbs tag set.

Due to using technical solution as described above, the present invention that there is following superiority：

It is to be directed to the video information equally measured that agricultural machinery net connection, which drives section with the specific goal in research of system is changed, how to be protected Hinder its useful information not lose, under the premise of video quality does not reduce, reduce as far as possible in its transmission process to transfer resource, That is the consumption of bandwidth.Meanwhile under the premise of communication bandwidth is limited, improving the computing capability and storage capacity of the network equipment It is relatively easy, feasible method.Therefore, section is exactly to utilize network element appropriate with the core concept of image transmitting processing method is changed Calculating and storage capacity, exchange or save the bandwidth consumption of network transmission, i.e. communication capacity for.

The video content that intelligent network connection agricultural machinery generates in operation process much all need to only transmit its content information, User is simultaneously not concerned with detailed video pictures, and is intended to obtain information from video pictures, for example, encountered in traveling process barrier, Semantic information can only be transmitted to feed back video content by driving to the regular events such as field boundary.Since semantic information is compared In video information, its scale of construction is few too many, therefore in intelligent network joins agricultural machinery system, semantic-based section bandization technology is extremely important. Meanwhile the semantic information by extracting video, video data can be converted from the representation based on frame based on semantic right to The representation of elephant so that computer can more accurately understand video data, convenient for being better achieved to video content Expression, is also more conducive to the inquiry and retrieval of video data.It can be seen that video semanteme extractive technique has important research Meaning and application value.

Description of the drawings

Fig. 1 is video training set construction flow chart；

Fig. 2 is semantic verbs tag extraction algorithm flow chart.

Specific implementation mode

As shown in Figure 1, 2, in intelligent network joins agricultural machinery system, since the scale of construction of video information is compared to other kinds of letter Breath, such as control information, load information, text message etc. want huge more, therefore the transmission pressure of video information is this system Need solve critical issue.The present invention solves under the congenital disadvantage of Bandwidth-Constrained, is come with existing hardware technology Store and transmit a large amount of video data.And propose section bandization image transmitting processing method.Wherein label is that expression is semantic A kind of effective means, the characteristics of according to video data, may be used semantic nouns label and semantic verbs label be combined Mode expresses its semantic information, and it is Moving Objects in video, semantic verbs label pair that wherein semantic nouns label is corresponding What is answered is the specific action behavior of Moving Objects, and the two combines the semantic label as video.

Semantic nouns tag extraction：

Shot segmentation

Shot segmentation algorithm based on histogram

It realizes simple and convenient, and can obtain preferable effect to agricultural machinery monitor video, is the most universal segmentation side Method.Algorithm based on histogram is typically that the gray scale of each pixel between consecutive frame, brightness are divided into N number of grade, then needle Histogram is made to each grade statistical pixel number to compare, and provides the histogram of two images, then difference of histograms calculation formula As follows：

Wherein, N is the sum of image frame pixel.What hm (i)-hn (i) was indicated is two video frame in this histogram list of i Distance above position.Do not consider the location information of pixel based on histogram method, and uses the statistical value of its brightness and color, disadvantage Different to structure and very similar two frame of histogram cause missing inspection and also for light variation relatively acutely in the case of, frame Difference can be by prodigious interference.

Key-frame extraction

Key frame is the picture frame that can describe Video Key content.It reflects most intuitive and most worthy letter in camera lens Breath.In order to enable users to understand at a glance the content of video as far as possible, can generally be taken when extracting key frame " it is peaceful it is wrong not Conservatism less ".But this conservatism is taken, the number of key frames that may result in extraction is excessive, causes key frame Dizzy multiple and redundancy.Therefore, suitable Key-frame Extraction Algorithm can extract most representative picture frame, and not generate too many superfluous Remaining, this is the direction of key-frame extraction technique primary study.The Key-frame Extraction Algorithm based on cluster will be introduced below, is Next research work lays the foundation.

Key-frame extraction based on cluster

Realize step：

The first step：If some camera lens Mi includes n picture frame, Mi={ N can be expressed as₁... ..., N_n, wherein N₁For First frame, N_nFor tail frame.If the similarity between adjacent two frame define be this adjacent two frames color histogram similarity (namely It is histogram feature difference), predefine the density of a threshold value δ control cluster.

Second step:Calculate present frame N_iWith it is existing some cluster barycenter between similarity, if the value be less than δ, the frame Distance is larger between the cluster, therefore N_iIt cannot be added in the cluster.If Ni and all existing cluster barycenter similarities are small In δ, then it is its barycenter that Ni, which forms new a cluster and Ni,；Otherwise the frame is added to and is similarly spent in maximum cluster, Keep the distance between the frame and this barycenter clustered minimum.

Third walks：After the n picture frame that camera lens Mi is included is referred to different clusters respectively by first two steps, so that it may with Select key frame：The representative frame that the frame nearest from cluster barycenter is clustered as this, all clusters are extracted from each cluster Representative frame just constitute the key frame of camera lens Mi.

Semantic verbs tag extraction：

In terms of semantic verbs label, the fortune of test video is obtained by the related algorithm of Moving Objects detect and track first Dynamic track data, then investigates the semantic similarity between the track data and the track data of training set video of test video, The semantic nouns label of test video and the marked content of training set video are contacted simultaneously, can be obtained verb after comprehensive analysis Semantic label.

◆ movement locus extracts and analysis

When extracting movement locus, usually first carry out Moving Objects detection operation, then to the Moving Objects that detected into Line trace, by the result of tracking with coordinate representation out to get movement locus.When extracting movement locus, usually first moved Object detection operates, then to the Moving Objects that detected into line trace, by the result of tracking with coordinate representation out to get Movement locus.

◆ video trains set construction method

Using the method for machine learning come when extracting video semanteme, construction video training set is very important a step Suddenly.Way herein is first to choose a certain number of videos, extracts the related datas such as key frame and movement locus, is being moved Longest one is chosen in track and its information, including the direction of motion, path curves are extracted by track post-processing operation G-bar etc.；Then manual mark is carried out to these videos, marked content is mainly the motor behavior of object.Wherein, it closes Key frame is used for video semantic nouns tag extraction, and motion track information and the content marked by hand are then for video semantic verbs mark Label extraction is prepared.Its related data is extracted to each video in training set and carries out manual mark, all operations are completed Afterwards, video training set is constructed and is finished.

◆ semantic verbs tag extraction algorithm

Semantic verbs tag extraction algorithm is based primarily upon machine learning thought, with data and the video to be analyzed of training set Data are matched, and main process is as follows：The key frame and movement locus of video are extracted first, and original to what is extracted Movement locus carries out post-processing operation, obtains the information such as its direction of motion and track G-bar；Then key frame is carried out pair As identification operation, recognition result is added in video semantic nouns tag set；Then semantic nouns label and training set are used In trail file carry out name-matches, find out corresponding trail file and calculate track semantic similarity, select k similarity Highest trail file；Finally, it reads and marks file with k trail file training set of the same name, result is added to video and is moved In word justice tag set.

In the process that agricultural machinery is advanced, information collection and video are carried out by vehicle-mounted camera and roadside field fixing camera Monitoring, then the key frame and movement locus of video are extracted, and the original motion trajectory to extracting carries out post-processing behaviour Make, obtains the information such as its direction of motion and track G-bar；Then Object identifying operation is carried out to key frame, by recognition result It is added in video semantic nouns tag set；Then title is carried out with the trail file in semantic nouns label and training set Match, find out corresponding trail file and calculate track semantic similarity, selects the highest trail file of k similarity；Finally, it reads It takes and marks file with k trail file training set of the same name, result is added in video semantic verbs tag set.Thus Video content can be fed back by transmitting semantic information.

Claims

1. a kind of semantic-based agricultural machinery monitor video image section band Transmission system, for the process that agricultural machinery is advanced, feature It is：Information collection and video monitoring are carried out by vehicle-mounted camera and roadside field fixing camera, then extracts video Key frame and movement locus, and the original motion trajectory to extracting carries out post-processing operation, obtains its direction of motion and rail Mark G-bar information；Then Object identifying operation is carried out to key frame, recognition result is added to video semantic nouns label In set；Then name-matches are carried out with the trail file in semantic nouns label and training set, finds out corresponding trail file And track semantic similarity is calculated, select the highest trail file of k similarity；Finally, it reads of the same name with k trail file Training set marks file, and result is added in video semantic verbs tag set；Regarded by transmitting semantic information to feed back Frequency content；

Video data is wherein fed back, its semanteme letter is expressed in such a way that semantic nouns label and semantic verbs label are combined It is Moving Objects in video that breath, wherein semantic nouns label be corresponding, and it is the tool of Moving Objects that semantic verbs label is corresponding Body action behavior, the two combine the semantic label as video；

(1), semantic nouns tag extraction：

In terms of semantic nouns label, the operations such as shot segmentation and key-frame extraction are carried out to test video first, obtain video Then key frame carries out Object identifying, you can obtain semantic nouns label to key frame.

1) shot segmentation

Shot segmentation algorithm based on histogram, be the gray scale of each pixel between consecutive frame, brightness are divided into it is N number of etc. Grade, then make histogram for each grade statistical pixel number and compare, the histogram of two images is provided, then difference of histograms meter It is as follows to calculate formula：

Wherein, N is the sum of image frame pixel；What hm (i)-hn (i) was indicated is two video frame in this histogram map unit of i The distance in face；Do not consider the location information of pixel based on histogram method, and uses the statistical value of its brightness and color；

2) key-frame extraction

The first step：If some camera lens Mi includes n picture frame, it is expressed as Mi={ N₁... ..., N_n, wherein N₁Headed by frame, N_nFor Tail frame；If the similarity between adjacent two frame define be this adjacent two frames color histogram similarity, that is to say histogram Characteristic difference predefines the density of a threshold value δ control cluster；

Second step:Calculate present frame N_iWith the similarity between some existing cluster barycenter, if the value is less than δ, the frame is poly- with this Distance is larger between class, therefore N_iIt cannot be added in the cluster；If N_iIt is respectively less than δ with all existing cluster barycenter similarities, then N_i Form new a cluster and N_iFor its barycenter；Otherwise the frame is added to and is similarly spent in maximum cluster, make the frame with The distance between the barycenter of this cluster minimum；

Third walks：After the n picture frame that camera lens Mi is included is referred to different clusters respectively by first two steps, key is selected Frame：The representative frame that the frame nearest from cluster barycenter is clustered as this, the representative frame of all clusters are extracted from each cluster Just constitute the key frame of camera lens Mi；

(2), semantic verbs tag extraction：

In terms of semantic verbs label, the movement rail of test video is obtained by the related algorithm of Moving Objects detect and track first Then mark data investigate the semantic similarity between the track data and the track data of training set video of test video, simultaneously The semantic nouns label of test video and the marked content of training set video are contacted, can be obtained semantic verbs after comprehensive analysis Label；

1) movement locus extraction and analysis when extracting movement locus, usually first carry out Moving Objects detection operation, then to detection Moving Objects out are into line trace, by the result of tracking with coordinate representation out to get movement locus；Extract movement locus When, Moving Objects detection operation is usually first carried out, then the Moving Objects that detected are used the result of tracking into line trace Coordinate representation is out to get movement locus；

2) video trains set construction method, and when extracting video semanteme, video training set is constructed using the method for machine learning；It does Method is first selecting video, extracts the related data of key frame and movement locus, chooses longest one in movement locus, and Its information is extracted by track post-processing operation, includes the G-bar of the direction of motion, path curves；Then these are regarded Frequency carries out manual mark, and marked content is the motor behavior of object；Wherein, key frame is used for video semantic nouns tag extraction, Motion track information and the content marked by hand are prepared for video semantic verbs tag extraction；To each of training set Video all extracts its related data and carries out manual mark, and after the completion of all operations, video training set is constructed and finished；

3) semantic verbs tag extraction algorithm, semantic verbs tag extraction algorithm are to be based on machine learning thought, with training set Data are matched with the data of video to be analyzed, and process is as follows：The key frame and movement locus of video are extracted first, and right The original motion trajectory extracted carries out post-processing operation, obtains its direction of motion and track G-bar information；Then right Key frame carries out Object identifying operation, and recognition result is added in video semantic nouns tag set；Then semantic nouns are used Trail file in label and training set carries out name-matches, finds out corresponding trail file and calculates track semantic similarity, Select the highest trail file of k similarity；Finally, it reads and marks file with k trail file training set of the same name, by result It is added in video semantic verbs tag set.