WO2018176303A1

WO2018176303A1 - Video transmitting and receiving method, system, and device, and unmanned aerial vehicle

Info

Publication number: WO2018176303A1
Application number: PCT/CN2017/078728
Authority: WO
Inventors: 朱磊; 崔浩; 龚明
Original assignee: 深圳市大疆创新科技有限公司
Priority date: 2017-03-30
Filing date: 2017-03-30
Publication date: 2018-10-04
Also published as: CN108496369A

Abstract

Embodiments of the present invention provide a video transmitting and receiving method, system, and device, and an unmanned aerial vehicle. The method comprises: decomposing video data into a plurality of sub-video data units; separately encoding the plurality of sub-video data units; and selecting one or more encoded sub-video data units on the basis of one or more characteristics of a channel and one or more characteristics of the sub-video data units, and transmitting the one or more encoded sub-video data units. In the embodiments of the present invention, video data is decomposed into a plurality of sub-video data units, and the plurality of sub-video data units are separately encoded, and one or more encoded sub-video data units are selected according to a channel characteristic and a characteristic of the sub-video data units, such that the selected one or more encoded sub-video data units match the channel characteristic. In this way, the problem of mismatch between a signal source and a channel can be effectively resolved, and transmission delay and jitter of video data caused by signal source-channel mismatch can be effectively reduced.

Description

Video transmission, receiving method, system, device and unmanned aerial vehicle

Technical field

Embodiments of the present invention relate to the field of image processing, and in particular, to a video transmission and reception method, system, device, and unmanned aerial vehicle.

Background technique

In the prior art, the unmanned aerial vehicle is equipped with a photographing device, and the photographing device can perform aerial photography, and the aerial video is transmitted to the ground receiving device such as a user terminal, a remote controller, and the like through a communication system of the unmanned aerial vehicle.

The scenes or objects captured by the UAV's shooting equipment may be different at different times. Therefore, the code stream data size corresponding to each frame of image data will change in real time (ie, the source will change in real time); in addition, it will be subject to unmanned aerial vehicles and receiving equipment. The distance between the distance, the relative position, the presence of occlusion, and the presence of electromagnetic interference, etc., the channel bandwidth between the UAV and the receiving device will also change in real time (ie, the channel will change in real time), and the source channel changes to each other. Independent and unpredictable. At present, the transmission and reception modes of frame-level image data are relatively fixed, which is difficult to adapt to real-time changing source channels, and lacks effective image transmission and image receiving methods, which may result in image transmission delay due to real-time source and channel mismatch problems. shake.

Summary of the invention

Embodiments of the present invention provide a video transmission and reception method, system, device, and an unmanned aerial vehicle to effectively reduce transmission delay jitter of video data.

An aspect of the embodiments of the present invention provides a video transmission method, including:

Decomposing video data into a plurality of sub-video data units, wherein each sub-video data unit includes one or more sub-images;

Encoding the plurality of sub-video data units separately;

One or more encoded sub-video data units are selected and transmitted based on one or more characteristics of the channel, and one or more characteristics of the sub-video data unit.

Another aspect of the embodiments of the present invention provides a video receiving method, including:

Receiving a plurality of encoded sub-video data units;

Decoding the plurality of encoded sub-video data units;

Reconstructing the video data according to the decoded sub-video data unit, wherein the video data includes one or more image frames, and the sub-video data unit includes a plurality of sub-decompositions obtained by decomposing each of the image frames At least one sub-image in the image.

Another aspect of the embodiments of the present invention provides a video transmission system, including:

One or more imaging devices configured to acquire video data;

One or more processors on the movable object, working alone or in concert, the processor being configured to:

Encoding the plurality of sub-video data units separately;

Another aspect of the embodiments of the present invention provides a receiving device, including: a communication interface, one or more processors, working alone or in cooperation, and the communication interface is in communication with the processor;

The communication interface is configured to receive a plurality of encoded sub-video data units;

The one or more processors are configured to: control a decoder to decode the plurality of encoded sub-video data units; reconstruct the video data according to the decoded sub-video data unit, wherein the video data includes one Or a plurality of image frames, the sub-video data unit including at least one of the plurality of sub-images obtained by decomposing each of the image frames.

Another aspect of an embodiment of the present invention provides an unmanned aerial vehicle comprising:

body;

a power system mounted to the fuselage for providing flight power;

And the above video transmission system.

The video transmission and receiving method, the system, the device and the unmanned aerial vehicle provided by the embodiment are respectively decomposed into a plurality of sub-video data units by the video data, and respectively, and the plurality of sub-video data units are respectively Encoding, selecting one or more encoded sub-video data units and said selected sub-video data units according to channel characteristics and characteristics of sub-video data units, such that one or more encoded sub-elements are selected The video data unit conforms to the channel characteristics, and when the selected one or more encoded sub-video data units are transmitted on the channel matched with the same, the problem of mismatch between the source and the channel can be effectively solved, and the video data can be effectively reduced. Transmission delay jitter caused by source channel mismatch problem.

DRAWINGS

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are some embodiments of the present invention. Other drawings may also be obtained from those of ordinary skill in the art in view of the drawings.

1 is a schematic diagram of transmission delay jitter according to an embodiment of the present invention;

2 is a flowchart of a video transmission method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a sub video data unit according to an embodiment of the present invention; FIG.

FIG. 4 is a schematic diagram of a sub video data unit according to another embodiment of the present invention; FIG.

FIG. 5 is a schematic diagram of a sub video data unit according to another embodiment of the present invention; FIG.

FIG. 6 is a flowchart of a video transmission method according to another embodiment of the present invention;

FIG. 7 is a schematic structural diagram of a frame image according to an embodiment of the present disclosure;

FIG. 8 is a coefficient image of a frame image after Hadamard transform according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of spatial transformation decomposition according to an embodiment of the present invention; FIG.

FIG. 10 is a schematic diagram of spatial downsampling decomposition according to an embodiment of the present invention; FIG.

FIG. 11 is a flowchart of a video receiving method according to an embodiment of the present invention;

FIG. 12 is a schematic diagram of a decoded sub-image according to an embodiment of the present invention;

FIG. 13 is a schematic diagram of a decoded sub-image according to another embodiment of the present invention; FIG.

FIG. 14 is a schematic diagram of reconstructing an original image according to an embodiment of the present invention;

FIG. 15 is a schematic diagram of reconstructing an original image according to another embodiment of the present invention; FIG.

FIG. 16 is a structural diagram of a video transmission system according to an embodiment of the present invention;

FIG. 17 is a structural diagram of a receiving device according to an embodiment of the present invention;

FIG. 18 is a structural diagram of an unmanned aerial vehicle according to an embodiment of the present invention.

Reference mark:

11-Sub Image 12-Sub Image 13-Sub Image 14-Sub Image

21-Sub Image 22-Sub Image 23-Sub Image 24-Sub Image

31-Sub Image 32-Sub Image 33-Sub Image 34-Sub Image

41-Sub Image 42-Sub Image 43-Sub Image 44-Sub Image

51-sub image 52-sub image 53-sub image 54-sub image

61-sub image 62-sub image 63-sub image 64-sub image

310-Sub Video Data Unit 320-Sub Video Data Unit 330-Sub Video Data Unit

340-sub video data unit 410-sub video data unit 420-sub video data unit

430-Sub Video Data Unit 50-Image Frame 510-Sub Video Data Unit

520-sub video data unit 530-sub video data unit 540-sub video data unit

550-sub video data unit 560-sub video data unit 570-sub video data unit

580-sub video data unit 1600-video transmission system 1601-imaging device

1602-processor 1700-receiving device 1701-communication interface 1702-processor

1800-Unmanned aerial vehicle 1801-motor 1802-propeller

1803-Electronic governor 1804-Flight controller

1806-processor 1807-support device 1805-imaging device

detailed description

The technical solutions in the embodiments of the present invention will be clearly described with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

It should be noted that when a component is referred to as being "fixed" to another component, it can be directly on the other component or the component can be present. When a component is considered to "connect" another component, it can be directly connected to another component or possibly a central component.

All technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs, unless otherwise defined. The terminology used in the description of the present invention is for the purpose of describing particular embodiments, and is not intended to invention. The term "and/or" used herein includes any and all combinations of one or more of the associated listed items.

Some embodiments of the present invention are described in detail below with reference to the accompanying drawings. The features of the embodiments and examples described below can be combined with each other without conflict.

The stability of the transmission delay of image data is an important indicator to measure the performance of the image transmission system. The image data transmission delay is the basic condition for ensuring the smooth display of the video image at the receiving end. However, in the process of image transmission, the source and channel are used. The real-time change will cause the jitter of the transmission delay between frames and frames, and reduce the performance of the image transmission system. First, with reference to Figure 1, the source data changes and channel changes are taken as examples to describe the frame-level image data transmission delay in detail. Jitter problem.

Figure 1 includes scenario 1 and scenario 2. In scenario 1, the bandwidth of the channel between the sender and the receiver remains stable. In the process of transmitting image data using the channel, it is assumed that the camera at the transmitting end suddenly moves, or the object within the camera shooting range suddenly moves rapidly, for example, at a certain moment, the camera's subject is a blue sky at the next moment. The camera suddenly turns to shoot a colorful hot air balloon flying in the sky, causing the corresponding stream data size after frame 4 encoding to increase to twice the size of the corresponding stream data after the frame 3 encoding, that is, the sudden change of the source At this time, the transmission delay of frame 4 will become twice the transmission delay of frame 3.

In scenario 2, the code stream data corresponding to each frame image is basically stable, that is, the source remains stable. In the process of transmitting image data, it is assumed that the channel bandwidth corresponding to frame 4 suddenly drops to half of the channel bandwidth corresponding to frame 3. For example, when an unmanned aerial vehicle carrying a photographing device shoots a subject, the subject is substantially unchanged. During the flight, the UAV suddenly approaches the nearby wireless communication base station. At this time, the wireless communication base station will affect the transmission channel of the UAV, that is, the channel changes, and the bandwidth of the channel decreases to half of the original bandwidth. Similarly, the transmission delay of frame 4 will also become twice the transmission delay of frame 3.

It can be seen from the description of FIG. 1 that whether the source change or the channel change causes frame-level image data transmission delay jitter, and the source change and the channel change are independent of each other, and it is difficult to predict, currently for each frame. Image data is encoded in a relatively fixed manner and cannot be adapted to real-time changing source channels.

The method for video transmission provided by the embodiment of the present invention is described in detail below with reference to FIG. 1 to effectively reduce the transmission delay jitter of the video data.

In order to solve the above problem, an embodiment of the present invention provides a video transmission method. Figure 2 is the basis A flowchart of a video transmission method provided by an embodiment of the invention. As shown in FIG. 2, the method in this embodiment may include:

Step S201, the video data is decomposed into a plurality of sub-video data units, wherein each sub-video data unit includes one or more sub-images.

The execution body of the embodiment may be a processor, a controller, or a general-purpose processor having an image processing function, and is not specifically limited herein. In this embodiment, an image processor is taken as an example to introduce the principle of a video transmission method. The image processor acquires video data captured by a shooting device mounted on an unmanned aerial vehicle in real time, and the video data may include one frame image or continuous multi-frame. The image processor may decompose the video data into a plurality of sub-video data units. The embodiment does not limit the number of sub-video data units obtained by decomposing the video data, and each sub-video data unit includes one or more sub-images. Wherein, a feasible implementation manner of decomposing the video data into a plurality of sub-video data units is: decomposing each image frame included in the video data, that is, each frame image into a plurality of sub-images, and selecting at least one sub-image of each image frame. The selected sub-image constitutes a sub-video data unit, that is, the sub-video data unit includes at least one of a plurality of sub-images obtained by decomposing each image frame in the video data. This embodiment does not limit the number of image frames included in one video data. To schematically illustrate the decomposition process of the video data, it is assumed that the video data includes 6 image frames, that is, 6 frames, and further, in other embodiments, the video The number of image frames included in the data may also be other values.

As shown in FIG. 3, the video data includes six image frames, which are frame 1, frame 2, frame 3, frame 4, frame 5, frame 6, and frame 1, frame 2, frame 3, frame 4, and frame 5. The frame 6 is separately decomposed. In this embodiment, the number of sub-images into which each image frame is decomposed is not limited. As shown in FIG. 3, each image frame is decomposed into four sub-images, which is only schematically illustrated herein. The number of sub-images obtained after each image frame is decomposed may also be other values. Each of the sub-video data units includes at least one of the four sub-images corresponding to each of the image frames after each of the six image frames is decomposed. Optionally, each sub-video data unit includes one of the four sub-images corresponding to each of the six image frames. As shown in FIG. 3, the sub-video data unit 310 includes one sub-image 11 of the frame 1. One sub-image 21 of frame 2, one sub-image 31 of frame 3, one sub-image 41 of frame 4, one sub-image 51 of frame 5, and one sub-image 61 of frame 6; similarly, sub-video data unit 320, sub- The video data unit 330 and the sub video data unit 340 respectively include one sub image of each of the six image frames.

In addition, the number of sub-images included in different sub-video data units may also be different. As shown in FIG. 4, the sub-video data unit 410 includes two

sub-images

11, 12 of the frame 1, and two

sub-images

21, 22 of the frame 2. Two

sub-images

31, 32 of frame 3, one sub-image 41 of frame 4, one sub-image 51 of frame 5, one sub-image 61 of frame 6; sub-video data unit 420 includes one sub-image 13 of frame 1, frame 2 One sub-image 23, one sub-image 33 of frame 3, two sub-images 42, 43 of frame 4, two sub-images 52, 53 of frame 5, two sub-images 62, 63 of frame 6, sub-video data unit 430 includes One sub-image of each of the six image frames.

Optionally, the sub-images included in each sub-video data unit do not overlap. The manner in which at least one of the plurality of sub-images corresponding to each of the plurality of image frames is combined to form a sub-video data unit may also have other combinations, which are not enumerated here.

In addition, the video data may also include only one image frame, that is, one frame image. As shown in FIG. 5, 50 indicates an image frame included in the video data, and the image frame 50 is decomposed. This embodiment does not limit one image frame. The number of sub-images obtained after the decomposition is selected, and the image frame 50 is optionally decomposed into four sub-images, such as the sub-image 11, the sub-image 12, the sub-image 13, and the sub-image 14 as shown in FIG. Sub-picture 11, sub-image 12, sub-image 13, and sub-image 14 can be divided into the following achievable ways:

One achievable manner is that each sub-video data unit includes one sub-image, such as sub-video data unit 510, sub-video data unit 520, sub-video data unit 530, and sub-video data unit 540 as shown in FIG.

Another achievable manner is that each sub-video data unit includes two sub-images. This embodiment does not limit the combination of two sub-images included in one sub-video data unit. Alternatively, the sub-image shown in FIG. Video data unit 550 and sub-video data unit 560, wherein sub-video data unit 550 includes sub-image 11 and sub-image 12, sub-video data unit 560 includes sub-image 13 and sub-image 14.

Yet another achievable manner is that each sub-video data unit includes a different number of sub-images, such as sub-video data unit 570 and sub-video data unit 580 as shown in FIG. 5, wherein sub-video data unit 570 includes three sub-pictures. The image, the sub-video data unit 580 includes one sub-image, or the sub-video data unit 570 includes one sub-image, and the sub-video data unit 580 includes three sub-images, which are not limited in this embodiment, and constitute three sub-images of one sub-video data unit. Combination, optional, sub-video data unit 570 includes sub-image 11, sub-image 12, sub-picture Like 13, the sub-video data unit 580 includes a sub-image 14.

Step S202: encoding the plurality of sub-video data units separately.

The image processor separately encodes each of the plurality of sub-video data units by using each sub-video data unit as a coding unit, and obtains a plurality of code stream data after encoding, and optionally encodes one sub-video data unit. Then, a code stream data is obtained, where the coding includes source coding and/or channel coding, and the manner of source coding may include H.263, H.264, H.265, MPEG4, etc., and the channel coding method may include correcting The error coding type may include an RS code, that is, a Reed-Solomon code, a convolutional code, a Turbo code, a Polar code, an interleaving code, a pseudo random sequence scrambling code, and the like.

Step S203: Select one or more encoded sub-video data units based on one or more characteristics of the channel and one or more characteristics of the sub-video data unit.

In this embodiment, one or more characteristics of the channel include at least a bandwidth. Alternatively, one or more characteristics of the channel include at least one of: noise, interference, signal to noise ratio, bit error rate, fading rate, bandwidth.

The one or more characteristics of the sub-video data unit include: a code stream data size encoded by the sub-video data unit, or an energy concentration of the sub-video data unit.

The image processor selects one or more of the plurality of encoded sub-video data units for transmission over the wireless channel based on one or more characteristics of the current wireless channel and one or more characteristics of the sub-video data unit, eg, transmitting To the receiving device, the receiving device can be a remote control smartphone, a tablet, a ground control station, a laptop, a watch, a wristband, and the like, and combinations thereof. Wherein, selecting one or more encoded sub-video data units can be implemented in the following feasible ways:

A first feasible way is to select one or more encoded sub-video data units such that the total code stream data size of the one or more encoded sub-video data units matches the channel bandwidth.

For example, the image processor decomposes the video data to obtain four sub-video data units 310-340 as shown in FIG. 3, and encodes the four sub-video data units separately to obtain the size of the code stream data as S0, S1, S2, and S3. The bandwidth of the current wireless channel is T, and the image processor may select one or more code stream data from the four code stream data according to the bandwidth T of the wireless channel, and the selection may be based on: one or more selected codes. The total size of the stream is as close as possible to the bandwidth T of the wireless channel. That is, one or more of the plurality of sub-video data units are selected for combination, so that the total code stream data size of the combined sub-video data unit is as close as possible to the bandwidth of the wireless channel, and the wireless channel can transmit the code stream with the current bandwidth. The child video data unit is as large as possible. For example, the sum of S0, S1, S2, and S3 is less than or equal to the current bandwidth of the channel, and the encoded sub-video data unit 310, the sub-video data unit 320, the sub-video data unit 330, and the sub-video data unit 340 may be selected for transmission. When the sum of S0, S1, S2, and S3 is greater than T, the three largest code stream data may be selected from S0, S1, S2, and S3, and the three largest code stream data are S0, S1, and S2, if S0, If the sum of S1 and S2 is less than T, the image processor may select the encoded sub-video data unit 310, the sub-video data unit 320, and the sub-video data unit 330 for transmission, and so on, when the sum of S0, S1, S2 is greater than T, other combinations of the code stream data that can be transmitted with the bandwidth of the current wireless channel and the largest sub-video data unit can be selected.

The second feasible way is that a plurality of sub-video data units are prioritized according to the energy concentration. The one or more encoded sub-video data units are selected based on a priority of the sub-video data unit and a channel bandwidth.

Since the energy concentration of each of the plurality of sub-video data units may be the same or similar, it may be different. Therefore, in other embodiments, if the energy concentration of each sub-video data unit is different, it may be based on each The energy concentration of the sub-video data units prioritizes the plurality of sub-video data units. Alternatively, the greater the energy concentration, the higher the priority. For example, the image processor decomposes the video data to obtain four sub-video data units as shown in FIG. 3, 4 or 5, and the four sub-video data units are respectively recorded as sub-video data unit A, sub-video data unit B, and sub-video. The data unit C, the sub-video data unit D, and the priority of the four sub-video data units are successively decremented, the code stream data size encoded by the sub-video data unit A is S0, and the code stream data encoded by the sub-video data unit B is sequentially obtained. The size of the code stream data encoded by the sub video data unit C is S2, and the size of the code stream data encoded by the sub video data unit D is S3.

In some cases, the image processor needs to select and transmit the sub-video data unit according to the code stream data size and channel bandwidth of the encoded sub-video data unit, and in some cases, according to the priority and channel of the sub-video data unit. The bandwidth selects the sub-video data unit and transmits it. For example, one or more encoded sub-video data units selected from the above four encoded sub-video data units may be selected according to one or more selected sub-views selected. The total code stream data size of the frequency data unit is smaller than the channel bandwidth, and the higher priority sub video data unit is preferentially selected, thereby ensuring that the high priority sub video data unit is preferentially transmitted. For example, when the sum of S0, S1, S2, and S3 is less than or equal to the current bandwidth of the channel, the encoded sub-video data unit A, sub-video data unit B, sub-video data unit C, and sub-video data unit D may be selected for transmission. When the sum of S0, S1, S2, and S3 is greater than T, determining the first three higher priority code stream data, that is, the encoded sub video data unit A, the encoded sub video data unit B, and the encoded sub- The video data unit C, if the sum of S0, S1, S2 is less than T, the image processor transmits the encoded sub-video data unit A, the encoded sub-video data unit B, and the encoded sub-video data unit C; If the sum of S0, S1, and S2 is greater than T, the first two sub-video data units A and sub-video data units B having higher priority are determined, and so on, to ensure that at least the sub-video data unit A having the highest priority is transmitted. .

In this embodiment, the video data is decomposed into a plurality of sub-video data units, and the plurality of sub-video data units are separately encoded, and one or more encoded sub-video data units are selected according to channel characteristics and characteristics of the sub-video data units. The selected one or more encoded sub-video data units conform to the channel characteristics, which can effectively solve the problem of mismatch between the source and the channel, and can effectively reduce the transmission delay of the video data due to the source channel mismatch problem. shake.

Embodiments of the present invention provide a video transmission method. FIG. 6 is a flowchart of a video transmission method according to another embodiment of the present invention. On the basis of the embodiment shown in FIG. 2, the video data includes one or more image frames. As shown in FIG. 6, the method in this embodiment may include:

Step S601: Decompose each of the one or more image frames in the video data into a plurality of sub-images.

On the basis of the above embodiments, the video data may include one frame image, and may also include consecutive multi-frame images. This embodiment does not limit the number of pixels included in one frame image, and does not limit the pixels of each pixel. value. When the image processor decomposes the video data, specifically, each frame image of one or more frames in the video data may be decomposed into a plurality of sub-images, and the sub-video data unit includes each of the image frames. At least one of the plurality of sub-images obtained after the decomposition is performed.

Specifically, each of the one or more image frames in the video data is decomposed into a plurality of sub-images. In this embodiment, a frame image included in the video data is taken as an example to introduce the frame image. The process of spatial decomposition can be achieved in several possible ways:

A first possible way to decompose each of the one or more image frames in the video data into a plurality of sub-images using a Fourier correlation transform or an orthogonal transform.

The Fourier correlation transform or orthogonal transform is selected from a Hadamard transform, a discrete cosine transform, a discrete Fourier transform, a Walsh-Hadamard transform, a Haar transform, or an oblique transform. This embodiment specifically takes the Hadamard transform as an example to introduce a process of spatially decomposing the frame image. In other embodiments, other spatial transforms may also be used to spatially decompose the frame image.

As shown in FIG. 7 , a schematic diagram of a frame image is used. This embodiment does not limit the number of pixels included in a frame image, and a frame image includes 16 pixels. For example, P1-P16 represents 16 pixels. The pixel value is spatially transformed and decomposed into pixel values of every four adjacent pixel points of the 16 pixel points, and is decomposed into four sub-images. The following is schematically illustrated by Hadamard transform, and the specific spatial transformation decomposition process includes The following steps:

Step 1. Perform a Hadamard transform by using four adjacent pixels of the 16 pixels as a unit. For example, the conversion coefficients obtained by the Hadamard transform of P1, P2, P3, and P4 are H1, H2, and H3. H4, wherein the relationship between P1, P2, P3, P4 and H1, H2, H3, H4 satisfies the formulas (1), (2), (3), (4):

H1=(P1+P2+P3+P4+1)>>1 (1)

H2=(P1+P2–P3–P4+1)>>1 (2)

H3=(P1+P3–P2–P4+1)>>1 (3)

H4=(P1+P4–P2–P3+1)>>1 (4)

According to formulas (1), (2), (3), (4), H1 contains the average energy of 4 pixels, H2 contains the average gradient of 4 pixels in the vertical direction, and H3 contains 4 pixels. In the horizontal gradient, H4 contains a cross gradient of 4 pixels, ie texture information. It can be seen that the energy concentration of H1 is the highest, the energy concentration of H2 and H3 is second, and the energy concentration of H4 is the smallest. When using P1, P2, P3, and P4 with H1, H2, H3, and H4, H1 is essential. H1 is the most important, followed by H2 and H3, and finally H4 is important. Therefore, H1 is the most important, H2 and H3 are the second most important, and H4 is the least important, that is, H1, H2, H3, and H4 are important. Sex is decremented in turn. Similarly, the same Hadamard transform is performed on P5-P8 to obtain H5-H8, and the energy concentration of H5, H6, H7 and H8 is successively decreased, and the importance of H5, H6, H7 and H8 is successively decreased, and P9-P12 is successively performed. The same Hadamard transform gets H9-H12, H9, The energy concentration of H10, H11 and H12 decreases in turn, and the importance of H9, H10, H11 and H12 decreases successively. The same Hadamard transformation is performed on P13-P16 to obtain the energy concentration of H13-H16, H13, H14, H15 and H16. The degrees are successively decremented, and the importance of H13, H14, H15, and H16 is successively decreased, and a coefficient image as shown in FIG. 8 is obtained.

Step 2: Decompose the conversion coefficients obtained by the Hadamard transform into different sub-images. This embodiment does not limit the number of sub-images obtained by spatially transforming each frame of the image, for example, the sub-images obtained after the decomposition. The number is four, which is only a schematic description. In other embodiments, the number of sub-images obtained by spatially transforming and decomposing each frame of image may be other values. Optionally, H1 is assigned to the first sub-image, H2 is assigned to the second sub-image, H3 is assigned to the third sub-image, and H4 is assigned to the fourth sub-image. Similarly, H5-H8 is decomposed into 4 sub-images, H9-H12 is decomposed into 4 sub-images, and H13-H16 is decomposed into 4 sub-images, and the decomposition result shown in FIG. 9 is obtained, wherein the sub-image 1 Concentrates the conversion coefficient with the highest energy concentration, sub-image 2 and sub-image 3 concentrate the conversion coefficient of energy concentration second, and sub-image 4 concentrates the conversion coefficient with the smallest energy concentration, so sub-image 1 is the most important, sub-image 2. The importance of sub-image 3 is second, sub-image 4 is of the least importance.

As can be seen from FIG. 9, the resolution of each of the four sub-images after the spatial transformation decomposition is one quarter of the original image before the decomposition.

A second possible way to use spatial downsampling to decompose each of the one or more image frames in the video data into a plurality of sub-images.

This embodiment does not limit the number of sub-images obtained by down-sampling the image space of each frame. For example, the number of sub-images obtained after decomposition is four, which is only schematically illustrated herein. In other embodiments, The number of sub-images obtained by down-sampling the image space of each frame may also be other values. On the basis of FIG. 7, spatially down-sampling the pixel values of every four adjacent pixel points of the 16 pixel points is performed. Decomposed into four sub-images, the specific spatial down-sampling decomposition process is: decomposing four pixel points in one unit into different sub-images by using four adjacent pixel points of 16 pixels as one unit. For example, P1 is decomposed into the first sub-image, P2 is decomposed into the second sub-image, P3 is decomposed into the third sub-image, and P4 is decomposed into the fourth sub-image. Similarly, P5-P8 is decomposed into 4 sub-images. For the image, P9-P12 is decomposed into 4 sub-images, and P13-P16 is decomposed into 4 sub-images, and the decomposition result as shown in FIG. 10 is obtained.

As can be seen from FIG. 10, the resolution of each of the four sub-images after spatial downsampling is one quarter of the original image before the decomposition. Without loss of generality, it is assumed that the size of the original image before decomposition is W*H. If the original image is decomposed into four sub-images, the row number or column number of the pixel matrix corresponding to the original image or sub-image is counted from 0, then the first The sub-images may include pixels in the original image with coordinates (2i, 2j), where

The second sub-image may include a pixel with coordinates (2i+1, 2j) in the original image, and the third sub-image may include a pixel with coordinates (2i, 2j+1) in the original image, and the fourth sub-image. The pixel in the original image with coordinates (2i+1, 2j+1) may be included.

Each of the one or more image frames in the video data can be decomposed into a plurality of sub-images according to any of the above-described decomposition modes, that is, spatial transformation or spatial downsampling. In this embodiment, one image frame is one frame image, and a plurality of image frames are multi-frame images. The video data includes one or more image frames, the sub-video data unit including at least one of a plurality of sub-images obtained by decomposing each of the image frames.

If there is an image frame in the video data, the image frame is decomposed in the manner shown in FIG. 9 or FIG. 10, and the sub-video data unit may include at least one of the plurality of sub-images obtained by decomposing the image frame, for example, one. The sub-video data unit includes a sub-image, and each sub-image obtained after the decomposition is compressed and encoded to obtain encoded code stream data.

If there are consecutive image frames in the video data, each image frame, that is, each frame image is decomposed as shown in FIG. 9 or FIG. 10, for example, the video data includes 24 image frames, and if each image frame is decomposed For 4 sub-images, 24 consecutive image frames are decomposed to obtain 24*4 sub-images, and each sub-video data unit may include multiple sub-images of 24*4 sub-images, specifically, the sub-video data unit At least one of the plurality of sub-images corresponding to each image frame obtained by decomposing each of the image frames, the sub-video data unit may be four, wherein the sub-video data unit Each of them includes a sub-image of each of the 24 image frames. The encoded code stream data is obtained by compressing and encoding each sub-video data unit.

As can be seen from FIG. 9 or FIG. 10, each sub-image includes a portion of an image frame. As shown in FIG. 10, each sub-image includes one or more pixels of an image frame. As shown in Figure 9, each sub-image includes one or more conversion coefficients of the image frame. When the receiving device reconstructs a frame of image as described in FIG. 7, in FIG. 10, the importance of the sub-image 1, the sub-image 2, the sub-image 3, and the sub-image 4 are the same. In FIG. 9, the sub-image 1 is the most important, and the sub-image 2 and the sub-image 3 are the second most important, and the sub-image 4 has the lowest importance. In this embodiment, the plurality of sub-video data units are prioritized according to the energy concentration. Sort by level. Since sub-image 1 concentrates the conversion coefficient with the highest energy concentration, the sub-picture The image 2 and the sub-image 3 concentrate the conversion coefficient of the energy concentration second, and the sub-image 4 concentrates the conversion coefficient with the smallest energy concentration. If a sub-video data unit includes a sub-image, the sub-video data unit including the sub-image 1 The priority of the sub-video data unit including sub-image 2 or sub-image 3 is the highest, and the sub-video data unit including sub-image 4 has the lowest priority.

Step S602, encoding the plurality of sub-video data units separately.

The encoding of the plurality of sub-video data units separately can be implemented in the following feasible ways:

A first possible way: multiple sub-video data units are encoded by a plurality of separate encoders.

Specifically, the plurality of sub-video data units are encoded in parallel by using the plurality of separate encoders; or the plurality of sub-video data units are encoded by using different video encoding rules; or, the same video encoding rule is adopted. The plurality of sub-video data units are encoded.

A second possible way: two or more of the plurality of sub-video data units are encoded by the same encoder.

A third possible way: encoding at least one of the plurality of sub-video data units based on a motion compensated video compression standard.

A fourth feasible way: compressing the plurality of sub-video data units according to different compression ratios. The compression ratio is determined based on one or more characteristics of the sub-video data unit.

Step S603: Select one or more encoded sub-video data units based on one or more characteristics of the channel and one or more characteristics of the sub-video data unit.

In this embodiment, if one sub-video data unit includes one sub-image, as shown in FIG. 9, the sub-image 1 is the most important, the sub-image 2, the sub-image 3 are of the second most important, and the sub-image 4 has the lowest importance. Then, the sub-video data unit including the sub-image 1 has the highest priority, and the sub-video data unit including the sub-image 2 or the sub-image 3 has the lower priority, and the sub-video data unit including the sub-image 4 has the lowest priority. The code stream data size encoded by the sub video data unit A including the sub image 1 is S0, the code stream data size encoded by the sub video data unit B including the sub image 2 is S1, and the sub video including the sub image 3 is used. The code stream data size encoded by the data unit C is S2, and the code stream data size encoded by the sub video data unit D including the sub image 4 is S3. The image processor can receive and transmit error status according to historical data, priority of the current sub-video data unit, The size of the code stream data encoded by the current sub-video data unit, and the estimated value of the real-time channel, such as the channel bandwidth, are combined and transmitted to the four sub-video data units to achieve real-time matching between the source and the channel. For example, when the sum of S0, S1, S2, and S3 is less than or equal to the current bandwidth of the channel, the encoded sub-video data unit A, sub-video data unit B, sub-video data unit C, and sub-video data unit D may be selected for transmission. When the sum of S0, S1, S2, and S3 is greater than T, determining the first three higher priority code stream data, that is, the encoded sub video data unit A, the encoded sub video data unit B, and the encoded sub- The video data unit C, if the sum of S0, S1, S2 is less than T, the image processor transmits the encoded sub-video data unit A, the encoded sub-video data unit B, and the encoded sub-video data unit C; If the sum of S0, S1, S2 is greater than T, then the first two higher priority sub-video data units A, sub-video data units B are determined, and so on, ensuring that at least the highest priority sub-video data unit A is transmitted.

In some embodiments, if one sub-video data unit includes one sub-image, as shown in FIG. 10, sub-image 1, sub-image 2, sub-image 3, sub-image 4 have the same importance, or in some cases The importance of the image 1, the sub-image 2, the sub-image 3, and the sub-image 4 cannot be determined, that is, the sub-video data unit A including the sub-image 1, the sub-video data unit B including the sub-image 2, and the sub-video including the sub-image 3. The data unit C, the sub-video data unit D including the sub-image 4 have the same priority or the priority cannot be determined, and the code stream data size encoded by the sub-video data unit A is S0, and the code encoded by the sub-video data unit B The stream data size is S1, the code stream data size encoded by the sub video data unit C is S2, and the code stream data size encoded by the sub video data unit D is S3. The image processor can jointly transmit the four sub-video data units according to the transmission and error status of the historical data, the current stream data size of the sub-video data unit, and the estimated value of the real-time channel, such as the channel bandwidth, to achieve real-time source and channel. match. For example, the sum of S0, S1, S2, and S3 is less than or equal to the current bandwidth of the channel, and the encoded sub-video data unit A, the sub-video data unit B, the sub-video data unit C, and the sub-video data unit D may be selected for transmission. When the sum of S0, S1, S2, and S3 is greater than T, the three largest code stream data can be selected from S0, S1, S2, and S3, and the three largest code stream data are S0, S1, S2, if S0, S1 If the sum of S2 is less than T, the image processor may select the encoded sub-video data unit A, the sub-video data unit B, and the sub-video data unit C to transmit, and so on, when the sum of S0, S1, S2 is greater than T. It is possible to select other combinations of the code stream data that can be transmitted with the bandwidth of the current wireless channel and the largest sub-picture data unit.

In addition, if the amount of transmitted data corresponding to the delay jitter allowed by the wireless channel is D, the combined code stream data size may fall within the range of [T-D, T+D].

In this embodiment, by decomposing each of the one or more image frames in the video data into a plurality of sub-images, the specific spatial decomposition includes spatial transformation decomposition and spatial down sampling decomposition, so that the sub-image obtained after the decomposition includes the image frame. One or more pixels, or one or more conversion coefficients including an image frame, the sub-images obtained by the decomposition are combined into a sub-video data unit, and when the encoded sub-video data unit is transmitted, the sub-video data unit is encoded The code stream data size matches the characteristics of the channel (such as bandwidth), or the priority of the sub-video data unit is matched to the characteristics of the channel (such as bandwidth), which enables source and channel matching and reduces the frame level. The transmission delay jitter caused by the video channel mismatch problem.

Embodiments of the present invention provide a video receiving method. FIG. 11 is a flowchart of a video receiving method according to an embodiment of the present invention. As shown in FIG. 11, the method in this embodiment may include:

Step S1101, receiving a plurality of encoded sub-video data units.

And wherein the video data includes one or more image frames, and the sub-video data unit includes at least one of a plurality of sub-images obtained by decomposing each of the image frames.

Specifically, when the video data includes one or more frames of images, the decomposition of the video data may be as shown in FIG. 3, FIG. 4 or FIG. 5, and the sub-video data unit is obtained; wherein each frame of the video data is performed. The decomposition may be as shown in FIG. 9 or FIG. 10, and the specific decomposition process is consistent with the foregoing embodiment, and details are not described herein again.

In this embodiment, the receiving device receives a plurality of encoded sub-video data units transmitted by the communication system of the unmanned aerial vehicle, wherein the video data may include one or more image frames, and the unmanned aerial vehicle transmits the video data before transmitting the video data. Each image frame included in the data is decomposed into a plurality of sub-images. For the specific decomposition method, refer to the foregoing section, where the sub-video data unit includes at least one of a plurality of sub-images corresponding to each image frame decomposed by each image frame. , that is, each sub-video data unit includes at least one sub-image of each image frame to reduce the correlation between the sub-images in each sub-video data unit, to avoid loss or distortion due to the sub-video data unit during transmission. The resulting sub-images with high correlation are lost or distorted. Because the correlation between the sub-images obtained after the same frame image is decomposed is higher, and the sub-images with higher correlation are lost. Or when it is distorted, it will be difficult to recover an image frame composed of a highly correlated sub-image.

For example, the receiving device receives 4 sub-video data units, and each sub-video data unit includes a sub-image obtained as shown in FIG. 9 or FIG.

Step S1102: Decode the plurality of encoded sub-video data units.

Specifically, the plurality of encoded sub-video data units are separately decoded. For example, the receiving device separately decodes the four encoded sub-video data units, that is, separately performs code stream data corresponding to the four sub-images. Decode to obtain the decoded sub-video data unit. When the sub-video data unit is transmitted in the wireless channel, the sub-video data unit obtained by the receiving device and the encoded system actually transmitted by the communication system of the UAV may be caused due to noise interference, multipath effect, fading, and the like. The video data units are different, causing the receiving device to receive an error.

If the communication system transmits the four encoded sub-images as shown in FIG. 9, the four sub-images obtained by the receiving device after decoding the code stream data are as shown in FIG. 12, wherein if the sub-image 1 is transmitted correctly, H1 Same as h1, H2 and h2 are the same, H3 and h3 are the same, H4 and h4 are the same, and if sub-image 1 is transmitted incorrectly, at least one of H1 and h1, H2 and h2, H3 and h3, H4 and h4 are different. Similarly, other sub-pictures are transmitted correctly or transmitted incorrectly, and the conversion coefficients before transmission and the conversion coefficients after transmission also have the same relationship.

If the communication system transmits the four encoded sub-images as shown in FIG. 10, the four sub-images obtained by the receiving device after decoding the code stream data are as shown in FIG. 13, wherein if the sub-image 1 is transmitted correctly, P1 The same as p1, P2 and p2 are the same, P3 and p3 are the same, and P4 and p4 are the same. If sub-image 1 is transmitted incorrectly, at least one of P1 and p1, P2 and p2, P3 and p3, P4 and p4 are different. Similarly, other sub-images are transmitted correctly or transmitted incorrectly, and the pixels before transmission and the pixels after transmission also have the same relationship.

Step S1103: Reconstruct the video data according to the decoded sub video data unit.

Specifically, a transmission error of one or more sub-images of the sub-video data unit is detected, and the video data is reconstructed according to receiving the correct sub-image. For example, after the receiving device decodes the code stream data to obtain 4 sub-images, it detects that each sub-image is transmitted correctly or transmits an error, and reconstructs the original image according to receiving the correct sub-image.

In the present embodiment, the sub-image transmitted by the communication system is the sub-image shown in FIG. 9, and the sub-image received by the receiving device is as shown in FIG. 12. In FIG. 12, it is assumed that the sub-image 2 receives an error, and the sub-image 1, The sub-image 3 and the sub-image 4 are all received correctly, and the receiving device reconstructs the original image according to the sub-image 1, the sub-image 3, and the sub-image 4 shown in FIG. 12, and reconstructs the original image, reconstructs the video data by using an inverse transform, and gives The sub-picture in which the error is transmitted in the sub-picture data unit is assigned a value. A possible implementation is to assign a value of 0 to the sub-picture that transmits the error in the sub-video data unit. For example, in FIG. 12, h2, h6, h10, and h14 of the sub-image 2 are both set to 0. As can be seen from the above embodiment, H1, H2, H3, and H4 are obtained based on P1, P2, P3, and P4, When reconstructing the original image, it is necessary to perform inverse Hadamard transform on h1, h2, h3, and h4. If spatial transformation is used to decompose the image frame, other spatial transformations are used. When the receiving device reconstructs the original image, the corresponding spatial transformation is adopted. In the inverse transformation of this embodiment, after hdama inverse transformation of h1, h2, h3, h4, p1, p2, p3, p4 are obtained, wherein h1, h2, h3, h4 and p1, p2, p3, p4 The relationship satisfies the formulas (5), (6), (7), (8):

P1=(h1+h2+h3+h4+1)>>1 (5)

P2=(h1+h2-h3-h4+1)>>1 (6)

P3=(h1+h3-h2-h4+1)>>1 (7)

P4=(h1+h4-h2-h3+1)>>1 (8)

Where h2 is 0, H1 and h1 are the same, H3 and h3 are the same, and H4 and h4 are the same. Therefore, p1, p2, p3, p4 obtained by inverse Hadamard transform and pixel values P1, P2, P3 in the original image, P4 may be different, but reconstructing the original image based on receiving the correct sub-image ensures that the reconstructed image is close to the original image. Similarly, h5, h6, h7, h8 are inversely transformed by Hadamard to obtain p5, p6, p7, p8, and h9, h10, h11, h12 are inversely transformed by Hadamama to obtain p9, p10, p11, p12, h13, H14, h15, h16 perform inverse Hadamard transform to obtain p13, p14, p15, p16, wherein h6, h10, h14 are all 0, and reconstruct the original image according to p1-p16 obtained by inverse Hadamard transform, as shown in Fig. 14. .

In other embodiments, the sub-image transmitted by the communication system is the sub-image shown in FIG. 10, and the sub-image received by the receiving device is as shown in FIG. 13, and in FIG. 13, it is assumed that the sub-image 3 receives the error, and the sub-image 1, The sub-image 2 and the sub-image 4 are all received correctly, and the receiving device reconstructs the original image according to the sub-image 1, the sub-image 2, the sub-image 3, and the sub-image 4 shown in FIG. 13, and when the original image is reconstructed, the sub-video data unit is given. The sub-image of the transmission error is assigned a value. Another feasible implementation is that the value assigned to the sub-picture transmitted in the sub-video data unit is determined by interpolation, specifically, for transmission in the sub-video data unit. The value of the wrong sub-image assignment is based on the transmission The correct sub-image is determined, and the sub-image transmitted incorrectly and the sub-image transmitted correctly are from the same image frame. For example, in FIG. 13, the sub-image 3 receives an error, and the sub-image 1, the sub-image 2, and the sub-image 4 are all received correctly, and the sub-image 3 does not participate in the reconstruction process, that is, the receiving device only according to the sub-image 1, the sub-image 2, The sub-image 4 reconstructs the original image. The specific process is as follows: since the original image includes 16 pixels, the sub-image 1, the sub-image 2, and the sub-image 4 have a total of 12 pixels. According to FIG. 10, each of the 16 pixels in the original image is known. Four adjacent pixels are decomposed into four different sub-images. Therefore, when the original image is reconstructed from the sub-image 1, sub-image 2, and sub-image 4, the first pixel p1 of the sub-image 1 and the sub-image 2 The first pixel p2 and the first pixel p4 of the sub-image 4 are respectively three pixels P1, P2, and P4 of the first four adjacent pixels of the original image. Similarly, p5, p6, and p8 are original. The three pixels P5, P6, P8, p9, p10, and p12 in the image P5-P8 are the three pixels P9, P10, and P12 in the original image P9-P12, respectively, and the p13, p14, and p16 are the original images P13-P16, respectively. The three pixels P 13 , P 14 , P 16 in accordance with p1, p2, p4, p5, p6, p8, p9, p10, p12, p13, p14 p16 obtained image A shown in FIG. 15, the pixel image A spare reception error that is included in the sub-image pixel 3 due to reception error sub-image 3, and therefore, the sub-image 3 can not participate in the process of reconstruction of the original image. Since p1, p2, p4, p5, p6, p8, p9, p10, p12, p13, p14, and p16 are all correctly received, ie, p1, p2, p4, p5, p6, p8, p9, p10, p12, p13 , p14, p16 are the same as the pixel points at the same position of the original image. In this embodiment, the pixel value of the image A can be determined according to the interpolation method. A feasible interpolation method is: p3 is equal to the arithmetic average of p1, p2, and p4 The value, p7 is equal to the arithmetic mean of p5, p6, p8, p11 is equal to the arithmetic mean of p9, p10, p12, and p15 is equal to the arithmetic mean of p13, p14, p16, resulting in the reconstructed original image B. Here, only one method of interpolating the pixel value of the image A is provided, and those skilled in the art may use other interpolation methods to determine the residual pixel value of the image A, which is not specifically limited herein.

In this embodiment, the receiving device separately decodes the plurality of sub-video data units to obtain the decoded sub-video data unit, and reconstructs the original image according to the decoded sub-video data unit, and specifically, receives the correct sub-image reconstruction according to the decoding. The original image can make the reconstructed image be close to the original image to the greatest extent when the partial sub-image is received incorrectly, which improves the fault tolerance of the receiving device in reconstructing the image and enhances the robustness of the system.

Embodiments of the present invention provide a video transmission system. FIG. 16 is a structural diagram of a video transmission system according to an embodiment of the present invention. As shown in FIG. 16, the video transmission system 1600 includes one or more imaging devices 1601, and one or more processors 1602 on a movable object. Or the plurality of imaging devices 1601 are configured to acquire video data; the one or more processors 1602 work alone or in concert, configured to: decompose the video data into a plurality of sub-video data units, wherein each sub-video data unit includes one or Multiple sub-images; encoding the plurality of sub-video data units separately; and selecting one or more encoded sub-videos based on one or more characteristics of the channel and one or more characteristics of the sub-video data unit Data unit and transfer.

Wherein one or more characteristics of the channel include at least a bandwidth. Alternatively, the one or more characteristics of the channel include at least one of: noise, interference, signal to noise ratio, bit error rate, fading rate, bandwidth.

The one or more characteristics of the sub video data unit include: a code stream data size encoded by the sub video data unit, or an energy concentration of the sub video data unit.

Optionally, the total code stream data size of the one or more encoded sub-video data units matches the channel bandwidth. Alternatively, the plurality of sub-video data units are prioritized according to the energy concentration. The processor 1602, when selecting one or more encoded sub-video data units, is configured to select the one or more encoded sub-video data units based on a priority and a channel bandwidth of the sub-video data units.

The specific principles and implementation manners of the video transmission system provided by the embodiment of the present invention are similar to the embodiment shown in FIG. 2, and details are not described herein again.

In this embodiment, the video data is decomposed into a plurality of sub-video data units, and the plurality of sub-video data units are separately encoded, and one or more encoded sub-video data units are selected according to channel characteristics and characteristics of the sub-video data units. The selected one or more encoded sub-video data units are made to conform to channel characteristics, and when the selected one or more encoded sub-video data units are transmitted on the channel matched thereto, the source and the source can be effectively solved. The channel mismatch problem can effectively reduce the transmission delay jitter of video data due to the source channel mismatch problem.

Embodiments of the present invention provide a video transmission system. Based on the technical solution provided by the embodiment shown in FIG. 16, the video data includes one or more image frames. Processor 1602 will When the video data is decomposed into a plurality of sub-video data units, configured to: decompose each of the one or more image frames in the video data into a plurality of sub-images, wherein each of the sub-video data units At least one of the plurality of sub-images obtained by each of the image frames is included. Each sub-image includes a portion of the image frame. Specifically, each sub-image includes one or more pixels of the image frame. Alternatively, each sub-image includes one or more conversion coefficients of the image frame.

Specifically, when the processor 1602 decomposes each of the one or more image frames in the video data into a plurality of sub-images, the processor 1602 is configured to: each of the one or more image frames in the video data A space is decomposed into multiple sub-images.

When the processor 1602 decomposes each of the one or more image frames in the video data into a plurality of sub-images, the processor 1602 is configured to: use one of the video data by Fourier correlation transform or orthogonal transform Or each of the plurality of image frames is decomposed into a plurality of sub-images. Wherein, the Fourier correlation transform or orthogonal transform is selected from a Hadamard transform, a discrete cosine transform, a discrete Fourier transform, a Walsh-Hadamard transform, a Haar transform or a skew transform.

Alternatively, when the processor 1602 decomposes each of the one or more image frames in the video data into a plurality of sub-images, the processor 1602 is configured to: spatially downsample one or more images in the video data Each space in the frame is decomposed into multiple sub-images.

In addition, when the processor 1602 separately encodes the plurality of sub video data units, the following may be implemented in several feasible manners:

The first possible way:

The processor 1602 controls a plurality of encoders to encode the plurality of sub-video data units. Specifically, the processor 1602 controls the plurality of encoders to perform parallel encoding on the plurality of sub video data units; or the processor 1602 controls the plurality of encoders to respectively use the different video encoding rules to the plurality of sub video data. The unit performs encoding; or alternatively, the processor 1602 controls the plurality of encoders to encode the plurality of sub-video data units using the same video encoding rule.

A second possible way: the processor 1602 controls the encoder to encode two or more of the plurality of sub-video data units.

A third possible way: the processor 1602 controls the encoder to encode at least one of the plurality of sub-video data units based on a motion compensated video compression standard.

A fourth possible way: the processor 1602 is configured to compress the plurality of sub-video data units according to different compression ratios. Wherein the compression ratio is determined according to one or more characteristics of the sub-video data unit.

The encoder may be a hardware entity that is independent of the processor 1602 and is electrically connected to the processor 1602, or may be software for implementing the encoding function in the processor 1602.

In this embodiment, the movable object is an unmanned aerial vehicle. One or more imaging devices 1601 are coupled to the movable object by a carrier, which may be a multi-axis universal joint.

The specific principles and implementations of the video transmission system provided by the embodiment of the present invention are similar to the embodiment shown in FIG. 2-10, and details are not described herein again.

In this embodiment, by decomposing each of the one or more image frames in the video data into a plurality of sub-images, the specific spatial decomposition includes spatial transformation decomposition and spatial down sampling decomposition, so that the sub-image obtained after the decomposition includes the image frame. One or more pixels, or one or more conversion coefficients including an image frame, for a frame image, a plurality of sub-images obtained by the decomposition thereof may be combined and transmitted, so that the combined code stream data size and channel The bandwidth matching, or the combined code stream data further includes code stream data corresponding to the sub-images with higher importance, so that each frame image can be matched on the wireless channel to achieve source and channel matching, and the frame is reduced. The transmission delay jitter caused by the level channel video data due to the source channel mismatch problem.

The embodiment of the invention provides a receiving device. FIG. 17 is a structural diagram of a receiving device according to an embodiment of the present invention. As shown in FIG. 17, the receiving device 1700 includes a communication interface 1701, one or more processors 1702, and one or more processors 1702 work alone or in cooperation. The interface 1701 is communicatively coupled to the processor 1702; the communication interface 1701 is configured to receive a plurality of encoded sub-video data units, wherein the video data includes one or more image frames, and the sub-video data unit includes the At least one of the plurality of sub-images obtained after each of the image frames is decomposed; the one or more processors 1702 are configured to: control the decoder to decode the plurality of encoded sub-video data units; The subsequent sub-video data unit reconstructs the video data.

Optionally, when the processor 1702 controls the decoder to decode the plurality of encoded sub-video data units, the processor 1702 is specifically configured to: control the decoder to separately decode the plurality of encoded sub-video data units. Wherein, the decoder may be independent of the processor 1702 and with the processor A hardware entity that is electrically connected to 1702 may also be software for implementing a decoding function in the processor 1702.

In addition, the processor 1702 is further configured to: detect a transmission error of one or more sub-images of the sub-video data unit; when the processor 1702 reconstructs the video data according to the decoded sub-video data unit, specifically, according to the receiving The correct sub-image reconstructs the video data.

In addition, the processor 1702 is further configured to assign a value to the sub-image in which the error is transmitted in the sub-video data unit.

Wherein, the value assigned to the sub-image in which the error is transmitted in the sub-video data unit is 0.

Alternatively, the value assigned to the sub-picture in which the error is transmitted in the sub-video data unit is determined by interpolation. The value assigned to the sub-picture in which the error is transmitted in the sub-video data unit is determined based on the transmission of the correct sub-image, and the sub-image of the transmission error and the sub-image of the transmission are from the same image frame.

When the processor 1702 reconstructs the video data according to the decoded sub video data unit, the processor 1702 is specifically configured to: reconstruct the video data by using an inverse transform.

The receiving device may be a remote controller, a smart phone, a tablet computer, a ground control station, a laptop computer, a watch, a wristband, and the like, and combinations thereof.

The specific principles and implementations of the receiving device provided by the embodiment of the present invention are similar to the embodiment shown in FIG. 11-15, and details are not described herein again.

An embodiment of the present invention provides a control terminal, where the control terminal includes the receiving device described in the foregoing embodiment. For example, the control terminal can be a remote control, a smart phone, a tablet, a ground control station, a laptop, a watch, a wristband, etc., and combinations thereof, and can also control the unmanned aerial vehicle on the ground.

Embodiments of the present invention provide an unmanned aerial vehicle. FIG. 18 is a schematic diagram of an embodiment of the present invention A structural diagram of a human aircraft, as shown in FIG. 18, the unmanned aerial vehicle 1800 includes: a fuselage, a power system, and a video transmission system, the power system including at least one of the following: a motor 1801, a propeller 1802, and an electronic governor 1803. A power system is mounted to the airframe for providing flight power; a flight controller 1804 is in communication with the power system for controlling the unmanned flight 1806, wherein the flight controller 1804 includes an inertial measurement unit And gyroscopes. The inertial measurement unit and the gyroscope are configured to detect an acceleration, a pitch angle, a roll angle, a yaw angle, and the like of the unmanned aerial vehicle.

The video transmission system includes one or more imaging devices 1805, and one or more processors 1806 disposed on the movable object. The imaging device 1805 is coupled to the body through the support device 1807, and the processor 1806 and the imaging device 1805 are communicatively coupled. The supporting device 1807 may specifically be a pan/tilt. The principle and implementation of the video transmission system are similar to the above embodiments, and are not described herein again.

In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional units.

The above-described integrated unit implemented in the form of a software functional unit can be stored in a computer readable storage medium. The above software functional unit is stored in a storage medium and includes a plurality of instructions for causing a computer device (which may be a personal computer, a server, or A network device or the like) or a processor performs part of the steps of the method described in various embodiments of the present invention. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. .

A person skilled in the art can clearly understand that for the convenience and brevity of the description, only the division of each functional module described above is exemplified. In practical applications, the above function assignment can be completed by different functional modules as needed, that is, the device is installed. The internal structure is divided into different functional modules to perform all or part of the functions described above. For the specific working process of the device described above, refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.

Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, and are not intended to be limiting; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that The technical solutions described in the foregoing embodiments may be modified, or some or all of the technical features may be equivalently replaced; and the modifications or substitutions do not deviate from the technical solutions of the embodiments of the present invention. range.

Claims

A video transmission method, comprising:

Decomposing video data into a plurality of sub-video data units, wherein each sub-video data unit includes one or more sub-images;

Encoding the plurality of sub-video data units separately;

One or more encoded sub-video data units are selected and transmitted based on one or more characteristics of the channel, and one or more characteristics of the sub-video data unit.
The method of claim 1 wherein said video data comprises one or more image frames;

Decomposing the video data into a plurality of sub-video data units, including:

Decomposing each of the one or more image frames in the video data into a plurality of sub-images, wherein each of the sub-video data units includes a plurality of sub-images obtained by each of the image frames At least one sub-image.
The method of claim 2 wherein each sub-image comprises a portion of said image frame.
The method of claim 3 wherein each of said sub-images comprises one or more pixels of said image frame.
The method of claim 3 wherein each of said sub-images comprises one or more conversion coefficients of said image frame.
The method according to any one of claims 2 to 5, wherein the decomposing each of the one or more image frames in the video data into a plurality of sub-images comprises:

Decomposing each of the one or more image frames in the video data into a plurality of sub-images.
The method according to claim 6, wherein the decomposing each of the one or more image frames in the video data into a plurality of sub-images comprises:

Each of the one or more image frames in the video data is decomposed into a plurality of sub-images using a Fourier correlation transform or an orthogonal transform.
The method according to claim 7, wherein said Fourier correlation transform or orthogonal transform is from Hadamard transform, discrete cosine transform, discrete Fourier transform, Walsh-Hadamard transform, Hal Selected in transform or skew transform.
The method according to claim 7, wherein the decomposing each of the one or more image frames in the video data into a plurality of sub-images comprises:

Each of the one or more image frames in the video data is decomposed into a plurality of sub-images using spatial downsampling.
The method of any of claims 1-9, wherein the one or more characteristics of the channel comprise at least a bandwidth.
The method of any of claims 1-9, wherein the one or more characteristics of the channel comprise at least one of the following:

Noise, interference, signal to noise ratio, bit error rate, fading rate, bandwidth.
The method of any of claims 1-11, wherein one or more characteristics of the sub-video data unit comprises:

The size of the code stream data encoded by the sub video data unit or the energy concentration of the sub video data unit.
The method according to claim 12, wherein said selecting one or more encoded sub-video data units comprises:

One or more encoded sub-video data units are selected such that the total code stream data size of the one or more encoded sub-video data units matches the channel bandwidth.
The method according to claim 12, wherein said plurality of sub-video data units are prioritized according to said energy concentration.
The method according to claim 14, wherein said selecting one or more encoded sub-video data units comprises:

The one or more encoded sub-video data units are selected based on a priority of the sub-video data unit and a channel bandwidth.
The method according to any one of claims 1 to 15, wherein the encoding the plurality of sub-video data units separately comprises:

The plurality of sub-video data units are encoded by a plurality of separate encoders.
The method of claim 16 wherein said plurality of sub-video data units are encoded by a plurality of separate encoders, comprising:

The plurality of sub-video data units are encoded in parallel using the plurality of separate encoders.
The method of claim 16 wherein said plurality of sub-videos The unit is encoded by a number of separate encoders, including:

The plurality of sub-video data units are encoded using different video coding rules.
The method of claim 16 wherein said plurality of sub-video data units are encoded by a plurality of separate encoders, comprising:

The plurality of sub-video data units are encoded using the same video encoding rules.
The method according to any one of claims 1 to 15, wherein the encoding the plurality of sub-video data units separately comprises:

Two or more of the plurality of sub-video data units are encoded by the same encoder.
The method according to any one of claims 1 to 15, wherein the encoding the plurality of sub-video data units separately comprises:

At least one of the plurality of sub-video data units is encoded based on a motion compensated video compression standard.
The method according to any one of claims 1 to 15, wherein the encoding the plurality of sub-video data units separately comprises:

The plurality of sub-video data units are compressed according to different compression ratios.
The method of claim 22 wherein said compression ratio is determined based on one or more characteristics of said sub-video data unit.
A video receiving method, comprising:

Receiving a plurality of encoded sub-video data units;

Decoding the plurality of encoded sub-video data units;

Reconstructing the video data according to the decoded sub-video data unit, wherein the video data includes one or more image frames, and the sub-video data unit includes a plurality of sub-decompositions obtained by decomposing each of the image frames At least one sub-image in the image.
The method according to claim 24, wherein the decoding the plurality of encoded sub-video data units comprises:

Decoding the plurality of encoded sub-video data units separately.
The method of claim 24 or 25, further comprising:

Detecting a transmission error of one or more sub-images of the sub-video data unit;

Reconstructing the video data according to the decoded sub video data unit includes:

The video data is reconstructed based on receiving the correct sub-image.
The method of claim 26, further comprising:

A value is assigned to the sub-image in which the error is transmitted in the sub-video data unit.
The method according to claim 27, wherein a value of 0 is assigned to the sub-picture in which the error is transmitted in the sub-video data unit.
The method according to claim 27, wherein the value assigned to the sub-picture in which the error is transmitted in the sub-video data unit is determined by interpolation.
The method according to claim 29, wherein the value assigned to the sub-picture in which the error is transmitted in the sub-video data unit is determined based on the transmission of the correct sub-image, and the sub-image of the transmission error and the The correct sub-image is transmitted from the same image frame.
The method according to claim 28 or 29, wherein the reconstructing the video data according to the decoded sub-video data unit comprises:

The video data is reconstructed using an inverse transform.
A video transmission system, comprising:

One or more imaging devices configured to acquire video data;

One or more processors on the movable object, working alone or in concert, the processor being configured to:

Decomposing video data into a plurality of sub-video data units, wherein each sub-video data unit includes one or more sub-images;

Encoding the plurality of sub-video data units separately;

One or more encoded sub-video data units are selected and transmitted based on one or more characteristics of the channel, and one or more characteristics of the sub-video data unit.
The system of claim 32 wherein said video data comprises one or more image frames;

When the processor decomposes the video data into a plurality of sub-video data units, it is configured to:

Decomposing each of the one or more image frames in the video data into a plurality of sub-images, wherein each of the sub-video data units includes a plurality of sub-images obtained by each of the image frames At least one sub-image.
The system of claim 33 wherein each sub-image comprises a portion of said image frame.
The system of claim 34 wherein each of said sub-images comprises one or more pixels of said image frame.
The system of claim 34 wherein each of said sub-images comprises one or more conversion coefficients of said image frame.
A system according to any of claims 33-36, wherein the processor, when decomposing each of the one or more image frames in the video data into a plurality of sub-images, is configured to:

Decomposing each of the one or more image frames in the video data into a plurality of sub-images.
The system of claim 37, wherein the processor, when decomposing each of the one or more image frames in the video data into a plurality of sub-images, is configured to:

Each of the one or more image frames in the video data is decomposed into a plurality of sub-images using a Fourier correlation transform or an orthogonal transform.
The system according to claim 38, wherein said Fourier correlation transform or orthogonal transform is from Hadamard transform, discrete cosine transform, discrete Fourier transform, Walsh-Hadamard transform, Hal Selected in transform or skew transform.
The system of claim 37, wherein the processor, when decomposing each of the one or more image frames in the video data into a plurality of sub-images, is configured to:

Each of the one or more image frames in the video data is decomposed into a plurality of sub-images using spatial downsampling.
A system according to any of claims 32-40, wherein one or more characteristics of the channel comprise at least a bandwidth.
A system according to any one of claims 32 to 40, wherein one or more characteristics of the channel comprise at least one of the following:

Noise, interference, signal to noise ratio, bit error rate, fading rate, bandwidth.
A system according to any of claims 32-42, wherein one or more characteristics of said sub-video data unit comprise:

The size of the code stream data encoded by the sub video data unit, or the sub video data The energy concentration of the unit.
The system of claim 43 wherein said processor, when selecting one or more encoded sub-video data units, is configured to:

One or more encoded sub-video data units are selected such that the total code stream data size of the one or more encoded sub-video data units matches the channel bandwidth.
The system of claim 35 wherein said plurality of sub-video data units are prioritized in accordance with said energy concentration.
The system of claim 45, wherein the processor, when selecting one or more encoded sub-video data units, is configured to:

The one or more encoded sub-video data units are selected based on a priority of the sub-video data unit and a channel bandwidth.
A system according to any of claims 32-46, wherein

The processor is further configured to control a plurality of encoders to encode the plurality of sub-video data units.
The system according to claim 47, wherein said processor is specifically configured to control said plurality of encoders to encode said plurality of sub-video data units in parallel.
The system according to claim 47, wherein the processor is specifically configured to control the plurality of encoders to respectively encode the plurality of sub-video data units by using different video encoding rules.
The system according to claim 47, wherein said processor is specifically configured to control said plurality of encoders to encode said plurality of sub-video data units using the same video encoding rule.
The system of any of claims 32-46, wherein the processor is further configured to control an encoder to encode two or more of the plurality of sub-video data units.
A system according to any of claims 32-46, wherein the processor is further for controlling the encoder to encode at least one of the plurality of sub-video data units based on a motion compensated video compression standard.
The system according to any one of claims 32-46, wherein when the processor separately encodes the plurality of sub-video data units, the processor is configured to:

The plurality of sub-video data units are compressed according to different compression ratios.
The system of claim 53 wherein said compression ratio is determined based on one or more characteristics of said sub-video data unit.
A system according to any of claims 32-54, wherein the movable object is an unmanned aerial vehicle.
A system according to any of claims 32-54, wherein said one or more imaging devices are coupled to said movable object by a carrier.
The system of claim 56 wherein said carrier is a multi-axis joint.
A receiving device, comprising: a communication interface, one or more processors, working alone or in cooperation, wherein the communication interface is in communication with the processor;

The communication interface is configured to receive a plurality of encoded sub-video data units;

The one or more processors are configured to: control a decoder to decode the plurality of encoded sub-video data units; reconstruct the video data according to the decoded sub-video data unit, wherein the video data includes one Or a plurality of image frames, the sub-video data unit including at least one of the plurality of sub-images obtained by decomposing each of the image frames.
The receiving device according to claim 58, wherein the processor controls the decoder to decode the plurality of encoded sub-video data units, specifically for:

The control decoder separately decodes the plurality of encoded sub-video data units.
The receiving device according to claim 58 or 59, wherein the processor is further configured to: detect a transmission error of one or more sub-images of the sub-video data unit;

When the processor reconstructs the video data according to the decoded sub video data unit, the processor is specifically configured to: reconstruct the video data according to receiving the correct sub image.
The receiving device according to claim 60, wherein the processor is further configured to: assign a value to the sub-image in which the error is transmitted in the sub-video data unit.
The receiving device according to claim 61, wherein a value assigned to the sub-picture in which the error is transmitted in the sub-video data unit is 0.
The receiving device according to claim 61, wherein the value assigned to the sub-picture in which the error is transmitted in the sub-video data unit is determined by interpolation.
The receiving apparatus according to claim 63, wherein the value assigned to the sub-picture in which the error is transmitted in the sub-video data unit is determined based on the transmission of the correct sub-image, and the sub-image and the The correct sub-images are transmitted from the same image frame.
The receiving device according to claim 62 or 63, wherein when the processor reconstructs the video data according to the decoded sub-video data unit, the processor is specifically configured to:

The video data is reconstructed using an inverse transform.
A control terminal, comprising: the receiving device according to any one of claims 58-65.
An unmanned aerial vehicle, comprising:

body;

a power system mounted to the fuselage for providing flight power;

And a video transmission system according to any of claims 32-57.