RU2576490C1

RU2576490C1 - Background hybrid retouch method for 2d to 3d conversion

Info

Publication number: RU2576490C1
Application number: RU2014134709/08A
Authority: RU
Inventors: Петр ПОГЛ; Александр Александрович Молчанов; Артем Геннадьевич ШАМСУАРОВ; Алексей Дмитриевич Зайцев
Original assignee: Самсунг Электроникс Ко., Лтд.
Priority date: 2014-08-25
Filing date: 2014-08-25
Publication date: 2016-03-10

Abstract

FIELD: physics.

SUBSTANCE: method of retouching a video image background comprises the steps of: obtaining background movement information from a series of frames of a video sequence; for frames with a missing region: transferring pixel data from one or more preceding frames using the background movement information, transferring pixel data from one or more subsequent frames using the background movement information, and merging the said data with partially recovered pixel data in one or more of the said regions on the current frame; selecting a frame with the remaining missing region; performing a spatial retouching procedure on the said frame to recover pixel data of the said missing region; and transferring the recovered pixel data from the said selected frame to all the frames of the sequence where it is possible using the background movement information.

EFFECT: enabling video image background retouching.

9 cl, 5 dwg

Description

ОБЛАСТЬ ТЕХНИКИ, К КОТОРОЙ ОТНОСИТСЯ ИЗОБРЕТЕНИЕFIELD OF THE INVENTION

Изобретение относится в общем к обработке видеоданных и более конкретно к ретушированию видеоданных, т.е. оценке значений для фона, ранее скрытого содержимым на переднем плане.The invention relates generally to video processing, and more particularly to retouching video data, i.e. evaluating values for a background previously hidden by foreground content.

УРОВЕНЬ ТЕХНИКИBACKGROUND

Ретуширование видеоданных применяется в общем в различных задачах, таких как стабилизация видео, преобразование с повышением частоты кадров, преобразование двумерного (2D) изображения в трехмерное (3D) и синтез ракурсов. Способ, согласно изобретению, направлен на полную оценку значений пикселей фона для произвольных последовательностей видеоданных, которые сопровождаются информацией об объектах переднего плана для каждого кадра видеоданных.Retouching video data is generally used in various tasks, such as video stabilization, conversion with increasing frame rate, converting two-dimensional (2D) images into three-dimensional (3D) and synthesis of angles. The method according to the invention is aimed at a full assessment of the background pixel values for arbitrary sequences of video data, which are accompanied by information about the foreground objects for each frame of video data.

В общем, большинство способов ретуширования видеоданных в уровне техники использует либо покадровое ретуширование, либо оценку движения камеры с последующим совмещением и наложением кадров.In general, most prior art video retouching methods use either frame-by-frame retouching, or estimation of camera movement, followed by overlapping and overlapping frames.

В US 2006/0257042 A1 описаны общие способы улучшения видеоданных (такого как стабилизация и устранение размытия), основанные на ретушировании движения. В соответствии со способом предлагается распространение векторов движения в отсутствующие области и последующее использование дополненной таким образом карты векторов движения для переноса пиксельной информации в отсутствующие области. Считается, что количество оставшихся пикселей является в общем небольшим, и их предлагается заполнять посредством размытия. В упомянутом документе рассматриваются главным образом артефакты стабилизации видеоизображения, и поэтому отсутствующие области обычно являются не объектами переднего плана, а областями, примыкающими к краям кадра. Поэтому в данном случае пространственное ретуширование не является важным и не рассматривается в этом документе. Таким образом, применение описанного способа будет ограниченным в случае удаления объектов переднего плана из последовательностей видеоданных, имеющих области фона, которые не видны ни на одном кадре.US 2006/0257042 A1 describes general methods for improving video data (such as stabilizing and eliminating blur) based on motion retouching. In accordance with the method, it is proposed the distribution of motion vectors to missing areas and the subsequent use of the thus supplemented map of motion vectors for transferring pixel information to missing areas. It is believed that the number of remaining pixels is generally small, and it is suggested that they be filled by blurring. This document deals mainly with video stabilization artifacts, and therefore the missing areas are usually not foreground objects, but areas adjacent to the edges of the frame. Therefore, in this case, spatial retouching is not important and is not considered in this document. Thus, the application of the described method will be limited in the case of removing foreground objects from sequences of video data having background regions that are not visible on any frame.

В US 8243805 B2 процесс ретуширования основан на дополнении карты векторов движения. Сначала оценивается локальное движение пикселей и определяется пространственно-временной пробел в последовательности на основании предоставленных масок (из пользовательского ввода или автоматически). Далее отсутствующие значения векторов движения в пробеле переносят в виде пространственно-временных вставок из известных данных путем использования особым образом реализованной меры подобия. В результате становится доступна полная информация о векторах движения для всей последовательности, которая может быть далее использована для распространения значений пикселей в пространственно-временной пробел. Однако очевидно, что такой подход не всегда позволяет дополнить всю последовательность. Например, сцена с почти статичным объектом перед статичным фоном или сцены с изменением масштаба не могут обеспечить достаточно данных для полного заполнения области объекта.In US 8243805 B2, the retouching process is based on the addition of a motion vector map. First, the local motion of the pixels is estimated and the space-time gap in the sequence is determined based on the provided masks (from user input or automatically). Further, the missing values of the motion vectors in the gap are transferred in the form of space-time inserts from known data by using a specially implemented similarity measure. As a result, full information about motion vectors for the entire sequence becomes available, which can be further used to propagate pixel values into a space-time gap. However, it is obvious that this approach does not always complement the entire sequence. For example, a scene with an almost static object in front of a static background or scenes with zooming cannot provide enough data to completely fill the area of the object.

В US 2013/0182184 A1 представлен другой подход к ретушированию видеоизображения. Он основан на представлении видео из одной сцены в виде мозаичных изображений (мозаичные изображения создают из кадров, совмещенных в определенной системе координат с фиксированным опорным кадром). Описанная система заполняет отсутствующие части путем совмещения кадров по координатам того же опорного кадра (мозаичное представление) и затем ретуширует оставшиеся части мозаики с использованием подходящего способа пространственного ретуширования. Затем мозаичное представление преобразуют обратно в последовательность кадров, в которой все отсутствующие части полностью заполнены одним и тем же образом. Ожидается, что описанная система обеспечит полное дополнение последовательности в случаях простого движения. Для сцен со сложным локальным движением фона совмещение кадров с помощью отслеживания характерных точек может быть недостаточным для описания движения, что может привести к видимым артефактам в получаемой последовательности.US 2013/0182184 A1 presents a different approach to retouching a video image. It is based on the presentation of video from one scene in the form of mosaic images (mosaic images are created from frames combined in a certain coordinate system with a fixed reference frame). The described system fills in the missing parts by combining frames at the coordinates of the same reference frame (mosaic representation) and then retouching the remaining parts of the mosaic using a suitable spatial retouching method. Then the mosaic representation is converted back to a sequence of frames in which all the missing parts are completely filled in the same way. It is expected that the described system will provide a complete complement of the sequence in cases of simple movement. For scenes with complex local background motion, the combination of frames by tracking characteristic points may not be sufficient to describe the motion, which can lead to visible artifacts in the resulting sequence.

В US 2013/0128121 A1 описано устройство для ретуширования видеоданных на основании отслеживания характерных точек. С использованием двумерного (2D) отслеживания получают траектории редких точек сцены и выделяют часть из них для описания плавного движения сцены. Это движение используют для прогнозирования точек сцены, которые являются невидимыми на целевых кадрах и для переноса содержимого из кадров-источников путем совмещения характерных точек и последующего использования технологий переноса с сохранением содержимого и наложения для заполнения отсутствующих областей. Предложенный способ хорошо работает только для последовательностей с определенными типами движения, которые обеспечивают возможность нахождения источника для каждого целевого пикселя. Этот документ не касается случаев, когда требуются дополнительные технологии пространственного ретуширования для обеспечения ретуширования видеоданных с наименьшим числом видимых артефактов.US 2013/0128121 A1 describes a device for retouching video data based on tracking of characteristic points. Using two-dimensional (2D) tracking, the trajectories of the rare points of the scene are obtained and some of them are extracted to describe the smooth movement of the scene. This movement is used to predict scene points that are invisible on the target frames and to transfer content from source frames by combining characteristic points and then using transfer technologies while preserving the content and overlay to fill in the missing areas. The proposed method works well only for sequences with certain types of motion, which provide the ability to find a source for each target pixel. This document does not address cases where additional spatial retouching technologies are required to provide retouching of video data with the least number of visible artifacts.

US 2012/0162395 A1 описывает синтез ракурсов для трехмерных (3D) видеосистем. В предложенном способе определяют значения глубины в отсутствующей области (например, путем выбора наименьшего значения в определенной области вокруг отсутствующего пикселя) и затем вычисляют значения цвета с использованием полученных значений глубины путем применения особой взвешенной фильтрации для всех отсутствующих пикселей и соседних пикселей. Способ был разработан для заполнения областей открытия для синтеза трехмерных ракурсов и в общем может показать плохие визуальные результаты при больших ретушированных областях. Кроме того, в нем не используется временная информация в видеопоследовательности, поскольку он основан только на информации о глубине и цвете для одного кадра.US 2012/0162395 A1 describes the synthesis of angles for three-dimensional (3D) video systems. In the proposed method, the depth values in the missing region are determined (for example, by selecting the smallest value in a certain region around the missing pixel) and then the color values are calculated using the obtained depth values by applying special weighted filtering for all missing pixels and neighboring pixels. The method was developed to fill the opening areas for the synthesis of three-dimensional angles and in general can show poor visual results with large retouched areas. In addition, it does not use temporal information in a video sequence, since it is based only on depth and color information for one frame.

US 2013/0128121 A1 и US 2006/0257042 A1 можно рассматривать в качестве ближайших аналогов предлагаемого изобретения.US 2013/0128121 A1 and US 2006/0257042 A1 can be considered as the closest analogues of the invention.

РАСКРЫТИЕ ИЗОБРЕТЕНИЯSUMMARY OF THE INVENTION

Учитывая вышеописанные проблемы, согласно настоящему изобретению предложен способ ретуширования фона видеоизображения и соответствующая система обработки видеоданных, выполненная с возможностью осуществления такого способа, которые преодолевают по меньшей мере некоторые из недостатков систем обработки видеоданных из уровня техники.In view of the above problems, the present invention provides a method for retouching a video image background and a corresponding video processing system configured to implement such a method that overcome at least some of the disadvantages of the prior art video processing systems.

Задача изобретения состоит в создании способа обработки видеоданных для ретуширования фона видеоизображения, работающего с данными пикселей, попиксельными полями векторов движения и указанием отсутствующих областей, т.е. областей, подлежащих восстановлению, в данных пикселей для каждого кадра последовательности видеоданных, например положений объектов переднего плана, причем все упомянутые данные относятся к одной видеопоследовательности. Видеопоследовательность в контексте настоящего документа - это последовательность кадров, которые могли быть сняты одной непрерывной съемкой камеры без перерывов и мгновенных переходов.The objective of the invention is to provide a method for processing video data for retouching the background of a video image working with pixel data, pixel-by-pixel fields of motion vectors and indicating missing areas, i.e. areas to be restored in the pixel data for each frame of the sequence of video data, for example, the positions of the foreground objects, all of the mentioned data refer to the same video sequence. A video sequence in the context of this document is a sequence of frames that could be shot with a single continuous camera shot without interruptions and instant transitions.

В одном аспекте изобретение относится к способу ретуширования фона видеоизображения, работающему с данными пикселей, причем способ содержит этапы, на которых:In one aspect, the invention relates to a method for retouching a video background working with pixel data, the method comprising the steps of:

a. получают информацию о движении фона из серии кадров видеопоследовательности,a. get background motion information from a series of frames of a video sequence,

b. для всех кадров с по меньшей мере одной отсутствующей областью:b. for all frames with at least one missing area:

- распространяют данные пикселей из одного или более предшествующих кадров с использованием информации о движении фона,- distribute pixel data from one or more previous frames using background motion information,

- распространяют данные пикселей из одного или более последующих кадров с использованием информации о движении фона, и- distributing pixel data from one or more subsequent frames using background motion information, and

- объединяют упомянутые данные с частично восстановленными данными пикселей в одной или более указанных областях на текущем кадре;- combine the above data with partially restored pixel data in one or more of these areas on the current frame;

c. повторяют до тех пор, пока не восстановлены все отсутствующие области на всех кадрах, этапы, на которых:c. repeat until all missing areas on all frames have been restored, the stages in which:

- выбирают кадр с по меньшей мере одной оставшейся отсутствующей областью;- choose a frame with at least one remaining missing area;

- выполняют процедуру пространственного ретуширования на упомянутом кадре для восстановления данных пикселей упомянутой отсутствующей области; и- perform a spatial retouching procedure on said frame to recover pixel data of said missing region; and

- распространяют восстановленные данные пикселей из упомянутого выбранного кадра на все кадры последовательности, на которые возможно распространить упомянутые восстановленные данные пикселей, с использованием информации о движении фона.- distributing the restored pixel data from said selected frame to all frames of the sequence into which it is possible to extend said restored pixel data using background motion information.

В варианте выполнения получение информации о движении фона может содержать этапы, на которых оценивают попиксельное движение для полного кадра для определенного кадра и применяют процедуру пространственного заполнения движения для оценки фонового движения в областях, подлежащих восстановлению.In an embodiment, obtaining background motion information may comprise stages in which pixel-by-pixel motion for a full frame for a given frame is evaluated and a spatial motion filling procedure is applied to estimate background motion in areas to be restored.

Процедура пространственного заполнения движения может использовать оценку глобального движения внутри отсутствующих областей и обеспечивать плавный переход от попиксельного движения к глобальному движению вблизи границ отсутствующей области.The spatial motion filling procedure can use an estimate of global motion within missing regions and provide a smooth transition from pixel-by-pixel motion to global motion near the boundaries of the missing region.

Распространение данных пикселей с использованием информации о фоновом движении может содержать этап, на котором осуществляют временную интеграцию полнокадровых полей движения между кадрами-источниками и целевыми кадрами и распространяют доступные данные пикселей из кадров-источников в указанные отсутствующие области в целевом кадре.The distribution of pixel data using background motion information may comprise the step of temporarily integrating full-frame motion fields between source frames and target frames, and distributing available pixel data from source frames to said missing areas in the target frame.

В другом аспекте предложена система обработки видеоданных, выполненная с возможностью ретуширования фона видеоизображения, работая с данными пикселей, причем система содержит:In another aspect, a video processing system is provided that is capable of retouching the background of a video image while working with pixel data, the system comprising:

по меньшей мере один процессор; иat least one processor; and

память для хранения данных, относящихся к обработке видеоданных, а также компьютерные программные инструкции;a memory for storing data related to the processing of video data, as well as computer program instructions;

причем при исполнении упомянутым по меньшей мере одним процессором компьютерные программные инструкции побуждают процессор:and when executed by the at least one processor, computer program instructions prompt the processor:

- получать информацию о движении фона из серии кадров видеопоследовательности;- receive information about the movement of the background from a series of frames of a video sequence;

- для всех кадров с по меньшей мере одной отсутствующей областью:- for all frames with at least one missing area:

- распространять данные пикселей из одного или более предшествующих кадров с использованием информации о движении фона,- distribute pixel data from one or more previous frames using background motion information,

- распространять данные пикселей из одного или более последующих кадров с использованием информации о движении фона, и- distribute pixel data from one or more subsequent frames using background motion information, and

- объединять упомянутые данные с частично восстановленными данными пикселей в одной или более указанных областях на текущем кадре;- combine the mentioned data with partially restored pixel data in one or more of the indicated areas on the current frame;

- выбирать кадр с по меньшей мере одной оставшейся отсутствующей областью;- select a frame with at least one remaining missing area;

- выполнять процедуру пространственного ретуширования на упомянутом кадре для восстановления данных пикселей упомянутой отсутствующей области; и- perform the spatial retouching procedure on said frame to recover pixel data of said missing region; and

- распространять данные пикселей из упомянутого выбранного кадра на все кадры последовательности, в которые возможно распространить упомянутые восстановленные данные пикселей, с использованием информации о движении фона.- distribute pixel data from said selected frame to all frames of a sequence into which it is possible to distribute said recovered pixel data using background motion information.

При исполнении упомянутым по меньшей мере одним процессором компьютерные программные инструкции могут дополнительно побуждать по меньшей мере один процессор оценивать попиксельное движение для полного кадра для определенного кадра и применять процедуру пространственного заполнения движения для оценки движения фона в областях, подлежащих восстановлению. Также компьютерные программные инструкции могут дополнительно побуждать по меньшей мере один процессор использовать оценку глобального движения в отсутствующих областях и обеспечивать плавный переход от попиксельного движения к глобальному движению вблизи границ отсутствующей области. В варианте выполнения компьютерные программные инструкции могут дополнительно побуждать по меньшей мере один процессор осуществлять временную интеграцию полнокадровых полей движения между кадрами-источниками и целевыми кадрами и распространять доступные данные пикселей из кадров-источников в указанные отсутствующие области в целевом кадре.When executed by the at least one processor, computer program instructions may additionally cause the at least one processor to evaluate pixel-by-pixel motion for a full frame for a given frame and apply a spatial motion filling procedure to estimate background motion in areas to be restored. Also, computer program instructions may further encourage at least one processor to use an estimate of global motion in missing regions and to provide a smooth transition from pixel-by-pixel motion to global motion near the boundaries of the missing region. In an embodiment, computer program instructions may further encourage at least one processor to temporarily integrate full-frame motion fields between source frames and target frames and distribute available pixel data from source frames to said missing areas in the target frame.

Еще один аспект изобретения относится к машиночитаемому носителю, на котором сохранена компьютерная программа, которая при исполнении по меньшей мере одним процессором побуждает по меньшей мере один процессор выполнять способ ретуширования фона видеоизображения, работающий с данными пикселей, причем компьютерная программа содержит:Another aspect of the invention relates to a computer-readable medium on which a computer program is stored, which, when executed by at least one processor, causes the at least one processor to perform a background retouching method of the video image working with pixel data, the computer program comprising:

- код для получения информации о движении фона из серии кадров видеопоследовательности,- a code for obtaining information about the movement of the background from a series of frames of a video sequence,

- код для распространения данных пикселей из одного или более предшествующих кадров с использованием информации о движении фона,- a code for distributing pixel data from one or more previous frames using background motion information,

- код для распространения данных пикселей из одного или более последующих кадров с использованием информации о движении фона, и- a code for distributing pixel data from one or more subsequent frames using background motion information, and

- код для объединения упомянутых данных с частично восстановленными данными пикселей в одной или более указанных областях на текущем кадре;- a code for combining said data with partially restored pixel data in one or more of the indicated areas on the current frame;

- код для выбора кадра с по меньшей мере одной оставшейся отсутствующей областью;- code for selecting a frame with at least one remaining missing area;

- код для выполнения процедуры пространственного ретуширования на упомянутом кадре для восстановления данных пикселей упомянутой отсутствующей области; и- a code for performing a spatial retouching procedure on said frame for recovering pixel data of said missing region; and

- код для распространения данных пикселей из упомянутого выбранного кадра на все кадры последовательности, на которые возможно распространить упомянутые восстановленные данные пикселей, с использованием информации о фоновом движении.- a code for distributing pixel data from said selected frame to all frames of a sequence to which it is possible to extend said recovered pixel data using background motion information.

При прочтении и понимании нижеприведенного описания специалистам в данной области техники будет понятно, что заявленное изобретение может также принимать и другие формы. Различные этапы способа и компоненты системы могут быть реализованы средствами аппаратного обеспечения, программного обеспечения, микропрограммного обеспечения или любым подходящим их сочетанием.By reading and understanding the description below, those skilled in the art will understand that the claimed invention may also take other forms. The various steps of the method and system components may be implemented by hardware, software, firmware, or any suitable combination thereof.

КРАТКОЕ ОПИСАНИЕ ЧЕРТЕЖЕЙBRIEF DESCRIPTION OF THE DRAWINGS

После вышеприведенного раскрытия изобретения ниже приведено подробное описание изобретательского замысла в качестве примера и с обращением к сопровождающим чертежам, которые приведены лишь в качестве иллюстрации и не предназначены для ограничения объема заявленного изобретения или определения его существенных признаков. На чертежах:After the above disclosure of the invention, the following is a detailed description of the inventive concept as an example and with reference to the accompanying drawings, which are given only as an illustration and are not intended to limit the scope of the claimed invention or to determine its essential features. In the drawings:

Фиг. 1 иллюстрирует основные компоненты системы обработки видеоданных согласно изобретению.FIG. 1 illustrates the main components of a video processing system according to the invention.

На Фиг. 2 показаны основные этапы ретуширования фона видеоизображения в соответствии со способом ретуширования фона видеоизображения согласно изобретению.In FIG. 2 shows the basic steps of retouching the background of a video image in accordance with the method of retouching the background of a video image according to the invention.

На Фиг. 3 показан пример временного интегрирования движения в соответствии со способом согласно изобретению.In FIG. 3 shows an example of the temporary integration of motion in accordance with the method according to the invention.

Фиг. 4A-B иллюстрирует принципы распространения данных пикселей при распространении на текущий кадр и с него с использованием интегрированного движения в соответствии со способом согласно изобретению.FIG. 4A-B illustrates the principles of distributing pixel data when propagating to and from the current frame using integrated motion in accordance with the method of the invention.

Фиг. 5 иллюстрирует принципы получения движения фона согласно изобретению.FIG. 5 illustrates the principles of obtaining background motion according to the invention.

ОСУЩЕСТВЛЕНИЕ ИЗОБРЕТЕНИЯDETAILED DESCRIPTION OF THE INVENTION

Настоящее подробное описание приведено, чтобы способствовать пониманию сущности изобретения. Следует отметить, что описание относится к примерным вариантам выполнения изобретения, и при внимательном прочтении описания с обращением к сопровождающим чертежам специалистом в данной области техники могут быть предусмотрены другие модификации, изменения и эквивалентные замены в описанных объектах изобретения. Все такие очевидные модификации, изменения и эквиваленты считаются охваченными объемом заявляемого изобретения. Ссылочные позиции и условные обозначения, приведенные в настоящем подробном описании, а также в прилагаемой формуле изобретения, не предназначены для ограничения или определения объема заявляемого изобретения каким-либо образом.The present detailed description is provided to facilitate understanding of the invention. It should be noted that the description refers to exemplary embodiments of the invention, and upon careful reading of the description with reference to the accompanying drawings, other modifications, changes and equivalent replacements in the described objects of the invention may be envisaged by a person skilled in the art. All such obvious modifications, changes and equivalents are considered to be covered by the scope of the claimed invention. The reference numbers and symbols given in the present detailed description, as well as in the attached claims, are not intended to limit or determine the scope of the claimed invention in any way.

Заявленное изобретение относится к способу ретуширования фона видеоизображения, работающего с данными пикселей, попиксельными полями векторов движения и указанием отсутствующих областей (т.е. областей, подлежащих восстановлению) в данных пикселей для каждого кадра последовательности видеоданных, например положений объектов переднего плана, причем все упомянутые данные относятся к одной видеопоследовательности. Видеопоследовательность в контексте настоящего документа - это последовательность кадров, которые могли быть сняты одной непрерывной съемкой камеры без перерывов и мгновенных переходов. Способ согласно изобретению содержит следующие этапы, на которых:The claimed invention relates to a method for retouching a background of a video image working with pixel data, pixel-by-pixel fields of motion vectors and indicating missing areas (i.e., areas to be restored) in pixel data for each frame of a sequence of video data, for example, positions of foreground objects, all of which are mentioned data refer to one video sequence. A video sequence in the context of this document is a sequence of frames that could be shot with a single continuous camera shot without interruptions and instant transitions. The method according to the invention comprises the following steps, in which:

- распространяют данные пикселей из упомянутого выбранного кадра на все кадры последовательности, на которые возможно распространить упомянутые восстановленные данные пикселей, с использованием информации о движении фона.- distributing pixel data from said selected frame to all frames of a sequence into which it is possible to extend said recovered pixel data using background motion information.

На Фиг. 1 изображены основные компоненты обобщенной системы, способной выполнять упомянутый алгоритм. Программа, которая реализует алгоритм, хранится в памяти (101) вместе с данными, требуемыми для процесса. Она выполняется и управляется одним или более процессорами (100) и выводит результат либо в память, либо посредством подходящего устройства (103) отображения. Вся передача данных происходит через шину (104) данных.In FIG. 1 shows the main components of a generalized system capable of executing the aforementioned algorithm. A program that implements the algorithm is stored in memory (101) along with the data required for the process. It is executed and controlled by one or more processors (100) and outputs the result either to memory or through a suitable display device (103). All data transfer occurs via the data bus (104).

На Фиг. 2 показаны основные этапы алгоритма ретуширования фона видеоизображения, на котором основан способ согласно изобретению. Алгоритм работает с последовательностями видеоданных, которые содержат одну снятую сцену. В случае множества сцен необходимо разделить их на множество подпоследовательностей и обработать каждую из них по отдельности. На этапе (201) система, которая осуществляет способ согласно изобретению, получает все необходимые входные данные, такие как: цветные или полутоновые изображения для всех кадров в последовательности, информацию об отсутствующих областях для всех кадров в последовательности (указанных любым подходящим средством, например масками). Также для всех кадров в последовательности требуются попиксельные поля движения фона (или их оценки). Предложенная схема оценки движения фона поясняется ниже, см. Фиг. 5 для подробного описания. Далее для каждого кадра в последовательности, начиная с первого, система выполняет этапы (203a) и (203b) (которые могут быть выполнены параллельно). На этапах (203a) и (203b) осуществляется временное интегрирование полей движения вперед и назад, берущих начало в текущем кадре, в направлении каждого из предшествующих и последующих кадров (см. приведенное ниже описание со ссылкой на Фиг. 3 для более подробного описания процесса). Это обеспечивает карты соответствия фоновых пикселей между текущим (целевым) и любым из других кадров (источников) в последовательности. Если пиксель отмечен как отсутствующий на целевом кадре, но существуют один или более кадров-источников, на которых соответствующий пиксель не отсутствует, то пиксель из источника может быть заполнен («перенесен» или «распространен») данными, взятыми из упомянутого кадра-источника (для иллюстрации см. верхнюю часть Фиг. 4). После этого выполняют этап (204) смешивания доступных перенесенных пикселей следующим образом: если для целевого пикселя существуют два или более возможных пикселя, которые взяты из кадров с одного направления поиска (например, все с предшествующих или с последующих кадров), то выбирают пиксель источника, который ближе по времени к кадру источника. Если существуют два возможных пикселя для целевого пикселя, взятых из кадров с различных направлений поиска (один из предшествующего кадра и один из последующего), то используют смешивание с весовыми коэффициентами, зависящими от временной удаленности каждого пикселя источника. После заполнения целевого пикселя его отмечают как не отсутствующий и процесс продолжается для следующего отсутствующего пикселя. В результате этапа (204) получается кадр, на котором заполнена по меньшей мере часть исходно отсутствовавших пикселей. После выполнения вышеуказанных этапов для каждого кадра в последовательности алгоритм переходит к этапу (205). Если некоторые из кадров по-прежнему имеют отсутствующие пиксели, то на этапе (207) один из таких пикселей выбирают для пространственного ретуширования. Могут быть использованы несколько критериев, например на основании общей площади отсутствующих данных пикселей в кадре. Далее на этапе (208) отсутствующие пиксели ретушируют посредством любого подходящего способа пространственного ретуширования, такого как способ, описанный в патенте US 7,551,181 B2 (Criminisi и др.) “Image Region Filling by Exemplar-Based Inpainting”. Единственное требование состоит в том, что выбранный способ должен синтезировать зрительно правдоподобную структуру. Поэтому предпочтение в общем отдается способам на основании образцов. На этапе (209) новую информацию, введенную посредством пространственного ретуширования на выбранном кадре (источнике), распространяют на все соседние с ним (целевые) кадры с использованием того же процесса, что описан для этапов (203a) и (203b) (см. также Фиг. 4B для пояснения). Этапы (205), (207)-(209) повторяют до тех пор, пока в данных не останется отсутствующих пикселей, затем алгоритм переходит к этапу (206) и завершается.In FIG. 2 shows the main steps of the background retouching algorithm of the video image on which the method according to the invention is based. The algorithm works with sequences of video data that contain one shot scene. In the case of many scenes, it is necessary to divide them into many subsequences and process each of them individually. At step (201), the system that implements the method according to the invention receives all the necessary input data, such as color or grayscale images for all frames in the sequence, information about missing areas for all frames in the sequence (indicated by any suitable means, for example, masks) . Also, for all frames in the sequence, pixel-by-pixel background motion fields (or their estimates) are required. The proposed background motion estimation scheme is explained below, see FIG. 5 for a detailed description. Further, for each frame in the sequence, starting from the first, the system performs steps (203a) and (203b) (which can be performed in parallel). In steps (203a) and (203b), the forward and backward motion fields originating in the current frame are temporarily integrated in the direction of each of the previous and subsequent frames (see the description below with reference to Fig. 3 for a more detailed description of the process) . This provides maps matching background pixels between the current (target) and any of the other frames (sources) in the sequence. If a pixel is marked as missing on the target frame, but there are one or more source frames on which the corresponding pixel is not missing, then the pixel from the source can be filled (“transferred” or “distributed”) with data taken from the said source frame ( for illustration see the top of Fig. 4). After that, perform step (204) mixing the available transferred pixels as follows: if there are two or more possible pixels for the target pixel that are taken from frames from the same search direction (for example, all from previous or subsequent frames), then select a source pixel that is closer in time to the source frame . If there are two possible pixels for the target pixel taken from frames from different search directions (one from the previous frame and one from the next), then mix with weighting factors depending on the time distance of each source pixel. After filling in the target pixel, it is marked as non-missing and the process continues for the next missing pixel. As a result of step (204), a frame is obtained on which at least part of the initially missing pixels are filled. After performing the above steps for each frame in the sequence, the algorithm proceeds to step (205). If some of the frames still have missing pixels, then at step (207) one of these pixels is selected for spatial retouching. Several criteria may be used, for example, based on the total area of the missing pixel data in the frame. Next, in step (208), the missing pixels are retouched by any suitable spatial retouching method, such as the method described in US Pat. No. 7,551,181 B2 (Criminisi et al.) “Image Region Filling by Exemplar-Based Inpainting”. The only requirement is that the selected method should synthesize a visually plausible structure. Therefore, preference is generally given to sample based methods. At step (209), the new information introduced by spatial retouching on the selected frame (source) is distributed to all neighboring (target) frames using the same process as described for steps (203a) and (203b) (see also Fig. 4B for explanation). Steps (205), (207) - (209) are repeated until there are no missing pixels in the data, then the algorithm proceeds to step (206) and ends.

Фиг. 3 иллюстрирует процесс временного интегрирования полей движения для трех следующих друг за другом кадров, начиная с индекса N до N+2 (отмечены, соответственно, как 300, 301 и 302). Временная интеграция означает выборку и сложение следующих друг за другом полей векторов, определенных в одной и той же области изображения, следующим образом. Предположим, что элемент (303) является вектором движения с координатами (u₁, v₁), берущим начало из поля (300) движения в некотором положении (x, y). Тогда производят выборку поля (301) движения в положении (x+u₁, y+v₁). В случае субпиксельной точности движения может быть использована билинейная интерполяция или любой другой вид интерполяции для получения вектора (304) движения с координатами (u₂, v₂). Сложение этих двух векторов приводит к элементу (305) - вектору, который можно рассматривать как интегрированное смещение пикселя в положении (x, y) в кадре (300) по сравнению с кадром (302). Точно такой же процесс применяют при интегрировании назад во времени от кадра N к кадрам N-1 и т.д.FIG. 3 illustrates the process of temporarily integrating motion fields for three consecutive frames, starting from index N to N + 2 (marked as 300, 301 and 302, respectively). Temporary integration means the selection and addition of successive fields of vectors defined in the same image area, as follows. Suppose that element (303) is a motion vector with coordinates (u ₁ , v ₁ ) originating from the motion field (300) in some position (x, y). Then, the motion field (301) is sampled in the position (x + u ₁ , y + v ₁ ). In the case of subpixel motion accuracy, bilinear interpolation or any other type of interpolation can be used to obtain the motion vector (304) with coordinates (u ₂ , v ₂ ). Addition of these two vectors leads to element (305) - a vector, which can be considered as the integrated pixel offset at position (x, y) in frame (300) compared to frame (302). Exactly the same process is used when integrating backward in time from frame N to frames N-1, etc.

На Фиг. 4A-B обозначены основные различия и поток данных для процессов распространения, описанных на этапах (203) и (209) по Фиг. 2. На Фиг. 4A показан поток данных для этапа (203), а на Фиг. 4B показан поток данных для этапа (209). Элементы (400) в обоих случаях представляют кадр или кадры-источники, элементы (401) (окрашенные черным) в обоих случаях соответствуют отсутствующим областям на целевых кадрах, элементы (402) представляют области-источники, т.е. области, которые будут скопированы в отсутствующие области (401) в обоих случаях, и элементы (403) соответствуют целевым областям, которые ранее отсутствовали и были обновлены данными, взятыми из областей, отмеченных как (402). Элемент (404) - это целевой кадр (или кадры), подлежащие обновлению посредством алгоритма. На Фиг. 4A данные распространяют из всех соседних кадров на текущий кадр, частично заполняя различные части отсутствующей области (для подробностей см. описание этапа 203 по Фиг. 2). На Фиг. 4B данные распространяют с полного текущего кадра на все соседние кадры, частично заполняя отсутствующие области (для подробностей см. описание этапа 209 по Фиг. 2). Распространение выполняют, как описано выше, с использованием интегрированных полей движения фона.In FIG. 4A-B indicate the main differences and data flow for the distribution processes described in steps (203) and (209) of FIG. 2. In FIG. 4A shows the data stream for step (203), and FIG. 4B shows the data stream for step (209). Elements (400) in both cases represent a source frame or frames, elements (401) (colored black) in both cases correspond to missing areas on the target frames, elements (402) represent source regions, i.e. areas that will be copied to missing areas (401) in both cases, and elements (403) correspond to target areas that were previously absent and were updated with data taken from areas marked as (402). Element (404) is the target frame (or frames) to be updated by an algorithm. In FIG. 4A, data is distributed from all neighboring frames to the current frame, partially filling in various parts of the missing area (for details, see the description of step 203 of FIG. 2). In FIG. 4B, data is distributed from the full current frame to all adjacent frames, partially filling in the missing areas (for details, see the description of step 209 of Fig. 2). Propagation is performed as described above using integrated background motion fields.

Фиг. 5 иллюстрирует основные концепции предложенного способа получения движения фона в соответствии с настоящим изобретением. Ссылочная позиция (500) обозначает полнокадровое поле M₀(x,y) векторов движения, определенное по области изображения, которое используют в качестве входных данных, оцениваемых любым способом, который предоставляет попиксельное поле векторов движения с необходимой точностью, см., например, заявку на патент РФ № RU 2012129183. Далее, ссылочная позиция (501) обозначает край отсутствующей области, которая состоит из областей (502) и (503). Алгоритм получения движения фона синтезирует свои выходные данные из трех различных полей векторов движений. Вне области, окруженной контуром (501), векторы движения принимают равными M₀(x,y). Далее внутри области (502) и (503) векторы движения заполняют оценкой M₁(x,y), полученной адаптацией определенной модели движения. Например, можно рассмотреть гомографию между плоскостями, определяемую следующими выражениями:FIG. 5 illustrates the basic concepts of the proposed method for obtaining background motion in accordance with the present invention. Reference numeral (500) denotes a full-frame field of motion vectors M ₀ (x, y) determined by the image area, which is used as input data evaluated in any way that provides a pixel-by-pixel motion vector field with the necessary accuracy, see, for example, RF patent No. RU 2012129183. Further, the reference position (501) denotes the edge of the missing region, which consists of regions (502) and (503). The algorithm for obtaining background motion synthesizes its output from three different fields of motion vectors. Outside the area surrounded by the contour (501), the motion vectors are taken equal to M ₀ (x, y). Then, inside the regions (502) and (503), the motion vectors are filled with the estimate M ₁ (x, y) obtained by adapting a certain motion model. For example, you can consider homography between planes defined by the following expressions:

Пусть (x', y')=(x, y)+M₀(x, y), тогда путем подстановки H в точки, формируемые с векторами M₀(x, y), взятыми из области (504), может быть вычислено глобальное поле векторов движения M₁(x, y) по области изображения в виде разности между исходными точками (x, y) и точками, полученными из них путем гомографического преобразования с подставленными параметрами H (матрица 3 на 3). Область (504) представляет собой полосу шириной w (взятой в качестве параметра алгоритма, обозначенного ссылочной позицией (507)), которая находится на расстоянии d₂ пикселей от контура (501) (d ₂ является еще одним параметром алгоритма, обозначенным позицией (506) на Фиг. 5). Назначение упомянутой полосы состоит в предотвращении введения артефактов, которые могут образоваться при оценке попиксельного движения вокруг краев объекта в оценках глобального движения. Кроме того, в области (503), которая представляет собой другую полосу на d ₁ пикселей внутрь от контура (501) (d₁ - параметр, обозначенный ссылочной позицией (505)), используют другую оценку M₂(x, y) поля векторов движения. Авторы изобретения предлагают использовать способы на основании диффузии для оценки движения (ретуширования) в области (503), используя M₀(x, y) в качестве источника для ретуширования, как описано в публикации Telea, A. (2004). An image inpainting technique based on the fast marching method. Journal of graphics tools, 9(1), 23-34. В результате получают два поля векторов движения M₁(x, y), M₂(x, y), которые содержат оценки фонового движения внутри области, окруженной контуром (501). Следующий этап состоит в их попиксельном смешивании внутри области (503) для получения результирующего поля векторов движения. Это может быть реализовано путем выбора надлежащей весовой функции W(x, y), определяемой по той же области изображения. Пусть 0≤W(x,y)≤1, тогда результирующее поле векторов вычисляют в виде M(x,y)=(1-W(x,y))·M₁(x,y)+W(x,y)·M₂(x,y). Цель этого состоит в фиксации локальных особенностей движения и обеспечении плавного перехода от исходных оценок к оценкам глобального движения в кадре. В предпочтительном варианте используется W(x, y), экспоненциально уменьшающаяся (с приемлемым параметром скорости уменьшения в пикселях) с расстоянием от края (501) отсутствующей области внутри отсутствующей области и равное 1 вне отсутствующей области. В результате получается попиксельная оценка векторов движения фона для полного кадра, которая в общем имеет свойства глобального движения внутри ранее отсутствовавших областей, но не имеет проблем, связанных с потерей непрерывности вокруг краев отсутствующей области.Let (x ', y') = (x, y) + M₀(x, y), then by substituting H at the points formed with the vectors M₀(x, y) taken from region (504), the global field of motion vectors M can be calculated_one(x, y) over the image area in the form of the difference between the source points (x, y) and the points obtained from them by homographic transformation with the substituted parameters H (3 by 3 matrix). Area (504) is a strip widew (taken as a parameter of the algorithm indicated by the reference position (507)), which is at a distance d₂ pixels from path (501) (d ₂ is another algorithm parameter, indicated by (506) in FIG. 5). The purpose of this band is to prevent the introduction of artifacts that may occur when evaluating pixel-by-pixel motion around the edges of an object in global motion estimates. In addition, in the region (503), which is another band ond _one pixels inward from outline (501) (d_one - the parameter indicated by the reference position (505)), use another rating M₂(x, y) fields of motion vectors. The inventors propose the use of diffusion-based methods for assessing movement (retouching) in area (503) using M₀(x, y) as a source for retouching, as described in Telea, A. (2004). An image inpainting technique based on the fast marching method. Journal of graphics tools, 9 (1), 23-34. The result is two fields of motion vectors M_one(x, y), M₂(x, y), which contain estimates of the background motion inside the area surrounded by the contour (501). The next step is to mix them pixel by pixel within the region (503) to obtain the resulting field of motion vectors. This can be realized by choosing the appropriate weight function W (x, y), defined by the same area of the image. Let 0≤W (x, y) ≤1, then the resulting field of vectors is calculated as M (x, y) = (1-W (x, y)) · M_one(x, y) + W (x, y)₂(x, y). The purpose of this is to fix the local features of the movement and ensure a smooth transition from the initial estimates to the estimates of global movement in the frame. In a preferred embodiment, W (x, y) is used, decreasing exponentially (with an acceptable parameter of the decreasing rate in pixels) with a distance from the edge (501) of the missing region inside the missing region and equal to 1 outside the missing region. The result is a pixel-by-pixel estimate of the background motion vectors for the full frame, which generally has the properties of global motion within previously missing regions, but does not have problems associated with loss of continuity around the edges of the missing region.

Вышеописанный способ применим для осуществления в системах, выполняющих преобразование видео из 2D в 3D с участием пользователя, а также к любой другой обработке для улучшения видеоданных, которая требует ретуширования видеоданных. Типичная основная система с необходимыми компонентами, которые являются обычными для вычислительной системы общего назначения, была более подробно описана выше, см. описание со ссылкой на Фиг. 1.The above method is applicable to the implementation in systems that convert video from 2D to 3D with the participation of the user, as well as to any other processing for improving video data, which requires retouching the video data. A typical core system with the necessary components that are common to a general purpose computing system has been described in more detail above, see description with reference to FIG. one.

При прочтении вышеприведенного описания с обращением к чертежам специалистами в данной области техники могут быть предусмотрены другие аспекты изобретения. Специалисту в данной области техники будет понятно, что возможны другие варианты выполнения изобретения и что подробности изобретения могут быть изменены в различных отношениях без выхода за рамки изобретательского замысла. Таким образом, чертежи и описание рассматриваются как имеющие иллюстративный, но не ограничительный характер. Объем заявляемого изобретения определяется только прилагаемой формулой изобретения.By reading the above description with reference to the drawings, other aspects of the invention may be contemplated by those skilled in the art. One skilled in the art will understand that other embodiments of the invention are possible and that the details of the invention can be changed in various ways without departing from the scope of the inventive concept. Thus, the drawings and description are considered to be illustrative, but not restrictive. The scope of the claimed invention is determined only by the attached claims.

Claims

1. A method of retouching the background of a video image using pixel data, the method comprising the steps of:
a. get background motion information from a series of frames of a video sequence,
b. for all frames with at least one missing area:
- transfer pixel data from one or more previous frames using information about the movement of the background,
- transfer pixel data from one or more subsequent frames using background motion information, and
- combine the above data with partially restored pixel data in one or more of these areas on the current frame;
c. repeat until all missing areas on all frames have been restored, the stages in which:
- choose a frame with at least one remaining missing area;
- perform a spatial retouching procedure on said frame to recover pixel data of said missing region; and
- transferring the restored pixel data from said selected frame to all frames of the sequence to which it is possible to transfer said restored pixel data using background motion information.

2. The method according to p. 1, in which obtaining information about the background motion comprises the steps of:
- evaluate the per-pixel motion for the full frame for a specific frame, and
- apply the spatial motion filling procedure to assess background motion in areas to be restored.

3. The method of claim 2, wherein the spatial motion filling procedure uses an estimate of global motion within missing regions and provides a smooth transition from pixel-by-pixel motion to global motion near the boundaries of the missing region.

4. The method according to claim 1, in which the transfer of pixel data using information about the movement of the background comprises the step of temporarily integrating the full-frame motion fields between the source frames and the target frames and transferring the available pixel data from the source frames to the indicated missing areas in the target frame.

5. A video processing system configured to retouch the background of a video image using pixel data, the system comprising:
at least one processor; and
a memory for storing data related to the processing of video data, as well as computer program instructions;
and when executed by the at least one processor, computer program instructions prompt the processor:
- receive information about the movement of the background from a series of frames of a video sequence;
- for all frames with at least one missing area:
- transfer pixel data from one or more previous frames using background motion information,
- transfer pixel data from one or more subsequent frames using background motion information, and
- combine the mentioned data with partially restored pixel data in one or more of the indicated areas on the current frame;
- select a frame with at least one remaining missing area;
- perform the spatial retouching procedure on said frame to recover pixel data of said missing region; and
- transfer pixel data from said selected frame to all frames of a sequence into which it is possible to transfer said recovered pixel data using background motion information.

6. The system of claim 5, wherein when said at least one processor is executed, computer program instructions further induce at least one processor:
- evaluate pixel-by-pixel motion for a full frame for a specific frame, and
- apply the spatial motion filling procedure to assess background motion in areas to be restored.

7. The system of claim 5, wherein when the at least one processor executes the computer program instructions, the at least one processor further encourages at least one processor to use the global motion estimate in the missing areas and to provide a smooth transition from pixel-by-pixel to global movement near the boundaries of the missing area.

8. The system of claim 5, wherein when the at least one processor executes the computer program instructions, additionally cause the at least one processor to temporarily integrate the full-frame fields of the motion vectors between the source frames and the target frames and transfer the available pixel data from the frame- sources to the indicated missing areas in the target frame.

9. A computer-readable medium on which a computer program is stored, which, when executed by at least one processor, causes the at least one processor to perform a method for retouching the background of the video image using pixel data, the computer program comprising:
- a code for obtaining information about the movement of the background from a series of frames of a video sequence,
- a code for transferring pixel data from one or more previous frames using background motion information,
- a code for transferring pixel data from one or more subsequent frames using background motion information, and
- a code for combining said data with partially restored pixel data in one or more of the indicated areas on the current frame;
- code for selecting a frame with at least one remaining missing area;
- a code for performing a spatial retouching procedure on said frame for recovering pixel data of said missing region; and
- a code for transferring pixel data from said selected frame to all frames of a sequence to which it is possible to transfer said recovered pixel data using background motion information.