RU2768797C1

RU2768797C1 - Method and system for determining synthetically modified face images on video

Info

Publication number: RU2768797C1
Application number: RU2021130421A
Authority: RU
Inventors: Кирилл Евгеньевич Вышегородцев; Александр Викторович Балашов; Григорий Алексеевич Вельможин; Валентин Валерьевич Сысоев
Original assignee: Публичное Акционерное Общество "Сбербанк России" (Пао Сбербанк)
Priority date: 2021-10-19
Filing date: 2021-10-19
Publication date: 2022-03-24
Also published as: WO2023068956A1

Abstract

FIELD: computer engineering.

SUBSTANCE: invention relates to computer engineering for determining synthetically altered images of faces on video. Technical result is achieved by a method in which: an image is obtained from a video; identifying images of faces on said image; calculating a vector representation of the geometric characteristics of the detected face images; using frame-by-frame video analysis, calculating the spatial-temporal significance of each face image of each person on said image and a vector representation of the temporal characteristic of the face image; calculating a vector of estimates of the probability of synthetic changes for images of human faces; calculating an overall estimate of the probability of synthetic changes; generating a final assessment of the presence of a synthetic face image change on the video; forming an integral estimate of the presence of a synthetically changed face image on the video at least one final estimate of the model and generating a notification on the presence of the synthetically changed face in the video.

EFFECT: high accuracy and efficiency of detecting synthetic change in images of human faces in video.

20 cl, 10 dwg, 2 tbl

Description

ОБЛАСТЬ ТЕХНИКИFIELD OF TECHNOLOGY

[0001] Настоящее техническое решение относится к области компьютерных технологий, применяемых в области обработки данных, в частности к способу и системе для определения синтетически измененных изображений лиц на видео.[0001] The present technical solution relates to the field of computer technology used in the field of data processing, in particular to a method and system for determining synthetically modified images of faces in a video.

УРОВЕНЬ ТЕХНИКИBACKGROUND OF THE INVENTION

[0002] На сегодняшний день применение технологий для формирования синтетических изображений, накладываемых на изображения лиц реальных людей, основано, как правило, на применении алгоритмов машинного обучения, например, искусственных нейронных сетей (ИНС). Такие подходы направлены на наложение цифровых масок, имитирующих лица людей. Примером такой технологии является методика DeepFake, основанная на искусственном интеллекте и применяемая для синтеза изображений (см. https://ru.wikipedia.org/wiki/Deepfake).[0002] To date, the use of technologies for the formation of synthetic images superimposed on images of the faces of real people is based, as a rule, on the use of machine learning algorithms, for example, artificial neural networks (ANNs). Such approaches are aimed at imposing digital masks that mimic human faces. An example of such technology is the DeepFake technique, based on artificial intelligence and used for image synthesis (see https://ru.wikipedia.org/wiki/Deepfake).

[0003] Известен способ распознавания синтетически измененных изображений лиц людей, в частности DeepFake изображений (Tolosana et al. DeepFakes Evolution: Analysis of Facial Regionsand Fake Detection Performance // Biometrics and Data Pattern Analytics - BiDA Lab, Universidad Autonoma de Madrid. 2020), который основан на анализе сегментов, формирующих изображения лица. Анализ осуществляется с помощью ИНС, обученной на реальных и синтетических изображениях лиц людей, в частности, знаменитостей, что может применяться для выявления подложных (фейковых) видео. Способ позволяет анализировать сегменты лица, на основании которых выдается классификация соответствующего изображения как содержащего синтетические изменения или нет.[0003] There is a known method for recognizing synthetically modified images of people's faces, in particular DeepFake images (Tolosana et al. DeepFakes Evolution: Analysis of Facial Regions and Fake Detection Performance // Biometrics and Data Pattern Analytics - BiDA Lab, Universidad Autonoma de Madrid. 2020), which is based on the analysis of the segments that form the face images. The analysis is carried out using an ANN trained on real and synthetic images of people's faces, in particular, celebrities, which can be used to detect fake (fake) videos. The method makes it possible to analyze facial segments, on the basis of which a classification of the corresponding image is issued as containing synthetic changes or not.

[0004] Недостатком такого подхода является низкая эффективность в связи с отсутствием применения интегральной оценки, которая бы формировалась исходя из геометрических параметров изображения лица, так и на основании пространственно-временной характеристики лица человека в видео. Другим недостатком является отсутствие в некоторых решениях обработки нескольких людей, если на видео присутствует нескольких человек. В других известных открытых решениях (https://www.kaggle.eom/c/deepfake-detection-challenge, https://ai.facebook.com/datasets/dfdc/) такая обработка осуществляется путем независимой оценки каждого изображения лица, каждого человека, на каждом анализируемом кадре видео и дальнейшем усреднением всех таких оценок. Все такие решения показывают низкую эффективность при обработке видео с несколькими людьми.[0004] The disadvantage of this approach is low efficiency due to the lack of application of an integral estimate, which would be formed based on the geometric parameters of the face image, and on the basis of the spatio-temporal characteristics of the person's face in the video. Another drawback is the lack of processing of several people in some solutions if there are several people on the video. In other well-known open solutions (https://www.kaggle.eom/c/deepfake-detection-challenge, https://ai.facebook.com/datasets/dfdc/), such processing is carried out by independently evaluating each face image, each person, on each analyzed frame of the video and further averaging all such estimates. All such solutions show low efficiency when processing videos with several people.

СУЩНОСТЬ ИЗОБРЕТЕНИЯSUMMARY OF THE INVENTION

[0005] Заявленные способ и система направлены на решение технической проблемы, заключающейся в эффективном и точном определении синтетических изменений изображений лиц на видео.[0005] The claimed method and system is aimed at solving the technical problem of efficiently and accurately detecting synthetic changes in facial images in video.

[0006] Техническим результатом является повышение точности и эффективности обнаружения синтетического изменения изображений лиц людей в видео.[0006] The technical result is to increase the accuracy and efficiency of detecting a synthetic change in images of people's faces in a video.

[0007] В первой предпочтительной реализации изобретения предложен компьютерно-реализуемый способ определения синтетически измененных изображений лиц на видео, выполняемый с помощью процессора, при котором:[0007] In a first preferred embodiment of the invention, a computer-implemented method for determining synthetically modified images of faces in a video, performed using a processor, is provided, in which:

a) получают по меньшей мере одно изображение из видео;a) obtaining at least one image from the video;

b) выявляют изображения лиц на упомянутом изображении;b) detecting images of faces in said image;

c) рассчитывают векторное представление геометрических характеристик выявленных изображений лиц, с помощью по меньшей мере алгоритма сравнения опорных точек лиц, для определения изображений по меньшей мере лица одного человека;c) calculating a vector representation of the geometric characteristics of the detected face images using at least a face reference point comparison algorithm to determine images of at least one person's face;

d) с помощью покадрового анализа видео рассчитывают пространственно-временную значимость каждого изображения лица каждого человека на упомянутом изображении, которая определяется как векторное представление пространственной характеристики лица, характеризующей размер области лица по отношению к кадру, и векторное представление временной характеристики изображения лица, характеризующей время отображения анализируемого изображения лица на кадрах видео;d) using frame-by-frame video analysis, the spatio-temporal significance of each image of the face of each person in said image is calculated, which is defined as a vector representation of the spatial characteristic of the face, characterizing the size of the face area in relation to the frame, and a vector representation of the temporal characteristic of the face image, characterizing the display time the analyzed image of the face on the video frames;

e) рассчитывают вектор оценок вероятности синтетических изменений для изображений лиц человека, характеризующий наличие синтетических изменений изображений лиц этого человека в каждом кадреe) calculate the vector of estimates of the probability of synthetic changes for images of human faces, characterizing the presence of synthetic changes in the images of faces of this person in each frame

f) рассчитывают общую оценку вероятности синтетических изменений на основании векторных преставлений пространственного, временного распределения и вектора оценок синтетических изменений для изображений лиц каждого человека в видео;f) calculating an overall synthetic change probability score based on vector representations of the spatial, temporal, and synthetic change score vectors for each person's face images in the video;

g) формируют итоговую оценку наличия на видео синтетического изменения изображения по меньшей мере одного лица;g) forming a final assessment of the presence in the video of a synthetic image change of at least one face;

h) формируют интегральную оценку наличия на видео синтетически измененного изображения лица по меньшей мере по одной итоговой оценке модели и генерируют уведомление о наличии синтетически измененного лица в видео.h) forming an integral estimate of the presence of a synthetically modified face image in the video according to at least one final assessment of the model and generating a notification about the presence of a synthetically modified face in the video.

[0008] В одной из частных реализаций способа этапы с)-h) выполняются моделью машинного обучения или ансамблем моделей, при этом модель машинного обучения или ансамбль моделей натренированы на наборе данных, содержащих синтезированные изображения лиц людей.[0008] In one of the particular implementations of the method, steps c)-h) are performed by a machine learning model or ensemble of models, while the machine learning model or ensemble of models is trained on a data set containing synthesized images of people's faces.

[0009] В другой частной реализации способа модель машинного обучения использует функцию автоматической корректировки разметки, обеспечивающей исправление некорректной разметки каждого лица на кадрах, путем сравнения изображений лиц на синтезированном видео с их изображениями на исходном видео.[0009] In another particular implementation of the method, the machine learning model uses an automatic markup correction function that corrects the incorrect markup of each face on frames by comparing the images of faces on the synthesized video with their images on the original video.

[0010] В другой частной реализации способа сравнение лиц осуществляется на основании значения векторной близости опорных точек, формирующих геометрические характеристики исходного изображения лица и синтезированного изображения на его основе.[0010] In another particular implementation of the method, faces are compared based on the value of the vector proximity of the reference points that form the geometric characteristics of the original face image and the synthesized image based on it.

[0011] В другой частной реализации способа сравнение лиц осуществляется с помощью анализа координат областей исходного изображения лица и синтезированного изображения лица.[0011] In another particular implementation of the method, comparison of faces is carried out by analyzing the coordinates of the regions of the original face image and the synthesized face image.

[0012] В другой частной реализации способа пространственно-временная значимость рассчитывается как общая матрица на основании значений векторных представлений, а оценка наличия синтетических изменений изображений лиц отдельного человека формируется моделью машинного обучения по полученной общей матрице.[0012] In another particular implementation of the method, the spatiotemporal significance is calculated as a general matrix based on the values of vector representations, and the assessment of the presence of synthetic changes in the images of the faces of an individual is formed by a machine learning model using the obtained general matrix.

[0013] В другой частной реализации способа ансамбль моделей машинного обучения состоит из группы моделей, каждая из которых обучена на выявление определенного алгоритма формирования синтетических изображений.[0013] In another particular implementation of the method, an ensemble of machine learning models consists of a group of models, each of which is trained to identify a specific synthetic imaging algorithm.

[0014] В другой частной реализации способа содержит интегральный классификатор, получающий на вход оценки, формируемые с помощью моделей, входящих в ансамбль.[0014] In another particular implementation of the method, it contains an integral classifier that receives as input estimates generated using models included in the ensemble.

[0015] В другой частной реализации способа общая оценка рассчитывается с помощью интегрального классификатора.[0015] In another particular implementation of the method, the overall score is calculated using an integral classifier.

[0016] В другой частной реализации способа дополнительно определяется алгоритм формирования синтетического изображения лица в анализируемом видеопотоке.[0016] In another particular implementation of the method, an algorithm for generating a synthetic face image in the analyzed video stream is additionally defined.

[0017] В другой частной реализации способа видео представляет собой онлайн видеоконференцию.[0017] In another particular implementation of the method, the video is an online video conference.

[0018] В другой частной реализации способа при определении синтетически измененного изображения лица в области его отображения формируется уведомление.[0018] In another particular implementation of the method, when a synthetically modified face image is determined, a notification is generated in its display area.

[0019] В другой частной реализации способа при определении синтетически измененного изображения лица осуществляется блокирование соединения с данным пользователем.[0019] In another particular implementation of the method, when determining a synthetically modified face image, the connection with this user is blocked.

[0020] В другой частной реализации способа анализируемое изображение получают из системы биометрической идентификации или биометрической аутентификации.[0020] In another particular implementation of the method, the analyzed image is obtained from a biometric identification or biometric authentication system.

[0021] В другой частной реализации способа при определении синтетически измененного изображения лица осуществляется блокировка доступа или запрашиваемого действия со стороны пользователя.[0021] In another particular implementation of the method, when determining a synthetically modified face image, access or the requested action is blocked from the user.

[0022] В другой частной реализации способа при определении синтетически измененного изображения лица дополнительно запрашивают данные аутентификации пользователя, выбираемые из группы: логин, код, пароль, двухфакторная аутентификация или их сочетания.[0022] In another particular implementation of the method, when determining a synthetically modified face image, user authentication data is additionally requested, selected from the group: login, code, password, two-factor authentication, or combinations thereof.

[0023] В другой частной реализации способа формируется сигнал в виде количественной оценки вероятности присутствия синтетически измененного изображения лица.[0023] In another particular implementation of the method, a signal is generated in the form of a quantitative estimate of the probability of the presence of a synthetically modified face image.

[0024] В другой частной реализации способа изображения получают из видео системы мониторинга медиапространства и анализа социальных медиа и СМИ, выполняющей проверку контента в социальных медиа и СМИ.[0024] In another particular implementation of the method, images are obtained from a video media space monitoring and social media and media analysis system that performs content verification on social media and media.

[0025] В другой частной реализации способа при определении синтетически измененного изображения лица формируется уведомление для информирования человека, который был подвержена созданию измененного изображения лица.[0025] In another particular implementation of the method, when a synthetically modified face image is determined, a notification is generated to inform the person who has been exposed to the creation of the modified face image.

[0026] Во второй предпочтительной реализации изобретения предложена система определения синтетически измененных изображений лиц на видео, содержащая по меньшей мере один процессор и по меньшей мере одну память, хранящую машиночитаемые инструкции, которые при их выполнении процессором реализуют вышеуказанный способ.[0026] In a second preferred embodiment of the invention, a system is provided for detecting synthetically modified face images in video, comprising at least one processor and at least one memory storing machine-readable instructions that, when executed by the processor, implement the above method.

КРАТКОЕ ОПИСАНИЕ ЧЕРТЕЖЕЙBRIEF DESCRIPTION OF THE DRAWINGS

[0027] Фиг. 1 иллюстрирует блок-схему реализации заявленного способа.[0027] FIG. 1 illustrates a block diagram of the implementation of the claimed method.

[0028] Фиг. 2 иллюстрирует пример формирования векторного представления изображений лиц в видео.[0028] FIG. 2 illustrates an example of generating a vector representation of images of faces in a video.

[0029] Фиг. 3А-3Б иллюстрируют пример формирования векторных представлений пространственно-временных характеристик.[0029] FIG. 3A-3B illustrate an example of generating vector representations of space-time characteristics.

[0030] Фиг. 4 иллюстрирует блок-схему формирования вектора оценок синтетических изображений лиц, вектора пространственной характеристики изображений лиц и вектора

характеристики изображений лиц для изображений лиц каждого человека на видео.[0030] FIG. 4 illustrates a block diagram of generating the synthetic face image score vector, the face image spatial characteristic vector, and the vector

face image characteristics for the face images of each person in the video.

[0031] Фиг. 5 иллюстрирует блок-схему независимого формирования итоговых пространственной и

характеристик, и общей оценки синтетических изображений лиц.[0031] FIG. 5 illustrates a block diagram of the independent formation of the final spatial and

characteristics, and overall evaluation of synthetic face images.

[0032] Фиг. 6 иллюстрирует блок-схему обработки итоговых пространственной и

характеристик, при их независимом формировании от общей оценки синтетических изображений лиц, для исключения лиц людей из расчета оценки синтетических изменений в видео.[0032] FIG. 6 illustrates a flowchart for processing the resulting spatial and

characteristics, when they are formed independently from the overall assessment of synthetic images of faces, to exclude people's faces from the calculation of the assessment of synthetic changes in the video.

[0033] Фиг. 7 иллюстрирует блок-схему формирования уведомления с интегральной оценкой наличия синтетических изображений лиц людей в видео, уведомления о вероятном алгоритме генерации данных синтетических изменений, при использовании совокупности ансамблей обученных моделей машинного обучения, когда модели каждого ансамбля обучены на наборе данных с одним конкретным алгоритмом генерации синтетических изменений лиц, а, по меньше мере модели одного ансамбля, обучены на наборе данных с несколькими алгоритмами генерации синтетических изменений лиц.[0033] FIG. 7 illustrates a flowchart for generating a notification with an integrated assessment of the presence of synthetic images of people's faces in a video, a notification about a probable algorithm for generating synthetic data, when using a set of ensembles of trained machine learning models, when the models of each ensemble are trained on a data set with one specific algorithm for generating synthetic face changes, and at least the models of one ensemble are trained on a dataset with several algorithms for generating synthetic face changes.

[0034] Фиг. 8 иллюстрирует блок-схему, когда уведомление формируется интегральным классификатором на основании оценок нескольких обученных моделей машинного обучения или их ансамблей, а на видео присутствует несколько людей.[0034] FIG. 8 illustrates a flowchart when a notification is generated by an integrated classifier based on the scores of several trained machine learning models or their ensembles, and there are several people on the video.

[0035] Фиг. 9 иллюстрирует общий вид вычислительного устройства.[0035] FIG. 9 illustrates a general view of the computing device.

ОСУЩЕСТВЛЕНИЕ ИЗОБРЕТЕНИЯIMPLEMENTATION OF THE INVENTION

[0036] В настоящем решении под термином «синтетически измененное изображения лица» здесь и далее по тексту будет пониматься любой тип формирования цифрового изображения, имитирующего лицо или часть лица другого человека, в том числе путем наложения цифровых масок, искажение/изменение частей лица и т.п. Под синтетически измененным изображением лица следует понимать, как полностью сгенерированные изображения, например, масок с помощью технологии DeepFake, накладываемых на лицо реального человека в кадре с сохранением мимической активности изображения, так и формирование частичного изменения отдельных частей лица (глаз, носа, губ, ушей и т.п.).[0036] In this decision, the term "synthetically modified face image" hereinafter in the text will be understood as any type of digital imaging that imitates the face or part of the face of another person, including by applying digital masks, distorting/changing parts of the face, etc. .P. A synthetically modified face image should be understood as fully generated images, for example, masks using DeepFake technology, superimposed on the face of a real person in the frame while maintaining the mimic activity of the image, and the formation of a partial change in individual parts of the face (eyes, nose, lips, ears etc.).

[0037] Как представлено на Фиг. 1, реализация заявленного способа (100) определения синтетически измененных изображений лиц в видео заключается в выполнении вычислительным компьютерным устройством, в частности, с помощью одного или нескольких процессоров в автоматизированном режиме программного алгоритма, представленного в виде последовательности этапов (101)-(107), обеспечивающих выполнение материальных действий в виде обработки электронных сигналов, порождаемых при исполнении процессором вычислительного устройства своих функций в целях реализации выполнения обработки данных в рамках исполнения способа (100).[0037] As shown in FIG. 1, the implementation of the claimed method (100) for determining synthetically modified images of faces in a video consists in the execution by a computing computer device, in particular, using one or more processors in an automated mode, of a software algorithm presented as a sequence of steps (101) - (107), ensuring the performance of material actions in the form of processing electronic signals generated when the processor of the computing device performs its functions in order to implement the execution of data processing as part of the execution of the method (100).

[0038] На первом этапе (101) осуществляется получение и сохранение в память вычислительного устройства, выполняющего способ (100), одного или нескольких изображений, получаемых из видео. В настоящих материалах заявки под термином «видео» будет пониматься видеоизображение, видеопоток (например, с ip-камеры, камеры электронного устройства, виртуальной камеры, с Интернет-приложения), упорядоченная последовательность кадров (изображений), подвыборка кадров, в том числе вплоть и до одного изображения.[0038] At the first stage (101), one or more images obtained from the video are received and stored in the memory of the computing device that performs the method (100). In the present application materials, the term “video” will mean a video image, a video stream (for example, from an IP camera, an electronic device camera, a virtual camera, from an Internet application), an ordered sequence of frames (images), subsampling of frames, including up to and including up to one image.

[0039] На этапе (102) полученные изображения анализируются на предмет наличия на них изображений лиц для определения наличия его синтетического изменения. Последующий анализ полученных изображений может выполняться с помощью одной или нескольких (ансамбля) моделей машинного обучения, которые обучены на детектирование и классификацию изображений лиц.[0039] At step (102), the received images are analyzed for the presence of images of faces to determine the presence of its synthetic change. Subsequent analysis of the obtained images can be performed using one or more (ensemble) machine learning models that are trained to detect and classify face images.

[0040] При выявлении синтетического изменения изображений лиц в видео могут использоваться различные модели машинного обучения, например, архитектуры нейронных сетей, таких как полносвязанные нейронные сети, CNN (сверточные сети), RNN (рекуррентные сети), Transformer (сети трансформеры), CapsNet (капсульные сети) и их совокупности.[0040] When detecting a synthetic change in face images in a video, various machine learning models can be used, for example, neural network architectures, such as fully connected neural networks, CNN (convolutional networks), RNN (recurrent networks), Transformer (transformer networks), CapsNet ( capsule networks) and their combinations.

[0041] При своем обучения сети могут выявлять одну или несколько особенностей синтетически измененных изображений лиц, в частности:[0041] In their training, the networks may detect one or more features of synthetically modified face images, in particular:

- анатомическая пропорция лица и головы;- anatomical proportion of the face and head;

- анатомическая особенность расположения частей лица;- anatomical feature of the location of parts of the face;

- пропорции частей лица;- proportions of parts of the face;

- пластика и рельеф мимического разнообразия;- plasticity and relief of mimic diversity;

- особенности пластики деталей лица: бровей, глаз, носа, ушей, губ, кожи;- features of plastic parts of the face: eyebrows, eyes, nose, ears, lips, skin;

- общая характеристику мышц лица и шеи;- general characteristics of the muscles of the face and neck;

- строение и распределение мышц на группы (мимические, жевательные, подзатылочные и прочие), место расположения;- structure and distribution of muscles into groups (facial, chewing, suboccipital and others), location;

- неестественность теней, света, бликов, полутеней, рефлексов освещенности и окружения деталей лица и окружающего пространства;- unnaturalness of shadows, light, glare, penumbra, reflexes of illumination and the environment of the details of the face and the surrounding space;

- температурное распределение по элементам лица;- temperature distribution by face elements;

- размытие, сглаживание при отрисовке элементов лица, головы и других элементов изображения;- blurring, smoothing when rendering elements of the face, head and other elements of the image;

- повышение резкости (шарпности) и искусственное усиление черт при отрисовке элементов лица, головы и других элементов изображения;- sharpening (sharpness) and artificial enhancement of features when rendering elements of the face, head and other image elements;

- графические артефакты, оставляемые алгоритмами генерации и/или их конкретными реализациями в программном обеспечении при создании синтетических изображений.- graphic artifacts left by generation algorithms and/or their specific implementations in software when creating synthetic images.

[0042] Так же возможно использование предобученных нейронных сетей с дальнейшим их обучение или без такового. В случае использования архитектур со сверточными сетями могут использоваться такие предобученные модели как: AlexNet, VGG, NASNet-A, DenseNet, DenseNet-B, DenseNet-BC, Inception, Xception, GoogleNet, PReLU-net, BN-inception, AmoebaNet, SENet, ResNet-18, ResNet-34, ResNet-50, ResNet-101, ResNet-152, XResNet, Squeeze-and-Excitation ResNet (SE-ResNet), EfficientNet-B0, EfficientNet EfficientNet-B1, EfficientNet-B2, EfficientNet-B3, EfficientNet-B4, EfficientNet-B5, EfficientNet-B6, EfficientNet-B7, YOLO и наследуемых от них.[0042] It is also possible to use pre-trained neural networks with or without further training. In the case of using architectures with convolutional networks, such pre-trained models can be used as: AlexNet, VGG, NASNet-A, DenseNet, DenseNet-B, DenseNet-BC, Inception, Xception, GoogleNet, PReLU-net, BN-inception, AmoebaNet, SENet, ResNet-18, ResNet-34, ResNet-50, ResNet-101, ResNet-152, XResNet, Squeeze-and-Excitation ResNet (SE-ResNet), EfficientNet-B0, EfficientNet EfficientNet-B1, EfficientNet-B2, EfficientNet-B3 , EfficientNet-B4, EfficientNet-B5, EfficientNet-B6, EfficientNet-B7, YOLO and inherited from them.

[0043] Обучение модели машинного обучения производилось как минимум с одним этапом из следующих:[0043] The machine learning model was trained with at least one of the following steps:

- Получение классифицированных (размеченных, с проставленными классами) данных в одном или нескольких форматов: видеопоток, видеофайл, кадры (кадр) видео;- Obtaining classified (labeled, with affixed classes) data in one or more formats: video stream, video file, video frames (frame);

- Выделение кадров в случае получения видеопотока или видеофайла;- Selection of frames in case of receiving a video stream or video file;

- Обнаружение лица (лиц) на кадрах. Их вырезка из кадра с некоторой окрестностью вокруг лица и получение массивов данных лиц;- Detection of face(s) on frames. Cutting them out of the frame with some neighborhood around the face and getting arrays of face data;

- Для данных класса «Синтетически измененное изображение» в случае наличия исходного кадра изображения - кадр из которого формировалось такое измененное изображение, проверка правильности проставленного класса;- For the data of the class "Synthetically modified image" in the case of the presence of the original image frame - the frame from which such a modified image was formed, checking the correctness of the affixed class;

- Для каждого лица производится трансформация его массива данных (значений пикселей, bmp-карты) по алгоритму предобработки (стандартизация данных, масштабирование изображения и другие);- For each face, its data array (pixel values, bmp-map) is transformed according to the preprocessing algorithm (data standardization, image scaling, etc.);

- Аугментация данных;- Data augmentation;

- Формирование пакета данных и подача его на обучение нейронной сети;- Formation of a data package and its submission for training a neural network;

- Подсчет значения целевой функции и обратное распространении ошибки пакета данных для обучения сети. В качестве показателей качества могут применяться: LogLoss, accuracy, precision (точность), recall (полнота), F-мера, AUC-ROC, AUC-PR, коэффициент/индекс Джини (Gini coefficient), confusion matrix (матрица ошибок).- Calculate the value of the objective function and backpropagate the error of the data packet for network training. The following can be used as quality indicators: LogLoss, accuracy, precision (accuracy), recall (completeness), F-measure, AUC-ROC, AUC-PR, Gini coefficient / index (Gini coefficient), confusion matrix (error matrix).

[0044] В качестве алгоритма тренировки модели машинного обучения может использоваться один или несколько следующих алгоритмов: Adagrad (Adaptive gradient algorithm), RMS (Root mean square), RMSProp (Root mean square propagation), Rprop (Resilient backpropagation algorithm), SGD (Stochastic Gradient Descent), BGD (Batch Gradient Descent), MBGD (Mini-batch Gradient Descent), Momentum, Nesterov Momentum, NAG (Nesterov Accelerated Gradient), FussySGD, SGDNesterov (SGD + Nesterov Momentum), AdaDelta, Adam (Adaptive Moment Estimation), AMSGrad, AdamW, ASGD (Averaged Stochastic Gradient Descent), LBFGS (L-BFGS algorithm - алгоритм Бройдена-Флетчера-Гольдфарба-Шанно с ограниченным использованием памяти), а так же оптимизаторы второго порядка, такие как: Метод Ньютона, Квазиньютоновский метод, Алгоритм Гаусса-Ньютона, Метод сопряженного градиента, Алгоритм Левенберга-Марквардта.[0044] One or more of the following algorithms can be used as a training algorithm for a machine learning model: Adagrad (Adaptive gradient algorithm), RMS (Root mean square), RMSProp (Root mean square propagation), Rprop (Resilient backpropagation algorithm), SGD (Stochastic Gradient Descent), BGD (Batch Gradient Descent), MBGD (Mini-batch Gradient Descent), Momentum, Nesterov Momentum, NAG (Nesterov Accelerated Gradient), FussySGD, SGDNesterov (SGD + Nesterov Momentum), AdaDelta, Adam (Adaptive Moment Estimation) , AMSGrad, AdamW, ASGD (Averaged Stochastic Gradient Descent), LBFGS (L-BFGS algorithm - Broyden-Fletcher-Goldfarb-Shanno algorithm with limited memory usage), as well as second-order optimizers such as: Newton method, Quasi-Newton method, Gauss-Newton algorithm, Conjugate gradient method, Levenberg-Marquardt algorithm.

[0045] В качестве целевой функции при обучении модели машинного обучения используется по крайней мере одна из следующих функций: L1Loss, MSELoss, CrossEntropyLoss, CTCLoss, NLLLoss, PoissonNLLLoss, GaussianNLLLoss, KLDivLoss, BCELoss, BCEWithLogitsLoss, MarginRankingLoss, HingeEmbeddingLoss, MultiLabelMarginLoss, HuberLoss, SmoothL1Loss, SoftMarginLoss, MultiLabelSoftMarginLoss, CosineEmbeddingLoss, MultiMarginLoss, TripletMarginLoss, TripletMarginWithDistanceLoss.[0045] At least one of the following functions is used as an objective function when training a machine learning model: L1Loss, MSELoss, CrossEntropyLoss, CTCLoss, NLLLoss, PoissonNLLLoss, GaussianNLLLoss, KLDivLoss, BCELoss, BCEWithLogitsLoss, MarginRankingLoss, HingeEmbeddingLoss, MultiLoginLabLoss , SoftMarginLoss, MultiLabelSoftMarginLoss, CosineEmbeddingLoss, MultiMarginLoss, TripletMarginLoss, TripletMarginWithDistanceLoss.

[0046] При обучении модели машинного обучения может применяться этап самопроверки разметки (автоматической корректировки разметки), на котором проводится проверка каждого лица на изображении (кадре), которое размечено как содержащее синтетическое изменение, что оно действительно содержит признаки такого изменения.[0046] When training a machine learning model, a labeling self-check (automatic labeling correction) stage can be used, which checks each face in the image (frame) that is marked as containing a synthetic change, that it really contains signs of such a change.

[0047] Данная проверка реализуется в случае наличия исходного видео (кадров, изображений). Исходное видео (кадры, изображения) это реальное (неизмененное внедрением синтетического изменения) видео из которого формировались синтетически измененное видео учебного (дополнительно и тестового) набора. Данная особенность реализуется следующим образом и может содержать следующие шаги:[0047] This check is implemented if the source video (frames, images) is available. The source video (frames, images) is a real (unmodified by the introduction of a synthetic change) video from which a synthetically modified video of the training (additionally and test) set was formed. This feature is implemented as follows and may include the following steps:

- Алгоритмом обнаружения лица на изображении на кадре синтетически измененного видео обнаруживается лицо. Вырезается часть изображения с лицом и некоторой окрестностью вокруг него. Размер окрестности может варьироваться.- The face detection algorithm on the image on the frame of the synthetically modified video detects a face. A part of the image with a face and some neighborhood around it is cut out. The size of the neighborhood may vary.

- В соответствующем кадре исходного видео обнаруживаются все лица. Выбирается лицо с наиболее близкими характеристиками к лицу с предыдущего шага. В качестве меры близости, в зависимости от используемого алгоритма детектирования лиц, используется близость по одной или нескольким точкам (совокупность точек) лица:- All faces are detected in the corresponding frame of the original video. The face with the closest characteristics to the face from the previous step is selected. As a measure of proximity, depending on the face detection algorithm used, proximity at one or more points (a set of points) of a face is used:

- носа;- nose;

- ноздрей;- nostrils;

- линии волос;- hair lines;

- линии растительности на лице (борода, усы);- lines of vegetation on the face (beard, mustache);

- рта;- mouth;

- губ (верхней и нижней);- lips (upper and lower);

- лба;- forehead;

- глаз;- eye;

- зрачков- pupils

- ушей;- ears;

- бровей;- eyebrows;

- век;- age;

- головы;- heads;

- скул;- cheekbones;

- подбородка;- chin;

- носогубного треугольника;- nasolabial triangle;

- координат прямоугольника лица.- coordinates of the face rectangle.

[0048] В качестве алгоритма для детектирования лиц людей могут использоваться такие подходы, как: адаптированное улучшение и основанный на нем метод Виолы-Джонса, MTCNN, метод гибкого сравнения на графах (Elastic graph matching), DeepFace Facebook, скрытые Марковские модели (СММ, НММ), Метод главных компонент и алгоритмы на основе разложения матрицы данных (РСА, SVD, LDA), Active Appearance Models (ААМ), Active Shape Models (ASM), FERET (face recognition technology), SURF, NeoFace, SHORE, ROI, Template Matching Methods, DPM (модель деформируемой детали), Искусственные нейронные сети (Neural network: Multilayer Perceptrons), Факторного анализа (ФА), Линейный дискриминантный анализ (Linear Discriminant Analysis), Метод опорных векторов (Support Vector Machines (SVM)), Наивный байесовский классификатор (Naive Bayes classifier), Скрытые Марковские модели (Hidden Markov model), Метод распределения (Distribution-based method), Совмещение ФА и метода главных компонент (Mixture of РСА, Mixture of factor analyzers), Разреженная сеть окон (Sparse network of winnows (SNoW)).[0048] As an algorithm for detecting people's faces, approaches such as: adapted enhancement and the Viola-Jones method based on it, MTCNN, Elastic graph matching method, DeepFace Facebook, hidden Markov models (HMM, HMM), Principal component analysis and data matrix decomposition algorithms (RSA, SVD, LDA), Active Appearance Models (AAM), Active Shape Models (ASM), FERET (face recognition technology), SURF, NeoFace, SHORE, ROI, Template Matching Methods, DPM (Deformable Part Model), Artificial Neural Networks (Neural network: Multilayer Perceptrons), Factor Analysis (FA), Linear Discriminant Analysis, Support Vector Machines (SVM), Naive Bayesian classifier, Hidden Markov model, Distribution-based method, Mixture of PCA, Mixture of factor analyzers), Sparse network of winnows (SNoW)).

[0049] Под близостью понимаем минимальность расстояния для числовых данных по метрике Брея-Кёртиса, Канберры, Ружичка, Кульчинского, Жаккара, Евклидова расстояния, метрики Манхэттена, расстояние размера Пенроуза, расстояние формы Пенроуза, Лоренцевское расстояние, расстояние Хеллинджера, расстояние Минковского меры р, расстояние Махаланобиса, статистическое расстояние, корреляционные подобности и расстояния (корреляция Пирсона, подобность Орчини, нормированное скалярное произведение) или иное. При вычислении близости для расчета берутся координаты точек на кадре синтетически измененного видео и координаты таких точек на кадре исходного видео, далее выбирается наиболее близкое изображения лица, как лица с минимальными расстояниями между используемыми точками.[0049] Proximity is understood as the minimum distance for numerical data according to the Bray-Curtis metric, Canberra, Ruzicka, Kulczynski, Jaccard, Euclidean distance, Manhattan metric, Penrose size distance, Penrose shape distance, Lorentzian distance, Hellinger distance, Minkowski distance measure p, Mahalanobis distance, statistical distance, correlation similarities and distances (Pearson correlation, Orchini similarity, normalized dot product) or otherwise. When calculating proximity, the coordinates of points on the frame of the synthetically modified video and the coordinates of such points on the frame of the original video are taken for calculation, then the closest face image is selected, as faces with the minimum distances between the points used.

[0050] В одном из частных примеров реализации также возможно выделение (получение координат) области лица на кадре синтетически измененного видео, после чего выполняется вырезание области с такими же координатами на кадре исходного видео. В другом частном примере реализации может выполняться обратный вид обработки - на кадре исходного видео обнаруживается лицо, а на кадрах синтетически измененного видео вырезается область с такими же координатами. По итогу выполненных операций получается два изображения, которые представляют собой область с лицом с кадра исходного видео и лица с кадром синтетически измененного видео.[0050] In one of the particular examples of implementation, it is also possible to select (obtain coordinates) a face area on a frame of a synthetically modified video, after which an area with the same coordinates is cut out on the frame of the original video. In another particular implementation example, the reverse type of processing can be performed - a face is detected on the frame of the original video, and an area with the same coordinates is cut out on the frames of the synthetically modified video. As a result of the performed operations, two images are obtained, which are an area with a face from the frame of the original video and a face with a frame of synthetically modified video.

[0051] Полученная пара изображений сравнивается между собой по заданной метрике для оценки уровня искажения изображения. В качестве такой метрики может использоваться:[0051] The resulting pair of images is compared with each other according to a given metric to assess the level of image distortion. As such a metric can be used:

- Пиковое отношение сигнала к шуму (PSNR - peak signal-to-noise ratio). https://ru.wikipedia.org/wiki/Пиковое_отношение_сигнала_к_шуму;- Peak signal-to-noise ratio (PSNR - peak signal-to-noise ratio). https://ru.wikipedia.org/wiki/Peak_signal_to_noise_ratio;

- Среднеквадратичная ошибка (MSE - mean square error). https://ru.wikipedia.org/wiki/Среднеквадратическое_отклонение;- Mean square error (MSE - mean square error). https://en.wikipedia.org/wiki/Standard_deviation;

- Квадратный корень функции среднеквадратической ошибки (RMSE - root-mean-square error). https://ru.wikipedia.org/wiki/Пиковое_отношение_сигнала_к_шуму;- The square root of the root-mean-square error function (RMSE - root-mean-square error). https://ru.wikipedia.org/wiki/Peak_signal_to_noise_ratio;

- Относительное среднее отклонение (RMD - Root mean squared deviation);- Relative mean squared deviation (RMD - Root mean squared deviation);

- Среднеквадратичное отклонение (RMS - Root Mean Squared);- Standard deviation (RMS - Root Mean Squared);

- Индекс структурного сходства (SSIM - structure similarity). https://ru.wikipedia.org/wiki/SSIM;- Index of structural similarity (SSIM - structure similarity). https://ru.wikipedia.org/wiki/SSIM;

- Структурные отличия (DSSIM - structural dissimilarity). https://ru.wikipedia.org/wiki/SSIM;- Structural differences (DSSIM - structural dissimilarity). https://ru.wikipedia.org/wiki/SSIM;

- Отношение сигнал/шум (ОСШ; SNR - signal-to-noise ratio). https://ru.wikipedia.org/wiki/Отношение_сигнал/шум/;- Signal-to-noise ratio (SNR; SNR - signal-to-noise ratio). https://ru.wikipedia.org/wiki/Signal/noise ratio/;

- Абсолютная разница между пикселями и наследуемые от нее показатели (средняя, относительная и прочие).- Absolute difference between pixels and indicators inherited from it (average, relative, etc.).

[0052] При этом если анализируются цветные изображений (с несколькими компонентами на пиксель) применяются аналогичные с дальнейшим взвешенным усреднением по каждой из компонент. Например, для RGB изображения для расчета PSNR или MSE считается по всем трем компонентам (и делится на утроенный размер изображения). Для синтетического изображения хорошего качества и хорошего качества видео (без помех у шумов) предпочтительно использовать «PSNR». Если наложению синтетического изображения подлежит только часть лица, то предпочтительно применять PSNR. Если видео с помехами, или высокой зернистостью, то предпочтительно использовать DSSIM или SSIM. При наличии множества помех предпочтительно применяться ОСШ. Если качество видео крайне низкого качества, например, с высокой степенью сжатия, то предпочтительно применять MSE или RMD. Если размеры лица по отношению к кадру маленькие, то применяется абсолютная разница между пикселями.[0052] In this case, if color images are analyzed (with several components per pixel), similar ones are applied with further weighted averaging for each of the components. For example, for an RGB image, PSNR or MSE is calculated over all three components (and divided by three times the image size). For a good quality synthetic image and a good video quality (no interference from noise), it is preferable to use "PSNR". If only part of the face is to be overlaid with the synthetic image, it is preferable to use PSNR. If the video is noisy or grainy, then it is preferable to use DSSIM or SSIM. In the presence of a lot of interference, it is preferable to apply SNR. If the video quality is extremely low quality, for example, with a high compression ratio, then it is preferable to use MSE or RMD. If the dimensions of the face in relation to the frame are small, then the absolute difference between the pixels is applied.

[0053] По применяемой метрике выбирается граничное значение, при этом если значение метрики между двумя полученными изображениями больше данного граничного значения, то лицо на кадре принимается как синтетически измененное. Если значение меньше или равно, то, не смотря на разметку данного изображения как синтетически измененного, то данное изображение лица принимается за реальное.[0053] According to the applied metric, a boundary value is selected, and if the value of the metric between two received images is greater than this boundary value, then the face on the frame is accepted as synthetically modified. If the value is less than or equal, then, despite the marking of this image as synthetically modified, then this face image is taken as real.

[0054] При выполнении трансформации массивов данных лиц могут использоваться такие элементы как: нормировка данных, стандартизация данных, приведение размера к заданному, алгоритмы масштабирование изображения.[0054] When performing the transformation of face data arrays, such elements as: data normalization, data standardization, size reduction to a given one, image scaling algorithms can be used.

[0055] Аугментация данных для тренировки одной или нескольких моделей машинного обучения может проводится с использование как минимум одного из следующих подходов: масштабирование изображения (увеличения, уменьшения); обрезка изображения; затемнение всего изображения, отдельных каналов изображения; осветление всего изображения, отдельных каналов изображения; повышение контрастности; цветовые преобразования: перемена мест (перемешивание) цветовых каналов, усиление, уменьшения одного или несколько цветовых каналов, получение изображения в градациях серого, получение монохромного изображения, удаление цветового канала; сдвиги и децентровка изображения; повороты изображения на различные углы в различных направлениях, вращение изображения или его части; наклоны, перекосы изображения; зеркальное отображение вдоль произвольной оси, линии; дополнительные линии или геометрические объекты на изображении: с прозрачностью своего цвета, без прозрачности, цветные объекты; серые объекты (от белого до черного цвета), в том числе и удаление части изображения (помещение черного объекта на изображение) на геометрических или смысловых позициях изображения; добавление любого фона на изображение; блики и затемнения частей изображения; дефокус (размытие) изображения или его частей; повышение зернистости, шарпности (резкости) изображения; сжатия и растяжения вдоль осей, линий; зашумление изображение по всему изображению или его части, помещение белого или иного шума; добавление одного или несколько элементов гауссового шума (Blur), пятнистого шума; совмещение (наложение) двух или нескольких изображений из тренировочной выборки (частей изображений) с различными весами; эластическая трансформация изображения (Elastic Transform); сеточное искажение изображения (GridDistortion); сжатие данных изображения различными алгоритмами обработки изображения с некоторым качеством (например, сжатие исходного bmp-изображения по стандарту JPEG некоторого качества, а затем получения из него снова bmp-изображения); изотропные, аффинные и другие преобразования (https://github.com/albumentations-team/albumentations).[0055] Data augmentation for training one or more machine learning models may be performed using at least one of the following approaches: image scaling (increase, decrease); image cropping; dimming the entire image, individual channels of the image; clarification of the entire image, individual channels of the image; increase in contrast; color transformations: repositioning (mixing) of color channels, amplification, reduction of one or more color channels, obtaining an image in grayscale, obtaining a monochrome image, deleting a color channel; shifts and decentering of the image; rotation of the image at different angles in different directions, rotation of the image or its part; slopes, distortions of the image; mirror image along an arbitrary axis, line; additional lines or geometric objects in the image: with transparency of their color, without transparency, colored objects; gray objects (from white to black), including the removal of a part of the image (placing a black object on the image) at the geometric or semantic positions of the image; adding any background to the image; glare and darkening of parts of the image; defocus (blur) of the image or its parts; increasing graininess, sharpness (sharpness) of the image; compression and stretching along axes, lines; image noise over the entire image or part of it, the placement of white or other noise; adding one or more elements of Gaussian noise (Blur), spotted noise; superposition (overlay) of two or more images from the training set (parts of images) with different weights; elastic transformation of the image (Elastic Transform); grid distortion of the image (GridDistortion); compression of image data by various image processing algorithms with some quality (for example, compression of the original bmp image according to the JPEG standard of some quality, and then getting a bmp image from it again); isotropic, affine and other transformations (https://github.com/albumentations-team/albumentations).

[0056] При этом все вышеуказанные применимы во всевозможных видах графического представления или его каналах: RGB, sRGB, RGBA, ProPhoto, CMYK, XYZ, LMS, HKS, HSV, HSB, HSL, AHSL, RYB, LAB, NCS, RAL, YUV. YCbCr. YPbPr, YDbDr, YIQ, PMS (Пантон), Манселла. Указанные методы аугментации могут применяться и к одному изображению, в любой последовательности, с вероятностью применения или без нее.[0056] In this case, all of the above are applicable in various types of graphic representation or its channels: RGB, sRGB, RGBA, ProPhoto, CMYK, XYZ, LMS, HKS, HSV, HSB, HSL, AHSL, RYB, LAB, NCS, RAL, YUV . YCbCr. YPbPr, YDbDr, YIQ, PMS (Panton), Munsella. These methods of augmentation can be applied to one image, in any sequence, with or without the possibility of application.

[0057] С помощью обученной модели или алгоритма детектирования лиц людей или на этапе (102) выделяются лица. На этапе (103) выполняется обработка изображений лиц, выделенных на этапе (102), с целью определения какие изображения лиц принадлежат одному человеку. Для этого на этапе (103) осуществляется расчет векторного представления геометрических характеристик изображений лиц. В общем случае это выполняется с помощью алгоритма сравнения опорных точек лиц. С помощью определения геометрических характеристик определяются изображения лиц, принадлежащие непосредственно одному и тому же человеку. Формирование данного вектора позволяет оценить вероятность наличия лица реального человека. Алгоритм работы может осуществляться следующим образом. На i-ом кадре выделяется j-oe лицо. Данное j-oe лицо ищется на последующих кадрах.[0057] Using the trained model or algorithm for detecting people's faces or at step (102), faces are highlighted. At step (103), the face images extracted at step (102) are processed to determine which face images belong to one person. To do this, at step (103), a vector representation of the geometric characteristics of face images is calculated. In the general case, this is done using the face reference point comparison algorithm. Using the definition of geometric characteristics, images of faces belonging directly to the same person are determined. The formation of this vector makes it possible to estimate the probability of having a face of a real person. The algorithm of work can be carried out as follows. The j-th face is highlighted on the i-th frame. This j-oe face is searched for in subsequent frames.

[0058] В одном из частных примеров реализации изобретения поиск осуществляется путем выделения наиболее близкого изображения лица в пространстве среди всех обнаруженных лиц на i+1-ом кадре. В качестве меры близости, в зависимости от используемого алгоритма детектирования лиц, используется близость (числовых данных по метрике Брея-Кёртиса, Канберры, Ружичка, Кульчинского, Жаккара, Евклидова расстояния, метрики Манхэттена, расстояние размера Пенроуза, расстояние формы Пенроуза, Лоренцевское расстояние, расстояние Хеллинджера, расстояние Минковского меры р, расстояние Махаланобиса, статистическое расстояние, корреляционные подобности и расстояния - корреляция Пирсона, подобность Орчини, нормированное скалярное произведение, или иное) по одной или нескольким точкам лица (опорным точкам лица): носа, ноздрей, линии волос, линии растительности на лице (борода, усы), рта, губ (верхней и нижней), лба, глаз, зрачков, ушей, бровей, век, головы, скул, подбородка, носогубного треугольника, координат прямоугольника лица.[0058] In one of the particular examples of the implementation of the invention, the search is carried out by highlighting the closest face image in space among all detected faces on the i + 1st frame. As a measure of proximity, depending on the face detection algorithm used, proximity is used (numerical data according to the Bray-Curtis metric, Canberra, Ruzic, Kulchinsky, Jaccard, Euclidean distance, Manhattan metric, Penrose size distance, Penrose shape distance, Lorentz distance, distance Hellinger distance, Minkowski distance p, Mahalanobis distance, statistical distance, correlation similarities and distances - Pearson correlation, Orchini similarity, normalized scalar product, or other) at one or more points of the face (reference points of the face): nose, nostrils, hairline, vegetation lines on the face (beard, mustache), mouth, lips (upper and lower), forehead, eyes, pupils, ears, eyebrows, eyelids, head, cheekbones, chin, nasolabial triangle, face rectangle coordinates.

[0059] Осуществляется расчет расстояния между соответствующими опорными точками j-го лица на i-ом кадре и точками каждого лица на i+1-ом кадре. Затем выбирается лицо с i+1-го кадра с наименьшими расстояниями по опорным точкам. В другом частном примере осуществлении изобретения на i+1-ом кадре ищется лицо с наиболее близкими характеристиками между данными опорными точками (взаимным расположением точек). В этом случае считаются геометрические характеристики (размерами) расположения опорных точек j-го изображения лица и на i+1 кадре ищется изображения лица с наиболее похожими геометрическими характеристиками. В еще одном примере осуществления для каждого лица выделяется некоторая пространственная окрестность (область расположения) на кадре и проверяется есть ли какое-либо изображения лица в i+1-ом кадре. Реализация подходов при осуществлении заявленного способа (100) не ограничивает иные возможные способы поиска изображения лица на кадрах.[0059] The distance between the corresponding reference points of the j-th face in the i-th frame and the points of each face in the i+1-th frame is calculated. Then a face is selected from the i+1-th frame with the smallest distances by reference points. In another particular embodiment of the invention, on the i+1st frame, a face with the closest characteristics between the given reference points (the relative position of the points) is searched for. In this case, the geometric characteristics (dimensions) of the location of the reference points of the j-th face image are considered, and on the i + 1 frame, the face images with the most similar geometric characteristics are searched. In yet another embodiment, for each face, a certain spatial neighborhood (location area) on the frame is allocated and it is checked whether there is any image of the face in the i + 1st frame. The implementation of approaches in the implementation of the claimed method (100) does not limit other possible ways to search for a face image in frames.

[0060] Далее на этапе (104) для каждого обнаруженного изображения лица на кадрах определяется рассчитывается оценка вероятности его синтетического изменения по используемой обученной модели машинного обучения детектирования и классификации синтетических изменений. Данная оценка добавляется в вектор оценок изображений лиц j-ого человека. Если на очередном кадре (или серии кадров) упорядоченной последовательности кадров изображение j-ого лица не обнаруживается, то формирование вектора оценок может завершиться. Пример формирования вектора оценок для изображения лица человека на кадрах видео представлен на Фиг. 2. В другом из вариантов реализации формирование вектора оценок для изображений лиц j-го человека происходит по всему видео, а не завершается, если на последующем кадре изображение лица не обнаруживается.[0060] Next, at step (104), for each detected face image in the frames, an estimate of the probability of its synthetic change is calculated using the trained machine learning model for detecting and classifying synthetic changes. This rating is added to the vector of ratings for face images of the j-th person. If the image of the j-th face is not found on the next frame (or series of frames) of the ordered sequence of frames, then the formation of the vector of estimates can be completed. An example of the formation of a rating vector for the image of a person's face on video frames is shown in Fig. 2. In another implementation, the formation of a rating vector for face images of the j-th person occurs throughout the video, and is not completed if a face image is not detected on the next frame.

[0061] Далее на этапе (104) для каждого определенного изображения лица человека определяются его пространственная и временная значимость, которая определяется как векторное представление пространственной характеристики лица человека, характеризующей размер области лица по отношению к кадру, и векторное представление временной характеристики изображения лица, характеризующей время отображения анализируемого изображения лица на кадрах видео. На Фиг. 4 представлена схема этапа 104. Расчеты вектора оценок синтетических изменений изображений лица j-го человека на видео, который состоит из оценок изменений изображения лица в каждом анализируемом кадре, расчет вектора пространственной и

характеристик (пространственного вектора и

вектора) могут проводится последовательно, как это представлено на Фиг. 4, или параллельно, независимо друг от друга. Описание изобретения не ограничивает порядок и способ расчета данных векторов, а описывает их применение для повышения качества выявления синтетических изменений изображений лиц на видео.[0061] Next, at step (104), for each specific image of a person's face, its spatial and temporal significance is determined, which is defined as a vector representation of the spatial characteristics of a person's face, characterizing the size of the face area in relation to the frame, and a vector representation of the temporal characteristic of the face image, characterizing display time of the analyzed face image on video frames. On FIG. 4 shows a diagram of step 104. Calculations of the vector of estimates of synthetic changes in the images of the face of the j-th person in the video, which consists of estimates of changes in the face image in each analyzed frame, calculation of the vector of spatial and

characteristics (space vector and

vector) can be carried out sequentially, as shown in FIG. 4, or in parallel, independently of each other. The description of the invention does not limit the order and method of calculating these vectors, but describes their use to improve the quality of detecting synthetic changes in face images in video.

[0062] На Фиг. 3А-3Б представлен пример расчета векторов пространственной и временной значимостей и вектора оценок синтетических изменений. На представленном примере для каждого кадра (К1)-(К6) полученного на этапе (101) видео рассчитывается вектор оценок синтетически измененного лица, вектор пространственного распределения лиц на кадрах, а также временная характеристика лица на кадрах. Пространственная характеристика может рассчитываться исходя из занимаемой доли площади лица от размера кадра. Например, прямоугольник, в который вписано лицо в кадре, имеет координаты: X1=100, Y1=50 - верхний левый угол; Х2=300, Y2=150 - нижний правый угол. Площадь такого прямоугольника 200*100=20000. Видео получено в разрешении 1280×1920 пикселей и его площадь равна 2457600. Доля площади лица в кадре составит 20000/2457600=0,8%. Временная характеристика для каждого лица может рассчитываться как скалярная величина, например, время его отображения на видео. В другой реализации может формироваться вектор, при котором 1 присваивается если человек присутствует в кадре, или 0 - если его нет в кадре. Пространственную и временную значимости можно представить, в частности, как общую матрицу на основании значений сформированных векторных представлений.[0062] In FIG. 3A-3B show an example of calculating the spatial and temporal significance vectors and the vector of synthetic change estimates. In the presented example, for each frame (K1)-(K6) of the video obtained at step (101), the vector of estimates of the synthetically modified face, the vector of the spatial distribution of faces on the frames, and the temporal characteristic of the face on the frames are calculated. The spatial characteristic can be calculated based on the occupied fraction of the face area from the frame size. For example, the rectangle in which the face is inscribed in the frame has the following coordinates: X1=100, Y1=50 - upper left corner; Х2=300, Y2=150 - lower right corner. The area of such a rectangle is 200*100=20000. The video was obtained at a resolution of 1280×1920 pixels and its area is 2457600. The proportion of the face area in the frame will be 20000/2457600=0.8%. The time characteristic for each face can be calculated as a scalar value, for example, the time it is displayed on the video. In another implementation, a vector may be formed in which 1 is assigned if the person is present in the frame, or 0 if he is not in the frame. Spatial and temporal significance can be represented, in particular, as a common matrix based on the values of the generated vector representations.

[0063] На этапе (105) формируется общая оценка синтетических изменений изображений лиц человека на видео на основании векторов, полученных на этапах (103)-(104). То есть, расчет оценки вероятности синтетического изменения изображения для каждого лица человека в видео выполняется на основании векторов

распределения, пространственного распределения и вектора оценок вероятности, что изображение лица на кадре было подвержено синтетическим изменениям.[0063] At step (105), an overall score of synthetic changes in images of human faces on video is generated based on the vectors obtained at steps (103)-(104). That is, the calculation of the synthetic image change probability estimate for each person's face in the video is performed based on the vectors

distribution, spatial distribution, and vector of probability estimates that the face image in the frame was subject to synthetic changes.

[0064] Для формирования общей оценки синтетических изменений изображений лиц j-го человека может использоваться отдельная модель машинного обучения. Для формирования упомянутой общей оценки полученные вектора пространственного и

распределения, вектор оценок синтетических изменений изображения лица объединяются в общую двумерную матрицу, представленную в Таблице 1 для примера на Фиг. 3А. Полученная матрица подается на вход модели машинного обучения для формирования общей оценки синтетического изменения лица j-го человека на видео. Данная модель может представлять собой рекуррентную нейронную сеть, сверточную нейронную сеть, полносвязанную нейронную сеть. Подобное объединение рекомендуется использовать для случая, когда человек присутствует на разных временных отрезках видео, а не только в одной последовательной серии кадров.[0064] A separate machine learning model can be used to generate an overall estimate of synthetic changes in the face images of the j-th person. To form the mentioned general estimate, the obtained vectors of spatial and

distributions, the vector of estimates of synthetic changes in the face image are combined into a common two-dimensional matrix presented in Table 1 for an example in Fig. 3A. The resulting matrix is fed to the input of the machine learning model to form an overall estimate of the synthetic change in the face of the j-th person in the video. This model can be a recurrent neural network, a convolutional neural network, a fully connected neural network. Such a combination is recommended for the case when a person is present at different time intervals of the video, and not only in one consecutive series of frames.

[0065] В другом частном примере реализации изобретения в общую двумерную матрицу объединяются вектор пространственного распределения изображения лица человека и вектор оценок присутствия синтетических изменений. Но их объединение происходит только по кадрам, на которых есть лицо данного человека. Пример представлен в Таблице 2 для Фиг. 3А. Подобное объединение рекомендуется использоваться для случая, когда человек присутствует в одной последовательной серии кадров.[0065] In another particular embodiment of the invention, a vector of spatial distribution of a human face image and a vector of estimates of the presence of synthetic changes are combined into a common two-dimensional matrix. But their unification occurs only in frames on which there is a face of a given person. An example is shown in Table 2 for FIG. 3A. Such a combination is recommended for the case when a person is present in one consecutive series of frames.

Один из вариантов формирование уведомления наличия изменений в видео, при способе расчета общей оценки синтетических изменений изображений лиц j-го человека на этапе 105 с помощью обученной модели, которая использует матрицу объединения векторов пространственно-временного представления и вектора оценок, представлена на Фигуре 8.One of the options for generating a notification of the presence of changes in the video, in the method of calculating the overall score of synthetic changes in the images of the faces of the j-th person at step 105 using a trained model that uses a matrix of combining spatio-temporal representation vectors and a score vector, is shown in Figure 8.

[0066] В другом частном примере реализации изобретения на этапе (105) вектор оценок, характеризующий то, что изображение лица на кадре было подвержено синтетическим изменениям, анализируется отдельно от векторов

и пространственного распределения. Пример данной схемы приведен на Фиг. 5. В этом случае общая оценка синтетического изменения лица j-го человека строится только на векторе оценок синтетических изменений. Для формирования общей оценки может использоваться отдельная модель машинного обучения или отдельный алгоритм.[0066] In another particular embodiment of the invention, at step (105), the score vector characterizing that the face image in the frame was subject to synthetic changes is analyzed separately from the vectors

and spatial distribution. An example of this circuit is shown in Fig. 5. In this case, the overall assessment of the synthetic change in the face of the j-th person is based only on the vector of estimates of synthetic changes. A separate machine learning model or a separate algorithm can be used to generate the overall score.

[0067] В одном из частных примеров реализации изобретения, приведенного на Фиг. 5, вектор оценок подается на вход отдельно обученной модели. Данная модель может представлять собой рекуррентную нейронную сеть, сверточную нейронную сеть, полносвязанную нейронную сеть. В подобных случаях может использоваться вектор определенной длины. В случае если полученный вектор оценок изображения лица меньше заданной длины вектора, то такой вектор дополняется значениями, например, 0,5 с определенного конца. Если вектор больше заданной длины, то он обрезается с определенного конца.[0067] In one of the particular embodiments of the invention shown in FIG. 5, the score vector is input to the separately trained model. This model can be a recurrent neural network, a convolutional neural network, a fully connected neural network. In such cases, a vector of a certain length can be used. If the resulting vector of face image ratings is less than the specified length of the vector, then such a vector is supplemented with values, for example, 0.5 from a certain end. If the vector is larger than the specified length, then it is cut off from a certain end.

[0068] В другом частном примере реализации изобретения производится подсчет количества оценок по заданным интервалам или частоты интервалов оценок. Например, берутся интервалы с шагом 0,1: [0-0,1; 0,1-0,2; 0,2-0,3; 0,3-0,4; 0,4-0,5; 0,5-0,6; 0,6-0,7; 0,7-0,8; 0,8-0,9; 0,9-1] и подсчитывается частота оценок из вектора в данных интервалах. Полученные значения подаются на модель машинного обучения, например, опорных векторов (SVM), K-соседей (K-nearest neighbour), линейной (нелинейной) регрессии, модель деревьев классификации. Описание изобретения не ограничивает вид модели машинного обучения, а описывает ее применение к полученному вектору оценок.[0068] In another particular embodiment of the invention, the number of ratings at predetermined intervals or the frequency of rating intervals is counted. For example, intervals are taken with a step of 0.1: [0-0.1; 0.1-0.2; 0.2-0.3; 0.3-0.4; 0.4-0.5; 0.5-0.6; 0.6-0.7; 0.7-0.8; 0.8-0.9; 0.9-1] and the frequency of estimates from the vector in these intervals is calculated. The obtained values are fed into a machine learning model, for example, support vectors (SVM), K-neighbors (K-nearest neighbor), linear (non-linear) regression, classification trees model. The description of the invention does not limit the form of the machine learning model, but describes its application to the received score vector.

[0069] В еще одном частном примере реализации изобретения общая оценка синтетический изменений изображений лица человека получается усреднением вектора оценок, или получается извлечением максимального значения, или по всему вектору или по его части.[0069] In another particular embodiment of the invention, an overall score of synthetic changes in human face images is obtained by averaging a vector of scores, or is obtained by extracting the maximum value, either over the entire vector or over a part of it.

[0070] В одном из частных примеров реализации изобретения, приведенного на Фиг. 5, для дальнейшего анализа строятся общие пространственные и

характеристики изображений лица человека. Общая пространственная характеристика рассчитывается как средняя по пространственному вектору данного лица. Общая

характеристика получается как длина вектора

характеристики по отношению к длине видео, то есть, является долей времени присутствия данного человека на видео от всего времени на видео. В другом частном варианте для расчета общей пространственной характеристики выбирается максимальное значение или минимальное.[0070] In one of the particular embodiments of the invention shown in FIG. 5, general spatial and

characteristics of human face images. The general spatial characteristic is calculated as an average over the spatial vector of a given person. General

the characteristic is obtained as the length of the vector

characteristics in relation to the length of the video, that is, is the proportion of the time the presence of a given person on the video from the total time on the video. In another particular variant, the maximum value or the minimum value is selected to calculate the overall spatial characteristic.

[0071] На этапе (106) вычисляется итоговая оценка присутствия синтетических изменений лиц для всего видео. Данная оценка строится с помощью каждой общей оценки синтетический изменений изображений лиц людей. Другими словами, на этапах (104)-(105) получаем оценки синтетических изменений для каждого человека на видео (отдельный человек выделяется на этапе 103), а на этапе (106) по оценкам для людей рассчитываем оценку для видео. Данный этап совокупного анализа оценок всех людей на видео позволяет повысить качество работы изобретения по сравнению с существующими. Например, если на видео множество людей и по всем им имеем высокую оценку синтетического изменения, то вероятно всего исследуемое видео очень сильно сжато, и мы имеем ложное положительное решение моделей при анализе изображений лиц. Совокупный анализ оценок на этапе (106) позволит в таком случае сформировать итоговую оценку для видео как «видео без синтетических изменений».[0071] In step (106), a final synthetic face change presence score is calculated for the entire video. This estimate is built using each overall estimate of synthetic changes in images of people's faces. In other words, in steps (104)-(105), we obtain synthetic change scores for each person in the video (an individual person is highlighted in step 103), and in step (106), from the scores for people, we calculate a score for the video. This stage of the cumulative analysis of the ratings of all people in the video allows you to improve the quality of the invention compared to existing ones. For example, if there are a lot of people on the video and for all of them we have a high synthetic change score, then the video under study is probably very compressed, and we have a false positive decision of the models when analyzing face images. The cumulative analysis of the ratings in step (106) would then generate a final rating for the video as "video without synthetic changes".

[0072] В одном из частных вариантов реализации изобретения, представленном на Фиг. 1, используем общие оценки лиц всех людей для формирования итоговой оценки видео могут применяться следующим образом:[0072] In one of the private embodiments of the invention, shown in Fig. 1, using the total scores of the faces of all people to form the final score of the video can be applied as follows:

- Определяется средневзвешенное значение оценок используемых лиц людей. В одном из частных вариантов изобретения весами для оценок могут быть произведение среднего размера изображений лиц данного человека и доли времени присутствия на видео.- The weighted average value of the estimates of the people's faces used is determined. In one particular embodiment of the invention, the weights for ratings can be the product of the average size of the images of the faces of a given person and the share of the presence time on the video.

- Вычисляется простое среднее по оценкам синтетических изменений изображений используемых лиц людей.- Calculates a simple average of the estimates of synthetic changes in the images of the used people's faces.

- Формируется максимальная оценка среди оценок используемых лиц людей.- The maximum rating is formed among the ratings of the people's faces used.

[0073] В другом частном варианте реализации изобретения для указанного выше примера может использоваться обученная модель. Данная модель может представлять метод опорных векторов (SVM), K-соседей (K-nearest neighbour), линейной (нелинейной) регрессии, модель деревьев классификации, одну или несколько нейронных сетей. Подобная модель может принимать на вход вектор (векторное представление данных), который характеризует количество использования интервалов оценок синтетических изменений по лицам.[0073] In another particular embodiment of the invention, a trained model may be used for the above example. This model can represent a support vector machine (SVM), K-neighbors (K-nearest neighbor), linear (non-linear) regression, a classification tree model, one or more neural networks. Such a model can take as input a vector (vector representation of data) that characterizes the amount of use of intervals of estimates of synthetic changes by faces.

[0074] В другом частном варианте реализации изобретения, пример этапов которого приведены на Фигуре 5, формируются общие пространственные и

характеристики изображений лица человека. На этапе (106) эти характеристики сравниваются с соответствующими граничными значениями. Если по итоговой характеристике размер изображения лица или время его присутствия меньше граничного значения, то оценка данного лица человека не учитывается при расчете вероятности синтетического изменения видео (оценка None). Схема данного примера приведена на Фиг. 6. Оставшиеся оценки синтетических изменений лиц людей анализируются далее выше описанными способами.[0074] In another particular embodiment of the invention, an example of the steps of which are shown in Figure 5, general spatial and

characteristics of human face images. At step (106) these characteristics are compared with the corresponding boundary values. If, according to the final characteristic, the size of the face image or the time of its presence is less than the boundary value, then the assessment of this person's face is not taken into account when calculating the probability of a synthetic change in the video (None score). The schematic of this example is shown in Fig. 6. The remaining estimates of synthetic changes in people's faces are analyzed further by the methods described above.

[0075] В другом частном варианте изобретения двумерные матрицы векторных представлений пространственно-временного распределения и оценок синтетических изменений лиц различных людей, формирование которых описано выше, подаются на вход этапа (106), где выполняется формирование итоговой оценки наличия синтетических изменений лиц людей на видео. Этот этап может выполняться также с помощью отдельной модели машинного обучения или ансамбля обученных моделей.[0075] In another particular embodiment of the invention, two-dimensional matrices of vector representations of the spatiotemporal distribution and estimates of synthetic changes in the faces of various people, the formation of which is described above, are fed to the input of stage (106), where the formation of the final assessment of the presence of synthetic changes in the faces of people in the video is performed. This step can also be performed using a single machine learning model or an ensemble of trained models.

[0076] На этапе (107) формируется интегральная оценка наличия на видео синтетически измененного изображения лица по итоговым оценка наличия синтетических изменений лиц людей на видео. Для этого используются по меньшей мере одна итоговая оценка наличия синтетических изменений лиц на видео, которая формируется по отдельной модели классификации и детектирования синтетических изменений лиц. По завершению этапа (107) генерируют уведомление о наличии синтетически измененного лица в видео.[0076] At step (107), an integral assessment of the presence of a synthetically modified face image on the video is formed based on the final assessment of the presence of synthetic changes in people's faces in the video. For this, at least one final assessment of the presence of synthetic changes in faces in the video is used, which is formed according to a separate model for classifying and detecting synthetic changes in faces. Upon completion of step (107), a notification about the presence of a synthetically modified face in the video is generated.

[0077] Уведомление может отображаться непосредственно в графическом интерфейсе пользователя, например, при проведении онлайн-конференции (Zoom, Skype, MS Teams). Также, уведомление может отображаться непосредственно в области выявления синтетического изменения лица, например, в области с изображением лица человека. Дополнительным эффектом от применения изобретения может являться его использование в системах биометрического контроля, например, при получении услуг (например, банковских услуг) или доступа (система контроля доступа, турникет с биометрическим сенсором). При выявлении синтетически измененного изображения лица осуществляется блокировка доступа или запрашиваемого действия со стороны пользователя. В этом случае может дополнительно запрашиваться данные аутентификации пользователя, выбираемые из группы: логин, код, пароль, двухфакторная аутентификация или их сочетания.[0077] The notification may be displayed directly in the graphical user interface, for example, during an online conference (Zoom, Skype, MS Teams). Also, the notification may be displayed directly in the synthetic face change detection area, for example, in the area depicting a person's face. An additional effect of the application of the invention may be its use in biometric control systems, for example, when receiving services (for example, banking services) or access (access control system, turnstile with a biometric sensor). When a synthetically modified face image is detected, access or the requested action is blocked by the user. In this case, user authentication data may be additionally requested, selected from the group: login, code, password, two-factor authentication, or combinations thereof.

[0078] Заявленное решение может применяться в системах мониторинга медиапространства и анализа социальных медиа и СМИ, для выявления публичных известных людей (первые лица государства, медийные личности, известные люди и т.п.), на которых может производиться попытка их компрометации. Такие системы будут являться источником получаемого видео для его последующего анализа, и, в случае выявления синтетических изменений изображений лиц таких людей, им или соответствующей службе может быть направлено уведомление о подложно сформированной информации. Для такого вида уведомления может также сохраняться информация о времени выявленного события, источнике события.[0078] The claimed solution can be used in systems for monitoring the media space and analyzing social media and the media, to identify publicly known people (first persons of the state, media personalities, famous people, etc.), on which they can be compromised. Such systems will be the source of the received video for its subsequent analysis, and if synthetic changes in the images of the faces of such people are detected, they or the relevant service may be notified of the falsely generated information. For this type of notification, information about the time of the detected event, the source of the event can also be stored.

[0079] В одном частном варианте изобретения используется несколько моделей выявления синтетических изменений в изображениях лиц, каждая из которых, по меньшей мере одна модель, обучена на свой алгоритм генерирования синтетических изменений.[0079] In one particular embodiment of the invention, several models for detecting synthetic changes in face images are used, each of which, at least one model, is trained on a different algorithm for generating synthetic changes.

[0080] В другом частном варианте на каждый алгоритм генерирования синтетических изменений обучен ансамбль моделей. Оценки с нескольких моделей в данном ансамбле усредняются.[0080] In another particular embodiment, an ensemble of models is trained for each algorithm for generating synthetic changes. Estimates from several models in this ensemble are averaged.

[0081] Для итоговой классификации полученные оценки обрабатываются интегральным классификатором, что позволяет выявлять скрытые взаимосвязи между предсказаниями моделей для различных алгоритмов генерирования синтетических изменений. Это качество позволяет достичь сверхаддитивного эффекта (синергетического) и повысить качество выявления видео с присутствием синтетических изменений изображений лиц. Общая схема представлена на Фиг. 7. Более подробная схема представлена на Фиг. 8.[0081] For the final classification, the obtained scores are processed by an integral classifier, which makes it possible to reveal hidden relationships between model predictions for various algorithms for generating synthetic changes. This quality makes it possible to achieve a super-additive effect (synergistic) and improve the quality of video detection with the presence of synthetic changes in face images. The general scheme is shown in Fig. 7. A more detailed diagram is shown in FIG. 8.

[0082] В другом частном варианте изобретения интегральным классификатором формирует не только интегральную оценку наличия синтетических изменений лиц людей на видео, но и наиболее вероятный алгоритм, с помощью которого был созданы данные синтетические изменения. Данный пример представлен на Фиг. 7.[0082] In another particular embodiment of the invention, the integral classifier generates not only an integral assessment of the presence of synthetic changes in people's faces in the video, but also the most probable algorithm by which these synthetic changes were created. This example is shown in Fig. 7.

[0083] На Фиг. 9 представлен общий вид вычислительного устройства (600), пригодного для реализации заявленного решения. Устройство (600) может представлять собой, например, компьютер, сервер или иной тип вычислительного устройства, который может применяться для реализации заявленного технического решения. В том числе входить в состав облачной вычислительной платформы.[0083] In FIG. 9 shows a general view of a computing device (600) suitable for implementing the claimed solution. Device (600) may be, for example, a computer, server, or other type of computing device that can be used to implement the claimed technical solution. Including being part of a cloud computing platform.

[0084] В общем случае вычислительное устройство (600) содержит объединенные общей шиной информационного обмена один или несколько процессоров (601), средства памяти, такие как ОЗУ (602) и ПЗУ (603), интерфейсы ввода/вывода (604), устройства ввода/вывода (605), и устройство для сетевого взаимодействия (606).[0084] In general, the computing device (600) comprises one or more processors (601) connected by a common information exchange bus, memory means such as RAM (602) and ROM (603), input/output interfaces (604), input devices / output (605), and a device for networking (606).

[0085] Процессор (601) (или несколько процессоров, многоядерный процессор) могут выбираться из ассортимента устройств, широко применяемых в текущее время, например, компаний Intel™, AMD™, Apple™, Samsung Exynos™, MediaTEK™, Qualcomm Snapdragon™ и т.п. В качестве процессора (601) может также применяться графический процессор, например, Nvidia, AMD, Graphcore и пр.[0085] The processor (601) (or multiple processors, multi-core processor) may be selected from a variety of devices currently widely used, such as Intel™, AMD™, Apple™, Samsung Exynos™, MediaTEK™, Qualcomm Snapdragon™, and etc. The processor (601) can also be a graphics processor such as Nvidia, AMD, Graphcore, etc.

[0086] ОЗУ (602) представляет собой оперативную память и предназначено для хранения исполняемых процессором (601) машиночитаемых инструкций для выполнение необходимых операций по логической обработке данных. ОЗУ (602), как правило, содержит исполняемые инструкции операционной системы и соответствующих программных компонент (приложения, программные модули и т.п.).[0086] RAM (602) is a random access memory and is designed to store machine-readable instructions executable by the processor (601) to perform the necessary data logical processing operations. RAM (602) typically contains the executable instructions of the operating system and associated software components (applications, program modules, etc.).

[0087] ПЗУ (603) представляет собой одно или более устройств постоянного хранения данных, например, жесткий диск (HDD), твердотельный накопитель данных (SSD), флэш-память (EEPROM, NAND и т.п.), оптические носители информации (CD-R/RW, DVD-R/RW, BlueRay Disc, MD) и др.[0087] A ROM (603) is one or more persistent storage devices such as a hard disk drive (HDD), a solid state drive (SSD), flash memory (EEPROM, NAND, etc.), optical storage media ( CD-R/RW, DVD-R/RW, BlueRay Disc, MD), etc.

[0088] Для организации работы компонентов устройства (600) и организации работы внешних подключаемых устройств применяются различные виды интерфейсов В/В (604). Выбор соответствующих интерфейсов зависит от конкретного исполнения вычислительного устройства, которые могут представлять собой, не ограничиваясь: PCI, AGP, PS/2, IrDa, FireWire, LPT, COM, SATA, IDE, Lightning, USB (2.0, 3.0, 3.1, micro, mini, type C), TRS/Audio jack (2.5, 3.5, 6.35), HDMI, DVI, VGA, Display Port, RJ45, RS232 и т.п.[0088] Various types of I/O interfaces (604) are used to organize the operation of device components (600) and organize the operation of external connected devices. The choice of appropriate interfaces depends on the particular design of the computing device, which can be, but not limited to: PCI, AGP, PS/2, IrDa, FireWire, LPT, COM, SATA, IDE, Lightning, USB (2.0, 3.0, 3.1, micro, mini, type C), TRS/Audio jack (2.5, 3.5, 6.35), HDMI, DVI, VGA, Display Port, RJ45, RS232, etc.

[0089] Для обеспечения взаимодействия пользователя с вычислительным устройством (600) применяются различные средства (605) В/В информации, например, клавиатура, дисплей (монитор), сенсорный дисплей, тач-пад, джойстик, манипулятор мышь, световое перо, стилус, сенсорная панель, трекбол, динамики, микрофон, средства дополненной реальности, оптические сенсоры, планшет, световые индикаторы, проектор, камера, средства биометрической идентификации (сканер сетчатки глаза, сканер отпечатков пальцев, модуль распознавания голоса) и т.п.[0089] To ensure user interaction with the computing device (600), various means (605) of I/O information are used, for example, a keyboard, a display (monitor), a touch screen, a touchpad, a joystick, a mouse, a light pen, a stylus, touch panel, trackball, speakers, microphone, augmented reality, optical sensors, tablet, indicator lights, projector, camera, biometric identification tools (retinal scanner, fingerprint scanner, voice recognition module), etc.

[0090] Средство сетевого взаимодействия (606) обеспечивает передачу данных устройством (600) посредством внутренней или внешней вычислительной сети, например, Интранет, Интернет, ЛВС и т.п. В качестве одного или более средств (606) может использоваться, но не ограничиваться: Ethernet карта, GSM модем, GPRS модем, LTE модем, 5G модем, модуль спутниковой связи, NFC модуль, Bluetooth и/или BLE модуль, Wi-Fi модуль и др.[0090] The networking means (606) enables data to be transmitted by the device (600) via an internal or external computer network, such as an Intranet, Internet, LAN, and the like. As one or more means (606) can be used, but not limited to: Ethernet card, GSM modem, GPRS modem, LTE modem, 5G modem, satellite communication module, NFC module, Bluetooth and / or BLE module, Wi-Fi module and others

[0091] Дополнительно могут применяться также средства спутниковой навигации в составе устройства (600), например, GPS, ГЛОНАСС, BeiDou, Galileo.[0091] Additionally, satellite navigation tools in the device (600) can also be used, for example, GPS, GLONASS, BeiDou, Galileo.

[0092] Представленные материалы заявки раскрывают предпочтительные примеры реализации технического решения и не должны трактоваться как ограничивающие иные, частные примеры его воплощения, не выходящие за пределы испрашиваемой правовой охраны, которые являются очевидными для специалистов соответствующей области техники.[0092] The submitted application materials disclose preferred examples of the implementation of the technical solution and should not be construed as limiting other, particular examples of its implementation that do not go beyond the scope of the requested legal protection, which are obvious to specialists in the relevant field of technology.

Claims

1. A computer-implemented method for determining synthetically modified images of faces in a video, which is performed using a processor and contains the following steps:

a) obtaining at least one image from the video;

b) detecting images of faces in said image;

c) calculating a vector representation of the geometric characteristics of the detected face images using at least a face reference point comparison algorithm to determine images of at least one person's face;

d) using frame-by-frame video analysis, the spatio-temporal significance of each image of the face of each person in said image is calculated, which is defined as a vector representation of the spatial characteristic of the face, characterizing the size of the face area in relation to the frame, and a vector representation of the temporal characteristic of the face image, characterizing the display time the analyzed image of the face on the video frames;

e) calculating a vector of estimates of the probability of synthetic changes for images of human faces, characterizing the presence of synthetic changes in the images of faces of this person in each frame;

f) calculating an overall synthetic change probability score based on vector representations of the spatial, temporal, and synthetic change score vectors for each person's face images in the video;

g) forming a final assessment of the presence in the video of a synthetic image change of at least one face;

h) forming an integral estimate of the presence of a synthetically modified face image in the video according to at least one final assessment of the model and generating a notification about the presence of a synthetically modified face in the video.

2. The method according to claim 1, characterized in that steps c)-h) are performed by a machine learning model or ensemble of models, while the machine learning model or ensemble of models is trained on a data set containing synthesized images of people's faces.

3. The method according to claim 2, characterized in that the machine learning model uses the automatic markup correction function, which corrects the incorrect markup of each face in the frames by comparing the images of faces on the synthesized video with their images on the original video.

4. The method according to claim 3, characterized in that the faces are compared based on the value of the vector proximity of the reference points that form the geometric characteristics of the original face image and the synthesized image based on it.

5. The method according to claim 3, characterized in that the face comparison is carried out by analyzing the coordinates of the areas of the original face image and the synthesized face image.

6. The method according to claim 1, characterized in that the spatiotemporal significance is calculated as a general matrix based on the values of vector representations, and the assessment of the presence of synthetic changes in the images of the faces of an individual is formed by a machine learning model based on the obtained general matrix.

7. The method according to claim 2, characterized in that the ensemble of machine learning models consists of a group of models, each of which is trained to identify a specific algorithm for generating synthetic images.

8. The method according to claim 7, characterized in that it contains an integral classifier that receives as input estimates generated using models included in the ensemble.

9. The method according to claim 8, characterized in that the final grade is calculated using an integral classifier.

10. The method according to claim 9, characterized in that the algorithm for generating a synthetic face image in the analyzed video stream is additionally determined.

11. The method of claim. 1, characterized in that the video is an online video conference.

12. The method according to claim 11, characterized in that when a synthetically modified face image is detected, a notification is generated in its display area.

13. The method according to p. 11, characterized in that when determining a synthetically modified face image, the connection with this user is blocked.

14. The method according to claim 1, characterized in that the analyzed image is obtained from a biometric identification or biometric authentication system.

15. The method according to claim 14, characterized in that when determining a synthetically modified face image, access or the requested action is blocked from the user.

16. The method according to claim 14, characterized in that, when determining a synthetically modified face image, they additionally request user authentication data selected from the group: login, code, password, two-factor authentication, or combinations thereof.

17. The method according to claim 14, characterized in that a signal is generated in the form of a quantitative estimate of the probability of the presence of a synthetically modified face image.

18. The method according to claim. 1, characterized in that the images are obtained from a video system for monitoring the media space and analyzing social media and media, which performs content verification in social media and media.

19. The method according to claim 18, characterized in that when the synthetically modified face image is determined, a notification is generated to inform the person who was subject to the creation of the modified face image.

20. A system for determining synthetically modified images of faces in a video, containing at least one processor and at least one memory storing machine-readable instructions that, when executed by the processor, implement the method according to any one of paragraphs. 1-19.