CN103347196A

CN103347196A - Method for evaluating stereo image vision comfort level based on machine learning

Info

Publication number: CN103347196A
Application number: CN2013102649568A
Authority: CN
Inventors: 邵枫; 姜求平; 蒋刚毅; 郁梅; 李福翠; 彭宗举
Original assignee: Ningbo University
Current assignee: Ape Point Technology Beijing Co ltd; Zheng Juan
Priority date: 2013-06-27
Filing date: 2013-06-27
Publication date: 2013-10-09
Anticipated expiration: 2033-06-27
Also published as: CN103347196B

Abstract

The invention discloses a method for evaluating the visual comfort of a stereoscopic image based on machine learning, which first extracts the visually important area mask of the stereoscopic image through the saliency map of the right viewpoint image and the right parallax image, and then uses the visually important area mask to extract The feature vectors used to reflect the parallax magnitude feature, the parallax gradient feature and the feature vector used to reflect the spatial frequency feature are obtained to obtain the feature vector of the stereo image, and then the feature vectors of all the stereo images in the stereo image set are analyzed by using support vector regression Finally, use the support vector regression training model obtained from the training to test each stereo image in the stereo image set, and obtain the visual comfort evaluation prediction value of each stereo image. The advantage is that the feature vector information of the obtained stereo image has relatively It has strong stability and can better reflect the change of the visual comfort of the stereoscopic image, thereby effectively improving the correlation between the objective evaluation and the subjective perception.

Description

A method for evaluating visual comfort of stereo images based on machine learning

技术领域technical field

本发明涉及一种图像质量评价方法，尤其是涉及一种基于机器学习的立体图像视觉舒适度评价方法。The invention relates to an image quality evaluation method, in particular to a machine learning-based method for evaluating visual comfort of stereoscopic images.

背景技术Background technique

随着立体视频显示技术和高质量立体视频内容获取技术的快速发展，立体视频的视觉体验质量（QoE，quality of experience）是立体视频系统设计中的一个重要问题，而视觉舒适度（VC，visual comfort）是影响立体视频的QoE的重要因素。目前，对立体视频/图像的质量评价研究主要考虑内容失真对于图像质量的影响，而很少考虑视觉舒适度等因素的影响。因此，为了提高观看者的视觉体验质量，研究立体视频/图像的视觉舒适度客观评价模型对指导3D内容的制作和后期处理具有十分重要的作用。With the rapid development of stereoscopic video display technology and high-quality stereoscopic video content acquisition technology, the visual quality of experience (QoE, quality of experience) of stereoscopic video is an important issue in the design of stereoscopic video systems, and visual comfort (VC, visual comfort) is an important factor affecting the QoE of stereoscopic video. At present, research on the quality evaluation of stereoscopic video/image mainly considers the influence of content distortion on image quality, but seldom considers the influence of factors such as visual comfort. Therefore, in order to improve the visual experience quality of viewers, it is very important to study the objective evaluation model of visual comfort of stereoscopic video/image to guide the production and post-processing of 3D content.

传统的立体图像视觉舒适度评价方法主要采用全局的视差统计特性来预测视觉舒适度。然而，根据人眼立体视觉注意力特性，人眼只对部分视觉重要区域的视觉舒适/不舒适比较敏感，如果以此全局的视差统计特征来预测视觉重要区域的视觉舒适程度，会导致无法精确预测得到客观评价值。因此，如何在评价过程中有效地根据视觉重要区域来提取出视觉舒适度特征，使得客观评价结果更加感觉符合人类视觉系统，是在对立体图像进行客观视觉舒适度评价过程中需要研究解决的问题。Traditional stereo image visual comfort evaluation methods mainly use the global disparity statistical characteristics to predict visual comfort. However, according to the attention characteristics of human stereo vision, the human eye is only sensitive to the visual comfort/discomfort of some visually important areas. If the global disparity statistical features are used to predict the visual comfort of visually important areas, it will lead to inaccurate Prediction gets an objective evaluation value. Therefore, how to effectively extract visual comfort features based on visually important areas in the evaluation process, so that the objective evaluation results are more in line with the human visual system, is a problem that needs to be studied and solved in the process of objective visual comfort evaluation of stereoscopic images .

发明内容Contents of the invention

本发明所要解决的技术问题是提供一种基于机器学习的立体图像视觉舒适度评价方法，其能够有效地提高客观评价结果与主观感知的相关性。The technical problem to be solved by the present invention is to provide a method for evaluating visual comfort of stereoscopic images based on machine learning, which can effectively improve the correlation between objective evaluation results and subjective perception.

本发明解决上述技术问题所采用的技术方案为：一种基于机器学习的立体图像视觉舒适度评价方法，其特征在于包括以下步骤：The technical solution adopted by the present invention to solve the above-mentioned technical problems is: a method for evaluating visual comfort of stereoscopic images based on machine learning, which is characterized in that it comprises the following steps:

①将待评价的立体图像的左视点图像记为{I_L(x,y)}，将待评价的立体图像的右视点图像记为{I_R(x,y)}，将待评价的立体图像的右视差图像记为{d_R(x,y)}，其中，此处(x,y)表示{I_L(x,y)}、{I_R(x,y)}和{d_R(x,y)}中的像素点的坐标位置，1≤x≤W，1≤y≤H，W表示{I_L(x,y)}、{I_R(x,y)}和{d_R(x,y)}的宽度，H表示{I_L(x,y)}、{I_R(x,y)}和{d_R(x,y)}的高度，I_L(x,y)表示{I_L(x,y)}中坐标位置为(x,y)的像素点的像素值，I_R(x,y)表示{I_R(x,y)}中坐标位置为(x,y)的像素点的像素值，d_R(x,y)表示{d_R(x,y)}中坐标位置为(x,y)的像素点的像素值；① Denote the left viewpoint image of the stereo image to be evaluated as {I _L (x, y)}, the right viewpoint image of the stereo image to be evaluated as {I _R (x, y)}, and the stereo image to be evaluated The right disparity image of the image is denoted as {d _R (x,y)}, where (x,y) denotes {I _L (x,y)}, {I _R (x,y)} and {d _R The coordinate position of the pixel in (x,y)}, 1≤x≤W, 1≤y≤H, W means {I _L (x, y)}, {I _R (x, y)} and {d The width of _R (x,y)}, H means the height of {I _L (x,y)}, {I _R (x,y)} and {d _R (x,y)}, I _L (x,y ) means the pixel value of the pixel whose coordinate position is (x, y) in {I _L (x, y)}, and I _R (x, y) means that the coordinate position in {I _R (x, y)} is (x , y) the pixel value of the pixel point, d _R (x, y) represents the pixel value of the pixel point whose coordinate position is (x, y) in {d _R (x, y)};

②提取出{I_R(x,y)}的显著图；然后根据{I_R(x,y)}的显著图和{d_R(x,y)}，获取{I_R(x,y)}的视觉显著图；再将{I_R(x,y)}的视觉显著图划分为视觉重要区域和非视觉重要区域；最后根据{I_R(x,y)}的视觉显著图的视觉重要区域和非视觉重要区域，获取待评价的立体图像的视觉重要区域掩膜，记为{M(x,y)}，其中，M(x,y)表示{M(x,y)}中坐标位置为(x,y)的像素点的像素值；② Extract the saliency map of {I _R (x, y)}; then according to the saliency map of {I _R (x, y)} and {d _R (x, y)}, get {I _R (x, y) } visual saliency map; then divide the visual saliency map of {I _R (x,y)} into visually important areas and non-visually important areas; finally according to the visual importance of the visual saliency map of {I _R (x,y)} Regions and non-visually important regions, obtain the visually important region mask of the stereo image to be evaluated, denoted as {M(x,y)}, where M(x,y) represents the coordinates in {M(x,y)} The pixel value of the pixel at position (x, y);

③根据{d_R(x,y)}和{M(x,y)}，获取{d_R(x,y)}中与{I_R(x,y)}的视觉显著图的视觉重要区域相对应的区域内的像素点的视差均值μ、视差方差δ、最大负视差θ、视差范围χ，然后将μ、δ、θ和χ按顺序进行排列构成用于反映{d_R(x,y)}的视差幅度特征的特征矢量，记为F₁，F₁=(μ，δ，θ，χ);③According to {d _R (x,y)} and {M(x,y)}, obtain the visually important areas of the visual saliency map in {d _R (x,y)} and {I _R (x,y)} The disparity mean μ, disparity variance δ, maximum negative disparity θ, and disparity range χ of the pixels in the corresponding area, and then arrange μ, δ, θ and χ in order to reflect {d _R (x, y )}, the feature vector of the parallax amplitude feature, denoted as F ₁ , F ₁ = (μ, δ, θ, χ);

④通过计算{d_R(x,y)}的视差梯度幅值图像和视差梯度方向图像，计算{d_R(x,y)}的视差梯度边缘图像；然后根据{d_R(x,y)}的视差梯度边缘图像和{M(x,y)}，计算{d_R(x,y)}的视差梯度边缘图像中与{I_R(x,y)}的视觉显著图的视觉重要区域相对应的区域内的所有像素点的梯度均值ψ；最后将ψ作为用于反映{d_R(x,y)}的视差梯度特征的特征矢量，记为F₂；④ By calculating the parallax gradient magnitude image and parallax gradient direction image of {d _R (x, y)}, calculate the parallax gradient edge image of {d _R (x, y)}; then according to {d _R (x, y) } of the disparity gradient edge image and {M(x,y)}, calculate the visually important regions in the disparity gradient edge image of {d _R (x,y)} and the visual saliency map of {I _R (x,y)} The gradient mean value ψ of all pixels in the corresponding area; finally, ψ is used as a feature vector reflecting the parallax gradient feature of {d _R (x, y)}, denoted as F ₂ ;

⑤获取{I_R(x,y)}的空间频率图像；然后根据{I_R(x,y)}的空间频率图像和{M(x,y)}，获取{I_R(x,y)}的空间频率图像中与{I_R(x,y)}的视觉显著图的视觉重要区域相对应的区域内的像素点的空间频率均值ν、空间频率方差ρ、空间频率范围ζ、空间频率敏感因子τ；再将ν、ρ、ζ和τ按顺序进行排列构成用于反映{I_R(x,y)}的空间频率特征的特征矢量，记为F₃，F₃=(ν，ρ，ζ，τ)；⑤ Obtain the spatial frequency image of {I _R (x, y)}; then according to the spatial frequency image of {I _R (x, y)} and {M (x, y)}, obtain {I _R (x, y) } in the spatial frequency image of {I _R (x,y)}, the spatial frequency mean ν, spatial frequency variance ρ, spatial frequency range ζ, spatial frequency Sensitivity factor τ; then arrange ν, ρ, ζ and τ in order to form a feature vector used to reflect the spatial frequency characteristics of {I _R (x,y)}, denoted as F ₃ , F ₃ =(ν, ρ , ζ, τ);

⑥将F₁、F₂及F₃构成一个新的特征矢量，记为X，X＝[F₁,F₂,F₃]，然后将X作为待评价的立体图像的特征矢量，其中，符号“[]”为矢量表示符号，[F₁,F₂,F₃]表示将F₁、F₂和F₃连接起来形成一个新的特征矢量；⑥Constitute F ₁ , F ₂ and F ₃ into a new feature vector, denoted as X, X=[F ₁ , F ₂ , F ₃ ], and then use X as the feature vector of the stereoscopic image to be evaluated, where the symbol "[]" is a vector symbol, and [F ₁ , F ₂ , F ₃ ] means connecting F ₁ , F ₂ and F ₃ to form a new feature vector;

⑦采用n副不同的立体图像以及对应的右视差图像建立立体图像集合，利用主观质量评价方法分别计算立体图像集合中的每副立体图像的视觉舒适度的平均主观评分均值，记为MOS，其中，n≥1，MOS∈[1,5]；然后按照步骤①至步骤⑥计算待评价的立体图像的特征矢量X的操作，以相同的方式分别计算立体图像集合中的每幅立体图像的特征矢量，将立体图像集合中的第i幅立体图像的特征矢量记为X_i，其中，1≤i≤n，n表示立体图像集合中包含的立体图像的幅数；⑦ Use n different stereoscopic images and corresponding right parallax images to establish a stereoscopic image set, and use the subjective quality evaluation method to calculate the average subjective score of the visual comfort of each stereoscopic image in the stereoscopic image set, denoted as MOS, where , n≥1, MOS∈[1,5]; then follow steps ① to ⑥ to calculate the feature vector X of the stereo image to be evaluated, and calculate the feature of each stereo image in the stereo image set in the same way Vector, denoting the feature vector of the i-th stereo image in the stereo image set as X _i , wherein, 1≤i≤n, n represents the number of stereo images contained in the stereo image set;

⑧将立体图像集合中的所有立体图像分成训练集和测试集，将训练集中的所有立体图像的特征矢量和平均主观评分均值构成训练样本数据集合，将测试集中的所有立体图像的特征矢量和平均主观评分均值构成测试样本数据集合，然后采用支持向量回归作为机器学习的方法，对训练样本数据集合中的所有立体图像的特征矢量进行训练，使得经过训练得到的回归函数值与平均主观评分均值之间的误差最小，拟合得到最优的权重矢量w^opt和最优的偏置项b^opt，接着利用w^opt和b^opt构造得到支持向量回归训练模型，再根据支持向量回归训练模型，对测试样本数据集合中的每幅立体图像的特征矢量进行测试，预测得到测试样本数据集合中的每幅立体图像的客观视觉舒适度评价预测值，将测试样本数据集合中的第k'幅立体图像的客观视觉舒适度评价预测值记为Q_k'，Q_k'＝f(X_k')，

其中，f()为函数表示形式，X_k'表示测试样本数据集合中的第k'幅立体图像的特征矢量，(w^opt)^T为w^opt的转置矩阵，

表示测试样本数据集合中的第k'幅立体图像的线性函数，1≤k'≤n-t，t表示训练集中包含的立体图像的幅数；之后通过重新分配训练集和测试集，重新预测得到测试样本数据集合中的每幅立体图像的客观视觉舒适度评价预测值，经过N次迭代后计算立体图像中的每幅立体图像的客观视觉舒适度评价预测值的平均值，并将计算得到的平均值作为对应那幅立体图像的最终的客观视觉舒适度预测值，其中，N的值取大于100。8. Divide all the stereo images in the stereo image set into training set and test set, form the training sample data set with the feature vectors and average subjective score mean of all the stereo images in the training set, and combine the feature vectors and average values of all the stereo images in the test set The mean value of the subjective score constitutes the test sample data set, and then uses support vector regression as a machine learning method to train the feature vectors of all stereo images in the training sample data set, so that the regression function value obtained after training and the average subjective score mean value The error between is the smallest, and the optimal weight vector w ^opt and the optimal bias item b ^opt are obtained by fitting, and then the support vector regression training model is obtained by using w ^opt and b ^opt , and then according to the support vector regression training model, the test The feature vector of each stereoscopic image in the sample data set is tested, and the objective visual comfort evaluation prediction value of each stereoscopic image in the test sample data set is predicted, and the k'th stereoscopic image in the test sample data set is The predicted value of objective visual comfort evaluation is denoted as Q _k' , Q _k' = f(X _k' ),

Among them, f() is a function representation, X _k' represents the feature vector of the k'th stereo image in the test sample data set, (w ^opt ) ^T is the transposition matrix of w ^opt ,

Represents the linear function of the k'th stereo image in the test sample data set, 1≤k'≤nt, t represents the number of stereo images contained in the training set; after that, re-predict the test by reassigning the training set and the test set The objective visual comfort evaluation prediction value of each stereoscopic image in the sample data set, calculate the average value of the objective visual comfort evaluation prediction value of each stereoscopic image in the stereoscopic image after N iterations, and calculate the average The value is used as the final objective visual comfort prediction value corresponding to that stereo image, wherein, the value of N is greater than 100.

所述的步骤②的具体过程为：The concrete process of described step 2. is:

②-1、采用基于图论的视觉显著性模型提取出{I_R(x,y)}的显著图，记为{SM_R(x,y)}，其中，SM_R(x,y)表示{SM_R(x,y)}中坐标位置为(x,y)的像素点的像素值；②-1. Use the visual saliency model based on graph theory to extract the saliency graph of {I _R (x, y)}, denoted as {SM _R (x, y)}, where SM _R (x, y) means The pixel value of the pixel whose coordinate position is (x, y) in {SM _R (x, y)};

②-2、根据{SM_R(x,y)}和{d_R(x,y)}，获取{I_R(x,y)}的视觉显著图，记为{D_R(x,y)}，将{D_R(x,y)}中坐标位置为(x,y)的像素点的像素值记为D_R(x,y)，其中，

表示SM_R(x,y)的权重，

表示d_R(x,y)的权重，

②-2. According to {SM _R (x,y)} and {d _R (x,y)}, obtain the visual saliency map of {I _R (x,y)}, denoted as {D _R (x,y) }, record the pixel value of the pixel whose coordinate position is (x, y) in {D _R (x, y)} as D _R (x, y), in,

Represents the weight of SM _R (x,y),

Indicates the weight of d _R (x,y),

②-3、根据{D_R(x,y)}中的每个像素点的像素值，将{D_R(x,y)}划分为视觉重要区域和非视觉重要区域，{D_R(x,y)}的视觉重要区域中的每个像素点的像素值大于自适应阈值T₁，{D_R(x,y)}的非视觉重要区域中的每个像素点的像素值小于或等于自适应阈值T₁，其中，T₁为利用大津法对{D_R(x,y)}进行处理得到的阈值；②-3. According to the pixel value of each pixel in {D _R (x, y)}, divide {D _R (x, y)} into visually important areas and non-visually important areas, {D _R (x , y)}, the pixel value of each pixel in the visually important area is greater than the adaptive threshold T ₁ , and the pixel value of each pixel in the non-visually important area of {D _R (x, y)} is less than or equal to Adaptive threshold T ₁ , where T ₁ is the threshold obtained by processing {D _R (x, y)} using the Otsu method;

②-4、根据{D_R(x,y)}的视觉重要区域和非视觉重要区域，获取待评价的立体图像的视觉重要区域掩膜，记为{M(x,y)}，将{M(x,y)}中坐标位置为(x,y)的像素点的像素值记为M(x,y)， $M (x, y) = \{\begin{matrix} 1 & D_{R} (x, y) > T_{1} \\ 0 & D_{R} (x, y) \leq T_{1} \end{matrix} .$ ②-4. According to the visually important area and non-visually important area of {D _R (x, y)}, obtain the visually important area mask of the stereo image to be evaluated, which is recorded as {M(x,y)}, and { The pixel value of the pixel whose coordinate position is (x, y) in M(x, y)} is recorded as M(x, y), $m (x, the y) = \{\begin{matrix} 1 & {D.}_{R} (x, the y) > T_{1} \\ 0 & {D.}_{R} (x, the y) \leq T_{1} \end{matrix} .$

所述的步骤②-2中取

。Take from the step ②-2

.

所述的步骤③的具体过程为：The concrete process of described step 3. is:

③-1、根据{d_R(x,y)}和{M(x,y)}，计算{d_R(x,y)}中与{I_R(x,y)}的视觉显著图的视觉重要区域相对应的区域内的所有像素点的视差均值，记为μ， $μ = \frac{\underset{(x, y) &Element; Ω}{Σ} d_{R} (x, y) \times M (x, y)}{\underset{(x, y) &Element; Ω}{Σ} M (x, y)},$ 其中，Ω表示图像域范围；③-1. According to {d _R (x, y)} and {M (x, y)}, calculate the visual saliency map in {d _R (x, y)} and {I _R (x, y)} The average disparity value of all pixels in the area corresponding to the visually important area, denoted as μ, $μ = \frac{\underset{(x, the y) &Element; Ω}{Σ} d_{R} (x, the y) \times m (x, the y)}{\underset{(x, the y) &Element; Ω}{Σ} m (x, the y)},$ Among them, Ω represents the image domain range;

③-2、根据{d_R(x,y)}和{M(x,y)}及μ，计算{d_R(x,y)}中与{I_R(x,y)}的视觉显著图的视觉重要区域相对应的区域内的所有像素点的视差方差，记为δ， $δ = \frac{\underset{(x, y) &Element; Ω}{Σ} {(d_{R} (x, y) - μ)}^{2} \times M (x, y)}{\underset{(x, y) &Element; Ω}{Σ} M (x, y)};$ ③-2. According to {d _R (x,y)} and {M(x,y)} and μ, calculate the visual salience of {I _R (x,y)} in {d _R (x,y)} The disparity variance of all pixels in the area corresponding to the visually important area of the graph is denoted as δ, $δ = \frac{\underset{(x, the y) &Element; Ω}{Σ} {(d_{R} (x, the y) - μ)}^{2} \times m (x, the y)}{\underset{(x, the y) &Element; Ω}{Σ} m (x, the y)};$

③-3、计算{d_R(x,y)}中与{I_R(x,y)}的视觉显著图的视觉重要区域相对应的区域内的像素点的最大负视差，记为θ，其中，θ的值为{d_R(x,y)}中与{I_R(x,y)}的视觉显著图的视觉重要区域相对应的区域内视差值最小的1%像素点的视差均值；③-3. Calculate the maximum negative disparity of the pixels in {d _R (x, y)} corresponding to the visually important area of the visual saliency map of {I _R (x, y)}, denoted as θ, Among them, the value of θ is the disparity of 1% of the pixels in {d _R (x, y)} corresponding to the visually important area of the visual saliency map of {I _R (x, y)} with the smallest disparity value mean;

③-4、计算{d_R(x,y)}中与{I_R(x,y)}的视觉显著图的视觉重要区域相对应的区域内的像素点的视差范围，记为χ，χ＝d_max-d_min，其中，d_max表示{d_R(x,y)}中与{I_R(x,y)}的视觉显著图的视觉重要区域相对应的区域内视差值最大的1%像素点的视差均值，d_min表示{d_R(x,y)}中与{I_R(x,y)}的视觉显著图的视觉重要区域相对应的区域内视差值最小的1%像素点的视差均值；③-4, calculate the disparity range of the pixels in the region corresponding to the visually important region of the visual saliency map of {I _R (x, y)} in {d _R (x, y)}, denoted as χ, χ ＝d _max -d _min , wherein, d _max represents the maximum disparity value in {d _R (x, y)} corresponding to the visually important area of the visual saliency map of {I _R (x, y)} The mean value of the disparity of 1% of the pixels, d _min represents the minimum disparity value of 1 in the area corresponding to the visually important area of the visual saliency map of {I _R (x, y)} in {d _R (x, y)} The average parallax value of % pixels;

③-5、将μ、δ、θ和χ按顺序进行排列构成用于反映{d_R(x,y)}的视差幅度特征的特征矢量，记为F₁，F₁=(μ，δ，θ，χ)，F₁的维数为4。③-5. Arrange μ, δ, θ and χ in order to form a feature vector used to reflect the parallax amplitude characteristics of {d _R (x, y)}, denoted as F ₁ , F ₁ =(μ, δ, θ, χ), the dimension of _F1 is 4.

所述的步骤④的具体过程为：The concrete process of described step 4. is:

④-1、计算{d_R(x,y)}的视差梯度幅值图像，记为{m(x,y)}，将{m(x,y)}中坐标位置为(x,y)的像素点的梯度幅值记为m(x,y)，

其中，G_x(x,y)表示{m(x,y)}中坐标位置为(x,y)的像素点的水平梯度值，G_y(x,y)表示{m(x,y)}中坐标位置为(x,y)的像素点的垂直梯度值；④-1. Calculate the parallax gradient magnitude image of {d _R (x,y)}, which is recorded as {m(x,y)}, and the coordinate position in {m(x,y)} is (x,y) The gradient magnitude of the pixel point is recorded as m(x,y),

Among them, G _x (x, y) represents the horizontal gradient value of the pixel whose coordinate position is (x, y) in {m(x, y)}, and G _y (x, y) represents {m(x, y) } in the vertical gradient value of the pixel whose coordinate position is (x, y);

④-2、计算{d_R(x,y)}的视差梯度方向图像，记为{θ(x,y)}，将{θ(x,y)}中坐标位置为(x,y)的像素点的梯度方向值记为θ(x,y)，θ(x,y)＝arctan(G_y(x,y)/G_x(x,y))，其中，arctan()为取反正切函数；④-2. Calculate the disparity gradient direction image of {d _R (x,y)}, which is recorded as {θ(x,y)}, and the coordinate position in {θ(x,y)} is (x,y) The gradient direction value of the pixel is recorded as θ(x,y), θ(x,y)=arctan(G _y (x,y)/G _x (x,y)), where arctan() is the arc tangent function;

④-3、根据{m(x,y)}和{θ(x,y)}，计算{d_R(x,y)}的视差梯度边缘图像，记为{E(x,y)}，将{E(x,y)}中坐标位置为p的像素点的梯度边缘值记为E(p)，④-3. According to {m(x,y)} and {θ(x,y)}, calculate the parallax gradient edge image of {d _R (x,y)}, denoted as {E(x,y)}, Record the gradient edge value of the pixel point whose coordinate position is p in {E(x,y)} as E(p),

其中，G_s(||p-q||)表示标准差为σ_s的高斯函数，

p-q表示坐标位置p和坐标位置q之间的欧氏距离，符号“|| ||”为求欧氏距离符号，

表示标准差为σ_o的高斯函数，

G_{o} (| | \overset{&RightArrow;}{θ} (p) - \overset{&RightArrow;}{θ} (q) | |) = \exp (- \frac{{| | \overset{&RightArrow;}{θ} (p) - \overset{&RightArrow;}{θ} (q) | |}^{2}}{2 σ_{o}^{2}}),

表示与

之间的欧氏距离，

\overset{&RightArrow;}{θ} (p) = [\sin (θ (p)), \cos ((p))],

\overset{&RightArrow;}{θ} (q) = [\sin (θ (q)), \cos (θ (q))],

θ(p)表示{θ(x,y)}中坐标位置为p的像素点的梯度方向值，θ(q)表示{θ(x,y)}中坐标位置为q的像素点的梯度方向值，

m(q)表示{m(x,y)}中坐标位置为q的像素点的梯度幅值，m(q')表示{m(x,y)}中坐标位置为q'的像素点的梯度幅值，ε_g为控制参数，符号“[]”为矢量表示符号，exp()表示以e为底的指数函数，e＝2.71828183，

表示以坐标位置为p的像素点为中心的邻域窗口，表示以坐标位置为q的像素点为中心的邻域窗口；

Among them, G _s (||pq||) represents the Gaussian function whose standard deviation is σ _s ,

pq represents the Euclidean distance between the coordinate position p and the coordinate position q, and the symbol "|| ||" is the Euclidean distance symbol,

Represents a Gaussian function with standard deviation σ _o ,

G_{o} (| | \overset{&Right Arrow;}{θ} (p) - \overset{&Right Arrow;}{θ} (q) | |) = \exp (- \frac{{| | \overset{&Right Arrow;}{θ} (p) - \overset{&Right Arrow;}{θ} (q) | |}^{2}}{2 σ_{o}^{2}}),

express and

The Euclidean distance between

\overset{&Right Arrow;}{θ} (p) = [\sin (θ (p)), \cos ((p))],

\overset{&Right Arrow;}{θ} (q) = [\sin (θ (q)), \cos (θ (q))],

θ(p) represents the gradient direction value of the pixel point whose coordinate position is p in {θ(x,y)}, and θ(q) represents the gradient direction value of the pixel point whose coordinate position is q in {θ(x,y)} value,

m(q) represents the gradient magnitude of the pixel at the coordinate position q in {m(x,y)}, and m(q') represents the gradient magnitude of the pixel at the coordinate position q' in {m(x,y)} Gradient amplitude, ε _g is the control parameter, the symbol “[]” is the vector representation symbol, exp() represents the exponential function with e as the base, e=2.71828183,

Indicates the neighborhood window centered on the pixel at the coordinate position p, Represents the neighborhood window centered on the pixel point whose coordinate position is q;

④-4、根据{E(x,y)}和{M(x,y)}，计算{E(x,y)}中与{I_R(x,y)}的视觉显著图的视觉重要区域相对应的区域内的所有像素点的梯度均值，记为ψ，其中，Ω表示图像域范围，E(x,y)表示{E(x,y)}中坐标位置为(x,y)的像素点的梯度边缘值；④-4. According to {E(x,y)} and {M(x,y)}, calculate the visual importance of the visual saliency map in {E(x,y)} and {I _R (x,y)} The gradient mean of all pixels in the region corresponding to the region is denoted as ψ, Among them, Ω represents the range of the image domain, and E(x, y) represents the gradient edge value of the pixel whose coordinate position is (x, y) in {E(x, y)};

④-5、将ψ作为用于反映{d_R(x,y)}的视差梯度特征的特征矢量，记为F₂，F₂的维数为1。④-5. Take ψ as a feature vector for reflecting the disparity gradient feature of {d _R (x, y)}, denoted as F ₂ , and the dimension of F ₂ is 1.

所述的步骤④-3中取σ_s＝0.4，σ_o＝0.4，ε_g＝0.5。In the step ④-3, σ _s =0.4, σ _o =0.4, ε _g =0.5.

所述的步骤④-3中的大小为3×3，

的大小为3×3。In the step ④-3 described has a size of 3×3,

The size is 3×3.

所述的步骤⑤的具体过程为：The concrete process of described step 5. is:

⑤-1、计算{I_R(x,y)}的空间频率图像，记为{SF(x,y)}，将{SF(x,y)}中坐标位置为(x,y)的像素点的空间频率值记为SF(x,y)， $SF (x, y) = \sqrt{{(HF (x, y))}^{2} + {(VF (x, y))}^{2} + {(DF (x, y))}^{2}},$ 其中，HF(x,y)表示{I_R(x,y)}中坐标位置为(x,y)的像素点的水平方向频率值， $HF (x, y) = \sqrt{\frac{Σ_{m = - 1}^{1} Σ_{n = 0}^{1} {(I_{R} (x + m, y + n) - I_{R} (x + m, y + n - 1))}^{2}}{3 \times 2}},$ VF(x,y)表示{I_R(x,y)}中坐标位置为(x,y)的像素点的垂直方向频率值， $VF (x, y) = \sqrt{\frac{Σ_{m = 0}^{1} Σ_{n = - 1}^{1} {(I_{R} (x + m, y + n) - I_{R} (x + m - 1, y + n))}^{2}}{2 \times 3}},$ DF(x,y)表示{I_R(x,y)}中坐标位置为(x,y)的像素点的对角方向频率值， $DF (x, y) = \sqrt{\frac{Σ_{m = 0}^{1} Σ_{n = 0}^{1} {(I_{R} (x + m, y + n) - I_{R} (x + m - 1, y + n - 1))}^{2}}{2 \times 2}} + \sqrt{\frac{Σ_{m = - 1}^{0} Σ_{n = 0}^{1} {(I_{R} (x + m, y + n) - I_{R} (x + m + 1, y + n - 1))}^{2}}{2 \times 2}},$ I_R(x+m,y+n)表示{I_R(x,y)}中坐标位置为(x+m,y+n)的像素点的像素值，I_R(x+m,y+n-1)表示{I_R(x,y)}中坐标位置为(x+m,y+n-1)的像素点的像素值，I_R(x+m-1,y+n)表示{I_R(x,y)}中坐标位置为(x+m-1,y+n)的像素点的像素值，I_R(x+m-1,y+n-1)表示{I_R(x,y)}中坐标位置为(x+m-1,y+n-1)的像素点的像素值，I_R(x+m+1,y+n-1)表示{I_R(x,y)}中坐标位置为(x+m+1,y+n-1)的像素点的像素值，如果x+m<1，则I_R(x+m,y+n)的值由I_R(1,y+n)的值替代，I_R(x+m,y+n-1)的值由I_R(1,y+n-1)的值替代；如果x+m-1<1，则I_R(x+m-1,y+n)的值由I_R(1,y+n)的值替代，I_R(x+m-1,y+n-1)的值由I_R(1,y+n-1)的值替代；如果x+m>W，则I_R(x+m,y+n)的值由I_R(W,y+n)的值替代，I_R(x+m,y+n-1)的值由I_R(W,y+n-1)的值替代；如果x+m+1>W，则I_R(x+m+1,y+n-1)的值由I_R(W,y+n-1)的值替代；如果y+n<1，则I_R(x+m,y+n)的值由I_R(x+m,1)的值替代，I_R(x+m-1,y+n)的值由I_R(x+m-1,1)的值替代；如果y+n-1<1，则I_R(x+m,y+n-1)的值由I_R(x+m,1)的值替代，I_R(x+m-1,y+n-1)的值由I_R(x+m-1,1)的值替代，I_R(x+m+1,y+n-1)的值由I_R(x+m+1,1)的值替代；如果y+n>H，则I_R(x+m,y+n)的值由I_R(x+m,H)的值替代，I_R(x+m-1,y+n)的值由I_R(x+m-1,H)的值替代；⑤-1. Calculate the spatial frequency image of {I _R (x, y)}, which is recorded as {SF(x, y)}, and the pixel whose coordinate position is (x, y) in {SF(x, y)} The spatial frequency value of a point is denoted as SF(x,y), $SF (x, the y) = \sqrt{{(HF (x, the y))}^{2} + {(VF (x, the y))}^{2} + {(DF (x, the y))}^{2}},$ Among them, HF(x, y) represents the horizontal direction frequency value of the pixel whose coordinate position is (x, y) in {I _R (x, y)}, $HF (x, the y) = \sqrt{\frac{Σ_{m = - 1}^{1} Σ_{no = 0}^{1} {(I_{R} (x + m, the y + no) - I_{R} (x + m, the y + no - 1))}^{2}}{3 \times 2}},$ VF(x,y) represents the vertical frequency value of the pixel whose coordinate position is (x,y) in {I _R (x,y)}, $VF (x, the y) = \sqrt{\frac{Σ_{m = 0}^{1} Σ_{no = - 1}^{1} {(I_{R} (x + m, the y + no) - I_{R} (x + m - 1, the y + no))}^{2}}{2 \times 3}},$ DF(x, y) represents the diagonal direction frequency value of the pixel point whose coordinate position is (x, y) in {I _R (x, y)}, $DF (x, the y) = \sqrt{\frac{Σ_{m = 0}^{1} Σ_{no = 0}^{1} {(I_{R} (x + m, the y + no) - I_{R} (x + m - 1, the y + no - 1))}^{2}}{2 \times 2}} + \sqrt{\frac{Σ_{m = - 1}^{0} Σ_{no = 0}^{1} {(I_{R} (x + m, the y + no) - I_{R} (x + m + 1, the y + no - 1))}^{2}}{2 \times 2}},$ I _R (x+m, y+n) represents the pixel value of the pixel whose coordinate position is (x+m, y+n) in {I _R (x, y)}, and I _R (x+m, y+ n-1) represents the pixel value of the pixel point whose coordinate position is (x+m, y+n-1) in {I _R (x, y)}, and I _R (x+m-1, y+n) represents The pixel value of the pixel whose coordinate position is (x+m-1, y+n) in {I _R (x, y)}, I _R (x+m-1, y+n-1) means {I _R The pixel value of the pixel whose coordinate position is (x+m-1, y+n-1) in (x, y)}, I _R (x+m+1, y+n-1) means that {I _R ( The pixel value of the pixel whose coordinate position is (x+m+1,y+n-1) in x,y)}, if x+m<1, then the value of I _R (x+m,y+n) is replaced by the value of I _R (1,y+n), and the value of I _R (x+m,y+n-1) is replaced by the value of I _R (1,y+n-1); if x+m- 1<1, the value of I _R (x+m-1,y+n) is replaced by the value of I _R (1,y+n), and the value of I _R (x+m-1,y+n-1) The value is replaced by the value of I _R (1,y+n-1); if x+m>W, the value of I _R (x+m,y+n) is replaced by the value of I _R (W,y+n) Instead, the value of I _R (x+m,y+n-1) is replaced by the value of I _R (W,y+n-1); if x+m+1>W, then I _R (x+m+ 1,y+n-1) is replaced by the value of I _R (W,y+n-1); if y+n<1, the value of I _R (x+m,y+n) is replaced by I _R The value of (x+m,1) is replaced, the value of I _R (x+m-1,y+n) is replaced by the value of I _R (x+m-1,1); if y+n-1<1 , then the value of I _R (x+m,y+n-1) is replaced by the value of I _R (x+m,1), and the value of I _R (x+m-1,y+n-1) is replaced by I The value of _R (x+m-1,1) is replaced, and the value of I _R (x+m+1,y+n-1) is replaced by the value of I _R (x+m+1,1); if y+ n>H, then the value of I _R (x+m,y+n) is replaced by the value of I _R (x+m,H), and the value of I _R (x+m-1,y+n) is replaced by I _R The value of (x+m-1,H) is replaced;

⑤-2、根据{SF(x,y)}和{M(x,y)}，计算{SF(x,y)}中与{I_R(x,y)}的视觉显著图的视觉重要区域相对应的区域内的所有像素点的空间频率均值，记为ν， $ν = \frac{\underset{(x, y) &Element; Ω}{Σ} SF (x, y) \times M (x, y)}{\underset{(x, y) &Element; Ω}{Σ} M (x, y)},$ 其中，Ω表示图像域范围；⑤-2. According to {SF(x,y)} and {M(x,y)}, calculate the visual importance of the visual saliency map in {SF(x,y)} and {I _R (x,y)} The spatial frequency mean value of all pixels in the region corresponding to the region is denoted as ν, $ν = \frac{\underset{(x, the y) &Element; Ω}{Σ} SF (x, the y) \times m (x, the y)}{\underset{(x, the y) &Element; Ω}{Σ} m (x, the y)},$ Among them, Ω represents the image domain range;

⑤-3、根据{SF(x,y)}和{M(x,y)}及ν，计算{SF(x,y)}中与{I_R(x,y)}的视觉显著图的视觉重要区域相对应的区域内的所有像素点的空间频率方差，记为ρ， $ρ = \frac{\underset{(x, y) &Element; Ω}{Σ} {(SF (x, y) - ν)}^{2} \times M (x, y)}{\underset{(x, y) &Element; Ω}{Σ} M (x, y)};$ ⑤-3. According to {SF(x,y)} and {M(x,y)} and ν, calculate the visual saliency map in {SF(x,y)} and {I _R (x,y)} The spatial frequency variance of all pixels in the area corresponding to the visually important area is denoted as ρ, $ρ = \frac{\underset{(x, the y) &Element; Ω}{Σ} {(SF (x, the y) - ν)}^{2} \times m (x, the y)}{\underset{(x, the y) &Element; Ω}{Σ} m (x, the y)};$

⑤-4、计算{SF(x,y)}中与{I_R(x,y)}的视觉显著图的视觉重要区域相对应的区域内的像素点的空间频率范围，记为ζ，ζ=SF_max-SF_min,其中，SF_max表示{SF(x,y)}中与{I_R(x,y)}的视觉显著图的视觉重要区域相对应的区域内空间频率值最大的1%像素点的空间频率均值，SF_min表示{SF(x,y)}中与{I_R(x,y)}的视觉显著图的视觉重要区域相对应的区域内空间频率值最小的1%像素点的空间频率均值；⑤-4. Calculate the spatial frequency range of the pixels in {SF (x, y)} corresponding to the visually important area of the visual saliency map of {I _R (x, y)}, denoted as ζ, ζ =SF _max -SF _min , where SF _max represents the 1 with the largest spatial frequency value in the region corresponding to the visually important region of the visual saliency map of {I _R (x,y)} in {SF(x,y)} % Spatial frequency mean of pixels, SF _min represents the smallest 1% of the spatial frequency value in the region corresponding to the visually important region of the visual saliency map of {I _R (x,y)} in {SF(x,y)} The spatial frequency mean of the pixel;

⑤-5、计算{SF(x,y)}中与{I_R(x,y)}的视觉显著图的视觉重要区域相对应的区域内的像素点的空间频率敏感因子，记为τ，τ＝ν/μ；⑤-5. Calculate the spatial frequency sensitivity factor of the pixels in the region corresponding to the visually important region of the visual saliency map of {I _R (x, y)} in {SF (x, y)}, denoted as τ, τ=ν/μ;

⑤-6、将ν、ρ、ζ和τ按顺序进行排列构成用于反映{I_R(x,y)}的空间频率特征的特征矢量，记为F₃，F₃=(ν，ρ，ζ，τ)，F₃的维数为4。⑤-6. Arrange ν, ρ, ζ and τ in order to form a feature vector for reflecting the spatial frequency characteristics of {I _R (x, y)}, denoted as F ₃ , F ₃ =(ν, ρ, ζ, τ), the dimension of F ₃ is 4.

所述的步骤⑧的具体过程为：The concrete process of described step 8. is:

⑧-1、随机选择立体图像集合中的

幅立体图像构成训练集，将立体图像集合中剩余的n-t幅立体图像构成测试集，其中，符号

为向上取整符号；⑧-1, randomly select the stereo image set

Stereo images constitute the training set, and the remaining nt stereo images in the stereo image set constitute the test set, where the symbol

is the symbol for rounding up;

⑧-2、将训练集中的所有立体图像的特征矢量和平均主观评分均值构成训练样本数据集合，记为Ω_t，{X_k,MOS_k}∈Ω_t，其中，X_k表示训练样本数据集合Ω_t中的第k幅立体图像的特征矢量，MOS_k表示训练样本数据集合Ω_t中的第k幅立体图像的平均主观评分均值，1≤k≤t；⑧-2. The feature vectors and average subjective ratings of all stereo images in the training set constitute the training sample data set, which is recorded as Ω _t , {X _k ,MOS _k }∈Ω _t , where X _k represents the training sample data set The feature vector of the k-th stereo image in Ω _t , MOS _k represents the average subjective score mean value of the k-th stereo image in the training sample data set Ω _t , 1≤k≤t;

⑧-3、构造训练样本数据集合Ω_t中的每幅立体图像的特征矢量的回归函数，将X_k的回归函数记为f(X_k)，

其中，f()为函数表示形式，w为权重矢量，w^T为w的转置矩阵，b为偏置项，表示X_k的线性函数，

D(X_k,X_l)为支持向量回归中的核函数，

X_l为训练样本数据集合Ω_t中的第l幅立体图像的特征矢量，1≤l≤t，γ为核参数，exp()表示以e为底的指数函数，e＝2.71828183，符号“|| ||”为求欧式距离符号；8.-3, construct the regression function of the feature vector of each stereoscopic image in the training sample data set Ω _t , the regression function of X _k is denoted as f(X _k ),

Among them, f() is the function representation, w is the weight vector, w ^T is the transpose matrix of w, b is the bias term, represents a linear function of X _k ,

D(X _k ,X _l ) is the kernel function in support vector regression,

X _l is the feature vector of the lth stereo image in the training sample data set Ω _t , 1≤l≤t, γ is the kernel parameter, exp() represents the exponential function with e as the base, e=2.71828183, the symbol "| | ||" is the Euclidean distance symbol;

⑧-4、采用支持向量回归对训练样本数据集合Ω_t中的所有立体图像的特征矢量进行训练，使得经过训练得到的回归函数值与平均主观评分均值之间的误差最小，拟合得到最优的权重矢量w^opt和最优的偏置项b^opt，将最优的权重矢量w^opt和最优的偏置项b^opt的组合记为(w^opt,b^opt)， $(w^{opt}, b^{opt}) = \underset{(w, b) &Element; Ψ}{\arg \min} Σ_{k = 1}^{t} {(f (X_{k}) - {MOS}_{k})}^{2},$ 利用得到的最优的权重矢量w^opt和最优的偏置项b^opt构造支持向量回归训练模型，记为

其中，Ψ表示对训练样本数据集合Ω_t中的所有立体图像的特征矢量进行训练的所有的权重矢量和偏置项的组合的集合，

\underset{(w, b) &Element; Ψ}{\arg \min} Σ_{k = 1}^{t} {(f (X_{k}) - {MOS}_{k})}^{2}

表示使得

Σ_{k = 1}^{t} {(f (X_{k}) - {MOS}_{k})}^{2}

最小的w和b的值，X_inp表示支持向量回归训练模型的输入矢量，(w^opt)^T为w^opt的转置矩阵，

表示支持向量回归训练模型的输入矢量X_inp的线性函数；⑧-4, adopting support vector regression to train the feature vectors of all stereoscopic images in the training sample data set Ω _t , so that the error between the regression function value obtained through training and the mean value of the average subjective rating is the smallest, and the fitting is optimal The weight vector w ^opt and the optimal bias item b ^opt , the combination of the optimal weight vector w ^opt and the optimal bias item b ^opt is recorded as (w ^opt , b ^opt ),

(w^{opt}, b^{opt}) = \underset{(w, b) &Element; Ψ}{\arg \min} Σ_{k = 1}^{t} {(f (x_{k}) - {MOS}_{k})}^{2},

Using the obtained optimal weight vector w ^opt and the optimal bias item b ^opt to construct a support vector regression training model, denoted as

Among them, Ψ represents the set of combinations of all weight vectors and bias items that are trained on the feature vectors of all stereo images in the training sample data set _Ωt ,

\underset{(w, b) &Element; Ψ}{\arg \min} Σ_{k = 1}^{t} {(f (x_{k}) - {MOS}_{k})}^{2}

express to make

Σ_{k = 1}^{t} {(f (x_{k}) - {MOS}_{k})}^{2}

The smallest value of w and b, X _inp represents the input vector of the support vector regression training model, (w ^opt ) ^T is the transpose matrix of w ^opt ,

Represents a linear function of the input vector X _inp of the support vector regression training model;

⑧-5、将测试集中的所有立体图像的特征矢量和平均主观评分均值构成测试样本数据集合，然后根据支持向量回归训练模型，对测试样本数据集合中的每幅立体图像的特征矢量进行测试，预测得到测试样本数据集合中的每幅立体图像的客观视觉舒适度评价预测值，将测试样本数据集合中的第k'幅立体图像的客观视觉舒适度评价预测值记为Q_k'，Q_k'＝f(X_k')，其中，X_k'表示测试样本数据集合中的第k'幅立体图像的特征矢量，表示测试样本数据集合中的第k'幅立体图像的线性函数，1≤k'≤n-t；8.-5, the eigenvectors of all stereoscopic images in the test set and the mean value of the average subjective score form the test sample data set, then according to the support vector regression training model, the eigenvectors of each stereoscopic image in the test sample data set are tested, The objective visual comfort evaluation prediction value of each stereoscopic image in the test sample data set is predicted, and the objective visual comfort evaluation prediction value of the k'th stereo image in the test sample data set is recorded as Q _k' , Q _{k '} = f(X _k' ), Among them, X _k' represents the feature vector of the k'th stereo image in the test sample data set, Represents the linear function of the k'th stereo image in the test sample data set, 1≤k'≤nt;

⑧-6、再重新随机选择立体图像集合中的幅立体图像构成训练集，将立体图像集合中剩余的n-t幅立体图像构成测试集，然后返回步骤⑧-2继续执行，在经过N次迭代后，计算立体图像集合中的每幅立体图像的客观视觉舒适度评价预测值的平均值，再将计算得到的平均值作为对应那幅立体图像的最终的客观视觉舒适度评价预测值，其中，N的值取大于100。⑧-6, and then re-randomly select the stereo image collection Stereo images form the training set, and the remaining nt stereo images in the stereo image set constitute the test set, and then return to step ⑧-2 to continue execution. After N iterations, calculate the objective value of each stereo image in the stereo image set The average value of the predicted value of the visual comfort evaluation, and then the calculated average value is used as the final predicted value of the objective visual comfort evaluation corresponding to the stereo image, wherein the value of N is greater than 100.

所述的步骤⑧-3中取γ＝54。In the step 8-3, γ=54 is taken.

与现有技术相比，本发明的优点在于：Compared with the prior art, the present invention has the advantages of:

1）本发明方法考虑到视觉重要区域对视觉舒适度的影响，因此根据立体图像的右视点图像的显著图和立体图像的右视差图像提取出立体图像的视觉重要区域掩膜，然后根据视觉重要区域掩膜只对视觉重要区域进行评价，从而有效地提高了客观评价结果与主观感知的相关性。1) The method of the present invention takes into account the influence of visually important regions on visual comfort, so the visually important region mask of the stereoscopic image is extracted according to the saliency map of the right viewpoint image of the stereoscopic image and the right disparity image of the stereoscopic image, and then according to the visually important Region masks only evaluate visually important regions, thus effectively improving the correlation between objective evaluation results and subjective perception.

2）本发明方法根据用于反映立体图像的右视差图像的视差幅度特征的特征矢量、用于反映立体图像的右视差图像的视差梯度特征的特征矢量、用于反映立体图像的右视点图像的空间频率特征的特征矢量，得到立体图像的特征矢量，然后利用支持向量回归对立体图像集合中的所有立体图像的特征矢量进行训练，计算得到立体图像集合中的每幅立体图像的客观视觉舒适度评价预测值，由于获得的立体图像的特征矢量信息具有较强的稳定性且能够较好地反映立体图像的视觉舒适度变化情况，因此有效地提高了客观评价情况与主观感知的相关性。2) The method of the present invention is based on the feature vector used to reflect the parallax amplitude feature of the right parallax image of the stereo image, the feature vector used to reflect the parallax gradient feature of the right parallax image of the stereo image, and the feature vector used to reflect the right viewpoint image of the stereo image The feature vector of the spatial frequency feature is obtained to obtain the feature vector of the stereo image, and then the feature vector of all the stereo images in the stereo image set is trained by using support vector regression, and the objective visual comfort of each stereo image in the stereo image set is calculated Evaluation prediction value, because the obtained feature vector information of the stereo image has strong stability and can better reflect the change of the visual comfort of the stereo image, so the correlation between the objective evaluation and the subjective perception is effectively improved.

附图说明Description of drawings

图1为本发明方法的总体实现框图；Fig. 1 is the overall realization block diagram of the inventive method;

图2a为“camera”的右视点图像；Figure 2a is the right view image of "camera";

图2b为“camera”的右视差图像；Figure 2b is the right parallax image of "camera";

图2c为“camera”的右视点图像的显著图；Figure 2c is a saliency map of the right view image of "camera";

图2d为“camera”的右视点图像的视觉显著图；Figure 2d is the visual saliency map of the right view image of "camera";

图2e为“camera”的视觉重要区域掩模；Figure 2e is the visually important area mask of "camera";

图3a为“cup”的右视点图像；Figure 3a is the right view image of "cup";

图3b为“cup”的右视差图像；Figure 3b is the right parallax image of "cup";

图3c为“cup”的右视点图像的显著图；Figure 3c is a saliency map of the right view image of "cup";

图3d为“cup”的右视点图像的视觉显著图；Figure 3d is a visual saliency map of the right view image of "cup";

图3e为“cup”的视觉重要区域掩模；Figure 3e is the visually important area mask of "cup";

图4a为“infant”的右视点图像；Figure 4a is the right view image of "infant";

图4b为“infant”的右视差图像；Figure 4b is the right parallax image of "infant";

图4c为“infant”的右视点图像的显著图；Figure 4c is a saliency map of the right view image of "infant";

图4d为“infant”的右视点图像的视觉显著图；Figure 4d is the visual saliency map of the right view image of "infant";

图4e为“infant”的视觉重要区域掩模；Figure 4e is the visually important area mask of "infant";

图5为根据F₁和F₂两个特征矢量获得的客观视觉舒适度评价预测值与平均主观评分均值的散点图；Fig. 5 is a scatter diagram of the predicted value of the objective visual comfort evaluation and the mean value of the average subjective score obtained according to the two feature vectors of _F1 and _F2 ;

图6为根据F₁和F₃两个特征矢量获得的客观视觉舒适度评价预测值与平均主观评分均值的散点图；Fig. 6 is a scatter diagram of the predicted value of the objective visual comfort evaluation and the mean value of the average subjective score obtained according to the two feature vectors of _F1 and _F3 ;

图7为根据F₂和F₃两个特征矢量获得的客观视觉舒适度评价预测值与平均主观评分均值的散点图；Fig. 7 is a scatter diagram of the predicted value of the objective visual comfort evaluation and the mean value of the average subjective score obtained according to the two feature vectors of _F2 and _F3 ;

图8为根据F₁、F₂和F₃三个特征矢量获得的客观视觉舒适度评价预测值与平均主观评分均值的散点图。Fig. 8 is a scatter diagram of the predicted value of the objective visual comfort evaluation and the mean value of the average subjective score obtained according to the three feature vectors of F ₁ , F ₂ and F ₃ .

具体实施方式Detailed ways

以下结合附图实施例对本发明作进一步详细描述。The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments.

本发明提出的一种基于机器学习的立体图像视觉舒适度评价方法，其总体实现框图如图1所示，其包括以下步骤：A kind of stereoscopic image visual comfort evaluation method based on machine learning that the present invention proposes, its overall realization block diagram is as shown in Figure 1, and it comprises the following steps:

①将待评价的立体图像的左视点图像记为{I_L(x,y)}，将待评价的立体图像的右视点图像记为{I_R(x,y)}，将待评价的立体图像的右视差图像记为{d_R(x,y)}，其中，此处(x,y)表示{I_L(x,y)}、{I_R(x,y)}和{d_R(x,y)}中的像素点的坐标位置，1≤x≤W，1≤y≤H，W表示{I_L(x,y)}、{I_R(x,y)}和{d_R(x,y)}的宽度，H表示{I_L(x,y)}、{I_R(x,y)}和{d_R(x,y)}的高度，I_L(x,y)表示{I_L(x,y)}中坐标位置为(x,y)的像素点的像素值，I_R(x,y)表示{I_R(x,y)}中坐标位置为(x,y)的像素点的像素值，d_R(x,y)表示{d_R(x,y)}中坐标位置为(x,y)的像素点的像素值。① Denote the left viewpoint image of the stereo image to be evaluated as {I _L (x, y)}, the right viewpoint image of the stereo image to be evaluated as {I _R (x, y)}, and the stereo image to be evaluated The right disparity image of the image is denoted as {d _R (x,y)}, where (x,y) denotes {I _L (x,y)}, {I _R (x,y)} and {d _R The coordinate position of the pixel in (x,y)}, 1≤x≤W, 1≤y≤H, W means {I _L (x, y)}, {I _R (x, y)} and {d The width of _R (x,y)}, H means the height of {I _L (x,y)}, {I _R (x,y)} and {d _R (x,y)}, I _L (x,y ) means the pixel value of the pixel whose coordinate position is (x, y) in {I _L (x, y)}, and I _R (x, y) means that the coordinate position in {I _R (x, y)} is (x ,y) is the pixel value of the pixel point, and d _R (x, y) represents the pixel value of the pixel point whose coordinate position is (x, y) in {d _R (x, y)}.

②提取出{I_R(x,y)}的显著图；然后根据{I_R(x,y)}的显著图和{d_R(x,y)}，获取{I_R(x,y)}的视觉显著图；再将{I_R(x,y)}的视觉显著图划分为视觉重要区域和非视觉重要区域；最后根据{I_R(x,y)}的视觉显著图的视觉重要区域和非视觉重要区域，获取待评价的立体图像的视觉重要区域掩膜，记为{M(x,y)}，其中，M(x,y)表示{M(x,y)}中坐标位置为(x,y)的像素点的像素值。② Extract the saliency map of {I _R (x, y)}; then according to the saliency map of {I _R (x, y)} and {d _R (x, y)}, get {I _R (x, y) } visual saliency map; then divide the visual saliency map of {I _R (x,y)} into visually important areas and non-visually important areas; finally according to the visual importance of the visual saliency map of {I _R (x,y)} Regions and non-visually important regions, obtain the visually important region mask of the stereo image to be evaluated, denoted as {M(x,y)}, where M(x,y) represents the coordinates in {M(x,y)} The pixel value of the pixel at position (x,y).

在此具体实施例中，步骤②的具体过程为：In this specific embodiment, the concrete process of step 2. is:

②-1、采用基于图论的视觉显著性（Graph-based Visual Saliency，GBVS）模型提取出{I_R(x,y)}的显著图，记为{SM_R(x,y)}，其中，SM_R(x,y)表示{SM_R(x,y)}中坐标位置为(x,y)的像素点的像素值。②-1. Use the Graph-based Visual Saliency (GBVS) model to extract the saliency map of {I _R (x, y)}, denoted as {SM _R (x, y)}, where , SM _R (x, y) represents the pixel value of the pixel at the coordinate position (x, y) in {SM _R (x, y)}.

②-2、根据{SM_R(x,y)}和{d_R(x,y)}，获取{I_R(x,y)}的视觉显著图，记为{D_R(x,y)}，将{D_R(x,y)}中坐标位置为(x,y)的像素点的像素值记为D_R(x,y)，

其中，

表示SM_R(x,y)的权重，

表示d_R(x,y)的权重，

在此取

②-2. According to {SM _R (x,y)} and {d _R (x,y)}, obtain the visual saliency map of {I _R (x,y)}, denoted as {D _R (x,y) }, record the pixel value of the pixel whose coordinate position is (x, y) in {D _R (x, y)} as D _R (x, y),

in,

Represents the weight of SM _R (x,y),

Indicates the weight of d _R (x,y),

take here

②-3、根据{D_R(x,y)}中的每个像素点的像素值，将{D_R(x,y)}划分为视觉重要区域和非视觉重要区域，{D_R(x,y)}的视觉重要区域中的每个像素点的像素值大于自适应阈值T₁，{D_R(x,y)}的非视觉重要区域中的每个像素点的像素值小于或等于自适应阈值T₁，其中，T₁为利用大津法对{D_R(x,y)}进行处理得到的阈值。②-3. According to the pixel value of each pixel in {D _R (x, y)}, divide {D _R (x, y)} into visually important areas and non-visually important areas, {D _R (x , y)}, the pixel value of each pixel in the visually important area is greater than the adaptive threshold T ₁ , and the pixel value of each pixel in the non-visually important area of {D _R (x, y)} is less than or equal to Adaptive threshold T ₁ , where T ₁ is a threshold obtained by processing {D _R (x, y)} using the Otsu method.

在此，截取三组典型的立体图像来说明本发明方法中获取的待评价的立体图像的视觉重要区域掩膜的性能。图2a和图2b分别给出了“camera”的右视点图像和右视差图像，图2c给出了“camera”的右视点图像的显著图，图2d给出了“camera”的右视点图像的视觉显著图，图2e给出了“camera”的视觉重要区域掩模；图3a和图3b分别给出了“cup”的右视点图像和右视差图像，图3c给出了“cup”的右视点图像的显著图，图3d给出了“cup”的右视点图像的视觉显著图，图3e给出了“cup”的视觉重要区域掩模；图4a和图4b分别给出了“infant”的右视点图像和右视差图像，图4c给出了“infant”的右视点图像的显著图，图4d给出了“infant”的右视点图像的视觉显著图，图4e给出了“infant”的视觉重要区域掩模。从图2e、图3e和图4e可以看出，采用本发明方法得到的视觉重要区域，能够很好地反映人眼视觉舒适程度。Here, three groups of typical stereoscopic images are intercepted to illustrate the performance of the visually important region mask of the stereoscopic image to be evaluated obtained in the method of the present invention. Figure 2a and Figure 2b show the right view image and right disparity image of "camera", respectively, Figure 2c shows the saliency map of the right view image of "camera", and Figure 2d shows the right view image of "camera" Visual saliency map, Figure 2e shows the visually important region mask of "camera"; Figure 3a and Figure 3b show the right viewpoint image and right disparity image of "cup", respectively, Figure 3c shows the right The saliency map of the viewpoint image, Figure 3d shows the visual saliency map of the right viewpoint image of "cup", and Fig. 3e shows the visually important region mask of "cup"; Fig. 4a and Fig. 4b respectively give the "infant" Figure 4c shows the saliency map of the right view image of "infant", Figure 4d shows the visual saliency map of the right view image of "infant", and Figure 4e shows the saliency map of "infant" The visually important region mask of . It can be seen from Fig. 2e, Fig. 3e and Fig. 4e that the visually important areas obtained by the method of the present invention can well reflect the visual comfort of human eyes.

③根据{d_R(x,y)}和{M(x,y)}，获取{d_R(x,y)}中与{I_R(x,y)}的视觉显著图的视觉重要区域相对应的区域内的像素点的视差均值μ、视差方差δ、最大负视差θ、视差范围χ，然后将μ、δ、θ和χ按顺序进行排列构成用于反映{d_R(x,y)}的视差幅度特征的特征矢量，记为F₁，F₁=(μ，δ，θ，χ)。③According to {d _R (x,y)} and {M(x,y)}, obtain the visually important areas of the visual saliency map in {d _R (x,y)} and {I _R (x,y)} The disparity mean μ, disparity variance δ, maximum negative disparity θ, and disparity range χ of the pixels in the corresponding area, and then arrange μ, δ, θ and χ in order to reflect {d _R (x, y )} The feature vector of the parallax amplitude feature is denoted as F ₁ , F ₁ =(μ, δ, θ, χ).

在此具体实施例中，步骤③的具体过程为：In this specific embodiment, the concrete process of step 3. is:

③-1、根据{d_R(x,y)}和{M(x,y)}，计算{d_R(x,y)}中与{I_R(x,y)}的视觉显著图的视觉重要区域相对应的区域内的所有像素点的视差均值，记为μ， $μ = \frac{\underset{(x, y) &Element; Ω}{Σ} d_{R} (x, y) \times M (x, y)}{\underset{(x, y) &Element; Ω}{Σ} M (x, y)},$ 其中，Ω表示图像域范围。③-1. According to {d _R (x, y)} and {M (x, y)}, calculate the visual saliency map in {d _R (x, y)} and {I _R (x, y)} The average disparity value of all pixels in the area corresponding to the visually important area, denoted as μ, $μ = \frac{\underset{(x, the y) &Element; Ω}{Σ} d_{R} (x, the y) \times m (x, the y)}{\underset{(x, the y) &Element; Ω}{Σ} m (x, the y)},$ Among them, Ω represents the range of the image domain.

③-2、根据{d_R(x,y)}和{M(x,y)}及μ，计算{d_R(x,y)}中与{I_R(x,y)}的视觉显著图的视觉重要区域相对应的区域内的所有像素点的视差方差，记为δ， $δ = \frac{\underset{(x, y) &Element; Ω}{Σ} {(d_{R} (x, y) - μ)}^{2} \times M (x, y)}{\underset{(x, y) &Element; Ω}{Σ} M (x, y)} .$ ③-2. According to {d _R (x,y)} and {M(x,y)} and μ, calculate the visual salience of {I _R (x,y)} in {d _R (x,y)} The disparity variance of all pixels in the area corresponding to the visually important area of the graph is denoted as δ, $δ = \frac{\underset{(x, the y) &Element; Ω}{Σ} {(d_{R} (x, the y) - μ)}^{2} \times m (x, the y)}{\underset{(x, the y) &Element; Ω}{Σ} m (x, the y)} .$

③-3、计算{d_R(x,y)}中与{I_R(x,y)}的视觉显著图的视觉重要区域相对应的区域内的像素点的最大负视差，记为θ，其中，θ的值为{d_R(x,y)}中与{I_R(x,y)}的视觉显著图的视觉重要区域相对应的区域内视差值最小的1%像素点的视差均值。③-3. Calculate the maximum negative disparity of the pixels in {d _R (x, y)} corresponding to the visually important area of the visual saliency map of {I _R (x, y)}, denoted as θ, Among them, the value of θ is the disparity of 1% of the pixels in {d _R (x, y)} corresponding to the visually important area of the visual saliency map of {I _R (x, y)} with the smallest disparity value mean.

③-4、计算{d_R(x,y)}中与{I_R(x,y)}的视觉显著图的视觉重要区域相对应的区域内的像素点的视差范围，记为χ，χ＝d_max-d_min，其中，d_max表示{d_R(x,y)}中与{I_R(x,y)}的视觉显著图的视觉重要区域相对应的区域内视差值最大的1%像素点的视差均值，d_min表示{d_R(x,y)}中与{I_R(x,y)}的视觉显著图的视觉重要区域相对应的区域内视差值最小的1%像素点的视差均值。③-4, calculate the disparity range of the pixels in the region corresponding to the visually important region of the visual saliency map of {I _R (x, y)} in {d _R (x, y)}, denoted as χ, χ ＝d _max -d _min , wherein, d _max represents the maximum disparity value in {d _R (x, y)} corresponding to the visually important area of the visual saliency map of {I _R (x, y)} The mean value of the disparity of 1% of the pixels, d _min represents the minimum disparity value of 1 in the area corresponding to the visually important area of the visual saliency map of {I _R (x, y)} in {d _R (x, y)} % disparity mean of pixels.

④通过计算{d_R(x,y)}的视差梯度幅值图像和视差梯度方向图像，计算{d_R(x,y)}的视差梯度边缘图像；然后根据{d_R(x,y)}的视差梯度边缘图像和{M(x,y)}，计算{d_R(x,y)}的视差梯度边缘图像中与{I_R(x,y)}的视觉显著图的视觉重要区域相对应的区域内的所有像素点的梯度均值ψ；最后将ψ作为用于反映{d_R(x,y)}的视差梯度特征的特征矢量，记为F₂。④ By calculating the parallax gradient magnitude image and parallax gradient direction image of {d _R (x, y)}, calculate the parallax gradient edge image of {d _R (x, y)}; then according to {d _R (x, y) } of the disparity gradient edge image and {M(x,y)}, calculate the visually important regions in the disparity gradient edge image of {d _R (x,y)} and the visual saliency map of {I _R (x,y)} The gradient mean value ψ of all pixels in the corresponding area; finally, ψ is used as a feature vector reflecting the disparity gradient feature of {d _R (x, y)}, which is denoted as F ₂ .

在此具体实施例中，步骤④的具体过程为：In this specific embodiment, the concrete process of step 4. is:

其中，G_x(x,y)表示{m(x,y)}中坐标位置为(x,y)的像素点的水平梯度值，G_y(x,y)表示{m(x,y)}中坐标位置为(x,y)的像素点的垂直梯度值。④-1. Calculate the parallax gradient magnitude image of {d _R (x,y)}, which is recorded as {m(x,y)}, and the coordinate position in {m(x,y)} is (x,y) The gradient magnitude of the pixel point is recorded as m(x,y),

Among them, G _x (x, y) represents the horizontal gradient value of the pixel whose coordinate position is (x, y) in {m(x, y)}, and G _y (x, y) represents {m(x, y) } in the vertical gradient value of the pixel whose coordinate position is (x, y).

④-2、计算{d_R(x,y)}的视差梯度方向图像，记为{θ(x,y)}，将{θ(x,y)}中坐标位置为(x,y)的像素点的梯度方向值记为θ(x,y)，θ(x,y)＝arctan(G_y(x,y)/G_x(x,y))，其中，arctan()为取反正切函数。④-2. Calculate the disparity gradient direction image of {d _R (x,y)}, which is recorded as {θ(x,y)}, and the coordinate position in {θ(x,y)} is (x,y) The gradient direction value of the pixel is recorded as θ(x,y), θ(x,y)=arctan(G _y (x,y)/G _x (x,y)), where arctan() is the arc tangent function.

④-3、根据{m(x,y)}和{θ(x,y)}，计算{d_R(x,y)}的视差梯度边缘图像，记为{E(x,y)}，将{E(x,y)}中坐标位置为p的像素点的梯度边缘值记为E(p)，

其中，G_s(||p-q||)表示标准差为σ_s的高斯函数，在此取σ_s＝0.4，

||p-q||表示坐标位置p和坐标位置q之间的欧氏距离，符号“|| ||”为求欧氏距离符号，

表示标准差为σ_o的高斯函数，在此取σ_o＝0.4，

G_{o} (| | \overset{&RightArrow;}{θ} (p) - \overset{&RightArrow;}{θ} (q) | |) = \exp (- \frac{{| | \overset{&RightArrow;}{θ} (p) - \overset{&RightArrow;}{θ} (q) | |}^{2}}{2 {σ_{o}}^{2}}),

| | \overset{&RightArrow;}{θ} (p) - \overset{&RightArrow;}{θ} (q) | |

表示

与

之间的欧氏距离，

m(q)表示{m(x,y)}中坐标位置为q的像素点的梯度幅值，m(q')表示{m(x,y)}中坐标位置为q'的像素点的梯度幅值，ε_g为控制参数，在此取ε_g＝0.5，符号“[]”为矢量表示符号，exp()表示以e为底的指数函数，e＝2.71828183，

表示以坐标位置为p的像素点为中心的邻域窗口，

表示以坐标位置为q的像素点为中心的邻域窗口，在此

的大小为3×3，

的大小为3×3。④-3. According to {m(x,y)} and {θ(x,y)}, calculate the parallax gradient edge image of {d _R (x,y)}, denoted as {E(x,y)}, Record the gradient edge value of the pixel point whose coordinate position is p in {E(x,y)} as E(p),

Among them, G _s (||pq||) represents a Gaussian function whose standard deviation is σ _s , and here σ _s =0.4,

||pq|| represents the Euclidean distance between the coordinate position p and the coordinate position q, and the symbol "|| ||" is the Euclidean distance symbol,

Represents a Gaussian function with standard deviation σ _o , where σ _o = 0.4,

G_{o} (| | \overset{&Right Arrow;}{θ} (p) - \overset{&Right Arrow;}{θ} (q) | |) = \exp (- \frac{{| | \overset{&Right Arrow;}{θ} (p) - \overset{&Right Arrow;}{θ} (q) | |}^{2}}{2 {σ_{o}}^{2}}),

| | \overset{&Right Arrow;}{θ} (p) - \overset{&Right Arrow;}{θ} (q) | |

express

and

The Euclidean distance between

m(q) represents the gradient magnitude of the pixel at the coordinate position q in {m(x,y)}, and m(q') represents the gradient magnitude of the pixel at the coordinate position q' in {m(x,y)} Gradient magnitude, ε _g is a control parameter, here ε _g = 0.5, symbol "[]" is a vector representation symbol, exp () represents an exponential function with e as the base, e=2.71828183,

Indicates the neighborhood window centered on the pixel at the coordinate position p,

Indicates the neighborhood window centered on the pixel at the coordinate position q, where

has a size of 3×3,

The size is 3×3.

④-4、根据{E(x,y)}和{M(x,y)}，计算{E(x,y)}中与{I_R(x,y)}的视觉显著图的视觉重要区域相对应的区域内的所有像素点的梯度均值，记为ψ，

其中，Ω表示图像域范围，E(x,y)表示{E(x,y)}中坐标位置为(x,y)的像素点的梯度边缘值。④-4. According to {E(x,y)} and {M(x,y)}, calculate the visual importance of the visual saliency map in {E(x,y)} and {I _R (x,y)} The gradient mean of all pixels in the region corresponding to the region is denoted as ψ,

Among them, Ω represents the range of the image domain, and E(x, y) represents the gradient edge value of the pixel whose coordinate position is (x, y) in {E(x, y)}.

⑤获取{I_R(x,y)}的空间频率图像；然后根据{I_R(x,y)}的空间频率图像和{M(x,y)}，获取{I_R(x,y)}的空间频率图像中与{I_R(x,y)}的视觉显著图的视觉重要区域相对应的区域内的像素点的空间频率均值ν、空间频率方差ρ、空间频率范围ζ、空间频率敏感因子τ；再将ν、ρ、ζ和τ按顺序进行排列构成用于反映{I_R(x,y)}的空间频率特征的特征矢量，记为F₃，F₃=(ν，ρ，ζ，τ)。⑤ Obtain the spatial frequency image of {I _R (x, y)}; then according to the spatial frequency image of {I _R (x, y)} and {M (x, y)}, obtain {I _R (x, y) } in the spatial frequency image of {I _R (x,y)}, the spatial frequency mean ν, spatial frequency variance ρ, spatial frequency range ζ, spatial frequency Sensitivity factor τ; then arrange ν, ρ, ζ and τ in order to form a feature vector used to reflect the spatial frequency characteristics of {I _R (x,y)}, denoted as F ₃ , F ₃ =(ν, ρ , ζ, τ).

在此具体实施例中，步骤⑤的具体过程为：In this specific embodiment, the concrete process of step 5. is:

⑤-1、计算{I_R(x,y)}的空间频率图像，记为{SF(x,y)}，将{SF(x,y)}中坐标位置为(x,y)的像素点的空间频率值记为SF(x,y)， $SF (x, y) = \sqrt{{(HF (x, y))}^{2} + {(VF (x, y))}^{2} + {(DF (x, y))}^{2}},$ 其中，HF(x,y)表示{I_R(x,y)}中坐标位置为(x,y)的像素点的水平方向频率值， $HF (x, y) = \sqrt{\frac{Σ_{m = - 1}^{1} Σ_{n = 0}^{1} {(I_{R} (x + m, y + n) - I_{R} (x + m, y + n - 1))}^{2}}{3 \times 2}},$ VF(x,y)表示{I_R(x,y)}中坐标位置为(x,y)的像素点的垂直方向频率值， $VF (x, y) = \sqrt{\frac{Σ_{m = 0}^{1} Σ_{n = - 1}^{1} {(I_{R} (x + m, y + n) - I_{R} (x + m - 1, y + n))}^{2}}{2 \times 3}},$ DF(x,y)表示{I_R(x,y)}中坐标位置为(x,y)的像素点的对角方向频率值， $DF (x, y) = \sqrt{\frac{Σ_{m = 0}^{1} Σ_{n = 0}^{1} {(I_{R} (x + m, y + n) - I_{R} (x + m - 1, y + n - 1))}^{2}}{2 \times 2}} + \sqrt{\frac{Σ_{m = - 1}^{0} Σ_{n = 0}^{1} {(I_{R} (x + m, y + n) - I_{R} (x + m + 1, y + n - 1))}^{2}}{2 \times 2}},$ I_R(x+m,y+n)表示{I_R(x,y)}中坐标位置为(x+m,y+n)的像素点的像素值，I_R(x+m,y+n-1)表示{I_R(x,y)}中坐标位置为(x+m,y+n-1)的像素点的像素值，I_R(x+m-1,y+n)表示{I_R(x,y)}中坐标位置为(x+m-1,y+n)的像素点的像素值，I_R(x+m-1,y+n-1)表示{I_R(x,y)}中坐标位置为(x+m-1,y+n-1)的像素点的像素值，I_R(x+m+1,y+n-1)表示{I_R(x,y)}中坐标位置为(x+m+1,y+n-1)的像素点的像素值，如果x+m<1，则I_R(x+m,y+n)的值由I_R(1,y+n)的值替代，I_R(x+m,y+n-1)的值由I_R(1,y+n-1)的值替代；如果x+m-1<1，则I_R(x+m-1,y+n)的值由I_R(1,y+n)的值替代，I_R(x+m-1,y+n-1)的值由I_R(1,y+n-1)的值替代；如果x+m>W，则I_R(x+m,y+n)的值由I_R(W,y+n)的值替代，I_R(x+m,y+n-1)的值由I_R(W,y+n-1)的值替代；如果x+m+1>W，则I_R(x+m+1,y+n-1)的值由I_R(W,y+n-1)的值替代；如果y+n<1，则I_R(x+m,y+n)的值由I_R(x+m,1)的值替代，I_R(x+m-1,y+n)的值由I_R(x+m-1,1)的值替代；如果y+n-1<1，则I_R(x+m,y+n-1)的值由I_R(x+m,1)的值替代，I_R(x+m-1,y+n-1)的值由I_R(x+m-1,1)的值替代，I_R(x+m+1,y+n-1)的值由I_R(x+m+1,1)的值替代；如果y+n>H，则I_R(x+m,y+n)的值由I_R(x+m,H)的值替代，I_R(x+m-1,y+n)的值由I_R(x+m-1,H)的值替代。⑤-1. Calculate the spatial frequency image of {I _R (x, y)}, which is recorded as {SF(x, y)}, and the pixel whose coordinate position is (x, y) in {SF(x, y)} The spatial frequency value of a point is denoted as SF(x,y), $SF (x, the y) = \sqrt{{(HF (x, the y))}^{2} + {(VF (x, the y))}^{2} + {(DF (x, the y))}^{2}},$ Among them, HF(x, y) represents the horizontal direction frequency value of the pixel whose coordinate position is (x, y) in {I _R (x, y)}, $HF (x, the y) = \sqrt{\frac{Σ_{m = - 1}^{1} Σ_{no = 0}^{1} {(I_{R} (x + m, the y + no) - I_{R} (x + m, the y + no - 1))}^{2}}{3 \times 2}},$ VF(x,y) represents the vertical frequency value of the pixel whose coordinate position is (x,y) in {I _R (x,y)}, $VF (x, the y) = \sqrt{\frac{Σ_{m = 0}^{1} Σ_{no = - 1}^{1} {(I_{R} (x + m, the y + no) - I_{R} (x + m - 1, the y + no))}^{2}}{2 \times 3}},$ DF(x, y) represents the diagonal direction frequency value of the pixel point whose coordinate position is (x, y) in {I _R (x, y)}, $DF (x, the y) = \sqrt{\frac{Σ_{m = 0}^{1} Σ_{no = 0}^{1} {(I_{R} (x + m, the y + no) - I_{R} (x + m - 1, the y + no - 1))}^{2}}{2 \times 2}} + \sqrt{\frac{Σ_{m = - 1}^{0} Σ_{no = 0}^{1} {(I_{R} (x + m, the y + no) - I_{R} (x + m + 1, the y + no - 1))}^{2}}{2 \times 2}},$ I _R (x+m, y+n) represents the pixel value of the pixel whose coordinate position is (x+m, y+n) in {I _R (x, y)}, and I _R (x+m, y+ n-1) represents the pixel value of the pixel point whose coordinate position is (x+m, y+n-1) in {I _R (x, y)}, and I _R (x+m-1, y+n) represents The pixel value of the pixel whose coordinate position is (x+m-1, y+n) in {I _R (x, y)}, I _R (x+m-1, y+n-1) means {I _R The pixel value of the pixel whose coordinate position is (x+m-1, y+n-1) in (x, y)}, I _R (x+m+1, y+n-1) means that {I _R ( The pixel value of the pixel whose coordinate position is (x+m+1,y+n-1) in x,y)}, if x+m<1, then the value of I _R (x+m,y+n) is replaced by the value of I _R (1,y+n), and the value of I _R (x+m,y+n-1) is replaced by the value of I _R (1,y+n-1); if x+m- 1<1, the value of I _R (x+m-1,y+n) is replaced by the value of I _R (1,y+n), and the value of I _R (x+m-1,y+n-1) The value is replaced by the value of I _R (1,y+n-1); if x+m>W, the value of I _R (x+m,y+n) is replaced by the value of I _R (W,y+n) Instead, the value of I _R (x+m,y+n-1) is replaced by the value of I _R (W,y+n-1); if x+m+1>W, then I _R (x+m+ 1,y+n-1) is replaced by the value of I _R (W,y+n-1); if y+n<1, the value of I _R (x+m,y+n) is replaced by I _R The value of (x+m,1) is replaced, the value of I _R (x+m-1,y+n) is replaced by the value of I _R (x+m-1,1); if y+n-1<1 , then the value of I _R (x+m,y+n-1) is replaced by the value of I _R (x+m,1), and the value of I _R (x+m-1,y+n-1) is replaced by I The value of _R (x+m-1,1) is replaced, and the value of I _R (x+m+1,y+n-1) is replaced by the value of I _R (x+m+1,1); if y+ n>H, then the value of I _R (x+m,y+n) is replaced by the value of I _R (x+m,H), and the value of I _R (x+m-1,y+n) is replaced by I _R (x+m-1,H) instead.

⑤-2、根据{SF(x,y)}和{M(x,y)}，计算{SF(x,y)}中与{I_R(x,y)}的视觉显著图的视觉重要区域相对应的区域内的所有像素点的空间频率均值，记为ν， $ν = \frac{\underset{(x, y) &Element; Ω}{Σ} SF (x, y) \times M (x, y)}{\underset{(x, y) &Element; Ω}{Σ} M (x, y)},$ 其中，Ω表示图像域范围。⑤-2. According to {SF(x,y)} and {M(x,y)}, calculate the visual importance of the visual saliency map in {SF(x,y)} and {I _R (x,y)} The spatial frequency mean value of all pixels in the region corresponding to the region is denoted as ν, $ν = \frac{\underset{(x, the y) &Element; Ω}{Σ} SF (x, the y) \times m (x, the y)}{\underset{(x, the y) &Element; Ω}{Σ} m (x, the y)},$ Among them, Ω represents the range of the image domain.

⑤-3、根据{SF(x,y)}和{M(x,y)}及ν，计算{SF(x,y)}中与{I_R(x,y)}的视觉显著图的视觉重要区域相对应的区域内的所有像素点的空间频率方差，记为ρ， $ρ = \frac{\underset{(x, y) &Element; Ω}{Σ} {(SF (x, y) - ν)}^{2} \times M (x, y)}{\underset{(x, y) &Element; Ω}{Σ} M (x, y)} .$ ⑤-3. According to {SF(x,y)} and {M(x,y)} and ν, calculate the visual saliency map in {SF(x,y)} and {I _R (x,y)} The spatial frequency variance of all pixels in the area corresponding to the visually important area is denoted as ρ, $ρ = \frac{\underset{(x, the y) &Element; Ω}{Σ} {(SF (x, the y) - ν)}^{2} \times m (x, the y)}{\underset{(x, the y) &Element; Ω}{Σ} m (x, the y)} .$

⑤-4、计算{SF(x,y)}中与{I_R(x,y)}的视觉显著图的视觉重要区域相对应的区域内的像素点的空间频率范围，记为ζ，ζ=SF_max-SF_mix，其中，SF_max表示{SF(x,y)}中与{I_R(x,y)}的视觉显著图的视觉重要区域相对应的区域内空间频率值最大的1%像素点的空间频率均值，SF_min表示{SF(x,y)}中与{I_R(x,y)}的视觉显著图的视觉重要区域相对应的区域内空间频率值最小的1%像素点的空间频率均值。⑤-4. Calculate the spatial frequency range of the pixels in {SF (x, y)} corresponding to the visually important area of the visual saliency map of {I _R (x, y)}, denoted as ζ, ζ =SF _max -SF _mix , where SF _max represents the 1 with the largest spatial frequency value in the region corresponding to the visually important region of the visual saliency map of {I _R (x,y)} in {SF(x,y)} % Spatial frequency mean of pixels, SF _min represents the smallest 1% of the spatial frequency value in the region corresponding to the visually important region of the visual saliency map of {I _R (x,y)} in {SF(x,y)} The spatial frequency mean of the pixel.

⑤-5、计算{SF(x,y)}中与{I_R(x,y)}的视觉显著图的视觉重要区域相对应的区域内的像素点的空间频率敏感因子，记为τ，τ＝ν/μ。⑤-5. Calculate the spatial frequency sensitivity factor of the pixels in the region corresponding to the visually important region of the visual saliency map of {I _R (x, y)} in {SF (x, y)}, denoted as τ, τ=ν/μ.

⑥将F₁、F₂及F₃构成一个新的特征矢量，记为X，X＝[F₁,F₂,F₃]，然后将X作为待评价的立体图像的特征矢量，其中，符号“[]”为矢量表示符号，[F₁,F₂,F₃]表示将F₁、F₂和F₃连接起来形成一个新的特征矢量。⑥Constitute F ₁ , F ₂ and F ₃ into a new feature vector, denoted as X, X=[F ₁ , F ₂ , F ₃ ], and then use X as the feature vector of the stereoscopic image to be evaluated, where the symbol "[]" is a vector representation symbol, and [F ₁ , F ₂ , F ₃ ] means connecting F ₁ , F ₂ and F ₃ to form a new feature vector.

⑦采用n副不同的立体图像以及对应的右视差图像建立立体图像集合，利用现有的主观质量评价方法分别计算立体图像集合中的每副立体图像的视觉舒适度的平均主观评分均值，记为MOS，其中，n≥1，MOS∈[1,5]；然后按照步骤①至步骤⑥计算待评价的立体图像的特征矢量X的操作，以相同的方式分别计算立体图像集合中的每幅立体图像的特征矢量，将立体图像集合中的第i幅立体图像的特征矢量记为X_i，其中，1≤i≤n，n表示立体图像集合中包含的立体图像的幅数。⑦ Use n different stereoscopic images and the corresponding right disparity images to establish a stereoscopic image set, and use the existing subjective quality evaluation method to calculate the average subjective rating of the visual comfort of each stereoscopic image in the stereoscopic image set, which is denoted as MOS, where, n≥1, MOS∈[1,5]; then follow steps ① to ⑥ to calculate the feature vector X of the stereo image to be evaluated, and calculate each stereo in the stereo image set in the same way The feature vector of the image, the feature vector of the i-th stereo image in the stereo image set is denoted as X _i , where 1≤i≤n, n represents the number of stereo images contained in the stereo image set.

在本实施例中，采用韩国科学技术院图像和视频系统实验室提供的立体图像数据库作为立体图像集合，该立体图像数据库包含120幅立体图像以及对应的右视差图像，该立体图像数据库包含了各种场景深度的室内和室外图像，并给出了每副立体图像的视觉舒适度的平均主观评分均值。In this embodiment, the stereoscopic image database provided by the Image and Video System Laboratory of the Korea Institute of Science and Technology is used as a stereoscopic image collection. The stereoscopic image database contains 120 stereoscopic images and corresponding right parallax images. Indoor and outdoor images of various scene depths, and the average subjective rating mean of the visual comfort of each stereo image is given.

表示测试样本数据集合中的第k'幅立体图像的线性函数，1≤k'≤n-t，t表示训练集中包含的立体图像的幅数；之后通过重新分配训练集和测试集，重新预测得到测试样本数据集合中的每幅立体图像的客观视觉舒适度评价预测值，经过N次迭代后计算立体图像中的每幅立体图像的客观视觉舒适度评价预测值的平均值，并将计算得到的平均值作为对应那幅立体图像的最终的客观视觉舒适度预测值，其中，N的值取大于100，以保证立体图像集合中的每幅立体图像都能得到客观视觉舒适度评价预测值，在本实施例中取N＝200。8. Divide all the stereo images in the stereo image set into training set and test set, form the training sample data set with the feature vectors and average subjective score mean of all the stereo images in the training set, and combine the feature vectors and average values of all the stereo images in the test set The mean value of the subjective score constitutes the test sample data set, and then uses support vector regression as a machine learning method to train the feature vectors of all stereo images in the training sample data set, so that the regression function value obtained after training and the average subjective score mean value The error between is the smallest, and the optimal weight vector w ^opt and the optimal bias item b ^opt are obtained by fitting, and then the support vector regression training model is obtained by using w ^opt and b ^opt , and then according to the support vector regression training model, the test The feature vector of each stereoscopic image in the sample data set is tested, and the objective visual comfort evaluation prediction value of each stereoscopic image in the test sample data set is predicted, and the k'th stereoscopic image in the test sample data set is The predicted value of objective visual comfort evaluation is denoted as Q _k' , Q _k' = f(X _k' ),

Represents the linear function of the k'th stereo image in the test sample data set, 1≤k'≤nt, t represents the number of stereo images contained in the training set; after that, re-predict the test by reassigning the training set and the test set The objective visual comfort evaluation prediction value of each stereoscopic image in the sample data set, calculate the average value of the objective visual comfort evaluation prediction value of each stereoscopic image in the stereoscopic image after N iterations, and calculate the average The value is used as the final objective visual comfort prediction value corresponding to the stereo image, where the value of N is greater than 100, so as to ensure that each stereo image in the stereo image set can get the objective visual comfort evaluation prediction value, in this N=200 is taken in the embodiment.

在此具体实施例中，步骤⑧的具体过程为：In this specific embodiment, the concrete process of step 8. is:

⑧-1、随机选择立体图像集合中的幅立体图像构成训练集，将立体图像集合中剩余的n-t幅立体图像构成测试集，其中，符号

为向上取整符号。⑧-1, randomly select the stereo image set Stereo images constitute the training set, and the remaining nt stereo images in the stereo image set constitute the test set, where the symbol

is the round up sign.

⑧-2、将训练集中的所有立体图像的特征矢量和平均主观评分均值构成训练样本数据集合，记为Ω_t，{X_k,MOS_k}∈Ω_t，其中，X_k表示训练样本数据集合Ω_t中的第k幅立体图像的特征矢量，MOS_k表示训练样本数据集合Ω_t中的第k幅立体图像的平均主观评分均值，1≤k≤t。⑧-2. The feature vectors and average subjective ratings of all stereo images in the training set constitute the training sample data set, which is recorded as Ω _t , {X _k ,MOS _k }∈Ω _t , where X _k represents the training sample data set The feature vector of the k-th stereo image in Ω _t , MOS _k represents the mean subjective score of the k-th stereo image in the training sample data set Ω _t , 1≤k≤t.

其中，f()为函数表示形式，w为权重矢量，w^T为w的转置矩阵，b为偏置项，w和b的值需要通过训练来得到，

表示X_k的线性函数，

D(X_k,X_l)为支持向量回归中的核函数，

X_l为训练样本数据集合Ω_t中的第l幅立体图像的特征矢量，1≤l≤t，γ为核参数，其用于反映输入样本值的范围，样本值的范围越大，γ值也就越大，在本实施例中取γ＝54，exp()表示以e为底的指数函数，e＝2.71828183，符号“|| ||”为求欧式距离符号。8.-3, construct the regression function of the feature vector of each stereoscopic image in the training sample data set Ω _t , the regression function of X _k is denoted as f(X _k ),

Among them, f() is the function representation, w is the weight vector, w ^T is the transposition matrix of w, b is the bias item, and the values of w and b need to be obtained through training.

represents a linear function of X _k ,

D(X _k ,X _l ) is the kernel function in support vector regression,

X _l is the feature vector of the lth stereo image in the training sample data set Ω _t , 1≤l≤t, γ is the kernel parameter, which is used to reflect the range of input sample values, the larger the range of sample values, the value of γ The larger it is, γ=54 is taken in this embodiment, exp() represents an exponential function with e as the base, e=2.71828183, and the symbol "|| ||" is the Euclidean distance symbol.

\underset{(w, b) &Element; Ψ}{\arg \min} Σ_{k = 1}^{t} {(f (X_{k}) - {MOS}_{k})}^{2}

表示使得

Σ_{k = 1}^{t} {(f (X_{k}) - {MOS}_{k})}^{2}

表示支持向量回归训练模型的输入矢量X_inp的线性函数。⑧-4, adopting support vector regression to train the feature vectors of all stereoscopic images in the training sample data set Ω _t , so that the error between the regression function value obtained through training and the mean value of the average subjective rating is the smallest, and the fitting is optimal The weight vector w ^opt and the optimal bias item b ^opt , the combination of the optimal weight vector w ^opt and the optimal bias item b ^opt is recorded as (w ^opt , b ^opt ),

(w^{opt}, b^{opt}) = \underset{(w, b) &Element; Ψ}{\arg \min} Σ_{k = 1}^{t} {(f (x_{k}) - {MOS}_{k})}^{2},

\underset{(w, b) &Element; Ψ}{\arg \min} Σ_{k = 1}^{t} {(f (x_{k}) - {MOS}_{k})}^{2}

express to make

Σ_{k = 1}^{t} {(f (x_{k}) - {MOS}_{k})}^{2}

A linear function representing the input vector X _inp of the support vector regression trained model.

⑧-5、将测试集中的所有立体图像的特征矢量和平均主观评分均值构成测试样本数据集合，然后根据支持向量回归训练模型，对测试样本数据集合中的每幅立体图像的特征矢量进行测试，预测得到测试样本数据集合中的每幅立体图像的客观视觉舒适度评价预测值，将测试样本数据集合中的第k'幅立体图像的客观视觉舒适度评价预测值记为Q_k'，Q_k'＝f(X_k')，

其中，X_k'表示测试样本数据集合中的第k'幅立体图像的特征矢量，表示测试样本数据集合中的第k'幅立体图像的线性函数，1≤k'≤n-t。8.-5, the eigenvectors of all stereoscopic images in the test set and the mean value of the average subjective score form the test sample data set, then according to the support vector regression training model, the eigenvectors of each stereoscopic image in the test sample data set are tested, The objective visual comfort evaluation prediction value of each stereoscopic image in the test sample data set is predicted, and the objective visual comfort evaluation prediction value of the k'th stereo image in the test sample data set is recorded as Q _k' , Q _{k '} = f(X _k' ),

Among them, X _k' represents the feature vector of the k'th stereo image in the test sample data set, Represents the linear function of the k'th stereo image in the test sample data set, 1≤k'≤nt.

⑧-6、再重新随机选择立体图像集合中的

幅立体图像构成训练集，将立体图像集合中剩余的n-t幅立体图像构成测试集，然后返回步骤⑧-2继续执行，在经过N次迭代后，计算立体图像集合中的每幅立体图像的客观视觉舒适度评价预测值的平均值，再将计算得到的平均值作为对应那幅立体图像的最终的客观视觉舒适度评价预测值，其中，N的值取大于100。⑧-6, and then re-randomly select the stereo image collection

Stereo images form the training set, and the remaining nt stereo images in the stereo image set constitute the test set, and then return to step ⑧-2 to continue execution. After N iterations, calculate the objective value of each stereo image in the stereo image set The average value of the predicted value of the visual comfort evaluation, and then the calculated average value is used as the final predicted value of the objective visual comfort evaluation corresponding to the stereo image, wherein the value of N is greater than 100.

在本实施例中，利用评估图像质量评价方法的4个常用客观参量作为评价指标，即非线性回归条件下的Pearson相关系数（Pearson linear correlation coefficient，PLCC）、Spearman相关系数（Spearman rank order correlation coefficient，SROCC）、Kendall相关系数（Kendall rank-order correlation coefficient，KROCC)、均方误差（root mean squarederror，RMSE），PLCC和RMSE反映客观评价预测值的准确性，SROCC和KROCC反映其单调性。将计算得到的120幅立体图像的视觉舒适度客观评价预测值做五参数Logistic函数非线性拟合，PLCC、SROCC和KROCC值越高、RMSE值越小说明本发明的视觉舒适度客观评价方法的评价结果与平均主观评分均值的相关性越好。表1给出了采用不同特征矢量得到的视觉舒适度评价预测值与平均主观评分均值之间的相关性，从表1中可以看出，只采用两个特征矢量得到的视觉舒适度评价预测值与平均主观评分均值之间的相关性均不是最优的，并且由视差幅度特征构成的特征矢量对评价性能的影响比其他两个特征矢量要大，这说明了本发明方法提取的三个特征矢量是有效的，并且结合视差幅度、视差梯度和空间频率特征的特征矢量，得到的视觉舒适度评价预测值与平均主观评分均值之间的相关性更强，这足以说明本发明方法是有效的。In this embodiment, four commonly used objective parameters for evaluating image quality evaluation methods are used as evaluation indicators, namely Pearson correlation coefficient (Pearson linear correlation coefficient, PLCC) and Spearman correlation coefficient (Spearman rank order correlation coefficient) under nonlinear regression conditions. , SROCC), Kendall rank-order correlation coefficient (KROCC), mean square error (root mean squared error, RMSE), PLCC and RMSE reflect the accuracy of objective evaluation of the predicted value, SROCC and KROCC reflect its monotonicity. The visual comfort objective evaluation prediction value of the calculated 120 stereoscopic images is used as a five-parameter Logistic function nonlinear fitting, the higher the PLCC, SROCC and KROCC values, and the smaller the RMSE value, the smaller the visual comfort objective evaluation method of the present invention. The better the correlation between the evaluation results and the average subjective rating mean. Table 1 shows the correlation between the predicted value of visual comfort evaluation obtained by using different feature vectors and the mean value of the average subjective score. It can be seen from Table 1 that the predicted value of visual comfort evaluation obtained by using only two feature vectors None of the correlations with the average subjective rating mean is optimal, and the feature vector composed of disparity magnitude features has a greater impact on the evaluation performance than the other two feature vectors, which illustrates that the three features extracted by the method of the present invention The vector is effective, and combined with the feature vector of parallax amplitude, parallax gradient and spatial frequency features, the correlation between the obtained visual comfort evaluation prediction value and the average subjective rating mean is stronger, which is enough to show that the method of the present invention is effective .

图5给出了采用F₁和F₂两个特征矢量得到的客观视觉舒适度评价预测值与平均主观评分均值的散点图，图6给出了采用F₁和F₃两个特征矢量得到的客观视觉舒适度评价预测值与平均主观评分均值的散点图，图7给出了采用F₂和F₃两个特征矢量得到的客观视觉舒适度评价预测值与平均主观评分均值的散点图，图8给出了采用F₁、F₂和F₃三个特征矢量得到的客观视觉舒适度评价预测值与平均主观评分均值的散点图，散点图中的散点越集中，说明客观评价结果与主观感知的一致性越好。从图5至图8中可以看出，采用本发明方法得到的散点图中的散点比较集中，与主观评价数据之间的吻合度较高。Figure 5 shows the scatter plot of the predicted value of objective visual comfort evaluation and the average subjective score obtained by using two feature vectors F ₁ and F ₂ , and Figure 6 shows the scatter diagram obtained by using two feature vectors F ₁ and F ₃ Figure 7 shows the scatter plot of the predicted value of objective visual comfort evaluation and the average subjective score obtained by using the two feature vectors F ₂ and F ₃ Fig. 8 shows the scatter diagram of the predicted value of objective visual comfort evaluation and the mean value of the average subjective rating obtained by using the three feature vectors F ₁ , F ₂ and F ₃ . The more concentrated the scatter points in the scatter diagram, the more The better the consistency between objective evaluation results and subjective perception. It can be seen from Fig. 5 to Fig. 8 that the scatter points in the scatter diagram obtained by the method of the present invention are relatively concentrated, and the coincidence degree with the subjective evaluation data is relatively high.

表1采用不同特征矢量得到的视觉舒适度评价预测值与平均主观评分均值之间的相关Table 1 Correlation between the predicted value of visual comfort evaluation and the average subjective score obtained by using different feature vectors

性sex

特征矢量feature vector F₁+F₂ F ₁ +F ₂ F₁+F₃ F ₁ +F ₃ F₂+F₃ F ₂ +F ₃ F₁+F₂+F₃ F ₁ +F ₂ +F ₃ PLCCPLCC 0.83480.8348 0.85480.8548 0.75430.7543 0.87160.8716

SROCCSROCC 0.79660.7966 0.80450.8045 0.70930.7093 0.83290.8329 KROCCKROCC 0.60840.6084 0.61200.6120 0.52310.5231 0.64540.6454 RMSERMSE 0.44300.4430 0.41760.4176 0.52830.5283 0.39440.3944

Claims

1. a method for evaluating visual comfort of stereoscopic images based on machine learning, characterized in that it may further comprise the steps:

① Denote the left viewpoint image of the stereo image to be evaluated as {I _L (x, y)}, the right viewpoint image of the stereo image to be evaluated as {I _R (x, y)}, and the stereo image to be evaluated The right disparity image of the image is denoted as {d _R (x,y)}, where (x,y) denotes {I _L (x,y)}, {I _R (x,y)} and {d _R The coordinate position of the pixel in (x,y)}, 1≤x≤W, 1≤y≤H, W means {I _L (x, y)}, {I _R (x, y)} and {d The width of _R (x,y)}, H means the height of {I _L (x,y)}, {I _R (x,y)} and {d _R (x,y)}, I _L (x,y ) means the pixel value of the pixel whose coordinate position is (x, y) in {I _L (x, y)}, and I _R (x, y) means that the coordinate position in {I _R (x, y)} is (x , y) the pixel value of the pixel point, d _R (x, y) represents the pixel value of the pixel point whose coordinate position is (x, y) in {d _R (x, y)};

② Extract the saliency map of {I _R (x, y)}; then according to the saliency map of {I _R (x, y)} and {d _R (x, y)}, get {I _R (x, y) } visual saliency map; then divide the visual saliency map of {I _R (x,y)} into visually important areas and non-visually important areas; finally according to the visual importance of the visual saliency map of {I _R (x,y)} Regions and non-visually important regions, obtain the visually important region mask of the stereo image to be evaluated, denoted as {M(x,y)}, where M(x,y) represents the coordinates in {M(x,y)} The pixel value of the pixel at position (x, y);

③According to {d _R (x,y)} and {M(x,y)}, obtain the visually important areas of the visual saliency map in {d _R (x,y)} and {I _R (x,y)} The disparity mean μ, disparity variance δ, maximum negative disparity θ, and disparity range χ of the pixels in the corresponding area, and then arrange μ, δ, θ and χ in order to reflect {d _R (x, y )}, denoted as F ₁ , F ₁ = (μ, δ, θ, χ);

④ By calculating the parallax gradient magnitude image and parallax gradient direction image of {d _R (x, y)}, calculate the parallax gradient edge image of {d _R (x, y)}; then according to {d _R (x, y) } of the disparity gradient edge image and {M(x,y)}, calculate the visually important regions in the disparity gradient edge image of {d _R (x,y)} and the visual saliency map of {I _R (x,y)} The gradient mean value ψ of all pixels in the corresponding area; finally, ψ is used as a feature vector reflecting the parallax gradient feature of {d _R (x, y)}, denoted as F ₂ ;

⑤ Obtain the spatial frequency image of {I _R (x, y)}; then according to the spatial frequency image of {I _R (x, y)} and {M (x, y)}, obtain {I _R (x, y) } in the spatial frequency image of {I _R (x,y)}, the spatial frequency mean ν, spatial frequency variance ρ, spatial frequency range ζ, spatial frequency Sensitivity factor τ; then arrange ν, ρ, ζ and τ in order to form a feature vector used to reflect the spatial frequency characteristics of {I _R (x,y)}, denoted as F ₃ , F ₃ =(ν,ρ ,ζ,τ);

⑥Constitute F ₁ , F ₂ and F ₃ into a new feature vector, denoted as X, X=[F ₁ , F ₂ , F ₃ ], and then use X as the feature vector of the stereoscopic image to be evaluated, where the symbol "[]" is a vector symbol, and [F ₁ , F ₂ , F ₃ ] means connecting F ₁ , F ₂ and F ₃ to form a new feature vector;

⑦ Use n different stereoscopic images and corresponding right parallax images to establish a stereoscopic image set, and use the subjective quality evaluation method to calculate the average subjective score of the visual comfort of each stereoscopic image in the stereoscopic image set, denoted as MOS, where , n≥1, MOS∈[1,5]; then follow steps ① to ⑥ to calculate the feature vector X of the stereo image to be evaluated, and calculate the feature of each stereo image in the stereo image set in the same way Vector, denoting the feature vector of the i-th stereo image in the stereo image set as X _i , wherein, 1≤i≤n, n represents the number of stereo images contained in the stereo image set;

8. Divide all the stereo images in the stereo image set into training set and test set, form the training sample data set with the feature vectors and average subjective score mean of all the stereo images in the training set, and combine the feature vectors and average values of all the stereo images in the test set The mean value of the subjective score constitutes the test sample data set, and then uses support vector regression as a machine learning method to train the feature vectors of all stereo images in the training sample data set, so that the regression function value obtained after training and the average subjective score mean value The error between is the smallest, and the optimal weight vector w ^opt and the optimal bias item b ^opt are obtained by fitting, and then the support vector regression training model is obtained by using w ^opt and b ^opt , and then according to the support vector regression training model, the test The feature vector of each stereoscopic image in the sample data set is tested, and the objective visual comfort evaluation prediction value of each stereoscopic image in the test sample data set is predicted, and the k'th stereoscopic image in the test sample data set is The predicted value of objective visual comfort evaluation is denoted as Q _k' , Q _k' = f(X _k' ),

2. a kind of stereoscopic image visual comfort evaluation method based on machine learning according to claim 1, is characterized in that described step 2. The specific process is:

②-1. Use the visual saliency model based on graph theory to extract the saliency graph of {I _R (x, y)}, denoted as {SM _R (x, y)}, where SM _R (x, y) means The pixel value of the pixel whose coordinate position is (x, y) in {SM _R (x, y)};

in,

Represents the weight of SM _R (x,y), Indicates the weight of d _R (x,y),

②-3. According to the pixel value of each pixel in {D _R (x, y)}, divide {D _R (x, y)} into visually important areas and non-visually important areas, {D _R (x , y)}, the pixel value of each pixel in the visually important area is greater than the adaptive threshold T ₁ , and the pixel value of each pixel in the non-visually important area of {D _R (x, y)} is less than or equal to Adaptive threshold T ₁ , where T ₁ is the threshold obtained by processing {D _R (x, y)} using the Otsu method;

②-4. According to the visually important area and non-visually important area of {D _R (x, y)}, obtain the visually important area mask of the stereo image to be evaluated, which is recorded as {M(x,y)}, and { The pixel value of the pixel whose coordinate position is (x, y) in M(x, y)} is recorded as M(x, y),

m (x, the y) = \{\begin{matrix} 1 & {D.}_{R} (x, the y) > T_{1} \\ 0 & {D.}_{R} (x, the y) \leq T_{1} \end{matrix} .

3. a kind of stereoscopic image visual comfort evaluation method based on machine learning according to claim 2, it is characterized in that described step ②-2 takes

4. according to a kind of machine learning-based three-dimensional image visual comfort evaluation method according to any one of claims 1 to 3, it is characterized in that the concrete process of described step 3. is:

③-1. According to {d _R (x, y)} and {M (x, y)}, calculate the visual saliency map in {d _R (x, y)} and {I _R (x, y)} The average disparity value of all pixels in the area corresponding to the visually important area, denoted as μ,

μ = \frac{\underset{(x, the y) &Element; Ω}{Σ} d_{R} (x, the y) \times m (x, the y)}{\underset{(x, the y) &Element; Ω}{Σ} m (x, the y)},

Among them, Ω represents the image domain range;

③-2. According to {d _R (x,y)} and {M(x,y)} and μ, calculate the visual salience of {I _R (x,y)} in {d _R (x,y)} The disparity variance of all pixels in the area corresponding to the visually important area of the graph is denoted as δ,

δ = \frac{\underset{(x, the y) &Element; Ω}{Σ} {(d_{R} (x, the y) - μ)}^{2} \times m (x, the y)}{\underset{(x, the y) &Element; Ω}{Σ} m (x, the y)};

③-3. Calculate the maximum negative disparity of the pixels in {d _R (x, y)} corresponding to the visually important area of the visual saliency map of {I _R (x, y)}, denoted as θ, Among them, the value of θ is the disparity of 1% of the pixels in {d _R (x, y)} corresponding to the visually important area of the visual saliency map of {I _R (x, y)} with the smallest disparity value mean;

③-4, calculate the disparity range of the pixels in the region corresponding to the visually important region of the visual saliency map of {I _R (x, y)} in {d _R (x, y)}, denoted as χ, χ =d _max -d _min , where d _max represents the largest disparity value in the region corresponding to the visually important region of the visual saliency map of {I _R (x,y)} in {d _R (x,y)} The mean value of the disparity of 1% of the pixels, d _min represents the minimum disparity value of 1 in the area corresponding to the visually important area of the visual saliency map of {I _R (x, y)} in {d _R (x, y)} The average parallax value of % pixels;

③-5. Arrange μ, δ, θ and χ in order to form a feature vector used to reflect the parallax amplitude characteristics of {d _R (x, y)}, denoted as F ₁ , F1=(μ, δ, θ , χ), the dimension of _F1 is 4.

5. a kind of stereoscopic image visual comfort evaluation method based on machine learning according to claim 4, is characterized in that the concrete process of described step 4. is:

④-1. Calculate the parallax gradient magnitude image of {d _R (x,y)}, which is recorded as {m(x,y)}, and the coordinate position in {m(x,y)} is (x,y) The gradient magnitude of the pixel point is recorded as m(x,y),

④-2. Calculate the disparity gradient direction image of {d _R (x,y)}, which is recorded as {θ(x,y)}, and the coordinate position in {θ(x,y)} is (x,y) The gradient direction value of the pixel is recorded as θ(x,y), θ(x,y)=arctan(G _y (x,y)/G _x (x,y)), where arctan() is the arc tangent function;

④-3. According to {m(x,y)} and {θ(x,y)}, calculate the parallax gradient edge image of {d _R (x,y)}, denoted as {E(x,y)}, Record the gradient edge value of the pixel point whose coordinate position is p in {E(x,y)} as E(p),

||pq|| represents the Euclidean distance between the coordinate position p and the coordinate position q, and the symbol "||||" is the Euclidean distance symbol,

Represents a Gaussian function with standard deviation σ _o ,

G_{o} (| | \overset{&Right Arrow;}{θ} (p) - \overset{&Right Arrow;}{θ} (q) | |) = \exp (- \frac{{| | \overset{&Right Arrow;}{θ} (p) - \overset{&Right Arrow;}{θ} (q) | |}^{2}}{{2 σ}_{o}^{2}}),

express and

The Euclidean distance between

\overset{&Right Arrow;}{θ} (p) = [\sin (θ (p)), \cos (θ (p))],

\overset{&Right Arrow;}{θ} (q) = [\sin (θ (q)), \cos (θ (q))],

④-4. According to {E(x,y)} and {M(x,y)}, calculate the visual importance of the visual saliency map in {E(x,y)} and {I _R (x,y)} The gradient mean of all pixels in the region corresponding to the region is denoted as ψ,

Among them, Ω represents the range of the image domain, and E(x, y) represents the gradient edge value of the pixel whose coordinate position is (x, y) in {E(x, y)};

④-5. Take ψ as a feature vector for reflecting the disparity gradient feature of {d _R (x, y)}, denoted as F ₂ , and the dimension of F ₂ is 1.

6. A method for evaluating visual comfort of stereoscopic images based on machine learning according to claim 5, characterized in that σ _s =0.4, σ _o =0.4, ε _g =0.5 in the step ④-3.

7. a kind of machine learning-based stereoscopic image visual comfort evaluation method according to claim 6, is characterized in that in described step 4.-3

has a size of 3×3,

The size is 3×3.

8. a kind of stereoscopic image visual comfort evaluation method based on machine learning according to claim 7, is characterized in that the concrete process of described step 5. is:

⑤-1. Calculate the spatial frequency image of {I _R (x, y)}, which is recorded as {SF(x, y)}, and the pixel whose coordinate position is (x, y) in {SF(x, y)} The spatial frequency value of a point is denoted as SF(x,y),

SF (x, the y) = \sqrt{{(HF (x, the y))}^{2} + {(VF (x, the y))}^{2} + {(DF (x, the y))}^{2}},

Among them, HF(x, y) represents the horizontal direction frequency value of the pixel whose coordinate position is (x, y) in {I _R (x, y)},

HF (x, the y) = \sqrt{\frac{Σ_{m = - 1}^{1} Σ_{no = 0}^{1} {(I_{R} (x + m, the y + no) - I_{R} (x + m, the y + no - 1))}^{2}}{3 \times 2}},

VF(x,y) represents the vertical frequency value of the pixel whose coordinate position is (x,y) in {I _R (x,y)},

VF (x, the y) = \sqrt{\frac{Σ_{m = 0}^{1} Σ_{no = - 1}^{1} {(I_{R} (x + m, the y + no) - I_{R} (x + m - 1, the y + no))}^{2}}{2 \times 3}},

DF(x, y) represents the diagonal direction frequency value of the pixel point whose coordinate position is (x, y) in {I _R (x, y)},

DF (x, the y) = \sqrt{\frac{Σ_{m = 0}^{1} Σ_{no = 0}^{1} {(I_{R} (x + m, the y + no) {- I}_{R} (x + m - 1, the y + no - 1))}^{2}}{2 \times 2}} + \sqrt{\frac{Σ_{m = - 1}^{0} Σ_{no = 0}^{1} {(I_{R} (x + m, the y + no) - I_{R} (x + m + 1, the y + no - 1))}^{2}}{2 \times 2}},

I _R (x+m, y+n) represents the pixel value of the pixel whose coordinate position is (x+m, y+n) in {I _R (x, y)}, and I _R (x+m, y+ n-1) represents the pixel value of the pixel point whose coordinate position is (x+m, y+n-1) in {I _R (x, y)}, and I _R (x+m-1, y+n) represents The pixel value of the pixel whose coordinate position is (x+m-1, y+n) in {I _R (x, y)}, I _R (x+m-1, y+n-1) means {I _R The pixel value of the pixel whose coordinate position is (x+m-1, y+n-1) in (x, y)}, I _R (x+m+1, y+n-1) means that {I _R ( The pixel value of the pixel whose coordinate position is (x+m+1,y+n-1) in x,y)}, if x+m<1, then the value of I _R (x+m,y+n) is replaced by the value of I _R (1,y+n), and the value of I _R (x+m,y+n-1) is replaced by the value of I _R (1,y+n-1); if x+m- 1<1, the value of I _R (x+m-1,y+n) is replaced by the value of I _R (1,y+n), and the value of I _R (x+m-1,y+n-1) The value is replaced by the value of I _R (1,y+n-1); if x+m>W, the value of I _R (x+m,y+n) is replaced by the value of I _R (W,y+n) Instead, the value of I _R (x+m,y+n-1) is replaced by the value of I _R (W,y+n-1); if x+m+1>W, then I _R (x+m+ 1,y+n-1) is replaced by the value of I _R (W,y+n-1); if y+n<1, the value of I _R (x+m,y+n) is replaced by I _R The value of (x+m,1) is replaced, the value of I _R (x+m-1,y+n) is replaced by the value of I _R (x+m-1,1); if y+n-1<1 , then the value of I _R (x+m,y+n-1) is replaced by the value of I _R (x+m,1), and the value of I _R (x+m-1,y+n-1) is replaced by I The value of _R (x+m-1,1) is replaced, and the value of I _R (x+m+1,y+n-1) is replaced by the value of I _R (x+m+1,1); if y+ n>H, then the value of I _R (x+m,y+n) is replaced by the value of I _R (x+m,H), and the value of I _R (x+m-1,y+n) is replaced by I _R The value of (x+m-1,H) is replaced;

⑤-2. According to {SF(x,y)} and {M(x,y)}, calculate the visual importance of the visual saliency map in {SF(x,y)} and {I _R (x,y)} The spatial frequency mean value of all pixels in the region corresponding to the region is denoted as ν,

ν = \frac{\underset{(x, the y) &Element; Ω}{Σ} SF (x, the y) \times m (x, the y)}{\underset{(x, the y) &Element; Ω}{Σ} m (x, the y)},

Among them, Ω represents the image domain range;

⑤-3. According to {SF(x,y)} and {M(x,y)} and ν, calculate the visual saliency map in {SF(x,y)} and {I _R (x,y)} The spatial frequency variance of all pixels in the area corresponding to the visually important area is denoted as ρ,

ρ = \frac{\underset{(x, the y) &Element; Ω}{Σ} {(SF (x, the y) - ν)}^{2} \times m (x, the y)}{\underset{(x, the y) &Element; Ω}{Σ} m (x, the y)};

⑤-4. Calculate the spatial frequency range of the pixels in {SF (x, y)} corresponding to the visually important area of the visual saliency map of {I _R (x, y)}, denoted as ζ, ζ =SF _max -SF _min , where SF _max represents the 1 with the largest spatial frequency value in the region corresponding to the visually important region of the visual saliency map of {I _R (x,y)} in {SF(x,y)} % Spatial frequency mean of pixels, SF _min represents the smallest 1% of the spatial frequency value in the region corresponding to the visually important region of the visual saliency map of {I _R (x,y)} in {SF(x,y)} The spatial frequency mean of the pixel;

⑤-5. Calculate the spatial frequency sensitivity factor of the pixels in the region corresponding to the visually important region of the visual saliency map of {I _R (x, y)} in {SF (x, y)}, denoted as τ, τ=ν/μ;

⑤-6. Arrange ν, ρ, ζ and τ in order to form a feature vector for reflecting the spatial frequency characteristics of {I _R (x, y)}, denoted as F ₃ , F ₃ =(ν, ρ, ζ,τ), the dimension of F ₃ is 4.

9. a kind of stereoscopic image visual comfort evaluation method based on machine learning according to claim 8, is characterized in that the concrete process of described step 8. is:

⑧-1, randomly select the stereo image set

is the symbol for rounding up;

⑧-2. The feature vectors and average subjective ratings of all stereo images in the training set constitute the training sample data set, which is recorded as Ω _t , {X _k ,MOS _k }∈Ω _t , where X _k represents the training sample data set The feature vector of the k-th stereo image in Ω _t , MOS _k represents the average subjective score mean value of the k-th stereo image in the training sample data set Ω _t , 1≤k≤t;

8.-3, construct the regression function of the feature vector of each stereoscopic image in the training sample data set Ω _t , the regression function of X _k is denoted as f(X _k ),

Among them, f() is the function representation, w is the weight vector, w ^T is the transpose matrix of w, b is the bias term,

represents a linear function of X _k ,

D(X _k ,X _l ) is the kernel function in support vector regression,

X _l is the feature vector of the lth stereo image in the training sample data set Ω _t , 1≤l≤t, γ is the kernel parameter, exp() means the exponential function with e as the base, e=2.71828183, the symbol "| |||" is the Euclidean distance symbol;

⑧-4, adopting support vector regression to train the feature vectors of all stereoscopic images in the training sample data set Ω _t , so that the error between the regression function value obtained through training and the mean value of the average subjective rating is the smallest, and the fitting is optimal The weight vector w ^opt and the optimal bias item b ^opt , the combination of the optimal weight vector w ^opt and the optimal bias item b ^opt is recorded as (w ^opt , b ^opt ),

(w^{opt}, b^{opt}) = \underset{(w, b) &Element; Ψ}{\arg \min} Σ_{k = 1}^{t} {(f (x_{k}) - {MOS}_{k})}^{2},

\underset{(w, b) &Element; Ψ}{\arg \min} Σ_{k = 1}^{t} {(f (x_{k}) - {MOS}_{k})}^{2}

express to make

Σ_{k = 1}^{t} {(f (x_{k}) - {MOS}_{k})}^{2}

8.-5, the eigenvectors of all stereoscopic images in the test set and the mean value of the average subjective score form the test sample data set, then according to the support vector regression training model, the eigenvectors of each stereoscopic image in the test sample data set are tested, The objective visual comfort evaluation prediction value of each stereoscopic image in the test sample data set is predicted, and the objective visual comfort evaluation prediction value of the k'th stereo image in the test sample data set is recorded as Q _k' , Q _{k '} = f(X _k' ), Among them, X _k' represents the feature vector of the k'th stereo image in the test sample data set,

Represents the linear function of the k'th stereo image in the test sample data set, 1≤k'≤nt;

⑧-6, and then re-randomly select the stereo image collection

10. A method for evaluating visual comfort of stereoscopic images based on machine learning according to claim 9, characterized in that γ=54 is used in the step 8-3.