CN101740029B

CN101740029B - Three-particle cooperative optimization method applied to vector quantization-based speaker recognition

Info

Publication number: CN101740029B
Application number: CN2009101889638A
Authority: CN
Inventors: 纪震; 薛丽萍; 周家锐; 储颖
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2009-12-16
Filing date: 2009-12-16
Publication date: 2011-12-21
Anticipated expiration: 2029-12-16
Also published as: CN101740029A

Abstract

The present invention relates to a three-particle collaborative optimization method applied to speaker recognition, which is a vector quantization speaker codebook model optimization design method, including: the initial group is 6 particles divided into 2 subgroups, each particle represents a codebook, Each subgroup consists of three particles named three particles, and two three particles use different particle update parameters to realize global exploration and local search. In each iteration, the particle performs PSO velocity and position update and LBG algorithm operation with 3 iterations. Whenever the number of mixing updates is reached, the particles are mixed and divided into new three particles to achieve global information exchange and co-evolution. When the maximum number of iterations of the initial population is satisfied, two particles are selected from the two three particles to continue the search until the maximum number of iterations of the elite particles is reached, and the optimal one is used as the speaker codebook model. The invention better solves the problem that the initial codebook influences the optimization result, and obviously improves the speaker recognition performance of the short voice.

Description

A three-particle collaborative optimization method for speaker recognition based on vector quantization

技术领域 technical field

本发明涉及语音识别技术领域，更具体地说，涉及一种应用于基于矢量量化的说话人识别的三粒子协同优化方法。The invention relates to the technical field of speech recognition, and more specifically, relates to a three-particle cooperative optimization method applied to speaker recognition based on vector quantization.

背景技术 Background technique

说话人识别是现有通信网络信息技术应用所迫切需求的生物识别技术，可广泛应用于金融领域、军事领域(如战场环境监听及指挥员鉴别)、医学领域(如对假肢的控制)、公安司法领域(如罪犯监听与识别)、安全保卫领域(如机场门禁系统控制)、信息服务领域(如自动信息检索或电子商务)等方面。Speaker recognition is a biometric technology that is urgently needed for the application of information technology in existing communication networks. Judicial field (such as monitoring and identification of criminals), security field (such as airport access control system control), information service field (such as automatic information retrieval or e-commerce), etc.

说话人模型是实现高性能说话人识别的核心，自20世纪80年代后期，由于矢量量化、概率统计模型、人工神经网络的说话人模型建立，说话人识别进入了一个崭新的、欣欣向荣的大发展时期。模型训练即优化作为典型的最优化问题，对传统的最优化技术提出了严峻的挑战。过去研究者考虑说话人模型的最优化问题时把重点放在了对确定性优化方法的研究运用上，如LBG算法和最大期望算法，然而确定性优化方法对求解具有多峰值特点的说话人模型有着严重的缺陷，难以获得较精确的全局最优解。要想有所突破，必须依赖于新的算法的研究与应用。训练数据一直是影响说话人识别系统性能的重要因素，要对每个目标说话人建立一套模型，这必然要求尽量多的目标说话人的训练数据。而实际应用系统中，由于说话人识别大多数应用是在目标说话人不愿意主动配合的情况下完成对目标说话人的验证工作，因此目标说话人的大量语音往往是无法获得的，对于训练和识别语音时间低于10秒的短语音实际应用背景，系统识别性能会严重恶化，如何建立最优的短语音说话人模型是当前仍未解决的难点问题。The speaker model is the core of high-performance speaker recognition. Since the late 1980s, speaker recognition has entered a new and thriving development due to the establishment of speaker models based on vector quantization, probability statistics models, and artificial neural networks. period. As a typical optimization problem, model training or optimization poses severe challenges to traditional optimization techniques. In the past, when researchers considered the optimization of the speaker model, they focused on the research and application of deterministic optimization methods, such as the LBG algorithm and the maximum expectation algorithm. It has serious defects and it is difficult to obtain a more accurate global optimal solution. In order to make a breakthrough, it must rely on the research and application of new algorithms. Training data has always been an important factor affecting the performance of the speaker recognition system. To build a model for each target speaker, this must require as much training data as possible for the target speaker. In the actual application system, since most applications of speaker recognition are to complete the verification of the target speaker when the target speaker is unwilling to actively cooperate, a large amount of speech of the target speaker is often unavailable, which is very important for training and In the practical application background of recognizing short voices whose voice time is less than 10 seconds, the recognition performance of the system will seriously deteriorate. How to establish an optimal short voice speaker model is a difficult problem that has not yet been solved.

一、基于矢量量化的说话人识别方法1. Speaker recognition method based on vector quantization

说话人识别系统的实用化发展，使短语音说话人识别成为一个关键问题。对于较少的训练语音，概率统计模型无法得到较准确的模型估计参数，矢量量化(Vector Quantization，VQ)的方法可以取得较好的识别效果。矢量量化是非参数(模板)模型，它是基于这样的假设，每个说话人的语音数据的特征矢量都有一定的分布，这种分布就是该说话人与别人区分开来的信息。矢量量化方法是试图描述这一分布，根据每个说话人训练数据基于失真最小化原则为其建立码本模型(Code-Book Model，CBM)。识别时，从待识别的语音中提取一组特征矢量，对每个码本模型进行矢量量化，求取平均量化误差，对应于最小平均量化误差的码本的说话人便是识别的结果。这种方法具有较小的存储要求和计算开销。基于矢量量化的说话人识别系统框图如图1所示。The practical development of speaker recognition system makes short speech speaker recognition a key issue. For less training speech, the probabilistic statistical model cannot obtain more accurate model estimation parameters, and the vector quantization (Vector Quantization, VQ) method can achieve better recognition results. Vector quantization is a non-parametric (template) model, which is based on the assumption that the feature vector of each speaker's speech data has a certain distribution, and this distribution is the information that distinguishes the speaker from others. The vector quantization method tries to describe this distribution, and builds a codebook model (Code-Book Model, CBM) based on the principle of distortion minimization based on the training data of each speaker. During recognition, a set of feature vectors is extracted from the speech to be recognized, vector quantization is performed on each codebook model, and the average quantization error is calculated. The speaker of the codebook corresponding to the minimum average quantization error is the result of recognition. This approach has small storage requirements and computational overhead. The block diagram of the speaker recognition system based on vector quantization is shown in Figure 1.

在基于矢量量化的说话人识别方法中，码本设计是关键问题。码本设计的主要目标是找到训练矢量的一个最佳分类，即将说话人的T个L维的训练矢量分成M个类别的最佳方案。好的码本能最大程度地提高矢量量化的效果，更准确地描述原始训练样本数据的分布。码本一般是有限空间的一个子空间，根据先验知识和某种失真测度，对大量的训练数据进行聚类而得到的。码本设计也是一个迭代的过程，可以视为一个多峰多极值问题，如同函数优化，其最优码本的搜索也需要优化算法的策略。因此，人们不断从新提出的优化算法中寻求新的码本设计优化策略，例如以下的粒子群优化算法、混合蛙跳算法以及LBG算法。In the speaker recognition method based on vector quantization, codebook design is a key issue. The main goal of codebook design is to find an optimal classification of training vectors, that is, the best scheme to divide the speaker's T L-dimensional training vectors into M categories. A good codebook can maximize the effect of vector quantization and describe the distribution of the original training sample data more accurately. The codebook is generally a subspace of the limited space, which is obtained by clustering a large amount of training data according to prior knowledge and some distortion measure. Codebook design is also an iterative process, which can be regarded as a multi-peak and multi-extreme problem. Like function optimization, the search for the optimal codebook also requires an optimization algorithm strategy. Therefore, people continue to seek new codebook design optimization strategies from newly proposed optimization algorithms, such as the following particle swarm optimization algorithm, hybrid leapfrog algorithm, and LBG algorithm.

二、粒子群优化算法和混合蛙跳算法2. Particle swarm optimization algorithm and hybrid leapfrog algorithm

粒子群优化算法(Particle Swarm Optimization，PSO)是源于自然中社会性生物世界的仿生类算法，基于人工生命理论以及鸟类、鱼类的群集现象，凭借其简单的算法结构和优异的问题求解能力，吸引了众多研究者，并取得了令人注目的成果。粒子群优化算法将群体中的每个个体视为多维搜索空间中一个没有质量和体积的粒子(点)，每个粒子都为优化问题的一个可行解，并由目标函数为之确定一个适应值(Fitness Value)。这些粒子在解空间中以一定的速度飞行，并根据粒子本身的飞行经验以及同伴的飞行经验对自己的飞行速度进行动态调整，即每个粒子通过迭代过程中自身的最优值p_i和群体的最优值p_g来不断地修正自己的前进方向和速度大小，逐步移到较优的区域，并最终搜索、寻找到问题的最优解。Particle Swarm Optimization (PSO) is a bionic algorithm derived from the social biological world in nature. It is based on the theory of artificial life and the flocking phenomenon of birds and fish. With its simple algorithm structure and excellent problem solving It has attracted many researchers and achieved remarkable results. The particle swarm optimization algorithm regards each individual in the group as a particle (point) without mass and volume in the multidimensional search space, and each particle is a feasible solution to the optimization problem, and an fitness value is determined for it by the objective function (Fitness Value). These particles fly at a certain speed in the solution space, and dynamically adjust their flight speed according to the flight experience of the particle itself and the flight experience of its companions, that is, each particle passes through its own optimal value p _i and the group The optimal value of _pg is used to continuously correct its own direction and speed, gradually move to a better area, and finally search and find the optimal solution to the problem.

在一个D维的目标搜索空间中，随机生成P个粒子，第i个粒子的位置可表示为z_i＝(z_i1，z_i2，…，z_iD)，速度为v_i＝(v_i1，v_i2，…，v_iD)，根据适应度函数计算z_i当前的适应值，来衡量粒子位置的优劣。粒子i迄今为止搜索到的最优位置为p_i＝(p_i1，p_i2，…，p_iD)，整个粒子群迄今为止搜索到的最优位置为p_g＝(p_g1，p_g2，…，p_gD)。每次迭代中粒子i的粒子根据以下公式更新速度和位置：In a D-dimensional target search space, P particles are randomly generated, the position of the i-th particle can be expressed as z _i =(z _i1 , z _i2 ,..., z _iD ), and the velocity is v _i =(v _i1 , v _i2 ,...,v _iD ), calculate the current fitness value of z _i according to the fitness function to measure the quality of the particle position. The optimal position searched by particle i so far is p _i =(p _i1 , p _i2 ,..., p _iD ), and the optimal position searched by the entire particle swarm so far is p _g =(p _g1 , p _g2 ,... , p _gD ). The particle of particle i in each iteration updates its velocity and position according to the following formula:

$v_{id}^{k + 1} = w v_{id}^{k} + c_{1} r_{1} (p_{id} - z_{id}^{k}) + c_{2} r_{2} (p_{gd} - z_{id}^{k})$ (式1) $v_{id}^{k + 1} = w v_{id}^{k} + c_{1} r_{1} (p_{id} - z_{id}^{k}) + c_{2} r_{2} (p_{gd} - z_{id}^{k})$ (Formula 1)

$z_{id}^{k + 1} = z_{id}^{k} + v_{id}^{k + 1}$ (式2) $z_{id}^{k + 1} = z_{id}^{k} + v_{id}^{k + 1}$ (Formula 2)

其中，d＝1，2，…，D；k是迭代次数；r₁和r₂为均匀分布在[0，1]之间的随机数；w为惯性权重；c₁，c₂为学习因子，也称加速因子，其使粒子具有自我总结和向群体中优秀个体学习的能力，从而向自己的历史最优点以及群体历史最优点靠近。粒子在目标搜索空间中不断跟踪p_i和p_g进行搜索，直到达到预定的迭代次数为止。Among them, d=1, 2,..., D; k is the number of iterations; r ₁ and r ₂ are random numbers uniformly distributed between [0, 1]; w is inertia weight; c ₁ , c ₂ are learning factors , also known as the acceleration factor, which enables particles to have the ability to self-summarize and learn from outstanding individuals in the group, so as to approach their own historical optimal point and the historical optimal point of the group. Particles keep tracking p _i and p _g in the target search space until the predetermined number of iterations is reached.

在粒子群优化算法中，群体中的所有粒子在整个搜索过程中均采用相同的参数值，很多学者的研究表明惯性权重w、学习因子c₁、c₂不同的取值对算法的性能有很大的影响。在搜索过程中，全局搜索能力与局部搜索能力的平衡关系对于算法的成功起着至关重要的作用。Shi和Eberhart研究了惯性权重w对优化性能的影响，发现较大的w使粒子有良好的全局探索能力，有利于跳出局部极小点，而较小的w值有利于提高局部搜索能力，为此提出了一种是根据算法迭代次数使惯性权重线性递减的方法。算法在初期使用较大惯性权重，后期则使用较小惯性权重。他们又提出了使用模糊控制系统自适应调整惯性权重方法，这些改进算法虽然提高了收敛速度并且在单峰问题上取得了更好的性能，但在解决多峰值函数问题时效果一般。学习因子c₁、c₂决定了粒子本身经验信息和其他粒子的经验信息对粒子运行轨迹的影响，反映了粒子之间的信息交流，能有效地控制全局探索和局部搜索。设置较大c₁和较小的c₂的值，有利于粒子在整个搜索空间中飞行，避免很快飞到群体最优粒子。相反，较小c₁和较大的c₂的值，则有利于群体收敛到最优点，但会促使粒子过早收敛到局部最优值。Ratnaweera建议在整个迭代过程中线性调整c₁、c₂，当c₁＝[2.5□0.5]、c₂＝[0.5□2.5]能改善部分多峰函数的优化效果。In the particle swarm optimization algorithm, all the particles in the group use the same parameter value in the whole search process. Many scholars have shown that the different values of inertia weight w, learning factor c ₁ and c ₂ have great influence on the performance of the algorithm big impact. In the search process, the balance between global search ability and local search ability plays a crucial role in the success of the algorithm. Shi and Eberhart studied the influence of the inertia weight w on the optimization performance, and found that a larger w makes the particle have a good global exploration ability, which is conducive to jumping out of the local minimum point, while a smaller w value is conducive to improving the local search ability, for This proposes a method that linearly decreases the inertia weight according to the number of iterations of the algorithm. The algorithm uses larger inertia weights in the early stages and smaller inertia weights in the later stages. They also proposed the method of using fuzzy control system to adaptively adjust the inertia weight. Although these improved algorithms have improved the convergence speed and achieved better performance on single-peak problems, they are not effective in solving multi-peak function problems. The learning factors c ₁ and c ₂ determine the impact of the particle's own experience information and other particles' experience information on the particle's trajectory, reflecting the information exchange between particles, and can effectively control the global exploration and local search. Setting a larger value of c ₁ and a smaller value of c ₂ is beneficial for particles to fly in the entire search space and avoid flying to the optimal particle of the group. On the contrary, smaller c ₁ and larger c ₂ values are beneficial for the population to converge to the optimal point, but it will prompt the particles to converge to the local optimal value prematurely. Ratnaweera suggested to linearly adjust c ₁ and c ₂ during the whole iterative process, when c ₁ =[2.5□0.5], c ₂ =[0.5□2.5] can improve the optimization effect of some multimodal functions.

混合蛙跳算法(Shuffled Frog-Leaping Algorithm，SFLA)模拟青蛙群体的觅食特性。在一个D维的目标搜索空间中，随机生成P只青蛙(问题的解)组成初始群体，第i只青蛙表示问题的解为X_i＝(x_i1，x_i2，…，x_iD)。青蛙个体按适应度值从优到劣排列，将整个群体分成m个子群体。其中，排名第1的青蛙分入第1子群，排名第2的分入第2子群，排名第m的青蛙分入第m子群，排名第m+1又分入第1子群，依次类推，直到全部青蛙划分完毕。每个子群体进行局部深度搜索，即每次迭代中，该子群中的当前最差个体X_w追随该子群的当前最好个体X_b或整个群体中的最好个体X_g进行更新操作，若X_w的适应度值无改进，则随机产生一个新的X_w。当所有子群都完成一定迭代次数局部搜索后，混合策略将所有个体混合并排序和重新划分子群体，然后再进行局部深度搜索，如此反复直到满足终止条件。Shuffled Frog-Leaping Algorithm (SFLA) simulates the foraging characteristics of frog populations. In a D-dimensional target search space, randomly generate P frogs (solutions to the problem) to form the initial population, and the i-th frog represents the solution to the problem as X _i =( _xi1 , _xi2 ,..., _xiD ). The individual frogs are arranged from good to bad according to their fitness value, and the whole group is divided into m subgroups. Among them, the frog ranked first is divided into the first subgroup, the frog ranked second is divided into the second subgroup, the frog ranked m is divided into the m subgroup, and the frog ranked m+1 is divided into the first subgroup. And so on, until all the frogs are divided. Each subgroup conducts a local deep search, that is, in each iteration, the current worst individual X _w in the subgroup follows the current best individual X _b in the subgroup or the best individual X _g in the entire group for update operations, If there is no improvement in the fitness value of X _w , a new X _w is randomly generated. When all subgroups have completed a certain number of iterations of local search, the mixed strategy mixes and sorts all individuals and re-divides subgroups, and then performs local deep search, and so on until the termination condition is met.

三、性能评价准则3. Performance Evaluation Criteria

说话人码本模型设计质量由失真测度来表示。失真测度通常采用训练矢量与对应的最近码字之间的均方误差(MSE)来表示，简写为

令说话人训练语音的L维特征矢量集为X＝{x₁，x₂，…x_T}，x_i＝{x_i1，x_i2，…，x_iL}，T为训练语音样本集中特征矢量的数目，Y是由M个L维的码字组成的码本，表示该说话人的模型，即Y＝{y₁，y₂，…，y_M}。The speaker codebook model design quality is represented by a distortion measure. The distortion measure is usually represented by the mean square error (MSE) between the training vector and the corresponding nearest codeword, abbreviated as

Let the L-dimensional feature vector set of the speaker's training voice be X={x ₁ , x ₂ ,...x _T }, x _i ={ _xi1 , x _i2 ,..., x _iL }, T is the feature vector in the training voice sample set Y is a codebook composed of M L-dimensional codewords, representing the speaker's model, that is, Y={y ₁ , y ₂ , . . . , y _M }.

$\tilde{D} = \frac{1}{T} Σ_{i = 1}^{T} {[d_{\min} (x_{i})]}^{2}$ (式3) $\tilde{D.} = \frac{1}{T} Σ_{i = 1}^{T} {[d_{\min} (x_{i})]}^{2}$ (Formula 3)

其中，

d(x_i，y_i)为Euclidean距离。in,

d(x _i , y _i ) is the Euclidean distance.

$d (x_{i}, y_{j}) = | | x_{i} - y_{j} | | = \sqrt{Σ_{p = 1}^{L} {(x_{ip} - y_{jp})}^{2}}$ (式4) $d (x_{i}, {the y}_{j}) = | | x_{i} - {the y}_{j} | | = \sqrt{Σ_{p = 1}^{L} {(x_{ip} - {the y}_{jp})}^{2}}$ (Formula 4)

四、LBG算法4. LBG Algorithm

LBG算法是实现矢量量化的说话人码本模型设计最通用的方法。基于最优矢量量化器的最近邻条件和质心条件，LBG算法把欧几里德距离度量(失真测度)用于矢量间的最近邻划分，给定初始码本，训练矢量的Voronoi划分R⁰是确定的，根据失真测度最近邻条件由划分R⁰计算出的新的胞腔得到新的码本Y¹，再由码本Y¹得到的划分R¹；如此迭代下去，直到满足失真误差条件，或达到迭代次数后结束迭代，并生成最终码本。这种迭代过程虽然不能保证最后能得到最优码本，但是每次迭代总能减少(或保持不变)平均失真，使得码本性能逐渐提高。LBG算法的流程图如图2所示。The LBG algorithm is the most general method for designing speaker codebook models for vector quantization. Based on the nearest neighbor condition and centroid condition of the optimal vector quantizer, the LBG algorithm uses the Euclidean distance measure (distortion measure) for the nearest neighbor division between vectors. Given the initial codebook, the Voronoi division ^R of the training vector is Definitely, according to the distortion measure nearest neighbor condition, the new cell calculated by dividing R ⁰ gets a new codebook Y ¹ , and then divides R ¹ obtained from the codebook Y ¹ ; so iteratively continues until the distortion error condition is satisfied, Or end the iteration after reaching the number of iterations, and generate the final codebook. Although this iterative process cannot guarantee that the optimal codebook can be obtained in the end, each iteration can always reduce (or keep unchanged) the average distortion, so that the performance of the codebook is gradually improved. The flowchart of the LBG algorithm is shown in Figure 2.

LBG虽然生成的码本效果比较好，但是它有一个明显的缺点，由于每次迭代都只能改变码本的局部变化，即每次迭代后，与旧码本相比，新码本不可能有非常大的变化，所以LBG对初始码本的要求非常高，选择不同的初始码本，将会获得不同的聚类效果。如果初始码本选择的不好，将极大的影响最后生成的码本的性能和程序运算时间，LBG算法的这种不稳定性严重地影响了码本的设计质量，这也是人们重视码本设计并重点研究其优化设计方法的原因。Although the effect of the codebook generated by LBG is relatively good, it has an obvious shortcoming, because each iteration can only change the local changes of the codebook, that is, after each iteration, compared with the old codebook, the new codebook cannot There are very large changes, so LBG has very high requirements for the initial codebook. If you choose a different initial codebook, you will get different clustering effects. If the initial codebook is not well selected, it will greatly affect the performance of the final generated codebook and the program operation time. The instability of the LBG algorithm seriously affects the design quality of the codebook, which is why people attach importance to the codebook. Design and focus on the reasons for its optimal design approach.

五、粒子对协同优化算法5. Particle pair collaborative optimization algorithm

薛丽萍等人(薛丽萍，尹俊勋，周家锐，纪震.基于粒子对协同优化的说话人辨认.电子学报，2009，37(1)：207-211)提出一种新的说话人码本模型的优化设计方法——粒子对协同优化算法(Particle-Pair Cooperative Optimizer，PPCO)。在传统粒子群优化(Particle Swarm Optimization，PSO)算法的基础上，借鉴基于种间竞争机制的协同进化思想，通过初始粒子对之间粒子的迁移达到信息的交流，实现粒子对的协同进化。粒子对协同优化算法用两个粒子构成群体规模较小的粒子对，形成协同工作关系，如图3所示。每个粒子在每次迭代中除了执行PSO算法的基本操作(速度更新和位置更新)，还充分利用LBG的局部寻优能力，执行迭代次数为3的LBG算法。PPCO算法初始群体规模为4个粒子，分为两个粒子对分别为{P₁，P₂}和{P₃，P₄}，它们在搜索过程中是作为两个独立的群体进行速度更新和位置更新，在训练语音样本矢量空间中的并行搜索最优VQ码本。初始粒子对的两个粒子具有自我总结和向粒子对中对方个体学习的能力，从而向自己的历史最优点以及对方个体历史最优点靠近。每隔一定的迭代次数，实施粒子对之间的信息交换：随机从一个粒子对中选取一个粒子与另一个粒子对的粒子交换，实现粒子对的协同进化。反复不断搜索并进化，直至满足初始粒子对的最大迭代次数，较优者将被选为精英粒子。分别从两个初始粒子对中选出的两个精英粒子EP₁和EP₂重新组合成一个新的精英粒子对{EP₁，EP₂}，继续进行搜索和进化，较优者EP₃将被选为最终解。基于粒子对协同优化算法(PPCO)的矢量量化说话人识别系统框图如图1所示。Xue Liping et al. (Xue Liping, Yin Junxun, Zhou Jiarui, Ji Zhen. Speaker identification based on particle pair collaborative optimization. Acta Electronica Sinica, 2009, 37(1): 207-211) proposed an optimal design of a new speaker codebook model Method - Particle-Pair Cooperative Optimizer (PPCO). On the basis of the traditional particle swarm optimization (PSO) algorithm, the idea of co-evolution based on the inter-species competition mechanism is used for reference, and the information exchange is achieved through the migration of particles between the initial particle pairs to realize the co-evolution of particle pairs. The particle pair collaborative optimization algorithm uses two particles to form a particle pair with a small group size, forming a cooperative working relationship, as shown in Figure 3. In addition to performing the basic operations of the PSO algorithm (velocity update and position update) in each iteration, each particle also makes full use of the local optimization ability of LBG to execute the LBG algorithm with 3 iterations. The initial population size of the PPCO algorithm is 4 particles, which are divided into two particle pairs respectively {P ₁ , P ₂ } and {P ₃ , P ₄ }, which are used as two independent groups for speed update and Position update, a parallel search for the optimal VQ codebook in the training speech sample vector space. The two particles of the initial particle pair have the ability of self-summarization and learning from the other individual in the particle pair, so as to approach their own historical optimal point and the historical optimal point of the other individual. Every certain number of iterations, implement information exchange between particle pairs: randomly select one particle from one particle pair to exchange with another particle pair, and realize the co-evolution of particle pairs. Repeatedly search and evolve until the maximum number of iterations of the initial particle pair is met, and the better one will be selected as the elite particle. The two elite particles EP ₁ and EP ₂ selected from the two initial particle pairs are recombined into a new elite particle pair {EP ₁ , EP ₂ }, and the search and evolution are continued, and the better one EP ₃ will be selected as the final solution. The block diagram of the vector quantization speaker recognition system based on Particle Pair Cooperative Optimization Algorithm (PPCO) is shown in Figure 1.

六、说话人语音信号的预处理和特征提取6. Preprocessing and Feature Extraction of Speaker Speech Signal

说话人语音信号采样后，首先经过预加重滤波器H(z)＝1-0.95z^-1，然后进行分帧、加窗，帧长20ms，帧移10ms，窗函数采用Hamming窗。提取15维Mel频率倒谱系数(Mel Frequency Cepstrum Coefficient，MFCC)作为说话人语音特征矢量。After the speaker's voice signal is sampled, it first goes through a pre-emphasis filter H(z)=1-0.95z ^-1 , and then performs frame division and window addition. The frame length is 20ms, and the frame shift is 10ms. The window function uses a Hamming window. The 15-dimensional Mel Frequency Cepstrum Coefficient (MFCC) is extracted as the speaker's speech feature vector.

七、说话人的码本模型建立7. Establishment of the speaker's codebook model

令说话人训练语音的L维特征矢量集为X＝{x₁，x₂，…x_T}，x_i＝{x_i1，x_i2，…，x_iL}，T为训练语音样本集中特征矢量的数目，Y是由M个L维的码字组成的码本，表示该说话人的模型，即Y＝{y₁，y₂，…，y_M}。PPCO把矢量集X的T个特征矢量根据的失真测度准则分配到M簇中，每一簇由一个码字代表。最终每一簇中的训练矢量都用其对应的码字代替。与训练语音样本集中特征矢量数目T相比，码本的大小(码字数目)要小得多。显然，码字的统计分布与原始训练语音样本特征矢量的分布是一样的。因此，在保持原有分布基本信息的基础上，码本大大减少了所需要处理的数据量。Let the L-dimensional feature vector set of the speaker's training voice be X={x ₁ , x ₂ ,...x _T }, x _i ={ _xi1 , x _i2 ,..., x _iL }, T is the feature vector in the training voice sample set Y is a codebook composed of M L-dimensional codewords, representing the speaker's model, that is, Y={y ₁ , y ₂ , . . . , y _M }. PPCO distributes the T feature vectors of the vector set X into M clusters according to the distortion measurement criterion, and each cluster is represented by a codeword. Finally, the training vectors in each cluster are replaced by their corresponding codewords. Compared with the number T of feature vectors in the training speech sample set, the size of the codebook (the number of codewords) is much smaller. Obviously, the statistical distribution of codewords is the same as the distribution of feature vectors of original training speech samples. Therefore, on the basis of maintaining the basic information of the original distribution, the codebook greatly reduces the amount of data that needs to be processed.

PPCO训练说话人码本模型，采用群体规模小、多粒子对、协同优化、PSO和LBG混合操作、精英粒子对等多种策略优化码本设计。其主要步骤如下：PPCO trains the speaker codebook model, and optimizes the codebook design by adopting various strategies such as small group size, multi-particle pairs, collaborative optimization, PSO and LBG mixed operation, and elite particle pairing. Its main steps are as follows:

(1)说话人初始码本的生成(1) Generation of the speaker's initial codebook

随机初始化群体，从训练特征矢量集随机地选取M个矢量作为每个粒子的初始码字。对每个粒子，计算其适应值。设置初始粒子对最大迭代次数，初始粒子对粒子交换的间隔迭代次数，精英粒子对最大迭代次数。The population is initialized randomly, and M vectors are randomly selected from the training feature vector set as the initial codeword of each particle. For each particle, calculate its fitness value. Set the maximum number of iterations for initial particle pairs, the number of interval iterations for particle exchange between initial particles, and the maximum number of iterations for elite particle pairs.

(2)两个初始粒子对的操作(2) Operation of two initial particle pairs

①按适应值更新确定p_i和p_g；①Update and determine p _i and p _g according to the fitness value;

②粒子按公式(式1)、(式2)更新速度和位置；② Particles update their velocity and position according to the formulas (Formula 1) and (Formula 2);

③粒子执行迭代次数为3的LBG操作，并处理空码字；③Particles perform LBG operations with an iteration number of 3 and process empty codewords;

④判断，如果达到交换粒子的间隔迭代次数，随机从一个粒子对中选取一个粒子与另一个粒子对的粒子交换，实施粒子对之间的信息交换。④ Judgment, if the number of interval iterations for exchanging particles is reached, randomly select a particle from a particle pair to exchange particles with another particle pair, and implement information exchange between particle pairs.

重复执行①、②、③、④步，直至满足初始粒子对的最大迭代次数。Repeat steps ①, ②, ③, and ④ until the maximum number of iterations for the initial particle pair is met.

(3)精英粒子对的更新和LBG操作(3) Update of elite particle pairs and LBG operation

在两个初始粒子对迭代达到中止条件后，从中分别选出两个最佳粒子组成精英粒子对。按照更新初始粒子对的方法，利用公式(式1)、(式2)对精英粒子对进行速度和位置更新以及LBG操作，直至达到精英粒子对的最大迭代次数。精英粒子对的最佳粒子作为训练结果，得到说话人码本模型。After the iteration of the two initial particle pairs reaches the termination condition, two optimal particles are selected from them to form elite particle pairs. According to the method of updating the initial particle pair, use the formulas (Formula 1) and (Formula 2) to update the velocity and position of the elite particle pair and perform LBG operations until the maximum number of iterations of the elite particle pair is reached. The best particle of the elite particle pair is used as the training result to obtain the speaker codebook model.

八、说话人识别8. Speaker Recognition

在说话人识别阶段，识别过程如下：In the speaker recognition stage, the recognition process is as follows:

(1)从测试语音提取特征矢量集X＝{x₁，x₂，…x_T}；(1) Extract feature vector set X={x ₁ , x ₂ ,...x _T } from the test speech;

(2)用系统中建立的N个码本依次对测试语音特征矢量集X＝{x₁，x₂，…x_T}，进行矢量量化，计算各自的量化误差，即(2) Use the N codebooks established in the system to perform vector quantization on the test speech feature vector set X={x ₁ , x ₂ ,...x _T } in turn, and calculate the respective quantization errors, namely

$ξ_{n} = \frac{1}{T} Σ_{t = 1}^{T} \min_{1 \leq l \leq M} {[d (x_{t}, y_{l}^{n})]}^{2}, n = 1,2, LN$ (式5) $ξ_{no} = \frac{1}{T} Σ_{t = 1}^{T} \min_{1 \leq l \leq m} {[d (x_{t}, {the y}_{l}^{no})]}^{2}, no = 1,2, LN$ (Formula 5)

其中，y_l ⁿ是第n个码本(对应第n个说话人)中第l个码字矢量。Among them, y _l ⁿ is the lth codeword vector in the nth codebook (corresponding to the nth speaker).

(3)选择平均量化误差最小的码本所对应的说话人作为系统的识别结果。(3) Select the speaker corresponding to the codebook with the smallest average quantization error as the recognition result of the system.

九、现有技术方案的主要缺点Nine, the main shortcoming of prior art scheme

LBG算法对初始码本的要求非常高，选择不同的初始码本，将会获得不同的聚类效果。如果初始码本选择的不好，会影响码本性能和程序运算时间，从而影响到说话人识别性能。The LBG algorithm has very high requirements on the initial codebook, and different clustering effects will be obtained if different initial codebooks are selected. If the initial codebook is not well selected, it will affect the codebook performance and program operation time, thereby affecting the speaker recognition performance.

尽管粒子对协同优化算法的码本设计质量较传统LBG算法有较大提高，但仍然存在对初始码本的选择具有一定的敏感性，有可能陷入局部最优值，不能保证能够寻找到全局最优码本。Although the codebook design quality of the particle pair collaborative optimization algorithm has been greatly improved compared with the traditional LBG algorithm, there is still a certain sensitivity to the selection of the initial codebook, which may fall into the local optimum, and cannot guarantee that the global optimum can be found. Ucodebook.

发明内容 Contents of the invention

本发明要解决的技术问题在于，提出一种应用于基于矢量量化的说话人识别的三粒子协同优化方法，为改进的矢量量化说话人码本模型优化设计方法-三粒子协同优化方法(Triple-Particle Cooperative Optimizer，TPCO)。The technical problem to be solved in the present invention is to propose a three-particle collaborative optimization method applied to speaker recognition based on vector quantization, which is an improved vector quantization speaker codebook model optimization design method-three-particle collaborative optimization method (Triple- Particle Cooperative Optimizer, TPCO).

本发明解决其技术问题所采用的技术方案是：构造一种应用于基于矢量量化的说话人识别的三粒子协同优化方法，包括对说话人语音信号进行预处理和特征提取，建立说话人码本模型以及识别说话人，所述建立说话人码本模型的步骤包括：The technical solution adopted by the present invention to solve the technical problem is: to construct a three-particle collaborative optimization method applied to speaker recognition based on vector quantization, including preprocessing and feature extraction of the speaker's voice signal, and establishing a speaker codebook Model and identifying the speaker, the steps of establishing the speaker codebook model include:

A，设置初始群体最大迭代次数、三粒子混合更新次数以及精英粒子最大迭代次数，初始群体规模为6个粒子，每个粒子代表一个码本，按混合蛙跳算法的划分规则分成2个子群，每个子群由三个粒子组成命名为三粒子，2个三粒子分别采用不同的粒子更新参数，以实现1个三粒子具有良好的全局探索能力，另1个三粒子在每一个局部能够找到最好解；A. Set the maximum number of iterations of the initial group, the number of mixed updates of the three particles, and the maximum number of iterations of the elite particles. The initial group size is 6 particles, each particle represents a codebook, and is divided into 2 subgroups according to the division rules of the hybrid leapfrog algorithm. Each subgroup is composed of three particles and named three particles. The two three particles use different particle update parameters to realize that one three particles has a good global exploration ability, and the other three particles can find the best results in each part. easy to understand;

B，在每次迭代中对所述三粒子的每一粒子执行PSO算法的速度更新和位置更新操作，以及执行迭代次数为3的LBG算法的混合更新操作；B, performing the speed update and position update operations of the PSO algorithm for each particle of the three particles in each iteration, and performing the mixed update operation of the LBG algorithm with a number of iterations of 3;

C，所述初始群体的2个三粒子在解空间同时进行大范围的全局探测和小范围的局部精细搜索，每当达到三粒子混合更新次数时，采用混合蛙跳算法的子群体划分规则和混合策略，并将所述2个三粒子混合并排名划分为新的2个三粒子，实现2个三粒子间的全局信息交换和协同进化；C, the two three-particles of the initial group simultaneously perform a large-scale global detection and a small-scale local fine search in the solution space. Whenever the number of three-particle mixed updates is reached, the sub-group division rules of the hybrid leapfrog algorithm and A mixed strategy, and the two three-particles are mixed and ranked into two new three-particles, so as to realize global information exchange and co-evolution between the two three-particles;

D，当满足初始群体的最大迭代次数时，分别从2个三粒子中选出的2个精英粒子继续进行速度更新和位置更新操作，直至达到精英粒子的最大迭代次数，最优者作为说话人码本模型。D. When the maximum number of iterations of the initial group is satisfied, the two elite particles selected from the two three particles continue to perform speed update and position update operations until the maximum number of iterations of the elite particles is reached, and the optimal one is the speaker codebook model.

本发明中，所述步骤A具体还包括：In the present invention, said step A specifically also includes:

A1，建立粒子结构和适应度函数；A1, establish particle structure and fitness function;

A2，三粒子划分原则。A2, the principle of three-particle division.

本发明中，所述步骤B具体还包括：In the present invention, the step B specifically also includes:

B1，粒子速度和位置更新策略。B1, particle velocity and position update strategy.

本发明中，粒子的最优位置由适应度函数值决定，所述步骤A1具体包括：In the present invention, the optimal position of the particle is determined by the fitness function value, and the step A1 specifically includes:

A11，所述粒子结构的建立是基于码字的，每个粒子代表一个码本Y，Y是由M个L维的码字组成，表示该说话人码本模型，即Y＝{y₁，y₂，…，y_M}，将说话人训练特征矢量集聚类成M簇，每个码字

(j＝1，2，…，M)代表一簇；A11, the establishment of the particle structure is based on codewords, each particle represents a codebook Y, and Y is composed of M codewords of L dimension, representing the speaker codebook model, that is, Y={y ₁ , y ₂ ,...,y _M }, cluster the speaker training feature vector set into M clusters, each codeword

(j=1, 2, ..., M) represents a cluster;

A12，粒子的维数为D维，其中D＝M×L，第i个粒子(i＝1，2，3)的位置为z_i＝(z_i1，z_i2，…，z_iD)，粒子位置z_i的取值范围为z_min～z_max，在语音信号的矢量量化过程中，z_min和z_max一般分别取语音特征矢量集中每一维的最小值和最大值，速度v_i的最大值v_max＝z_max。A12, the dimension of the particle is D dimension, where D=M×L, the position of the i-th particle (i=1, 2, 3) is z _i =(z _i1 , z _i2 ,..., z _iD ), the particle The value range of position z _i is z _min ~ z _max . In the process of vector quantization of speech signals, z _min and z _max generally take the minimum value and maximum value of each dimension in the speech feature vector set, and the maximum value of velocity v _i Value v _max =z _max .

本发明中，所述步骤A2具体包括：In the present invention, the step A2 specifically includes:

按照混合蛙跳算法的子群划分规则，将群体中的6个粒子按适应值从优到劣排列，分成两个子群体SWARM_S和SWARM_L，其中，排名第1的个粒子分入SWARM_S，排名第2个粒子分入SWARM_L，第3的粒子分入SWARM_S，第4的粒子分入SWARM_L，第5的粒子分入SWARM_S，第6的粒子分入SWARM_L。According to the subgroup division rules of the hybrid leapfrog algorithm, the 6 particles in the group are arranged from good to bad according to their fitness values, and are divided into two subgroups SWARM _S and SWARM _L. Among them, the particle ranked first is divided into SWARM _S , ranking The second particle is classified into SWARM _L , the third particle is classified into SWARM _S , the fourth particle is classified into SWARM _L , the fifth particle is classified into SWARM _S , and the sixth particle is classified into SWARM _L.

本发明中，所述步骤B1中，2个三粒子的粒子速度更新和位置更新的公式为(式6)至(式9)：In the present invention, in the step B1, the formulas for updating the particle velocity and position of the two three particles are (Formula 6) to (Formula 9):

$v_{id}^{L} = w^{L} v_{id}^{L} + c_{1}^{L} r_{1} (p_{id}^{L} - x_{id}^{L}) + c_{2}^{L} r_{2} (p_{gd}^{L} - x_{id}^{L})$ (式6) $v_{id}^{L} = w^{L} v_{id}^{L} + c_{1}^{L} r_{1} (p_{id}^{L} - x_{id}^{L}) + c_{2}^{L} r_{2} (p_{gd}^{L} - x_{id}^{L})$ (Formula 6)

$x_{id}^{L} = x_{id}^{L} + v_{id}^{L}$ (式7) $x_{id}^{L} = x_{id}^{L} + v_{id}^{L}$ (Formula 7)

$v_{jd}^{S} = w^{S} v_{jd}^{S} + c_{1}^{S} r_{1} (p_{jd}^{S} - x_{jd}^{S}) + c_{2}^{S} r_{2} (p_{gd}^{S} - x_{jd}^{S})$ (式8) $v_{jd}^{S} = w^{S} v_{jd}^{S} + c_{1}^{S} r_{1} (p_{jd}^{S} - x_{jd}^{S}) + c_{2}^{S} r_{2} (p_{gd}^{S} - x_{jd}^{S})$ (Formula 8)

$x_{jd}^{S} = x_{jd}^{S} + v_{jd}^{S}$ (式9) $x_{jd}^{S} = x_{jd}^{S} + v_{jd}^{S}$ (Formula 9)

其中，d＝1，2，…，D，r₁和r₂为均匀分布在[0，1]之间的随机数，w^L、c₁ ^L、c₂ ^L为SWARM_L的参数，w^S、c₁ ^S、c₂ ^S为SWARM_S的参数，w为惯性权重，c₁、c₂为学习因子；Among them, d=1, 2,..., D, r ₁ and r ₂ are random numbers evenly distributed between [0, 1], w ^L , c ₁ ^L , c ₂ ^L are the parameters of SWARM _L , w ^S , c ₁ ^S , c ₂ ^S are parameters of SWARM _S , w is inertia weight, c ₁ , c ₂ are learning factors;

其中，精英粒子速度更新和位置更新公式采用(式8)和(式9)。Among them, the elite particle speed update and position update formulas use (Formula 8) and (Formula 9).

本发明中，选取由特征矢量与其对应的最近码字之间的均方误差MSE作为适应度函数，计算均方误差MSE的公式为：In the present invention, choose the mean square error MSE between feature vector and its corresponding nearest code word as fitness function, calculate the formula of mean square error MSE as:

其中，

表示均方误差MSE，说话人训练语音的L维特征矢量集为X＝{x₁，x₂，…x_T}，x_i＝{x_i1，x_i2，…，x_iL}，T为训练语音样本集中特征矢量的数目，Y是由M个L维的码字组成的码本，表示该说话人的模型，即Y＝{y₁，y₂，…，y_M}，

d(x_i，y_j)为Euclidean距离。in,

Indicates the mean square error MSE, the L-dimensional feature vector set of the speaker's training speech is X={x ₁ , x ₂ ,...x _T }, x _i ={x _i1 , x _i2 ,..., x _iL }, T is the training The number of feature vectors in the speech sample set, Y is a codebook composed of M codewords of L dimension, representing the model of the speaker, that is, Y={y ₁ , y ₂ ,...,y _M },

d(x _i , y _j ) is the Euclidean distance.

本发明中，所述三粒子SWARM_L选取较大的w、c₁和较小的c₂，所述三粒子SWARM_S选取较小的w、c₁和较大的c₂。In the present invention, the three-particle SWARM _L selects larger w, c ₁ and relatively small c ₂ , and the three-particle SWARM _S selects relatively small w, c ₁ and relatively large c ₂ .

本发明中，完成粒子速度和位置更新，并执行迭代次数为3的LBG算法之后，处理空码字，用具有较大误差的训练矢量来代替具有越界问题的码字。In the present invention, after updating the particle velocity and position, and executing the LBG algorithm with an iteration number of 3, the empty codeword is processed, and the codeword with a problem of crossing the boundary is replaced by a training vector with a large error.

本发明的有益效果是，本发明综合粒子对协同优化算法和混合蛙跳算法的优点，从控制群体中信息传播、协调全局探索和局部深度搜索能力入手，提高群体的多样性克服早熟，提高收敛速度和求解精度，说话人模型生成时间短，能够稳定地取得较好的语音识别效果，还能够更有效地避免粒子陷入局部最优码本，使整体码本向全局最优解进一步靠近，同时更好地抑制了矢量量化中初始码本对优化结果的影响，提高说话人识别性能。The beneficial effect of the present invention is that the present invention integrates the advantages of the particle pair collaborative optimization algorithm and the hybrid leapfrog algorithm, starts from controlling information dissemination in the group, coordinates global exploration and local deep search capabilities, improves the diversity of the group, overcomes premature maturity, and improves convergence Speed and solution accuracy, the generation time of the speaker model is short, can stably achieve better speech recognition results, and can more effectively prevent particles from falling into the local optimal codebook, making the overall codebook closer to the global optimal solution, and at the same time The impact of the initial codebook on the optimization result in vector quantization is better suppressed, and the performance of speaker recognition is improved.

附图说明 Description of drawings

下面将结合附图及实施例对本发明作进一步说明，附图中：The present invention will be further described below in conjunction with accompanying drawing and embodiment, in the accompanying drawing:

图1是基于矢量量化的说话人识别系统框图；Fig. 1 is a block diagram of a speaker recognition system based on vector quantization;

图2是现有技术LBG算法流程图；Fig. 2 is prior art LBG algorithm flowchart;

图3是粒子对协同优化算法示意图；Fig. 3 is a schematic diagram of particle pair collaborative optimization algorithm;

图4是本发明三粒子协同优化方法的基本流程图。Fig. 4 is a basic flowchart of the three-particle collaborative optimization method of the present invention.

具体实施方式 Detailed ways

为使对本发明的结构特征及所达成的功效有更进一步的了解与认识，用以较佳的实施例及附图配合详细的说明，说明如下：In order to have a further understanding and understanding of the structural features of the present invention and the achieved effects, the preferred embodiments and accompanying drawings are used for a detailed description, as follows:

本发明所要解决的技术问题是提出一种改进的矢量量化说话人码本模型优化设计方法-三粒子协同优化方法(Triple-Particle Cooperative Optimizer，TPCO)。本发明综合粒子对协同优化算法和混合蛙跳算法的优点，从控制群体中信息传播、协调全局探索和局部深度搜索能力入手，提高群体的多样性克服早熟，提高收敛速度和求解精度，此方法能够更有效地避免粒子陷入局部最优码本，使整体码本向全局最优解进一步靠近，同时更好地抑制了矢量量化中初始码本对优化结果的影响，提高说话人识别性能。The technical problem to be solved by the present invention is to propose an improved vector quantization speaker codebook model optimization design method - triple-particle cooperative optimization method (Triple-Particle Cooperative Optimizer, TPCO). The present invention integrates the advantages of the particle pair collaborative optimization algorithm and the hybrid leapfrog algorithm, starts from the information dissemination in the control group, coordinates global exploration and local deep search capabilities, improves the diversity of the group, overcomes prematurity, and improves the convergence speed and solution accuracy. It can more effectively prevent particles from falling into the local optimal codebook, and make the overall codebook closer to the global optimal solution. At the same time, it can better suppress the influence of the initial codebook on the optimization result in vector quantization, and improve the speaker recognition performance.

一、基本原理1. Basic principles

本发明所述的三粒子协同优化方法沿用了PPCO算法中初始粒子对和精英粒子对的基本思想，仍采用群体规模小、多粒子群、协同优化、PSO和LBG混合操作、精英粒子对等多种策略优化码本设计。TPCO初始群体规模为6个粒子，按SFLA的划分规则分成2个子群体SWARM_L和SWARM_S，每个子群由三个粒子组成，命名为三粒子。SWARM_L选取较大的w、c₁和较小的c₂，以实现该三粒子中的粒子能够尽可能地分散到达解空间的各个位置，使该三粒子具有良好的全局探索能力，在局部极值点的邻域内能够良好地逃逸，与此对应的SWARM_S选取较小的w、c₁和较大的c₂，以保证该三粒子的粒子在每一个局部能够找到最好解。在每次迭代中粒子执行PSO的速度更新和位置更新以及迭代次数为3的LBG算法的混合更新操作。2个三粒子在解空间同时进行大范围的全局探测和小范围的局部精细搜索，每当完成一定迭代次数的搜索后，按SFLA的混合策略将所有粒子混合并排名划分新的2个三粒子，实现2个三粒子间的全局信息交换和协同进化。反复不断搜索并进化，直至满足初始群体的最大迭代次数，分别从2个三粒子中选出的2个精英粒子继续进行精细搜索和进化，直至达到精英粒子的最大迭代次数，最优者将被选为最终码本模型。The three-particle collaborative optimization method described in the present invention follows the basic idea of the initial particle pair and the elite particle pair in the PPCO algorithm, and still adopts small group size, multi-particle swarm, collaborative optimization, PSO and LBG mixed operation, and elite particle pairs. A strategy to optimize the codebook design. The initial population size of TPCO is 6 particles, which are divided into two subgroups, SWARM _L and SWARM _S , according to the division rules of SFLA. Each subgroup consists of three particles and is named as three particles. SWARM _L selects larger w, c ₁ and smaller c ₂ to realize that the particles in the three particles can disperse as much as possible to reach various positions in the solution space, so that the three particles have good global exploration ability and local The neighborhood of extreme points can escape well, and the corresponding SWARM _S chooses smaller w, c ₁ and larger c ₂ to ensure that the particles of the three particles can find the best solution in each locality. In each iteration, the particle performs the velocity update and position update of PSO and the mixed update operation of LBG algorithm with the number of iterations being 3. Two three-particles conduct large-scale global detection and small-scale local fine search in the solution space at the same time. After completing a certain number of iterations of search, all particles are mixed and ranked according to the SFLA mixing strategy to divide the new two three-particles , to realize the global information exchange and co-evolution between two three particles. Repeatedly search and evolve until the maximum number of iterations of the initial population is met. Two elite particles selected from the two three particles respectively continue to conduct fine search and evolution until the maximum number of iterations of the elite particles is reached. The optimal one will be selected selected as the final codebook model.

二、粒子结构和适应度函数2. Particle structure and fitness function

三粒子协同优化方法采用基于码本的优化方案，粒子结构的设计是基于码字的。每个粒子代表一个码本Y，Y是由M个L维的码字组成，表示该说话人码本模型，即Y＝{y₁，y₂，…，y_M}。将说话人训练特征矢量集聚类成M簇，每个码字

(j＝1，2，…，M)代表一簇，粒子结构如表1所示。The three-particle collaborative optimization method uses a codebook-based optimization scheme, and the design of the particle structure is based on codewords. Each particle represents a codebook Y, and Y is composed of M L-dimensional codewords, representing the speaker codebook model, that is, Y={y ₁ , y ₂ , . . . , y _M }. Cluster the speaker training feature vector set into M clusters, each codeword

(j=1, 2, ..., M) represents a cluster, and the particle structure is shown in Table 1.

表1粒子结构Table 1 Particle structure

y₁₁，y₁₂，…，y_1L y ₁₁ , y ₁₂ , ..., y _1L y₂₁，y₂₂，…，y_2L y ₂₁ , y ₂₂ , ..., y _2L ---- ---- y_j1，y_j2，…，y_jL y _j1 , y _j2 ,..., y _jL ---- ---- y_M1，y_M2，…，y_ML y _M1 , y _M2 , ..., y _{M L}

粒子的维数为D维，其中D＝M×L。第i个粒子(i＝1，2，3)的位置z_i＝(z_i1，z_i2，…，z_iD)由表1表示。粒子位置z_i的取值范围为z_min～z_max。在语音信号的矢量量化过程中，z_min和z_max一般分别取语音特征矢量集中每一维的最小值和最大值。速度v_i的最大值v_max＝z_max。The dimension of the particle is D dimension, where D=M×L. The position _zi = ( _zi1 , _zi2 , . . . , _ziD ) of the i-th particle (i=1, 2, 3) is shown in Table 1. The value range of the particle position z _i is z _min ~ z _max . In the process of vector quantization of speech signals, z _min and z _max generally take the minimum and maximum values of each dimension in the speech feature vector set, respectively. The maximum value v _max =z _max of the speed v _i .

粒子的最优位置由适应度函数值决定，适应度函数的选取要体现码本设计质量，因此三粒子协同优化方法选取由训练矢量与其对应的最近码字之间的均方误差(MSE)作为适应度函数，利用公式(式3)、(式4)计算。The optimal position of a particle is determined by the value of the fitness function, and the selection of the fitness function should reflect the quality of the codebook design, so the three-particle collaborative optimization method selects the mean square error (MSE) between the training vector and its corresponding nearest codeword as The fitness function is calculated using the formulas (Formula 3) and (Formula 4).

三、三粒子划分原则3. Principle of three-particle division

三粒子协同优化方法按照SFLA算法的子群划分规则，将群体中的6个粒子按适应值从优到劣排列，分成两个子群。其中，排名第1的个粒子分入SWARM_S，排名第2个粒子分入SWARM_L，第3的粒子分入SWARM_S，第4的粒子分入SWARM_L，第5的粒子分入SWARM_S，第6的粒子分入SWARM_L。该划分规则在混合蛙跳算法及其改进算法中广泛使用，比随机划分有更多的优越性，能快速完成两个子群间远距离的信息传递和全局信息交换。According to the subgroup division rules of the SFLA algorithm, the three-particle collaborative optimization method arranges the six particles in the group according to their fitness values from good to bad, and divides them into two subgroups. Among them, the 1st particle is classified into SWARM _S , the 2nd particle is classified into SWARM _L , the 3rd particle is classified into SWARM _S , the 4th particle is classified into SWARM _L , and the 5th particle is classified into SWARM _S. The 6th particle is classified into SWARM _L. This division rule is widely used in the hybrid leapfrog algorithm and its improved algorithm. It has more advantages than random division, and can quickly complete long-distance information transmission and global information exchange between two subgroups.

四、粒子更新策略4. Particle update strategy

①粒子的速度和位置更新① Particle speed and position update

2个三粒子群的粒子速度和位置更新公式如(式6)至(式9)。The particle velocity and position update formulas of two three-particle swarms are as (Formula 6) to (Formula 9).

其中，d＝1，2，…，D；r₁和r₂为均匀分布在[0，1]之间的随机数；w^L、c₁ ^L、c₂ ^L为SWARM_L的参数；w^S、c₁ ^S、c₂ ^S为SWARM_S的参数。Among them, d=1, 2,..., D; r ₁ and r ₂ are random numbers uniformly distributed between [0, 1]; w ^L , c ₁ ^L , c ₂ ^L are the parameters of SWARM _L ; w ^S , c ₁ ^S , c ₂ ^S are parameters of SWARM _S.

为保证精英粒子的精细搜索能力，精英粒子更新公式采用(式8)和(式9) 。In order to ensure the fine search ability of elite particles, the update formula of elite particles adopts (Formula 8) and (Formula 9).

②粒子的LBG操作② LBG operation of particles

完成粒子速度和位置更新之后，执行迭代次数为3的LBG算法，用以提高粒子局部寻优能力。码本设计过程中粒子的码字位置数值超过最大值将会导致空码字的出现，在粒子执行完LBG操作后会处理空码字，用具有较大误差的训练矢量来代替具有越界问题的码字。After the particle velocity and position are updated, the LBG algorithm with an iteration number of 3 is executed to improve the particle local optimization ability. In the process of codebook design, the value of the particle’s codeword position value exceeding the maximum value will lead to the appearance of an empty codeword. After the particle performs the LBG operation, the empty codeword will be processed, and the training vector with a large error will be used to replace the one with the out-of-bounds problem. Codeword.

五、基本流程5. Basic process

表2给出了三粒子协同优化方法的伪代码，其流程图如图4所示。Table 2 gives the pseudo code of the three-particle collaborative optimization method, and its flow chart is shown in Figure 4.

表2三粒子协同优化方法的伪代码Table 2 Pseudocode of the three-particle collaborative optimization method

六、与PPCO的区别6. Differences from PPCO

针对码本优化是一个典型的多峰多极值优化问题，与PPCO算法相比，三粒子协同优化方法从两个方面进一步提高码本优化设计性能。Codebook optimization is a typical multi-peak and multi-extreme optimization problem. Compared with PPCO algorithm, the three-particle collaborative optimization method further improves the design performance of codebook optimization from two aspects.

(1)引入混合蛙跳算法子群体划分规则和混合策略，增加子群粒子数，提高子群的搜索能力。(1) Introduce the hybrid leapfrog algorithm subgroup division rules and mixed strategies to increase the number of subgroup particles and improve the search ability of subgroups.

TPCO子群从粒子对变为三粒子，进一步加强搜索能力。通过观察PPCO的码本优化过程可以发现，粒子对之间的信息交换有利于提高优化效果。由于在PPCO中采用的是在两个粒子对之间随机交换一个粒子，存在三种可能的交换方式：①最优粒子交换；②最差粒子交换；③最优粒子与最差粒子交换。前两种方式基本保持了每个粒子对中优劣粒子的分布，第3种方式形成了较优粒子为一对，较差粒子为另一对，破坏了原有粒子对的分布。三种粒子交换方式的存在，有利于提高粒子对的多样性，提高收敛到全局最优的概率。但由于有可能粒子已经接近全局最优解需要进行精细的开掘搜索，随机的粒子交换会破坏原粒子对的分布，影响粒子对可能正在进行的精细搜索，从而降低收敛速度和精度。混合蛙跳算法子群体划分规则和混合策略有其鲜明的特色，个体按适应值从优到劣排列依次分配给各个子群体，使得每个子群体中既有较优的个体也有较差的个体，在提高子群体多样性的同时，保持了较优个体在子群的影响力，不破坏优劣个体在子群体的分布，有利于提高收敛速度和搜索精度。因此在混合粒子对优化算法中，采用SFLA的子群体划划分规则和混合策略来进行2个三粒子的划分与重组。The TPCO subgroup changes from particle pairs to three particles, further enhancing the search capability. By observing the codebook optimization process of PPCO, it can be found that the information exchange between particle pairs is beneficial to improve the optimization effect. Since a particle is randomly exchanged between two particle pairs in PPCO, there are three possible exchange methods: ① optimal particle exchange; ② worst particle exchange; ③ optimal particle and worst particle exchange. The first two methods basically maintain the distribution of good and bad particles in each particle pair, and the third method forms a pair of better particles and another pair of worse particles, which destroys the original distribution of particle pairs. The existence of three particle exchange methods is beneficial to increase the diversity of particle pairs and increase the probability of converging to the global optimum. However, since it is possible that the particles are already close to the global optimal solution, a fine excavation search is required, and the random particle exchange will destroy the distribution of the original particle pairs, affecting the possible fine search of the particle pairs, thereby reducing the convergence speed and accuracy. The subgroup division rules and mixed strategy of the hybrid leapfrog algorithm have their distinct characteristics. Individuals are assigned to each subgroup according to their fitness values from good to bad, so that each subgroup has both better individuals and poorer individuals. While improving the diversity of subgroups, the influence of better individuals in subgroups is maintained, and the distribution of good and bad individuals in subgroups is not destroyed, which is conducive to improving the convergence speed and search accuracy. Therefore, in the mixed particle pair optimization algorithm, the subgroup division rules and mixed strategy of SFLA are used to divide and recombine the two three particles.

(2)2个三粒子SWARM_L和SWARM_S选用不同的粒子更新参数，进一步加强全局探索和局部精细搜索能力。(2) The two three-particle SWARM _L and SWARM _S select different particle update parameters to further enhance the global exploration and local fine search capabilities.

PPCO的两个初始粒子对的粒子更新参数是一样的，为了实现在整个迭代过程中群体均具有全局探索和局部搜索能力，既能够从局部极值的邻域跳转到全局最优解的邻域，又能够在全局最优解的邻域内精细搜索，TPCO的2个三粒子SWARM_S和SWARM_L选用不同的粒子更新参数，使SWARM_L具有良好的全局探索能力，在局部极值点的邻域内能够良好地逃逸，与此对应SWARM_S能够在每一个局部找到最好解。该方法就好像有粗、细两个镜头的显微镜，粗镜头用来在大范围内寻找感兴趣的目标，而细镜头则用来对找到的目标仔细观察。再通过粒子的混合重组来实现2个三粒子间的信息交流。The particle update parameters of the two initial particle pairs of PPCO are the same. In order to realize that the population has global exploration and local search capabilities throughout the iterative process, it can jump from the neighborhood of the local extremum to the neighborhood of the global optimal solution. The two three-particle SWARM _S and SWARM _L of TPCO choose different particle update parameters, so that SWARM _L has a good global exploration ability. The domain can escape well, and correspondingly, SWARM _S can find the best solution in each part. The method is like a microscope with a thick lens and a thin lens, the coarse lens is used to find the object of interest in a wide range, and the fine lens is used to observe the found object carefully. Then, the information exchange between the two three particles is realized through the mixing and reorganization of the particles.

本发明所述的三粒子协同优化方法是在PPCO算法的基础上，针对码本优化是一个典型的多峰多极值优化问题，利用2个不同更新参数的三粒子实现局部深度搜索和全局探索，通过SFLA的混合策略实现三粒子间的信息快速交换，从而较好地平衡全局优化和局部搜索，使得算法能够跳出局部最优。将三粒子协同优化方法应用于说话人识别系统中，收敛速度快，说话人模型生成时间短，不依赖于说话人初始码本的选取，能够稳定地取得较好的识别效果。较好地解决了短语音说话人初始码本影响优化结果的问题，能广泛应用于短语音说话人识别的产品中。The three-particle collaborative optimization method described in the present invention is based on the PPCO algorithm, aiming at codebook optimization is a typical multi-peak multi-extreme optimization problem, using two three-particles with different update parameters to realize local deep search and global exploration , through the hybrid strategy of SFLA to realize the rapid exchange of information between the three particles, so as to better balance the global optimization and local search, so that the algorithm can jump out of the local optimum. Applying the three-particle collaborative optimization method to the speaker recognition system, the convergence speed is fast, the generation time of the speaker model is short, and it does not depend on the selection of the speaker's initial codebook, and can stably achieve better recognition results. It better solves the problem that the short speech speaker's initial codebook affects the optimization result, and can be widely used in short speech speaker recognition products.

最后应说明的是，以上实施例仅用以说明本发明的技术方案而非限制，尽管参照较佳实施例对本发明进行了详细说明，本领域的普通技术人员应当理解，可以对本发明的技术方案进行修改或者等同替换，而不脱离本发明技术方案的精神和范围，其均应涵盖在本发明的权利要求范围当中。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention without limitation. Although the present invention has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the present invention can be Modifications or equivalent replacements without departing from the spirit and scope of the technical solutions of the present invention shall be covered by the claims of the present invention.

Claims

1. A three-particle collaborative optimization method applied to speaker recognition based on vector quantization, including carrying out preprocessing and feature extraction to speaker voice signals, setting up speaker codebook models and identifying speakers, characterized in that, the The steps to build a speaker codebook model include:

A. Set the maximum number of iterations of the initial group, the number of mixed updates of the three particles, and the maximum number of iterations of the elite particles. The initial group size is 6 particles, each particle represents a codebook, and is divided into 2 subgroups according to the division rules of the hybrid leapfrog algorithm. Each subgroup is composed of three particles and named three particles. The two three particles use different particle update parameters to realize that one three particles has a good global exploration ability, and the other three particles can find the best results in each part. easy to understand;

B, performing the speed update and position update operations of the PSO algorithm for each particle of the three particles in each iteration, and performing the mixed update operation of the LBG algorithm with a number of iterations of 3;

C, the two three-particles of the initial group simultaneously perform a large-scale global detection and a small-scale local fine search in the solution space. Whenever the number of three-particle mixed updates is reached, the sub-group division rules of the hybrid leapfrog algorithm and A mixed strategy, and the two three-particles are mixed and ranked into two new three-particles, so as to realize global information exchange and co-evolution between the two three-particles;

D. When the maximum number of iterations of the initial group is satisfied, the two elite particles selected from the two three particles continue to perform speed update and position update operations until the maximum number of iterations of the elite particles is reached, and the optimal one is the speaker codebook model.

2. The three-particle collaborative optimization method applied to speaker recognition based on vector quantization according to claim 1, wherein said step A specifically further comprises:

A1, establish particle structure and fitness function;

A2, establish the principle of three-particle division.

3. The three-particle collaborative optimization method applied to speaker recognition based on vector quantization according to claim 1, wherein said step B specifically further comprises:

B1, establish particle velocity and position update strategy.

4. the three-particle collaborative optimization method that is applied to the speaker recognition based on vector quantization according to claim 2, is characterized in that, the optimum position of particle is determined by fitness function value, and described step A1 specifically comprises:

A11, the establishment of the particle structure is based on codewords, each particle represents a codebook Y, and Y is composed of M codewords of L dimension, representing the speaker codebook model, that is, Y={y ₁ , y ₂ ,...,y _M }, cluster the speaker training feature vector set into M clusters, each codeword

Represents a cluster, where j=1, 2, ..., M;

A12, the dimension of the particle is D dimension, where D=M×L, the position of the i-th particle is z _i =(z _i1 , z _i2 ,..., z _iD ), where i=1, 2, 3, the particle The value range of position zi is z _min ～ z _max , in the process of vector quantization of speech signal, z _min and z _max respectively take the minimum value and maximum value of each dimension in the speech feature vector set, and the maximum value v of velocity v _i _max =z _max .

5. The three-particle collaborative optimization method applied to speaker recognition based on vector quantization according to claim 2, wherein said step A2 specifically comprises:

According to the subgroup division rules of the hybrid leapfrog algorithm, the 6 particles in the group are arranged according to the fitness value obtained by the fitness function from good to bad, and divided into two subgroups SWARM _S and SWARM _L , among which, the particles ranked first The second particle is classified into SWARM _L , the third _particle is classified into SWARM _S , the fourth particle is classified into SWARM _L , the fifth particle is classified into SWARM _S , and the sixth particle is classified into SWARM _L .

6. the three-particle collaborative optimization method that is applied to the speaker recognition based on vector quantization according to claim 3, is characterized in that, in described step B1, the formula of the particle velocity update and the position update of 2 three particles is formula 6 to formula 9:

v_{id}^{L} = w^{L} v_{id}^{L} + c_{1}^{L} r_{1} (p_{id}^{L} - x_{id}^{L}) + c_{2}^{L} r_{2} (p_{gd}^{L} - x_{id}^{L})

(Formula 6)

x_{id}^{L} = x_{id}^{L} + v_{id}^{L}

(Formula 7)

v_{jd}^{S} = w^{S} v_{jd}^{S} + c_{1}^{S} r_{1} (p_{jd}^{S} - x_{jd}^{S}) + c_{2}^{S} r_{2} (p_{gd}^{S} - x_{jd}^{S})

(Formula 8)

x_{jd}^{S} = x_{jd}^{S} + v_{jd}^{S}

(Formula 9)

Among them, d=1, 2, ..., D, r ₁ and r ₂ are random numbers uniformly distributed between [0, 1], w ^L ,

is the parameter of SWARM _L , w ^S ,

is the parameter of SWARM _S , w is the inertia weight, c ₁ and c ₂ are the learning factors;

Among them, the elite particle speed update and position update formulas use formula 8 and formula 9.

7. the three-particle collaborative optimization method that is applied to the speaker recognition based on vector quantization according to claim 4, is characterized in that, chooses the mean square error MSE between feature vector and its corresponding nearest codeword as fitness function , the formula for calculating the mean square error MSE is:

\tilde{D.} = \frac{1}{T} Σ_{i = 1}^{T} {[d_{\min} (x_{i})]}^{2}

(Formula 3)

d (x_{i}, {the y}_{j}) = | | x_{i} - {the y}_{j} | | = \sqrt{Σ_{p = 1}^{L} {(x_{ip} - {the y}_{jp})}^{2}}

(Formula 4)

in,

d(x _i , y _j ) is the Euclidean distance.