+

CN113823293A - A method and system for speaker recognition based on speech enhancement - Google Patents

A method and system for speaker recognition based on speech enhancement Download PDF

Info

Publication number
CN113823293A
CN113823293A CN202111140239.5A CN202111140239A CN113823293A CN 113823293 A CN113823293 A CN 113823293A CN 202111140239 A CN202111140239 A CN 202111140239A CN 113823293 A CN113823293 A CN 113823293A
Authority
CN
China
Prior art keywords
speaker
voice
speech
data
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111140239.5A
Other languages
Chinese (zh)
Other versions
CN113823293B (en
Inventor
熊盛武
张欣冉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Technology WUT
Original Assignee
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Technology WUT filed Critical Wuhan University of Technology WUT
Priority to CN202111140239.5A priority Critical patent/CN113823293B/en
Publication of CN113823293A publication Critical patent/CN113823293A/en
Application granted granted Critical
Publication of CN113823293B publication Critical patent/CN113823293B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/18Artificial neural networks; Connectionist approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Telephonic Communication Services (AREA)
  • Filters That Use Time-Delay Elements (AREA)

Abstract

本发明提供了一种基于语音增强的说话人识别方法及系统,其中的方法包括如下步骤:S1采集大量的原始语音数据;S2对原始语音数据中包含的干扰噪声和无关说话人声音进行去除;S3:提取MFCC特征和GFCC特征,融合得到语音的声学特征;S4:基于卷积神经网络构建说话人识别模型,将从大量的原始语音数据中提取的声学特征作为训练数据,对说话人识别模型进行训练;S5:收集注册语音样本进行注册,再获取待识别说话人的语音数据,采用S2和S3的方法进行语音增强和特征提取后,输入训练好的模型得到待识别说话人特征,根据待识别说话人特征与已注册的说话人特征的相似度,对待识别说话人的身份进行识别。本发明可以提高声纹识别系统的识别准确率。

Figure 202111140239

The present invention provides a method and system for speaker recognition based on speech enhancement, wherein the method includes the following steps: S1 collects a large amount of original speech data; S2 removes interference noise and irrelevant speaker voices contained in the original speech data; S3: Extract the MFCC features and GFCC features, and fuse the acoustic features of the speech; S4: Build a speaker recognition model based on convolutional neural networks, and use the acoustic features extracted from a large number of original speech data as training data. Carry out training; S5: Collect registered voice samples for registration, and then obtain the voice data of the speaker to be recognized. After using the methods of S2 and S3 for voice enhancement and feature extraction, input the trained model to obtain the characteristics of the speaker to be recognized. Identify the similarity between speaker features and registered speaker features, and identify the identity of the speaker to be identified. The invention can improve the recognition accuracy of the voiceprint recognition system.

Figure 202111140239

Description

Speaker recognition method and system based on voice enhancement
Technical Field
The invention relates to the field of pattern recognition, in particular to a speaker recognition method and system based on voice enhancement.
Background
Voiceprint recognition is a technology for extracting the voice characteristics and the content information of a speaker and automatically verifying the identity of the speaker. With the wide application of artificial intelligence in people's daily life, voiceprint recognition technology has also gradually highlighted its role, such as voice-based authentication of personal intelligent devices (e.g., mobile phones, vehicles, and notebook computers); the transaction safety of bank transaction and remote payment is ensured; and automatic identity tagging.
However, due to the complexity of background noise in real life, the voice to be recognized always contains various noises, which will result in poor voiceprint recognition effect, so how to overcome the noise problem of the voice to be recognized is a problem to be solved urgently when the voiceprint recognition technology is applied to real life.
Disclosure of Invention
The invention provides a speaker recognition method and system based on voice enhancement, which are used for solving or at least partially solving the technical problem of poor voiceprint recognition effect in the prior art.
In order to solve the above technical problem, a first aspect of the present invention provides a speaker recognition method based on speech enhancement, including:
s1: collecting a large amount of original voice data;
s2: removing interference noise and irrelevant speaker voice contained in original voice data to obtain enhanced voice data;
s3: extracting MFCC characteristics and cepstrum coefficient GFCC characteristics based on a Gamma tone filter from the enhanced voice data, and fusing the MFCC characteristics and the GFCC characteristics to obtain acoustic characteristics of voice;
s4: constructing a speaker recognition model based on a convolutional neural network, taking acoustic features extracted from a large amount of original voice data as training data, and training the speaker recognition model to obtain a trained model;
s5: collecting registered voice samples, performing voice enhancement and feature extraction by adopting methods of S2 and S3, inputting a trained model to obtain the depth feature of each registered voice sample, taking the depth feature as the speaker feature of each speaker, and storing the speaker feature; obtaining voice data of the speaker to be recognized, performing voice enhancement and feature extraction by adopting methods of S2 and S3, inputting the trained model to obtain the feature of the speaker to be recognized, and recognizing the identity of the speaker to be recognized according to the similarity between the feature of the speaker to be recognized and the stored feature of the speaker.
In one embodiment, step S1 is performed by recording raw voice data.
In one embodiment, step S2 is implemented by removing the interference noise and the irrelevant speaker voice contained in the original voice data by using the generation countermeasure network to achieve end-to-end voice enhancement.
In one embodiment, step S3 includes:
s3.1: carrying out voice activity endpoint detection on the enhanced voice data to eliminate a long-time mute section;
s3.2: preprocessing the voice obtained in the step S3.1;
s3.3: performing fast Fourier transform on the preprocessed voice to obtain the frequency spectrum of each frame, and performing a modular square on the frequency spectrum of the voice signal to obtain a power spectrum of the voice signal;
s3.4: enabling the power spectrum obtained by the fast Fourier transform to pass through a group of Mel-scale triangular filters to obtain the energy value of each frame of data in the frequency band corresponding to the triangular filters;
s3.5: logarithm is taken on the energy value of each frame data in the frequency band corresponding to the triangular filter, and the logarithmic energy output by each filter bank is calculated;
s3.6: substituting the logarithmic energy into discrete cosine transform to obtain L-order Mel cepstrum coefficient;
s3.7: the power spectrum obtained by fast Fourier transform is processed by index compression and discrete cosine transform through a Gamma atom filter to obtain the GFCC characteristic of the voice signal;
s3.8: and cascading the MFCC characteristics and the GFCC characteristics of the voice signal to obtain the acoustic characteristics of the voice signal.
In one embodiment, step S4 includes:
performing voice enhancement on a large amount of collected original voice data, extracting acoustic features from the voice data to be used as training data, and inputting the training data into a speaker recognition model for training to obtain a trained model;
in one embodiment, the step S5 of registering data including h voice samples of each speaker, and identifying the identity of the speaker to be identified according to the similarity between the characteristics of the speaker to be identified and the characteristics of the registered speaker comprises:
after voice enhancement and feature extraction are carried out on each voice sample in the registration data, the depth feature of each voice sample is extracted from the obtained acoustic feature through a convolutional neural network of a speaker recognition model;
averaging the h depth features of each speaker to serve as the speaker feature of each speaker, and storing the speaker feature in a database;
after voice data of a speaker to be recognized is subjected to voice enhancement and feature extraction, inputting a trained model to obtain the feature of the speaker to be recognized;
and calculating the cosine similarity cos of the characteristics of the speaker to be identified and all the characteristics of the speakers stored in the database, wherein if the maximum cosine similarity is greater than a set threshold, the speaker in the database corresponding to the cosine similarity is the identity of the identified speaker, and otherwise, the speaker is rejected.
Based on the same inventive concept, the second aspect of the present invention provides a speaker recognition system based on speech enhancement, comprising:
the voice acquisition module is used for acquiring a large amount of original voice data;
the voice enhancement module is used for removing interference noise and irrelevant speaker voice contained in the original voice data to obtain enhanced voice data;
the voice feature extraction module is used for extracting MFCC features and cepstrum coefficient GFCC features based on a Gamma filter from the enhanced voice data and fusing the MFCC features and the GFCC features to obtain the acoustic features of the voice;
the model training module is used for constructing a speaker recognition model based on a convolutional neural network, taking acoustic features extracted from a large amount of original voice data as training data, and training the speaker recognition model to obtain a trained model;
the speaker recognition module is used for collecting the registered voice samples, inputting the trained model to obtain the depth characteristic of each registered voice sample after voice enhancement and characteristic extraction are carried out by adopting the methods of the voice enhancement module and the voice characteristic extraction module, taking the depth characteristic as the speaker characteristic of each speaker, and storing the speaker characteristic of each speaker; obtaining the voice data of the speaker to be recognized, inputting the trained model to obtain the characteristics of the speaker to be recognized after performing voice enhancement and characteristic extraction by adopting the methods of a voice enhancement module and a voice characteristic extraction module, and recognizing the identity of the speaker to be recognized according to the similarity between the characteristics of the speaker to be recognized and the stored characteristics of the speaker.
One or more technical solutions in the embodiments of the present application have at least one or more of the following technical effects:
the invention provides a speaker recognition method based on voice enhancement, which uses an end-to-end voice enhancement method to remove noise in voice and irrelevant speaker voice, uses GFCC (noise robust character) characteristics with more noise robustness in the voiceprint recognition process, fuses the MFCC characteristics and the GFCC characteristics to obtain acoustic characteristics of voice, can improve the noise robustness, constructs a speaker recognition model based on a convolutional neural network, trains the model by using training data, collects registered voice samples, extracts and stores the speaker characteristics of each registered speaker, and recognizes the identity of the speaker to be recognized according to the similarity between the characteristics of the speaker to be recognized and the stored characteristics of the speaker. The problem of among the prior art because the noise that contains in the pronunciation leads to the voiceprint recognition effect not good is solved, improve the discernment rate of accuracy of voiceprint discernment.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart of a method for speaker recognition based on speech enhancement according to an embodiment of the present invention;
FIG. 2 is a flow chart of the voice feature MFCC extraction in the practice of the present invention;
FIG. 3 is a flow chart of the extraction of the GFCC speech feature in the practice of the present invention;
FIG. 4 is a block diagram of a speaker recognition system based on speech enhancement in accordance with an embodiment of the present invention.
Detailed Description
The invention aims to provide a speaker recognition method based on voice enhancement, which solves the problem of poor recognition effect caused by the fact that the voice to be recognized contains noise and accurate feature extraction cannot be carried out in the prior art.
The main concept of the invention is as follows:
firstly, collecting a large amount of original voice data, and then removing interference noise and irrelevant speaker voice contained in the original voice data to obtain enhanced voice data; extracting MFCC characteristics and cepstrum coefficient GFCC characteristics based on a Gamma-tone filter from the enhanced voice data, and fusing the MFCC characteristics and the GFCC characteristics to obtain acoustic characteristics of voice; then, constructing a speaker recognition model based on a convolutional neural network, taking acoustic features extracted from a large amount of original voice data as training data, and training the speaker recognition model to obtain a trained model; collecting registered voice samples, performing voice enhancement and feature extraction by adopting methods of S2 and S3, inputting a trained model to obtain the depth feature of each registered voice sample, taking the depth feature as the speaker feature of each speaker, and storing the speaker feature; and then acquiring voice data of the speaker to be recognized, performing voice enhancement and feature extraction by adopting methods of S2 and S3, inputting the trained model to obtain the feature of the speaker to be recognized, and recognizing the identity of the speaker to be recognized according to the similarity between the feature of the speaker to be recognized and the stored feature of the speaker.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
The embodiment of the invention provides a speaker recognition method based on voice enhancement, which comprises the following steps:
s1: collecting a large amount of original voice data;
s2: removing interference noise and irrelevant speaker voice contained in original voice data to obtain enhanced voice data;
s3: extracting MFCC characteristics and cepstrum coefficient GFCC characteristics based on a Gamma tone filter from the enhanced voice data, and fusing the MFCC characteristics and the GFCC characteristics to obtain acoustic characteristics of voice;
s4: constructing a speaker recognition model based on a convolutional neural network, taking acoustic features extracted from a large amount of original voice data as training data, and training the speaker recognition model to obtain a trained model;
s5: collecting registered voice samples, performing voice enhancement and feature extraction by adopting methods of S2 and S3, inputting a trained model to obtain the depth feature of each registered voice sample as the speaker feature of each speaker, and storing the speaker feature of each speaker; obtaining voice data of the speaker to be recognized, performing voice enhancement and feature extraction by adopting methods of S2 and S3, inputting the trained model to obtain the feature of the speaker to be recognized, and recognizing the identity of the speaker to be recognized according to the similarity between the feature of the speaker to be recognized and the stored feature of the speaker.
Specifically, in the speaker recognition model training module, the network model uses a convolutional neural network, the classifier uses softmax, and the trained model is an offline model. The registered voice data includes a plurality of speakers, each speaker including h voice samples.
Please refer to fig. 1, which is a flowchart of a speaker recognition method based on speech enhancement according to an embodiment of the present invention.
In one embodiment, step S1 is performed by recording raw voice data.
In one embodiment, step S2 is implemented by removing the interference noise and the irrelevant speaker voice contained in the original voice data by using the generation countermeasure network to achieve end-to-end voice enhancement.
The generation countermeasure network is a complete convolution structure of a coder-decoder and is used for removing noise in voice to generate a clean voice waveform; the countermeasure network sets a threshold value on the basis of the clean voice waveform and the noise voice waveform for judging whether the generated voice waveform is clean or not, and when the values of the generated voice waveform and the noise voice waveform reach the threshold value, the generated voice waveform is sufficiently clean.
The invention realizes an end-to-end voice enhancement method in a generation countermeasure framework to remove interference noise and irrelevant speaker voice in voice.
In the specific implementation process, pure voice and common noise in life are mixed by a random signal-to-noise ratio to obtain noise voice corresponding to the pure voice, and then the pure voice data set and the corresponding noise voice data set are used for training to obtain a generation countermeasure network for realizing end-to-end voice enhancement.
The speech model training process is described in detail below by taking as an example a model for training a data set containing 1000 clean speeches.
The clean speech set and the live noise data set are mixed with a random signal-to-noise ratio (typically between-10 dB to 10 dB) to obtain a noise speech set corresponding to the clean speech set. The method comprises the following steps of obtaining generated pure voice by noise voice through a generation network, and judging whether the generated pure voice and real pure voice are real pure voice through a discrimination network: if the generated clean speech is obtained, the discriminator should output 0, and if the true clean speech is obtained, 1. And then, parameters are updated by obtaining the reverse propagation of the error gradient through the loss function until the generated pure voice and the real pure voice cannot be accurately judged by the discriminator, and the generated network is the trained voice enhancement network. Intuitively, it is: the discriminator has to tell the generator how to adjust so that the clean speech it generates becomes more realistic.
In one embodiment, step S3 includes:
s3.1: performing voice activity endpoint detection on the enhanced voice data and eliminating a long-time mute section;
s3.2: preprocessing the voice obtained in the step S3.1;
s3.3: performing fast Fourier transform on the preprocessed voice to obtain the frequency spectrum of each frame, and performing a modular square on the frequency spectrum of the voice signal to obtain a power spectrum of the voice signal;
s3.4: enabling the power spectrum obtained by the fast Fourier transform to pass through a group of Mel-scale triangular filters to obtain the energy value of each frame of data in the frequency band corresponding to the triangular filters;
s3.5: logarithm is taken on the energy value of each frame data in the frequency band corresponding to the triangular filter, and the logarithmic energy output by each filter bank is calculated;
s3.6: substituting the logarithmic energy into discrete cosine transform to obtain L-order Mel cepstrum coefficient;
s3.7: the power spectrum obtained by fast Fourier transform is processed by index compression and discrete cosine transform through a Gamma atom filter to obtain the GFCC characteristic of the voice signal;
s3.8: and cascading the MFCC characteristics and the GFCC characteristics of the voice signal to obtain the acoustic characteristics of the voice signal.
In a specific implementation process, the preprocessing includes pre-emphasis, framing, and windowing. The specific steps of feature extraction are as follows:
s301: performing voice activity endpoint detection (VAD) on the enhanced voice to eliminate a long mute period;
s302: the speech signal is pre-emphasized by passing it through a high-pass filter: h (z) ═ 1-. mu.z-1H (z) is a high-pass filter; μ pre-emphasis factor, typically taken as 0.97; z is a speech signal.
S303: the sampling frequency of the voice signal is 16KHz, 512 sampling points are firstly grouped into a frame, and the corresponding time length is 512/16000 × 1000 ═ 32 ms. An overlap region is formed between two adjacent frames, and the overlap region includes 256 sampling points, 1/2 of sampling point 512.
S304: assuming that the signal after framing is s (N), N is 0,1, and N-1, where N is the total number of frames, each frame is multiplied by a hamming window:
x(n)=s(n)×W(n),
Figure BDA0003283600940000071
w (n) is a Hamming window; n is the total frame number; n-1, 0, 1.
S305: and performing fast Fourier transform on each frame signal x (n) after the framing and windowing to obtain the frequency spectrum of each frame, and performing modular squaring on the frequency spectrum of the voice signal to obtain the power spectrum of the voice signal. The discrete fourier transform of a speech signal (the speech signal is stored in discrete form) is:
Figure BDA0003283600940000072
x (n) is the input speech signal, and T represents the number of points of the Fourier transform.
S306: performing fast Fourier transform to obtain a power spectrum | X (k) | non-conducting2Triangular filter H through a set of Mel scalesm(k) M is more than or equal to 0 and less than or equal to M, and M is the number of filters: respectively multiplying and accumulating the power spectrum with each filter to obtain the energy value of the frame data in the corresponding frequency band of the filter
Figure BDA0003283600940000073
S307: taking log of the energy values, the log energy output by each filter bank is calculated as:
Figure BDA0003283600940000074
t represents the number of points of Fourier transform; m is the number of the filters; | X (k) messaging2The power spectrum obtained for S4; hm(k) M is more than or equal to 0 and less than or equal to M is a group of triangular filters with the Mel scale.
S308: substituting the logarithmic energy of S307 into discrete cosine transform to obtain L-order Mel cepstrum coefficient MFCC:
Figure BDA0003283600940000081
l refers to the order of the MFCC coefficient, and is usually 12-16; m is the number of the triangular filters, and M is more than or equal to 0 and less than or equal to M.
S309: and (3) passing the power spectrum obtained by the fast Fourier transform through a Gamma atom filter, and then performing index compression and Discrete Cosine Transform (DCT) to obtain the GFCC characteristics of the voice signal.
S310: and cascading the MFCC characteristics and the GFCC characteristics of the voice signal to obtain the GMCC characteristics of the voice signal.
Fig. 2 and fig. 3 are a flow chart of voice feature MFCC extraction and a flow chart of voice feature GFCC extraction, respectively, in the implementation of the present invention.
In one embodiment, step S4 includes:
and performing voice enhancement on a large amount of collected original voice data, extracting acoustic features from the voice data to be used as training data, and inputting the training data into a speaker recognition model for training to obtain a trained model.
Specifically, the training model is an off-line process, and the training of the speaker recognition model:
collecting training samples in a recording mode; the collected voice samples pass through a voice preprocessing module (a voice enhancement module and a voice feature extraction module) to obtain the GMCC features of the voice; and taking the GMCC characteristics as the input of the model, and training the speaker recognition model by adopting a convolutional neural network structure and softmax classification.
The following describes the speaker recognition model training process by taking training a model containing 1000 speakers as an example.
Collecting a sample of each speaker, wherein each speaker collects 100 samples; obtaining GMCC characteristics of voice of all voice samples through a voice preprocessing module (a voice enhancement module and a voice characteristic extraction module) to be used as training data of a convolutional neural network (a speaker recognition model), wherein all the training data are randomly divided into 5:1 and respectively used as a training set and a verification set; training the convolutional network by using a training set, and finishing the training of the convolutional network when the identification precision of the trained convolutional network on a verification set is basically kept unchanged; otherwise, continuing training. The trained convolutional network is the speaker recognition offline model.
In one embodiment, the step S5 of registering data including h voice samples of each speaker, and identifying the identity of the speaker to be identified according to the similarity between the characteristics of the speaker to be identified and the characteristics of the registered speaker comprises:
after voice enhancement and feature extraction are carried out on each voice sample in the registration data, the depth feature of each voice sample is extracted from the obtained acoustic feature through a convolutional neural network of a speaker recognition model;
averaging the h depth features of each speaker to serve as the speaker feature of each speaker, and storing the speaker feature in a database;
after voice data of a speaker to be recognized is subjected to voice enhancement and feature extraction, inputting a trained model to obtain the feature of the speaker to be recognized;
and calculating the cosine similarity cos of the characteristics of the speaker to be identified and all the characteristics of the speakers stored in the database, wherein if the maximum cosine similarity is greater than a set threshold, the speaker in the database corresponding to the cosine similarity is the identity of the identified speaker, and otherwise, the speaker is rejected.
A registration mode:
collecting registration samples in a recording mode; obtaining GMCC characteristics of voice through a voice preprocessing module by the collected registration samples; extracting Deep Feature (depth Feature) of each voice sample from GMCC features of voice through a speaker recognition offline model; enrollment data (i.e., speaker characteristics for each speaker) is generated and stored in a database.
For example, samples of 10 speakers (20 speech samples per person) are taken; the voice preprocessing module processes all voice samples to obtain GMCC characteristics of voice; obtaining Deep features of 200 voice samples by using GMCC characteristics of voice through a speaker recognition offline model; then averaging 20 Deep features of each speaker as the characteristics of each speaker; save 10 speaker profiles in the database: a spaker 0, spaker 1, a., spaker 9.
Identifying a mode:
collecting a sample to be identified by adopting a recording mode; obtaining GMCC characteristics of a sample to be recognized through a voice preprocessing module; obtaining Deep Feature of a sample to be identified through GMCC features through a speaker identification offline model, and taking the Deep Feature as the features of the speaker to be identified; calculating the cosine similarity cos of the characteristics of the speaker to be identified and the characteristics of all speakers in the database, wherein if the maximum cosine similarity is greater than a certain threshold, the speaker in the database corresponding to the cosine similarity is the identified speaker; otherwise, rejecting.
For example, a piece of voice data of the speaker is collected; obtaining GMCC characteristics through a voice preprocessing module; obtaining Deep Feature of the voice data by using the GMCC Feature through a speaker recognition offline model as the speaker Feature; calculating cosine similarity of the speaker characteristics and the 10 speaker characteristics stored in the database to obtain cos0, cos1 and cos9, finding the maximum value cos _ max of the 10 cosine similarity and the number speaker _ x of the corresponding speaker, if the maximum value is greater than a set threshold value, accepting the speaker as the speaker _ x, and otherwise, identifying the speaker as an unregistered speaker.
In summary, the invention realizes a speaker recognition method based on speech enhancement through speech acquisition, speech enhancement, speech feature extraction, speaker model training, speaker registration and speaker recognition.
Compared with the prior art, the invention has the beneficial effects that:
the end-to-end voice enhancement method is used to remove noise in voice and irrelevant speaker voice, GFCC features with noise robustness are used in the voiceprint recognition process, the noise robustness of the whole system is improved, the problem of poor voiceprint recognition effect caused by noise contained in voice can be solved, and the recognition accuracy of the voiceprint recognition system is improved.
Example two
Based on the same inventive concept, the present embodiment provides a speaker recognition system based on speech enhancement, please refer to fig. 4, the system includes:
a voice collecting module 201, configured to collect a large amount of original voice data;
the voice enhancement module 202 is configured to remove interference noise and irrelevant speaker voice included in the original voice data to obtain enhanced voice data;
a speech feature extraction module 203, configured to extract MFCC features and cepstrum coefficient GFCC features based on a Gammatone filter from the enhanced speech data, and fuse the MFCC features and the GFCC features to obtain acoustic features of the speech;
the model training module 204 is used for constructing a speaker recognition model based on a convolutional neural network, taking acoustic features extracted from a large amount of original voice data as training data, and training the speaker recognition model to obtain a trained model;
the speaker recognition module 205 is used for registering and recognizing speakers, collecting registered voice samples, performing voice enhancement and feature extraction by adopting the methods of the voice enhancement module and the voice feature extraction module, inputting the trained model to obtain the depth feature of each registered voice sample, and storing the depth feature as the speaker feature of each speaker; obtaining the voice data of the speaker to be recognized, inputting the trained model to obtain the characteristics of the speaker to be recognized after performing voice enhancement and characteristic extraction by adopting the methods of a voice enhancement module and a voice characteristic extraction module, and recognizing the identity of the speaker to be recognized according to the similarity between the characteristics of the speaker to be recognized and the stored characteristics of the speaker.
Since the system described in the second embodiment of the present invention is a system adopted for implementing the speaker recognition method based on speech enhancement according to the first embodiment of the present invention, a person skilled in the art can understand the specific structure and deformation of the system based on the method described in the first embodiment of the present invention, and thus the details are not described herein again. All systems adopted by the method of the first embodiment of the present invention are within the intended protection scope of the present invention.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (7)

1.一种基于语音增强的说话人识别方法,其特征在于,包括:1. a speaker recognition method based on speech enhancement, is characterized in that, comprises: S1:采集大量的原始语音数据;S1: Collect a large amount of raw voice data; S2:对原始语音数据中包含的干扰噪声和无关说话人声音进行去除,得到增强后的语音数据;S2: remove the interference noise and irrelevant speaker voices contained in the original voice data to obtain enhanced voice data; S3:从增强后的语音数据中提取MFCC特征和基于Gammatone滤波器的倒谱系数GFCC特征,将MFCC特征和GFCC特征进行融合得到语音的声学特征;S3: Extract the MFCC feature and the cepstral coefficient GFCC feature based on the Gammatone filter from the enhanced speech data, and fuse the MFCC feature and the GFCC feature to obtain the acoustic feature of the speech; S4:基于卷积神经网络构建说话人识别模型,将从大量的原始语音数据中提取的声学特征作为训练数据,对说话人识别模型进行训练,得到训练好的模型;S4: Build a speaker recognition model based on a convolutional neural network, and use the acoustic features extracted from a large amount of raw speech data as training data to train the speaker recognition model to obtain a trained model; S5:收集注册语音样本,采用S2和S3的方法进行语音增强和特征提取后,输入训练好的模型得到每个注册语音样本的深度特征,作为每个说话人的说话人特征,并将其进行保存;获取待识别说话人的语音数据,采用S2和S3的方法进行语音增强和特征提取后,输入训练好的模型得到待识别说话人特征,根据待识别说话人特征与已保存的说话人特征的相似度,对待识别说话人的身份进行识别。S5: Collect registered voice samples, use the methods of S2 and S3 for voice enhancement and feature extraction, input the trained model to obtain the depth features of each registered voice sample, as the speaker features of each speaker, and carry out Save; obtain the speech data of the speaker to be recognized, use the methods of S2 and S3 for speech enhancement and feature extraction, input the trained model to obtain the characteristics of the speaker to be recognized, according to the characteristics of the speaker to be recognized and the saved speaker characteristics The similarity is to identify the identity of the speaker to be identified. 2.如权利要求1所述的说话人识别方法,其特征在于,步骤S1采用录音的方式进行原始语音数据的采集。2 . The speaker recognition method according to claim 1 , wherein, in step S1 , a recording method is used to collect the original speech data. 3 . 3.如权利要求1所述的说话人识别方法,其特征在于,步骤S2采用生成对抗网络对原始语音数据中包含的干扰噪声和无关说话人声音进行去除,实现端到端的语音增强。3 . The speaker recognition method according to claim 1 , wherein in step S2 , a generative adversarial network is used to remove interference noise and irrelevant speaker voices contained in the original speech data, so as to realize end-to-end speech enhancement. 4 . 4.如权利要求1所述的说话人识别方法,其特征在于,步骤S3包括:4. The speaker identification method as claimed in claim 1, wherein step S3 comprises: S3.1:对增强后的语音数据进行语音活动端点检测,消除长时间的静音段;S3.1: Perform voice activity endpoint detection on the enhanced voice data to eliminate long silent periods; S3.2:对步骤S3.1得到的语音进行预处理;S3.2: preprocess the speech obtained in step S3.1; S3.3:对预处理后的语音进行快速傅里叶变换得到各帧的频谱,并对语音信号的频谱取模平方得到语音信号的功率谱;S3.3: Perform fast Fourier transform on the preprocessed speech to obtain the frequency spectrum of each frame, and take the modulo square of the frequency spectrum of the speech signal to obtain the power spectrum of the speech signal; S3.4:将快速傅里叶变换得到的功率谱通过一组梅尔尺度的三角滤波器,得到每一帧数据在三角滤波器对应频段的能量值;S3.4: Pass the power spectrum obtained by the fast Fourier transform through a set of mel-scale triangular filters to obtain the energy value of each frame of data in the corresponding frequency band of the triangular filter; S3.5:对每一帧数据在三角滤波器对应频段的能量值取对数,计算每个滤波器组输出的对数能量;S3.5: Take the logarithm of the energy value of each frame of data in the corresponding frequency band of the triangular filter, and calculate the logarithmic energy output by each filter bank; S3.6:将对数能量代入离散余弦变换,求出L阶的梅尔倒谱系数;S3.6: Substitute the logarithmic energy into the discrete cosine transform to obtain the L-order Mel cepstral coefficient; S3.7:将快速傅里叶变换得到的功率谱,通过Gammatone滤波器,再进行指数压缩和离散余弦变换得到语音信号的GFCC特征;S3.7: The power spectrum obtained by the fast Fourier transform is passed through the Gammatone filter, and then exponential compression and discrete cosine transform are performed to obtain the GFCC feature of the speech signal; S3.8:将语音信号的MFCC特征和GFCC特征进行级联,得到语音信号的声学特征。S3.8: Concatenate the MFCC feature and the GFCC feature of the speech signal to obtain the acoustic feature of the speech signal. 5.如权利要1所述的说话人识别方法,其特征在于,步骤S4包括:5. The speaker recognition method as claimed in claim 1, wherein step S4 comprises: 将大量的原始语音数据通过语音增强,然后从中提取声学特征作为训练数据,输入到说话人识别模型进行训练,得到训练好的模型。A large amount of original speech data is enhanced by speech, and then acoustic features are extracted from it as training data, which is input to the speaker recognition model for training, and a trained model is obtained. 6.如权利要1所述的说话人识别方法,其特征在于,注册数据包括每个说话人的h个语音样本,根据待识别说话人特征与已注册的说话人特征的相似度,对待识别说话人的身份进行识别,步骤S5包括:6. speaker recognition method as claimed in claim 1 is characterized in that, registration data comprises h speech samples of each speaker, according to the similarity of speaker feature to be recognized and registered speaker feature, to be recognized The identity of the speaker is identified, and step S5 includes: 将注册数据中的每个语音样本进行语音增强和特征提取后,将得到的声学特征通过说话人识别模型的卷积神经网络提取每个语音样本的深度特征;After each speech sample in the registration data is subjected to speech enhancement and feature extraction, the obtained acoustic features are passed through the convolutional neural network of the speaker recognition model to extract the depth features of each speech sample; 将每个说话人的h个深度特征取平均,作为每个说话人的说话人特征,保存在数据库中;The h depth features of each speaker are averaged as the speaker features of each speaker, and stored in the database; 将待识别说话人的语音数据通过语音增强和特征提取后,输入训练好的模型得到待识别说话人特征;After the speech data of the speaker to be recognized is processed by voice enhancement and feature extraction, the trained model is input to obtain the characteristics of the speaker to be recognized; 计算待识别说话人特征和数据库中保存的所有说话人特征的余弦相似度cos,如果最大的余弦相似度大于设定阈值,则该余弦相似度对应的数据库中的说话人即为识别到的说话人身份,否则拒绝。Calculate the cosine similarity cos between the characteristics of the speaker to be recognized and all the speaker characteristics stored in the database. If the maximum cosine similarity is greater than the set threshold, the speaker in the database corresponding to the cosine similarity is the recognized speech. identity, otherwise refuse. 7.一种基于语音增强的说话人识别系统,其特征在于,包括:7. A speaker recognition system based on speech enhancement, characterized in that, comprising: 语音采集模块,用于采集大量的原始语音数据;The voice acquisition module is used to collect a large amount of original voice data; 语音增强模块,用于对原始语音数据中包含的干扰噪声和无关说话人声音进行去除,得到增强后的语音数据;The speech enhancement module is used to remove the interference noise and irrelevant speaker voices contained in the original speech data to obtain the enhanced speech data; 语音特征提取模块,用于从增强后的语音数据中提取MFCC特征和基于Gammatone滤波器的倒谱系数GFCC特征,将MFCC特征和GFCC特征进行融合得到语音的声学特征;The voice feature extraction module is used to extract MFCC features and cepstral coefficient GFCC features based on Gammatone filter from the enhanced voice data, and fuse the MFCC features and GFCC features to obtain the acoustic features of the voice; 模型训练模块,用于基于卷积神经网络构建说话人识别模型,将从大量的原始语音数据中提取的声学特征作为训练数据,对说话人识别模型进行训练,得到训练好的模型;The model training module is used to build a speaker recognition model based on a convolutional neural network. The acoustic features extracted from a large amount of original speech data are used as training data to train the speaker recognition model to obtain a trained model; 说话人识别模块,收集注册语音样本,采用语音增强模块和语音特征提取模块的方法进行语音增强和特征提取后,输入训练好的模型得到每个注册语音样本的深度特征,作为每个说话人的说话人特征,并将其进行保存;获取待识别说话人的语音数据,采用语音增强模块和语音特征提取模块的方法进行语音增强和特征提取后,输入训练好的模型得到待识别说话人特征,根据待识别说话人特征与已保存的说话人特征的相似度,对待识别说话人的身份进行识别。The speaker recognition module collects registered voice samples, uses the voice enhancement module and the voice feature extraction module to perform voice enhancement and feature extraction, and then inputs the trained model to obtain the depth features of each registered voice sample, which are used as the characteristics of each speaker. Speaker characteristics, and save them; obtain the speech data of the speaker to be recognized, use the method of speech enhancement module and speech feature extraction module to perform speech enhancement and feature extraction, input the trained model to obtain the characteristics of the speaker to be recognized, Identify the identity of the speaker to be recognized according to the similarity between the features of the speaker to be recognized and the saved speaker features.
CN202111140239.5A 2021-09-28 2021-09-28 Speaker recognition method and system based on voice enhancement Active CN113823293B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111140239.5A CN113823293B (en) 2021-09-28 2021-09-28 Speaker recognition method and system based on voice enhancement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111140239.5A CN113823293B (en) 2021-09-28 2021-09-28 Speaker recognition method and system based on voice enhancement

Publications (2)

Publication Number Publication Date
CN113823293A true CN113823293A (en) 2021-12-21
CN113823293B CN113823293B (en) 2024-04-26

Family

ID=78921390

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111140239.5A Active CN113823293B (en) 2021-09-28 2021-09-28 Speaker recognition method and system based on voice enhancement

Country Status (1)

Country Link
CN (1) CN113823293B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114360551A (en) * 2022-01-07 2022-04-15 浙江大学 Gender and language-based speaker identification method and system
CN114822559A (en) * 2022-04-29 2022-07-29 上海大学 A system and method for short-term speech speaker recognition based on deep learning
CN114974261A (en) * 2022-05-12 2022-08-30 厦门快商通科技股份有限公司 Voice verification method, terminal device and storage medium
CN115410581A (en) * 2022-09-01 2022-11-29 山东深博建筑工程有限公司 A voiceprint recognition method for intelligent access control
CN115602176A (en) * 2022-10-11 2023-01-13 武汉烽火普天信息技术有限公司(Cn) Method, system and storage medium for voiceprint recognition
CN115631743A (en) * 2022-12-07 2023-01-20 中诚华隆计算机技术有限公司 High-precision voice recognition method and system based on voice chip
CN116312570A (en) * 2023-03-15 2023-06-23 山东新一代信息产业技术研究院有限公司 Voice noise reduction method, device, equipment and medium based on voiceprint recognition
CN116434759A (en) * 2023-04-11 2023-07-14 兰州交通大学 Speaker identification method based on SRS-CL network
CN116612765A (en) * 2023-05-24 2023-08-18 华东理工大学 Speaker Recognition System Based on Star Generative Adversarial Network
WO2024082928A1 (en) * 2022-10-21 2024-04-25 腾讯科技(深圳)有限公司 Voice processing method and apparatus, and device and medium
CN119400200A (en) * 2025-01-03 2025-02-07 中国空气动力研究与发展中心低速空气动力研究所 A method for determining the type of drone based on sound recognition

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102568472A (en) * 2010-12-15 2012-07-11 盛乐信息技术(上海)有限公司 Voice synthesis system with speaker selection and realization method thereof
CN104157290A (en) * 2014-08-19 2014-11-19 大连理工大学 A speaker recognition method based on deep learning
CN104835498A (en) * 2015-05-25 2015-08-12 重庆大学 Voiceprint identification method based on multi-type combination characteristic parameters
CN107464568A (en) * 2017-09-25 2017-12-12 四川长虹电器股份有限公司 Based on the unrelated method for distinguishing speek person of Three dimensional convolution neutral net text and system
CA3179080A1 (en) * 2016-09-19 2018-03-22 Pindrop Security, Inc. Channel-compensated low-level features for speaker recognition
CN109147810A (en) * 2018-09-30 2019-01-04 百度在线网络技术(北京)有限公司 Establish the method, apparatus, equipment and computer storage medium of speech enhan-cement network
US20190043529A1 (en) * 2018-06-06 2019-02-07 Intel Corporation Speech classification of audio for wake on voice
CN109326302A (en) * 2018-11-14 2019-02-12 桂林电子科技大学 A speech enhancement method based on voiceprint comparison and generative adversarial network
CN109410974A (en) * 2018-10-23 2019-03-01 百度在线网络技术(北京)有限公司 Sound enhancement method, device, equipment and storage medium
CN109524020A (en) * 2018-11-20 2019-03-26 上海海事大学 A kind of speech enhan-cement processing method
CN109712628A (en) * 2019-03-15 2019-05-03 哈尔滨理工大学 A kind of voice de-noising method and audio recognition method based on RNN
CN110299142A (en) * 2018-05-14 2019-10-01 桂林远望智能通信科技有限公司 A kind of method for recognizing sound-groove and device based on the network integration
CN110428849A (en) * 2019-07-30 2019-11-08 珠海亿智电子科技有限公司 A kind of sound enhancement method based on generation confrontation network
CN111785285A (en) * 2020-05-22 2020-10-16 南京邮电大学 Voiceprint recognition method for home multi-feature parameter fusion
KR20210036692A (en) * 2019-09-26 2021-04-05 국방과학연구소 Method and apparatus for robust speech enhancement training using adversarial training
CN112820301A (en) * 2021-03-15 2021-05-18 中国科学院声学研究所 Unsupervised cross-domain voiceprint recognition method fusing distribution alignment and counterstudy

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102568472A (en) * 2010-12-15 2012-07-11 盛乐信息技术(上海)有限公司 Voice synthesis system with speaker selection and realization method thereof
CN104157290A (en) * 2014-08-19 2014-11-19 大连理工大学 A speaker recognition method based on deep learning
CN104835498A (en) * 2015-05-25 2015-08-12 重庆大学 Voiceprint identification method based on multi-type combination characteristic parameters
CA3179080A1 (en) * 2016-09-19 2018-03-22 Pindrop Security, Inc. Channel-compensated low-level features for speaker recognition
CN107464568A (en) * 2017-09-25 2017-12-12 四川长虹电器股份有限公司 Based on the unrelated method for distinguishing speek person of Three dimensional convolution neutral net text and system
CN110299142A (en) * 2018-05-14 2019-10-01 桂林远望智能通信科技有限公司 A kind of method for recognizing sound-groove and device based on the network integration
US20190043529A1 (en) * 2018-06-06 2019-02-07 Intel Corporation Speech classification of audio for wake on voice
CN109147810A (en) * 2018-09-30 2019-01-04 百度在线网络技术(北京)有限公司 Establish the method, apparatus, equipment and computer storage medium of speech enhan-cement network
CN109410974A (en) * 2018-10-23 2019-03-01 百度在线网络技术(北京)有限公司 Sound enhancement method, device, equipment and storage medium
CN109326302A (en) * 2018-11-14 2019-02-12 桂林电子科技大学 A speech enhancement method based on voiceprint comparison and generative adversarial network
CN109524020A (en) * 2018-11-20 2019-03-26 上海海事大学 A kind of speech enhan-cement processing method
CN109712628A (en) * 2019-03-15 2019-05-03 哈尔滨理工大学 A kind of voice de-noising method and audio recognition method based on RNN
CN110428849A (en) * 2019-07-30 2019-11-08 珠海亿智电子科技有限公司 A kind of sound enhancement method based on generation confrontation network
KR20210036692A (en) * 2019-09-26 2021-04-05 국방과학연구소 Method and apparatus for robust speech enhancement training using adversarial training
CN111785285A (en) * 2020-05-22 2020-10-16 南京邮电大学 Voiceprint recognition method for home multi-feature parameter fusion
CN112820301A (en) * 2021-03-15 2021-05-18 中国科学院声学研究所 Unsupervised cross-domain voiceprint recognition method fusing distribution alignment and counterstudy

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
杨瑶;陈晓;: "基于神经网络的说话人识别实验设计", 实验室研究与探索, no. 09 *
毛维;曾庆宁;龙超;: "双微阵列语音增强算法在说话人识别中的应用", 声学技术, no. 03, 15 June 2018 (2018-06-15) *
王萌;王福龙;: "基于端点检测和高斯滤波器组的MFCC说话人识别", 计算机系统应用, no. 10, 15 October 2016 (2016-10-15) *
蓝天;彭川;李森;叶文政;李萌;惠国强;吕忆蓝;钱宇欣;刘峤;: "单声道语音降噪与去混响研究综述", 计算机研究与发展, no. 05 *
蔡倩等: "一种基于卷积神经网络的快速说话人识别方法", 《无线电工程》, vol. 50, no. 6, pages 447 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114360551A (en) * 2022-01-07 2022-04-15 浙江大学 Gender and language-based speaker identification method and system
CN114822559A (en) * 2022-04-29 2022-07-29 上海大学 A system and method for short-term speech speaker recognition based on deep learning
CN114974261A (en) * 2022-05-12 2022-08-30 厦门快商通科技股份有限公司 Voice verification method, terminal device and storage medium
CN115410581A (en) * 2022-09-01 2022-11-29 山东深博建筑工程有限公司 A voiceprint recognition method for intelligent access control
CN115602176A (en) * 2022-10-11 2023-01-13 武汉烽火普天信息技术有限公司(Cn) Method, system and storage medium for voiceprint recognition
WO2024082928A1 (en) * 2022-10-21 2024-04-25 腾讯科技(深圳)有限公司 Voice processing method and apparatus, and device and medium
CN115631743B (en) * 2022-12-07 2023-03-21 中诚华隆计算机技术有限公司 High-precision voice recognition method and system based on voice chip
CN115631743A (en) * 2022-12-07 2023-01-20 中诚华隆计算机技术有限公司 High-precision voice recognition method and system based on voice chip
CN116312570A (en) * 2023-03-15 2023-06-23 山东新一代信息产业技术研究院有限公司 Voice noise reduction method, device, equipment and medium based on voiceprint recognition
CN116434759A (en) * 2023-04-11 2023-07-14 兰州交通大学 Speaker identification method based on SRS-CL network
CN116434759B (en) * 2023-04-11 2024-03-01 兰州交通大学 A speaker recognition method based on SRS-CL network
CN116612765A (en) * 2023-05-24 2023-08-18 华东理工大学 Speaker Recognition System Based on Star Generative Adversarial Network
CN119400200A (en) * 2025-01-03 2025-02-07 中国空气动力研究与发展中心低速空气动力研究所 A method for determining the type of drone based on sound recognition

Also Published As

Publication number Publication date
CN113823293B (en) 2024-04-26

Similar Documents

Publication Publication Date Title
CN113823293B (en) Speaker recognition method and system based on voice enhancement
Chauhan et al. Speaker recognition using LPC, MFCC, ZCR features with ANN and SVM classifier for large input database
CN108597496B (en) Voice generation method and device based on generation type countermeasure network
WO2021139425A1 (en) Voice activity detection method, apparatus and device, and storage medium
CN104835498B (en) Method for recognizing sound-groove based on polymorphic type assemblage characteristic parameter
CN102509547A (en) Method and system for voiceprint recognition based on vector quantization based
CN111554302A (en) Strategy adjusting method, device, terminal and storage medium based on voiceprint recognition
CN111524520A (en) Voiceprint recognition method based on error reverse propagation neural network
CN109473102A (en) A kind of robot secretary intelligent meeting recording method and system
CN111524524A (en) Voiceprint recognition method, voiceprint recognition device, voiceprint recognition equipment and storage medium
CN110782902A (en) Audio data determination method, apparatus, device and medium
Murugaiya et al. Probability enhanced entropy (PEE) novel feature for improved bird sound classification
CN110570871A (en) A voiceprint recognition method, device and equipment based on TristouNet
KR100779242B1 (en) Speaker Recognition Method in Integrated Speech Recognition / Speaker Recognition System
Joshi et al. Noise robust automatic speaker verification systems: review and analysis
CN113724692A (en) Voice print feature-based phone scene audio acquisition and anti-interference processing method
Palivela et al. Voice Authentication System
Nagakrishnan et al. Generic speech based person authentication system with genuine and spoofed utterances: different feature sets and models
Kaminski et al. Automatic speaker recognition using a unique personal feature vector and Gaussian Mixture Models
Sukor et al. Speaker identification system using MFCC procedure and noise reduction method
Islam et al. A Novel Approach for Text-Independent Speaker Identification Using Artificial Neural Network
Khetri et al. Automatic speech recognition for marathi isolated words
Imam et al. Speaker recognition using automated systems
Shofiyah et al. Voice recognition system for home security keys with Mel-frequency cepstral coefficient method and backpropagation artificial neural network
CN114360551A (en) Gender and language-based speaker identification method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载