+

WO2018166112A1 - Procédé de vérification d'identité basé sur la reconnaissance d'empreinte vocale, dispositif électronique et support de stockage - Google Patents

Procédé de vérification d'identité basé sur la reconnaissance d'empreinte vocale, dispositif électronique et support de stockage Download PDF

Info

Publication number
WO2018166112A1
WO2018166112A1 PCT/CN2017/091361 CN2017091361W WO2018166112A1 WO 2018166112 A1 WO2018166112 A1 WO 2018166112A1 CN 2017091361 W CN2017091361 W CN 2017091361W WO 2018166112 A1 WO2018166112 A1 WO 2018166112A1
Authority
WO
WIPO (PCT)
Prior art keywords
voiceprint
voice data
vector
feature vector
gaussian mixture
Prior art date
Application number
PCT/CN2017/091361
Other languages
English (en)
Chinese (zh)
Inventor
王健宗
丁涵宇
郭卉
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2018166112A1 publication Critical patent/WO2018166112A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0861Network architectures or network communication protocols for network security for authentication of entities using biometrical features, e.g. fingerprint, retina-scan
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum

Definitions

  • the present invention relates to the field of communications technologies, and in particular, to a method, an electronic device, and a storage medium for identity verification based on voiceprint recognition.
  • a first aspect of the present invention provides a voiceprint recognition based authentication method, and the voiceprint recognition based identity verification method includes:
  • a second aspect of the present invention provides an electronic device, including a processing device, a storage device, and a voiceprint recognition-based identity verification system, wherein the voiceprint recognition-based identity verification system is stored in the storage device, including at least one computer Reading instructions, the at least one computer readable instruction being executable by the processing device to:
  • a third aspect of the invention provides a computer readable storage medium having stored thereon at least one computer readable instruction executable by a processing device to:
  • the invention has the beneficial effects that the background channel model generated by the pre-training of the present invention is obtained by mining and comparing a large amount of voice data, and the model can accurately describe the user's voice while retaining the user's voiceprint feature to the utmost extent.
  • the background voiceprint feature can be removed at the time of recognition, and the intrinsic feature of the user voice can be extracted, which can greatly improve the accuracy of user identity verification and improve the efficiency of identity verification.
  • FIG. 1 is a schematic diagram of an application environment of a preferred embodiment of a voiceprint recognition based authentication method according to the present invention
  • FIG. 2 is a schematic flow chart of a preferred embodiment of a voiceprint recognition based identity verification method according to the present invention
  • FIG. 3 is a schematic diagram showing the refinement process of step S1 shown in FIG. 2;
  • step S3 is a schematic diagram showing the refinement process of step S3 shown in FIG. 2;
  • FIG. 5 is a schematic structural diagram of a system for authenticating a voiceprint recognition based authentication method according to the present invention.
  • FIG. 1 it is a schematic diagram of an application environment of a preferred embodiment of a method for implementing voiceprint recognition based identity verification according to the present invention.
  • the application environment diagram includes an electronic device 1 and a terminal device 2.
  • the electronic device 1 can perform data interaction with the terminal device 2 through a suitable technology such as a network or a near field communication technology.
  • the terminal device 2 includes, but is not limited to, any electronic product that can interact with a user through a keyboard, a mouse, a remote controller, a touch pad, or a voice control device, for example, a personal computer, a tablet computer, a smart phone, or an individual.
  • PDA Personal Digital Assistant
  • game console Internet Protocol Television (IPTV)
  • smart wearable device etc.
  • the electronic device 1 is an automatic numerical meter capable of automatically setting according to an instruction set or stored in advance. Equipment for calculation and/or information processing.
  • the electronic device 1 may be a computer, a single network server, a server group composed of multiple network servers, or a cloud-based cloud composed of a large number of hosts or network servers, where cloud computing is a type of distributed computing, A super virtual computer consisting of a loosely coupled set of computers.
  • the electronic device 1 includes, but is not limited to, a storage device 11, a processing device 12, and a network interface 13 that are communicably connected to each other through a system bus. It should be noted that FIG. 1 only shows the electronic device 1 having the components 11-13, but it should be understood that not all illustrated components are required to be implemented, and more or fewer components may be implemented instead.
  • the storage device 11 includes a memory and at least one type of readable storage medium.
  • the memory provides a cache for the operation of the electronic device 1;
  • the readable storage medium may be a non-volatile storage medium such as a flash memory, a hard disk, a multimedia card, a card type memory, or the like.
  • the readable storage medium may be an internal storage unit of the electronic device 1, such as a hard disk of the electronic device 1; in other embodiments, the non-volatile storage medium may also be external to the electronic device 1.
  • a storage device such as a plug-in hard disk equipped with an electronic device 1, a smart memory card (SMC), a Secure Digital (SD) card, a flash card, or the like.
  • SMC smart memory card
  • SD Secure Digital
  • the readable storage medium of the storage device 11 is generally used to store an operating system installed on the electronic device 1 and various types of application software, such as the voiceprint recognition-based identity verification system 10 in an embodiment of the present application. Program code, etc. Further, the storage device 11 can also be used to temporarily store various types of data that have been output or are to be output.
  • Processing device 12 may, in some embodiments, include one or more microprocessors, microcontrollers, digital processors, and the like.
  • the processing device 12 is generally used to control the operation of the electronic device 1, for example, to perform control and processing related to data interaction or communication with the terminal device 2.
  • the processing device 12 is operative to run program code or process data stored in the storage device 11, such as a system 10 that runs voiceprint recognition based authentication.
  • the network interface 13 may comprise a wireless network interface or a wired network interface, which is typically used to establish a communication connection between the electronic device 1 and other electronic devices.
  • the network interface 13 is mainly used to connect the electronic device 1 with one or more terminal devices 2, and establish a data transmission channel and a communication connection between the electronic device 1 and one or more terminal devices 2.
  • the voiceprint recognition based authentication system 10 includes at least one computer readable instructions stored in the storage device 11, the at least one computer readable instructions being executable by the processing device 12 to implement a voiceprint based on embodiments of the present application.
  • the method of identifying the authentication As described later, the at least one computer readable instruction can be classified into different logic modules depending on the functions implemented by its various parts.
  • the voiceprint recognition based authentication system 10 when executed by the processing device 12, the following operations are performed: first, after receiving the voice data of the authenticated user, acquiring the voiceprint feature of the voice data And constructing a corresponding voiceprint feature vector based on the voiceprint feature; then inputting the voiceprint feature vector into a background channel model generated by pre-training to construct a current voiceprint discrimination vector corresponding to the voice data; a spatial distance between the current voiceprint discrimination vector and a pre-stored standard voiceprint discrimination vector of the user, based on the distance Authenticate and generate verification results.
  • FIG. 2 is a schematic flowchart of a method for authenticating a voiceprint recognition based authentication method according to the present invention.
  • the method for authenticating identity based on voiceprint recognition in this embodiment is not limited to the steps shown in the process. Further, in the steps shown in the flowchart, some steps may be omitted, and the order between the steps may be changed.
  • the method for voiceprint recognition based authentication includes the following steps:
  • Step S1 after receiving the voice data of the user who performs the authentication, acquiring the voiceprint feature of the voice data, and constructing a corresponding voiceprint feature vector based on the voiceprint feature;
  • the voice data is collected by the voice collection device (the voice collection device is, for example, a microphone), and the voice collection device sends the collected voice data to the voice recognition-based identity verification system.
  • the voice collection device is, for example, a microphone
  • the voice collection device When collecting voice data, you should try to prevent environmental noise and interference from voice acquisition equipment.
  • the voice collection device maintains an appropriate distance from the user, and tries not to use a large voice acquisition device.
  • the power supply preferably uses the commercial power and keeps the current stable; the sensor should be used when recording the telephone.
  • the voice data may be denoised prior to extracting the voiceprint features in the voice data to further reduce interference.
  • the collected voice data is voice data of a preset data length, or voice data greater than a preset data length.
  • the voiceprint features include various types, such as wide-band voiceprint, narrow-band voiceprint, amplitude voiceprint, etc., and the voiceprint feature of the present embodiment is a Mel Frequency Cepstrum Coefficient (MFCC), which is preferably voice data. .
  • MFCC Mel Frequency Cepstrum Coefficient
  • the voiceprint feature of the voice data is composed into a feature data matrix, which is a voiceprint feature vector of the voice data.
  • Step S2 input the voiceprint feature vector into a background channel model generated by pre-training to construct a current voiceprint discrimination vector corresponding to the voice data;
  • the voiceprint feature vector is input into the background channel model generated by the pre-training.
  • the background channel model is a Gaussian mixture model, and the background channel model is used to calculate the voiceprint feature vector to obtain a corresponding current voiceprint discrimination vector ( I-vector).
  • the calculation process includes:
  • Loglike is a likelihood logarithmic matrix
  • E(X) is a mean matrix trained by a general background channel model
  • D(X) is a covariance matrix
  • X is a data matrix
  • X. 2 is a square of each value of the matrix.
  • Extract the current voiceprint discrimination vector firstly calculate the first-order and second-order coefficients, and the first-order coefficient calculation can be obtained by summing the probability matrix:
  • Gamma i is the i-th element of the first-order coefficient vector
  • loglikes ji is the j-th row of the probability matrix, the i-th element.
  • the second-order coefficients can be obtained by multiplying the transposition of the probability matrix by the data matrix:
  • X Loglike T *feats, where X is a second-order coefficient matrix, loglike is a probability matrix, and feats is a feature data matrix.
  • the primary term and the quadratic term are calculated in parallel, and then the current voiceprint discrimination vector is calculated by the primary term and the quadratic term.
  • the background channel model is a Gaussian mixture model
  • the method includes:
  • the voiceprint feature vector corresponding to each voice data sample is divided into a training set of a first ratio and a verification set of a second ratio, wherein a sum of the first ratio and the second ratio is less than or equal to 1;
  • the Gaussian mixture model is trained by using the voiceprint feature vector in the training set, and after the training is completed, the accuracy of the trained Gaussian mixture model is verified by using the verification set;
  • the model training ends, and the trained Gaussian mixture model is used as the background channel model of the step S2, or if the accuracy is less than or equal to the preset threshold, the voice is added.
  • the number of data samples is re-trained based on the increased speech data samples.
  • the likelihood probability corresponding to the extracted D-dimensional voiceprint feature can be expressed by K Gaussian components:
  • P(x) is the probability that the speech data samples are generated by the Gaussian mixture model (mixed Gaussian model), w k is the weight of each Gaussian model, and p(x
  • K is the number of Gaussian models.
  • the parameters of the entire Gaussian mixture model can be expressed as: ⁇ w i , ⁇ i , ⁇ i ⁇ , w i is the weight of the i-th Gaussian model, ⁇ i is the mean of the i-th Gaussian model, and ⁇ i is the i-th Gaussian
  • Training the Gaussian mixture model can use an unsupervised EM algorithm. After the training is completed, the Gaussian mixture model weight vector, constant vector, N covariance matrix, and the mean multiplied by the covariance matrix are obtained, which is a trained Gaussian mixture model.
  • Step S3 Calculate a spatial distance between the current voiceprint discrimination vector and a pre-stored standard voiceprint discrimination vector of the user, perform identity verification on the user based on the distance, and generate a verification result.
  • the spatial distance of this embodiment is a cosine distance, which is a measure of the magnitude of the difference between two individuals using the cosine of the angles of the two vectors in the vector space.
  • the standard voiceprint discriminant vector is a voiceprint discriminant vector obtained and stored in advance, and the standard voiceprint discriminant vector carries the identifier information of the corresponding user when stored, which can accurately represent the identity of the corresponding user.
  • the stored voiceprint discrimination vector is obtained according to the identification information provided by the user before calculating the spatial distance.
  • the verification passes, and vice versa, the verification fails.
  • the background channel model generated by the pre-training in this embodiment is obtained by mining and comparing a large amount of voice data, and the model can accurately depict the user while maximally retaining the voiceprint features of the user.
  • the background voiceprint feature when speaking, and can remove this feature when identifying, and extracting the inherent features of the user voice, can greatly improve the accuracy of the user identity verification, and improve the efficiency of the identity verification; It makes full use of the voiceprint features related to the vocal vocal in the human voice.
  • This voiceprint feature does not need to limit the text, so it has greater flexibility in the process of identification and verification.
  • the foregoing step S1 includes:
  • Step S11 Perform pre-emphasis, framing, and windowing on the voice data.
  • the voice data is processed.
  • each frame signal is regarded as a stationary signal.
  • the start frame and the end frame of the speech data are discontinuous, and after the framing, the original speech is further deviated. Therefore, the voice data needs to be windowed.
  • Step S12 performing Fourier transform on each window to obtain a corresponding spectrum
  • Step S13 input the spectrum into a mel filter to output a mega spectrum
  • Step S14 performing cepstrum analysis on the Mel spectrum to obtain a Mel frequency cepstral coefficient MFCC, and composing a corresponding voiceprint feature vector based on the Mel frequency cepstral coefficient MFCC.
  • the cepstrum analysis is, for example, taking logarithm and inverse transform.
  • the inverse transform is generally implemented by DCT discrete cosine transform, and the second to thirteenth coefficients after DCT are taken as MFCC coefficients.
  • the Mel frequency cepstrum coefficient MFCC is the voiceprint feature of the speech data of this frame, and the Mel frequency cepstral coefficient MFCC of each frame is composed into a feature data matrix, which is the voiceprint feature vector of the speech data.
  • Step S3 includes:
  • Step S31 calculating a cosine distance between the current voiceprint discrimination vector and the pre-stored standard voiceprint discrimination vector of the user: Identifying the vector for the standard voiceprint, Identify the vector for the current voiceprint;
  • Step S32 if the cosine distance is less than or equal to a preset distance threshold, generating verification pass information
  • Step S33 If the cosine distance is greater than a preset distance threshold, generate information that the verification fails.
  • the foregoing step S3 is replaced by: calculating a spatial distance between the current voiceprint discrimination vector and each of the pre-stored standard voiceprint discrimination vectors, and obtaining The smallest spatial distance, the user is authenticated based on the minimum spatial distance, and a verification result is generated.
  • the difference between the embodiment and the embodiment of FIG. 1 is that the embodiment does not carry the identification information of the user when storing the standard voiceprint authentication vector, and calculates the current voiceprint authentication vector and the pre-stored standard when verifying the identity of the user.
  • the voiceprint discriminates the spatial distance between the vectors and obtains a minimum spatial distance. If the minimum spatial distance is less than a preset distance threshold (the distance threshold is the same as or different from the distance threshold of the above embodiment), the verification passes, otherwise verification failed.
  • FIG. 5 is a functional block diagram of a preferred embodiment of the voiceprint recognition based authentication system 10 of the present invention.
  • the voiceprint recognition based authentication system 10 can be partitioned into one or more modules, one or more modules being stored in a memory and executed by one or more processors to complete this invention.
  • the voiceprint recognition based authentication system 10 can be divided into a detection module 21, an identification module 22, a replication module 23, an installation module 24, and a startup module 25.
  • module refers to a series of computer program instruction segments capable of performing a specific function, which is more suitable than the program for describing the execution of the voiceprint recognition based authentication system 10 in an electronic device, wherein:
  • the first obtaining module 101 is configured to acquire a voiceprint feature of the voice data after receiving the voice data of the user who performs the identity verification, and construct a corresponding voiceprint feature vector based on the voiceprint feature;
  • the voice data is collected by the voice collection device (the voice collection device is, for example, a microphone), and the voice collection device sends the collected voice data to the voice recognition-based identity verification system.
  • the voice collection device is, for example, a microphone
  • the voice collection device When collecting voice data, you should try to prevent environmental noise and interference from voice acquisition equipment.
  • the voice collection device maintains an appropriate distance from the user, and tries not to use a large voice acquisition device.
  • the power supply preferably uses the commercial power and keeps the current stable; the sensor should be used when recording the telephone.
  • the voice data may be denoised prior to extracting the voiceprint features in the voice data to further reduce interference.
  • the collected voice data is a preset number. According to the length of the voice data, or the voice data is greater than the preset data length.
  • the voiceprint features include various types, such as wide-band voiceprint, narrow-band voiceprint, amplitude voiceprint, etc., and the voiceprint feature of the present embodiment is a Mel Frequency Cepstrum Coefficient (MFCC), which is preferably voice data. .
  • MFCC Mel Frequency Cepstrum Coefficient
  • the voiceprint feature of the voice data is composed into a feature data matrix, which is a voiceprint feature vector of the voice data.
  • the constructing module 102 is configured to input the voiceprint feature vector into a background channel model generated by pre-training to construct a current voiceprint discrimination vector corresponding to the voice data;
  • the voiceprint feature vector is input into the background channel model generated by the pre-training.
  • the background channel model is a Gaussian mixture model, and the background channel model is used to calculate the voiceprint feature vector to obtain a corresponding current voiceprint discrimination vector ( I-vector).
  • the calculation process includes:
  • Loglike is a likelihood logarithmic matrix
  • E(X) is a mean matrix trained by a general background channel model
  • D(X) is a covariance matrix
  • X is a data matrix
  • X. 2 is a square of each value of the matrix.
  • Extract the current voiceprint discrimination vector firstly calculate the first-order and second-order coefficients, and the first-order coefficient calculation can be obtained by summing the probability matrix:
  • Gamma i is the i-th element of the first-order coefficient vector
  • loglikes ji is the j-th row of the probability matrix, the i-th element.
  • the second-order coefficients can be obtained by multiplying the transposition of the probability matrix by the data matrix:
  • X Loglike T *feats, where X is a second-order coefficient matrix, loglike is a probability matrix, and feats is a feature data matrix.
  • the primary term and the quadratic term are calculated in parallel, and then the current voiceprint discrimination vector is calculated by the primary term and the quadratic term.
  • the background channel model is a Gaussian mixture model
  • the voiceprint recognition based authentication system further comprises:
  • a second acquiring module configured to acquire a preset number of voice data samples, and obtain each voice data a voiceprint feature corresponding to the sample, and constructing a voiceprint feature vector corresponding to each voice data sample based on the voiceprint feature corresponding to each voice data sample;
  • a dividing module configured to divide the voiceprint feature vector corresponding to each voice data sample into a training set of a first ratio and a verification set of a second ratio, wherein a sum of the first ratio and the second ratio is less than or equal to 1;
  • a training module is configured to train the Gaussian mixture model by using the voiceprint feature vector in the training set, and after the training is completed, verify the accuracy of the trained Gaussian mixture model by using the verification set;
  • a processing module if the accuracy is greater than a preset threshold, the model training ends, the trained Gaussian mixture model is used as the background channel model, or, if the accuracy is less than or equal to a preset threshold, The number of speech data samples is described and re-trained based on the increased speech data samples.
  • the likelihood probability corresponding to the extracted D-dimensional voiceprint feature can be expressed by K Gaussian components:
  • P(x) is the probability that the speech data samples are generated by the Gaussian mixture model (mixed Gaussian model), w k is the weight of each Gaussian model, and p(x
  • K is the number of Gaussian models.
  • the parameters of the entire Gaussian mixture model can be expressed as: ⁇ w i , ⁇ i , ⁇ i ⁇ , w i is the weight of the i-th Gaussian model, ⁇ i is the mean of the i-th Gaussian model, and ⁇ i is the i-th Gaussian
  • Training the Gaussian mixture model can use an unsupervised EM algorithm. After the training is completed, the Gaussian mixture model weight vector, constant vector, N covariance matrix, and the mean multiplied by the covariance matrix are obtained, which is a trained Gaussian mixture model.
  • the first verification module 103 is configured to calculate a spatial distance between the current voiceprint discrimination vector and the pre-stored standard voiceprint discrimination vector of the user, perform identity verification on the user based on the distance, and generate a verification result.
  • the spatial distance of the present embodiment is a cosine distance
  • the cosine distance is a cosine value of the angle between two vectors in the vector space.
  • the standard voiceprint discriminant vector is a voiceprint discriminant vector obtained and stored in advance, and the standard voiceprint discriminant vector carries the identifier information of the corresponding user when stored, which can accurately represent the identity of the corresponding user.
  • the stored voiceprint discrimination vector is obtained according to the identification information provided by the user before calculating the spatial distance.
  • the verification passes, and vice versa, the verification fails.
  • the first acquiring module 101 is specifically configured to perform pre-emphasis, framing, and windowing on the voice data; Fourier transform to obtain the corresponding spectrum; input the spectrum into the Meyer filter to output the Mel spectrum; perform cepstrum analysis on the Mel spectrum to obtain the Mel frequency cepstral coefficient MFCC, A corresponding voiceprint feature vector is formed based on the Mel frequency cepstral coefficient MFCC.
  • the pre-emphasis processing is actually a high-pass filtering process, filtering out the low-frequency data, so that the high-frequency characteristics in the speech data are more prominent.
  • N frames short-time signals
  • there is a repeating area between adjacent frames and the repeating area is generally 1/2 of the length of each frame; after the framed speech data, each frame signal is regarded as a stationary signal.
  • the voice data needs to be windowed.
  • the cepstrum analysis is, for example, taking logarithm and inverse transform.
  • the inverse transform is generally implemented by DCT discrete cosine transform, and the second to thirteenth coefficients after DCT are taken as MFCC coefficients.
  • the Mel frequency cepstrum coefficient MFCC is the voiceprint feature of the speech data of this frame, and the Mel frequency cepstral coefficient MFCC of each frame is composed into a feature data matrix, which is the voiceprint feature vector of the speech data.
  • the first verification module 103 is specifically configured to calculate between the current voiceprint discrimination vector and the pre-stored standard voiceprint discrimination vector of the user.
  • Cosine distance Identifying the vector for the standard voiceprint, And identifying the vector for the current voiceprint; if the cosine distance is less than or equal to the preset distance threshold, generating information for verifying the pass; if the cosine distance is greater than the preset distance threshold, generating information that the verification fails.
  • the first verification module is replaced by a second verification module, configured to calculate the current voiceprint discrimination vector and pre-stored standard voiceprint identification.
  • the spatial distance between the vectors, the minimum spatial distance is obtained, the user is authenticated based on the minimum spatial distance, and a verification result is generated.
  • the present embodiment does not carry the identification information of the user when storing the standard voiceprint authentication vector, and calculates the current voiceprint authentication vector and the pre-stored standard when verifying the identity of the user.
  • the voiceprint discriminates the spatial distance between the vectors and obtains a minimum spatial distance. If the minimum spatial distance is less than a preset distance threshold (the distance threshold is the same as or different from the distance threshold of the above embodiment), the verification passes, otherwise verification failed.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Collating Specific Patterns (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
  • Telephonic Communication Services (AREA)

Abstract

La présente invention concerne un procédé de vérification d'identité basé sur la reconnaissance d'empreinte vocale, un dispositif électronique et un support de stockage. Le procédé de vérification d'identité basé sur la reconnaissance d'empreinte vocale comprend : après réception de données vocales d'un utilisateur procédant à une vérification d'identité, l'obtention d'une caractéristique d'empreinte vocale des données vocales, et la construction d'un vecteur de caractéristique d'empreinte vocale correspondant sur la base de la caractéristique d'empreinte vocale ; l'entrée du vecteur de caractéristique d'empreinte vocale dans un modèle de canal d'arrière-plan généré en apprenant à l'avance à construire un vecteur d'identification d'empreinte vocale actuel correspondant aux données vocales ; et le calcul d'une distance d'espacement entre le vecteur d'identification d'empreinte vocale actuel et un vecteur d'identification d'empreinte vocale standard pré-stocké de l'utilisateur, la réalisation d'une vérification d'identité relative à l'utilisateur sur la base de la distance, et la génération d'un résultat de vérification. La présente invention peut améliorer la précision et l'efficacité de la vérification de l'identité d'un utilisateur.
PCT/CN2017/091361 2017-03-13 2017-06-30 Procédé de vérification d'identité basé sur la reconnaissance d'empreinte vocale, dispositif électronique et support de stockage WO2018166112A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710147695.XA CN107068154A (zh) 2017-03-13 2017-03-13 基于声纹识别的身份验证的方法及系统
CN201710147695.X 2017-03-13

Publications (1)

Publication Number Publication Date
WO2018166112A1 true WO2018166112A1 (fr) 2018-09-20

Family

ID=59622093

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/CN2017/091361 WO2018166112A1 (fr) 2017-03-13 2017-06-30 Procédé de vérification d'identité basé sur la reconnaissance d'empreinte vocale, dispositif électronique et support de stockage
PCT/CN2017/105031 WO2018166187A1 (fr) 2017-03-13 2017-09-30 Serveur, procédé et système de vérification d'identité, et support d'informations lisible par ordinateur

Family Applications After (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/105031 WO2018166187A1 (fr) 2017-03-13 2017-09-30 Serveur, procédé et système de vérification d'identité, et support d'informations lisible par ordinateur

Country Status (3)

Country Link
CN (2) CN107068154A (fr)
TW (1) TWI641965B (fr)
WO (2) WO2018166112A1 (fr)

Families Citing this family (56)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107068154A (zh) * 2017-03-13 2017-08-18 平安科技(深圳)有限公司 基于声纹识别的身份验证的方法及系统
CN107527620B (zh) * 2017-07-25 2019-03-26 平安科技(深圳)有限公司 电子装置、身份验证的方法及计算机可读存储介质
CN107993071A (zh) * 2017-11-21 2018-05-04 平安科技(深圳)有限公司 电子装置、基于声纹的身份验证方法及存储介质
CN108172230A (zh) * 2018-01-03 2018-06-15 平安科技(深圳)有限公司 基于声纹识别模型的声纹注册方法、终端装置及存储介质
CN108269575B (zh) * 2018-01-12 2021-11-02 平安科技(深圳)有限公司 更新声纹数据的语音识别方法、终端装置及存储介质
CN108154371A (zh) * 2018-01-12 2018-06-12 平安科技(深圳)有限公司 电子装置、身份验证的方法及存储介质
CN108091326B (zh) * 2018-02-11 2021-08-06 张晓雷 一种基于线性回归的声纹识别方法及系统
CN108694952B (zh) * 2018-04-09 2020-04-28 平安科技(深圳)有限公司 电子装置、身份验证的方法及存储介质
CN108768654B (zh) * 2018-04-09 2020-04-21 平安科技(深圳)有限公司 基于声纹识别的身份验证方法、服务器及存储介质
CN108766444B (zh) * 2018-04-09 2020-11-03 平安科技(深圳)有限公司 用户身份验证方法、服务器及存储介质
CN108447489B (zh) * 2018-04-17 2020-05-22 清华大学 一种带反馈的连续声纹认证方法及系统
CN108806695A (zh) * 2018-04-17 2018-11-13 平安科技(深圳)有限公司 自更新的反欺诈方法、装置、计算机设备和存储介质
CN108630208B (zh) * 2018-05-14 2020-10-27 平安科技(深圳)有限公司 服务器、基于声纹的身份验证方法及存储介质
CN108650266B (zh) * 2018-05-14 2020-02-18 平安科技(深圳)有限公司 服务器、声纹验证的方法及存储介质
CN108834138B (zh) * 2018-05-25 2022-05-24 北京国联视讯信息技术股份有限公司 一种基于声纹数据的配网方法及系统
CN109101801B (zh) * 2018-07-12 2021-04-27 北京百度网讯科技有限公司 用于身份认证的方法、装置、设备和计算机可读存储介质
CN109087647B (zh) * 2018-08-03 2023-06-13 平安科技(深圳)有限公司 声纹识别处理方法、装置、电子设备及存储介质
CN109256138B (zh) * 2018-08-13 2023-07-07 平安科技(深圳)有限公司 身份验证方法、终端设备及计算机可读存储介质
CN110867189A (zh) * 2018-08-28 2020-03-06 北京京东尚科信息技术有限公司 一种登陆方法和装置
CN110880325B (zh) * 2018-09-05 2022-06-28 华为技术有限公司 身份识别方法及设备
CN109450850B (zh) * 2018-09-26 2022-10-11 深圳壹账通智能科技有限公司 身份验证方法、装置、计算机设备和存储介质
CN109377662A (zh) * 2018-09-29 2019-02-22 途客易达(天津)网络科技有限公司 充电桩控制方法、装置以及电子设备
CN109257362A (zh) * 2018-10-11 2019-01-22 平安科技(深圳)有限公司 声纹验证的方法、装置、计算机设备以及存储介质
CN109378002B (zh) * 2018-10-11 2024-05-07 平安科技(深圳)有限公司 声纹验证的方法、装置、计算机设备和存储介质
CN109147797B (zh) * 2018-10-18 2024-05-07 平安科技(深圳)有限公司 基于声纹识别的客服方法、装置、计算机设备及存储介质
CN109524026B (zh) * 2018-10-26 2022-04-26 北京网众共创科技有限公司 提示音的确定方法及装置、存储介质、电子装置
CN109473105A (zh) * 2018-10-26 2019-03-15 平安科技(深圳)有限公司 与文本无关的声纹验证方法、装置和计算机设备
CN109360573A (zh) * 2018-11-13 2019-02-19 平安科技(深圳)有限公司 牲畜声纹识别方法、装置、终端设备及计算机存储介质
CN109493873A (zh) * 2018-11-13 2019-03-19 平安科技(深圳)有限公司 牲畜声纹识别方法、装置、终端设备及计算机存储介质
CN109636630A (zh) * 2018-12-07 2019-04-16 泰康保险集团股份有限公司 检测代投保行为的方法、装置、介质及电子设备
CN110046910B (zh) * 2018-12-13 2023-04-14 蚂蚁金服(杭州)网络技术有限公司 判断客户通过电子支付平台所进行交易合法性的方法和设备
CN109816508A (zh) * 2018-12-14 2019-05-28 深圳壹账通智能科技有限公司 基于大数据的用户身份认证方法、装置、计算机设备
CN109473108A (zh) * 2018-12-15 2019-03-15 深圳壹账通智能科技有限公司 基于声纹识别的身份验证方法、装置、设备及存储介质
CN109545226B (zh) * 2019-01-04 2022-11-22 平安科技(深圳)有限公司 一种语音识别方法、设备及计算机可读存储介质
CN110322888B (zh) * 2019-05-21 2023-05-30 平安科技(深圳)有限公司 信用卡解锁方法、装置、设备及计算机可读存储介质
CN110298150B (zh) * 2019-05-29 2021-11-26 上海拍拍贷金融信息服务有限公司 一种基于语音识别的身份验证方法及系统
CN110334603A (zh) * 2019-06-06 2019-10-15 视联动力信息技术股份有限公司 身份验证系统
CN110473569A (zh) * 2019-09-11 2019-11-19 苏州思必驰信息科技有限公司 检测说话人欺骗攻击的优化方法及系统
CN110738998A (zh) * 2019-09-11 2020-01-31 深圳壹账通智能科技有限公司 基于语音的个人信用评估方法、装置、终端及存储介质
CN110971755B (zh) * 2019-11-18 2021-04-20 武汉大学 一种基于pin码和压力码的双因素身份认证方法
CN111402899B (zh) * 2020-03-25 2023-10-13 中国工商银行股份有限公司 跨信道声纹识别方法及装置
CN111597531A (zh) * 2020-04-07 2020-08-28 北京捷通华声科技股份有限公司 一种身份认证方法、装置、电子设备及可读存储介质
CN111625704A (zh) * 2020-05-11 2020-09-04 镇江纵陌阡横信息科技有限公司 一种用户意图与数据协同的非个性化推荐算法模型
CN111710340A (zh) * 2020-06-05 2020-09-25 深圳市卡牛科技有限公司 基于语音识别用户身份的方法、装置、服务器及存储介质
CN111613230A (zh) * 2020-06-24 2020-09-01 泰康保险集团股份有限公司 声纹验证方法、装置、设备及存储介质
CN111899566A (zh) * 2020-08-11 2020-11-06 南京畅淼科技有限责任公司 一种基于ais的船舶交通管理系统
CN112289324B (zh) * 2020-10-27 2024-05-10 湖南华威金安企业管理有限公司 声纹身份识别的方法、装置和电子设备
CN112669841B (zh) * 2020-12-18 2024-07-02 平安科技(深圳)有限公司 多语种语音的生成模型的训练方法、装置及计算机设备
CN112835737A (zh) * 2021-03-30 2021-05-25 中国工商银行股份有限公司 系统异常处理方法及装置
CN112802481A (zh) * 2021-04-06 2021-05-14 北京远鉴信息技术有限公司 声纹验证方法、声纹识别模型训练方法、装置及设备
CN113421575B (zh) * 2021-06-30 2024-02-06 平安科技(深圳)有限公司 声纹识别方法、装置、设备及存储介质
CN113889120A (zh) * 2021-09-28 2022-01-04 北京百度网讯科技有限公司 声纹特征提取方法、装置、电子设备及存储介质
CN114780787A (zh) * 2022-04-01 2022-07-22 杭州半云科技有限公司 声纹检索方法、身份验证方法、身份注册方法和装置
CN114826709B (zh) * 2022-04-15 2024-07-09 马上消费金融股份有限公司 身份认证和声学环境检测方法、系统、电子设备及介质
CN114782141A (zh) * 2022-05-07 2022-07-22 中国工商银行股份有限公司 基于5g消息的产品交互方法、装置、电子设备及介质
CN119132307A (zh) * 2024-09-02 2024-12-13 蔚泓智能信息科技(上海)有限公司 一种基于语音识别和nlp的实验室研发智能自动数据记录系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1403953A (zh) * 2002-09-06 2003-03-19 浙江大学 掌上声纹验证系统
CN102820033A (zh) * 2012-08-17 2012-12-12 南京大学 一种声纹识别方法
US20120330663A1 (en) * 2011-06-27 2012-12-27 Hon Hai Precision Industry Co., Ltd. Identity authentication system and method
US20130225128A1 (en) * 2012-02-24 2013-08-29 Agnitio Sl System and method for speaker recognition on mobile devices

Family Cites Families (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8645137B2 (en) * 2000-03-16 2014-02-04 Apple Inc. Fast, language-independent method for user authentication by voice
TWI234762B (en) * 2003-12-22 2005-06-21 Top Dihital Co Ltd Voiceprint identification system for e-commerce
US7447633B2 (en) * 2004-11-22 2008-11-04 International Business Machines Corporation Method and apparatus for training a text independent speaker recognition system using speech data with text labels
US7536304B2 (en) * 2005-05-27 2009-05-19 Porticus, Inc. Method and system for bio-metric voice print authentication
CN101064043A (zh) * 2006-04-29 2007-10-31 上海优浪信息科技有限公司 一种声纹门禁系统及其应用
CN102479511A (zh) * 2010-11-23 2012-05-30 盛乐信息技术(上海)有限公司 一种大规模声纹认证方法及其系统
CN102238190B (zh) * 2011-08-01 2013-12-11 安徽科大讯飞信息科技股份有限公司 身份认证方法及系统
CN102509547B (zh) * 2011-12-29 2013-06-19 辽宁工业大学 基于矢量量化的声纹识别方法及系统
CN102695112A (zh) * 2012-06-09 2012-09-26 九江妙士酷实业有限公司 汽车播放器及其音量控制方法
CN102916815A (zh) * 2012-11-07 2013-02-06 华为终端有限公司 用户身份验证的方法和装置
CN103220286B (zh) * 2013-04-10 2015-02-25 郑方 基于动态密码语音的身份确认系统及方法
CN104427076A (zh) * 2013-08-30 2015-03-18 中兴通讯股份有限公司 呼叫系统自动应答的识别方法及装置
CN103632504A (zh) * 2013-12-17 2014-03-12 上海电机学院 图书馆安静提醒器
CN104765996B (zh) * 2014-01-06 2018-04-27 讯飞智元信息科技有限公司 声纹密码认证方法及系统
CN104978507B (zh) * 2014-04-14 2019-02-01 中国石油化工集团公司 一种基于声纹识别的智能测井评价专家系统身份认证方法
CN105100911A (zh) * 2014-05-06 2015-11-25 夏普株式会社 智能多媒体系统和方法
CN103986725A (zh) * 2014-05-29 2014-08-13 中国农业银行股份有限公司 一种客户端、服务器端以及身份认证系统和方法
CN104157301A (zh) * 2014-07-25 2014-11-19 广州三星通信技术研究有限公司 删除语音信息空白片段的方法、装置和终端
CN105321293A (zh) * 2014-09-18 2016-02-10 广东小天才科技有限公司 一种危险检测提醒方法及智能设备
CN104485102A (zh) * 2014-12-23 2015-04-01 智慧眼(湖南)科技发展有限公司 声纹识别方法和装置
CN104751845A (zh) * 2015-03-31 2015-07-01 江苏久祥汽车电器集团有限公司 一种用于智能机器人的声音识别方法及系统
CN104992708B (zh) * 2015-05-11 2018-07-24 国家计算机网络与信息安全管理中心 短时特定音频检测模型生成与检测方法
CN105096955B (zh) * 2015-09-06 2019-02-01 广东外语外贸大学 一种基于模型生长聚类的说话人快速识别方法及系统
CN105575394A (zh) * 2016-01-04 2016-05-11 北京时代瑞朗科技有限公司 基于全局变化空间及深度学习混合建模的声纹识别方法
CN105611461B (zh) * 2016-01-04 2019-12-17 浙江宇视科技有限公司 前端设备语音应用系统的噪声抑制方法、装置及系统
CN106971717A (zh) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 机器人与网络服务器协作处理的语音识别方法、装置
CN105869645B (zh) * 2016-03-25 2019-04-12 腾讯科技(深圳)有限公司 语音数据处理方法和装置
CN106210323B (zh) * 2016-07-13 2019-09-24 Oppo广东移动通信有限公司 一种语音播放方法及终端设备
CN106169295B (zh) * 2016-07-15 2019-03-01 腾讯科技(深圳)有限公司 身份向量生成方法和装置
CN106373576B (zh) * 2016-09-07 2020-07-21 Tcl科技集团股份有限公司 一种基于vq和svm算法的说话人确认方法及其系统
CN107068154A (zh) * 2017-03-13 2017-08-18 平安科技(深圳)有限公司 基于声纹识别的身份验证的方法及系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1403953A (zh) * 2002-09-06 2003-03-19 浙江大学 掌上声纹验证系统
US20120330663A1 (en) * 2011-06-27 2012-12-27 Hon Hai Precision Industry Co., Ltd. Identity authentication system and method
US20130225128A1 (en) * 2012-02-24 2013-08-29 Agnitio Sl System and method for speaker recognition on mobile devices
CN102820033A (zh) * 2012-08-17 2012-12-12 南京大学 一种声纹识别方法

Also Published As

Publication number Publication date
WO2018166187A1 (fr) 2018-09-20
CN107517207A (zh) 2017-12-26
TWI641965B (zh) 2018-11-21
CN107068154A (zh) 2017-08-18
TW201833810A (zh) 2018-09-16

Similar Documents

Publication Publication Date Title
WO2018166112A1 (fr) Procédé de vérification d'identité basé sur la reconnaissance d'empreinte vocale, dispositif électronique et support de stockage
WO2019100606A1 (fr) Dispositif électronique, procédé et système de vérification d'identité à base d'empreinte vocale, et support de stockage
CN107527620B (zh) 电子装置、身份验证的方法及计算机可读存储介质
WO2020181824A1 (fr) Procédé, appareil et dispositif de reconnaissance d'empreinte vocale et support de stockage lisible par ordinateur
TWI527023B (zh) A voiceprint recognition method and apparatus
WO2019136912A1 (fr) Dispositif électronique, procédé et système d'authentification d'identité, et support de stockage
CN107886943A (zh) 一种声纹识别方法及装置
CN109243487B (zh) 一种归一化常q倒谱特征的回放语音检测方法
CN102324232A (zh) 基于高斯混合模型的声纹识别方法及系统
CN114722812B (zh) 一种多模态深度学习模型脆弱性的分析方法和系统
US9947323B2 (en) Synthetic oversampling to enhance speaker identification or verification
Wang et al. ASVspoof 5: Crowdsourced speech data, deepfakes, and adversarial attacks at scale
Duraibi Voice biometric identity authentication model for IoT devices
CN113223536A (zh) 声纹识别方法、装置及终端设备
WO2019218515A1 (fr) Serveur, procédé d'authentification d'identité par empreinte vocale, et support de stockage
Biagetti et al. Speaker identification with short sequences of speech frames
WO2019196305A1 (fr) Dispositif électronique, procédé de vérification d'identité, et support de stockage
WO2019218512A1 (fr) Serveur, procédé de vérification d'empreinte vocale et support d'informations
CN113436633B (zh) 说话人识别方法、装置、计算机设备及存储介质
Guo et al. Voice-based user-device physical unclonable functions for mobile device authentication
CN113035230B (zh) 认证模型的训练方法、装置及电子设备
Lin et al. A multiscale chaotic feature extraction method for speaker recognition
Wang et al. Recording source identification using device universal background model
Nagakrishnan et al. Generic speech based person authentication system with genuine and spoofed utterances: different feature sets and models
CN111310836A (zh) 一种基于声谱图的声纹识别集成模型的防御方法及防御装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17900320

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 09/12/2019)

122 Ep: pct application non-entry in european phase

Ref document number: 17900320

Country of ref document: EP

Kind code of ref document: A1

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载