WO2018157789A1 - Procédé de reconnaissance vocale, ordinateur, support de stockage et appareil électronique - Google Patents
Procédé de reconnaissance vocale, ordinateur, support de stockage et appareil électronique Download PDFInfo
- Publication number
- WO2018157789A1 WO2018157789A1 PCT/CN2018/077413 CN2018077413W WO2018157789A1 WO 2018157789 A1 WO2018157789 A1 WO 2018157789A1 CN 2018077413 W CN2018077413 W CN 2018077413W WO 2018157789 A1 WO2018157789 A1 WO 2018157789A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- keyword
- related information
- word
- target
- text
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000003860 storage Methods 0.000 title claims description 20
- 238000004590 computer program Methods 0.000 claims description 12
- 230000015654 memory Effects 0.000 claims description 9
- 238000012549 training Methods 0.000 claims description 9
- 238000010586 diagram Methods 0.000 description 13
- 238000009313 farming Methods 0.000 description 11
- 238000012545 processing Methods 0.000 description 11
- 238000006243 chemical reaction Methods 0.000 description 8
- 241000257303 Hymenoptera Species 0.000 description 6
- 238000000605 extraction Methods 0.000 description 5
- 238000009499 grossing Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 238000003058 natural language processing Methods 0.000 description 3
- 238000012546 transfer Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000012015 optical character recognition Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 102000004169 proteins and genes Human genes 0.000 description 2
- 108090000623 proteins and genes Proteins 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000009395 breeding Methods 0.000 description 1
- 230000001488 breeding effect Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
Definitions
- the embodiments of the present invention relate to the field of computers, and in particular, to a voice recognition method, a computer, a storage medium, and an electronic device.
- a general speech recognition system includes at least two parts of an acoustic model and a language model.
- the acoustic model mainly converts the input speech signal into a topN candidate language sequence; and the language model determines the probability that the candidate language sequence conforms to a normal sentence.
- a common language model is often constructed by massive (hundreds of billions, even billions, billions of) natural text statistics of the probability of occurrence of different length fragments (Ngram).
- a disadvantage of the related art is that a common language model often has a problem of biased data recognition.
- a voice transfer scenario specifically, in a professional academic lecture scenario, a user needs to automatically perform a conference record through a voice recognition system.
- some small and professional vocabulary such as the name of a certain protein
- the general speech recognition system may not be correctly recognized because the language model may not involve the corpus in this aspect. .
- An embodiment of the present invention provides a method, a computer, a storage medium, and an electronic device for voice recognition, for identifying a keyword or a keyword related to the keyword in the identification text acquired according to the voice signal received next time.
- the recognition will show accurate recognition and improve the accuracy of speech recognition.
- a first aspect of the embodiments of the present invention provides a method for voice recognition, which may include:
- the keyword is a word of the key information in the preliminary identification text
- the preliminary identification text is a text recognized according to the voice signal
- the target related information is context information corresponding to the keyword
- the target language library is built based on the relevant information of the target.
- a second aspect of the embodiments of the present invention provides a computer, which may include:
- a first acquiring module configured to acquire a keyword in the preliminary identification text, where the keyword is a word of the key information in the preliminary identification text, and the preliminary identification text is a text identified according to the voice signal;
- a second obtaining module configured to acquire target related information according to the keyword, where the target related information is context information corresponding to the keyword;
- a module is created for establishing a target language library based on the relevant information of the target.
- a third aspect of the embodiments of the present invention provides a storage medium in which a computer program is stored, wherein the computer program is configured to execute the above method at runtime.
- a fourth aspect of the embodiments of the present invention provides an electronic device including a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the above method by the computer program.
- the keyword in the preliminary identification text is obtained, the keyword is a word of the key information in the preliminary identification text, and the preliminary identification text is a text recognized according to the voice signal; and the target is related according to the keyword Information, the target related information is context information corresponding to the keyword; and the target language library is established according to the target related information.
- the computer can receive the voice signal, obtain the corresponding preliminary identification text according to the voice signal, obtain the keyword according to the preliminary identification text, and then obtain the target related information according to the keyword, and can establish the target language according to the related information.
- the library, the target language library is used for identifying the keyword or the related topic words in the recognition text acquired according to the next received voice signal, and the accurate recognition is improved, and the accuracy of the voice recognition is improved. rate.
- FIG. 1 is a schematic diagram of a general speech recognition system according to an embodiment of the present invention.
- FIG. 2 is a schematic diagram of a frame of a voice recognition system applied in an embodiment of the present invention
- FIG. 3 is a schematic diagram of an embodiment of a method for voice recognition according to an embodiment of the present invention.
- FIG. 4 is a schematic diagram of voice recognition in an embodiment of the present invention.
- FIG. 5 is a schematic diagram of an embodiment of a computer according to an embodiment of the present invention.
- FIG. 6 is a schematic diagram of another embodiment of a computer according to an embodiment of the present invention.
- FIG. 7 is a schematic diagram of another embodiment of a computer in an embodiment of the present invention.
- An embodiment of the present invention provides a method for voice recognition and a computer, wherein in the identification text acquired according to the voice signal received next time, the identification of the keyword or the keyword related to the topic word is displayed. Accurate recognition improves the accuracy of speech recognition.
- Natural Language is actually a human language.
- Natural Language Processing is the processing of human language, of course, mainly using computers. Natural language processing is an interdisciplinary subject between computer science and linguistics. Common research tasks include: Word Segmentation or Word Breaker (WB); Information Extraction (IE); Relation Extraction (RE); Named Entity Recognition (NER); Part Of Speech Tagging (POS); Coreference Resolution; Parsing; Word Sense Disambiguation (WSD); Speech Recognition (Speech) Recognition);Text To Speech (TTS); Machine Translation (MT); Automatic Summarization; Question Answering; Natural Language Understanding; Optical Character Recognition (Optical) Character Recognition, OCR); Information Retrieval (IR).
- the language model is the model used to calculate the probability of a sentence, namely P(W1, W2...Wk).
- P(W1, W2...Wk) the probability of a sentence.
- the input pinyin string is nixianzaiganshenme, and the corresponding output can have various forms, such as what you are doing now, what you are going to do in Xi'an, etc., then which one is the correct conversion result, use
- the language model we know that the probability of the former is greater than the latter, so the conversion to the former is more reasonable in most cases.
- a schematic diagram of a general speech recognition system includes at least two parts: an acoustic model and a language model.
- the acoustic model is a knowledge representation of differences in acoustics, phonetics, environmental variables, speaker gender, accent, and the like.
- the language model is a knowledge representation of a sequence of words.
- Common language models often have problems with data recognition. For example, in a voice transfer scenario, specifically, in a professional academic lecture scenario, a voice recognition system is required to automatically perform a conference record. At this time, if the speech mentions some niche and professional vocabulary (such as the name of a certain protein), the general speech recognition system, because the language model may not involve the corpus in this aspect, and often cannot be correctly identified.
- the above-mentioned niche, professional vocabulary, long tail corpus is not exhaustive (or the cost of exhaustive is not necessary).
- FIG. 2 is a schematic diagram of a frame of a voice recognition system according to an embodiment of the present invention, including voice recognition input, voice recognition system, preliminary recognition text, extracted topic words, full network search top result summary, training context, and domain language. model.
- the solution to be solved by the embodiment of the present invention is to add domain-related long tail corpus in real time in a general language model system, so as to solve the professional vocabulary in the field of voice transcription writing, which is not recognized in the past several times in the general voice recognition system, but with With the advancement of transfer (speech), the system can automatically and accurately extract the language model corpus of the corresponding field in real time, and then the speaker can refer to the vocabulary again, and even the vocabulary related to the vocabulary can be effectively recognized.
- FIG. 3 it is a schematic diagram of an embodiment of a voice recognition method according to an embodiment of the present invention, including:
- the computer receives the voice signal.
- the voice signal herein may be a voice of a related worker in a conference scene, a voice signal received by a computer, or may be an academic report, a topic research report, or A series of scenes, such as a lecture on professional knowledge, a piece of speech signal received by a computer.
- the acoustic model can be trained with lstm+ctc to obtain the mapping of phonetic features to phonemes;
- the language model can be trained by SRILM tool LM (language mode) to get 3-gram and 4-gram, which are words and words.
- the mapping of words and sentences, the dictionary is the phoneme index set corresponding to the words, is the mapping between words and phonemes.
- the so-called acoustic model is to classify the acoustic features of speech into units of (decoding) phonemes or words; the language model then decodes the words into a complete sentence.
- the language model represents the probability of a sequence of words.
- the chain rule is used to disassemble the probability of a sentence into the product of the probability of each word in the device.
- W be composed of w1, w2, ..., wn, then P(W) can be split (by conditional probability formula and multiplication formula):
- P(W) P(w1)P(w2
- the ternary grammar only takes the first two words.
- P(W) P(w1)P(w2
- the Bayesian formula can be used to calculate the probability of occurrence of adjacent words in all corpora, and then the probability of occurrence of a single word can be counted and substituted.
- n-gram here is based on the sequence of strings, so an n-gram is equivalent to a phrase, there must be some phrases that have not appeared, but there is also a probability of occurrence, so the algorithm needs to generate these uncommon The probability of a phrase.
- the task of the acoustic model is to calculate P(X/W), which is the probability that the speech will be emitted after a given text (finally using Bayesian, when P(X/W) is used).
- P(X/W) the probability that the speech will be emitted after a given text
- the dictionary This requires another module, called the dictionary. Seeing the source code of eesen is to first find the dictionary of the corresponding phoneme in the data preparation stage. Its function is to convert the word string into a phoneme string, and then obtain the language model and the training acoustic model. (Use the lstm+ctc (long short-term memory) to train the acoustic model). With the help of the dictionary, the acoustic model knows which sounds are given in a given string of text.
- the computer may further acquire a corresponding preliminary identification text according to the voice signal. That is, the speech signal can obtain the corresponding preliminary identification text through the acoustic model and the general language model in the speech recognition system.
- a speech signal is input, and a sequence of words (consisting of words or words) is found, and the sequence of characters found has the highest degree of matching with the speech signal.
- the degree of matching is generally expressed by probability.
- P(X) can be regarded as a constant, and the denominator can be omitted.
- input pinyin nixianzaiganshenme may correspond to many conversion results.
- the possible conversion results are shown in Figure 4 (only some of the word nodes are drawn), and the nodes are formed.
- any path from the beginning to the end is a possible conversion result, and the process of selecting the most appropriate result from many conversion results requires a decoding algorithm.
- a commonly used decoding algorithm is the viterbi algorithm, which uses the principle of dynamic programming to quickly determine the most appropriate path.
- the keyword is a word that initially identifies key information in the text
- the preliminary identification text is a text that is identified according to the voice signal
- the keyword in the preliminary identification text may be obtained, and the keyword is a word that initially identifies the key information in the text.
- the subject words can be understood as the core theme of the meeting discussion, or the focus of the meeting report.
- Obtaining the keyword in the preliminary identification text may include: obtaining the keyword according to formula 1 according to the preliminary identification text, wherein formula 1 is:
- i refers to the i-th word in the preliminary identification text
- tf(i) refers to the number of times the i-th word appears in the preliminary recognized text
- idf(i) refers to the inverse document frequency of the i-th word in the preliminary recognized text.
- idf(i) is obtained by offline statistics of a large amount of text data, and formula 2 for calculating idf(i) is:
- the extraction of the topic words can also be based on the TextRank algorithm, that is, the task of keyword extraction is to automatically extract a number of meaningful words or phrases from a given text.
- the TextRank algorithm uses the relationship between local vocabularies (co-occurrence window) to sort subsequent keywords and extract them directly from the text itself. The main steps are as follows:
- Construct candidate keyword graph G (V, E), where V is a node set, consisting of candidate keywords generated in step (2), and then using co-occurrence to construct between two points Edge, there is an edge between two nodes only when their corresponding vocabulary co-occur in the window of length K, K represents the window size, that is, a maximum of K words;
- T words are obtained from step (5), marked in the original text, and if adjacent phrases are formed, combined into multi-word keywords. For example, there is a sentence in the text "Matlab code for plotting ambiguity function”. If both "Matlab” and “code” belong to candidate keywords, then the combination of "Matlab code” is added to the keyword sequence.
- the textRank source code parsing is as follows: read the text, and cut the word, the statistical co-occurrence relationship of the word cut result, the window defaults to 5, save the large cm.
- the extraction of the keyword includes, but is not limited to, the several implementations mentioned above, and the number of keywords obtained by the computer is not limited.
- target related information is context information corresponding to the keyword
- the target related information may be acquired according to the keyword, and the target related information is context information corresponding to the keyword.
- Obtaining target related information based on the keyword can include:
- obtaining the target related information through the whole network search according to the keyword may include: searching for the corresponding search result according to the keyword through the whole network; matching the search result to determine the target related information.
- target related information herein can be simply understood as the title of each article on the page displayed by the search keyword, or the abstract of each article, or all the contents of each article. However, it should be understood that if the target-related information is all the content of each article, the resources consumed are relatively large.
- the filter displayed here may be the correct phrase or may not be the correct phrase, and the computer may automatically obtain the content of the page related to the filter through some search software, such as display.
- search software such as display.
- the target language library is established according to the target related information.
- the method may include: training according to the target related information, and establishing a target language library.
- the target language library here is the domain language model established at the core of this conference or the core of this report. That is, a series of operations such as filtering and cleaning, domain matching, and the like can be performed on the target related information, and the domain language model can be obtained by training.
- the training can be performed according to the summary information in the hyperlink content of the high-pass filter, the low-pass filter, the band-pass filter and the band-stop filter, and the language model about the filter field can be obtained, and the filter field will be
- the language model is added to the general language model shown in Figure 2 above.
- the relevant information about the filter will appear in the speech recognition system first, because the speech recognition system has previously added a language model about the filter field, so the computer can be accurate. Identification, specifically whether it is a high pass filter, a low pass filter, a band pass filter or a band stop filter.
- the Ngram statistical language model can be used in the embodiment of the present invention.
- the n-gram model is also called the n-1 order Markov model. It has a finite historical hypothesis: the probability of occurrence of the current word is only related to the previous n-1 words. So P(S) can be approximated as:
- the n-gram models are called unigram, bigram, and trigram language models, respectively.
- the parameters of the n-gram model are the conditional probability P(W i
- the selected n is 3 as an example, that is, the trigram language model.
- the ngram language model that is, the above P(S) model, generally uses the maximum likelihood estimation for parameter estimation.
- the keyword in the preliminary identification text is obtained, the keyword is a word of the key information in the preliminary identification text, and the preliminary identification text is a text recognized according to the voice signal; and the target is related according to the keyword Information, the target related information is context information corresponding to the keyword; and the target language library is established according to the target related information.
- the computer can receive the voice signal, obtain the corresponding preliminary identification text according to the voice signal, obtain the keyword according to the preliminary identification text, and then obtain the target related information according to the keyword, and can establish the target language according to the related information.
- the library, the target language library is used for identifying the keyword or the related topic words in the recognition text acquired according to the next received voice signal, and the accurate recognition is improved, and the accuracy of the voice recognition is improved. rate.
- each word is assigned an "importance" weight.
- the most common words (“,” “yes”, “at”) give the least weight, the more common words ("China”) give less weight, less common words (“bees”, “farming” ) Give greater weight.
- This weight is called “Inverse Document Frequency” (IDF) and its size is inversely proportional to the common degree of a word.
- the first step is to calculate the word frequency
- Word Frequency (TF) number of occurrences of a word in an article
- the word frequency is standardized.
- Word Frequency (TF) number of occurrences of a word in an article / total number of words in an article
- Word Frequency (TF) number of occurrences of a word in an article / number of occurrences of the word with the most occurrences of the article
- the second step is to calculate the inverse document frequency
- Inverse Document Frequency log (total number of documents in the corpus / (number of documents containing the word + 1))
- the third step is to calculate the TF-IDF.
- TF-IDF Word Frequency (TF)* Inverse Document Frequency (IDF)
- TF-IDF is proportional to the number of occurrences of a word in the document and inversely proportional to the number of occurrences of the word in the entire language. Therefore, the algorithm for automatically extracting keywords is very clear, that is, the TF-IDF value of each word of the document is calculated, and then arranged in descending order, taking the top words.
- the TF-IDF algorithm can be used in many other places. For example, in information retrieval, for each document, a set of search words ("China”, “bee”, “farming") TF-IDF can be calculated separately, and they can be added to obtain the TF- of the entire document. IDF. The document with the highest value is the one most relevant to the search term.
- the “bees” and “farming” here searched as the subject words, obtained context information about “bees” and “farming”, and trained the searched context information to obtain a language model in the field of bee farming.
- FIG. 5 it is a schematic diagram of an embodiment of a computer in the embodiment of the present invention, including:
- the first obtaining module 501 is configured to initially identify a keyword in the text, the keyword is a word that initially identifies key information in the text, and the preliminary identification text is a text that is identified according to the voice signal;
- the second obtaining module 502 is configured to acquire target related information according to the keyword, where the target related information is context information corresponding to the keyword;
- the establishing module 503 is configured to establish a target language library according to the target related information.
- the first obtaining module 501 is specifically configured to obtain the keyword according to the formula 1 according to the preliminary identification text, where the formula 1 is:
- Score(i) tf(i)*idf(i), where i refers to the i-th word in the preliminary identification text, tf(i) refers to the number of times the i-th word appears in the preliminary recognized text, idf(i) Refers to the inverse document frequency of the i-th word in the preliminary identification text.
- FIG. 6 is a schematic diagram of another embodiment of a computer in the embodiment of the present invention.
- the computer further includes:
- the receiving module 504 is configured to receive a voice signal
- the third obtaining module 505 is configured to obtain a corresponding preliminary identification text according to the voice signal.
- the second obtaining module 502 is specifically configured to obtain target related information by searching through the entire network according to the keyword.
- the second obtaining module 502 is further configured to: obtain a corresponding search result by searching through the entire network according to the keyword, and match the search result to determine the target related information.
- the second obtaining module 502 is further configured to extract target related information corresponding to the keyword in the preset related information set.
- the establishing module 503 is specifically configured to perform training according to the target related information to establish a target language library.
- the embodiment of the invention further provides a storage medium, wherein the storage medium stores a computer program, wherein the computer program is set to execute the above method when it is running.
- Embodiments of the present invention also provide an electronic device including a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the above method by the computer program.
- the electronic device may be the computer shown in FIG. 7, and the processor may be the central processor shown in FIG.
- FIG. 7 is a schematic diagram of another embodiment of a computer in the present invention.
- Computer 700 may vary considerably depending on configuration or performance, and may include one or more central processing units (CPU) 722 (eg, one or more processors) and memory 732, one or more A storage medium 730 storing storage application 742 or data 744 (eg, one or one storage device in Shanghai).
- the memory 732 and the storage medium 730 may be short-term storage or persistent storage.
- the program stored on storage medium 730 may include one or more modules (not shown), each of which may include a series of instruction operations on the computer.
- central processor 722 can be configured to communicate with storage medium 730, executing a series of instruction operations in storage medium 730 on computer 700.
- Computer 700 may also include one or more power sources 726, one or more wired or wireless network interfaces 750, one or more input and output interfaces 758, and/or one or more operating systems 741, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and more.
- operating systems 741 such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and more.
- the central processing unit 722 is further configured to perform the following functions: for acquiring a keyword in the preliminary identification text, the keyword is a word for initially identifying key information in the text, and the preliminary identification text is obtained according to the voice signal.
- the text; the target related information is obtained according to the keyword, the target related information is context information corresponding to the keyword; and the target language library is established according to the target related information.
- the central processing unit 722 is specifically configured to obtain the keyword according to the formula 1 according to the preliminary identification text, where the formula 1 is:
- Score(i) tf(i)*idf(i), where i refers to the i-th word in the preliminary identification text, tf(i) refers to the number of times the i-th word appears in the preliminary recognized text, idf(i) Refers to the inverse document frequency of the i-th word in the preliminary identification text.
- the central processing unit 722 is further configured to receive a voice signal, and obtain a corresponding preliminary identification text according to the voice signal.
- the central processing unit 722 is specifically configured to obtain target related information through a full network search according to the keyword.
- the central processing unit 722 is specifically configured to obtain a corresponding search result by searching through the entire network according to the keyword, and matching the search result to determine the target related information.
- the central processing unit 722 is further configured to extract target related information corresponding to the keyword in the preset related information set.
- the central processing unit 722 is specifically configured to perform training according to the target related information to establish a target language library.
- the disclosed system, apparatus, and method may be implemented in other manners.
- the device embodiments described above are merely illustrative.
- the division of the unit is only a logical function division.
- there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
- the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
- each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
- the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
- the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
- the technical solution of the present invention may contribute to the related art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium.
- a number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
- the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
L'invention concerne un procédé de reconnaissance vocale, ledit procédé consistant à : recevoir un signal vocal (301) ; acquérir un texte de reconnaissance préliminaire correspondant d'après le signal vocal (302) ; acquérir un mot thématique dans le texte de reconnaissance préliminaire, le mot thématique étant un mot d'information-clé dans le texte de reconnaissance préliminaire (303) ; acquérir des informations cibles pertinentes d'après le mot thématique, les informations cibles pertinentes étant des informations contextuelles correspondant au mot thématique (304) ; et établir une banque vocale cible d'après les informations cibles pertinentes (305). Le procédé permet d'afficher une reconnaissance précise d'un mot thématique ou d'un mot thématique associé au mot thématique dans un texte de reconnaissance acquis d'après le signal vocal reçu suivant, ce qui permet d'améliorer la précision de la reconnaissance vocale.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710121180.2 | 2017-03-02 | ||
CN201710121180.2A CN108538286A (zh) | 2017-03-02 | 2017-03-02 | 一种语音识别的方法以及计算机 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018157789A1 true WO2018157789A1 (fr) | 2018-09-07 |
Family
ID=63370555
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/077413 WO2018157789A1 (fr) | 2017-03-02 | 2018-02-27 | Procédé de reconnaissance vocale, ordinateur, support de stockage et appareil électronique |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108538286A (fr) |
WO (1) | WO2018157789A1 (fr) |
Families Citing this family (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109522392A (zh) * | 2018-10-11 | 2019-03-26 | 平安科技(深圳)有限公司 | 基于语音的检索方法、服务器及计算机可读存储介质 |
CN111081226B (zh) * | 2018-10-18 | 2024-02-13 | 北京搜狗科技发展有限公司 | 语音识别解码优化方法及装置 |
CN109376658B (zh) * | 2018-10-26 | 2022-03-08 | 信雅达科技股份有限公司 | 一种基于深度学习的ocr方法 |
CN111125355A (zh) * | 2018-10-31 | 2020-05-08 | 北京国双科技有限公司 | 一种信息处理方法及相关设备 |
CN109360554A (zh) * | 2018-12-10 | 2019-02-19 | 广东潮庭集团有限公司 | 一种基于语深度神经网络的语言识别方法 |
CN109299248A (zh) * | 2018-12-12 | 2019-02-01 | 成都航天科工大数据研究院有限公司 | 一种基于自然语言处理的商业情报收集方法 |
CN109559744B (zh) * | 2018-12-12 | 2022-07-08 | 泰康保险集团股份有限公司 | 语音数据的处理方法、装置及可读存储介质 |
CN110136688B (zh) * | 2019-04-15 | 2023-09-29 | 平安科技(深圳)有限公司 | 一种基于语音合成的文字转语音方法及相关设备 |
CN110349568B (zh) * | 2019-06-06 | 2024-05-31 | 平安科技(深圳)有限公司 | 语音检索方法、装置、计算机设备及存储介质 |
CN111326160A (zh) * | 2020-03-11 | 2020-06-23 | 南京奥拓电子科技有限公司 | 一种纠正噪音文本的语音识别方法、系统及存储介质 |
CN111444318A (zh) * | 2020-04-08 | 2020-07-24 | 厦门快商通科技股份有限公司 | 一种文本纠错方法 |
CN112017645B (zh) * | 2020-08-31 | 2024-04-26 | 广州市百果园信息技术有限公司 | 一种语音识别方法及装置 |
CN112468665A (zh) * | 2020-11-05 | 2021-03-09 | 中国建设银行股份有限公司 | 一种会议纪要的生成方法、装置、设备及存储介质 |
CN112632319B (zh) * | 2020-12-22 | 2023-04-11 | 天津大学 | 基于迁移学习的提升长尾分布语音总体分类准确度的方法 |
CN113077792B (zh) * | 2021-03-24 | 2024-03-05 | 平安科技(深圳)有限公司 | 佛学主题词识别方法、装置、设备及存储介质 |
CN113129866B (zh) * | 2021-04-13 | 2022-08-02 | 重庆度小满优扬科技有限公司 | 语音处理方法、装置、存储介质及计算机设备 |
CN113658585B (zh) * | 2021-08-13 | 2024-04-09 | 北京百度网讯科技有限公司 | 语音交互模型的训练方法、语音交互方法及装置 |
CN113961694B (zh) * | 2021-09-22 | 2024-08-06 | 福建亿榕信息技术有限公司 | 一种基于会议的公司各单位运作情况辅助分析方法及系统 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102280106A (zh) * | 2010-06-12 | 2011-12-14 | 三星电子株式会社 | 用于移动通信终端的语音网络搜索方法及其装置 |
CN103229137A (zh) * | 2010-09-29 | 2013-07-31 | 国际商业机器公司 | 基于上下文的首字母缩略词和缩写词的歧义消除 |
CN103544140A (zh) * | 2012-07-12 | 2014-01-29 | 国际商业机器公司 | 一种数据处理方法、展示方法和相应的装置 |
CN103680498A (zh) * | 2012-09-26 | 2014-03-26 | 华为技术有限公司 | 一种语音识别方法和设备 |
US20160379626A1 (en) * | 2015-06-26 | 2016-12-29 | Michael Deisher | Language model modification for local speech recognition systems using remote sources |
CN106328145A (zh) * | 2016-08-19 | 2017-01-11 | 北京云知声信息技术有限公司 | 语音修正方法及装置 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101315624B (zh) * | 2007-05-29 | 2015-11-25 | 阿里巴巴集团控股有限公司 | 一种文本主题推荐的方法和装置 |
CN203456091U (zh) * | 2013-04-03 | 2014-02-26 | 中金数据系统有限公司 | 语音语料库的构建系统 |
CN106297800B (zh) * | 2016-08-10 | 2021-07-23 | 中国科学院计算技术研究所 | 一种自适应的语音识别的方法和设备 |
CN106328147B (zh) * | 2016-08-31 | 2022-02-01 | 中国科学技术大学 | 语音识别方法和装置 |
-
2017
- 2017-03-02 CN CN201710121180.2A patent/CN108538286A/zh active Pending
-
2018
- 2018-02-27 WO PCT/CN2018/077413 patent/WO2018157789A1/fr active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102280106A (zh) * | 2010-06-12 | 2011-12-14 | 三星电子株式会社 | 用于移动通信终端的语音网络搜索方法及其装置 |
CN103229137A (zh) * | 2010-09-29 | 2013-07-31 | 国际商业机器公司 | 基于上下文的首字母缩略词和缩写词的歧义消除 |
CN103544140A (zh) * | 2012-07-12 | 2014-01-29 | 国际商业机器公司 | 一种数据处理方法、展示方法和相应的装置 |
CN103680498A (zh) * | 2012-09-26 | 2014-03-26 | 华为技术有限公司 | 一种语音识别方法和设备 |
US20160379626A1 (en) * | 2015-06-26 | 2016-12-29 | Michael Deisher | Language model modification for local speech recognition systems using remote sources |
CN106328145A (zh) * | 2016-08-19 | 2017-01-11 | 北京云知声信息技术有限公司 | 语音修正方法及装置 |
Also Published As
Publication number | Publication date |
---|---|
CN108538286A (zh) | 2018-09-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2018157789A1 (fr) | Procédé de reconnaissance vocale, ordinateur, support de stockage et appareil électronique | |
US11775760B2 (en) | Man-machine conversation method, electronic device, and computer-readable medium | |
US10176804B2 (en) | Analyzing textual data | |
CN106537370B (zh) | 在存在来源和翻译错误的情况下对命名实体鲁棒标记的方法和系统 | |
WO2021051521A1 (fr) | Procédé et appareil d'obtention d'informations de réponse, dispositif informatique et support d'informations | |
KR101543992B1 (ko) | 언어-내 통계적 머신 번역 | |
US9330661B2 (en) | Accuracy improvement of spoken queries transcription using co-occurrence information | |
CN111611807B (zh) | 一种基于神经网络的关键词提取方法、装置及电子设备 | |
CN112069298A (zh) | 基于语义网和意图识别的人机交互方法、设备及介质 | |
US10290299B2 (en) | Speech recognition using a foreign word grammar | |
JP2004005600A (ja) | データベースに格納された文書をインデックス付け及び検索する方法及びシステム | |
US10592542B2 (en) | Document ranking by contextual vectors from natural language query | |
JP2004133880A (ja) | インデックス付き文書のデータベースとで使用される音声認識器のための動的語彙を構成する方法 | |
US20150178274A1 (en) | Speech translation apparatus and speech translation method | |
CN105096942A (zh) | 语义分析方法和装置 | |
CN112347241A (zh) | 一种摘要提取方法、装置、设备及存储介质 | |
CN113743090A (zh) | 一种关键词提取方法及装置 | |
WO2025044865A1 (fr) | Procédés et appareils de traitement de problème inter-domaine, dispositif électronique et support de stockage | |
CN110705285B (zh) | 一种政务文本主题词库构建方法、装置、服务器及可读存储介质 | |
WO2022227166A1 (fr) | Procédé et appareil de remplacement de mot, dispositif électronique et support de stockage | |
CN111126084A (zh) | 数据处理方法、装置、电子设备和存储介质 | |
CN118747500A (zh) | 基于神经网络模型的汉语言翻译方法及系统 | |
CN111161730B (zh) | 语音指令匹配方法、装置、设备及存储介质 | |
CN113486155B (zh) | 一种融合固定短语信息的中文命名方法 | |
JP2005202924A (ja) | 対訳判断装置、方法及びプログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18761662 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18761662 Country of ref document: EP Kind code of ref document: A1 |