+

WO2018157789A1 - Speech recognition method, computer, storage medium, and electronic apparatus - Google Patents

Speech recognition method, computer, storage medium, and electronic apparatus Download PDF

Info

Publication number
WO2018157789A1
WO2018157789A1 PCT/CN2018/077413 CN2018077413W WO2018157789A1 WO 2018157789 A1 WO2018157789 A1 WO 2018157789A1 CN 2018077413 W CN2018077413 W CN 2018077413W WO 2018157789 A1 WO2018157789 A1 WO 2018157789A1
Authority
WO
WIPO (PCT)
Prior art keywords
keyword
related information
word
target
text
Prior art date
Application number
PCT/CN2018/077413
Other languages
French (fr)
Chinese (zh)
Inventor
康战辉
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2018157789A1 publication Critical patent/WO2018157789A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting

Definitions

  • the embodiments of the present invention relate to the field of computers, and in particular, to a voice recognition method, a computer, a storage medium, and an electronic device.
  • a general speech recognition system includes at least two parts of an acoustic model and a language model.
  • the acoustic model mainly converts the input speech signal into a topN candidate language sequence; and the language model determines the probability that the candidate language sequence conforms to a normal sentence.
  • a common language model is often constructed by massive (hundreds of billions, even billions, billions of) natural text statistics of the probability of occurrence of different length fragments (Ngram).
  • a disadvantage of the related art is that a common language model often has a problem of biased data recognition.
  • a voice transfer scenario specifically, in a professional academic lecture scenario, a user needs to automatically perform a conference record through a voice recognition system.
  • some small and professional vocabulary such as the name of a certain protein
  • the general speech recognition system may not be correctly recognized because the language model may not involve the corpus in this aspect. .
  • An embodiment of the present invention provides a method, a computer, a storage medium, and an electronic device for voice recognition, for identifying a keyword or a keyword related to the keyword in the identification text acquired according to the voice signal received next time.
  • the recognition will show accurate recognition and improve the accuracy of speech recognition.
  • a first aspect of the embodiments of the present invention provides a method for voice recognition, which may include:
  • the keyword is a word of the key information in the preliminary identification text
  • the preliminary identification text is a text recognized according to the voice signal
  • the target related information is context information corresponding to the keyword
  • the target language library is built based on the relevant information of the target.
  • a second aspect of the embodiments of the present invention provides a computer, which may include:
  • a first acquiring module configured to acquire a keyword in the preliminary identification text, where the keyword is a word of the key information in the preliminary identification text, and the preliminary identification text is a text identified according to the voice signal;
  • a second obtaining module configured to acquire target related information according to the keyword, where the target related information is context information corresponding to the keyword;
  • a module is created for establishing a target language library based on the relevant information of the target.
  • a third aspect of the embodiments of the present invention provides a storage medium in which a computer program is stored, wherein the computer program is configured to execute the above method at runtime.
  • a fourth aspect of the embodiments of the present invention provides an electronic device including a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the above method by the computer program.
  • the keyword in the preliminary identification text is obtained, the keyword is a word of the key information in the preliminary identification text, and the preliminary identification text is a text recognized according to the voice signal; and the target is related according to the keyword Information, the target related information is context information corresponding to the keyword; and the target language library is established according to the target related information.
  • the computer can receive the voice signal, obtain the corresponding preliminary identification text according to the voice signal, obtain the keyword according to the preliminary identification text, and then obtain the target related information according to the keyword, and can establish the target language according to the related information.
  • the library, the target language library is used for identifying the keyword or the related topic words in the recognition text acquired according to the next received voice signal, and the accurate recognition is improved, and the accuracy of the voice recognition is improved. rate.
  • FIG. 1 is a schematic diagram of a general speech recognition system according to an embodiment of the present invention.
  • FIG. 2 is a schematic diagram of a frame of a voice recognition system applied in an embodiment of the present invention
  • FIG. 3 is a schematic diagram of an embodiment of a method for voice recognition according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of voice recognition in an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of an embodiment of a computer according to an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of another embodiment of a computer according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of another embodiment of a computer in an embodiment of the present invention.
  • An embodiment of the present invention provides a method for voice recognition and a computer, wherein in the identification text acquired according to the voice signal received next time, the identification of the keyword or the keyword related to the topic word is displayed. Accurate recognition improves the accuracy of speech recognition.
  • Natural Language is actually a human language.
  • Natural Language Processing is the processing of human language, of course, mainly using computers. Natural language processing is an interdisciplinary subject between computer science and linguistics. Common research tasks include: Word Segmentation or Word Breaker (WB); Information Extraction (IE); Relation Extraction (RE); Named Entity Recognition (NER); Part Of Speech Tagging (POS); Coreference Resolution; Parsing; Word Sense Disambiguation (WSD); Speech Recognition (Speech) Recognition);Text To Speech (TTS); Machine Translation (MT); Automatic Summarization; Question Answering; Natural Language Understanding; Optical Character Recognition (Optical) Character Recognition, OCR); Information Retrieval (IR).
  • the language model is the model used to calculate the probability of a sentence, namely P(W1, W2...Wk).
  • P(W1, W2...Wk) the probability of a sentence.
  • the input pinyin string is nixianzaiganshenme, and the corresponding output can have various forms, such as what you are doing now, what you are going to do in Xi'an, etc., then which one is the correct conversion result, use
  • the language model we know that the probability of the former is greater than the latter, so the conversion to the former is more reasonable in most cases.
  • a schematic diagram of a general speech recognition system includes at least two parts: an acoustic model and a language model.
  • the acoustic model is a knowledge representation of differences in acoustics, phonetics, environmental variables, speaker gender, accent, and the like.
  • the language model is a knowledge representation of a sequence of words.
  • Common language models often have problems with data recognition. For example, in a voice transfer scenario, specifically, in a professional academic lecture scenario, a voice recognition system is required to automatically perform a conference record. At this time, if the speech mentions some niche and professional vocabulary (such as the name of a certain protein), the general speech recognition system, because the language model may not involve the corpus in this aspect, and often cannot be correctly identified.
  • the above-mentioned niche, professional vocabulary, long tail corpus is not exhaustive (or the cost of exhaustive is not necessary).
  • FIG. 2 is a schematic diagram of a frame of a voice recognition system according to an embodiment of the present invention, including voice recognition input, voice recognition system, preliminary recognition text, extracted topic words, full network search top result summary, training context, and domain language. model.
  • the solution to be solved by the embodiment of the present invention is to add domain-related long tail corpus in real time in a general language model system, so as to solve the professional vocabulary in the field of voice transcription writing, which is not recognized in the past several times in the general voice recognition system, but with With the advancement of transfer (speech), the system can automatically and accurately extract the language model corpus of the corresponding field in real time, and then the speaker can refer to the vocabulary again, and even the vocabulary related to the vocabulary can be effectively recognized.
  • FIG. 3 it is a schematic diagram of an embodiment of a voice recognition method according to an embodiment of the present invention, including:
  • the computer receives the voice signal.
  • the voice signal herein may be a voice of a related worker in a conference scene, a voice signal received by a computer, or may be an academic report, a topic research report, or A series of scenes, such as a lecture on professional knowledge, a piece of speech signal received by a computer.
  • the acoustic model can be trained with lstm+ctc to obtain the mapping of phonetic features to phonemes;
  • the language model can be trained by SRILM tool LM (language mode) to get 3-gram and 4-gram, which are words and words.
  • the mapping of words and sentences, the dictionary is the phoneme index set corresponding to the words, is the mapping between words and phonemes.
  • the so-called acoustic model is to classify the acoustic features of speech into units of (decoding) phonemes or words; the language model then decodes the words into a complete sentence.
  • the language model represents the probability of a sequence of words.
  • the chain rule is used to disassemble the probability of a sentence into the product of the probability of each word in the device.
  • W be composed of w1, w2, ..., wn, then P(W) can be split (by conditional probability formula and multiplication formula):
  • P(W) P(w1)P(w2
  • the ternary grammar only takes the first two words.
  • P(W) P(w1)P(w2
  • the Bayesian formula can be used to calculate the probability of occurrence of adjacent words in all corpora, and then the probability of occurrence of a single word can be counted and substituted.
  • n-gram here is based on the sequence of strings, so an n-gram is equivalent to a phrase, there must be some phrases that have not appeared, but there is also a probability of occurrence, so the algorithm needs to generate these uncommon The probability of a phrase.
  • the task of the acoustic model is to calculate P(X/W), which is the probability that the speech will be emitted after a given text (finally using Bayesian, when P(X/W) is used).
  • P(X/W) the probability that the speech will be emitted after a given text
  • the dictionary This requires another module, called the dictionary. Seeing the source code of eesen is to first find the dictionary of the corresponding phoneme in the data preparation stage. Its function is to convert the word string into a phoneme string, and then obtain the language model and the training acoustic model. (Use the lstm+ctc (long short-term memory) to train the acoustic model). With the help of the dictionary, the acoustic model knows which sounds are given in a given string of text.
  • the computer may further acquire a corresponding preliminary identification text according to the voice signal. That is, the speech signal can obtain the corresponding preliminary identification text through the acoustic model and the general language model in the speech recognition system.
  • a speech signal is input, and a sequence of words (consisting of words or words) is found, and the sequence of characters found has the highest degree of matching with the speech signal.
  • the degree of matching is generally expressed by probability.
  • P(X) can be regarded as a constant, and the denominator can be omitted.
  • input pinyin nixianzaiganshenme may correspond to many conversion results.
  • the possible conversion results are shown in Figure 4 (only some of the word nodes are drawn), and the nodes are formed.
  • any path from the beginning to the end is a possible conversion result, and the process of selecting the most appropriate result from many conversion results requires a decoding algorithm.
  • a commonly used decoding algorithm is the viterbi algorithm, which uses the principle of dynamic programming to quickly determine the most appropriate path.
  • the keyword is a word that initially identifies key information in the text
  • the preliminary identification text is a text that is identified according to the voice signal
  • the keyword in the preliminary identification text may be obtained, and the keyword is a word that initially identifies the key information in the text.
  • the subject words can be understood as the core theme of the meeting discussion, or the focus of the meeting report.
  • Obtaining the keyword in the preliminary identification text may include: obtaining the keyword according to formula 1 according to the preliminary identification text, wherein formula 1 is:
  • i refers to the i-th word in the preliminary identification text
  • tf(i) refers to the number of times the i-th word appears in the preliminary recognized text
  • idf(i) refers to the inverse document frequency of the i-th word in the preliminary recognized text.
  • idf(i) is obtained by offline statistics of a large amount of text data, and formula 2 for calculating idf(i) is:
  • the extraction of the topic words can also be based on the TextRank algorithm, that is, the task of keyword extraction is to automatically extract a number of meaningful words or phrases from a given text.
  • the TextRank algorithm uses the relationship between local vocabularies (co-occurrence window) to sort subsequent keywords and extract them directly from the text itself. The main steps are as follows:
  • Construct candidate keyword graph G (V, E), where V is a node set, consisting of candidate keywords generated in step (2), and then using co-occurrence to construct between two points Edge, there is an edge between two nodes only when their corresponding vocabulary co-occur in the window of length K, K represents the window size, that is, a maximum of K words;
  • T words are obtained from step (5), marked in the original text, and if adjacent phrases are formed, combined into multi-word keywords. For example, there is a sentence in the text "Matlab code for plotting ambiguity function”. If both "Matlab” and “code” belong to candidate keywords, then the combination of "Matlab code” is added to the keyword sequence.
  • the textRank source code parsing is as follows: read the text, and cut the word, the statistical co-occurrence relationship of the word cut result, the window defaults to 5, save the large cm.
  • the extraction of the keyword includes, but is not limited to, the several implementations mentioned above, and the number of keywords obtained by the computer is not limited.
  • target related information is context information corresponding to the keyword
  • the target related information may be acquired according to the keyword, and the target related information is context information corresponding to the keyword.
  • Obtaining target related information based on the keyword can include:
  • obtaining the target related information through the whole network search according to the keyword may include: searching for the corresponding search result according to the keyword through the whole network; matching the search result to determine the target related information.
  • target related information herein can be simply understood as the title of each article on the page displayed by the search keyword, or the abstract of each article, or all the contents of each article. However, it should be understood that if the target-related information is all the content of each article, the resources consumed are relatively large.
  • the filter displayed here may be the correct phrase or may not be the correct phrase, and the computer may automatically obtain the content of the page related to the filter through some search software, such as display.
  • search software such as display.
  • the target language library is established according to the target related information.
  • the method may include: training according to the target related information, and establishing a target language library.
  • the target language library here is the domain language model established at the core of this conference or the core of this report. That is, a series of operations such as filtering and cleaning, domain matching, and the like can be performed on the target related information, and the domain language model can be obtained by training.
  • the training can be performed according to the summary information in the hyperlink content of the high-pass filter, the low-pass filter, the band-pass filter and the band-stop filter, and the language model about the filter field can be obtained, and the filter field will be
  • the language model is added to the general language model shown in Figure 2 above.
  • the relevant information about the filter will appear in the speech recognition system first, because the speech recognition system has previously added a language model about the filter field, so the computer can be accurate. Identification, specifically whether it is a high pass filter, a low pass filter, a band pass filter or a band stop filter.
  • the Ngram statistical language model can be used in the embodiment of the present invention.
  • the n-gram model is also called the n-1 order Markov model. It has a finite historical hypothesis: the probability of occurrence of the current word is only related to the previous n-1 words. So P(S) can be approximated as:
  • the n-gram models are called unigram, bigram, and trigram language models, respectively.
  • the parameters of the n-gram model are the conditional probability P(W i
  • the selected n is 3 as an example, that is, the trigram language model.
  • the ngram language model that is, the above P(S) model, generally uses the maximum likelihood estimation for parameter estimation.
  • the keyword in the preliminary identification text is obtained, the keyword is a word of the key information in the preliminary identification text, and the preliminary identification text is a text recognized according to the voice signal; and the target is related according to the keyword Information, the target related information is context information corresponding to the keyword; and the target language library is established according to the target related information.
  • the computer can receive the voice signal, obtain the corresponding preliminary identification text according to the voice signal, obtain the keyword according to the preliminary identification text, and then obtain the target related information according to the keyword, and can establish the target language according to the related information.
  • the library, the target language library is used for identifying the keyword or the related topic words in the recognition text acquired according to the next received voice signal, and the accurate recognition is improved, and the accuracy of the voice recognition is improved. rate.
  • each word is assigned an "importance" weight.
  • the most common words (“,” “yes”, “at”) give the least weight, the more common words ("China”) give less weight, less common words (“bees”, “farming” ) Give greater weight.
  • This weight is called “Inverse Document Frequency” (IDF) and its size is inversely proportional to the common degree of a word.
  • the first step is to calculate the word frequency
  • Word Frequency (TF) number of occurrences of a word in an article
  • the word frequency is standardized.
  • Word Frequency (TF) number of occurrences of a word in an article / total number of words in an article
  • Word Frequency (TF) number of occurrences of a word in an article / number of occurrences of the word with the most occurrences of the article
  • the second step is to calculate the inverse document frequency
  • Inverse Document Frequency log (total number of documents in the corpus / (number of documents containing the word + 1))
  • the third step is to calculate the TF-IDF.
  • TF-IDF Word Frequency (TF)* Inverse Document Frequency (IDF)
  • TF-IDF is proportional to the number of occurrences of a word in the document and inversely proportional to the number of occurrences of the word in the entire language. Therefore, the algorithm for automatically extracting keywords is very clear, that is, the TF-IDF value of each word of the document is calculated, and then arranged in descending order, taking the top words.
  • the TF-IDF algorithm can be used in many other places. For example, in information retrieval, for each document, a set of search words ("China”, “bee”, “farming") TF-IDF can be calculated separately, and they can be added to obtain the TF- of the entire document. IDF. The document with the highest value is the one most relevant to the search term.
  • the “bees” and “farming” here searched as the subject words, obtained context information about “bees” and “farming”, and trained the searched context information to obtain a language model in the field of bee farming.
  • FIG. 5 it is a schematic diagram of an embodiment of a computer in the embodiment of the present invention, including:
  • the first obtaining module 501 is configured to initially identify a keyword in the text, the keyword is a word that initially identifies key information in the text, and the preliminary identification text is a text that is identified according to the voice signal;
  • the second obtaining module 502 is configured to acquire target related information according to the keyword, where the target related information is context information corresponding to the keyword;
  • the establishing module 503 is configured to establish a target language library according to the target related information.
  • the first obtaining module 501 is specifically configured to obtain the keyword according to the formula 1 according to the preliminary identification text, where the formula 1 is:
  • Score(i) tf(i)*idf(i), where i refers to the i-th word in the preliminary identification text, tf(i) refers to the number of times the i-th word appears in the preliminary recognized text, idf(i) Refers to the inverse document frequency of the i-th word in the preliminary identification text.
  • FIG. 6 is a schematic diagram of another embodiment of a computer in the embodiment of the present invention.
  • the computer further includes:
  • the receiving module 504 is configured to receive a voice signal
  • the third obtaining module 505 is configured to obtain a corresponding preliminary identification text according to the voice signal.
  • the second obtaining module 502 is specifically configured to obtain target related information by searching through the entire network according to the keyword.
  • the second obtaining module 502 is further configured to: obtain a corresponding search result by searching through the entire network according to the keyword, and match the search result to determine the target related information.
  • the second obtaining module 502 is further configured to extract target related information corresponding to the keyword in the preset related information set.
  • the establishing module 503 is specifically configured to perform training according to the target related information to establish a target language library.
  • the embodiment of the invention further provides a storage medium, wherein the storage medium stores a computer program, wherein the computer program is set to execute the above method when it is running.
  • Embodiments of the present invention also provide an electronic device including a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the above method by the computer program.
  • the electronic device may be the computer shown in FIG. 7, and the processor may be the central processor shown in FIG.
  • FIG. 7 is a schematic diagram of another embodiment of a computer in the present invention.
  • Computer 700 may vary considerably depending on configuration or performance, and may include one or more central processing units (CPU) 722 (eg, one or more processors) and memory 732, one or more A storage medium 730 storing storage application 742 or data 744 (eg, one or one storage device in Shanghai).
  • the memory 732 and the storage medium 730 may be short-term storage or persistent storage.
  • the program stored on storage medium 730 may include one or more modules (not shown), each of which may include a series of instruction operations on the computer.
  • central processor 722 can be configured to communicate with storage medium 730, executing a series of instruction operations in storage medium 730 on computer 700.
  • Computer 700 may also include one or more power sources 726, one or more wired or wireless network interfaces 750, one or more input and output interfaces 758, and/or one or more operating systems 741, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and more.
  • operating systems 741 such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and more.
  • the central processing unit 722 is further configured to perform the following functions: for acquiring a keyword in the preliminary identification text, the keyword is a word for initially identifying key information in the text, and the preliminary identification text is obtained according to the voice signal.
  • the text; the target related information is obtained according to the keyword, the target related information is context information corresponding to the keyword; and the target language library is established according to the target related information.
  • the central processing unit 722 is specifically configured to obtain the keyword according to the formula 1 according to the preliminary identification text, where the formula 1 is:
  • Score(i) tf(i)*idf(i), where i refers to the i-th word in the preliminary identification text, tf(i) refers to the number of times the i-th word appears in the preliminary recognized text, idf(i) Refers to the inverse document frequency of the i-th word in the preliminary identification text.
  • the central processing unit 722 is further configured to receive a voice signal, and obtain a corresponding preliminary identification text according to the voice signal.
  • the central processing unit 722 is specifically configured to obtain target related information through a full network search according to the keyword.
  • the central processing unit 722 is specifically configured to obtain a corresponding search result by searching through the entire network according to the keyword, and matching the search result to determine the target related information.
  • the central processing unit 722 is further configured to extract target related information corresponding to the keyword in the preset related information set.
  • the central processing unit 722 is specifically configured to perform training according to the target related information to establish a target language library.
  • the disclosed system, apparatus, and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
  • the technical solution of the present invention may contribute to the related art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium.
  • a number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

A speech recognition method, the method comprising: receiving a speech signal (301); on the basis of the speech signal, acquiring a corresponding preliminary recognition text (302); acquiring a topic word in the preliminary recognition text, the topic word being a key information word in the preliminary recognition text (303); on the basis of the topic word, acquiring target relevant information, the target relevant information being context information corresponding to the topic word (304); and, on the basis of the target relevant information, establishing a target speech bank (305). The present method is used for displaying accurate recognition of a topic word or a topic word related to the topic word in a recognition text acquired on the basis of the next received speech signal, thereby improving the accuracy of speech recognition.

Description

一种语音识别的方法、计算机、存储介质以及电子装置Method, computer, storage medium and electronic device for speech recognition
本申请要求于2017年03月02日提交中国专利局、申请号为201710121180.2、发明名称为“一种语音识别的方法以及计算机”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese Patent Application, filed on March 2, 2017, filed on Jan. 2,,,,,,,,,,,,,,,,,,,,,,,,, in.
技术领域Technical field
本发明实施例涉及计算机领域,尤其涉及一种语音识别的方法、计算机、存储介质以及电子装置。The embodiments of the present invention relate to the field of computers, and in particular, to a voice recognition method, a computer, a storage medium, and an electronic device.
背景技术Background technique
一个通用的语音识别系统至少包括声学模型和语言模型两大部分。其中声学模型主要是将输入的语音信号转化为topN候选的语言序列;而语言模型则是判别候选语言序列是否符合一个正常语句的概率。至此,一个通用的语言模型往往是通过海量(几亿,乃至几十亿,上百亿)自然文本统计不同长度片段(Ngram)的出现概率而构建。A general speech recognition system includes at least two parts of an acoustic model and a language model. The acoustic model mainly converts the input speech signal into a topN candidate language sequence; and the language model determines the probability that the candidate language sequence conforms to a normal sentence. At this point, a common language model is often constructed by massive (hundreds of billions, even billions, billions of) natural text statistics of the probability of occurrence of different length fragments (Ngram).
相关技术的缺点是,通用的语言模型往往存在数据识别有偏的问题。比如在语音转写场景下,具体来说比如某个专业的学术演讲场景下,用户需要通过语音识别系统自动做会议记录。此时如果在会议演讲中提到一些小众、专业的词汇(比如某种蛋白质的名字),通用的语音识别系统,由于其中的语言模型可能没有涉及到这方面的语料,进而往往不能正确识别。A disadvantage of the related art is that a common language model often has a problem of biased data recognition. For example, in a voice transfer scenario, specifically, in a professional academic lecture scenario, a user needs to automatically perform a conference record through a voice recognition system. At this time, if some small and professional vocabulary (such as the name of a certain protein) is mentioned in the conference speech, the general speech recognition system may not be correctly recognized because the language model may not involve the corpus in this aspect. .
发明内容Summary of the invention
本发明实施例提供了一种语音识别的方法、计算机、存储介质以及电子装置,用于在根据下次接收的语音信号获取的识别文本中,对于该主题词或者该主题词相关的主题词的识别就会显示的是准确的识别,提高了语音识别的准确率。An embodiment of the present invention provides a method, a computer, a storage medium, and an electronic device for voice recognition, for identifying a keyword or a keyword related to the keyword in the identification text acquired according to the voice signal received next time. The recognition will show accurate recognition and improve the accuracy of speech recognition.
本发明实施例第一方面提供一种语音识别的方法,可以包括:A first aspect of the embodiments of the present invention provides a method for voice recognition, which may include:
获取初步识别文本中的主题词,该主题词为该初步识别文本中关键信息 的词,该初步识别文本为根据语音信号识别得到的文本;Obtaining a keyword in the preliminary identification text, the keyword is a word of the key information in the preliminary identification text, and the preliminary identification text is a text recognized according to the voice signal;
根据该主题词获取目标相关信息,该目标相关信息为与该主题词对应的上下文信息;Obtaining target related information according to the keyword, the target related information is context information corresponding to the keyword;
根据该目标相关信息建立目标语言库。The target language library is built based on the relevant information of the target.
本发明实施例第二方面提供一种计算机,可以包括:A second aspect of the embodiments of the present invention provides a computer, which may include:
第一获取模块,用于获取初步识别文本中的主题词,该主题词为该初步识别文本中关键信息的词,该初步识别文本为根据语音信号识别得到的文本;a first acquiring module, configured to acquire a keyword in the preliminary identification text, where the keyword is a word of the key information in the preliminary identification text, and the preliminary identification text is a text identified according to the voice signal;
第二获取模块,用于根据该主题词获取目标相关信息,该目标相关信息为与该主题词对应的上下文信息;a second obtaining module, configured to acquire target related information according to the keyword, where the target related information is context information corresponding to the keyword;
建立模块,用于根据该目标相关信息建立目标语言库。A module is created for establishing a target language library based on the relevant information of the target.
本发明实施例第三方面提供一种存储介质,所述存储介质中存储有计算机程序,其中,所述计算机程序被设置为运行时执行上述的方法。A third aspect of the embodiments of the present invention provides a storage medium in which a computer program is stored, wherein the computer program is configured to execute the above method at runtime.
本发明实施例第四方面提供一种电子装置,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器被设置为通过所述计算机程序执行上述的方法。A fourth aspect of the embodiments of the present invention provides an electronic device including a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the above method by the computer program.
从以上技术方案可以看出,本发明实施例具有以下优点:It can be seen from the above technical solutions that the embodiments of the present invention have the following advantages:
在本发明实施例中,获取初步识别文本中的主题词,该主题词为该初步识别文本中关键信息的词,该初步识别文本为根据语音信号识别得到的文本;根据该主题词获取目标相关信息,该目标相关信息为与该主题词对应的上下文信息;根据该目标相关信息建立目标语言库。用户在使用计算机的过程中,计算机可以接收语音信号,根据语音信号获取对应的初步识别文本,再根据初步识别文本获取主题词,然后根据该主题词获取目标相关信息,可以根据相关信息建立目标语言库,目标语言库用于在根据下次接收的语音信号获取的识别文本中,对于该主题词或者该主题词相关的主题词的识别就会显示的是准确的识别,提高了语音识别的准确率。In the embodiment of the present invention, the keyword in the preliminary identification text is obtained, the keyword is a word of the key information in the preliminary identification text, and the preliminary identification text is a text recognized according to the voice signal; and the target is related according to the keyword Information, the target related information is context information corresponding to the keyword; and the target language library is established according to the target related information. In the process of using the computer, the computer can receive the voice signal, obtain the corresponding preliminary identification text according to the voice signal, obtain the keyword according to the preliminary identification text, and then obtain the target related information according to the keyword, and can establish the target language according to the related information. The library, the target language library is used for identifying the keyword or the related topic words in the recognition text acquired according to the next received voice signal, and the accurate recognition is improved, and the accuracy of the voice recognition is improved. rate.
附图说明DRAWINGS
为了更清楚地说明本发明实施例技术方案,下面将对实施例和相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅 仅是本发明的一些实施例,还可以根据这些附图获得其它的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the embodiments and the related art description will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the present invention. Other drawings can also be obtained from these figures.
图1为本发明实施例中通用的语音识别系统的一个示意图;1 is a schematic diagram of a general speech recognition system according to an embodiment of the present invention;
图2为本发明实施例中所应用的语音识别系统的框架示意图;2 is a schematic diagram of a frame of a voice recognition system applied in an embodiment of the present invention;
图3为本发明实施例中语音识别的方法的一个实施例示意图;3 is a schematic diagram of an embodiment of a method for voice recognition according to an embodiment of the present invention;
图4为本发明实施例中语音识别的一个示意图;4 is a schematic diagram of voice recognition in an embodiment of the present invention;
图5为本发明实施例中计算机的一个实施例示意图;FIG. 5 is a schematic diagram of an embodiment of a computer according to an embodiment of the present invention; FIG.
图6为本发明实施例中计算机的另一个实施例示意图;FIG. 6 is a schematic diagram of another embodiment of a computer according to an embodiment of the present invention; FIG.
图7为本发明实施例中计算机的另一个实施例示意图。FIG. 7 is a schematic diagram of another embodiment of a computer in an embodiment of the present invention.
具体实施方式detailed description
本发明实施例提供了一种语音识别的方法以及计算机,用于在根据下次接收的语音信号获取的识别文本中,对于该主题词或者该主题词相关的主题词的识别就会显示的是准确的识别,提高了语音识别的准确率。An embodiment of the present invention provides a method for voice recognition and a computer, wherein in the identification text acquired according to the voice signal received next time, the identification of the keyword or the keyword related to the topic word is displayed. Accurate recognition improves the accuracy of speech recognition.
为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行描述,显然,所描述的实施例仅仅是本发明一部分的实施例,而不是全部的实施例。基于本发明中的实施例,都应当属于本发明保护的范围。In order to make those skilled in the art better understand the solution of the present invention, the technical solutions in the embodiments of the present invention will be described below in conjunction with the drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the present invention. Embodiments, rather than all of the embodiments. Embodiments based on the present invention are all within the scope of the present invention.
自然语言(Natural Language)其实就是人类语言,自然语言处理(Natural Language Processing,NLP)就是对人类语言的处理,当然主要是利用计算机。自然语言处理是关于计算机科学和语言学的交叉学科,常见的研究任务包括:分词(Word Segmentation或Word Breaker,WB);信息抽取(Information Extraction,IE);关系抽取(Relation Extraction,RE);命名实体识别(Named Entity Recognition,NER);词性标注(Part Of Speech Tagging,POS);指代消解(Coreference Resolution);句法分析(Parsing);词义消歧(Word Sense Disambiguation,WSD);语音识别(Speech Recognition);语音合成(Text To Speech,TTS);机器翻译(Machine Translation,MT);自动文摘(Automatic Summarization);问答系统(Question Answering);自然语言理解(Natural Language Understanding);光学字符识别(Optical Character Recognition,OCR);信息检索(Information Retrieval,IR)。Natural Language is actually a human language. Natural Language Processing (NLP) is the processing of human language, of course, mainly using computers. Natural language processing is an interdisciplinary subject between computer science and linguistics. Common research tasks include: Word Segmentation or Word Breaker (WB); Information Extraction (IE); Relation Extraction (RE); Named Entity Recognition (NER); Part Of Speech Tagging (POS); Coreference Resolution; Parsing; Word Sense Disambiguation (WSD); Speech Recognition (Speech) Recognition);Text To Speech (TTS); Machine Translation (MT); Automatic Summarization; Question Answering; Natural Language Understanding; Optical Character Recognition (Optical) Character Recognition, OCR); Information Retrieval (IR).
简单地说,语言模型就是用来计算一个句子的概率的模型,即P(W1,W2......Wk)。利用语言模型,可以确定哪个词序列的可能性更大,或者给定若干个词,可以预测下一个最可能出现的词语。举个音字转换的例子来说,输入拼音串为nixianzaiganshenme,对应的输出可以有多种形式,如你现在干什么、你西安再赶什么、等等,那么到底哪个才是正确的转换结果呢,利用语言模型,我们知道前者的概率大于后者,因此转换成前者在多数情况下比较合理。再举一个机器翻译的例子,给定一个汉语句子为李明正在家里看电视,可以翻译为Li Ming is watching TV at home、Li Ming at home is watching TV、等等,同样根据语言模型,我们知道前者的概率大于后者,所以翻译成前者比较合理。Simply put, the language model is the model used to calculate the probability of a sentence, namely P(W1, W2...Wk). Using a language model, you can determine which word sequence is more likely, or given a number of words to predict the next most likely word. For an example of a phonetic conversion, the input pinyin string is nixianzaiganshenme, and the corresponding output can have various forms, such as what you are doing now, what you are going to do in Xi'an, etc., then which one is the correct conversion result, use The language model, we know that the probability of the former is greater than the latter, so the conversion to the former is more reasonable in most cases. Another example of machine translation, given a Chinese sentence for Li Ming is watching TV at home, can be translated as Li Ming is watching TV at home, Li Ming at home is watching TV, etc., also according to the language model, we know The probability of the former is greater than the latter, so it is more reasonable to translate into the former.
如图1所示,为通用的语音识别系统的示意图,至少包括声学模型和语言模型两大部分,声学模型是对声学、语音学、环境的变量、说话人性别、口音等的差异的知识表示,而语言模型是对一组字序列构成的知识表示。通用的语言模型往往存在数据识别有偏的问题。比如在语音转写场景下,具体来说比如某个专业的学术演讲场景下,需要通过语音识别系统自动做会议记录。此时如果演讲中提到一些小众、专业的词汇(比如某种蛋白质的名字),通用的语音识别系统,由于其中的语言模型可能没有涉及到这方面的语料,进而往往不能正确识别。而以上这种小众、专业的词汇,长尾语料是不能够穷举的(或者说穷举的成本很高也没必要)。As shown in FIG. 1 , a schematic diagram of a general speech recognition system includes at least two parts: an acoustic model and a language model. The acoustic model is a knowledge representation of differences in acoustics, phonetics, environmental variables, speaker gender, accent, and the like. And the language model is a knowledge representation of a sequence of words. Common language models often have problems with data recognition. For example, in a voice transfer scenario, specifically, in a professional academic lecture scenario, a voice recognition system is required to automatically perform a conference record. At this time, if the speech mentions some niche and professional vocabulary (such as the name of a certain protein), the general speech recognition system, because the language model may not involve the corpus in this aspect, and often cannot be correctly identified. The above-mentioned niche, professional vocabulary, long tail corpus is not exhaustive (or the cost of exhaustive is not necessary).
如图2所示,为本发明实施例所应用的语音识别系统的框架示意图,包括语音识别输入、语音识别系统、初步识别文本、提取主题词、全网搜索top结果摘要、训练上下文和领域语言模型。本发明实施例要解决的就是在通用的语言模型系统中实时加入领域相关的长尾语料,以解决在语音转写场景下,当通用语音识别系统前几次不能识别的领域专业词汇,但随着转写(演讲)的推进,系统可以准确实时的自动挖掘补充相应领域的语言模型语料,后续当演讲者再次提到该词汇,甚至与该词汇相关的词汇时可有效识别。FIG. 2 is a schematic diagram of a frame of a voice recognition system according to an embodiment of the present invention, including voice recognition input, voice recognition system, preliminary recognition text, extracted topic words, full network search top result summary, training context, and domain language. model. The solution to be solved by the embodiment of the present invention is to add domain-related long tail corpus in real time in a general language model system, so as to solve the professional vocabulary in the field of voice transcription writing, which is not recognized in the past several times in the general voice recognition system, but with With the advancement of transfer (speech), the system can automatically and accurately extract the language model corpus of the corresponding field in real time, and then the speaker can refer to the vocabulary again, and even the vocabulary related to the vocabulary can be effectively recognized.
下面以实施例的方式对本发明实施例的技术方案做进一步的描述,如图3所示,为本发明实施例中语音识别的方法的一个实施例示意图,包括:The technical solution of the embodiment of the present invention is further described in the following embodiments. As shown in FIG. 3, it is a schematic diagram of an embodiment of a voice recognition method according to an embodiment of the present invention, including:
301、接收语音信号;301. Receive a voice signal;
在本发明实施例中,计算机接收语音信号,示例性的,这里的语音信号可以是在会议场景中相关工作人员的声音,被计算机接收的语音信号;也可以是在学术报告、主题研究报告、专业知识讲座等一系列场景中,计算机所接收的一段语音信号。其中,声学模型可以用lstm+ctc训练,得到语音特征到音素的映射;语言模型可以用SRILM工具做LM(language mode,即语言模型)的训练得到3-gram and 4-gram,是词与词、词与句子的映射,字典是字词对应的音素index集合,是字词和音素之间的映射。In the embodiment of the present invention, the computer receives the voice signal. Illustratively, the voice signal herein may be a voice of a related worker in a conference scene, a voice signal received by a computer, or may be an academic report, a topic research report, or A series of scenes, such as a lecture on professional knowledge, a piece of speech signal received by a computer. Among them, the acoustic model can be trained with lstm+ctc to obtain the mapping of phonetic features to phonemes; the language model can be trained by SRILM tool LM (language mode) to get 3-gram and 4-gram, which are words and words. , the mapping of words and sentences, the dictionary is the phoneme index set corresponding to the words, is the mapping between words and phonemes.
所谓声学模型就是把语音的声学特征分类对应到(解码)音素或字词这样的单元;语言模型接着把字词解码成一个完整的句子。The so-called acoustic model is to classify the acoustic features of speech into units of (decoding) phonemes or words; the language model then decodes the words into a complete sentence.
先说语言模型,语言模型表示某一字序列发生的概率,一般采用链式法则,把一个句子的概率拆解成器中的每个词的概率之积。设W是由w1,w2,...,wn组成的,则P(W)可以拆成(由条件概率公式和乘法公式):Let's first talk about the language model. The language model represents the probability of a sequence of words. Generally, the chain rule is used to disassemble the probability of a sentence into the product of the probability of each word in the device. Let W be composed of w1, w2, ..., wn, then P(W) can be split (by conditional probability formula and multiplication formula):
P(W)=P(w1)P(w2/w1)P(w3/w1,w2)...P(wn/w1,w2,...wn-1),每一项都是在之前所有词的概率条件下,当前词的概率。由马尔卡夫模型的思想,最常见的做法就是用N-元文法,即假定某一个字的输出只与前面N-1个字出现的概率有关系,这种语言模型叫做n-gram模型(一般n取3,即t rigram),这时候我们就可以这么表示:P(W)=P(w1)P(w2/w1)P(w3/w1,w2)...P(wn/w1,w2,...wn-1), each item is all before The probability of the current word under the probability condition of the word. The most common practice of the Markov model is to use the N-gram grammar, which assumes that the output of a word is only related to the probability of the occurrence of the previous N-1 words. This language model is called the n-gram model ( Usually n takes 3, ie t rigram), then we can say this:
P(W)=P(w1)P(w2|w1)P(w3|w1,w2)P(w4|w1,w2,w3)...P(wn/wn-1,wn-2,...,w1),条件太长的时候,概率就不好估计了,三元文法只取前两个词P(W)=P(w1)P(w2|w1)P(w3|w1,w2)P(w4|w1,w2,w3)...P(wn/wn-1,wn-2,.. ., w1), when the condition is too long, the probability is not easy to estimate. The ternary grammar only takes the first two words.
P(W)=P(w1)P(w2|w1)P(w3|w1,w2)P(w4|w2,w3)...P(wn/wn-1,wn-2),P(W)=P(w1)P(w2|w1)P(w3|w1,w2)P(w4|w2,w3)...P(wn/wn-1,wn-2),
对于其中的每一项条件概率都可以用贝叶斯公式求出,在所有的语料中统计出相邻的字发生的概率,再统计出单个字出现的概率,代入即可。For each of these conditional probabilities, the Bayesian formula can be used to calculate the probability of occurrence of adjacent words in all corpora, and then the probability of occurrence of a single word can be counted and substituted.
需要说明的是,这里的n-gram是根据字符串序列建立的,所以一个n-gram就相当于词组,必然会有一些词组没有出现过,但是也存在发生的概率,所以需要算法生成这些生僻词组的概率。It should be noted that the n-gram here is based on the sequence of strings, so an n-gram is equivalent to a phrase, there must be some phrases that have not appeared, but there is also a probability of occurrence, so the algorithm needs to generate these uncommon The probability of a phrase.
再说声学模型,声学模型的任务是计算P(X/W),即给定文字之后发出这段语音的概率(最后利用贝叶斯,求P(X/W)时使用)。首先第一问题:怎么才能知道每个单词发什么音呢?这就需要另外一个模块,叫做词典,看eesen的源码在数据准备阶段就是先求出词对应音素的词典,它的作用就是把单词 串转化成音素串,然后再求得语言模型和训练声学模型(用lstm+ctc(long short-term memory,即长短期记忆网络)训练声学模型)。有了词典的帮助,声学模型就知道给定的文字串该依次发哪些音了。In addition to the acoustic model, the task of the acoustic model is to calculate P(X/W), which is the probability that the speech will be emitted after a given text (finally using Bayesian, when P(X/W) is used). First of all, the first question: How do you know what sounds are emitted by each word? This requires another module, called the dictionary. Seeing the source code of eesen is to first find the dictionary of the corresponding phoneme in the data preparation stage. Its function is to convert the word string into a phoneme string, and then obtain the language model and the training acoustic model. (Use the lstm+ctc (long short-term memory) to train the acoustic model). With the help of the dictionary, the acoustic model knows which sounds are given in a given string of text.
302、根据语音信号获取对应的初步识别文本;302. Acquire a corresponding preliminary identification text according to the voice signal.
在本发明实施例中,计算机在接收语音信号之后,还可以根据语音信号获取对应的初步识别文本。即语音信号可以通过语音识别系统中的声学模型和通用语言模型,获取对应的初步识别文本。In the embodiment of the present invention, after receiving the voice signal, the computer may further acquire a corresponding preliminary identification text according to the voice signal. That is, the speech signal can obtain the corresponding preliminary identification text through the acoustic model and the general language model in the speech recognition system.
具体来说就是输入一段语音信号,要找到一个文字序列(由字或者词组成),找到的这个文字序列与语音信号的匹配程度最高。这个匹配程度,一般都是用概率来表示的,用X表示语音信号,用W表示文字序列,则要解的是下面这个问题:Specifically, a speech signal is input, and a sequence of words (consisting of words or words) is found, and the sequence of characters found has the highest degree of matching with the speech signal. The degree of matching is generally expressed by probability. When X is used to represent the speech signal and W is used to represent the sequence of words, the following problem is solved:
Figure PCTCN2018077413-appb-000001
Figure PCTCN2018077413-appb-000001
但是一般语音是由文字产生的,已知文字才能发出语音,所以对于上面的条件概率公式我们想要已知结果求该条件下发生概率,这时候自然而然就想到贝叶斯公式:However, the general speech is generated by words, and the known words can emit speech. Therefore, for the above conditional probability formula, we want to know the probability of occurrence under this condition. At this time, we naturally think of the Bayesian formula:
Figure PCTCN2018077413-appb-000002
由于我们要优化W,P(X)可以看作常数,可以省略分母。
Figure PCTCN2018077413-appb-000002
Since we want to optimize W, P(X) can be regarded as a constant, and the denominator can be omitted.
由上边的步骤来看,求文字串、计算语言模型概率、求音素串、求音素分界点、计算声学模型概率几个步骤似乎是依次进行的。其实不然,在实际编码过程中,因为文字串、音素分界点都有非常多种可能,枚举是不现实的。实际中,这几个步骤同时进行并互相制约,随时砍掉不够优的可能,最终在可接受的时间内求出最优解,如下所示:From the above steps, the steps of finding the character string, calculating the language model probability, finding the phoneme string, finding the phoneme boundary point, and calculating the acoustic model probability seem to be sequential. In fact, in the actual coding process, because the text string and the phoneme boundary point have many kinds of possibilities, the enumeration is unrealistic. In practice, these steps are carried out at the same time and restrict each other, and the possibility of not being good enough is cut off at any time, and finally the optimal solution is obtained within an acceptable time, as follows:
W*=argmaxP(W|X)。W*=argmaxP(W|X).
举个例子来说,对于音字转换问题,输入拼音nixianzaiganshenme,可能对应着很多转换结果,对于这个例子,可能的转换结果如下图4所示(只画出部分的词语节点),各节点之间构成了复杂的网络结构,从开始到结束的任意一条路径都是可能的转换结果,从诸多转换结果中选择最合适的结果的过程就需要解码算法。For example, for the word conversion problem, input pinyin nixianzaiganshenme may correspond to many conversion results. For this example, the possible conversion results are shown in Figure 4 (only some of the word nodes are drawn), and the nodes are formed. In the complex network structure, any path from the beginning to the end is a possible conversion result, and the process of selecting the most appropriate result from many conversion results requires a decoding algorithm.
常用的解码算法是viterbi算法,它采用动态规划的原理能够很快地确定 最合适的路径。A commonly used decoding algorithm is the viterbi algorithm, which uses the principle of dynamic programming to quickly determine the most appropriate path.
303、获取初步识别文本中的主题词,主题词为初步识别文本中关键信息的词,初步识别文本为根据语音信号识别得到的文本;303. Obtain a keyword in the preliminary identification text, where the keyword is a word that initially identifies key information in the text, and the preliminary identification text is a text that is identified according to the voice signal;
在本发明实施例中,计算机根据语音信号获取对应的初步识别文本之后,可以获取初步识别文本中的主题词,主题词为初步识别文本中关键信息的词。其中,主题词可以理解为该会议讨论的核心主题,也可以是会议报告的重心等。In the embodiment of the present invention, after the computer obtains the corresponding preliminary identification text according to the voice signal, the keyword in the preliminary identification text may be obtained, and the keyword is a word that initially identifies the key information in the text. Among them, the subject words can be understood as the core theme of the meeting discussion, or the focus of the meeting report.
获取初步识别文本中的主题词可以包括:根据初步识别文本按照公式1获取主题词,其中,公式1为:Obtaining the keyword in the preliminary identification text may include: obtaining the keyword according to formula 1 according to the preliminary identification text, wherein formula 1 is:
Score(i)=tf(i)*idf(i)   (公式1)Score(i)=tf(i)*idf(i) (Equation 1)
其中,i指初步识别文本中第i个词,tf(i)指第i个词在初步识别文本中出现的次数,idf(i)指第i个词在初步识别文本中的逆文档频率。Where i refers to the i-th word in the preliminary identification text, tf(i) refers to the number of times the i-th word appears in the preliminary recognized text, and idf(i) refers to the inverse document frequency of the i-th word in the preliminary recognized text.
进一步的,idf(i)为通过大量文本数据离线统计而得,计算idf(i)的公式2为:Further, idf(i) is obtained by offline statistics of a large amount of text data, and formula 2 for calculating idf(i) is:
Figure PCTCN2018077413-appb-000003
Figure PCTCN2018077413-appb-000003
其中,|D|为文档集里的文档个数,d j为第j个文档,t i为第j个文档中的第i个词。 Where |D| is the number of documents in the document set, d j is the jth document, and t i is the i-th word in the j-th document.
对于主题词的提取还可以基于TextRank算法,即关键词抽取的任务就是从一段给定的文本中自动抽取出若干有意义的词语或词组。TextRank算法是利用局部词汇之间关系(共现窗口)对后续关键词进行排序,直接从文本本身抽取。其主要步骤如下:The extraction of the topic words can also be based on the TextRank algorithm, that is, the task of keyword extraction is to automatically extract a number of meaningful words or phrases from a given text. The TextRank algorithm uses the relationship between local vocabularies (co-occurrence window) to sort subsequent keywords and extract them directly from the text itself. The main steps are as follows:
(1)把给定的文本T按照完整句子进行分割,即T=[S 1,S 2,...,S m]; (1) Dividing a given text T into complete sentences, ie T = [S 1 , S 2 , ..., S m ];
(2)对于每个句子S i∈T,进行分词和词性标注处理,并过滤掉停用词,只保留指定词性的单词,如名词、动词、形容词,即S i=[t i,1,t i,2,...t i,n],其中t i,j∈S j是保留后的候选关键词; (2) For each sentence S i ∈T, perform word segmentation and part-of-speech tagging, and filter out the stop words, and only retain words with specified part of speech, such as nouns, verbs, adjectives, ie S i =[t i,1 , t i,2 ,...t i,n ], where t i,j ∈S j are reserved candidate keywords;
(3)构建候选关键词图G=(V,E),其中V为节点集,由步骤(2)生成的候选关键词组成,然后采用共现关系(co-occurrence)构造任两点之间的边,两个节点之间存在边仅当它们对应的词汇在长度为K的窗口中共现,K表示窗口大小,即最多共现K个单词;(3) Construct candidate keyword graph G=(V, E), where V is a node set, consisting of candidate keywords generated in step (2), and then using co-occurrence to construct between two points Edge, there is an edge between two nodes only when their corresponding vocabulary co-occur in the window of length K, K represents the window size, that is, a maximum of K words;
(4)根据上面公式,迭代传播各节点的权重,直至收敛;(4) Iteratively propagate the weights of each node according to the above formula until convergence;
(5)对节点权重进行倒序排序,从而得到最重要的T个单词,作为候选关键词;(5) Sorting the node weights in reverse order to obtain the most important T words as candidate keywords;
(6)由步骤(5)得到最重要的T个单词,在原始文本中进行标记,若形成相邻词组,则组合成多词关键词。例如,文本中有句子“Matlab code for plotting ambiguity function”,如果“Matlab”和“code”均属于候选关键词,则组合成“Matlab code”加入关键词序列。(6) The most important T words are obtained from step (5), marked in the original text, and if adjacent phrases are formed, combined into multi-word keywords. For example, there is a sentence in the text "Matlab code for plotting ambiguity function". If both "Matlab" and "code" belong to candidate keywords, then the combination of "Matlab code" is added to the keyword sequence.
其中,对于TextRank源码解析如下所示:读入文本,并切词,对切词结果统计共现关系,窗口默认为5,保存大cm中。Among them, the textRank source code parsing is as follows: read the text, and cut the word, the statistical co-occurrence relationship of the word cut result, the window defaults to 5, save the large cm.
Figure PCTCN2018077413-appb-000004
Figure PCTCN2018077413-appb-000004
需要说明的是,对于主题词的提取包括但不限于上述提及的几种实现方式,而且,计算机获取的主题词的数量不做限定。It should be noted that the extraction of the keyword includes, but is not limited to, the several implementations mentioned above, and the number of keywords obtained by the computer is not limited.
在语音识别系统中,经常会遇到这样的需求:将大量(比如几十万、甚至上百万)的对象进行排序,然后只需要取出最Top的前N名作为排行榜的 数据,这即是一个TopN算法。常见的解决方案有三种:In speech recognition systems, there is often a need to sort a large number (such as hundreds of thousands, or even millions) of objects, and then only need to take the top N top N as the leaderboard data, which is Is a TopN algorithm. There are three common solutions:
(1)直接使用List的Sort方法进行处理。(1) Directly use the Sort method of the List for processing.
(2)使用排序二叉树进行排序,然后取出前N名。(2) Sort using the sort binary tree, and then take the top N names.
(3)使用最大堆排序,然后取出前N名。(3) Use the maximum heap sort, then remove the top N.
304、根据主题词获取目标相关信息,目标相关信息为与主题词对应的上下文信息;304. Obtain target related information according to the keyword, and the target related information is context information corresponding to the keyword;
在本发明实施例中,计算机获取初步识别文本中的主题词之后,可以根据主题词获取目标相关信息,目标相关信息为与主题词对应的上下文信息。In the embodiment of the present invention, after the computer obtains the keyword in the preliminary identification text, the target related information may be acquired according to the keyword, and the target related information is context information corresponding to the keyword.
根据主题词获取目标相关信息,可以包括:Obtaining target related information based on the keyword can include:
(1)根据主题词通过全网搜索获取目标相关信息。(1) Obtain target related information through the whole network search according to the keyword.
(2)在预置的相关信息集合中,提取与主题词对应的目标相关信息。(2) Extracting target related information corresponding to the keyword in the preset related information set.
进一步的,根据主题词通过全网搜索获取目标相关信息,可以包括:根据主题词通过全网搜索,获取对应的搜索结果;将搜索结果进行匹配,确定目标相关信息。Further, obtaining the target related information through the whole network search according to the keyword may include: searching for the corresponding search result according to the keyword through the whole network; matching the search result to determine the target related information.
需要说明的是,这里的目标相关信息简单的可以理解为搜索主题词显示的页面上的每篇文章的题目、或者每篇文章的摘要、或者每篇文章的所有内容。但是,应理解,若目标相关信息为每篇文章的所有内容的话,消耗的资源比较大。It should be noted that the target related information herein can be simply understood as the title of each article on the page displayed by the search keyword, or the abstract of each article, or all the contents of each article. However, it should be understood that if the target-related information is all the content of each article, the resources consumed are relatively large.
示例性的,若获取的主题词为滤波器,这里显示的滤波器可以是正确的词组,也可以不是正确的词组,计算机可以自动通过一些搜索软件获取与滤波器相关的页面内容,例如显示的是高通滤波器、低通滤波器、带通滤波器和带阻滤波器的超链接内容,可以把这些超链接的标题或者每个超链接内容中的摘要作为主题词“滤波器”的目标相关信息。Exemplarily, if the obtained keyword is a filter, the filter displayed here may be the correct phrase or may not be the correct phrase, and the computer may automatically obtain the content of the page related to the filter through some search software, such as display. Is a high-pass filter, low-pass filter, band-pass filter, and band-stop filter hyperlink content, you can associate the title of these hyperlinks or the abstract in each hyperlink content as the target of the keyword "filter" information.
305、根据目标相关信息建立目标语言库。305. Establish a target language library according to the target related information.
在本发明实施例中,计算机根据主题词获取目标相关信息之后,再根据目标相关信息建立目标语言库。具体的,可以包括:根据目标相关信息进行训练,建立目标语言库。应理解,这里的目标语言库是建立的关于本次会议的主题或者本次报告核心的领域语言模型。即可以对目标相关信息进行过滤清洗、领域匹配等一系列操作,进行训练等得到领域语言模型。示例性的, 可以根据高通滤波器、低通滤波器、带通滤波器和带阻滤波器超链接内容中的摘要信息,进行训练,得到关于滤波器领域的语言模型,并将关于滤波器领域的语言模型添加在上述图2所示的通用语言模型中。In the embodiment of the present invention, after the computer acquires the target related information according to the keyword, the target language library is established according to the target related information. Specifically, the method may include: training according to the target related information, and establishing a target language library. It should be understood that the target language library here is the domain language model established at the core of this conference or the core of this report. That is, a series of operations such as filtering and cleaning, domain matching, and the like can be performed on the target related information, and the domain language model can be obtained by training. Exemplarily, the training can be performed according to the summary information in the hyperlink content of the high-pass filter, the low-pass filter, the band-pass filter and the band-stop filter, and the language model about the filter field can be obtained, and the filter field will be The language model is added to the general language model shown in Figure 2 above.
那么,在后续的语音识别中,再出现关于滤波器的相关信息,都会先在语音识别系统中进行识别,因为语音识别系统中之前有添加关于滤波器领域的语言模型,所以,计算机可以准确的识别,具体可以识别出是否是高通滤波器、低通滤波器、带通滤波器或者带阻滤波器。Then, in the subsequent speech recognition, the relevant information about the filter will appear in the speech recognition system first, because the speech recognition system has previously added a language model about the filter field, so the computer can be accurate. Identification, specifically whether it is a high pass filter, a low pass filter, a band pass filter or a band stop filter.
本发明实施例中可以使用Ngram统计语言模型,n-gram模型也称为n-1阶马尔科夫模型,它有一个有限历史假设:当前词的出现概率仅仅与前面n-1个词相关。因此P(S)可以近似为:The Ngram statistical language model can be used in the embodiment of the present invention. The n-gram model is also called the n-1 order Markov model. It has a finite historical hypothesis: the probability of occurrence of the current word is only related to the previous n-1 words. So P(S) can be approximated as:
Figure PCTCN2018077413-appb-000005
Figure PCTCN2018077413-appb-000005
当n取1、2、3时,n-gram模型分别称为unigram、bigram和trigram语言模型。n-gram模型的参数就是条件概率P(W i|W i-n+1,...,W i-1)。假设词表的大小为100000,那么n-gram模型的参数数量为100000的n次方。n越大,模型越准确,也越复杂,需要的计算量越大。本发明实施例中以选用的n为3为例来进行说明,即trigram语言模型。更详细一点,ngram语言模型也就是上述P(S)模型一般通过最大似然估计进行参数估计,各类模型算法的不同之处往往在于使用何种数据平滑算法来解决当n增大后的数据稀疏问题(即要解决上述概率公式展开后由于某项在语料中统计频次趋近与0,而带来的整个P(S)趋0的问题)。本发明实施例可以使用Katz平滑算法,相应的业界还存在加法平滑、Good-Turing平滑,插值平滑等不同算法。 When n is 1, 2, and 3, the n-gram models are called unigram, bigram, and trigram language models, respectively. The parameters of the n-gram model are the conditional probability P(W i |W i-n+1 ,...,W i-1 ). Assuming that the size of the vocabulary is 100,000, then the number of parameters of the n-gram model is 100000 nth power. The larger n is, the more accurate and complex the model is, and the greater the amount of computation required. In the embodiment of the present invention, the selected n is 3 as an example, that is, the trigram language model. In more detail, the ngram language model, that is, the above P(S) model, generally uses the maximum likelihood estimation for parameter estimation. The difference between various model algorithms often lies in which data smoothing algorithm is used to solve the data when n is increased. Sparse problem (that is, to solve the problem that the whole P(S) tends to zero due to the fact that the above-mentioned probability formula is expanded due to the fact that the statistical frequency of an item approaches 0 in the corpus. The Katz smoothing algorithm can be used in the embodiment of the present invention, and different algorithms such as addition smoothing, Good-Turing smoothing, and interpolation smoothing exist in the corresponding industry.
在本发明实施例中,获取初步识别文本中的主题词,该主题词为该初步识别文本中关键信息的词,该初步识别文本为根据语音信号识别得到的文本;根据该主题词获取目标相关信息,该目标相关信息为与该主题词对应的上下文信息;根据该目标相关信息建立目标语言库。用户在使用计算机的过程中,计算机可以接收语音信号,根据语音信号获取对应的初步识别文本,再根据初步识别文本获取主题词,然后根据该主题词获取目标相关信息,可以根据相关信息建立目标语言库,目标语言库用于在根据下次接收的语音信号获取的识别文本中,对于该主题词或者该主题词相关的主题词的识别就会显示的 是准确的识别,提高了语音识别的准确率。In the embodiment of the present invention, the keyword in the preliminary identification text is obtained, the keyword is a word of the key information in the preliminary identification text, and the preliminary identification text is a text recognized according to the voice signal; and the target is related according to the keyword Information, the target related information is context information corresponding to the keyword; and the target language library is established according to the target related information. In the process of using the computer, the computer can receive the voice signal, obtain the corresponding preliminary identification text according to the voice signal, obtain the keyword according to the preliminary identification text, and then obtain the target related information according to the keyword, and can establish the target language according to the related information. The library, the target language library is used for identifying the keyword or the related topic words in the recognition text acquired according to the next received voice signal, and the accurate recognition is improved, and the accuracy of the voice recognition is improved. rate.
下面以实际应用场景对本发明实施例中语音识别的方法进行具体说明,如下所示:The method for voice recognition in the embodiment of the present invention is specifically described below in the actual application scenario, as follows:
假设秋香是一位播音主持,在节目中需要读取一篇文章,这篇长文为《中国的蜜蜂养殖》,我们准备用计算机提取它的关键词。一个容易想到的思路,就是找到出现次数最多的词。如果某个词很重要,它应该在这篇文章中多次出现。于是,我们进行“词频”(Term Frequency,缩写为TF)统计。Assume that Qiu Xiang is a broadcast host. In the program, I need to read an article. This long article is "China Bee Farming". We are going to use computer to extract its keywords. An easy way to think of is to find the words that appear the most. If a word is important, it should appear multiple times in this article. So, we carry out the "Term Frequency" (TF) statistics.
结果大家肯定猜到了,出现次数最多的词是:“的”、“是”、“在”等这一类最常用的词。它们叫做“停用词”(stop words),表示对找到结果毫无帮助、必须过滤掉的词。As a result, everyone must have guessed that the most frequently used words are: "," "yes", "in" and so on. They are called "stop words" and are words that do not help to find results and must be filtered out.
假设我们把它们都过滤掉了,只考虑剩下的有实际意义的词。这样又会遇到了另一个问题,我们可能发现“中国”、“蜜蜂”、“养殖”这三个词的出现次数一样多。这是不是意味着,作为关键词,它们的重要性是一样的?Suppose we filter them all out and only consider the remaining meaningful words. In this way, we will encounter another problem. We may find that the three words “China”, “bee” and “farming” appear as many times. Does this mean that, as keywords, their importance is the same?
显然不是这样。因为“中国”是很常见的词,相对而言,“蜜蜂”和“养殖”不那么常见。如果这三个词在一篇文章的出现次数一样多,有理由认为,“蜜蜂”和“养殖”的重要程度要大于“中国”,也就是说,在关键词排序上面,“蜜蜂”和“养殖”应该排在“中国”的前面。Obviously not the case. Because "China" is a very common word, relatively speaking, "bee" and "farming" are less common. If these three words appear as many times in an article, there is reason to believe that "bees" and "farming" are more important than "China", that is, in the order of keywords, "bees" and " "culture" should be ranked in front of "China."
所以,我们需要一个重要性调整系数,衡量一个词是不是常见词。如果某个词比较少见,但是它在这篇文章中多次出现,那么它很可能就反映了这篇文章的特性,正是我们所需要的关键词。Therefore, we need an importance adjustment factor to measure whether a word is a common word. If a word is rare, but it appears multiple times in this article, then it is likely to reflect the characteristics of this article, which is exactly what we need.
用统计学语言表达,就是在词频的基础上,要对每个词分配一个“重要性”权重。最常见的词(“的”、“是”、“在”)给予最小的权重,较常见的词(“中国”)给予较小的权重,较少见的词(“蜜蜂”、“养殖”)给予较大的权重。这个权重叫做“逆文档频率”(Inverse Document Frequency,缩写为IDF),它的大小与一个词的常见程度成反比。Expressed in a statistical language, on the basis of word frequency, each word is assigned an "importance" weight. The most common words ("," "yes", "at") give the least weight, the more common words ("China") give less weight, less common words ("bees", "farming" ) Give greater weight. This weight is called "Inverse Document Frequency" (IDF) and its size is inversely proportional to the common degree of a word.
知道了“词频”(TF)和“逆文档频率”(IDF)以后,将这两个值相乘,就得到了一个词的TF-IDF值。某个词对文章的重要性越高,它的TF-IDF值就越大。所以,排在最前面的几个词,就是这篇文章的关键词。After knowing the word frequency (TF) and the "inverse document frequency" (IDF), multiplying these two values yields the TF-IDF value of a word. The higher the importance of a word to an article, the greater its TF-IDF value. Therefore, the first few words are the key words of this article.
第一步,计算词频;The first step is to calculate the word frequency;
词频(TF)=某个词在文章中的出现次数Word Frequency (TF) = number of occurrences of a word in an article
考虑到文章有长短之分,为了便于不同文章的比较,进行“词频”标准化。Considering the length and length of the article, in order to facilitate the comparison of different articles, the word frequency is standardized.
词频(TF)=某个词在文章中的出现次数/文章的总词数Word Frequency (TF) = number of occurrences of a word in an article / total number of words in an article
或者,or,
词频(TF)=某个词在文章中的出现次数/该文出现次数最多的词的出现次数Word Frequency (TF) = number of occurrences of a word in an article / number of occurrences of the word with the most occurrences of the article
第二步,计算逆文档频率;The second step is to calculate the inverse document frequency;
这时,需要一个语料库(corpus),用来模拟语言的使用环境。At this time, a corpus is needed to simulate the language usage environment.
逆文档频率(IDF)=log(语料库的文档总数/(包含该词的文档数+1))Inverse Document Frequency (IDF) = log (total number of documents in the corpus / (number of documents containing the word + 1))
如果一个词越常见,那么分母就越大,逆文档频率就越小越接近0。分母之所以要加1,是为了避免分母为0(即所有文档都不包含该词)。log表示对得到的值取对数。If a word is more common, the denominator is larger, and the inverse of the document frequency is closer to zero. The reason why the denominator is added 1 is to avoid the denominator being 0 (that is, all documents do not contain the word). Log indicates the logarithm of the obtained value.
第三步,计算TF-IDF。The third step is to calculate the TF-IDF.
TF-IDF=词频(TF)*逆文档频率(IDF)TF-IDF=Word Frequency (TF)* Inverse Document Frequency (IDF)
可以看到,TF-IDF与一个词在文档中的出现次数成正比,与该词在整个语言中的出现次数成反比。所以,自动提取关键词的算法就很清楚了,就是计算出文档的每个词的TF-IDF值,然后按降序排列,取排在最前面的几个词。It can be seen that TF-IDF is proportional to the number of occurrences of a word in the document and inversely proportional to the number of occurrences of the word in the entire language. Therefore, the algorithm for automatically extracting keywords is very clear, that is, the TF-IDF value of each word of the document is calculated, and then arranged in descending order, taking the top words.
还是以《中国的蜜蜂养殖》为例,假定该文长度为1000个词,“中国”、“蜜蜂”、“养殖”各出现20次,则这三个词的“词频”(TF)都为0.02。然后,搜索Google发现,包含“的”字的网页共有250亿张,假定这就是中文网页总数。包含“中国”的网页共有62.3亿张,包含“蜜蜂”的网页为0.484亿张,包含“养殖”的网页为0.973亿张。则它们的逆文档频率(IDF)和TF-IDF如下表1所示:Take "China's Bee Breeding" as an example. Suppose the length of the article is 1000 words, "China", "bee" and "farming" appear 20 times each, then the "word frequency" (TF) of these three words are 0.02. Then, Google search found that there were 25 billion pages with the word "", assuming that this is the total number of Chinese pages. There were 6.23 billion pages containing “China”, 0.484 million pages containing “bees”, and 97.3 million pages containing “breeding”. Then their inverse document frequency (IDF) and TF-IDF are shown in Table 1 below:
Figure PCTCN2018077413-appb-000006
Figure PCTCN2018077413-appb-000006
表1Table 1
从上述表1可见,“蜜蜂”的TF-IDF值最高,“养殖”其次,“中国”最低。(如果还计算“的”字的TF-IDF,那将是一个极其接近0的值。)所以,如果只选择一个词,“蜜蜂”就是这篇文章的关键词。It can be seen from Table 1 above that "bee" has the highest TF-IDF value, followed by "culture" and "China". (If you also calculate the TF-IDF of the "" word, it will be a value very close to 0.) So, if you only select one word, "bee" is the keyword for this article.
除了自动提取关键词,TF-IDF算法还可以用于许多别的地方。比如,信息检索时,对于每个文档,都可以分别计算一组搜索词(“中国”、“蜜蜂”、“养殖”)的TF-IDF,将它们相加,就可以得到整个文档的TF-IDF。这个值最高的文档就是与搜索词最相关的文档。In addition to automatically extracting keywords, the TF-IDF algorithm can be used in many other places. For example, in information retrieval, for each document, a set of search words ("China", "bee", "farming") TF-IDF can be calculated separately, and they can be added to obtain the TF- of the entire document. IDF. The document with the highest value is the one most relevant to the search term.
所以,这里的“蜜蜂”和“养殖”作为主题词进行搜索,获取关于“蜜蜂”和“养殖”的上下文信息,将搜索到的这些上下文信息训练得到蜜蜂养殖领域的语言模型。Therefore, the “bees” and “farming” here searched as the subject words, obtained context information about “bees” and “farming”, and trained the searched context information to obtain a language model in the field of bee farming.
等到后续的文章中再出现关于蜜蜂养殖的过程中,出现关于蜜蜂养殖的相关的语音识别时,就可以通过蜜蜂养殖领域的语言模型准确的进行识别了。When the subsequent speech recognition on bee culture occurs in the subsequent article on bee culture, it can be accurately identified by the language model of bee farming.
上面对本发明实施例中语音识别的方法进行了描述,下面对本发明实施例中的计算机进行说明,如图5所示,为本发明实施例中计算机的一个实施例示意图,包括:The method for the voice recognition in the embodiment of the present invention is described above. The following describes the computer in the embodiment of the present invention. As shown in FIG. 5, it is a schematic diagram of an embodiment of a computer in the embodiment of the present invention, including:
第一获取模块501,用于初步识别文本中的主题词,主题词为初步识别文本中关键信息的词,初步识别文本为根据语音信号识别得到的文本;The first obtaining module 501 is configured to initially identify a keyword in the text, the keyword is a word that initially identifies key information in the text, and the preliminary identification text is a text that is identified according to the voice signal;
第二获取模块502,用于根据主题词获取目标相关信息,目标相关信息为与主题词对应的上下文信息;The second obtaining module 502 is configured to acquire target related information according to the keyword, where the target related information is context information corresponding to the keyword;
建立模块503,用于根据目标相关信息建立目标语言库。The establishing module 503 is configured to establish a target language library according to the target related information.
可选的,在本发明的一些实施例中,Optionally, in some embodiments of the invention,
第一获取模块501,具体用于根据初步识别文本按照公式1获取主题词,其中,公式1为:The first obtaining module 501 is specifically configured to obtain the keyword according to the formula 1 according to the preliminary identification text, where the formula 1 is:
Score(i)=tf(i)*idf(i),其中,i指初步识别文本中第i个词,tf(i)指第i个词在初步识别文本中出现的次数,idf(i)指第i个词在初步识别文本中的逆文档频率。Score(i)=tf(i)*idf(i), where i refers to the i-th word in the preliminary identification text, tf(i) refers to the number of times the i-th word appears in the preliminary recognized text, idf(i) Refers to the inverse document frequency of the i-th word in the preliminary identification text.
可选的,在本发明的一些实施例中,在上述图5所示的基础上,如图6所示,为本发明实施例中计算机的另一个实施例示意图,计算机还包括:Optionally, in some embodiments of the present invention, based on the foregoing FIG. 5, as shown in FIG. 6, which is a schematic diagram of another embodiment of a computer in the embodiment of the present invention, the computer further includes:
接收模块504,用于接收语音信号;The receiving module 504 is configured to receive a voice signal;
第三获取模块505,用于根据语音信号获取对应的初步识别文本。The third obtaining module 505 is configured to obtain a corresponding preliminary identification text according to the voice signal.
可选的,在本发明的一些实施例中,Optionally, in some embodiments of the invention,
第二获取模块502,具体用于根据主题词通过全网搜索获取目标相关信息。The second obtaining module 502 is specifically configured to obtain target related information by searching through the entire network according to the keyword.
可选的,在本发明的一些实施例中,Optionally, in some embodiments of the invention,
第二获取模块502,具体还用于根据主题词通过全网搜索,获取对应的搜索结果;将搜索结果进行匹配,确定目标相关信息。The second obtaining module 502 is further configured to: obtain a corresponding search result by searching through the entire network according to the keyword, and match the search result to determine the target related information.
可选的,在本发明的一些实施例中,Optionally, in some embodiments of the invention,
第二获取模块502,具体还用于在预置的相关信息集合中,提取与主题词对应的目标相关信息。The second obtaining module 502 is further configured to extract target related information corresponding to the keyword in the preset related information set.
可选的,在本发明的一些实施例中,Optionally, in some embodiments of the invention,
建立模块503,具体用于根据目标相关信息进行训练,建立目标语言库。The establishing module 503 is specifically configured to perform training according to the target related information to establish a target language library.
本发明实施例还提供一种存储介质,所述存储介质中存储有计算机程序,其中,所述计算机程序被设置为运行时执行上述的方法。The embodiment of the invention further provides a storage medium, wherein the storage medium stores a computer program, wherein the computer program is set to execute the above method when it is running.
本发明实施例还提供一种电子装置,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器被设置为通过所述计算机程序执行上述的方法。该电子装置可以是图7所示的计算机,处理器可以是图7所示的中央处理器。Embodiments of the present invention also provide an electronic device including a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the above method by the computer program. The electronic device may be the computer shown in FIG. 7, and the processor may be the central processor shown in FIG.
如图7所示,为本发明中计算机的另一个实施例示意图。FIG. 7 is a schematic diagram of another embodiment of a computer in the present invention.
计算机700可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)722(例如,一个或一个以上处理器)和存储器732,一个或一个以上存储应用程序742或数据744的存储介质730(例如一个或一个以上海量存储设备)。其中,存储器732和存储介质730可以是短暂存储或持久存储。存储在存储介质730的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对计算机中的一系列指令操作。更进一步地,中央处理器722可以设置为与存储介质730通信,在计算机700上执行存储介质730中的一系列指令操作。Computer 700 may vary considerably depending on configuration or performance, and may include one or more central processing units (CPU) 722 (eg, one or more processors) and memory 732, one or more A storage medium 730 storing storage application 742 or data 744 (eg, one or one storage device in Shanghai). Among them, the memory 732 and the storage medium 730 may be short-term storage or persistent storage. The program stored on storage medium 730 may include one or more modules (not shown), each of which may include a series of instruction operations on the computer. Still further, central processor 722 can be configured to communicate with storage medium 730, executing a series of instruction operations in storage medium 730 on computer 700.
计算机700还可以包括一个或一个以上电源726,一个或一个以上有线或 无线网络接口750,一个或一个以上输入输出接口758,和/或,一个或一个以上操作系统741,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。Computer 700 may also include one or more power sources 726, one or more wired or wireless network interfaces 750, one or more input and output interfaces 758, and/or one or more operating systems 741, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and more.
在本发明实施例中,中央处理器722还用于执行以下功能:用于获取初步识别文本中的主题词,主题词为初步识别文本中关键信息的词,初步识别文本为根据语音信号识别得到的文本;根据主题词获取目标相关信息,目标相关信息为与主题词对应的上下文信息;根据目标相关信息建立目标语言库。In the embodiment of the present invention, the central processing unit 722 is further configured to perform the following functions: for acquiring a keyword in the preliminary identification text, the keyword is a word for initially identifying key information in the text, and the preliminary identification text is obtained according to the voice signal. The text; the target related information is obtained according to the keyword, the target related information is context information corresponding to the keyword; and the target language library is established according to the target related information.
可选的,在本发明的一些实施例中,Optionally, in some embodiments of the invention,
中央处理器722,具体用于根据初步识别文本按照公式1获取主题词,其中,公式1为:The central processing unit 722 is specifically configured to obtain the keyword according to the formula 1 according to the preliminary identification text, where the formula 1 is:
Score(i)=tf(i)*idf(i),其中,i指初步识别文本中第i个词,tf(i)指第i个词在初步识别文本中出现的次数,idf(i)指第i个词在初步识别文本中的逆文档频率。Score(i)=tf(i)*idf(i), where i refers to the i-th word in the preliminary identification text, tf(i) refers to the number of times the i-th word appears in the preliminary recognized text, idf(i) Refers to the inverse document frequency of the i-th word in the preliminary identification text.
可选的,在本发明的一些实施例中,Optionally, in some embodiments of the invention,
中央处理器722,还用于接收语音信号;根据语音信号获取对应的初步识别文本。The central processing unit 722 is further configured to receive a voice signal, and obtain a corresponding preliminary identification text according to the voice signal.
可选的,在本发明的一些实施例中,Optionally, in some embodiments of the invention,
中央处理器722,具体用于根据主题词通过全网搜索获取目标相关信息。The central processing unit 722 is specifically configured to obtain target related information through a full network search according to the keyword.
可选的,在本发明的一些实施例中,Optionally, in some embodiments of the invention,
中央处理器722,具体还用于根据主题词通过全网搜索,获取对应的搜索结果;将搜索结果进行匹配,确定目标相关信息。The central processing unit 722 is specifically configured to obtain a corresponding search result by searching through the entire network according to the keyword, and matching the search result to determine the target related information.
可选的,在本发明的一些实施例中,Optionally, in some embodiments of the invention,
中央处理器722,具体还用于在预置的相关信息集合中,提取与主题词对应的目标相关信息。The central processing unit 722 is further configured to extract target related information corresponding to the keyword in the preset related information set.
可选的,在本发明的一些实施例中,Optionally, in some embodiments of the invention,
中央处理器722,具体用于根据目标相关信息进行训练,建立目标语言库。The central processing unit 722 is specifically configured to perform training according to the target related information to establish a target language library.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。A person skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the system, the device and the unit described above can refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided by the present application, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对相关技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may contribute to the related art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium. A number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .
以上所述,以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。The above embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to be limiting; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that The technical solutions described in the embodiments are modified, or the equivalents of the technical features are replaced by the equivalents of the technical solutions of the embodiments of the present invention.

Claims (16)

  1. 一种语音识别的方法,包括:A method for speech recognition, comprising:
    获取初步识别文本中的主题词,所述主题词为所述初步识别文本中关键信息的词,所述初步识别文本为根据语音信号识别得到的文本;Obtaining a keyword in the preliminary identification text, the keyword is a word of the key information in the preliminary identification text, and the preliminary identification text is a text recognized according to the voice signal;
    根据所述主题词获取目标相关信息,所述目标相关信息为与所述主题词对应的上下文信息;Obtaining target related information according to the keyword, the target related information being context information corresponding to the keyword;
    根据所述目标相关信息建立目标语言库;Establishing a target language library according to the target related information;
    在下次接收的语音信号获取的识别文本中利用所述目标语言库对所述主题词或者所述主题词相关的主题词进行识别。The keyword or the keyword related to the keyword is identified by the target language library in the identification text acquired by the next received voice signal.
  2. 根据权利要求1所述的方法,其中,所述获取初步识别文本中的主题词,包括:The method of claim 1, wherein the obtaining the keyword in the preliminary identification text comprises:
    根据所述初步识别文本按照公式1获取所述主题词,其中,所述公式1为:Obtaining the keyword according to formula 1 according to the preliminary identification text, wherein the formula 1 is:
    Score(i)=tf(i)*idf(i),其中,i指所述初步识别文本中第i个词,tf(i)指第i个词在所述初步识别文本中出现的次数,idf(i)指第i个词在所述初步识别文本中的逆文档频率。Score(i)=tf(i)*idf(i), where i refers to the i-th word in the preliminary identification text, and tf(i) refers to the number of times the i-th word appears in the preliminary recognized text, Idf(i) refers to the inverse document frequency of the i-th word in the preliminary identification text.
  3. 根据权利要求1所述的方法,其中,所述获取初步识别文本中的主题词之前,所述方法还包括:The method of claim 1, wherein before the obtaining the keyword in the preliminary identification text, the method further comprises:
    接收语音信号;Receiving a voice signal;
    根据所述语音信号获取对应的初步识别文本。Corresponding preliminary identification text is obtained according to the voice signal.
  4. 根据权利要求1-3任一所述的方法,其中,所述根据所述主题词获取目标相关信息,包括:The method according to any one of claims 1-3, wherein the obtaining target related information according to the keyword includes:
    根据所述主题词通过全网搜索获取所述目标相关信息。Obtaining the target related information through a full network search according to the keyword.
  5. 根据权利要求4所述的方法,其中,所述根据所述主题词通过全网搜索获取所述目标相关信息,包括:The method according to claim 4, wherein the obtaining the target related information through a full network search according to the keyword includes:
    根据所述主题词通过全网搜索,获取对应的搜索结果;Searching through the entire network according to the keyword to obtain a corresponding search result;
    将所述搜索结果进行匹配,确定所述目标相关信息。The search results are matched to determine the target related information.
  6. 根据权利要求1-3任一所述的方法,其中,所述根据所述主题词获取目标相关信息,包括:The method according to any one of claims 1-3, wherein the obtaining target related information according to the keyword includes:
    在预置的相关信息集合中,提取与所述主题词对应的目标相关信息。In the preset related information set, target related information corresponding to the keyword is extracted.
  7. 根据权利要求1-3任一所述的方法,其中,所述根据所述目标相关信息建立目标语言库,包括:The method according to any one of claims 1-3, wherein the establishing a target language library according to the target related information comprises:
    根据所述目标相关信息进行训练,建立所述目标语言库。Training is performed according to the target related information to establish the target language library.
  8. 一种计算机,包括:A computer comprising:
    第一获取模块,用于获取初步识别文本中的主题词,所述主题词为所述初步识别文本中关键信息的词,所述初步识别文本为根据语音信号识别得到的文本;a first obtaining module, configured to acquire a keyword in the preliminary identification text, where the keyword is a word of the key information in the preliminary identification text, and the preliminary identification text is a text identified according to the voice signal;
    第二获取模块,用于根据所述主题词获取目标相关信息,所述目标相关信息为与所述主题词对应的上下文信息;a second acquiring module, configured to acquire target related information according to the keyword, where the target related information is context information corresponding to the keyword;
    建立模块,用于根据所述目标相关信息建立目标语言库,在下次接收的语音信号获取的识别文本中利用所述目标语言库对所述主题词或者所述主题词相关的主题词进行识别。And a establishing module, configured to establish a target language library according to the target related information, and use the target language library to identify the keyword or the keyword related to the keyword in the recognized text acquired by the next received voice signal.
  9. 根据权利要求8所述的计算机,其中,The computer according to claim 8, wherein
    所述第一获取模块,具体用于根据所述初步识别文本按照公式1获取所述主题词,其中,所述公式1为:The first obtaining module is configured to obtain the keyword according to the preliminary identification text according to the formula 1, wherein the formula 1 is:
    Score(i)=tf(i)*idf(i),其中,i指所述初步识别文本中第i个词,tf(i)指第i个词在所述初步识别文本中出现的次数,idf(i)指第i个词在所述初步识别文本中的逆文档频率。Score(i)=tf(i)*idf(i), where i refers to the i-th word in the preliminary identification text, and tf(i) refers to the number of times the i-th word appears in the preliminary recognized text, Idf(i) refers to the inverse document frequency of the i-th word in the preliminary identification text.
  10. 根据权利要求8所述的计算机,其中,所述计算机还包括:The computer of claim 8 wherein said computer further comprises:
    接收模块,用于接收语音信号;a receiving module, configured to receive a voice signal;
    第三获取模块,用于根据所述语音信号获取对应的初步识别文本。And a third acquiring module, configured to acquire a corresponding preliminary identification text according to the voice signal.
  11. 根据权利要求8-10任一所述的计算机,其中,A computer according to any one of claims 8 to 10, wherein
    所述第二获取模块,具体用于根据所述主题词通过全网搜索获取所述目标相关信息。The second obtaining module is specifically configured to obtain the target related information by searching through the entire network according to the keyword.
  12. 根据权利要求11所述的计算机,其中,The computer according to claim 11, wherein
    所述第二获取模块,具体还用于根据所述主题词通过全网搜索,获取对应的搜索结果;将所述搜索结果进行匹配,确定所述目标相关信息。The second obtaining module is further configured to: obtain a corresponding search result by searching through the entire network according to the keyword, and match the search result to determine the target related information.
  13. 根据权利要求8-10任一所述的计算机,其中,A computer according to any one of claims 8 to 10, wherein
    所述第二获取模块,具体还用于在预置的相关信息集合中,提取与所述 主题词对应的目标相关信息。The second acquiring module is further configured to extract, in the preset related information set, target related information corresponding to the keyword.
  14. 根据权利要求8-10任一所述的计算机,其中,A computer according to any one of claims 8 to 10, wherein
    所述建立模块,具体用于根据所述目标相关信息进行训练,建立所述目标语言库。The establishing module is specifically configured to perform training according to the target related information to establish the target language library.
  15. 一种存储介质,其特征在于,所述存储介质中存储有计算机程序,其中,所述计算机程序被设置为运行时执行所述权利要求1至7任一项中所述的方法。A storage medium, characterized in that a computer program is stored in the storage medium, wherein the computer program is arranged to execute the method of any one of claims 1 to 7 at runtime.
  16. 一种电子装置,包括存储器和处理器,其特征在于,所述存储器中存储有计算机程序,所述处理器被设置为通过所述计算机程序执行所述权利要求1至7任一项中所述的方法。An electronic device comprising a memory and a processor, wherein the memory stores a computer program, the processor being arranged to perform the method of any one of claims 1 to 7 by the computer program Methods.
PCT/CN2018/077413 2017-03-02 2018-02-27 Speech recognition method, computer, storage medium, and electronic apparatus WO2018157789A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710121180.2 2017-03-02
CN201710121180.2A CN108538286A (en) 2017-03-02 2017-03-02 A kind of method and computer of speech recognition

Publications (1)

Publication Number Publication Date
WO2018157789A1 true WO2018157789A1 (en) 2018-09-07

Family

ID=63370555

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/077413 WO2018157789A1 (en) 2017-03-02 2018-02-27 Speech recognition method, computer, storage medium, and electronic apparatus

Country Status (2)

Country Link
CN (1) CN108538286A (en)
WO (1) WO2018157789A1 (en)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109522392A (en) * 2018-10-11 2019-03-26 平安科技(深圳)有限公司 Voice-based search method, server and computer readable storage medium
CN111081226B (en) * 2018-10-18 2024-02-13 北京搜狗科技发展有限公司 Speech recognition decoding optimization method and device
CN109376658B (en) * 2018-10-26 2022-03-08 信雅达科技股份有限公司 OCR method based on deep learning
CN111125355A (en) * 2018-10-31 2020-05-08 北京国双科技有限公司 Information processing method and related equipment
CN109360554A (en) * 2018-12-10 2019-02-19 广东潮庭集团有限公司 A kind of language identification method based on language deep neural network
CN109299248A (en) * 2018-12-12 2019-02-01 成都航天科工大数据研究院有限公司 A kind of business intelligence collection method based on natural language processing
CN109559744B (en) * 2018-12-12 2022-07-08 泰康保险集团股份有限公司 Voice data processing method and device and readable storage medium
CN110136688B (en) * 2019-04-15 2023-09-29 平安科技(深圳)有限公司 Text-to-speech method based on speech synthesis and related equipment
CN110349568B (en) * 2019-06-06 2024-05-31 平安科技(深圳)有限公司 Voice retrieval method, device, computer equipment and storage medium
CN111326160A (en) * 2020-03-11 2020-06-23 南京奥拓电子科技有限公司 Speech recognition method, system and storage medium for correcting noise text
CN111444318A (en) * 2020-04-08 2020-07-24 厦门快商通科技股份有限公司 Text error correction method
CN112017645B (en) * 2020-08-31 2024-04-26 广州市百果园信息技术有限公司 Voice recognition method and device
CN112468665A (en) * 2020-11-05 2021-03-09 中国建设银行股份有限公司 Method, device, equipment and storage medium for generating conference summary
CN112632319B (en) * 2020-12-22 2023-04-11 天津大学 Method for improving overall classification accuracy of long-tail distributed speech based on transfer learning
CN113077792B (en) * 2021-03-24 2024-03-05 平安科技(深圳)有限公司 Buddhism subject term identification method, device, equipment and storage medium
CN113129866B (en) * 2021-04-13 2022-08-02 重庆度小满优扬科技有限公司 Voice processing method, device, storage medium and computer equipment
CN113658585B (en) * 2021-08-13 2024-04-09 北京百度网讯科技有限公司 Training method of voice interaction model, voice interaction method and device
CN113961694B (en) * 2021-09-22 2024-08-06 福建亿榕信息技术有限公司 A method and system for auxiliary analysis of the operation status of each unit of a company based on meetings

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102280106A (en) * 2010-06-12 2011-12-14 三星电子株式会社 VWS method and apparatus used for mobile communication terminal
CN103229137A (en) * 2010-09-29 2013-07-31 国际商业机器公司 Context-based disambiguation of acronyms and abbreviations
CN103544140A (en) * 2012-07-12 2014-01-29 国际商业机器公司 Data processing method, display method and corresponding devices
CN103680498A (en) * 2012-09-26 2014-03-26 华为技术有限公司 Speech recognition method and speech recognition equipment
US20160379626A1 (en) * 2015-06-26 2016-12-29 Michael Deisher Language model modification for local speech recognition systems using remote sources
CN106328145A (en) * 2016-08-19 2017-01-11 北京云知声信息技术有限公司 Voice correction method and voice correction device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101315624B (en) * 2007-05-29 2015-11-25 阿里巴巴集团控股有限公司 A kind of method and apparatus of text subject recommending
CN203456091U (en) * 2013-04-03 2014-02-26 中金数据系统有限公司 Construction system of speech corpus
CN106297800B (en) * 2016-08-10 2021-07-23 中国科学院计算技术研究所 A method and device for adaptive speech recognition
CN106328147B (en) * 2016-08-31 2022-02-01 中国科学技术大学 Speech recognition method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102280106A (en) * 2010-06-12 2011-12-14 三星电子株式会社 VWS method and apparatus used for mobile communication terminal
CN103229137A (en) * 2010-09-29 2013-07-31 国际商业机器公司 Context-based disambiguation of acronyms and abbreviations
CN103544140A (en) * 2012-07-12 2014-01-29 国际商业机器公司 Data processing method, display method and corresponding devices
CN103680498A (en) * 2012-09-26 2014-03-26 华为技术有限公司 Speech recognition method and speech recognition equipment
US20160379626A1 (en) * 2015-06-26 2016-12-29 Michael Deisher Language model modification for local speech recognition systems using remote sources
CN106328145A (en) * 2016-08-19 2017-01-11 北京云知声信息技术有限公司 Voice correction method and voice correction device

Also Published As

Publication number Publication date
CN108538286A (en) 2018-09-14

Similar Documents

Publication Publication Date Title
WO2018157789A1 (en) Speech recognition method, computer, storage medium, and electronic apparatus
US11775760B2 (en) Man-machine conversation method, electronic device, and computer-readable medium
US10176804B2 (en) Analyzing textual data
CN106537370B (en) Method and system for robust tagging of named entities in the presence of source and translation errors
WO2021051521A1 (en) Response information obtaining method and apparatus, computer device, and storage medium
KR101543992B1 (en) Intra-language statistical machine translation
US9330661B2 (en) Accuracy improvement of spoken queries transcription using co-occurrence information
CN111611807B (en) A neural network-based keyword extraction method, device and electronic equipment
CN112069298A (en) Human-computer interaction method, device and medium based on semantic web and intention recognition
US10290299B2 (en) Speech recognition using a foreign word grammar
JP2004005600A (en) Method and system for indexing and retrieving document stored in database
US10592542B2 (en) Document ranking by contextual vectors from natural language query
JP2004133880A (en) Method for constructing dynamic vocabulary for speech recognizer used in database for indexed document
US20150178274A1 (en) Speech translation apparatus and speech translation method
CN105096942A (en) Semantic analysis method and semantic analysis device
CN112347241A (en) Abstract extraction method, device, equipment and storage medium
CN113743090A (en) Keyword extraction method and device
WO2025044865A1 (en) Cross-domain problem processing methods and apparatuses, electronic device and storage medium
CN110705285B (en) Government affair text subject word library construction method, device, server and readable storage medium
WO2022227166A1 (en) Word replacement method and apparatus, electronic device, and storage medium
CN111126084A (en) Data processing method and device, electronic equipment and storage medium
CN118747500A (en) Chinese language translation method and system based on neural network model
CN111161730B (en) Voice instruction matching method, device, equipment and storage medium
CN113486155B (en) Chinese naming method fusing fixed phrase information
JP2005202924A (en) Translation determination system, method, and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18761662

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18761662

Country of ref document: EP

Kind code of ref document: A1

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载