+

CN119889302A - Voice interaction method and device, vehicle and storage medium - Google Patents

Voice interaction method and device, vehicle and storage medium Download PDF

Info

Publication number
CN119889302A
CN119889302A CN202311378512.7A CN202311378512A CN119889302A CN 119889302 A CN119889302 A CN 119889302A CN 202311378512 A CN202311378512 A CN 202311378512A CN 119889302 A CN119889302 A CN 119889302A
Authority
CN
China
Prior art keywords
voice
round
interaction
execution result
voice dialogue
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311378512.7A
Other languages
Chinese (zh)
Inventor
缪士阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jidu Automobile Co Ltd
Original Assignee
Shanghai Jidu Automobile Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jidu Automobile Co Ltd filed Critical Shanghai Jidu Automobile Co Ltd
Priority to CN202311378512.7A priority Critical patent/CN119889302A/en
Publication of CN119889302A publication Critical patent/CN119889302A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The disclosure provides a voice interaction method, a device, a vehicle and a storage medium, wherein the method comprises the steps of obtaining a first voice instruction for interaction with a user interface, determining an interaction interface for entering a multi-round voice dialogue mode and displaying the target function in the multi-round voice dialogue mode, obtaining a first execution result of a first interaction event on the interaction interface, wherein the first interaction event is triggered by touch operation on the interaction interface, entering a next round of dialogue in the multi-round voice dialogue mode when the voice dialogue flow in the multi-round voice dialogue mode is determined according to the first execution result, judging whether to continue the voice dialogue flow according to the first execution result, improving the performance of fusion of voice interaction and manual touch interaction, and improving interaction experience.

Description

Voice interaction method and device, vehicle and storage medium
Technical Field
The disclosure relates to the field of computer technology, and in particular relates to a voice interaction method, a voice interaction device, a vehicle and a storage medium.
Background
At present, with the development of intelligent traffic, the man-machine interaction mode is gradually developed from the interaction mode of a manual operation graphical user interface (GRAPHICAL USER INTERFACE, GUI) to the interaction mode of a voice interaction interface (Voice User Interface, VUI), so that the driving safety and convenience are improved.
Disclosure of Invention
The embodiment of the disclosure at least provides a voice interaction method, a voice interaction device, a vehicle and a storage medium.
In a first aspect, an embodiment of the present disclosure provides a voice interaction method, including:
obtaining a first voice instruction for interacting with a user interface, wherein the first voice instruction is used for operating a target function;
Determining to enter a multi-round voice dialogue mode, and displaying an interactive interface of the target function in the multi-round voice dialogue mode;
obtaining a first execution result of a first interaction event on the interaction interface, wherein the first interaction event is triggered by touch operation on the interaction interface;
and according to the first execution result, when the voice conversation flow of the multi-round voice conversation mode is determined to be continued, entering the next round of conversation of the multi-round voice conversation mode.
In the embodiment of the disclosure, based on a first voice command, determining to enter a multi-round voice dialogue mode, displaying an interactive interface of the target function in the multi-round voice dialogue mode, obtaining a first execution result of a first interactive event on the interactive interface, wherein the first interactive event is triggered by touch operation on the interactive interface, and further entering a next round of dialogue in the multi-round voice dialogue mode when determining to continue the voice dialogue flow in the multi-round voice dialogue mode according to the first execution result, so that the touch operation is executed on the interactive interface in the multi-round voice dialogue mode, judging whether to continue the multi-round voice dialogue flow according to a first execution result corresponding to the touch operation, realizing logic of voice interaction is simpler, voice interaction can be better fused with GUI interaction, direct interruption due to manual touch operation in the voice interaction process is avoided, interaction flexibility and convenience are improved, and interaction experience of a user is also improved.
In an alternative embodiment, determining, according to the first execution result, a voice conversation process of continuing the multiple-round voice conversation mode includes:
determining an operation to be executed corresponding to the first execution result according to the first execution result, wherein the operation to be executed is used for triggering a second interaction event positioned after the first interaction event;
Judging whether the second interaction event belongs to a target interaction event associated with the multi-round voice conversation mode or judging whether a scene type corresponding to the second interaction event belongs to a target scene type associated with the multi-round voice conversation mode;
if yes, determining a voice conversation flow which continues the multi-round voice conversation mode.
In the embodiment of the disclosure, in the multi-round voice conversation mode, the first execution result may be determined based on the first execution result of the touch operation, and if the operation to be executed corresponding to the first execution result still determines the target interaction event or the target scene type associated with the voice party, the touch operation does not interrupt the multi-round voice conversation mode, and the language conversation flow may be continued.
In an alternative embodiment, determining, according to the first execution result, a voice conversation process of continuing the multiple-round voice conversation mode includes:
Judging whether the first execution result belongs to a target execution result of a target interaction event associated with the multi-round voice dialogue mode;
if yes, determining a voice conversation flow which continues the multi-round voice conversation mode.
In the embodiment of the disclosure, the judging efficiency can be improved by judging that the first executing result belongs to the target executing result associated with the voice service, namely, continuously maintaining the multi-round voice dialogue mode.
In an alternative embodiment, after displaying the interactive interface of the target function in the multi-round voice conversation mode, the method further includes:
determining a full duplex waiting state for entering the multi-round voice conversation mode;
And if the first execution result is not obtained before the timing duration of the full duplex waiting state is reached, ending the multi-round voice conversation mode.
In the embodiment of the disclosure, whether the multi-round voice conversation mode is ended or not can be judged by monitoring the timing duration of the full duplex waiting state entering the multi-round voice conversation mode, so that the problem that the power consumption or the resource consumption performance is caused by long-time starting of the multi-round voice conversation mode without voice conversation by a user is avoided.
In an alternative embodiment, the method further comprises:
when the voice conversation flow of the multi-round voice conversation mode is determined not to be continued according to the first execution result, the full duplex waiting state is maintained;
And ending the multi-round voice conversation mode after the timing duration of the full duplex waiting state is reached, or determining to enter a new multi-round voice conversation mode if a second voice command is received before the timing duration of the full duplex waiting state is reached.
In the embodiment of the disclosure, when it is determined that the voice conversation process is not continued according to the first execution result, a full duplex waiting state with a certain duration can be maintained, so that when voice is triggered again later, the duration of entering the multi-round voice conversation mode can be reduced, and the response speed can be improved.
In an alternative embodiment, the obtaining the first execution result of the first interaction event on the interaction interface includes:
Monitoring a callback interface of the first interaction event on the interaction interface, wherein the callback interface characterizes an interface registered by an execution result;
And obtaining a return value of the callback interface according to the monitoring result, wherein the return value is used for indicating a first execution result of the first interaction event.
In the embodiment of the disclosure, the first execution result can be monitored through the pre-registered interface, so that the implementation is simple and convenient, and the efficiency is improved.
In an alternative embodiment, before determining to continue the voice conversation process of the multiple voice conversation mode, the method further includes:
And determining that the voice state corresponding to the multi-round voice conversation mode is in an activated state.
In the embodiment of the disclosure, whether to continue the voice conversation process is judged by combining the first execution result and whether the voice state is in the activated state, so that the judgment accuracy is improved, and the voice interaction experience is improved.
In a second aspect, an embodiment of the present disclosure further provides a voice interaction device, including a processor, a memory, the memory storing machine-readable instructions executable by the processor, the processor configured to execute the machine-readable instructions stored in the memory, the machine-readable instructions when executed by the processor, the processor configured to perform the steps of:
obtaining a first voice instruction for interacting with a user interface, wherein the first voice instruction is used for operating a target function;
Determining to enter a multi-round voice dialogue mode, and displaying an interactive interface of the target function in the multi-round voice dialogue mode;
obtaining a first execution result of a first interaction event on the interaction interface, wherein the first interaction event is triggered by touch operation on the interaction interface;
and according to the first execution result, when the voice conversation flow of the multi-round voice conversation mode is determined to be continued, entering the next round of conversation of the multi-round voice conversation mode.
In a third aspect, an optional implementation manner of the disclosure further provides a vehicle, where the vehicle includes the apparatus in the second aspect.
In a fourth aspect, an alternative implementation of the present disclosure further provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the first aspect, or any of the possible implementation manners of the first aspect.
The description of the effects of the voice interaction device, the vehicle, and the computer-readable storage medium is referred to the description of the voice interaction method, and is not repeated here.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the aspects of the disclosure.
The foregoing objects, features and advantages of the disclosure will be more readily apparent from the following detailed description of the preferred embodiments taken in conjunction with the accompanying drawings.
Drawings
FIG. 1 illustrates a flow chart of a method of voice interaction provided by an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a voice interaction device according to an embodiment of the present disclosure;
fig. 3 shows a schematic view of a vehicle provided by an embodiment of the present disclosure.
Detailed Description
For the convenience of understanding the technical solutions of the present disclosure, technical terms in the embodiments of the present disclosure will be described first:
A graphical user interface (GRAPHICAL USER INTERFACE, GUI) representing a computer-operated user interface graphically displayed is a dialog interface between the computer and its user, and the user can operate icons or menu options, etc. on the graphical user interface by means of, for example, a mouse, finger touch, etc.
A voice interactive interface (Voice User Interface, VUI) that represents a voice interactive interface that enables voice interaction between a person and a computer, the VUI may not need to have a specific visual interface, and may be entirely audible or tactile, such as by a sound-controlled blinking of lights, vibration, etc.
The human-computer interaction mode based on the intelligent transportation means is gradually developed from the interaction mode of manual operation GUI to the interaction mode of VUI, the vehicle driving is more convenient and safer through the voice interaction mode of VUI, in the related technology, for example, a user establishes a voice interaction flow through a voice awakening instruction, and then the user can input a voice control instruction through voice, but in the voice interaction flow, if the user performs manual clicking operation on a GUI control, the voice interaction flow is interrupted to enter the GUI interaction flow, the scheme has higher coupling between the VUI and the GUI, the VUI and the GUI can not be well fused, the interaction flexibility is reduced, and the convenience of voice interaction is also influenced.
In order to solve the problems, the disclosure provides a solution of a voice interaction method, which obtains a first voice command for interacting with a user interface, wherein the first voice command is used for operating a target function, determines to enter a multi-round voice dialogue mode, displays an interaction interface of the target function in the multi-round voice dialogue mode, further obtains a first execution result of the first interaction event on the interaction interface if the first interaction event is triggered by touch operation on the interaction interface, and enters a next round of dialogue in the multi-round voice dialogue mode when determining to continue the voice dialogue flow in the multi-round voice dialogue mode according to the first execution result, so that when entering the multi-round voice dialogue mode, whether to continue the multi-round voice dialogue mode can be judged through the first execution result corresponding to the touch operation, the voice dialogue flow is interrupted instead of receiving the manual touch operation, interference of the manual operation is avoided, and judgment of whether to continue the voice dialogue flow is only required to be performed by monitoring the first execution result, various touch operation is not required, and the target function is not required to actively transmit various touch operation to various touch parties, so that the voice interaction function can be realized, and the flexibility is improved.
In order to facilitate understanding of the present embodiment, a description of a voice interaction method in the embodiment of the present disclosure is first provided below. The execution main body of the voice interaction method provided by the embodiment of the disclosure is generally computer equipment with a certain computing capability, the computer equipment comprises a vehicle-mounted terminal, a vehicle machine, a terminal device or other processing devices, for example, in an intelligent vehicle application scene, the intelligent vehicle is an electric vehicle, a fuel vehicle, a hybrid electric vehicle and the like, the vehicle machine of the vehicle represents a vehicle-mounted infotainment product installed in the vehicle, various service functions, man-machine interaction functions and the like can be provided, a user can interact with the vehicle through a display device such as a screen of the vehicle and the like, and control of the vehicle functions is realized through interaction modes of a GUI or a VUI. In some possible implementations, the voice interaction method may be implemented by a processor invoking computer readable instructions stored in a memory.
The following describes a voice interaction method provided by the embodiment of the present disclosure, taking a vehicle-mounted device with an execution body as a vehicle as an example. Referring to fig. 1, a flowchart of a voice interaction method according to an embodiment of the disclosure is shown, where the method includes:
S101, a first voice instruction for interacting with a user interface is obtained, wherein the first voice instruction is used for operating a target function.
In one possible implementation, the target function may be a function in the target application, for example, a temperature adjusting function in the air conditioner, and the operation of the function may be performed on an air conditioner application interface, or may be performed on a function card displayed on a vehicle interface based on a function component that is invoked (not to invoke the air conditioner application).
S102, determining to enter a multi-round voice conversation mode and displaying an interactive interface of a target function in the multi-round voice conversation mode.
In the embodiment of the disclosure, the application scenario is mainly aimed at an intelligent vehicle, and a multi-round voice dialogue mode is supported, the multi-round voice dialogue mode indicates that multi-round dialogue can be performed, and only one wake-up is needed, for example, a user can input a first voice command through a vehicle voice assistant, for example, "please open a target function a", then a new round of dialogue under the multi-round voice dialogue mode can be entered, the triggering of the multi-round voice dialogue mode can be a timing duration that at least one time triggers a voice dialogue flow before and does not reach a full duplex waiting state, and then the dialogue management module can trigger to open the target function a after performing semantic understanding and analysis based on the first voice command, and display an interactive interface of the target function a.
And S103, obtaining a first execution result of a first interaction event on the interaction interface, wherein the first interaction event is triggered by touch operation on the interaction interface.
Specifically for the step S103, the disclosure provides a possible implementation manner, which includes monitoring a callback interface of a first interaction event on an interaction interface, characterizing an interface registered by an execution result obtained by the callback interface, and obtaining a return value of the callback interface according to the monitoring result, where the return value is used to indicate the first execution result of the first interaction event.
For example, a corresponding callback interface may be preset and registered for a target function in a certain target application, a first execution result of a first interaction event associated with the target function may be obtained by monitoring a return value of the callback interface, and in addition, usually, some target functions need to monitor an execution result that is limited, so that the corresponding callback interface may be registered in advance according to a requirement, and the return value may be monitored through the corresponding callback interface to obtain the corresponding first execution result.
The first interaction event is triggered by a touch operation on an interaction interface of the target function, so that a first execution result is usually generated by an application service, where the application service may be understood as a module for providing a background service for the target function, for example, the application service may receive the first interaction event input by the touch operation on the interaction interface of the target function, and may perform corresponding processing based on the first interaction event, so as to obtain the first execution result, for example, the target function is a navigation function in a map, the first interaction event is a navigation button clicked to a place, and then the first execution result may be a planned route success and a planned navigation route.
In the foregoing process, in the embodiment of the present disclosure, the voice service may actively monitor the first execution result from the application service in real time, where the voice service may be understood as a module that provides a background service for the voice function, and further after the application service generates the first execution result for the first interaction event, the voice service may also obtain the first execution result of the first interaction event.
It should be noted that, in the embodiment of the present disclosure, the voice service does not need to be concerned about what the first interaction event is, and does not need to be concerned about whether the first interaction event is triggered by voice or triggered by manual clicking, but only needs to monitor the first execution result, so as to determine whether to continue the multiple-round voice conversation mode according to the first execution result, thereby reducing the coupling between the application service and the voice service, and also reducing the access threshold and the cost of the application developer.
Therefore, the capability of acquiring the required data can be transferred to the voice service for active monitoring acquisition in a mode that the application service actively informs, the application service does not need to actively inform the voice service of various touch operations, the dependence of the voice service on the application service is reduced, the decoupling of the application service and the voice service is realized, the voice service does not need to monitor touch operations or interaction events, only the first execution result is concerned, and the realization logic is simpler.
And S104, according to the first execution result, when the voice conversation flow of the multi-round voice conversation mode is determined to be continued, entering the next round of conversation of the multi-round voice conversation mode.
The following describes a voice conversation process for determining whether to continue the multi-round voice conversation mode according to the first execution result. In the embodiment of the disclosure, after the first execution result of the first interaction event is obtained, whether to continue the voice conversation process of the multi-round voice conversation mode can be judged, so as to avoid the situation that the manual touch operation directly interrupts the voice interaction process.
Specifically, for determining a voice conversation process of continuing a multi-round voice conversation mode according to a first execution result, several possible implementations are provided in the embodiments of the present disclosure, including:
In a possible implementation manner, determining the voice conversation process of continuing the multi-round voice conversation mode according to the first execution result includes 1) determining an operation to be executed corresponding to the first execution result according to the first execution result, wherein the operation to be executed is used for triggering a second interaction event located after the first interaction event, 2) judging whether the second interaction event belongs to a target interaction event associated with the multi-round voice conversation mode or judging whether a scene type corresponding to the second interaction event belongs to a target scene type associated with the multi-round voice conversation mode, and 3) if so, determining the voice conversation process of continuing the multi-round voice conversation mode.
The target interaction event or the target scene type associated with the multi-round voice dialogue mode may be preset, for example, the target scene type associated with the voice service may be set to have a navigation scene, or a specific refinement scene in the navigation scene, etc., which is not limited.
For example, a user opens a navigation function in a map application by voice, enters a multi-round voice dialogue mode, and displays an interactive interface of the navigation function, if the user clicks a road calculation icon in the interactive interface through touch operation, a first execution result of success or failure of the road calculation can be obtained through the road calculation processing, if the first execution result is the road calculation success, a plurality of calculated candidate paths can be displayed to the user, and if an event that a path selection event in the plurality of candidate paths is responded by voice service is preset, namely, a target interaction event associated with the multi-round dialogue mode, the voice dialogue flow can be judged to be continued.
Therefore, when judging that the voice service is still associated through the second interaction event after the corresponding first execution result of the touch operation, the multi-round voice conversation mode can be continuously maintained, the phenomenon that the manual touch operation interrupts the multi-round voice conversation mode is avoided, and then the voice interaction can be directly performed without restarting the multi-round voice conversation mode, so that the efficiency and interaction experience are improved.
In another possible embodiment, determining the voice conversation process of continuing the multi-round voice conversation mode according to the first execution result includes determining whether the first execution result belongs to a target execution result of a target interaction event associated with the multi-round voice conversation mode, and if so, determining the voice conversation process of continuing the multi-round voice conversation mode.
Specifically, in the embodiment of the present disclosure, an association list corresponding to a voice service may be set, where the association list includes a plurality of target interaction events or a plurality of target execution results, and for different applications or functions, the target execution results of which target interaction events need to be focused by the voice service may be preset according to experience and requirements, for example, for a vehicle-mounted entertainment function, search results of search interaction events related to music, video, and the like, play control events, and the like may be preset as target interaction events or target execution results that need to be focused by the voice service.
In the embodiment of the disclosure, for the target execution result of the target interaction event to be focused, the call-back interface corresponding to the call-back interface is registered for the voice service by defining the interface function of the call-back interface in advance, and setting the calling party of the interface function as the voice service, so that the voice service needs to monitor or respond to the first execution result of which functions, the call-back interface corresponding to the voice service can be registered in advance, and the voice service can actively monitor the call-back interfaces to obtain the first execution result, so that if the first execution result corresponding to the call-back interface to be focused is judged, the target execution result of the target interaction event associated with the multi-round voice dialogue mode can be judged, and the voice dialogue flow for continuing the multi-round voice dialogue mode is determined.
Further, in the embodiment of the present disclosure, in order to further improve the interaction experience of voice and manual touch, a waiting countdown manner of a multi-round voice dialogue mode may be further combined, so that the multi-round voice mode is avoided being started all the time, but no triggering operation is performed, performance consumption may be reduced, and a possible implementation manner is provided in the present disclosure:
1) In one possible implementation, after displaying the interactive interface of the target function in the multi-round voice conversation mode, the full duplex waiting state of entering the multi-round voice conversation mode can be determined, and if the first execution result is not obtained before the timing duration of the full duplex waiting state is reached, the multi-round voice conversation mode is ended.
For example, the timing duration of the preset full duplex waiting state is 10 seconds, in the multi-round voice dialogue mode, the user initiates a round of dialogue, inputs a first voice command of "please play songs", plays songs and displays an interactive interface associated with the songs on the vehicle-mounted display screen, meanwhile, the full duplex waiting state can be entered, in 10 seconds after the round of dialogue is finished, a new voice command is not received, and any first execution result corresponding to the touch operation on the interactive interface is not obtained, so that the multi-round voice dialogue mode is ended.
In the embodiment of the disclosure, after entering a multi-round voice dialogue mode based on a first voice command and displaying an interactive interface of a target function in the multi-round voice dialogue mode, entering a double-full-work waiting state, wherein the double-full-work waiting state represents countdown waiting, if a first execution result is not obtained all the time in the stage of the double-full-work waiting state, the multi-round voice dialogue mode can be ended, at the moment, the multi-round voice dialogue mode can enter an inactive state and no response to voice or touch operation is performed, and if a subsequent user needs to use voice service, the subsequent user needs to wake up again.
2) In a possible implementation manner, when the voice conversation flow of the multi-round voice conversation mode is determined not to continue according to the first execution result, the full duplex waiting state is maintained, the multi-round voice conversation mode is ended after the timing duration of the full duplex waiting state is reached, or if a second voice command is received before the timing duration of the full duplex waiting state is reached, the new multi-round voice conversation mode is determined to be entered.
In this embodiment, when it is determined according to the first execution result that the voice conversation process in the multi-round voice conversation mode is not continued, the full duplex waiting state is still maintained, and waiting is maintained for a period of time, so that if a voice command or other target execution results associated with the multi-round voice conversation mode exist before the timing duration arrives, voice service can still be used, efficiency is improved, and user interaction experience is improved.
Furthermore, in the embodiment of the present disclosure, based on the timing duration of the full duplex waiting state, before determining to continue the voice conversation process of the multiple voice conversation modes, it is further required to determine that the voice state corresponding to the multiple voice conversation modes is in the active state in the embodiment of the present disclosure.
Therefore, through the judgment of the voice state and the first execution result, if the voice state is in the voice activation state and the first execution result is required to be responded or processed, the voice conversation process of the multi-round voice conversation mode can be continuously maintained, the next round of conversation is started, the voice conversation process is not interrupted due to the manual touch operation, and the user interaction experience is improved.
Of course, in the embodiment of the present disclosure, if the voice state corresponding to the current multi-round voice dialogue mode is in the inactive state, the voice dialogue flow of the multi-round voice dialogue mode is not continued, and the GUI flow is executed, so as to avoid that the user interaction experience is affected by suddenly entering the voice dialogue flow to perform voice playing, and the actual voice interaction scene is not met.
In addition, in the embodiment of the present disclosure, in the case of determining to continue the voice conversation flow of the multi-round voice conversation mode, the next round of conversation of the multi-round voice conversation mode may be entered, and at this time, the present disclosure further provides a possible implementation manner, to play voice guidance information, where the voice guidance information may be understood as information for guiding the subsequent voice conversation, for example, question information, result prompt information, and the like, which is not limited. The present disclosure also provides several possible embodiments:
1) In one possible implementation manner, when determining to continue the voice conversation process of the multi-round voice conversation mode, performing semantic analysis on the first execution result according to the first execution result to determine intention information, determining voice guidance information corresponding to the intention information according to the intention information, and entering the next round of conversation of the multi-round voice conversation mode.
In the embodiment of the disclosure, when determining to continue the voice conversation process of the multi-round voice conversation mode, the intention information after obtaining the first execution result may be determined according to the first execution result, specifically, the first execution result may be input into the intention model based on the trained intention model, the first execution result may be subjected to semantic analysis, the intention information corresponding to the first execution result is obtained, for example, it is determined what the user wants to do after obtaining the first execution result, and further, the voice guidance information corresponding to the intention information may be played, so as to guide the user to perform the next round of voice conversation.
For example, a user opens a map application, searches a place through a voice command, displays a place list in an interactive interface, can manually click or select one place for navigation through the voice command, a voice party service can acquire a first execution result, the first execution result is a navigation result, determines a current voice state as an activation state, and determines a voice dialogue flow for continuing a multi-round voice dialogue mode, further, through intention analysis on the first execution result, voice guidance information corresponding to the first execution result is obtained, the voice guidance information can be "route planned for you, navigation is started", and then, for the user, the voice guidance information is displayed to a user navigation result page and can be played at the same time.
2) In another possible implementation manner, when determining to continue the voice conversation process of the multi-round voice conversation mode, determining the voice guidance information corresponding to the first execution result according to the mapping relationship between the preset execution result and the voice guidance information, and entering the next round of conversation of the multi-round voice conversation mode.
In other words, in the embodiment of the present disclosure, the mapping relationship between each first execution result and the voice guidance information may be preset, and different voice guidance information may be played by different first execution results, so that corresponding voice guidance information may be directly obtained according to the mapping relationship to play, which is relatively simple to implement and has relatively high efficiency.
Further, after playing the voice guidance information corresponding to the first execution result, in the embodiment of the present disclosure, a voice instruction fed back according to the voice guidance information may also be obtained, control information may be generated according to the voice instruction, and the control information may be sent to a corresponding control device of the vehicle, where the control information is used to control the vehicle through the control device, so as to implement voice control of the vehicle, for example, reduce a playing volume, close a driving window, and so on.
It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.
Based on the same inventive concept, a voice interaction device corresponding to the voice interaction method is also provided in the embodiments of the present disclosure, and since the principle of solving the problem of the voice interaction device in the embodiments of the present disclosure is similar to that of the voice interaction method in the embodiments of the present disclosure, the specific implementation of the device may refer to the implementation of the method, and the repetition is omitted.
Referring to fig. 2, a schematic diagram of a voice interaction device according to an embodiment of the disclosure is provided, where the device includes a processor 21 and a memory 22, where the memory 22 stores machine-readable instructions executable by the processor 21, the processor 21 is configured to execute the machine-readable instructions stored in the memory 22, and when the machine-readable instructions are executed by the processor 21, the processor 21 is configured to perform the following steps:
obtaining a first voice instruction for interacting with a user interface, wherein the first voice instruction is used for operating a target function;
Determining to enter a multi-round voice dialogue mode, and displaying an interactive interface of the target function in the multi-round voice dialogue mode;
obtaining a first execution result of a first interaction event on the interaction interface, wherein the first interaction event is triggered by touch operation on the interaction interface;
and according to the first execution result, when the voice conversation flow of the multi-round voice conversation mode is determined to be continued, entering the next round of conversation of the multi-round voice conversation mode.
In an alternative embodiment, when determining to continue the voice conversation process of the multiple voice conversation mode according to the first execution result, the processor 21:
determining an operation to be executed corresponding to the first execution result according to the first execution result, wherein the operation to be executed is used for triggering a second interaction event positioned after the first interaction event;
Judging whether the second interaction event belongs to a target interaction event associated with the multi-round voice conversation mode or judging whether a scene type corresponding to the second interaction event belongs to a target scene type associated with the multi-round voice conversation mode;
if yes, determining a voice conversation flow which continues the multi-round voice conversation mode.
In an alternative embodiment, when determining to continue the voice conversation process of the multi-round voice conversation mode according to the first execution result, the processor 21 is configured to:
judging whether the first execution result belongs to a target execution result of a target interaction event associated with the multi-round voice conversation mode, and if so, determining to continue the voice conversation flow of the multi-round voice conversation mode.
In an alternative embodiment, after displaying the interactive interface of the target function in the multi-round voice conversation mode, the processor 21 is further configured to:
determining a full duplex waiting state for entering the multi-round voice conversation mode;
And if the first execution result is not obtained before the timing duration of the full duplex waiting state is reached, ending the multi-round voice conversation mode.
In an alternative embodiment, the processor 21 is further configured to:
when the voice conversation flow of the multi-round voice conversation mode is determined not to be continued according to the first execution result, the full duplex waiting state is maintained;
And ending the multi-round voice conversation mode after the timing duration of the full duplex waiting state is reached, or determining to enter a new multi-round voice conversation mode if a second voice command is received before the timing duration of the full duplex waiting state is reached.
In an alternative embodiment, the obtaining the first execution result of the first interaction event on the interaction interface includes:
Monitoring a callback interface of the first interaction event on the interaction interface, wherein the callback interface characterizes an interface registered by an execution result;
And obtaining a return value of the callback interface according to the monitoring result, wherein the return value is used for indicating a first execution result of the first interaction event.
In an alternative embodiment, the processor 21 is further configured to determine that the voice state corresponding to the multiple voice conversation mode is in an active state before determining to continue the voice conversation process of the multiple voice conversation mode.
The memory 22 includes a memory 221 and an external memory 222, and the memory 221 is also referred to as an internal memory, and is used for temporarily storing operation data in the processor 21 and data exchanged with the external memory 222 such as a hard disk, and the processor 21 exchanges data with the external memory 222 via the memory 221.
The specific execution process of the above instruction may refer to the steps of the voice interaction method described in the embodiments of the present disclosure, which is not described herein.
The description of the process flow of each component in the apparatus, and the interaction flow between components, is not meant to imply any limitation on the implementation by the exact process flow and interaction flow, which should be determined by the functions and possibly inherent logic thereof.
Referring to fig. 3, a schematic diagram of a vehicle according to an embodiment of the disclosure is provided, where the vehicle includes the voice interaction device in the foregoing embodiment.
In the embodiment of the disclosure, the interactive interface of the target function may be displayed through a screen of the vehicle, where the screen of the vehicle may be further divided into a plurality of areas, including, for example, a central control screen, a main screen, a sub-screen, and the like, and not limited thereto, a voice assistant may be installed in the vehicle system, and a user may interact with the vehicle through touch or voice. For example, the user may input a voice command "open door", from which the voice assistant generates a control command and sends it to the vehicle's controller, which may control the opening of the door of the vehicle.
In practice, when a user interacts with a vehicle, touch and voice modes, for example, voice is used to open multimedia music, and then a manual touch click list is selected and played, in the embodiment of the disclosure, whether to continue the voice conversation process of the multi-round voice conversation mode can be judged according to the first execution result of the touch selection playing event, when the continuation is determined, the next round of conversation of the multi-round voice conversation mode is entered, and corresponding voice guiding information can be played, so that the user is guided to perform the next round of conversation, and the interactive experience of the user is improved.
The disclosed embodiments also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the voice interaction method described in the method embodiments above. Wherein the storage medium may be a volatile or nonvolatile computer readable storage medium.
The embodiments of the present disclosure further provide a computer program product, on which a computer program is stored, where the computer program when executed by a processor implements the steps of the voice interaction method described in the foregoing method embodiments, and specifically, reference may be made to the foregoing method embodiments, which are not described herein in detail.
The methods of the present application may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer programs or instructions. When the computer program or instructions are loaded and executed on a computer, the processes or functions of the present application are performed in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, a network device, a user device, a core network device, an operation administration maintenance (Operation Administration AND MAINTENANCE, OAM) or other programmable device.
The computer program or instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer program or instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired or wireless means. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that integrates one or more available media. The usable medium may be a magnetic medium such as a floppy disk, a hard disk, a magnetic tape, an optical medium such as a digital video disk, or a semiconductor medium such as a solid state disk. The computer readable storage medium may be volatile or nonvolatile storage medium, or may include both volatile and nonvolatile types of storage medium.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present disclosure may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in essence or a part contributing to the prior art or a part of the technical solution, or in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present disclosure. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.
It should be noted that the foregoing embodiments are merely specific implementations of the disclosure, and are not intended to limit the scope of the disclosure, and although the disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that any modification, variation or substitution of some of the technical features described in the foregoing embodiments may be made or equivalents may be substituted for those within the scope of the disclosure without departing from the spirit and scope of the technical aspects of the embodiments of the disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (10)

1.一种语音交互方法,其特征在于,包括:1. A voice interaction method, comprising: 获得与用户界面进行交互的第一语音指令,所述第一语音指令用于操作目标功能;Obtaining a first voice instruction for interacting with a user interface, wherein the first voice instruction is used to operate a target function; 确定进入多轮语音对话模式,并显示所述多轮语音对话模式下的所述目标功能的交互界面;Determining to enter a multi-round voice dialogue mode, and displaying an interactive interface of the target function in the multi-round voice dialogue mode; 获得所述交互界面上第一交互事件的第一执行结果,所述第一交互事件为通过在所述交互界面上的触控操作而触发的;Obtaining a first execution result of a first interaction event on the interaction interface, where the first interaction event is triggered by a touch operation on the interaction interface; 根据所述第一执行结果,在确定继续所述多轮语音对话模式的语音对话流程时,进入所述多轮语音对话模式的下一轮对话。According to the first execution result, when it is determined to continue the voice dialogue process of the multi-round voice dialogue mode, the next round of dialogue of the multi-round voice dialogue mode is entered. 2.根据权利要求1所述的方法,其特征在于,根据所述第一执行结果,确定继续所述多轮语音对话模式的语音对话流程,包括:2. The method according to claim 1, characterized in that, according to the first execution result, determining to continue the voice dialogue process of the multi-round voice dialogue mode comprises: 根据所述第一执行结果,确定与所述第一执行结果对应的待执行操作,所述待执行操作用于触发位于所述第一交互事件之后的第二交互事件;Determining, according to the first execution result, an operation to be executed corresponding to the first execution result, wherein the operation to be executed is used to trigger a second interaction event that is located after the first interaction event; 判断所述第二交互事件是否属于所述多轮语音对话模式关联的目标交互事件,或者,判断所述第二交互事件对应的场景类型是否属于所述多轮语音对话模式关联的目标场景类型;Determining whether the second interaction event belongs to the target interaction event associated with the multi-round voice dialogue mode, or determining whether the scene type corresponding to the second interaction event belongs to the target scene type associated with the multi-round voice dialogue mode; 若是,确定继续所述多轮语音对话模式的语音对话流程。If so, determine to continue the voice dialogue process of the multi-round voice dialogue mode. 3.根据权利要求1所述的方法,其特征在于,根据所述第一执行结果,确定继续所述多轮语音对话模式的语音对话流程,包括:3. The method according to claim 1, characterized in that, according to the first execution result, determining to continue the voice dialogue process of the multi-round voice dialogue mode comprises: 判断所述第一执行结果是否属于所述多轮语音对话模式关联的目标交互事件的目标执行结果;Determining whether the first execution result belongs to the target execution result of the target interaction event associated with the multi-round voice dialogue mode; 若是,确定继续所述多轮语音对话模式的语音对话流程。If so, determine to continue the voice dialogue process of the multi-round voice dialogue mode. 4.根据权利要求1-3任一项所述的方法,其特征在于,在显示所述多轮语音对话模式下的所述目标功能的交互界面之后,所述方法还包括:4. The method according to any one of claims 1 to 3, characterized in that after displaying the interactive interface of the target function in the multi-round voice dialogue mode, the method further comprises: 确定进入所述多轮语音对话模式的全双工等待状态;Determine to enter the full-duplex waiting state of the multi-round voice dialogue mode; 若在所述全双工等待状态的计时时长到达之前,未获得所述第一执行结果,结束所述多轮语音对话模式。If the first execution result is not obtained before the timing duration of the full-duplex waiting state is reached, the multi-round voice dialogue mode is terminated. 5.根据权利要求4所述的方法,其特征在于,所述方法还包括:5. The method according to claim 4, characterized in that the method further comprises: 在根据所述第一执行结果,确定不再继续所述多轮语音对话模式的语音对话流程时,保持所述全双工等待状态;When it is determined according to the first execution result that the voice conversation process of the multi-round voice conversation mode is no longer to be continued, maintaining the full-duplex waiting state; 在所述全双工等待状态的计时时长到达后,结束所述多轮语音对话模式;或者,若在所述全双工等待状态的计时时长到达之前,接收到第二语音指令,确定进入新的多轮语音对话模式。After the timing duration of the full-duplex waiting state is reached, the multi-round voice dialogue mode is terminated; or, if a second voice instruction is received before the timing duration of the full-duplex waiting state is reached, it is determined to enter a new multi-round voice dialogue mode. 6.根据权利要求1所述的方法,其特征在于,所述获得所述交互界面上第一交互事件的第一执行结果,包括:6. The method according to claim 1, characterized in that obtaining a first execution result of a first interactive event on the interactive interface comprises: 监听所述交互界面上所述第一交互事件的回调接口,所述回调接口表征获取执行结果所注册的接口;A callback interface for monitoring the first interactive event on the interactive interface, wherein the callback interface represents an interface registered for obtaining the execution result; 根据监听结果,获得所述回调接口的返回值,所述返回值用于指示所述第一交互事件的第一执行结果。According to the monitoring result, a return value of the callback interface is obtained, where the return value is used to indicate a first execution result of the first interaction event. 7.根据权利要求1所述的方法,其特征在于,确定继续所述多轮语音对话模式的语音对话流程之前,所述方法还包括:7. The method according to claim 1, characterized in that before determining to continue the voice dialogue process of the multi-round voice dialogue mode, the method further comprises: 确定所述多轮语音对话模式对应的语音状态处于激活状态。It is determined that the voice state corresponding to the multi-round voice dialogue mode is in an activated state. 8.一种语音交互装置,其特征在于,包括处理器、存储器,所述存储器存储有所述处理器可执行的机器可读指令,所述处理器用于执行所述存储器中存储的机器可读指令,所述机器可读指令被所述处理器执行时,所述处理器用于执行以下步骤:8. A voice interaction device, comprising a processor and a memory, wherein the memory stores machine-readable instructions executable by the processor, and the processor is used to execute the machine-readable instructions stored in the memory. When the machine-readable instructions are executed by the processor, the processor is used to perform the following steps: 获得与用户界面进行交互的第一语音指令,所述第一语音指令用于操作目标功能;Obtaining a first voice instruction for interacting with a user interface, wherein the first voice instruction is used to operate a target function; 确定进入多轮语音对话模式,并显示所述多轮语音对话模式下的所述目标功能的交互界面;Determining to enter a multi-round voice dialogue mode, and displaying an interactive interface of the target function in the multi-round voice dialogue mode; 获得所述交互界面上第一交互事件的第一执行结果,所述第一交互事件为通过在所述交互界面上的触控操作而触发的;Obtaining a first execution result of a first interaction event on the interaction interface, where the first interaction event is triggered by a touch operation on the interaction interface; 根据所述第一执行结果,在确定继续所述多轮语音对话模式的语音对话流程时,进入所述多轮语音对话模式的下一轮对话。According to the first execution result, when it is determined to continue the voice dialogue process of the multi-round voice dialogue mode, the next round of dialogue of the multi-round voice dialogue mode is entered. 9.一种车辆,其特征在于,所述车辆包括如权利要求8所述的语音交互装置。9. A vehicle, characterized in that the vehicle comprises the voice interaction device as claimed in claim 8. 10.一种计算机可读存储介质,其上存储有计算机程序,其特征在于:所述计算机程序被处理器执行时实现权利要求1-7任一项所述方法的步骤。10. A computer-readable storage medium having a computer program stored thereon, wherein the computer program implements the steps of the method according to any one of claims 1 to 7 when executed by a processor.
CN202311378512.7A 2023-10-23 2023-10-23 Voice interaction method and device, vehicle and storage medium Pending CN119889302A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311378512.7A CN119889302A (en) 2023-10-23 2023-10-23 Voice interaction method and device, vehicle and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311378512.7A CN119889302A (en) 2023-10-23 2023-10-23 Voice interaction method and device, vehicle and storage medium

Publications (1)

Publication Number Publication Date
CN119889302A true CN119889302A (en) 2025-04-25

Family

ID=95443711

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311378512.7A Pending CN119889302A (en) 2023-10-23 2023-10-23 Voice interaction method and device, vehicle and storage medium

Country Status (1)

Country Link
CN (1) CN119889302A (en)

Similar Documents

Publication Publication Date Title
JP6963700B2 (en) Generating and transmitting call requests to the appropriate third-party agents
US9218812B2 (en) Vehicular device, server, and information processing method
CN106558310B (en) Virtual reality voice control method and device
CN111095399B (en) Voice user interface shortcuts for assistant applications
CN110928409B (en) Vehicle-mounted scene mode control method and device, vehicle and storage medium
KR101809808B1 (en) System and method for emergency calls initiated by voice command
CN107783803B (en) System optimization method and device of intelligent terminal, storage medium and intelligent terminal
CN111813912B (en) Man-machine conversation method, device, equipment and storage medium
CN110050303B (en) Voice-to-text conversion based on third party proxy content
KR20200117070A (en) Initializing a conversation with an automated agent via selectable graphical element
JP7392128B2 (en) Semi-delegated calls with automated assistants on behalf of human participants
CN105912241A (en) Method and device for man-machine interaction, and terminal
CN112751971A (en) Voice playing method and device and electronic equipment
WO2022089483A1 (en) Audio playback control method and apparatus, and electronic device
CN114452644B (en) Game interaction method and device, computer storage medium, and electronic device
CN109144373A (en) An instant messaging method and device
WO2024230509A1 (en) Relationship establishment method and apparatus, device and storage medium
CN114639384A (en) Voice control method, device, equipment and computer storage medium
CN116691563A (en) Vehicle rest control method, device, equipment and storage medium
WO2000004533A1 (en) Automatic speech recognition
CN119889302A (en) Voice interaction method and device, vehicle and storage medium
CN113270096A (en) Voice response method and device, electronic equipment and computer readable storage medium
CN105843672A (en) Control method, device and system for application program
CN108459838B (en) Information processing method and electronic equipment
CN116935831A (en) Audio data processing method, intelligent voice system and intelligent voice device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载