CN119889302A

CN119889302A - Voice interaction method and device, vehicle and storage medium

Info

Publication number: CN119889302A
Application number: CN202311378512.7A
Authority: CN
Inventors: 缪士阳
Original assignee: Shanghai Jidu Automobile Co Ltd
Current assignee: Shanghai Jidu Automobile Co Ltd
Priority date: 2023-10-23
Filing date: 2023-10-23
Publication date: 2025-04-25

Abstract

The disclosure provides a voice interaction method, a device, a vehicle and a storage medium, wherein the method comprises the steps of obtaining a first voice instruction for interaction with a user interface, determining an interaction interface for entering a multi-round voice dialogue mode and displaying the target function in the multi-round voice dialogue mode, obtaining a first execution result of a first interaction event on the interaction interface, wherein the first interaction event is triggered by touch operation on the interaction interface, entering a next round of dialogue in the multi-round voice dialogue mode when the voice dialogue flow in the multi-round voice dialogue mode is determined according to the first execution result, judging whether to continue the voice dialogue flow according to the first execution result, improving the performance of fusion of voice interaction and manual touch interaction, and improving interaction experience.

Description

Voice interaction method and device, vehicle and storage medium

Technical Field

The disclosure relates to the field of computer technology, and in particular relates to a voice interaction method, a voice interaction device, a vehicle and a storage medium.

Background

At present, with the development of intelligent traffic, the man-machine interaction mode is gradually developed from the interaction mode of a manual operation graphical user interface (GRAPHICAL USER INTERFACE, GUI) to the interaction mode of a voice interaction interface (Voice User Interface, VUI), so that the driving safety and convenience are improved.

Disclosure of Invention

The embodiment of the disclosure at least provides a voice interaction method, a voice interaction device, a vehicle and a storage medium.

In a first aspect, an embodiment of the present disclosure provides a voice interaction method, including:

obtaining a first voice instruction for interacting with a user interface, wherein the first voice instruction is used for operating a target function;

Determining to enter a multi-round voice dialogue mode, and displaying an interactive interface of the target function in the multi-round voice dialogue mode;

obtaining a first execution result of a first interaction event on the interaction interface, wherein the first interaction event is triggered by touch operation on the interaction interface;

and according to the first execution result, when the voice conversation flow of the multi-round voice conversation mode is determined to be continued, entering the next round of conversation of the multi-round voice conversation mode.

In the embodiment of the disclosure, based on a first voice command, determining to enter a multi-round voice dialogue mode, displaying an interactive interface of the target function in the multi-round voice dialogue mode, obtaining a first execution result of a first interactive event on the interactive interface, wherein the first interactive event is triggered by touch operation on the interactive interface, and further entering a next round of dialogue in the multi-round voice dialogue mode when determining to continue the voice dialogue flow in the multi-round voice dialogue mode according to the first execution result, so that the touch operation is executed on the interactive interface in the multi-round voice dialogue mode, judging whether to continue the multi-round voice dialogue flow according to a first execution result corresponding to the touch operation, realizing logic of voice interaction is simpler, voice interaction can be better fused with GUI interaction, direct interruption due to manual touch operation in the voice interaction process is avoided, interaction flexibility and convenience are improved, and interaction experience of a user is also improved.

In an alternative embodiment, determining, according to the first execution result, a voice conversation process of continuing the multiple-round voice conversation mode includes:

determining an operation to be executed corresponding to the first execution result according to the first execution result, wherein the operation to be executed is used for triggering a second interaction event positioned after the first interaction event;

Judging whether the second interaction event belongs to a target interaction event associated with the multi-round voice conversation mode or judging whether a scene type corresponding to the second interaction event belongs to a target scene type associated with the multi-round voice conversation mode;

if yes, determining a voice conversation flow which continues the multi-round voice conversation mode.

In the embodiment of the disclosure, in the multi-round voice conversation mode, the first execution result may be determined based on the first execution result of the touch operation, and if the operation to be executed corresponding to the first execution result still determines the target interaction event or the target scene type associated with the voice party, the touch operation does not interrupt the multi-round voice conversation mode, and the language conversation flow may be continued.

Judging whether the first execution result belongs to a target execution result of a target interaction event associated with the multi-round voice dialogue mode;

In the embodiment of the disclosure, the judging efficiency can be improved by judging that the first executing result belongs to the target executing result associated with the voice service, namely, continuously maintaining the multi-round voice dialogue mode.

In an alternative embodiment, after displaying the interactive interface of the target function in the multi-round voice conversation mode, the method further includes:

determining a full duplex waiting state for entering the multi-round voice conversation mode;

And if the first execution result is not obtained before the timing duration of the full duplex waiting state is reached, ending the multi-round voice conversation mode.

In the embodiment of the disclosure, whether the multi-round voice conversation mode is ended or not can be judged by monitoring the timing duration of the full duplex waiting state entering the multi-round voice conversation mode, so that the problem that the power consumption or the resource consumption performance is caused by long-time starting of the multi-round voice conversation mode without voice conversation by a user is avoided.

In an alternative embodiment, the method further comprises:

when the voice conversation flow of the multi-round voice conversation mode is determined not to be continued according to the first execution result, the full duplex waiting state is maintained;

And ending the multi-round voice conversation mode after the timing duration of the full duplex waiting state is reached, or determining to enter a new multi-round voice conversation mode if a second voice command is received before the timing duration of the full duplex waiting state is reached.

In the embodiment of the disclosure, when it is determined that the voice conversation process is not continued according to the first execution result, a full duplex waiting state with a certain duration can be maintained, so that when voice is triggered again later, the duration of entering the multi-round voice conversation mode can be reduced, and the response speed can be improved.

In an alternative embodiment, the obtaining the first execution result of the first interaction event on the interaction interface includes:

Monitoring a callback interface of the first interaction event on the interaction interface, wherein the callback interface characterizes an interface registered by an execution result;

And obtaining a return value of the callback interface according to the monitoring result, wherein the return value is used for indicating a first execution result of the first interaction event.

In the embodiment of the disclosure, the first execution result can be monitored through the pre-registered interface, so that the implementation is simple and convenient, and the efficiency is improved.

In an alternative embodiment, before determining to continue the voice conversation process of the multiple voice conversation mode, the method further includes:

And determining that the voice state corresponding to the multi-round voice conversation mode is in an activated state.

In the embodiment of the disclosure, whether to continue the voice conversation process is judged by combining the first execution result and whether the voice state is in the activated state, so that the judgment accuracy is improved, and the voice interaction experience is improved.

In a second aspect, an embodiment of the present disclosure further provides a voice interaction device, including a processor, a memory, the memory storing machine-readable instructions executable by the processor, the processor configured to execute the machine-readable instructions stored in the memory, the machine-readable instructions when executed by the processor, the processor configured to perform the steps of:

In a third aspect, an optional implementation manner of the disclosure further provides a vehicle, where the vehicle includes the apparatus in the second aspect.

In a fourth aspect, an alternative implementation of the present disclosure further provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the first aspect, or any of the possible implementation manners of the first aspect.

The description of the effects of the voice interaction device, the vehicle, and the computer-readable storage medium is referred to the description of the voice interaction method, and is not repeated here.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the aspects of the disclosure.

The foregoing objects, features and advantages of the disclosure will be more readily apparent from the following detailed description of the preferred embodiments taken in conjunction with the accompanying drawings.

Drawings

FIG. 1 illustrates a flow chart of a method of voice interaction provided by an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a voice interaction device according to an embodiment of the present disclosure;

fig. 3 shows a schematic view of a vehicle provided by an embodiment of the present disclosure.

Detailed Description

For the convenience of understanding the technical solutions of the present disclosure, technical terms in the embodiments of the present disclosure will be described first:

A graphical user interface (GRAPHICAL USER INTERFACE, GUI) representing a computer-operated user interface graphically displayed is a dialog interface between the computer and its user, and the user can operate icons or menu options, etc. on the graphical user interface by means of, for example, a mouse, finger touch, etc.

A voice interactive interface (Voice User Interface, VUI) that represents a voice interactive interface that enables voice interaction between a person and a computer, the VUI may not need to have a specific visual interface, and may be entirely audible or tactile, such as by a sound-controlled blinking of lights, vibration, etc.

The human-computer interaction mode based on the intelligent transportation means is gradually developed from the interaction mode of manual operation GUI to the interaction mode of VUI, the vehicle driving is more convenient and safer through the voice interaction mode of VUI, in the related technology, for example, a user establishes a voice interaction flow through a voice awakening instruction, and then the user can input a voice control instruction through voice, but in the voice interaction flow, if the user performs manual clicking operation on a GUI control, the voice interaction flow is interrupted to enter the GUI interaction flow, the scheme has higher coupling between the VUI and the GUI, the VUI and the GUI can not be well fused, the interaction flexibility is reduced, and the convenience of voice interaction is also influenced.

In order to solve the problems, the disclosure provides a solution of a voice interaction method, which obtains a first voice command for interacting with a user interface, wherein the first voice command is used for operating a target function, determines to enter a multi-round voice dialogue mode, displays an interaction interface of the target function in the multi-round voice dialogue mode, further obtains a first execution result of the first interaction event on the interaction interface if the first interaction event is triggered by touch operation on the interaction interface, and enters a next round of dialogue in the multi-round voice dialogue mode when determining to continue the voice dialogue flow in the multi-round voice dialogue mode according to the first execution result, so that when entering the multi-round voice dialogue mode, whether to continue the multi-round voice dialogue mode can be judged through the first execution result corresponding to the touch operation, the voice dialogue flow is interrupted instead of receiving the manual touch operation, interference of the manual operation is avoided, and judgment of whether to continue the voice dialogue flow is only required to be performed by monitoring the first execution result, various touch operation is not required, and the target function is not required to actively transmit various touch operation to various touch parties, so that the voice interaction function can be realized, and the flexibility is improved.

In order to facilitate understanding of the present embodiment, a description of a voice interaction method in the embodiment of the present disclosure is first provided below. The execution main body of the voice interaction method provided by the embodiment of the disclosure is generally computer equipment with a certain computing capability, the computer equipment comprises a vehicle-mounted terminal, a vehicle machine, a terminal device or other processing devices, for example, in an intelligent vehicle application scene, the intelligent vehicle is an electric vehicle, a fuel vehicle, a hybrid electric vehicle and the like, the vehicle machine of the vehicle represents a vehicle-mounted infotainment product installed in the vehicle, various service functions, man-machine interaction functions and the like can be provided, a user can interact with the vehicle through a display device such as a screen of the vehicle and the like, and control of the vehicle functions is realized through interaction modes of a GUI or a VUI. In some possible implementations, the voice interaction method may be implemented by a processor invoking computer readable instructions stored in a memory.

The following describes a voice interaction method provided by the embodiment of the present disclosure, taking a vehicle-mounted device with an execution body as a vehicle as an example. Referring to fig. 1, a flowchart of a voice interaction method according to an embodiment of the disclosure is shown, where the method includes:

S101, a first voice instruction for interacting with a user interface is obtained, wherein the first voice instruction is used for operating a target function.

In one possible implementation, the target function may be a function in the target application, for example, a temperature adjusting function in the air conditioner, and the operation of the function may be performed on an air conditioner application interface, or may be performed on a function card displayed on a vehicle interface based on a function component that is invoked (not to invoke the air conditioner application).

S102, determining to enter a multi-round voice conversation mode and displaying an interactive interface of a target function in the multi-round voice conversation mode.

In the embodiment of the disclosure, the application scenario is mainly aimed at an intelligent vehicle, and a multi-round voice dialogue mode is supported, the multi-round voice dialogue mode indicates that multi-round dialogue can be performed, and only one wake-up is needed, for example, a user can input a first voice command through a vehicle voice assistant, for example, "please open a target function a", then a new round of dialogue under the multi-round voice dialogue mode can be entered, the triggering of the multi-round voice dialogue mode can be a timing duration that at least one time triggers a voice dialogue flow before and does not reach a full duplex waiting state, and then the dialogue management module can trigger to open the target function a after performing semantic understanding and analysis based on the first voice command, and display an interactive interface of the target function a.

And S103, obtaining a first execution result of a first interaction event on the interaction interface, wherein the first interaction event is triggered by touch operation on the interaction interface.

Specifically for the step S103, the disclosure provides a possible implementation manner, which includes monitoring a callback interface of a first interaction event on an interaction interface, characterizing an interface registered by an execution result obtained by the callback interface, and obtaining a return value of the callback interface according to the monitoring result, where the return value is used to indicate the first execution result of the first interaction event.

For example, a corresponding callback interface may be preset and registered for a target function in a certain target application, a first execution result of a first interaction event associated with the target function may be obtained by monitoring a return value of the callback interface, and in addition, usually, some target functions need to monitor an execution result that is limited, so that the corresponding callback interface may be registered in advance according to a requirement, and the return value may be monitored through the corresponding callback interface to obtain the corresponding first execution result.

The first interaction event is triggered by a touch operation on an interaction interface of the target function, so that a first execution result is usually generated by an application service, where the application service may be understood as a module for providing a background service for the target function, for example, the application service may receive the first interaction event input by the touch operation on the interaction interface of the target function, and may perform corresponding processing based on the first interaction event, so as to obtain the first execution result, for example, the target function is a navigation function in a map, the first interaction event is a navigation button clicked to a place, and then the first execution result may be a planned route success and a planned navigation route.

In the foregoing process, in the embodiment of the present disclosure, the voice service may actively monitor the first execution result from the application service in real time, where the voice service may be understood as a module that provides a background service for the voice function, and further after the application service generates the first execution result for the first interaction event, the voice service may also obtain the first execution result of the first interaction event.

It should be noted that, in the embodiment of the present disclosure, the voice service does not need to be concerned about what the first interaction event is, and does not need to be concerned about whether the first interaction event is triggered by voice or triggered by manual clicking, but only needs to monitor the first execution result, so as to determine whether to continue the multiple-round voice conversation mode according to the first execution result, thereby reducing the coupling between the application service and the voice service, and also reducing the access threshold and the cost of the application developer.

Therefore, the capability of acquiring the required data can be transferred to the voice service for active monitoring acquisition in a mode that the application service actively informs, the application service does not need to actively inform the voice service of various touch operations, the dependence of the voice service on the application service is reduced, the decoupling of the application service and the voice service is realized, the voice service does not need to monitor touch operations or interaction events, only the first execution result is concerned, and the realization logic is simpler.

And S104, according to the first execution result, when the voice conversation flow of the multi-round voice conversation mode is determined to be continued, entering the next round of conversation of the multi-round voice conversation mode.

The following describes a voice conversation process for determining whether to continue the multi-round voice conversation mode according to the first execution result. In the embodiment of the disclosure, after the first execution result of the first interaction event is obtained, whether to continue the voice conversation process of the multi-round voice conversation mode can be judged, so as to avoid the situation that the manual touch operation directly interrupts the voice interaction process.

Specifically, for determining a voice conversation process of continuing a multi-round voice conversation mode according to a first execution result, several possible implementations are provided in the embodiments of the present disclosure, including:

In a possible implementation manner, determining the voice conversation process of continuing the multi-round voice conversation mode according to the first execution result includes 1) determining an operation to be executed corresponding to the first execution result according to the first execution result, wherein the operation to be executed is used for triggering a second interaction event located after the first interaction event, 2) judging whether the second interaction event belongs to a target interaction event associated with the multi-round voice conversation mode or judging whether a scene type corresponding to the second interaction event belongs to a target scene type associated with the multi-round voice conversation mode, and 3) if so, determining the voice conversation process of continuing the multi-round voice conversation mode.

The target interaction event or the target scene type associated with the multi-round voice dialogue mode may be preset, for example, the target scene type associated with the voice service may be set to have a navigation scene, or a specific refinement scene in the navigation scene, etc., which is not limited.

For example, a user opens a navigation function in a map application by voice, enters a multi-round voice dialogue mode, and displays an interactive interface of the navigation function, if the user clicks a road calculation icon in the interactive interface through touch operation, a first execution result of success or failure of the road calculation can be obtained through the road calculation processing, if the first execution result is the road calculation success, a plurality of calculated candidate paths can be displayed to the user, and if an event that a path selection event in the plurality of candidate paths is responded by voice service is preset, namely, a target interaction event associated with the multi-round dialogue mode, the voice dialogue flow can be judged to be continued.

Therefore, when judging that the voice service is still associated through the second interaction event after the corresponding first execution result of the touch operation, the multi-round voice conversation mode can be continuously maintained, the phenomenon that the manual touch operation interrupts the multi-round voice conversation mode is avoided, and then the voice interaction can be directly performed without restarting the multi-round voice conversation mode, so that the efficiency and interaction experience are improved.

In another possible embodiment, determining the voice conversation process of continuing the multi-round voice conversation mode according to the first execution result includes determining whether the first execution result belongs to a target execution result of a target interaction event associated with the multi-round voice conversation mode, and if so, determining the voice conversation process of continuing the multi-round voice conversation mode.

Specifically, in the embodiment of the present disclosure, an association list corresponding to a voice service may be set, where the association list includes a plurality of target interaction events or a plurality of target execution results, and for different applications or functions, the target execution results of which target interaction events need to be focused by the voice service may be preset according to experience and requirements, for example, for a vehicle-mounted entertainment function, search results of search interaction events related to music, video, and the like, play control events, and the like may be preset as target interaction events or target execution results that need to be focused by the voice service.

In the embodiment of the disclosure, for the target execution result of the target interaction event to be focused, the call-back interface corresponding to the call-back interface is registered for the voice service by defining the interface function of the call-back interface in advance, and setting the calling party of the interface function as the voice service, so that the voice service needs to monitor or respond to the first execution result of which functions, the call-back interface corresponding to the voice service can be registered in advance, and the voice service can actively monitor the call-back interfaces to obtain the first execution result, so that if the first execution result corresponding to the call-back interface to be focused is judged, the target execution result of the target interaction event associated with the multi-round voice dialogue mode can be judged, and the voice dialogue flow for continuing the multi-round voice dialogue mode is determined.

Further, in the embodiment of the present disclosure, in order to further improve the interaction experience of voice and manual touch, a waiting countdown manner of a multi-round voice dialogue mode may be further combined, so that the multi-round voice mode is avoided being started all the time, but no triggering operation is performed, performance consumption may be reduced, and a possible implementation manner is provided in the present disclosure:

1) In one possible implementation, after displaying the interactive interface of the target function in the multi-round voice conversation mode, the full duplex waiting state of entering the multi-round voice conversation mode can be determined, and if the first execution result is not obtained before the timing duration of the full duplex waiting state is reached, the multi-round voice conversation mode is ended.

For example, the timing duration of the preset full duplex waiting state is 10 seconds, in the multi-round voice dialogue mode, the user initiates a round of dialogue, inputs a first voice command of "please play songs", plays songs and displays an interactive interface associated with the songs on the vehicle-mounted display screen, meanwhile, the full duplex waiting state can be entered, in 10 seconds after the round of dialogue is finished, a new voice command is not received, and any first execution result corresponding to the touch operation on the interactive interface is not obtained, so that the multi-round voice dialogue mode is ended.

In the embodiment of the disclosure, after entering a multi-round voice dialogue mode based on a first voice command and displaying an interactive interface of a target function in the multi-round voice dialogue mode, entering a double-full-work waiting state, wherein the double-full-work waiting state represents countdown waiting, if a first execution result is not obtained all the time in the stage of the double-full-work waiting state, the multi-round voice dialogue mode can be ended, at the moment, the multi-round voice dialogue mode can enter an inactive state and no response to voice or touch operation is performed, and if a subsequent user needs to use voice service, the subsequent user needs to wake up again.

2) In a possible implementation manner, when the voice conversation flow of the multi-round voice conversation mode is determined not to continue according to the first execution result, the full duplex waiting state is maintained, the multi-round voice conversation mode is ended after the timing duration of the full duplex waiting state is reached, or if a second voice command is received before the timing duration of the full duplex waiting state is reached, the new multi-round voice conversation mode is determined to be entered.

In this embodiment, when it is determined according to the first execution result that the voice conversation process in the multi-round voice conversation mode is not continued, the full duplex waiting state is still maintained, and waiting is maintained for a period of time, so that if a voice command or other target execution results associated with the multi-round voice conversation mode exist before the timing duration arrives, voice service can still be used, efficiency is improved, and user interaction experience is improved.

Furthermore, in the embodiment of the present disclosure, based on the timing duration of the full duplex waiting state, before determining to continue the voice conversation process of the multiple voice conversation modes, it is further required to determine that the voice state corresponding to the multiple voice conversation modes is in the active state in the embodiment of the present disclosure.

Therefore, through the judgment of the voice state and the first execution result, if the voice state is in the voice activation state and the first execution result is required to be responded or processed, the voice conversation process of the multi-round voice conversation mode can be continuously maintained, the next round of conversation is started, the voice conversation process is not interrupted due to the manual touch operation, and the user interaction experience is improved.

Of course, in the embodiment of the present disclosure, if the voice state corresponding to the current multi-round voice dialogue mode is in the inactive state, the voice dialogue flow of the multi-round voice dialogue mode is not continued, and the GUI flow is executed, so as to avoid that the user interaction experience is affected by suddenly entering the voice dialogue flow to perform voice playing, and the actual voice interaction scene is not met.

In addition, in the embodiment of the present disclosure, in the case of determining to continue the voice conversation flow of the multi-round voice conversation mode, the next round of conversation of the multi-round voice conversation mode may be entered, and at this time, the present disclosure further provides a possible implementation manner, to play voice guidance information, where the voice guidance information may be understood as information for guiding the subsequent voice conversation, for example, question information, result prompt information, and the like, which is not limited. The present disclosure also provides several possible embodiments:

1) In one possible implementation manner, when determining to continue the voice conversation process of the multi-round voice conversation mode, performing semantic analysis on the first execution result according to the first execution result to determine intention information, determining voice guidance information corresponding to the intention information according to the intention information, and entering the next round of conversation of the multi-round voice conversation mode.

In the embodiment of the disclosure, when determining to continue the voice conversation process of the multi-round voice conversation mode, the intention information after obtaining the first execution result may be determined according to the first execution result, specifically, the first execution result may be input into the intention model based on the trained intention model, the first execution result may be subjected to semantic analysis, the intention information corresponding to the first execution result is obtained, for example, it is determined what the user wants to do after obtaining the first execution result, and further, the voice guidance information corresponding to the intention information may be played, so as to guide the user to perform the next round of voice conversation.

For example, a user opens a map application, searches a place through a voice command, displays a place list in an interactive interface, can manually click or select one place for navigation through the voice command, a voice party service can acquire a first execution result, the first execution result is a navigation result, determines a current voice state as an activation state, and determines a voice dialogue flow for continuing a multi-round voice dialogue mode, further, through intention analysis on the first execution result, voice guidance information corresponding to the first execution result is obtained, the voice guidance information can be "route planned for you, navigation is started", and then, for the user, the voice guidance information is displayed to a user navigation result page and can be played at the same time.

2) In another possible implementation manner, when determining to continue the voice conversation process of the multi-round voice conversation mode, determining the voice guidance information corresponding to the first execution result according to the mapping relationship between the preset execution result and the voice guidance information, and entering the next round of conversation of the multi-round voice conversation mode.

In other words, in the embodiment of the present disclosure, the mapping relationship between each first execution result and the voice guidance information may be preset, and different voice guidance information may be played by different first execution results, so that corresponding voice guidance information may be directly obtained according to the mapping relationship to play, which is relatively simple to implement and has relatively high efficiency.

Further, after playing the voice guidance information corresponding to the first execution result, in the embodiment of the present disclosure, a voice instruction fed back according to the voice guidance information may also be obtained, control information may be generated according to the voice instruction, and the control information may be sent to a corresponding control device of the vehicle, where the control information is used to control the vehicle through the control device, so as to implement voice control of the vehicle, for example, reduce a playing volume, close a driving window, and so on.

It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.

Based on the same inventive concept, a voice interaction device corresponding to the voice interaction method is also provided in the embodiments of the present disclosure, and since the principle of solving the problem of the voice interaction device in the embodiments of the present disclosure is similar to that of the voice interaction method in the embodiments of the present disclosure, the specific implementation of the device may refer to the implementation of the method, and the repetition is omitted.

Referring to fig. 2, a schematic diagram of a voice interaction device according to an embodiment of the disclosure is provided, where the device includes a processor 21 and a memory 22, where the memory 22 stores machine-readable instructions executable by the processor 21, the processor 21 is configured to execute the machine-readable instructions stored in the memory 22, and when the machine-readable instructions are executed by the processor 21, the processor 21 is configured to perform the following steps:

In an alternative embodiment, when determining to continue the voice conversation process of the multiple voice conversation mode according to the first execution result, the processor 21:

In an alternative embodiment, when determining to continue the voice conversation process of the multi-round voice conversation mode according to the first execution result, the processor 21 is configured to:

judging whether the first execution result belongs to a target execution result of a target interaction event associated with the multi-round voice conversation mode, and if so, determining to continue the voice conversation flow of the multi-round voice conversation mode.

In an alternative embodiment, after displaying the interactive interface of the target function in the multi-round voice conversation mode, the processor 21 is further configured to:

In an alternative embodiment, the processor 21 is further configured to:

In an alternative embodiment, the processor 21 is further configured to determine that the voice state corresponding to the multiple voice conversation mode is in an active state before determining to continue the voice conversation process of the multiple voice conversation mode.

The memory 22 includes a memory 221 and an external memory 222, and the memory 221 is also referred to as an internal memory, and is used for temporarily storing operation data in the processor 21 and data exchanged with the external memory 222 such as a hard disk, and the processor 21 exchanges data with the external memory 222 via the memory 221.

The specific execution process of the above instruction may refer to the steps of the voice interaction method described in the embodiments of the present disclosure, which is not described herein.

The description of the process flow of each component in the apparatus, and the interaction flow between components, is not meant to imply any limitation on the implementation by the exact process flow and interaction flow, which should be determined by the functions and possibly inherent logic thereof.

Referring to fig. 3, a schematic diagram of a vehicle according to an embodiment of the disclosure is provided, where the vehicle includes the voice interaction device in the foregoing embodiment.

In the embodiment of the disclosure, the interactive interface of the target function may be displayed through a screen of the vehicle, where the screen of the vehicle may be further divided into a plurality of areas, including, for example, a central control screen, a main screen, a sub-screen, and the like, and not limited thereto, a voice assistant may be installed in the vehicle system, and a user may interact with the vehicle through touch or voice. For example, the user may input a voice command "open door", from which the voice assistant generates a control command and sends it to the vehicle's controller, which may control the opening of the door of the vehicle.

In practice, when a user interacts with a vehicle, touch and voice modes, for example, voice is used to open multimedia music, and then a manual touch click list is selected and played, in the embodiment of the disclosure, whether to continue the voice conversation process of the multi-round voice conversation mode can be judged according to the first execution result of the touch selection playing event, when the continuation is determined, the next round of conversation of the multi-round voice conversation mode is entered, and corresponding voice guiding information can be played, so that the user is guided to perform the next round of conversation, and the interactive experience of the user is improved.

The disclosed embodiments also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the voice interaction method described in the method embodiments above. Wherein the storage medium may be a volatile or nonvolatile computer readable storage medium.

The embodiments of the present disclosure further provide a computer program product, on which a computer program is stored, where the computer program when executed by a processor implements the steps of the voice interaction method described in the foregoing method embodiments, and specifically, reference may be made to the foregoing method embodiments, which are not described herein in detail.

The methods of the present application may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer programs or instructions. When the computer program or instructions are loaded and executed on a computer, the processes or functions of the present application are performed in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, a network device, a user device, a core network device, an operation administration maintenance (Operation Administration AND MAINTENANCE, OAM) or other programmable device.

The computer program or instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer program or instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired or wireless means. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that integrates one or more available media. The usable medium may be a magnetic medium such as a floppy disk, a hard disk, a magnetic tape, an optical medium such as a digital video disk, or a semiconductor medium such as a solid state disk. The computer readable storage medium may be volatile or nonvolatile storage medium, or may include both volatile and nonvolatile types of storage medium.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present disclosure may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in essence or a part contributing to the prior art or a part of the technical solution, or in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present disclosure. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.

It should be noted that the foregoing embodiments are merely specific implementations of the disclosure, and are not intended to limit the scope of the disclosure, and although the disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that any modification, variation or substitution of some of the technical features described in the foregoing embodiments may be made or equivalents may be substituted for those within the scope of the disclosure without departing from the spirit and scope of the technical aspects of the embodiments of the disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A voice interaction method, comprising:

Obtaining a first voice instruction for interacting with a user interface, wherein the first voice instruction is used to operate a target function;

Obtaining a first execution result of a first interaction event on the interaction interface, where the first interaction event is triggered by a touch operation on the interaction interface;

According to the first execution result, when it is determined to continue the voice dialogue process of the multi-round voice dialogue mode, the next round of dialogue of the multi-round voice dialogue mode is entered.

2. The method according to claim 1, characterized in that, according to the first execution result, determining to continue the voice dialogue process of the multi-round voice dialogue mode comprises:

Determining, according to the first execution result, an operation to be executed corresponding to the first execution result, wherein the operation to be executed is used to trigger a second interaction event that is located after the first interaction event;

Determining whether the second interaction event belongs to the target interaction event associated with the multi-round voice dialogue mode, or determining whether the scene type corresponding to the second interaction event belongs to the target scene type associated with the multi-round voice dialogue mode;

If so, determine to continue the voice dialogue process of the multi-round voice dialogue mode.

3. The method according to claim 1, characterized in that, according to the first execution result, determining to continue the voice dialogue process of the multi-round voice dialogue mode comprises:

Determining whether the first execution result belongs to the target execution result of the target interaction event associated with the multi-round voice dialogue mode;

4. The method according to any one of claims 1 to 3, characterized in that after displaying the interactive interface of the target function in the multi-round voice dialogue mode, the method further comprises:

Determine to enter the full-duplex waiting state of the multi-round voice dialogue mode;

If the first execution result is not obtained before the timing duration of the full-duplex waiting state is reached, the multi-round voice dialogue mode is terminated.

5. The method according to claim 4, characterized in that the method further comprises:

When it is determined according to the first execution result that the voice conversation process of the multi-round voice conversation mode is no longer to be continued, maintaining the full-duplex waiting state;

After the timing duration of the full-duplex waiting state is reached, the multi-round voice dialogue mode is terminated; or, if a second voice instruction is received before the timing duration of the full-duplex waiting state is reached, it is determined to enter a new multi-round voice dialogue mode.

6. The method according to claim 1, characterized in that obtaining a first execution result of a first interactive event on the interactive interface comprises:

A callback interface for monitoring the first interactive event on the interactive interface, wherein the callback interface represents an interface registered for obtaining the execution result;

According to the monitoring result, a return value of the callback interface is obtained, where the return value is used to indicate a first execution result of the first interaction event.

7. The method according to claim 1, characterized in that before determining to continue the voice dialogue process of the multi-round voice dialogue mode, the method further comprises:

It is determined that the voice state corresponding to the multi-round voice dialogue mode is in an activated state.

8. A voice interaction device, comprising a processor and a memory, wherein the memory stores machine-readable instructions executable by the processor, and the processor is used to execute the machine-readable instructions stored in the memory. When the machine-readable instructions are executed by the processor, the processor is used to perform the following steps:

9. A vehicle, characterized in that the vehicle comprises the voice interaction device as claimed in claim 8.

10. A computer-readable storage medium having a computer program stored thereon, wherein the computer program implements the steps of the method according to any one of claims 1 to 7 when executed by a processor.