CN120469577A

CN120469577A - Device, method, and user interface for gesture-based interactions

Info

Publication number: CN120469577A
Application number: CN202510565348.3A
Authority: CN
Inventors: 聂毅强; G·M·阿诺里; A·W·德瑞尔; J·K·芬尼斯; C·马鲁夫; C·穆赛特; G·耶基斯
Original assignee: Apple Inc
Current assignee: Apple Inc
Priority date: 2022-09-21
Filing date: 2023-09-14
Publication date: 2025-08-12
Also published as: WO2024064016A1; CN119923613A; EP4591138A1

Abstract

The present disclosure relates to devices, methods, and user interfaces for gesture-based interactions. In some embodiments, the present disclosure includes techniques and user interfaces for performing actions using in-air gestures. In some embodiments, the present disclosure includes techniques and user interfaces for adjusting audio playback using gestures. In some embodiments, the present disclosure includes techniques and user interfaces for conditionally responding to inputs.

Description

Device, method, and user interface for gesture-based interactions

The application is a divisional application of Chinese patent application with application date 2023, 9, 14, application number 202380066404.8 and the name "device, method and user interface for gesture-based interaction".

Technical Field

The present disclosure relates generally to computer systems in communication with a display generation component, one or more input devices, and optionally an external wearable device, the computer systems providing a computer-generated experience, including but not limited to electronic devices that provide virtual reality and mixed reality experiences via a display. More particularly, the present disclosure relates to techniques for performing operations using gestures (e.g., air gestures).

Background

In recent years, the development of computer systems for augmented reality has increased significantly. An example augmented reality environment includes at least some virtual elements that replace or augment the physical world. Input devices (such as cameras, controllers, joysticks, touch-sensitive surfaces, and touch screen displays) for computer systems and other electronic computing devices are used to interact with the virtual/augmented reality environment. Example virtual elements include virtual objects such as digital images, videos, text, icons, and control elements (such as buttons and other graphics).

Disclosure of Invention

Some methods and interfaces for performing operations using gestures are cumbersome, inefficient, and limited. For example, providing a system for insufficient feedback of actions associated with virtual objects, a system that requires input on specific hardware to achieve desired results in an augmented reality environment, and a system in which virtual objects are complex, cumbersome, and error-prone to manipulate can create a significant cognitive burden on the user and detract from the experience of the virtual/augmented reality environment. In addition, these methods take longer than necessary, wasting energy from the computer system. This latter consideration is particularly important in battery-powered devices.

Accordingly, there is a need for a computing system with improved methods and interfaces to provide a user with a computer-generated experience that makes interactions with the computing system using gestures more efficient and intuitive for the user. Such methods and interfaces optionally supplement or replace conventional methods for providing and interacting with an augmented reality experience. Such methods and interfaces reduce the number, extent, and/or nature of inputs from a user by helping the user understand the association between the inputs provided and the response of the device to those inputs, thereby forming a more efficient human-machine interface.

The above-described drawbacks and other problems associated with user interfaces of computer systems are reduced or eliminated by the disclosed systems. In some embodiments, the computer system is a desktop computer with an associated display. In some embodiments, the computer system is a portable device (e.g., a notebook computer, tablet computer, or handheld device). In some embodiments, the computer system is a personal electronic device (e.g., a wearable electronic device such as a watch or a head-mounted device). In some embodiments, the computer system has a touch pad. In some embodiments, the computer system has one or more cameras. In some implementations, the computer system has a touch-sensitive display (also referred to as a "touch screen" or "touch screen display"). In some embodiments, the computer system has one or more eye tracking components. In some embodiments, the computer system has one or more hand tracking components. In some embodiments, the computer system has, in addition to the display generating component, one or more output devices including one or more haptic output generators and/or one or more audio output devices. In some embodiments, a computer system has a Graphical User Interface (GUI), one or more processors, memory and one or more modules, a program or set of instructions stored in the memory for performing a plurality of functions. In some embodiments, the user interacts with the GUI through contact and gestures of a stylus and/or finger on the touch-sensitive surface, movement of the user's eyes and hands in space relative to the GUI (and/or computer system) or user's body (as captured by cameras and other motion sensors), and/or voice input (as captured by one or more audio input devices). In some embodiments, the functions performed by the interactions optionally include image editing, drawing, presentation, word processing, spreadsheet making, game playing, phone calls, video conferencing, email sending and receiving, instant messaging, test support, digital photography, digital video recording, web browsing, digital music playing, notes taking, and/or digital video playing. Executable instructions for performing these functions are optionally included in a transitory and/or non-transitory computer readable storage medium or other computer program product configured for execution by one or more processors.

There is a need for an electronic device with improved methods and interfaces to interact with a three-dimensional environment using gestures. Such methods and interfaces may supplement or replace conventional methods for interacting with a three-dimensional environment using gestures. Such methods and interfaces reduce the amount, degree, and/or nature of input from a user and result in a more efficient human-machine interface. For battery-powered computing devices, such methods and interfaces conserve power and increase the time interval between battery charges. Such methods may also improve the operational life of the device by reducing wear on the input mechanism (e.g., button).

According to some embodiments, a method performed at a computer system having one or more input devices is described. The method includes, at a computer system in communication with one or more input devices, detecting, via the one or more input devices, an air gesture performed by a hand and including a rotation of the hand, and in response to detecting the air gesture, performing a first operation based on the air gesture in accordance with a determination that the air gesture meets a first set of criteria, wherein the first set of criteria includes criteria met when the air gesture is detected while the hand is performing a pinch gesture, and in accordance with a determination that the air gesture does not meet the first set of criteria, discarding performing the first operation based on the air gesture.

According to some embodiments, a non-transitory computer readable storage medium is described. The non-transitory computer-readable storage medium stores one or more programs configured to be executed by one or more processors of a computer system having one or more input devices, the one or more programs including instructions for detecting, via the one or more input devices, an air gesture performed by a hand and including rotation of the hand, and in response to detecting the air gesture, performing a first operation based on the air gesture in accordance with a determination that the air gesture meets a first set of criteria, wherein the first set of criteria includes criteria met when the air gesture is detected while the hand is performing a pinch gesture, and in accordance with a determination that the air gesture does not meet the first set of criteria, relinquishing performing the first operation based on the air gesture.

According to some embodiments, a transitory computer readable storage medium is described. The transitory computer-readable storage medium stores one or more programs configured to be executed by one or more processors of a computer system having one or more input devices, the one or more programs including instructions for detecting an air gesture performed by a hand and including rotation of the hand via the one or more input devices, and in response to detecting the air gesture, performing a first operation based on the air gesture in accordance with a determination that the air gesture meets a first set of criteria, wherein the first set of criteria includes criteria met when the air gesture is detected while the hand is performing the pinch gesture, and in accordance with a determination that the air gesture does not meet the first set of criteria, relinquishing performing the first operation based on the air gesture.

According to some embodiments, a computer system having one or more input devices is described that includes one or more processors and a memory storing one or more programs configured to be executed by the one or more processors. The one or more programs include instructions for detecting, via the one or more input devices, an air gesture performed by the hand and including rotation of the hand, and in response to detecting the air gesture, performing a first operation based on the air gesture in accordance with a determination that the air gesture meets a first set of criteria, wherein the first set of criteria includes criteria met when the air gesture is detected while the hand is performing a pinch gesture, and in accordance with a determination that the air gesture does not meet the first set of criteria, discarding performing the first operation based on the air gesture.

According to some embodiments, a computer system having one or more input devices is described. The computer system includes means for detecting, via the one or more input devices, an air gesture performed by the hand and including rotation of the hand, and means responsive to detecting the air gesture for, in accordance with a determination that the air gesture meets a first set of criteria, wherein the first set of criteria includes criteria met when the air gesture is detected while the hand is performing a pinch gesture, performing a first operation based on the air gesture, and in accordance with a determination that the air gesture does not meet the first set of criteria, relinquishing performing the first operation based on the air gesture.

According to some embodiments, a computer program product is described that includes one or more programs configured to be executed by one or more processors of a computer system having one or more input devices. The one or more programs include instructions for detecting, via the one or more input devices, an air gesture performed by the hand and including rotation of the hand, and in response to detecting the air gesture, performing a first operation based on the air gesture in accordance with a determination that the air gesture meets a first set of criteria, wherein the first set of criteria includes criteria met when the air gesture is detected while the hand is performing a pinch gesture, and in accordance with a determination that the air gesture does not meet the first set of criteria, discarding performing the first operation based on the air gesture.

According to some embodiments, a method performed at a computer system having one or more input devices is described. The method includes, at a computer system in communication with one or more input devices, detecting, via the one or more input devices, a first gesture performed by a first hand, and in response to detecting the first gesture, performing an audio playback adjustment operation in accordance with a determination that the first gesture satisfies a first set of criteria, wherein the first set of criteria includes criteria that are satisfied when the first gesture includes the first hand being at a first location at least a threshold distance from a first body part of a user, and in accordance with a determination that the first gesture does not satisfy the first set of criteria, discarding the audio playback adjustment operation.

According to some embodiments, a non-transitory computer readable storage medium is described. The non-transitory computer-readable storage medium stores one or more programs configured to be executed by one or more processors of a computer system having one or more input devices, the one or more programs including instructions for detecting, via the one or more input devices, a first gesture performed by a first hand, and in response to detecting the first gesture, performing an audio playback adjustment operation in accordance with a determination that the first gesture satisfies a first set of criteria, wherein the first set of criteria includes criteria that are satisfied when the first gesture includes the first hand being at a first location at least a threshold distance from a first body part of a user, and discarding performing the audio playback adjustment operation in accordance with a determination that the first gesture does not satisfy the first set of criteria.

According to some embodiments, a transitory computer readable storage medium is described. The transitory computer-readable storage medium stores one or more programs configured to be executed by one or more processors of a computer system having one or more input devices, the one or more programs including instructions for detecting, via the one or more input devices, a first gesture performed by a first hand, and in response to detecting the first gesture, performing an audio playback adjustment operation in accordance with a determination that the first gesture satisfies a first set of criteria, wherein the first set of criteria includes criteria that are satisfied when the first gesture includes the first hand being at a first location at least a threshold distance from a first body part of a user, and in accordance with a determination that the first gesture does not satisfy the first set of criteria, relinquishing performing the audio playback adjustment operation.

According to some embodiments, a computer system having one or more input devices is described that includes one or more processors and a memory storing one or more programs configured to be executed by the one or more processors. The one or more programs include instructions for detecting, via the one or more input devices, a first gesture performed by a first hand, and in response to detecting the first gesture, performing an audio playback adjustment operation in accordance with a determination that the first gesture satisfies a first set of criteria, wherein the first set of criteria includes a criterion that is satisfied when the first gesture includes the first hand being at a first position at least a threshold distance from a first body part of a user, and in accordance with a determination that the first gesture does not satisfy the first set of criteria, discarding the performing of the audio playback adjustment operation.

According to some embodiments, a computer system having one or more input devices is described. The computer system includes means for detecting, via the one or more input devices, a first gesture performed by a first hand, and means responsive to detecting the first gesture for performing an audio playback adjustment operation in accordance with a determination that the first gesture meets a first set of criteria, wherein the first set of criteria includes criteria met when the first gesture includes the first hand being at a first position at least a threshold distance from a first body part of a user, and in accordance with a determination that the first gesture does not meet the first set of criteria, refraining from performing the audio playback adjustment operation.

According to some embodiments, a computer program product is described that includes one or more programs configured to be executed by one or more processors of a computer system having one or more input devices. The one or more programs include instructions for detecting, via the one or more input devices, a first gesture performed by a first hand, and in response to detecting the first gesture, performing an audio playback adjustment operation in accordance with a determination that the first gesture satisfies a first set of criteria, wherein the first set of criteria includes a criterion that is satisfied when the first gesture includes the first hand being at a first position at least a threshold distance from a first body part of a user, and in accordance with a determination that the first gesture does not satisfy the first set of criteria, discarding the performing of the audio playback adjustment operation.

According to some embodiments, a method performed on a computer system having a display generating component and one or more input devices is described. The method includes, at a computer system in communication with a display generation component and one or more input devices, detecting, while a user interface including a virtual object is displayed via the display generation component, an input via the one or more input devices, and in response to detecting the input, performing a first operation with respect to the first virtual object in accordance with a determination that an input user's attention is being directed to the first virtual object and the input corresponds to a first type of input in accordance with a determination that the input user's attention is being directed to the first virtual object and the input corresponds to a second type of input different from the first type of input, performing a second operation with respect to the first virtual object, wherein the second operation is different from the first operation, performing the first operation with respect to the second virtual object in accordance with a determination that the input user's attention is being directed to the second virtual object, wherein the second virtual object is different from the first virtual object and the input corresponds to the first type of input, and performing the first operation with respect to the second virtual object in accordance with a determination that the input user's attention is being directed to the second virtual object and the input corresponds to the second type of input.

According to some embodiments, a non-transitory computer readable storage medium is described. The non-transitory computer-readable storage medium stores one or more programs configured to be executed by one or more processors of a computer system having a display generating component and one or more input devices, the one or more programs including instructions for, upon display of a user interface including a virtual object via the display generating component, detecting an input via the one or more input devices, and in response to detecting the input, performing a first operation with respect to the first virtual object in accordance with a determination that an input user's attention is directed to the first virtual object in conjunction with the detection and that the input corresponds to a first type of input, performing a second operation with respect to the first virtual object in accordance with a determination that an input user's attention is directed to the first virtual object in conjunction with the detection and that the input corresponds to a second type of input that is different from the first type of input, wherein the second operation is different from the first operation, performing the first operation with respect to the second virtual object in accordance with the determination that an input user's attention is directed to the second virtual object in conjunction with the detection and that the input corresponds to the first type of input, and performing the first operation with respect to the first virtual object in accordance with the determination that an input is directed to the second virtual object in conjunction with the detection of the first type of input.

According to some embodiments, a transitory computer readable storage medium is described. The transitory computer-readable storage medium stores one or more programs configured to be executed by one or more processors of a computer system having a display generating component and one or more input devices, the one or more programs including instructions for, upon display of a user interface including a virtual object via the display generating component, detecting an input via the one or more input devices, and in response to detecting the input, performing a first operation with respect to the first virtual object in accordance with a determination that an input user's attention is directed to the first virtual object in conjunction with detecting the input of the first type, performing a second operation with respect to the first virtual object in accordance with a determination that an input user's attention is directed to the first virtual object in conjunction with detecting the input of a second type that is different from the first type of input, wherein the second operation is different from the first operation, performing the first operation with respect to the second virtual object in accordance with a determination that an input user's attention is directed to the second virtual object in conjunction with detecting the input of the second virtual object, wherein the second virtual object is different from the first virtual object and the input of the first type corresponds to the first type of input, and performing the second operation with respect to the first virtual object in accordance with a determination that an input of the second type of the attention is directed to the first virtual object is directed to the second virtual object.

According to some embodiments, a computer system having a display generating component and one or more input devices is described that includes one or more processors and a memory storing one or more programs configured to be executed by the one or more processors. The one or more programs include instructions for detecting an input via the one or more input devices while a user interface including a virtual object is displayed via the display generation component, and in response to detecting the input, performing a first operation with respect to the first virtual object in accordance with a determination that the attention of the input user is being directed to the first virtual object and the input corresponds to a first type of input, performing a second operation with respect to the first virtual object in accordance with a determination that the attention of the input user is being directed to the first virtual object and the input corresponds to a second type of input different from the first type of input, wherein the second operation is different from the first operation, performing the first operation with respect to the second virtual object in accordance with a determination that the attention of the input user is being directed to the second virtual object, wherein the second virtual object is different from the first virtual object and the input corresponds to the first type of input, and performing the first operation with respect to the second virtual object in accordance with a determination that the attention of the input user is being directed to the second virtual object and the input corresponds to the second type of input.

According to some embodiments, a computer system having a display generating component and one or more input devices is described. The computer system includes means for detecting an input via one or more input devices when a user interface including a virtual object is displayed via a display generating means, and means for performing a first operation with respect to the first virtual object in accordance with a determination that the attention of the input user is directed to the first virtual object in conjunction with the detection and the input corresponds to a first type of input, performing a second operation with respect to the first virtual object in accordance with a determination that the attention of the input user is directed to the first virtual object in conjunction with the detection and the input corresponds to a second type of input different from the first type of input, wherein the second operation is different from the first operation, performing a first operation with respect to the second virtual object in accordance with the determination that the attention of the input user is directed to the second virtual object in conjunction with the detection and the input corresponds to the first type of input, and performing the first operation with respect to the second virtual object in accordance with a determination that the attention of the input user is directed to the second virtual object in conjunction with the detection and the input corresponds to the second type of input.

According to some embodiments, a computer program product is described that includes one or more programs configured to be executed by one or more processors of a computer system having a display generating component and one or more input devices. The one or more programs include instructions for detecting an input via the one or more input devices while a user interface including a virtual object is displayed via the display generation component, and in response to detecting the input, performing a first operation with respect to the first virtual object in accordance with a determination that the attention of the input user is being directed to the first virtual object and the input corresponds to a first type of input, performing a second operation with respect to the first virtual object in accordance with a determination that the attention of the input user is being directed to the first virtual object and the input corresponds to a second type of input different from the first type of input, wherein the second operation is different from the first operation, performing the first operation with respect to the second virtual object in accordance with a determination that the attention of the input user is being directed to the second virtual object, wherein the second virtual object is different from the first virtual object and the input corresponds to the first type of input, and performing the first operation with respect to the second virtual object in accordance with a determination that the attention of the input user is being directed to the second virtual object and the input corresponds to the second type of input.

It is noted that the various embodiments described above may be combined with any of the other embodiments described herein. The features and advantages described in this specification are not all-inclusive and, in particular, many additional features and advantages will be apparent to one of ordinary skill in the art in view of the drawings, specification, and claims. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter.

Drawings

For a better understanding of the various described embodiments, reference should be made to the following detailed description taken in conjunction with the following drawings, in which like reference numerals designate corresponding parts throughout the several views.

FIG. 1 is a block diagram illustrating an operating environment for a computer system for providing an XR experience, according to some embodiments.

FIG. 2 is a block diagram illustrating a controller of a computer system configured to manage and coordinate a user's XR experience, according to some embodiments.

FIG. 3 is a block diagram illustrating a display generation component of a computer system configured to provide a visual component of an XR experience to a user, in accordance with some embodiments.

FIG. 4 is a block diagram illustrating a hand tracking unit of a computer system configured to capture gesture inputs of a user, according to some embodiments.

Fig. 5 is a block diagram illustrating an eye tracking unit of a computer system configured to capture gaze input of a user, in accordance with some embodiments.

Fig. 6 is a flow diagram illustrating a flash-assisted gaze tracking pipeline in accordance with some embodiments.

Fig. 7A-7E illustrate example techniques for performing operations using air gestures according to some embodiments.

FIG. 8 is a flowchart of a method of performing an operation using an air gesture, according to various embodiments.

Fig. 9A-9D illustrate example techniques for audio playback adjustment using gestures according to some embodiments.

Fig. 10 is a flow chart of a method of audio playback adjustment using gestures, according to various embodiments.

11A-11H illustrate example techniques for conditionally responding to an input according to some embodiments.

FIG. 12 is a flow chart of a method of conditionally responding to an input according to various embodiments.

Detailed Description

According to some embodiments, the present disclosure relates to a user interface for providing an augmented reality (XR) experience to a user.

Fig. 1-6 provide a description of an example computer system for providing an XR experience to a user. Fig. 7A-7E illustrate example techniques for performing operations using air gestures according to some embodiments. FIG. 8 is a flowchart of a method of performing an operation using an air gesture, according to various embodiments. The user interfaces in fig. 7A to 7E are used to illustrate the process in fig. 8. Fig. 9A-9D illustrate example techniques for audio playback adjustment using gestures according to some embodiments. Fig. 10 is a flow diagram of a method for audio playback adjustment using gestures, according to various embodiments. The illustrations in fig. 9A to 9D are for illustrating the process in fig. 10. 11A-11H illustrate example techniques for conditionally responding to an input according to some embodiments. FIG. 12 is a flow chart of a method for conditionally responding to an input according to various embodiments. The user interfaces in fig. 11A to 11H are used to illustrate the process in fig. 12.

The processes described below enhance operability of a device and make a user-device interface more efficient (e.g., by helping a user provide appropriate input and reducing user errors in operating/interacting with the device) through various techniques, including by providing improved visual feedback to the user, reducing the number of inputs required to perform an operation, providing additional control options without cluttering the user interface with additional display controls, performing an operation when a set of conditions has been met without further user input, improving privacy and/or security, providing a richer, more detailed and/or more realistic user experience while conserving storage space, and/or additional techniques. These techniques also reduce power usage and extend battery life of the device by enabling a user to use the device faster and more efficiently. Saving battery power and thus weight, improves the ergonomics of the device. These techniques also enable real-time communication, allow fewer and/or less accurate sensors to be used, resulting in a more compact, lighter, and cheaper device, and enable the device to be used under a variety of lighting conditions. These techniques reduce energy usage, and thus heat emitted by the device, which is particularly important for wearable devices, where wearing the device can become uncomfortable for the user if the device generates too much heat completely within the operating parameters of the device components.

Furthermore, in a method described herein in which one or more steps are dependent on one or more conditions having been met, it should be understood that the method may be repeated in multiple iterations such that during the iteration, all conditions that determine steps in the method have been met in different iterations of the method. For example, if a method requires performing a first step (if a condition is met) and performing a second step (if a condition is not met), one of ordinary skill will know that the stated steps are repeated until both the condition and the condition are not met (not sequentially). Thus, a method described as having one or more steps depending on one or more conditions having been met may be rewritten as a method that repeats until each of the conditions described in the method have been met. However, this does not require the system or computer-readable medium to claim that the system or computer-readable medium contains instructions for performing the contingent operation based on the satisfaction of the corresponding condition or conditions, and thus is able to determine whether the contingent situation has been met without explicitly repeating the steps of the method until all conditions to decide on steps in the method have been met. It will also be appreciated by those of ordinary skill in the art that, similar to a method with optional steps, a system or computer readable storage medium may repeat the steps of the method as many times as necessary to ensure that all optional steps have been performed.

In some embodiments, as shown in fig. 1, an XR experience is provided to a user via an operating environment 100 comprising a computer system 101. The computer system 101 includes a controller 110 (e.g., a processor or remote server of a portable electronic device), a display generation component 120 (e.g., a Head Mounted Device (HMD), a display, a projector, a touch screen, etc.), one or more input devices 125 (e.g., an eye tracking device 130, a hand tracking device 140, other input devices 150), one or more output devices 155 (e.g., a speaker 160, a haptic output generator 170, and other output devices 180), one or more sensors 190 (e.g., an image sensor, a light sensor, a depth sensor, a haptic sensor, an orientation sensor, a proximity sensor, a temperature sensor, a position sensor, a motion sensor, a speed sensor, etc.), and optionally one or more peripheral devices 195 (e.g., a household appliance, a wearable device, etc.). In some implementations, one or more of the input device 125, the output device 155, the sensor 190, and the peripheral device 195 are integrated with the display generating component 120 (e.g., in a head-mounted device or a handheld device).

In describing an XR experience, various terms are used to refer differently to several related but different environments that a user may sense and/or interact with (e.g., interact with inputs detected by computer system 101 that generated the XR experience, such inputs causing the computer system that generated the XR experience to generate audio, visual, and/or tactile feedback corresponding to various inputs provided to computer system 101). The following are a subset of these terms:

Physical environment-a physical environment refers to the physical world in which people can sense and/or interact without the assistance of an electronic system. Physical environments such as physical parks include physical objects such as physical trees, physical buildings, and physical people. People can directly sense and/or interact with a physical environment, such as by visual, tactile, auditory, gustatory, and olfactory.

Augmented reality-conversely, an augmented reality (XR) environment refers to a completely or partially simulated environment in which people sense and/or interact via an electronic system. In XR, a subset of the physical movements of the person, or a representation thereof, is tracked, and in response, one or more characteristics of one or more virtual objects simulated in the XR environment are adjusted in a manner consistent with at least one physical law. For example, an XR system may detect a person's head rotation and, in response, adjust the graphical content and sound field presented to the person in a manner similar to the manner in which such views and sounds change in a physical environment. In some cases (e.g., for reachability reasons), the adjustment of the characteristics of the virtual object in the XR environment may be made in response to a representation of the physical motion (e.g., a voice command). A person may utilize any of his senses to sense and/or interact with XR objects, including vision, hearing, touch, taste, and smell. For example, a person may sense and/or interact with audio objects that create a 3D or spatial audio environment that provides perception of a point audio source in 3D space. As another example, an audio object may enable audio transparency that selectively introduces environmental sounds from a physical environment with or without computer generated audio. In some XR environments, a person may sense and/or interact with only audio objects.

Examples of XRs include virtual reality and mixed reality.

Virtual reality-Virtual Reality (VR) environment refers to a simulated environment designed to be based entirely on computer-generated sensory input for one or more senses. The VR environment includes a plurality of virtual objects that a person can sense and/or interact with. For example, computer-generated images of trees, buildings, and avatars representing people are examples of virtual objects. A person may sense and/or interact with virtual objects in a VR environment through a simulation of the presence of the person within the computer-generated environment and/or through a simulation of a subset of the physical movements of the person within the computer-generated environment.

Mixed reality-in contrast to VR environments that are designed to be based entirely on computer-generated sensory input, mixed Reality (MR) environments refer to simulated environments that are designed to introduce sensory input, or representations thereof, from a physical environment in addition to including computer-generated sensory input (e.g., virtual objects). On a virtual continuum, a mixed reality environment is any condition between, but not including, a full physical environment as one end and a virtual reality environment as the other end. In some MR environments, the computer-generated sensory input may be responsive to changes in sensory input from the physical environment. In addition, some electronic systems for rendering MR environments may track the position and/or orientation relative to the physical environment to enable virtual objects to interact with real objects (i.e., physical objects or representations thereof from the physical environment). For example, the system may cause the motion such that the virtual tree appears to be stationary relative to the physical ground.

Examples of mixed reality include augmented reality and augmented virtualization.

Augmented Reality (AR) environment refers to a simulated environment in which one or more virtual objects are superimposed over a physical environment or a representation of a physical environment. For example, an electronic system for presenting an AR environment may have a transparent or translucent display through which a person may directly view the physical environment. The system may be configured to present the virtual object on a transparent or semi-transparent display such that a person perceives the virtual object superimposed over the physical environment with the system. Alternatively, the system may have an opaque display and one or more imaging sensors that capture images or videos of the physical environment, which are representations of the physical environment. The system combines the image or video with the virtual object and presents the composition on an opaque display. A person utilizes the system to indirectly view the physical environment via an image or video of the physical environment and perceive a virtual object superimposed over the physical environment. As used herein, video of a physical environment displayed on an opaque display is referred to as "pass-through video," meaning that the system captures images of the physical environment using one or more image sensors and uses those images when rendering an AR environment on the opaque display. Further alternatively, the system may have a projection system that projects the virtual object into the physical environment, for example as a hologram or on a physical surface, such that a person perceives the virtual object superimposed on top of the physical environment with the system. An augmented reality environment also refers to a simulated environment in which a representation of a physical environment is transformed by computer-generated sensory information. For example, in providing a passthrough video, the system may transform one or more sensor images to apply a selected viewing angle (e.g., a viewpoint) that is different from the viewing angle captured by the imaging sensor. As another example, the representation of the physical environment may be transformed by graphically modifying (e.g., magnifying) portions thereof such that the modified portions may be representative but not real versions of the original captured image. For another example, the representation of the physical environment may be transformed by graphically eliminating or blurring portions thereof.

Enhanced virtual-enhanced virtual (AV) environments refer to simulated environments in which a virtual environment or computer-generated environment incorporates one or more sensory inputs from a physical environment. The sensory input may be a representation of one or more characteristics of the physical environment. For example, an AV park may have virtual trees and virtual buildings, but the face of a person is realistically reproduced from an image taken of a physical person. As another example, the virtual object may take the shape or color of a physical object imaged by one or more imaging sensors. For another example, the virtual object may employ shadows that conform to the positioning of the sun in the physical environment.

Virtual object with viewpoint locked when the computer system displays the virtual object at the same location and/or position in the user's viewpoint, the virtual object is viewpoint locked even if the user's viewpoint is offset (e.g., changed). In embodiments in which the computer system is a head-mounted device, the user's point of view is locked to the forward direction of the user's head (e.g., the user's point of view is at least a portion of the user's field of view when the user is looking directly in front), and thus, without moving the user's head, the user's point of view remains fixed even when the user's gaze is offset. In embodiments in which the computer system has a display generating component (e.g., a display screen) that is repositionable relative to the user's head, the user's point of view is an augmented reality view presented to the user on the display generating component of the computer system. For example, a viewpoint-locked virtual object displayed in the upper left corner of the user's viewpoint continues to be displayed in the upper left corner of the user's viewpoint when the user's viewpoint is in a first orientation (e.g., the user's head faces north), even when the user's viewpoint changes to a second orientation (e.g., the user's head faces west). In other words, the position and/or orientation of the virtual object in which the viewpoint lock is displayed in the viewpoint of the user is independent of the position and/or orientation of the user in the physical environment. In embodiments in which the computer system is a head-mounted device, the user's point of view is locked to the orientation of the user's head, such that the virtual object is also referred to as a "head-locked virtual object.

Environment-locked visual objects when the computer system displays a virtual object at a location and/or position in the viewpoint of the user, the virtual object is environment-locked (alternatively, "world-locked"), the location and/or position being based on (e.g., selected and/or anchored to) a location and/or object in a three-dimensional environment (e.g., a physical environment or virtual environment) with reference to the location and/or object. As the user's point of view moves, the position and/or object in the environment relative to the user's point of view changes, which results in the environment-locked virtual object being displayed at a different position and/or location in the user's point of view. For example, an environmentally locked virtual object that locks onto a tree immediately in front of the user is displayed at the center of the user's viewpoint. When the user's viewpoint is shifted to the right (e.g., the user's head is turned to the right) such that the tree is now to the left of center in the user's viewpoint (e.g., the tree positioning in the user's viewpoint is shifted), the environmentally locked virtual object that is locked onto the tree is displayed to the left of center in the user's viewpoint. In other words, the position and/or orientation at which the environment-locked virtual object is displayed in the user's viewpoint depends on the position and/or orientation of the object in the environment to which the virtual object is locked. In some embodiments, the computer system uses a stationary frame of reference (e.g., a coordinate system anchored to a fixed location and/or object in the physical environment) in order to determine the location of the virtual object that displays the environmental lock in the viewpoint of the user. The environment-locked virtual object may be locked to a stationary portion of the environment (e.g., a floor, wall, table, or other stationary object), or may be locked to a movable portion of the environment (e.g., a vehicle, animal, person, or even a representation of a portion of a user's body such as a user's hand, wrist, arm, or foot that moves independent of the user's point of view) such that the virtual object moves as the point of view or the portion of the environment moves to maintain a fixed relationship between the virtual object and the portion of the environment.

In some implementations, the environmentally or view-locked virtual object exhibits an inert follow-up behavior that reduces or delays movement of the environmentally or view-locked virtual object relative to movement of a reference point that the virtual object follows. In some embodiments, the computer system intentionally delays movement of the virtual object when detecting movement of a reference point (e.g., a portion of the environment, a viewpoint, or a point fixed relative to the viewpoint, such as a point between 5cm and 300cm from the viewpoint) that the virtual object is following while exhibiting inert follow-up behavior. For example, when a reference point (e.g., a portion or viewpoint of an environment) moves at a first speed, the virtual object is moved by the device to remain locked to the reference point, but moves at a second speed that is slower than the first speed (e.g., until the reference point stops moving or slows down, at which time the virtual object begins to catch up with the reference point). In some embodiments, when the virtual object exhibits inert follow-up behavior, the device ignores small movements of the reference point (e.g., ignores movements of the reference point below a threshold amount of movement, such as movements of 0 to 5 degrees or movements of 0 to 50 cm). For example, when a reference point (e.g., a portion or viewpoint of an environment to which a virtual object is locked) moves a first amount, the distance between the reference point and the virtual object increases (e.g., because the virtual object is being displayed so as to maintain a fixed or substantially fixed position relative to a different viewpoint or portion of the environment than the reference point to which the virtual object is locked), and when the reference point (e.g., a portion or viewpoint of the environment to which the virtual object is locked) moves a second amount greater than the first amount, the distance between the reference point and the virtual object increases (e.g., because the virtual object is being displayed so as to maintain a fixed or substantially fixed position relative to a different viewpoint or portion of the environment than the reference point to which the virtual object is locked) then decreases as the amount of movement of the reference point increases above a threshold (e.g., an "inertia following" threshold) because the virtual object is moved by the computer system so as to maintain a fixed or substantially fixed position relative to the reference point. In some embodiments, maintaining a substantially fixed location of the virtual object relative to the reference point includes the virtual object being displayed within a threshold distance (e.g., 1cm, 2cm, 3cm, 5cm, 15cm, 20cm, 50 cm) of the reference point in one or more dimensions (e.g., up/down, left/right, and/or forward/backward of the location relative to the reference point).

Hardware there are many different types of electronic systems that enable a person to sense and/or interact with various XR environments. Examples include head-mounted systems, projection-based systems, head-up displays (HUDs), vehicle windshields integrated with display capabilities, windows integrated with display capabilities, displays formed as lenses designed for placement on a person's eyes (e.g., similar to contact lenses), headphones/earphones, speaker arrays, input systems (e.g., wearable or handheld controllers with or without haptic feedback), smart phones, tablet devices, and desktop/laptop computers. The head-mounted system may include speakers and/or other audio output devices integrated into the head-mounted system for providing audio output. The head-mounted system may have one or more speakers and an integrated opaque display. Alternatively, the head-mounted system may be configured to accept an external opaque display (e.g., a smart phone). The head-mounted system may incorporate one or more imaging sensors for capturing images or video of the physical environment and/or one or more microphones for capturing audio of the physical environment. The head-mounted system may have a transparent or translucent display instead of an opaque display. A transparent or translucent display may have a medium through which light representing an image is directed to a person's eye. The display may utilize digital light projection, OLED, LED, uLED, liquid crystal on silicon, laser scanning light sources, or any combination of these techniques. The medium may be an optical waveguide, a holographic medium, an optical combiner, an optical reflector, or any combination thereof. In one embodiment, the transparent or translucent display may be configured to selectively become opaque. Projection-based systems may employ retinal projection techniques that project a graphical image onto a person's retina. The projection system may also be configured to project the virtual object into the physical environment, for example as a hologram or on a physical surface. In some embodiments, the controller 110 is configured to manage and coordinate the XR experience of the user. In some embodiments, controller 110 includes suitable combinations of software, firmware, and/or hardware. The controller 110 is described in more detail below with respect to fig. 2. In some implementations, the controller 110 is a computing device that is in a local or remote location relative to the scene 105 (e.g., physical environment). For example, the controller 110 is a local server located within the scene 105. As another example, the controller 110 is a remote server (e.g., cloud server, central server, etc.) located outside of the scene 105. In some implementations, the controller 110 is communicatively coupled with the display generation component 120 (e.g., HMD, display, projector, touch-screen, etc.) via one or more wired or wireless communication channels 144 (e.g., bluetooth, IEEE 802.11x, IEEE 802.16x, IEEE 802.3x, etc.). In another example, the controller 110 is included within a housing (e.g., a physical enclosure) of the display generation component 120 (e.g., an HMD or portable electronic device including a display and one or more processors, etc.), one or more of the input devices 125, one or more of the output devices 155, one or more of the sensors 190, and/or one or more of the peripheral devices 195, or shares the same physical housing or support structure with one or more of the above.

In some embodiments, display generation component 120 is configured to provide an XR experience (e.g., at least a visual component of the XR experience) to a user. In some embodiments, display generation component 120 includes suitable combinations of software, firmware, and/or hardware. The display generating section 120 is described in more detail below with respect to fig. 3. In some embodiments, the functionality of the controller 110 is provided by and/or combined with the display generating component 120.

According to some embodiments, display generation component 120 provides an XR experience to a user when the user is virtually and/or physically present within scene 105.

In some embodiments, the display generating component is worn on a portion of the user's body (e.g., on his/her head, on his/her hand, etc.). As such, display generation component 120 includes one or more XR displays provided for displaying XR content. For example, in various embodiments, the display generation component 120 encloses a field of view of a user. In some embodiments, display generation component 120 is a handheld device (such as a smart phone or tablet device) configured to present XR content, and the user holds the device with a display facing the user's field of view and a camera facing scene 105. In some embodiments, the handheld device is optionally placed within a housing that is worn on the head of the user. In some embodiments, the handheld device is optionally placed on a support (e.g., tripod) in front of the user. In some embodiments, display generation component 120 is an XR room, housing, or room configured to present XR content, wherein the user does not wear or hold display generation component 120. Many of the user interfaces described with reference to one type of hardware for displaying XR content (e.g., a handheld device or a device on a tripod) may be implemented on another type of hardware for displaying XR content (e.g., an HMD or other wearable computing device). For example, a user interface showing interactions with XR content triggered based on interactions occurring in a space in front of a handheld device or a tripod-mounted device may similarly be implemented with an HMD, where the interactions occur in the space in front of the HMD and responses to the XR content are displayed via the HMD. Similarly, a user interface showing interaction with XR content triggered based on movement of a handheld device or tripod-mounted device relative to a physical environment (e.g., a scene 105 or a portion of a user's body (e.g., a user's eye, head, or hand)) may similarly be implemented with an HMD, where the movement is caused by movement of the HMD relative to the physical environment (e.g., the scene 105 or a portion of the user's body (e.g., a user's eye, head, or hand)).

While relevant features of the operating environment 100 are shown in fig. 1, those of ordinary skill in the art will recognize from this disclosure that various other features are not illustrated for the sake of brevity and so as not to obscure more relevant aspects of the example embodiments disclosed herein.

Fig. 2 is a block diagram of an example of a controller 110 according to some embodiments. While certain specific features are illustrated, those of ordinary skill in the art will appreciate from the present disclosure that various other features are not illustrated for the sake of brevity and so as not to obscure more pertinent aspects of the embodiments disclosed herein. To this end, as a non-limiting example, in some embodiments, the controller 110 includes one or more processing units 202 (e.g., microprocessors, application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs), graphics Processing Units (GPUs), central Processing Units (CPUs), processing cores, etc.), one or more input/output (I/O) devices 206, one or more communication interfaces 208 (e.g., universal Serial Bus (USB), FIREWIRE, THUNDERBOLT, IEEE 802.3x, IEEE 802.11x, IEEE 802.16x, global system for mobile communications (GSM), code Division Multiple Access (CDMA), time Division Multiple Access (TDMA), global Positioning System (GPS), infrared (IR), bluetooth, ZIGBEE, and/or similar types of interfaces), one or more programming (e.g., I/O) interfaces 210, memory 220, and one or more communication buses 204 for interconnecting these components and various other components.

In some embodiments, one or more of the communication buses 204 include circuitry that interconnects and controls communications between system components. In some embodiments, the one or more I/O devices 206 include at least one of a keyboard, a mouse, a touchpad, a joystick, one or more microphones, one or more speakers, one or more image sensors, one or more displays, and the like.

Memory 220 includes high-speed random access memory such as Dynamic Random Access Memory (DRAM), static Random Access Memory (SRAM), double data rate random access memory (DDR RAM), or other random access solid state memory devices. In some embodiments, memory 220 includes non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices. Memory 220 optionally includes one or more storage devices located remotely from the one or more processing units 202. Memory 220 includes a non-transitory computer-readable storage medium. In some embodiments, memory 220 or a non-transitory computer readable storage medium of memory 220 stores the following programs, modules, and data structures, or a subset thereof, including optional operating system 230 and XR experience module 240.

Operating system 230 includes instructions for handling various basic system services and for performing hardware-related tasks. In some embodiments, XR experience module 240 is configured to manage and coordinate single or multiple XR experiences of one or more users (e.g., single XR experiences of one or more users, or multiple XR experiences of a respective group of one or more users). To this end, in various embodiments, the XR experience module 240 includes a data acquisition unit 241, a tracking unit 242, a coordination unit 246, and a data transmission unit 248.

In some embodiments, the data acquisition unit 241 is configured to acquire data (e.g., presentation data, interaction data, sensor data, location data, etc.) from at least the display generation component 120 of fig. 1, and optionally from one or more of the input device 125, the output device 155, the sensor 190, and/or the peripheral device 195. To this end, in various embodiments, the data acquisition unit 241 includes instructions and/or logic for instructions as well as heuristics and metadata for heuristics.

In some embodiments, tracking unit 242 is configured to map scene 105 and track at least the location/position of display generation component 120 relative to scene 105 of fig. 1, and optionally the location/position of one or more of input device 125, output device 155, sensor 190, and/or peripheral device 195. To this end, in various embodiments, the tracking unit 242 includes instructions and/or logic for instructions as well as heuristics and metadata for heuristics. In some embodiments, tracking unit 242 includes a hand tracking unit 244 and/or an eye tracking unit 243. In some embodiments, the hand tracking unit 244 is configured to track the location/position of one or more portions of the user's hand, and/or the movement of one or more portions of the user's hand relative to the scene 105 of fig. 1, relative to the display generating component 120, and/or relative to a coordinate system defined relative to the user's hand. The hand tracking unit 244 is described in more detail below with respect to fig. 4. In some embodiments, the eye tracking unit 243 is configured to track the positioning or movement of the user gaze (or more generally, the user's eyes, face, or head) relative to the scene 105 (e.g., relative to the physical environment and/or relative to the user (e.g., the user's hand)) or relative to XR content displayed via the display generating component 120. The eye tracking unit 243 is described in more detail below with respect to fig. 5.

In some embodiments, coordination unit 246 is configured to manage and coordinate XR experiences presented to a user by display generation component 120, and optionally by one or more of output device 155 and/or peripheral device 195. For this purpose, in various embodiments, coordination unit 246 includes instructions and/or logic for instructions as well as heuristics and metadata for heuristics.

In some embodiments, the data transmission unit 248 is configured to transmit data (e.g., presentation data, location data, etc.) to at least the display generation component 120, and optionally to one or more of the input device 125, the output device 155, the sensor 190, and/or the peripheral device 195. For this purpose, in various embodiments, the data transmission unit 248 includes instructions and/or logic for instructions as well as heuristics and metadata for heuristics.

While the data acquisition unit 241, tracking unit 242 (e.g., including eye tracking unit 243 and hand tracking unit 244), coordination unit 246, and data transmission unit 248 are shown as residing on a single device (e.g., controller 110), it should be understood that in other embodiments, any combination of the data acquisition unit 241, tracking unit 242 (e.g., including eye tracking unit 243 and hand tracking unit 244), coordination unit 246, and data transmission unit 248 may reside in a single computing device.

Furthermore, FIG. 2 is a functional description of various features that may be present in a particular implementation, as opposed to a schematic of the embodiments described herein. As will be appreciated by one of ordinary skill in the art, the individually displayed items may be combined and some items may be separated. For example, some of the functional blocks shown separately in fig. 2 may be implemented in a single block, and the various functions of a single functional block may be implemented by one or more functional blocks in various embodiments. The actual number of modules and the division of particular functions, and how features are allocated among them, will vary depending upon the particular implementation, and in some embodiments, depend in part on the particular combination of hardware, software, and/or firmware selected for a particular implementation.

Fig. 3 is a block diagram of an example of display generation component 120 according to some embodiments. While certain specific features are illustrated, those of ordinary skill in the art will appreciate from the present disclosure that various other features are not illustrated for the sake of brevity and so as not to obscure more pertinent aspects of the embodiments disclosed herein. For this purpose, as a non-limiting example, in some embodiments, display generation component 120 (e.g., HMD) includes one or more processing units 302 (e.g., microprocessors, ASIC, FPGA, GPU, CPU, processing cores, etc.), one or more input/output (I/O) devices and sensors 306, one or more communication interfaces 308 (e.g., USB, FIREWIRE, THUNDERBOLT, IEEE 802.3x, IEEE 802.11x, IEEE 802.16x, GSM, CDMA, TDMA, GPS, IR, bluetooth, ZIGBEE, and/or similar types of interfaces), one or more programming (e.g., I/O) interfaces 310, one or more XR displays 312, one or more optional inwardly and/or outwardly facing image sensors 314, memory 320, and one or more communication buses 304 for interconnecting these components and various other components.

In some embodiments, one or more communication buses 304 include circuitry for interconnecting and controlling communications between various system components. In some embodiments, the one or more I/O devices and sensors 306 include an Inertial Measurement Unit (IMU), an accelerometer, a gyroscope, a thermometer, one or more physiological sensors (e.g., blood pressure monitor, heart rate monitor, blood oxygen sensor, blood glucose sensor, etc.), one or more microphones, one or more speakers, a haptic engine, and/or one or more depth sensors (e.g., structured light, time of flight, etc.), and/or the like.

In some embodiments, one or more XR displays 312 are configured to provide an XR experience to a user. In some embodiments, one or more XR displays 312 correspond to holographic, digital Light Processing (DLP), liquid Crystal Displays (LCD), liquid crystal on silicon (LCoS), organic light emitting field effect transistors (OLET), organic Light Emitting Diodes (OLED), surface conduction electron emitting displays (SED), field Emission Displays (FED), quantum dot light emitting diodes (QD-LED), microelectromechanical systems (MEMS), and/or similar display types. In some embodiments, one or more XR displays 312 correspond to diffractive, reflective, polarizing, holographic, etc. waveguide displays. For example, the display generation component 120 (e.g., HMD) includes a single XR display. In another example, display generation component 120 includes an XR display for each eye of the user. In some embodiments, one or more XR displays 312 are capable of presenting MR and VR content. In some implementations, one or more XR displays 312 can present MR or VR content.

In some embodiments, the one or more image sensors 314 are configured to acquire image data corresponding to at least a portion of the user's face including the user's eyes (and may be referred to as an eye tracking camera). In some embodiments, the one or more image sensors 314 are configured to acquire image data corresponding to at least a portion of a user's hand and optionally a user's arm (and may be referred to as a hand tracking camera). In some implementations, the one or more image sensors 314 are configured to face forward in order to acquire image data corresponding to a scene that a user would see in the absence of the display generating component 120 (e.g., HMD) (and may be referred to as a scene camera). The one or more optional image sensors 314 may include one or more RGB cameras (e.g., with Complementary Metal Oxide Semiconductor (CMOS) image sensors or Charge Coupled Device (CCD) image sensors), one or more Infrared (IR) cameras, and/or one or more event-based cameras, etc.

Memory 320 includes high-speed random access memory such as DRAM, SRAM, DDR RAM or other random access solid state memory devices. In some embodiments, memory 320 includes non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices. Memory 320 optionally includes one or more storage devices located remotely from the one or more processing units 302. Memory 320 includes a non-transitory computer-readable storage medium. In some embodiments, memory 320 or a non-transitory computer readable storage medium of memory 320 stores the following programs, modules, and data structures, or a subset thereof, including optional operating system 330 and XR presentation module 340.

Operating system 330 includes processes for handling various basic system services and for performing hardware-related tasks. In some embodiments, XR presentation module 340 is configured to present XR content to a user via one or more XR displays 312. To this end, in various embodiments, the XR presentation module 340 includes a data acquisition unit 342, an XR presentation unit 344, an XR map generation unit 346, and a data transmission unit 348.

In some embodiments, the data acquisition unit 342 is configured to at least acquire data (e.g., presentation data, interaction data, sensor data, location data, etc.) from the controller 110 of fig. 1. For this purpose, in various embodiments, the data acquisition unit 342 includes instructions and/or logic for instructions and heuristics and metadata for heuristics.

In some embodiments, XR presentation unit 344 is configured to present XR content via one or more XR displays 312. For this purpose, in various embodiments, XR presentation unit 344 includes instructions and/or logic for instructions and heuristics and metadata for heuristics.

In some embodiments, XR map generation unit 346 is configured to generate an XR map based on the media content data (e.g., a 3D map of a mixed reality scene or a map of a physical environment in which computer-generated objects may be placed to generate an augmented reality). For this purpose, in various embodiments, XR map generation unit 346 includes instructions and/or logic for the instructions as well as heuristics and metadata for the heuristics.

In some embodiments, the data transmission unit 348 is configured to transmit data (e.g., presentation data, location data, etc.) to at least the controller 110, and optionally one or more of the input device 125, the output device 155, the sensor 190, and/or the peripheral device 195. For this purpose, in various embodiments, data transmission unit 348 includes instructions and/or logic for instructions and heuristics and metadata for heuristics.

Although the data acquisition unit 342, the XR presentation unit 344, the XR map generation unit 346, and the data transmission unit 348 are shown as residing on a single device (e.g., the display generation component 120 of fig. 1), it should be understood that in other embodiments, any combination of the data acquisition unit 342, the XR presentation unit 344, the XR map generation unit 346, and the data transmission unit 348 may be located in separate computing devices.

Furthermore, fig. 3 is used more as a functional description of various features that may be present in a particular embodiment, as opposed to a schematic of the embodiments described herein. As will be appreciated by one of ordinary skill in the art, the individually displayed items may be combined and some items may be separated. For example, some of the functional blocks shown separately in fig. 3 may be implemented in a single block, and the various functions of a single functional block may be implemented by one or more functional blocks in various embodiments. The actual number of modules and the division of particular functions, and how features are allocated among them, will vary depending upon the particular implementation, and in some embodiments, depend in part on the particular combination of hardware, software, and/or firmware selected for a particular implementation.

Fig. 4 is a schematic illustration of an example embodiment of a hand tracking device 140. In some embodiments, the hand tracking device 140 (fig. 1) is controlled by the hand tracking unit 244 (fig. 2) to track the position/location of one or more portions of the user's hand, and/or movement of one or more portions of the user's hand relative to the scene 105 of fig. 1 (e.g., relative to a portion of the physical environment surrounding the user, relative to the display generating component 120, or relative to a portion of the user (e.g., the user's face, eyes, or head), and/or relative to a coordinate system defined relative to the user's hand). In some implementations, the hand tracking device 140 is part of the display generation component 120 (e.g., embedded in or attached to a head-mounted device). In some embodiments, the hand tracking device 140 is separate from the display generation component 120 (e.g., in a separate housing or attached to a separate physical support structure).

In some implementations, the hand tracking device 140 includes an image sensor 404 (e.g., one or more IR cameras, 3D cameras, depth cameras, and/or color cameras, etc.) that captures three-dimensional scene information including at least a human user's hand 406. The image sensor 404 captures the hand image with sufficient resolution to enable the finger and its corresponding location to be distinguished. The image sensor 404 typically captures images of other parts of the user's body, and possibly also all parts of the body, and may have a zoom capability or a dedicated sensor with increased magnification to capture images of the hand with a desired resolution. In some implementations, the image sensor 404 also captures 2D color video images of the hand 406 and other elements of the scene. In some implementations, the image sensor 404 is used in conjunction with other image sensors to capture the physical environment of the scene 105, or as an image sensor that captures the physical environment of the scene 105. In some embodiments, the image sensor 404, or a portion thereof, is positioned relative to the user or the user's environment in a manner that uses the field of view of the image sensor to define an interaction space in which hand movements captured by the image sensor are considered input to the controller 110.

In some embodiments, the image sensor 404 outputs a sequence of frames containing 3D map data (and, in addition, possible color image data) to the controller 110, which extracts high-level information from the map data. This high-level information is typically provided via an Application Program Interface (API) to an application running on the controller, which drives the display generating component 120 accordingly. For example, a user may interact with software running on the controller 110 by moving his hand 406 and changing his hand pose.

In some implementations, the image sensor 404 projects a speckle pattern onto a scene containing the hand 406 and captures an image of the projected pattern. In some implementations, the controller 110 calculates 3D coordinates of points in the scene (including points on the surface of the user's hand) by triangulation based on lateral offsets of the blobs in the pattern. This approach is advantageous because it does not require the user to hold or wear any kind of beacon, sensor or other marker. The method gives the depth coordinates of points in the scene relative to a predetermined reference plane at a specific distance from the image sensor 404. In this disclosure, it is assumed that the image sensor 404 defines an orthogonal set of x-axis, y-axis, z-axis such that the depth coordinates of points in the scene correspond to the z-component measured by the image sensor. Alternatively, the image sensor 404 (e.g., a hand tracking device) may use other 3D mapping methods, such as stereoscopic imaging or time-of-flight measurements, based on single or multiple cameras or other types of sensors.

In some implementations, the hand tracking device 140 captures and processes a time series of depth maps containing the user's hand as the user moves his hand (e.g., the entire hand or one or more fingers). Software running on the image sensor 404 and/or a processor in the controller 110 processes the 3D map data to extract image block descriptors of the hand in these depth maps. The software may match these descriptors with image block descriptors stored in database 408 based on previous learning processes in order to estimate the pose of the hand in each frame. The pose typically includes the 3D position of the user's hand joints and finger tips.

The software may also analyze the trajectory of the hand and/or finger over a plurality of frames in the sequence to identify a gesture. The pose estimation functions described herein may alternate with motion tracking functions such that image block-based pose estimation is performed only once every two (or more) frames while tracking changes used to find poses that occur on the remaining frames. Pose, motion, and gesture information are provided to an application running on the controller 110 via the APIs described above. The program may move and modify images presented on the display generation component 120, for example, in response to pose and/or gesture information, or perform other functions.

In some implementations, the gesture includes an air gesture. An air gesture is a motion (including a motion of a user's body relative to an absolute reference (e.g., an angle of a user's arm relative to the ground or a distance of a user's hand relative to the ground), a motion relative to another portion of the user's body (e.g., a motion of a user's hand relative to a shoulder of a user, a motion of a user's hand relative to another hand of a user, and/or a motion of a user's finger relative to another finger or portion of a hand of a user) that is detected without the user touching an input element (or being independent of an input element that is part of a device) that is part of a device (e.g., computer system 101, one or more input devices 125, and/or hand tracking device 140), and/or an absolute motion of a portion of the user's body (e.g., including a flick gesture that moves a hand by a predetermined amount and/or velocity in a predetermined gesture that includes a predetermined position or a predetermined flick of a hand) that is a predetermined amount or a predetermined amount of a rotation of a hand of a body.

In some embodiments, according to some embodiments, the input gestures used in the various examples and embodiments described herein include air gestures performed by movement of a user's finger relative to other fingers (or portions of the user's hand) for interacting with an XR environment (e.g., a virtual or mixed reality environment). In some embodiments, the air gesture is a gesture that is detected without the user touching an input element that is part of the device (or independent of an input element that is part of the device) and based on a detected movement of a portion of the user's body through the air, including a movement of the user's body relative to an absolute reference (e.g., an angle of the user's arm relative to the ground or a distance of the user's hand relative to the ground), a movement relative to another portion of the user's body (e.g., a movement of the user's hand relative to the user's shoulder, a movement of the user's hand relative to the other hand of the user, and/or a movement of the user's finger relative to the other finger or part of the hand of the user), and/or an absolute movement of a portion of the user's body (e.g., a flick gesture that includes the hand moving a predetermined amount and/or speed in a predetermined gesture that includes a predetermined gesture of speed or a shake of a predetermined amount of rotation of a portion of the user's body).

In some embodiments where the input gesture is an air gesture (e.g., in the absence of physical contact with the input device, the input device provides information to the computer system as to which user interface element is the target of the user input, such as contact with a user interface element displayed on a touch screen, or contact with a mouse or touchpad to move a cursor to the user interface element), the gesture takes into account the user's attention (e.g., gaze) to determine the target of the user input (e.g., for direct input, as described below). Thus, in embodiments involving air gestures, for example, an input gesture in combination (e.g., simultaneously) with movement of a user's finger and/or hand detects an attention (e.g., gaze) toward a user interface element to perform pinch and/or tap inputs, as described below.

In some implementations, an input gesture directed to a user interface object is performed with direct or indirect reference to the user interface object. For example, user input is performed directly on a user interface object according to performing input with a user's hand at a location corresponding to the location of the user interface object in a three-dimensional environment (e.g., as determined based on the user's current viewpoint). In some implementations, upon detecting a user's attention (e.g., gaze) to a user interface object, an input gesture is performed indirectly on the user interface object in accordance with a positioning of a user's hand while the user performs the input gesture not being at the positioning corresponding to the positioning of the user interface object in a three-dimensional environment. For example, for a direct input gesture, the user can direct the user's input to the user interface object by initiating the gesture at or near a location corresponding to the displayed location of the user interface object (e.g., within 0.5cm, 1cm, 5cm, or within a distance between 0 and 5cm measured from the outer edge of the option or the center portion of the option). For indirect input gestures, a user can direct the user's input to a user interface object by focusing on the user interface object (e.g., by looking at the user interface object), and while focusing on an option, the user initiates the input gesture (e.g., at any location that is detectable by the computer system) (e.g., at a location that does not correspond to a display location of the user interface object).

In some embodiments, according to some embodiments, the input gestures (e.g., air gestures) used in the various examples and embodiments described herein include pinch inputs and tap inputs for interacting with a virtual or mixed reality environment. For example, pinch and tap inputs described below are performed as air gestures.

In some implementations, the pinch input is part of an air gesture that includes one or more of a pinch gesture, a long pinch gesture, a pinch-and-drag gesture, or a double pinch gesture. For example, pinch gestures as air gestures include movement of two or more fingers of a hand to contact each other, i.e., optionally, immediately followed by interruption of contact with each other (e.g., within 0 seconds to 1 second). A long pinch gesture, which is an air gesture, includes movement of two or more fingers of a hand into contact with each other for at least a threshold amount of time (e.g., at least 1 second) before interruption of contact with each other is detected. For example, a long pinch gesture includes a user holding a pinch gesture (e.g., where two or more fingers make contact), and the long pinch gesture continues until a break in contact between the two or more fingers is detected. In some implementations, a double pinch gesture that is an air gesture includes two (e.g., or more) pinch inputs (e.g., performed by the same hand) that are detected in succession with each other immediately (e.g., within a predefined period of time). For example, the user performs a first pinch input (e.g., a pinch input or a long pinch input), releases the first pinch input (e.g., breaks contact between two or more fingers), and performs a second pinch input within a predefined period of time (e.g., within 1 second or within 2 seconds) after releasing the first pinch input.

In some implementations, the pinch-and-drag gesture as an air gesture includes a pinch gesture (e.g., a pinch gesture or a long pinch gesture) that is performed in conjunction with (e.g., follows) a drag input that changes a position of a user's hand from a first position (e.g., a start position of the drag) to a second position (e.g., an end position of the drag). In some implementations, the user holds the pinch gesture while performing the drag input, and releases the pinch gesture (e.g., opens their two or more fingers) to end the drag gesture (e.g., at the second location). In some implementations, the pinch input and the drag input are performed by the same hand (e.g., a user pinch two or more fingers to contact each other and move the same hand into a second position in the air with a drag gesture). In some embodiments, the pinch input is performed by a first hand of the user and the drag input is performed by a second hand of the user (e.g., the second hand of the user moves in the air from the first position to the second position as the user continues to pinch the input with the first hand of the user). In some implementations, the input gesture as an air gesture includes an input (e.g., pinch and/or tap input) performed using two hands of the user. For example, an input gesture includes two (e.g., or more) pinch inputs performed in conjunction with each other (e.g., concurrently or within a predefined time period). For example, a first pinch gesture (e.g., pinch input, long pinch input, or pinch and drag input) is performed using a first hand of a user, and a second pinch input is performed using the other hand (e.g., a second hand of the two hands of the user) in combination with the pinch input performed using the first hand. In some embodiments, movement between the user's two hands (e.g., increasing and/or decreasing the distance or relative orientation between the user's two hands).

In some implementations, the tap input (e.g., pointing to the user interface element) performed as an air gesture includes movement of a user's finger toward the user interface element, movement of a user's hand toward the user interface element (optionally, the user's finger extends toward the user interface element), downward movement of the user's finger (e.g., mimicking a mouse click motion or a tap on a touch screen), or other predefined movement of the user's hand. In some embodiments, a flick input performed as an air gesture is detected based on a movement characteristic of a finger or hand performing a flick gesture movement of the finger or hand away from a user's point of view and/or toward an object that is a target of the flick input, followed by an end of the movement. In some embodiments, the end of movement is detected based on a change in movement characteristics of the finger or hand performing the flick gesture (e.g., the end of movement away from the user's point of view and/or toward an object that is the target of the flick input, reversal of the direction of movement of the finger or hand, and/or reversal of the acceleration direction of movement of the finger or hand).

In some embodiments, the determination that the user's attention is directed to a portion of the three-dimensional environment is based on detection of gaze directed to that portion (optionally, without other conditions). In some embodiments, the portion of the three-dimensional environment to which the user's attention is directed is determined based on detecting a gaze directed to the portion of the three-dimensional environment with one or more additional conditions, such as requiring the gaze to be directed to the portion of the three-dimensional environment for at least a threshold duration (e.g., dwell duration) and/or requiring the gaze to be directed to the portion of the three-dimensional environment when the point of view of the user is within a distance threshold from the portion of the three-dimensional environment, such that the device determines the portion of the three-dimensional environment to which the user's attention is directed, wherein if one of the additional conditions is not met, the device determines that the attention is not directed to the portion of the three-dimensional environment to which the gaze is directed (e.g., until the one or more additional conditions are met).

In some embodiments, detection of the ready state configuration of the user or a portion of the user is detected by the computer system. Detection of a ready state configuration of a hand is used by a computer system as an indication that a user may be ready to interact with the computer system using one or more air gesture inputs (e.g., pinch, tap, pinch and drag, double pinch, long pinch, or other air gestures described herein) performed by the hand. For example, the ready state of the hand is determined based on whether the hand has a predetermined hand shape (e.g., a pre-pinch shape in which the thumb and one or more fingers extend and are spaced apart in preparation for making a pinch or grasp gesture, or a pre-flick in which the one or more fingers extend and the palm faces away from the user), based on whether the hand is in a predetermined position relative to the user's point of view (e.g., below the user's head and above the user's waist and extending at least 15cm, 20cm, 25cm, 30cm, or 50cm from the body), and/or based on whether the hand has moved in a particular manner (e.g., toward an area above the user's waist and in front of the user's head or away from the user's body or legs). In some implementations, the ready state is used to determine whether an interactive element of the user interface is responsive to an attention (e.g., gaze) input.

In a scenario where input is described with reference to an air gesture, it should be appreciated that similar gestures may be detected using a hardware input device attached to or held by one or more hands of a user, where the positioning of the hardware input device in space may be tracked using optical tracking, one or more accelerometers, one or more gyroscopes, one or more magnetometers, and/or one or more inertial measurement units, and the positioning and/or movement of the hardware input device is used instead of the positioning and/or movement of one or more hands at the corresponding air gesture. In the context of describing inputs with reference to air gestures, it should be appreciated that similar gestures may be detected using a hardware input device attached to or held by one or more hands of a user, user inputs may be detected using controls contained in the hardware input device, such as one or more touch-sensitive input elements, one or more pressure-sensitive input elements, one or more buttons, one or more knobs, one or more dials, one or more joysticks, one or more hand or finger covers that may detect changes in positioning or location of portions of a hand and/or finger relative to each other, relative to the body of the user, and/or relative to the physical environment of the user, and/or other hardware input device controls, wherein user inputs made using controls contained in the hardware input device are used to replace hand and/or finger gestures, such as flicks or air in the corresponding air gestures. For example, selection inputs described as being performed with an air tap or air pinch input may alternatively be detected with a button press, a tap on a touch-sensitive surface, a press on a pressure-sensitive surface, or other hardware input. As another example, movement input described as being performed with air kneading and dragging may alternatively be detected based on interactions with hardware input controls, such as button presses and holds, touches on a touch-sensitive surface, presses on a pressure-sensitive surface, or other hardware inputs after movement of a hardware input device (e.g., along with a hand associated with the hardware input device) through space. Similarly, two-handed input, including movement of hands relative to each other, may be performed using one air gesture and one of the hands that is not performing the air gesture, two hardware input devices held in different hands, or two air gestures performed by different hands using various combinations of air gestures and/or inputs detected by the one or more hardware input devices.

In some embodiments, the software may be downloaded to the controller 110 in electronic form, over a network, for example, or may alternatively be provided on tangible non-transitory media, such as optical, magnetic, or electronic memory media. In some embodiments, database 408 is also stored in a memory associated with controller 110. Alternatively or in addition, some or all of the described functions of the computer may be implemented in dedicated hardware, such as a custom or semi-custom integrated circuit or a programmable Digital Signal Processor (DSP). Although the controller 110 is shown in fig. 4, for example, as a separate unit from the image sensor 404, some or all of the processing functions of the controller may be performed by a suitable microprocessor and software or by dedicated circuitry within the housing of the image sensor 404 (e.g., a hand tracking device) or other devices associated with the image sensor 404. In some embodiments, at least some of these processing functions may be performed by a suitable processor integrated with display generation component 120 (e.g., in a television receiver, handheld device, or head mounted device) or with any other suitable computerized device (such as a game console or media player). The sensing functionality of the image sensor 404 may likewise be integrated into a computer or other computerized device to be controlled by the sensor output.

Fig. 4 also includes a schematic diagram of a depth map 410 captured by the image sensor 404, according to some embodiments. As described above, the depth map comprises a matrix of pixels having corresponding depth values. The pixels 412 corresponding to the hand 406 have been segmented from the background and wrist in the figure. The brightness of each pixel within the depth map 410 is inversely proportional to its depth value (i.e., the measured z-distance from the image sensor 404), where the gray shade becomes darker with increasing depth. The controller 110 processes these depth values to identify and segment components of the image (i.e., a set of adjacent pixels) that have human hand characteristics. These characteristics may include, for example, overall size, shape, and frame-to-frame motion from a sequence of depth maps.

Fig. 4 also schematically illustrates the hand bones 414 that the controller 110 eventually extracts from the depth map 410 of the hand 406, according to some embodiments. In fig. 4, the hand skeleton 414 is superimposed over the hand background 416 that has been segmented from the original depth map. In some embodiments, key feature points of the hand and optionally on the wrist or arm connected to the hand (e.g., points corresponding to knuckles, finger tips, palm center, end of the hand connected to the wrist, etc.) are identified and located on the hand skeleton 414. In some embodiments, the controller 110 uses the positions and movements of these key feature points on the plurality of image frames to determine a gesture performed by the hand or a current state of the hand according to some embodiments.

Fig. 5 illustrates an example embodiment of the eye tracking device 130 (fig. 1). In some embodiments, eye tracking device 130 is controlled by eye tracking unit 243 (fig. 2) to track the positioning and movement of the user gaze relative to scene 105 or relative to XR content displayed via display generation component 120. In some embodiments, the eye tracking device 130 is integrated with the display generation component 120. For example, in some embodiments, when display generating component 120 is a head-mounted device (such as a headset, helmet, goggles, or glasses) or a handheld device placed in a wearable frame, the head-mounted device includes both components that generate XR content for viewing by a user and components for tracking the user's gaze with respect to the XR content. In some embodiments, the eye tracking device 130 is separate from the display generation component 120. For example, when the display generating component is a handheld device or an XR chamber, the eye tracking device 130 is optionally a device separate from the handheld device or XR chamber. In some embodiments, the eye tracking device 130 is a head mounted device or a portion of a head mounted device. In some embodiments, the head-mounted eye tracking device 130 is optionally used in combination with a display generating component that is also head-mounted or a display generating component that is not head-mounted. In some embodiments, the eye tracking device 130 is not a head mounted device and is optionally used in conjunction with a head mounted display generating component. In some embodiments, the eye tracking device 130 is not a head mounted device and optionally is part of a non-head mounted display generating component.

In some embodiments, the display generation component 120 uses a display mechanism (e.g., a left near-eye display panel and a right near-eye display panel) to display frames including left and right images in front of the user's eyes, thereby providing a 3D virtual view to the user. For example, the head mounted display generating component may include left and right optical lenses (referred to herein as eye lenses) located between the display and the user's eyes. In some embodiments, the display generation component may include or be coupled to one or more external cameras that capture video of the user's environment for display. In some embodiments, the head mounted display generating component may have a transparent or translucent display and the virtual object is displayed on the transparent or translucent display through which the user may directly view the physical environment. In some embodiments, the display generation component projects the virtual object into the physical environment. The virtual object may be projected, for example, on a physical surface or as a hologram, such that an individual uses the system to observe the virtual object superimposed over the physical environment. In this case, separate display panels and image frames for the left and right eyes may not be required.

As shown in fig. 5, in some embodiments, the eye tracking device 130 (e.g., a gaze tracking device) includes at least one eye tracking camera (e.g., an Infrared (IR) or Near Infrared (NIR) camera) and an illumination source (e.g., an IR or NIR light source, such as an array or ring of LEDs) that emits light (e.g., IR or NIR light) toward the user's eye. The eye-tracking camera may be directed toward the user's eye to receive IR or NIR light reflected directly from the eye by the light source, or alternatively may be directed toward "hot" mirrors located between the user's eye and the display panel that reflect IR or NIR light from the eye to the eye-tracking camera while allowing visible light to pass through. The eye tracking device 130 optionally captures images of the user's eyes (e.g., as a video stream captured at 60-120 frames per second (fps)), analyzes the images to generate gaze tracking information, and communicates the gaze tracking information to the controller 110. In some embodiments, both eyes of the user are tracked separately by the respective eye tracking camera and illumination source. In some embodiments, only one eye of the user is tracked by the respective eye tracking camera and illumination source.

In some embodiments, the eye tracking device 130 is calibrated using a device-specific calibration process to determine parameters of the eye tracking device for the particular operating environment 100, such as 3D geometry and parameters of LEDs, cameras, hot mirrors (if present), eye lenses, and display screens. The device-specific calibration procedure may be performed at the factory or another facility prior to delivering the AR/VR equipment to the end user. The device-specific calibration process may be an automatic calibration process or a manual calibration process. According to some embodiments, the user-specific calibration process may include an estimation of eye parameters of a specific user, such as pupil position, foveal position, optical axis, visual axis, eye distance, etc. According to some embodiments, once the device-specific parameters and the user-specific parameters are determined for the eye-tracking device 130, the images captured by the eye-tracking camera may be processed using a flash-assist method to determine the current visual axis and gaze point of the user relative to the display.

As shown in fig. 5, the eye tracking device 130 (e.g., 130A or 130B) includes an eye lens 520 and a gaze tracking system including at least one eye tracking camera 540 (e.g., an Infrared (IR) or Near Infrared (NIR) camera) positioned on a side of the user's face on which eye tracking is performed, and an illumination source 530 (e.g., an IR or NIR light source such as an array or ring of NIR Light Emitting Diodes (LEDs)) that emits light (e.g., IR or NIR light) toward the user's eyes 592. The eye-tracking camera 540 may be directed toward a mirror 550 (which reflects IR or NIR light from the eye 592 while allowing visible light to pass) located between the user's eye 592 and the display 510 (e.g., left or right display panel of a head-mounted display, or display of a handheld device, projector, etc.) (e.g., as shown in the top portion of fig. 5), or alternatively may be directed toward the user's eye 592 to receive reflected IR or NIR light from the eye 592 (e.g., as shown in the bottom portion of fig. 5).

In some implementations, the controller 110 renders AR or VR frames 562 (e.g., left and right frames for left and right display panels) and provides the frames 562 to the display 510. The controller 110 uses the gaze tracking input 542 from the eye tracking camera 540 for various purposes, such as for processing the frames 562 for display. The controller 110 optionally estimates the gaze point of the user on the display 510 based on gaze tracking input 542 acquired from the eye tracking camera 540 using a flash assist method or other suitable method. The gaze point estimated from the gaze tracking input 542 is optionally used to determine the direction in which the user is currently looking.

Several possible use cases of the current gaze direction of the user are described below and are not intended to be limiting. As an example use case, the controller 110 may render virtual content differently based on the determined direction of the user's gaze. For example, the controller 110 may generate virtual content in a foveal region determined according to a current gaze direction of the user at a higher resolution than in a peripheral region. As another example, the controller may position or move virtual content in the view based at least in part on the user's current gaze direction. As another example, the controller may display particular virtual content in the view based at least in part on the user's current gaze direction. As another example use case in an AR application, the controller 110 may direct an external camera used to capture the physical environment of the XR experience to focus in the determined direction. The autofocus mechanism of the external camera may then focus on an object or surface in the environment that the user is currently looking at on display 510. As another example use case, the eye lens 520 may be a focusable lens, and the controller uses the gaze tracking information to adjust the focus of the eye lens 520 such that the virtual object that the user is currently looking at has the appropriate vergence to match the convergence of the user's eyes 592. The controller 110 may utilize the gaze tracking information to direct the eye lens 520 to adjust the focus such that the approaching object the user is looking at appears at the correct distance.

In some embodiments, the eye tracking device is part of a head mounted device that includes a display (e.g., display 510), two eye lenses (e.g., eye lens 520), an eye tracking camera (e.g., eye tracking camera 540), and a light source (e.g., light source 530 (e.g., IR or NIR LED)) mounted in a wearable housing. The light source emits light (e.g., IR or NIR light) toward the user's eye 592. In some embodiments, the light sources may be arranged in a ring or circle around each of the lenses, as shown in fig. 5. In some embodiments, for example, eight light sources 530 (e.g., LEDs) are arranged around each lens 520. However, more or fewer light sources 530 may be used, and other arrangements and locations of light sources 530 may be used.

In some implementations, the display 510 emits light in the visible range and does not emit light in the IR or NIR range, and thus does not introduce noise in the gaze tracking system. Note that the position and angle of the eye tracking camera 540 is given by way of example and is not intended to be limiting. In some implementations, a single eye tracking camera 540 is located on each side of the user's face. In some implementations, two or more NIR cameras 540 may be used on each side of the user's face. In some implementations, a camera 540 with a wider field of view (FOV) and a camera 540 with a narrower FOV may be used on each side of the user's face. In some implementations, a camera 540 operating at one wavelength (e.g., 850 nm) and a camera 540 operating at a different wavelength (e.g., 940 nm) may be used on each side of the user's face.

The embodiment of the gaze tracking system as illustrated in fig. 5 may be used, for example, in computer-generated reality, virtual reality, and/or mixed reality applications to provide a user with a computer-generated reality, virtual reality, augmented reality, and/or augmented virtual experience.

Fig. 6 illustrates a flash-assisted gaze tracking pipeline in accordance with some embodiments. In some embodiments, the gaze tracking pipeline is implemented by a glint-assisted gaze tracking system (e.g., eye tracking device 130 as illustrated in fig. 1 and 5). The flash-assisted gaze tracking system may maintain a tracking state. Initially, the tracking state is off or "no". When in the tracking state, the glint-assisted gaze tracking system uses previous information from a previous frame when analyzing the current frame to track pupil contours and glints in the current frame. When not in the tracking state, the glint-assisted gaze tracking system attempts to detect pupils and glints in the current frame and, if successful, initializes the tracking state to "yes" and continues with the next frame in the tracking state.

As shown in fig. 6, the gaze tracking camera may capture left and right images of the left and right eyes of the user. The captured image is then input to the gaze tracking pipeline for processing beginning at 610. As indicated by the arrow returning to element 600, the gaze tracking system may continue to capture images of the user's eyes, for example, at a rate of 60 frames per second to 120 frames per second. In some embodiments, each set of captured images may be input to a pipeline for processing. However, in some embodiments or under some conditions, not all captured frames are pipelined.

At 610, for the currently captured image, if the tracking state is yes, the method proceeds to element 640. At 610, if the tracking state is no, the image is analyzed to detect a user's pupil and glints in the image, as indicated at 620. At 630, if the pupil and glints are successfully detected, the method proceeds to element 640. Otherwise, the method returns to element 610 to process the next image of the user's eye.

At 640, if proceeding from element 610, the current frame is analyzed to track pupils and glints based in part on previous information from the previous frame. At 640, if proceeding from element 630, a tracking state is initialized based on the pupil and flash detected in the current frame. The results of the processing at element 640 are checked to verify that the results of the tracking or detection may be trusted. For example, the results may be checked to determine if the pupil and a sufficient number of flashes for performing gaze estimation are successfully tracked or detected in the current frame. If the result is unlikely to be authentic at 650, then the tracking state is set to no at element 660 and the method returns to element 610 to process the next image of the user's eye. At 650, if the result is trusted, the method proceeds to element 670. At 670, the tracking state is set to yes (if not already yes) and pupil and glint information is passed to element 680 to estimate the gaze point of the user.

Fig. 6 is intended to serve as one example of an eye tracking technique that may be used in a particular implementation. As will be appreciated by one of ordinary skill in the art, other eye tracking techniques, currently existing or developed in the future, may be used in place of or in combination with the glint-assisted eye tracking techniques described herein in computer system 101 for providing an XR experience to a user, according to various embodiments.

In this disclosure, various input methods are described with respect to interactions with a computer system. When one input device or input method is used to provide an example and another input device or input method is used to provide another example, it should be understood that each example may be compatible with and optionally utilize the input device or input method described with respect to the other example. Similarly, various output methods are described with respect to interactions with a computer system. When one output device or output method is used to provide an example and another output device or output method is used to provide another example, it should be understood that each example may be compatible with and optionally utilize the output device or output method described with respect to the other example. Similarly, the various methods are described with respect to interactions with a virtual environment or mixed reality environment through a computer system. When examples are provided using interactions with a virtual environment, and another example is provided using a mixed reality environment, it should be understood that each example may be compatible with and optionally utilize the methods described with respect to the other example. Thus, the present disclosure discloses embodiments that are combinations of features of multiple examples, without the need to list all features of the embodiments in detail in the description of each example embodiment.

User interface and associated process

Attention is now directed to embodiments of a user interface ("UI") and associated processes that may be implemented on a computer system, such as a portable multifunction device (e.g., a smart phone) or a head-mounted device, in communication with one or more input devices and, optionally, display generating components.

Fig. 7A-7E illustrate examples of techniques for performing operations using air gestures. FIG. 8 is a flow chart of an exemplary method 800 for performing an operation using an air gesture. The user interfaces in fig. 7A-7E are used to illustrate the processes described below, including the process in fig. 8.

In fig. 7A, a user 701 is holding and interacting with a device 700 that includes a display 700a that is displaying a home screen interface 700b in fig. 7A. In some embodiments, device 700 includes one or more features of computer system 101, such as eye tracking device 130.

Fig. 7A also illustrates a user 701 wearing a wearable device 702. Wearable device 702 is on the right hand of user 701. In some embodiments, wearable device 702 includes one or more sensors (e.g., one or more heart rate sensors and/or blood pressure sensors (e.g., one or more light emitting elements and/or optical sensors on a back side of the device oriented toward the wrist of user 701), accelerometers, and/or gyroscopes) that detect movement (e.g., rotation and/or lateral movement), orientation, gestures, and positioning of the right hand of user 701. At fig. 7A, the right hand of user 701 is in a neutral position (e.g., the right hand of user 702 is not rotated) and is released. In the embodiment of fig. 7A-7E, wearable device 702 is a smart watch. However, in some embodiments, wearable device 702 is another device (e.g., a bracelet, ring, brooch, or pin) that can be worn and that can track hand movements (e.g., via a camera or other optical sensor or motion sensor). In some embodiments, device 702 includes one or more components of computer system 101. In some embodiments, the device 700 further includes one or more sensors (e.g., cameras or other optical sensors, or motion sensors) capable of tracking hand movements and user gestures.

Returning to the device 700 in fig. 7A, the home screen user interface 700b includes a plurality of application icons including a weather application icon 710a corresponding to a weather application and a camera application icon 710b corresponding to a camera application. In some implementations, the display 700a is touch sensitive and the device 700 opens and/or launches a corresponding application upon detecting touch contact on an application icon. In fig. 7A, the device 700 detects (e.g., via the eye-tracking device 130) that the gaze of the user 701 is directed to the weather application icon 710a, as indicated by the exemplary gaze line 752a and gaze target indication 754 a. In the embodiments of fig. 7A-7E, gaze line 752a and gaze target indication 754a are provided for illustrative purposes only and are not part of the user interface provided by device 700 and/or 702. In some implementations, the device 700 displays a visual indication (e.g., location point) of the detected gaze location/direction. In some implementations, the user's attention is determined based on parameters other than gaze (e.g., the direction the user points with a finger; the position of a cursor controlled by a mouse and/or joystick).

In some implementations, the home screen interface 700b is part of an augmented reality environment displayed via the display 700a (e.g., the home screen interface 700b is a set of viewpoint-locked virtual objects) that includes a representation of a physical environment (e.g., a through representation based on data captured via one or more cameras of the device 700). In some implementations, the device 702 is represented in an augmented reality environment. In some embodiments, the device 702 is not represented in an augmented reality environment, even when it is within a portion of the physical environment currently being represented. In some implementations, the device 700 is a head-mounted system or display (e.g., an HMD), and the device 700 and/or the device 702 detect various gestures performed by the user 701 while operating the device 700 as an HMD. In such implementations, the user 701 may control certain operations of the devices 700 and/or 702 via gestures without having to contact (e.g., provide touch input to) the devices 700 and/or 702. Control via gestures may be particularly useful for HMDs, as hardware elements of the HMDs (e.g., buttons or touch-sensitive surfaces) may not be visible to a user when wearing the HMDs and/or may be difficult to operate due to their location and/or lack of visibility.

At fig. 7A, device 702 (and/or device 700 in some implementations) detects a first input as a first portion of a gesture, as indicated by exemplary text 750a 1. In some implementations, the first portion of the detected gesture is a pinch-in-air gesture made using the thumb and index finger of the right hand of the user 701 without the fingers touching the device 700 or 702. In some embodiments, the air gesture is a different air gesture, such as a flick gesture made with an index or middle finger (e.g., a mid-air flick that does not involve contacting device 700 or device 702 with a user's finger) or positioning the user's finger in a predetermined pose (e.g., a "C" shape made with a thumb and index finger). In some implementations, the detected input is not an air gesture, such as device 700 and/or device 702 detecting contact with or actuation of a hardware button. While this description refers to the first input as a first portion of a gesture, it should be understood that the first input may also be referred to and considered as a discrete gesture (e.g., rather than a portion of a gesture that may include other portions).

At fig. 7B, in response to detecting the first input corresponding to the first portion of text 750a1 as a gesture, device 702 sends an indication of the detected first input to device 700. The device 700 displays a representation 720a corresponding to the weather application and outputs a haptic output 770a in response to receiving the indication of the detected first input. In some embodiments, haptic output 770a is provided by device 702, or both devices 700 and 702 provide haptic output. In some implementations, the representation 720a is a preview of the weather application and may be selected (e.g., via touch, via gaze, and/or air gestures) to display a user interface of the weather application. In some embodiments, representation 720a is a user interface of the weather application with which the user can interact directly (e.g., select different locations to obtain corresponding weather information). As described with reference to fig. 7A, the device 700 detects that the gaze of the user 701 is directed to the weather application icon 710a when the first input is detected. In some embodiments, the device 700 displays the representation 720a in response to the first input because the device 700 detects that the user's gaze is directed to the weather application icon 710a, and if the user's gaze is directed to a different application icon (e.g., camera application icon 710 b), a different representation of the different application (e.g., a representation of the camera application) will be displayed. In some embodiments, device 700 displays representation 720a regardless of the direction of the detected user gaze (e.g., because the weather application was the last open application and/or because the weather application was a designated and/or favorite application). In some implementations, the device 700 does not display the representation 720a in response to receiving an indication of the first input when the user's gaze is positioned on an object (e.g., a graphical indication of time) that is not associated with performing the operation when the first input is detected.

At fig. 7B, device 702 detects a second input that is a first portion of the gesture (e.g., holding the first input (e.g., the user continues to hold the pinch-in air gesture)) in combination with a second portion of the gesture, as indicated by exemplary text 750a 2. In some implementations, the second portion of the gesture is an air gesture that includes a rotation of the user's 701 hand performed while maintaining the first portion of the gesture (e.g., the user rotates her hand while continuing to pinch). In some implementations, the device 702 detects that hand rotation of the second input occurs at a first speed (e.g., 10 degrees per second) and includes a first angular distance of hand rotation (e.g., 20 degrees) determined to correspond to a request to transition to display of the next application representation (e.g., the second input is determined to be a request to transition to an amplitude of 1). In some implementations, detecting the second input includes detecting an air gesture (e.g., a pinching motion followed by a rotation) similar to a position and movement of a user's hand when turning the physical knob and/or dial. While this description refers to the second input as a second portion of the gesture that also includes the first portion of fig. 7A, it should be understood that the second input may also be referred to and considered as a separate gesture (e.g., rather than a portion of the gesture that also includes other portions (e.g., the first portion)).

AT. in fig. 7C, in response to detecting the second input, device 702 sends an indication of the detected second input to device 700. In response to receiving the indication of the detected second input, device 700 displays an animation of representation 720a moving left and back to be replaced in the middle foreground of display 700a by representation 720b corresponding to the calculator application. The device 700 also provides a tactile output 770b in response to receiving an indication of the detected second input. In some implementations, one or more properties (e.g., duration, amplitude, mode, and/or frequency) of the haptic output 770b are based on one or more characteristics of the second input (e.g., rotational speed, angular distance, and/or resting position of the hand at the end of rotation) and/or current operation. For example, in some embodiments, a discrete haptic output (e.g., a discrete perceptible vibration) is output as each representation (e.g., representation 720 b) transitions onto the display. In some embodiments, haptic output 770b is provided by device 702, or both devices 700 and 702 provide haptic output. In some implementations, detecting a second portion of the gesture (e.g., rotation of a hand) and not detecting a first portion of the gesture (e.g., pinch gesture) does not cause display of representation 720a (e.g., a first operation). In some embodiments, doing so does not cause any perceptible operation (e.g., only rotation of the hand is not a mapped gesture).

At fig. 7C, the device 702 detects a third input that is a first portion of the gesture (e.g., holding the first input (e.g., the user continues to hold the pinch-in air gesture)) in combination with a third portion of the gesture, as indicated by the exemplary text 750a 3. In some implementations, the third portion of the gesture is an air gesture that includes a further rotation of the user's 701 hand in the same direction (e.g., a further clockwise rotation beyond the clockwise rotation discussed with respect to fig. 7B) performed while maintaining the first portion of the gesture (e.g., the user further rotates her hand while continuing to pinch). In some embodiments, device 702 detects that hand rotation of the third input occurs at a second, higher speed (e.g., 20 degrees per second) and includes the same first angular distance (e.g., 20 degrees) of hand rotation as detected in fig. 7B (e.g., further rotation is the same amount of rotation as in fig. 7B, but occurs at a faster speed), which is determined to correspond to a request to transition to an application representation displayed in the representation sequence two representations away from representation 720B (e.g., the second input is determined to be a request for a transition amplitude of 2). Thus, in such embodiments, the execution amplitude of the operation for the corresponding change in angular distance varies depending on the speed of hand rotation, the same amount of rotation resulting in a greater execution amplitude when the same amount of rotation is executed at a faster speed. In some implementations, the third portion of the gesture holds a first portion of the gesture (e.g., pinch the air gesture) and holds a second portion of the gesture (e.g., holds the hand in the same rotational position of the second portion of the gesture without further rotation), in such implementations, the second portion of the gesture is held so that operation (e.g., by representing a transition) continues as long as the gesture continues to be held (e.g., similar to operation of a physical scroll wheel type dial). In some such implementations, upon continuing operation based on detecting the first and second portions of the hold gesture, rotation in an opposite direction of rotation of the second portion is detected (e.g., the second portion is clockwise, counter-clockwise rotation is detected) such that operation ceases (e.g., no longer continues). In some such implementations, upon continuing operation based on detecting the first and second portions of the hold gesture, further rotation of the hand in the same direction as the second portion (e.g., the second portion is clockwise, further clockwise rotation is detected) is detected such that a speed of operation (e.g., a speed indicative of (e.g., 720 a-720 d) transition) increases.

In some implementations, the third portion of the gesture is a rotation in a direction opposite to the direction of the gesture in FIG. 7B, in such implementations, the representation 720a will transition back into being in the middle foreground of the display 700a (e.g., the operation has a directional component determined based on the direction of rotation). While this description refers to the third input as a third portion of the gesture that also includes the first portion of fig. 7A and the second portion of fig. 7B, it should be understood that the third input may also be referred to and considered as a separate gesture (e.g., rather than a portion of the gesture that also includes other portions (e.g., the first portion)).

AT. in fig. 7D, in response to detecting the third input, device 702 sends an indication of the detected third input to device 700. In response to receiving the detected indication of the third input, device 700 displays an animation of representation 720b moving left and back and an animation of application representation 720c and representation 720d transitioning onto display 700a corresponding to the map application. Representation 720c is a representation that appears after representation 720b in a series of representations, where representation 720d is a representation that follows representation 720c (e.g., representation 720d is two positions that follow representation 720b in the series). Representation 720d is at the middle foreground of display 700 a. In some embodiments, the device 700 transitions the two representations in the series (e.g., instead of one as occurs in fig. 7C) in response to the third input because the third input includes a further hand rotation that rotates the same first angular distance but at a higher second speed (e.g., as compared to the first speed of the hand rotation of the second input). The device 700 also provides a tactile output 770c in response to receiving an indication of the detected third input. In some embodiments, the adjusted component is based on a characteristic of the input. For example, in some embodiments, when the second input includes a rotation of the user's hand, the direction of adjustment is determined based on the direction of the rotation (e.g., clockwise for an increase in sound intensity (volume) and counterclockwise for a decrease in sound intensity), and further, the magnitude of adjustment is based on the speed of the rotation and/or the angular distance (e.g., each degree change in angular distance results in a 1% increase or decrease in sound intensity). In some embodiments, haptic output 770c is provided by device 702, or both devices 700 and 702 provide haptic output.

At fig. 7D, device 702 detects a fourth input that includes an end of the first portion of the gesture (e.g., the user moves her thumb and index finger away (e.g., by relaxing her hand), stopping holding the pinch gesture), as indicated by text 750a 4. In some embodiments, the fourth input does not include any rotation of the hand of the user 701.

At fig. 7E, in response to detecting the fourth input, device 702 sends an indication of the detected fourth input to device 700. In response to receiving the detected indication of the fourth input, device 700 displays a user interface 730 for the news application (e.g., exits represent selection mode). Thus, in the embodiment of fig. 7A-7E, user 701 is able to initiate a selection operation (e.g., via a first input), navigate through a series of application representations (e.g., via a second input and a third input) to an application of interest, and then open the application of interest (e.g., via a fourth input), all via a gesture performed by one hand without the hand contacting device 700 or device 702. In some implementations, the device 700 stops further transition of the representation of the application (e.g., exits the mode representing the transition) but continues to display the same content (e.g., as seen in fig. 7D) in response to detecting the fourth input, instead of displaying the user interface 730.

Additional description with respect to fig. 7A-7E is provided below with respect to method 800 described with respect to fig. 7A-7E.

FIG. 8 is a flowchart of an exemplary method 800 for performing operations using air gestures, according to some embodiments. In some embodiments, the method 800 is performed at a computer system (e.g., a smartphone, a desktop computer, a laptop computer, a tablet, a smartwatch, a wrist-worn fitness tracker, a heads-up display unit, a head-worn display unit, an optical head-worn display unit, a head-worn augmented reality and/or augmented reality device, and/or a wearable device) in communication with one or more input devices (e.g., a camera, a gyroscope, an accelerometer, an acoustic sensor, and/or a physiological sensor (e.g., a blood pressure sensor, and/or a heart rate sensor), the hand tracking unit 244, and/or the eye tracking unit 243) (in some embodiments, the computer system 101, the device 700, the device 702, and/or the device 900 in fig. 1). In some embodiments, the one or more input devices are integrated into an external device (e.g., a smart watch and/or a wearable device; device 702) that communicates with the computer system. In some implementations, the computer system communicates with a display generation component (e.g., display 700a, a display controller, a touch-sensitive display system, and/or a head-mounted display system). In some embodiments, the method 800 is managed by instructions stored in a non-transitory (or transitory) computer-readable storage medium and executed by one or more processors of a computer system (such as the one or more processors 202 of the computer system 101) (e.g., the control 110 in fig. 1). Some of the operations in method 800 are optionally combined and/or the order of some of the operations are optionally changed.

The computer system (e.g., 700, 702, and/or 900) detects (802), via one or more input devices (e.g., 244), an air gesture (and in some embodiments, a gesture made by the hand that does not include direct contact with the computer system and/or one or more sensors of the computer system) performed by the hand (e.g., the hand of the user of the computer system) and that includes a rotation of the hand (e.g., 750a2 or 750a 3) (e.g., rotation in a first direction, rotation at a first speed, rotation at a first acceleration, rotation of the user's wrist, and/or rotation about an axis parallel to the user's wrist). In some embodiments, the air gesture is detected via one or more physiological sensors and/or optical sensors and/or gyroscopes of an external device (e.g., a smart watch and/or a wrist-worn fitness tracker in communication with a computer system and including one or more physiological sensors).

In response to detecting an air gesture (804) and in accordance with a determination that the air gesture meets a first set of criteria, wherein the first set of criteria includes criteria met when the air gesture (and/or in some embodiments at least a portion of the air gesture is detected) is detected (based on the air gesture) when a pinch gesture (e.g., a gesture in which a finger (e.g., a predetermined finger (e.g., an index finger) and/or thumb of a hand) is being performed (and/or is being performed) a predetermined distance (e.g., a contact or near contact) from another finger and/or thumb) (e.g., as seen in fig. 7B and/or 7D) is detected (and/or in some embodiments, at least a portion of the air gesture is detected) before the hand is (is detected via one or more input devices), the computer system performs (806) a first operation (e.g., a display as seen in fig. 7B) (e.g., adjusts a tone intensity, navigates through a list of items, scrolls through content, adjusts one or more characteristics (e.g., brightness, contrast, hue, and/or warmth)) based on the air gesture. In some embodiments, the pinch gesture includes the finger and/or thumb being within a predetermined orientation of the other finger and/or thumb and/or another portion of the hand (e.g., with the pads substantially facing each other).

In response to detecting the air gesture (804) and in accordance with a determination that the air gesture does not meet a first set of criteria (e.g., because hand rotation is not being performed while pinching is being performed), the computer system relinquishes (808) performing a first operation based on the air gesture (e.g., as discussed with respect to fig. 7C) (e.g., relinquishes performing any operation based on the air gesture and/or performing an operation other than the first operation). In response to detecting the air gesture and in accordance with a determination that the air gesture satisfies a first set of criteria, including criteria that are satisfied when the air gesture is detected while the hand is performing a pinch gesture, a selection is made whether to perform the first operation based on the air gesture, which provides the user with more control to cause performance of the first operation by making the particular gesture without requiring additional controls and/or virtual objects that clutter the user interface.

In some implementations, in response to detecting an air gesture and in accordance with a determination that the air gesture meets a first set of criteria and in accordance with a determination that the hand is rotating (e.g., while the air gesture is being detected) at a first movement magnitude (e.g., speed, acceleration, and/or angular distance), a first operation (e.g., movement (e.g., lateral movement and/or rotational movement across the display generating component), a display speed (e.g., display and/or non-display of one or more objects), and/or change one or more characteristics (e.g., sound, brightness, tone, color, and/or contrast) of the display generating component and/or the computer system is performed at the first execution magnitude (e.g., as described with reference to fig. 7C and 7D). In response to detecting the air gesture and in accordance with a determination that the air gesture meets a first set of criteria and in accordance with a determination that the hand rotates (e.g., while the air gesture is being detected) at a second magnitude of movement (e.g., speed, acceleration, and/or angular distance) that is different than the first magnitude of movement, a first operation is performed at a second magnitude of execution that is different than the first magnitude of execution (e.g., as described with reference to fig. 7C and 7D). In some embodiments, the first operation is performed at a faster rate when the hand rotates faster and/or at a slower rate when the hand rotates slower. Performing the first operation with an execution amplitude that is based on the movement amplitude of the rotating hand allows the user to control how the first operation is executed, which provides the user with more control over the computer system without cluttering the user interface.

In some implementations, in response to detecting the air gesture and in accordance with a determination that the air gesture meets a first set of criteria and in accordance with a determination that the hand is rotating at a first speed (e.g., a current speed, an average speed, a median speed, and/or a minimum and/or maximum speed of the hand as it rotates) and the hand is rotating a first angular distance, a first operation is performed at a third execution magnitude (e.g., as described with reference to gesture 750a3 with reference to fig. 7C and 7D). In some implementations, in response to detecting the air gesture and in accordance with a determination that the air gesture meets a first set of criteria and in accordance with a determination that the hand is rotating at a first speed (e.g., a current speed, an average speed, a median speed, and/or a minimum speed, and/or a maximum speed of the hand as it rotates) and the hand is rotating a second angular distance different from the first angular distance, the first operation is performed at a fourth execution magnitude different from the third execution magnitude (e.g., as described with reference to gesture 750a3 with reference to fig. 7C and 7D). In some embodiments, performing the first operation at a third execution magnitude includes moving the one or more respective user interface objects a first amount, and performing the second operation at a fourth execution magnitude includes moving the one or more respective user interface objects a second amount different from the first amount. In some embodiments, the magnitude of the corresponding change in angular distance is performed to vary depending on the speed of hand rotation. In accordance with a determination that the hand is rotated a first angular distance (e.g., 20 degrees) at a first speed (e.g., 10 degrees per second), a first operation is performed at a first execution amplitude (1 unit). In accordance with a determination that the hand is rotated (the same) a first angular distance (e.g., 20 degrees) at a second speed (e.g., 20 degrees per second) different from the first speed, the first operation is performed at a second amplitude (e.g., 4 units) different from the first amplitude. In some embodiments, the second speed is faster than the first speed and the second amplitude is greater than the first amplitude, such that for the same amount of angular distance, rotating the hand faster causes the operation to be performed at a greater amplitude. Performing the first operation at an execution magnitude that is based on the angular distance of rotation of the hand (and in some embodiments, even where the hand rotates at the same speed and different angular distances) allows the user to control how the first operation is executed, which provides the user with more control over the computer system without cluttering the user interface.

In some implementations, the first angular distance is less than the second angular distance (e.g., as described with reference to fig. 7C and 7D with reference to gesture 750a 3), and wherein the third execution magnitude is greater than the fourth execution magnitude (e.g., faster and/or the first operation at the third execution magnitude is performed faster than the first operation at the fourth execution magnitude because the first angular distance is less (e.g., shorter) than the second angular distance (and in some implementations, regardless of the rotational speed)). Performing the first operation at an execution magnitude based on the angular distance of rotation of the hand provides the user with a choice to cause the first operation to be performed slower by increasing the angular distance of rotation of the hand, which provides the user with more control over the computer system without cluttering the user interface.

In some embodiments, in response to detecting the air gesture and in accordance with a determination that the air gesture meets a first set of criteria and in accordance with a determination that the hand is rotating in a first direction (e.g., clockwise or counterclockwise with respect to yaw, pitch, and/or roll), a first operation (e.g., as described with reference to the rotational direction of fig. 7C and 750a 3) is performed based on the first direction (e.g., with a direction corresponding to the first direction) based on the first direction (e.g., by adjusting a parameter such as an increase or decrease in the first adjustment direction) (and in some embodiments, not based on a second direction different from the first direction). In some embodiments, in response to detecting the air gesture and in accordance with a determination that the air gesture meets a first set of criteria and in accordance with a determination that the hand is rotating in a second direction different from (and in some embodiments opposite) the first direction, a first operation is performed based on the second direction (e.g., the first operation is performed with an adjustment in a second adjustment direction such as decreasing or increasing the parameter) (and in some embodiments, not based on the first direction). In some embodiments, the first operation performed based on the first direction is a different operation than the first operation performed based on the second direction. Performing the first operation in a direction based on the direction of rotation of the hand allows the user to control how the first operation is performed, which provides the user with more control over the computer system without cluttering the user interface.

In some implementations, in response to detecting the air gesture and in accordance with a determination that the air gesture meets a first set of criteria, the computer system initiates a process for generating a set of one or more haptic outputs (e.g., 770 b-770 c) that indicate that the gesture meets the first set of criteria. In some embodiments, in response to detecting the air gesture and in accordance with a determination that the air gesture meets a first set of criteria, the computer system generates a haptic output and/or the computer system transmits instructions that cause a different computer system (e.g., a wearable device, a smart watch, a wrist-worn wearable device, and/or a device that includes one or more input devices in communication with the computer system) to generate the haptic output. Initiating a process for generating a set of one or more haptic outputs in response to detecting an air gesture and in accordance with a determination that the air gesture meets a first set of criteria provides feedback to a user regarding identifying/detecting and performing a first operation, which may also reduce the amount of input required to reverse performance of the operation when the user inadvertently causes the first operation.

In some embodiments, the computer system communicates with a wearable device (e.g., 702) (e.g., a smart watch, a wrist-worn wearable device, and/or a set of headphones), and initiating a process for generating haptic feedback includes transmitting one or more instructions that cause the wearable device to generate the set of one or more haptic outputs that indicate that the gesture satisfies a first set of criteria. In some embodiments, the set of one or more haptic outputs is generated only at the wearable device. In some embodiments, the set of one or more haptic outputs is generated at the computer system and the wearable device. In some embodiments, the set of one or more haptic outputs is generated only at the computer system. In response to detecting the air gesture and in accordance with a determination that the air gesture meets the first set of criteria, transmitting one or more instructions that cause the wearable device to generate a set of one or more tactile outputs provides feedback to the user regarding performance of the first operation at the wearable device, which may also reduce a number of inputs required to reverse performance of the operation when the user inadvertently causes the first operation.

In some implementations, performing the first operation based on the air gesture includes transitioning (e.g., transitioning between representations 720 a-720 d) (e.g., navigating, changing, and/or displaying) through a first sequence of states (e.g., a list of items, a set of user interfaces, one or more system states (e.g., on, sleep, and/or off; intensity level; brightness level; and/or contrast level)). In some embodiments, upon transitioning through the first sequence of states and in accordance with a determination that a first state in the first sequence of states has been reached (e.g., a state, event, and/or adjustment event when the first operation adjusts one or more user interface elements and/or one or more settings), a computer system (e.g., at the computer system and/or a computer system other than the computer system (e.g., a wearable device, a smart watch, a wrist-worn fitness tracker, and/or a wrist-worn device) generates (e.g., emits, provides, and/or outputs) a set of one or more haptic outputs (e.g., a vibratory beep, kinesthesia output) that indicates that the first state in the first sequence of states has been reached. In some embodiments, upon transitioning through the first sequence of states and in accordance with a determination that a second state in the first sequence of states has been reached, a set of one or more haptic outputs is generated (e.g., at the computer system and/or a computer system other than the computer system (e.g., a wearable device, a smart watch, a wrist-worn fitness tracker, and/or a wrist-worn device) that indicates that the second state in the first sequence of states has been reached, wherein the second state is different from the first state. In some embodiments, the set of one or more haptic outputs that indicate that a first state in the first sequence of states has been reached is different from (e.g., includes haptic outputs generated at different points in time, longer/shorter haptic outputs, and/or stronger/weaker haptic outputs) the set of one or more haptic outputs that indicate that a second state in the first sequence of states has been reached. In some embodiments, the set of one or more haptic outputs that indicate that a first state in the first sequence of states has been reached and the set of one or more haptic outputs that indicate that a second state in the first sequence of states has been reached are generated simultaneously and/or in combination with (e.g., before, simultaneously and/or after) the one or more audio outputs and/or visual outputs. Generating different sets of haptic outputs in different states provides feedback to the user regarding the execution of the different states of the first operation, which may also reduce the number of inputs required to reverse the execution of the operation when the user inadvertently causes the first operation.

In some implementations, performing the first operation based on the air gesture includes transitioning (e.g., navigating, changing, and/or displaying) through a second sequence of states (e.g., transitioning from representation 720d to 720a in reverse order) (e.g., a list of items, a set of user interfaces, one or more system states (e.g., on, sleep, and/or off), a sound intensity level, a brightness level, and/or a contrast level)). In some embodiments, upon transitioning through the sequence of states, the computer system detects that a particular state in the sequence of states has been reached (e.g., an end state (e.g., an end of a list, a last user interface in a set of user interfaces, a last system state)) (and/or that the particular state is to be reached and/or that the particular state is the next state in the sequence of states), and in response to detecting that the particular state has been reached, the computer system generates (e.g., at the computer system and/or a computer system other than the computer system (e.g., a wearable device, a smart watch, a wrist-worn fitness tracker, and/or a wrist-worn device) a set of one or more haptic outputs (e.g., a set of list end haptic outputs and/or a set of particular haptic outputs corresponding to the particular state) indicating that the particular state has been reached (e.g., to indicate that the particular state has been reached). Generating a set of one or more haptic outputs that indicate that a particular state in the sequence of states has been reached in response to detecting that the particular state has been reached provides feedback to the user that the particular state of the first operation has been reached, which may also reduce the amount of input required to attempt to continue performing the operation.

In some embodiments, while performing the first operation based on the air gesture, the computer system detects a first change to the air gesture including rotation of the hand (e.g., as described with reference to fig. 7C-7D and operation of the wheel-like dial), and in response to detecting the first change to the air gesture including rotation of the hand and in accordance with a determination that rotation of the hand has stopped (e.g., stopped and/or discontinued rotation about the axis), the computer system continues to perform the first operation (e.g., based on the air gesture) with the hand continuing to perform the pinch gesture (and in some embodiments, but and/or with the hand continuing to perform the pinch gesture). In some implementations, in response to detecting a first change to an air gesture that includes a rotation of a hand and in accordance with a determination that the rotation of the hand has stopped and the hand has not continued to perform the pinch gesture, the computer system stops performing the first operation based on the air gesture. In response to detecting a first change to an air gesture comprising a rotation of the hand and in accordance with a determination that the rotation of the hand has stopped while the hand continues to perform the pinch gesture, continuing to perform the operation provides the user with control to continue to perform the first operation without cluttering the user interface.

In some embodiments, upon performing the first operation based on the air gesture, detecting a second change to the air gesture including rotation of the hand, and in response to detecting the second change to the air gesture including rotation of the hand and in accordance with a determination that the hand has changed from rotating in a first direction to rotating in a second direction different from the first direction, stopping performing the first operation (e.g., as described with reference to fig. 7C-7D and the effect of reversing direction after maintaining the second portion of the gesture). In some embodiments, in response to detecting a second change to the air gesture including rotation of the hand and in accordance with a determination that the hand has not changed from rotating in the first direction, the computer system continues to perform the first operation based on the air gesture. In some embodiments, in response to detecting a second change to the air gesture comprising rotation of the hand and in accordance with a determination that the hand has changed from rotating in a first direction to rotating in a second direction different from the first direction, the computer system reverses performance of the operation. In response to detecting a second change to the air gesture comprising rotation of the hand and in accordance with a determination that the hand has changed from rotating in a first direction to rotating in a second direction different from the first direction, stopping performing the first operation provides control to the user to stop performance of the first operation without cluttering the user interface. In response to detecting a second change to the air gesture comprising rotation of the hand and in accordance with a determination that the hand has changed from rotating in the first direction to rotating in the second direction, stopping performing the first operation provides control to the user to stop performance of the first operation without cluttering the user interface.

In some embodiments, upon performing a first operation based on an air gesture and in accordance with a determination that a first portion of the first operation is being (or has been) performed, a set of one or more haptic outputs (e.g., a first portion of the haptic output 770c when displayed 720 c) indicating that the first portion of the first operation is being performed is generated (e.g., at the computer system and/or a computer system other than the computer system (e.g., a wearable device, a smartwatch, a wrist-worn fitness tracker, and/or a wrist-worn device)) and a second portion of the first operation is being performed (or has been) based on a determination that the second portion of the first operation is being performed (e.g., at the computer system and/or the computer system other than the computer system) is being generated (e.g., at the wearable device, a smartwatch, a wrist-worn fitness tracker, and/or a wrist-worn device), wherein the second portion of the first operation is being performed (e.g., a second portion of the haptic output 770c when displayed 720 c), and wherein the second portion of the first operation is different from the first portion of the first operation is being performed (or has been performed) than the first portion of the first operation is being performed. Generating the set of one or more haptic outputs that indicate that the first portion of the first operation is being performed and the set of one or more haptic outputs that indicate that the second portion of the first operation is being performed when the first operation is performed based on the air gesture provides feedback to the user regarding the performance of the first operation, which may also reduce the amount of input required to reverse the performance of the operation when the user inadvertently causes the first operation.

In some implementations, performing the first operation based on the air gesture includes generating a set of haptic feedback (e.g., as described with reference to fig. 7C and haptic output 770C) that indicates that the air gesture is being performed at a first speed in accordance with a determination that the air gesture is being performed at a first speed (e.g., where the first operation is being performed based on the first speed), and generating a set of haptic feedback (e.g., as described with reference to fig. 7C and haptic output 770C) that indicates that the air gesture is being performed at a second speed (e.g., speed and/or speed of hand rotation and/or angular distance of hand rotation) that is different from the first speed in accordance with a determination that the air gesture is being performed at a second speed that is different from the first speed.

In some embodiments, the first operation is performed at a first rate in response to detecting the air gesture, and while the first operation is performed at the first rate (e.g., and while continuing to detect the air gesture (and in some embodiments, the pinch gesture in which the air gesture was not detected has been released) and in response to detecting the air gesture), the computer system detects additional rotation of the hand while the hand is performing the pinch gesture, and in response to detecting additional rotation of the hand while the hand is performing the pinch gesture, the computer system performs the first operation at a second rate that is different (e.g., faster or slower) than the first rate (e.g., as described with reference to fig. 7C). Performing the first operation at a second rate different from the first rate in response to detecting additional rotation of the hand while the hand is performing the pinch gesture provides the user with control over how fast the first operation is performed without cluttering the user interface.

In some implementations, upon performing the first operation based on the air gesture, a third change to the air gesture including rotation of the hand is detected (e.g., 750a 4), and in response to detecting the third change to the air gesture including rotation of the hand and in accordance with a determination that the hand is no longer performing the pinch gesture, performance of the first operation is aborted (e.g., as seen in FIG. 7E) (e.g., regardless of whether rotation continues). In some embodiments, after relinquishing execution of the first operation, the computer system continues in a state in which the first operation is at and/or before a time when it is determined that the hand is no longer performing the pinch gesture. In some embodiments, when the first operation includes scrolling through the list, the computer system stops scrolling through the list, and the list is maintained at a location where the list was located at a time and/or after a determination was made that the hand was no longer performing the pinch gesture. In some embodiments, when the first operation includes setting a value, the computer system selects the value that is displayed and/or selected at and/or after the time when the determination is made that the hand is no longer performing the pinch gesture. In response to detecting a third change to the air gesture including rotation of the hand and in accordance with a determination that the hand is no longer performing the pinch gesture, performance of the first operation is abandoned, which provides the user with control of whether to continue performing the first operation without cluttering the user interface.

In some embodiments, the first operation is an operation performed with respect to one or more virtual objects in an augmented reality environment (e.g., as described with reference to fig. 7A).

In some embodiments, the computer system is in communication with a display generating component (e.g., 700 a), and wherein the first set of criteria includes criteria that are met when the display generating component is in an active state (e.g., an on state and/or a non-sleep state) (e.g., either inactive or inactive (e.g., an off state and/or a sleep state)). In response to detecting the air gesture and in accordance with a determination that the air gesture meets a first set of criteria including the criteria met when the display generating component is in an active state, selecting not to perform the first operation based on the air gesture allows the computer system to not automatically perform the operation in a particular instance, regardless of whether the air gesture includes a pinch gesture and rotation of the hand.

In some implementations, an air gesture is detected while a first virtual object (e.g., 710 a) and a second virtual object (e.g., camera icon of FIG. 7A) that is different from the first virtual object are displayed, and wherein in response to detecting the air gesture, a first operation is performed with respect to the first virtual object (rather than with respect to the second virtual object) in accordance with determining that the gesture satisfies a first set of criteria in a space where the user's attention is directed to the first virtual object, and a first operation is performed with respect to the second virtual object in accordance with determining that the gesture satisfies the first set of criteria in a space where the user's attention is directed to the second virtual object (rather than the first virtual object) (e.g., as described with reference to the operations in FIGS. 7B and 11E-11H). Selecting to perform the first operation with respect to a particular object based on the user's attention being directed to the particular object provides the user with control over how the first operation is performed without cluttering the user interface.

In some embodiments, an air gesture is detected while the third virtual object (e.g., 710 a) is displayed, and wherein in response to detecting the air gesture, in accordance with a determination that the air gesture satisfies a first set of criteria in a rotational space in which the user's attention is directed to the third virtual object and the third virtual object is responsive to the hand, the first operation is performed with respect to the third virtual object (rather than with respect to the second virtual object) and based on movement of the hand, and in accordance with a determination that the first set of criteria is satisfied by the air gesture in a rotational space in which the user's attention is directed to the third virtual object and the third virtual object is responsive to the hand, the first operation is performed without respect to the third virtual object (and without being based on movement of the hand) (e.g., as described with reference to FIG. 7B) (e.g., and/or without performing the first operation at all and/or not satisfying the first set of criteria). In accordance with a determination that the gesture satisfies a first set of criteria in a time when the user's attention is directed to the third virtual object and the third virtual object is responsive to rotation of the hand, a selection is made as to whether to perform the first operation relative to the third virtual object and based on movement of the hand, which allows the computer system to automatically perform the first operation relative to the object when the object is responsive to rotation of the hand.

In some implementations, the first set of criteria includes criteria that are met when the user's attention is directed to the first location, and wherein the user's attention is determined based on the user's gaze being directed to the first location (e.g., as indicated by 754 a). In response to detecting the air gesture and in accordance with a determination that the air gesture meets a first set of criteria including the criteria met when the user's attention is directed to the first location and determining the user's attention based on the user's gaze being directed to the first location, a selection is made whether to perform the first operation based on the air gesture, which provides the user with more control to cause performance of the first operation by making the particular gesture without requiring additional controls and/or virtual objects that clutter the user interface.

In some embodiments, the first set of criteria includes criteria that are met when the user's attention is directed to the second location, and wherein the user's attention is determined based on the user's hand being within a predetermined distance (e.g., 0.1 meters to 3 meters) from the second location (e.g., and/or directed to the second location (e.g., in a general direction of the second location and/or directed to the second location) responsive to detecting the air gesture and selecting whether to perform the first operation based on the air gesture in accordance with determining that the air gesture meets the first set of criteria, which provides the user with more control to cause performance of the first operation by making the particular gesture without requiring additional controls and/or virtual objects that confuse the user interface, wherein the first set of criteria includes criteria that are met when the user's attention is directed to the second location and based on the user's hand being within the predetermined distance from the second location.

In some embodiments, performing the first operation based on the air gesture includes, in response to detecting a first portion of the air gesture that includes a pinch and hold gesture, displaying a plurality of virtual objects that include virtual objects corresponding to a first application (e.g., a browser application, a note taking application, a stock application, a word processing application, a messaging application, and/or a social media application), and while displaying the plurality of virtual objects and while continuing to detect the air gesture, detecting a second portion of the air gesture that includes a rotation of a hand, and in response to detecting a second portion of the air gesture that includes a rotation of a hand, updating an appearance of a user interface that includes the plurality of virtual objects, including indicating that an input focus has moved from a virtual object corresponding to the first application to a virtual object corresponding to a second application that is different from the first application. In some embodiments, indicating that the input focus has moved includes one or more of moving a focus ring between virtual objects, moving the virtual objects around the display such that the focused virtual object is at a particular location on the display (e.g., center, middle, highlighted position, left side position, and/or right side position), highlighting the focused virtual object, and de-highlighting (deemphasizing) the one or more virtual objects that are unfocused, and/or changing the size of the focused virtual object to be different from the size of the one or more virtual objects that are unfocused, upon updating the appearance of the user interface (and in some embodiments, upon concurrently displaying the plurality of virtual objects and/or upon causing the one or more virtual objects to be displayed and stopping displaying the one or more other virtual objects), and upon indicating that the virtual object corresponding to the second application has the input focus, detecting a third portion of the pinch-in-air gesture that includes releasing the release gesture, and in response to detecting the virtual object corresponding to the second application while displaying the pinch-in-air, and/or holding the release gesture, e.g., the gesture, as described in connection with the second application, and/or the user interface 7 is initiated (e.g., the sequence of the gesture). In some implementations, the second application is not previously displayed before the air gesture is detected. In some implementations, the first application is displayed when the air gesture is initially detected and/or when the air gesture is initially detected.

Displaying a user interface corresponding to the second application in response to detecting a third portion of the air gesture including a release pinch and hold gesture when displaying the virtual object corresponding to the second application (and when transitioning through the plurality of virtual objects and when displaying the virtual object corresponding to the second application) provides more control to navigate between applications.

In some embodiments, aspects/operations of methods 800, 1000, and/or 1200 as described herein may be interchanged, substituted, and/or added between the methods. For example, the first gesture of method 1000 and/or the input of method 1200 may be an air gesture of method 800. For the sake of brevity, these details are not repeated here.

Fig. 9A to 9D illustrate examples of techniques for audio playback adjustment using gestures. Fig. 10 is a flow chart of an exemplary method 1000 for audio playback adjustment using gestures. The illustrations in fig. 9A to 9D are for illustrating the processes described below, including the process in fig. 10.

Fig. 9A illustrates a user 901 wearing a wearable device 702 on his right hand. As described above, wearable device 702 includes one or more sensors (e.g., one or more heart rate sensors and/or blood pressure sensors (e.g., one or more light emitting elements and/or optical sensors on a back side of the device oriented toward a wrist of user 901), accelerometers, and/or gyroscopes) that detect movement (e.g., rotation and/or lateral movement), orientation, gestures, and positioning of the right hand of user 901. The user 901 also wears a headphone device 900, which is a wireless headphone in communication with the device 702. In some embodiments, user 901 also wears a second headphone device in the other ear of the user. In such implementations, the operations discussed with respect to the implementations of fig. 9A-9D may occur on two headphone devices based on the described gestures. In some embodiments, the headset device includes one or more features of the computer system 101. In some implementations, both the wearable device 702 and the headset device 900 are connected (e.g., wirelessly) to another external device, such as a smart phone, a tablet computer, and/or a head mounted display device. In some implementations, the headphone device 900 is an audio component (e.g., integrated speaker) of a head mounted display (e.g., HMD) device. In such embodiments, the operations discussed with reference to fig. 9A-9D may allow a user to adjust audio playback using one or more gestures without requiring the user to make contact with the headphone device 900 and/or the wearable device 702 (e.g., provide touch input). Control via gestures may be particularly useful for HMDs, as hardware elements of the HMDs (e.g., buttons or touch-sensitive surfaces) may not be visible to a user when wearing the HMDs and/or may be difficult to operate due to their location and/or lack of visibility.

In fig. 9A, a user 901 is currently listening to song #1 via a headphone apparatus 900, as indicated by exemplary text 920. Playback of song #1 is currently at 50% intensity as indicated by exemplary indication 910 a. In the embodiment of fig. 9A-9D, exemplary elements such as text 920 and indication 910a are provided for illustrative purposes only and are not part of any user interface provided by device 702 and/or device 900.

In fig. 9A, when device 702 detects that the right hand of user 901 is on the user's side, device 702 detects a first input as a first type of gesture, as indicated by exemplary text 950 a. In some implementations, the first type of gesture is an air gesture, such as pinching and rotating as discussed above with reference to fig. 7B and 7C. In some embodiments, the air gesture is a different air gesture, such as a flick gesture made with an index or middle finger (e.g., a mid-air flick that does not involve contacting device 700 or device 702 with a user's finger) or positioning the user's finger in a predetermined pose (e.g., a "C" shape made with a thumb and index finger).

At fig. 9B, in response to detecting the first input corresponding to the exemplary text 950a, the device 702 sends an indication of the detected first input to the headset device 900 (e.g., directly or indirectly (e.g., via an external device (e.g., a smart phone) connected to both the device 702 and the device 900)). After receiving the indication of the detected first input, the device 900 continues the same audio playback operation (e.g., playback song #1 at 50% intensity) without any modification and/or adjustment based on the indication of the detected input being received, as indicated by exemplary text 910a and exemplary text 920. The device 900 does not modify and/or adjust the audio playback operation based on the first input because the first type of gesture is detected when the user's hand is detected as being on the side of the user (in some implementations, because no gesture is detected when the hand is detected as being in a predefined position (e.g., within a predetermined distance from the user's head). In some implementations, based on detecting the first input, a different operation is performed that does not affect audio playback. For example, the device 702 may send an indication of the detected first input to an external device (e.g., a smart phone or HMD), which in response performs one or more of the operations described with reference to fig. 7A-7E. In another example, the headphone device 900 performs a different non-audio playback adjustment operation in response to the first input of fig. 9A, such as outputting an indication of battery power while continuing to playback song #1 at 50% intensity. In yet another example, the device 702 may send an indication of the detected first input to an external device (e.g., a smart phone or HMD) that is rendering the augmented reality environment and performing one or more operations affecting the augmented reality environment (e.g., moving one or more virtual objects that are part of the augmented reality environment) in response to the first input.

At fig. 9B, while the headphone device 900 continues to playback song #1 at 50% intensity, the device 702 detects a second user input that includes the user 901 performing a first type of gesture (e.g., the same type of gesture as performed in fig. 9A) when the user's right hand is positioned adjacent to the user's head (e.g., within a predetermined distance (e.g., 1 to 15 inches) of the user's head), as indicated by exemplary text 950B.

At fig. 9C, in response to detecting the second input corresponding to the exemplary text 950b, the device 702 sends an indication of the detected second input (e.g., directly or indirectly) to the headset device 900 and outputs a tactile output 970a. In response to receiving the indication of the detected second input, the headphone device 900 adjusts a property of the ongoing audio playback. Specifically, as shown in fig. 9C and as indicated by the exemplary text 910b, the headphone apparatus 900 adjusts the intensity of playback from 50% to 75% while continuing to play back song # 1. In some embodiments, the adjusted component is based on a characteristic of the input. For example, in some embodiments, when the second input includes a rotation of the user's hand, the direction of adjustment is determined based on the direction of the rotation (e.g., clockwise for an increase in intensity of sound and counterclockwise for a decrease in intensity of sound), and further, the magnitude of adjustment is based on the speed of the rotation and/or the angular distance (e.g., each degree change in angular distance results in an increase or decrease in intensity of sound by 1%, 5%, or 10%). In some embodiments, one or more properties (e.g., duration, amplitude, mode, and/or frequency) of the haptic output 970a are based on one or more characteristics of the second input (e.g., rotational speed, angular distance, and/or resting position of the hand at the end of rotation) and/or current operation. For example, in some embodiments, a discrete haptic output (e.g., a discrete perceptible vibration) is output for each predetermined unit of increase in sound intensity (e.g., a discrete haptic vibration for each 1%, 5%, or 10% of increase in sound intensity).

At fig. 9C, while the headphone device 900 continues to play back song # 1at 75% intensity, the device 702 detects a third user input that includes the user 901 performing a second type of gesture (e.g., a different type of gesture than the gesture performed in fig. 9A and 9B) when the user's right hand is positioned adjacent the user's head (e.g., within a predetermined distance (e.g., 1 to 15 inches) of the user's head), as indicated by the exemplary text 950C. In some implementations, the second type of gesture is an air gesture, such as movement of a user's thumb along the length of the user's index finger (e.g., a finger swipe gesture). In some embodiments, the air gesture is a different air gesture, such as a flick gesture made with an index or middle finger (e.g., a mid-air flick that does not involve contacting device 700 or device 702 with a user's finger) or positioning the user's finger in a predetermined pose (e.g., a "V" shape made with a thumb and index finger).

At fig. 9D, in response to detecting the third input corresponding to the exemplary text 950c, the device 702 sends an indication of the detected third input (e.g., directly or indirectly) to the headset device 900 and outputs a tactile output 970b. In response to receiving the indication of the detected third input, the headphone device 900 adjusts a property of the ongoing audio playback. Specifically, as shown in fig. 9D and indicated by exemplary text 930, headphone device 900 transitions from playback of song #1 to playback of song #2 (e.g., performs a track skip operation) while continuing to output audio at 75%. In some embodiments, the adjusted component is based on a characteristic of the input. For example, in some embodiments, when the third input includes sliding the user's thumb across the user's index finger, the direction of adjustment (e.g., to go forward to the next track and backward to go to the previous track) is determined based on the direction of the sliding (e.g., to go to the previous track or the next track), and further, the magnitude of the adjustment is based on the speed and/or distance of the sliding (e.g., sliding a greater distance and/or a higher speed may result in skipping two or more tracks forward or backward). In some implementations, the audio playback adjustment operation is to pause or resume playback (e.g., when the third input includes pinching and then releasing an air gesture). In some embodiments, one or more properties (e.g., duration, amplitude, mode, and/or frequency) of the haptic output 970b are based on one or more characteristics of the second input (e.g., rotational speed, angular distance, and/or resting position of the hand at the end of rotation) and/or current operation. For example, in some embodiments, a discrete haptic output (e.g., a discrete perceptible vibration) is output as each track is skipped.

Additional description with respect to fig. 9A-9D refers to method 1000 described below with respect to fig. 9A-9D.

Fig. 10 is a flowchart of an exemplary method 1000 of audio playback adjustment using gestures, according to some embodiments. In some implementations, the method 1000 is performed at a computer system (e.g., computer system 101 in fig. 1; device 900, 700, or 702) (e.g., a smartphone, desktop computer, laptop computer, tablet, smart watch, heads-up display unit, optical heads-up display unit, head-up augmented reality and/or augmented reality device, a set of headphones, and/or a wearable device) with one or more input devices (e.g., a camera, a gyroscope, an accelerometer, an acoustic sensor, and/or a physiological sensor (e.g., a blood pressure sensor and/or a heart rate sensor). In some embodiments, one or more input devices are integrated into an external device (e.g., a smartwatch, a smartphone, and/or a wearable device) that communicates with the computer system. In some implementations, the computer system communicates with a display generation component (e.g., a display controller, a touch-sensitive display system, and/or a head-mounted display system). In some embodiments, method 1000 is managed by instructions stored in a non-transitory (or transitory) computer-readable storage medium and executed by one or more processors of a computer system (such as one or more processors 202 of computer system 101) (e.g., control 110 in fig. 1). Some operations in method 1000 are optionally combined and/or the order of some operations is optionally changed.

The computer system (e.g., 900) detects (1002) (and in some embodiments, while the audio playback operation is occurring) a first gesture (e.g., 950a or 950 b) (e.g., an air gesture) performed by a first hand (e.g., a hand of a user of the computer system) via one or more input devices.

In response to detecting the first gesture (1004) and in accordance with a determination that the first gesture (e.g., 950 b) meets a first set of criteria, the computer system performs (1006) an audio playback adjustment operation (e.g., reducing the sound intensity as in fig. 9C; changing the track as in fig. 9D) (e.g., pausing, playing, rewinding, fast-forwarding, increasing and/or decreasing the sound intensity, jumping forward through content or between content items, and/or jumping back through content or between content items), wherein the first set of criteria includes transmitting instructions to the computer system and/or an external device (e.g., a wearable device (e.g., a set of headphones, an ear bud, and/or a head mounted display system)) at least a threshold distance (e.g., 1cm to 20cm above and/or at its location) when the first gesture includes the first hand meeting the first set of criteria at (e.g., lifting up to and/or moving to) a first body part of the user (e.g., ear of the user, the user's head of the user, and/or a body part of the user on which the external device (e.g., a body part of the wearable device (e.g., a set of headphones, an ear bud, and/or a head set of the head bud) is located) of the user) at the location thereof), the audio playback adjustment operation and/or the computer system is not performed in the same set of criteria and/or the computer system.

In response to detecting the first gesture (1004) and in accordance with a determination that the first gesture (e.g., 950 a) does not meet the first set of criteria, the computer system aborts (1008) (e.g., as seen in fig. 9B) performing audio playback adjustment operations (e.g., aborts performing any operations based on detecting the first gesture and/or performs operations other than audio playback adjustment operations). In some embodiments, as part of performing the audio playback operation, the computer system causes an external device (e.g., a set of headphones and/or a wearable device) to perform the operation. In response to detecting the first gesture and in accordance with a determination that the first gesture satisfies a first set of criteria, the first set of criteria including criteria that are satisfied when the first gesture includes the first hand in a first position at least a threshold distance from a first body part of the user, selecting whether to execute the audio playback adjustment operation provides the user with more control to cause execution of the audio playback adjustment operation by making the particular gesture without requiring additional controls and/or virtual objects that confuse the user interface.

In some embodiments, the first gesture includes detecting that the first hand is lifted from the second position (e.g., a position different and/or distinct from the first position and/or a position that is not at least a threshold distance from the first body part of the user) to the first position (e.g., as seen in fig. 9A-9B). In some embodiments, as part of detecting the first gesture, the computer system detects that the first hand is lowered from the third position to the first position and/or a position that is not at least a threshold distance from the first body part of the user. In response to detecting the first gesture and in accordance with a determination that the first gesture satisfies a first set of criteria, the first set of criteria including criteria that are satisfied when the first gesture includes the first hand being lifted to the first position, a selection of whether to execute the audio playback adjustment operation is made.

In some embodiments, in response to detecting the first gesture, in accordance with a determination that the first gesture (e.g., 950 a) does not meet the first set of criteria, an operation (e.g., as described with reference to fig. 9B) is performed that is different from the audio playback adjustment operation (and in some embodiments, the external device is not caused to perform the operation). Performing an operation other than the audio playback adjustment operation in response to detecting the first gesture and in accordance with a determination that the first gesture does not meet the first set of criteria provides more control to the user such that the operation other than the audio playback adjustment operation is performed.

In some embodiments, the operation other than the audio playback adjustment operation is an operation to change one or more aspects of the augmented reality environment (e.g., as described with reference to fig. 9B) (e.g., an augmented reality environment displayed by a computer system and/or by a display generation component in communication with the computer system). Performing an operation of changing one or more aspects of the augmented reality environment in response to detecting the first gesture and in accordance with a determination that the first gesture does not meet the first set of criteria provides more control to the user such that the operation of changing one or more aspects of the augmented reality environment is performed.

In some embodiments, the first gesture is detected while a set of headphones (e.g., 900) in communication with the computer system is playing audio, and wherein performing the audio playback adjustment operation includes causing the set of headphones to perform the audio playback adjustment operation (e.g., adjust the sound intensity as in fig. 9C). In response to detecting the first gesture and in accordance with a determination that the first gesture meets a first set of criteria, selecting whether to cause the set of headphones to perform the audio playback adjustment operation provides control to the user to cause the set of headphones to perform the audio playback adjustment operation without requiring additional controls and/or virtual objects that confuse the user interface.

In some implementations, a first gesture is detected while an augmented reality device (e.g., a head mounted display unit, a smart phone, a laptop computer, and/or a tablet computer) is playing audio (e.g., because the augmented reality device is displaying an augmented reality user interface (e.g., an augmented reality user interface associated with audio being played back (e.g., a music user interface and/or a user interface including an indication that audio is being played back))), and wherein performing the audio playback adjustment operation includes causing the augmented reality device to perform the audio playback adjustment operation. In response to detecting the first gesture and in accordance with a determination that the first gesture meets a first set of criteria, selecting whether to cause the augmented reality device to perform the audio playback adjustment operation provides control to the user to cause the augmented reality device to perform the audio playback adjustment operation without requiring additional controls and/or virtual objects that confuse the user interface.

In some embodiments, the first media item (e.g., 920) is being played back (e.g., by a computer system, an augmented reality device, a set of headphones, and/or a computer system different from the computer system) before the first gesture is detected to be performed by the first hand, the first gesture is a pinch gesture (e.g., a single pinch gesture, a multiple pinch gesture, and/or in some embodiments, a flick gesture, and/or a swipe gesture), and performing an audio playback adjustment operation in response to detecting the first gesture includes playing or pausing the first media item (e.g., as described with reference to fig. 9D). In some implementations, the computer system (or another computer system/device) pauses the first media item in response to detecting the first gesture while the computer system is playing back the first media item. In some implementations, the computer system (or another computer system/device) plays the first media item in response to detecting the first media item when the computer system (or another computer system/device) is not playing back the first media item (and in some implementations is configured to play back the first media item). In some implementations, playing or pausing the first media item includes transmitting and/or sending a set of instructions that cause an augmented reality device, a set of headphones, and/or a computer system different from the computer system to play and/or pause the first media item. In response to detecting the first gesture as a pinch gesture and in accordance with a determination that the first gesture meets a first set of criteria, selecting whether to play or pause the first media item provides control to the user to play or pause the first media item without requiring additional controls and/or virtual objects that confuse the user interface.

In some embodiments, the second media item (e.g., 920 or 930) is being played back (e.g., by a computer system, an augmented reality device, a set of headphones, and/or a computer system different from the computer system) before the first gesture performed by the first hand is detected, the first gesture being a gesture (e.g., as described with reference to FIG. 9D) that includes movement of a finger of the first hand relative to a portion of the first hand (e.g., that does not include the portion of the finger of the first hand and/or another finger of the first hand different from the finger of the first hand) and (e.g., swipe and/or drag along the first hand), and performing the audio playback adjustment operation in response to detecting the first gesture includes changing playback (e.g., fast forward or rewind) of the second media item (e.g., as described with reference to FIG. 9D). In some implementations, fast forwarding or rewinding the second media item includes transmitting and/or sending a set of instructions that cause the augmented reality device, a set of headphones, and/or a computer system different from the computer system to fast forward or rewind the second media item and/or pause the second media item. In some embodiments, in accordance with a determination that the first gesture is in the first direction, changing playback of the second media item includes fast forwarding the media item (e.g., the media item currently being played and/or the media item currently configured for playback) (or, in some embodiments, jumping to the next media item). In some embodiments, in accordance with a determination that the first gesture is in a second direction different from the first direction, changing playback of the second media item includes rewinding the media item, e.g., the media item currently being played and/or the media item currently configured for playback (or, in some embodiments, returning to a previous media item). In response to detecting a gesture that includes movement of a finger of the first hand relative to a portion of the first hand and in accordance with a determination that the first gesture meets a first set of criteria, a selection is made as to whether to fast forward or rewind the second media item, which provides control to the user to fast forward or rewind the second media item without requiring additional controls and/or virtual objects that confuse the user interface.

In some embodiments, before detecting a first gesture performed by a first hand, the computer system displays (and/or in some embodiments causes another display to be displayed) a user interface (e.g., 700 b) that includes an augmented reality device, a set of headphones, and/or a computer system other than the computer system (e.g., and/or is configured to operate in a first transparent mode), detects a tap gesture (e.g., a double tap gesture, a single tap gesture, and/or in some embodiments a pinch gesture, a mouse tap gesture, and/or a slide gesture) performed by the first hand via one or more input devices while the computer system is displaying the user interface in a second transparent mode that is different from the first transparent mode, and displays the user interface in a second transparent metric that is different from the first transparent mode (e.g., configures the computer system to operate in a second transparent mode that is different from the first transparent mode) in response to detecting a second gesture and in accordance with determining that the second gesture satisfies a second set of criteria. In response to detecting the second gesture and in accordance with a determination that the second gesture meets a second set of criteria, displaying the user interface at a second transparent metric different from the first transparent metric provides control to the user to change the transparent metric of the user interface without requiring additional controls and/or virtual objects that confuse the user interface.

In some embodiments, prior to detecting the first gesture performed by the first hand, the third media item (e.g., 920 or 930) is configured to be played back (e.g., by the computer system, the augmented reality device, the set of headphones, and/or a computer system different from the computer system) at a first sound intensity level (e.g., 910a or 910 b), the first gesture is a rotated first pinch gesture (e.g., 950 b) (e.g., or a double-click gesture, a single-click gesture, and/or in some embodiments, a pinch gesture that moves (e.g., vertically, inwardly, outwardly, and/or horizontally), a mouse-click gesture, and/or a slide gesture), and in response to detecting the first gesture, performing the audio playback adjustment operation includes configuring the third media item to be played back at a second sound intensity level (e.g., 930) different from the first sound intensity level (e.g., at the computer system, the augmented reality device, the set of headphones, and/or a computer system different from the computer system). In some implementations, configuring the third media item to be played back at a second soundintensity level different from the first soundintensity level includes causing the first soundintensity level to increase or causing the first soundintensity level to decrease. In some embodiments, the second intensity level is lower than the first intensity level in response to detecting the first gesture and rotating in a first direction in accordance with a determination of the pinch gesture, and the second intensity level is higher than the first intensity level in response to detecting the first gesture and rotating in a second direction different from the first direction in accordance with a determination of the pinch gesture. In response to detecting the first gesture as the rotated first pinch gesture and in accordance with a determination that the first gesture meets a first set of criteria, selecting whether to configure the third media item for playback at a second soundintensity level different from the first soundintensity level, this provides a control to the user to change the soundintensity level at which the third media item is configured for playback without requiring additional controls and/or virtual objects that confuse the user interface.

In some embodiments, prior to detecting a first gesture performed by a first hand, a fourth media item (e.g., 920 or 930) is configured to be played back (e.g., by a computer system, an augmented reality device, a set of headphones, and/or a computer system different from the computer system) at a third level of sound intensity (e.g., 910 a), the first gesture is a rotated second pinch gesture (e.g., 950 b) (e.g., or a double-tap gesture, a single-tap gesture, and/or in some embodiments, a pinch gesture that moves (e.g., vertically, inwardly, outwardly, and/or horizontally), a mouse click gesture, and/or a slide gesture), and in response to detecting the first gesture, performing an audio playback adjustment operation includes configuring the third media item to be played back at a fourth level of sound intensity that differs from the third level of sound intensity (e.g., as described in reference to fig. 9C) in accordance with determining that the second pinch gesture is rotated at a first rate (e.g., speed (speed), acceleration, rate of movement, and/or velocity (velocity) that differs from the third level of sound intensity) in accordance with the determined second pinch gesture, wherein the second pinch gesture is configured to differ from the second level of sound intensity by a first level of sound intensity by a second amount different from the second level of sound intensity (e.g., as described in accordance with the second example, the first pinch gesture). In some implementations, when the second rate is higher (e.g., faster) than the first rate, the second amount of sound intensity is greater than the first amount of sound intensity (and/or the differential action of the sound intensity change increases as the gesture rotates at the higher rate). The third media item is configured to play back at different amounts of soundness based on the rate at which the second pinch gesture rotates providing the user with control over how much the soundness level is changed without requiring additional controls and/or virtual objects that confuse the user interface.

In some implementations, in response to detecting the first gesture and in accordance with a determination that the first gesture meets a first set of criteria, a first set of haptic outputs (e.g., 970 a) corresponding to the audio playback adjustment operation are generated (e.g., in connection with performing the audio playback adjustment operation (e.g., concurrent with, before, and/or after performing the audio playback adjustment operation)). Generating the first set of haptic outputs corresponding to the audio playback adjustment operation provides feedback to the user regarding performance of the audio playback adjustment operation, which may also reduce the amount of input required to reverse performance of the operation when the user inadvertently causes the audio playback adjustment operation.

In some embodiments, a wrist-worn wearable device (e.g., 702) (e.g., a smart watch, a wrist-worn fitness tracker, a heart rate monitor, and/or other wrist-worn device capable of detecting gestures) includes one or more input devices (e.g., 244) for detecting a first gesture performed by a first hand.

In some implementations, in response to detecting the first gesture and in accordance with a determination that the first gesture meets a first set of criteria, the computer system causes the wrist-worn wearable device (e.g., 702) to generate a second set of haptic outputs (e.g., 970 a) corresponding to the audio playback adjustment operation (e.g., in connection with performing the audio playback adjustment operation (e.g., concurrent with, before, and/or after performing the audio playback adjustment operation)). Causing the wrist-worn wearable device to generate a second set of haptic outputs corresponding to the audio playback adjustment operation provides feedback to the user regarding performance of the audio playback adjustment operation may also reduce the amount of input required to reverse performance of the operation when the user inadvertently causes the audio playback adjustment operation.

In some implementations, in accordance with determining that the first gesture (e.g., 950 b) (e.g., when in the first position and/or when at least a threshold distance from the first body part of the user) has a first direction of rotation, the audio playback adjustment operation is a first operation (e.g., decreasing a sound intensity level, rewinding the media item, jumping to a previous media item in the media item queue, and/or decreasing one or more audio characteristics (e.g., bass, balance, pitch, and/or tone)) and in accordance with determining that the first gesture has a second direction of rotation, the audio adjustment operation is a second operation (e.g., as described with reference to fig. 9C) that is different from the first operation (e.g., increasing a sound intensity level, fast-forwarding the media item in the media item queue, jumping to a next media item, and/or increasing one or more audio characteristics (e.g., bass, balance, pitch, and/or tone)). Performing the audio playback adjustment operation based on the direction of rotation allows the user to control the audio playback adjustment operation performed without requiring additional controls and/or virtual objects that confuse the user interface.

In some embodiments, in accordance with a determination that the first gesture (e.g., 950 b) (e.g., when in the first position and/or when a threshold distance from the first body part of the user) has a first rotational amplitude (e.g., velocity, speed, acceleration, speed, distance, and/or angular distance), the audio playback adjustment operation has a first adjustment amplitude (e.g., as described with reference to FIG. 9C) (e.g., decreasing the intensity level at the first amplitude, rewinding the media item, jumping to a preceding media item in the media item queue, and/or decreasing one or more audio characteristics (e.g., bass, balance, pitch, and/or tone)) (e.g., increasing the intensity level at the first amplitude, fast-forwarding the media item, jumping to a next media item, and/or increasing one or more audio characteristics (e.g., bass, balance, pitch, and/or tone)) (e.g., increasing the first gesture has a second rotational amplitude different from the first rotational amplitude, and/or the second adjustment amplitude (e.g., decreasing the first amplitude, jumping to the media item, balance, pitch, jumping to the next media item, jumping to the media item, and/or the first amplitude, jumping to the media item, the first amplitude, jumping to the tone)) And/or tone)). Performing the audio playback adjustment operation based on the magnitude of rotation allows the user to control the performed audio playback adjustment operation without requiring additional controls and/or virtual objects that confuse the user interface.

In some embodiments, aspects/operations of methods 800, 1000, and/or 1200 as described herein may be interchanged, substituted, and/or added between the methods. For example, the air gesture of method 800 may be the first gesture of method 1000 that conditionally performs audio playback adjustment. For the sake of brevity, these details are not repeated here.

Fig. 11A-11H illustrate examples of techniques for conditionally responding to an input. FIG. 12 is a flow chart of an exemplary method 1200 for conditionally responding to an input. The illustrations in fig. 11A to 11H are for illustrating the processes described below, including the process in fig. 12. As discussed in more detail below, those processes include providing a) a user interface object that causes a first operation to be performed when the device receives a first type of gesture and a second operation to be performed when the device receives a second type of gesture, b) a user interface object that causes a third operation to be performed when the device receives the first type of gesture or the second type of gesture, and/or c) a user interface object that causes a fourth operation to be performed when the device receives the first type of gesture but not an operation (e.g., any operation) to be performed when the device receives the second type of gesture. In some embodiments, the device is an augmented reality device (e.g., a device that presents an augmented reality environment to a user), such as a head mounted device. In such embodiments, presenting a user interface with virtual objects that may respond differently to gestures provides various control options, particularly for control via gestures. Control via gestures may be particularly useful for HMDs, as hardware elements of the HMDs (e.g., buttons or touch-sensitive surfaces) may not be visible to a user when wearing the HMDs and/or may be difficult to operate due to their location and/or lack of visibility.

At fig. 11A, a user 701 is holding and interacting with a device 700 that is currently displaying a camera user interface 1102 of a camera application. As discussed above, the device 700 may include one or more features of the computer system 101 (e.g., an eye tracking unit 243 configured to track the position and movement of the user's gaze), and in the embodiment of fig. 11A-11H, the device 700 includes an image sensor 314 that includes one or more cameras (e.g., rear-facing cameras) that may be used to capture still images and video using the camera user interface 1102. The user 701 also wears the wearable device 702 on her right hand. As described above, wearable device 702 includes one or more sensors (e.g., one or more heart rate sensors and/or blood pressure sensors (e.g., one or more light emitting elements and/or optical sensors on a back side of the device oriented toward a wrist of user 701), accelerometers, and/or gyroscopes) that detect movement (e.g., rotation and/or lateral movement), orientation, gestures, and positioning of the right hand of user 701.

At fig. 11A, camera user interface 1102 includes various selectable interface objects including a shutter object 1102a (e.g., for initiating image or video capture), a camera film object 1102b (e.g., for accessing one or more previously captured images or video), a flash control object 1102c (e.g., for selecting and/or modifying a flash mode), and a video mode object 1102d (e.g., for selecting a video capture mode). In fig. 11A, video object 1102d is currently selected (e.g., as indicated in bold), so device 700 is configured to capture video media (e.g., when shutter object 1102a is selected). The camera user interface 1102 also includes a camera preview object 1104, which is a user interface area that provides a preview of image data captured in the field of view of one or more cameras of the device 700. In fig. 11A, the camera preview object 1104 currently shows that the camera of the device 700 is currently pointing toward the person playing the guitar. In some implementations, the camera user interface 1102 is a set of virtual objects displayed in a virtual reality environment.

At fig. 11A, the device 700 detects (e.g., alternatively and/or sequentially) a gaze-directed shutter object 1102a, a camera film object 1102b, and a flash control object 1102c of the user 701, as indicated by exemplary gaze line 752a and gaze target indications 1154a, 1154b, and 1154c, respectively. In the embodiment of fig. 11A-11H, exemplary elements such as gaze line 752a and gaze target indications 1154 a-1154 c are provided for illustrative purposes only and are not part of the user interface provided by device 700 and/or 702.

Also at fig. 11A, device 702 (in some embodiments and/or device 700) detects a first input as a first type of gesture while the user's gaze is directed (e.g., alternatively and/or sequentially) toward user interface objects 1102 a-1102 c, as indicated by exemplary text 1150 a. In some implementations, the first type of gesture is an air gesture, such as a pinch-then-release gesture (e.g., pinch with thumb and forefinger for less than a predetermined period of time) or pinch-and-rotate as discussed above with reference to fig. 7B and 7C. In some embodiments, the air gesture is a different air gesture, such as a flick gesture made with an index or middle finger (e.g., a mid-air flick that does not involve contacting device 700 or device 702 with a user's finger) or positioning the user's finger in a predetermined pose (e.g., a "C" shape made with a thumb and index finger).

At fig. 11B, in response to detecting a first input corresponding to exemplary text 1150a when the gaze of user 701 is directed to shutter object 1102a (e.g., as illustrated by gaze target 1154a in fig. 11A), device 702 sends an indication of the detected first input to device 700. In response to receiving an indication of a first input while the user's gaze is directed to shutter object 1102a, device 700 initiates a process of recording video, as indicated by shutter object 1102a being replaced by stop object 1102e and recording time object 1102f beginning to increment the count. Thus, when a first type of gesture is detected while the user's attention is directed to the shutter object 1102a, the device 700 performs a first operation (e.g., initiates video recording) corresponding to the shutter object 1102 a. In some implementations, in response to detecting the first input corresponding to the exemplary text 1150a, the device 702 and/or the device 700 output a tactile output (e.g., to indicate that a gesture was received/recognized). In some implementations, different haptic outputs are provided for different gesture types (e.g., a single haptic vibration for pinch and then release and two haptic vibrations for a double pinch gesture).

At fig. 11C, in response to detecting a first input corresponding to exemplary text 1150a when the gaze of user 701 is directed to camera film object 1102b (e.g., as illustrated by gaze target 1154b in fig. 11A), device 702 sends an indication of the detected first input to device 700. In response to receiving an indication of a first input while the user's gaze is directed to the camera film object 1102b, the device 700 displays a camera film interface 1106 that includes a media representation 1106a of a previously captured image (e.g., captured prior to the depiction of fig. 11A). Thus, when a first type of gesture is detected while the user's attention is directed to the camera film object 1102b, the device 700 performs a first operation (e.g., displaying a representation of previously captured media) corresponding to the camera film object 1102 b.

At fig. 11D, in response to detecting a first input corresponding to exemplary text 1150a when the gaze of user 701 is directed to flash control object 1102c (e.g., as illustrated by gaze target 1154c in fig. 11A), device 702 sends an indication of the detected first input to device 700. In response to receiving an indication of the first input while the user's gaze is directed to the flash control object 1102c, the device 700 enables an automatic flash function as indicated by a change in the appearance of the flash control object 1102 c. Thus, when a first type of gesture is detected while the user's attention is directed to the flash control object 1102c, the device 700 performs a first operation (e.g., enables auto-flash) corresponding to the flash control object 1102 c.

At fig. 11D, the device 700 detects (e.g., alternatively and/or sequentially) the gaze-directed shutter object 1102a, camera film object 1102b, and flash control object 1102c of the user 701, as indicated by the exemplary gaze line 752a and gaze target indications 1154D, 1154e, and 1154f, respectively. In some implementations, the device 700 displays a visual indication of the detected user gaze, such as highlighting the object (e.g., highlighting and/or thickening the shutter object 1102 when the device 700 determines that the user's gaze is currently directed to the shutter object 1102).

Also at fig. 11D, device 702 (and/or device 700 in some embodiments) detects a second input as a gesture of a second type different from the first type, while the user's gaze is directed (e.g., alternatively and/or sequentially) toward user interface objects 1102 a-1102 c, as indicated by exemplary text 1150b and gaze targets 1154D-1154 f, respectively. In some implementations, the second type of gesture is an air gesture, such as a pinch and hold gesture (e.g., pinch with thumb and forefinger, which remains at least a predetermined period of time before being released) or pinch and rotate, as discussed above with reference to fig. 7B and 7C. In some embodiments, the air gesture is a different air gesture, such as a flick gesture made with an index or middle finger (e.g., a mid-air flick that does not involve contacting device 700 or device 702 with a user's finger) or positioning the user's finger in a predetermined pose (e.g., a "V" shape made with a thumb and index finger).

In response to detecting a second input corresponding to the exemplary text 1150b when the gaze of the user 701 is directed to the shutter object 1102a (e.g., as illustrated by gaze target 1154D in fig. 11D), the device 702 sends an indication of the detected second input to the device 700. In response to receiving an indication of the second input while the user's gaze is directed to shutter object 1102a, device 700 initiates a process of recording video, as seen in fig. 11B, in which shutter object 1102a is replaced by stop object 1102e and recording time object 1102f begins to increment a count. Thus, when a second type of gesture is detected while the user's attention is directed to the shutter object 1102a, the device 700 performs the same first operation (e.g., initiates video recording) corresponding to the shutter object 1102 a. In general terms, the device 700 performs the same operation when a first type of gesture or a second type of gesture is detected while the user's attention is directed to the shutter object 1102 a. In some implementations, mapping multiple gestures to the same operation for a virtual object ensures that the operation is performed without requiring the user to remember a single precise gesture. In some implementations, the virtual object is associated with only a single operation, and thus gesture disambiguation is not required (gesture-disambiguation). In some embodiments, this operation is an important or critical operation (e.g., media capture of a potential transient event), and mapping multiple gestures from a set of recognizable gestures (in some embodiments mapping all recognizable gestures) reduces the risk of not performing the operation due to false recognition of the gesture and/or user error in providing the desired gesture.

In response to detecting a second input corresponding to the exemplary text 1150b when the gaze of the user 701 is directed to the camera film object 1102b (e.g., as illustrated by gaze target 1154e in fig. 11D), the device 702 sends an indication of the detected second input to the device 700. In response to receiving an indication of the second input while the user's gaze is directed to the shutter object 1102b, the device 700 maintains the display of the user interface 1102b, as seen in fig. 11D (e.g., the device 700 does not provide a perceptible response to the second input). Thus, when a second type of gesture is detected while the user's attention is directed to the camera film object 1102b, the device 700 does not perform a perceptible operation corresponding to the camera film object 1102 b. In some implementations, the second type of gesture does not cause a perceptible operation (e.g., displaying a representation of previously captured media, as seen in fig. 11C) when causing a false positive of the operation (false positives). For example, navigating away from the camera capture user interface may result in missing transient media capture opportunities when the user does not intend to navigate away from the camera capture user interface 1102. In such implementations, the first operation corresponding to the camera film object 1102b is mapped to only a particular set of gesture types (in some implementations, a single type of gesture).

In response to detecting a second input corresponding to the exemplary text 1150b when the gaze of the user 701 is directed to the flash control object 1102c (e.g., as illustrated by gaze target 1154f in fig. 11D), the device 702 sends an indication of the detected second input to the device 700. In response to receiving an indication of the second input while the user's gaze is directed to the flash control object 1102c, the device 700 switches to the forced on flash function. Thus, when a gesture of a first type is detected while the user's attention is directed to the flash control object 1102c, the device 700 performs a second operation (e.g., enables forced flash) corresponding to the flash control object 1102 c. In some implementations, if a second type of gesture (e.g., as seen in fig. 11A) is detected when the flash is off and when the user's attention is directed to the flash control object 1102c, the flash function will transition to forced on flash instead of auto flash. In some implementations, the virtual object is associated with three or more operations (e.g., flash off, auto flash, and forced on flash), and the three or more particular gesture types are each mapped to a particular operation. For example, a pinch and then release gesture is mapped to flash off, pinch and remain mapped to auto flash, and a double pinch gesture is mapped to forced flash on. In some implementations, the respective virtual object is associated with a plurality of states (e.g., three or more states, such as off, auto flash, and forced on flash) and receiving a first type of gesture or a second type of gesture when the user's attention is directed to the respective virtual object causes the same operation-cycling to the next state in a predetermined sequence of states.

In fig. 11E, the device 700 detects that the user's gaze is directed to a location in the camera preview object 1104, as indicated by the exemplary gaze line 752a and gaze target indication 1154 g. Also at fig. 11E, the device 702 (and/or the device 700 in some implementations) detects a third input as a first type of gesture while the user's gaze is directed to a location in the camera preview object 1104, as indicated by the exemplary text 1150 c.

At fig. 11F, in response to detecting a third input corresponding to the exemplary text 1150c when the gaze of the user 701 is directed to a location in the camera preview object 1104, the device 702 sends an indication of the detected third input to the device 700. In response to receiving an indication of a third input while the user's gaze is directed to a location in the camera preview object 1104, the device 700 displays an exposure adjustment control object 1108. In some implementations, the first type of gesture is an air gesture having an amplitude characteristic and/or a direction characteristic (e.g., the first type of gesture is a finger swipe gesture as discussed above with respect to fig. 9C), and the one or more characteristics of the operation are based on the amplitude characteristic and/or the direction characteristic of the gesture (e.g., the change in the exposure is based on the direction and distance the thumb is slid over the index finger). Thus, when a first type of gesture is detected while the user's attention is directed to the camera preview object 1104, the device 700 performs a first operation (e.g., displays the exposure adjustment control object 1108) corresponding to the camera preview object 1104.

At fig. 11G, the device 700 detects that the user's gaze is directed to the same location in the camera preview object 1104 as seen in fig. 11E, as indicated by the exemplary gaze line 752a and gaze target indication 1154G. Also at fig. 11G, device 702 (and/or device 700 in some implementations) detects a fourth input as a second type of gesture while the user's gaze is directed to a location in camera preview object 1104, as indicated by exemplary text 1150 d.

At fig. 11H, in response to detecting a fourth input corresponding to the exemplary text 1150d while the gaze of the user 701 is directed to a location in the camera preview object 1104, the device 702 sends an indication of the detected fourth input to the device 700. In response to receiving an indication of the fourth input while the user's gaze is directed to a location in the camera preview object 1104, the device 700 displays a representation 720a corresponding to the weather application, as described with reference to fig. 7B, and outputs a haptic output 1160a (e.g., similar to the haptic output 770a of fig. 7B). In some embodiments, the user may then switch to a different application using additional gestures, as discussed with respect to fig. 7A-7E. Thus, when a second type of gesture is detected while the user's attention is directed to the camera preview object 1104, the device 700 performs a second operation (e.g., displaying representation 720 a) corresponding to the camera preview object 1104. In some implementations, when a third type of gesture (e.g., a double pinch gesture) is detected, a third operation (e.g., a system level operation such as transitioning the device and/or display to a low power state) is performed. In some embodiments, when a corresponding gesture is detected while the user's attention is directed to the virtual object, different operations mapped to different gesture types may be performed. In some embodiments, this allows multiple operations to be associated with a virtual object without cluttering the user interface with an otherwise displayed control object.

Additional description with respect to fig. 11A-11H refers to method 1200 described below with respect to fig. 11A-11H.

Fig. 12 is a flowchart of an exemplary method 1200 for conditionally responding to an input, according to some embodiments. In some embodiments, the method 1000 is performed at a computer system (e.g., computer system 101 in fig. 1; devices 900, 700, or 702) (e.g., a smart phone, desktop computer, laptop computer, tablet, smartwatch, wrist-worn fitness tracker, heads-up display unit, optical heads-up display unit, head-worn augmented reality and/or augmented reality device, and/or wearable device) in communication with a display generating component and one or more input devices (e.g., a camera; gyroscope, accelerometer, acoustic sensor, physiological sensor (e.g., blood pressure sensor), keyboard, touch-sensitive surface, smartwatch, and/or mouse). In some embodiments, one or more input devices are integrated into an external device (e.g., a smartwatch and/or a wearable device) that communicates with the computer system. In some embodiments, method 1000 is managed by instructions stored in a non-transitory (or transitory) computer-readable storage medium and executed by one or more processors of a computer system (such as one or more processors 202 of computer system 101) (e.g., control 110 in fig. 1). Some operations in method 1200 are optionally combined and/or the order of some operations is optionally changed.

The computer system (e.g., 700 or 702) detects (1202) input (e.g., 1150 a) (e.g., input having a set of characteristics (e.g., pinch, and hold, and/or multi-pinch gesture/input and/or in some embodiments, non-pinch gesture/input (such as click, click and hold, and/or multi-click, tap, press and hold, and/or multi-tap gesture/input)) via one or more input devices when a user interface (e.g., 1102) including virtual objects (e.g., 1102a through 1102 c) (e.g., and/or first virtual objects and second virtual objects) is displayed via a display generation component (e.g., 700 a).

In response to detecting the input (1204) and in accordance with determining that the user's attention (e.g., the user's gaze directed to a portion of the user interface (e.g., a particular portion and/or a predetermined portion), the user's object, or body part) is directed to a first virtual object (e.g., 1154c or 1104) (e.g., a virtual object that causes the computer system to perform a different operation in response to and/or in response to detecting the first type of input or the second type of input) in combination with detecting the input (e.g., while the input is detected, after (e.g., immediately after (e.g., within 0 to 5 seconds)), and/or before (e.g., immediately before (e.g., within 0 to 5 seconds)) and inputting an input (e.g., 1150 a) (e.g., pinch and hold, single pinch, and/or multiple pinch (e.g., double and/or triple pinch), the system performs (1206) a first operation (e.g., pinch, e.g., as illustrated in fig. 11D or fig. 11F) with respect to the first virtual object) (e.g., a pinch, a switch, a second computer object is selected, a first operation is not illustrated, and/or a second computer object is not illustrated) (e.g., a media is switched, or is set and/is not captured with respect to the first virtual object).

In response to detecting the input (1204) and in accordance with a determination that the user's attention is directed to the first virtual object in combination with detecting the input and the input corresponds to a second type of input (e.g., 1150B) that is different from the first type of input (e.g., pinch and hold, single pinch, and/or multi-pinch (e.g., double pinch and/or triple pinch) input), the computer system performs (1208) a second operation (e.g., as described with reference to fig. 11B, 11C, or 11D) with respect to the first virtual object (e.g., as described with reference to fig. 11D or 11H) (e.g., selects the virtual object, switches between applications, opens and/or closes a setting, and/or captures media), wherein the second operation is different from the first operation (e.g., does not perform the first operation with respect to the first virtual object).

In response to detecting the input (1204) and in accordance with a determination that the user's attention is directed to a second virtual object (e.g., 1102) (e.g., a virtual object that causes the computer system to perform the same operation in response to the first type of input (and in some embodiments, not the second type of input) and/or in response to detecting the first type of input or the second type of input), wherein the second virtual object is different from the first virtual object and the input corresponds to the first type of input, the computer system performs (1210) a first operation (e.g., as described with reference to fig. 11B) with respect to the second virtual object (e.g., does not perform a second operation with respect to the second virtual object).

In response to detecting the input (1204) and in accordance with a determination that the attention of the user is directed to the second virtual object in connection with detecting the input and that the input corresponds to the second type of input, the computer system performs (1212) a first operation with respect to the second virtual object (e.g., as described with reference to fig. 11D) (e.g., does not perform the second operation with respect to the second virtual object).

In some implementations, detecting the first type of input includes detecting that a pinch-then-release gesture has been performed (e.g., as described with reference to fig. 11A). Performing the same operation or a different operation with respect to the respective virtual object based on the respective virtual object regarding detecting the pinch and then releasing gesture or another type of input allows the computer system to automatically select which operation is to be performed based on the type of the respective virtual object, wherein the same operation is performed in some scenarios (e.g., when the respective virtual object is a first virtual object) and a different operation is performed in other scenarios (e.g., when the respective virtual object is a second virtual object) in response to detecting the pinch and then releasing gesture or another type of input.

In some embodiments, detecting the second type of input includes detecting a pinch gesture having a duration longer than a predetermined time period (e.g., as described with reference to fig. 11D) (e.g., a non-zero time period (e.g., 1 second to 5 seconds) and/or a time period longer than zero seconds). Performing the same operation or a different operation based on the respective virtual object with respect to detecting the first type of input or the pinch gesture having a duration longer than a predetermined period of time allows the computer system to automatically select which operation is to be performed based on the type of the respective virtual object, wherein the same operation is performed in some scenarios (e.g., when the respective virtual object is a first virtual object) and a different operation is performed in other scenarios (e.g., when the respective virtual object is a second virtual object) in response to detecting that the pinch gesture has been performed for a longer time than the predetermined period of time or the second type of input.

In some implementations, in response to detecting the input and in accordance with a determination that the user's attention is directed to the first virtual object (e.g., 1102 c), in conjunction with detecting the input and the input corresponds to a third type of input that is different from the first type of input and the second type of input, a third operation (e.g., as described with reference to fig. 11D) is performed with respect to the first virtual object, wherein the third operation (e.g., selecting the virtual object, switching between applications, opening and/or closing settings, and/or capturing media) is different from the first operation and the second operation. In accordance with a determination that the user's attention is directed to the first virtual object, in conjunction with detecting an input and the input corresponds to a third type of input that is different from the first type of input and the second type of input, performing a third operation with respect to the first virtual object provides control to the user to perform an operation that is different from the first operation or the second operation (e.g., with respect to an object that is responsive to the first operation, the second operation, and the third operation) without cluttering the user interface with additional controls and/or virtual objects.

In some implementations, detecting the third type of input includes detecting that a multi-pinch gesture (e.g., a double pinch gesture, a triple pinch gesture, and/or a quad pinch gesture) has been performed (e.g., as described with reference to fig. 11D). In some embodiments, the multi-pinch gesture is a gesture in which multiple single-pinch are detected within a predetermined period of time (e.g., within 0.1 to 2 seconds of each other), and in some embodiments, the computer system performs an operation different from the third operation if one of the single-pinch is not detected within a predetermined period of time (e.g., within 0.1 to 2 seconds) of a previous single-pinch. In accordance with a determination that the user's attention is directed to the first virtual object in connection with detecting the input and the input corresponds to the multi-pinch gesture, performing a third operation with respect to the first virtual object provides control to the user to perform an operation other than the first operation or the second operation (e.g., with respect to an object responsive to the first operation, the second operation, and the third operation) without cluttering the user interface with additional controls and/or virtual objects.

In some implementations, performing the third operation relative to the first virtual object includes performing a system operation (e.g., displaying a system user interface or enabling or disabling system-level functionality). In accordance with a determination that the user's attention is directed to the first virtual object in connection with detecting the input and the input corresponds to the third type of input, performing a system operation with respect to the first virtual object provides control to the user to perform an operation other than the first operation or the second operation (e.g., with respect to an object responsive to the first operation, the second operation, and the system operation) without cluttering the user interface with additional controls and/or virtual objects.

In some embodiments, performing the third operation with respect to the first virtual object includes transitioning the computer system from the first system mode to a second system mode different from the first system mode (e.g., as described with reference to fig. 11H). In some embodiments, as part of transitioning the computer system from the first system mode to a second system mode that is different from the first system mode, the computer system enables the low power mode and/or the reduced power mode (e.g., a mode that places one or more components of the computer system (e.g., a display generating component, a processor, and/or a connectivity hardware component such as a bluetooth hardware module, and/or a wireless hardware module) in a low power state and/or a reduced power state) or the sleep mode (e.g., a mode that places one or more components of the computer system (e.g., a display generating component, a processor, and/or a connectivity hardware component such as a bluetooth hardware module, and/or a wireless hardware module) in a sleep state) or disables the low power mode or the sleep mode. In accordance with a determination that the user's attention is directed to the first virtual object, in conjunction with detecting an input and that the input corresponds to a third type of input, transitioning the computer system from the first system mode to a second system mode different from the first system mode provides control to the user to transition the computer system from the first system mode to the second system mode without cluttering the user interface with additional controls and/or virtual objects.

In some embodiments, in response to detecting the input and in accordance with a determination that the user's attention is directed to a portion of the user interface that does not include the second virtual object (e.g., or any virtual object), the computer system performs a fourth operation (e.g., does not perform an operation with respect to the second virtual object) (and/or the first operation and/or the second operation). In response to detecting the input and in accordance with a determination that the user's attention is directed to a portion of the user interface that does not include the second virtual object and that the input corresponds to the second type of input, performing the fourth operation allows the computer system to automatically perform the fourth operation because the user interface is responsive to the second type of input (e.g., while the second virtual object is not responsive to the second type of input).

In some embodiments, when the user interface is displayed, the third virtual object is emphasized (e.g., the third virtual object is highlighted, the size of the third virtual object is increased, the third virtual object is thickened (e.g., the perimeter and/or interior of the third virtual object)) in accordance with a determination that the user's attention is directed to the third virtual object (e.g., the first virtual object, the second virtual object, or another virtual object) (e.g., as described with reference to FIG. 11D) (in some embodiments, the third virtual object is thickened, the perimeter and/or interior of the third virtual object is thickened) in accordance with a determination that the user's attention is not directed to the third virtual object (e.g., the first type of input is not sensed by the pointing device), and the third virtual object is not emphasized (e.g., the third virtual object is not highlighted, the size of the third virtual object is increased, the perimeter and/or interior of the third virtual object) in accordance with a determination that the user's attention is directed to the third virtual object is selected to allow the computer system to automatically provide feedback to the user's attention directed to the third virtual object based on prescribed conditions.

In some embodiments, detecting the second type of input causes the computer system to display a system user interface (e.g., as described with reference to fig. 11H) (e.g., not displayed before the second type of input was detected and/or not previously displayed).

The system user interface is an application switcher user interface (e.g., as described with reference to fig. 11H and 7A-7E) (e.g., a user interface displaying a plurality of user interfaces, wherein each of the plurality of user interfaces corresponds to and/or represents a different application than other ones of the plurality of user interfaces). In some embodiments, the computer system may detect one or more inputs (e.g., a single pinch gesture, a multiple pinch gesture, a pinch and rotate gesture, and/or an air flick gesture) while displaying the application switcher user interface, and in response to detecting the one or more inputs, the computer system performs one or more operations (e.g., operations to switch to one or more different applications and/or operations to switch to one or more applications that are not displayed in the application switcher user interface and/or are not currently focused in the application switcher user interface).

In some embodiments, upon displaying the system user interface, the computer system detects a second input (e.g., 750a 2) corresponding to a fourth type of input (e.g., different from the first type of input and/or the second type of input) (e.g., a portion of the input and/or an input that is different from the detected input detected upon displaying the user interface that includes the virtual object and/or has the set of characteristics). In some embodiments, the fourth type of input is part of the second type of input (e.g., the fourth type of input is a rotating portion of a pinch and rotate input and/or the fourth type of input is a sliding portion of a pinch and slide input). In response to detecting the second input corresponding to the fourth type of input, operations corresponding to the fourth type of input (e.g., as described with reference to fig. 7C-7D) are performed with respect to the system user interface (and in some embodiments, no operations are performed with respect to the first virtual object and/or the second virtual object). In some implementations, in response to detecting a second input corresponding to a fourth type of input and in accordance with a determination that the second input is performed in the first direction, operations corresponding to the fourth type of input are performed with respect to the system user interface based on the first direction (and not based on a second direction different from the first direction). In some implementations, in response to detecting a second input corresponding to the fourth type of input and in accordance with a determination that the second input is performed in a second direction different from the first direction, operations corresponding to the fourth type of input are performed with respect to the system user interface based on the second direction (and not based on the first direction). Performing an operation with respect to the system user interface in response to detecting a second input corresponding to a fourth type of input allows the user to control the system user interface without cluttering the user interface with additional controls and/or virtual objects.

In some implementations, in response to detecting an input, the computer system generates a first set of haptic outputs (e.g., 1160 a) (e.g., to provide feedback for detecting the input and/or to indicate that a particular type of input was detected (e.g., one or more air pinch, air pinch and hold, air pinch and rotate, and/or air double pinch gestures)). In some embodiments, generating the first set of outputs includes, in accordance with a determination that the input is a first type of input, the first set of haptics including a first type of haptics (e.g., having a first duration, a first amplitude, and/or a first pattern), and, in accordance with a determination that the input is a second type of input, the first set of haptics including a second type of haptics (e.g., having a second duration different from the first duration, a second amplitude different from the first amplitude, and/or a second pattern different from the first pattern). In some embodiments, the first set of haptic sensations changes over time and/or occurs over time as the input is made and/or performed. Generating the first set of haptic outputs in response to detecting the input provides feedback to the user regarding the detection of the execution of the input, which may also reduce the amount of input required to reverse the execution of the operation when the user inadvertently causes the audio playback adjustment operation.

In some embodiments, a wrist-worn wearable device (e.g., 702) (e.g., a smart watch, a wrist-worn fitness tracker, and/or a heart rate monitor) includes one or more input devices for detecting an input (e.g., 244).

In some embodiments, in response to detecting the input, the wrist-worn wearable device (e.g., 702) is caused to generate a second set of haptic outputs (e.g., as described with reference to fig. 11H) (e.g., the first set of haptic outputs or a different set of haptic outputs described with reference [ claim 13 ]). Causing the wrist-worn wearable device to generate a second set of haptic outputs in response to detecting the input provides feedback to the user regarding the execution of the detection of the input may also reduce the number of inputs required to reverse the execution of the operation when the user inadvertently causes the audio playback adjustment operation.

In some embodiments, the user interface (e.g., 1102) is an augmented reality user interface (e.g., as described with reference to fig. 11A). In some embodiments, the computer system detects input while the augmented reality user interface is displayed and/or detects input at a computer system that does not display the augmented reality user interface.

The foregoing description, for purposes of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and its practical application to thereby enable others skilled in the art to best utilize the invention and various described embodiments with various modifications as are suited to the particular use contemplated.

In some embodiments, aspects/operations of methods 800, 1000, and/or 1200 as described herein may be interchanged, substituted, and/or added between the methods. For example, the air gesture of method 800 may be an input of method 1200 detected when the user's attention is directed to the first virtual object. For the sake of brevity, these details are not repeated here.

As described above, one aspect of the present technology is to collect and use data from various sources to improve the XR experience of the user and/or to improve gesture-based input systems. The present disclosure contemplates that in some instances, such collected data may include personal information data that uniquely identifies or may be used to contact or locate a particular person. Such personal information data may include demographic data, location-based data, telephone numbers, email addresses, tweet IDs, home addresses, data or records related to the user's health or fitness level (e.g., vital sign measurements, medication information, exercise information), date of birth, or any other identification or personal information.

The present disclosure recognizes that the use of such personal information data in the present technology may be used to benefit users. For example, personal information data may be used to improve the XR experience of the user and/or to improve gesture-based input systems. In addition, the present disclosure contemplates other uses for personal information data that are beneficial to the user. For example, the health and fitness data may be used to provide insight into the general health of the user, or may be used as positive feedback to individuals who use the technology to pursue health goals.

The present disclosure contemplates that entities responsible for the collection, analysis, disclosure, transmission, storage, or other use of such personal information data will adhere to sophisticated privacy policies and/or privacy measures. In particular, such entities should exercise and adhere to the use of privacy policies and measures that are recognized as meeting or exceeding industry or government requirements for maintaining the privacy and security of personal information data. Such policies should be convenient for the user to access and should be updated as the collection and/or use of data changes. Personal information from users should be collected for legitimate and reasonable physical uses and must not be shared or sold outside of these legitimate uses. Further, such collection/sharing should be performed after receiving the user's informed consent. Additionally, such entities should consider taking any necessary steps for protecting and securing access to such personal information data and ensuring that other entities having access to the personal information data adhere to the privacy policies and procedures of other entities. Moreover, such entities may subject themselves to third party evaluations to prove compliance with widely accepted privacy policies and privacy practices. In addition, policies and practices should be adapted to the particular type of personal information data collected and/or accessed, and to applicable laws and standards including consideration of particular jurisdictions. For example, in the united states, the collection or acquisition of certain health data may be governed by federal and/or state law, such as the health insurance circulation and liability act (HIPAA), while health data in other countries may be subject to other regulations and policies and should be treated accordingly. Thus, different privacy measures should be claimed for different personal data types in each country.

Regardless of the foregoing, the present disclosure also contemplates embodiments in which a user selectively blocks use or access to personal information data. That is, the present disclosure contemplates that hardware elements and/or software elements may be provided to prevent or block access to such personal information data. For example, with respect to an XR experience or gesture-based input experience, the present technology may be configured to allow a user to choose to "opt-in" or "opt-out" to participate in the collection of personal information data at any time during or after registration with a service. As another example, the user may choose not to provide data for service customization. For another example, the user may choose to limit the length of time that data is maintained or to prohibit development of the customized service altogether. In addition to providing the "opt-in" and "opt-out" options, the present disclosure contemplates providing notifications related to accessing or using personal information. For example, the user may be notified that his personal information data will be accessed when the application is downloaded, and then be reminded again just before the personal information data is accessed by the application.

Furthermore, it is intended that personal information data should be managed and processed in a manner that minimizes the risk of inadvertent or unauthorized access or use. Once the data is no longer needed, risk can be minimized by limiting the collection and deletion of data. Further, and when applicable, including in certain health-related applications, data de-identification may be used to protect the privacy of the user. De-identification may be facilitated by removing specific identifiers (e.g., date of birth, etc.), controlling the amount or specificity of stored data (e.g., collecting location data at a city level instead of at an address level), controlling how data is stored (e.g., aggregating data among users), and/or other methods, as appropriate.

Thus, while the present disclosure broadly covers the use of personal information data to implement one or more of the various disclosed embodiments, the present disclosure also contemplates that the various embodiments may be implemented without the need to access such personal information data. That is, various embodiments of the present technology do not fail to function properly due to the lack of all or a portion of such personal information data. For example, an XR experience may be generated by inferring preferences based on non-personal information data or absolute minimum metrics of personal information, such as content requested by a device associated with the user, other non-personal information available to the service, or publicly available information.

Claims

1. A method comprising:

At a computer system in communication with one or more input devices:

detecting, via the one or more input devices, an air gesture performed by a hand and comprising a rotation of the hand; and

In response to detecting the mid-air gesture:

performing a first operation based on the air gesture based on determining that the air gesture satisfies a first set of criteria, wherein the first set of criteria includes criteria that are satisfied when the air gesture is detected while the hand is performing a pinch gesture, wherein performing the first operation includes performing the first operation based on a magnitude of movement of the rotation of the hand; and

Based on determining that the air gesture does not meet the first set of criteria, abandoning performing the first operation based on the air gesture.

2. The method according to claim 1, wherein:

In response to detecting the mid-air gesture and in accordance with determining that the mid-air gesture satisfies the first set of criteria:

According to the determination that the hand rotates with the first movement amplitude, performing the first operation with the first execution amplitude; and

The first operation is performed with a second execution amplitude different from the first execution amplitude based on determining that the hand is rotated with a second movement amplitude different from the first movement amplitude.

3. The method according to any one of claims 1 to 2, wherein:

performing the first operation at a third execution amplitude based on determining that the hand is rotating at a first speed and the hand is rotating a first angular distance; and

The first operation is performed with a fourth execution amplitude different from the third execution amplitude based on determining that the hand is rotating at the first speed and the hand is rotated by a second angular distance different from the first angular distance.

4 . The method according to claim 1 , wherein the first angular distance is smaller than the second angular distance, and wherein the third execution amplitude is larger than the fourth execution amplitude.

5. The method according to any one of claims 1 to 2, wherein:

Based on determining that the hand is rotating in a first direction, performing the first operation based on the first direction; and

Based on determining that the hand is rotated in a second direction different from the first direction, the first operation is performed based on the second direction.

6. The method according to any one of claims 1 to 2, further comprising:

In response to detecting the mid-air gesture and based on determining that the mid-air gesture satisfies the first set of criteria, a process is initiated to generate a set of one or more tactile outputs indicating that the gesture satisfies the first set of criteria.

7. A method according to claim 6, wherein the computer system communicates with a wearable device, and wherein initiating the process for generating tactile feedback includes sending one or more instructions that cause the wearable device to generate the set of one or more tactile outputs indicating that the gesture satisfies the first set of criteria.

8. The method according to any one of claims 1 to 2, wherein performing the first operation based on the mid-air gesture comprises transitioning through a first state sequence, the method further comprising:

When transitioning through the first sequence of states:

Based on determining that a first state in the first sequence of states has been reached, generating a set of one or more tactile outputs indicating that the first state in the first sequence of states has been reached; and

Based on determining that a second state in the first sequence of states has been reached, a set of one or more tactile outputs is generated indicating that the second state in the first sequence of states has been reached, wherein the second state is different from the first state.

9. The method of any one of claims 1 to 2, wherein performing the first operation based on the mid-air gesture comprises transitioning through a second state sequence, the method further comprising:

While transitioning through the sequence of states, detecting that a particular state in the sequence of states has been reached; and

In response to detecting that the particular state has been reached, a set of one or more tactile outputs is generated indicating that the particular state in the sequence of states has been reached.

10. The method according to any one of claims 1 to 2, further comprising:

While performing the first operation based on the mid-air gesture, detecting a first change to the mid-air gesture including the rotation of the hand; and

In response to detecting the first change to the mid-air gesture that includes the rotation of the hand and based on determining that the rotation of the hand has stopped while the hand continues to perform the pinch gesture, continuing to perform the first operation.

11. The method according to any one of claims 1 to 2, further comprising:

While performing the first operation based on the mid-air gesture, detecting a second change to the mid-air gesture that includes the rotation of the hand; and

In response to detecting the second change to the mid-air gesture that includes the rotation of the hand and based on determining that the hand has changed from rotating in a first direction to rotating in a second direction different from the first direction, ceasing to perform the first operation.

12. The method according to any one of claims 1 to 2, further comprising:

When performing the first operation based on the mid-air gesture:

Based on determining that the first portion of the first operation is being performed, generating a set of one or more tactile outputs indicating that the first portion of the first operation is being performed; and

Based on determining that a second portion of the first operation is being performed, generating a set of one or more tactile outputs indicating that the second portion of the first operation is being performed, wherein the second portion of the first operation is different from the first portion of the operation.

13. The method according to any one of claims 1 to 2, wherein performing the first operation based on the mid-air gesture comprises:

Based on determining that the mid-air gesture is being performed at a first speed, generating a set of tactile feedback indicating that the mid-air gesture is being performed at the first speed; and

Based on determining that the mid-air gesture is being performed at a second speed different from the first speed, a set of tactile feedback is generated indicating that the mid-air gesture is being performed at the second speed different from the first speed.

14. The method of any one of claims 1 to 2, wherein in response to detecting the mid-air gesture, performing the first operation at a first rate, the method further comprising:

While performing the first operation at the first rate, detecting an additional rotation of the hand while the hand is performing the pinch gesture; and

In response to detecting the additional rotation of the hand while the hand is performing the pinch gesture, the first operation is performed at a second rate different from the first rate.

15. The method according to any one of claims 1 to 2, further comprising:

While performing the first operation based on the mid-air gesture, detecting a third change to the mid-air gesture that includes the rotation of the hand; and

In response to detecting the third change to the mid-air gesture that includes the rotation of the hand and based on determining that the hand is no longer performing the pinch gesture, performing the first operation is abandoned.

16. The method according to any one of claims 1 to 2, wherein the first operation is an operation performed with respect to one or more virtual objects in an augmented reality environment.

17. The method of any one of claims 1 to 2, wherein the computer system is in communication with a display generation component, and wherein the first set of criteria comprises criteria that are satisfied when the display generation component is in an active state.

18. The method of any one of claims 1 to 2, wherein the mid-air gesture is detected while a first virtual object and a second virtual object different from the first virtual object are displayed, and wherein:

In response to detecting the mid-air gesture:

Based on determining that the mid-air gesture satisfies the first set of criteria when the user's attention is directed toward the first virtual object, performing the first operation with respect to the first virtual object; and

Based on determining that the mid-air gesture meets the first set of criteria when the user's attention is directed toward the second virtual object, performing the first operation relative to the second virtual object.

19. The method of any one of claims 1 to 2, wherein the mid-air gesture is detected while a third virtual object is displayed, and wherein:

In response to detecting the mid-air gesture:

performing the first operation relative to the third virtual object and based on the movement of the hand based on determining that the mid-air gesture meets the first set of criteria when the user's attention is directed toward the third virtual object and the third virtual object is responsive to the rotation of the hand; and

Based on determining that the mid-air gesture meets the first set of criteria when the user's attention is directed to the third virtual object and the third virtual object is responsive to the rotation of the hand, not performing the first operation with respect to the third virtual object.

20. The method of any one of claims 1 to 2, wherein the first set of criteria comprises criteria that are satisfied when a user's attention is directed toward a first location, and wherein the user's attention is determined based on the user's gaze being directed toward the first location.

21. A method according to any one of claims 1 to 2, wherein the first set of criteria includes criteria that are satisfied when the user's attention is directed to a second location, and wherein the user's attention is determined based on the user's hand being within a predetermined distance from the second location.

22. The method according to any one of claims 1 to 2, wherein performing the first operation based on the mid-air gesture comprises:

In response to detecting a first portion of the mid-air gesture including a pinch-and-hold gesture, displaying a plurality of virtual objects including a virtual object corresponding to a first application; and

while displaying the plurality of virtual objects and while continuing to detect the mid-air gesture, detecting a second portion of the mid-air gesture that includes the rotation of the hand;

in response to detecting the second portion of the mid-air gesture including the rotation of the hand, updating an appearance of a user interface including the plurality of virtual objects, including indicating that input focus has moved from the virtual object corresponding to the first application to a virtual object corresponding to a second application different from the first application;

While updating the appearance of the user interface and while indicating that the virtual object corresponding to the second application has input focus, detecting a third portion of the mid-air gesture comprising releasing the pinch-and-hold gesture; and

In response to detecting the third portion of the air gesture including releasing the pinch-and-hold gesture while the virtual object corresponding to the second application is displayed, displaying a user interface corresponding to the second application.

23. A computer-readable storage medium storing one or more programs configured to be executed by one or more processors of a computer system in communication with one or more input devices, the one or more programs comprising instructions for performing the method according to any one of claims 1 to 22.

24. A computer system configured to communicate with one or more input devices, the computer system comprising:

one or more processors; and

A memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for executing the method according to any one of claims 1 to 22.

25. A computer system configured to communicate with one or more input devices, the computer system comprising:

Components for carrying out the method according to any one of claims 1 to 22.

26. A computer program product comprising one or more programs configured to be executed by one or more processors of a computer system in communication with one or more input devices, the one or more programs comprising instructions for performing the method according to any one of claims 1 to 22.