WO2006011295A1 - Dispositif de communication - Google Patents
Dispositif de communication Download PDFInfo
- Publication number
- WO2006011295A1 WO2006011295A1 PCT/JP2005/010024 JP2005010024W WO2006011295A1 WO 2006011295 A1 WO2006011295 A1 WO 2006011295A1 JP 2005010024 W JP2005010024 W JP 2005010024W WO 2006011295 A1 WO2006011295 A1 WO 2006011295A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- agent
- communication
- address book
- character
- Prior art date
Links
- 238000004891 communication Methods 0.000 title claims abstract description 122
- 238000012545 processing Methods 0.000 claims abstract description 58
- 238000013500 data storage Methods 0.000 claims abstract description 51
- 238000000034 method Methods 0.000 claims description 36
- 238000000605 extraction Methods 0.000 claims description 23
- 230000005540 biological transmission Effects 0.000 claims description 12
- 230000015572 biosynthetic process Effects 0.000 claims description 11
- 238000003786 synthesis reaction Methods 0.000 claims description 11
- 239000000284 extract Substances 0.000 claims description 7
- 230000002996 emotional effect Effects 0.000 claims description 5
- 238000013523 data management Methods 0.000 abstract description 21
- 239000003795 chemical substances by application Substances 0.000 description 152
- 230000006870 function Effects 0.000 description 17
- 230000001815 facial effect Effects 0.000 description 15
- 238000010586 diagram Methods 0.000 description 11
- 238000005516 engineering process Methods 0.000 description 10
- 230000008451 emotion Effects 0.000 description 7
- 238000011156 evaluation Methods 0.000 description 5
- 238000007726 management method Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 241000189662 Calla Species 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 238000005299 abrasion Methods 0.000 description 1
- 230000004397 blinking Effects 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 238000001308 synthesis method Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/141—Systems for two-way working between two video terminals, e.g. videophone
Definitions
- the present invention relates to a communication device that transmits a message using a CG character agent that imitates a specific individual, and more particularly, to a communication device having a videophone function.
- information terminals have transmitted various messages to users. For example, there is a message to inform the state of the device such as “The battery is about to run out”, a message to convey the event such as “E-mail from Mr. OO arrived”, etc.
- the content of the sent e-mail itself can be regarded as a message from the mail sender.
- the simplest interface is to display text information on the screen.
- the display of only characters is (1) the number of characters that can be displayed simultaneously is limited, (2) information that is difficult to express by characters (for example, emotions, etc.) cannot be conveyed, etc. It was difficult.
- an easy-to-understand and friendly interface has been proposed by combining various interfaces such as displaying an image of a message sender and reading out a message by voice.
- One such multi-modal interface is an interface that reads out CG character power messages in the form of a human voice.
- Such an interface using CG characters is called a CG character agent.
- This CG character agent can be broadly divided into two types: one using a general CG character and one using a personal information-powered CG character.
- the former is an agent that can be created without depending on the personal information of the user, and refers to, for example, an anime character or a character in the form of an animal.
- the latter is generated based on the user's personal information, and refers to, for example, a character pasted with a user's face photo or a character using a portrait that reflects the user's characteristics.
- Patent Document 1 Japanese Unexamined Patent Publication No. 2003-141564
- Patent Document 2 Japanese Patent Laid-Open No. 06-162167
- the present invention aims to solve the above-described problems. By using information transmitted together with communication such as a videophone, it is possible to automatically perform communication for each communication partner without requiring a user.
- the primary purpose is to provide a communication device that can generate and update personal CG character agents.
- a second object is to provide an apparatus.
- the communication device of the present invention is a communication device for communicating a message using a CG character agent, and communication means for communicating communication data with other terminals.
- agent data creation means for creating agent data including personal characteristic data of the communication partner based on the communication data, and address book data including the agent data corresponding to the personal information of the communication partner.
- the address book data creating means, the address book data storing means for storing the address book data, and the agent data included in the address book data of the communication partner are referred to, and the CG character agent of the communication partner is referred to.
- the communication apparatus automatically creates agent data of a communication partner using communication data received during communication with another terminal, and generates address book data. It is possible to create a CG character agent by referring to the agent data of the communication partner registered in the address book in the agent creation means.
- the agent data creation means extracts an image feature extraction unit that automatically extracts image feature data from the communication data, and extracts voice feature data.
- a speech feature extraction unit that performs at least the agent data. Includes the image feature data and the audio feature data.
- the agent data creation means of the communication device temporarily stores a reliability degree assigning unit that assigns reliability to the created agent data, and the agent data to which the reliability is assigned. Compares the temporary storage data storage unit for storage with the reliability of newly created agent data during communication and the reliability of agent data already stored in the temporary storage data storage unit! A reliability determination unit that automatically updates the agent data stored in the temporary storage data storage unit to agent data having high reliability, and the agent creation means further includes the temporary storage data storage The CG character agent is created using the agent data stored in the section.
- the communication device assigns reliability to the image feature data and the sound feature data, which are personal feature data, and is automatically stored in the temporary storage data storage unit. Because agent data can be updated to highly reliable agent data, it is always possible to communicate messages using the latest reliable image and voice CG character agent of the communication partner.
- the present invention can be realized as a CG character agent creating apparatus having characteristic means of a communication apparatus, a communication method using the means as a step, or each step in a computer. It can also be realized as a program for execution. It goes without saying that such a program can be distributed through a recording medium such as a CD-ROM or a transmission medium such as the Internet.
- the communication apparatus can automatically generate and update a CG character agent that reflects the characteristics of a communication partner that does not require user effort. Therefore, it is possible to realize a familiar interface using CG character agents and an easy-to-use interface.
- the communication device stores image feature data and sound feature data of a communication partner. This makes it easy to manage agent data, and can use this agent data for various applications other than videophone and mail.
- FIG. 1 is an external view of a mobile phone terminal for explaining an information terminal according to the present invention.
- FIG. 2 shows a configuration of a communication device according to an embodiment of the present invention.
- FIG. 3 is a block diagram showing a configuration of an agent data creation unit in an embodiment of the present invention.
- FIG. 4 is a conceptual diagram for explaining address book data in an embodiment of the present invention.
- FIG. 5 is a conceptual diagram for explaining feature extraction of an image in an embodiment of the present invention.
- FIG. 6 is a conceptual diagram for explaining a method for generating an animation image using feature points of an image in an embodiment of the present invention.
- Fig. 7 is a flowchart for explaining generation processing of agent data in one embodiment of the present invention.
- FIG. 8 is a flowchart for explaining processing of an application using a personal CG agent in one embodiment of the present invention.
- FIG. 9 is a conceptual diagram for explaining animation control in an agent output unit in one embodiment of the present invention.
- FIG. 1 is an external view of the mobile phone terminal 10.
- a mobile phone terminal 10 is a mobile phone having a television phone function, and includes a key 101, a speaker 102, a display 103, a camera 104, and a microphone 105.
- the key 101 is composed of a number key for making a call and a plurality of keys for a camera function and a mail function.
- the speaker 102 outputs sound when receiving a call, or outputs a ring tone for a telephone or e-mail.
- the display 103 displays images and characters, and specifically may be a liquid crystal display or an organic EL display.
- the camera 104 acquires a still image or a moving image, and acquires a user image when using a videophone. As an example of the camera 104, a CCD camera or a CMOS camera may be used.
- the microphone 105 is for inputting voice, and acquires the voice of the user when using the telephone.
- the communication device is described as the mobile phone terminal 10, but a communication device having a videophone function, such as a stationary telephone, a PDA (Personal Digital Assistant), a personal It can be a computer or the like.
- a communication device having a videophone function such as a stationary telephone, a PDA (Personal Digital Assistant), a personal It can be a computer or the like.
- FIG. 2 is a functional block diagram of the communication device 20 in the mobile phone terminal 10.
- the communication device 20 includes a communication processing unit 210, a videophone processing unit 220, an agent data creation unit 230, an address book data management unit 240, an address book data storage unit 250, an agent setting unit 260, an agent output unit 270, an output unit. 280, an input unit 290, and an application processing unit 300.
- the communication device may be provided with the CG character agent creation device constituting each functional block shown in FIG.
- the input unit 290 includes an image input unit 291, an audio input unit 292, and a key input unit 293.
- the image input unit 291 also captures image data from the camera 104 and acquires it as bitmap data.
- bitmap data formats include RGB format and YU V format etc.!
- the voice input unit 292 acquires voice data from the microphone 105.
- An example of audio data may be PCM (Pulse Code Modulation) data.
- the key input unit 293 acquires the pressed state when the key 101 is pressed.
- the output unit 280 includes an image output unit 281 and an audio output unit 282.
- the image output unit 281 receives bitmap data or a memory address where the bitmap data is stored, and displays it on the display 103.
- the audio output unit 282 receives audio data and outputs sound through the speaker 105.
- a communication processing unit 210 shown in FIG. 1 performs transmission / reception of communication in a videophone.
- the communication processing unit 210 passes the received packet data, which has also received the information terminal capability of the other party, to the videophone processing unit 220.
- the transmission packet data generated from the videophone processing unit 220 is received and transmitted to the information terminal of the other party.
- a data format of MPEG4 (Moving Picture Experts Group phase 4) may be used.
- the videophone packet data may have any data format as long as voice and moving images can be transmitted.
- the present invention can be applied to any data format.
- the videophone processing unit 220 performs videophone transmission processing and reception processing.
- the received packet data force received from the communication processing unit 210 also generates image data (bitmap data) as image information and sound data as sound information.
- the generated bit map data is transferred to the screen output unit 281 and the audio data is transferred to the audio output unit 282, whereby the transmitted image is displayed on the display 103, and the transmitted audio is output to the speaker 102.
- video packet transmission packet data is created from the image data acquired from the image input unit 291 and the audio data acquired from the audio input unit 292, and passed to the communication processing unit 210.
- the videophone processing unit 220 informs the address book data management unit 240 of the telephone number of the other party (videophone partner) and the agent in the corresponding address data.
- Get data update settings for data When the update setting is ON, the generated image data and audio data are passed to the agent data creation unit 230 during the reception process.
- timing for passing data timing at which reception processing is executed may be used. However, in order to reduce the processing, the timing may be set at regular intervals so that the number of times is less than that of the reception processing.
- the agent data creation unit 230 creates agent data for the image information and voice information provided from the videophone processing unit 220 and stores them in the address book data storage unit 250.
- agent data creation unit 230 creates agent data for the image information and voice information provided from the videophone processing unit 220 and stores them in the address book data storage unit 250.
- the agent data creation unit 230 includes an image feature extraction unit 231, an audio feature extraction unit 232, a reliability determination unit 233, and a temporarily saved data storage unit 234.
- the image feature extraction unit 231 generates image feature data from the image data received from the videophone processing unit 220.
- image feature data is the image data (still image) from which the position coordinates of feature points indicating the part of the face and feature points are extracted.
- FIG. 5 is a conceptual diagram for explaining the facial feature points extracted by the image feature extraction unit 230.
- the face image shown in the upper part of FIG. 5 shows the image data delivered from the videophone processing unit 220, and the face of the other party is displayed on a 240 ⁇ 320 size bitmap.
- P1 upper lip upper end point
- P2 upper lip lower end point
- P3 lower lip upper end point
- P4 lower lip lower end point
- P5 lip left end point
- P6 lip
- the result of recognizing 6 points (right end point) is shown below.
- the numerical value in the box next to the feature point name indicates the position coordinates of the feature point.
- the upper right point is the origin and the lower right point is (240, 320) t Use the coordinate system! /
- the voice feature extraction unit 232 generates voice feature data from the voice data received from the videophone processing unit 220.
- voice feature data a voice control parameter for controlling the pitch, accent, and the like at the time of voice synthesis may be used.
- Voice feature data May be voice data obtained by extracting specific phonemes or words used at the time of voice synthesis, or data such as a difference from the voice dictionary data stored in the basic data storage unit 273.
- the address book data management unit 240 manages the address book data recorded in the address book data storage unit 250.
- This address book data includes personal information of multiple persons.
- the address book data management unit 240 has a function of searching for one element included in address book data such as a telephone number as a key and acquiring a memory address of the address book data as a search result. For example, by passing a numerical value of “13” as the data ID or “0901234 5678” as the telephone number, the memory address for accessing the address book data as shown in FIG. 4 is managed.
- An example of the address book data 400 shown in FIG. 4 is address book data for one person.
- a data ID for uniquely identifying the data, and an individual such as a name, telephone number, e-mail address, group number, icon, etc. Information is recorded. These data are used for various purposes, such as displaying icons when receiving a phone call, and sorting to a directory by group when receiving a mail.
- the present invention is characterized in that the address book data 400 includes agent data 401.
- agent data 401 includes image feature data, voice feature data, data update settings, and an agent type.
- the image feature data consists of image data itself, or a certain type of feature data obtained by extracting features from the image data.
- the image feature data the position coordinates of the feature points by state generated by the image feature extraction unit 231, the reliability indicating the certainty of the information about the feature points by state, and the bit of the image data corresponding to the feature points Consists of a file name indicating the map.
- the position coordinates of the feature points of the image data are data indicating the positions of the human eyes and mouth and the connected parts in the image data, as shown in the lower diagram of FIG. 5 and the conceptual diagram of FIG.
- it can be defined as position coordinates indicating points on the contours such as eyes, nose and mouth.
- the characteristic points for each state are the states in which phonemes such as “A”, “I”, “U”, “E”, “O” are generated, “Angry”, “Laughing”. This means the position coordinates of feature points in emotional states such as “sad”.
- By having feature points for each state it is possible to generate personal CG agents with various facial expressions.
- the bitmap of the image data corresponding to the feature point is image data that becomes a base when the image is divided and displayed as shown in the upper diagram of FIG. 6, and may be image data in a specific state.
- image data may be provided for each of a plurality of states using a single image data. For example, it is possible to improve the expressive power by providing a plurality of image data corresponding to feature points by emotion.
- the reliability is a numerical value representing the certainty of image recognition and voice recognition when extracting feature data. For example, if the other party is not clearly shown in the image data of a videophone, there is a high possibility that image feature data cannot be extracted accurately. Even if a CG character agent is generated using image feature data in such a case, a clear image cannot be generated. Therefore, in the present invention, a realistic CG character reflecting the features of the other party is added by adding an index of reliability to the image feature data and voice feature data, and selecting and recording feature data with high reliability. Realizes creating an agent.
- the reliability it is necessary to generate a numerical value in which the reliability of recognition is high, the value is 100, the reliability is low !, and the value is 0.
- a method of generating the reliability of the image feature data it is also possible to generate a value of a table function when recognizing (detecting) a face (or a face portion) from the image data.
- a table function As an example of the evaluation function in face detection, for example, an evaluation function g for face Z non-face identification disclosed in Japanese Patent Laid-Open No. 2003-44853 may be used.
- this evaluation function g it is possible to perform face Z non-face discrimination by assuming that a face that should be determined as a face is negative and that a face that should be determined is positive.
- the reliability of the speech feature data it can be generated from the value of the recognition evaluation function used during speech recognition.
- the value of the cost function V disclosed in JP 2004-198656 may be used.
- the reliability of face recognition can also be used to determine the reliability of speech recognition. In this case, when determining the reliability of the audio feature data, the reliability of the corresponding image feature data is used.
- the reliability of the image feature data is created in the image feature extraction unit 231 and the reliability of the audio feature data is the audio feature extraction unit 232.
- the agent data creation unit 230 may be provided with a reliability providing unit that provides reliability.
- the voice feature data also includes the reliability of the voice feature data for each state generated by the voice feature extraction unit 232 and the voice feature data for each state.
- voice feature data as with image feature data, phonemes such as “A”, “I”, “U”, “E”, “O”, “Angry”, “Laughing” , Voice control parameters in emotional states such as “sad” may be used.
- the reliability indicates the certainty of recognition and classification, similar to the reliability of the image feature data.
- the data update setting indicates whether or not agent data is to be generated. When ON, it indicates that agent data is generated and updated when a call is received for a TV phone call. Indicates that agent data is not updated during a call. This setting can be specified by the user using an address book editing application.
- the agent type sets what type of agent is used when generating a personal CG agent.
- the present invention comprises image feature data and audio feature data as agent data. Therefore, these types of data can be used to generate various types of agents. For example, on the image side, it is possible to generate a realistic agent with a face photo of a person or to generate a portrait with image feature data power. On the sound side, the voice feature data power is similar to that person. It is also possible to generate robotic speech with the person's accent.
- Agent setting unit 260 is an application processing unit 300 power personal CG age Enter the agent settings when using the agent message transfer function. The details will be described below.
- the agent setting unit 260 includes a personal identifier setting unit 261, a state setting unit 262, and a message setting unit 263.
- the personal identifier setting unit 261 sets a personal identifier that is identification data for specifying a personal CG agent to be used.
- the personal identifier is a key for searching the address book data by the address book data management unit 240. For example, a data ID included in the address book data or a telephone number may be used.
- the state setting unit 262 designates a state when a message is transmitted.
- An example of the state may be emotions such as “angry” and “laughing”.
- the state setting unit 262 sets the animation state and the like. For example, it includes settings for whether the same message is repeatedly transmitted, settings for message transmission speed, and the like.
- Message setting section 263 sets a character string of a message to be transmitted. For example, the character string “Denwa Dayo” is set as the message string.
- the application processing unit 300 indicates an arbitrary application that uses a CG character agent, and performs agent setting for the agent setting unit 300.
- a specific personal identifier may be acquired from various information such as a telephone number and a name by using the data search function of the address book data management unit 240.
- the agent output unit 270 acquires the personal CG agent data stored in the address book data storage unit 250 from the address book data management unit 240, and performs image display Z sound output of the personal CG agent based on the data. Details will be described below.
- the agent output unit 270 includes a CG character drawing unit 271, a voice synthesis unit 272, and a basic data storage unit 273.
- the CG character drawing unit 271 generates a CG character on a bitmap on a certain memory from the data stored in the basic data storage unit 273 and the image feature data stored in the address book data storage unit 250. Draw and draw bitmap data or bitmap data Is passed to the image output unit 281.
- FIG. 6 An example of a method of drawing a CG character with image feature data power will be described with reference to FIG.
- the upper figure in Fig. 6 displays the facial feature points and the facial image data from which the feature points have been extracted, and the lower figure in Fig. 6 shows the facial image data using the coordinates of those feature points.
- various animation images such as opening and closing eyes, opening and closing mouth, angry state, and laughing state can be generated by animation that moves feature points.
- the two-dimensional animation technology has been explained.
- the image data of the face is pasted on the face of the 3DCG character and animated to create a three-dimensional animation. Variations can also be applied.
- the speech synthesizer 282 generates speech data reflecting personal features from the speech dictionary data stored in the basic data storage unit 273 and the speech feature data stored in the address book data storage unit 250. To the audio output unit 282.
- the basic data storage unit 273 stores basic data necessary for image generation and speech synthesis of the personal CG agent. Examples of basic data include bitmap data and shape data for displaying character images, and speech dictionary data required for speech synthesis.
- the agent data stored in the address book data storage unit 250 is data for reflecting the characteristics of the individual. When used together with the basic data in the basic data storage unit 273, the personal CG agent can be output. it can.
- the agent data in the address book data storage unit 250 includes all of the personal CG agent data, the basic data storage unit 273 can be eliminated.
- FIG. 7 is a flowchart showing agent data generation processing performed during a videophone call.
- the agent data generation process is a videophone call.
- image feature data and voice feature data are generated from the image data and voice data of the calling party, and stored in address book data in which personal information of the calling party is recorded. This will be described in detail below.
- the videophone call is started when the power of the other terminal is applied, or when the terminal calls the other terminal (step S101).
- the videophone processing unit 220 notifies the address book data management unit 240 of the telephone number of the other party, and acquires the data update setting value of the agent data. .
- the address book data management unit 240 searches the address book data stored in the address book data storage unit 250 from the passed telephone number, and sets the “data update setting” value of the address book data (see FIG. 4). It is acquired and returned to the videophone processing unit 220 (step S102).
- the videophone processing unit 220 ends the agent data generation process (step S103), and only the videophone call process is executed.
- the videophone processing unit 220 performs processing for passing image data and audio data to the agent data creation unit 230 simultaneously with the videophone call processing (step S104).
- the agent data creation unit 230 generates agent data from the image data and the sound data.
- the image feature extraction unit 231 generates image feature data from image data using image recognition technology
- the audio feature extraction unit 232 generates audio feature data from audio data using audio recognition technology. (Step S105). The details will be described below.
- the image data is image data transmitted from the other party's power of the videophone, and mainly the face image of the sender is shown.
- the image data is subjected to facial feature extraction by the image feature extraction unit 231.
- a face image feature extraction method a pattern matching technique may be used in which the color value of image data is compared with a preliminarily prepared pattern.
- An example of an object to recognize is whether the entire face is shown What is necessary is just to recognize such as the position of the eyes, nose and mouth. It may also recognize color information.
- An example of the image feature data acquired here may be position coordinates indicating a specific position such as position coordinates of both ends of the lips as shown in FIG.
- any data such as color information and face contour information can be applied as long as it can be acquired from the image data.
- voice data the type of "sound” uttered by voice recognition technology (for example, “O”, “Ha”, “Yo”, “U”, etc.), and the emotional state of the speaker
- this control parameter shows information such as specific vowel consonants and voice data of frequently used words as they are, and information such as voice pitch, loudness, and speed that changes according to the state of emotion.
- the type of sound and the type of emotion are defined in advance, and the typical speech feature amount is defined only for the defined type, and it is classified into any type by comparing with the speech feature amount. If you can determine the power that can be.
- the reliability of the generated image feature data and audio feature data is passed to the reliability determination unit 233, which determines the reliability of the data stored in the temporary storage data storage unit 234 and the state-specific reliability.
- the degree of comparison is compared (step S106), and feature data with high reliability is updated and stored in the temporary storage data storage unit 234 (step S107).
- the reliability of the image feature data and the reliability of the audio feature data may be separately compared for determination. Also, if the person appears in the image and the sound is correct and the sound is transmitted, or if the sound is incorrect and the person is displayed in the state, the characteristics of the other party Since there is a high possibility that it will not be reflected, the reliability of the audio data is reflected in the determination of the image data, and the reliability of the audio data is reflected in the determination of the image data. It may be reflected in the determination.
- the method for storing highly reliable data has been described. However, the created image feature data and audio feature data are stored in the temporary storage data storage unit 234 using the reliability. You may merge with feature data. An example of a merge method is to linearly interpolate two values with confidence as a weight value. Yes.
- the agent data creation unit 230 asks the videophone processing unit 220 whether or not the power of the videophone call has been terminated (step S108). If it is not terminated (NO in step S108) ), Return to the processing in step S104 and the subsequent steps, and repeat the agent data generation processing.
- agent data creation unit 230 performs the processing in step S109 and subsequent steps for storing the generated agent data in address book data storage unit 250. Details will be described below.
- the video phone processing unit 220 notifies the address book data management unit 240 that the video phone call has ended, and passes the phone number of the other party.
- Address book data management unit 240 receives agent data stored in temporary storage data storage unit 234 from agent data creation unit 230.
- the address book data of the other party is searched from the address book data storage unit 250 based on the telephone number that has been passed, and the agent data is obtained.
- the reliability judgment unit 233 compares the reliability for each state (step S109), and when the reliability is high (YES in step S110), the reliability of the address book data Update agent data (step S111). On the other hand, if the reliability is low (NO in step S110), the process is terminated without updating the agent data of the address book data.
- the method for storing highly reliable data has been described. However, the two data may be merged using the reliability.
- the present invention can be applied to a TV conference system in which a plurality of persons can participate. Needless to say, the present invention can also be applied to a system that communicates various images and sounds.
- This section describes the flow of the abrasion process in which the client sends the message “Denwa Dayo” to the user's communication device.
- the application processing unit 300 refers to a processing unit of a telephone application that processes a call when an incoming call is received.
- the application processing unit 300 sets the personal CG agent to be used for the agent setting unit 260 (step 201).
- An example of this setting is personal identifier setting, status setting, and message setting power, which will be described in detail below.
- application processing unit 300 sets an identifier for identifying an individual in individual identifier setting unit 261.
- the personal identifier is an identifier for retrieving data in the address book data management unit 240, and may be a telephone number or address book data ID, for example.
- the application processing unit 300 passes the telephone number of the incoming call to the address book data management unit 240, searches the address book data in the address book data storage unit 250, and You only need to get the corresponding data ID.
- a message character string to be transmitted is set in the message setting unit 263.
- a fixed character string such as “Denwa Dayo” may be used.
- a personal message may be set in the address book and the character string may be read out.
- the name of the other party may be obtained from the address book data management unit 240 and may include a character string unique to the individual such as “From Mr. OO.
- the message is information of only a character string, but parameters used in speech synthesis such as accent, size, and interval may be added to the message.
- a state parameter for controlling the personal CG agent is set in the state setting unit 262.
- the state parameter there is a state meter that indicates the state of the agent, and setting power related to animation such as repeated actions.
- the status parameter may be an emotion parameter for changing the way of reading depending on the emotion such as “angry” or “laughing” when reading the character string “Denwa Dayo”.
- Repetitive action settings include the ability to read out the message “Calla phone” just once, the power to read it repeatedly as “Phone phone, phone call, phone call, etc.” With settings for repeated control Yes, at the time of repetition, you can also specify the change of reading when repeating, such as “Repeat gradually” and “Repeat gradually”. From these settings, the type of CG agent operation and the type of animation can be determined.
- the agent setting unit 260 passes the setting value to the agent output unit 270.
- the agent output unit 270 first passes the personal identifier to the address book data management unit and receives the corresponding agent data (step S202). The data transfer may be performed by copying the actual state of the data to a certain memory. You may pass the address of the memory where the data is stored. Further, the image feature data included in the agent data is passed to the CG character drawing unit 271, and the voice feature data is passed to the speech synthesis unit 272.
- the CG character drawing unit 271 draws a CG character reflecting the personal features based on the personal feature data (step S203).
- a CG character drawing method will be described.
- the method of directly using the facial photo and the method of displaying the portrait are selected.
- the facial parts stored in the basic data storage unit 273 (“eyes”, “nose”, “mouth”, etc.) from the bitmap of the facial photo and the coordinates of the facial feature points )
- Bitmaps are selected, and further scaled according to the position coordinates are combined to generate various facial bitmaps.
- the generated portrait bitmap can be handled in the same way as the portrait bitmap.
- an animation image of opening and closing of the mouth and opening and closing of the eyes is generated.
- this animation technology as shown in Fig. 6, a mesh is defined based on facial feature points, and bit map data (face photo or caricature data) of the face is texture-mapped on the mesh, and the mesh is created.
- bit map data face photo or caricature data
- By using these technologies it is possible to draw animated images with lip sync or blinking as bitmap data in accordance with the sound.
- a full body model of the character is created using the 3DCG force, which shows only the method of animating the facial image data, and the above facial image is pasted on the facial mesh to display the full body character. May be.
- the speech synthesizer 272 utters a speech corresponding to the given character string (step 204).
- the phoneme data of the character to be uttered is acquired from the default phoneme database stored in the basic data storage unit 273, and the phoneme control data included in the personal voice feature data is further acquired. Use to change phoneme data and connect phoneme data of character units to generate voice data of the corresponding character string
- bitmap data generated by the agent output unit 270 is sent to the screen output unit 281 (step S203), and the voice data is sent to the voice output unit 282 (step S204).
- the agent output unit 270 controls animation according to the setting of the agent setting unit 260. For example, if a message with the string “Good morning” is conveyed as voice in 4 seconds, an animation of 4 seconds is also drawn for the screen display. As an example of how to create an animation image, as shown in Fig. 9, every second, the face of the face is shaped like “O”, “Ha”, “Yo”, “U”. Control animation and use multiple frames per second to draw animated images that change mouth shape continuously.
- step S205 the personal CG agent output processing ends (step S205).
- step S206 the position coordinates used in the drawing, such as control parameters for voice processing, are changed according to the change in time (step S206), and the personal CG agent The output process (step S203 to step S204) is repeated.
- the communication device 20 of the present invention uses the image information and audio information sent in a videophone call to reflect the characteristics of the other party. Can be automatically generated. Therefore, it is possible to realize a communication device that transmits a message using a CG character agent that reduces the time and effort of the user.
- agent data created in the agent data creation unit 230 in correspondence with the address book data of the other party of the call, all kinds of messages from persons registered in the address book can be personalized. Can be communicated as a CG character agent.
- the agent data 401 including the image feature data and the voice feature data is assigned to the address book data 400 and managed for each communication partner. Management can be facilitated, and the address book data 400 can be used in various other applications other than message transmission applications such as videophone and mail.
- the processing of the application using the present invention has been explained for the incoming call message.
- the present invention reads out the incoming message of the e-mail and the content of the e-mail. Needless to say, it is equally applicable to various applications that handle personal information, such as applications transmitted by the personal CG agent of the e-mail sender.
- the present invention can be applied to entertainment applications such as games.
- CG characters that reflect the characteristics of a person who knows! / Can appear in the virtual world and convey information (messages), creating their own characters and stories. To provide new entertainment.
- the communication apparatus can realize a familiar and easily understandable interface using a CG character agent, and is useful for information terminals having a videophone function, such as a mobile phone terminal, a PDA, and a PC.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Processing Or Creating Images (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
- Telephone Function (AREA)
Abstract
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2004-215234 | 2004-07-23 | ||
JP2004215234A JP2007279776A (ja) | 2004-07-23 | 2004-07-23 | Cgキャラクタエージェント装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2006011295A1 true WO2006011295A1 (fr) | 2006-02-02 |
Family
ID=35786049
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2005/010024 WO2006011295A1 (fr) | 2004-07-23 | 2005-06-01 | Dispositif de communication |
Country Status (2)
Country | Link |
---|---|
JP (1) | JP2007279776A (fr) |
WO (1) | WO2006011295A1 (fr) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007235581A (ja) * | 2006-03-01 | 2007-09-13 | Funai Electric Co Ltd | テレビジョン受信装置 |
JP2015516625A (ja) * | 2012-03-14 | 2015-06-11 | グーグル・インク | ビデオ会議中の参加者の風貌修正 |
CN106791165A (zh) * | 2017-01-10 | 2017-05-31 | 努比亚技术有限公司 | 一种通讯录分组终端及方法 |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4844614B2 (ja) | 2008-10-07 | 2011-12-28 | ソニー株式会社 | 情報処理装置、情報処理方法およびコンピュータプログラム |
JP4686619B2 (ja) | 2009-03-23 | 2011-05-25 | 株式会社東芝 | 顔認証を利用した情報処理方法および情報表示装置 |
JP2010272077A (ja) | 2009-05-25 | 2010-12-02 | Toshiba Corp | 情報再生方法及び情報再生装置 |
TW201236444A (en) | 2010-12-22 | 2012-09-01 | Seyyer Inc | Video transmission and sharing over ultra-low bitrate wireless communication channel |
CN108090940A (zh) * | 2011-05-06 | 2018-05-29 | 西尔股份有限公司 | 基于文本的视频生成 |
JP5787644B2 (ja) | 2011-07-01 | 2015-09-30 | キヤノン株式会社 | 画像処理装置および画像処理装置の制御方法 |
JP5857531B2 (ja) * | 2011-08-24 | 2016-02-10 | カシオ計算機株式会社 | 画像処理装置、画像処理方法及びプログラム |
JP6828530B2 (ja) * | 2017-03-14 | 2021-02-10 | ヤマハ株式会社 | 発音装置及び発音制御方法 |
US11151979B2 (en) * | 2019-08-23 | 2021-10-19 | Tencent America LLC | Duration informed attention network (DURIAN) for audio-visual synthesis |
JP6796762B1 (ja) * | 2019-11-28 | 2020-12-09 | 有限会社クロマニヨン | 仮想人物対話システム、映像生成方法、映像生成プログラム |
WO2024089868A1 (fr) * | 2022-10-28 | 2024-05-02 | 日本電気株式会社 | Système d'aide à l'entrée, procédé d'aide à l'entrée et support non transitoire lisible par ordinateur |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH09138767A (ja) * | 1995-11-14 | 1997-05-27 | Fujitsu Ten Ltd | 感情表現の通信装置 |
JPH1125016A (ja) * | 1997-07-09 | 1999-01-29 | Matsushita Electric Ind Co Ltd | アバターを使用した通信方法およびそのプログラムを記録した記録媒体 |
JPH11306322A (ja) * | 1998-04-23 | 1999-11-05 | Fujitsu Ltd | 画像フィルタリングシステム |
JP2000020683A (ja) * | 1998-06-30 | 2000-01-21 | Victor Co Of Japan Ltd | 通信会議システム |
JP2002016676A (ja) * | 2000-06-28 | 2002-01-18 | Sony Corp | 無線伝送方法および無線伝送装置 |
JP2002077840A (ja) * | 2000-08-30 | 2002-03-15 | Toshiba Corp | 通信端末装置 |
JP2002140697A (ja) * | 2000-10-31 | 2002-05-17 | Toshiba Corp | コミュニケーション画像生成装置およびコミュニケーション装置およびコミュニケーション画像生成方法およびコミュニケーション情報処理方法およびプログラム |
JP2002176632A (ja) * | 2000-12-08 | 2002-06-21 | Mitsubishi Electric Corp | 携帯電話機及び画像伝送方法 |
JP2003016475A (ja) * | 2001-07-04 | 2003-01-17 | Oki Electric Ind Co Ltd | 画像コミュニケーション機能付き情報端末装置および画像配信システム |
JP2003244425A (ja) * | 2001-12-04 | 2003-08-29 | Fuji Photo Film Co Ltd | 伝送画像の修飾パターンの登録方法および装置ならびに再生方法および装置 |
JP2004179997A (ja) * | 2002-11-27 | 2004-06-24 | Sony Corp | 双方向コミュニケーションシステム,映像通信装置,および映像通信装置の映像データ配信方法 |
-
2004
- 2004-07-23 JP JP2004215234A patent/JP2007279776A/ja active Pending
-
2005
- 2005-06-01 WO PCT/JP2005/010024 patent/WO2006011295A1/fr active Application Filing
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH09138767A (ja) * | 1995-11-14 | 1997-05-27 | Fujitsu Ten Ltd | 感情表現の通信装置 |
JPH1125016A (ja) * | 1997-07-09 | 1999-01-29 | Matsushita Electric Ind Co Ltd | アバターを使用した通信方法およびそのプログラムを記録した記録媒体 |
JPH11306322A (ja) * | 1998-04-23 | 1999-11-05 | Fujitsu Ltd | 画像フィルタリングシステム |
JP2000020683A (ja) * | 1998-06-30 | 2000-01-21 | Victor Co Of Japan Ltd | 通信会議システム |
JP2002016676A (ja) * | 2000-06-28 | 2002-01-18 | Sony Corp | 無線伝送方法および無線伝送装置 |
JP2002077840A (ja) * | 2000-08-30 | 2002-03-15 | Toshiba Corp | 通信端末装置 |
JP2002140697A (ja) * | 2000-10-31 | 2002-05-17 | Toshiba Corp | コミュニケーション画像生成装置およびコミュニケーション装置およびコミュニケーション画像生成方法およびコミュニケーション情報処理方法およびプログラム |
JP2002176632A (ja) * | 2000-12-08 | 2002-06-21 | Mitsubishi Electric Corp | 携帯電話機及び画像伝送方法 |
JP2003016475A (ja) * | 2001-07-04 | 2003-01-17 | Oki Electric Ind Co Ltd | 画像コミュニケーション機能付き情報端末装置および画像配信システム |
JP2003244425A (ja) * | 2001-12-04 | 2003-08-29 | Fuji Photo Film Co Ltd | 伝送画像の修飾パターンの登録方法および装置ならびに再生方法および装置 |
JP2004179997A (ja) * | 2002-11-27 | 2004-06-24 | Sony Corp | 双方向コミュニケーションシステム,映像通信装置,および映像通信装置の映像データ配信方法 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007235581A (ja) * | 2006-03-01 | 2007-09-13 | Funai Electric Co Ltd | テレビジョン受信装置 |
JP2015516625A (ja) * | 2012-03-14 | 2015-06-11 | グーグル・インク | ビデオ会議中の参加者の風貌修正 |
CN106791165A (zh) * | 2017-01-10 | 2017-05-31 | 努比亚技术有限公司 | 一种通讯录分组终端及方法 |
Also Published As
Publication number | Publication date |
---|---|
JP2007279776A (ja) | 2007-10-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP2127341B1 (fr) | Réseau de communication et dispositifs de conversion texte/parole et texte/animation faciale | |
CN110134484B (zh) | 消息图标的显示方法、装置、终端及存储介质 | |
CN1326400C (zh) | 虚拟电视通话装置 | |
CN100359941C (zh) | 可视电话终端 | |
EP1480425B1 (fr) | Terminal portable et programme por générer un avatar en fonction d'une analyse vocale | |
US10360716B1 (en) | Enhanced avatar animation | |
CN100481851C (zh) | 使用通信设备的化身控制 | |
CN112669417B (zh) | 虚拟形象的生成方法、装置、存储介质及电子设备 | |
US20080158334A1 (en) | Visual Effects For Video Calls | |
WO2006011295A1 (fr) | Dispositif de communication | |
US20100060647A1 (en) | Animating Speech Of An Avatar Representing A Participant In A Mobile Communication | |
JP2005115896A (ja) | 通信装置及び通信方法 | |
CN104935860A (zh) | 视频通话实现方法及装置 | |
BRPI0904540B1 (pt) | método para animar rostos/cabeças/personagens virtuais via processamento de voz | |
CN101690071A (zh) | 在视频会议和其他通信期间控制化身的方法和终端 | |
CN113395597A (zh) | 一种视频通讯处理方法、设备及可读存储介质 | |
WO2008087621A1 (fr) | Appareil et procédé d'animation d'objets virtuels à répondant émotionnel | |
JP2005078427A (ja) | 携帯端末及びコンピュータ・ソフトウエア | |
JP2005064939A (ja) | アニメーション機能を有する携帯電話端末およびその制御方法 | |
JP2006350986A (ja) | 顔写真付きメールを送受信できる携帯電話機 | |
KR20200040625A (ko) | 사용자의 발화를 처리하는 사용자 단말 및 그 제어 방법 | |
CN105227765A (zh) | 通话过程中的互动方法及系统 | |
KR100733772B1 (ko) | 이동통신 가입자를 위한 립싱크 서비스 제공 방법 및 이를위한 시스템 | |
CN1650290A (zh) | 通信网络中传输消息和简单图案的方法和装置 | |
WO2002097732A1 (fr) | Procede de production d'avatars utilisant des donnees d'images et systeme d'agent associe a l'avatar |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
NENP | Non-entry into the national phase |
Ref country code: DE |
|
NENP | Non-entry into the national phase |
Ref country code: JP |
|
122 | Ep: pct application non-entry in european phase |