WO1997003513A1 - Mode switching system for a voice over data modem - Google Patents
Mode switching system for a voice over data modem Download PDFInfo
- Publication number
- WO1997003513A1 WO1997003513A1 PCT/US1996/011313 US9611313W WO9703513A1 WO 1997003513 A1 WO1997003513 A1 WO 1997003513A1 US 9611313 W US9611313 W US 9611313W WO 9703513 A1 WO9703513 A1 WO 9703513A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- voice
- mode
- digital
- ofthe
- Prior art date
Links
- 238000004891 communication Methods 0.000 claims abstract description 157
- 230000005540 biological transmission Effects 0.000 claims description 54
- 238000000034 method Methods 0.000 claims description 36
- 230000009977 dual effect Effects 0.000 claims description 16
- 238000006243 chemical reaction Methods 0.000 claims description 6
- 230000004044 response Effects 0.000 claims description 3
- 230000001755 vocal effect Effects 0.000 claims description 3
- 150000001768 cations Chemical class 0.000 claims 1
- 230000006835 compression Effects 0.000 description 98
- 238000007906 compression Methods 0.000 description 98
- 230000006870 function Effects 0.000 description 60
- 239000000872 buffer Substances 0.000 description 30
- 238000012546 transfer Methods 0.000 description 30
- 230000001413 cellular effect Effects 0.000 description 20
- 238000010586 diagram Methods 0.000 description 18
- 230000015572 biosynthetic process Effects 0.000 description 14
- 238000003786 synthesis reaction Methods 0.000 description 14
- 238000013139 quantization Methods 0.000 description 12
- 238000001514 detection method Methods 0.000 description 11
- 239000013598 vector Substances 0.000 description 11
- 230000007774 longterm Effects 0.000 description 9
- 238000012545 processing Methods 0.000 description 9
- 238000005070 sampling Methods 0.000 description 9
- 238000004458 analytical method Methods 0.000 description 8
- 230000003595 spectral effect Effects 0.000 description 6
- 230000006837 decompression Effects 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 238000007667 floating Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000011664 signaling Effects 0.000 description 4
- 230000001360 synchronised effect Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 230000000977 initiatory effect Effects 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 238000003825 pressing Methods 0.000 description 2
- 230000002441 reversible effect Effects 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 206010002953 Aphonia Diseases 0.000 description 1
- 239000013065 commercial product Substances 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000000881 depressing effect Effects 0.000 description 1
- 230000000994 depressogenic effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 235000019800 disodium phosphate Nutrition 0.000 description 1
- 238000002592 echocardiography Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000004377 microelectronic Methods 0.000 description 1
- 239000000047 product Substances 0.000 description 1
- WVMLRRRARMANTD-FHLIZLRMSA-N ram-316 Chemical compound C1=CCC[C@@]2(O)[C@H]3CC4=CC=C(OC)C(O)=C4[C@]21CCN3C WVMLRRRARMANTD-FHLIZLRMSA-N 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000008054 signal transmission Effects 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 238000001308 synthesis method Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M11/00—Telephonic communication systems specially adapted for combination with other electrical systems
- H04M11/06—Simultaneous speech and data transmission, e.g. telegraphic transmission over the same conductors
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M7/00—Arrangements for interconnection between switching centres
- H04M7/0024—Services and arrangements where telephone services are combined with data services
- H04M7/0027—Collaboration services where a computer is used for data transfer and the telephone is used for telephonic communication
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M7/00—Arrangements for interconnection between switching centres
- H04M7/12—Arrangements for interconnection between switching centres for working between exchanges having different types of switching equipment, e.g. power-driven and step by step or decimal and non-decimal
- H04M7/1205—Arrangements for interconnection between switching centres for working between exchanges having different types of switching equipment, e.g. power-driven and step by step or decimal and non-decimal where the types of switching equipement comprises PSTN/ISDN equipment and switching equipment of networks other than PSTN/ISDN, e.g. Internet Protocol networks
- H04M7/129—Details of providing call progress tones or announcements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M7/00—Arrangements for interconnection between switching centres
- H04M7/12—Arrangements for interconnection between switching centres for working between exchanges having different types of switching equipment, e.g. power-driven and step by step or decimal and non-decimal
- H04M7/1205—Arrangements for interconnection between switching centres for working between exchanges having different types of switching equipment, e.g. power-driven and step by step or decimal and non-decimal where the types of switching equipement comprises PSTN/ISDN equipment and switching equipment of networks other than PSTN/ISDN, e.g. Internet Protocol networks
- H04M7/1295—Details of dual tone multiple frequency signalling
Definitions
- the present invention relates to communications systems and in particular to systems for switching between voice communications and computer assisted digital communications having a voice over data communications ability.
- Background ofthe Invention A wide variety of communications alternatives are currently available to telecommunications users. For example, facsimile transmission of printed matter is available through what is commonly referred to as a stand-alone fax machine.
- fax-modem communication systems are currently available for personal computer users which combine the operation of a facsimile machine with the word processor of a computer to transmit documents held on computer disk.
- Modem communication over telephone lines in combination with a personal computer is also known in the art where file transfers can be accomplished from one computer to another. Also, simultaneous voice and modem data transmitted over the same telephone line has been accomplished in several ways.
- Modem technology has recently multiplexed the transmission of various nonstandard data with standard digital data, such as voice over data communications, creating a hybrid datastream of voice and digital data.
- One problem associated with voice over data commumcations occurs when two users initiate an analog voice connection and subsequently wish to initiate digital data or voice over data communications.
- One method to initiate digital data or voice over data communications is to terminate the analog voice connection and re-connect in a digital data or voice over data format, however, this is inconvenient and requires hanging up and redialing between the users.
- a time-division multiplexing voice and data communication system which switches between a "SYSTEM mode” and a "POTS mode" was proposed in U.S. Patent No. 4,740,963 by Eckley, entitled “VOICE AND DATA COMMUNICATION SYSTEM".
- SYSTEM mode a multiplexer means time- division multiplexes a compressed, digitized analog voice signal with a digital data signal to produce a composite digital signal having a data rate substantially equal to the uncompressed, digitized voice signal.
- the POTS mode is the analog voice mode.
- a remote user unit switches to the SYSTEM mode upon receipt of a particular dual-tone multifrequency (DTMF) signal from a remote digital loop carrier unit.
- DTMF dual-tone multifrequency
- the Eckley system returns to POTS mode upon detection of a failure of a remote user unit or upon detection of a particular code from a central office terminal.
- the Eckley invention requires a special mode tone detector to generate a control signal to enter SYSTEM mode and a code detection circuit to detect the particular code to return to POTS mode.
- the Eckley system is designed to operate in a particular voice and data time-division multiplexing system.
- Packetized voice over data communication systems utilize several communication parameters not found in fixed time-division multiplexing systems and require negotiation of packet transmission parameters, such as speech compression ratio and speech algorithm selection.
- a mode switching control for a packetized voice over data commumcations which provides a plurality of switching means for transferring between an analog voice connection and digital data communications or voice over data commumcations without having to hang up on the original analog voice connection.
- the mode switching control should provide means for negotiating digital data and voice over data communications parameters.
- the subject ofthe present invention includes a mode switching system for establishing a digital data communications or a voice over data communications from an existing analog voice connection.
- Alternate embodiments include means for negotiation of communications parameters for digital data communications and for digital voice over data communications.
- Embodiments are also described for returning to analog voice communications after completing digital data communications or voice over data . communications.
- the major functions ofthe present system are a telephone function, a voice mail function, a fax manager function, a multi-media mail function, a show and tell function, a terminal function and an address book function.
- the telephone function is more sophisticated than a standard telephone in that the present system converts the voice into a digital signal which can be processed with echo cancellation, compressed, stored as digital data for later retrieval and transmitted as digital voice data concurrent with the transfer of digital information data.
- the voice over data (show and tell) component ofthe present system enables the operator to simultaneously transmit voice and data commumcation to a remote site.
- This voice over data function dynamically allocates data bandwidth over the telephone line depending on the demands of the voice grade digitized signal.
- a modified supervisory packet which can be used to negotiate digital data communication parameters or voice over data communications parameters.
- the modified supervisory packet negotiates nonstandard data transmission parameters, such as the speech compression algorithm and speech compression ratio, in voice over data communications.
- nonstandard data transmission parameters such as the speech compression algorithm and speech compression ratio
- data transmission parameter negotiation occurs without an interruption in the transmission of data.
- data transmission parameters can be renegotiated and changed in real time throughout the data transmission. This method may also be employed for negotiation of standard communications parameters or protocols.
- Figure 1 shows the telecommunications environment within which the present system may operate in several ofthe possible modes of communication
- Figure 2 is the main menu icon for the software components operating on the personal computer
- FIG. 3 is a block diagram of the hardware components of the present system
- Figure 4 is a detailed function flow diagram ofthe speech compression algorithm
- Figure 5 is a detailed fimction flow diagram of the speech decompression algorithm
- Figure 6 is a signal flow diagram of the speech compression algorithm
- Figure 7 is a signal flow diagram of the speech compression algorithm showing details ofthe code book synthesis
- FIG. 8 is a detailed function flow diagram of the voice/data multiplexing function
- Figure 9 is a flow diagram showing the steps for initiating digital data communications and voice over data communications from an established analog voice connection according to one embodiment of the present invention
- Figure 10 is a flow diagram showing the steps for one example of establishing a digital data link with user controlled mode switches according to one embodiment of the present invention
- Figure 11 is a flow diagram showing the steps for one example of establishing a digital data link using an answer tone according to one embodiment ofthe present invention
- Figure 12 is a flow diagram showing the steps for one example of establishing a digital data link using a calling tone according to one embodiment ofthe present invention
- Figure 13 is a flow diagram showing the steps for one example of establishing a digital data link requiring only a single user controlled mode switch according to one embodiment ofthe present invention
- Figure 14 is a flow diagram showing the steps for one example of establishing a digital data link using ATD/ATA commands according to one embodiment ofthe present invention
- Figure 15 is a flow diagram showing a sequence of steps to exit a digital data communications mode and a voice over data communications mode according to one embodiment ofthe present invention.
- Figure 1 shows a typical arrangement for the use ofthe present system.
- Personal computer 10 is running the software components ofthe present system while the hardware components 20 include the data communication equipment and telephone headset.
- Hardware components 20 communicate over a standard telephone line 30 to one of a variety of remote sites.
- One of the remote sites may be equipped with the present system including hardware components 20a and software components running on personal computer 10a.
- the local hardware components 20 may be communicating over standard telephone line 30 to facsimile machine 60.
- the present system may be communicating over a standard telephone line 30 to another personal computer 80 through a remote modem 70.
- the present system may be communicating over a standard telephone line 30 to a standard telephone 90.
- the present inventions are embodied in a commercial product by the assignee, MultiTech Systems, Inc.
- the software component operating on a personal computer is sold under the commercial trademark of MultiExpressPCSTM personal communications software while the hardware component ofthe present system is sold under the commercial name of MultiModemPCSTM, Intelligent Personal Communications System Modem.
- the software component runs under Microsoft® Windows® however those skilled in the art will readily recognize that the present system is easily adaptable to run under any single or multi-user, single or multi-window operating system.
- the present system is a multifunction commumcation system which includes hardware and software components.
- the system allows the user to connect to remote locations equipped with a similar system or with modems, facsimile machines or standard telephones over a single analog telephone line.
- the software component ofthe present system includes a number of modules which are described in more detail below.
- Figure 2 is an example of the Windows®-based main menu icon ofthe present system operating on a personal computer. The functions listed with the icons used to invoke those functions are described in the above ⁇ mentioned U.S. Patent Application Patent No. 5,452.289. issued September 19, 1995 and entitled "COMPUTER-BASED MULTIFUNCTION PERSONAL COMMUNICATIONS SYSTEM".
- the telephone module allows the system to operate as a conventional or sophisticated telephone system.
- the system converts voice into a digital signal so that it can be transmitted or stored with other digital data, like computer information.
- the telephone function supports PBX and Centrex features such a call waiting, call forwarding, caller ID and three-way calling.
- This module also allows the user to mute, hold or record a conversation.
- the telephone module enables the handset, headset or hands-free speaker telephone operation ofthe hardware component. It includes on-screen push button dialing, speed-dial of stored numbers and digital recording of two-way conversations.
- the voice mail portion ofthe present system allows this system to operate as a telephone answering machine by storing voice messages as digitized voice files along with a time/date voice stamp.
- the digitized voice files can be saved and sent to one or more destinations immediately or at a later time using a queue scheduler.
- the user can also listen to, forward or edit the voice messages which have been received with a powerful digital voice editing component ofthe present system.
- This module also creates queues for outgoing messages to be sent at preselected times and allows the users to create outgoing messages with the voice editor.
- the fax manager portion ofthe present system is a queue for incoming and outgoing facsimile pages.
- this function is tied into the Windows "print" command once the present system has been installed. This feature allows the user to create faxes from any Windows ® -based document that uses the "print" command.
- the fax manager function of the present system allows the user to view queued faxes which are to be sent or which have been received. This module creates queues for outgoing faxes to be sent at preselected times and logs incoming faxes with time/date stamps.
- the multi-media mail function ofthe present system is a utility which allows the user to compose documents that include text, graphics and voice messages using the message composer function ofthe present system, described more fully below.
- the multi-media mail utility ofthe present system allows the user to schedule messages for transmittal and queues up the messages that have been received so that can be viewed at a later time.
- the show and tell function ofthe present system allows the user to establish a data over voice (DOV) communications session.
- DOV data over voice
- This voice over data function is accomplished in the hardware components ofthe present system. It digitizes the voice and transmits it in a dynamically changing allocation of voice data and digital data multiplexed in the same transmission. The allocation at a given moment is selected depending on the amount of voice digital information required to be transferred. Quiet voice intervals allocate greater space to the digital data transmission.
- the terminal function ofthe present system allows the user to establish a data communications session with another computer which is equipped with a modem but which is not equipped with the present system.
- This feature ofthe present system is a WindowsTM-based data communications program that reduces the need for issuing "AT" commands by providing menu driven and "pop-up” window alternatives.
- the address book function of the present system is a database that is accessible from all the other functions of the present system. This database is created by the user inputting destination addresses and telephone numbers for data communication, voice mail, facsimile transmission, modem communication and the like.
- the address book function of the present system may be utilized to broadcast communications to a wide variety of recipients. Multiple linked databases have separate address books for different groups and different destinations may be created by the users.
- the address book function includes a textual search capability which allows fast and efficient location of specific addresses as described more fully below.
- FIG. 3 is a block diagram ofthe hardware components ofthe present system corresponding to reference number 20 of Figure 1. These components form the link between the user, the personal computer running the software component ofthe present system and the telephone line interface. The details ofthe system shown in Figure 3 and a detailed description ofthe schematics is found in the above-mentioned U.S. Patent No. 5,452,289 issued on September 19, 1995 and entitled "COMPUTER-BASED MUL ⁇ FUNC ⁇ ON PERSONAL COMMUNICATIONS SYSTEM".
- the telephone handset 301 the telephone handset 301
- a telephone headset 302 the telephone headset
- a hands-free microphone 303 and speaker 304 the three alternative interfaces connect to the digital telephone coder-decoder (CODEC) circuit 305.
- CDEC digital telephone coder-decoder
- the digital telephone CODEC circuit 305 interfaces with the voice control digital signal processor (DSP) circuit 306 which includes a voice control DSP and CODEC.
- DSP voice control digital signal processor
- This circuit does digital to analog (D/A) conversion, analog to digital (A D) conversion, coding/decoding, gain control and is the interface between the voice control DSP circuit 306 and the telephone interface.
- the CODEC ofthe voice control circuit 306 transfers digitized voice information in a compressed format to multiplexor circuit 310 to analog telephone line interface 309.
- the CODEC ofthe voice control circuit 306 is an integral component of a voice control digital signal processor integrated circuit.
- the voice control DSP of circuit 306 controls the digital telephone CODEC circuit 305, performs voice compression and echo cancellation.
- the data pump circuit 311 selects between the voice control DSP circuit 306 and the data pump DSP circuit 31 1 for transmission of information on the telephone line through telephone line interface circuit 309.
- the data pump circuit 311 also includes a digital signal processor (DSP) and a CODEC for communicating over the telephone line interface 309 through MUX circuit 310.
- DSP digital signal processor
- CODEC CODEC
- the data pump DSP and CODEC of circuit 311 performs functions such as modulation, demodulation and echo cancellation to communicate over the telephone line interface 309 using a plurality of telecommunications standards including FAX and modem protocols.
- the main controller circuit 313 controls the DSP data pump circuit 311 and the voice control DSP circuit 306 through serial input/output and clock timer control (SIO/CTC) circuits 312 and dual port RAM circuit 308 respectively.
- the main controller circuit 313 communicates with the voice control DSP 306 through dual port RAM circuit 308. In this fashion digital voice data can be read and written simultaneously to the memory portions of circuit 308 for high speed communication between the user (through interfaces 301, 302 or 303/304) and the personal computer connected to serial interface circuit 315 and the remote telephone connection connected through the telephone line attached to line interface circuit 309.
- the main controller circuit 313 includes a microprocessor which controls the functions and operation of all ofthe hardware components shown in Figure 3.
- the main controller is connected to RAM circuit 316 and an programmable and electrically erasable read only memory
- the PEROM circuit 317 includes non-volatile memory in which the executable control programs for the voice control DSP circuits 306 and the main controller circuits 313 operate.
- the RS232 serial interface circuit 315 communicates to the serial port ofthe personal computer which is running the software components of the present system.
- the RS232 serial interface circuit 315 is connected to a serial input/output circuit 314 with main controller circuit 313.
- SIO circuit 314 is in the preferred embodiment, a part of SIO/CTC circuit 312.
- Main controller 313 causes the data pump DSP circuit 31 1 to seize the telephone line and transmit the DTMF tones to dial a number.
- DSP 306 receives commands from the personal computer via main controller 313 to configure the digital telephone CODEC circuit 305 to enable either the handset 301 operation, the microphone 303 and speaker 304 operation or the headset 302 operation.
- a telephone connection is established through the telephone line interface circuit 309 and commumcation is enabled.
- the user's analog voice is transmitted in an analog fashion to the digital telephone CODEC 305 where it is digitized.
- the digitized voice patterns are passed to the voice control circuit 306 where echo cancellation is accomplished, the digital voice signals are reconstructed into analog signals and passed through multiplexor circuit 310 to the telephone line interface circuit 309 for analog transmission over the telephone line.
- the incoming analog voice from the telephone connection through telephone connection circuit 309 is passed to the integral CODEC of the voice control circuit 306 where it is digitized.
- the digitized incoming voice is then passed to digital telephone CODEC circuit 305 where it is reconverted to an analog signal for transmission to the selected telephone interface (either the handset 301, the microphone/speaker 303/304 or the headset 302).
- Voice Control DSP circuit 306 is programmed to perform echo cancellation to avoid feedback and echoes between transmitted and received signals, as is more fully described below.
- voice messages may be stored for later transmission or the present system may operate as an answering machine receiving incoming messages.
- the telephone interface is used to send the analog speech patterns to the digital telephone CODEC circuit 305.
- Circuit 305 digitizes the voice patterns and passes them to voice control circuit 306 where the digitized voice patterns are digitally compressed.
- the digitized and compressed voice patterns are passed through dual port ram circuit 308 to the main controller circuit 313 where they are transferred through the serial interface to the personal computer using a packet protocol defined below.
- the voice patterns are then stored on the disk of the personal computer for later use in multi-media mail, for voice mail, as a pre- recorded answering machine message or for later predetermined transmission to other sites.
- the hardware components of Figure 3 are placed in answer mode.
- An incoming telephone ring is detected through the telephone line interface circuit 309 and the main controller circuit 313 is alerted which passes the information off to the personal computer through the RS232 serial interface circuit 315.
- the telephone line interface circuit 309 seizes the telephone line to make the telephone connection.
- a pre-recorded message may be sent by the personal computer as compressed and digitized speech through the RS232 interface to the main controller circuit 313.
- the compressed and digitized speech from the personal computer is passed from main controller circuit 313 through dual port ram circuit 308 to the voice control DSP circuit 306 where it is uncompressed and converted to analog voice patterns.
- These analog voice patterns are passed through multiplexor circuit 310 to the telephone line interface 309 for transmission to the caller. Such a message may invite the caller to leave a voice message at the sound of a tone.
- the incoming voice messages are received through telephone line interface 309 and passed to voice control circuit 306.
- the analog voice patterns are digitized by the integral CODEC of voice control circuit 306 and the digitized voice patterns are compressed by the voice control DSP ofthe voice control circuit 306.
- the digitized and compressed speech pattems are passed through dual port ram circuit 308 to the main controller circuit 313 where they are transferred using packet protocol described below through the RS232 serial interface 315 to the personal computer for storage and later retrieval. In this fashion the hardware components of Figure 3 operate as a transmit and receive voice mail system for implementing the voice mail function 117 of the present system.
- the hardware components of Figure 3 may also operate to facilitate the fax manager function 119 of Figure 2.
- a ring detect circuit ofthe telephone line interface 309 which will alert the main controller circuit 313 to the incoming call.
- Main controller circuit 313 will cause line interface circuit 309 to seize the telephone line to receive the call.
- Main controller circuit 313 will also concurrently alert the operating programs on the personal computer through the RS232 interface using the packet protocol described below.
- a fax carrier tone is transmitted and a return tone and handshake is received from the telephone line and detected by the data pump circuit 311.
- the reciprocal transmit and receipt ofthe fax tones indicates the imminent receipt of a facsimile transmission and the main controller circuit 313 configures the hardware components of Figure 3 for the receipt of that information.
- the necessary handshaking with the remote facsimile machine is accomplished through the data pump 31 1 under control of the main controller circuit 313.
- the incoming data packets of digital facsimile data are received over the telephone line interface and passed through data pump circuit 31 1 to main controller circuit 313 which forwards the information on a packet basis (using the packet protocol described more fully below) through the serial interface circuit 315 to the personal computer for storage on disk.
- main controller circuit 313 which forwards the information on a packet basis (using the packet protocol described more fully below) through the serial interface circuit 315 to the personal computer for storage on disk.
- the FAX data could be transferred from the telephone line to the personal computer using the same path as the packet transfer except using the normal AT stream mode.
- a facsimile transmission is also facilitated by the hardware components of Figure 3.
- the transmission of a facsimile may be immediate or queued for later transmission at a predetermined or preselected time.
- Control packet information to configure the hardware components to send a facsimile are sent over the RS232 serial interface between the personal computer and the hardware components of Figure 3 and are received by main controller circuit 313.
- the data pump circuit 311 then dials the recipient's telephone number using DTMF tones or pulse dialing over the telephone line interface circuit 309. Once an appropriate connection is established with the remote facsimile machine, standard facsimile handshaking is accomplished by the data pump circuit 311.
- the digital facsimile picture information is received through the data packet protocol transfer over serial line interface circuit 315, passed through main controller circuit 313 and data pump circuit 311 onto the telephone line through telephone line interface circuit 309 for receipt by the remote facsimile machine.
- a multimedia transmission consists of a combination of picture information, digital data and digitized voice information.
- the type of multimedia information transferred to a remote site using the hardware components of Figure 3 could be the multimedia format ofthe MicroSoft ® Multimedia Wave ® format with the aid of an Intelligent Serial Interface (ISI) card added to the personal computer.
- the multimedia may also be the type of multimedia information assembled by the software component ofthe present system which is described more fully below.
- the multimedia package of information including text, graphics and voice messages (collectively called the multimedia document) may be transmitted or received through the hardware components shown in Figure 3.
- the transmission of a multimedia document through the hardware components of Figure 3 is accomplished by transferring the multimedia digital information using the packet protocol described below over the RS232 serial interface between the personal computer and the serial line interface circuit 315.
- the packets are then transferred through main controller circuit 313 through the data pump circuit 311 on to the telephone line for receipt at a remote site through telephone line interface circuit 309.
- the multimedia documents received over the telephone line from the remote site are received at the telephone line interface circuit 309, passed through the data pump circuit 311 for receipt and forwarding by the main controller circuit 313 over the serial line interface circuit 315.
- the show and tell function 123 ofthe present system allows the user to establish a data over voice communication session. In this mode of operation, full duplex data transmission may be accomplished simultaneously with the voice communication between both sites. This mode of operation assumes a like- configured remote site.
- the hardware components ofthe present system also include a means for sending voice/data over cellular links.
- the protocol used for transmitting multiplexed voice and data include a supervisory packet described more fully below to keep the link established through the cellular link. This supervisory packet is an acknowledgement that the link is still up.
- the supervisory packet may also contain link information to be used for adjusting various link parameters when needed.
- This supervisory packet is sent ever ⁇ ' second when data is not being sent and if the packet is not acknowledged after a specified number of attempts, the protocol would then give an indication that the cellular link is down and then allow the modem to take action.
- the action could be for example; change speeds, retrain, or hang up.
- the use of supervisory packets is a novel method of maintaining inherently intermittent cellular links when transmitting multiplexed voice and data.
- the voice portion ofthe voice over data transmission ofthe show and tell function is accomplished by receiving the user's voice through the telephone interface 301, 302 or 303 and the voice information is digitized by the digital telephone circuit 305.
- the digitized voice information is passed to the voice control circuit 306 where the digitized voice information is compressed using a voice compression algorithm as described in the above-mentioned U.S. Patent No. 5,452,289, issued September 19. 1995 and entitled "COMPUTER- BASED MULTIFUNCTION PERSONAL COMMUNICA ⁇ ONS SYSTEM".
- the digitized and compressed voice information is passed through dual port RAM circuit 308 to the main controller circuit 313.
- a quiet flag is passed from voice control circuit 306 to the main controller 313 through a packet transfer protocol described below by a dual port RAM circuit 308.
- main controller circuit 313 in the show and tell function of the present system must efficiently and effectively combine the digitized voice information with the digital information for transmission over the telephone line via telephone line interface circuit 309.
- main controller circuit 313 dynamically changes the amount of voice information and digital information transmitted at any given period of time depending upon the quiet times during the voice transmissions. For example, during a quiet moment where there is no speech information being transmitted, main controller circuit 313 ensures that a higher volume of digital data information be transmitted over the telephone line interface in lieu of digitized voice information.
- the packets of digital data transmitted over the telephone line interface with the transmission packet protocol described below requires 100 percent accuracy in the transmission of the digital data, but a lesser standard of accuracy for the transmission and receipt ofthe digitized voice information. Since digital information must be transmitted with 100 percent accuracy, a corrupted packet of digital information received at the remote site must be re-transmitted. A retransmission signal is communicated back to the local site and the packet of digital information which was corrupted during transmission is retransmitted. Ifthe packet transmitted contained voice data, however, the remote site uses the packets whether they were corrupted or not as long as the packet header was intact. Ifthe header is corrupted, the packet is discarded.
- the voice information may be corrupted without requesting retransmission since it is understood that the voice information must be transmitted on a real time basis and the corruption of any digital information of the voice signal is not critical. In contrast to this the transmission of digital data is critical and retransmission of corrupted data packets is requested by the remote site.
- the transmission ofthe digital data follows the CCITT V.42 standard, as is well known in the industry and as described in the CCITT Blue Book, volume VIII entitled Data Communication over the Telephone Network, 1989.
- the CCITT V.42 standard is hereby inco ⁇ orated by reference.
- the voice data packet information also follows the CCITT V.42 standard, but uses a different header format so the receiving site recognizes the difference between a data packet and a voice packet.
- the voice packet is distinguished from a data packet by using undefined bits in the header (80 hex) ofthe V.42 standard.
- the packet protocol for voice over data transmission during the show and tell function ofthe present system is described more fully below.
- incoming data packets and incoming voice packets are received by the hardware components of Figure 3.
- the incoming data packets and voice packets are received through the telephone line interface circuit 309 and passed to the main controller circuit 313 via data pump DSP circuit 31 1.
- the incoming data packets are passed by the main controller circuit 313 to the serial interface circuit 315 to be passed to the personal computer.
- the incoming voice packets are passed by the main controller circuit 313 to the dual port RAM circuit 308 for receipt by the voice control DSP circuit 306.
- the voice packets are decoded and the compressed digital information therein is uncompressed by the voice control DSP of circuit 306.
- the uncompressed digital voice information is passed to digital telephone CODEC circuit 305 where it is reconverted to an analog signal and retransmitted through the telephone line interface circuits. In this fashion full-duplex voice and data transmission and reception is accomplished through the hardware components of Figure 3 during the show and tell functional operation ofthe present system.
- Terminal operation 125 ofthe present system is also supported by the hardware components of Figure 3.
- Terminal operation means that the local personal computer simply operates as a "dumb" terminal including file transfer capabilities. Thus no local processing takes place other than the handshaking protocol required for the operation of a dumb terminal.
- the remote site is assumed to be a modem connected to a personal computer but the remote site is not necessarily a site which is configured according to the present system.
- the command and data information from personal computer is transferred over the RS232 serial interface circuit 315, forwarded by main controller circuit 313 to the data pump circuit 311 where the data is placed on the telephone line via telephone line interface circuit 309.
- data pump circuit 311 In a reciprocal fashion, data is received from the telephone line over telephone line interface circuit 309 and simply forwarded by the data pump circuit 311 , the main controller circuit 313 over the serial line interface circuit 315 to the personal computer.
- the address book function of the present system is primarily a support function for providing telephone numbers and addresses for the other various functions ofthe present system.
- a special packet protocol is used for communication between the hardware components 20 and the personal computer (PC) 10.
- the protocol is used for transferring different types of information between the two devices such as the transfer of DATA, VOICE, and QUALIFIED information.
- a Data Packet is used for normal data transfer between the controller 313 of hardware component 20 and the computer 10 for such things as text, file transfers, binary data and any other type of information presently being sent through modems. All packet transfers begin with a synch character 01 hex (synchronization byte).
- the Data Packet begins with an ID byte which specifies the packet type and packet length.
- the Voice Packet is used to transfer compressed VOICE messages between the controller 313 of hardware component 20 and the computer 10.
- the Voice Packet is similar to the Data Packet except for its length which is, in one embodiment, currently fixed at 23 bytes of data.
- all packets begin with a synchronization character chosen in the preferred embodiment to be 01 hex (01H).
- the ID byte ofthe Voice Packet is completely a zero byte: all bits are set to zero.
- the Qualified Packet is used to transfer commands and other non- data/voice related information between the controller 313 of hardware component 20 and the computer 10 and start with a synchronization character chosen in one embodiment to be 01 hex (01H).
- a Qualified Packet starts with two bytes where the first byte is the ID byte and the second byte is the QUALIFIER type identifier.
- a supervisory packet is also used. Both sides ofthe cellular link will send the cellular supervisory packet every 1 to 3 seconds. Upon receiving the cellular supervisory packet, the receiving side will acknowledge it using the ACK field ofthe cellular supervisory packet. Ifthe sender does not receive an acknowledgement within one second, it will repeat sending the cellular supervisory packet up to 12 times. After 12 attempts of sending the cellular supervisory packet without an acknowledgement, the sender will disconnect the line. Upon receiving an acknowledgement, the sender will restart its 3 second timer. Those skilled in the art will readily recognize that the timer values and wait times selected here may be varied without departing from the spirit or scope ofthe present invention.
- a modified supervisory packet was described in detail in the above-mentioned US Patent Application Serial Number 08/271,496 filed July 7, 1994 entitled "VOICE OVER DATA MODEM WITH SELECTABLE VOICE COMPRESSION".
- the modified supervisory packet was described as an independent communications channel.
- One example demonstrated the use of a modified supervisory packet in the negotiation of nonstandard communication parameters. For instance, the modified supervisory packet is used to negotiate speech algorithm selection and speech compression ratios. Other examples were given, and those given here are not intended in a limiting or exclusive sense.
- Packet Protocol Between the PC and the Hardware Component A special packet protocol is used for communication between the hardware components 20 and the personal computer (PC) 10.
- the protocol is used for transferring different types of information between the two devices such as the transfer of DATA, VOICE, and QUALIFIED information.
- the protocol also uses the BREAK as defined in CCITT X.28 as a means to maintain protocol synchronization. A description of this BREAK sequence is also described in the Statutory Invention Registration entitled “ESCAPE METHODS FOR MODEM COMMUNICATIONS", to Timothy D. Gunn filed January 8, 1993, which is hereby inco ⁇ orated by reference.
- the protocol has two modes of operation. One mode is packet mode and the other is stream mode.
- the protocol allows mixing of different types of information into the data stream without having to physically switch modes of operation.
- the hardware component 20 will identify the packet received from the computer 10 and perform the appropriate action according to the specifications ofthe protocol. If it is a data packet, then the controller 313 of hardware component 20 would send it to the data pump circuit 31 1. If the packet is a voice packet, then the controller 313 of hardware component 20 would distribute that information to the Voice DSP 306. This packet transfer mechanism also works in the reverse, where the controller 313 of hardware component 20 would give different information to the computer 10 without having to switch into different modes.
- the packet protocol also allows commands to be sent to either the main controller 313 directly or to the Voice DSP 306 for controlling different options without having to enter a command state.
- Packet mode is made up of 8 bit asynchronous data and is identified by a beginning synchronization character (01 hex) followed by an ID/LI character and then followed by the information to be sent.
- ID/LI character codes defined below, those skilled in the art will readily recognize that other ID/LI character codes could be defined to allow for additional types of packets such as video data, or altemate voice compression algorithm packets such as Codebook Excited Linear Predictive Coding (CELP) algorithm, GSM, RPE, VSELP, etc.
- CELP Codebook Excited Linear Predictive Coding
- Stream mode is used when large amounts of one type of packet (VOICE, DATA, or QUALIFIED) is being sent.
- the transmitter tells the receiver to enter stream mode by a unique command. Thereafter, the transmitter tells the receiver to terminate stream mode by using the BREAK command followed by an "AT" type command.
- the command used to terminate the stream mode can be a command to enter another type of stream mode or it can be a command to enter back into packet mode.
- DATA DATA.
- VOICE VOICE
- Table 1 shows the common packet parameters used for all three packet types.
- Table 2 shows the three basic types of packets with the sub-types listed.
- a Data Packet is shown in Table 1 and is used for normal data transfer between the controller 313 of hardware component 20 and the computer 10 for such things as text, file transfers, binary data and any other type of information presently being sent through modems. All packet transfers begin with a synch character 01 hex (synchronization byte) The Data Packet begins with an ID byte which specifies the packet type and packet length. Table 3 describes the Data Packet byte structure and Table 4 describes the bit stmcture of the ID byte of the Data Packet. Table 5 is an example of a Data Packet with a byte length of 6. The value ofthe LI field is the actual length ofthe data field to follow, not counting the ID byte.
- Bit 7 identifies the type of packet Bits 6 - 0 contain the LI or length indicator portion ofthe ID byte
- the Voice Packet is used to transfer compressed VOICE messages between the controller 313 of hardware component 20 and the computer 10.
- the Voice Packet is similar to the Data Packet except for its length which is, in the preferred embodiment, currently fixed at 23 bytes of data. Once again, all packets begin with a synchronization character chosen in the preferred embodiment to be 01 hex (01H).
- the ID byte ofthe Voice Packet is completely a zero byte: all bits are set to zero.
- Table 6 shows the ID byte ofthe Voice Packet and Table 7 shows the Voice Packet byte stmcture. TABLE 6: ID Byte of Voice Packet
- LI (length indicator) 0 23 bytes of data
- the Qualified Packet is used to transfer commands and other non- data/voice related information between the controller 313 of hardware component 20 and the computer 10.
- the various species or types ofthe Qualified Packets are described below and are listed above in Table 2.
- all packets start with a synchronization character chosen in the preferred embodiment to be 01 hex (01H).
- a Qualified Packet starts with two bytes where the first byte is the ID byte and the second byte is the QUALIFIER type identifier.
- Table 8 shows the ID byte for the Qualified Packet
- Table 9 shows the byte stmcture of the Qualified Packet
- Tables 10-12 list the Qualifier Type byte bit maps for the three types of Qualified Packets. TABLE 8: ID Byte of Qualified Packet
- the bit maps ofthe Qualifier Byte (QUAL BYTE) ofthe Qualified Packet are shown in Tables 10-12.
- Table 10 describes the Qualifier Byte of Qualified Packet, Group 1 which are immediate commands.
- Table 1 1 describes the Qualifier Byte of Qualified Packet, Group 2 which are stream mode commands in that the command is to stay in the designated mode until a BREAK + INIT command string is sent.
- Table 12 describes the Qualifier Byte of Qualified Packet, Group 3 which are information or status commands.
- a supervisory packet shown in Table 13 is used. Both sides ofthe cellular link will send the cellular supervisory packet every 3 seconds. Upon receiving the cellular supervisory packet, the receiving side will acknowledge it using the ACK field ofthe cellular supervisory packet. Ifthe sender does not receive an acknowledgement within one second, it will repeat sending the cellular supervisory packet up to 12 times. After 12 attempts of sending the cellular supervisory packet without an acknowledgement, the sender will disconnect the line. Upon receiving an acknowledgement, the sender will restart its 3 second timer. Those skilled in the art will readily recognize that the timer values and wait times selected here may be varied without departing from the spirit or scope ofthe present invention. TABLE 13: Cellular Supervisory Packet Byte Structure
- the user is talking either through the handset, the headset or the microphone/speaker telephone interface.
- the analog voice signals are received and digitized by the telephone CODEC circuit 305.
- the digitized voice information is passed from the digital telephone CODEC circuit 305 to the voice control circuits 306.
- the digital signal processor (DSP) ofthe voice control circuit 306 is programmed to do the voice compression algorithm.
- the source code programmed into the voice control DSP is in the microfiche appendix of U.S. Patent No. 5,452,289, issued September 19, 1995 and entitled "COMPUTER-BASED MULTIFUNCTION PERSONAL
- the DSP ofthe voice control circuit 306 compresses the speech and places the compressed digital representations ofthe speech into special packets described more fully below.
- the compressed voice information is passed to the dual port ram circuit 308 for either forwarding and storage on the disk ofthe personal computer via the RS232 serial interface or for multiplexing with conventional modem data to be transmitted over the telephone line via the telephone line interface circuit 309 in the voice-over-data mode of operation Show and Tell function 123.
- the compressed speech bits are multiplexed with data bits using a packet format described below. Three compression rates are described herein which will be called 8Kbit/sec, 9.6Kbit/sec and 16Kbit/sec.
- Speech Compression Algorithm To multiplex high-fidelity speech with digital data and transmit both over the telephone line, a high available bandwidth would normally be required.
- the analog voice information is digitized into 8-bit PCM data at an 8 KHz sampling rate producing a serial bit stream of 64,000 bps serial data rate. This rate cannot be transmitted over the telephone line.
- the 64 Kbs digital voice data is compressed into a 9500 bps encoding bit stream using a fixed-point (non-floating point) DSP such that the compressed speech can be transmitted over the telephone line multiplexed with asynchronous data. This is accomplished in an efficient manner such that enough machine cycles remain during real time speech compression to allow to allow for echo cancellation in the same fixed-point DSP.
- a silence detection function is used to detect quiet intervals in the speech signal which allows the data processor to substitute asynchronous data in lieu of voice data packets over the telephone line to efficiently time multiplex the voice and asynchronous data transmission.
- the allocation of time for asynchronous data transmission is constantly changing depending on how much silence is on the voice channel.
- the voice compression algorithm ofthe present system relies on a model of human speech which shows that human speech contains redundancy inherent in the voice pattems. Only the incremental innovations (changes) need to be transmitted.
- the algorithm operates on 128 digitized speech samples (20 milliseconds at 6400 Hz), divides the speech samples into time segments of 32 samples (5 milliseconds) each, and uses predicted coding on each segment.
- the input to the algorithm could be either PCM data sampled at 6400 Hz or 8000 Hz. If the sampling is at 8000 Hz, or any other selected sampling rate, the input sample data stream must be decimated from 8000 Hz to 6400 Hz before processing the speech data. At the output, the 6400 Hz PCM signal is inte ⁇ olated back to 8000 Hz and passed to the CODEC.
- the current segment is predicted as best as possible based on the past recreated segments and a difference signal is determined.
- the difference values are compared to the stored difference values in a lookup table or code book, and the address ofthe closest value is sent to the remote site along with the predicted gain and pitch values for each segment. In this fashion, the entire 20 milliseconds of speech can be represented by 190 bits, thus achieving an effective data rate of 9500 bps.
- the present system includes a unique Vector Quantization (VQ) speech compression algorithm designed to provide maximum fidelity with minimum compute power and bandwidth.
- VQ Vector Quantization
- the VQ algorithm has two major components. The first section reduces the dynamic range of the input speech signal by removing short term and long term redundancies. This reduction is done in the waveform domain, with the synthesized part used as the reference for determining the incremental "new" content.
- the second section maps the residual signal into a code book optimized for preserving the general spectral shape ofthe speech signal.
- Figure 4 is a high level signal flow block diagram ofthe speech compression algorithm used in the present system to compress the digitized voice for transmission over the telephone line in the voice over data mode of operation or for storage and use on the personal computer.
- the transmitter and receiver components are implemented using the programmable voice control DSP/CODEC circuit 306 shown in Figure 3.
- the DC removal stage 1101 receives the digitized speech signal and removes the D.C. bias by calculating the long-term average and subtracting it from each sample. This ensures that the digital samples ofthe speech are centered about a zero mean value.
- the pre-emphasis stage 1103 whitens the spectral content ofthe speech signal by balancing the extra energy in the low band with the reduced energy in the high band.
- the system finds the innovation in the current speech segment by subtracting 1109 the prediction from reconstmcted past samples synthesized from synthesis stage 1107. This process requires the synthesis ofthe past speech samples locally (analysis by synthesis).
- the synthesis block 1107 at the transmitter performs the same function as the synthesis block 1113 at the receiver.
- a difference term is produced in the form of an error signal.
- This residual error is used to find the best match in the code book 1105.
- the code book 1105 quantizes the error signal using a code book generated from a representative set of speakers and environments. A minimum mean squared error match is determined in segments.
- the code book is designed to provide a quantization error with spectral rolloff (higher quantization error for low frequencies and lower quantization error for higher frequencies).
- the quantization noise spectrum in the reconstmcted signal will always tend to be smaller than the underlying speech signal.
- each frame of 20ms is divided into 4 sub-blocks or segments of 5ms each.
- Each sub-block of data consists of a plurality of bits for the long term predictor, a plurality of bits for the long term predictor gain, a plurality of bits for the sub-block gain, and a plurality of bits for each code book entry for each 5ms.
- each 1.25ms of speech is looked up in a 512 word code book for the best match.
- the table entry is transmitted rather than the actual samples.
- the code book entries are pre-computed from representative speech segments, as described more fully below.
- the synthesis block 1113 at the receiver performs the same function as the synthesis block 1107 at the transmitter.
- the synthesis block 1113 reconstmcts the original signal from the voice data packets by using the gain and pitch values and code book address corresponding to the error signal most closely matched in the code book.
- the code book at the receiver is similar to the code book 1105 in the transmitter.
- the de-emphasis stage 1115 inverts the pre- emphasis operation by restoring the balance of original speech signal.
- the complete speech compression algorithm is summarized as follows: a) Digitally sample the voice to produce a PCM sample bit stream sampled at 8,000 samples per second. b) Decimate the 8,000 samples per second sampled data to produce a sampling rate of 6,400 samples per second for the 9.6Kbit/sec compression rate (6,000 samples per second for the 8Kbit sec algorithm and 8,000 samples per second for the 16Kbit/sec algorithm). c) Remove any D.C. bias in the speech signal. d) Pre-emphasize the signal. e) Find the innovation in the current speech segment by subtracting the prediction from reconstructed past samples. This step requires the synthesis ofthe past speech samples locally (analysis by synthesis) such that the residual error is fed back into the system.
- f) Quantize the error signal using a code book generated from a representative set of speakers and environments. A minimum mean squared error match is determined in 5ms segments.
- the code book is designed to provide a quantization error with spectral rolloff (higher quantization error for low frequencies and lower quantization error for higher frequencies).
- the quantization noise spectrum in the reconstmcted signal will always tend to be smaller than the underlying speech signal.
- g) At the transmitter and the receiver, reconstruct the speech from the quantized error signal fed into the inverse ofthe function in step (e) above. Use this signal for analysis by synthesis and for the output to the reconstmction stage below, h) Use a de-emphasis filter to reconstmct the output.
- the major advantages of this approach over other low-bit-rate algorithms are that there is no need for any complicated calculation of reflection coefficients (no matrix inverse or lattice filter computations). Also, the quantization noise in the output speech is hidden under the speech signal and there are no pitch tracking artifacts: the speech sounds "natural", with only minor increases of background hiss at lower bit-rates.
- the computational load is reduced significantly compared to a VSELP algorithm and variations ofthe present algorithm thus provides bit rates of 8, 9.6 and 16 Kbit/sec, and can also provide bit rates of 9.2Kbit/sec, 9.5Kbit/sec and many other rates.
- the total delay through the analysis section is less than 20 milliseconds in the 9.6Kbit sec embodiment.
- the present algorithm is accomplished completely in the waveform domain and there is no spectral information being computed and there is no filter computations needed.
- the speech compression algorithm operates within the programmed control ofthe voice control DSP circuit 306.
- the speech or analog voice signal is received through the telephone interface 301, 302 or 303 and is digitized by the digital telephone CODEC circuit 305.
- the CODEC for circuit 305 is a companding ⁇ -law CODEC.
- the analog voice signal from the telephone interface is band-limited to about 3,000 Hz and sampled at a selected sampling rate by digital telephone CODEC 305.
- the sample rates in the 9.6Kbit/sec embodiment ofthe present invention are 8Ksample/sec.
- Each sample is encoded into 8-bit PCM data producing a serial 64kb/s.
- the digitized samples are passed to the voice control DSP/CODEC of circuit 306.
- the 8-bit ⁇ -law PCM data is converted to 13-bit linear PCM data.
- the 13-bit representation is necessary to accurately represent the linear version of the logarithmic 8-bit ⁇ -law PCM data. With linear PCM data, simpler mathematics may be performed on the PCM data.
- the voice control DSP/CODEC of circuit 306 corresponds to a single integrated circuit, for example, a WF ⁇ DSP16C Digital Signal Processor/CODEC from AT&T Microelectronics which is a combined digital signal processor and a linear CODEC in a single chip as described above.
- the digital telephone CODEC of circuit 305 corresponds to an integrated circuit, such as a T7540 companding ⁇ -law CODEC.
- the sampled and digitized PCM voice signals from the telephone ⁇ - law CODEC 305 shown in Figure 3 are passed to the voice control DSP/CODEC circuit 308 via direct data lines clocked and synchronized to a clocking frequency.
- the sample rate in CODEC 305 in this embodiment ofthe present invention is 8Ksample/sec.
- the digital samples are loaded into the voice control DSP/CODEC one at a time through the serial input and stored into an intemal queue held in RAM, converted to linear PCM data and decimated to a sample rate of 6.4Ksample/sec. As the samples are loaded into the end ofthe queue in the RAM ofthe voice control DSP, the samples at the head of the queue are operated upon by the voice compression algorithm.
- the voice compression algorithm then produces a greatly compressed representation ofthe speech signals in a digital packet form.
- the compressed speech signal packets are then passed to the dual port RAM circuit 308 shown in Figure 3 for use by the main controller circuit 313 for either transferring in the voice-over-data mode of operation or for transfer to the personal computer for storage as compressed voice for functions such as telephone answering machine message data, for use in the multi-media documents and the like.
- voice control DSP/CODEC circuit 306 of Figure 3 will be receiving digital voice PCM data from the digital telephone CODEC circuit 305, compressing it and transferring it to dual port RAM circuit 308 for multiplexing and transfer over the telephone line. This is the transmit mode of operation ofthe voice control DSP/CODEC circuit 306 corresponding to transmitter block 1100 of Figure 4 and corresponding to the compression algorithm of Figure 4.
- the voice control DSP/CODEC circuit 306 is receiving compressed voice data packets from dual port RAM circuit 308, uncompressing the voice data and transferring the uncompressed and reconstmcted digital PCM voice data to the digital telephone CODEC 305 for digital to analog conversion and eventual transfer to the user through the telephone interface 301, 302, 304.
- This is the receive mode of operation ofthe voice control DSP/CODEC circuit 306 corresponding to receiver block 1200 of Figure 4 and corresponding to the decompression algorithm of Figure 6.
- the voice-control DSP/CODEC circuit 306 is processing the voice data in both directions in a full- duplex fashion.
- the voice control DSP/CODEC circuit 306 operates at a clock frequency of approximately 24.576MHz while processing data at sampling rates of approximately 8KHz in both directions.
- the voice compression/decompression algorithms and packetization ofthe voice data is accomplished in a quick and efficient fashion to ensure that all processing is done in real-time without loss of voice information. This is accomplished in an efficient manner such that enough machine cycles remain in the voice control DSP circuit 306 during real time speech compression to allow real time acoustic and line echo cancellation in the same fixed- point DSP.
- PCM voice data from the ⁇ -law digital telephone CODEC circuit 305 causes an interrupt in the voice control DSP/CODEC circuit 306 where the sample is loaded into intemal registers for processing. Once loaded into an intemal register it is transferred to a RAM address which holds a queue of samples. The queued PCM digital voice samples are converted from 8-bit ⁇ -law data to a 13-bit linear data format using table lookup for the conversion. Those skilled in the art will readilv recognize that the digital telephone CODEC circuit 305 could also be a linear CODEC.
- Sample Rate Decimation The sampled and digitized PCM voice signals from the telephone ⁇ - law CODEC 305 shown in Figure 3 are passed to the voice control DSP/CODEC circuit 308 via direct data lines clocked and synchronized to a clocking frequency.
- the sample rate in this embodiment ofthe present invention is 8Ksample/sec.
- the digital samples for the 9.6Kbit sec and 8Kbit sec algorithms are decimated using a digital decimation process to produce a 6.4Ksample/sec and 6Ksample/sec rate, respectively. For the 16Kbit/sec algorithm, no decimation is needed.
- the decimated digital samples are shown as speech entering the transmitter block 1100.
- the transmitter block is the mode of operation ofthe voice-control DSP/CODEC circuit 306 operating to receive local digitized voice information, compress it and packetize it for transfer to the main controller circuit 313 for transmission on the telephone line.
- the telephone line connected to telephone line interface 309 of Figure 3 corresponds to the channel l l l l of Figure 4.
- a frame rate for the voice compression algorithm is 20 milliseconds of speech for each compression. This correlates to 128 samples to process per frame for the 6.4K decimated sampling rate. When 128 samples are accumulated in the queue ofthe intemal DSP RAM, the compression of that sample frame is begun.
- the voice-control DSP/CODEC circuit 306 is programmed to first remove the DC component 1101 ofthe incoming speech.
- the DC removal is an adaptive function to establish a center base line on the voice signal by digitally adjusting the values ofthe PCM data. This corresponds to the DC removal stage 1203 ofthe software flow chart of Figure 5.
- the formula for removal ofthe DC bias or drift is as follows:
- n sample number
- s(n) is the current sample
- x(n) is the sample with the DC bias removed.
- a 6Ksample/sec output 1250 is produced for the 8Kbit/sec algorithm, and no decimation is performed on the 8Ksample/sec voice sample rate 1252 for the 16Kbit sec algorithm.
- the analysis and compression begin at block 1201 where the 13-bit linear PCM speech samples are accumulated until 128 samples (for the 6.4Ksample/sec decimated sampling rate) representing 20 milliseconds of voice or one frame of voice is passed to the DC removal portion of code operating within the programmed voice control DSP/CODEC circuit 306.
- the DC removal portion of the code described above approximates the base line ofthe frame of voice by using an adaptive DC removal technique.
- a silence detection algorithm 1205 is also included in the programmed code ofthe DSP/CODEC 306.
- the silence detection function is a summation ofthe square of each sample ofthe voice signal over the frame. If the power ofthe voice frame falls below a preselected threshold, this would indicate a silent frame.
- the detection of a silence frame of speech is important for later multiplexing ofthe V-data (voice data) and C-data (asynchronous computer data) described below.
- the main controller circuit 313 will transfer conventional digital data (C-data) over the telephone line in lieu of voice data (V-data).
- C-data digital data
- n is the sample number
- x (n) is the sample value
- the present voice frame is flagged as containing silence.
- the 128-sample (Decimated Samples) silent frame is still processed by the voice compression algorithm; however, the silent frame packets are discarded by the main controller circuit 313 so that asynchronous digital data may be transferred in lieu of voice data.
- the rest of the voice compression is operated upon in segments where there are four segments per frame amounting to 32 samples of data per segment (Sub-Block Size). It is only the DC removal and silence detection which is accomplished over an entire 20 millisecond frame.
- the pre-emphasis 1207 of the voice compression algorithm shown in Figure 4 is the next step.
- the sub-blocks are first passed through a pre-emphasis stage which whitens the spectral content of the speech signal by balancing the extra energy in the low band with the reduced energy in the high band.
- the pre-emphasis essentially flattens the signal by reducing the dynamic range of the signal. By using pre-emphasis to flatten the dynamic range of the signal, less of a signal range is required for compression making the compression algorithm operate more efficiently.
- the formula for the pre-emphasis is
- the next step is the long-term prediction (LTP).
- the long-term prediction is a method to detect the innovation in the voice signal. Since the voice signal contains many redundant voice segments, we can detect these redundancies and only send information about the changes in the signal from one segment to the next. This is accomplished by comparing the speech samples ofthe current segment on a sample by sample basis to the reconstmcted speech samples from the previous segments to obtain the innovation information and an indicator ofthe error in the prediction.
- the long-term predictor gives the pitch and the LTP-Gain ofthe sub ⁇ block which are encoded in the transmitted bit stream.
- This value is coded with 6-bits.
- the pitch for segments 0 and 3 is encoded with 6 bits
- the pitch for segments 1 and 2 is encoded with 5 bits.
- the Pitch j is chosen as that which maximizes ⁇ . Since ⁇ j is positive, only j with positive S ? is considered.
- the prev_pitch parameter in the above equation is the ofthe pitch ofthe previous sub- segment.
- the LTP-Gain is given by
- the value ofthe ⁇ is a normalized quantity between zero and unity for this segment where ⁇ is an indicator ofthe correlation between the segments. For example, a perfect sine wave would produce a ⁇ which would be close to unity since the correlation between the current segments and the previous reconstmcted segments should be almost a perfect match so ⁇ is one.
- the error signal is normalized with respect to the maximum amplitude in the sub-segment for vector-quantization ofthe error signal.
- the maximum amplitude in the segment is obtained as follows:
- the maximum amplitude (G) is encoded using the Gain Encode Table. This table is characterized in Table 17.
- the encoded amplitude (gcode) is transmitted to the far end. At the receiver, the maximum amplitude is retrieved from Table 18, as follows:
- the error signal e(n) is then normalized by
- BGCODE 6 * gcode + bcode.
- the encoded bits for the G and LTP-Gain ( ⁇ ) at the receiver can be obtained as follows:
- Each segment of 32 samples (Sub-Block Size) is divided into 4 vectors of 8 samples (VSIZE) each.
- Each vector is compared to the vectors stored in the CodeBook and the Index ofthe Code Vector that is closest to the signal vector is selected.
- the CodeBook consists of 512 entries (512 addresses). The index chosen has the least difference according to the following minimization formula: VSIZE - 1
- x ⁇ the input vector of VSIZE samples (8 for the 9.6Kbit/sec algorithm)
- y ⁇ the code book vector of VSIZE samples (8 for the 9.6Kbit/sec algorithm).
- Sub-Block Size VSIZE CodeBook indices (4 CodeBook Indices, 9 bits each, for the 9.6Kbit/sec algorithm). Therefore, for the 9.6Kbit/sec algorithm, for each Sub-Block Size segment we will transmit 36 bits representing that segment.
- the input speech samples are replaced by the corresponding vectors in the chosen indexes. These values are then multiplied by the G q to denormalize the synthesized error signal, e'(n). This signal is then passed through the Inverse Pitch Filter to reintroduce the Pitch effects that was taken out by the Pitch filter.
- the Inverse Pitch Filter is performed as follows:
- ⁇ is the Quantized LTP-Gain from Table 16, and j is the Lag.
- the Inverse Pitch Filter output is used to update the synthesized speech buffer which is used for the analysis ofthe next sub-segment.
- the update of the state buffer is as follows:
- the signal is then passed through the deemphasis filter since preemphasis was performed at the beginning ofthe processing.
- the preemphasis state is updated so that we properly satisfy the Analysis-by- Synthesis method of performing the compression.
- the output ofthe deemphasis filter, s' ( ) is passed on to the D/A to generate analog speech.
- the voice is reconstmcted at the receiving end ofthe voice-over data link according to the reverse ofthe compression algorithm as shown as the decompression algorithm in Figure 6.
- the decompression algorithm simply discards the received frame and initialize the output with zeros. If a speech frame is received, the pitch, LTP-Gain and GAIN are decoded as explained above.
- the error signal is reconstmcted from the codebook indexes, which is then denormalized with respect to the GAIN value. This signal is then passed through the Inverse filter to generate the reconstmcted signal.
- the Pitch and the LTP-Gain are the decoded values, same as those used in the Analysis.
- the filtered signal is passed through the Deemphasis filter whose output is passed on to the D/A to put out analog speech.
- the compressed frame contains 23 8-bit words and one 6-bit word.
- VQo 7 VQo 6 VQo 5 VQo 4 VQo 3 VQo 2 VQo 1 VQ 0 ° Comp_Frame[8] LS 8 bits VQ[0]
- VQ, 7 VQ, 6 VQ, 5 VQ, 4 VQ, 3 VQ, 2 VQ, 1 VQ,° Comp_Frame[9] LS 8 bits VQ[l]
- Appendix A includes the code book data for the 8Kbit sec algorithm
- Appendix B includes the code book data for the 9.6Kbit/sec algorithm
- Appendix C includes the code book data for the 16Kbit/sec algorithm. Table 20 describes the format ofthe code book for the 9.6Kbit/sec algorithm.
- the code book values in the Appendices are stored in a signed floating point format which is converted to a fixed point representation of floating point number when stored in the lookup tables ofthe present invention.
- the code book comprises a table of nine columns and 512 rows of floating point data.
- the first 8 rows correspond to the 8 samples of speech and the ninth entry is the precomputed constant described above as -Vi ⁇ y, 2 .
- An example ofthe code book data is shown in Table 21 with the complete code book for the 9.6Kbit/sec algorithm described in Appendix B.
- the code books are stored in PROM memory accessible by the Voice DSP as a lookup table.
- the table data is loaded into local DSP memory upon the selection of the appropriate algorithm to increase access speed.
- the code books comprise a table of data in which each entry is a sequential address from 000 to 51 1.
- a 9 X 512 code book is used.
- a 6 X 256 code book is used and for the 8Kbit sec algorithm, a 9 X 512 code book is used.
- the corresponding code book is used to encode/decode the speech samples.
- the code books are generated statistically by encoding a wide variety of speech pattems.
- the code books are generated in a learning mode for the above- described algorithm in which each speech segment which the compression algorithm is first exposed to is placed in the code book until 512 entries are recorded. Then the algorithm is continually fed a variety of speech pattems upon which the code book is adjusted. As new speech segments are encountered, the code book is searched to find the best match. Ifthe error between the observed speech segment and the code book values exceed a predetermined threshold, then the closest speech segment in the code book and the new speech segment is averaged and the new average is placed in the code book in place ofthe closest match. In this learning mode, the code book is continually adjusted to have the lowest difference ratio between observed speech segment values and code book values. The learning mode of operation may take hours or days of exposure to different speech pattems to adjust the code books to the best fit.
- the code books may be exposed to a single person's speech which will result in a code book being tailored to that particular persons method of speaking.
- the speech pattems of a wide variety of speakers of both genders are exposed to the code book learning algorithm for the average fit for a given language.
- V-data digitized voice data
- C-data C-data
- the V-data and C-data multiplex transmission is achieved in two modes at two levels: the transmit and receive modes and data service level and multiplex control level. This operation is shown diagrammatically in Figure 8.
- the main controller circuit 313 of Figure 3 operates in the data service level 1505 to collect and buffer data from both the personal computer 10 (through the RS232 port interface 315) and the voice control DSP 306.
- multiplex control level 1515 the main controller circuit 313 multiplexes the data and transmits that data out over the phone line 1523.
- the main controller circuit 313 operates in the multiplex control level 1515 to de-multiplex the V-data packets and the C-data packets and then operates in the data service level 1505 to deliver the appropriate data packets to the correct destination: the personal computer 10 for the C-data packets or the voice control DSP circuit 306 for V-data.
- V-data buffer 1511 there are two data buffers, the V-data buffer 1511 and the C-data buffer 1513, implemented in the main controller RAM 316 and maintained by main controller 313.
- voice control DSP circuit 306 engages voice operation, it will send a block of V-data every 20 ms to the main controller circuit 313 through dual port RAM circuit 308.
- Each V-data block has one sign byte as a header and 24 bytes of V-data.
- the sign byte header ofthe voice packet is transferred every frame from the voice control DSP to the controller 313.
- the sign byte header contains the sign byte which identifies the contents ofthe voice packet.
- the sign byte is defined as follows:
- the main controller circuit 313 operates at the data service level to perform the following tests.
- the voice control DSP circuit 306 starts to send the 24-byte V-data packet through the dual port RAM to the main controller circuit 313, the main controller will check the V-data buffer to see ifthe buffer has room for 24 bytes. If there is sufficient room in the V-data buffer, the main controller will check the sign byte in the header preceding the V-data packet. If the sign byte is equal to one (indicating voice information in the packet), the main controller circuit 313 will put the following 24 bytes of V-data into the V-data buffer and clear the silence counter to zero. Then the main controller 313 sets a flag to request that the V-data be sent by the main controller at the multiplex control level.
- the main controller circuit 313 will increase the silence counter by 1 and check ifthe silence counter has reached 5. When the silence counter reaches 5, the main controller circuit 313 will not put the following 24 bytes of V-data into the V- data buffer and will stop increasing the silence counter. By this method, the main controller circuit 313 operating at the service level will only provide non-silence V-data to the multiplex control level, while discarding silence V-data packets and preventing the V-data buffer from being overwritten.
- the operation ofthe main controller circuit 313 in the multiplex control level is to multiplex the V-data and C-data packets and transmit them through the same channel.
- both types of data packets are transmitted by the HDLC protocol in which data is transmitted in synchronous mode and checked by CRC error checking. If a V-data packet is received at the remote end with a bad CRC, it is discarded since 100% accuracy ofthe voice channel is not ensured. Ifthe V-data packets were re-sent in the event of corruption, the real-time quality ofthe voice transmission would be lost.
- the C-data is transmitted following a modem data communication protocol such as CCITT V.42.
- V-data block which includes a maximum of five V-data packets.
- V-data buffer has V-data with number more than 24 bytes
- the transmit block counter is set 1 and starts transmit V- data. This means that the main controller circuit will only transmit one block of V- data. If the V-data buffer has V-data with less than 24 bytes, the main controller circuit services the transmission of C-data. During the transmission of a C-data block, the V-data buffer condition is checked before transmitting the first C-data byte. Ifthe V-data buffer contains more than one V-data packet, the current transmission ofthe C-data block will be terminated in order to handle the V-data.
- the main controller circuit 313 operates at the multiplex control level to de-multiplex received data to V-data and C-data.
- the type of block can be identified by checking the first byte ofthe incoming data blocks.
- the main controller circuit 313 Before receiving a block of V-data, the main controller circuit 313 will initialize a receive V-data byte counter, a backup pointer and a temporary V-data buffer pointer.
- the value ofthe receiver V-data byte counter is 24, the value ofthe receive block counter is 0 and the backup pointer is set to the same value as the V-data receive buffer pointer. If the received byte is not equal to 80 hex (80h indicating a V-data packet), the receive operation will follow the current modem protocol since the data block must contain C-data.
- the main controller circuit 313 operating in receive mode will process the V-data.
- V-data block received when a byte of V-data is received, the byte of V-data is put into the V-data receive buffer, the temporary buffer pointer is increased by 1 and the receive V-data counter is decreased by 1. If the V-data counter is down to zero, the value ofthe temporary V-data buffer pointer is copied into the backup pointer buffer. The value ofthe total V-data counter is added with 24 and the receive V-data counter is reset to 24. The value of the receive block counter is increased by 1. A flag to request service of V-data is then set.
- the main controller circuit 313 will not put the incoming V-data into the V-data receive buffer but throw it away. If the total V-data counter has reached its maximum value, the receiver will not put the incoming V- data into the V-data receive buffer but throw it away.
- the main controller circuit 313 operating in the multiplex control level will not check the result ofthe CRC but instead will check the value ofthe receive V-data counter. Ifthe value is zero, the check is finished, otherwise the value ofthe backup pointer is copied back into the current V-data buffer pointer.
- the receiver is insured to de-multiplex the V-data from the receiving channel 24 bytes at a time.
- the main controller circuit 313 operating at the service level in the receive mode will monitor the flag of request service of V-data. Ifthe flag is set, the main controller circuit 313 will get the V-data from the V-data buffer and transmit it to the voice control DSP circuit 306 at a rate of 24 bytes at a time. After sending a block of V-data, it decreases 24 from the value in the total V-data counter.
- the modem hardware component 20 inco ⁇ orates a modified packet protocol for negotiation ofthe speech compression rate.
- a modified supervisory packet is formatted using the same open flag, address, CRC, and closing flag formatting bytes which are found in the CCITT V.42 standard data supervisory packet, as is well known in the industry and as is described in the CCITT Blue Book, volume VIII entitled Data Communication over the Telephone Network. 1989 referenced above.
- the set of CCITT standard header bytes (control words) has been extended to include nonstandard control words used to signal transmission of a nonstandard communication command.
- Nonstandard control word does not conflict with other data communication terminals, for example, when communicating with a non-PCS (Personal Communications System) modem system, since the nonstandard packet will be ignored by a non-PCS system.
- Table 22 offers one embodiment ofthe present invention showing a modified supervisory packet stmcture. Table 22 omits the CCITT standard formatting bytes: open flag, address, CRC, and closing flag; however, these bytes are described in the CCITT standard.
- the modified supervisory packet is distinguished from a V.42 standard packet by using a nonstandard control word, such as 80 hex, as the header. The nonstandard control word does not conflict with V.42 standard communications.
- the modified supervisory packet is transmitted by the HDLC protocol in which data is transmitted in synchronous mode and checked by CRC error checking.
- the use of a modified supervisory packet eliminates the need for an escape command sent over the telephone line to intermpt data communications, providing an independent channel for negotiation of the compression rate.
- the channel may also be used as an altemative means for programming standard communications parameters.
- the modified supervisory packet is encoded with different function codes to provide an independent communications channel between hardware components. This provides a means for real time negotiation and programming of the voice compression rate during uninterrupted transmission of voice data and conventional data without the need for conventional escape routines.
- the modified supervisor ⁇ ' packet is encoded with a function code using several embodiments.
- the function code is embedded in the packet as one of the data words and is located in a predetermined position.
- the supervisory packet header signals a nonstandard supervisory packet and contains the compression rate to be used between the sites. In such an embodiment, for example, a different nonreserved header is assigned to each function code.
- a system consisting of PCS modem 20 and data terminal 10 are connected via phone line 30 to a second PCS system comprised of PCS modem 20 A and data terminal 10A. Therefore, calling modem 20 initializes communication with receiving modem 20 A.
- a speech compression command is sent via a modified supervisory data packet as the request for speech compression algorithm and ratio negotiation. Encoded in the speech compression command is the particular speech compression algorithm and the speech compression ratio desired by the calling PCM modem 20.
- the first data byte ofthe modified supervisory packet could be used to identify the speech compression algorithm using a binary coding scheme (e.g., OOh for Vector Quantization, Olh for CELP+, 02h for VCELP, and 03h for TmeSpeech, etc.).
- a second data byte could be used to encode the speech compression ratio (e.g., OOh for 9.6Kbit/sec, Olh for 16Kbit/sec, 02h for 8Kbit/sec, etc.).
- This embodiment ofthe speech compression command supervisory packet is shown in Table 23.
- the function code could be stored in a predetermined position of one ofthe packet data bytes.
- Other function code encoding embodiments are possible without deviating from the scope and spirit of the present invention and the embodiments offered are not intended to be exclusive or limiting embodiments.
- the receiving PCS modem 20A will recognize the speech compression command and will respond with an acknowledge packet using. for instance, a header byte such as hex 81.
- the acknowledge packet will alert the calling modem 20 that the speech compression algorithm and speech compression ratio selected are available by use ofthe ACK field ofthe supervisory packet shown in Table 23. Receipt ofthe acknowledge supervisory packet causes the calling modem 20 to transmit subsequent voice over data information according to the selected speech compression algorithm and compression ratio.
- the frequency of which the speech compression command supervisory packet is transmitted will vary with the application. For moderate quality voice over data applications, the speech compression algorithm need only be negotiated at the initialization ofthe phone call. For applications requiring more fidelity, the speech compression command supervisory packet is renegotiated throughout the call to accommodate new parties to the communication which have different speech compression algorithm limitations or to actively tune the speech compression ratio as the quality ofthe communications link fluctuates.
- Other embodiments provide a speech compression command supervisory packet encode varying transmission rates ofthe speech compression command supervisory packet and different methods of speech compression algorithm and compression ratio negotiation. Additionally, other encoding embodiments to encode the supervisory packet speech compression algorithm and the speech compression ratio may be incorporated without deviating from the scope and spirit ofthe present invention, and the described embodiments are not exclusive or limiting.
- a new supervisory packet may be allocated for use as a means for negotiating multiplexing scheme for the various types of information sent over the communications link. For example, if voice over data mode is employed, there exist several methods for multiplexing the voice and digital data.
- the multiplexing scheme may be selected by using a modified supervisory packet, called a multiplex supervisory packet, to negotiate the selection of multiplexing scheme.
- a remote control supervisory packet could be designated for remote control of another hardware device.
- a remote control supervisory packet could be encoded with the necessary selection parameters needed to program the remote device.
- a mode Switching System Referring again to Figure 1, consider the case where a first user on modem 20 has established analog voice communications with a second user at remote modem 20a. As shown in Figure 9, the first user and second user may wish to establish either digital data communications or voice over data communications without terminating the existing analog voice telephone connection.
- digital data link will be used to describe the digital link established to commence either a digital data communications mode, a voice over data communications mode, or a combination ofthe two modes.
- the modem transmits digital data and in voice over data communications mode the modem transmits multiplexed packetized voice and data packets.
- the termination of the digital data link results in an exit by hang up or by return to analog voice mode.
- the users begin in the analog voice mode
- a digital data link is initiated 410 by the method and apparatus described herein.
- the digital data link is established 414.
- the users may enter a digital data communications mode 420, a voice over data communications mode 430, or a sequential combination ofthe two modes, as shown in Figure 9.
- the users may exit 440 by hanging up the telephone lines 450 or by reentering analog voice mode 400.
- the numberings shown in Figure 3 shall be used to indicate the components of modem 20, and similar numbering shall be used to indicate modem 20a by attaching an "a" suffix to each component of Figure 3.
- the main controller of modem 20 is controller 313, whereas the main controller of modem 20a is controller 313a (not shown).
- the first user and the second user establish the digital data link by pressing a hardwired switch 330 located on modem 20 and a similar switch 330a located on modem 20a, at approximately the same time.
- Switch 330 is shown in Figure 3 as one means for initiating digital data link.
- Controller 313 determines whether its modem is originating or answering based on whether it receives an originate signal 332 or an answer signal 334, as predetermined by the users.
- Controller 313 of modem 20 detects when the switch 330 is pressed by the first user and controller 313a of modem 20a detects when the switch 330a is pressed by the second user.
- Both modems 20 and 20a execute software to establish a digital link through handshaking protocols specified in CCITT v-series modem protocols (some examples are v.22, v.22bis, v.32 and v.34 protocols).
- the users initiate digital data communications by a software switch which is selected from a menu of options displayed on computers 10 and 10a, respectively.
- the software switch also contains options for each user to select their originating or answer status.
- an analog voice connection is established 500 and when both modems detect the pressed switches 330 and 330a, the modems are placed in handshaking mode.
- the designation of originating modem and answering modem is predefined by the users before entering into handshaking mode 510.
- the digital data link is established 530 and the modems may enter either the digital data communications mode for digital data transfer or the voice over data communications mode for multiplexed voice and data packet transfer 540.
- the exit routines 550 will be discussed in further detail below (see Figure 15).
- the switch between analog voice mode and the digital data link modes is accomplished using a switching signal.
- Both moden are preprogrammed to idle in an origination state 600 prior to the analog voice connection 610 and both modems have a hardware mode switch to force the modem into an answer state 620.
- the analog voice communications are conducted normally and without interruption. Ifthe hardware switch is depressed on one of the modems, that modem (e.g., modem A) will enter an answer mode and transmit an answer tone, which is used as a switching signal 630.
- the answer tone is detected by the modem which is still in origination mode (modem B) 640 and the originating modem and the answering modem handshake with the originator/answer designation forced by the user depressing the hardware mode switch 650.
- the digital link is thereby established 660 and digital data communications and voice over data communications are operable 670.
- a variation of this embodiment occurs when both modems idle in the answer state 700 and the mode switch is used to force one ofthe modems into an originator mode 720.
- the originating modem thereby transmits a calling tone which is used as a switching signal 730.
- the answering modem detects the calling tone and responds with an answering tone 740, and the modems handshake 750 with the originator/answer designation forced by the hardware mode switch.
- the digital link is thereby established 760 and digital data communications and voice over data communications are operable 770.
- the modems are idling with a software routine designed to poll telephone line interface 309 in order to detect transmission of a predetermined switching tone sent from another modem.
- Both modems include a mode switch that has both an origination and an answer mode selection.
- Figure 13 shows that when a user depresses the mode switch to force one modem into the origination mode 810 and 820, the other modem detects the calling signal generated by that originating modem and the resident software forces the second modem into an answering mode 830. In this case the mode signal is the calling signal. Ifthe user depresses the mode switch to force the first modem into the answering mode the first modem generates an answer tone which is the switching signal 840. The answer tone is decoded by the second modem and the software on the second modem forces that modem into an origination mode 850.
- the last three embodiments eliminate the need for both operators to predetermine which modem will be originating and which modem will be answering. It also provides the users with the ability to unilaterally establish a digital data link.
- the answer tone is a 2100 Hz tone and the calling tone is a 1300 Hz tone.
- DTMF dual-tone multifrequency
- Another example incorporates the use of a sequence of DTMF tones to be decoded as a mode switching signal, in place of a single switching tone.
- both modems are preprogrammed to monitor telephone line interface 309 in order to detect the switching tone using codec/DSP 311.
- the switching tone is detected using DSP 306.
- Altemate embodiments include signal debouncing means to eliminate accidental triggering of the modems into the handshake mode.
- the mode switch is actually a software switch, which is operated by the user at the terminal attached to the modem.
- One embodiment provides a digital data link between modems 20 and 20a by the use of ATD and ATA modem commands to place the modems in the handshaking mode.
- This method and apparatus does not require hardware switches 330 or 330a, but does require that both users predetermine which will be an originating modem and which will be an answering modem, as shown in Figure 14, step 910.
- ATD dialing
- ATA ATA
- Transmission of these commands may be initiated with either a software command or a hardware switching device which generates the ATD/ATA commands and transfers the appropriate command to their respective modems.
- controller 313 of modem 20 receives the command and places modem 20 in answering mode.
- an ATA command is issued to modem 20a which places modem 20a in handshaking mode and initiates an answer tone 930.
- Modem 20a which is in the origination mode, receives the answer tone generated by modem 20a and initiates digital data communications through handshaking according to CCITT v-series modem protocols 940, 950.
- the modems establish communication parameters during handshaking. Some ofthe communications parameters negotiated include baud rate and digital data protocols. Those skilled in the art will readily recognize that other protocols may be substituted without departing from the scope and spirit of this embodiment ofthe present invention.
- parameter negotiation is performed by the modified supervisory packet as described in the above-mentioned US Patent
- voice over data communications are performed by establishing the digital data link as described above and then incorporating a supervisory packet to signal the voice over data communications mode as described in the copending US Patent No. 5,535,204, issued July 9, 1996 entitled “RINGDOWN AND RINGBACK SIGNALLING FOR A COMPUTER-BASED MULTIFUNCTION PERSONAL COMMUNICATIONS SYSTEM", which was incorporated by reference, above.
- establishment of the digital data link may automatically invoke the voice over data communications mode which additionally inco ⁇ orates advanced priority statistical multiplexing (APSM).
- APSM is described in the copending US Patent Application Serial Number 08/349,505 filed December 2, 1994 entitled “VOICE OVER DATA CONFERENCING FOR A COMPUTER- BASED PERSONAL COMMUNICATIONS SYSTEM", which is hereby inco ⁇ orated by reference.
- APSM allows the digital link to use as much bandwidth for voice as is necessary, and the remaining bandwidth is dynamically allocated to digital data communications, thus, eliminating the need to switch between digital data transmission mode and a voice over data transmission mode.
- Yet another embodiment switches between the digital data communications mode and the voice over data communications mode by using special mode switching codes transmitted by the users ofthe modems 20 and 20a after the digital link is established.
- the modem 20a may negotiate communications parameters such as speech compression ratio and voice algorithm selection using the modified supervisory packet as detailed in the above-mentioned US Patent Application Serial Number 08/271,496 filed July 7, 1994 entitled "VOICE OVER DATA MODEM WITH SELECTABLE VOICE COMPRESSION".
- One Example of Establishing The Digital Data Link Using Calling Tones Altemate methods and apparatus may be employed to switch from voice analog mode to digital data communications mode or voice over data communications mode.
- modem 20 is the originating modem and modem 20a is the answering modem a 1300 Hz calling tone is used to initiate transfer from analog voice mode to digital communications mode.
- the originating modem (modem 20 in this instance) is programmed to transmit a 1300 Hz calling tone to originate contact with an answering modem and codec/DSP 311 is programmed to detect the 2100 Hz answer tone received from the answering modem (modem 20a in this instance).
- the originating modem (20) transmits the 1300 Hz calling tone and the answering modem (20a) transmits a 2100 Hz answering tone, which is detected by the originating modem (20).
- the originating modem and the answering modem begin handshaking to establish the digital data link.
- audible tones may be substituted without departing from the scope and spirit of the present invention and other hardware may be configured to detect the audible signal.
- the codec/DSP ofthe originating modem is preprogrammed to detect the 2100 Hz calling tone.
- a dedicated detector is added to the hardware to detect the answering tone and signal the modem electronics that a digital link is being initiated.
- both modems are constantly monitoring their respective telephone line interfaces (309 and 309a) using codec/DSP (311 and 31 la) to detect an audio calling signal.
- Specialized DTMF tones may be used to initiate the establishment of the digital data link while in analog voice mode.
- the user manually enters a predetermined DTMF tone from the telephone keypad during the analog voice connection to initiate establishment ofthe digital link.
- a DTMF tone sequence is detected to switch from analog voice mode to digital data link mode.
- the modem software is preprogrammed dial the numbers in order to generate the DTMF tone sequence.
- DSP circuit 311 is preprogrammed to recognize a special DTMF tone sequence which initiates the establishment ofthe digital data link between the originating and answering modems.
- DSP 306 is preprogrammed to leam and recognize verbal commands which are issued by a first operator to enter digital data communications mode or voice over data communications mode.
- the commands may be understood by both the local modem and the remote site modem since both modems are connected to a common analog voice connection.
- the commands are executed automatically upon recognition by DSPs 306.
- codec/DSP 312 momtors telephone line interface 309 to detect predetermined voice commands to establish digital data communications and voice over data communications.
- Facsimile Mode Using Facsimile Tone
- Yet another embodiment inco ⁇ orates an 1100 Hz facsimile audible tone for switching from analog voice mode to facsimile mode.
- the detection ofthe 1100 Hz facsimile signal is accomplished by monitoring telephone line interface 309 using codec/DSP 311 and switching to facsimile mode upon signal detection.
- exit from the digital data link 550 is performed digitally, by encoding a special a Hangup Command Packet 1010.
- an exit command 1030 is performed using the supervisory packet with a Retum to Analog Voice Mode (RAV) command 1040 to signal end of digital communications.
- RAV Analog Voice Mode
- a special RAV audible tone is generated to signal retum to analog voice mode and disable the modems 1050, 1060.
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- General Engineering & Computer Science (AREA)
- Telephonic Communication Services (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
A personal communications system enables the operator to simultaneously transmit voice and data communication to a remote site. The personal communications system is equipped with two telephone line interfaces to allow connection between two remote sites. The connection between the first remote site and the second remote site may operate in an analog voice mode, a digital data communications mode, and a voice over data communications mode. A switch between analog voice mode and digital data communications mode and analog voice mode and voice over data communications mode is performed using switching tones, including calling tones, answer tones, and DTMF tones. Hardware and software switches are also used to program the modems in the personal communication systems for originating and answering modes.
Description
MODE SWITCHING SYSTEM FOR A VOICE OVER DATA MODEM
This patent application is a Continuation-In-Part of US Patent
Application Serial Number 08/346,421 filed November 29, 1994 entitled "DYNAMIC SELECTION OF COMPRESSION RATE FOR A VOICE COMPRESSION ALGORITHM IN A VOICE OVER DATA MODEM", the complete application of which is incoφorated by reference, which application is also a Continuation-In-Part of US Patent
Application Serial Number 08/271,496 filed July 7, 1994 entitled "VOICE OVER DATA MODEM WITH SELECTABLE VOICE COMPRESSION", the complete application of which is incorporated by reference, which application is also a Continuation-In-Part of US Patent No. 5,453,986, issued September 26, 1995 entitled "DUAL PORT INTERFACE FOR A COMPUTER-BASED MULTIFUNCTION PERSONAL COMMUNICATIONS SYSTEM", the complete application of which is hereby incorporated by reference, which application is also a Continuation-In-Part of US Patent No. 5,535,204, issued July 9, 1996 entitled "RINGDOWN AND RINGBACK
SIGNALLING FOR A COMPUTER-BASED MULTIFUNCTION PERSONAL COMMUNICATIONS SYSTEM", the complete application of which is hereby incorporated by reference, which application is also a Continuation-In-Part of US Patent No. 5,452,289, issued September 19, 1995 entitled "COMPUTER-BASED MULTIFUNCTION PERSONAL COMMUNICATIONS SYSTEM", the complete application of which, including the microfiche appendix, is also hereby incoφorated by reference.
Field of the Invention The present invention relates to communications systems and in particular to systems for switching between voice communications and computer assisted digital communications having a voice over data communications ability.
Background ofthe Invention A wide variety of communications alternatives are currently available to telecommunications users. For example, facsimile transmission of printed matter is available through what is commonly referred to as a stand-alone fax machine. Alternatively, fax-modem communication systems are currently available for personal computer users which combine the operation of a facsimile machine with the word processor of a computer to transmit documents held on computer disk. Modem communication over telephone lines in combination with a personal computer is also known in the art where file transfers can be accomplished from one computer to another. Also, simultaneous voice and modem data transmitted over the same telephone line has been accomplished in several ways.
Modem technology has recently multiplexed the transmission of various nonstandard data with standard digital data, such as voice over data communications, creating a hybrid datastream of voice and digital data.
One problem associated with voice over data commumcations occurs when two users initiate an analog voice connection and subsequently wish to initiate digital data or voice over data communications. One method to initiate digital data or voice over data communications is to terminate the analog voice connection and re-connect in a digital data or voice over data format, however, this is inconvenient and requires hanging up and redialing between the users.
A time-division multiplexing voice and data communication system which switches between a "SYSTEM mode" and a "POTS mode" was proposed in U.S. Patent No. 4,740,963 by Eckley, entitled "VOICE AND DATA COMMUNICATION SYSTEM". In SYSTEM mode a multiplexer means time- division multiplexes a compressed, digitized analog voice signal with a digital data signal to produce a composite digital signal having a data rate substantially equal to the uncompressed, digitized voice signal. The POTS mode is the analog voice mode. A remote user unit switches to the SYSTEM mode upon receipt of a particular dual-tone multifrequency (DTMF) signal from a remote digital loop
carrier unit. The Eckley system returns to POTS mode upon detection of a failure of a remote user unit or upon detection of a particular code from a central office terminal. The Eckley invention requires a special mode tone detector to generate a control signal to enter SYSTEM mode and a code detection circuit to detect the particular code to return to POTS mode. However, the Eckley system is designed to operate in a particular voice and data time-division multiplexing system.
Packetized voice over data communication systems utilize several communication parameters not found in fixed time-division multiplexing systems and require negotiation of packet transmission parameters, such as speech compression ratio and speech algorithm selection.
Therefore, there is a need in the art for a mode switching control for a packetized voice over data commumcations which provides a plurality of switching means for transferring between an analog voice connection and digital data communications or voice over data commumcations without having to hang up on the original analog voice connection. The mode switching control should provide means for negotiating digital data and voice over data communications parameters.
Summarv ofthe Invention The present disclosure describes a complex computer assisted communications system, the details of which are set forth in the above¬ mentioned U.S. Patent Application Serial Number 08β46.421. entitled "DYNAMIC SELECTION OF COMPRESSION RATE FOR A VOICE COMPRESSION ALGORITHM IN A VOICE OVER DATA MODEM" by Sharma et al., filed November 29, 1994, the complete application of which was incoφorated by reference and in the above-mentioned U.S. Patent No. 5,452,289, issued September 19, 1995 entitled "COMPUTER-BASED MULTIFUNCTION PERSONAL COMMUNICATIONS SYSTEM", the complete application of which, including the microfiche appendix, was also incoφorated by reference.
The subject ofthe present invention includes a mode switching system for establishing a digital data communications or a voice over data communications from an existing analog voice connection. Alternate embodiments include means for negotiation of communications parameters for digital data communications and for digital voice over data communications. Embodiments are also described for returning to analog voice communications after completing digital data communications or voice over data . communications.
The major functions ofthe present system are a telephone function, a voice mail function, a fax manager function, a multi-media mail function, a show and tell function, a terminal function and an address book function. The telephone function is more sophisticated than a standard telephone in that the present system converts the voice into a digital signal which can be processed with echo cancellation, compressed, stored as digital data for later retrieval and transmitted as digital voice data concurrent with the transfer of digital information data.
The voice over data (show and tell) component ofthe present system enables the operator to simultaneously transmit voice and data commumcation to a remote site. This voice over data function dynamically allocates data bandwidth over the telephone line depending on the demands of the voice grade digitized signal.
A modified supervisory packet is described which can be used to negotiate digital data communication parameters or voice over data communications parameters. In one embodiment, the modified supervisory packet negotiates nonstandard data transmission parameters, such as the speech compression algorithm and speech compression ratio, in voice over data communications. By using a supervisory packet the need for escape sequences is obviated and data transmission parameter negotiation occurs without an interruption in the transmission of data. In addition, data transmission parameters can be renegotiated and changed in real time throughout the data
transmission. This method may also be employed for negotiation of standard communications parameters or protocols.
Description ofthe Drawings In the drawings, where like numerals describe like components throughout the several views:
Figure 1 shows the telecommunications environment within which the present system may operate in several ofthe possible modes of communication;
Figure 2 is the main menu icon for the software components operating on the personal computer;
Figure 3 is a block diagram ofthe hardware components ofthe present system;
Figure 4 is a detailed function flow diagram ofthe speech compression algorithm; Figure 5 is a detailed fimction flow diagram of the speech decompression algorithm;
Figure 6 is a signal flow diagram of the speech compression algorithm;
Figure 7 is a signal flow diagram of the speech compression algorithm showing details ofthe code book synthesis;
Figure 8 is a detailed function flow diagram of the voice/data multiplexing function;
Figure 9 is a flow diagram showing the steps for initiating digital data communications and voice over data communications from an established analog voice connection according to one embodiment of the present invention; Figure 10 is a flow diagram showing the steps for one example of establishing a digital data link with user controlled mode switches according to one embodiment of the present invention;
Figure 11 is a flow diagram showing the steps for one example of establishing a digital data link using an answer tone according to one embodiment ofthe present invention;
Figure 12 is a flow diagram showing the steps for one example of establishing a digital data link using a calling tone according to one embodiment ofthe present invention;
Figure 13 is a flow diagram showing the steps for one example of establishing a digital data link requiring only a single user controlled mode switch according to one embodiment ofthe present invention;
Figure 14 is a flow diagram showing the steps for one example of establishing a digital data link using ATD/ATA commands according to one embodiment ofthe present invention; and Figure 15 is a flow diagram showing a sequence of steps to exit a digital data communications mode and a voice over data communications mode according to one embodiment ofthe present invention.
Detailed Description ofthe Preferred Embodiments
In the following detailed description ofthe preferred embodiment, reference is made to the accompanying drawings which form a part hereof, and in which is shown by way of illustration specific embodiments in which the inventions may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention, and it is to be understood that other embodiments may be utilized and that structural changes may be made without departing from the spirit and scope ofthe present inventions. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope ofthe present inventions is defined by the appended claims.
Figure 1 shows a typical arrangement for the use ofthe present system. Personal computer 10 is running the software components ofthe present system while the hardware components 20 include the data communication equipment and telephone headset. Hardware components 20 communicate over a standard telephone line 30 to one of a variety of remote sites. One of the remote sites may be equipped with the present system including hardware components 20a and software components running on personal computer 10a. In one alternative use, the local hardware components 20 may be communicating
over standard telephone line 30 to facsimile machine 60. In another alternative use, the present system may be communicating over a standard telephone line 30 to another personal computer 80 through a remote modem 70. In another alternative use, the present system may be communicating over a standard telephone line 30 to a standard telephone 90. Those skilled in the art will readily recognize the wide variety of communication interconnections possible with the present system by reading and understanding the following detailed description.
General Overview The present inventions are embodied in a commercial product by the assignee, MultiTech Systems, Inc. The software component operating on a personal computer is sold under the commercial trademark of MultiExpressPCS™ personal communications software while the hardware component ofthe present system is sold under the commercial name of MultiModemPCS™, Intelligent Personal Communications System Modem. In the preferred embodiment, the software component runs under Microsoft® Windows® however those skilled in the art will readily recognize that the present system is easily adaptable to run under any single or multi-user, single or multi-window operating system.
The present system is a multifunction commumcation system which includes hardware and software components. The system allows the user to connect to remote locations equipped with a similar system or with modems, facsimile machines or standard telephones over a single analog telephone line. The software component ofthe present system includes a number of modules which are described in more detail below. Figure 2 is an example of the Windows®-based main menu icon ofthe present system operating on a personal computer. The functions listed with the icons used to invoke those functions are described in the above¬ mentioned U.S. Patent Application Patent No. 5,452.289. issued September 19, 1995 and entitled "COMPUTER-BASED MULTIFUNCTION PERSONAL COMMUNICATIONS SYSTEM". Those skilled in the art will readily recognize that a wide variety of selection techniques may be used to invoke the
various functions ofthe present system. The icon of Figure 2 is part of Design Patent Application Number 29/001397, filed November 12, 1992 entitled "Icons for a Computer-Based Multifunction Personal Communications System" assigned to the same assignee ofthe present inventions. The telephone module allows the system to operate as a conventional or sophisticated telephone system. The system converts voice into a digital signal so that it can be transmitted or stored with other digital data, like computer information. The telephone function supports PBX and Centrex features such a call waiting, call forwarding, caller ID and three-way calling. This module also allows the user to mute, hold or record a conversation. The telephone module enables the handset, headset or hands-free speaker telephone operation ofthe hardware component. It includes on-screen push button dialing, speed-dial of stored numbers and digital recording of two-way conversations.
The voice mail portion ofthe present system allows this system to operate as a telephone answering machine by storing voice messages as digitized voice files along with a time/date voice stamp. The digitized voice files can be saved and sent to one or more destinations immediately or at a later time using a queue scheduler. The user can also listen to, forward or edit the voice messages which have been received with a powerful digital voice editing component ofthe present system. This module also creates queues for outgoing messages to be sent at preselected times and allows the users to create outgoing messages with the voice editor.
The fax manager portion ofthe present system is a queue for incoming and outgoing facsimile pages. In the preferred embodiment ofthe present system, this function is tied into the Windows "print" command once the present system has been installed. This feature allows the user to create faxes from any Windows ® -based document that uses the "print" command. The fax manager function of the present system allows the user to view queued faxes which are to be sent or which have been received. This module creates queues for outgoing faxes to be sent at preselected times and logs incoming faxes with time/date stamps.
The multi-media mail function ofthe present system is a utility which allows the user to compose documents that include text, graphics and voice messages using the message composer function ofthe present system, described more fully below. The multi-media mail utility ofthe present system allows the user to schedule messages for transmittal and queues up the messages that have been received so that can be viewed at a later time.
The show and tell function ofthe present system allows the user to establish a data over voice (DOV) communications session. When the user is transmitting data to a remote location similarly equipped, the user is able to talk to the person over the telephone line while concurrently transferring the data. This voice over data function is accomplished in the hardware components ofthe present system. It digitizes the voice and transmits it in a dynamically changing allocation of voice data and digital data multiplexed in the same transmission. The allocation at a given moment is selected depending on the amount of voice digital information required to be transferred. Quiet voice intervals allocate greater space to the digital data transmission.
The terminal function ofthe present system allows the user to establish a data communications session with another computer which is equipped with a modem but which is not equipped with the present system. This feature ofthe present system is a Windows™-based data communications program that reduces the need for issuing "AT" commands by providing menu driven and "pop-up" window alternatives.
The address book function of the present system is a database that is accessible from all the other functions of the present system. This database is created by the user inputting destination addresses and telephone numbers for data communication, voice mail, facsimile transmission, modem communication and the like. The address book function of the present system may be utilized to broadcast communications to a wide variety of recipients. Multiple linked databases have separate address books for different groups and different destinations may be created by the users. The address book function includes a
textual search capability which allows fast and efficient location of specific addresses as described more fully below.
Hardware Components Figure 3 is a block diagram ofthe hardware components ofthe present system corresponding to reference number 20 of Figure 1. These components form the link between the user, the personal computer running the software component ofthe present system and the telephone line interface. The details ofthe system shown in Figure 3 and a detailed description ofthe schematics is found in the above-mentioned U.S. Patent No. 5,452,289 issued on September 19, 1995 and entitled "COMPUTER-BASED MULΗFUNCΗON PERSONAL COMMUNICATIONS SYSTEM".
In the preferred embodiment ofthe present system three alternate telephone interfaces are available: the telephone handset 301, a telephone headset 302, and a hands-free microphone 303 and speaker 304. Regardless of the telephone interface, the three alternative interfaces connect to the digital telephone coder-decoder (CODEC) circuit 305.
The digital telephone CODEC circuit 305 interfaces with the voice control digital signal processor (DSP) circuit 306 which includes a voice control DSP and CODEC. This circuit does digital to analog (D/A) conversion, analog to digital (A D) conversion, coding/decoding, gain control and is the interface between the voice control DSP circuit 306 and the telephone interface. The CODEC ofthe voice control circuit 306 transfers digitized voice information in a compressed format to multiplexor circuit 310 to analog telephone line interface 309. The CODEC ofthe voice control circuit 306 is an integral component of a voice control digital signal processor integrated circuit. The voice control DSP of circuit 306 controls the digital telephone CODEC circuit 305, performs voice compression and echo cancellation.
Multiplexor (MUX) circuit 310 selects between the voice control DSP circuit 306 and the data pump DSP circuit 31 1 for transmission of information on the telephone line through telephone line interface circuit 309.
The data pump circuit 311 also includes a digital signal processor (DSP) and a CODEC for communicating over the telephone line interface 309 through MUX circuit 310. The data pump DSP and CODEC of circuit 311 performs functions such as modulation, demodulation and echo cancellation to communicate over the telephone line interface 309 using a plurality of telecommunications standards including FAX and modem protocols.
The main controller circuit 313 controls the DSP data pump circuit 311 and the voice control DSP circuit 306 through serial input/output and clock timer control (SIO/CTC) circuits 312 and dual port RAM circuit 308 respectively. The main controller circuit 313 communicates with the voice control DSP 306 through dual port RAM circuit 308. In this fashion digital voice data can be read and written simultaneously to the memory portions of circuit 308 for high speed communication between the user (through interfaces 301, 302 or 303/304) and the personal computer connected to serial interface circuit 315 and the remote telephone connection connected through the telephone line attached to line interface circuit 309.
In one embodiment, the main controller circuit 313 includes a microprocessor which controls the functions and operation of all ofthe hardware components shown in Figure 3. The main controller is connected to RAM circuit 316 and an programmable and electrically erasable read only memory
(PEROM) circuit 317. The PEROM circuit 317 includes non-volatile memory in which the executable control programs for the voice control DSP circuits 306 and the main controller circuits 313 operate.
The RS232 serial interface circuit 315 communicates to the serial port ofthe personal computer which is running the software components of the present system. The RS232 serial interface circuit 315 is connected to a serial input/output circuit 314 with main controller circuit 313. SIO circuit 314 is in the preferred embodiment, a part of SIO/CTC circuit 312.
Functional Operation of the Hardware Components Referring once again to Figure 3, the multiple and selectable functions described in conjunction with Figure 2 are all implemented in the
hardware components of Figure 3. Each of these functions is discussed in the above-mentioned U.S. Patent No. 5,452,289, issued September 19, 1995 and entitled "COMPUTER-BASED MULTIFUNCTION PERSONAL COMMUNICATIONS SYSTEM". The telephone function 115 is implemented by the user either selecting a telephone number to be dialed from the address book 127 or manually selecting the number through the telephone menu on the personal computer. The telephone number to be dialed is downloaded from the personal computer over the serial interface and received by main controller 313. Main controller 313 causes the data pump DSP circuit 31 1 to seize the telephone line and transmit the DTMF tones to dial a number. DSP 306 receives commands from the personal computer via main controller 313 to configure the digital telephone CODEC circuit 305 to enable either the handset 301 operation, the microphone 303 and speaker 304 operation or the headset 302 operation. A telephone connection is established through the telephone line interface circuit 309 and commumcation is enabled. The user's analog voice is transmitted in an analog fashion to the digital telephone CODEC 305 where it is digitized. The digitized voice patterns are passed to the voice control circuit 306 where echo cancellation is accomplished, the digital voice signals are reconstructed into analog signals and passed through multiplexor circuit 310 to the telephone line interface circuit 309 for analog transmission over the telephone line. The incoming analog voice from the telephone connection through telephone connection circuit 309 is passed to the integral CODEC of the voice control circuit 306 where it is digitized. The digitized incoming voice is then passed to digital telephone CODEC circuit 305 where it is reconverted to an analog signal for transmission to the selected telephone interface (either the handset 301, the microphone/speaker 303/304 or the headset 302). Voice Control DSP circuit 306 is programmed to perform echo cancellation to avoid feedback and echoes between transmitted and received signals, as is more fully described below. In the voice mail function mode ofthe present system, voice messages may be stored for later transmission or the present system may operate
as an answering machine receiving incoming messages. For storing digitized voice, the telephone interface is used to send the analog speech patterns to the digital telephone CODEC circuit 305. Circuit 305 digitizes the voice patterns and passes them to voice control circuit 306 where the digitized voice patterns are digitally compressed. The digitized and compressed voice patterns are passed through dual port ram circuit 308 to the main controller circuit 313 where they are transferred through the serial interface to the personal computer using a packet protocol defined below. The voice patterns are then stored on the disk of the personal computer for later use in multi-media mail, for voice mail, as a pre- recorded answering machine message or for later predetermined transmission to other sites.
For the present system to operate as an answering machine, the hardware components of Figure 3 are placed in answer mode. An incoming telephone ring is detected through the telephone line interface circuit 309 and the main controller circuit 313 is alerted which passes the information off to the personal computer through the RS232 serial interface circuit 315. The telephone line interface circuit 309 seizes the telephone line to make the telephone connection. A pre-recorded message may be sent by the personal computer as compressed and digitized speech through the RS232 interface to the main controller circuit 313. The compressed and digitized speech from the personal computer is passed from main controller circuit 313 through dual port ram circuit 308 to the voice control DSP circuit 306 where it is uncompressed and converted to analog voice patterns. These analog voice patterns are passed through multiplexor circuit 310 to the telephone line interface 309 for transmission to the caller. Such a message may invite the caller to leave a voice message at the sound of a tone. The incoming voice messages are received through telephone line interface 309 and passed to voice control circuit 306. The analog voice patterns are digitized by the integral CODEC of voice control circuit 306 and the digitized voice patterns are compressed by the voice control DSP ofthe voice control circuit 306. The digitized and compressed speech pattems are passed through dual port ram circuit 308 to the main controller circuit 313 where they
are transferred using packet protocol described below through the RS232 serial interface 315 to the personal computer for storage and later retrieval. In this fashion the hardware components of Figure 3 operate as a transmit and receive voice mail system for implementing the voice mail function 117 of the present system.
The hardware components of Figure 3 may also operate to facilitate the fax manager function 119 of Figure 2. In fax receive mode, an incoming telephone call will be detected by a ring detect circuit ofthe telephone line interface 309 which will alert the main controller circuit 313 to the incoming call. Main controller circuit 313 will cause line interface circuit 309 to seize the telephone line to receive the call. Main controller circuit 313 will also concurrently alert the operating programs on the personal computer through the RS232 interface using the packet protocol described below. Once the telephone line interface seizes the telephone line, a fax carrier tone is transmitted and a return tone and handshake is received from the telephone line and detected by the data pump circuit 311. The reciprocal transmit and receipt ofthe fax tones indicates the imminent receipt of a facsimile transmission and the main controller circuit 313 configures the hardware components of Figure 3 for the receipt of that information. The necessary handshaking with the remote facsimile machine is accomplished through the data pump 31 1 under control of the main controller circuit 313. The incoming data packets of digital facsimile data are received over the telephone line interface and passed through data pump circuit 31 1 to main controller circuit 313 which forwards the information on a packet basis (using the packet protocol described more fully below) through the serial interface circuit 315 to the personal computer for storage on disk. Those skilled in the art will readily recognize that the FAX data could be transferred from the telephone line to the personal computer using the same path as the packet transfer except using the normal AT stream mode. Thus the incoming facsimile is automatically received and stored on the personal computer through the hardware components of Figure 3.
A facsimile transmission is also facilitated by the hardware components of Figure 3. The transmission of a facsimile may be immediate or queued for later transmission at a predetermined or preselected time. Control packet information to configure the hardware components to send a facsimile are sent over the RS232 serial interface between the personal computer and the hardware components of Figure 3 and are received by main controller circuit 313. The data pump circuit 311 then dials the recipient's telephone number using DTMF tones or pulse dialing over the telephone line interface circuit 309. Once an appropriate connection is established with the remote facsimile machine, standard facsimile handshaking is accomplished by the data pump circuit 311. Once the facsimile connection is established, the digital facsimile picture information is received through the data packet protocol transfer over serial line interface circuit 315, passed through main controller circuit 313 and data pump circuit 311 onto the telephone line through telephone line interface circuit 309 for receipt by the remote facsimile machine.
The operation ofthe multi-media mail function 121 of Figure 2 is also facilitated by the hardware components of Figure 3. A multimedia transmission consists of a combination of picture information, digital data and digitized voice information. For example, the type of multimedia information transferred to a remote site using the hardware components of Figure 3 could be the multimedia format ofthe MicroSoft® Multimedia Wave® format with the aid of an Intelligent Serial Interface (ISI) card added to the personal computer. The multimedia may also be the type of multimedia information assembled by the software component ofthe present system which is described more fully below. The multimedia package of information including text, graphics and voice messages (collectively called the multimedia document) may be transmitted or received through the hardware components shown in Figure 3. For example, the transmission of a multimedia document through the hardware components of Figure 3 is accomplished by transferring the multimedia digital information using the packet protocol described below over the RS232 serial interface between the personal computer and the serial line interface circuit 315.
The packets are then transferred through main controller circuit 313 through the data pump circuit 311 on to the telephone line for receipt at a remote site through telephone line interface circuit 309. In a similar fashion, the multimedia documents received over the telephone line from the remote site are received at the telephone line interface circuit 309, passed through the data pump circuit 311 for receipt and forwarding by the main controller circuit 313 over the serial line interface circuit 315.
The show and tell function 123 ofthe present system allows the user to establish a data over voice communication session. In this mode of operation, full duplex data transmission may be accomplished simultaneously with the voice communication between both sites. This mode of operation assumes a like- configured remote site. The hardware components ofthe present system also include a means for sending voice/data over cellular links. The protocol used for transmitting multiplexed voice and data include a supervisory packet described more fully below to keep the link established through the cellular link. This supervisory packet is an acknowledgement that the link is still up. The supervisory packet may also contain link information to be used for adjusting various link parameters when needed. This supervisory packet is sent ever}' second when data is not being sent and if the packet is not acknowledged after a specified number of attempts, the protocol would then give an indication that the cellular link is down and then allow the modem to take action. The action could be for example; change speeds, retrain, or hang up. The use of supervisory packets is a novel method of maintaining inherently intermittent cellular links when transmitting multiplexed voice and data. The voice portion ofthe voice over data transmission ofthe show and tell function is accomplished by receiving the user's voice through the telephone interface 301, 302 or 303 and the voice information is digitized by the digital telephone circuit 305. The digitized voice information is passed to the voice control circuit 306 where the digitized voice information is compressed using a voice compression algorithm as described in the above-mentioned U.S. Patent No. 5,452,289, issued September 19. 1995 and entitled "COMPUTER-
BASED MULTIFUNCTION PERSONAL COMMUNICAΗONS SYSTEM". The digitized and compressed voice information is passed through dual port RAM circuit 308 to the main controller circuit 313. During quiet periods ofthe speech, a quiet flag is passed from voice control circuit 306 to the main controller 313 through a packet transfer protocol described below by a dual port RAM circuit 308.
Simultaneous with the digitizing compression and packetizing of the voice information is the receipt ofthe packetized digital information from the personal computer over interface line circuit 315 by main controller circuit 313. Main controller circuit 313 in the show and tell function of the present system must efficiently and effectively combine the digitized voice information with the digital information for transmission over the telephone line via telephone line interface circuit 309. As described above and as described more fully below, main controller circuit 313 dynamically changes the amount of voice information and digital information transmitted at any given period of time depending upon the quiet times during the voice transmissions. For example, during a quiet moment where there is no speech information being transmitted, main controller circuit 313 ensures that a higher volume of digital data information be transmitted over the telephone line interface in lieu of digitized voice information.
Also, as described more fully below, the packets of digital data transmitted over the telephone line interface with the transmission packet protocol described below, requires 100 percent accuracy in the transmission of the digital data, but a lesser standard of accuracy for the transmission and receipt ofthe digitized voice information. Since digital information must be transmitted with 100 percent accuracy, a corrupted packet of digital information received at the remote site must be re-transmitted. A retransmission signal is communicated back to the local site and the packet of digital information which was corrupted during transmission is retransmitted. Ifthe packet transmitted contained voice data, however, the remote site uses the packets whether they were corrupted or not as long as the packet header was intact. Ifthe header is corrupted, the packet
is discarded. Thus, the voice information may be corrupted without requesting retransmission since it is understood that the voice information must be transmitted on a real time basis and the corruption of any digital information of the voice signal is not critical. In contrast to this the transmission of digital data is critical and retransmission of corrupted data packets is requested by the remote site.
The transmission ofthe digital data follows the CCITT V.42 standard, as is well known in the industry and as described in the CCITT Blue Book, volume VIII entitled Data Communication over the Telephone Network, 1989. The CCITT V.42 standard is hereby incoφorated by reference. The voice data packet information also follows the CCITT V.42 standard, but uses a different header format so the receiving site recognizes the difference between a data packet and a voice packet. The voice packet is distinguished from a data packet by using undefined bits in the header (80 hex) ofthe V.42 standard. The packet protocol for voice over data transmission during the show and tell function ofthe present system is described more fully below.
Since the voice over data communication with the remote site is full-duplex, incoming data packets and incoming voice packets are received by the hardware components of Figure 3. The incoming data packets and voice packets are received through the telephone line interface circuit 309 and passed to the main controller circuit 313 via data pump DSP circuit 31 1. The incoming data packets are passed by the main controller circuit 313 to the serial interface circuit 315 to be passed to the personal computer. The incoming voice packets are passed by the main controller circuit 313 to the dual port RAM circuit 308 for receipt by the voice control DSP circuit 306. The voice packets are decoded and the compressed digital information therein is uncompressed by the voice control DSP of circuit 306. The uncompressed digital voice information is passed to digital telephone CODEC circuit 305 where it is reconverted to an analog signal and retransmitted through the telephone line interface circuits. In this fashion full-duplex voice and data transmission and reception is
accomplished through the hardware components of Figure 3 during the show and tell functional operation ofthe present system.
Terminal operation 125 ofthe present system is also supported by the hardware components of Figure 3. Terminal operation means that the local personal computer simply operates as a "dumb" terminal including file transfer capabilities. Thus no local processing takes place other than the handshaking protocol required for the operation of a dumb terminal. In terminal mode operation, the remote site is assumed to be a modem connected to a personal computer but the remote site is not necessarily a site which is configured according to the present system. In terminal mode of operation, the command and data information from personal computer is transferred over the RS232 serial interface circuit 315, forwarded by main controller circuit 313 to the data pump circuit 311 where the data is placed on the telephone line via telephone line interface circuit 309. In a reciprocal fashion, data is received from the telephone line over telephone line interface circuit 309 and simply forwarded by the data pump circuit 311 , the main controller circuit 313 over the serial line interface circuit 315 to the personal computer.
As described above, and more fully below, the address book function of the present system is primarily a support function for providing telephone numbers and addresses for the other various functions ofthe present system.
Packet Protocol Overview Specific details on packet protocol are found in the above- mentioned U.S. Patent No. 5,452,289, issued September 19, 1995 and entitled "COMPUTER-BASED MULTIFUNCTION PERSONAL COMMUNICATIONS SYSTEM".
A special packet protocol is used for communication between the hardware components 20 and the personal computer (PC) 10. The protocol is used for transferring different types of information between the two devices such as the transfer of DATA, VOICE, and QUALIFIED information.
In one embodiment there are 3 types of packets used: DATA, VOICE, and QUALIFIED. A Data Packet is used for normal data transfer between the controller 313 of hardware component 20 and the computer 10 for such things as text, file transfers, binary data and any other type of information presently being sent through modems. All packet transfers begin with a synch character 01 hex (synchronization byte). The Data Packet begins with an ID byte which specifies the packet type and packet length.
The Voice Packet is used to transfer compressed VOICE messages between the controller 313 of hardware component 20 and the computer 10. The Voice Packet is similar to the Data Packet except for its length which is, in one embodiment, currently fixed at 23 bytes of data. Once again, all packets begin with a synchronization character chosen in the preferred embodiment to be 01 hex (01H). The ID byte ofthe Voice Packet is completely a zero byte: all bits are set to zero. The Qualified Packet is used to transfer commands and other non- data/voice related information between the controller 313 of hardware component 20 and the computer 10 and start with a synchronization character chosen in one embodiment to be 01 hex (01H). A Qualified Packet starts with two bytes where the first byte is the ID byte and the second byte is the QUALIFIER type identifier.
In order to determine the status ofthe cellular link, a supervisory packet is also used. Both sides ofthe cellular link will send the cellular supervisory packet every 1 to 3 seconds. Upon receiving the cellular supervisory packet, the receiving side will acknowledge it using the ACK field ofthe cellular supervisory packet. Ifthe sender does not receive an acknowledgement within one second, it will repeat sending the cellular supervisory packet up to 12 times. After 12 attempts of sending the cellular supervisory packet without an acknowledgement, the sender will disconnect the line. Upon receiving an acknowledgement, the sender will restart its 3 second timer. Those skilled in the art will readily recognize that the timer values and wait times selected here may be varied without departing from the spirit or scope ofthe present invention.
A modified supervisory packet was described in detail in the above-mentioned US Patent Application Serial Number 08/271,496 filed July 7, 1994 entitled "VOICE OVER DATA MODEM WITH SELECTABLE VOICE COMPRESSION". The modified supervisory packet was described as an independent communications channel. One example demonstrated the use of a modified supervisory packet in the negotiation of nonstandard communication parameters. For instance, the modified supervisory packet is used to negotiate speech algorithm selection and speech compression ratios. Other examples were given, and those given here are not intended in a limiting or exclusive sense. Packet Protocol Between the PC and the Hardware Component A special packet protocol is used for communication between the hardware components 20 and the personal computer (PC) 10. The protocol is used for transferring different types of information between the two devices such as the transfer of DATA, VOICE, and QUALIFIED information. The protocol also uses the BREAK as defined in CCITT X.28 as a means to maintain protocol synchronization. A description of this BREAK sequence is also described in the Statutory Invention Registration entitled "ESCAPE METHODS FOR MODEM COMMUNICATIONS", to Timothy D. Gunn filed January 8, 1993, which is hereby incoφorated by reference.
The protocol has two modes of operation. One mode is packet mode and the other is stream mode. The protocol allows mixing of different types of information into the data stream without having to physically switch modes of operation. The hardware component 20 will identify the packet received from the computer 10 and perform the appropriate action according to the specifications ofthe protocol. If it is a data packet, then the controller 313 of hardware component 20 would send it to the data pump circuit 31 1. If the packet is a voice packet, then the controller 313 of hardware component 20 would distribute that information to the Voice DSP 306. This packet transfer mechanism also works in the reverse, where the controller 313 of hardware component 20 would give different information to the computer 10 without
having to switch into different modes. The packet protocol also allows commands to be sent to either the main controller 313 directly or to the Voice DSP 306 for controlling different options without having to enter a command state. Packet mode is made up of 8 bit asynchronous data and is identified by a beginning synchronization character (01 hex) followed by an ID/LI character and then followed by the information to be sent. In addition to the ID/LI character codes defined below, those skilled in the art will readily recognize that other ID/LI character codes could be defined to allow for additional types of packets such as video data, or altemate voice compression algorithm packets such as Codebook Excited Linear Predictive Coding (CELP) algorithm, GSM, RPE, VSELP, etc.
Stream mode is used when large amounts of one type of packet (VOICE, DATA, or QUALIFIED) is being sent. The transmitter tells the receiver to enter stream mode by a unique command. Thereafter, the transmitter tells the receiver to terminate stream mode by using the BREAK command followed by an "AT" type command. The command used to terminate the stream mode can be a command to enter another type of stream mode or it can be a command to enter back into packet mode. Currently there are 3 types of packets used: DATA. VOICE, and
QUALIFIED. Table 1 shows the common packet parameters used for all three packet types. Table 2 shows the three basic types of packets with the sub-types listed.
TABLE 1: Packet Parameters
1. Asynchronous transfer
2. 8 bits, no parity
3. Maximum packet length of 128 bytes - IDentifier byte = 1
- InFormation = 127
4. SPEED
- variable from 9600 to 57600 - default to 19200
TABLE 2: Packet Types l. Data
2. Voice
3. Qualified: a. COMMAND b. RESPONSE c. STATUS d. FLOW CONTROL e. BREAK f. ACK g. NAK h. STREAM
A Data Packet is shown in Table 1 and is used for normal data transfer between the controller 313 of hardware component 20 and the computer 10 for such things as text, file transfers, binary data and any other type of information presently being sent through modems. All packet transfers begin with a synch character 01 hex (synchronization byte) The Data Packet begins with an ID byte which specifies the packet type and packet length. Table 3 describes the Data Packet byte structure and Table 4 describes the bit stmcture of the ID byte of the Data Packet. Table 5 is an example of a Data Packet with a
byte length of 6. The value ofthe LI field is the actual length ofthe data field to follow, not counting the ID byte.
TABLE 3: Data Packet Byte Structure byte 1 = Olh (sync byte) byte 2 = ID/LI (ID byte/length indicator) bytes 3-127 = data (depending on LI)
01 ID SYNC LI data data data data data
TABLE 4: ID Byte of Data Packet
Bit 7 identifies the type of packet Bits 6 - 0 contain the LI or length indicator portion ofthe ID byte
TABLE 5: Data Packet Example
LI (length indicator) = 6
01 06 SYNC ID data data data data data data
The Voice Packet is used to transfer compressed VOICE messages between the controller 313 of hardware component 20 and the computer 10. The Voice Packet is similar to the Data Packet except for its length which is, in the preferred embodiment, currently fixed at 23 bytes of data. Once again, all packets begin with a synchronization character chosen in the
preferred embodiment to be 01 hex (01H). The ID byte ofthe Voice Packet is completely a zero byte: all bits are set to zero. Table 6 shows the ID byte ofthe Voice Packet and Table 7 shows the Voice Packet byte stmcture. TABLE 6: ID Byte of Voice Packet
TABLE 7: Voice Packet Byte Structure
LI (length indicator) = 0 23 bytes of data
(depending upon compression algorithm used)
01 00 SYNC ID data data data data data
The Qualified Packet is used to transfer commands and other non- data/voice related information between the controller 313 of hardware component 20 and the computer 10. The various species or types ofthe Qualified Packets are described below and are listed above in Table 2. Once again, all packets start with a synchronization character chosen in the preferred embodiment to be 01 hex (01H). A Qualified Packet starts with two bytes where the first byte is the ID byte and the second byte is the QUALIFIER type identifier. Table 8 shows the ID byte for the Qualified Packet, Table 9 shows the byte stmcture of the Qualified Packet and Tables 10-12 list the Qualifier Type byte bit maps for the three types of Qualified Packets.
TABLE 8: ID Byte of Qualified Packet
The Length Identifier ofthe ID byte equals the amount of data which follows including the QUALIFIER byte (QUAL byte + DATA). If LI = 1, then the Qualifier Packet contains the Q byte only.
TABLE 9: Qualifier Packet Byte Structure
01 85 QUAL SYNC ID BYTE data data data data
The bit maps ofthe Qualifier Byte (QUAL BYTE) ofthe Qualified Packet are shown in Tables 10-12. The bit map follows the pattem whereby if the QUAL byte = 0, then the command is a break. Also, bit 1 ofthe QUAL byte designates ack nak, bit 2 designates flow control and bit 6 designates stream mode command. Table 10 describes the Qualifier Byte of Qualified Packet, Group 1 which are immediate commands. Table 1 1 describes the Qualifier Byte of Qualified Packet, Group 2 which are stream mode commands in that the command is to stay in the designated mode until a BREAK + INIT command string is sent. Table 12 describes the Qualifier Byte of Qualified Packet, Group 3 which are information or status commands.
TABLE 10: Qualifier Byte of Qualified Packet: Group 1
765432 1 0 x x x x x x x x
00000000 = break
000000 1 0 =ACK
000000 1 1 =NAK
00000 1 00 = xoff or stop sending data 00000 1 0 1 = xon or resume sending data
0000 1 000 -cancel fax
TABLE 11: Qualifier Byte of Qualified Packet: Group 2
765432 1 0 x x x x x x x x
0 1 00000 1 = stream command mode 0 1 0000 1 0 = stream data 0 1 0000 1 1 = stream voice
0 1 000 1 00 = stream video 0 1 000 1 0 1 = stream A 0 1 000 1 1 0 = stream B 0 1 000 1 1 1 = stream C The Qualifier Packet indicating stream mode and BREAK attention is used when a large of amount of information is sent (voice, data...) to allow the highest throughput possible. This command is mainly intended for use in DATA mode but can be used in any one ofthe possible modes. To change from one mode to another, a break-init sequence would be given. A break "AT...<cr>" type command would cause a change in state and set the serial rate from the "AT" command.
TABLE 12: Qualifier Byte of Qualified Packet: Group 3
7 6 5 4 3 2 1 0 x x x x x x x x
1 0 0 0 0 0 0 0 = commands 1 0 0 0 0 0 0 1 = responses 1 0 0 0 0 0 1 0 = status
Cellular Supervisory Packet In order to determine the status ofthe cellular link, a supervisory packet shown in Table 13 is used. Both sides ofthe cellular link will send the cellular supervisory packet every 3 seconds. Upon receiving the cellular supervisory packet, the receiving side will acknowledge it using the ACK field ofthe cellular supervisory packet. Ifthe sender does not receive an acknowledgement within one second, it will repeat sending the cellular supervisory packet up to 12 times. After 12 attempts of sending the cellular supervisory packet without an acknowledgement, the sender will disconnect the line. Upon receiving an acknowledgement, the sender will restart its 3 second timer. Those skilled in the art will readily recognize that the timer values and wait times selected here may be varied without departing from the spirit or scope ofthe present invention. TABLE 13: Cellular Supervisory Packet Byte Structure
8F ID LI ACK data data data
Speech Compression
The Speech Compression algorithm described above for use in transmitting voice over data accomplished via the voice control circuit 306.
Referring once again to Figure 3, the user is talking either through the handset, the headset or the microphone/speaker telephone interface. The analog voice signals are received and digitized by the telephone CODEC circuit 305. The digitized voice information is passed from the digital telephone CODEC circuit
305 to the voice control circuits 306. The digital signal processor (DSP) ofthe voice control circuit 306 is programmed to do the voice compression algorithm. The source code programmed into the voice control DSP is in the microfiche appendix of U.S. Patent No. 5,452,289, issued September 19, 1995 and entitled "COMPUTER-BASED MULTIFUNCTION PERSONAL
COMMUNICATIONS SYSTEM". The DSP ofthe voice control circuit 306 compresses the speech and places the compressed digital representations ofthe speech into special packets described more fully below. As a result ofthe voice compression algorithm, the compressed voice information is passed to the dual port ram circuit 308 for either forwarding and storage on the disk ofthe personal computer via the RS232 serial interface or for multiplexing with conventional modem data to be transmitted over the telephone line via the telephone line interface circuit 309 in the voice-over-data mode of operation Show and Tell function 123. The compressed speech bits are multiplexed with data bits using a packet format described below. Three compression rates are described herein which will be called 8Kbit/sec, 9.6Kbit/sec and 16Kbit/sec.
Speech Compression Algorithm To multiplex high-fidelity speech with digital data and transmit both over the telephone line, a high available bandwidth would normally be required. In the present invention, the analog voice information is digitized into 8-bit PCM data at an 8 KHz sampling rate producing a serial bit stream of 64,000 bps serial data rate. This rate cannot be transmitted over the telephone line. With the Speech Compression algorithm described below, the 64 Kbs digital voice data is compressed into a 9500 bps encoding bit stream using a fixed-point (non-floating point) DSP such that the compressed speech can be transmitted over the telephone line multiplexed with asynchronous data. This is accomplished in an efficient manner such that enough machine cycles remain during real time speech compression to allow to allow for echo cancellation in the same fixed-point DSP. A silence detection function is used to detect quiet intervals in the speech signal which allows the data processor to substitute asynchronous data in
lieu of voice data packets over the telephone line to efficiently time multiplex the voice and asynchronous data transmission. The allocation of time for asynchronous data transmission is constantly changing depending on how much silence is on the voice channel. The voice compression algorithm ofthe present system relies on a model of human speech which shows that human speech contains redundancy inherent in the voice pattems. Only the incremental innovations (changes) need to be transmitted. The algorithm operates on 128 digitized speech samples (20 milliseconds at 6400 Hz), divides the speech samples into time segments of 32 samples (5 milliseconds) each, and uses predicted coding on each segment.
Thus, the input to the algorithm could be either PCM data sampled at 6400 Hz or 8000 Hz. Ifthe sampling is at 8000 Hz, or any other selected sampling rate, the input sample data stream must be decimated from 8000 Hz to 6400 Hz before processing the speech data. At the output, the 6400 Hz PCM signal is inteφolated back to 8000 Hz and passed to the CODEC.
With this algorithm, the current segment is predicted as best as possible based on the past recreated segments and a difference signal is determined. The difference values are compared to the stored difference values in a lookup table or code book, and the address ofthe closest value is sent to the remote site along with the predicted gain and pitch values for each segment. In this fashion, the entire 20 milliseconds of speech can be represented by 190 bits, thus achieving an effective data rate of 9500 bps.
To produce this compression, the present system includes a unique Vector Quantization (VQ) speech compression algorithm designed to provide maximum fidelity with minimum compute power and bandwidth. The VQ algorithm has two major components. The first section reduces the dynamic range of the input speech signal by removing short term and long term redundancies. This reduction is done in the waveform domain, with the synthesized part used as the reference for determining the incremental "new" content. The second section maps the residual signal into a code book optimized for preserving the general spectral shape ofthe speech signal.
Figure 4 is a high level signal flow block diagram ofthe speech compression algorithm used in the present system to compress the digitized voice for transmission over the telephone line in the voice over data mode of operation or for storage and use on the personal computer. The transmitter and receiver components are implemented using the programmable voice control DSP/CODEC circuit 306 shown in Figure 3.
The DC removal stage 1101 receives the digitized speech signal and removes the D.C. bias by calculating the long-term average and subtracting it from each sample. This ensures that the digital samples ofthe speech are centered about a zero mean value. The pre-emphasis stage 1103 whitens the spectral content ofthe speech signal by balancing the extra energy in the low band with the reduced energy in the high band.
The system finds the innovation in the current speech segment by subtracting 1109 the prediction from reconstmcted past samples synthesized from synthesis stage 1107. This process requires the synthesis ofthe past speech samples locally (analysis by synthesis). The synthesis block 1107 at the transmitter performs the same function as the synthesis block 1113 at the receiver. When the reconstmcted previous segment of speech is subtracted from the present segment (before prediction), a difference term is produced in the form of an error signal. This residual error is used to find the best match in the code book 1105. The code book 1105 quantizes the error signal using a code book generated from a representative set of speakers and environments. A minimum mean squared error match is determined in segments. In addition, the code book is designed to provide a quantization error with spectral rolloff (higher quantization error for low frequencies and lower quantization error for higher frequencies). Thus, the quantization noise spectrum in the reconstmcted signal will always tend to be smaller than the underlying speech signal.
The following description will specifically explain the algorithm for the 9.6Kbit sec compression rate, except where specifically stated otherwise. The discussion is applicable to the other compression rates by substituting the
parameter values found in Table 14, below, and by following the special instmctions for each calculation provided throughout the discussion. TABLE 14 Speech Compression Algorithm Parameters For Three Voice
Compression Rates
Parameter 16Kbit/sec 9.6Kbit/sec 8Kbit/sec
Input Samples (msecs @ 8Ksample/sec) 160 (20msec) 160 (20msec) 192 (24msec)
Decimated Samples (msec) 160 (20msec) 128 (20msec) 144 (24msec)
@ rate @ 8KsampIe/ @ 6.4Ksample/sec @ 6Ksample/sec sec
Sub-Block Size 40 32 36
Min_Pitch 40 32 36
Max_Pitch 160 95 99
Codebook Size 256 512 512
Vector Size (VSIZE) 5 8 9
# of compressed bytes 40 24 24
P 0.75 0.5 0.5
9.6Kbit/sec Compression Rate Algorithm For the 9.6Kbit/sec speech compression rate, each frame of 20ms is divided into 4 sub-blocks or segments of 5ms each. Each sub-block of data consists of a plurality of bits for the long term predictor, a plurality of bits for the long term predictor gain, a plurality of bits for the sub-block gain, and a plurality of bits for each code book entry for each 5ms. In the code book block, each 1.25ms of speech is looked up in a 512 word code book for the best match. The table entry is transmitted rather than the actual samples. The code book entries are pre-computed from representative speech segments, as described more fully below.
On the receiving end 1200, the synthesis block 1113 at the receiver performs the same function as the synthesis block 1107 at the transmitter. The synthesis block 1113 reconstmcts the original signal from the voice data packets by using the gain and pitch values and code book address corresponding to the error signal most closely matched in the code book. The code book at the receiver is similar to the code book 1105 in the transmitter. Thus the synthesis block recreates the original pre-emphasized signal. The de-emphasis stage 1115 inverts the pre- emphasis operation by restoring the balance of original speech signal.
The complete speech compression algorithm is summarized as follows: a) Digitally sample the voice to produce a PCM sample bit stream sampled at 8,000 samples per second. b) Decimate the 8,000 samples per second sampled data to produce a sampling rate of 6,400 samples per second for the 9.6Kbit/sec compression rate (6,000 samples per second for the 8Kbit sec algorithm and 8,000 samples per second for the 16Kbit/sec algorithm). c) Remove any D.C. bias in the speech signal. d) Pre-emphasize the signal. e) Find the innovation in the current speech segment by subtracting the prediction from reconstructed past samples. This step requires the synthesis ofthe past speech samples locally (analysis by synthesis) such that the residual error is fed back into the system. f) Quantize the error signal using a code book generated from a representative set of speakers and environments. A minimum mean squared error match is determined in 5ms segments. In addition, the code book is designed to provide a quantization error with spectral rolloff (higher quantization error for low frequencies and lower quantization error for higher frequencies). Thus, the
quantization noise spectrum in the reconstmcted signal will always tend to be smaller than the underlying speech signal. g) At the transmitter and the receiver, reconstruct the speech from the quantized error signal fed into the inverse ofthe function in step (e) above. Use this signal for analysis by synthesis and for the output to the reconstmction stage below, h) Use a de-emphasis filter to reconstmct the output.
The major advantages of this approach over other low-bit-rate algorithms are that there is no need for any complicated calculation of reflection coefficients (no matrix inverse or lattice filter computations). Also, the quantization noise in the output speech is hidden under the speech signal and there are no pitch tracking artifacts: the speech sounds "natural", with only minor increases of background hiss at lower bit-rates. The computational load is reduced significantly compared to a VSELP algorithm and variations ofthe present algorithm thus provides bit rates of 8, 9.6 and 16 Kbit/sec, and can also provide bit rates of 9.2Kbit/sec, 9.5Kbit/sec and many other rates. The total delay through the analysis section is less than 20 milliseconds in the 9.6Kbit sec embodiment. The present algorithm is accomplished completely in the waveform domain and there is no spectral information being computed and there is no filter computations needed. Detailed Description ofthe Speech Compression Algorithm The speech compression algorithm is described in greater detail with reference to Figures 5 through 8, and with reference to the block diagram of the hardware components ofthe present system shown at Figure 3. The voice compression algorithm operates within the programmed control ofthe voice control DSP circuit 306. In operation, the speech or analog voice signal is received through the telephone interface 301, 302 or 303 and is digitized by the digital telephone CODEC circuit 305. The CODEC for circuit 305 is a companding μ-law CODEC. The analog voice signal from the telephone interface is band-limited to about 3,000 Hz and sampled at a selected sampling rate by digital telephone CODEC 305. The
sample rates in the 9.6Kbit/sec embodiment ofthe present invention are 8Ksample/sec. Each sample is encoded into 8-bit PCM data producing a serial 64kb/s. The digitized samples are passed to the voice control DSP/CODEC of circuit 306. There, the 8-bit μ-law PCM data is converted to 13-bit linear PCM data. The 13-bit representation is necessary to accurately represent the linear version of the logarithmic 8-bit μ-law PCM data. With linear PCM data, simpler mathematics may be performed on the PCM data.
In one embodiment, the voice control DSP/CODEC of circuit 306 corresponds to a single integrated circuit, for example, a WF^ DSP16C Digital Signal Processor/CODEC from AT&T Microelectronics which is a combined digital signal processor and a linear CODEC in a single chip as described above. In one embodiment, the digital telephone CODEC of circuit 305 corresponds to an integrated circuit, such as a T7540 companding μ-law CODEC.
The sampled and digitized PCM voice signals from the telephone μ- law CODEC 305 shown in Figure 3 are passed to the voice control DSP/CODEC circuit 308 via direct data lines clocked and synchronized to a clocking frequency. The sample rate in CODEC 305 in this embodiment ofthe present invention is 8Ksample/sec. The digital samples are loaded into the voice control DSP/CODEC one at a time through the serial input and stored into an intemal queue held in RAM, converted to linear PCM data and decimated to a sample rate of 6.4Ksample/sec. As the samples are loaded into the end ofthe queue in the RAM ofthe voice control DSP, the samples at the head of the queue are operated upon by the voice compression algorithm. The voice compression algorithm then produces a greatly compressed representation ofthe speech signals in a digital packet form. The compressed speech signal packets are then passed to the dual port RAM circuit 308 shown in Figure 3 for use by the main controller circuit 313 for either transferring in the voice-over-data mode of operation or for transfer to the personal computer for storage as compressed voice for functions such as telephone answering machine message data, for use in the multi-media documents and the like. In the voice-over-data mode of operation, voice control DSP/CODEC circuit 306 of Figure 3 will be receiving digital voice PCM data from the digital
telephone CODEC circuit 305, compressing it and transferring it to dual port RAM circuit 308 for multiplexing and transfer over the telephone line. This is the transmit mode of operation ofthe voice control DSP/CODEC circuit 306 corresponding to transmitter block 1100 of Figure 4 and corresponding to the compression algorithm of Figure 4.
Concurrent with this transmit operation, the voice control DSP/CODEC circuit 306 is receiving compressed voice data packets from dual port RAM circuit 308, uncompressing the voice data and transferring the uncompressed and reconstmcted digital PCM voice data to the digital telephone CODEC 305 for digital to analog conversion and eventual transfer to the user through the telephone interface 301, 302, 304. This is the receive mode of operation ofthe voice control DSP/CODEC circuit 306 corresponding to receiver block 1200 of Figure 4 and corresponding to the decompression algorithm of Figure 6. Thus, the voice-control DSP/CODEC circuit 306 is processing the voice data in both directions in a full- duplex fashion.
The voice control DSP/CODEC circuit 306 operates at a clock frequency of approximately 24.576MHz while processing data at sampling rates of approximately 8KHz in both directions. The voice compression/decompression algorithms and packetization ofthe voice data is accomplished in a quick and efficient fashion to ensure that all processing is done in real-time without loss of voice information. This is accomplished in an efficient manner such that enough machine cycles remain in the voice control DSP circuit 306 during real time speech compression to allow real time acoustic and line echo cancellation in the same fixed- point DSP. In programmed operation, the availability of an eight-bit sample of
PCM voice data from the μ-law digital telephone CODEC circuit 305 causes an interrupt in the voice control DSP/CODEC circuit 306 where the sample is loaded into intemal registers for processing. Once loaded into an intemal register it is transferred to a RAM address which holds a queue of samples. The queued PCM digital voice samples are converted from 8-bit μ-law data to a 13-bit linear data format using table lookup for the conversion. Those skilled in the art will readilv
recognize that the digital telephone CODEC circuit 305 could also be a linear CODEC.
Sample Rate Decimation The sampled and digitized PCM voice signals from the telephone μ- law CODEC 305 shown in Figure 3 are passed to the voice control DSP/CODEC circuit 308 via direct data lines clocked and synchronized to a clocking frequency. The sample rate in this embodiment ofthe present invention is 8Ksample/sec. The digital samples for the 9.6Kbit sec and 8Kbit sec algorithms are decimated using a digital decimation process to produce a 6.4Ksample/sec and 6Ksample/sec rate, respectively. For the 16Kbit/sec algorithm, no decimation is needed.
Referring to Figure 4, the decimated digital samples are shown as speech entering the transmitter block 1100. The transmitter block is the mode of operation ofthe voice-control DSP/CODEC circuit 306 operating to receive local digitized voice information, compress it and packetize it for transfer to the main controller circuit 313 for transmission on the telephone line. The telephone line connected to telephone line interface 309 of Figure 3 corresponds to the channel l l l l of Figure 4.
A frame rate for the voice compression algorithm is 20 milliseconds of speech for each compression. This correlates to 128 samples to process per frame for the 6.4K decimated sampling rate. When 128 samples are accumulated in the queue ofthe intemal DSP RAM, the compression of that sample frame is begun.
Data Flow Description The voice-control DSP/CODEC circuit 306 is programmed to first remove the DC component 1101 ofthe incoming speech. The DC removal is an adaptive function to establish a center base line on the voice signal by digitally adjusting the values ofthe PCM data. This corresponds to the DC removal stage 1203 ofthe software flow chart of Figure 5. The formula for removal ofthe DC bias or drift is as follows:
32735 x(n) = s(n) - s(n-l) + α * x (n-l) where =
32768
and where n = sample number, s(n) is the current sample, and x(n) is the sample with the DC bias removed.
The removal ofthe DC is for the 20 millisecond frame of voice which amounts to 128 samples at the 6.4Ksample/sec decimated sampling rate which corresponds to the 9.6Kbit/sec algorithm. The selection of α is based on empirical observation to provide the best result. Referring again to Figure 5, the voice compression algorithm in a control flow diagram is shown which will assist in the understanding ofthe block diagram of Figure 4. Figure 7 is a simplified data flow description ofthe flow chart of Figure 5 showing the sample rate decimator 1241 and the sample rate incrementor 1242. Sample rate decimator 1241 produces an output 1251 of 6.4Ksample/sec for an 8Ksample/sec input in the 9.6Kbit/sec system. (Similarly, a 6Ksample/sec output 1250 is produced for the 8Kbit/sec algorithm, and no decimation is performed on the 8Ksample/sec voice sample rate 1252 for the 16Kbit sec algorithm.) The analysis and compression begin at block 1201 where the 13-bit linear PCM speech samples are accumulated until 128 samples (for the 6.4Ksample/sec decimated sampling rate) representing 20 milliseconds of voice or one frame of voice is passed to the DC removal portion of code operating within the programmed voice control DSP/CODEC circuit 306. The DC removal portion of the code described above approximates the base line ofthe frame of voice by using an adaptive DC removal technique. A silence detection algorithm 1205 is also included in the programmed code ofthe DSP/CODEC 306. The silence detection function is a summation ofthe square of each sample ofthe voice signal over the frame. If the power ofthe voice frame falls below a preselected threshold, this would indicate a silent frame. The detection of a silence frame of speech is important for later multiplexing ofthe V-data (voice data) and C-data (asynchronous computer data) described below. During silent portions of the speech, the main controller circuit
313 will transfer conventional digital data (C-data) over the telephone line in lieu of voice data (V-data). The formula for computing the power is
Sub-Block Size-1 PWR = Σ x (n) * x (n)
n = 0
where n is the sample number, and x (n) is the sample value
Ifthe power PWR is lower than a preselected threshold, then the present voice frame is flagged as containing silence. The 128-sample (Decimated Samples) silent frame is still processed by the voice compression algorithm; however, the silent frame packets are discarded by the main controller circuit 313 so that asynchronous digital data may be transferred in lieu of voice data. The rest of the voice compression is operated upon in segments where there are four segments per frame amounting to 32 samples of data per segment (Sub-Block Size). It is only the DC removal and silence detection which is accomplished over an entire 20 millisecond frame.
The pre-emphasis 1207 of the voice compression algorithm shown in Figure 4 is the next step. The sub-blocks are first passed through a pre-emphasis stage which whitens the spectral content of the speech signal by balancing the extra energy in the low band with the reduced energy in the high band. The pre-emphasis essentially flattens the signal by reducing the dynamic range of the signal. By using pre-emphasis to flatten the dynamic range of the signal, less of a signal range is required for compression making the compression algorithm operate more efficiently. The formula for the pre-emphasis is
x (n) = x (n) - p * x (n-l) where p = 0.5 for
9.6Kbit/sec and where n is the sample number, .v (n) is the sample
Each segment thus amounts to five milliseconds of voice which is equal to 32 samples. Pre-emphasis then is done on each segment. The selection of p is based on empirical observation to provide the best result.
The next step is the long-term prediction (LTP). The long-term prediction is a method to detect the innovation in the voice signal. Since the voice signal contains many redundant voice segments, we can detect these redundancies and only send information about the changes in the signal from one segment to the next. This is accomplished by comparing the speech samples ofthe current segment on a sample by sample basis to the reconstmcted speech samples from the previous segments to obtain the innovation information and an indicator ofthe error in the prediction.
The long-term predictor gives the pitch and the LTP-Gain ofthe sub¬ block which are encoded in the transmitted bit stream. In order to predict the pitch in the current segment, we need at least 3 past sub-blocks of reconstmcted speech. This gives a pitch value in the range of MIN_PITCH to MAX_PITCH (32 and 95, respectively, as given in Table 14). This value is coded with 6-bits. But, in order to accommodate the compressed data rate within a 9600 bps link, the pitch for segments 0 and 3 is encoded with 6 bits, while the pitch for segments 1 and 2 is encoded with 5 bits. When performing the prediction of the Pitch for segments 1 and 2, the correlation lag is adjusted around the predicted pitch value ofthe previous segment. This gives us a good chance of predicting the correct pitch for the current segment even though the entire range for prediction is not used. The computations for the long-term correlation lag PITCH and associated LTP-gain factor β j (where j = 0, 1, 2, 3 corresponding to each ofthe four segments ofthe frame) are done as follows:
For j = min_pitch .... max_pitch, first perform the following computations between the current speech samples x(n) and the past reconstmcted speech samples x'(n)
Sub-Block Size - 1
S„. (j) = ∑ x 0) * x ' 0 + MAXJPTTCH -j) i=0
Sub-Block Size - 1
SxV fi} = = Σ Σ x' (i + MAX_PITCH-j) * x' (i+MAX_PITCH-j)
1=0
s 2 The Pitch j is chosen as that which maximizes ~ . Since β j is positive, only j with positive S ? is considered.
For the 9.6Kbit/sec and 8Kbit/sec embodiments, the Pitch is encoded with different number of bits for each sub-segment, the value of minjpitch and max_pitch (range ofthe synthesized speech for pitch prediction ofthe current segment) is computed as follows: if (seg_number = 0 or 3)
{ min_pitch = MIN_PITCH max_pitch = MAX_PITCH
}
if (seg_number = 1 or 2)
{ min_pitch = prev_pitch - 15 if (prev_pitch < MIN_PITCH + 15) min_pitch = MIN_PITCH if (prev_pitch > MAX_PITCH + 15) min_pitch = MAX_PITCH - 30 max__pitch = min_pitch + 30 }
(This calculation is not necessary for the 16Kbit/sec algorithm.) The prev_pitch parameter in the above equation, is the ofthe pitch ofthe previous sub- segment. The pitch j is the encoded in 6 bits or 5 bits as: encoded bits = j - min_pitch
The LTP-Gain is given by
β= forSA^≠O sxV 0)
The value ofthe β is a normalized quantity between zero and unity for this segment where β is an indicator ofthe correlation between the segments. For example, a perfect sine wave would produce a β which would be close to unity since the correlation between the current segments and the previous reconstmcted segments should be almost a perfect match so β is one. The LTP gain factor is quantized from a LTP Gain Encode Table. This table is characterized in Table 15. The resulting index (bcode) is transmitted to the far end. At the receiver, the LTP Gain Factor is retrieved from Table 16, as follows: βq = dlb_tab[bcode]
TABLE 15: LTP Gain Encode Table
β= 0.1 0.3 0.5 0.7 0.9
.1 I I I I I I I I i
I I I I 1 I 1 i 1 j > bcode= 0
TABLE 16: LTP Gain Decode Table
3= 0.0 0.2 0.4 0.5 0.8 1.0
bcode=0
After the Long-Term Prediction, we pass the signal through a pitch filter to whiten the signal so that all the pitch effects are removed. The pitch filter is given by:
e (n) = x (n) - q *x'(n-j) where j is the Lag, and βq is the associated Quantized LTP Gain.
Next, the error signal is normalized with respect to the maximum amplitude in the sub-segment for vector-quantization ofthe error signal. The maximum amplitude in the segment is obtained as follows:
The maximum amplitude (G) is encoded using the Gain Encode Table. This table is characterized in Table 17. The encoded amplitude (gcode) is transmitted to the far end. At the receiver, the maximum amplitude is retrieved from Table 18, as follows:
Gn = dig Jab [gcode]
The error signal e(n) is then normalized by
e(n) e(n) =
TABLE 17 : Gain Encode Table
32 64 128 256 512 1024 2048 4096 8192
G=16 < ,1 , , |_| |_| i—l !—! 1—! !—i !—I I—>
7 8 9 o 1 (gcode)
TABLE 18: Gain Decode Table
G=16 32 64 128 256 512 1024 2048 4096 8192 I i — I i I i I i I I I 1 II j 1 II I 1 i i I 1 I i I 1 I i I 1 I i I i I i I 1
0 1 2 3 4 5 6 7 8 9 (gcode)
From the Gain and LTP Gain Encode tables, we can see that we would require 4 bits for gcode and 3 bits for bcode. This results in total of 7 bits for both parameters. In order to reduce the bandwidth of the compressed bit stream, the gcode and bcode parameters are encoded together in 6 bits, as follows:
BGCODE = 6 * gcode + bcode.
The encoded bits for the G and LTP-Gain (β) at the receiver can be obtained as follows:
gcode = BGCODE / 6 bcode = BGCODE - 6 * gcode
However, these calculations are needed only for the 8 Kbit/sec and 9.6Kbit/sec algorithms.
Each segment of 32 samples (Sub-Block Size) is divided into 4 vectors of 8 samples (VSIZE) each. Each vector is compared to the vectors stored in the CodeBook and the Index ofthe Code Vector that is closest to the signal vector is selected. The CodeBook consists of 512 entries (512 addresses). The index chosen has the least difference according to the following minimization formula:
VSIZE - 1
where x\ = the input vector of VSIZE samples (8 for the 9.6Kbit/sec algorithm), and y\ = the code book vector of VSIZE samples (8 for the 9.6Kbit/sec algorithm).
The minimization computation, to find the best match between the subsegment and the code book entries is computationally intensive. A bmte force comparison may exceed the available machine cycles if real time processing is to be accomplished. Thus, some shorthand processing approaches are taken to reduce the computations required to find the best fit. The above formula can be computed in a shorthand fashion as follows.
By expanding out the above formula, some ofthe unnecessary terms may be removed and some fixed terms may be pre-computed:
(x, - y,)2 = (x, - y,)*(x, - y,) = (x,2 - x,y, - x,y, + y,2)
= (x,2 - 2x,y, + y,2)
where x ,2 is a constant so it may be dropped from the formula, and the value of -Vi JT,2 may be precomputed and stored as the VSIZE + 1th value (8 + 1 = 9th value for the 9.6 Kbit sec algorithm) in the code book so that the only real-time computation involved is the following formula:
Thus, for a segment of Sub-Block Size samples (32 for the 9.6Kbit sec algorithm), we will transmit Sub-Block Size VSIZE CodeBook indices
(4 CodeBook Indices, 9 bits each, for the 9.6Kbit/sec algorithm). Therefore, for the 9.6Kbit/sec algorithm, for each Sub-Block Size segment we will transmit 36 bits representing that segment.
After the appropriate index into the code book is chosen, the input speech samples are replaced by the corresponding vectors in the chosen indexes. These values are then multiplied by the Gq to denormalize the synthesized error signal, e'(n). This signal is then passed through the Inverse Pitch Filter to reintroduce the Pitch effects that was taken out by the Pitch filter. The Inverse Pitch Filter is performed as follows:
y(n) = e'(n) + P q * x' (n -j)
where β is the Quantized LTP-Gain from Table 16, and j is the Lag.
The Inverse Pitch Filter output is used to update the synthesized speech buffer which is used for the analysis ofthe next sub-segment. The update of the state buffer is as follows:
x' (k) =* χ' (k + MIN_PΓΓCH)
where k = 0, ... , (MAX_PITCH - MIN_PITCH) - 1
where / = MAX_PITCH - MIN_PITCH, ..., MAX_PITCH - 1
The signal is then passed through the deemphasis filter since preemphasis was performed at the beginning ofthe processing. In the analysis, only the preemphasis state is updated so that we properly satisfy the Analysis-by- Synthesis method of performing the compression. In the Synthesis, the output ofthe deemphasis filter, s' ( ), is passed on to the D/A to generate analog speech. The deemphasis filter is implemented as follows:
s'(n) =y (n) + p * s' (n - l) where p = 0.5 for the 9.6Kbit/sec algorithm
The voice is reconstmcted at the receiving end ofthe voice-over data link according to the reverse ofthe compression algorithm as shown as the decompression algorithm in Figure 6.
If a silence frame is received, the decompression algorithm simply discards the received frame and initialize the output with zeros. If a speech frame is received, the pitch, LTP-Gain and GAIN are decoded as explained above. The error signal is reconstmcted from the codebook indexes, which is then denormalized with respect to the GAIN value. This signal is then passed through the Inverse filter to generate the reconstmcted signal. The Pitch and the LTP-Gain are the decoded values, same as those used in the Analysis. The filtered signal is passed through the Deemphasis filter whose output is passed on to the D/A to put out analog speech. The compressed frame contains 23 8-bit words and one 6-bit word.
Thus a total of 24 words. Total number of bits transferred is 190, which corresponds to 9500 bps as shown in Table 19 (for the 9.6Kbit/sec algorithm).
Table 19 Compressed Frame Packet for 9.6Kbit/sec Algorithm
7 6 5 4 3 2 1 0 Bit Number
S S Po5 Po4 Po3 Po2 Po* Po° Comp_Frame[0] v2 8 v,8 Vo8 P.4 P,3 P,2 P.' P.° Comp_Frame[l] v5 8 v4 8 v3 8 Pa4 P2 3 p2 2 Pa' P2° Comp_Frame[2] v7 8 v6 8 Ps5 P34 P33 P3 2 P3* P3° Comp_Frame[3]
Vs8 BG0 5 BG0 4 BG0 3 BG0 2 BG0' BG0° Comp_Frame[4]
V*,8 V,o8 BG,5 BG,4 BG,3 BG,2 BG, 1 BG,° Comp_Frame[5] v13 8 V V 128 BG2 5 BG2 4 BG2 3 BG2 2 BG2' BG/ Comp_Frame[6]
V,58 v14 8 BG3 5 BG3 4 BG3 3 BG3 2 BG3' BG3° Comp_Frame[7]
VQo7 VQo6 VQo5 VQo4 VQo3 VQo2 VQo1 VQ0° Comp_Frame[8] = LS 8 bits VQ[0]
VQ,7 VQ,6 VQ,5 VQ,4 VQ,3 VQ,2 VQ, 1 VQ,° Comp_Frame[9] = LS 8 bits VQ[l]
VQ,4 VQ,4 VQH VQ,4 VQ,4 VQ,4 VQ,4 VQH Comp_Frame [22]
7 6 5 A 3 2 1 0 LS 8 bits VQ[14]
VQ-5 VQ,5 VQ,5 VQ,5 VQ,5 VQ,5 VQ,5 VQ,5 Comp_Frame[23]
7 6 5 A 3 2 1 0 LS 8 bits VQ[15]
where BG = Beta/Gain, P = Pitch, VQ = CodeBook Index and S = Spare Bits
Code Book Descriptions
The code books used for the VQ algorithm described above are incorporated by reference as Appendices A, B and C in U.S. application Serial No. 08/346,421, filed November 29, 1994 entitled DYNAMIC SELECTION OF COMPRESSION RATE FOR A VOICE COMPRESSION ALGORITHM IN A VOICE OVER DATA MODEM. Appendix A includes the code book data for the 8Kbit sec algorithm, Appendix B includes the code book data for the 9.6Kbit/sec algorithm
and Appendix C includes the code book data for the 16Kbit/sec algorithm. Table 20 describes the format ofthe code book for the 9.6Kbit/sec algorithm. The code book values in the Appendices are stored in a signed floating point format which is converted to a fixed point representation of floating point number when stored in the lookup tables ofthe present invention. There are 512 entries in each code book corresponding to 512 different speech segments which can be used to encode and reconstruct the speech.
Table 20: Code Book Format for the 9.6Kbit/sec Algorithm
Code Book Entries Vi Sum2 Constant —
8 entries 1 entry
For the 9.6Kbit/sec algorithm, the code book comprises a table of nine columns and 512 rows of floating point data. The first 8 rows correspond to the 8 samples of speech and the ninth entry is the precomputed constant described above as -Vi Σ y,2. An example ofthe code book data is shown in Table 21 with the complete code book for the 9.6Kbit/sec algorithm described in Appendix B.
Table 21 : Code Book Example for the 9.6Kbit/sec Algorithm
0.786438 1.132875 1.208375 1.206750 1.1 14250 0.937688 0.772062 0.583250 3.93769 0.609667 1.019167 0.909167 0.957750 0.999833 0.854333 1.005667 0.91 1250 3.36278
0.614750 1.150750 1.477750 1.548750 1.434750 1.304250 1.349750 1.428250 6.95291
0.657000 1.132909 1.279909 1.204727 1.335636 1.280818 1.162000 0.958818 5.24933
0.592429 0.897571 1.101714 1.337286 1.323571 1.349000 1.304857 1.347143 5.6239
0.325909 0.774182 1.035727 1.263636 1.456455 1.356273 1.076273 0.872818 4.628
The code books are stored in PROM memory accessible by the Voice DSP as a lookup table. The table data is loaded into local DSP memory upon the selection of the appropriate algorithm to increase access speed. The code books comprise a table of data in which each entry is a sequential address from 000 to 51 1. For the 9.6Kbit/sec algorithm, a 9 X 512 code book is used. For the 16Kbit/sec algorithm, a 6 X 256 code book is used and for the 8Kbit sec algorithm, a 9 X 512 code book is used. Depending upon which voice compression quality and compression rate is selected, the corresponding code book is used to encode/decode the speech samples.
Generation ofthe Code Books The code books are generated statistically by encoding a wide variety of speech pattems. The code books are generated in a learning mode for the above- described algorithm in which each speech segment which the compression algorithm is first exposed to is placed in the code book until 512 entries are recorded. Then the algorithm is continually fed a variety of speech pattems upon which the code book is adjusted. As new speech segments are encountered, the code book is searched to find the best match. Ifthe error between the observed speech segment and the code book values exceed a predetermined threshold, then the closest speech segment in the code book and the new speech segment is averaged and the new average is placed in the code book in place ofthe closest match. In this learning mode, the code book is continually adjusted to have the lowest difference ratio between observed speech segment values and code book values. The learning mode of operation may take hours or days of exposure to different speech pattems to adjust the code books to the best fit.
The code books may be exposed to a single person's speech which will result in a code book being tailored to that particular persons method of speaking. For a mass market sale of this product, the speech pattems of a wide variety of speakers of both genders are exposed to the code book learning algorithm for the average fit for a given language. For other languages, it is best to expose the algorithm to speech pattems of only one language such as English or Japanese.
Voice Over Data Packet Protocol As described above, the present system can transmit voice data and conventional data concurrently by using time multiplex technology. The digitized voice data, called V-data carries the speech information. The conventional data is referred to as C-data. The V-data and C-data multiplex transmission is achieved in two modes at two levels: the transmit and receive modes and data service level and multiplex control level. This operation is shown diagrammatically in Figure 8.
In transmit mode, the main controller circuit 313 of Figure 3 operates in the data service level 1505 to collect and buffer data from both the personal computer 10 (through the RS232 port interface 315) and the voice control DSP 306.
In multiplex control level 1515, the main controller circuit 313 multiplexes the data and transmits that data out over the phone line 1523. In the receive mode, the main controller circuit 313 operates in the multiplex control level 1515 to de-multiplex the V-data packets and the C-data packets and then operates in the data service level 1505 to deliver the appropriate data packets to the correct destination: the personal computer 10 for the C-data packets or the voice control DSP circuit 306 for V-data.
Transmit Mode In transmit mode, there are two data buffers, the V-data buffer 1511 and the C-data buffer 1513, implemented in the main controller RAM 316 and maintained by main controller 313. When the voice control DSP circuit 306 engages voice operation, it will send a block of V-data every 20 ms to the main controller circuit 313 through dual port RAM circuit 308. Each V-data block has one sign byte as a header and 24 bytes of V-data.
The sign byte header ofthe voice packet is transferred every frame from the voice control DSP to the controller 313. The sign byte header contains the sign byte which identifies the contents ofthe voice packet. The sign byte is defined as follows:
00 hex = the following V-data contains silent sound 01 hex = the following V-data contains speech information
Ifthe main controller 313 is in transmit mode for V-data/C-data multiplexing, the main controller circuit 313 operates at the data service level to perform the following tests. When the voice control DSP circuit 306 starts to send the 24-byte V-data packet through the dual port RAM to the main controller circuit 313, the main controller will check the V-data buffer to see ifthe buffer has room for 24 bytes. If there is sufficient room in the V-data buffer, the main controller will check the sign byte in the header preceding the V-data packet. If the sign byte is equal to one (indicating voice information in the packet), the main controller circuit 313 will put the following 24 bytes of V-data into the V-data buffer and clear the
silence counter to zero. Then the main controller 313 sets a flag to request that the V-data be sent by the main controller at the multiplex control level.
Ifthe sign byte is equal to zero (indicating silence in the V-data packet), the main controller circuit 313 will increase the silence counter by 1 and check ifthe silence counter has reached 5. When the silence counter reaches 5, the main controller circuit 313 will not put the following 24 bytes of V-data into the V- data buffer and will stop increasing the silence counter. By this method, the main controller circuit 313 operating at the service level will only provide non-silence V-data to the multiplex control level, while discarding silence V-data packets and preventing the V-data buffer from being overwritten.
The operation ofthe main controller circuit 313 in the multiplex control level is to multiplex the V-data and C-data packets and transmit them through the same channel. At this control level, both types of data packets are transmitted by the HDLC protocol in which data is transmitted in synchronous mode and checked by CRC error checking. If a V-data packet is received at the remote end with a bad CRC, it is discarded since 100% accuracy ofthe voice channel is not ensured. Ifthe V-data packets were re-sent in the event of corruption, the real-time quality ofthe voice transmission would be lost. In addition, the C-data is transmitted following a modem data communication protocol such as CCITT V.42. In order to identify the V-data block to assist the main controller circuit 313 to multiplex the packets for transmission at his level, and to assist the remote site in recognizing and de-multiplexing the data packets, a V-data block is defined which includes a maximum of five V-data packets. The V-data block size and the maximum number of blocks are defined as follows: The V-data block header = 80h;
The V-data block size = 24; The maximum V-data block size -= 5; The V-data block has higher priority to be transmitted than C-data to ensure the integrity ofthe real-time voice transmission. Therefore, the main controller circuit 313 will check the V-data buffer first to determine whether it will transmit V-data or C-data blocks. If V-data buffer has V-data of more than 69 bytes,
a transmit block counter is set to 5 and the main controller circuit 313 starts to transmit V-data from the V-data buffer through the data pump circuit 311 onto the telephone line. Since the transmit block counter indicates 5 blocks of V-data will be transmitted in a continuous stream, the transmission will stop either at finish the 115 bytes of V-data or if the V-data buffer is empty. If V-data buffer has V-data with number more than 24 bytes, the transmit block counter is set 1 and starts transmit V- data. This means that the main controller circuit will only transmit one block of V- data. Ifthe V-data buffer has V-data with less than 24 bytes, the main controller circuit services the transmission of C-data. During the transmission of a C-data block, the V-data buffer condition is checked before transmitting the first C-data byte. Ifthe V-data buffer contains more than one V-data packet, the current transmission ofthe C-data block will be terminated in order to handle the V-data.
Receive Mode On the receiving end ofthe telephone line, the main controller circuit
313 operates at the multiplex control level to de-multiplex received data to V-data and C-data. The type of block can be identified by checking the first byte ofthe incoming data blocks. Before receiving a block of V-data, the main controller circuit 313 will initialize a receive V-data byte counter, a backup pointer and a temporary V-data buffer pointer. The value ofthe receiver V-data byte counter is 24, the value ofthe receive block counter is 0 and the backup pointer is set to the same value as the V-data receive buffer pointer. If the received byte is not equal to 80 hex (80h indicating a V-data packet), the receive operation will follow the current modem protocol since the data block must contain C-data. Ifthe received byte is equal to 80h, the main controller circuit 313 operating in receive mode will process the V-data. For a V-data block received, when a byte of V-data is received, the byte of V-data is put into the V-data receive buffer, the temporary buffer pointer is increased by 1 and the receive V-data counter is decreased by 1. If the V-data counter is down to zero, the value ofthe temporary V-data buffer pointer is copied into the backup pointer buffer. The value ofthe total V-data counter is added with 24 and the receive V-data counter is reset to 24. The value of the receive block
counter is increased by 1. A flag to request service of V-data is then set. Ifthe receive block counter has reached 5, the main controller circuit 313 will not put the incoming V-data into the V-data receive buffer but throw it away. Ifthe total V-data counter has reached its maximum value, the receiver will not put the incoming V- data into the V-data receive buffer but throw it away.
At the end ofthe block which is indicated by receipt ofthe CRC check bytes, the main controller circuit 313 operating in the multiplex control level will not check the result ofthe CRC but instead will check the value ofthe receive V-data counter. Ifthe value is zero, the check is finished, otherwise the value ofthe backup pointer is copied back into the current V-data buffer pointer. By this method, the receiver is insured to de-multiplex the V-data from the receiving channel 24 bytes at a time. The main controller circuit 313 operating at the service level in the receive mode will monitor the flag of request service of V-data. Ifthe flag is set, the main controller circuit 313 will get the V-data from the V-data buffer and transmit it to the voice control DSP circuit 306 at a rate of 24 bytes at a time. After sending a block of V-data, it decreases 24 from the value in the total V-data counter.
Negotiation of Voice Compression Rate The modem hardware component 20 incoφorates a modified packet protocol for negotiation ofthe speech compression rate. A modified supervisory packet is formatted using the same open flag, address, CRC, and closing flag formatting bytes which are found in the CCITT V.42 standard data supervisory packet, as is well known in the industry and as is described in the CCITT Blue Book, volume VIII entitled Data Communication over the Telephone Network. 1989 referenced above. In the modified packet protocol embodiment, the set of CCITT standard header bytes (control words) has been extended to include nonstandard control words used to signal transmission of a nonstandard communication command. The use of a nonstandard control word does not conflict with other data communication terminals, for example, when communicating with a non-PCS (Personal Communications System) modem system, since the nonstandard packet will be ignored by a non-PCS system.
Table 22 offers one embodiment ofthe present invention showing a modified supervisory packet stmcture. Table 22 omits the CCITT standard formatting bytes: open flag, address, CRC, and closing flag; however, these bytes are described in the CCITT standard. The modified supervisory packet is distinguished from a V.42 standard packet by using a nonstandard control word, such as 80 hex, as the header. The nonstandard control word does not conflict with V.42 standard communications.
TABLE 22: Modified Supervisory Packet Structure
80h ID LI ACK data data data
The modified supervisory packet is transmitted by the HDLC protocol in which data is transmitted in synchronous mode and checked by CRC error checking. The use of a modified supervisory packet eliminates the need for an escape command sent over the telephone line to intermpt data communications, providing an independent channel for negotiation of the compression rate. The channel may also be used as an altemative means for programming standard communications parameters.
The modified supervisory packet is encoded with different function codes to provide an independent communications channel between hardware components. This provides a means for real time negotiation and programming of the voice compression rate during uninterrupted transmission of voice data and conventional data without the need for conventional escape routines. The modified supervisor}' packet is encoded with a function code using several embodiments. For example, in one embodiment, the function code is embedded in the packet as one of the data words and is located in a predetermined position. In an altemate embodiment, the supervisory packet header signals a nonstandard supervisory packet and contains the compression rate to be used between the sites. In such an embodiment, for example, a different nonreserved header is assigned to each
function code. These embodiments are not limiting and other methods known to those skilled in the art may be employed to encode the function code into the modified supervisory packet.
Referring once again to Figure 1, a system consisting of PCS modem 20 and data terminal 10 are connected via phone line 30 to a second PCS system comprised of PCS modem 20 A and data terminal 10A. Therefore, calling modem 20 initializes communication with receiving modem 20 A. In one embodiment ofthe present invention, a speech compression command is sent via a modified supervisory data packet as the request for speech compression algorithm and ratio negotiation. Encoded in the speech compression command is the particular speech compression algorithm and the speech compression ratio desired by the calling PCM modem 20. Several methods for encoding the speech compression algorithm and compression ratio exist. For example, in embodiments where the function code is embedded in the header byte, the first data byte ofthe modified supervisory packet could be used to identify the speech compression algorithm using a binary coding scheme (e.g., OOh for Vector Quantization, Olh for CELP+, 02h for VCELP, and 03h for TmeSpeech, etc.). A second data byte could be used to encode the speech compression ratio (e.g., OOh for 9.6Kbit/sec, Olh for 16Kbit/sec, 02h for 8Kbit/sec, etc.). This embodiment ofthe speech compression command supervisory packet is shown in Table 23.
TABLE 23: Speech Compression Command Supervisory Packet
80h ID LI ACK Algthm CRatio data
Altematively, as stated above, the function code could be stored in a predetermined position of one ofthe packet data bytes. Other function code encoding embodiments are possible without deviating from the scope and spirit of the present invention and the embodiments offered are not intended to be exclusive or limiting embodiments.
In either case, the receiving PCS modem 20A will recognize the speech compression command and will respond with an acknowledge packet using.
for instance, a header byte such as hex 81. The acknowledge packet will alert the calling modem 20 that the speech compression algorithm and speech compression ratio selected are available by use ofthe ACK field ofthe supervisory packet shown in Table 23. Receipt ofthe acknowledge supervisory packet causes the calling modem 20 to transmit subsequent voice over data information according to the selected speech compression algorithm and compression ratio.
The frequency of which the speech compression command supervisory packet is transmitted will vary with the application. For moderate quality voice over data applications, the speech compression algorithm need only be negotiated at the initialization ofthe phone call. For applications requiring more fidelity, the speech compression command supervisory packet is renegotiated throughout the call to accommodate new parties to the communication which have different speech compression algorithm limitations or to actively tune the speech compression ratio as the quality ofthe communications link fluctuates. Other embodiments provide a speech compression command supervisory packet encode varying transmission rates ofthe speech compression command supervisory packet and different methods of speech compression algorithm and compression ratio negotiation. Additionally, other encoding embodiments to encode the supervisory packet speech compression algorithm and the speech compression ratio may be incorporated without deviating from the scope and spirit ofthe present invention, and the described embodiments are not exclusive or limiting.
A new supervisory packet may be allocated for use as a means for negotiating multiplexing scheme for the various types of information sent over the communications link. For example, if voice over data mode is employed, there exist several methods for multiplexing the voice and digital data. The multiplexing scheme may be selected by using a modified supervisory packet, called a multiplex supervisory packet, to negotiate the selection of multiplexing scheme.
Similarly, another supervisory packet could be designated for remote control of another hardware device. For example, to control the baud rate or data
format of a remote modem, a remote control supervisory packet could be encoded with the necessary selection parameters needed to program the remote device. Detailed Description of a Mode Switching System Referring again to Figure 1, consider the case where a first user on modem 20 has established analog voice communications with a second user at remote modem 20a. As shown in Figure 9, the first user and second user may wish to establish either digital data communications or voice over data communications without terminating the existing analog voice telephone connection. The term "digital data link" will be used to describe the digital link established to commence either a digital data communications mode, a voice over data communications mode, or a combination ofthe two modes. In digital data communications mode the modem transmits digital data and in voice over data communications mode the modem transmits multiplexed packetized voice and data packets. The termination of the digital data link results in an exit by hang up or by return to analog voice mode. As illustrated in Figure 9, the users begin in the analog voice mode
400 and a digital data link is initiated 410 by the method and apparatus described herein. After handshaking mode is complete 412 the digital data link is established 414. Depending on the particular application the users may enter a digital data communications mode 420, a voice over data communications mode 430, or a sequential combination ofthe two modes, as shown in Figure 9. The users may exit 440 by hanging up the telephone lines 450 or by reentering analog voice mode 400. Throughout this description the numberings shown in Figure 3 shall be used to indicate the components of modem 20, and similar numbering shall be used to indicate modem 20a by attaching an "a" suffix to each component of Figure 3. For example, the main controller of modem 20 is controller 313, whereas the main controller of modem 20a is controller 313a (not shown).
Switching Systems for Establishing The Digital Data Link In one embodiment ofthe present invention the first user and the second user establish the digital data link by pressing a hardwired switch 330 located on modem 20 and a similar switch 330a located on modem 20a, at approximately the same time. Switch 330 is shown in Figure 3 as one means for initiating digital data
link. To ensure consistent handshaking, the users have predetermined which one will be the originating modem and which one will be the answering modem. Controller 313 determines whether its modem is originating or answering based on whether it receives an originate signal 332 or an answer signal 334, as predetermined by the users. Controller 313 of modem 20 detects when the switch 330 is pressed by the first user and controller 313a of modem 20a detects when the switch 330a is pressed by the second user. Both modems 20 and 20a execute software to establish a digital link through handshaking protocols specified in CCITT v-series modem protocols (some examples are v.22, v.22bis, v.32 and v.34 protocols). In another embodiment the users initiate digital data communications by a software switch which is selected from a menu of options displayed on computers 10 and 10a, respectively. The software switch also contains options for each user to select their originating or answer status.
As shown in Figure 10, an analog voice connection is established 500 and when both modems detect the pressed switches 330 and 330a, the modems are placed in handshaking mode. The designation of originating modem and answering modem is predefined by the users before entering into handshaking mode 510. After handshaking is completed 520 the digital data link is established 530 and the modems may enter either the digital data communications mode for digital data transfer or the voice over data communications mode for multiplexed voice and data packet transfer 540. The exit routines 550 will be discussed in further detail below (see Figure 15).
In yet another embodiment, as shown in Figure 1 1 the switch between analog voice mode and the digital data link modes is accomplished using a switching signal. Both moden are preprogrammed to idle in an origination state 600 prior to the analog voice connection 610 and both modems have a hardware mode switch to force the modem into an answer state 620. When both modems are in the origination state the analog voice communications are conducted normally and without interruption. Ifthe hardware switch is depressed on one of the modems, that modem (e.g., modem A) will enter an answer mode and transmit an answer tone, which is used as a switching signal 630. The answer tone is detected by the modem
which is still in origination mode (modem B) 640 and the originating modem and the answering modem handshake with the originator/answer designation forced by the user depressing the hardware mode switch 650. The digital link is thereby established 660 and digital data communications and voice over data communications are operable 670.
As shown in Figure 12, a variation of this embodiment occurs when both modems idle in the answer state 700 and the mode switch is used to force one ofthe modems into an originator mode 720. The originating modem thereby transmits a calling tone which is used as a switching signal 730. The answering modem detects the calling tone and responds with an answering tone 740, and the modems handshake 750 with the originator/answer designation forced by the hardware mode switch. The digital link is thereby established 760 and digital data communications and voice over data communications are operable 770.
In yet another embodiment the modems are idling with a software routine designed to poll telephone line interface 309 in order to detect transmission of a predetermined switching tone sent from another modem. Both modems include a mode switch that has both an origination and an answer mode selection. Figure 13 shows that when a user depresses the mode switch to force one modem into the origination mode 810 and 820, the other modem detects the calling signal generated by that originating modem and the resident software forces the second modem into an answering mode 830. In this case the mode signal is the calling signal. Ifthe user depresses the mode switch to force the first modem into the answering mode the first modem generates an answer tone which is the switching signal 840. The answer tone is decoded by the second modem and the software on the second modem forces that modem into an origination mode 850.
The last three embodiments eliminate the need for both operators to predetermine which modem will be originating and which modem will be answering. It also provides the users with the ability to unilaterally establish a digital data link. In one embodiment the answer tone is a 2100 Hz tone and the calling tone is a 1300 Hz tone. Those skilled in the art will readily recognize that other tone
frequencies and audio signals may be used as switching signals without departing from the scope and spirit ofthe present invention. For example, a dual-tone multifrequency (DTMF) tone may be substituted for the switching tone. Another example incorporates the use of a sequence of DTMF tones to be decoded as a mode switching signal, in place of a single switching tone.
In one embodiment, both modems are preprogrammed to monitor telephone line interface 309 in order to detect the switching tone using codec/DSP 311. In another embodiment the switching tone is detected using DSP 306. Altemate embodiments include signal debouncing means to eliminate accidental triggering of the modems into the handshake mode.
In an altemate embodiment the mode switch is actually a software switch, which is operated by the user at the terminal attached to the modem.
Those skilled in the art will readily recognize that other methods of initiating the digital data link may be substituted for the methods described herein without departing from the spirit and scope of the present invention, and the methods taught herein are not intended in a limiting or exclusive sense.
Establishing Digital Data Communications Using ATP/ ATA Commands
One embodiment provides a digital data link between modems 20 and 20a by the use of ATD and ATA modem commands to place the modems in the handshaking mode. This method and apparatus does not require hardware switches 330 or 330a, but does require that both users predetermine which will be an originating modem and which will be an answering modem, as shown in Figure 14, step 910. When the first user and second user desire a digital data link, one ofthe users will transmit to its respective modem an ATD (dialing) command 920. The other user will transmit an ATA (answering) command to its respective modem 920. Transmission of these commands may be initiated with either a software command or a hardware switching device which generates the ATD/ATA commands and transfers the appropriate command to their respective modems.
If, for example, the first user transmits the ATD command to modem 20 then controller 313 of modem 20 receives the command and places modem 20 in answering mode. In this example, an ATA command is issued to modem 20a which
places modem 20a in handshaking mode and initiates an answer tone 930. Modem 20, which is in the origination mode, receives the answer tone generated by modem 20a and initiates digital data communications through handshaking according to CCITT v-series modem protocols 940, 950. In one embodiment the modems establish communication parameters during handshaking. Some ofthe communications parameters negotiated include baud rate and digital data protocols. Those skilled in the art will readily recognize that other protocols may be substituted without departing from the scope and spirit of this embodiment ofthe present invention. In an altemate embodiment parameter negotiation is performed by the modified supervisory packet as described in the above-mentioned US Patent
Application Serial Number 08/271,496 filed July 7, 1994 entitled "VOICE OVER DATA MODEM WITH SELECTABLE VOICE COMPRESSION".
Establishing Voice Over Data Communications Using ATD/ATA Commands In one embodiment voice over data communications are performed by establishing the digital data link as described above and then incorporating a supervisory packet to signal the voice over data communications mode as described in the copending US Patent No. 5,535,204, issued July 9, 1996 entitled "RINGDOWN AND RINGBACK SIGNALLING FOR A COMPUTER-BASED MULTIFUNCTION PERSONAL COMMUNICATIONS SYSTEM", which was incorporated by reference, above.
In a second embodiment, establishment of the digital data link may automatically invoke the voice over data communications mode which additionally incoφorates advanced priority statistical multiplexing (APSM). APSM is described in the copending US Patent Application Serial Number 08/349,505 filed December 2, 1994 entitled "VOICE OVER DATA CONFERENCING FOR A COMPUTER- BASED PERSONAL COMMUNICATIONS SYSTEM", which is hereby incoφorated by reference. APSM allows the digital link to use as much bandwidth for voice as is necessary, and the remaining bandwidth is dynamically allocated to digital data communications, thus, eliminating the need to switch between digital data transmission mode and a voice over data transmission mode.
Yet another embodiment switches between the digital data communications mode and the voice over data communications mode by using special mode switching codes transmitted by the users ofthe modems 20 and 20a after the digital link is established. After the voice over data communications link is established, the modem 20a may negotiate communications parameters such as speech compression ratio and voice algorithm selection using the modified supervisory packet as detailed in the above-mentioned US Patent Application Serial Number 08/271,496 filed July 7, 1994 entitled "VOICE OVER DATA MODEM WITH SELECTABLE VOICE COMPRESSION".
One Example of Establishing The Digital Data Link Using Calling Tones Altemate methods and apparatus may be employed to switch from voice analog mode to digital data communications mode or voice over data communications mode. For example, in one embodiment, if modem 20 is the originating modem and modem 20a is the answering modem a 1300 Hz calling tone is used to initiate transfer from analog voice mode to digital communications mode. The originating modem (modem 20 in this instance) is programmed to transmit a 1300 Hz calling tone to originate contact with an answering modem and codec/DSP 311 is programmed to detect the 2100 Hz answer tone received from the answering modem (modem 20a in this instance). When both users wish to establish the digital data link, the originating modem (20) transmits the 1300 Hz calling tone and the answering modem (20a) transmits a 2100 Hz answering tone, which is detected by the originating modem (20). Upon detection of the answering tone, the originating modem and the answering modem begin handshaking to establish the digital data link.
Other audible tones may be substituted without departing from the scope and spirit of the present invention and other hardware may be configured to detect the audible signal. For example, in an altemate embodiment the codec/DSP ofthe originating modem is preprogrammed to detect the 2100 Hz calling tone. In yet another embodiment, a dedicated detector is added to the hardware to detect the answering tone and signal the modem electronics that a digital link is being initiated.
In an altemate embodiment, both modems are constantly monitoring their respective telephone line interfaces (309 and 309a) using codec/DSP (311 and 31 la) to detect an audio calling signal. This allows one modem to initiate the data link; both users need not instruct their modems to establish the digital data link Additional codes are used to place the modems in digital data communications mode or voice over data communications mode after the digital link is established. The APSM system described in the previous section entitled "Establishing Voice Over Data Communications Using ATD/ATA Commands" automatically switches between digital data communications mode and voice over data communications mode according to the data being transferred between the modems. The modified supervisory packet also discussed in that section provides an additional communications channel and enables negotiation of commimications parameters as described above via the supervisory packet.
One Example of Establishing The Digital Data Link Using DTMF Tones Specialized DTMF tones may be used to initiate the establishment of the digital data link while in analog voice mode. In one embodiment the user manually enters a predetermined DTMF tone from the telephone keypad during the analog voice connection to initiate establishment ofthe digital link. In another embodiment, a DTMF tone sequence is detected to switch from analog voice mode to digital data link mode. In yet another embodiment, the modem software is preprogrammed dial the numbers in order to generate the DTMF tone sequence. For example, in one embodiment to initiate digital data communications data pump DSP circuit 311 is preprogrammed to recognize a special DTMF tone sequence which initiates the establishment ofthe digital data link between the originating and answering modems.
For example, ifthe special DTMF tone sequence is represented by a particular dialing sequence, for instance, 5-5-6-2, then the user initiates the digital data link by pressing touch tone buttons 5-5-6-2 in the proper sequence during the analog voice mode. These numbers represent the DTMF tones decoded by the answering modem to begin modem handshaking.
Speech Recognition Mode Switching One skilled in the art will readily recognize that other signalling techniques may be employed without departing from the scope and spirit ofthe present invention. For example, in one embodiment nonstandard signalling, such as speech recognition, is incoφorated into digital signal processor 306. DSP 306 is preprogrammed to leam and recognize verbal commands which are issued by a first operator to enter digital data communications mode or voice over data communications mode. The commands may be understood by both the local modem and the remote site modem since both modems are connected to a common analog voice connection. The commands are executed automatically upon recognition by DSPs 306. In another embodiment codec/DSP 312 momtors telephone line interface 309 to detect predetermined voice commands to establish digital data communications and voice over data communications.
Establishing Facsimile Mode Using Facsimile Tone Yet another embodiment incoφorates an 1100 Hz facsimile audible tone for switching from analog voice mode to facsimile mode. The detection ofthe 1100 Hz facsimile signal is accomplished by monitoring telephone line interface 309 using codec/DSP 311 and switching to facsimile mode upon signal detection.
Exiting Digital Data Communications As shown in Figure 15, in one embodiment ofthe present invention exit from the digital data link 550 is performed digitally, by encoding a special a Hangup Command Packet 1010. In another embodiment, an exit command 1030 is performed using the supervisory packet with a Retum to Analog Voice Mode (RAV) command 1040 to signal end of digital communications. In an altemate embodiment a special RAV audible tone is generated to signal retum to analog voice mode and disable the modems 1050, 1060.
The present inventions are to be limited only in accordance with the scope of the appended claims, since others skilled in the art may devise other embodiments still within the limits ofthe claims.
Claims
1. A communication module for use with a personal computer, comprising: communications interface means connected for communicating to the personal computer for transferring data between the personal computer and the communications module; communication line interface means for connection to a commumcation line and for full duplex digital communication over the communication line; voice interface means for receiving local voice signals from a local user and for conveying remote voice signals from a remote user to the local user; full-duplex conversion means connected to the voice interface means for converting the local voice signals into outgoing digital voice data and for converting incoming digital voice data into the remote voice signals; digital signal processor means connected to the full-duplex conversion means for compressing the outgoing digital voice data into compressed outgoing digital voice data packets and for decompressing compressed incoming digital voice data packets into the incoming digital voice data, each ofthe compressed outgoing digital voice data packets having headers and each ofthe compressed incoming digital voice data packets having headers; and mode switching means for switching from an analog voice mode to a digital data mode and a voice over data commumcations mode.
2. The communication module of claim 1, further comprising: main control means connected to the communication line interface means, connected for receiving the compressed outgoing digital voice data packets from the digital signal processor means, connected for receiving outgoing computer digital data packets from the personal computer through the communications interface means, operable for multiplexing the compressed outgoing digital voice data packets and the computer digital data packets to produce multiplexed outgoing data and for sending the multiplexed outgoing data to the communication line interface means for transmission over the communication line;
3. The communication module of claim 2, wherein the main control means is further operable for receiving multiplexed incoming data from the communication line interface means, the multiplexed incoming data containing incoming computer digital data packets multiplexed with the compressed incoming digital voice data packets, the main control means is further operable for demultiplexing the incoming computer digital data packets and the compressed incoming digital voice data packets, and for sending the incoming computer digital data packets to the personal computer through the communications interface means and for sending the compressed incoming digital voice data packets to the digital signal processor means.
4. The communication module of claim 1 wherein the mode switching means uses a calling tone for mode switching.
5. The communication module of claim 1 wherein the mode switching means uses an answer tone for mode switching.
6. The communication module of claim 1 wherein the mode switching means uses a dual-tone multifrequency signal for mode switching.
7. The communication module of claim 1 wherein the mode switching means uses modem dialing and modem answering commands for mode switching.
8. The communication module of claim 1 wherein the mode switching means comprises speech recognition means for mode switching based on verbal commands.
9. A method for mode switching comprising the steps of: establishing analog voice communications between a first communication module and a second communication module capable of packet communications; producing a mode switch signal from the first communication module; and switching from analog communications to packet commimications in response to the mode switch signal.
10. The method of claim 9 wherein the step of producing a mode switch signal comprises the step of producing a calling tone signal.
11. The method of claim 9 wherein the step of producing a mode switch signal comprises the step of producing an answer tone signal.
12. The method of claim 9 wherein the step of producing a mode switch signal comprises the step of producing a dual tone multifrequency signal.
13. The method of claim 9 wherein the step of producing a mode switch signal comprises the steps of providing speech recognition means and producing a mode switch signal based on verbal commands received by the speech recognition means.
14. The method of claim 9, wherein the packet communications comprise voice over data communications.
15. The method of claim 9, wherein the packet communications comprise digital data communications.
16. The method of claim 9, wherein the packet communications comprise voice communications.
17. The method of claim 9, wherein the packet communications comprise video communi cati ons .
18. A method for mode switching in a multifunction communication system, comprising the steps of: establishing communications between a first modem and a second modem; communicating a mode switch signal from the first modem to the second modem by encoding a packet with a code indicating a desired communications mode.
19. A method for mode switching in a multifunction communication system, comprising the steps of: establishing communications between a first modem and a second modem; communicating a mode switch signal from the first modem to the second modem by transmitting a tone indicating a desired communications mode.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US49967595A | 1995-07-07 | 1995-07-07 | |
US08/499,675 | 1995-07-07 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1997003513A1 true WO1997003513A1 (en) | 1997-01-30 |
Family
ID=23986234
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1996/011313 WO1997003513A1 (en) | 1995-07-07 | 1996-07-05 | Mode switching system for a voice over data modem |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO1997003513A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1168736A1 (en) * | 2000-06-30 | 2002-01-02 | Alcatel | Telecommunication system and method with a speech recognizer |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4740963A (en) * | 1986-01-30 | 1988-04-26 | Lear Siegler, Inc. | Voice and data communication system |
EP0650286A2 (en) * | 1993-10-25 | 1995-04-26 | Multi-Tech Systems Inc | Ringdown and ringback signalling for a computer-based multifunction personal communications system |
US5463616A (en) * | 1993-01-07 | 1995-10-31 | Advanced Protocol Systems, Inc. | Method and apparatus for establishing a full-duplex, concurrent, voice/non-voice connection between two sites |
-
1996
- 1996-07-05 WO PCT/US1996/011313 patent/WO1997003513A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4740963A (en) * | 1986-01-30 | 1988-04-26 | Lear Siegler, Inc. | Voice and data communication system |
US5463616A (en) * | 1993-01-07 | 1995-10-31 | Advanced Protocol Systems, Inc. | Method and apparatus for establishing a full-duplex, concurrent, voice/non-voice connection between two sites |
US5535204A (en) * | 1993-01-08 | 1996-07-09 | Multi-Tech Systems, Inc. | Ringdown and ringback signalling for a computer-based multifunction personal communications system |
EP0650286A2 (en) * | 1993-10-25 | 1995-04-26 | Multi-Tech Systems Inc | Ringdown and ringback signalling for a computer-based multifunction personal communications system |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1168736A1 (en) * | 2000-06-30 | 2002-01-02 | Alcatel | Telecommunication system and method with a speech recognizer |
WO2002003632A1 (en) * | 2000-06-30 | 2002-01-10 | Alcatel | Telecommunication system and method with speech recognizer |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5546395A (en) | Dynamic selection of compression rate for a voice compression algorithm in a voice over data modem | |
EP0650286B1 (en) | Ringdown and ringback signalling for a computer-based multifunction personal communications system | |
EP0656718B1 (en) | Dual port interface for a computer-based multifunction personal communication system using the telephone network | |
US5617423A (en) | Voice over data modem with selectable voice compression | |
US5812534A (en) | Voice over data conferencing for a computer-based personal communications system | |
US5790532A (en) | Voice over video communication system | |
CA2126927C (en) | Noncompressed voice and data communication over modem for a computer-based multifunction personal communications system | |
US7082141B2 (en) | Computer implemented voice over data communication apparatus and method | |
US5754589A (en) | Noncompressed voice and data communication over modem for a computer-based multifunction personal communications system | |
WO1996015601A2 (en) | Voice over data conferencing communications system | |
WO1997003513A1 (en) | Mode switching system for a voice over data modem | |
CA2216294C (en) | Dual port interface for a computer based multifunctional personal communication system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): CA JP |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
122 | Ep: pct application non-entry in european phase | ||
NENP | Non-entry into the national phase |
Ref country code: CA |