US20110234200A1 - Adaptive slip double buffer - Google Patents
Adaptive slip double buffer Download PDFInfo
- Publication number
- US20110234200A1 US20110234200A1 US13/065,583 US201113065583A US2011234200A1 US 20110234200 A1 US20110234200 A1 US 20110234200A1 US 201113065583 A US201113065583 A US 201113065583A US 2011234200 A1 US2011234200 A1 US 2011234200A1
- Authority
- US
- United States
- Prior art keywords
- buffer
- samples
- read
- adaptive
- clock
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 239000000872 buffer Substances 0.000 title claims abstract description 215
- 230000003044 adaptive effect Effects 0.000 title claims abstract description 47
- 238000000034 method Methods 0.000 claims abstract description 39
- 238000012544 monitoring process Methods 0.000 claims abstract description 4
- 230000003247 decreasing effect Effects 0.000 claims description 4
- 230000008859 change Effects 0.000 claims description 3
- 238000004590 computer program Methods 0.000 claims description 3
- 239000000523 sample Substances 0.000 description 42
- 230000006870 function Effects 0.000 description 21
- 238000012545 processing Methods 0.000 description 19
- 230000009471 action Effects 0.000 description 18
- 238000004422 calculation algorithm Methods 0.000 description 18
- 238000006243 chemical reaction Methods 0.000 description 16
- 230000005540 biological transmission Effects 0.000 description 15
- 238000005070 sampling Methods 0.000 description 15
- 230000004048 modification Effects 0.000 description 10
- 238000012986 modification Methods 0.000 description 10
- 230000000694 effects Effects 0.000 description 9
- 230000003139 buffering effect Effects 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 238000007792 addition Methods 0.000 description 5
- 230000008901 benefit Effects 0.000 description 5
- 230000015556 catabolic process Effects 0.000 description 5
- 238000007906 compression Methods 0.000 description 5
- 230000006835 compression Effects 0.000 description 5
- 238000006731 degradation reaction Methods 0.000 description 5
- 238000012217 deletion Methods 0.000 description 5
- 230000037430 deletion Effects 0.000 description 5
- 230000008707 rearrangement Effects 0.000 description 5
- 238000006467 substitution reaction Methods 0.000 description 5
- 238000013459 approach Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 230000006735 deficit Effects 0.000 description 4
- 230000001934 delay Effects 0.000 description 4
- 230000002939 deleterious effect Effects 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 4
- 230000006399 behavior Effects 0.000 description 3
- 239000013078 crystal Substances 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 239000012464 large buffer Substances 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 230000006855 networking Effects 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 230000003466 anti-cipated effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013144 data compression Methods 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000001627 detrimental effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 230000007257 malfunction Effects 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 230000000116 mitigating effect Effects 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 229910052701 rubidium Inorganic materials 0.000 description 1
- IGLNJRXAVVLDKE-UHFFFAOYSA-N rubidium atom Chemical compound [Rb] IGLNJRXAVVLDKE-UHFFFAOYSA-N 0.000 description 1
- 239000007858 starting material Substances 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000010977 unit operation Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04J—MULTIPLEX COMMUNICATION
- H04J3/00—Time-division multiplex systems
- H04J3/02—Details
- H04J3/06—Synchronising arrangements
- H04J3/062—Synchronisation of signals having the same nominal but fluctuating bit rates, e.g. using buffers
- H04J3/0632—Synchronisation of packets and cells, e.g. transmission of voice via a packet network, circuit emulation service [CES]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L7/00—Arrangements for synchronising receiver with transmitter
- H04L7/0016—Arrangements for synchronising receiver with transmitter correction of synchronization errors
- H04L7/005—Correction by an elastic buffer
Definitions
- Embodiments of the invention relate generally to the field of digital networking communications. More particularly, an embodiment of the invention relates to methods and systems for packet (and/or frame) switched networking that include an adaptive slip double buffer.
- VoIP Voice over IP
- RTP and RTCP see Ref. [1,2]
- QoE Quality of Experience
- PSTN Public Switched Telephone Network
- the overall delay has four principal components.
- the process of packetization involves buffering information to fill the packet payload and thus introduces delay.
- the encoding and decoding algorithms especially in the case of source codecs, require buffering as well. These two delays are often known quantities.
- the third component is the delay through the network. This delay is difficult to predict a priori since it depends on the physical distance, the number of intermediate packet switches involved in the end-to-end transport of a packet, the bandwidth of the links between switches (routers). However, for two given end-points there is, in principle, a minimal network delay corresponding to the transit time of the fastest possible packet transmission.
- the delay experienced by packets will be variable, ranging from the minimal delay to infinity (a packet lost in the network is construed as an instance of infinite delay).
- Some maximum delay threshold must be determined and packets with delay greater than this maximum are discarded.
- Received packets are stored in a buffer whose size corresponds to the difference between minimum and maximum delays and so, practically speaking, fast packets are delayed so that the packets can be decoded and converted back to analog signals in a smooth fashion.
- the notion of play-out, or dejittering whereby some delay is introduced via a jitter buffer constitutes the fourth delay component.
- the play-out buffer also referred to as the jitter buffer, should be as small as possible.
- a process comprises: monitoring a fill in an adaptive slip buffer of a digital to analog convertor; adjusting a number of samples that are read from the adaptive slip buffer per page as a function of the fill; and reading the number of samples from the adaptive slip buffer.
- a machine comprises: a digital to analog convertor including an adaptive slip buffer and a read address generator coupled to the adaptive slip buffer, wherein the read address generator includes an increment control that adjusts a number of samples that are read from the adaptive slip buffer per page as a function of fill of the adaptive slip buffer.
- FIG. 1 is a functional block view of a simplified depiction of a VoIP call (only one direction shown), appropriately labeled “PRIOR ART.”
- FIG. 2 is a functional block view of a circular buffering action separating ADC and DAC clocks, appropriately labeled “PRIOR ART.”
- FIG. 3 is a functional block view of a simplified model of VoIP over an IP network, appropriately labeled “PRIOR ART.”
- FIG. 4 is a functional block view of transmission of voice-band signals over a packet network, appropriately labeled “PRIOR ART.”
- FIG. 5 is a functional block view of depicting the functions involved in generating the received speech signal, representing an embodiment of the invention.
- FIG. 6 is a functional block view of an underlying principle of a retiming FIFO buffer (play-out buffer), representing an embodiment of the invention.
- FIG. 7 is a functional block view of a double buffer arrangement for delivering samples to the DAC, representing an embodiment of the invention.
- FIG. 8 is a functional block view of a simplified circular buffer arrangement, representing an embodiment of the invention.
- FIG. 9 is a functional block view in more detail of “Read Add. Gen.” ( 433 in FIG. 8 ), representing an embodiment of the invention.
- the invention described herein describes a novel approach to the play-out buffer, providing a method to maintain optimal performance even in situations where the analog-to-digital converter (ADC) and digital-to-analog converter (DAC) have different underlying time-bases.
- ADC analog-to-digital converter
- DAC digital-to-analog converter
- the invention is an extension of controlled slip behavior.
- the slip mechanism is invoked primarily when the speech segment represents a synthetic signal such as during periods of silence or if the characteristics of the speech segment are such that the repetition/deletion of a speech sample will have minimal subjective annoyance.
- an adaptive play-out buffer of the manner described here can form an integral part of an adaptive jitter buffer mechanism. Extensions of the invention include methods to implement adaptive clock control with minimal impact on subjective quality.
- synchronization applies to alignment of time and the term syntonization applies to alignment of frequency, but in the telecommunication environment we often use the term synchronization to refer to either time-alignment, or frequency-alignment, or both. It is generally clear from the context which meaning is appropriate. All real-time communication carried over a digital network requires synchronization to some degree. This can be illustrated by considering the example of delivering a real-time voice signal between two geographically disparate points across a network.
- the situation is depicted in FIG. 1 .
- the analog source is converted into digital format by an analog-to-digital converter (ADC or A/D) operating at a sampling clock rate of nominally 8 kHz.
- ADC analog-to-digital converter
- Each sample is, conventionally, quantized to 8 bits so that the digital stream carrying the voice information is 8 kilo-octets-per-second or 64 kbps (see ITU-T Rec. G.711, Ref. [3], and Ref. [4]).
- This is regarded as a DS0 and represents “uncompressed” voice.
- this DS0 is delivered “as is” to the destination for conversion back to analog format.
- the DS0 is, possibly, compressed and organized into packets. These packets are delivered to the destination where the expansion to DS0 format is performed prior to conversion back to analog.
- VoIP Voice-over-IP
- the schemes described here are applicable regardless of the word-length employed for A/D conversion or D/A conversion, we shall henceforth assume here that these are done with a word-length of 8 bits (1 octet) (representative of ⁇ -law and A-law formats provided in ITU-T Recommendation G.711) for specificity.
- DAC or D/A digital-to-analog converter
- ADC or A/D analog-to-digital converter
- FIG. 1 we show a single direction of transmission solely for convenience in representation and explanation.
- the rate at which packets are generated is determined by the A/D clock, shown as f A in FIG. 1 .
- f A The rate at which packets are generated (in the encoder) is determined by the A/D clock, shown as f A in FIG. 1 .
- the nominal word-length associated with each sample is 8 bits, following G.711 (see Ref. [3]) so the “uncompressed” signal represents a bit-rate of 64 kbps (or DS0).
- Compression algorithms are employed to reduce the effective bit-rate.
- ADPCM adaptive differential pulse code modulation
- ITU-T Recommendation G.726 reduces the word-length associated with each sample to 4, effectively reducing the data rate to 32 kbps.
- ITU-T Recommendation G.727 describes methods for reducing the bits/sample from 8 down to 5 or 4 or 3 or 3, corresponding to bit-rates of 40, 32, 24, and 16 kbps, respectively.
- More sophisticated schemes such as those described in ITU-T Recommendation G.723 and G.729 (see Ref. [5]) are even more effective in reducing the bit-rate.
- the notion of a “10-ms-packet” is the collection of information produced by the coder that permits the decoder at the far end to synthesize a 10-ms block of speech. Depending on the coding algorithm it is possible that information from previous packets is necessary as well. At the receiving end the decoder recreates the appropriate digital signal (DS0) for conversion back into analog format.
- the D/A clock is shown as f D in FIG. 1 .
- the circuit-switched network is well synchronized and this approach to network synchronization has the derivative benefit that the clock offset between the end points is minimized.
- NGN where asynchronous transport is employed, there is no guarantee that the clock offset between the end points is negligible.
- a slip is the result of the difference in sampling rates and is independent of the word length associated with the quantization and compression.
- the degradation of perceptual quality caused by slips is in addition to any degradation caused by other factors.
- the unit of information inserted or deleted is one sample (or octet).
- a slip occurs when the accumulated phase difference, expressed in time units, caused by the aforementioned frequency difference, crosses 125 ⁇ s.
- the unit could be as large a block of speech, typically of duration 20 ms and thus slips would have an impact similar to packet loss. Note that 20-ms slips occur much less frequently than 125- ⁇ s slips but have a greater impact each time they occur.
- the thrust of the current invention is to get the benefits of single-octet (single-sample) slips in a packet environment. Furthermore, the thrust of the current invention is to get the benefits of a single-octet slip in low-cost implementations such as in customer-premises-equipment (CPE) integrated-access-devices (IADs) and residential gateways.
- CPE customer-premises-equipment
- IADs integrated-access-devices
- XO Crystal Oscillator
- TCXO Temperature-Compensated Crystal Oscillator
- OCXO Oven-Controlled Crystal Oscillator
- the higher quality clock sources are appropriate.
- the quality of the oscillator is likely to be of the XO or, at best, TCXO class.
- the perceptual degradation in quality caused by slips is very subjective.
- the impact of an isolated slip in conventional telephony using uncompressed signals (G.711) is typically a “click” that could well be imperceptible, especially if it occurs during a silent interval.
- the perceived quality degrades rapidly as the slip-rate increases.
- the various digital switches in the PSTN are all provided a PRS (Primary Reference Source) traceable reference and thus have an absolute accuracy of better than 1 ⁇ 10 ⁇ 11 .
- a call traversing two distinct timing domains may experience slips corresponding to a worst-case frequency difference of 2 ⁇ 10 ⁇ 11 . Considering that this equates to one slip every 72 days, we can, for all practical purposes, ignore the phenomenon of slips in the traditional circuit-switched network.
- the end points are quite cost sensitive and therefore it is likely that the quality of oscillator deployed will be represented by one of the last three rows of Table 1 and clearly slips may play an important role in determining the quality of experience (or lack thereof).
- the size of the buffer is large, then the relative frequency of occurrence of buffer overflow/underflow events will be small.
- large buffers imply the introduction of delay and the decrease in quality of experience.
- the analog signal from the source enters the network and is converted into a digital signal by the analog-to-digital converter (ADC).
- ADC analog-to-digital converter
- the network acts as a pipe for these digital words (samples) that are delivered to the far-end digital-to-analog converter (DAC) for conversion back to analog.
- the conversion points could be in equipment, such as a customer-premise located IAD or PBX or even a Class-5 switch operated by the local telephone company. It is important to recognize that the time-base governing the A/D clock could be different from the time-base governing the D/A clock and thus there could be a difference in the sampling rates associated with these two conversions. That is, in every digital network there is the potential of encountering the pitch modification effect.
- the frequency difference could be small, of the order of 2 parts in 10 11 , if the conversion clocks are traceable to a Stratum-1 source (or sources); the frequency difference could be significant, of the order of 64 parts in 10 6 (64 parts per million or 64 ppm), if the only guarantee given is that the conversion clocks are Stratum-4 quality (Stratum-4 implies an accuracy of no worse than ⁇ 32 ppm).
- Stratum-4 implies an accuracy of no worse than ⁇ 32 ppm.
- This buffer will be of a FIFO (first-in-first-out) nature where the data is written into the buffer under control of the ADC clock and read out of the buffer under control of the DAC clock.
- ADC clock is higher
- DAC clock is higher.
- double buffering wherein there are two pages, say A and B, and while data is being written into page A, data is being read out of page B. If there is no frequency offset, then the opposite-page nature of read and write will, for the most part, be preserved.
- Such a buffer needs to be just big enough to accommodate any relative wander or jitter between the two clocks. It is convenient to describe the size of the buffer in terms of time. For example, if each page is “10 ms”, then each page has 80 octets, assuming a nominal sampling rate of 8 kHz and one octet per sample (e.g. G.711; see Ref. [3] or [4]). The overall buffer is then 20 ms deep, introduces a nominal delay of 10 ms and can accommodate ⁇ 10 ms of wander.
- a good way of visualizing the double-buffer action is to consider a circular buffer as depicted in FIG. 2 .
- the memory is organized in a circular manner with address calculations done Modulo-2N, where 2N is the total number of memory locations. From the viewpoint of the DS0 channel under consideration, each location holds one octet (corresponding to one octet per sample), the buffer has a “length” of (2N/8) ms, introduces a nominal delay of (N/8) ms, and can accommodate ⁇ (N/8) ms of wander.
- the operation is quite simple. With each write operation the write pointer moves one location counter-clockwise and likewise the read pointer moves one location counter-clockwise with each read operation.
- a slip occurs when the relative time interval error between read and write clocks exceeds 125 ⁇ s. For example, if the relative frequency offset between the two clocks is 64 ppm, then a slip will occur approximately every 2 seconds.
- the delay through the network is not steady as is the case of circuit-switched networks. Therefore, even if the rates of the ADC and DAC are equal, the write clock may, on a short-term basis, appear to be faster (or slower) than the read clock. This requires the use of a buffer that is called a jitter buffer because the term used in the industry for variable transit delay is “jitter”.
- the buffer will overflow (underflow) when the relative time interval error between the two clocks exceeds 100 ms.
- a 64 ppm offset will thus result in overflows (underflows) approximately every 3000 seconds.
- overflows (underflows) that are a result of a clock offset may be ignored for all practical purposes. This is one of the (incorrect) reasons given by proponents of IP networks that frequency synchronization is not required because free-running clocks can support VoIP considering that buffer overflows and underflows can be made rare by increasing the size of the buffer.
- the thrust of this invention is to use multiple buffers.
- One buffer is similar to a traditional jitter buffer.
- the incoming packets are written into the jitter buffer upon arrival. Note that this write operation is tied, effectively, to the ADC clock (of the far end) with additional jitter introduced by the packet delay variation in the network.
- the packets are extracted (read out) from the jitter buffer using the DSP block (explained later) that is nominally uniform.
- the rate of packet extraction by the DSP block is determined by the rate of the DAC clock.
- the second buffer is a double buffer whose size is altered occasionally to adjust the rate at which the jitter buffer data is extracted by the DSP block.
- a network based on packet switching and transmission can be quite complex, but the simple model depicted in FIG. 3 is sufficient to illustrate how synchronization and adaptive play-out buffers play a role.
- IAD Integrated Access Device
- the IAD will provide an FXS port to which the Fax machine (telephone) is connected.
- the FXS port appears, for all intents and purposes, as the line circuit of a traditional Class-5 switch.
- the IAD contains the codec where the conversion between analog and digital is accomplished. The information, however, is not transported as a conventional DS0 would in a TDM (time division multiplexed) or circuit-switched scenario.
- the data is packetized and encapsulated in the appropriate “wrappers” for transmission over the packet network.
- FIG. 4 In terms of the important processes involved after call set-up, a simple, though accurate, view is depicted in FIG. 4 . For convenience only one direction of transmission is shown.
- the analog signal from the source Fax machine or telephone (“srce”) is converted into digital format using an A/D converter. It is quite conventional to use a conventional telephony codec that uses a sampling rate of 8 kHz and encodes the sample value in an octet (G.711 coding) though there are implementations described in the literature where a higher sampling rate and a higher word-length are used for improved fidelity.
- These samples are assembled into packets. For speech applications there may be some signal processing involved for purposes of echo cancellation and data compression; for Fax calls the samples are generally used without modification.
- the packets are delivered to the destination by the packet network.
- Speech implementations also allow for voice activity detection (VAD) whereby intervals of silence are detected and transmission bandwidth conserved by just transmitting an indication of silence rather than (encoded) speech sample information. At the receiving end intervals of silence are synthesized using comfort noise.
- VAD voice activity detection
- Packet architectures are superior to circuit-switched architectures in terms of efficiency of bandwidth utilization (because of statistical multiplexing), they have some drawbacks, comparatively speaking.
- Packet architectures tend to increase latency (average delay) and introduce time delay variations.
- jitter buffers are required. That is, buffers of an “elastic” nature are used to account for the burstiness of the packet arrival pattern.
- the depth of these buffers must be large enough to span the peak-to-peak time delay variation over the network. Put another way, the size (depth) of the jitter buffer determines the peak-to-peak time delay variation that is allowed for the network and a variation greater than this maximum value will result in packets being lost or used incorrectly.
- the jitter buffer With the jitter buffer set at its “optimum” size, and providing adequate traffic engineering is in place to provide the real-time services (such as VoIP) the appropriate priority, it is assumed that time delay variation will not cause packet loss except in situations of high traffic congestion.
- the frequency offset between source and destination has two deleterious effects. One is the pitch modification effect that has been described elsewhere (see Ref. [12], for example) and while important, is not the thrust of this invention. The other is a “buffer shrink” effect. If the DAC clock is faster than the ADC clock, the jitter buffer will empty faster than it is being filled. Suppose for example the buffer size is 200 ms.
- the emptying of the buffer will affect the lower threshold.
- the buffer will fill faster than it is being emptied and this will affect the upper threshold. For example, a frequency difference of 50 ppm will cause a threshold reduction (either the upper or the lower) of 50 ⁇ sec every second or 1 ms every 20 seconds. Therefore, whereas the probability of losing a packet due to time delay variation may have been small to nonexistent at the start of the call, the probability increases with the duration of the call and, for calls of long duration could become appreciable.
- an adaptive jitter buffer For voice calls there have been several methods described in the literature to handle such problems.
- the notion of an adaptive jitter buffer is to modify the size of the jitter buffer to match the existing time-delay variation condition being experienced.
- Silence-stretching and silence-compressing algorithms have been proposed to delete or expand sections (sub-intervals) of silence.
- Packet loss concealment algorithms have been developed to insert or delete sections of “non-silence” in such a manner as to reduce (subjectively) any annoying effects of packet loss. The interested reader is pointed to Ref [9,10] for further information on these methods.
- silence-manipulation and packet loss concealment will be designated as extreme measures. Such measures are necessary because the general behaviour of IP networks is such that packets will be lost in the network for a variety of reasons, including excessive time-delay variation that could lead to jitter buffer overflow or underflow.
- the block labeled “Depacketization, Jitter Buffer, and Signal Processing” in FIG. 4 will be, logically, split into multiple entities:
- Depacketization The packets received from the IP network are processed and the information content required for synthesis of the speech signal extracted. As part of the depacketization process, the protocol wrappers are examined to detect whether a packet was lost in the network. If a packet is detected as “lost”, then the packet loss concealment algorithm must be invoked.
- the current invention does not relate in particular to depacketization algorithms and implementations and most methods prevalent in the state-of-the-art can be employed. Packets contain both time-stamps and sequence numbers (also called frame numbers) and between these two it is straightforward to decide whether there was a missing packet or whether the apperent missing packet was actually a “no_transmission” corresponding to a silence packet.
- each IP packet may contain more than one speech frame. That is, each IP packet may contain the information for multiple (1 or more) blocks of speech. For example, if the block size used is 10 ms, the IP packet may contain 20 ms or two blocks worth of speech information in encoded form.
- the unit of storage in the jitter buffer (see below) is the speech frame since this is the most convenient and useful unit of storage and can be either in the form of encoded speech or even decoded speech (see the notion of processing, below).
- the jitter buffer in prior art VoIP decoders comprised a first-in first-out (FIFO) buffer that was large enough to accommodate the time delay variation encountered by packets as they traverse the IP network from source (encoder/packetization) to the destination decoder.
- FIFO first-in first-out
- the incoming packets are written in as they arrive and read out by the signal processing entity at the play-out rate. That is, the jitter buffer contains the actual received packets with, possibly, the protocol wrappers removed.
- the incoming packets are treated by the signal processing entity as they arrive and the synthesized speech samples written into the FIFO.
- the FIFO contains actual speech samples destined for the DAC and is emptied based on the clock of the DAC.
- the invention disclosed herein applies to both modes of operation.
- the reason for the first mode of operation is that the jitter buffer module includes the logic required to handle missing packets as well as “silence” when there are really no packets available and the missing packets are synthesized as “silence” based on other information such as time-stamps available in the packets.
- a flag is associated with each sample (octet) of speech signal recreated/synthesized. This flag is asserted (“true”) if the speech sample generated was part of a silence segment or a segment of signal artificially created via the packet loss concealment algorithm or had some particular characteristic as will be described later.
- the intent in this flag is to indicate that the sample is “actionable” and will have a minimal subjective annoyance in the event that the sample was deleted (or repeated) as part of the adaptive slip double buffer that is the crux of the invention disclosed herein. If the signal processing entity is incapable of providing such a flag for any reason, then the play-out buffer will, in essence, ignore the flag and assume that all samples are “actionable”.
- the notion of “actionable” is that the frame of speech is either representative of silence or is representative of a synthetic frame of speech used for packet loss concealment.
- the nominal short-term power of the speech is computed by the encoding function (at the analog-to-digital converter side) and communicated to the decoding side (the digital to analog converter side).
- the decoding side In the case where there is no compression, the decoding side must compute the short-term power of the signal and invoke suitable algorithms to determine whether the current decoded speech is part of a silence interval. Implementing slips introduces degradation but the degradation is much less consequential is invoked during periods of silence.
- the invention disclosed here deals with an adaptive play-out buffer that is also called an adaptive slip double buffer. This is described below by considering the fundamentals of prior-art and the extensions that comprise the invention.
- the play-out buffer can be viewed as a retimer as described here.
- the data speech samples or octets
- the “recovered clock” is used to write the incoming packets into a buffer that is operated in a FIFO (“first-in-first-out”) mode.
- the recovered clock in this scenario is a burst mode clock corresponding to packet arrival instants.
- the data is read out of the buffer using, effectively, the DAC clock (the retiming function generally involves inserting the “reference” clock), and then packets read out from the FIFO can be applied to the signal processing function to generate the digital speech samples for the DAC.
- the retiming function generally involves inserting the “reference” clock
- packets read out from the FIFO can be applied to the signal processing function to generate the digital speech samples for the DAC.
- the function of “retiming” is illustrated in FIG. 6 .
- a FIFO buffer 412 is coupled to a depacketization block 411 .
- a digital signal processor 413 is coupled to the FIFO buffer 412 .
- the block labeled “DeP” refers to the circuitry used to implement the depacketization functions.
- the block labeled DSP represents the DSP functions that generate the speech samples for handing off to the digital-to-analog converter (DAC).
- the FIFO buffer represents the jitter buffer.
- the DSP block reads out of the jitter buffer based on the DAC clock.
- the DeP writes into the jitter buffer when packets arrive and this can be viewed as a jittered version of the encoder clock from the far end.
- the FIFO can be viewed as a “pipe” with the receive data that is written into the FIFO viewed as being pushed into the pipe.
- the transmit data that is read out of the FIFO is viewed as being pulled out of the pipe.
- the arrow designated as “fill position” indicates where the next frame/packet that must be read out is located within the pipe.
- the action of “write” moves the fill position to the right and each read operation moves the fill position to the left.
- the fill position arbitrarily, points to the middle of the FIFO buffer.
- the size of the FIFO buffer is 2N units (typically frames)
- short-term frequency variations referred to as wander
- up to N unit intervals (“UI”) of time-delay variation in the packet network (2N UI, peak-to-peak) can be absorbed (1 UI is equivalent to 1 frame-time, 10 ms for a frame size of 80 samples if the underlying sample rate is 8 kHz).
- N UI unit intervals
- the arrangement adds transmission delay of, on the average, N UI.
- a FIFO of this nature can serve as a jitter buffer accommodating up to ⁇ N UI of time-delay variation.
- N is 10
- ⁇ 100 ms of time-delay variation (wander) can be absorbed.
- the buffer will either overflow or underflow.
- the fill position will move all the way to the right if the write clock is high or all the way to the left if the write clock is low. In this situation data will be corrupted; either some data is lost (“overflow”), or some “garbage” data must be inserted (“underflow”).
- the appropriate way to handle such frequency offsets is to force the fill position to the center (the equivalent of “reset”) whenever the fill position rails at either extreme. In such a situation, either N frames are discarded (“lost”) or N frames are repeated (“garbage”).
- One key element of the disclosed invention is the anticipation of overflow/underflow events.
- Another key element of the disclosed invention is the manner in which the clock used by the DSP to read frames out of the jitter buffer is derived from the DAC and adjusted to minimize the impact of clock offset between the local DAC and the far-end ADC.
- the arrangement for delivering samples to the digital-to-analog converter generally involves a double buffer arrangement.
- the reason for this buffering is that the actual conversion is done on a sample by sample basis using a “continuous” clock.
- the DSP unit will usually generate the samples as a block of samples. Thus while the DSP unit generates the correct number of samples per unit time on the average, it generates the samples in bursts.
- the most common arrangement for implementing the double-buffer function involves the use of two buffers of equal size, say N octets, and referred to as “Page-A” and “Page-B”.
- One of the sides (we shall assume the “write” side for specificity and ease of explanation) accesses the buffer(s) sequentially. That is, the write operation first fills buffer Page-A, moves to buffer Page-B, fills it, and returns to filling buffer Page-A. The read operation empties the buffers. Under “normal” conditions, the read side is accessing buffer Page-B while the write side is accessing buffer Page-A, and vice-versa.
- FIG. 7 a simplified depiction of a double buffer arrangement for implementing the interface to the DAC is shown.
- a first buffer 421 (Page-A) is coupled to a second buffer 422 (Page-B).
- the actual DAC converts samples that are read out of the appropriate.
- the two buffers are often referred to as Page-A and Page-B.
- the trajectory of the write pointer (“WP”) (the address to which the next write operation will pertain to) is shown. In particular, after filling Page-A, the pointer moves to the bottom of Page-B and commences filling Page-B.
- the trajectory of the read pointer (“RP”) follows the same principle and is implied. At the beginning (or “reset”), the WP and RP point to different pages.
- the page size is nominally 80 samples.
- the DSP writes into the buffer the entire frame (nominally 80 samples) almost instantaneously. That is, it computes the appropriate sample values and fills the buffer in one “write statement”.
- the pseudo code for this operation will appear as:
- write_pointer identifies the memory address of the first element in the appropriate page (page A or page B).
- the instruction following the loop is important. What this achieves is the replication of sample value X[N1] into page-A/B location N1 as well as (N1+1).
- the same value is placed in the 80 th as well as the 81 st location of the buffer. Note that this approach is suitable for a linear buffer arrangement; slight modifications are required for circular buffer operation.
- the speed of the write operation is determined by the speed at which the DSP operates and not by the rate of the DAC clock. Generally speaking the machine-cycle time of the DSP will be very small and the entire process of writing 81 samples will be a very small fraction of the 10 ms frame duration.
- the DAC clock is locally generated and may or may not be locked to a network reference. That is, it may be derived from a free-running oscillator. In either case it is not controlled by the DSP module that is reading out from the jitter buffer because implementing a clock synchronization method based on jitter-buffer fill (also referred to as adaptive clock recovery) requires expensive oscillators to smooth out the jitter introduced by packet network that can be quite large (see Ref. [11] for example).
- a key aspect of the invention is to allow the DAC clock to run asynchronously with respect to the far-end ADC clock but yet account for the frequency offset using a slip mechanism that is based on single-sample slips while simulating clock synchronization as applied to the jitter buffer read/write. That is, the intent of this simulated synchronization is to avoid the “buffer shrink” effect and keeping the data corrupted due to a slip small (one sample) minimizes the deleterious effect on end-user quality of experience.
- the typical manner in which the “read clock” (the “DAC-derived clock” in FIG. 6 ) for the DSP unit is generated is based on the premise that the DAC unit will provide a marker (generally implemented using the notion of a “software interrupt”) every 80 samples.
- the DAC will empty one page, say Page-A, and provide a software interrupt signal to initiate the DSP unit operation so that the DSP will read one frames worth of information from the jitter buffer and complete the signal processing required to fill Page-A while the DAC is reading from Page-B.
- the operation of the DAC unit will involve a counter that starts from 0 for each page and is incremented by 1 when a sample is extracted from that page.
- the page When the counter reaches a “final value” the page is deemed to have been emptied and the DAC unit flags the DSP unit that one frame interval has transpired. If this “final value” is 80 then the frame interval is 10 ms (assuming that the sampling rate is nominally 8 kHz) according to the DAC clock.
- One key aspect of the invention is to allow the controlling entity to adjust the “final value”. Thus if the final value is set at 79, the DAC will interrupt the DSP unit in less than 10 ms (10 ms minus 125 ⁇ s) and if the final value is set at 81, the DAC will interrupt the DSP unit in more than 10 ms (10 ms plus 125 ⁇ s) where these time intervals are based on the DAC clock.
- the method of changing the “final value” provides the means to either shorten or lengthen the apparent frame interval corresponding to an apparent increase or decrease of the apparent DAC clock frequency from the viewpoint of the read clock.
- the DAC will extract only 79 out of the 80 valid samples from the page. That is, effectively we have deleted one sample.
- the DAC will attempt to extract 81 samples from the page though there are only 80 valid samples. To ensure that this is done in a reasonable manner, the buffer size should be 81 and when the DSP writes 80 samples into the buffer it repeats the last sample to get the 81 st sample. That is, sample 80 and 81 are the same. Consequently the DAC is repeating one sample.
- the controlling entity should change the final value occasionally, and only when necessary. At all other times it should be left at the nominal value of 80 (N1 set to 79).
- the overall adaptive jitter double buffer arrangement can be viewed as a combination of the linear double buffer between the DSP block and the DAC and a “traditional” jitter buffer that stores packets between the depacketization block and the DSP block (as depicted by the FIFO in FIG. 6 ).
- the FIFO is advantageously implemented as a circular buffer.
- a simplified view of the circular buffer arrangement is depicted in FIG. 8 .
- a buffer 432 is coupled to a write address generator 431 .
- a read address generator 433 is coupled to the buffer 432 .
- a page control block 434 is coupled between the write address generator 431 and the read address generator 433 .
- a control signals black 435 is coupled to the read address generator 433 .
- the data written into the buffer comprises the packets extracted from the IP stream by the depacketization block.
- the size of the circular buffer is 2N “locations”, each “location” containing the data associated with the packet.
- the data read out of the buffer comprises the packet data that is used by the DSP to extract the speech samples.
- the DSP block gets enough information to synthesize one block/frame worth of samples that will be fed to the DAC.
- the nuances of the method are implemented in the “Read Add. Gen.” block and thus the “Write Add. Gen.” block where the write address (“WR_ADD”) is generated can be quite simple.
- the block labeled “ ⁇ ” generates the difference between the read and write addresses [“RD_ADD”-“WR_ADD”] where the B-bit numbers are interpreted as 2's-complement represented integers.
- the block labeled “Control Signals” represents the circuitry implementing the logic associated with the control signals required by the “Read Add. Gen.” block.
- the functions associated with the various blocks are elaborated upon next. These functions have a direct counterpart for a linear buffer arrangement.
- the “Write Add. Gen.” block is quite straightforward.
- the starting address is provided as the initial value of the write_pointer and then for every write operation the write_pointer is incremented. Since a circular buffer operation is used, modulo-2N arithmetic provides the wrap-around feature.
- the write instruction is asserted (see write instruction W1 in pseudo-code; this applies for the jitter buffer as well), the input data is written into the buffer in the location pointed to by the counter contents, “WR_ADD”, and the write_pointer incremented by one.
- software instructions are needed to determine the suitable memory address of the start of the page.
- the “page ctrl” block represents a function that monitors whether the read operation as well as the write operations are happening in the “location”. If so then the buffer has overflowed/underflowed and the correct action is to forcibly move one or the other side to the opposite part of the circular buffer. This is achieved by adding “N” (modulo-2N) to write_address or to the read_address (depending which is to be forcibly moved to the other page). Minor modifications are required in the case of a linear buffer arrangement.
- This difference is done modulo-2N; when the memory addresses are at diametrically opposite parts of the circular buffer the difference will be N; when the addresses are close to each other the difference is small in magnitude; when they coincide the difference is zero.
- defining which is “ahead” is somewhat moot. For our purposes, if ⁇ n is positive the write pointer is “catching up” to the read pointer; if ⁇ n is negative the read pointer is catching up to the write pointer.
- T 3 >T 2 >T 1 three “threshold values”, T 3 >T 2 >T 1 are predetermined. Suitable choices for these thresholds and the underlying rationale are provided later. Comparison of ⁇ n with these determines the “state” of the adaptive play-out buffer; the state then determines the appropriate action.
- Making the final_value equal to (N1 ⁇ 1) means the read address reads one less location from the page, essentially deleting a sample. This is done if ⁇ n is positive (write catching up with read). What this accomplishes is artificially decreasing the duration of a “frame” from the viewpoint of accessing the jitter buffer, slowing down the rate at which the write is catching up with the read.
- the flag associated with the current read data should be true.
- the flag will be set true by the signal processing block if the sample is part of an “actionable” signal segment.
- the timer has expired.
- the timer is essentially a counter that is reset (to zero) when a slip event (repetition/deletion) has occurred.
- the timer counter is incremented by the DAC clock and saturates at a (pre-determined) maximum value. Until it reaches this maximum count, slip events are inhibited. The intent is to ensure that slip events are not allowed to occur too close together. c. If T 3 >
- the implication of the “orange” state is that the read and write pointers are very likely coming closer and some action is definitely required. This takes the form of a controlled slip provided some other conditions are met. This is similar to the yellow state with relaxed conditions. In particular, the flag is ignored.
- the timer constraint is the same as for the yellow state. d. If
- Traditional “adaptive” jitter buffers adjust the size of the jitter buffer to mitigate the occurrence of such overflow/underflow events. That is, the size of the jitter buffer is increased if the trend is seen to be towards such overflow/underflow events.
- Traditional adaptive algorithms for jitter buffers malfunction because they make no distinction between overflow/underflow that is the result of packet delay variation and the result of a clock offset.
- the slip function implemented in this algorithm addresses the clock offset issue and therefore if overflow/underflow does occur it is because the jitter buffer is not large enough to accommodate the packet delay variation in the network. Consequently the invention disclosed here will improve and enhance the behavior of conventional adaptive jitter buffer algorithms.
- a suitable value for the timer is the closest power of 2 less than the packet size and in this case is 64. With this choice of timer, the slip events will be constrained to no more than twice per packet duration.
- FIG. 9 A simplified view of this block is shown in FIG. 9 .
- a time 447 is coupled to an increment control block 443 .
- An increment generator 442 is coupled to the increment control 443 and generates a final_value 441 .
- the increment generator 442 is coupled to an adder block 444 , which in-turn is coupled to a select block 445 , which in-turn is coupled to a register block 446 , which in-turn is coupled to a read address block 448 .
- the read address counter is implemented as an accumulator that is updated based on the DAC-derived clock (“Read_Clock”). Under normal operation the increment is one unit (corresponding to packet size). That is, the read operation will sequence through the jitter buffer in a normal manner. The adjustment of the “Read_Clock” interval based on the slip buffer mechanism between DSP and DAC will account for frequency offset between DAC and far-end ADC clock. If the condition is “red” (see item “d” above) then the increment is either 0 units (the packet loss concealment algorithm is invoked) or 2 units (one packet is effectively deleted).
- Final_value is the control value for the double buffer between the DSP block and the DAC.
- the nominal value will be called “N” in the following.
- (N ⁇ 1) and (N+1) are the values for Final_value that will delete or repeat a (DAC) sample, respectively
- the block labeled “Increment Control” is one aspect of the invention of the adaptive play-out buffer. The actions have been described before but are summarized here for completeness. Based on the various state conditions this block controls the generation of the increment used by the read address counter:
- i Deliver message to signal processing entity that packet loss concealment (deletion or synthesis, based on sign of ⁇ n) is required.
- FIG. 9 does not show this control signal explicitly but it is implied.
- ii. If timer has not expired, set Final_value to “N”.
- iii. If timer has expired, set Final_value to (N ⁇ 1) or (N+1) depending on sign of ⁇ n and reset timer. 3.
- State is orange: i. If timer has not expired, set Final_value to “N”. ii. If timer has expired, set Final_value to (N ⁇ 1) or (N+1) depending on sign of ⁇ n and reset timer. 4. If State is yellow. i.
- a second problem is that the transport is asynchronous and therefore the receiving end may be operating at a different timing-base from the sending end.
- the packetized nature of VoIP necessitates the use of a jitter buffer and, possibly, a second buffer to interface to the actual digital to analog converter (DAC).
- DAC digital to analog converter
- Salient points of the invention are:
- the DAC double buffer is made adaptive in the sense that controlled slips are implemented. 2)
- the signal-processing entity can flag samples from segments of speech that are considered “actionable”. 3)
- the slip action can, optionally, be inhibited if the sample affected has been flagged as “nonactionable” 4)
- the controlled slip action is instantiated by monitoring the fill of the jitter buffer.
- the jitter buffer FIFO is implemented as a circular buffer and the difference between the read and write pointers used as a measure of buffer fill. 6)
- a timer is used to ensure that slip events do not occur too close to each other.
- a timer is used to ensure that the frequency control is not too rapid.
- program and/or the phrase computer program are intended to mean a sequence of instructions designed for execution on a computer system (e.g., a program and/or computer program, may include a subroutine, a function, a procedure, an object method, an object implementation, an executable application, an applet, a servlet, a source code, an object code, a shared library/dynamic load library and/or other sequence of instructions designed for execution on a computer or computer system).
- a program and/or computer program may include a subroutine, a function, a procedure, an object method, an object implementation, an executable application, an applet, a servlet, a source code, an object code, a shared library/dynamic load library and/or other sequence of instructions designed for execution on a computer or computer system).
- the term substantially is intended to mean largely but not necessarily wholly that which is specified.
- the term approximately is intended to mean at least close to a given value (e.g., within 10% of).
- the term generally is intended to mean at least approaching a given state.
- the term coupled is intended to mean connected, although not necessarily directly, and not necessarily mechanically.
- the term proximate as used herein, is intended to mean close, near adjacent and/or coincident; and includes spatial situations where specified functions and/or results (if any) can be carried out and/or achieved.
- the term distal is intended to mean far, away, spaced apart from and/or non-coincident, and includes spatial situation where specified functions and/or results (if any) can be carried out and/or achieved.
- the term deploying is intended to mean designing, building, shipping, installing and/or operating.
- the terms first or one, and the phrases at least a first or at least one, are intended to mean the singular or the plural unless it is clear from the intrinsic text of this document that it is meant otherwise.
- the terms second or another, and the phrases at least a second or at least another, are intended to mean the singular or the plural unless it is clear from the intrinsic text of this document that it is meant otherwise.
- the terms a and/or an are employed for grammatical style and merely for convenience.
- the term plurality is intended to mean two or more than two.
- the term any is intended to mean all applicable members of a set or at least a subset of all applicable members of the set.
- the phrase any integer derivable therein is intended to mean an integer between the corresponding numbers recited in the specification.
- the phrase any range derivable therein is intended to mean any range within such corresponding numbers.
- the term means, when followed by the term “for” is intended to mean hardware, firmware and/or software for achieving a result.
- the term step, when followed by the term “for” is intended to mean a (sub)method, (sub)process and/or (sub)routine for achieving the recited result.
- all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. In case of conflict, the present specification, including definitions, will control.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Computer Hardware Design (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
- Telephone Function (AREA)
Abstract
A method includes monitoring a fill in an adaptive slip buffer of a digital to analog convertor; adjusting a number of samples that are read from the adaptive slip buffer per page as a function of the fill; and reading the number of samples from the adaptive slip buffer. An apparatus includes a digital to analog convertor including an adaptive slip buffer and a read address generator coupled to the adaptive slip buffer, wherein the read address generator includes an increment control that adjusts a number of samples that are read from the adaptive slip buffer per page as a function of fill of the adaptive slip buffer.
Description
- This application claims a benefit of priority under 35 U.S.C. 119(e) from copending provisional patent applications U.S. Ser. No. 61/340,923, filed Mar. 24, 2010, U.S. Ser. No. 61/340,922, filed Mar. 24, 2010 and U.S. Ser. No. 61/340,906, filed Mar. 24, 2010, the entire contents of all of which are hereby expressly incorporated herein by reference for all purposes.
- 1. Field of the Invention
- Embodiments of the invention relate generally to the field of digital networking communications. More particularly, an embodiment of the invention relates to methods and systems for packet (and/or frame) switched networking that include an adaptive slip double buffer.
- 2. Discussion of the Related Art
- With the advent of Internet Protocol (“IP”), packet-based transmission and routing schemes are becoming ever more popular. It is well accepted that Next Generation Networks (“NGN”s) will be built upon these principles. However, several services, such as real-time voice and voice-band communication, that are well suited for circuit-switched (“TDM”) transmission and switching, have to be supported by this new architecture. VoIP (“voice over IP”) is one such example. The underlying premise of VoIP is that speech, after conversion from analog to digital format, can be packetized and several protocols such as RTP and RTCP (see Ref. [1,2]) have been developed to support the ability of IP networks to provide such real-time services.
- One of the premises of NGNs is that the Quality of Experience (QoE) should be at least as good as good, or even better than, that provided by the legacy circuit-switched network or PSTN (Public Switched Telephone Network). It is clear that delay is an important parameter in determining the QoE. It is well known that one-way delays that are very large (of the order of 400 ms or larger) are extremely detrimental from the view of subjective quality, making regular full-duplex conversation difficult. At lower one-way delays, the impact of echo is important. The Quality of Experience, for a given level of Echo Return Loss (ERL) drops rapidly with increasing delay.
- The overall delay has four principal components. The process of packetization involves buffering information to fill the packet payload and thus introduces delay. The encoding and decoding algorithms, especially in the case of source codecs, require buffering as well. These two delays are often known quantities. The third component is the delay through the network. This delay is difficult to predict a priori since it depends on the physical distance, the number of intermediate packet switches involved in the end-to-end transport of a packet, the bandwidth of the links between switches (routers). However, for two given end-points there is, in principle, a minimal network delay corresponding to the transit time of the fastest possible packet transmission. Considering that in a pure IP network the transmission path could be different for different packets, and the queuing delay in intermediate nodes is a function of congestion, the delay experienced by packets will be variable, ranging from the minimal delay to infinity (a packet lost in the network is construed as an instance of infinite delay). Some maximum delay threshold must be determined and packets with delay greater than this maximum are discarded. Received packets are stored in a buffer whose size corresponds to the difference between minimum and maximum delays and so, practically speaking, fast packets are delayed so that the packets can be decoded and converted back to analog signals in a smooth fashion. The notion of play-out, or dejittering, whereby some delay is introduced via a jitter buffer constitutes the fourth delay component. Clearly, in order to maximize the subjective quality of the call, the play-out buffer, also referred to as the jitter buffer, should be as small as possible.
- There is a need for the following embodiments of the invention. Of course, the invention is not limited to these embodiments.
- According to an embodiment of the invention, a process comprises: monitoring a fill in an adaptive slip buffer of a digital to analog convertor; adjusting a number of samples that are read from the adaptive slip buffer per page as a function of the fill; and reading the number of samples from the adaptive slip buffer. According to another embodiment of the invention, a machine comprises: a digital to analog convertor including an adaptive slip buffer and a read address generator coupled to the adaptive slip buffer, wherein the read address generator includes an increment control that adjusts a number of samples that are read from the adaptive slip buffer per page as a function of fill of the adaptive slip buffer.
- These, and other, embodiments of the invention will be better appreciated and understood when considered in conjunction with the following description and the accompanying drawings. It should be understood, however, that the following description, while indicating various embodiments of the invention and numerous specific details thereof, is given for the purpose of illustration and does not imply limitation. Many substitutions, modifications, additions and/or rearrangements may be made within the scope of an embodiment of the invention without departing from the spirit thereof, and embodiments of the invention include all such substitutions, modifications, additions and/or rearrangements.
- The drawings accompanying and forming part of this specification are included to depict certain embodiments of the invention. A clearer concept of embodiments of the invention, and of components combinable with embodiments of the invention, and operation of systems provided with embodiments of the invention, will be readily apparent by referring to the exemplary, and therefore nonlimiting, embodiments illustrated in the drawings (wherein identical reference numerals (if they occur in more than one view) designate the same elements). Embodiments of the invention may be better understood by reference to one or more of these drawings in combination with the following description presented herein. It should be noted that the features illustrated in the drawings are not necessarily drawn to scale.
-
FIG. 1 is a functional block view of a simplified depiction of a VoIP call (only one direction shown), appropriately labeled “PRIOR ART.” -
FIG. 2 is a functional block view of a circular buffering action separating ADC and DAC clocks, appropriately labeled “PRIOR ART.” -
FIG. 3 is a functional block view of a simplified model of VoIP over an IP network, appropriately labeled “PRIOR ART.” -
FIG. 4 is a functional block view of transmission of voice-band signals over a packet network, appropriately labeled “PRIOR ART.” -
FIG. 5 is a functional block view of depicting the functions involved in generating the received speech signal, representing an embodiment of the invention. -
FIG. 6 is a functional block view of an underlying principle of a retiming FIFO buffer (play-out buffer), representing an embodiment of the invention. -
FIG. 7 is a functional block view of a double buffer arrangement for delivering samples to the DAC, representing an embodiment of the invention. -
FIG. 8 is a functional block view of a simplified circular buffer arrangement, representing an embodiment of the invention. -
FIG. 9 is a functional block view in more detail of “Read Add. Gen.” (433 inFIG. 8 ), representing an embodiment of the invention. - Embodiments of the invention and the various features and advantageous details thereof are explained more fully with reference to the nonlimiting embodiments that are illustrated in the accompanying drawings and detailed in the following description. Descriptions of well known starting materials, processing techniques, components and equipment are omitted so as not to unnecessarily obscure the embodiments of the invention in detail. It should be understood, however, that the detailed description and the specific examples, while indicating preferred embodiments of the invention, are given by way of illustration only and not by way of limitation. Various substitutions, modifications, additions and/or rearrangements within the spirit and/or scope of the underlying inventive concept will become apparent to those skilled in the art from this disclosure.
- Within this application several publications are referenced by Arabic numerals, or principal author's name followed by year of publication, within parentheses or brackets. Full citations for these, and other, publications may be found at the end of the specification immediately preceding the claims after the section heading References. The disclosures of all these publications in their entireties are hereby expressly incorporated by reference herein for the purpose of indicating the background of embodiments of the invention and illustrating the state of the art.
- The invention described herein describes a novel approach to the play-out buffer, providing a method to maintain optimal performance even in situations where the analog-to-digital converter (ADC) and digital-to-analog converter (DAC) have different underlying time-bases. In particular, a method based on controlled slips, a technique that is well known as being efficient in TDM architectures for addressing clock offset, is presented. The invention is an extension of controlled slip behavior. In particular, the slip mechanism is invoked primarily when the speech segment represents a synthetic signal such as during periods of silence or if the characteristics of the speech segment are such that the repetition/deletion of a speech sample will have minimal subjective annoyance. It will be seen that an adaptive play-out buffer of the manner described here can form an integral part of an adaptive jitter buffer mechanism. Extensions of the invention include methods to implement adaptive clock control with minimal impact on subjective quality.
- Strictly speaking, the term synchronization applies to alignment of time and the term syntonization applies to alignment of frequency, but in the telecommunication environment we often use the term synchronization to refer to either time-alignment, or frequency-alignment, or both. It is generally clear from the context which meaning is appropriate. All real-time communication carried over a digital network requires synchronization to some degree. This can be illustrated by considering the example of delivering a real-time voice signal between two geographically disparate points across a network.
- The situation is depicted in
FIG. 1 . The analog source is converted into digital format by an analog-to-digital converter (ADC or A/D) operating at a sampling clock rate of nominally 8 kHz. Each sample is, conventionally, quantized to 8 bits so that the digital stream carrying the voice information is 8 kilo-octets-per-second or 64 kbps (see ITU-T Rec. G.711, Ref. [3], and Ref. [4]). This is regarded as a DS0 and represents “uncompressed” voice. In a conventional circuit-switched or TDM (Time Division Multiplexed) architecture, this DS0 is delivered “as is” to the destination for conversion back to analog format. In a packet-switched environment, exemplified by Voice-over-IP (VoIP), the DS0 is, possibly, compressed and organized into packets. These packets are delivered to the destination where the expansion to DS0 format is performed prior to conversion back to analog. Whereas the schemes described here are applicable regardless of the word-length employed for A/D conversion or D/A conversion, we shall henceforth assume here that these are done with a word-length of 8 bits (1 octet) (representative of □-law and A-law formats provided in ITU-T Recommendation G.711) for specificity. - It is important to recognize that at each end the digital-to-analog converter (DAC or D/A) and analog-to-digital converter (ADC or A/D) are usually in the same integrated circuit chip or on the same circuit board and thus the same clock is used for both functions at any one end. In the event that the (digital) signal processing includes echo cancellation, it is mandatory that the same clock be used for both functions else the echo canceller will exhibit sub-par performance and there will be instances of echo leakage and other phenomena that negatively impact the quality of experience. In
FIG. 1 we show a single direction of transmission solely for convenience in representation and explanation. - The rate at which packets are generated (in the encoder) is determined by the A/D clock, shown as fA in
FIG. 1 . In most VoIP schemes, one packet is generated for every 80 samples from the A/D converter. That is, using the conventional sampling rate of 8 kHz (nominal), each packet represents 10 ms (ms=millisecond) of speech (there are variants that use block sizes other than 10 ms, such as 5 ms, 20 ms, 30 ms, etc. but the principles described here are applicable in all cases). The nominal word-length associated with each sample is 8 bits, following G.711 (see Ref. [3]) so the “uncompressed” signal represents a bit-rate of 64 kbps (or DS0). Compression algorithms are employed to reduce the effective bit-rate. For example, ADPCM (adaptive differential pulse code modulation) following ITU-T Recommendation G.726 (see Ref. [5]) reduces the word-length associated with each sample to 4, effectively reducing the data rate to 32 kbps. ITU-T Recommendation G.727 (see Ref. [5]) describes methods for reducing the bits/sample from 8 down to 5 or 4 or 3 or 3, corresponding to bit-rates of 40, 32, 24, and 16 kbps, respectively. More sophisticated schemes, such as those described in ITU-T Recommendation G.723 and G.729 (see Ref. [5]) are even more effective in reducing the bit-rate. The notion of a “10-ms-packet” is the collection of information produced by the coder that permits the decoder at the far end to synthesize a 10-ms block of speech. Depending on the coding algorithm it is possible that information from previous packets is necessary as well. At the receiving end the decoder recreates the appropriate digital signal (DS0) for conversion back into analog format. The D/A clock is shown as fD inFIG. 1 . - If the frequencies of the A/D clock (fA) and the D/A clock (fD) are not equal, then slips will occur. The notion of a slip is simple. If fA>fD then the DAC will experience a surfeit of samples; if fA<fD then the DAC will experience a shortage of samples. Rate-adaptation then requires that samples be deleted or inserted. In the circuit-switched architecture of the legacy PSTN, every transmission boundary element is required to extract DS0 s from an incoming digital signal (typically a DS1) and reinsert the information into an outgoing digital signal (typically a DS1) that may, potentially, have a different time-base. Therefore slip buffers are very common. To minimize the occurrence of slips, the circuit-switched network is well synchronized and this approach to network synchronization has the derivative benefit that the clock offset between the end points is minimized. In an NGN, where asynchronous transport is employed, there is no guarantee that the clock offset between the end points is negligible.
- However, this phenomenon is not necessarily catastrophic, but the DAC would have to either insert or delete a sample to account for the difference in sampling rates. This insertion or deletion of a block of information, such as a sample, is referred to as a slip. Note that a slip is the result of the difference in sampling rates and is independent of the word length associated with the quantization and compression. The degradation of perceptual quality caused by slips is in addition to any degradation caused by other factors. In conventional circuit-switched telephony, the unit of information inserted or deleted is one sample (or octet). Considering the nominal sampling rate is 8 kHz (one sample every 125 □s), a slip occurs when the accumulated phase difference, expressed in time units, caused by the aforementioned frequency difference, crosses 125 □s. In a packetized scenario, the unit could be as large a block of speech, typically of duration 20 ms and thus slips would have an impact similar to packet loss. Note that 20-ms slips occur much less frequently than 125-□s slips but have a greater impact each time they occur. The thrust of the current invention is to get the benefits of single-octet (single-sample) slips in a packet environment. Furthermore, the thrust of the current invention is to get the benefits of a single-octet slip in low-cost implementations such as in customer-premises-equipment (CPE) integrated-access-devices (IADs) and residential gateways.
- In the following table we provide the slip rate assuming that the D/A conversion clock uses a free-running oscillator and that the A/D clock is accurate (relative to a Primary Reference Source). Also provided is the typical technology used for that accuracy and a budgetary estimate (order of magnitude) of the cost of the oscillator. The last three columns provide an approximate time between slip occurrences for different block sizes. In generating this table it was assumed that the transmission link between the A/D and D/A is equivalent to a “null” link that adds no impairments such as excessive time-delay variation or transmission errors. The intent is to lay the baseline for the minimum impairment that is introduced by the lack of synchronization between the end-points.
- With regard to Table I as shown below, the terminology used includes: XO: Crystal Oscillator, TCXO: Temperature-Compensated Crystal Oscillator and OCXO: Oven-Controlled Crystal Oscillator
-
TABLE 1 Relationship between frequency offset and interval between buffer overflow/underflow events Accuracy Technology Cost 125-□sec slip 10- ms slip 1 × 10−10 Rubidium ~$1000 1.25 × 106 sec. 1 × 108 sec. (14.5 days) (3.2 years) 50 × 10−9 Hi-Quality ~$500 25 × 103 sec. 2 × 105 sec. (50 ppb) OCXO (41.7 min) (2.3 days) 5 × 10−6 OCXO ~$50 25 sec. 2 × 103 sec. (5 ppm) (33.3 min) 50 × 10−6 TCXO ~$10 2.5 sec. 20 sec. (50 ppm) 1 × 10−3 XO ~$1 0.125 sec. 1 sec. (0.1%) (8 per sec.) 1 × 10−2 XO ~$0.1 12.5 msec. 0.1 sec. (1%) (80 per sec.) - It should be noted that in carrier-grade equipment such as that used in large telecom service provider networks, the higher quality clock sources (oscillators) are appropriate. For customer-premise equipment, including cases where the application runs on a personal computer, the quality of the oscillator is likely to be of the XO or, at best, TCXO class.
- The perceptual degradation in quality caused by slips is very subjective. The impact of an isolated slip in conventional telephony using uncompressed signals (G.711) is typically a “click” that could well be imperceptible, especially if it occurs during a silent interval. However, the perceived quality degrades rapidly as the slip-rate increases. The various digital switches in the PSTN are all provided a PRS (Primary Reference Source) traceable reference and thus have an absolute accuracy of better than 1×10−11. A call traversing two distinct timing domains may experience slips corresponding to a worst-case frequency difference of 2×10−11. Considering that this equates to one slip every 72 days, we can, for all practical purposes, ignore the phenomenon of slips in the traditional circuit-switched network. In VoIP applications, the end points are quite cost sensitive and therefore it is likely that the quality of oscillator deployed will be represented by one of the last three rows of Table 1 and clearly slips may play an important role in determining the quality of experience (or lack thereof).
- Most studies for evaluating the perceptual quality of compressed voice are done in a controlled environment and consider only a single compression/expansion. Additional study is required to assess the impact of tandem connections wherein there may be multiple conversions of format. Furthermore, the impact of an isolated slip may have a different perceptual effect on synthetic speech, such as that inherent in CELP (Code Excited Linear Prediction) methods for compression, such as G.729 (see Ref. [5]). However, it is quite well accepted that the controlled slip method, where one sample (octet) is deleted/inserted in an “uncompressed” stream, works very well provided that slips do not manifest themselves too often.
- If the size of the buffer is large, then the relative frequency of occurrence of buffer overflow/underflow events will be small. However, large buffers imply the introduction of delay and the decrease in quality of experience. Nevertheless, even with large buffers deployed to mitigate the occurrence of buffer overflow/underflow, there are other impairments that arise because of a difference in clock between the end-points. Note that if there is a long-term-average difference in the clock (frequency) at the two end-points then buffer overflow/underflow will occur—the size of the buffer will just determine the interval between these catastrophic events.
- The analog signal from the source enters the network and is converted into a digital signal by the analog-to-digital converter (ADC). The network acts as a pipe for these digital words (samples) that are delivered to the far-end digital-to-analog converter (DAC) for conversion back to analog. The conversion points could be in equipment, such as a customer-premise located IAD or PBX or even a Class-5 switch operated by the local telephone company. It is important to recognize that the time-base governing the A/D clock could be different from the time-base governing the D/A clock and thus there could be a difference in the sampling rates associated with these two conversions. That is, in every digital network there is the potential of encountering the pitch modification effect. The frequency difference could be small, of the order of 2 parts in 1011, if the conversion clocks are traceable to a Stratum-1 source (or sources); the frequency difference could be significant, of the order of 64 parts in 106 (64 parts per million or 64 ppm), if the only guarantee given is that the conversion clocks are Stratum-4 quality (Stratum-4 implies an accuracy of no worse than ±32 ppm). {The notions of clock strata and the frequency accuracy of different classes of clocks are available in Ref. [6,7].}
- Clearly, if the conversion rates are different, then the DAC will experience a surfeit of samples if the ADC clock is higher than the DAC clock, or a dearth of samples if the situation is reversed. In fact, such a phenomenon could be manifested at multiple places in the network where there is a connection between two Network Elements with different clock references. Clock offsets of this type are accommodated by the use of slip-buffers. Whereas buffers are always required to compensate for accumulated jitter and wander, it is the effect of a frequency offset that is the primary focus here.
- Again for simplicity, we shall assume that there is just one buffer, and that this buffer is associated with the DAC. This buffer will be of a FIFO (first-in-first-out) nature where the data is written into the buffer under control of the ADC clock and read out of the buffer under control of the DAC clock. Clearly, if there is a frequency offset between the two clocks, the buffer will, eventually, either overflow (ADC clock is higher) or underflow (DAC clock is higher). In practice the buffering method is called “double buffering” wherein there are two pages, say A and B, and while data is being written into page A, data is being read out of page B. If there is no frequency offset, then the opposite-page nature of read and write will, for the most part, be preserved. Such a buffer needs to be just big enough to accommodate any relative wander or jitter between the two clocks. It is convenient to describe the size of the buffer in terms of time. For example, if each page is “10 ms”, then each page has 80 octets, assuming a nominal sampling rate of 8 kHz and one octet per sample (e.g. G.711; see Ref. [3] or [4]). The overall buffer is then 20 ms deep, introduces a nominal delay of 10 ms and can accommodate ±10 ms of wander.
- A good way of visualizing the double-buffer action is to consider a circular buffer as depicted in
FIG. 2 . The memory is organized in a circular manner with address calculations done Modulo-2N, where 2N is the total number of memory locations. From the viewpoint of the DS0 channel under consideration, each location holds one octet (corresponding to one octet per sample), the buffer has a “length” of (2N/8) ms, introduces a nominal delay of (N/8) ms, and can accommodate ±(N/8) ms of wander. The operation is quite simple. With each write operation the write pointer moves one location counter-clockwise and likewise the read pointer moves one location counter-clockwise with each read operation. If the relative time error between the read and write clocks is zero, then the pointers remain a fixed distance apart. A frequency offset will result in one pointer catching up to the other, resulting in an overflow/underflow. The reset position is when the pointers access diametrically opposite locations. When an overflow/underflow occurs, one pointer is forcibly moved to be diametrically opposite to the other. This action causes data corruption in the sense that N octets will be either lost or repeated. It should be emphasized that allowing large buffers to overflow/underflow results in losses of large amounts of data when such events occur and this could have a much more deleterious impact on end-user (human) quality of experience than losses of small amounts of data that may occur more frequently. - One special case is when the buffer is 250 □sec deep. This is the notion of a conventional slip buffer. Considering the sampling rate is 8 kHz (125 □sec period), a slip buffer has two octets and the overflow/underflow results in either the deletion of an octet or the repetition of an octet. This is called a controlled slip. A slip occurs when the relative time interval error between read and write clocks exceeds 125 □s. For example, if the relative frequency offset between the two clocks is 64 ppm, then a slip will occur approximately every 2 seconds.
- In packet-switched networks the delay through the network is not steady as is the case of circuit-switched networks. Therefore, even if the rates of the ADC and DAC are equal, the write clock may, on a short-term basis, appear to be faster (or slower) than the read clock. This requires the use of a buffer that is called a jitter buffer because the term used in the industry for variable transit delay is “jitter”.
- Now suppose that the buffer is 200 ms deep. The buffer will overflow (underflow) when the relative time interval error between the two clocks exceeds 100 ms. A 64 ppm offset will thus result in overflows (underflows) approximately every 3000 seconds. Considering that a telephone call rarely lasts 50 minutes, it is clear that overflows (underflows) that are a result of a clock offset may be ignored for all practical purposes. This is one of the (incorrect) reasons given by proponents of IP networks that frequency synchronization is not required because free-running clocks can support VoIP considering that buffer overflows and underflows can be made rare by increasing the size of the buffer.
- It should be recognized that:
-
- if a frequency offset exists then there will be occurrences of buffer overflow/underflow.
- the relative rate of such events will be smaller for larger buffer sizes.
- the larger the buffer size the greater is the loss of data when such an event occurs.
- the larger the buffer size the larger the delay (important for human quality of experience).
- The thrust of this invention is to use multiple buffers. One buffer is similar to a traditional jitter buffer. The incoming packets are written into the jitter buffer upon arrival. Note that this write operation is tied, effectively, to the ADC clock (of the far end) with additional jitter introduced by the packet delay variation in the network. The packets are extracted (read out) from the jitter buffer using the DSP block (explained later) that is nominally uniform. The rate of packet extraction by the DSP block is determined by the rate of the DAC clock. The second buffer is a double buffer whose size is altered occasionally to adjust the rate at which the jitter buffer data is extracted by the DSP block.
- A network based on packet switching and transmission can be quite complex, but the simple model depicted in
FIG. 3 is sufficient to illustrate how synchronization and adaptive play-out buffers play a role. We consider an IAD (Integrated Access Device) at the customer premises as the traffic aggregator. All the various services are provided from the IAD to which all the customer equipment is connected. To allow for attachment of legacy devices such as telephones and Fax machines, the IAD will provide an FXS port to which the Fax machine (telephone) is connected. To the Fax machine (telephone), the FXS port appears, for all intents and purposes, as the line circuit of a traditional Class-5 switch. The IAD contains the codec where the conversion between analog and digital is accomplished. The information, however, is not transported as a conventional DS0 would in a TDM (time division multiplexed) or circuit-switched scenario. The data is packetized and encapsulated in the appropriate “wrappers” for transmission over the packet network. - In terms of the important processes involved after call set-up, a simple, though accurate, view is depicted in
FIG. 4 . For convenience only one direction of transmission is shown. The analog signal from the source Fax machine or telephone (“srce”) is converted into digital format using an A/D converter. It is quite conventional to use a conventional telephony codec that uses a sampling rate of 8 kHz and encodes the sample value in an octet (G.711 coding) though there are implementations described in the literature where a higher sampling rate and a higher word-length are used for improved fidelity. These samples are assembled into packets. For speech applications there may be some signal processing involved for purposes of echo cancellation and data compression; for Fax calls the samples are generally used without modification. The packets are delivered to the destination by the packet network. - Speech implementations also allow for voice activity detection (VAD) whereby intervals of silence are detected and transmission bandwidth conserved by just transmitting an indication of silence rather than (encoded) speech sample information. At the receiving end intervals of silence are synthesized using comfort noise.
- Whereas packet architectures are superior to circuit-switched architectures in terms of efficiency of bandwidth utilization (because of statistical multiplexing), they have some drawbacks, comparatively speaking. Packet architectures tend to increase latency (average delay) and introduce time delay variations. In order to accommodate time delay variations, jitter buffers are required. That is, buffers of an “elastic” nature are used to account for the burstiness of the packet arrival pattern. In order to avoid loss of data the depth of these buffers must be large enough to span the peak-to-peak time delay variation over the network. Put another way, the size (depth) of the jitter buffer determines the peak-to-peak time delay variation that is allowed for the network and a variation greater than this maximum value will result in packets being lost or used incorrectly.
- If the jitter buffer is too small, time delay variation can be the primary cause of packet loss. For normal voice (speech) calls, packet loss concealment (“PLC”) algorithms are available to mitigate the impact of lost packets. However, it should be emphasized that the mitigation of the deleterious impact does not mean that the problem is eliminated. In Ref. [8] a general picture of the impact of packet loss on Quality of Experience is provided. One way to reduce packet loss is to increase the size of the jitter buffer. However, this approach, too, has its drawbacks since the increase in delay caused by increasing the depth of the jitter buffer has a negative impact on the Quality of Experience for voice calls for several reasons (see Ref. [8]). Consequently most prior art VoIP implementations utilize what is referred to as an adaptive jitter buffer, algorithms have been developed to make the jitter buffer size dynamic, the intent being to keep the buffer just large enough such that the loss of packets due to time delay variation is within an acceptable limit, which the ITU-T Recommendations specify as 0.05%. However, adaptive litter buffer operation in the prior art has a major problem because the proponents of VoIP and adaptive jitter buffers have ignored the effects of lack of clock synchronization.
- With the jitter buffer set at its “optimum” size, and providing adequate traffic engineering is in place to provide the real-time services (such as VoIP) the appropriate priority, it is assumed that time delay variation will not cause packet loss except in situations of high traffic congestion. However, the frequency offset between source and destination has two deleterious effects. One is the pitch modification effect that has been described elsewhere (see Ref. [12], for example) and while important, is not the thrust of this invention. The other is a “buffer shrink” effect. If the DAC clock is faster than the ADC clock, the jitter buffer will empty faster than it is being filled. Suppose for example the buffer size is 200 ms. Then, whereas at the start of the call a 200 ms buffer will, theoretically, allow a ±100 ms time delay variation, the emptying of the buffer will affect the lower threshold. Similarly, if the ADC clock is faster than the DAC clock, the buffer will fill faster than it is being emptied and this will affect the upper threshold. For example, a frequency difference of 50 ppm will cause a threshold reduction (either the upper or the lower) of 50 □sec every second or 1 ms every 20 seconds. Therefore, whereas the probability of losing a packet due to time delay variation may have been small to nonexistent at the start of the call, the probability increases with the duration of the call and, for calls of long duration could become appreciable.
- For voice calls there have been several methods described in the literature to handle such problems. The notion of an adaptive jitter buffer is to modify the size of the jitter buffer to match the existing time-delay variation condition being experienced. Silence-stretching and silence-compressing algorithms have been proposed to delete or expand sections (sub-intervals) of silence. Packet loss concealment algorithms have been developed to insert or delete sections of “non-silence” in such a manner as to reduce (subjectively) any annoying effects of packet loss. The interested reader is pointed to Ref [9,10] for further information on these methods.
- In the context of this invention, silence-manipulation and packet loss concealment will be designated as extreme measures. Such measures are necessary because the general behaviour of IP networks is such that packets will be lost in the network for a variety of reasons, including excessive time-delay variation that could lead to jitter buffer overflow or underflow. In the context of this invention, the block labeled “Depacketization, Jitter Buffer, and Signal Processing” in
FIG. 4 will be, logically, split into multiple entities: - a. Depacketization. The packets received from the IP network are processed and the information content required for synthesis of the speech signal extracted. As part of the depacketization process, the protocol wrappers are examined to detect whether a packet was lost in the network. If a packet is detected as “lost”, then the packet loss concealment algorithm must be invoked. The current invention does not relate in particular to depacketization algorithms and implementations and most methods prevalent in the state-of-the-art can be employed. Packets contain both time-stamps and sequence numbers (also called frame numbers) and between these two it is straightforward to decide whether there was a missing packet or whether the apperent missing packet was actually a “no_transmission” corresponding to a silence packet. Basically the block labeled “Extract Frames” in
FIG. 4 extracts the (encoded) speech frames from the packets. Note that each IP packet may contain more than one speech frame. That is, each IP packet may contain the information for multiple (1 or more) blocks of speech. For example, if the block size used is 10 ms, the IP packet may contain 20 ms or two blocks worth of speech information in encoded form. For convenience, the unit of storage in the jitter buffer (see below) is the speech frame since this is the most convenient and useful unit of storage and can be either in the form of encoded speech or even decoded speech (see the notion of processing, below).
b. Jitter Buffer. The jitter buffer in prior art VoIP decoders comprised a first-in first-out (FIFO) buffer that was large enough to accommodate the time delay variation encountered by packets as they traverse the IP network from source (encoder/packetization) to the destination decoder. In one possible first implementation, the incoming packets are written in as they arrive and read out by the signal processing entity at the play-out rate. That is, the jitter buffer contains the actual received packets with, possibly, the protocol wrappers removed. In a second possible implementation, the incoming packets are treated by the signal processing entity as they arrive and the synthesized speech samples written into the FIFO. In this second implementation the FIFO contains actual speech samples destined for the DAC and is emptied based on the clock of the DAC. The invention disclosed herein applies to both modes of operation. The reason for the first mode of operation is that the jitter buffer module includes the logic required to handle missing packets as well as “silence” when there are really no packets available and the missing packets are synthesized as “silence” based on other information such as time-stamps available in the packets. Specifically, if the sequence numbers of consecutive packets are in correct sequence but the time-stamps indicate a time gap greater than the unit (frames or packets) then it is deemed that there were silent frames/packets between the two in-correct-order-sequence-number packets. In the second mode of operation there must be logic to determine silence packets. The invention described here is applicable to both implementations though, for specificity, the first implementation scheme is assumed.
c. Signal Processing. The information extracted from the received packet is processed with the appropriate algorithms to generate the speech segment. This includes the codec function, echo treatment (if any), comfort noise generation to synthesize silence, and packet loss concealment. The current invention does not relate in particular to the signal processing algorithms and implementation and just about any methods prevalent in the state-of-the-art can be employed. - There is one additional (though optional) requirement on the signal processing implementation arising from the current invention. That is, a flag is associated with each sample (octet) of speech signal recreated/synthesized. This flag is asserted (“true”) if the speech sample generated was part of a silence segment or a segment of signal artificially created via the packet loss concealment algorithm or had some particular characteristic as will be described later. The intent in this flag is to indicate that the sample is “actionable” and will have a minimal subjective annoyance in the event that the sample was deleted (or repeated) as part of the adaptive slip double buffer that is the crux of the invention disclosed herein. If the signal processing entity is incapable of providing such a flag for any reason, then the play-out buffer will, in essence, ignore the flag and assume that all samples are “actionable”.
- The notion of “actionable” is that the frame of speech is either representative of silence or is representative of a synthetic frame of speech used for packet loss concealment. In the case where the speech is compressed, the nominal short-term power of the speech is computed by the encoding function (at the analog-to-digital converter side) and communicated to the decoding side (the digital to analog converter side). In the case where there is no compression, the decoding side must compute the short-term power of the signal and invoke suitable algorithms to determine whether the current decoded speech is part of a silence interval. Implementing slips introduces degradation but the degradation is much less consequential is invoked during periods of silence.
- The invention disclosed here deals with an adaptive play-out buffer that is also called an adaptive slip double buffer. This is described below by considering the fundamentals of prior-art and the extensions that comprise the invention.
- The underlying principle of retiming is quite straightforward. The play-out buffer can be viewed as a retimer as described here. Fundamentally, the data (speech samples or octets) as well as a clock (“recovered clock”) are recovered from the incoming packet stream. The “recovered clock” is used to write the incoming packets into a buffer that is operated in a FIFO (“first-in-first-out”) mode. The recovered clock in this scenario is a burst mode clock corresponding to packet arrival instants. The data is read out of the buffer using, effectively, the DAC clock (the retiming function generally involves inserting the “reference” clock), and then packets read out from the FIFO can be applied to the signal processing function to generate the digital speech samples for the DAC. The function of “retiming” is illustrated in
FIG. 6 . - Referring to
FIG. 6 , aFIFO buffer 412 is coupled to adepacketization block 411. Adigital signal processor 413 is coupled to theFIFO buffer 412. - In
FIG. 6 , the block labeled “DeP” refers to the circuitry used to implement the depacketization functions. The block labeled DSP represents the DSP functions that generate the speech samples for handing off to the digital-to-analog converter (DAC). The FIFO buffer represents the jitter buffer. The DSP block reads out of the jitter buffer based on the DAC clock. The DeP writes into the jitter buffer when packets arrive and this can be viewed as a jittered version of the encoder clock from the far end. - For illustrative purposes, the FIFO can be viewed as a “pipe” with the receive data that is written into the FIFO viewed as being pushed into the pipe. The transmit data that is read out of the FIFO is viewed as being pulled out of the pipe. The arrow designated as “fill position” indicates where the next frame/packet that must be read out is located within the pipe. The action of “write” moves the fill position to the right and each read operation moves the fill position to the left. At the beginning or “reset” situation, the fill position, arbitrarily, points to the middle of the FIFO buffer. With such an arrangement, if the size of the FIFO buffer is 2N units (typically frames), short-term frequency variations, referred to as wander, can be accommodated without loss of data. In particular, up to N unit intervals (“UI”) of time-delay variation in the packet network (2N UI, peak-to-peak) can be absorbed (1 UI is equivalent to 1 frame-time, 10 ms for a frame size of 80 samples if the underlying sample rate is 8 kHz). Needless to say, the arrangement adds transmission delay of, on the average, N UI. A FIFO of this nature can serve as a jitter buffer accommodating up to ±N UI of time-delay variation. For reference, if N is 10, up to ±100 ms of time-delay variation (wander) can be absorbed.
- If the (long-term) average frequencies of the write clock and read clock are different, then the buffer will either overflow or underflow. With respect to
FIG. 6 , the fill position will move all the way to the right if the write clock is high or all the way to the left if the write clock is low. In this situation data will be corrupted; either some data is lost (“overflow”), or some “garbage” data must be inserted (“underflow”). In a generic retiming application, the appropriate way to handle such frequency offsets is to force the fill position to the center (the equivalent of “reset”) whenever the fill position rails at either extreme. In such a situation, either N frames are discarded (“lost”) or N frames are repeated (“garbage”). In a VoIP scenario, where the signal processing entity is capable of packet loss concealment, the advent of underflow can be anticipated and instead of “garbage”, speech segments can be synthesized that have much less subjective annoyance. Likewise, the advent of overflow can be detected and packet loss concealment methods applied to “delete” packets in a manner that is not arbitrary but introduces less impairment from a subjective standpoint. - One key element of the disclosed invention is the anticipation of overflow/underflow events.
- This will be described shortly.
- Another key element of the disclosed invention is the manner in which the clock used by the DSP to read frames out of the jitter buffer is derived from the DAC and adjusted to minimize the impact of clock offset between the local DAC and the far-end ADC.
- This is described next
- The arrangement for delivering samples to the digital-to-analog converter generally involves a double buffer arrangement. The reason for this buffering is that the actual conversion is done on a sample by sample basis using a “continuous” clock. The DSP unit will usually generate the samples as a block of samples. Thus while the DSP unit generates the correct number of samples per unit time on the average, it generates the samples in bursts.
- The most common arrangement for implementing the double-buffer function involves the use of two buffers of equal size, say N octets, and referred to as “Page-A” and “Page-B”. One of the sides (we shall assume the “write” side for specificity and ease of explanation) accesses the buffer(s) sequentially. That is, the write operation first fills buffer Page-A, moves to buffer Page-B, fills it, and returns to filling buffer Page-A. The read operation empties the buffers. Under “normal” conditions, the read side is accessing buffer Page-B while the write side is accessing buffer Page-A, and vice-versa. If the average (long-term) frequencies of the read and write operations are equal, then the accesses will, substantially, remain in opposite buffers. This arrangement is sometimes referred to as a linear buffer arrangement to distinguish it from a circular buffer arrangement. The advantage of a linear buffer arrangement is that the memory allocation for the buffer can be slightly more than the actual page size.
- In
FIG. 7 a simplified depiction of a double buffer arrangement for implementing the interface to the DAC is shown. A first buffer 421 (Page-A) is coupled to a second buffer 422 (Page-B). The actual DAC converts samples that are read out of the appropriate. The two buffers are often referred to as Page-A and Page-B. The trajectory of the write pointer (“WP”) (the address to which the next write operation will pertain to) is shown. In particular, after filling Page-A, the pointer moves to the bottom of Page-B and commences filling Page-B. The trajectory of the read pointer (“RP”) follows the same principle and is implied. At the beginning (or “reset”), the WP and RP point to different pages. It is especially pertinent to make the page size equal to one frame. For example, in implementations using a 10 ms frame with an 8 kHz sampling rate the frame size is nominally 80 samples. Also, in this situation the DSP writes into the buffer the entire frame (nominally 80 samples) almost instantaneously. That is, it computes the appropriate sample values and fills the buffer in one “write statement”. The pseudo code for this operation will appear as: -
Get initial value for write_pointer (establish whether it is page A or page B) For l = 0,1,2,...,N1 { Write X[l] into address defined by write_pointer [write instruction “W1”] Increment write_pointer } - Write X[N1] into address defined by write_pointer [write instruction “W2”]
- Switch page designation for next block (from page A to page B and vice versa)
- In the above block of code N1 is 79 if the block size is 80 since the range of the index I starts at 0. It is assumed that the DSP has computed the requisite sample values and these are available in the array {X[j]; j=0, 1, 2, . . . , N1}. At the start write_pointer identifies the memory address of the first element in the appropriate page (page A or page B). The instruction following the loop (in bold font) is important. What this achieves is the replication of sample value X[N1] into page-A/B location N1 as well as (N1+1). Thus for the case of an 80-sample frame, the same value is placed in the 80th as well as the 81st location of the buffer. Note that this approach is suitable for a linear buffer arrangement; slight modifications are required for circular buffer operation.
- Note that the speed of the write operation is determined by the speed at which the DSP operates and not by the rate of the DAC clock. Generally speaking the machine-cycle time of the DSP will be very small and the entire process of writing 81 samples will be a very small fraction of the 10 ms frame duration.
- In common implementations in customer-premises equipment such as the Integrated Access Device (IAD), the DAC clock is locally generated and may or may not be locked to a network reference. That is, it may be derived from a free-running oscillator. In either case it is not controlled by the DSP module that is reading out from the jitter buffer because implementing a clock synchronization method based on jitter-buffer fill (also referred to as adaptive clock recovery) requires expensive oscillators to smooth out the jitter introduced by packet network that can be quite large (see Ref. [11] for example).
- A key aspect of the invention is to allow the DAC clock to run asynchronously with respect to the far-end ADC clock but yet account for the frequency offset using a slip mechanism that is based on single-sample slips while simulating clock synchronization as applied to the jitter buffer read/write. That is, the intent of this simulated synchronization is to avoid the “buffer shrink” effect and keeping the data corrupted due to a slip small (one sample) minimizes the deleterious effect on end-user quality of experience.
- The typical manner in which the “read clock” (the “DAC-derived clock” in
FIG. 6 ) for the DSP unit is generated is based on the premise that the DAC unit will provide a marker (generally implemented using the notion of a “software interrupt”) every 80 samples. In most implementations the DAC will empty one page, say Page-A, and provide a software interrupt signal to initiate the DSP unit operation so that the DSP will read one frames worth of information from the jitter buffer and complete the signal processing required to fill Page-A while the DAC is reading from Page-B. The operation of the DAC unit will involve a counter that starts from 0 for each page and is incremented by 1 when a sample is extracted from that page. When the counter reaches a “final value” the page is deemed to have been emptied and the DAC unit flags the DSP unit that one frame interval has transpired. If this “final value” is 80 then the frame interval is 10 ms (assuming that the sampling rate is nominally 8 kHz) according to the DAC clock. One key aspect of the invention is to allow the controlling entity to adjust the “final value”. Thus if the final value is set at 79, the DAC will interrupt the DSP unit in less than 10 ms (10 ms minus 125 μs) and if the final value is set at 81, the DAC will interrupt the DSP unit in more than 10 ms (10 ms plus 125 μs) where these time intervals are based on the DAC clock. - That is, the method of changing the “final value” provides the means to either shorten or lengthen the apparent frame interval corresponding to an apparent increase or decrease of the apparent DAC clock frequency from the viewpoint of the read clock.
- Some important points associated with this method:
- a. If the final value (N1) is set at 78 then the DAC will extract only 79 out of the 80 valid samples from the page. That is, effectively we have deleted one sample.
b. If the final value (N1) is set at 80 then the DAC will attempt to extract 81 samples from the page though there are only 80 valid samples. To ensure that this is done in a reasonable manner, the buffer size should be 81 and when the DSP writes 80 samples into the buffer it repeats the last sample to get the 81st sample. That is, sample 80 and 81 are the same. Consequently the DAC is repeating one sample.
c. The controlling entity should change the final value occasionally, and only when necessary. At all other times it should be left at the nominal value of 80 (N1 set to 79). Note that the example cited above assumed a frame size of 10 ms and a sampling rate of 8 kHz. The same technique is applicable for different frame sizes and different sampling rates though the specific values such as 79, 80, and 81 for the “final value” will depend on the sampling rate and chosen frame size. - The overall adaptive jitter double buffer arrangement can be viewed as a combination of the linear double buffer between the DSP block and the DAC and a “traditional” jitter buffer that stores packets between the depacketization block and the DSP block (as depicted by the FIFO in
FIG. 6 ). The FIFO is advantageously implemented as a circular buffer. - A simplified view of the circular buffer arrangement is depicted in
FIG. 8 . Abuffer 432 is coupled to awrite address generator 431. Aread address generator 433 is coupled to thebuffer 432. A page control block 434 is coupled between thewrite address generator 431 and theread address generator 433. A control signals black 435 is coupled to the readaddress generator 433. The data written into the buffer comprises the packets extracted from the IP stream by the depacketization block. The size of the circular buffer is 2N “locations”, each “location” containing the data associated with the packet. The data read out of the buffer comprises the packet data that is used by the DSP to extract the speech samples. As mentioned before, based on each read access the DSP block gets enough information to synthesize one block/frame worth of samples that will be fed to the DAC. In this implementation it is assumed that the nuances of the method are implemented in the “Read Add. Gen.” block and thus the “Write Add. Gen.” block where the write address (“WR_ADD”) is generated can be quite simple. The block labeled “Δ” generates the difference between the read and write addresses [“RD_ADD”-“WR_ADD”] where the B-bit numbers are interpreted as 2's-complement represented integers. The block labeled “Control Signals” represents the circuitry implementing the logic associated with the control signals required by the “Read Add. Gen.” block. The functions associated with the various blocks are elaborated upon next. These functions have a direct counterpart for a linear buffer arrangement. - The “Write Add. Gen.” block is quite straightforward. The starting address is provided as the initial value of the write_pointer and then for every write operation the write_pointer is incremented. Since a circular buffer operation is used, modulo-2N arithmetic provides the wrap-around feature. When the write instruction is asserted (see write instruction W1 in pseudo-code; this applies for the jitter buffer as well), the input data is written into the buffer in the location pointed to by the counter contents, “WR_ADD”, and the write_pointer incremented by one. In the case of a linear buffer arrangement software instructions are needed to determine the suitable memory address of the start of the page.
- The “page ctrl” block represents a function that monitors whether the read operation as well as the write operations are happening in the “location”. If so then the buffer has overflowed/underflowed and the correct action is to forcibly move one or the other side to the opposite part of the circular buffer. This is achieved by adding “N” (modulo-2N) to write_address or to the read_address (depending which is to be forcibly moved to the other page). Minor modifications are required in the case of a linear buffer arrangement.
- The block labeled “Δ” generates the difference [“RD_ADD”-“WR_ADD”]=Δn. This difference is done modulo-2N; when the memory addresses are at diametrically opposite parts of the circular buffer the difference will be N; when the addresses are close to each other the difference is small in magnitude; when they coincide the difference is zero. Considering the circular nature of the buffer, defining which is “ahead” is somewhat moot. For our purposes, if Δn is positive the write pointer is “catching up” to the read pointer; if Δn is negative the read pointer is catching up to the write pointer.
- Assigning appropriate actions based on the value of do is a key aspect of the invention.
- To this end, three “threshold values”, T3>T2>T1 are predetermined. Suitable choices for these thresholds and the underlying rationale are provided later. Comparison of Δn with these determines the “state” of the adaptive play-out buffer; the state then determines the appropriate action.
- a. If |Δn−N|≦T1, the state is “green”. The implication of the “green” state is that the read and write pointers are far apart and no special action is taken. Note that the furthest they can be apart is, essentially, N, implying that the read and write operations are occurring in diametrically opposite parts of the circular buffer. The “increment” applied to the read address pointer (discussed shortly) is unity implying the read function operates in a normal manner.
b. If T2>|Δn−N|≧T1, the state is “yellow”. The implication of the “yellow” state is that the read and write pointers are possibly coming closer and some action is required. This takes the form of a controlled slip provided some other conditions are met. A controlled slip involves repeating or deleting one signal sample by changing the final_value in the linear double-buffer arrangement between the DSP and the DAC. - This is achieved by modifying the final_value to (N1+1) as described earlier. As described before, this implies that we essentially repeating a sample. This is done if Δn is negative (read catching up with write). What this accomplishes is artificially increasing the duration of a “frame” from the viewpoint of accessing the jitter buffer, slowing down the rate at which the read is catching up with the write.
- Making the final_value equal to (N1−1) means the read address reads one less location from the page, essentially deleting a sample. This is done if Δn is positive (write catching up with read). What this accomplishes is artificially decreasing the duration of a “frame” from the viewpoint of accessing the jitter buffer, slowing down the rate at which the write is catching up with the read.
- The aforementioned conditions for allowing a slip operation to take place are the following:
- 1) The flag associated with the current read data should be true. The flag will be set true by the signal processing block if the sample is part of an “actionable” signal segment.
2) The timer has expired. The timer is essentially a counter that is reset (to zero) when a slip event (repetition/deletion) has occurred. The timer counter is incremented by the DAC clock and saturates at a (pre-determined) maximum value. Until it reaches this maximum count, slip events are inhibited. The intent is to ensure that slip events are not allowed to occur too close together.
c. If T3>|Δn−N|≧T2, the state is “orange”. The implication of the “orange” state is that the read and write pointers are very likely coming closer and some action is definitely required. This takes the form of a controlled slip provided some other conditions are met. This is similar to the yellow state with relaxed conditions. In particular, the flag is ignored. The timer constraint is the same as for the yellow state.
d. If |Δn−N|>T3, the state is “red”. The implication of the “red” state is that the read and write pointers are very close to each other and some extreme action is required. This takes the form of a controlled slip provided the timer constraint is met (as in the orange state) as well as a request to the signal processing entity that packet loss concealment must be initiated. If Δn is negative a segment of synthetic speech must be inserted; if Δn is positive a segment of speech must be deleted. In the red state we invoke not just effective change of frame duration by 1 DAC sample interval, but an entire frame in addition. - Traditional “adaptive” jitter buffers adjust the size of the jitter buffer to mitigate the occurrence of such overflow/underflow events. That is, the size of the jitter buffer is increased if the trend is seen to be towards such overflow/underflow events. Traditional adaptive algorithms for jitter buffers malfunction because they make no distinction between overflow/underflow that is the result of packet delay variation and the result of a clock offset. The slip function implemented in this algorithm addresses the clock offset issue and therefore if overflow/underflow does occur it is because the jitter buffer is not large enough to accommodate the packet delay variation in the network. Consequently the invention disclosed here will improve and enhance the behavior of conventional adaptive jitter buffer algorithms.
- e. If Δn=0, the state is “catastrophic” implying that the write pointer and read pointer are coincident. This requires very drastic action. This is achieved by re-centering the jitter buffer. That is, the read pointer is “reset” to be diametrically opposite to the write pointer. N packets will be lost or repeated by this action that is equivalent to jitter buffer overflow/underflow. Suitable values for the thresholds are T3=(¾)N; T2=(½)N; T1=(¼)N, where the size of the overall jitter buffer is 2N. If the packet loss concealment algorithm is not very sophisticated and thus should be minimally invoked, an alternate set of threshold values is T3=(⅞)N; T2=(¾)N; T1=(⅛)N. These choices are well suited for efficient implementation and it is unlikely that “optimum” values for these thresholds, derived by any sophisticated means, will provide an efficacy that much greater than this particular set to warrant an increase in implementation complexity. The value for N, the buffer size, depends on the expected time-delay variation. If we assume a packet size of 10 ms (80 speech samples) a “typical” time-delay variation will be ±10 ms, corresponding to ±0.5 packet duration.
- A suitable value for the timer is the closest power of 2 less than the packet size and in this case is 64. With this choice of timer, the slip events will be constrained to no more than twice per packet duration.
- The block labeled “Read Add. Gen.” is important since this is a key aspect of the invention. A simplified view of this block is shown in
FIG. 9 . Atime 447 is coupled to anincrement control block 443. Anincrement generator 442 is coupled to theincrement control 443 and generates afinal_value 441. Theincrement generator 442 is coupled to anadder block 444, which in-turn is coupled to aselect block 445, which in-turn is coupled to aregister block 446, which in-turn is coupled to a readaddress block 448. - The entity M-WR_ADD represents the WR_ADD modified to represent the address diametrically opposite the current location that is being written into. If Δn=0, the drastic action taken is to make the select control choose M-WR_ADD to load into the read address register (see item “e” above). The read address counter is implemented as an accumulator that is updated based on the DAC-derived clock (“Read_Clock”). Under normal operation the increment is one unit (corresponding to packet size). That is, the read operation will sequence through the jitter buffer in a normal manner. The adjustment of the “Read_Clock” interval based on the slip buffer mechanism between DSP and DAC will account for frequency offset between DAC and far-end ADC clock. If the condition is “red” (see item “d” above) then the increment is either 0 units (the packet loss concealment algorithm is invoked) or 2 units (one packet is effectively deleted).
- The notion of “Final_value” is the control value for the double buffer between the DSP block and the DAC. The nominal value will be called “N” in the following. (N−1) and (N+1) are the values for Final_value that will delete or repeat a (DAC) sample, respectively
- The block labeled “Increment Control” is one aspect of the invention of the adaptive play-out buffer. The actions have been described before but are summarized here for completeness. Based on the various state conditions this block controls the generation of the increment used by the read address counter:
- 1. If State is catastrophic (Δn=0):
i. Assert reset (forcing read pointer to be diametrically opposite to write pointer)
ii. Reset timer. This is optional. Included for specificity.
iii. Set increment to one unit. This is optional since counter action is overridden by reset action. Set Final_value to “N”. - i. Deliver message to signal processing entity that packet loss concealment (deletion or synthesis, based on sign of Δn) is required.
FIG. 9 does not show this control signal explicitly but it is implied. Set increment to 0 or 2 units.
ii. If timer has not expired, set Final_value to “N”.
iii. If timer has expired, set Final_value to (NΔ1) or (N+1) depending on sign of Δn and reset timer.
3. If State is orange:
i. If timer has not expired, set Final_value to “N”.
ii. If timer has expired, set Final_value to (N−1) or (N+1) depending on sign of Δn and reset timer.
4. If State is yellow.
i. If timer has not expired, or flag is false, set Final_value to “N”.
ii. If timer has expired, and flag is true, set Final_value to (N−1) or (N+1) depending on sign of Δn and reset timer.
iii. Note: If the signal processing entity does not provide the flag it is deemed to be always true.
5. If State is green:
i. Set Final_value to “N”. (Normal slip buffer operation)
Note: In states orange, yellow, and green the increment for the read address for the jitter buffer (i.e. RD_ADD inFIG. 9 ) is set to one unit. - One of the problems associated with communication of real-time information over packet networks is the time-delay variation introduced. A second problem is that the transport is asynchronous and therefore the receiving end may be operating at a different timing-base from the sending end. The packetized nature of VoIP necessitates the use of a jitter buffer and, possibly, a second buffer to interface to the actual digital to analog converter (DAC). The invention described herein deals with simple and efficient methods to address the jitter buffer and clock offset issues.
- Salient points of the invention are:
- 1) The DAC double buffer is made adaptive in the sense that controlled slips are implemented.
2) The signal-processing entity can flag samples from segments of speech that are considered “actionable”.
3) The slip action can, optionally, be inhibited if the sample affected has been flagged as “nonactionable”
4) The controlled slip action is instantiated by monitoring the fill of the jitter buffer.
5) The jitter buffer FIFO is implemented as a circular buffer and the difference between the read and write pointers used as a measure of buffer fill.
6) A timer is used to ensure that slip events do not occur too close to each other.
7) A timer is used to ensure that the frequency control is not too rapid. - The term program and/or the phrase computer program are intended to mean a sequence of instructions designed for execution on a computer system (e.g., a program and/or computer program, may include a subroutine, a function, a procedure, an object method, an object implementation, an executable application, an applet, a servlet, a source code, an object code, a shared library/dynamic load library and/or other sequence of instructions designed for execution on a computer or computer system).
- The term substantially is intended to mean largely but not necessarily wholly that which is specified. The term approximately is intended to mean at least close to a given value (e.g., within 10% of). The term generally is intended to mean at least approaching a given state. The term coupled is intended to mean connected, although not necessarily directly, and not necessarily mechanically. The term proximate, as used herein, is intended to mean close, near adjacent and/or coincident; and includes spatial situations where specified functions and/or results (if any) can be carried out and/or achieved. The term distal, as used herein, is intended to mean far, away, spaced apart from and/or non-coincident, and includes spatial situation where specified functions and/or results (if any) can be carried out and/or achieved. The term deploying is intended to mean designing, building, shipping, installing and/or operating.
- The terms first or one, and the phrases at least a first or at least one, are intended to mean the singular or the plural unless it is clear from the intrinsic text of this document that it is meant otherwise. The terms second or another, and the phrases at least a second or at least another, are intended to mean the singular or the plural unless it is clear from the intrinsic text of this document that it is meant otherwise. Unless expressly stated to the contrary in the intrinsic text of this document, the term or is intended to mean an inclusive or and not an exclusive or. Specifically, a condition A or B is satisfied by any one of the following: A is true (or present) and B is false (or not present), A is false (or not present) and B is true (or present), and both A and B are true (or present). The terms a and/or an are employed for grammatical style and merely for convenience.
- The term plurality is intended to mean two or more than two. The term any is intended to mean all applicable members of a set or at least a subset of all applicable members of the set. The phrase any integer derivable therein is intended to mean an integer between the corresponding numbers recited in the specification. The phrase any range derivable therein is intended to mean any range within such corresponding numbers. The term means, when followed by the term “for” is intended to mean hardware, firmware and/or software for achieving a result. The term step, when followed by the term “for” is intended to mean a (sub)method, (sub)process and/or (sub)routine for achieving the recited result. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. In case of conflict, the present specification, including definitions, will control.
- The described embodiments and examples are illustrative only and not intended to be limiting. Although embodiments of the invention can be implemented separately, embodiments of the invention may be integrated into the system(s) with which they are associated. All the embodiments of the invention disclosed herein can be made and used without undue experimentation in light of the disclosure. Although the best mode of the invention contemplated by the inventor(s) is disclosed, embodiments of the invention are not limited thereto. Embodiments of the invention are not limited by theoretical statements (if any) recited herein. The individual steps of embodiments of the invention need not be performed in the disclosed manner, or combined in the disclosed sequences, but may be performed in any and all manner and/or combined in any and all sequences.
- Various substitutions, modifications, additions and/or rearrangements of the features of embodiments of the invention may be made without deviating from the spirit and/or scope of the underlying inventive concept. All the disclosed elements and features of each disclosed embodiment can be combined with, or substituted for, the disclosed elements and features of every other disclosed embodiment except where such elements or features are mutually exclusive. The spirit and/or scope of the underlying inventive concept as defined by the appended claims and their equivalents cover all such substitutions, modifications, additions and/or rearrangements.
- The appended claims are not to be interpreted as including means-plus-function limitations, unless such a limitation is explicitly recited in a given claim using the phrase(s) “means for” and/or “step for.” Subgeneric embodiments of the invention are delineated by the appended independent claims and their equivalents. Specific embodiments of the invention are differentiated by the appended dependent claims and their equivalents.
-
- [1] RFC 3550, RTP: A Transport Protocol for Real-Time Application, Internet Engineering Task Force Request for Comment.
- [2] RFC 3551, RTP Profile for Audio and Video Conferences with Minimal Control, Internet Engineering Task Force Request for Comment.
- [3] ITU-T Recommendation G.711, Pulse Code Modulation (PCM) of Voice Frequencies, Geneva, 1989.
- [4] Kishan Shenoi, Digital Signal Processing in Telecommunications, Prentice-Hall, 1995. ISBN0-13-096751-3.
- [5] ITU-T Recommendations series G, Transmission systems and media, digital systems and networks.
- [6] Stefano Bregni, Synchronization of Digital Telecommunications Networks, John Wiley & Sons, 2002.
ISBN 0 471 61550 1. - [7] P. K. Bhatnagar, Engineering Networks for Synchronization, CCS 7, and ISDN, IEEE Press, 1997. ISBN 0-7803-1158-2.
- [8] Danny De Vleeschauwer and Jan Janssen, Voice Performance over packet-based networks, An Alcatel White Paper.
- [9] Ramachandran Ramjee, Jim Kurose, Don Townsley, and Henning Schulzrine, Adaptive playout mechanisms for packetized audio applications in wide-area networks, Proceedings of the Conference on Computer Communication (IEEE INFOCOM), Toronto, Canada, June 1994.
- [10] Aman Kansal and Abhay Karandikar, Jitter-free audio playout over Best Effort packet networks, in ATM Forum—International Symposium on Broadband Communication in the New Millenium, August 2001.
- [11] Kishan Shenoi, Synchronization implications of providing Circuit Emulation Services in an IP Network, NFOEC/OFC, Anaheim, Calif., March 2005.
- [12] Kishan Shenoi, Synchronization Implications in VoIP, NIST-ATIS Workshop on Synchronization in Telecommunications Systems (WSTS), February 2004.
Claims (14)
1. A method, comprising:
monitoring a fill in an adaptive slip buffer of a digital to analog convertor;
adjusting a number of samples that are read from the adaptive slip buffer per page as a function of the fill; and
reading the number of samples from the adaptive slip buffer.
2. The method of claim 1 , wherein the number of samples defines an apparent frame interval as a function of a clock frequency of the digital to analog convertor.
3. The method of claim 2 , wherein the number of samples is increased when the fill is decreasing and the number of samples is decreased when the fill is increasing.
4. The method of claim 3 , wherein the number of samples is decreased when a sample is flagged as actionable.
5. The method of claim 3 , wherein the number of samples is changed when a minimum slip time interval has been exceeded.
6. The method of claim 3 , wherein the number of samples is changed when an apparent frequency change threshold has not been exceeded.
7. A computer program, comprising computer or machine readable program elements translatable for implementing the method of claim 1 .
8. A machine readable medium, comprising a program for performing the method of claim 1 .
9. An apparatus, comprising: a digital to analog convertor including an adaptive slip buffer and a read address generator coupled to the adaptive slip buffer, wherein the read address generator includes an increment control that adjusts a number of samples that are read from the adaptive slip buffer per page as a function of fill of the adaptive slip buffer.
10. The apparatus of claim 9 , wherein the number of samples controls an apparent frame interval as a function of a clock frequency of the digital to analog convertor.
11. The apparatus of claim 9 , wherein the adaptive slip buffer includes a circular buffer.
12. The apparatus of claim 9 , wherein the adaptive slip buffer includes a double buffer.
13. The apparatus of claim 9 , wherein the adaptive slip buffer includes a linear buffer.
14. A digital switched network integrated access device, comprising the apparatus of claim 11 .
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/065,583 US20110234200A1 (en) | 2010-03-24 | 2011-03-24 | Adaptive slip double buffer |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US34092210P | 2010-03-24 | 2010-03-24 | |
US34092310P | 2010-03-24 | 2010-03-24 | |
US34090610P | 2010-03-24 | 2010-03-24 | |
US13/065,583 US20110234200A1 (en) | 2010-03-24 | 2011-03-24 | Adaptive slip double buffer |
Publications (1)
Publication Number | Publication Date |
---|---|
US20110234200A1 true US20110234200A1 (en) | 2011-09-29 |
Family
ID=44655644
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/065,585 Expired - Fee Related US8379151B2 (en) | 2010-03-24 | 2011-03-24 | Synchronization of audio and video streams |
US13/065,584 Abandoned US20110235500A1 (en) | 2010-03-24 | 2011-03-24 | Integrated echo canceller and speech codec for voice-over IP(VoIP) |
US13/065,583 Abandoned US20110234200A1 (en) | 2010-03-24 | 2011-03-24 | Adaptive slip double buffer |
Family Applications Before (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/065,585 Expired - Fee Related US8379151B2 (en) | 2010-03-24 | 2011-03-24 | Synchronization of audio and video streams |
US13/065,584 Abandoned US20110235500A1 (en) | 2010-03-24 | 2011-03-24 | Integrated echo canceller and speech codec for voice-over IP(VoIP) |
Country Status (1)
Country | Link |
---|---|
US (3) | US8379151B2 (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8669892B2 (en) * | 2012-03-22 | 2014-03-11 | Silicon Laboratories Inc. | Digital-to-analog converter resolution enhancement using circular buffer |
US20140165056A1 (en) * | 2012-12-11 | 2014-06-12 | International Business Machines Corporation | Virtual machine failover |
US20140164709A1 (en) * | 2012-12-11 | 2014-06-12 | International Business Machines Corporation | Virtual machine failover |
US20170026298A1 (en) * | 2014-04-15 | 2017-01-26 | Dolby Laboratories Licensing Corporation | Jitter buffer level estimation |
EP3193459A1 (en) * | 2016-01-15 | 2017-07-19 | Deutsche Telekom AG | Method and system for testing the transmission quality in a communication network |
WO2023282959A1 (en) * | 2021-07-09 | 2023-01-12 | Arris Enterprises Llc | System and method to synchronize rendering of multi-channel audio to video presentation |
WO2024020186A1 (en) * | 2022-07-21 | 2024-01-25 | Shure Acquisition Holdings, Inc. | Communications between networked audio devices |
Families Citing this family (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103503340B (en) * | 2011-03-04 | 2016-01-13 | 株式会社电通 | Synchronizing content distributing broadcasting system |
US9208798B2 (en) | 2012-04-09 | 2015-12-08 | Board Of Regents, The University Of Texas System | Dynamic control of voice codec data rate |
US9568985B2 (en) * | 2012-11-23 | 2017-02-14 | Mediatek Inc. | Data processing apparatus with adaptive compression algorithm selection based on visibility of compression artifacts for data communication over camera interface and related data processing method |
CN103888630A (en) * | 2012-12-20 | 2014-06-25 | 杜比实验室特许公司 | Method used for controlling acoustic echo cancellation, and audio processing device |
RU2633107C2 (en) * | 2012-12-21 | 2017-10-11 | Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. | Adding comfort noise for modeling background noise at low data transmission rates |
US9589533B2 (en) * | 2013-02-28 | 2017-03-07 | Robert Bosch Gmbh | Mobile electronic device integration with in-vehicle information systems |
US9553954B1 (en) * | 2013-10-01 | 2017-01-24 | Integrated Device Technology, Inc. | Method and apparatus utilizing packet segment compression parameters for compression in a communication system |
GB201318653D0 (en) | 2013-10-22 | 2013-12-04 | Microsoft Corp | Adapting a jitter buffer |
US9313300B2 (en) | 2013-11-07 | 2016-04-12 | Integrated Device Technology, Inc. | Methods and apparatuses for a unified compression framework of baseband signals |
EP3160147A1 (en) | 2015-10-19 | 2017-04-26 | Thomson Licensing | Method for fast channel change, corresponding arrangement and device |
US10045346B1 (en) * | 2016-08-02 | 2018-08-07 | Sprint Spectrum L.P. | Assigning a low-GDV carrier to a high-speed UE |
US10264116B2 (en) * | 2016-11-02 | 2019-04-16 | Nokia Technologies Oy | Virtual duplex operation |
US10541933B2 (en) * | 2016-11-10 | 2020-01-21 | Disney Enterprises, Inc. | Systems and methods for aligning frames of a digital video content in IP domain |
TW201931863A (en) * | 2018-01-12 | 2019-08-01 | 圓剛科技股份有限公司 | Multimedia signal synchronization apparatus and synchronization method thereof |
US10834295B2 (en) * | 2018-08-29 | 2020-11-10 | International Business Machines Corporation | Attention mechanism for coping with acoustic-lips timing mismatch in audiovisual processing |
US11197054B2 (en) | 2018-12-05 | 2021-12-07 | Roku, Inc. | Low latency distribution of audio using a single radio |
US11031026B2 (en) * | 2018-12-13 | 2021-06-08 | Qualcomm Incorporated | Acoustic echo cancellation during playback of encoded audio |
US11165463B1 (en) * | 2019-07-12 | 2021-11-02 | Cable Television Laboratories, Inc. | Systems and methods for broadband signal equalization |
US11844038B2 (en) | 2020-09-15 | 2023-12-12 | Texas Instruments Incorporated | Synchronization of wireless network nodes for efficient communications |
GB2624172A (en) * | 2022-11-08 | 2024-05-15 | Virtex Entertainment Ltd | Apparatus and methods for virtual events |
US12147376B2 (en) * | 2023-03-27 | 2024-11-19 | Cypress Semiconductor Corporation | Efficient transmission of video and audio over slave FIFO interface |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020141338A1 (en) * | 2001-02-22 | 2002-10-03 | Snowshore Networks, Inc. | Minimizing latency with content-based adaptive buffering |
US6912224B1 (en) * | 1997-11-02 | 2005-06-28 | International Business Machines Corporation | Adaptive playout buffer and method for improved data communication |
US7023942B1 (en) * | 2001-10-09 | 2006-04-04 | Nortel Networks Limited | Method and apparatus for digital data synchronization |
US20060088000A1 (en) * | 2004-10-27 | 2006-04-27 | Hans Hannu | Terminal having plural playback pointers for jitter buffer |
US20080256271A1 (en) * | 2006-12-12 | 2008-10-16 | Breed Paul T | Methods and apparatus for reducing storage usage in devices |
US20090185695A1 (en) * | 2007-12-18 | 2009-07-23 | Tandberg Telecom As | Method and system for clock drift compensation |
US20110261719A1 (en) * | 2000-03-22 | 2011-10-27 | Texas Instruments Incorporated | Systems, processes and integrated circuits for improved packet scheduling of media over packet |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5857167A (en) * | 1997-07-10 | 1999-01-05 | Coherant Communications Systems Corp. | Combined speech coder and echo canceler |
IL123906A0 (en) * | 1998-03-31 | 1998-10-30 | Optibase Ltd | Method for synchronizing audio and video streams |
AU1359601A (en) * | 1999-11-03 | 2001-05-14 | Tellabs Operations, Inc. | Integrated voice processing system for packet networks |
JP4411499B2 (en) * | 2000-06-14 | 2010-02-10 | ソニー株式会社 | Information processing apparatus, information processing method, and recording medium |
US20030120484A1 (en) * | 2001-06-12 | 2003-06-26 | David Wong | Method and system for generating colored comfort noise in the absence of silence insertion description packets |
WO2003034725A1 (en) * | 2001-10-18 | 2003-04-24 | Matsushita Electric Industrial Co., Ltd. | Video/audio reproduction apparatus, video/audio reproduction method, program, and medium |
US7283585B2 (en) * | 2002-09-27 | 2007-10-16 | Broadcom Corporation | Multiple data rate communication system |
-
2011
- 2011-03-24 US US13/065,585 patent/US8379151B2/en not_active Expired - Fee Related
- 2011-03-24 US US13/065,584 patent/US20110235500A1/en not_active Abandoned
- 2011-03-24 US US13/065,583 patent/US20110234200A1/en not_active Abandoned
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6912224B1 (en) * | 1997-11-02 | 2005-06-28 | International Business Machines Corporation | Adaptive playout buffer and method for improved data communication |
US20110261719A1 (en) * | 2000-03-22 | 2011-10-27 | Texas Instruments Incorporated | Systems, processes and integrated circuits for improved packet scheduling of media over packet |
US20020141338A1 (en) * | 2001-02-22 | 2002-10-03 | Snowshore Networks, Inc. | Minimizing latency with content-based adaptive buffering |
US7023942B1 (en) * | 2001-10-09 | 2006-04-04 | Nortel Networks Limited | Method and apparatus for digital data synchronization |
US20060088000A1 (en) * | 2004-10-27 | 2006-04-27 | Hans Hannu | Terminal having plural playback pointers for jitter buffer |
US20080256271A1 (en) * | 2006-12-12 | 2008-10-16 | Breed Paul T | Methods and apparatus for reducing storage usage in devices |
US20090185695A1 (en) * | 2007-12-18 | 2009-07-23 | Tandberg Telecom As | Method and system for clock drift compensation |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8669892B2 (en) * | 2012-03-22 | 2014-03-11 | Silicon Laboratories Inc. | Digital-to-analog converter resolution enhancement using circular buffer |
US9069701B2 (en) * | 2012-12-11 | 2015-06-30 | International Business Machines Corporation | Virtual machine failover |
US20140164709A1 (en) * | 2012-12-11 | 2014-06-12 | International Business Machines Corporation | Virtual machine failover |
US20140164701A1 (en) * | 2012-12-11 | 2014-06-12 | International Business Machines Corporation | Virtual machines failover |
US9032157B2 (en) * | 2012-12-11 | 2015-05-12 | International Business Machines Corporation | Virtual machine failover |
US9047221B2 (en) * | 2012-12-11 | 2015-06-02 | International Business Machines Corporation | Virtual machines failover |
US20140165056A1 (en) * | 2012-12-11 | 2014-06-12 | International Business Machines Corporation | Virtual machine failover |
US20170026298A1 (en) * | 2014-04-15 | 2017-01-26 | Dolby Laboratories Licensing Corporation | Jitter buffer level estimation |
US10103999B2 (en) * | 2014-04-15 | 2018-10-16 | Dolby Laboratories Licensing Corporation | Jitter buffer level estimation |
EP3193459A1 (en) * | 2016-01-15 | 2017-07-19 | Deutsche Telekom AG | Method and system for testing the transmission quality in a communication network |
WO2023282959A1 (en) * | 2021-07-09 | 2023-01-12 | Arris Enterprises Llc | System and method to synchronize rendering of multi-channel audio to video presentation |
US12231713B2 (en) | 2021-07-09 | 2025-02-18 | Arris Enterprises Llc | System and method to synchronize rendering of multi-channel audio to video presentation |
WO2024020186A1 (en) * | 2022-07-21 | 2024-01-25 | Shure Acquisition Holdings, Inc. | Communications between networked audio devices |
Also Published As
Publication number | Publication date |
---|---|
US8379151B2 (en) | 2013-02-19 |
US20110235500A1 (en) | 2011-09-29 |
US20110234902A1 (en) | 2011-09-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20110234200A1 (en) | Adaptive slip double buffer | |
US7894489B2 (en) | Adaptive play-out buffers and adaptive clock operation in packet networks | |
US7636022B2 (en) | Adaptive play-out buffers and clock operation in packet networks | |
US6366959B1 (en) | Method and apparatus for real time communication system buffer size and error correction coding selection | |
US7460479B2 (en) | Late frame recovery method | |
US20020167911A1 (en) | Method and apparatus for determining jitter buffer size in a voice over packet communications system | |
US7450601B2 (en) | Method and communication apparatus for controlling a jitter buffer | |
US8437026B2 (en) | Compensation for facsimile transmission in a packet switched network | |
KR20090018853A (en) | Clock Drift Compensation Technology for Audio Decoding | |
US7639716B2 (en) | System and method for determining clock skew in a packet-based telephony session | |
US9686033B2 (en) | System and method for advanced adaptive pseudowire | |
US7177306B2 (en) | Calculation of clock skew using measured jitter buffer depth | |
US20090257455A1 (en) | Method and apparatus for synchronizing timing of signal packets | |
US7050465B2 (en) | Response time measurement for adaptive playout algorithms | |
US7542465B2 (en) | Optimization of decoder instance memory consumed by the jitter control module | |
CA2610550A1 (en) | A method and system for providing via a data network information data for recovering a clock frequency | |
Noro et al. | Circuit emulation over IP networks | |
Riegel | Requirements for edge-to-edge emulation of time division multiplexed (TDM) Circuits over packet switching networks | |
JP3669660B2 (en) | Call system | |
Verbiest et al. | Variable bit rate video coding in ATM networks | |
JPH04352537A (en) | ATM transmission system | |
Lee et al. | Variable rate video transport in broadband packet networks | |
Steinbach et al. | Adaptive media playout | |
RU2369015C2 (en) | Synchronisation of vodsl for dslam, connected to ethernet only | |
Baba et al. | Adaptive multimedia playout method based on semantic structure of media stream |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: FLOREAT, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SHENOI, KISHAN;REEL/FRAME:026390/0019 Effective date: 20110520 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |