Wireless Networking Site: January 2010

Wednesday, January 20, 2010

Voice Compression Standards

To conserve valuable WAN bandwidth, you can compress the quantized voice waveforms. Two categories of waveform encoding include:

Waveform algorithms (coders) Waveform algorithms have the following functions and characteristics:

- Sample analog signals at 8000 times per second

- Use predictive differential methods to reduce bandwidth

- Highly impact voice quality because of reduced bandwidth

- Do not take advantage of speech characteristics

- Examples include: G.711 and G.726

Source algorithms (coders) Source algorithms have the following functions and characteristics:

- Source algorithm coders are called vocoders, or voice coders. A vocoder is a device that converts analog speech into digital speech, using a specific compression scheme that is optimized for coding human speech.

- Vocoders take advantage of speech characteristics.

- Bandwidth reduction occurs by sending linear-filter settings.

- Codebooks store specific predictive waveshapes of human speech. They match the speech, encode the phrases, decode the waveshapes at the receiver by looking up the coded phrase, and match it to the stored waveshape in the receiver codebook.

- Examples include: G.728 and G.729

The following three common voice compression techniques are standardized by the ITU-T:

PCM Amplitude of voice signal is sampled and quantized at 8000 times per second. Each sample is then represented by one octet (8 bits) and transmitted. For sampling, you must use either a-law or µ-law to reduce the signal-to-noise ratio.

ADPCM The difference between the current sample and its predicted value (based on past samples). ADPCM is represented by 2, 3, 4, or 5 bits. This method reduces the bandwidth requirement at the expense of signal quality.

CELP Excitation value and a set of linear-predictive filters (settings) are transmitted. The filter setting transmissions are less frequent than excitation values and are sent on an as-needed basis.

Table 2-4 describes the CODECs and compression standards.

A common type of waveform encoding is pulse code modulation (PCM). Standard PCM is known as ITU standard G.711, which requires 64,000 bits per second of bandwidth to transport the voice payload (that is, not including any overhead), as shown in Figure 2-30.

Figure 2-30 shows that PCM requires 1 polarity bit, 3 segment bits, and 4 step bits, which equals 8 bits per sample. The Nyquist Theorem requires 8000 samples per second; therefore, you can figure the required bandwidth as follows:

8 bits * 8000 samples per second = 64,000 bits per second

Adaptive differential pulse code modulation (ADPCM) coders, like other waveform coders, encode analog voice signals into digital signals to adaptively predict future encodings by looking at the immediate past. The adaptive feature of ADPCM reduces the number of bits per second that the PCM method requires to encode voice signals. ADPCM does this by taking 8000 samples per second of the analog voice and turning them into linear PCM samples. ADPCM then calculates the predicted value of the next sample, based on the immediate past sample, and encodes the difference. The ADPCM process generates 4-bit words, thereby generating 16 specific bit patterns.

The ADPCM algorithm from the Consultative Committee for International Telegraph and Telephone (CCITT) transmits all 16 possible bit patterns. The ADPCM algorithm from the American National Standards Institute (ANSI) uses 15 of the 16 possible bit patterns. The ANSI ADPCM algorithm does not generate a 0000 pattern.

The ITU standards for compression are as follows:

G.711 rate: 64 kbps = (2 * 4 kHz) * 8 bits/sample

G.726 rate: 32 kbps = (2 * 4 kHz) * 4 bits/sample

G.726 rate: 24 kbps = (2 * 4 kHz) * 3 bits/sample

G.726 rate: 16 kbps = (2 * 4 kHz) * 2 bits/sample

Code excited linear prediction (CELP) compression transforms analog voice as follows:

1. The input to the coder is converted from an 8-bit to a 16-bit linear PCM sample.

2. A codebook uses feedback to continuously learn and predict the voice waveform.

3. The coder is excited (that is, begins its lookup process) by a white noise generator.

4. The mathematical result is sent to the far-end decoder for synthesis and generation of the voice waveform.

Two forms of CELP include Low-Delay CELP (LDCELP) and Conjugate Structure Algebraic CELP (CS-ACELP). LDCELP is similar to CS-ACELP, except for the following:

LDCELP uses a smaller codebook and operates at 16 kbps to minimize delay, or look-ahead, from 2 to 5 ms, while CS-ACELP minimizes bandwidth requirements (8 kbps) at the expense of increasing delay (10 ms).

The 10-bit code word is produced from every five speech samples from the 8 kHz input with no look-ahead.

Four of these 10-bit code words are called a subframe. They take approximately 2.5 ms to encode. CS-ACELP uses eight 10-bit code words.

Two of these subframes are combined into a 5-ms block for transmission. CS-ACELP is a variation of CELP that performs these functions:

Codes 80-byte frames, which take approximately 10 ms to buffer and process.

Adds a look-ahead of 5 ms. A look-ahead is a coding mechanism that continuously analyzes, learns, and predicts the next waveshape.

Adds noise reduction and pitch-synthesis filtering to processing requirements.

Cisco VoIP environments typically leverage the benefits of G.729 when transmitting voice traffic over the IP WAN. These benefits include the ability to minimize bandwidth demands, while maintaining an acceptable level of voice quality. Several variants of G.729 exist.

Monday, January 4, 2010

Analog-to-Digital and Digital-to-Analog Voice Encoding

This section covers the fundamentals of digitally encoding voice, specifically, the basics of voice digitization and the various compression schemes that are used to transport voice while using less bandwidth.

Digitizing speech was a project first undertaken by the Bell System in the 1950s. The original purpose of digitizing speech was to deploy more voice circuits with a smaller number of wires. This evolved into the T1 and E1 transmission methods of today. Examples of analog and digital waveforms are presented in Figure 2-24.

Table 2-3 details the steps to convert an analog signal to a digital signal.

The three mandatory components in the analog-to-digital conversion process are further described as follows:

Sampling Sample the analog signal at periodic intervals. The output of sampling is a pulse amplitude modulation (PAM) signal.

Quantization Match the PAM signal to a segmented scale. This scale measures the amplitude (height) of the PAM signal and assigns an integer number to define that amplitude.

Encoding Convert the integer base-10 number to a binary number. The output of encoding is a binary expression in which each bit is either a 1 (pulse) or a 0 (no pulse).

This three-step process is repeated 8000 times per second for telephone voice-channel service. Use the fourth optional step, compression, to save bandwidth. This optional step allows a single channel to carry more voice calls.

After the receiving terminal at the far end receives the digital PCM signal, it must convert the PCM signal back into an analog signal. The process of converting digital signals back into analog signals includes the following two processes:

Decoding The received 8-bit word is decoded to recover the number that defines the amplitude of that sample. This information is used to rebuild a PAM signal of the original amplitude. This process is simply the reverse of the analog-to-digital conversion.

Filtering The PAM signal is passed through a filter to reconstruct the original analog wave form from its digitally coded counterpart.

With this basic understanding of analog to digital conversion, this chapter considers the sampling, quantization, and encoding processes more thoroughly, beginning with sampling.

Sampling and the Nyquist Theorem

One of the major issues with sampling is determining how often to take those samples (that is, "snapshots") of the analog wave. You do not want to take too few samples per second because when the equipment at the other end of the phone call attempts to reassemble and make sense of those samples, a different sound (that is, a lower frequency sound) signal might also match those samples, and the incorrect sound would be heard by the listener. This phenomenon is called aliasing, as shown in Figure 2-25.

With the obvious detrimental effect of undersampling, you might be tempted to take many more samples per second. While that approach, sometimes called oversampling, does indeed eliminate the issue of aliasing, it also suffers from a major drawback. If you take far more samples per second than actually needed to accurately recreate the original signal, you consume more bandwidth than is absolutely necessary. Because bandwidth is a scarce commodity (especially on a wide-area network), you do not want to perform the oversampling shown in Figure 2-26.

Digital signal technology is based on the premise stated in the Nyquist Theorem: When a signal is instantaneously sampled at the transmitter in regular intervals and has a rate of at least twice the highest channel frequency, then the samples will contain sufficient information to allow an accurate reconstruction of the signal at the receiver. Figure 2-27 illustrates sampling, as prescribed by the Nyquist Theorem.

While the human ear can sense sounds from 20 to 20,000 Hz, and speech encompasses sounds from about 200 to 9000 Hz, the telephone channel was designed to operate at about 300 to 3400 Hz. This economical range carries enough fidelity to allow callers to identify the party at the far end and sense their mood. Nyquist decided to extend the digitization to 4000 Hz, to capture higher-frequency sounds that the telephone channel may deliver. Therefore, the highest frequency for voice is 4000 Hz, or 8000 samples per second; that is, one sample every 125 microseconds.

Quantization

Quantization involves dividing the range of amplitude values that are present in an analog signal sample into a set of discrete steps that are closest in value to the original analog signal, as illustrated in Figure 2-28. Each step is assigned a unique digital code word.

In Figure 2-28, the x-axis is time and the y-axis is the voltage value (PAM). The voltage range is divided into 16 segments (0 to 7 positive, and 0 to 7 negative). Starting with segment 0, each segment has fewer steps than the previous segment, which reduces the signal-to-noise ratio (SNR) and makes the segment uniform. This segmentation also corresponds closely to the logarithmic behavior of the human ear. If there is an SNR problem, it is resolved by using a logarithmic scale to convert PAM to PCM.

Linear sampling of analog signals causes small-amplitude signals to have a lower SNR, and therefore poorer quality, than larger amplitude signals. The Bell System developed the µ-law method of quantization, which is widely used in North America. The International Telecommunication Union (ITU) modified the original m-law method and created a-law, which is used in countries outside of North America.

By allowing smaller step functions at lower amplitudes, rather than higher amplitudes, µ-law and a-law provide a method of reducing this problem. Both µ-law and a-law "compand" the signal; that is, they both compress the signal for transmission and then expand the signal back to its original form at the other end.