Communication Technology

Communication Technology

Encode/Decode

coding and decoding are fundamental concepts in communication systems that are used to transform data for efficient and reliable transmission and reception.

Wireless communication can be affected by various problems, including poor signal strength, interference, and noise. These issues can degrade the quality of the received signal, leading to a poor user experience, such as jittery video streaming, echoing voice calls, or high latency in online gaming.

Channel coding is a technique used to mitigate these problems and ensure that data packets are transmitted and received error-free.

Block Code

Overview

Generation of Linear Block Code

Galois Field

gfconv

Examples of Channel Coding

Convolutional Code

Representation of Convolutional Code

Python Packages

Coding

The process of transmitting information over a communication channel can be divided into two main stages: source coding (or encoding) and channel coding (or encoding). Each stage serves a specific purpose and plays a crucial role in the overall communication process.

(With Reference to Lec 2 | MIT 6.450 Principles of Digital Communications I, Fall 2006 (YouTube) )

The overall purpose and functionality can be summarized as follows.

Source Coding (Encoding):

Purpose: The primary goal of source coding is to represent the original information in a more efficient and compact form, reducing the amount of data that needs to be transmitted over the communication channel.
Process: Source coding involves compressing the original data by removing redundancy and irrelevant information. This is achieved using various data compression algorithms, which represent the data in a more compact form without significant loss of information.
Examples: Common source coding techniques include Huffman coding, run-length encoding, and transform coding (e.g., JPEG for images, MP3 for audio). Lossless compression methods preserve the original data exactly, while lossy compression methods sacrifice some information for higher compression ratios.
Outcome: The result of source coding is a compressed version of the original data, which requires less bandwidth to transmit over the communication channel.

Channel Coding (Encoding):

Purpose: The primary goal of channel coding is to protect the transmitted data against errors that may occur during transmission due to factors such as noise, interference, and fading.
Process: Channel coding involves adding redundancy to the transmitted data, which allows the receiver to detect and, in some cases, correct errors in the received data. This is achieved using various error-correcting codes, which add extra bits or symbols to the data based on specific coding rules.
Examples: Common channel coding techniques include Hamming codes, Reed-Solomon codes, convolutional codes, turbo codes, and LDPC codes. These codes add redundancy to the data, enabling the receiver to detect and correct errors.
Outcome: The result of channel coding is a more robust and error-resistant version of the transmitted data, which improves the reliability and quality of the received signal.

If you take a look at a big picture as illustrated above, it may looks very simple. Then why the text book about this is so thick ?

It is because "We want to implement Efficient process". What do you mean by "Effiecient process" ? It may roughly mean "a process to send data using as less number of bits as possible" and "a process to be guaranteed to work". All that the thick text book is trying to say is "How we can make the efficient process ?", "How do you represent it in mathemaical form ?", "How do you prove (in mathemtical way) that it is efficient and always work?"

In real work, most of engineers doesn't care about this. There are only handsful of persons who digest all those textbook issues and come out with 'what they think is efficient process' and produce 'technical specification (e.g, 3GPP)'. Around 99% of us just follows the specification. That's why we don't read the text book after we graduate from university and still can survive -:)

NOTE : Does every communication system use both source coding and channel coding ? It seems almost every wireless system use channel coding, but not necessarily for source coding. For example, in cellular communication system (3G,4G,5G) we see explicit specification (3GPP) about channel coding, but I don't see corresponding specification for source coding

I think this is a reasonable observation. In wireless communication systems, especially in cellular communication systems like 3G, 4G, and 5G, channel coding is explicitly specified and used to ensure reliable data transmission over the wireless channel. Channel coding is essential for protecting the transmitted data against errors that may occur due to factors such as noise, interference, and fading in the wireless channel.

On the other hand, source coding is not always explicitly specified in the standards for cellular communication systems. This is because source coding is typically applied at the application layer, rather than the physical or link layers where channel coding is applied. Source coding is used to compress data for efficient storage and transmission, and the choice of source coding technique often depends on the specific type of data being transmitted (e.g., audio, video, text).

For example, in a cellular communication system, voice calls may use audio compression algorithms like AMR (Adaptive Multi-Rate) or EVS (Enhanced Voice Services) for source coding. Video streaming services may use video compression algorithms like H.264 or H.265 for source coding. These source coding techniques are typically chosen and implemented by the application or service provider, rather than being explicitly specified in the cellular communication standards.

Source Coding

There would be a couple of variations of the definitions of Source Coding depending on the context. The first type of definition is 'to convert the original data(user data) into a certain binary sequence' which is illustrated as below. Converting the original data into a sequence of binary sequence can be a kind of source coding because it can be the process of converting the original data (user data) into a certain binary sequence. Usually the converted sequence would represents the data in a more efficient and compact form. This binary sequence is typically shorter than the original data, resulting in data compression.

An example of Source coding that involves the process of converting a user data(original data) into a certain type of binary sequence is illustrated below. This is an example of source coding that convert an analog input into digital(binary) sequence.

(With Reference to Lec 2 | MIT 6.450 Principles of Digital Communications I, Fall 2006 (YouTube) )

First, let me describe on the source coding process as follows :

Input Waveform: The process starts with an analog input waveform, which could represent a continuous-time signal such as an audio signal or a video signal.
[Sampler]: The sampler is the first step in the process. It converts the continuous-time input waveform into a discrete-time analog sequence by taking samples of the waveform at regular intervals. This process is known as sampling. The sampling rate (the number of samples taken per second) is typically chosen based on the Nyquist-Shannon sampling theorem, which states that the sampling rate should be at least twice the highest frequency present in the input waveform.
Analog Sequence: The output of the sampler is a discrete-time analog sequence, which represents the input waveform as a series of discrete samples. Each sample is still an analog value.
[Quantizer]: The quantizer is the next step in the process. It converts the discrete-time analog sequence into a discrete-time symbol sequence by assigning each sample to one of a finite number of discrete levels or symbols. This process is known as quantization. Quantization introduces some loss of information, as the original analog values are approximated by the discrete symbols.
Symbol Sequence: The output of the quantizer is a discrete-time symbol sequence, which represents the input waveform as a series of discrete symbols. Each symbol corresponds to a specific range of analog values.
[Discrete Coder]: The discrete coder is the final step in the process. It converts the discrete-time symbol sequence into a binary sequence by assigning a unique binary code to each symbol. This process is known as discrete coding or binary coding. The binary codes are typically chosen to minimize the length of the binary sequence, resulting in data compression.
Binary Sequence: The output of the discrete coder is a binary sequence, which represents the input waveform as a series of binary digits (bits). This binary sequence can be transmitted over a digital communication channel or stored in digital form.

Next, let's talk about the source decoding process. Basically it is just reverse process of channel coding.

Binary Sequence: The process starts with a binary sequence, which represents the input waveform as a series of binary digits (bits). This binary sequence was generated by the source coding process.
[Discrete Decoder]: The discrete decoder is the first step in the process. It converts the binary sequence back into a discrete-time symbol sequence by assigning a unique symbol to each binary code. This process is known as discrete decoding or binary decoding.
Symbol Sequence: The output of the discrete decoder is a discrete-time symbol sequence, which represents the input waveform as a series of discrete symbols. Each symbol corresponds to a specific range of analog values.
[Table Lookup]: The table lookup is the next step in the process. It converts the discrete-time symbol sequence back into a discrete-time analog sequence by assigning an analog value to each symbol based on a lookup table. This process is known as dequantization.
Analog Sequence: The output of the table lookup is a discrete-time analog sequence, which represents the input waveform as a series of discrete samples. Each sample is an analog value.
[Analog Filter]: The analog filter is the final step in the process. It converts the discrete-time analog sequence back into a continuous-time analog output waveform by interpolating between the samples and removing any high-frequency artifacts introduced by the sampling and quantization processes. This process is known as reconstruction or filtering.
Output Waveform: The output of the analog filter is a continuous-time analog output waveform, which represents the original input waveform that was converted into a binary sequence by the source coding process.

Another type of definition for source coding (according to Ref [2]) is to convert a source data into a form that can minimize the bandwidth that are required to transmit the data. Simply put, it is just another name for 'Data Compression'.

Again, It doesn't look that complicated. Then why the text book about this is so thick ?

Same answer as above.

Channel Coding

As mentioned above, the primary goal of channel coding is to protect the transmitted data against errors that may occur during transmission due to factors such as noise, interference, and fading.

Channel coding involves adding redundancy to the transmitted data, which allows the receiver to detect and, in some cases, correct errors in the received data

A very simple method for Channel Coding is to replace 'original data bits' with 'some other bits (normally longer than the original bits)'. For example, the simplest coding would be as follows. In this method, we just repeating the data multiple times. so this kind of coding is called repetition code.

0 --> 0000 : replace all '1' in orginal data into '0000'

1 --> 1111 : replace all '1' in orginal data into '1111'

Another example of coding would be addition of parity bits as in RS 232 communication

original seven bit data --> original seven bit data + one parity bit

In communication system, we usually call 'the original data' as a 'message' and call the ecoded data as 'Codeword'. In all encoding process, the length of Codeword is greater than the length of 'message', in means that in coding process some additional bits are added to the original data(message) and this additional bits are called 'redundency bits'.

The examples mentioned above are the simplest type of coding just to give you some basic idea. There are many more types of channel coding technique that are used in communication system. Followings are some examples of those techniques and typical applications.

Channel Coding Type	Applications
Repetition Code	Simple error detection and correction, low-noise communication channels
Parity Code	Error detection in computer memory, data transmission, and storage
Convolutional Code	Wireless communication, satellite communication, deep-space communication
Hamming Code	Computer memory error detection and correction, satellite communication
BCH Code (Bose-Chaudhuri-Hocquenghem Code)	digital television broadcasting and data storage devices.
Golay Code	deep-space communication
Reed-Solomon Code	CDs, DVDs, QR codes, satellite communication, digital television broadcasting
Turbo Code	3G and 4G cellular communication, deep-space communication
LDPC (Low-Density Parity-Check) Code	5G cellular communication, Wi-Fi, digital television broadcasting
Polar Code	5G cellular communication

Why Channel Coding ?

Why do we need this kind of Coding ?

Most of the experts in this area would come out with something like 'Shannon's theorem'.. but let's just look at this in intuitive way rather than bringing up such a scary words.

Let's look at the simple wireless communication process that we went through in previous section. In this process, what do you think is the biggest problem ?

Just give yourself a moment to think before you go on.

The answer is the 'noise' which is added by 'Channel'. As you can guess, this noise would make a lot of errors while the reciever is demodulate the received signal into the bit stream.

What would be the solution for this problem ?

We can think about two possible options.

Make the channel noise-free. (Construct the channel in such a way there is no noise)
Use some method to detect and correct the error

Which option do you prefer ? You would know that the first option would be almost impossible especially in wireless communication. If it is wired communication, at least you can make some try to reduce the noise in the channel but in wireless it is almost impossible to remove the noise directly from the channel.

It means the only option is to develop some method (algorithm) to detect and correct the error caused by the noisy channel. This is the main motivation for 'Channel Coding'.

Main idea of Channel Coding is to add some additional bits (we call this as redundancy bits) to original data in a very special way (not in a random/arbitrary way) so that they can be used to detect the exact location of error and correct it.

In communication chain, following two blocks are added for this purpose. 'Encoding' is the process of adding redundancy bits and 'Decoding' is the process of extracting the error corrected bits from the received data.

Usually Encoding/Decoding blocks locate as shown below. Encoding takes in the bits stream and produce coded bit stream.

Downside of Channel Coding

In previous section, I said "Encoding/Decoding is mainly for detecting/correcting erros generated in communication process". It is very good and important step. But once you gain something, you would lose some other things. It is almost nature of everything including our life.

What is the gain for Encoding/Decoding ? It is the received data with less or almost no error.

What is the loss ? What is downside of this process ? Following are the major issues caused by Encoding/Decoding process.

Data Transmission Overhead (Due to redundancy bits) : There is a trade-off between the error correction capability of a channel code and the amount of redundancy (overhead) it adds to the original data. Higher error correction capability typically requires more redundancy, which can increase the bandwidth requirements and reduce the data transmission rate.
Increased Complexity: Complicate the data transmission and reception process (due to encoding/decoding algorithm). Channel coding techniques, especially advanced error-correcting codes, can be computationally complex. This can increase the complexity of the transmitter and receiver, leading to higher power consumption, increased hardware costs, and greater implementation challenges.
Increased Latency/Increased bandwidth: Channel coding involves adding redundancy to the original data, encoding the data, transmitting the codeword, detecting and correcting errors at the receiver, and decoding the corrected codeword to recover the original data. These processes can introduce delays, leading to increased latency in the communication system and requiring the increased bandwidth.
Limitations in Correcting Errors: While channel coding can improve the reliability of data transmission, it has limitations in correcting errors. Some channel codes may only be able to correct a certain number of errors in a block of data. If the number of errors exceeds this limit, the receiver may not be able to recover the original data.

Fundamental Questions of Channel Coding

As I said earlier, Channel Coding is adding some redundancy bits to original data and these redundancy bits will be used to detect and correct error in decoding process. There are many different Encoding/Decoding algorithms which has it's own advantage and disadvantages. So we would always have the following question when we implement Coding/Decoding Blocks.

How many bits of redundancy bits I have to add to minimize the number of redundancy bits and maximize the error correction ?
Which Encoding/Decoding algorithms would be the best for the communication system I am implementing. .

There would not be a simple/clear-cut answers to these questions.

In general, the more redundancy bits you add, you would have higher error detection/correction capability. But the more redudancy bit you add, the less throughput you would get because larger portions of transmitted bits should be allocated for the redundancy bits, not for the information that you want to send. So you have to carefully determine how many bits you would allocate for the redundancy bits.

Fortunately in most mobile communication system, somebody else has already determined the answer to these questions. For example, in LTE or WCDMA system.. 3GPP specification specifies all the details of Encoding/Decoding process and you don't have to worry about these questions.

For example, in LTE case, 3GPP 36.212 specifies the Coding algorithm and related parameters as shown below. Chapter 5 of the specification provides all the detailed parameters to implement the coding process.

From now, I will introduce some of the basic functions in Matlab/Octave related to Encoding/Decoding. My goal is to extend this section to Turbo Coding/Convolution Coding which is most widely used in mobile communication system, but I am not sure by when I can achieve this goal. Let's just start with the simple things first.

Block Code

Block Code is an encoding method in which we split the whole input data stream into small blocks and replace the small block (Message) with another small block(Codeword).

Overview

The process in generating a block code can be illustrated as shown below. Block coding operates on fixed-size blocks of data. Block Coding would be done by several steps as below.

Data Segmentation: Divide the original data into fixed-size blocks. Each block will be encoded separately.
Redundancy Addition: Add redundancy to each block of data. This can be done in various ways, such as adding parity bits, check bits, or performing mathematical operations on the data. The specific method of adding redundancy depends on the chosen block coding technique.
Error Correction Capability: Determine the error correction capability of the chosen block coding technique. This refers to the number of errors that can be corrected in each block of data. The error correction capability is typically a parameter of the block coding technique.
Encoding Algorithm: Apply the encoding algorithm of the chosen block coding technique to each block of data. The encoding algorithm generates a codeword for each block of data. The codeword includes both the original data and the added redundancy.
Codeword Formation: Combine the original data and the added redundancy to form the codeword for each block of data. The codeword is larger than the original block of data due to the added redundancy.
Repeat for Each Block: Repeat previous steps for each block of data.

In terms of designing a certain block coding algorithm, you may take all of these steps into account, but in implementation of the algorithm you may not see each of these factors as separate steps. Here goes an simple and high level example on how block coding is done.

The small block produced by Data Segmentation gets into the Encoding block. The block getting into the Encoding function is called a 'message' and the block getting out of the Encoding function is called 'Codeword'. We also express this process in a symbol (n,k). n is the number of bits for codeword(output) and k is the number of bits for message (input). n is always greater than k, meaning the length of output(Codeword) is always longer than the length of input(Message).

One example of block code is as shown below. In this example, we split the input data stream into 4 bit chunks and replace each of the chunks with 7 bit chunk. The mapping table between each possible 4 bit chunk(input = message) and 7 bit chunk(output = codeword) is shown below.

Meaning of 'Linear'

'Linear' when you say 'Linear Coding' means a set of code in which any linear combination (modular 2 sum) of two codes within the set results in a code which also belong to the original set. Let's assume that you have a set of codes as follows. Take out any two codes from the set and take modular 2 sum of them. The result is also a member of the set as shown below. Take any other two codes and try yourself.

Meaning of Cyclic

'Cyclic' is a set of code in which a code can be generated by cyclic shift of another code in the set. or in other word, any number of cyclic shift of a code in the set also fall within the set. For example, let's assume that we have a code set which is made up of following four codes. If you look into each of these codes, you would notice that any of these codes can be created by the cyclic shift of another code as illustrated below.

Take the code (0 1 0 1 0 1) and cyclic shift right by one and it gives (1 0 1 0 1 0) which is also a member of the set.

Take the code (0 1 0 1 0 1) and cyclic shift right by two and it gives (0 1 0 1 0 1) which is also a member of the set.

In short, in this set .. whatever code you take and however many times you do cyclic shift, the result is always within the original set.

Generation of Linear Block Code

Now you may ask "how can I create a mapping table as shown above ?".

The simple answer would be "You can create it in any way as you like ?".

Really, Is it true ? Can I use any arbitrary mapping table as pops up in my mind now ?

Yes, you can. but it doesn't mean that what you created in arbitrary way would be 'efficient' mapping (coding).

Then what would be the efficient coding ? There may be multiple answers to this, but some common answer would be

Capability of detecting/correcting with minimum number of redundancy bits.
Easy of implementatione

Generation By Generation Matrix

One of the way to perform coding is to use what is called 'Generator Function'. You might come across this term in any kind of materials related to 'Coding'. My first response when I first heard of this function was 'What the hell, is it ?'. Is it something like those functions that we learned in high school ? It looks similar to what we learned in high school days.. e.g, g(x) = x^4 + x^2 + 1, but the way they are used looks pretty different from high school.

My next question was "Why they call it 'Generator Function' ?".

I would not try to give you answer to these question here.. you will get it as you proceed. What I am trying to say is "Don't feel yourself as stupid if you have the same questions" -:).

Generator Function is a kind of 'Binary Function' which can perform 'coding' (Convering a message to a codeword) in following manner.

Expressing the equation in more detailed form, it can be expressed as shown below.

The best way to understand the real meaning of the equation above would be by example. Here goes one example.

Let's suppose you have four generating functions

g1(x) = 1 + 1 x + 0 x^2 + 1 x^3 + 0 x^4 + 0 x^6 + 0 x^7 = 1 + x + x^3

g2(x) = 0 + 1 x + 1 x^2 + 0 x^3 + 1 x^4 + 0 x^6 + 0 x^7 = 0 + x + x^2 + x^4

g3(x) = 1 + 1 x + 1 x^2 + 0 x^3 + 0 x^4 + 1 x^6 + 0 x^7 = 1 + x + x^2 + x^6

g4(x) = 1 + 0 x + 1 x^2 + 0 x^3 + 0 x^4 + 0 x^6 + 1 x^7 = 1 + x + x^2 + x^7

you can pack these equations into a matrix as shown below. (Note that the lowest order comes leftmost).

Let's suppose you have a message as shown below.

m = (1 1 0 1)

Now, let's figure out codeword for this message (figure out the output bit stream when this message go through the encoder block). This is the point where we use the generator function shown above. The process to calculate the codeword using the generator function is as shown below.

For practice, try all the messages shown in overview section and check if your calculation matches the codewords in the section.

Now let's take a high level view of generator function (generator matrix) and see if there is any recognizable pattern. It is as shown below.

You may notice followings right away.

i) The number of rows is same as the length of message (number of bits in message)

ii) The matrix contains the k x k Identity matrix, where k is the length of the message

Now you would ask another again. How can I get these generator functions ? (How can I know which generator functions I have to use ?). Don't worry.. in most case these functions/matrix will be given to you from the specification of the communication system. (Of course, there is theoretical background that justifies those generator functions given by the specification and you can refer to Communication Text Books or surf on internet for it. I would not get further into it). In think, just understanding the meaning of the generator functions and how to use them to implement 'encoder' block would be enough in most case.

Now let's look into the codebook we generated from higher level. As shown below, you would notice that the codebook contains the message part as it is at the end and redundant bits at the beginning. This is one of the characteristics of what we call 'linear block coding'.

Generation by Generator Polynomial

NOTE : How (1 + X + X^2)(1 + X + X^3) produces the result 1 + X^4 + X^5 ?

The result would seem odd in terms of regular polynomial calculation that we have learned in high school math. It is because the polynomial equation in this context is 'Binary' polynomials. In binary polynomial arithmetic, the coefficients of the polynomial can only be 0 or 1, and the addition operation is performed modulo 2. This means that the sum of two 1s is 0 (1 + 1 = 0), and the sum of a 1 and a 0 is 1 (1 + 0 = 1).\

Let's multiply the two given polynomials and then simplify the result using binary polynomial arithmetic:

(1 + X + X^2)(1 + X + X^3)

= 1*(1 + X + X^3) + X*(1 + X + X^3) + X^2*(1 + X + X^3)

= 1 + X + X^3 + X + X^2 + X^4 + X^2 + X^3 + X^5

Now, let's simplify the result by combining like terms and performing addition modulo 2:

= 1 + (X + X) + (X^2 + X^2) + (X^3 + X^3) + X^4 + X^5

= (1 mod 2) + (2 mod 2) X + (2 mod 2) X^2 + (2 mod 2) X^3 + (1 mod 2) X^4 + (1 mod 2) X^5

= 1 + 0 + 0 + 0 + X^4 + X^5

= 1 + X^4 + X^5

So, the result of the multiplication in binary polynomial arithmetic is:

1 + X^4 + X^5

Convolutional Code

It is hard to cleary explain what is the convolutional code is, but you would get a rough image if you think about the 'convolution' that you learned in your engineering math.

One important criteria for convolutional code would be

Each bit on output stream is influenced not only by the current input bit but also influenced by the previous output bits.

How to represent convolutional coding process ?

Representing the convolutional coding process is not simple and clearly explained in short space. So I created a separate page for this(See ConvolutionalCode page).

Python Packages

Python Package	Description
Scipy	Provides functions for encoding and decoding data using error-correcting codes such as Reed-Solomon codes and BCH codes.
Pyldpc	Provides tools for encoding and decoding data using Low-Density Parity-Check (LDPC) codes.
CommPy	Provides tools for modulation, demodulation, channel coding, and channel decoding. Supports various channel coding techniques, including convolutional codes and turbo codes.
PyCode	Provides tools for encoding and decoding data using various coding techniques, including Hamming codes, Reed-Solomon codes, and BCH codes.
Galois	Provides tools for finite field arithmetic, which is used in error-correcting codes such as Reed-Solomon codes and BCH codes.

Reference :

The Basics of Channel Coding
SIMULATION OF DIGITAL COMMUNICATION SYSTEMS USING MATLAB By Mathuranathan Viswanathan

YouTube :

Lec 2 | MIT 6.450 Principles of Digital Communications I, Fall 2006 (YouTube)