5G/NR - AI/ML  




AI/ML - PHY - CSI Report

I think one of the most widely adopted application of AI/ML in physical layer is for CSI report process. In this note, I am trying to consollidate various concepts and ideas from different sources.

Comparison of Non-AI and AI-enabled CSI Feedback Frameworks


Feedback Framework




Codebook-based CSI Feedback


The UE searches for the nearest codeword in a predefined codebook, which is shared by the UE and the BS. The UE then feedbacks the index of the selected codeword to the BS, which obtains the corresponding codeword by looking up the codebook.

In order to get accurate feedback, the codebook size should be large but the algorithm complexity increases with the codebook size.

CS-based CSI Feedback


This type of feedback is based on the assumption of CSI sparsity in a certain domain. The downlink CSI is compressed by a sensing matrix and then reconstructed by some CS algorithms.

The main concerns are the sparsity assumption and the complexity of the reconstruction algorithms.

AI-enabled Implicit and explicit CSI Feedback


The UE first estimate/construct the entire channel matrix based on received signal. Then extract the codebook index or precoding matrix (i.e, v part of SVD) or compressed channel matrix.

Then send those information to gNB and gNB extract CSI using AI model.

Difficulties in  fair performance evaluation, managing computational complexity, designing protocols for UE and BS collaboration, ensuring model generalization, managing shared information between companies, integrating with channel prediction, combining with enhanced reciprocity-based feedback, and adapting to new use cases and variable computational power.


NOTE :  The term 'sparsity' in this context mean that the CSI can be represented in a sparse or compressed form in a certain domain, such as the angular or delay domain. This sparsity allows for more efficient transmission of CSI from the user equipment (UE) to the base station (BS), as the sparse representation can be transmitted with less data than the full CSI.

As an example, let's consider a wireless communication system where a base station (BS) communicates with multiple user equipment (UE) devices in a multi-path environment. In such a scenario, the Channel State Information (CSI) can be represented as a matrix, where each entry represents the channel gain from a specific transmit antenna at the BS to a receive antenna at a UE over a specific path.

In a typical urban environment, there are only a limited number of paths (due to buildings, trees, etc.) that the signal can take from the BS to a UE. This means that most of the paths do not exist, and therefore, the corresponding entries in the CSI matrix are zero. This results in a sparse matrix, i.e., a matrix with mostly zero entries.

This is the sparsity in the context of CSI feedback. By exploiting this sparsity, advanced signal processing techniques such as compressive sensing can be used to compress the CSI matrix for feedback from the UE to the BS, reducing the amount of data that needs to be transmitted and therefore saving bandwidth.

Use Case and Models

There can be various type of AI/ML models applicable to this application and I will try to collect those models from various source.

Report Enhancement

Let me start with models described in the paper AI for CSI Feedback Enhancement in 5G-Advanced. The frameworks proposed in this paper is well summarized by the following figure.


Source : AI for CSI Feedback Enhancement in 5G-Advanced


These model can be summarized and compared as in a table as follows.



Feedback Type

Changes to Existing Feedback Strategy

Deployment Stage

AI-enabled One-sided Refinement for Implicit CSI Feedback

  • UE construct the channel matrix across the whole CSI refernce signal
  • The UE performs Singular Value Decomposition (SVD) on the channel matrix. This process results in the production of a precoding matrix.
  • The UE then uses a shared codebook to feedback the precoding matrix to the Base Station (BS).
  • The BS receives the feedback and identifies the index of the selected precoding codeword.
  • Using the shared codebook, the BS selects the corresponding codeword that matches the index received from the UE.
  • A pretrained Neural Network (NN) at the BS then refines the obtained codeword to enhance the CSI feedback.


No, this framework does not need to change the existing feedback framework and is easy to deploy.

Can be embedded into the existing BS without standardization.

Autoencoder-based Two-sided Enhancement for Implicit CSI Feedback

  • UE construct the channel matrix across the whole CSI refernce signal
  • An NN-based encoder at the UE is then used to compress and quantize this precoding matrix.
  • The compressed and quantized precoding matrix is then converted into a feedback bitstream.
  • This feedback bitstream is sent from the UE to the Base Station (BS).
  • At the BS, an NN-based decoder is used to reconstruct the original precoding matrix from the received feedback bitstream.


Yes, replaces the original codebook-based coding and decoding with the NN-based encoder and decoder.

Expected to be introduced in 5G-Advanced.

Autoencoder-based Two-sided Enhancement for Explicit CSI Feedback

  • UE construct the channel matrix across the whole CSI refernce signal.
  • The UE uses an NN-based encoder to convert the channel matrix into a compressed bitstream.
  • This compressed bitstream, which represents the entire downlink channel matrix, is then sent from the UE to the Base Station (BS).
  • Upon receiving the bitstream, the BS uses an NN-based decoder to reconstruct the original channel matrix.
  • The BS now has the original CSI based on the received bitstream, which it can use for further processing and decision-making.


Yes, completely changes the CSI feedback and utilization strategy.

Expected to be deployed in 6G and beyond.

NOTE : What does it mean by 'Implicit' and 'Explicit' ?

  • Implicit CSI Feedback: In this case, the UE does not send the full CSI back to the BS. Instead, it sends a more compact representation, such as a precoding matrix or a codeword index from a predefined codebook. The BS then uses this information to infer the CSI. This method reduces the amount of data that needs to be sent back to the BS, saving bandwidth. However, it may not be as accurate as explicit feedback, especially in rapidly changing or complex environments.
  • Explicit CSI Feedback: In this case, the UE sends the full CSI back to the BS. This method can provide more accurate and detailed information about the channel to the BS, which can be beneficial for optimizing communication. However, it requires more bandwidth to send the full CSI, and it may also require more complex processing at the UE and the BS.

I found another well described method from Mathworks document : CSI Feedback with Autoencoders. I think the best part of this documents is to show the detailed procedure of each steps along the entire process.

Overall process is illustrated as follows : As you see here, the entire channel coefficient for every resource elements of every antenna is preprocessed (data reduction) and encoded (compressed), and sent to reciever and recovered to channel coefficient for every subcarriers and antenna.

High leve descriptions of this process is as follows :

  • Preprocess:
    • The input data, which has dimensions [N_sc, N_sym, N_rx, N_tx, N_slot], represents the CSI matrix, where N_sc is the number of subcarriers, N_sym is the number of symbols, N_rx is the number of receiver antennas, N_tx is the number of transmitter antennas, and N_slot is the number of time slots.
    • The preprocessing stage likely includes operations such as normalization, noise reduction, and possibly feature extraction, to prepare the CSI data for the encoding stage. The heatmap suggests a distribution of signal characteristics across subcarriers and transmitter antennas before this preprocessing.
  • Encoder:
    • This stage compresses the preprocessed CSI data into a lower-dimensional space. The encoder is part of the autoencoder architecture and is designed to capture the most important features of the input data.
    • The output of the encoder is a compact representation, often referred to as a "code" or "latent space representation". The numbers shown in the brackets ([0.0287, 0.0446, ...]) represent this compressed feature vector, which significantly reduces the dimensionality from the original input (a few numbers compared to potentially thousands in the input data).
  • Decoder:
    • The decoder is the second part of the autoencoder that attempts to reconstruct the original data from the compressed form created by the encoder. The objective is to have the output of the decoder match the original input data as closely as possible.
    • This step is critical for understanding how well the autoencoder can compress and reconstruct the CSI information, which is essential for reducing the overhead in feedback channels.
  • Postprocess:
    • After the decoder, there might be some postprocessing operations. These could include reshaping the data back to its original dimensions, scaling it back to its original range if normalization was applied, or applying some form of error correction.
    • The final heatmap on the right of the diagram looks similar to the initial input heatmap, which implies that the autoencoder is able to reconstruct the CSI data effectively from the compressed representation.

Out of the entire process, Preprocessing step can be broken down into followint steps. I think a good thing of the matlap document explains about this preprocessing step whereas most of the academic papers explain only about the core part (encoder/decoder part).

High level description for each step is as follows :

  • Average over Symbols:
    • The CSI is initially represented as a multidimensional matrix with dimensions corresponding to the number of subcarriers, symbols, receiver antennas, transmitter antennas, and time slots.
    • The first step is to average the signal over multiple symbols to stabilize the CSI estimate by reducing noise and transient effects. This averaging process helps in obtaining a more reliable representation of the channel characteristics.
  • 2D - DFT over SC-Tx Antenna:
    • A two-dimensional Discrete Fourier Transform (2D-DFT) is applied over the subcarriers and transmitter antennas.
    • The DFT over the transmitter antennas reveals the spatial frequency components that correspond to different angles of departure (AoD) for the transmitted signal. Essentially, this can provide the steering vectors for beamforming.
    • The DFT over the subcarriers helps understand how the channel response varies across different frequencies.
    • Note that x axis and y xis after this step becomes angle and delay samples.
  • Truncate Delay Ndelay
    • After the 2D-DFT, the signal in the delay domain is truncated to retain only the first Ndelay samples.
    • This truncation step is a form of dimensionality reduction, keeping the most significant delay paths which represent the channel's impulse response.
  • 2D - IDFT over SC-Tx Antenna:
    • A two-dimensional Inverse Discrete Fourier Transform (2D-IDFT) is then applied to the truncated signal.
    • This step is the reverse of the 2D-DFT and transforms the signal back into the spatial and subcarrier domains. It's essentially reconstructing the signal from its spatial frequency components.
  • Complex to Real-Imaginary:
    • The complex-valued matrix resulting from the 2D-IDFT is then separated into its real and imaginary parts.
    • This separation is necessary because in many practical systems, especially those that involve hardware processing, it's easier to deal with real and imaginary parts separately.

As in any type of machine learning algorithm, it is crucial and challenging to figure out how to train the model. It is even more challenging in case of auto encoder model because one model is seprated into different location : half of the model (encoder) is located in UE and half of the model is located in RAN. The challenges and possible workaround is well summarized in An Overview of the 3GPP Study on Artificial Intelligence for 5G New Radio as illustrated below.

Image Source : An Overview of the 3GPP Study on Artificial Intelligence for 5G New Radio

This is overall description of the training method shown in this illustration.

  • Type 1: Joint training at one side - Training is conducted solely at one entity, either at the network side or the user equipment (UE) side.
  • Type 2: Joint training at two sides - Training involves both the network and UE sides. The network-side model sends forward activation to the UE-side model, which then sends back the backward gradient. It means the training process is collaborative. The network model performs part of the computation and sends the result to the UE model, which continues the computation and sends back the necessary adjustments (gradients) to the network model. 'Forward activation' and 'Backward gradient' refer to the components of the backpropagation algorithm used in training neural networks. In joint training, these processes are distributed between the network and user equipment, allowing for a collaborative model training that takes advantage of computational resources and data available on both sides. Here's a simplified breakdown:
    • Forward Activation: This is the process where input data is passed through the network layer by layer until the output layer is reached. At each layer, the input is transformed using weights and a non-linear activation function. The final output is the 'activation' that is then used to make predictions.
    • Backward Gradient: After the forward pass, the output is compared to the desired outcome using a loss function, and the error is calculated. During the backward pass, this error is propagated back through the network, which involves computing the gradient of the loss function with respect to the weights of the network. This gradient is used to update the weights in the network to minimize the loss, hence the term 'backward gradient'.
  • Type 3: Separate training at two sides - Both network and UE sides train their respective models separately but share a training dataset. The nature of the training dataset exchanged is usually in the form of feature sets or labeled data samples that are relevant to both the network-side and user equipment (UE)-side models. The exchange of such datasets aims to ensure that both models, though trained separately, benefit from a harmonized understanding of the environment they operate in, leading to improved overall performance when deployed in a real-world setting. The dataset would typically consist of:
    • Data samples that have been preprocessed and labeled from both network and UE perspectives.
    • Information that is relevant for both models to learn from, which might include signal characteristics, network conditions, user behavior, or other context-specific information that can improve the model's performance on both ends.

Report Prediction

Getting the accurate CSI report at the proper timing is crucial for physical layer operation of most of wireless system. It is especially true for the system like 5G/NR which uses high degree of MIMO technology. However, getting the CSI report all the time usually requires very frequent CSI report and the frequent CSI report causes huge overhead. For example, based on my observation. The most frequent CSI report interval that I see from live network of 5G/NR seems to be 40 ms, 80 ms, 160 ms. However, there can be so many things happening in radio channel over those period (e.g, 40 ms) and the CSI report that gNB just got would not be accurate enough to use for now or right near future as illustrated below.

In orther words, the motivation for using AI/ML algorithms for Channel State Information (CSI) prediction is to address the issue of channel aging, which is the delay between when CSI is reported and when it is used by the gNB (gNodeB). This delay leads to the reported CSI being outdated, particularly at higher UE (User Equipment) speeds. This is a significant issue in MU-MIMO (multi-user multiple-input multiple-output) scenarios, especially with massive MIMO deployments where the performance is negatively impacted by the movement of UEs at medium to high speeds. AI/ML-based CSI prediction aims to mitigate the effects of outdated CSI by forecasting future CSI states, enabling more accurate and timely adjustments to the wireless network, enhancing overall communication performance.


Image Source : Predicting Future CSI Feedback For Highly-Mobile Massive MIMO Systems


What would be the solution to increase the accuracy of CSI value over the whole period before next report ? The answer is to 'properly' estimate the CSI value for the periiod between the latest CSI report and the next coming report.

It is easy to say, but not easy at all to do. There can be various algorithms to predict a specific values from the past data even before AI/ML came out. It would be natural to trying to think of applying AI/ML for this application.

In conclusion, the dynamic nature of the wireless environment and the massive use of MIMO technology in 5G/NR systems present significant challenges in obtaining accurate and timely CSI reports. The high overhead associated with frequent CSI reporting only amplifies these difficulties. However, we can look towards intelligent solutions to address these challenges. AI and ML algorithms are promising tools that can "fill in the gaps" by predicting CSI values in between report intervals, thereby increasing the overall accuracy of the CSI report. While the implementation of such algorithms is not trivial, their potential to significantly enhance the efficiency and effectiveness of CSI reports cannot be overstated. It is an avenue worth exploring, offering us the potential to push the boundaries of our wireless systems to even greater heights. As we move forward, the integration of AI/ML into wireless communication systems should be a key focus in our pursuit of optimized performance and reliability.

What type of Deep Learning Model can be used for this type of application ?

Since the main focus of this application is to predict a certain value in time domain, the typical models for sequence prediction like RNN, LSTM, GRU would be the candidates that pops up in your head right away. But in some researches more conventional model like CNN is shown to work on this type of application.

Following is an example of applying CNN for CSI prediction. The frame proposed in this paper uses a 3-D convolutional neural network to capture temporal, spatial, and frequency correlations of downlink channel samples. The proposed model significantly improves performance compared to the sample-and-hold approach and mitigates the impact of the dynamic communication environment.


Image Source : Predicting Future CSI Feedback For Highly-Mobile Massive MIMO Systems


Overall description of the model (CNN model) proposed in this paper is as follows :

  • Input: The input to the model is the past L channel observations.
  • Conv Block 1 and Conv Block 2: These are convolutional blocks that include extra normalization layers and activation layers. They build on top of the 3-D convolutional layer.
  • Conv Res Block: This is a convolutional residual block built on top of the convolutional block. It adopts a residual architecture, which is mainly used for extracting deeper features that are hardly found by shallow networks while keeping the training experience efficient.
  • MaxPool 1 and MaxPool 2: These are max-pooling layers used to reduce the dimensionality of the input, which helps to reduce the number of parameters in the last fully connected layer.
  • FC Block: This is a fully-connected block employed to reshape the output to have the desired dimension.
  • Prediction: The output of the model is the predicted future channel state information.

NOTE : there is a stage where the output of MaxPool1 and the output of Conv Res Block. What is the purpose of it ?

    The combination of the output of MaxPool1 and the Conv Res Block is a key part of the proposed deep learning model's architecture. This combination is a feature of the residual architecture used in the model.

    The main purpose of introducing the second max-pooling layer (MaxPool2) is to reduce the number of parameters in the last fully connected layer (FC Block), as normally, Nt and K are relatively large.

    • Nt refers to the number of antennas at the base station. It's a parameter that represents the transmit dimension in the spatial domain of the system.
    • K refers to the number of resource blocks. It's a parameter that represents the frequency dimension of the system

    The residual architecture is used to extract deeper features that are hardly found by shallow networks while keeping the training experience efficient. The output of the MaxPool1 and the Conv Res Block are combined and passed through another Conv Block (Conv Block 2) and then through MaxPool2. This process allows the model to capture more complex and abstract features from the input data, which can improve the accuracy of the model's predictions.

    In short, this combination allows the network to learn from both the original features and the transformed features, which can help to improve the model's performance.