I have an excellent tips for troubleshooting for throughput or throughput optimization.
The TIP is "There is NO SHORTCUT for throughput troubleshoot or throughput optimization".
I saw a lot of cases where people just are just trying to find the short cut and eventually spend more time and energy.
I will give you my personal approach in this section.
First, write down all the components in the data path (Really "write down" on the paper or in computer document software.). Following are a couple of examples for the description of the data path. You would have more Cases on your own and you would describe it in more detail, meaning putting down more detailed components in the path. The more components you can write
down, the sooner you would achieve your goal.
Case 1 : UE Packet App on UE PC -(1)-> UE IP Driver -(2)-> UE PDCP -(3)-> UE RLC -(4)-> UE MAC -(5)-> UE PHY
-(6)-> RF Connection -(7)-> Equipment PHY -(8)-> Equipment MAC -(9)-> Equipment RLC -(10)-> Equipment
-(11)-> Equipment PDCP -(10)-> Equipment TE -(11)-> Network Interface on Server PC -(12) -> Packet App
on Server PC
Case 2 : UE Packet App on UE -(1)-> UE PDCP -(2)-> UE RLC -(3)-> UE MAC -(4)-> UE PHY
-(5)-> RF Connection -(6)-> Equipment PHY -(7)-> Equipment MAC -(8)-> Equipment RLC -(9)-> Equipment
-(10)-> Equipment PDCP -(11)-> Equipment TE -(12)-> Network Interface on Server PC -(13) -> Packet App
on Server PC
Case 3 : Client UE Packet App on UE -(1)-> WiFi Stack on Client UE -(2)-> WiFi Connection
-(3)-> WiFi Stack on Mobile Hotspot UE -(4)-> Hotspot UE PDCP -(5)-> Hotspot UE RLC
-(6)-> Hotspot UE MAC -(7)-> Hotspot UE PHY -(8)-> RF Connection -(9)-> Equipment PHY
-(10)-> Equipment MAC -(11)-> Equipment RLC -(12)-> Equipment -(13)-> Equipment PDCP
-(14)-> Equipment TE -(15)-> Network Interface on Server PC -(16) -> Packet App on Server PC
Second, ask yourself "Do I have any measure/tools to see what's happening in each and every components ?". (Wireshark, UE logging tool, Network logging tool would be the minimum requirement).
Third, ask yourself "Do you have knowledge and skills to analyze every and each components you wrote down at step 1?"
It would not be highly possible for you to be the one who knows everything. At least try to get other persons ready to help you analyze those data.
Fourth, try to indetify important parameters influencing the throughput. The more, the better. Following is an example list coming from my experience. I will split these factors into two main catetories as listed below.
Factors : Data Path
One simple and obvious rule in throughput is "Higher Layer throughput can never be larger than Lower Layer throughput". Putting it another way using a specific case, IP layer throughput can never be larger than L1/PHY layer throughput. It implies that if you want to get max IP throughput, you have first guarantee that you have L1/PHY is in such a condition that
allows it's maximum capacity with no error. More specifically in LTE, it means a maximum possible transport block size is allocated in every subframe and there is no HARQ NACK/DTX for any transmission. In many case, to check this condition for every subframe is very tedious and time consuming but without this step, trying to achieve higher layer throughput is almost meaningless.
I have been asked so many times to troubleshoot various throughput issues without any information on lower layer. The first difficulty that I have was to let them understand why these information is important.
|Transport Block Size
In ideal condition with very good radio signal, you would get get higher throughput as you increase Transport block size(TBS). But if you increase TBS even when radio condition is poor, the chance of recetion failure gets high and it would result in a lot of retransmission which would lead to throughput degradation.
When you are doing throughput test using test equipment, handling TBS issue is pretty straightforward since you can explicitely set a certain TBS as you like for each and every subframe. However, if you are doing the throughput test in live network, in most case you would not have such a controlability. In such a case, you need to have very detailed logging that shows TBS allocation for each and every subframe.
||Code Rate may not be considered as a direct factors for throughput. But in some case, it can negatively influence the throughput. If the Code Rate gets too high, the probablity of CRC error leading to retransmission and in turn leading to lower throughput. CodeRate tend to start being an important factors from Category 3 100 Mbps throughput and become more common factors from Category
||CFI is not a direct factos, but it can influce on Code Rate which in turn may lead to throughput variation. See CFI page on how CFI can influence Code Rate and Throughput.
|BSR (Buffer Status Report)
||This is mainly for Uplink throughput. If network or test equipment is configured to schedule uplink data transmission (PUSCH) based on BSR, network would not allocate large TBS unless UE send BSR with high value. If you need to troubleshoot throughput issues in this case, you need to check all BSR values and see if UE send proper value and network allocate proper TBS based
on the value.
|CQI Report Accuracy
When UE report lower CQI value than it supposed to be, the reception reliability may increase a little bit since network would allocate less TBS than UE can can handle but you would get a little bit less throughput than can be achieved with maximum capacity.
When UE report higher CQI value than it supposed to be, the reception reliability may decrease and cause reception error if network allocate the max TBS for the CQI value UE reported.
||In ideal condition, Transmission mode for MIMO (e.g, TM3, TM4) would lead higher throughput than the transmission mode for SISO(TM1) or Diversity (TM2).
|RLC Window Size
||Generally larger RLC Window size would be helpful in a communication condition where not much RLC retransmission occurs, but it is hard to say which value would be the best for a specific condition. In most case, this would not seem to influence much in throughput unless you set it to be too low.
|RLC Reordering Timer
||In most case, this value wouldn't influence much in terms of throughput according to my experience, but there were some cases in which I had to tweak this value several times to achieve ideal throughput. It is hard to say whether just large value is good or low value is good. You may need to tweak this value depending on situations.
|TCP Window Size
||Generally speaking larger TCP window size may help achieving higher throughput but there can be some overhead about it. Recently a lot of UE or PC TCP stack keep changing TCP Window Size dynamically based on its own internal algorithm. It is good if everything works fine, but it is very hard to troubleshoot if this dynamic TCP Window Size change cause any problems.
|IP Packet Latency (RTT)
||This would not influence much on UDP throughput, but it would influence a lot on TCP based throught (e.g, ftp, http etc). I strongly recommend you try throughput with different RTT and see how your device is influenced by this factor. According to my experience, I see great deal of throughput reduction when the RTT gets over 50~60 ms.
|RLC SDU capacity/Layer 2 Buffer
||Up to Catetory 3, I didn't see any of these items play an important role in throughput. But I see these factors start influencing the throughput performance especially from Cat 6 or higher. See UE Category pages for 3GPP requirement for these factors. 3GPP requirement is a kind of recommended one about this factors. It seems some test equipment support less capability
than 3GPP recommendation and some equipment support higher capability than the suggestion.
|Average IP packet size
||For all the test equipment (probably even in live network), there is limitation on the number of RLC SDUs, PDCP packets etc it can process within one TTI. So if the average IP packet size being pumped from test tool is small, the max throughput would be lower than expected even when the tool is generating enough number of IP packets. (Some people test throughput not only with max IP packet size (e.g, around 1300~1500
Bytes per packet) but also various combination of smaller packets)
||MTU Size is dependent on the capability of each NIC (Network Interface Card) and this would also related to IP Packet Size. In most case, this value is set to be 1200~1500 and I think Windows default value is 1300. You would need to try several different values to find which one is the best. (Refer to Setting MTU Size section
to change the value on Windows)
|Data Buffer Size in test Software
||Most of IP throughput application has one or more types of internal data buffers. sometimes especially in very high throughput case, those buffer size setting is very important for achieving the targeted throughput and stability of the throughput (e.g, Internal Transfer Buffer Size, Socket buffer size in FileZilla)
||Even in low throughput, I saw many cases that USB driver cause a lot of issues that results in a poor throughput. In case of very high throughput case, you have to consider the USB version as well. For example, it would be impossible to achieve Cat 6 Max throughput (300 Mbps) if you use USB v2, you should use USB v3 to achieve this level of throughput.
||Most of Ethernet Cable or Switches you have been using would support 10/100 BASE by default. so you would not have much issues with the cables or Switches up to 100 Mbps throughput. But if you want to achieve the throughput much higher than 100 Mbps (e.g, Cat 4 Max throughput, 150 Mbps), you have to make it sure that the cable is CAT 6 cable (supporting Gb ethernet) and all the ports on the network switch also support
|Mobile Hot Spot Efficiency
Factors : Software Tools/PC
|Linux vs Windows
||If you are OK with around 90% of ideal throughput at IP layer (e.g, around 90 Mbps at Cat 3 max throughput condition), there may not be any issue whether you use Linux or Windows. But if you want to achieve very close to ideal max, I would recommend Linux PC.
|CPU utilization ration
||LTE level throughput (e.g, 100 Mbps and over) is pretty tough task not only for IP stack, but also the IP application software and CPU utilization. So, in stead of directly jumping into max throughput test, increase the throughput step-by-step and check CPU utilization (e.g, you can monitor the CPU utilization using Windows Task Manager)
||We can find many different versions of iperf, but according to my experience performance different version shows pretty much different CPU utilization. This would not cause serious problems when you test Downlink only or Uplink only throughput, but it would cause critical issues when you try to do bidirectional throughput.
|Active vs Passive mode in FTP
||This would not make any differences in throughput, but there may be some cases in which the data does not go through at all in Active mode. In this case, try in Passive mode.
Lastly, do the test and analysis as much as possible before the problem is find by somebody else. Normally if any problem happens, almost everybody including me wants to get it solved right away. But solving the throughput related problem right away is just a matter of luck, not the matter of engineering/science. I don't like any situation which would depend only on luck. The best way is
to analyze the device as in detail as possible and see how each of the factors listed above influence the throughput of the device. Each of the factors influence in different ways to different device model/software. This is the only way to find the solution the soonest when the problem happens in the field.
 LTE Throughput Optimization: Part 1 – PDCCH Capacity Enhancement
 LTE Throughput Optimization: Part 2 – Spectral Efficiency