|
||
A floating-point number is a way to represent real numbers in computers, particularly those that are very large or very small. They're called "floating-point" because the decimal (or binary) point can "float" to different positions within the number. This allows for a wide range of values to be represented with a fixed number of digits. How it works:A floating-point number consists of three parts:
Why use floating-point numbers?
Limitations:Approximation: Most real numbers cannot be represented exactly as floating-point numbers, leading to rounding errors. This is because computers have finite memory, and floating-point numbers are essentially approximations of real numbers. Uneven distribution: The spacing between representable floating-point numbers is not uniform, meaning that the accuracy can vary depending on the magnitude of the number. Important Note: When working with floating-point numbers in calculations, it's important to be aware of their limitations and potential for rounding errors. Floating-Point RepresentationIn IEEE 754 single-precision floating-point representation, a number is represented as: (-1)s × 1.m × 2e
How to calculate Exponent ?The exponent in an IEEE 754 single-precision floating-point number is not directly stored. Instead, it's represented using a biased format. This means a fixed value, called the bias, is added to the actual exponent before storing it. For single-precision, this bias is 127. To determine the actual exponent, you'll need to extract the 8-bit exponent field from the 32-bit floating-point representation. Convert this binary value to decimal, and then subtract the bias (127) to obtain the true exponent. This process effectively shifts the range of representable exponents to allow for both positive and negative values without needing an explicit sign bit for the exponent itself. Remember that special cases exist for representing zero, infinity, and NaN (Not a Number), which have specific exponent field values. Understanding the bias and its role in exponent representation is crucial for interpreting and manipulating floating-point numbers accurately. Steps to Calculate the Exponent
Example CalculationLet's take an example to illustrate the calculation: Example Number: 12.375
IEEE 754 Representation of 12.375 Putting it all together, the IEEE 754 single-precision representation of 12.375 is: 0 10000010 10001100000000000000000 Explanation of Each Component
Steps in Code (Python Example)
Output
Summary
How to Calculate Mantissa ?The mantissa, also known as the significand, in an IEEE 754 single-precision floating-point number represents the fractional part of the number. This 23-bit field, however, only stores the digits after the leading 1. This leading 1 is implicit and not stored, which allows for an additional bit of precision. To obtain the full mantissa, you'll combine this hidden bit with the stored bits from the 23-bit field. Convert this combined binary value into decimal, and then divide it by 2 raised to the power of 23 to normalize it to a fraction between 1 and 2. Remember, this process only applies to normal numbers, as special cases like zero, infinity, and NaN have specific mantissa values. Recognizing the role of the hidden bit and normalization in mantissa calculation is key to accurately interpreting the precision and magnitude of floating-point numbers. Steps to Calculate the Mantissa
Example CalculationLet's take an example to illustrate the calculation: Example Number: 12.375
IEEE 754 Representation of 12.375 Putting it all together, the IEEE 754 single-precision representation of 12.375 is: 0 10000010 10001100000000000000000 Explanation of Each Component
Steps in Code (Python Example)
Output
Summary
Floating-Point Compression - BF1(Block Floating Point Compression)Block Floating Point (BFP) compression, specifically BF1, is a technique used to reduce the storage space required for a set of IEEE 754 single-precision floating-point numbers. The core idea is to find a common exponent that can represent the entire block of numbers with reasonable accuracy. This common exponent is stored once, and then only the mantissas of the individual numbers are stored, effectively saving the space that would have been used to store individual exponents for each number. To compress a block of floating-point numbers using BF1, you first find the largest absolute value within the block. The exponent of this largest value becomes the shared exponent for the entire block. Next, each number's mantissa is adjusted by shifting it to match the shared exponent. Finally, only these adjusted mantissas and the shared exponent are stored. Decompression involves reversing this process. The shared exponent is applied to each stored mantissa by shifting it back to its original position. This restores the original floating-point representation of each number within the block. While BFP compression can significantly reduce storage requirements, it does introduce some loss of precision due to the shared exponent approximation. The effectiveness of BFP depends on the characteristics of the data being compressed, such as the range of values and the desired level of accuracy. Following diagram depicts the overall process of BFP based on how I understand the process. In BFP, compression (i.e, reduced number of stored bits) happens at two level
Introduction to BF1 CompressionBlock Floating Point (BF1) compression is a technique used to compress a block of floating-point numbers by leveraging a common exponent for all the numbers in the block. This method is particularly effective in reducing the data size while maintaining a reasonable level of precision. The key steps in BF1 compression involve determining a common exponent, normalizing the numbers, quantizing the mantissas, and then storing the compressed data. Steps in BF1 Compression
In BF1 compression, we don't need to store the exponent part of each number separately because we use a common exponent for the entire block of numbers. It means the every number of the compressed data uses the same exponent(common exponent). So saving only one exponent(common exponent) and quantized mantissa for each number is enough to recover every individual value. Yes, that's correct! The BF1 (Block Floating Point) compression technique achieves data compression primarily through two key steps:
Example of BF1 CompressionLet's consider a block of floating-point numbers and compress them using BF1 compression. Example Block: [0.015, 0.020, 0.018, 0.022, 0.016, 0.021, 0.017, 0.019]
BF1 DecompressionTo decompress the data, the stored quantized mantissas are multiplied by Example of Decompression
Python Code ExampleDownload from here ( The output of the python code are shown below. ###### Original Block ###### Original Block (Decimal): [0.015, 0.02, 0.018, 0.022, 0.016, 0.021, 0.017, 0.019] Original Block (Binary Components): Number: 0.015 s: 0 e: 01111000 mantissa: 11101011100001010001111
Number: 0.02 s: 0 e: 01111001 mantissa: 01000111101011100001010
Number: 0.018 s: 0 e: 01111001 mantissa: 00100110111010010111100
Number: 0.022 s: 0 e: 01111001 mantissa: 01101000011100101011000
Number: 0.016 s: 0 e: 01111001 mantissa: 00000110001001001101111
Number: 0.021 s: 0 e: 01111001 mantissa: 01011000000100000110001
Number: 0.017 s: 0 e: 01111001 mantissa: 00010110100001110010110
Number: 0.019 s: 0 e: 01111001 mantissa: 00110111010010111100011
###### Compressed Block ###### Common Exponent (Decimal): -7 Common Exponent (Binary): 01111000 Compressed Block (Decimal): [1.919921875, 2.5595703125, 2.3037109375, 2.8154296875, 2.0478515625, 2.6875, 2.17578125, 2.431640625] Compression Percentage: 43.75% Compressed Block (Binary Components): Quantized Mantissa: 1.919921875 s: 0 e: 01111000 <== this individual exponent is not saved because this value for all elements in the block is same. quantized mantissa: 11110101110 <== bit length of this mantissa is shorter than the mantissa of the original block.
Quantized Mantissa: 2.5595703125 s: 0 e: 01111000 quantized mantissa: 101000111101
Quantized Mantissa: 2.3037109375 s: 0 e: 01111000 quantized mantissa: 100100110111
Quantized Mantissa: 2.8154296875 s: 0 e: 01111000 quantized mantissa: 101101000011
Quantized Mantissa: 2.0478515625 s: 0 e: 01111000 quantized mantissa: 100000110001
Quantized Mantissa: 2.6875 s: 0 e: 01111000 quantized mantissa: 101011000000
Quantized Mantissa: 2.17578125 s: 0 e: 01111000 quantized mantissa: 100010110100
Quantized Mantissa: 2.431640625 s: 0 e: 01111000 quantized mantissa: 100110111010
###### Decompressed Block ###### Decompressed Block (Decimal): [0.0149993896484375, 0.01999664306640625, 0.01799774169921875, 0.02199554443359375, 0.01599884033203125, 0.02099609375, 0.016998291015625, 0.0189971923828125] Decompressed Block (Binary Components): Decompressed Number: 0.0149993896484375 = 'quantized_mantissa * (2 ^ common_exponent)' = 1.919921875 * (2 ^-7) s: 0 e: 01111000 mantissa: 11101011100000000000000
Decompressed Number: 0.01999664306640625 = 'quantized_mantissa * (2 ^ common_exponent)' = 2.5595703125 * (2 ^-7) s: 0 e: 01111001 mantissa: 01000111101000000000000
Decompressed Number: 0.01799774169921875 = 'quantized_mantissa * (2 ^ common_exponent)' = 2.3037109375 * (2 ^-7) s: 0 e: 01111001 mantissa: 00100110111000000000000
Decompressed Number: 0.02199554443359375 = 'quantized_mantissa * (2 ^ common_exponent)' = 2.8154296875 * (2 ^-7) s: 0 e: 01111001 mantissa: 01101000011000000000000
Decompressed Number: 0.01599884033203125 = 'quantized_mantissa * (2 ^ common_exponent)' = 2.0478515625 * (2 ^-7) s: 0 e: 01111001 mantissa: 00000110001000000000000
Decompressed Number: 0.02099609375 = 'quantized_mantissa * (2 ^ common_exponent)' = 2.6875 * (2 ^-7) s: 0 e: 01111001 mantissa: 01011000000000000000000
Decompressed Number: 0.016998291015625 = 'quantized_mantissa * (2 ^ common_exponent)' = 2.17578125 * (2 ^-7) s: 0 e: 01111001 mantissa: 00010110100000000000000
Decompressed Number: 0.0189971923828125 = 'quantized_mantissa * (2 ^ common_exponent)' = 2.431640625 * (2 ^-7) s: 0 e: 01111001 mantissa: 00110111010000000000000
###### Verification ###### Original: 0.015000, Decompressed: 0.014999, Error: 0.000001 Original: 0.020000, Decompressed: 0.019997, Error: 0.000003 Original: 0.018000, Decompressed: 0.017998, Error: 0.000002 Original: 0.022000, Decompressed: 0.021996, Error: 0.000004 Original: 0.016000, Decompressed: 0.015999, Error: 0.000001 Original: 0.021000, Decompressed: 0.020996, Error: 0.000004 Original: 0.017000, Decompressed: 0.016998, Error: 0.000002 Original: 0.019000, Decompressed: 0.018997, Error: 0.000003 SummaryBF1 compression works by determining a common exponent for a block of floating-point numbers, normalizing the numbers using this common exponent, quantizing the mantissas, and storing the compressed data. During decompression, the stored quantized mantissas are multiplied by Reference
YouTube
|
||