What is floating-point?
Front
Active users
1
All-time users
1
Favorites
0
Last updated
4 years ago
Date created
Nov 30, 2020
Floating Point (recall)
(3 cards)
What is floating-point?
The representation for non-integral numbers
What are some examples of floating-point numbers?
How does the floating-point work in binary?
$$/pm 1.xxxxxxx_2 * 2^{yyyy}$$ (for example, $$1.0_2*2^{-1}$$
The decimal point represents the binary point.
The x's represent the fraction.
The first 2 represents base 2.
The y's represent the exponent.
There's a tradeoff between precision and range.
IEEE 754 Floating-Point Format
(1 card)
What is the IEEE 754 Floating-Point Format?
We have this example where x = -11111111.01 x 2^7
The s equals the sign positive or negative (1 bit).
The (1+Fraction) is the fraction (single 23 bits, double 52 bits) [The fraction takes the LSH completely]
The 2^(actual exponent) is the stored exponent (single 8 bits, double 11 bits).
Floating-Point Data Type
(1 card)
What is the single-precision floating-point (float) type?
float versus int32_t
(1 card)
What is the difference if int32_t y = 1000 and float x = 1000. (note on the floating point)?
int32_t y
vs
float x
Here, we have taken into account the sign bit as the MSB. Then, we took into account the excess 127, and with the exponent given by the offset of 9, we have 136_10 as our 8-bit exponent. Finally, the decimal/fraction part follows where we have the MSB to the floating-point right next to it. We zero-extend to the right if necessary.
Floating-Point example
(4 cards)
How do we represent 0.75 in IEEE 754 in single-precision format?
Convert the number to binary.
$$ 0.75_10 \tp 0.11_2 $$
Normalize the number.
$$ 0.11_2 \to 1.1 * 2^{-1} $$
With our given exponent, we can calculate the exponent part.
$$ -1 + 127 = 126 $$
Convert to binary and store in the next 8 positions after the MSB, which is 0 (positive).
$$126_{10} \to 01111110_2 $$
Insert the remaining decimal/fractional part in the next bit positions.
How do we represent decimal -0.5 in IEEE 754 single-precision format?
We take into account the MSB is 1 since the number is negative.
Convert decimal into binary
$$ 0.5_{10} \to 0.1_2 $$
Normalize the number
$$ 0.1_2 = 1 * 2.0^{-1}
Given our exponent, we calculate the excess
$$ -1 + 127 = 126 $$
Convert the excess into binary
$$ 126_{10} /to 01111110_2 $$
Save this excess in the corresponding bit positions after the MSB.
Save the fractional part in the remaining bit positions.
What is the binary representation of the binary number 111111.01 x 2^0 in the IEEE 754 single-precision format?
How do we represent decimal 2.0 in IEEE 754 single-precision format?
Convert the whole part to binary.
Normalize the binary.
Given the exponent, add that exponent to 127 and store it after the MSB, which is 0(positive).
Convert to binary before doing so as an 8-bit number.
Store the remaining decimal part to their respectable bit positions.
Single-precision Floating-point Representing Zero and One
(2 cards)
How do we represent 0 in single-precision floating-point?
We just know both the stored exponent and fractional part are 0 in binary.
How do we represent 1 in single-precision floating-point?
In binary, we know 1 is 0001.
We can normalize it to 1.0 x 2^0.
This gives us the exponent part to be 127 and the fractional part to be all 0's.
Floating-Point Registers
(3 cards)
Can floating point registers S0 to S15 be modified?
These may be modified by functions
Can floating-point registers S16 to S31 be modified?
Preserved by functions; no
What are floating-point registers D0-D15 used for?
They are used to hold 64-bit values. Yet, we can't perform 64-bit arithmetic.
Floating-point unit
(5 cards)
What are the processors used in this floating-point unit?
How does the FP processor process a constant as an input?
We use the VMOV rule for only a few values. Not recommended.
How does the integer processor interact with the main memory?
We have the LDR and STR instructions.
How does the FP processor interact with the main memory?
We have the VSTR and VLDR instructions.
How does the integer processor process a constant as an input?
Floating-Point Push & Pop
(2 cards)
How do we push FPU registers?
We use the VPUSH instruction. Use at function entry to preserve registers S16 to S31 that are modified.
How do we pop FPU registers?
We use the VPOP instruction. Use at function exit to restore registers S16 to S31 that are modified.
Function Parameters and Return Values
(2 cards)
How does the function call look for floating-point parameters?
How does a function with floating-point parameters look in assembly?
Loading Floating-Point Constants
(2 cards)
How does VLDR work?
Unlike LDR, we cannot do pseudo-instructions. We need to define a function label for the number with .float as the instruction. We can then use VLDR if this function is defined in the file.
How does VMOV work?
It works like MOV except we can work with a limited set of constants.
VMOV Immediate Constants
(1 card)
What are some of the VMOV immediate constants?
Yup
Copying Floating-Point Data -- Register -> Register
(1 card)
What are some of the functions for VMOV?
NOTE: These only copy data. They do NOT convert between integer and floating-point representations.
Copying Floating-Point Data -- Memory -> (Single or Double) Registers
(2 cards)
How does VSTR work?
How does VLDR work?
Converting between Integer and Floating Point
(1 card)
How does VCVT work?
Rounding Modes
(1 card)
What are the rounding modes?
float versus int32_t
(2 cards)
How do you define
float x = -1000.5 //or -1111101000.1_2
How do you define
int32_t y = -1000
Arithmetic with real numbers
(4 cards)
Solve the following Pythagorean Theorem function in assembly
float Hypotenuse(float side1, float side2)
Hypotenuse:
VMUL.F32 S0,S0,S0
// S0 = side1 * side1
VMLA.F32 S0,S1,S1
// S0 += side2 * side2
VSQRT.F32 S0,S0
// S0 = square root of S0
BX LR // Return
Find the discriminant in assembly
float Discriminant(float a, float b, float c){
return b * b –4.0 * a * c ;
}
Find the volume of a cube in assembly
float VolumeOfCube(float height, float width, float depth){
return height * width * depth;
}
What kind of arithmetic can you do with real numbers and floating-point?
Arbitrary Floating-Point Constants
(1 card)
Find the Area of a circus given a radius.
float AreaOfCircle(float radius){
return 3.14159 * radius * radius ;
}
Using Expressions to Create Constants
(1 card)
Calculate the volume of a sphere given a radius
float VolumeOfSphere(float radius){
return (4.0 / 3.0) * 3.14159 * radius *
radius * radius ;
}
Comparing Real numbers
(1 card)
How can we compare floating-point numbers?
Interpreting Flags After VCMP
(1 card)
What are the possible conditions where we can compare two floating-point numbers?
Copying 0 to a Floating-Point Register
(1 card)
How do we copy 0 to a FP register?
Given the issues with VMOV S0,0.0, we are left with
VSUB.F32 S0,S0,S0
Pointer to an Array of Floats
(4 cards)
Given this function
float TaylorPoly(float x, float coef_a[], int32_t terms)
what do we know about it?
FP reg S0: x
int reg R0: &coef_a
int reg R1: terms
When we pass an array of floats, where is this address passed to?
Integer registers R0-R3
How can we load the values of an array of floats to our registers?
VLDMIA R0!,{S1,S2,S3}
When we pass an array of floats as a fn parameter, what are we passing?
The address of the array that points to the first element of the array
Preserving Floating-Point Registers
(1 card)
Find the volume of a cone in assembly
float VolumeOfCone(float radius, float height){
return AreaOfCircle(radius) * height / 3.0 ;
}
Floating-Point Compare & Flags
(1 card)
int32_t ImaginaryRoots(float a, float b, float c){
return Discriminant(a, b, c) < 0.0 ;
//returns 1 or 0
}
FPU Instructions in IT Blocks
(1 card)
float LimitedIncrement(float a, float b){
if (a < b) a += 1.0 ;
return a ;
}
Floating-Point Equality Test
(1 card)
Extended Floating-Point Example
(1 card)
float QuadraticRoot(float a, float b, float c, intminus){
float root = sqrt(Discriminant(a, b, c)) ;
float top ;
if (minus) top = -b –root ;
else top = -b + root ;
return top / 2*a ;
}
FPU Programming - The Essentials
(11 cards)
Can we use constants as expressions for FP instructions?
What registers belong to float?
What about Pointers, Addresses, and integers?
Float: S0-S15
Else: R0-R3
Can we write pseudo-instructions with VLDR like with LDR?
No. Use .float to create a constant in memory, add a label to it, and use VLDR Sn,label to load it.
Can we use VMOV with many integers?
No. VMOV supports a very restricted set on immediate constants. Easiest to only use it to load small integers (like 4.0) and some simple fractions (like 0.5).
Do we need to specify the operand type for certain instructions?
Yes. All instructions that perform arithmetic, data type conversion, or compares must specify the operand type, as in VADD.F32.
Do we append a condition code to an FPU instruction?
Yes. In an IT block, append the condition code to an FPU instruction BEFORE appending the data type specifier, as in VADDLE.F32
Do we need to use VMRS after VCMP?
Yes. Comparing two FPU values requires VCMP followed by VMRS APSR_nzcv,FPSCR before the conditional branch or IT block.
How do we preserve FPU regs?
By copying S0-S15 to R4-R8 and push/pop R4,R8
Do VPUSH and VPOP work with int regs?
No, only FPU regs.
Does VMOV convert an integer to float and vice-versa?
No. That requires a combination of VMOV and VCVT.
What are the addressing modes for VSTR?
[Rn] and [Rn, constant]
FPU instruction cycle counts
(1 card)
What is the cycle counts for FPU instructions?
Division by an Arbitrary Constant
(2 cards)
How can you float divide using reciprocal multiplication?
How can you divide using integer multiplication?