Floating Point (recall)

What is floating-point?

Front

Active users

1

All-time users

1

Favorites

0

Last updated

3 years ago

Date created

Nov 30, 2020

Floating Point (recall)

(3 cards)

What is floating-point?

The representation for non-integral numbers

- Including very small and very large numbers

What are some examples of floating-point numbers?

- Scientific notation
- $$-2.34 * 10^{56}$$
- $$+0.002*10^{-4}$$
- $$+987.02*10^9$$

- Binary
- $$\pm 1.xxxxxxx_2 * 2^{yyyy}$$

- Types in C
- float (single-precision FP)
- double (double-precision FP)

How does the floating-point work in binary?

$$/pm 1.xxxxxxx_2 * 2^{yyyy}$$ (for example, $$1.0_2*2^{-1}$$

The decimal point represents the binary point.

The x's represent the fraction.

The first 2 represents base 2.

The y's represent the exponent.

There's a tradeoff between precision and range.

IEEE 754 Floating-Point Format

(1 card)

What is the IEEE 754 Floating-Point Format?

We have this example where x = -11111111.01 x 2^7

The s equals the sign positive or negative (1 bit).

The (1+Fraction) is the fraction (single 23 bits, double 52 bits) [The fraction takes the LSH completely]

The 2^(actual exponent) is the stored exponent (single 8 bits, double 11 bits).

Floating-Point Data Type

(1 card)

What is the single-precision floating-point (float) type?

float versus int32_t

(1 card)

What is the difference if int32_t y = 1000 and float x = 1000. (note on the floating point)?

int32_t y

vs

float x

Here, we have taken into account the sign bit as the MSB. Then, we took into account the excess 127, and with the exponent given by the offset of 9, we have 136_10 as our 8-bit exponent. Finally, the decimal/fraction part follows where we have the MSB to the floating-point right next to it. We zero-extend to the right if necessary.

Floating-Point example

(4 cards)

How do we represent 0.75 in IEEE 754 in single-precision format?

Convert the number to binary.

$$ 0.75_10 \tp 0.11_2 $$

Normalize the number.

$$ 0.11_2 \to 1.1 * 2^{-1} $$

With our given exponent, we can calculate the exponent part.

$$ -1 + 127 = 126 $$

Convert to binary and store in the next 8 positions after the MSB, which is 0 (positive).

$$126_{10} \to 01111110_2 $$

Insert the remaining decimal/fractional part in the next bit positions.

How do we represent decimal -0.5 in IEEE 754 single-precision format?

We take into account the MSB is 1 since the number is negative.

Convert decimal into binary

$$ 0.5_{10} \to 0.1_2 $$

Normalize the number

$$ 0.1_2 = 1 * 2.0^{-1}

Given our exponent, we calculate the excess

$$ -1 + 127 = 126 $$

Convert the excess into binary

$$ 126_{10} /to 01111110_2 $$

Save this excess in the corresponding bit positions after the MSB.

Save the fractional part in the remaining bit positions.

What is the binary representation of the binary number 111111.01 x 2^0 in the IEEE 754 single-precision format?

Back

How do we represent decimal 2.0 in IEEE 754 single-precision format?

Convert the whole part to binary.

Normalize the binary.

Given the exponent, add that exponent to 127 and store it after the MSB, which is 0(positive).

Convert to binary before doing so as an 8-bit number.

Store the remaining decimal part to their respectable bit positions.

Single-precision Floating-point Representing Zero and One

(2 cards)

How do we represent 0 in single-precision floating-point?

We just know both the stored exponent and fractional part are 0 in binary.

How do we represent 1 in single-precision floating-point?

In binary, we know 1 is 0001.

We can normalize it to 1.0 x 2^0.

This gives us the exponent part to be 127 and the fractional part to be all 0's.

Floating-Point Registers

(3 cards)

Can floating point registers S0 to S15 be modified?

These may be modified by functions

Can floating-point registers S16 to S31 be modified?

Preserved by functions; no

What are floating-point registers D0-D15 used for?

They are used to hold 64-bit values. Yet, we can't perform 64-bit arithmetic.

Floating-point unit

(5 cards)

What are the processors used in this floating-point unit?

- We have the Integer Processor
- We hold the registers R0 to R12
- We have APSR flags
- We have more familiarity with this processor

- We have the FP (floating-point) Processor
- We hold the registers S0 to S31
- We have FPU flags

How does the FP processor process a constant as an input?

We use the VMOV rule for only a few values. Not recommended.

How does the integer processor interact with the main memory?

We have the LDR and STR instructions.

How does the FP processor interact with the main memory?

We have the VSTR and VLDR instructions.

How does the integer processor process a constant as an input?

- We can copy constants from one register to another with MOV, MVN, MOVW, and MOVT registers
- We can use LDR as a pseudo-instruction to load a constant to a register

Floating-Point Push & Pop

(2 cards)

How do we push FPU registers?

We use the VPUSH instruction. Use at function entry to preserve registers S16 to S31 that are modified.

How do we pop FPU registers?

We use the VPOP instruction. Use at function exit to restore registers S16 to S31 that are modified.

Function Parameters and Return Values

(2 cards)

How does the function call look for floating-point parameters?

Back

How does a function with floating-point parameters look in assembly?

Back

Loading Floating-Point Constants

(2 cards)

How does VLDR work?

Unlike LDR, we cannot do pseudo-instructions. We need to define a function label for the number with .float as the instruction. We can then use VLDR if this function is defined in the file.

How does VMOV work?

It works like MOV except we can work with a limited set of constants.

VMOV Immediate Constants

(1 card)

What are some of the VMOV immediate constants?

Yup

Copying Floating-Point Data -- Register -> Register

(1 card)

What are some of the functions for VMOV?

**NOTE: **These only copy data. They do NOT convert between integer and floating-point representations.

Copying Floating-Point Data -- Memory -> (Single or Double) Registers

(2 cards)

How does VSTR work?

Back

How does VLDR work?

Back

Converting between Integer and Floating Point

(1 card)

How does VCVT work?

Back

Rounding Modes

(1 card)

What are the rounding modes?

- Round to
- Nearest even (default)
- Round toward pos inf
- Round toward neg inf
- Round towards zero

float versus int32_t

(2 cards)

How do you define

`float x = -1000.5 //or -1111101000.1_2`

Back

How do you define

`int32_t y = -1000`

Back

Arithmetic with real numbers

(4 cards)

Solve the following Pythagorean Theorem function in assembly

`float Hypotenuse(float side1, float side2)`

```
Hypotenuse:
VMUL.F32 S0,S0,S0
// S0 = side1 * side1
VMLA.F32 S0,S1,S1
// S0 += side2 * side2
VSQRT.F32 S0,S0
// S0 = square root of S0
BX LR // Return
```

Find the discriminant in assembly

```
float Discriminant(float a, float b, float c){
return b * b –4.0 * a * c ;
}
```

Back

Find the volume of a cube in assembly

```
float VolumeOfCube(float height, float width, float depth){
return height * width * depth;
}
```

Back

What kind of arithmetic can you do with real numbers and floating-point?

Back

Arbitrary Floating-Point Constants

(1 card)

Find the Area of a circus given a radius.

```
float AreaOfCircle(float radius){
return 3.14159 * radius * radius ;
}
```

Back

Using Expressions to Create Constants

(1 card)

Calculate the volume of a sphere given a radius

```
float VolumeOfSphere(float radius){
return (4.0 / 3.0) * 3.14159 * radius *
radius * radius ;
}
```

Back

Comparing Real numbers

(1 card)

How can we compare floating-point numbers?

Back

Interpreting Flags After VCMP

(1 card)

What are the possible conditions where we can compare two floating-point numbers?

Back

Copying 0 to a Floating-Point Register

(1 card)

How do we copy 0 to a FP register?

Given the issues with VMOV S0,0.0, we are left with

`VSUB.F32 S0,S0,S0`

Pointer to an Array of Floats

(4 cards)

Given this function

`float TaylorPoly(float x, float coef_a[], int32_t terms)`

what do we know about it?

FP reg S0: x

int reg R0: &coef_a

int reg R1: terms

When we pass an array of floats, where is this address passed to?

Integer registers R0-R3

How can we load the values of an array of floats to our registers?

VLDMIA R0!,{S1,S2,S3}

When we pass an array of floats as a fn parameter, what are we passing?

The address of the array that points to the first element of the array

Preserving Floating-Point Registers

(1 card)

Find the volume of a cone in assembly

```
float VolumeOfCone(float radius, float height){
return AreaOfCircle(radius) * height / 3.0 ;
}
```

Back

Floating-Point Compare & Flags

(1 card)

```
int32_t ImaginaryRoots(float a, float b, float c){
return Discriminant(a, b, c) < 0.0 ;
//returns 1 or 0
}
```

Back

FPU Instructions in IT Blocks

(1 card)

```
float LimitedIncrement(float a, float b){
if (a < b) a += 1.0 ;
return a ;
}
```

Back

Floating-Point Equality Test

(1 card)

Front

Back

Extended Floating-Point Example

(1 card)

```
float QuadraticRoot(float a, float b, float c, intminus){
float root = sqrt(Discriminant(a, b, c)) ;
float top ;
if (minus) top = -b –root ;
else top = -b + root ;
return top / 2*a ;
}
```

Back

FPU Programming - The Essentials

(11 cards)

Can we use constants as expressions for FP instructions?

Back

What registers belong to float?

What about Pointers, Addresses, and integers?

Float: S0-S15

Else: R0-R3

Can we write pseudo-instructions with VLDR like with LDR?

No. Use .float to create a constant in memory, add a label to it, and use VLDR Sn,label to load it.

Can we use VMOV with many integers?

No. VMOV supports a very restricted set on immediate constants. Easiest to only use it to load small integers (like 4.0) and some simple fractions (like 0.5).

Do we need to specify the operand type for certain instructions?

Yes. All instructions that perform arithmetic, data type conversion, or compares must specify the operand type, as in VADD.F32.

- VCVT requires two specifiers (VCVT.F32.S32).

Do we append a condition code to an FPU instruction?

Yes. In an IT block, append the condition code to an FPU instruction BEFORE appending the data type specifier, as in VADD* LE*.F32

Do we need to use VMRS after VCMP?

Yes. Comparing two FPU values requires VCMP followed by VMRS APSR_nzcv,FPSCR before the conditional branch or IT block.

How do we preserve FPU regs?

By copying S0-S15 to R4-R8 and push/pop R4,R8

Do VPUSH and VPOP work with int regs?

No, only FPU regs.

Does VMOV convert an integer to float and vice-versa?

No. That requires a combination of VMOV and VCVT.

- VCVT.S32.F32 Sd,Sm
- // Sd <- (float) Sm, where Sm is a 2’s comp integer

- VMOV Rn, Sd

What are the addressing modes for VSTR?

[Rn] and [Rn, constant]

FPU instruction cycle counts

(1 card)

What is the cycle counts for FPU instructions?

Back

Division by an Arbitrary Constant

(2 cards)

How can you float divide using reciprocal multiplication?

Back

How can you divide using integer multiplication?

Back