Paper Example Doctorate 966 words

Numerical precision in computational methods

Last reviewed: October 27, 2013 ~5 min read

Abstract

This paper contains detailed descriptions of the use of floating point numbers in computer programming. It examines the memory usage of floating point numbers and the memory usage of binary coded decimal numbers. Each is examined for accuracy and efficiency. The use of floating point numbers often results in rounding errors and the use of guard bits to reduce these errors is also examined.

Floating Point Numbers

With the proliferation of increasingly complex software and hardware that perform tasks as varied as financial calculations or scientific experiments, the arithmetic involved in these operations has also grown cumbersome. Simple arithmetic is no longer sufficient when trying to compute things such as telephone call rates that are billed by the second and require six or more fractional digits or the Gross National Product of countries that may require fifteen digits to the left of the decimal (Cowlishaw, 2003, p. 3). The computing of such a wide range of numbers requires the use of computers and floating point numbers.

When dealing with computers, real numbers and the infinite combinations that they require are simply too inefficient to handle. The floating point number is designed to eliminate this problem. Floating point numbers can be either single precision or double precision. Single precision numbers have about eight significant decimal digits, while double precision have about seventeen (Mak, 2003, p. 9). These numbers are represented in base 2 instead of base 10 to conform to the internal binary form that computers require (Mak, 2003, p. 9).

These numbers are both stored in the computer's memory in the same way, with single precision requiring a total of 32 bits and double precision requiring 64 (Mak, 2003, p. 34). For both single and double precision numbers, one bit is used to store the sign of the number, which is either "0" for positive or "1" for negative (Mak, 2003, p. 34). The next part of the number that is stored in memory is the exponent, which comprises eight bits for single precision and eleven bits for double (Mak, 2003, p. 34). Finally the fractional value of the number is stored using 23 bits for single precision and 52 bits for double (Mak, 2003, p. 34). Because of the fact that floating point numbers limit the amount of memory used to store a number, it makes them very efficient and speedy when performing calculations. However, these same traits can also lead to errors in both rounding and computation.

Binary Coded Decimal

Many of the advantages of floating point numbers have been incorporated into binary coded decimal representations, while also adding the accuracy of decimal encodings (Sanchez & Canton, 2007, p. 52). Since such representations do not have formally established standards, each machine or software package uses the numbers in a unique and often incompatible way (Sanchez & Canton, 2007, p. 52). These formats can be useful for input-output operations and arithmetic calculations when BCD encoding is used.

One type of BCD encoding that is often used is called BCD12. It is named this way because it requires twelve bytes of memory storage, or 96 bits total (Sanchez & Canton, 2007, p. 52). The first four bits represent the sign of the number, either 0000B for a positive number or 0001B for a negative one (Sanchez & Canton, 2007, p. 52). The four low-order bits of the first byte represent the sign of the exponent, which is again either 0000B for a positive exponent or 0001B for a negative one (Sanchez & Canton, 2007, p. 52). The next two bytes encode the exponent of the digit, ranging from 0000 to 9999 (Sanchez & Canton, 2007, p. 52). The remaining nine bytes compose the significand field (Sanchez & Canton, 2007, p. 52).

The BCD format does allow for easy conversion into decimal forms and makes great use of the processor's arithmetic instructions to speed calculations of this type. However, BCD formats are much less efficient for storing numbers into memory than are floating point encodings. This is due to the fact that many extra bytes are used to store information than is necessary in BCD encoding, such as using four bits for the sign when only one is required (Sanchez & Canton, 2007, p. 53). Also, each BCD digit requires four bits, which binary could use to encode six additional combinations, for a byte-level wasted space of 100 codings out of a possible 256 (Sanchez & Canton, 2007, p. 53).

Rounding Errors

Though the floating point number is more efficient for computations that either exponential or binary coded decimals, it still has significant drawbacks, such as rounding errors. Often the calculations are performed and then the result is rounded to the correct amount of digits that can be stored, leading to errors in the final calculation. These rounding errors are then compounded when more arithmetic calculations are performed on them, often resulting in much larger errors.

You’re 79% through this paper. Sign up to read the full paper.

130,000+ paper examples AI writing assistant Citation generator Cancel anytime

References

7 sources cited in this paper

Cowlishaw, M.F. (2003). Decimal floating-point: Algorism for computers. Proceedings of the 16th IEEE Symposium on Computer Arithmetic. Retrieved from:
http://www.cs.tufts.edu/~nr/cs257/archive/mike-cowlishaw/decimal-arith.pdf. ↗
Govindarajalu, B. (2004). Computer architecture and organization. India: McGraw-Hill.
Mak, R. (2003). Java number cruncher: The Java programmer’s guide to numerical computing.
Upper Saddle River, NJ: Pearson Education, Inc.
Sanchez, J. & Canton, M.P. (2007). Microcontroller programming. Boca Raton, FL: Taylor &
Francis.

Cowlishaw, M.F. (2003). Decimal floating-point: Algorism for computers. Proceedings of the 16th IEEE Symposium on Computer Arithmetic. Retrieved from:
http://www.cs.tufts.edu/~nr/cs257/archive/mike-cowlishaw/decimal-arith.pdf. ↗
Govindarajalu, B. (2004). Computer architecture and organization. India: McGraw-Hill.
Mak, R. (2003). Java number cruncher: The Java programmer’s guide to numerical computing.
Upper Saddle River, NJ: Pearson Education, Inc.
Sanchez, J. & Canton, M.P. (2007). Microcontroller programming. Boca Raton, FL: Taylor &
Francis.

Cowlishaw, M.F. (2003). Decimal floating-point: Algorism for computers. Proceedings of the 16th IEEE Symposium on Computer Arithmetic. Retrieved from:
http://www.cs.tufts.edu/~nr/cs257/archive/mike-cowlishaw/decimal-arith.pdf. ↗
Govindarajalu, B. (2004). Computer architecture and organization. India: McGraw-Hill.
Mak, R. (2003). Java number cruncher: The Java programmer’s guide to numerical computing.
Upper Saddle River, NJ: Pearson Education, Inc.
Sanchez, J. & Canton, M.P. (2007). Microcontroller programming. Boca Raton, FL: Taylor &
Francis.

Cite This Paper

PaperDue. (2013). Numerical precision in computational methods. PaperDue. https://paperdue.com/essay/numerical-precision-125659

Always verify citation format against your institution’s current style guide requirements.

paperdue.com

Read the full paper
for $1

Access 130,000+ paper examples — plus AI writing assistant, citation generator, and more.

130,000+ paper examples
PaperDue AI writing assistant
Citation generator
Essay outline generator
Cancel anytime

Start $1 Trial →

$1 today · then $24.95/mo · cancel anytime

Writing Tools

Citation Generator Essay Outline Generator AI Writing Assistant Text Checker