Another month has passed; time for another T-SQL Tuesday post. The monthly blog party is hosted by Brent Ozar (b|t) this time. And he wants us to write about our favourite (or least favourite) data type.

I don’t really have a data type I like. I also don’t hate data types. The thing I do hate is when the wrong data type is used for the purpose.

So instead of doing what Brent asks, I will write a short post about a very common misconception about a data type that triggers me whenever I read it. Which happens a lot.

# Precise versus approximate

One of the most widespread falsehoods about data types is that *float* (or its synonym, *real*) should never be used because it is imprecise. And when people say that, they often come with examples where they show how computations involving *float* data yield incorrect results. For instance, multiplying a price by a quantity, then subtracting the amount paid, and then the remaining balance is one cent off. That demo is then used as “proof” that *float* is a bad data type that should always be avoided.

What’s weird is that if I run that same query with data type *int*, I probably also get the wrong results. Yet nobody claims *int* is a terrible data type that should not be used. When I use data type *datetime*, the same repro will likely result in an error. When I use data type *varchar* and use it to compute the sum of 123 and 456, the result will be 123456 and nobody will take that as proof that *varchar* sucks.

If I would seriously make the claims above, people would quickly respond that it’s my fault, that I’m trying to use a data type in the wrong situation. So here’s my question. Why does nobody realize that the example of multiplying quantity by price and getting a rounding error is **also** not “proof” that *float* suck – if anything, it merely proves that float is not the right choice for computing prices.

# Back to high school

To explain why *float* is not a bad data type at all, I’ll need to take you back to your high school days. To science class.

Remember the practicum part of your science class? Remember how you were taught how to use instruments that measure size, weight, distance, speed, power, of whatever else you had to measure? And most importantly, remember how you were taught to assess how precise the instrument and your reading were? When you measured a length, you knew the centimetres (sorry Americans), but how sure were you about the number of millimetres you wrote down? Would you dare to write down fractional millimetres? In those cases, you said that you had significant digits to the centimetre, did write down the best estimation of the millimetres for good measure, but more digits were considered wrong. And of course, when you were allowed to use a high precision measuring device, you had more significant digits; but even those instruments do not have infinite precision.

So let’s say you measured an object, and you wrote down the length as 512.5 centimetres, the width as 371.3 centimetres, and the depth as 110.7 centimetres; all of these significant up to the centimetre, so they all had 3 significant digits. To computed the cubage, you had to multiply the numbers in metres: 5.125 * 3.713 * 1.107 = 21.065241375 m^{3}. If you gave that as the answer on your test, you’d get some points deducted, because you gave way too many digits. Your measurements were trustworthy up to three significant digits, and that means that the answer needs to be given in three significant digits (this is a simplification; see here for the “beyond high school level” version), so the correct answer was 21.1 m^{3}.

Now let’s assume your teacher asked you to compute the energy stored in a single plutonium atom, using Einstein’s famous formula: E = mc^{2}. Of course, this formula expects standard units, so we have to express the mass of that single atom in grammes. If I did my math right and didn’t miscount the zeros, that number is 0.0000000000000000000004052. A long number, but there are only four significant digits. Scientists prefer to write this in a different way, as 4.052 * 10^{-22} gram.

Similarly, c (the speed of light) needs to be expressed in metres per second. Let’s say we have measured / computed this speed with four significant digits as well. That would then be 299900000 m/s. With this way of writing, it is not even immediately obvious whether there are four significant digits, of five and the fifth just happens to be a zero. So in this case there’s even more reason to write it as 2.999 * 10^{8}.

# Return to the database

Let’s return to the database. Let’s figure out a way to store these numbers appropriately.

Could we use *decimal* (or its synonym *numeric*)? Well, yes. We can. We need 25 digits after the decimal place for 0.0000000000000000000004052, and 9 digits before the decimal place for 299900000, so that would fit in a decimal(34,25). But if you try to compute c^{2} so you can then multiply that to the m, you’ll run into an overflow error.

Okay. You can use different definitions for each. Play around with the length and precision of the various variables and you’ll find some combinations that don’t return an error. But now you’ll notice different results. I got 0.0000364437000000000000000 as the answer for one of my tests. On another test, the answer was 0.0000360000000000000000000. Still correct, but only up to two significant digits. We had four in our input! For sure, the professor will dock points for this!

And what if you are designing a general use database for a science department, where measurements will be stored and computations run for all departments? What datatype should you use for a (measured or computed) weight, when it’s used by nuclear physicists to store the mass of a single plutonium atom, but also by astrologist to store the mass of our Sun?

And this is where *float* makes its big entrance! With *float* data types, the mass of that plutonium atom is stored as 4.052E-22 (which is the standard “computer” representation of 4.052 * 10^{-22}). The speed of light is stored as 2.999E+08. The square of that is then computed and stored as 8.994001E+16, and multiplying this with the mass of a plutonium atom results in 3.64437E-05, or as scientists would write: 3.64437 * 10^{-5}. Is that last digit really correct? Are we sure there’s no rounding error in the computation? I don’t know. And I don’t care. Remember, we only had four significant digits in our measurements anyway, so that means only four, or at most five, digits should be retained: 3.644 * 10^{-5}. Whether or not that seven was correct makes no difference at all!

# Conclusion

There are no “bad” data types in SQL Server. Data types can be inappropriate for the job. For working with money, which is by definition precise numeric data, data types such as *int*, *varchar*, and *float* won’t work. But *decimal* and *money* are ideal for this job!

But when dealing with scientific data, often the results of measurements and computations based on measurements, we are handling data that by its very definition is an approximation. So this data is best handled by a data type designed for approximate-number data, and that would be *float*.

## 5 Comments. Leave new

Hear, hear!

[…] Hugo Kornelis levies a defense of floating point data types: […]

[…] Float Does Not Suck by Hugo Kornelis – “To explain why float is not a bad data type at all, I’ll need to take you back to your high school days. To science class.” […]

I once looked after a ERP SQL Server db which had been set up with floats as primary keys – amazingly enough it even survived moving from 32 to 64 bit CPUs without throwing a wobbly

Uhhhmm … yikes, I guess?

Glad it worked for you. But believe me, this is really one of those “just because it works …” cases. Floating point data is even worse as a primary key than guids. In fact, I’d struggle to come up with a worse choice. (Perhaps sql_variant…)