Bitten by the floating point bug

My game lives in a world of integers. In a modern era of powerful floating point geometry, for reasons of optimisation and simplicity, the main mechanic requires me to use only the wholest of numbers. However, mid-way through development, I decided it might be useful to use box2D in the future. I have my own very specific physics engine within the integer maths framework of my game, but box2D offer a lot of very fast solutions to a lot of mechanics my own physics can’t handle and potentially dumping parts of my world into it and pulling out the solutions as and when it was apt, seemed like a reasonable thing to do (a lot of games use box2D for their physics, including some rather famous ones, I love this anecdote from GDC). Unfortunately, according to the documentation, box2D works best at scales from 10cm to 10m, so my base scale of 1 would not sit well. Taking this into consideration I decided to introduce a world scale and have my entities give their positions and sizes in this scale, then move it back and forth to integer maths where necessary (and in many places hold onto both scale systems for speed, as a test with only moving into ints at the last minute, but every frame, caused a noticeable slow-down from all those casts).

Then I came across a bug where entities were appearing one pixel off from where I was telling them to appear. So I delved into the debugger and found a common issue with floats, but in a form I thought would be “safe”. To quickly recap, floats are stored internally using a mantissa and exponent, with the function being something like mantissa * 2 ^ exponent. If you check out this calculator you can see that the number 123 is actually stored as 1.921875 * 2^6. The problem this has is that the number 0.7 might be stored as 0.6999999992 and if you multiply that by 10.0 and cast it to an int naively you could end up with the integer value being 6 rather than 7 (as someone kindly pointed out, the value 7 is stored exactly as a float, but as soon as you start performing operations on it, then the certainty disappears).

Unfortunately, with my world scale of 0.1, I was taking an object of size 4, turning it into a 0.4 and then later dividing by the same scale to get back to 4 for some internal machination. In this case, sadly, it turned it back into a 3. I scratched my head and ran some tests dividing and multiplying numbers by 0.1 in the function where the problem was occurring, but these numbers all came out fine. As both the numbers were stored as variables, I tried dividing out the 0.4 variable by a float literal 0.1 and it gave me 3, but doing the same with a 0.4 float literal and the variable 0.1 gave me 4. In the debugger it was displaying the variable values clearly as 0.4 and 0.1, not as some almost approximations that would have hinted at a problem. One solution to this was to use roundf, which would give me the number I was looking for, by rounding the float to the nearest whole number, but if this could happen in this one spot, then it could happen anywhere else I was doing the same sort of calculation and I didn’t want to have to spend the time introducing a rounding function everywhere in my code, nor the overhead of calling it in terms of CPU. In the end I decided that, as I wasn’t using box2D now and could kick that can down the road, the thing to change would be having a world scale and doing all the division necessary to get objects into box2D as and when I end up using it.

I guess this is just one more example of where future proofing development has come back to bite me in the ass. That I introduced this issue before being advised against future proofing my project makes me feel a little better, but every day seems like a constant struggle to get the game closer to being finished and wasting hours on bugs like this is not where I want to spend my time (though learning a little something new about coding always feels worthwhile).

About these ads

7 thoughts on “Bitten by the floating point bug

  1. Not a bug. Floating point numbers are represented as binary digits and thus can’t precisely represent certain numbers. Review “What every computer scientist should know about floating-point arithmetic”
    http://www.cse.msu.edu/~cse320/Documents/FloatingPoint.pdf

    If you want an example of a real FP bug, there was a real bug in the floating point division operation on pentiums: http://www.intel.com/support/processors/pentium/sb/CS-013007.htm

    • I should say that I never thought this was a bug in how floating point numbers worked, just that it was a bug in my code caused by my use of floating point numbers. Unfortunately my decision to use a pun for a title has also bitten me in the ass.
      (thanks for the link though)

  2. Instead of using a scale of 0.1, use a scale of 0.125, or something else that is exactly representable in a floating point number.

  3. It is not a bug. *Binary* Floating Point was invented for the convenience of the computer, not the human. It uses a binary exponent for the fractional part of a number, instead of a decimal exponent that everyone is taught in school. Floating point was also designed to have a very large scale (range of number) at the sacrifice of precision. For calculating the mass of an atom or star, or your location in your favorite FPS, this error is fine, but for issues dealing with money, financial calculations, or anything that you do with decimal fractions, it is NOT ok to use binary floating point.

    Sadly, I have never met another programmer who knew about the difference between binary and decimal floating point and they go along their using binary floating point (floats and doubles) for all their calculations. It is also very sad that most of the “popular” programming languages being used right now only support data types native to a modern CPU, i.e. binary floating point only. Don’t they teach this in college any more? Thankfully languages like COBOL provide types for properly working with decimal numbers, or we would all be in trouble right now (well, more trouble anyway).

    Ironically, your chosen scale multiplier of 0.1 happens to be one of the numbers that binary floating point cannot represent exactly. It is stored as an approximation like 0.999999999999997 or whatever. Typically this won’t be a problem until you start to run out some multiplication or division on the number and increase the error. The value 0.1 also happens to be how most would write “ten cents” in money calculations, and imagine a large bank that had a system where the programmers used binary floating point for your saving account, or for doing their million and billion dollar transactions…

    If you want to work with decimal floating point and have your math work, then use integers scaled up to your largest rounding policy (i.e.: 10000 would be 1 dollar with the two “cents” digits, plus two digits of precision after that for calculations and rounding), or use a library designed for decimal numbers, like this one that has been around for over 20 years…

    http://speleotrove.com/decimal/

    @niggler: If you *really* want an example of a floating point bug, try this:

    http://www.ima.umn.edu/~arnold/disasters/patriot.html

    • I appreciate it’s not a bug. I now very much regret the need I feel to title my posts with something other than a direct description. Bad puns are not appreciated on the internet. Thanks for the verbiage and links.

      • If it makes you happy, I appreciated the pun. And I enjoyed your story, from the perspective of a novice (computer scientist and user of floating point numbers).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s