2015年12月30日 星期三

Fixed Point Math in PHP with BCMath, precision loss cases

When dealing with fixed point numbers, you have to be very careful – especially if you develop with PHP and MySQL. In this article, obstacles and subtleties of working with the PHP BCMath extension, MySQL fixed point expression handling and persisting fixed point data from PHP to MySQL are described. Despite the occurring barriers we try to figure out how to work with fixed point numbers and not to lose a digit.

Troubles with BCMath

BCMath documentation says:
For arbitrary precision mathematics PHP offers the Binary Calculator which supports numbers of any size and precision, represented as strings.
So BCMath function parameters should be represented as strings. Passing numeric values to bcmathcan lead to wrong results, the same precision loss as when we treat double value as string

Case 1

echo bcmul(776.210000, '100', 10) . PHP_EOL;
    echo bcmul(776.211000, '100', 10) . PHP_EOL;
    echo bcmul(776.210100, '100', 10) . PHP_EOL;

    echo bcmul(50018850776.210000, '100', 10) . PHP_EOL;
    echo bcmul(50018850776.211000, '100', 10) . PHP_EOL;
    echo bcmul(50018850776.210100, '100', 10) . PHP_EOL;
Results are:
77621.00
    77621.100
    77621.0100
    5001885077621.00
    5001885077621.100
    5001885077621.00 //here we can see precision loss
Never pass numeric values to BCMath functions, only string values that represent numbers. Even when not dealing with floating points, BCMath can output strange results:

Case 2

echo bcmul('10', 0.0001, 10) . PHP_EOL;
 echo bcmul('10', 0.00001, 10) . PHP_EOL;
 echo 10*0.00001 . PHP_EOL;
Results are:
0.0010
 0 // thats really strange!!!
 0.0001
The reason for this is that BCMath converts its arguments to strings, and there are cases in which a number’s string representation has exponential notation.

Case 3

echo bcmul('10', '1e-4', 10) . PHP_EOL; //outputs 0 as well
PHP is a weakly typed language and in some cases you can’t control input in a strict way – you want to process as many requests as possible.
For example we can “fix” Case 2 and Case 3 by applying sprintf transformation:
$val = sprintf("%.10f", '1e-5');
 echo bcmul('10', $val, 10) . PHP_EOL;
 // gives us 0.0001000000
but applying the same transformation can break Case 1 “proper” behaviour:
$val = sprintf("%.10f", '50018850776.2100000000');
 echo bcmul('10', $val, 10) . PHP_EOL;
 echo bcmul('10', 50018850776.2100000000, 10) . PHP_EOL;
 500188507762.0999908450 //WRONG
 500188507762.10 //RIGHT
So the sprintf solution is not suitable for BCmath. Assuming all user inputs are strings, we can implement a simple validator, catching all exponential notation numbers and converting them properly. This technique is done in php-bignumbers, so we can safely pass in arguments like 1e-20 and50018850776.2101 without losing precision.
echo bcmul("50018850776.2101", '100', 10) . PHP_EOL;
 echo bcmul(Decimal::create("50018850776.2101"), '100', 10) . PHP_EOL;

 echo bcmul(Decimal::create("1e-8"), '100', 10) . PHP_EOL;
 echo bcmul("1e-8", '100', 10) . PHP_EOL;

 echo bcmul(50018850776.2101, '100', 10) . PHP_EOL;
 echo bcmul(Decimal::create(50018850776.2101), '100', 10) . PHP_EOL;
 
 // Result
 // 5001885077621.0100
 // 5001885077621.0100
 // 0.00000100
 // 0
 // 5001885077621.00
 // 5001885077621.00982700
But the last two lines of the example show us that floating point caveats cannot be avoided by input parsing (which is completely logical – we can not deal with PHP internal double representation).

BCMath final guidelines

Never use floating point numbers as fixed point operation arguments. String conversion does not help, because we can not manage the precision loss in any way.
When using BCMath extension operations, be careful with arguments in exponential representation. BCMath functions do not process exponential arguments (i.e. ‘1e-8’) correctly, so you should convert them manually. Be careful, do not use sprintf or similar conversion techniques, because it leads to precision loss.
You can use the php-bignumbers library which handles input arguments in exponential form and provides users with fixed point math operations functions. However, its performance is worse than that of the BCMath extension, so it’s a kind of compromise between a robust package and performance.

MySQL and fixed point numbers

In MySQL, fixed point numbers are handled with the DECIMAL column type. You can read the official MySQL documentation for data types and precision math operations.
The most interesting part is how MySQL handles expressions:
Handling of a numeric expression depends on the kind of values the expression contains:
If any approximate values are present, the expression is approximate and is evaluated using floating-point arithmetic.
If no approximate values are present, the expression contains only exact values. If any exact value contains a fractional part (a value following the decimal point), the expression is evaluated using DECIMAL exact arithmetic and has a precision of 65 digits. The term “exact” is subject to the limits of what can be represented in binary. For example, 1.0/3.0 can be approximated in decimal notation as .333…, but not written as an exact number, so (1.0/3.0)*3.0 does not evaluate to exactly 1.0.
Otherwise, the expression contains only integer values. The expression is exact and is evaluated using integer arithmetic and has a precision the same as BIGINT (64 bits).
If a numeric expression contains any strings, they are converted to double-precision floating-point values and the expression is approximate.
Here is a short example that demonstrates fractional part cases:
mysql> CREATE TABLE fixed_point (
        ->   amount NUMERIC(40,20) NOT NULL
        -> ) engine=InnoDB, charset=utf8;
    Query OK, 0 rows affected (0.02 sec)

    mysql> INSERT INTO fixed_point (amount) VALUES(0.2);
    Query OK, 1 row affected (0.00 sec)

    mysql> SELECT amount, amount + 0.1, amount + 1e-1, amount + '0.1' FROM fixed_point;
    +------------------------+------------------------+---------------------+---------------------+
    | amount                 | amount + 0.1           | amount + 1e-1       | amount + '0.1'      |
    +------------------------+------------------------+---------------------+---------------------+
    | 0.20000000000000000000 | 0.30000000000000000000 | 0.30000000000000004 | 0.30000000000000004 |
    +------------------------+------------------------+---------------------+---------------------+
    1 row in set (0.00 sec)
It may seen quite straightforward, but let’s look at how to deal with it within PHP.

Precision math in PHP & MySQL

So now we have to persist our fixed point values from PHP into MySQL. The right way is to use prepared statements and placeholders within our queries. Then we do parameter binding and everything is safe and secure.
$amount_to_add = "0.01";
 $stmt = $dbh->prepare("UPDATE fixed_point SET amount = amount + :amount");
 $stmt->bindValue("amount", $amount_to_add);
 $stmt->execute();
When we bind a value to a statement placeholder, we can specify its type by the bindValue third argument. Possible types are represented by constants PDO::PARAM_BOOLPDO::PARAM_NULL,PDO::PARAM_INTPDO::PARAM_STRPDO::PARAM_LOB and PDO::PARAM_STMT. So the problem is that the PHP PDO extension does not have a decimal parameter type for binding. As a result, all math expressions in queries are treated as floating point expressions, not as fixed point expressions.
$dbh = new PDO("mysql:host=localhost;dbname=test", "root", "");
    $dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
    $sql = "
        CREATE TABLE IF NOT EXISTS fixed_point (
            amount DECIMAL(43,20)
        )
    ";
    $dbh->query($sql);

    $dbh->query("DELETE FROM fixed_point");
    $dbh->query("INSERT INTO fixed_point VALUES(0.2)");

    $amount_to_add = "0.1";
    $stmt = $dbh->prepare("UPDATE fixed_point SET amount = amount + :amount");
    $stmt->bindValue("amount", $amount_to_add);
    $stmt->execute();

    $stmt = $dbh->prepare("SELECT amount FROM fixed_point");
    $stmt->execute();
    var_dump($stmt->fetchColumn());

    //output is string(22) "0.30000000000000004000"
If we want to take the advantage of prepared statements and work with fixed point numbers, the best way is to perform all math operations in PHP and save results to MySQL.
$amount_to_add = "0.1";
 $stmt = $dbh->prepare("SELECT amount FROM fixed_point");
 $stmt->execute();
 $amount = $stmt->fetchColumn();
 $new_amount = bcadd($amount, $amount_to_add, 20);
 $stmt = $dbh->prepare("UPDATE fixed_point SET amount=:amount");
 $stmt->bindValue("amount", $new_amount);
 $stmt->execute();
 $stmt = $dbh->prepare("SELECT amount FROM fixed_point");
 $stmt->execute();
 $amount_after_change = $stmt->fetchColumn();
 echo $amount_after_change . PHP_EOL;

Conclusion

We’ve reached the following conclusions:
  • Never use floating point numbers as fixed point operations arguments in BCMath PHP extension funcitons. Only strings.
  • BCMath extension does not work with string numbers in exponential representation
  • MySQL supports fixed point number expressions, but all operands have to be in decimal format. If at least one agrument is in exponential format or string, it is treated as floating point number and the expression is evaluated as floating point number.
  • PHP PDO extension does not have Decimal parameter type, so if you use prepared statements and binding parameters in SQL expressions that contain fixed point operands – you won’t get precise results.
  • To perform precise math operations in PHP+MySQL applications you can choose two ways. The first one is to process all operations in PHP and persist data to MySQL only with INSERT or UPDATEstatements. In this case you can use prepared statements and parameter binding. The second one is to build SQL queries manually (you can still use prepared statements, but you have to escape parameters by yourself) so all SQL math expressions are in decimal number representation.
My personal favorite approach is the first one: all math operations in PHP. I agree that PHP and MySQL may be not the best choice for applications with precision math, but if you chose this technology stack, it’s good to know that there is a way to deal with it the right way.

沒有留言:

張貼留言