atof(将字符串转化为浮点数的Java实现)

时间：2014-07-03 13:32:26 收藏：0 阅读：524

原文链接：http://www.loveyqq.tk/blog/2014/07/03/atof-jiang-zi-fu-chuan-zhuan-hua-wei-fu-dian-shu-de-javashi-xian/

atof，是C语言中的一个字符串转化为浮点数的函数，在Java在也有一个对应的实现，就是大家所熟悉的Double.parseDouble(String s)函数。

既然是讲atof的Java实现，肯定脱离不开C语言的实现，引用[我的算法学习之路]中的一句话

“stof是一个简单到爆的“算法””

事实却是如此，字符串转化为浮点数整个的算法核心只有一个，如何将字符‘0‘~‘9‘转化为计算机能识别的数字0~9，而在C语言中有一个很简单的转化方式：int x = (char)c - ‘0‘; 剩下的就是一些异常处理以及如果有效的得到该字符串的位数。

本文核心是想理解Java的实现，借此可以了解Java将字符串转为为浮点数的原理，通过这些代码可以看到，代码编写人注意了每一行代码的变量声明，以及if的逻辑控制提高效率：

<span style="font-size:18px;">public strictfp double doubleValue(){
        int     kDigits = Math.min( nDigits, maxDecimalDigits+1 );
        long    lValue;
        double  dValue;
        double  rValue, tValue;

        // First, check for NaN and Infinity values
        if(digits == infinity || digits == notANumber) {
            if(digits == notANumber)
                return Double.NaN;
            else
                return (isNegative?Double.NEGATIVE_INFINITY:Double.POSITIVE_INFINITY);
        }
        else {
            if (mustSetRoundDir) {
                roundDir = 0;
            }
            /*
             * convert the lead kDigits to a long integer.
             */
            // (special performance hack: start to do it using int)
            int iValue = (int)digits[0]-(int)'0';
            int iDigits = Math.min( kDigits, intDecimalDigits );
            for ( int i=1; i < iDigits; i++ ){
                iValue = iValue*10 + (int)digits[i]-(int)'0';
            }
            lValue = (long)iValue;
            for ( int i=iDigits; i < kDigits; i++ ){
                lValue = lValue*10L + (long)((int)digits[i]-(int)'0');
            }
            dValue = (double)lValue;
            int exp = decExponent-kDigits;
            /*
             * lValue now contains a long integer with the value of
             * the first kDigits digits of the number.
             * dValue contains the (double) of the same.
             */

            if ( nDigits <= maxDecimalDigits ){
                /*
                 * possibly an easy case.
                 * We know that the digits can be represented
                 * exactly. And if the exponent isn't too outrageous,
                 * the whole thing can be done with one operation,
                 * thus one rounding error.
                 * Note that all our constructors trim all leading and
                 * trailing zeros, so simple values (including zero)
                 * will always end up here
                 */
                if (exp == 0 || dValue == 0.0)
                    return (isNegative)? -dValue : dValue; // small floating integer
                else if ( exp >= 0 ){
                    if ( exp <= maxSmallTen ){
                        /*
                         * Can get the answer with one operation,
                         * thus one roundoff.
                         */
                        rValue = dValue * small10pow[exp];
                        if ( mustSetRoundDir ){
                            tValue = rValue / small10pow[exp];
                            roundDir = ( tValue ==  dValue ) ? 0
                                :( tValue < dValue ) ? 1
                                : -1;
                        }
                        return (isNegative)? -rValue : rValue;
                    }
                    int slop = maxDecimalDigits - kDigits;
                    if ( exp <= maxSmallTen+slop ){
                        /*
                         * We can multiply dValue by 10^(slop)
                         * and it is still "small" and exact.
                         * Then we can multiply by 10^(exp-slop)
                         * with one rounding.
                         */
                        dValue *= small10pow[slop];
                        rValue = dValue * small10pow[exp-slop];

                        if ( mustSetRoundDir ){
                            tValue = rValue / small10pow[exp-slop];
                            roundDir = ( tValue ==  dValue ) ? 0
                                :( tValue < dValue ) ? 1
                                : -1;
                        }
                        return (isNegative)? -rValue : rValue;
                    }
                    /*
                     * Else we have a hard case with a positive exp.
                     */
                } else {
                    if ( exp >= -maxSmallTen ){
                        /*
                         * Can get the answer in one division.
                         */
                        rValue = dValue / small10pow[-exp];
                        tValue = rValue * small10pow[-exp];
                        if ( mustSetRoundDir ){
                            roundDir = ( tValue ==  dValue ) ? 0
                                :( tValue < dValue ) ? 1
                                : -1;
                        }
                        return (isNegative)? -rValue : rValue;
                    }
                    /*
                     * Else we have a hard case with a negative exp.
                     */
                }
            }

            /*
             * Harder cases:
             * The sum of digits plus exponent is greater than
             * what we think we can do with one error.
             *
             * Start by approximating the right answer by,
             * naively, scaling by powers of 10.
             */
            if ( exp > 0 ){
                if ( decExponent > maxDecimalExponent+1 ){
                    /*
                     * Lets face it. This is going to be
                     * Infinity. Cut to the chase.
                     */
                    return (isNegative)? Double.NEGATIVE_INFINITY : Double.POSITIVE_INFINITY;
                }
                if ( (exp&15) != 0 ){
                    dValue *= small10pow[exp&15];
                }
                if ( (exp>>=4) != 0 ){
                    int j;
                    for( j = 0; exp > 1; j++, exp>>=1 ){
                        if ( (exp&1)!=0)
                            dValue *= big10pow[j];
                    }
                    /*
                     * The reason for the weird exp > 1 condition
                     * in the above loop was so that the last multiply
                     * would get unrolled. We handle it here.
                     * It could overflow.
                     */
                    double t = dValue * big10pow[j];
                    if ( Double.isInfinite( t ) ){
                        /*
                         * It did overflow.
                         * Look more closely at the result.
                         * If the exponent is just one too large,
                         * then use the maximum finite as our estimate
                         * value. Else call the result infinity
                         * and punt it.
                         * ( I presume this could happen because
                         * rounding forces the result here to be
                         * an ULP or two larger than
                         * Double.MAX_VALUE ).
                         */
                        t = dValue / 2.0;
                        t *= big10pow[j];
                        if ( Double.isInfinite( t ) ){
                            return (isNegative)? Double.NEGATIVE_INFINITY : Double.POSITIVE_INFINITY;
                        }
                        t = Double.MAX_VALUE;
                    }
                    dValue = t;
                }
            } else if ( exp < 0 ){
                exp = -exp;
                if ( decExponent < minDecimalExponent-1 ){
                    /*
                     * Lets face it. This is going to be
                     * zero. Cut to the chase.
                     */
                    return (isNegative)? -0.0 : 0.0;
                }
                if ( (exp&15) != 0 ){
                    dValue /= small10pow[exp&15];
                }
                if ( (exp>>=4) != 0 ){
                    int j;
                    for( j = 0; exp > 1; j++, exp>>=1 ){
                        if ( (exp&1)!=0)
                            dValue *= tiny10pow[j];
                    }
                    /*
                     * The reason for the weird exp > 1 condition
                     * in the above loop was so that the last multiply
                     * would get unrolled. We handle it here.
                     * It could underflow.
                     */
                    double t = dValue * tiny10pow[j];
                    if ( t == 0.0 ){
                        /*
                         * It did underflow.
                         * Look more closely at the result.
                         * If the exponent is just one too small,
                         * then use the minimum finite as our estimate
                         * value. Else call the result 0.0
                         * and punt it.
                         * ( I presume this could happen because
                         * rounding forces the result here to be
                         * an ULP or two less than
                         * Double.MIN_VALUE ).
                         */
                        t = dValue * 2.0;
                        t *= tiny10pow[j];
                        if ( t == 0.0 ){
                            return (isNegative)? -0.0 : 0.0;
                        }
                        t = Double.MIN_VALUE;
                    }
                    dValue = t;
                }
            }

            /*
             * dValue is now approximately the result.
             * The hard part is adjusting it, by comparison
             * with FDBigInt arithmetic.
             * Formulate the EXACT big-number result as
             * bigD0 * 10^exp
             */
            FDBigInt bigD0 = new FDBigInt( lValue, digits, kDigits, nDigits );
            exp   = decExponent - nDigits;

            correctionLoop:
            while(true){
                /* AS A SIDE EFFECT, THIS METHOD WILL SET THE INSTANCE VARIABLES
                 * bigIntExp and bigIntNBits
                 */
                FDBigInt bigB = doubleToBigInt( dValue );

                /*
                 * Scale bigD, bigB appropriately for
                 * big-integer operations.
                 * Naively, we multiply by powers of ten
                 * and powers of two. What we actually do
                 * is keep track of the powers of 5 and
                 * powers of 2 we would use, then factor out
                 * common divisors before doing the work.
                 */
                int B2, B5; // powers of 2, 5 in bigB
                int     D2, D5; // powers of 2, 5 in bigD
                int Ulp2;   // powers of 2 in halfUlp.
                if ( exp >= 0 ){
                    B2 = B5 = 0;
                    D2 = D5 = exp;
                } else {
                    B2 = B5 = -exp;
                    D2 = D5 = 0;
                }
                if ( bigIntExp >= 0 ){
                    B2 += bigIntExp;
                } else {
                    D2 -= bigIntExp;
                }
                Ulp2 = B2;
                // shift bigB and bigD left by a number s. t.
                // halfUlp is still an integer.
                int hulpbias;
                if ( bigIntExp+bigIntNBits <= -expBias+1 ){
                    // This is going to be a denormalized number
                    // (if not actually zero).
                    // half an ULP is at 2^-(expBias+expShift+1)
                    hulpbias = bigIntExp+ expBias + expShift;
                } else {
                    hulpbias = expShift + 2 - bigIntNBits;
                }
                B2 += hulpbias;
                D2 += hulpbias;
                // if there are common factors of 2, we might just as well
                // factor them out, as they add nothing useful.
                int common2 = Math.min( B2, Math.min( D2, Ulp2 ) );
                B2 -= common2;
                D2 -= common2;
                Ulp2 -= common2;
                // do multiplications by powers of 5 and 2
                bigB = multPow52( bigB, B5, B2 );
                FDBigInt bigD = multPow52( new FDBigInt( bigD0 ), D5, D2 );
                //
                // to recap:
                // bigB is the scaled-big-int version of our floating-point
                // candidate.
                // bigD is the scaled-big-int version of the exact value
                // as we understand it.
                // halfUlp is 1/2 an ulp of bigB, except for special cases
                // of exact powers of 2
                //
                // the plan is to compare bigB with bigD, and if the difference
                // is less than halfUlp, then we're satisfied. Otherwise,
                // use the ratio of difference to halfUlp to calculate a fudge
                // factor to add to the floating value, then go 'round again.
                //
                FDBigInt diff;
                int cmpResult;
                boolean overvalue;
                if ( (cmpResult = bigB.cmp( bigD ) ) > 0 ){
                    overvalue = true; // our candidate is too big.
                    diff = bigB.sub( bigD );
                    if ( (bigIntNBits == 1) && (bigIntExp > -expBias+1) ){
                        // candidate is a normalized exact power of 2 and
                        // is too big. We will be subtracting.
                        // For our purposes, ulp is the ulp of the
                        // next smaller range.
                        Ulp2 -= 1;
                        if ( Ulp2 < 0 ){
                            // rats. Cannot de-scale ulp this far.
                            // must scale diff in other direction.
                            Ulp2 = 0;
                            diff.lshiftMe( 1 );
                        }
                    }
                } else if ( cmpResult < 0 ){
                    overvalue = false; // our candidate is too small.
                    diff = bigD.sub( bigB );
                } else {
                    // the candidate is exactly right!
                    // this happens with surprising frequency
                    break correctionLoop;
                }
                FDBigInt halfUlp = constructPow52( B5, Ulp2 );
                if ( (cmpResult = diff.cmp( halfUlp ) ) < 0 ){
                    // difference is small.
                    // this is close enough
                    if (mustSetRoundDir) {
                        roundDir = overvalue ? -1 : 1;
                    }
                    break correctionLoop;
                } else if ( cmpResult == 0 ){
                    // difference is exactly half an ULP
                    // round to some other value maybe, then finish
                    dValue += 0.5*ulp( dValue, overvalue );
                    // should check for bigIntNBits == 1 here??
                    if (mustSetRoundDir) {
                        roundDir = overvalue ? -1 : 1;
                    }
                    break correctionLoop;
                } else {
                    // difference is non-trivial.
                    // could scale addend by ratio of difference to
                    // halfUlp here, if we bothered to compute that difference.
                    // Most of the time ( I hope ) it is about 1 anyway.
                    dValue += ulp( dValue, overvalue );
                    if ( dValue == 0.0 || dValue == Double.POSITIVE_INFINITY )
                        break correctionLoop; // oops. Fell off end of range.
                    continue; // try again.
                }

            }
            return (isNegative)? -dValue : dValue;
        }
    }</span>

上面就是整个实现的源代码，之所以列出来，省的各位去找了，下面就来一行行解读：

首先跟C语言中的实现一样，去掉了字符串前后的空格：

<span style="font-size:18px;"> in = in.trim()</span>

判断是否为空的字符串：

<span style="font-size:18px;"> int l = in.length();
 if (l == 0) throw new NumberFormatException("empty String");</span>

取出第一个字符，判断是否为正负数：

<span style="font-size:18px;"> int i = 0;
            switch (c = in.charAt(i)) {
                case '-':
                    isNegative = true;
                    //FALLTHROUGH
                case '+':
                    i++;
                    signSeen = true;
            }</span>

检查是否为Infinity（无穷大）或者NaN（不明确的数值结果，一般被除数为0会出现这个结果）

<span style="font-size:18px;">  c = in.charAt(i);

    if (c == 'N' || c == 'I');</span>

如果既不是Infinity或者NaN，则检查是为十六进制浮点数

<span style="font-size:18px;"> else if (c == '0');</span>

之后就是把字符串中的每个字符拆解出来放到array中，

Java提供了一个方法，将字符array转化为数字，即doubleValue(),

从doubleValue()中可以看到

<span style="font-size:18px;">int iValue = (int)digits[0]-(int)'0';</span>

直接通过了强制类型转换进行数值的转换，剩下的任务就是异常判断以及是否为科学计数法。

总结：从上面可以看出，从算法本身来讲，Java简直弱爆了，一个用C语言十几行代码就能实现的算法，用Java却达到了一百来行的代码。

atof(将字符串转化为浮点数的Java实现),布布扣,bubuko.com