Friday, January 26, 2018

Double Precision Floating's Encodings

double's Encodings are as fllowing

+Infinity  0   11..11 (1)  00..00
+Normals  0   11..10 (1)  11..11  the max is less than 1 SHL (2046-1023) * 2 = 1 SHL 1024 = 2 ^ 1024; intel manual says it's less than 2 ^ 1023, which is proved to be in-correct
+Normals  0   00..01 (1)  00..00  the min is 2^-1022;  checked with javascript's normal min and normal max
+Denormals  0   00..00 (0)  11..11  the max is less than 1 SHL (1-1023) = 2 ^ -1022; Will explain later
+Denormals  0   00..00 (0)  00..01  the min is 1 SHL (1-1023)  SHR 52 = 2 ^ -1074; 

as Denormals.min's minimal  Fraction is 00..01, not 00..00(00..00 is left for zero), so if assume  Denormals.min's integer|J bit as 1, then it's Significand is 1.00..01.

processing 1.00..01 is not convinient, so we let Denormals.min's Exponent to be -1022 instead of -1023,
by doing this, Denormals.min can be caculated in above way; so does Denormals.max.

+0: all zeros;
-0   1   00..00 (0)  00..00

-Denormals  1   00..00 (0)  00..01  the max is (-1) * (+Denormals min)
-Denormals  1   00..00 (0)  11..11  the min is (-1) * (+Denormals max)
-Normals  1   the max is (-1) * (+Normals.min)
-Normals  1   the min is (-1) * (+Normals.max)
-Infinity

NaNs are not detailed here, but Numbers of NaNs is easy to get:

NaNs's Exponent parts are the same with Infinitys, so 
 Numbers of NaNs = 2^53 - 2

No comments:

Post a Comment