The other day I was on a customer site where a subject area in the data warehouse had been based on another division’s data warehouse.
There were a few changes that had been made due to different versions of the source system, but these were largely trivial. The main change was this data warehouse was in SQL Server and the original data warehouse database had been Oracle. In both cases the source system was an Oracle database.
The customer couldn’t get the new SQL Server data warehouse to balance back to the source system.
It didn’t take long to work out the cause: precision.
Here’s some background on Oracle:
The Oracle documentation describes the NUMBER datatype as follows:
The NUMBER datatype stores fixed and floating-point numbers. Numbers of virtually any magnitude can be stored … up to 38 digits of precision.
Columns using the NUMBER datatype can be defined using the following conventions:
- NUMBER – see below
- NUMBER(precision) – i.e.: scale is zero
- NUMBER(*, scale) – i.e.: precision is 38
- NUMBER(precision, scale)
Oracle always stores NUMBER values in variable length format as “mainframe” computational-3 packed decimals. In English this means a value with p digits of precision uses ROUND((LENGTH(p)+s)/2))+1 bytes (s is 0 for positive values and 1 for negatives).
All other numeric datatypes in Oracle (except for BINARY_FLOAT and BINARY_DOUBLE) are aliases for NUMBER with specific precision and scale values. For example, if you create a table with a column that has a datatype of INTEGER the column will really be NUMBER(38).
If a column is NUMBER without precision or scale, it will accept any value in the following list:
- Positive numbers in the range 1 x 10-130 to 9.99...9 x 10125 with up to 38 significant digits
- Negative numbers from -1 x 10-130 to 9.99...99 x 10125 with up to 38 significant digits
As you can see the NUMBER datatype without precision or scale is very flexible. Unfortunately, it doesn’t move nicely to other relational databases. This brings us back to the original issue.
If you move a table from Oracle to SQL Server, then for basic datatypes, you do the following:
- CHAR(x) stays the same
- VARCHAR2(x) becomes VARCHAR(x)
- DATE becomes DATETIME (note: in SQL Server 2008 there’s now a DATE datatype, so you can leave these as DATE)
- NUMBER(x) becomes NUMERIC(x) **1
- NUMBER(x,y) becomes NUMERIC(x,y) **1
- NUMBER becomes ??? **2
There are two important points to make here:
**1: The maximum precision in both Oracle and SQL Server is 38. However, in SQL Server, the extreme size of a number is quite a lot less than Oracle. SQL Server values can range from (-1 x 1038) + 1 to (1 x 1038) – 1. Changing to FLOAT(53) will allow any Oracle NUMBER value outside the SQL Server NUMERIC range to be stored, but only with 15 digits of precision… This lack of precision capability is why the FLOAT and REAL datatypes in SQL Server are referred to as “Approximate-number data types” in SQL Server Books Online.
Note: In SQL Server, REAL is just an alias for FLOAT(24).
**2: NUMBER (without precision or scale) is non-trivial. If you have a column in SQL Server with a datatype of NUMERIC, what you actually get is NUMERIC(18,0).
This is largely unexpected, well it certainly was by me and the developers on the client site. Of course, this turned out to be the cause of my original issue: when you load data into a NUMERIC(18,0) in SQL Server, any decimal digits are lost. So, the data warehouse not balancing to the source system was caused by rounding errors caused by implicit conversion loading data from Oracle NUMBER columns into SQL Server NUMERIC columns (which were actually NUMERIC(18,0)).
Therefore, you can’t change NUMBER in Oracle to NUMERIC in SQL Server unless you know the NUMBER column in Oracle only contains smallish integer values.
What you need to do is look at the data in the column in Oracle and explicitly manage it into the best SQL Server option. A couple of options are:
- If the biggest value you have is 20 digits long in Oracle, I’d change NUMBER to be NUMERIC(38,10) in SQL Server.
- If the most significant decimal digits you have is two in Oracle, I’d change NUMBER to be NUMERIC(38,2) in SQL Server.
- And so on…
But you have to manage the rounding explicitly to make sure values always fit in the future. And you need to balance future proofing a big value coming along with the default non-variable storage nature of the NUMERIC datatype in SQL Server too!
Out of interest, I had a look at the other two databases supported by WhereScape RED: IBM DB2 and Teradata.
The DB2 DECIMAL datatype and the SQL Server NUMERIC datatype are very similar to each other, with a few exceptions:
- The maximum precision is 31 in DB2
- DECIMAL without precision and scale gives you a DECIMAL(5,0) in DB2
- DECIMAL in DB2 can hold values from (-1 x 1031) + 1 to (1 x 1031) – 1
- DOUBLE can be used to hold bigger numbers
- DECIMAL storage works the same way as Oracle, so is variable by default
So, the answer is the same: moving from Oracle to DB2, you need to make the same judgement calls on the scale and precision to use.
There are even more issues moving to Teradata from Oracle. In fact, you’re going to have issues moving to Teradata from Oracle, SQL Server and DB2, especially if you’re still using Teradata V2R6.x.
- The maximum precision is only 18 prior to Teradata 12. From Teradata 12 onwards, this increases to 38. Beware using precision over 18, as some built in database functions only work if the precision is less than or equal to 18.
- NUMERIC without precision and scale gives you a NUMERIC(5,0) in Teradata (same as DB2)
- NUMERIC in Teradata can hold values from (-1 x 1038) + 1 to (1 x 1038) – 1 from Teradata 12 onwards. Before Teradata 12, this range is (-1 x 1018) + 1 to (1 x 1018) – 1.
- FLOAT can be used to hold bigger numbers
- In Teradata, NUMERIC storage is fixed storage, increasing exponentially in bands. For example, a NUMERIC(19) uses twice the space as a NUMERIC(18), regardless of the value stored in the field.