WhereScape RED provides all needed mechanisms for multiple sources management when populating dimensions. The new “post update” feature, available from version 126.96.36.199, brings the next steps in term of performance.
Let’s says for instance that your customer dimension table must be loaded from a load_customer table that contains the whole history.
The business key is the customer number and the load table contains all related information as well (let’s says customer name, tel, address, country, creation date and modification date).
Naturally, we don’t want the process to check if every single customer of the whole history exists before updating the dim data or inserting a new record. It is time for a lookup populated dimension using the post update feature!
First thing to do: another load table… Its source obviously is load_customer and it needs only one column: the customer number. Let’s call this table load_customer_code.
In that example (to make it simple), we build ourselves the lookup table from the source. In many cases, the lookup data will come from somewhere else. It could be a list of customers flagged by the marketing, the list of modified/ created customers since the last load…
Now, when creating the dimension, all related customer information will come from load_customer but cust numbers will be loaded from the second table. When building the procedure, WhereScape RED will detect multiple sources and proposes to populate the dimension via a join or a lookup. We select “lookup”; the primary table obviously is load_customer_code because it contains all customer identifiers we want to process.
The business key definition box appears, select the customer number as key. Let’s Tick “Not Exists Select” checkbox, we just ensured in a single click that Red will only deal with new customers in the cursor processing part of the procedure. For that a “Not exists” statement (Oracle: Minus) is added in the cursor definition. It should be as follows:
DECLARE c_Load CURSOR LOCAL FAST_FORWARD READ_ONLY FOR SELECT code code FROM load_customer_code WHERE NOT EXISTS ( SELECT 1 FROM dim_customer WHERE ( load_customer_code.code = dim_customer.code OR ( load_customer_code.code IS NULL AND dim_customer.code IS NULL )) )
Last step, the lookup definition box appears. Let’s tick “post update” then define the join between the dim table and the source table (in our example, dim_customer and load_customer). The procedure is ready to boost!
When loading the table, what happens behind the scenes is:
1) Not exists option: the cursor defined will create all new records very quickly (as it contains only new customer codes).
2) Post update: In that example, the cursor creates blank records with only the business key column populated. At the end of the procedure, a global update (As it would be for a “set” mode) occurs. It updates existing records on one hand and populates related information for new data on the other.
Everything is built and ready, fast…
To go even further, if we can pre-extract in the load_table all modified / new customer information as source of the dimension, it will go faster as the post update will only deal with modified / new customers. Besides, the functionality will obviously allow more complex lookups with other tables containing additional customer related information for instance.
To conclude, combining the “not exist” option with the post update lookup will allow to minimize the cursor processing and then gain in performance. To finish with something that could be important: even if the cursor deals with records to be “created only” in theory, it will always try to update first the table before inserting. That is because the source could contain doubles for instance. This is something that advanced WhereScape RED developers may want to tune but at the cost of a custom modified procedure…