When the data volume is getting big, salvation always comes from partitioning. Cubes are no exceptions. I am always amazed to see how it can both simplify business user’s life and mine. Simply defining partitions by periods of time radically shortens the loading duration as well as improving performance reporting.
Let’s take an example: we want to build a basic sales cube holding 8 years of history. Data is changing nightly on the current year. We naturally do not want to rebuild the full cube every night and the users won’t want to stand around waiting for full scan queries. Partitioning by year will allow us to only rebuild the current year nightly. Trends and comparisons between periods will be a lot faster. Users, who will generally focus on current sales, will appreciate the performance. Less than 1/8 of the cube needs to be parsed each time they’re accessing current year’s data. Everybody wins.
WhereScape RED has been designed to be a great help to create your multi-partitioned cube. But before letting the magic happens, it’s all about finding a good partition key. Logically, the natural choice comes to a time period but not always. The choice really depends on data volume, data organization and history limits.
Using WhereScape Red, we have 3 options for partitioning. The list box “Partitioning method” is located in the properties of the cube, under the “Cube Processing / Partition”. This tab contains all related information. From there, it is pretty easy to define the way we populate the cube’s data as well as its dimensions, depending on the chosen method of partitioning.
Every cube possesses at least one partition. The first option, for small cubes, is to keep a single partition. This works when the data volume is not huge and the cube doesn’t need tuning in the first place. The population option in that case would logically be the full refresh mode.
The second one allows us to manually create partitions. For that, select “Manually managed multiple partitions” in the “partitioning method” dropdown list. Define the other parameters then exit the properties. By right clicking on the cube in the left pane, we choose “Add partition”. A dialog box appears with all the new partition attributes. We just have to fill the blanks. This option is handy if we want to create for instance a partition on the last few years only and keep another big one for all the previous history.
The third one is really powerful; you can set the partitioning to be automatically done. This option is available if the cube is to be partitioned by one numeric value. It should preferably be on a time period value like day, month, quarter or year. Once the key is defined, WhereScape RED will create all necessary partitions for us. By using the next options in the tab, we are able to optimize the query made by WhereScape RED to check if a new partition needs to be created as well as we are able to limit or not the number of partitions. The last option is very useful; we can ask WhereScape RED to populate only the last x partitions during the normal periodic process (daily in our example). It becomes really easy to replace some partitions and recreate only parts of the cube.
As a conclusion, partitioned cubes combined with a well structured dataflow upstream and an efficient fact table population strategy (such as fact table partitioning…) will ensure performance on the whole chain of data warehousing. All the users will see is their cubes ready when they arrive in the morning instead having to wait for mid-afternoon, and better performance – nice…