There are numerous ASO cubes among my flock. Usually the choice to use ASO was not arrived at lightly — it was/is for very specific technical reasons. Typically, the main reason I have for using ASO is to get the fast data loads and the ability to load oodles of noodles… I mean data. Yes, oodles of data. The downsides (with version 7.x) is that I’m giving up calc scripts, incremental loading (although this has been somewhat addressed in later versions), native Dynamic Time Series, some flexiblity with my hierarchies, and I have to have just one database per application (you… uhh… were doing that already, right?). Also, due to the sparsity of much of the data, trying to use BSO would result in a very unwieldy cube in this particular instance.
I have a set of four cubes that are all very similar, except for different Measures dimensions. They range from 10,000 to 40,000 members. This isn’t huge, but in conjunction with the sizes of the other dimensions, there is an incredible “maximum possible blocks” potential (sidenote for EAS: one of the most worthless pieces of information to know about your cube. Really, why?). The performance of these cubes is generally pretty acceptable (considering the amount of data), but occasionally user Excel queries (especially with attribute dimensions) really pound the server and take awhile to come back. So I started looking into ways to squeeze out a little more performance.
Due to the nature of the aggregations in the cubes, they all have Dynamic hierarchies in the Accounts/Measures dimension. This is due to using the minus (-) operator, some label only stuff, and shared members, all of which ASO is very particular with, especially in this version. All of the other dimensions have Stored hiearchies or are set to Multiple Hierarchies (such as the MDX calcs in the Time dimension to get me some Year-To-Date members).
Actually, it turns out that all of the these cubes have Measures dimensions that make it prohibitively difficult to set Measures to Stored instead of Dynamic, except for one. So, even though I would need to spin off a separate EIS metaoutline in order to build the dimension differently (these cubes are all generated from the same metaoutline but with different filters), it might be worth it if I can get some better performance on retrieves to this cube — particularly when the queries start to put some of the attribute dimensions or other MDX calcs into play.
What I need is some sort of method to test the performance of some typical retrieves against the two variants of the cube. I setup one cube as normal, loaded it up with data, and materialized 1 gig worth of aggregations. Prior to this I had also copied the cube within EAS, made the tweak to Measures to change it from Dynamic to Stored, loaded the data, did a gig of aggregations. At this point I had two cubes with identical data but one with a Dynamic hierarchy (Measures with 10,000 or so members) and one with stored. Time to compare.
I cooked up some report scripts, MaxL scripts, and some batch files. The batch file loads a configuration file which specifies which database to hit and which report to run. It then runs the report against the database, sets a timestamp before and after it runs, and dumps it all to a text file. It’s not an exact science, but in theory it’ll give me somewhat of an idea as to whether making the hierarchy Stored is going to help my users’ retrieval operations. And without further ado, here are the results:
Starting new process at Tue 01/20/2009 10:11:08.35 |
Time |
Duration |
Winner |
|
start-report_01-DB (Dynamic) |
11:10.0 |
|
|
|
finish-report_01-DB (Dynamic) |
11:13.4 |
00:03.4 |
|
|
start-report_01-DB (Stored) |
11:14.6 |
|
|
|
finish-report_01-DB (Stored) |
11:21.4 |
00:06.8 |
Dynamic |
|
start-report_02-DB (Dynamic) |
11:22.6 |
|
|
|
finish-report_02-DB (Dynamic) |
11:51.9 |
00:29.3 |
|
|
start-report_02-DB (Stored) |
11:53.0 |
|
|
|
finish-report_02-DB (Stored) |
12:00.0 |
00:07.0 |
Stored |
|
start-report_03-DB (Dynamic) |
12:01.3 |
|
|
|
finish-report_03-DB (Dynamic) |
12:02.2 |
00:00.9 |
|
|
start-report_03-DB (Stored) |
12:03.9 |
|
|
|
finish-report_03-DB (Stored) |
12:42.1 |
00:38.2 |
Dynamic |
|
start-report_04-DB (Dynamic) |
12:43.6 |
|
|
|
finish-report_04-DB (Dynamic) |
12:50.2 |
00:06.6 |
|
|
start-report_04-DB (Stored) |
12:51.3 |
|
|
|
finish-report_04-DB (Stored) |
14:26.4 |
01:35.1 |
Dynamic |
|
start-report_05-DB (Dynamic) |
14:36.3 |
|
|
|
finish-report_05-DB (Dynamic) |
15:18.3 |
00:42.0 |
|
|
start-report_05-DB (Stored) |
15:19.6 |
|
|
|
finish-report_05-DB (Stored) |
17:32.0 |
02:12.4 |
Dynamic |
|
|
|
|
|
Starting new process at Tue 01/20/2009 10:30:55.65 |
|
|
|
|
start-report_01-DB (Dynamic) |
30:57.5 |
|
|
|
finish-report_01-DB (Dynamic) |
30:59.9 |
00:02.4 |
|
|
start-report_01-DB (Stored) |
31:01.0 |
|
|
|
finish-report_01-DB (Stored) |
31:05.8 |
00:04.7 |
Dynamic |
|
start-report_02-DB (Dynamic) |
31:07.7 |
|
|
|
finish-report_02-DB (Dynamic) |
31:40.8 |
00:33.1 |
|
|
start-report_02-DB (Stored) |
31:42.5 |
|
|
|
finish-report_02-DB (Stored) |
31:46.1 |
00:03.5 |
Stored |
|
start-report_03-DB (Dynamic) |
31:50.4 |
|
|
|
finish-report_03-DB (Dynamic) |
31:51.0 |
00:00.6 |
|
|
start-report_03-DB (Stored) |
31:52.4 |
|
|
|
finish-report_03-DB (Stored) |
31:52.8 |
00:00.3 |
Tie |
|
start-report_04-DB (Dynamic) |
31:54.0 |
|
|
|
finish-report_04-DB (Dynamic) |
32:06.3 |
00:12.3 |
|
|
start-report_04-DB (Stored) |
32:12.1 |
|
|
|
finish-report_04-DB (Stored) |
32:51.4 |
00:39.3 |
Dynamic |
|
start-report_05-DB (Dynamic) |
32:55.5 |
|
|
|
finish-report_05-DB (Dynamic) |
33:38.1 |
00:42.6 |
|
|
start-report_05-DB (Stored) |
33:39.7 |
|
|
|
finish-report_05-DB (Stored) |
36:42.5 |
03:02.8 |
Dynamic |
So, interestingly enough, the Dynamic dimension comes out on top, at least for most of the tests I wrote. There is one of the tests though (report_02) that seems to completely smoke the Dynamic hierarchy. I wrote these report scripts kind of randomly, so I definitely need to do some more testing, but in the mean time I think I feel better about using a Dynamic hierarchy. Since the ASO aggregation method for these cubes is simply to process aggregations until the database size is a certain multiple of it’s original size, one of the next steps I could look at for query optimization would be to enable query tracking, stuff the query statistics by running some reports, and then using those stats to design the aggregations. In any case, I’m glad I am looking at some actual data rather than just blindly implementing a change and hoping for the best.
This isn’t to say that Dynamic is necessarily better than Stored or vice versa, however, I ran this very limited number of tests numerous times and got essentially the same results. For the least part, this goes to show that there isn’t really a silver bullet for optimization and that experimentation is always a good way to go (except on your production servers, of course). I am curious, however to go back and look at report_02 and see what it is about that particular report that is apparently so conducive to Stored hierarchies.