I thought this up while attending Dan Pressman’s Kscope presentation How ASO Works and How to Design for Performance, a presentation that definitely appealed to my inner Hyperion geek. Dan did a crazy deep dive on performance tuning with particular respect to loading ASO. He had some pretty bangin hardware to play with too.
Long story short, and many of us have known this for awhile, but there are ways to format your Essbase load files so that they load faster. Basically what you are trying to do is make things easier on Essbase: stream in less data, don’t repeat things you don’t need to repeat, don’t thrash blocks in and out of memory, and so on. That’s all well and good.
The advent and proliferation of SSDs in the enterprise has done wonderful things for Hyperion performance by eliminating a lot of the performance quirks with rotational media and penalties from fragmentation. But at the end of the day we are still looking for ways to pump ever-increasing amounts of information into our cubes even faster than we were the day before.
For instances where we are loading a file that resides on the same machine as the Hyperion apps/cubes or even across the network, I wonder what, if any, performance benefits are to be had if we had the ability to import a zip file?
Zip files can get awesome compression on text files. They can also have their uncompressed contents streamed. In other words, it’s not necessary to extract the contents of a zip file before you can read the contents (starting at the beginning). In theory, if one achieved moderate to decent compression on their zip file and handed that to Essbase (say with a specialized import data MaxL command), it would be saving time on the disk-read aspect of the data load, at the expense of some additional CPU usage. Many Essbase load operations are disk I/O bound anyway so this seems like a reasonable tradeoff to make.
As an additional benefit or elaboration on the concept, perhaps multiple text files could be placed into the same zip file, perhaps with a “load manifest” or options on the load command, and Essbase would attempt to parallelize the data load to the extent it can. This would likely be an add-on feature once the basic support is in place. In all you would need to augment the data load process with a zip file reader routine (this would be an off-the-shelf library that is quite common), a couple new MaxL import data variants, and an augmentation to the Java API. I suppose you could leave the MaxL command alone and just program the interpreter to look for a .zip extension and treat it accordingly, but it seems like it’d be the better choice to specifically indicate the data load is from a compressed file.
Of course, if you’re loading just from SQL this whole thing wouldn’t apply to you. Loading data files may seem low-tech but it’s incredibly common and often times I prefer it as I have an exact text file to tie back to, if need be, versus a possibly changing SQL data store (but that’s a conversation for a different blog post). This feature would cater to the performance nuts out there – and if Kscope is any indication, there are plenty. I’d be curious to hear anyone’s thoughts on this.