Essbase has been a great addition to the decision-support toolbox in our enterprise due to the fact that the financial pages are produced via a “home-cooked” solution. It’s actually quite impressive, although in some ways I would say that it is limited due to it’s rather rigid structure. Our world consists of seven-digit accounts for everything. You can glean a little bit about what the account is based on what digits are where, although don’t count on it. Accounts are then associated with lines (NET SALES, GROSS PROFIT, RENT, EBITDA, etc), and the lines make up the “physical pages.” The lines are numbered 00 through 99 and they quite literally correspond to a position on a piece of paper. 00 is the first line, followed by 01, then all the way to the next page with 99 on it. It’s not necessary to use all the lines, in fact, many of them are blank. Accounts are then directed towards lines. They are not “nested” in a true hierarchy — an account is simply directed multiple times. For example, sales for a particular department are directed to the NET SALES line, but since it also affects GROSS PROFIT, that particular account is also directed to the GROSS PROFIT line. This is all maintained in the Cross Reference, which is the master list of all report types and account groups (basically just different accounting systems for different business segments), the account, the line it rolls to, a description of the account, and a description of the line.
The trick, then, is how do we make this useful in Essbase? It is our goal to mirror the financials, and provide a window into them that previously did not exist, but essentially our Measures dimension is a fairly fast moving target. It can (and is) changed at any time, and we have anywhere from 5,000 to 80,000 accounts that affect a single financial page (the decision to model separate cubes after particular groups of financial pages was made long ago to make the cubes more manageable and to make more sense to the end-users). We also need to be able to load the data for a new financial period right after it closes. So altogether, here’s what we know:
- The Measures dimension has tens of thousands of members, and they change every periods (additions and deletions).
- The Measures dimension is a ragged hierarchy based on the way the accounts roll up on the financial page itself.
- The Location dimension changes each period, as well as the Attribute dimensions that are associated with the Location dimension.
- The raw data for a single period of data is about 200 megabytes (a straight, tab-delimited pull from DB2)
- In this particular case, there will be four cubes modeled after four different groups of financial pages. This means that data will be loaded four times.
- Data will be final (the books closed for a particular period) at 4AM Eastern time. All databases must be loaded and ready by 7AM.
Although using BSO isn’t automatically ruled out in this scenario, it was definitely not a favored option. The databases in this scenario will easily be tens of gigabytes each. These databases essentially already exist at just one division and a single period start to finish is many hours. ASO was the obvious choice. ASO has a few quirks, probably the most notable is that when you touch the outline, you lose all the data. So this means that we are going to load all of the data, for all periods and all years, every period. This isn’t really a bad thing though — due to history restates occurring for possibly any location and any measures, it is basically necessary to reload all historical data each period anyway.
Well, let’s tackle the issue of the outlines first and worry about the data second. Essbase Integration Server (EIS) was an obvious choice for this. The cross-reference and list of locations are both stored on the WAN and in a DB2 system. The Essbase and SQL Server boxes are in a different state, but all on the same LAN as each other. The format of the cross-reference data — that is, that way that everyone else looks at it besides the Essbase folks — isn’t really usable for us in its present state. We’re going to need to massage it a little bit. In this case, because I need the data on my on SQL box, and I want it on the same LAN as Essbase (so I can test faster, among other things), I will setup a quick DTS job to pull the data down for me.
The Merant DB2 drivers are garbage, so I found some IBM DB2 ODBC drivers to setup on the SQL box, then hook up the DTS job. This is SQL Server 2000 (ugh), otherwise I would have used whatever they call it now (SSIS?). It’s about 3 million rows altogether and takes a little while to pull down. I can automate this in the future, but for now I am content to just refresh it by hand once a period. Now I have my vey own cross-reference, yay! I setup some views on it for convenience (to split up some fields for me), and indexed on the physical page to make some of my queries a bit snappier. I did the same thing for the list of locations (all five-thousand or so) that I’m also pulling down in a DTS job.
The Measures dimension is quite ragged, so I chose the Parent / Child method of building things in EIS. My final table that I will use as the source data for EIS generally is of the form Parent, Child, Alias, Agg, UDA, Formula, SolveOrder, OrderWeight, and DataStorage. This works out pretty well for me and gives me everything I need for EIS. I don’t generally need more alias tables, if I did I would probably just bust out another table and do a physical join on some ID. With some fairly simple Transact-SQL, I have written a script that generates the dataI want (all the children and their parents). I used some temporary tables, some REPLACEing, and a few other tricks. Of special note, due to the sign on accounts that are in GROSS PROFIT, I have tagged these accounts with a UDA (‘GP’) so that I can do sign flips on it when I go to load that particular cube.
Speaking of ragged hierarchies, Location is ragged too so I setup some T-SQL to translate that from a flat structure (store — zones — division) into Parent/Child as well. I took care in EIS to setup the model carefully so I could use it effectively for all four of the cubes. Whip up some build scripts so that the outline updates can be scripted, and we’re golden. The outline update process only takes about a minute for each cube — keep in mind that the data is dumped from the get-go, making things substantially faster.
Next time: data load strategies with an emphasis on loading as fast as we can.