We will review BI architecture for more details and keep delving into the details of BI solution with a list of interesting BI architecture designs that we want to share.
In our previous post we have described optimization of the ETL process in one of our BI solutions for leading insurance brokerage company. In that article we have slightly covered a subject of Business Intelligence architecture. So here are more details:
We are implementing big enterprise solution with a large amount of member for one of our customers – currently this is around 400 000 members in the system and it started to grow very fast since we started to upload more and more customers to the system. One of the very next tasks – were to build up to date Reports that will provide different statistics and reports.
For example – Total premium report, Gender report (Count of Males and Females in system), New Employee Additions to the system Report and so on.
Calculate all that numbers and reports in real time once user will go to Report section – will be slow operation and it will use SQL Server very hard. As per specification – no need to calculate it in real time and this is Ok if we will provide a report with system state actual to the previous date. We decided to implement BI reports.
We received very good specification (Reports Specification) and started think how to implement BI architecture in the best way:
In First step, we started to analyze all the reports we need to build and think what data we need to get from the database to build all the reports. After that, we started to think about best OLAP Cubes structure – what will be the Cubes, Measurements and Dimensions. In result, we build new architecture: “Analysis Services Multidimensional and Data Mining Project” (Image 2) with all the Cubes, Measurements, Dimensions calculations and dependencies (Image 3).
OLAP Database schema
We analyzed all the reports and found five different groups of reports so we received five different Cubes:
- Claims Cube
- Management Cube
- Member Age Cube
- Transactions Cube
- Policies Cube
Based on our analyze we build the list of general dimensions (General dimensions) and cube related dimensions (Schema, dimensions and measurements of cube Claims). Also each Cube has the list of measurements (Schema, dimensions and measurements of cube Claims).
Schema, dimensions and measurements of cube Claims
General dimensions we will use across the Cubes. For example Dimension “Dim Gender” we will use to build this reports: Count of Males and Females in Different Policies (Report based on Cube Policies), Count Of Males and Females addition, deletion and modifications in the system (Report based on Cube Transactions).
Cube related Dimensions we are using just inside some Cube. For example dimension “Dim Payment type” related just to Claims and show statistics of payments for Claims.
Once we designed and implemented Cube – we started to build Date Warehouse database. Data Warehouse database will contain pre-aggregated and denormalized data. So this is very important part of BI architecture solution! This is all the purpose of “fast” reports because queries to SQL Server with “JOIN”, distinct and some other operations take the time to execute. For example. Let's image we need to Build Gender report. For this purpose, we need to work with two tables – table Member and Table Gender. In most simple case we will have this table structure: table Member has Id, Name and GenderId – foreign key, table Gender has Id and Name:
Example of table to build Example report
Let's add some test data to this tables:
This is two ways how we can build Gender Report. Option one: Write direct SQL query:
Gender report using SQL query
Option two: Pre-aggregated and denormalized of data, build Cube and write MDX query to the Cube to get the data:
Table “MembersToGender” data denormalization
Option two is faster. Because in this case we have already denormalized data and query to this table will be faster. So our Date Warehouse database contains all needed denormalized data for our OLAP Database.
In result we received next Date Warehouse database solution:
Data Warehouse database project
In the next step we need to write SQL queries that will do all the pre-aggregations, denormalisations and inserts. Using this queries, we will get all data from production OLTP database and transfer it into Date Warehouse solution. In next image I will show as an example query – Get all Claims for Date Warehouse table “factClaims”:
Denormalize data for Data Warehouse table “factClaim”
In this step we have designed and implemented OLAP solution, Data Warehouse database solution, SQL queries to denormalize date. After this we can start to build ETL solution. For this purpose we need to create new Integration Service solution:
New ETL solution
As I have mentioned before – we need to provide report for the previous day. So we will run ETL solution every 24 hours. We designed ETL in a way to keep all the history of reports. So every new day we will calculate new a version of reports as per current state of production database. In this case we will be able to use time filters and see what was the Gender report lets say 100 days ago.
On a daily basis ETL will do next steps:
- Update Date Warehouse tables (In this step we will use query from Image (Denormalize data for Data Warehouse table “factClaim”) and all other SQL queries that do pre-aggregation, denormalization and inserts)
- Process Cube (Deploy and Process OLAP database. It means it will get all data from Date Warehouse database and transfer it to OLAP)
- Update Access (Will apply role based security to Cubes. It was described in the previous article)
Final ETL flow we can see on next image (ETL flow):
This is few ways how to deploy Integration Service solution. We did this in a next way. When you will build ETL solution – you will receive dtsx package:
Bin folder of ETL solution
You can double click on dtsx package – and it will run the flow from "ETL flow" image.
Running of DTSX
We decided to create SQL Job with scheduler that will execute this DTSX package every 24 hours:
SQL Job to run DTSX package
So in result of this solution – we received ETL process that will do all the steps and Process OLAP database on daily basis:
OLAP Cube database
Now we can connect to the Cube and get needed report using MDX queries:
MDX query for Premium Report
This is short description of designed architecture. Currently reports in production more that 10 months and all works perfect. SQL job running ETL package every day without any issues. Reports works extremely fast. Architecture is stable, easy for supporting and extending.
Also check out:
Best Regards. Andrew Tsopych CTO of Diceus