Thanks to everyone who attended my session “Building an Effective Data Warehouse Architecture” for Pragmatic Works. There were over 500 attendees! The abstract is below and the recording of the session is available here. I received a ton of questions and I have attempted to answer most of them below.
Here is the PowerPoint presentation: Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
You’re a DBA and your boss asks you to determine if a data warehouse would help the company. So many questions pop into your head: Why use a data warehouse? What is the best methodology to use when creating a data warehouse? Should I use a normalized or dimensional approach? What is the difference between the Kimball and Inmon methodologies? Does the new Tabular model in SQL Server 2012 change things? What is the difference between a data warehouse and a data mart? Is there any hardware I can purchase that is optimized for a data warehouse? What if I have a ton of data? During this session James will help you to answer these questions so your response to your boss will provoke amazement and lead to a big raise. Or at least help to lead you down the correct path!
Questions from the presentation:
Q: If CDC violates a vendor TOS, would CDC work against a OLTP mirror done implemented with log shipping?
A: No it is not possible to enable CDC at Log Shipping Destination. This is because changing (enabling CDC) to the log shipped database (destination) would require changes to the schema; this would bring the need to take the database online, breaking the log chain and making it inaccessible to the additional logs (During log shipping, the destination/secondary database will be in Standby or No Recovery mode that means it will be available for Read-Only).
Q: How’s the fast tracking dw performance compared with the PDW from Microsoft?
A: Fast Track database can have very high performance (see SQL Performance Tuning Test Results for Data Warehouse Architectures) but once you get to 50TB or more you will reach the limit of the hardware. That is where PDW comes into play and queries that can take hours on a SMP server can be done in seconds on a MPP server.
Q: For the ELT I notice you put SSIS… Do you recommend SSIS over TSQL for this process? Is there a reason to use one over the other?
A: I have a blog about this: http://www.jamesserra.com/archive/2011/08/when-to-use-t-sql-or-ssis-for-etl/
Q: For the cubes, once they are processed.. If new or updated data is added to the data warehouse, do the cubes have to be re-processed to recognize the new data?
A: Yes, the cubes have to be reprocessed, but you can create daily partitions in the cube so you only have to process one partition and not the whole cube
Q: Does DW typically keep history or is this only done in dimensional models?
A: It is usually only done in data mart dimensional models. I would only keep history in the EDW if I needed history from day one and I was not planning on creating a data mart for a while
Q: Appliances are good as stand alone so what are the options for blending this into a SAN environment?
A: SAN’s are very slow compared to the direct-storage that is used with an appliance. I’m afraid you are out of luck if you want to use an appliance with a SAN
Q: What do u mean by indirect end user access of data vs direct
A: Data marts can be directly accessed (a user can query it). EDW should not allow a user to query it. Instead, a data mart should be created from a EDW and the user can query that data mart
Q: The diagram showed the Inmon final Data Marts as being in 3NF, was this right or is it a typo?
A: That diagram is correct. It’s rare, but in some cases you may want to have a data mart in 3NF. For example, if you are not reporting off of it but rather exporting it to a csv
Q: is tabular model only available in SSAS in sql 2012? And is tabular model better than multi-dimensional model?
A: Tabular model is not available in versions prior to SQL Server 2012. I have a blog post comparing it to multidimensional: http://www.jamesserra.com/archive/2012/04/sql-server-2012-multidimensional-vs-tabular/
Q: In your “Why use a Data Warehouse?” slide…the comment ..”plug holes in source systems”. In your opinion, where do you draw the line between “plugging holes” and establishing that a fix/re-design of the application and its associated data architecture are the best approach before loading that data in the DW?
A: Plugging a hole in the source system only helps for future data entry, so you will still need to correct prior data issues via tools like SSIS or DQS. I have never had a case where I recommended a redesign of a source system – I just spend more time plugging holes and correcting prior data. I don’t think a client would like to hear me say they need to rebuild their source system 🙂
Q: Could you comment about BI and Microsoft cloud Azure? Does it perform?
Q: What books and/or internet resources you could recommend to start with DW?
A: Go with Kimball: http://www.jamesserra.com/archive/2013/05/ralph-kimball-books/
Q: Does tabular models handle ragged hierarchies?
A: No, but BIDS helper comes to the rescue