Wednesday, May 11, 2011

Dimensional Lifecycle Road Map


Dimensional Lifecycle Road Map
            As we know, data warehousing requires the technological knowledge as well as the knowledge of business processes to build an efficient data warehouse, we follow certain life cycle to build our data warehouse. These steps manages our workforce to develop an efficient data ware house and most likely yield greater productivity out of our data warehouse, if followed wisely. Therefore, we can conclude that the dimensional lifecycle to build data warehouse acts as a road map to successfully build an efficient data warehouse (Kimball, Page 332).
            The lifecycle starts with the project planning and goes through numerous process of figuring out the requirements that contains technical as well as analytical requirements to develop and constantly maintain and grow the data warehouse.
            The first step in data warehousing life cycle starts with project planning. We need to access readiness of our organization if they are ready to proceed ahead for the project (Kimball, Page 334).
            If we take an example of a video store that is growing day by day and we felt as if we need to create a data warehouse for the business then we also need to figure out if creating a data warehouse is feasible or not. We need to consider culture and scoping of the video store (Kimball, Page 336). We need to see the cost of building the data warehouse and benefit that we get out of it. Then, we need to see if it is worth investing money on data warehouse for our store. We can hire individuals that are passionate about videos as well as sound on the database and data warehouse technology.
            We need to develop a project plan which would ensure our progress on the project life cycle that has certain time frame to accomplish the task and the task that we already accomplished.
            The second important phase of our life cycle is requirement definition. We need to find out the information that we need from our data warehouse, so that, the top level executives would find it easier to make a smart decisions to be productive and stay at the top in the competitive market. We can look at the current database of the video store and gather information for the questions that would not be answered correctly by our current database  which would hamper our productivity for the store.
            We can also set the requirements by asking questions to the dimensional model that we create for our store. The question would be the question that the top level executives would want the answer for the store to be more productive. Example of one requirement for our store would be, "How many people come to our store with our ongoing marketing promotions?".
            We need to document the requirements we come up with. We need to have the requirements for architecture we would have for our data warehouse. We need to create a dimension table and a fact table for the requirement we come up with.  We should also find out the analytical requirements for our video store.
            We finally deploy the data warehouse and keep on performing the maintenance and growth because with time, our requirements and business process changes and we need to accommodate the change. This is a never ending process and there is always a scope for change in the data warehouse.
            Basically, data warehousing process seems to be more organized with the application of dimensional lifecycle road map that only helps the team responsible to build the data warehouse of the organization. This life cycle doesn't have a hard and fast rules and could be different for different organizations. It is just a guideline for the project team members and the project managers which proves to be yielding productivity and they are followed by many organizations. I personally, feel that this is very helpful for the members of the data warehouse team to accomplish the milestone and deliverables on time and with greater efficiency.    

Works Cited:
Kimball, Ralph, and Margy Ross. The Data Warehouse Toolkit: the Complete Guide to      Dimensional Modeling. New York: Wiley, 2002. Print.

Goals of data warehouse


May 10, 2011
Goals of data warehouse       
            Data warehouse is a process of extracting the useful data out of massive database which is very helpful for the top level management to make decisions and get the required information for the further analysis of the business process. It looks at organization's data from the bird's eye view and gives us the clear picture of the overall organization.
            The data warehouse should present the organization's information clearly, without misleading the decision makers. For example, sales database would only look at the sales transaction of the customer and may conclude that the customer is loyal to the company but, the information about the same customer from the accounts database may show that the customer we thought loyal, actually was late on payments and is a burden for our business. Hence, the overall data, combining all the departments is very crucial and important for a business to be successful. Therefore, we need to have organization's data presented clearly to the top level management for correct decision making.  
            We need to make sure that an organization's data and information are easily accessible to the users (Kimball, Page 3). The tools that we use should be easy enough for our staffs to learn or use it. We also need to make sure that the reports we generate out of the data warehouse is not complex for the managerial staff to understand. We need to hide the complexity and show the report in simple form. This makes the information accessed easily, and we can except good decision making for the profitability of our business.
            Business process is changing and every day we have more advanced technology being introduced into the market. We can't stop the advancement in technology and the change in business process. Rather, what we can adapt is, we can make data warehouse adapt to the future change in technology and also the business processes. For example, if our business operates individually only in one location but has a mighty probability that it would expand its location in future then we must make data warehouse adaptable for many branches to operate. This would save time and money for our business, even though,  return on investment at that particular time seems to go against it.
            Once of the most important aspect of data warehousing is security. It contains reports and analysis of the overall business and this information is very sensitive. If it reaches our competitor's hand then they could use it against our business. Hence, we need to make sure there are no loop holes in our system for the hackers to enter and compromise our information.
            The main purpose of data warehouse is to provide decision making ability to the management. Therefore, it must serve as the application for the managerial staff to make decisions based on the facts given by the data warehouse. For example, the manager for sales department of a video store is confused about if he wants to introduce a sale for the videos older than four months. He should get the facts from data warehouse if it is either beneficial or not to introduce a sale. Hence, they should help make decisions that should eventually yield profit for the business.
            It doesn't matter if we build a very good data warehouse system which is very efficient, unless the users that use the warehouse accept it (Kimball, Page 4). It is very important because if they avoid using it then the data warehouse would not be able to give the results it should have given. This means that we need to build a data warehouse that would be accepted by the users using it. Therefore, in the process of making data warehouse requirement, we need to see the technical knowledge of the users of the data warehouse and make it suitable for the users to use it.
            The most important goal of building a data warehouse is to make the business owners happy.
            How can we make our business owners happy?
            The answer is not with our words but with our works, by building an efficient system that would analyze the overall data of our business which is free of problems and can be trusted. We need to constantly develop and maintain the data warehouse so that they can make important decisions based on the facts shown.
            Basically, in today's information economy, the most important goal of data warehouse is to analyze all the data in all of the database of our business and come up with a overall view of our business organization to dump some of the unnecessary information and look at the overall aspect of the organization to make better decision for better business management.

Works Cited:
Kimball, Ralph, and Margy Ross. The Data Warehouse Toolkit: the Complete Guide to      Dimensional Modeling. New York: Wiley, 2002. Print.

Sunday, April 24, 2011

Data Mining and Neural Networks


April 24, 2011
Data mining in information economy
            Data mining is a popular concept developed in today's modern information economy which helps the management aspect of every sectors as business, research, science, supply chain etc (Kudyba, Stephan P.). Every day many individuals are performing many transactions, for example - swiping a card, becoming a member of an organization, connecting to a social networking website etc. Business organizations such as Facebook, bank, health care, e-commerce etc are storing, processing and analyzing data which is growing faster than before and will continue to grow even more in the future.




            Therefore, we need to have a particular process or a system that enables the business managers and the top level executives to refine the massive data and come up with a set of data that would help them make decisions for the prosperity of their organization.
            Basically, data mining is a mathematical method that may include mathematical equations, algorithms, logistics, neural networks, segmentation and classification, clustering, statistics and many more (Kudyba, Stephan P.).  It processes huge data and extracts the trends and patterns of the data which is what the top level executives are more interested in, so that, they can make good decisions and make profit out of their business. Data is always growing and the concept of data mining would always be useful for the business executives. Therefore, the future of data mining has huge potential for its application growth because where there is data, data mining comes in to play.
            Data mining should be performed by the technical staffs that work for the information technology department and analyzed by the top level management of any organization. They should be able to extract, transform and load data in to our data warehouse system; store and manage the data in a multidimensional system; provide access to concerned parties; analyze the data and prepare the report and documentation for possible change in business processes (Data Mining: What Is Data Mining?).






            The future of neural networks is very important. In my opinion, it can be one of the major innovation in the technology and change the world we see now in terms of how data are analyzed and processed. I can't just imagine if the system would be able to think like our brain and store information like us to make logic. Then, the system would be more intelligent than us because we can hardly extract information that are minor to us but, if it would have been a system, it would definitely. But, anyways, analysis of data can be done by artificial neural networks that learn through training and can be performed for data mining (Data Mining: What Is Data Mining?). Analysis of data can also be performed using genetic algorithms, decision trees, nearest neighbor method, rule induction or data visualization (Data Mining: What Is Data Mining?).





            A simple example of data mining could be an e-commerce website which has many customers purchasing the products and services. Now, if an executive of the website wants to know which customers among all of our customers bought products and services more than one thousand dollars in three months time period, then we are performing data mining by querying into our database. The motive of an executive for that information may be to offer them with a coupon of fifty dollars so that they are motivated to buy more.
            To sum up, we can conclude that every small or big businesses these days needs data mining and the tools that help data mining to make an executive decision, so that, they can be productive in the competitive market of today's modern information economy.   

Works Cited:
"Data Mining: What Is Data Mining?" UCLA. Web. 24 Apr. 2011.             <http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm>.
Kudyba, Stephan P. "YouTube - What Is Data Mining?" YouTube - Broadcast Yourself. 25 May   2010. Web. 24 Apr. 2011. <http://www.youtube.com/watch?v=R-sGvh6tI04>.     
            

Sunday, April 17, 2011

Comparison and contrast of two different ETL tools

April 15, 2011


Comparison and contrast of two different ETL tools

Data staging area, which can be imagined as kitchen of a restaurant consist of a storage area and the set of processes called extract-transformation-load (ETL) (Kimball, Page 8) To create a data warehouse for our changing business needs, we need to capture data from the source system by using different ETL tools. The ETL tool used would load the data into the database for our data warehouse (Taylor, Art).

ETL tools should extract, transform and load from source system to the warehouse. Extraction collects the required data from the data warehouse considering the proper format and the data layout. Transformation involves application of "a series of rules or functions" for the data being transformed (ETL - Extract Transform Load). We need to perform operations as duplicate removal, standardization, filtering, translating, sorting and check for consistency during this process. Loading, simply loads the data after extracting and transforming the data from the source.

These days the usefulness of these ETL tools are being researched to have more functionality than simply helping to extract, transform and load the data to the data warehouse. Some of those features may include - real-time or near real-time data integration, data migrations, Compliance, Master Data Management(MDM) and Operational BI (It.toolbox.com).



In market, we can see many ETL tools. Before we decide to get them we need to define our business requirement and then consider the technical aspect for our data warehouse. Some of the commercial tools are IBM - Infosphere DataStage Informatica Power Center, Oracle Warehouse Builder (OWB), Oracle Data Integrator (ODI), SAS ETL Studio etc (ETL Tools Comparison). Also, some of the freeware and open-source tools are Pentaho Data Integration (Kettle), Talend Integrator Suite, CloverETL (ETL Tools Comparison). If our business need is to invest less on data warehouse and we have qualified technical people to assist with the data warehousing then we can use the open source tools which is free and also can be adapted to the individual business needs by actually changing the tool to adapt to our business. Even though the tool comes free, sometimes the cost of upgrading it to fit to our business may cost more than the commercial tools available.

Information Server Infosphere platform from IBM is one of the ETL tool that has strong vision on the market and is very flexible that also progresses towards common metadata platform (ETL Tools Comparison). They have taken good initiatives to make it even more stronger and businesses that are using it have high level of satisfaction (ETL Tools Comparison). At the same time, it is hard to learn and needs time to get used to it. They also have long implementation cycles which uses a lot of data storage in GBs, and also the processing power (ETL Tools Comparison).

SQL Server Integration Services from Microsoft enables clients to have a broad set of documentation and support which has very easy and fast implementation with standardized data integration unlike from IBM (ETL Tools Comparison). Microsoft have real-time and message based capabilities with relatively low cost for clients to invest on warehousing (ETL Tools Comparison). Apart from those positives, they have problems in non-windows environment which Microsoft always tend to have in their products. Also, Microsoft lacks in good vision and strategy for the future of their ETL tool (ETL Tools Comparison).

Basically, we need to choose the ETL tools that serves our business requirement the most. We need to consider the future, business plan, cost and many more things before we stick with one tool to extract, transform and load from source system to the warehouse.


Works Cited:

ETL - Extract Transform Load. Web. 17 Apr. 2011.    <http://www.etltools.org/>.

"ETL Tools Comparison." ETL Software Tools. Web. 17 Apr. 2011. <http://www.etltools.net/etl- tools-comparison.html>.

It.toolbox.com. 25 Oct. 2010. Web. 17 Apr. 2011. <http://it.toolbox.com/wiki/index.php/Extract,_Transform,_Load>.

Kimball, Ralph, and Margy Ross. The Data Warehouse Toolkit: the Complete Guide to Dimensional Modeling. New York: Wiley, 2002. Print.

Taylor, Art. "Data Warehouse and the ETL Tool." The Data Administration Newsletter – TDAN.com. 1 June 1999. Web. 17 Apr. 2011. <http://www.tdan.com/view-articles/5259/>.

Sunday, April 10, 2011

Agile Architectures: Creating and Evolving Data Architectures that Adopt to changing Business


April 5, 2011
Data Architecture - Why don't we go Agile?

            John O'Brien is with BI Data warehousing who has a lot of experience with data architecture and its evolution for the better business practices. His presentation at the "TDWI BI Executive Summit" is mind blowing in a way that it helped me to open my eyes, making me think about data warehouse and the importance of data architecture not only from the technical perspective but also from the organization view - profit making, decision making etc.
            We can use traditional data architecture for our business but with the faster-than-ever changing business environment more business are going agile (Powell, James E). Why do we want our data architecture to be agile?
            Business process is changing everyday and, more and more technology is evolving. By going agile, we can set small targets, making the balance between business needs and the future of the architecture. We can communicate with the business people to know more about the business needs and come up with the solution that is not final; yet, fulfills the basic requirements for the business to run (O'Brien, John).
            It is an iterative process where we constantly perform development of the reusable architecture that we developed to come up with better architecture that would serve our business more efficiently with the change in business needs or the process itself (O'Brien, John). If we didn't go agile and tried to build an architecture that would be solid but would take many years (time) and more money (investment); this may be not suitable to the business because it may not serve the business when it needed the most (return on investment) and may also get outdated with the business process changing (O'Brien, John).
            So, what is the use of such perfect system that the business may not be able to use? Therefore, we need to make a balance between the quick delivery as well as think about the consistency and efficiency of the architecture we are building with the agile method (O'Brien, John).
             Thin slicing by layers is very important when we go agile. First, we need to connect to the sources and then deliver data marts. If we are comfortable, we can directly go to staging. After it gets more complex, we can go to EDL (Enterprise Dimensional Layer). When the requirement gets even more demanding  for changing business needs, we can put data warehouse on the layer (O'Brien, John). Evolution is always on going and we don't go to the advanced layers unless the business really needs it. This is the advantages of going agile.
            Going agile, sandboxing may actually help us create the architecture efficiently. By creating the  sandbox, we would be able to actually play with the actual data by creating a medium sized database server (Singh, Amit). This would help us to explore the requirements for the process of building data architecture by agile methodology (O'Brien, John).
            Other important step going agile is to modularize - being able to create and write codes that can be reusable (O'Brien, John). Communication is very important between the team as well as with the management. As a team, working to create the architecture, we need to set the standards in our coding or applications we write, so that, others would have a value to follow our standards (O'Brien, John.). Over the time, we need someone to play the role of governance because the scope of the project gets huge and there must be someone who helps the team to leverage more of what the team has already delivered (Powell, James E).
            Basically, just like RAD (Rapid Application Development), building architecture for our data, going agile proves to be more practical and efficient in today's world which has so many positives to offer to our business and our changing business requirements.    


Works Cited:
O'Brien, John. "Agile Architectures: Creating and Evolving Data Architectures That Adopt to Changing             Business." TDWI - The Data Warehousing Institute. TDWI, 15 Feb. 2011. Web. 09 Apr. 2011.             <http://tdwi.org/pages/video/lv-wc-ss-videos/video-assets/agile-architectures-creating-and-    evolving-data-architectures-that-adapt.aspx>.
Powell, James E. "Q and A: Agile BI Architectures." TDWI - The Data Warehousing Institute.      21 July 2010. Web. 10 Apr. 2011. <http://tdwi.org/articles/2010/07/21/agile-bi-  architectures.aspx>.
Singh, Amit. "Sandboxing." A Taste of Computer Security. June 2004. Web. 10 Apr. 2011.             <http://www.kernelthread.com/publications/security/sandboxing.html>.

Friday, April 1, 2011

Data Warehouse and its Uses

April 1, 2011


Data Warehouse and its Uses

In today's world every information are being stored in to the system by replacing the papers that used to be the main source for the information and data storage. Gone are those days when we had to perform filing and indexing for the information management. In today's world, we have algorithms that sets the search and we search those massive information with the click of a button.

Information and data are constantly stored in an organized way by the general staff of the organization. They don't deal with the upper level management and don't perform decision making for those information. Instead, the staff that are involved with the data warehouse are responsible for those operation. They tend to look at the data from the bird's eye view unlike the general staff who just use the worm's eye view and don't care about the broad perspective of the data.

Therefore, date warehouse is a process of extracting the useful data out of massive database from different database sources which is very useful for the upper level management in the process of decision making and getting useful information for further analysis of the business process.

Data warehouse involves the technical knowledge of how the data are stored in the system as well as the knowledge of business process for the data warehouse managers to perform their tasks. In other words, it involves technical as well as managerial skills for the data warehouse process.

Data warehouse is very useful in today's world where we can clearly sense the information overload in most of the organization. It is useful for an organization to make decision by selecting the information that we need and neglecting the other data that are not useful for that particular process (Kimball, Page 4). It helps an organization to easily access the information, present it to top level management efficiently and adapt to the change in technology (Kimball, Page 3). Information security is always the top priority for any organization. Data warehousing can help make decision to secure the organization's information by the reports that shows some loop holes in the system or the organization itself (Kimball, Page 4).

With the help of data warehousing we can reduce the occurrences of slow report and dashboards (Hillam, Jared). It provides the mission level view of the organization by extracting the useful data out of massive database from different database sources. It would help make an efficient decision by the organization that would lead to good decision making by the top level management to earn profit (Hillam, Jared). It would also help reduce the inefficient and expensive information access (Hillam, Jared).

Data warehouse solves the problem of sorting out the multiple version of truth that individual departments may generate. But, warehousing would look the data as a whole from all the departments and gives us the reports that is basically the sorting of all the related data (Hillam, Jared). Hence, it helps resolve the problem of dealing with the multiple version of the truth.



Works Cited:

Hillam, Jared. "YouTube - Demystifying the Data Warehouse." YouTube - Broadcast Yourself. 18 June 2009. Web. 01 Apr. 2011. .



Kimball, Ralph, and Margy Ross. The Data Warehouse Toolkit: the Complete Guide to Dimensional Modeling. New York: Wiley, 2002. Print.