Sunday, April 24, 2011

Data Mining and Neural Networks


April 24, 2011
Data mining in information economy
            Data mining is a popular concept developed in today's modern information economy which helps the management aspect of every sectors as business, research, science, supply chain etc (Kudyba, Stephan P.). Every day many individuals are performing many transactions, for example - swiping a card, becoming a member of an organization, connecting to a social networking website etc. Business organizations such as Facebook, bank, health care, e-commerce etc are storing, processing and analyzing data which is growing faster than before and will continue to grow even more in the future.




            Therefore, we need to have a particular process or a system that enables the business managers and the top level executives to refine the massive data and come up with a set of data that would help them make decisions for the prosperity of their organization.
            Basically, data mining is a mathematical method that may include mathematical equations, algorithms, logistics, neural networks, segmentation and classification, clustering, statistics and many more (Kudyba, Stephan P.).  It processes huge data and extracts the trends and patterns of the data which is what the top level executives are more interested in, so that, they can make good decisions and make profit out of their business. Data is always growing and the concept of data mining would always be useful for the business executives. Therefore, the future of data mining has huge potential for its application growth because where there is data, data mining comes in to play.
            Data mining should be performed by the technical staffs that work for the information technology department and analyzed by the top level management of any organization. They should be able to extract, transform and load data in to our data warehouse system; store and manage the data in a multidimensional system; provide access to concerned parties; analyze the data and prepare the report and documentation for possible change in business processes (Data Mining: What Is Data Mining?).






            The future of neural networks is very important. In my opinion, it can be one of the major innovation in the technology and change the world we see now in terms of how data are analyzed and processed. I can't just imagine if the system would be able to think like our brain and store information like us to make logic. Then, the system would be more intelligent than us because we can hardly extract information that are minor to us but, if it would have been a system, it would definitely. But, anyways, analysis of data can be done by artificial neural networks that learn through training and can be performed for data mining (Data Mining: What Is Data Mining?). Analysis of data can also be performed using genetic algorithms, decision trees, nearest neighbor method, rule induction or data visualization (Data Mining: What Is Data Mining?).





            A simple example of data mining could be an e-commerce website which has many customers purchasing the products and services. Now, if an executive of the website wants to know which customers among all of our customers bought products and services more than one thousand dollars in three months time period, then we are performing data mining by querying into our database. The motive of an executive for that information may be to offer them with a coupon of fifty dollars so that they are motivated to buy more.
            To sum up, we can conclude that every small or big businesses these days needs data mining and the tools that help data mining to make an executive decision, so that, they can be productive in the competitive market of today's modern information economy.   

Works Cited:
"Data Mining: What Is Data Mining?" UCLA. Web. 24 Apr. 2011.             <http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm>.
Kudyba, Stephan P. "YouTube - What Is Data Mining?" YouTube - Broadcast Yourself. 25 May   2010. Web. 24 Apr. 2011. <http://www.youtube.com/watch?v=R-sGvh6tI04>.     
            

Sunday, April 17, 2011

Comparison and contrast of two different ETL tools

April 15, 2011


Comparison and contrast of two different ETL tools

Data staging area, which can be imagined as kitchen of a restaurant consist of a storage area and the set of processes called extract-transformation-load (ETL) (Kimball, Page 8) To create a data warehouse for our changing business needs, we need to capture data from the source system by using different ETL tools. The ETL tool used would load the data into the database for our data warehouse (Taylor, Art).

ETL tools should extract, transform and load from source system to the warehouse. Extraction collects the required data from the data warehouse considering the proper format and the data layout. Transformation involves application of "a series of rules or functions" for the data being transformed (ETL - Extract Transform Load). We need to perform operations as duplicate removal, standardization, filtering, translating, sorting and check for consistency during this process. Loading, simply loads the data after extracting and transforming the data from the source.

These days the usefulness of these ETL tools are being researched to have more functionality than simply helping to extract, transform and load the data to the data warehouse. Some of those features may include - real-time or near real-time data integration, data migrations, Compliance, Master Data Management(MDM) and Operational BI (It.toolbox.com).



In market, we can see many ETL tools. Before we decide to get them we need to define our business requirement and then consider the technical aspect for our data warehouse. Some of the commercial tools are IBM - Infosphere DataStage Informatica Power Center, Oracle Warehouse Builder (OWB), Oracle Data Integrator (ODI), SAS ETL Studio etc (ETL Tools Comparison). Also, some of the freeware and open-source tools are Pentaho Data Integration (Kettle), Talend Integrator Suite, CloverETL (ETL Tools Comparison). If our business need is to invest less on data warehouse and we have qualified technical people to assist with the data warehousing then we can use the open source tools which is free and also can be adapted to the individual business needs by actually changing the tool to adapt to our business. Even though the tool comes free, sometimes the cost of upgrading it to fit to our business may cost more than the commercial tools available.

Information Server Infosphere platform from IBM is one of the ETL tool that has strong vision on the market and is very flexible that also progresses towards common metadata platform (ETL Tools Comparison). They have taken good initiatives to make it even more stronger and businesses that are using it have high level of satisfaction (ETL Tools Comparison). At the same time, it is hard to learn and needs time to get used to it. They also have long implementation cycles which uses a lot of data storage in GBs, and also the processing power (ETL Tools Comparison).

SQL Server Integration Services from Microsoft enables clients to have a broad set of documentation and support which has very easy and fast implementation with standardized data integration unlike from IBM (ETL Tools Comparison). Microsoft have real-time and message based capabilities with relatively low cost for clients to invest on warehousing (ETL Tools Comparison). Apart from those positives, they have problems in non-windows environment which Microsoft always tend to have in their products. Also, Microsoft lacks in good vision and strategy for the future of their ETL tool (ETL Tools Comparison).

Basically, we need to choose the ETL tools that serves our business requirement the most. We need to consider the future, business plan, cost and many more things before we stick with one tool to extract, transform and load from source system to the warehouse.


Works Cited:

ETL - Extract Transform Load. Web. 17 Apr. 2011.    <http://www.etltools.org/>.

"ETL Tools Comparison." ETL Software Tools. Web. 17 Apr. 2011. <http://www.etltools.net/etl- tools-comparison.html>.

It.toolbox.com. 25 Oct. 2010. Web. 17 Apr. 2011. <http://it.toolbox.com/wiki/index.php/Extract,_Transform,_Load>.

Kimball, Ralph, and Margy Ross. The Data Warehouse Toolkit: the Complete Guide to Dimensional Modeling. New York: Wiley, 2002. Print.

Taylor, Art. "Data Warehouse and the ETL Tool." The Data Administration Newsletter – TDAN.com. 1 June 1999. Web. 17 Apr. 2011. <http://www.tdan.com/view-articles/5259/>.

Sunday, April 10, 2011

Agile Architectures: Creating and Evolving Data Architectures that Adopt to changing Business


April 5, 2011
Data Architecture - Why don't we go Agile?

            John O'Brien is with BI Data warehousing who has a lot of experience with data architecture and its evolution for the better business practices. His presentation at the "TDWI BI Executive Summit" is mind blowing in a way that it helped me to open my eyes, making me think about data warehouse and the importance of data architecture not only from the technical perspective but also from the organization view - profit making, decision making etc.
            We can use traditional data architecture for our business but with the faster-than-ever changing business environment more business are going agile (Powell, James E). Why do we want our data architecture to be agile?
            Business process is changing everyday and, more and more technology is evolving. By going agile, we can set small targets, making the balance between business needs and the future of the architecture. We can communicate with the business people to know more about the business needs and come up with the solution that is not final; yet, fulfills the basic requirements for the business to run (O'Brien, John).
            It is an iterative process where we constantly perform development of the reusable architecture that we developed to come up with better architecture that would serve our business more efficiently with the change in business needs or the process itself (O'Brien, John). If we didn't go agile and tried to build an architecture that would be solid but would take many years (time) and more money (investment); this may be not suitable to the business because it may not serve the business when it needed the most (return on investment) and may also get outdated with the business process changing (O'Brien, John).
            So, what is the use of such perfect system that the business may not be able to use? Therefore, we need to make a balance between the quick delivery as well as think about the consistency and efficiency of the architecture we are building with the agile method (O'Brien, John).
             Thin slicing by layers is very important when we go agile. First, we need to connect to the sources and then deliver data marts. If we are comfortable, we can directly go to staging. After it gets more complex, we can go to EDL (Enterprise Dimensional Layer). When the requirement gets even more demanding  for changing business needs, we can put data warehouse on the layer (O'Brien, John). Evolution is always on going and we don't go to the advanced layers unless the business really needs it. This is the advantages of going agile.
            Going agile, sandboxing may actually help us create the architecture efficiently. By creating the  sandbox, we would be able to actually play with the actual data by creating a medium sized database server (Singh, Amit). This would help us to explore the requirements for the process of building data architecture by agile methodology (O'Brien, John).
            Other important step going agile is to modularize - being able to create and write codes that can be reusable (O'Brien, John). Communication is very important between the team as well as with the management. As a team, working to create the architecture, we need to set the standards in our coding or applications we write, so that, others would have a value to follow our standards (O'Brien, John.). Over the time, we need someone to play the role of governance because the scope of the project gets huge and there must be someone who helps the team to leverage more of what the team has already delivered (Powell, James E).
            Basically, just like RAD (Rapid Application Development), building architecture for our data, going agile proves to be more practical and efficient in today's world which has so many positives to offer to our business and our changing business requirements.    


Works Cited:
O'Brien, John. "Agile Architectures: Creating and Evolving Data Architectures That Adopt to Changing             Business." TDWI - The Data Warehousing Institute. TDWI, 15 Feb. 2011. Web. 09 Apr. 2011.             <http://tdwi.org/pages/video/lv-wc-ss-videos/video-assets/agile-architectures-creating-and-    evolving-data-architectures-that-adapt.aspx>.
Powell, James E. "Q and A: Agile BI Architectures." TDWI - The Data Warehousing Institute.      21 July 2010. Web. 10 Apr. 2011. <http://tdwi.org/articles/2010/07/21/agile-bi-  architectures.aspx>.
Singh, Amit. "Sandboxing." A Taste of Computer Security. June 2004. Web. 10 Apr. 2011.             <http://www.kernelthread.com/publications/security/sandboxing.html>.

Friday, April 1, 2011

Data Warehouse and its Uses

April 1, 2011


Data Warehouse and its Uses

In today's world every information are being stored in to the system by replacing the papers that used to be the main source for the information and data storage. Gone are those days when we had to perform filing and indexing for the information management. In today's world, we have algorithms that sets the search and we search those massive information with the click of a button.

Information and data are constantly stored in an organized way by the general staff of the organization. They don't deal with the upper level management and don't perform decision making for those information. Instead, the staff that are involved with the data warehouse are responsible for those operation. They tend to look at the data from the bird's eye view unlike the general staff who just use the worm's eye view and don't care about the broad perspective of the data.

Therefore, date warehouse is a process of extracting the useful data out of massive database from different database sources which is very useful for the upper level management in the process of decision making and getting useful information for further analysis of the business process.

Data warehouse involves the technical knowledge of how the data are stored in the system as well as the knowledge of business process for the data warehouse managers to perform their tasks. In other words, it involves technical as well as managerial skills for the data warehouse process.

Data warehouse is very useful in today's world where we can clearly sense the information overload in most of the organization. It is useful for an organization to make decision by selecting the information that we need and neglecting the other data that are not useful for that particular process (Kimball, Page 4). It helps an organization to easily access the information, present it to top level management efficiently and adapt to the change in technology (Kimball, Page 3). Information security is always the top priority for any organization. Data warehousing can help make decision to secure the organization's information by the reports that shows some loop holes in the system or the organization itself (Kimball, Page 4).

With the help of data warehousing we can reduce the occurrences of slow report and dashboards (Hillam, Jared). It provides the mission level view of the organization by extracting the useful data out of massive database from different database sources. It would help make an efficient decision by the organization that would lead to good decision making by the top level management to earn profit (Hillam, Jared). It would also help reduce the inefficient and expensive information access (Hillam, Jared).

Data warehouse solves the problem of sorting out the multiple version of truth that individual departments may generate. But, warehousing would look the data as a whole from all the departments and gives us the reports that is basically the sorting of all the related data (Hillam, Jared). Hence, it helps resolve the problem of dealing with the multiple version of the truth.



Works Cited:

Hillam, Jared. "YouTube - Demystifying the Data Warehouse." YouTube - Broadcast Yourself. 18 June 2009. Web. 01 Apr. 2011. .



Kimball, Ralph, and Margy Ross. The Data Warehouse Toolkit: the Complete Guide to Dimensional Modeling. New York: Wiley, 2002. Print.