The Definition of the data ware house focuses on the data storage,however , the means to retrieve and analyse the data,to extract ,transform and load data,and to manage the data dictionary are also considered essential components of a data warehousing system.
Now lets understand Data warehouse Architecture
Simple conceptualization of data warehouse architecture consists of the following interconnected layers:
1.Operational Database Layer
-An organisation’s Enterprise Resource Planning system fall into this layer
2.Data Access Layer
-The interface between the operational and informational access layer,tools to extract,transform,load data into the warehouse fall into this layer.
-The Data dictionary ,this is more detailed data about data stored
4.Informational Access Layer
-The Data Access for reporting and analyzing and the tools for reporting and analyzing data- Business intelligence tools fall into this category
Now lets see the objectives of data warehouse
1.The primary objective of data ware house system is to provide consolidated,flexible,meaningful data storage to the end user for reporting and analysis
2.Achieving a constant and efficient connection to the data source
3.Harnessing efficient data cleaning and loading technologies to the warehousing system. this process is known as Data Transformation services or Data Preparation and Staging
4.Data storage-The cleaned and stored data will have to be partitioned,summarized and stored for efficient Query and analysis
5.Finally tools necessary for Query ,analysis and reporting on data would have to be built into the system to the process to deliver a rich end user experience. This process is known as Data Representation
Now its time to understand the Basic Difference between Data warehouse and Database System.
There are a number of fundamental differences which separates a data ware house from a database. the primary difference between an application database and a data warehouse is that while former is designed to record,the latter has to be designed to respond to analysis questions that are critical for a business.
Basically a database is any system which keeps data in a table format.A Data warehouse is a especially setup database designed to hold large amount of data for reporting purposes. while normal database is optimized for transitional activity while keeping small amount of history , a data ware house is optimized for large scale reporting. within a data warehouse data from several systems are typically be merged together to present a global Enterprise view.
All data warehouses are databases,not all databases are data warehouses.
OLTP-On-line Transaction processing systems-a application database where every transaction has to be recorded , and super fast at that.ATM transaction before the ATM machine is the good example.
Advantages of data warehousing are as follows
1.A data warehouse provides a common data model for all data of interest regardless of the data source. this makes it easier to report and analyze information.
2.Prior ro loading data into the data warehouse,inconsistencies are identified and resolved. This greatly simplifies reporting and analysis.
3.As it is separate from operational system ,Data warehouse provides retrieval of data without slowing down the operational systems.
Disadvantage of Data warehouses are as follows
1.As data must be extracted,transformed and loaded into the ware house ,there is an element of latency in the data ware houses
2.Over the life data warehouses can have high costs. data warehouses is usually not static. maintenance cost is high.
We still have to understand some basic database terminology that we can come across many time while reading the other tutorials ahead. so lets have a look at it.
A metadata is the data about data.The metadata is the data that describes objects in the database and makes easier for those objects to be accessed or manipulated. it describes the database structure,constraints,authorization and sizes of data types and so on. These are often used as an integral tool for information resource management.
Data mining is the process of finding certain kind of patterns in the data. there are different types of data mining tools like OLAP- online analytical processing tools simplify analysts tasks by letting them view summaries of data in different ways. for example an analyst can see total sales by region or by product or by combination of product and region.
Data mining commonly involves four classes of tasks
4.Association rule learning
A Data mart is a collection of subject areas organised for decision support based on the needs of a given department. Finance has their data mart,marketing has theirs, and sales have theirs like wise. Each Department has its own interpretation of what a data mart should like and each department is specific to its own needs
Reason for creating data mart
1.Easy access to frequently needed data
2.creates collective view by a group of users
3.improves end user response time
4.Ease of creation.
5.lower cost that implementing a full data warehouse
A System Catalog is a repository of information describing the data in the database,that its the metadata. System catalog is a system created database that describes all databases objects , Data dictionary information and user access information, it also describes table related data such as table names,creators or owners,columns name,data types,data sizes,foreign keys and primary keys,indexed files, users,user access privileges .
Data Items/Field and Record and Files
A Data Item is the smallest unit of the data that has meaning to its user .it is traditionally called a field or data element.
A Record is collection of logically related fields or data items, with each fields possessing a fixed number of bytes and having a fixed data types. a record consist of value for each fields.
A file is a collection of related sequence of records.All records in a file are of the same record type.
Data dictionary also called information repositories are mini database management systems that manages metadata. it is repository of information about a database ,attributes names and definition for each table in the database.
Data Dictionary usually stores and manages following types of information
1.Description of the schema of the database
2.detailed information on the physical database design ,such as storage structures and access paths and files and record size
Component of Data Dictionary
Data dictionary contain following components
1.Entity-Any Item about which information is stored is called entity.
2.Attributes-An Attribute is property or characteristic or field of an Entity
3.Relationship– an association or the way that different entities relate to each other is called Relationship.
4.Key-Key is single attribute or combination of attributes of an entity set that is used to identify one or more instance of the set. There is Primary Key which is uniquely identify a record.there are other types of keys as well that we will discuss as we move on with other tutorials.
In the next tutorial we will discuss in detail about the Database management System.