Provide an example of a data warehouse model defining the grain, dimensions and facts of the data warehouse.

Task 1: 180-200 words with reference discussion reply to the discussion post.

Discussion Question:

Provide an example of a data warehouse model defining the grain, dimensions and facts of the data warehouse.

Discussion Post:

A grain of data helps indicate exactly what is to be contained in a single row inside the fact table (The Kimball Group, n.d).  This is important because it determines the level of detail that is available in the fact table and helps to determine what should be available in the dimensional model.  Usually granularity is determined during requirement gathering to indicate what is best suited to the business (IBM, n.d). The grain definition can be considered the starting point and is one of the most important steps in the making of every dimensional model as it is what determines what type of data is going to be available in the database; every dimension and fact needs to be consistent with the grain (IBM, n.d).

To not overcomplicate things, choosing a grain is basically choosing the template of the data in a dimensional model that the dimensions and the fact table will have to follow.  The granularity of the data must be specified to determine what data is kept in the database, for example, a date field can contain month and years, but not days, meaning that the granularity of the date field is at the month level.  When that is the case for example, any analysis done on the data will only be able to extrapolate patterns based on month to month activity, and not daily activity (IBM, n.d).

Some examples of a grain are:

·

· A car in a car dealership with price, age, model and mileage

· Items on a bill to be paid by a certain date

· The amount of money in a bank account on a certain date.

All of the above are single records in a fact table that determine what type of information the dimensions should consist of.

 

Works Cited

IBM. (n.d). Identify the Grain. Retrieved from IBM Knowledge Center: https://www.ibm.com/support/knowledgecenter/en/SS9UM9_9.1.1/com.ibm.datatools.dimensional.ui.doc/topics/c_dm_design_cycle_2_idgrain.html

The Kimball Group. (n.d). Grain. Retrieved from kimballgroup.com: https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/grain/

Discussion reply:

Task 2: 180-200 words with reference discussion reply

Discussion Question:

Identify the importance of selecting the Grain of a data warehouse in the Kimball Data Warehouse Model. Provide examples of grains within a Data Warehouse.

Discussion post:

This week’s discussion is about the importance of selecting the grain of a data warehouse. Before I discuss the importance of selecting a grain, I would like to define what it means to declare a grain in data warehouse.  According to Kimball (2003), declaring the grain means saying exactly what a fact table record represents in a data warehouse.  The following are few examples of declaring a grain in a data warehouse:

· A movie ticket used by someone at the movies

· A line item on a grocery receipt  received from a grocery store

· Am inventory measurement taken every day of products in a warehouse

Declaring a grain of a data ware house is important because it helps to clearly identify the dimensions that can be used and those that are impossible to use. Another importance of declaring a grain in a data warehouse is its powerful effect.  “Every fact table design must be rooted in the realities of available physical data sources (Kimball, 2003).”The declaration of a grain can help one to vividly visualize the possible dimensionality of a fact table when considering data sources and deciding whether or not a dimension can be attached to that data. This ensures that all the facts in the design are deeply anchored to the realities of available data sources. Another importance of declaring a grain in a data warehouse is that it allows for creativity in the addition of dimensions to a fact table design that may not be easily noticed in the data source. Finally, declaring the grain in a data warehouse helps to verify the truth about the measured numeric fact. In essence the fact must be true to the grain in the data warehouse.

References

Kimball, R. (2003). Declaring the Grain. Retrieved from https://www.kimballgroup.com/2003/03/declaring-the-grain/

IBM. (n.d.). Identify the grain. Retrieved from https://www.ibm.com/support/knowledgecenter/en/SS9UM9_9.1.1/com.ibm.datatools.dimensional.ui.doc/topics/c_dm_design_cycle_2_idgrain.html

Discussion Reply:

Task 3: 180-200 words with reference discussion reply

Discussion Question:

Describe one unique and specific example where you would use classification of type Decision Tree, Bayesian or Rule-Based and explain WHY. Use references and justification to support your point of view.

Discussion post:

Decision trees are used to decide on a specific action. These actions can be traced to the end leaves from the root. Two entities, namely decision nodes, and leaves can explain the tree. The leaves are the final decisions or results. And the decision nodes divide the data. Decision trees are most useful if we have a complete set of data that we want to test independent variables to see whether or not they are related to a target variable. A decision tree is a simple display of the classification of data. The decision tree is easily learned and understood the successful method.

For example, a person’s information such as age, eating habits, physical activity, etc. is fit. The decision nodes are like’ What’s the age? does he practice? Does he eat many pizzas?’ And the leaves, which are either “fit” or “unfit.” This was a binary classification problem in this case (yes no type problem). The decision variable is categorical. (Kulkarni, 2017)

Kulkarni (2017)

One of the most popular classification algorithms used in data mining and machine learning is the decision-making tree. Examples include Evaluation of brand expansion opportunities for a company using historical sales data Determination of potential buyers of a product using demographic data to target a limited advertising budget. Assistance in prioritizing the treatment of patients in emergency rooms using a predictive model based on factors such as age, blood pressure, gender, location and severity of pain and other measurements (Decision making solutions, n.d.).

References

Kulkarni, M.(2017). Decision Trees for Classification: A Machine Learning Algorithm. Retrieved from https://www.xoriant.com/blog/product-engineering/decision-trees-machine-learning-algorithm.html

Decision-making solutions. (n.d.).The decision making a tree – A simple to way to visualize a decision. Retrieved from https://www.decision-making-solutions.com/decision-making-tree.html

Discussion Reply:

Task 4: 180-200 words with reference discussion reply

Discussion Question:

Describe one unique and specific example where you would use classification of type Decision Tree, Bayesian or Rule-Based and explain WHY. Use references and justification to support your point of view.

Discussion post:

Naive Bayes is a method of classification that is derived from the probability theory of the Reverend Thomas Bayes.  Naive Bayes looks at certain values and determines the probability of an event happening, which would classify the instance.  Training data is used to determine the probability of an event happening; this is done by looking at one attribute and seeing if a certain value of that attribute is a valid predictor of what the class will end up being.  Naive Bayes combines the elements of prior probability (formulating a prediction about class without prior knowledge of results) and conditional probability (formulating a prediction about class by using values from prior results) into one formula with a successful rate of accuracy (Bramer, 2016).

An example of the use of Naive Bayes from the digital marketing world that I live in would be the probability of a user making a purchase based on certain previous events.  For example we might have a data set where the class indicates whether or not an item was purchased as a + or -. Some of the attributes in this data set are related to other activities the potential buyer may have participated in.  Some of these items are:

· Purchased item previously

· Received promotional email

· Visited product page

· Buyer location

· Buyer age

Naive Bayes can be used to predict classification based on the value of some of these attributes.  The probability of a purchase based on one or more of these attributes can be used as an accurate predictor.

Bramer, M. (2016). Principles of Data Mining. Undergraduate Topics in Computer Science. Springer London. https://doi.org/10.1007/978-1-4471-7307-6

Discussion Reply: