Data Warehousing and Data Mining - Unit Wise Questions
1. What are the key steps in knowledge discovery in databases? Explain.
1. Differentiate between Data-Warehouse and Data-mining. Explain the stages of knowledge discovery in database with example.
1. Differentiate between Data-Warehouse and Data-mining..
1. Write down any one advantage and disadvantage of MOLAP over ROLAP. Define signed network and how do you check whether it is balanced or not? How beam search reduces the space complexity? Illustrate with an example.[2+4+4]
2. Why do we need to preprocess the data before running the algorithm? What are the processes for this? Explain. Give some examples of noise that must be removed in data while extracting the pattern.
2. Explain the functionalities and classification of data mining system with example.
2. Explain the various data mining task primitives in detail.
2. How concept hierarchy is used in extracting information? Generate the frequent pattern from the following data set FP growth, where minimum support = 3.[2+8]
3. Explain the architecture and implementation of data warehouse with example.
3. How do you compare two classifiers? Given the points A(3,7), B(4,6), C(5,5), D(6,4), E(7,3), F(6,2), G(7, 2), and H(8,4), find the core points and outliers using DBSCAN. Take Eps = 2.5 and MinPts = 3. [2+8]
3. Explain the architecture of data mining system with schematic diagram.
3. What kind of data preprocessing do we need before applying data mining algorithm to any data set. Explain minning method to handle noisy data with example.
4. What are the stages of knowledge discovery in database (KDD)?
4. What are the basic stages of KDD?
4. What is the purpose of cluster analysis in data mining? Explain.
5. How does KDD differ with data mining? Describe the stages of data mining.
2. "Data mining is a part of KDD", Do you agree or disagree? Justify. Explain the different stages in HDD.[3+7]
5. Describe the types of data used in data mining.
4.How classification plays significance role in data mining? Explain.
4.When a pattern is said to be interesting? List the issues of data mining. [1+4]
5. Are the information given by data mining is always useful? What are the issues in data warehousing and data mining?
6. Differentiate between OLAP and OLTP.
3. How data can be modeled in multidimensional data model? Explain the conceptual modeling of data warehouse.[4+6]
5. Define data discretization. Describe the tasks for data preprocessing. [1+4]
7. Differentiate between KDD and Data Mining.
6.Explain the four characteristics of data warehouse.
7. Differentiate between KDD and Data Mining.
6. Define spatial data mining. What are the challenged of multimedia mining? Describe with an example.[2+3]
7. Consider the following data set.
Find out whether the object with attribute Confident = Yes, Sick = No will Fail or Pass using Bayesian classification.[5]
4. In real-world data, tuples with missing values values for some attributes are a common occurrence. Describe various methods for handling problem. [5]
8. What are the choices for data cube materialization? Explain the strategies for cube computation. [2+3]
5. Can we use operational database instead of data warehouse? List the nature of data warehouse.[1+4]
9.Show the conflict between theory of balance and status. How do you improve Apriori? [2+3]
6. Why it is necessary to pre-compute the data cube? What are the possible issues for performing data cube computation.[3+2]
10. Differentiate between star schema and snow flake schema. List any two methods for data normalization. [2+3]
7. Describe any three methods to normalize the group of data.[5]
11. Differentiate between KDD and data mining.
10. Describe genetic algorithm using as problem solving technique in data mining.
11. How do you evaluate the accuracy of a classifier? Discuss the advantages of using K- fold cross validation. [2+3]
13. Write short notes (Any Two)
a) MOLAP
b) Data cubes
c) Snowflakes
d) Regression
8. What are the significances of association rules in data mining? List the types of association rules with examples.[2+3]
13. Write short notes (Any Two)
a) Stars
b) HOLAP
c) Data Specification
d) Mining and world wide web (WWW)
13. Write short notes (Any Two)
a) HOLAP
b) Hierarchy specification
c) Spatial Database
13. Write short notes (Any Two)
a) Data cubes
b) HOLAP
c) Spatial Database
12. Apply K(=2)- Means algorithm over the data (185, 72), (170, 56), (168, 60), (179, 68), (182, 72), (188, 77) up to two iterations and show the clusters. Initially choose first two objects as initial centroids.[5]
9. How do you index OLAP data? Give examples.[5]
13. Write short notes (Any Two)
a) Text Database Mining
b) Back propagation Algorithm
c) Regression
d) HOLAP
13. Write short notes on (Any Two)
a. Evolution analysis
b. Decision trees
c. Text mining
d. Classification using Regression
10. Apriori needs to scan the dataset a lot of time which reduces the efficiency. Explain some mechanism to improve its efficiency.[5]
11. Differentiate between OLTP and OLAP. [5]
12. Which one approach is better, hierarchical or partitioning for clustering? Justify. List some drawbacks of k-means.[2+3]
13. Write short notes.(Any Two)
a. Outlier Analysis
b. Web Mining
c. Query Manager
d. Pros and Cons of Association rules
1. Explain the architecture of Data mining system with block diagram.
2. Explain the DBMS vs. Data Warehouse.
2. Do pattern and information refer to same aspect? Justify. Differentiate between data warehouse and operational database.
1. Suppose that a data warehouse for Big University consists of the following four dimensions: student, course, semester, and instructor, and two measures count and avg-grade. When at the lowest conceptual level (e.g., for a given student, course, semester, and instructor combination), the avg-grade measure stores the actual course grade of the student. At higher conceptual levels, avg-grade stores the average grade for the given combination.
a) Draw a snowflake schema diagram for the data warehouse.
b) Starting with the base cuboid [student, course, semester, instructor], what specific OLAP operations (e.g., roll-up from semester to year) should one perform in order to list the average grade of CS courses for each Big University Student.
c) If each dimension has five levels (including all), such as “student < major < status < university < all”, how many cuboids will this cube contain (including the base and apex cuboids)?
3. Explain about the architecture and implementation of data warehouse with example.
3. Explain the data warehouse architecture. Differentiate between distributed and virtual data warehouse.
4. What do you mean by knowledge discovery in database (KDD)?
4. Differentiate between Data marks and Meta data.
4. Explain the multidimensional data model with example.
5. List down the functionality of meta data.
5. What do you mean by virtual data warehouse.
5. Explain the application of data warehouse and data mining.
5. Differentiate between DBMS and Data Warehouse.
6. Explain the distributed and virtual data warehouse.
6. Explain the similarities and dissimilarities between operational database and data warehouse.
7. Explain the multidimensional data model.
5. Differentiate between data marts and data cubes.
7. Explain the data mining techniques.
8. How different schema are used to model data warehouse? Explain.
9. Why data cube computation is essential task in data mining? Describe general strategy in data cube computation.
8. How multidimensional data model helps in retrieving information? Explain with suitable example.
10. Describe the different components of a data warehouse.
11. Define dimension table and fact table. What makes the necessity of multidimensional data model?
12. What is DMQL? How do you define Star Schema using DMQL?
2. Describe how bitmap and join indexing are used to represent OLAP data. Explain the different components of data warehouse.
5. Differentiate between OLTP and OLAP.
6. Differentiate between OLAP and OLTP.
6. Explain OLAP operations with examples.
7. List the types of OLAP operations with example.
6. Explain OLAP operations with example?
9. Compare the OLAP servers, ROLAP, MOLAP and HOLAP.
11. Differentiate between OLTP and OLAP.
12. Explain the data mining languages.
13. Write short notes on (any two):
a) Concept hierarchy
b) Data mining Query Language
c) Text mining
d) ROLAP vs MOLAP
6. Explain the tuning and testing of Data Warehouse.
6. Explain the tuning and testing of Data Warehouse.
8. List down the data mining tools.
8. What are the data warehouse back end tools? Explain.
7. Explain the optimization techniques in data cube computation.
9. Describe the significances of pre-computation of data cube.
11. What is data cube? Explain with example.
12. What does data warehouse tuning mean? Describe the parameters.
7. Explain the data cube with example.
8. Explain the Apriori Algorithm.
1. Consider the following 14 training dataset assumed a credit risk of high, moderate or low to people based on the following properties of their credit rating:
a. Collateral with possible values { Adequate, None}
b. Income with possible values {"Rs 0K to Rs 15K","Rs 15 K to Rs 35K","Over Rs 35 K"}
c. Debt with possible values{ High, Low}
d. Credit history with possible values {Good, Bad, Unknown}
Classify the individual with credit history=unknown, debt = low, collateral = adequate and income = Rs 15K to Rs 35K using decision tree algorithm. Use ID3 algorithm for building the decision tree.[10]
4. List and describe the five primitives for specifying a data mining task.
7. Explain the primitives of data mining query language.
8. Explain the data mining query language with example.
8. Explain the data mining query language.
1. What do you mean by representative object based clustering technique? Explain in detail with example.
1. Discuss the types of web mining. Explain why K-means is sensitive to outlier and how does K-Medoid minimize this issue.
2. Define clustering. Explain with example of the partitioning and hierarchical clustering methods.
2. What do you mean by clustering? Explain the K-Mean and K-Mediod algorithm with example.
3. List the two steps used in classification approach with its issues. Is this right decision to use neural network always as a classifier? Give your opinion. Discuss the working mechanism of back propagation classification algorithm.
3. Explain the K-mean and K-Mediod Algorithm with example.
9. Explain the K-Mediod Algorithm.
7. List the drawbacks of ID3 algorithm with over-fitting and its remedy techniques
8. Write the algorithm for K-means clustering. Compare it with k-nearest neighbor algorithm.
10. What are the types of Regression? Explain.
10. Explain the types of Regression.
10. What is the objective of K-means algorithm?
12. Discuss the approach behind Bayesian classification. Why smoothing technique is necessary in Bayesian classification?
13. Write short notes (Any Two)
a) OLAP queries
b) Snow flakes
c) K-mean
d) Mining text databases
1. You are given the transaction data shown below from a fast food restaurant. There are 9 distinct transactions (order 1 to order 9). There are total 5 meal (M1 to M5) involved in transactions.
Meal Items | List of item IDs | Meal Items | List of item IDs |
order 1 order 2 order 3 order 4 order 5 | M1, M2, M5 M2, M4 M2, M3 M1, M2, M4 M1, M3 | order 6 order 7 order 8 order 9 | M2, M3 M1, M3 M1, M2, M3, M5 M1, M2, M3 |
Minimum support =2, Minimum confidence = 0,7
Apply the Apriori algorithm to the database to identify frequent k-itemset and find all strong association rules.
3. List the problems of Apriori algorithm with its possible solutions. Consider the following transaction dataset.
Transaction_ID Item_List
T1 {K, A, D, B}
T2 {D,A,C,E,B}
T3 {C,A,B,E}
T4 {B,A,D}
What association rules can be found in this set, if the minimum support is 3 and the minimum confidence is 80%.
3. Give any two types of association rules with example. Trace the results of using the Apriori algorithm on the grocery store example with support threshold 2 and confidence threshold 60 %. Show the candidate and frequent itemsets for each database scan. Enumerate all the final frequent itemsets. Also indicate the association rules that are generated.
Transaction_ID | Items |
T1 | HotDogs, Buns, Ketchup |
T2 | HotDogs, Buns |
T3 | HotDogs, Coke, Chips |
T4 | Chips, Coke |
T5 | Chips, Ketchup |
T6 | HotDogs, Coke, Chips |
2. A= {A1, A2, A3, A4, A5, A6}, Assume σ = 35%. Use Apriori algorithm to get the desired solution.
A1 | A2 | A3 | A4 | A5 | A6 |
0 | 0 | 0 | 1 | 1 | 1 |
0 | 1 | 1 | 1 | 0 | 0 |
1 | 0 | 0 | 1 | 1 | 1 |
1 | 1 | 0 | 1 | 0 | 0 |
1 | 0 | 1 | 0 | 1 | 1 |
0 | 1 | 1 | 1 | 0 | 1 |
0 | 0 | 0 | 1 | 1 | 0 |
0 | 1 | 0 | 1 | 0 | 1 |
1 | 0 | 0 | 1 | 0 | 0 |
1 | 1 | 1 | 1 | 1 | 1 |
4. Explain the use of frequent item set generation process.
9. Explain the Aprion Algorithm.
9. What are the advantages and disadvantages of association rules?
9. Write down the two measures of association rule.
11. Explain the association rules with advantages and disadvantages.
11. Explain the Apriori Algorithm.
12. Explain the Apriori Algorithm.
1. List some issues of multimedia mining. Describe how back propagation is used in classification.
9. Explain the data mining tasks performed on a text database.
10. Define the spatial database and its features.
10. Define the spatial database and its features.
11. Explain the application of spatial databases.
9. What is text mining? Explain the text indexing techniques.
12. Explain mining text databases.
12. Explain the application of mining used in WWW.
12. Explain the methods of mining multimedia database.
11. What do you mean by WWW mining? Explain WWW mining techniques.