Data Warehousing and Data Mining 2078

Tribhuwan University
Institute of Science and Technology
2078
Bachelor Level / Seventh Semester / Science
Computer Science and Information Technology ( CSC410 )
( Data Warehousing and Data Mining )
Full Marks: 60
Pass Marks: 24
Time: 3 hours
Candidates are required to give their answers in their own words as far as practicable.
The figures in the margin indicate full marks.

Group A

Attempt any two questions:(2*10=20)

1. Consider the following 14 training dataset assumed a credit risk of high, moderate or low to people based on the following properties of their credit rating:

a. Collateral with possible values { Adequate, None}

b. Income with possible values {"Rs 0K to Rs 15K","Rs 15 K to Rs 35K","Over Rs 35 K"}

c. Debt with possible values{ High, Low}

d. Credit history with possible values {Good, Bad, Unknown}

Classify the individual with credit history=unknown, debt  = low, collateral = adequate and income = Rs 15K to Rs 35K using decision tree algorithm. Use ID3 algorithm for building the decision tree.[10]


10 marks view

2. "Data mining is a part of KDD", Do you agree or disagree? Justify. Explain the different stages in HDD.[3+7]

10 marks view

3. How data can be modeled in multidimensional data model? Explain the conceptual modeling of data warehouse.[4+6]

10 marks view

Group B

Short Answer Questions. [5*8 = 40]

4. In real-world data, tuples with missing values values for some attributes are a common occurrence. Describe various methods for handling problem. [5]

5 marks view

5. Can we use operational database instead of data warehouse? List the nature of data warehouse.[1+4]

5 marks view

6. Why it is necessary to pre-compute the data cube? What are the possible issues for performing data cube computation.[3+2]

5 marks view

7. Describe any three methods to normalize the group of data.[5]

5 marks view

8. What are the significances of association rules in data mining? List the types of association rules with examples.[2+3]

5 marks view

9. How do you index OLAP data? Give examples.[5]

5 marks view

10. Apriori needs to scan the dataset a lot of time which reduces the efficiency. Explain some mechanism to improve its efficiency.[5]

5 marks view

11. Differentiate between OLTP and OLAP. [5]

5 marks view

12. Which one approach is better, hierarchical or partitioning for clustering? Justify. List some drawbacks of k-means.[2+3]

5 marks view

13. Write short notes.(Any Two)

a. Outlier Analysis

b. Web Mining

c. Query Manager

d. Pros and Cons of Association rules

5 marks view