Data Warehousing and Data Mining 2078(New Course)

Tribhuwan University
Institute of Science and Technology
2078(New Course)
Bachelor Level / Seventh Semester / Science
Computer Science and Information Technology ( CSC410 )
( Data Warehousing and Data Mining )
Full Marks: 60
Pass Marks: 24
Time: 3 hours
Candidates are required to give their answers in their own words as far as practicable.
The figures in the margin indicate full marks.

Section A

Attempt any TWO questions [2*10=20]

1. Write down any one advantage and disadvantage of MOLAP over ROLAP. Define signed network and how do you check whether it is balanced or not? How beam search reduces the space complexity? Illustrate with an example.[2+4+4]

10 marks view

2. How concept hierarchy is used in extracting information? Generate the frequent pattern from the following data set FP growth, where minimum support = 3.[2+8]

10 marks view

3. How do you compare two classifiers? Given the points A(3,7), B(4,6), C(5,5), D(6,4), E(7,3), F(6,2), G(7, 2), and H(8,4), find the core points and outliers using DBSCAN. Take Eps = 2.5 and MinPts = 3. [2+8]

10 marks view

Section B

Attempt any EIGHT questions [8*5= 40]

4.When a pattern is said to be interesting? List the issues of data mining. [1+4]

5 marks view

5. Define data discretization. Describe the tasks for data preprocessing. [1+4]

5 marks view

6. Define spatial data mining. What are the challenged of multimedia mining? Describe with an example.[2+3]

5 marks view

7. Consider the following data set.

Find out whether the object with attribute Confident = Yes, Sick = No will Fail or Pass using Bayesian classification.[5]

5 marks view

8. What are the choices for data cube materialization? Explain the strategies for cube computation. [2+3]

5 marks view

9.Show the conflict between theory of balance and status. How do you improve Apriori? [2+3]

5 marks view

10. Differentiate between star schema and snow flake schema. List any two methods for data normalization. [2+3]

5 marks view

11. How do you evaluate the accuracy of a classifier? Discuss the advantages of using K- fold cross validation. [2+3]

5 marks view

12. Apply K(=2)- Means algorithm over the data (185, 72), (170, 56), (168, 60), (179, 68), (182, 72), (188, 77) up to two iterations and show the clusters. Initially choose first two objects as initial centroids.[5]

5 marks view