Data Warehousing and Data Mining - Syllabus

Course Overview and Structure

Embark on a profound academic exploration as you delve into the Data Warehousing and Data Mining course (DWDM) within the distinguished Tribhuvan university's CSIT department. Aligned with the 2074 Syllabus, this course (CSC410) seamlessly merges theoretical frameworks with practical sessions, ensuring a comprehensive understanding of the subject. Rigorous assessment based on a 60 + 20 + 20 marks system, coupled with a challenging passing threshold of , propels students to strive for excellence, fostering a deeper grasp of the course content.

This 3 credit-hour journey unfolds as a holistic learning experience, bridging theory and application. Beyond theoretical comprehension, students actively engage in practical sessions, acquiring valuable skills for real-world scenarios. Immerse yourself in this well-structured course, where each element, from the course description to interactive sessions, is meticulously crafted to shape a well-rounded and insightful academic experience.


Course Description:

This course introduces advanced aspects of data warehousing and data mining, encompassing the principles, research results and commercial application of the current technologies.

Course Objective:

The main objective of this course is to provide knowledge of different data mining techniques and data warehousing.


Units

Key Topics

  • Introduction to E-commerce
    IN-1

    Overview of E-commerce and its significance in the digital age.

  • E-business vs E-commerce
    IN-2

    Understanding the differences between E-business and E-commerce.

  • Features of E-commerce
    IN-3

    Key characteristics and benefits of E-commerce.

  • Pure vs Partial E-commerce
    IN-4

    Types of E-commerce models and their applications.

  • History of E-commerce
    IN-5

    Evolution and development of E-commerce over time.

  • E-commerce Framework
    IN-6

    Understanding the components of E-commerce framework including People, Public Policy, Marketing and Advertisement, Support Services, and Business Partnerships.

  • Types of E-commerce
    IN-7

    Overview of different types of E-commerce including B2C, B2B, C2B, C2C, M-Commerce, U-commerce, Social-Ecommerce, and Local E-commerce.

  • Challenges in E-commerce
    IN-8

    Common obstacles and difficulties faced in E-commerce.

  • Status of E-commerce in Nepal
    IN-9

    Current state and trends of E-commerce in Nepal.

  • Overview of Electronic Transaction Act of Nepal
    IN-10

    Understanding the legal framework governing E-commerce in Nepal.

  • Software Engineering Ethics
    IN-11

    Ethical considerations and principles in software engineering, including accountability, privacy, and intellectual property.

  • Distributed Computing in Grid and Cloud
    IN-12

    Exploring the role of distributed computing in grid and cloud environments, including its applications and benefits.

  • Trends in Data Warehousing
    IN-13

    Current and emerging trends in data warehousing, including big data, cloud computing, and real-time analytics.

Key Topics

  • Introduction to Computers
    IN-01

    An overview of computers and their significance in today's world. This topic sets the stage for understanding the basics of computers.

  • Digital and Analog Computers
    IN-02

    Understanding the difference between digital and analog computers, their characteristics, and applications.

  • Characteristics of Computers
    IN-03

    Exploring the key characteristics of computers, including input, processing, storage, and output.

  • History of Computers
    IN-04

    A brief history of computers, from their inception to the present day, highlighting key milestones and developments.

  • Generations of Computers
    IN-05

    Understanding the different generations of computers, including their features, advantages, and limitations.

  • Classification of Computers
    IN-06

    Categorizing computers based on their size, functionality, and application, including desktops, laptops, and mobile devices.

  • The Computer System
    IN-07

    An in-depth look at the components of a computer system, including hardware and software.

Key Topics

  • Introduction to Databases
    DA-1

    Introduction to databases, including examples and basic concepts.

  • Database Management System
    DA-2

    Introduction to Database Management Systems (DBMS), including advantages and examples.

  • Database Users
    DA-3

    Types of database users, including actors on the scene and workers behind the scene.

  • Benefits of Databases
    DA-4

    Advantages and benefits of using databases.

  • Data Models
    DA-5

    Types of data models, including hierarchical, network, ER, relational, and object models.

Key Topics

  • Introduction to Databases
    DA-1

    Introduction to databases, including examples and basic concepts.

  • Database Management System
    DA-2

    Introduction to Database Management Systems (DBMS), including advantages and examples.

  • Database Users
    DA-3

    Types of database users, including actors on the scene and workers behind the scene.

  • Benefits of Databases
    DA-4

    Advantages and benefits of using databases.

  • Data Models
    DA-5

    Types of data models, including hierarchical, network, ER, relational, and object models.

  • Three-Schema Architecture
    DA-6

    Three-schema architecture, including internal, conceptual, and external views.

Key Topics

  • Control Word and Microprogram
    MI-1

    This topic covers the concept of control words and microprograms in microprogrammed control, including their roles in controlling the flow of data and instructions in a computer system.

  • Address Sequencing and Conditional Branch
    MI-2

    This topic explains how address sequencing and conditional branching are used to control the flow of instructions in a microprogrammed control unit, including the use of conditional branch instructions and subroutines.

  • Microinstruction Format and Symbolic Microinstructions
    MI-3

    This topic covers the format of microinstructions and the use of symbolic microinstructions to represent complex control sequences in a microprogrammed control unit.

  • Design of Control Unit
    MI-4

    This topic covers the design principles and considerations for building a control unit using microprogrammed control, including the organization of control memory and the role of the sequencer.

  • Association Rules
    MI-5

    Association rules are statements that describe the relationship between different items in a dataset. They are used to identify patterns and correlations between items.

  • Types of Association Rules
    MI-6

    There are different types of association rules, including single dimensional, multidimensional, multilevel, and quantitative rules. Each type has its own characteristics and applications.

  • Finding Frequent Itemsets
    MI-7

    Finding frequent itemsets involves using algorithms such as Apriori and FP-growth to identify patterns in a dataset.

  • Generating Association Rules
    MI-8

    Generating association rules involves using frequent itemsets to create rules that describe the relationships between items.

  • Limitations and Improvements of Apriori
    MI-9

    The Apriori algorithm has limitations, such as being computationally expensive. Improvements can be made by using techniques such as sampling and parallel processing.

  • From Association Mining to Correlation Analysis
    MI-10

    Association mining can be extended to correlation analysis, which involves identifying relationships between continuous variables.

  • Lift
    MI-11

    Lift is a measure of the strength of an association rule. It helps to evaluate the usefulness and relevance of the rule.

Key Topics

  • Common Client-side Web Technologies
    CL-1

    This topic covers the fundamental technologies used on the client-side of web development, including HTML, CSS, and JavaScript.

  • JQuery
    CL-2

    This topic explores the use of JQuery, a popular JavaScript library, for client-side scripting and DOM manipulation.

  • Forms and Validation
    CL-3

    This topic discusses the importance of form validation and how to implement it using ASP.NET Core, including client-side and server-side validation techniques.

  • Single Page Application (SPA) Frameworks
    CL-4

    This topic introduces Single Page Application (SPA) frameworks, including Angular and React, and their role in building dynamic and interactive client-side applications.

  • Software-as-a-Service (SaaS)
    CL-5

    SaaS implementation issues, key characteristics of SaaS, benefits of the SaaS model.

  • Jericho Cloud Cube Model
    CL-6

    A cloud service model framework.

  • User Defined Objects
    CL-10

    Creating custom objects with properties and methods.

  • Event Handling and Form Validation
    CL-11

    Handling events and validating form data with JavaScript.

  • Error Handling
    CL-12

    Catching and handling errors in JavaScript code.

  • Handling Cookies
    CL-13

    Storing and retrieving data with cookies in JavaScript.

  • Graphic Presentation
    CL-7

    Graphic presentation involves using graphs such as histograms, frequency polygons, and frequency curves to present data. It is a visual way of presenting data, making it easy to understand and analyze.

  • Histogram
    CL-8

    A histogram is a type of graph that uses bars to represent the frequency of different ranges of values. It is commonly used to display continuous data.

  • Frequency Polygon
    CL-9

    A frequency polygon is a type of graph that uses lines to connect the points representing the frequency of different ranges of values. It is commonly used to display continuous data.

Key Topics

  • Common Client-side Web Technologies
    CL-1

    This topic covers the fundamental technologies used on the client-side of web development, including HTML, CSS, and JavaScript.

  • JQuery
    CL-2

    This topic explores the use of JQuery, a popular JavaScript library, for client-side scripting and DOM manipulation.

  • Forms and Validation
    CL-3

    This topic discusses the importance of form validation and how to implement it using ASP.NET Core, including client-side and server-side validation techniques.

  • Single Page Application (SPA) Frameworks
    CL-4

    This topic introduces Single Page Application (SPA) frameworks, including Angular and React, and their role in building dynamic and interactive client-side applications.

  • Software-as-a-Service (SaaS)
    CL-5

    SaaS implementation issues, key characteristics of SaaS, benefits of the SaaS model.

  • Jericho Cloud Cube Model
    CL-6

    A cloud service model framework.

Key Topics

  • Optimization Problems and Greedy Algorithms
    GR-1

    Introduction to optimization problems and the concept of optimal solutions, with an overview of greedy algorithms and their elements.

  • Greedy Algorithm Applications
    GR-2

    Exploration of various applications of greedy algorithms, including fractional knapsack, job sequencing with deadlines, Kruskal's algorithm, Prim's algorithm, and Dijkstra's algorithm.

  • Huffman Coding
    GR-3

    Introduction to Huffman coding, including its purpose, prefix codes, and the Huffman coding algorithm, along with its analysis.

  • Social Network Analysis
    GR-4

    Social network analysis is the process of examining social structures, relationships, and interactions within a network. It involves using graph theory and statistical methods to understand social behavior and patterns.

  • Link Mining
    GR-5

    Link mining is a subfield of graph mining that focuses on the analysis of links between nodes in a graph. It involves discovering patterns and relationships between entities in a network.

  • Friends of Friends
    GR-6

    Friends of friends is a concept in social network analysis that refers to the friends of an individual's friends. It is used to study social relationships and network structures.

  • Degree Assortativity
    GR-7

    Degree assortativity is a measure of the tendency of nodes in a network to be connected to other nodes with similar degrees. It is used to study network structures and patterns.

  • Signed Networks
    GR-8

    Signed networks are graphs that contain both positive and negative edges, representing friendships and antagonisms between nodes. It involves using theories such as structured balance and status to analyze signed networks.

  • Trust in a Network
    GR-9

    Trust in a network refers to the level of confidence or reliability between nodes. It involves using algorithms such as atomic propagation and iterative propagation to predict trust and distrust in a network.

  • Predicting Positive and Negative Links
    GR-10

    This topic involves using machine learning and graph mining techniques to predict the formation of positive and negative links in a network, such as friendships and antagonisms.

Key Topics

  • From Association Mining to Correlation Analysis
    MI-10

    Association mining can be extended to correlation analysis, which involves identifying relationships between continuous variables.

  • Lift
    MI-11

    Lift is a measure of the strength of an association rule. It helps to evaluate the usefulness and relevance of the rule.

  • Spatial Data Mining
    MI-01

    Spatial data mining involves discovering patterns and relationships in spatial data, such as geographic information. It includes techniques for mining spatial association and spatial data cubes.

  • Spatial Data Cube
    MI-02

    A spatial data cube is a multidimensional representation of spatial data, allowing for efficient querying and analysis of spatial relationships.

  • Mining Spatial Association
    MI-03

    Mining spatial association involves discovering relationships between spatial objects, such as proximity, distance, and orientation.

  • Multimedia Data Mining
    MI-04

    Multimedia data mining involves discovering patterns and relationships in multimedia data, such as images, videos, and audio files.

  • Similarity Search in Multimedia Data
    MI-05

    Similarity search in multimedia data involves finding similar multimedia objects based on their features and attributes.

  • Mining Association in Multimedia Data
    MI-06

    Mining association in multimedia data involves discovering relationships between multimedia objects, such as co-occurrence and correlation.

  • Text Mining
    MI-07

    Text mining involves discovering patterns and relationships in unstructured text data, using techniques from natural language processing and information extraction.

  • Web Mining
    MI-08

    Web mining involves discovering patterns and relationships in web data, including web content, structure, and usage.

  • Web Content Mining
    MI-09

    Web content mining involves extracting useful information from web pages, such as text, images, and links.

Lab works

Laboratory Works:

The laboratory should contain all the features mentioned in a course, which should include data preprocessing and cleaning, implementing classification, clustering, association algorithms in any programming language, and data visualization through data mining tools.