Information Retrieval - Syllabus

Course Overview and Structure

Embark on a profound academic exploration as you delve into the Information Retrieval course (IR) within the distinguished Tribhuvan university's CSIT department. Aligned with the 2074 Syllabus, this course (CSC413) seamlessly merges theoretical frameworks with practical sessions, ensuring a comprehensive understanding of the subject. Rigorous assessment based on a 60 + 20 + 20 marks system, coupled with a challenging passing threshold of , propels students to strive for excellence, fostering a deeper grasp of the course content.

This 3 credit-hour journey unfolds as a holistic learning experience, bridging theory and application. Beyond theoretical comprehension, students actively engage in practical sessions, acquiring valuable skills for real-world scenarios. Immerse yourself in this well-structured course, where each element, from the course description to interactive sessions, is meticulously crafted to shape a well-rounded and insightful academic experience.


Course Description:

This course familiarizes students with different concepts of information retrieval techniques mainly focused on clustering, classification, search engine, ranking and query operations techniques.

Course Objective:

The main objective of this course is to provide knowledge of different information retrieval techniques so that the students will be able to develop information retrieval engine.

Units

Key Topics

  • Introduction to Computers
    IN-01

    An overview of computers and their significance in today's world. This topic sets the stage for understanding the basics of computers.

  • Digital and Analog Computers
    IN-02

    Understanding the difference between digital and analog computers, their characteristics, and applications.

  • Characteristics of Computers
    IN-03

    Exploring the key characteristics of computers, including input, processing, storage, and output.

  • History of Computers
    IN-04

    A brief history of computers, from their inception to the present day, highlighting key milestones and developments.

  • Generations of Computers
    IN-05

    Understanding the different generations of computers, including their features, advantages, and limitations.

  • Classification of Computers
    IN-06

    Categorizing computers based on their size, functionality, and application, including desktops, laptops, and mobile devices.

  • The Computer System
    IN-07

    An in-depth look at the components of a computer system, including hardware and software.

Key Topics

  • Types of Statistical Hypotheses
    TE-1

    This topic covers the different types of statistical hypotheses, including null and alternative hypotheses, and their roles in hypothesis testing.

  • Power of the Test and P-Value
    TE-2

    This topic explains the concept of power of the test, p-value, and its use in decision making during hypothesis testing.

  • Steps in Testing of Hypothesis
    TE-3

    This topic outlines the steps involved in testing a hypothesis, from formulating the hypothesis to making a decision based on the test results.

  • One Sample Tests for Mean of Normal Population
    TE-4

    This topic covers one sample tests for the mean of a normal population, including tests for known and unknown variance.

  • Test for Single Proportion
    TE-5

    This topic explains how to conduct a test for a single proportion, including the test statistic and p-value calculation.

  • Test for Difference between Two Means
    TE-6

    This topic covers the test for the difference between two means, including the test statistic and p-value calculation.

  • Test for Difference between Two Proportions
    TE-7

    This topic explains how to conduct a test for the difference between two proportions, including the test statistic and p-value calculation.

  • Paired Sample T-Test
    TE-8

    This topic covers the paired sample t-test, including its application and interpretation.

  • Linkage between Confidence Interval and Testing of Hypothesis
    TE-9

    This topic explains the relationship between confidence intervals and hypothesis testing, including how to use confidence intervals to make inferences about a population.

  • Inverted Indices
    TE-10

    Inverted indices are data structures used to store and retrieve information efficiently. They consist of a list of words and their corresponding document frequencies.

  • Positional Inverted Index
    TE-11

    A positional inverted index is a type of inverted index that stores the position of each word in a document. This allows for more efficient phrase querying and proximity searching.

  • Natural Language Processing in Information Retrieval
    TE-12

    Natural Language Processing (NLP) is a subfield of artificial intelligence that deals with the interaction between computers and human language. In information retrieval, NLP is used to improve the understanding and retrieval of text data.

  • Basic NLP Tasks
    TE-13

    Basic NLP tasks include Part-of-Speech (POS) tagging and shallow parsing. These tasks are used to analyze and understand the structure and meaning of text data.

Key Topics

  • Memory Read
    BA-01

    Memory Read operation involves retrieving data from memory locations. It is a fundamental operation in microprocessor-based systems.

  • Memory Write
    BA-02

    Memory Write operation involves storing data in memory locations. It is a crucial operation in microprocessor-based systems.

  • I/O Read
    BA-03

    I/O Read operation involves retrieving data from input/output devices. It enables the microprocessor to interact with the external environment.

  • I/O Write
    BA-04

    I/O Write operation involves sending data to input/output devices. It enables the microprocessor to interact with the external environment.

  • Direct Memory Access
    BA-05

    Direct Memory Access (DMA) is a technique that allows peripheral devices to access system memory directly, reducing the microprocessor's workload.

  • Interrupt
    BA-06

    An interrupt is a signal to the microprocessor that an event has occurred, requiring immediate attention. It enables the microprocessor to handle asynchronous events.

  • Types of Interrupts
    BA-07

    There are different types of interrupts, including maskable and non-maskable interrupts, which vary in their priority and handling by the microprocessor.

  • Interrupt Masking
    BA-08

    Interrupt Masking is a technique that enables the microprocessor to temporarily ignore or mask interrupts, allowing it to focus on high-priority tasks.

  • Non-Overlapping Lists
    BA-09

    Non-overlapping lists are used in some retrieval models to improve the efficiency of retrieval by reducing the number of documents to be ranked.

  • Proximal Nodes Mode
    BA-10

    The proximal nodes mode is a retrieval model that uses the proximity of terms in a document to improve the retrieval of relevant documents.

Key Topics

  • Event Handling Concept
    EV-1

    Understanding the concept of event handling in Java, including the role of listeners and event sources. This topic lays the foundation for handling events in Java applications.

  • Listener Interfaces
    EV-2

    Exploring the different listener interfaces in Java, including their methods and usage. This topic covers the interfaces that must be implemented to handle events.

  • Using Action Commands
    EV-3

    Learning how to use action commands to handle events, including setting and getting action commands. This topic covers the basics of using action commands in event handling.

  • Adapter Classes
    EV-4

    Understanding the role of adapter classes in event handling, including their usage and benefits. This topic covers how adapter classes simplify event handling.

  • Handling Action Events
    EV-5

    Handling action events, including button clicks and other actions. This topic covers the specifics of handling action events in Java applications.

  • Handling Key Events
    EV-6

    Handling key events, including key presses and releases. This topic covers the specifics of handling key events in Java applications.

Key Topics

  • Query Processing
    QU-1

    Concept of query processing, including the steps involved in processing a query and the role of the query processor.

  • Query Trees and Heuristics
    QU-2

    Query trees and heuristics for query optimization, including the use of query trees to represent queries and heuristics to guide optimization.

  • Query Execution Plans
    QU-3

    Choice of query execution plans, including the factors that influence the choice of plan and the importance of plan selection.

  • Cost-Based Optimization
    QU-4

    Cost-based optimization, including the use of cost estimates to guide optimization and the role of cost-based optimization in query processing.

Key Topics

  • Search Engines
    WE-01

    Understanding the working principle of search engines, including their architecture and functionality.

  • Spidering
    WE-02

    Exploring the structure and algorithms of spiders, including simple and multithreaded approaches, and their role in web crawling.

  • Directed Spidering
    WE-03

    Learning about topic-directed and link-directed spidering techniques, and their applications in web search.

  • Crawlers
    WE-04

    Understanding the basic architecture and functionality of crawlers in web search.

  • Link Analysis
    WE-05

    Studying link analysis techniques, including HITS and PageRank, and their role in ranking web pages.

  • Query Log Analysis
    WE-06

    Analyzing query logs to understand user behavior and improve search engine performance.

  • Handling Invisible Web
    WE-07

    Exploring techniques for handling the 'invisible' web, including snippet generation and CLIR (Cross Language Information Retrieval).

Key Topics

  • Types of Statistical Hypotheses
    TE-1

    This topic covers the different types of statistical hypotheses, including null and alternative hypotheses, and their roles in hypothesis testing.

  • Power of the Test and P-Value
    TE-2

    This topic explains the concept of power of the test, p-value, and its use in decision making during hypothesis testing.

  • Steps in Testing of Hypothesis
    TE-3

    This topic outlines the steps involved in testing a hypothesis, from formulating the hypothesis to making a decision based on the test results.

  • One Sample Tests for Mean of Normal Population
    TE-4

    This topic covers one sample tests for the mean of a normal population, including tests for known and unknown variance.

  • Test for Single Proportion
    TE-5

    This topic explains how to conduct a test for a single proportion, including the test statistic and p-value calculation.

  • Test for Difference between Two Means
    TE-6

    This topic covers the test for the difference between two means, including the test statistic and p-value calculation.

  • Test for Difference between Two Proportions
    TE-7

    This topic explains how to conduct a test for the difference between two proportions, including the test statistic and p-value calculation.

Key Topics

  • Types of Statistical Hypotheses
    TE-1

    This topic covers the different types of statistical hypotheses, including null and alternative hypotheses, and their roles in hypothesis testing.

  • Power of the Test and P-Value
    TE-2

    This topic explains the concept of power of the test, p-value, and its use in decision making during hypothesis testing.

  • Steps in Testing of Hypothesis
    TE-3

    This topic outlines the steps involved in testing a hypothesis, from formulating the hypothesis to making a decision based on the test results.

  • One Sample Tests for Mean of Normal Population
    TE-4

    This topic covers one sample tests for the mean of a normal population, including tests for known and unknown variance.

  • Test for Single Proportion
    TE-5

    This topic explains how to conduct a test for a single proportion, including the test statistic and p-value calculation.

  • Test for Difference between Two Means
    TE-6

    This topic covers the test for the difference between two means, including the test statistic and p-value calculation.

Key Topics

  • Relational Database Design Using ER-to-Relational Mapping
    RE-1

    Learn how to design relational databases using ER-to-relational mapping, including mapping of regular entities, weak entities, relationship types, multivalued attributes, and N-ary relationships.

  • Informal Design Guidelines for Relational Schemas
    RE-2

    Understand informal design guidelines for relational schemas, including semantics of attributes in relations, redundant information in tuples and update anomalies, NULL values in tuples, and generation of spurious tuples.

  • Functional Dependencies
    RE-3

    Study functional dependencies, including definition, inference rules, Armstrong's axioms, attribute closure, equivalence of functional dependencies, and minimal sets of functional dependencies.

Key Topics

  • Query Processing
    QU-1

    Concept of query processing, including the steps involved in processing a query and the role of the query processor.

  • Query Trees and Heuristics
    QU-2

    Query trees and heuristics for query optimization, including the use of query trees to represent queries and heuristics to guide optimization.

  • Query Execution Plans
    QU-3

    Choice of query execution plans, including the factors that influence the choice of plan and the importance of plan selection.

  • Cost-Based Optimization
    QU-4

    Cost-based optimization, including the use of cost estimates to guide optimization and the role of cost-based optimization in query processing.

  • Measurement of Queueing System Performance
    QU-5

    This topic covers the metrics and methods used to measure the performance of queuing systems, including efficiency, effectiveness, and quality of service.

  • Networks of Queuing Systems
    QU-6

    This topic explores the concept of networks of queuing systems, with a focus on computer systems and their applications.

  • Applications of Queuing Systems
    QU-7

    This topic highlights the various applications of queuing systems in real-world scenarios, including manufacturing, healthcare, and transportation.

Key Topics

  • Active Database Concepts and Triggers
    AD-1

    This topic covers the concepts of active databases, including triggers, and their applications in advanced database systems.

  • Temporal Database Concepts
    AD-2

    This topic explores the concepts and techniques of temporal databases, which manage time-varying data and support temporal queries.

  • Spatial Database Concepts
    AD-3

    This topic introduces the concepts and techniques of spatial databases, which manage spatial data and support spatial queries and analysis.

  • Multimedia Database Concepts
    AD-4

    This topic covers the concepts and techniques of multimedia databases, which manage multimedia data such as images, audio, and video.

  • Deductive Database Concepts
    AD-5

    This topic explores the concepts and techniques of deductive databases, which use logical rules to derive new information from existing data.

  • Introduction to Information Retrieval and Web Search
    AD-6

    This topic provides an introduction to the concepts and techniques of information retrieval and web search, including indexing, querying, and ranking.

  • Multitasking
    AD-7

    Multitasking in 80286 architecture allows multiple tasks to run concurrently, improving system performance and responsiveness.

Lab works

Laboratory Works:

The laboratory should contain all the features mentioned in a course. The Laboratory work should contain at least following tasks

  1. Program to demonstrate the Boolean Retrieval Model and Vector Space Model
  2. Tokenize the words of large documents according to type and token
  3. Program to find the similarity between documents
  4. Implement Porter stemmer
  5. Build a spider that tracks only the link of nepali documents
  6. Group the online news onto different categorize like sports, entertainment, politics
  7. Build a recommender system for online music store