Information Retrieval - Syllabus

Course Overview and Structure

Embark on a profound academic exploration as you delve into the Information Retrieval course () within the distinguished Tribhuvan university's CSIT department. Aligned with the 2065 Syllabus, this course (CSC-405) seamlessly merges theoretical frameworks with practical sessions, ensuring a comprehensive understanding of the subject. Rigorous assessment based on a 60 marks system, coupled with a challenging passing threshold of , propels students to strive for excellence, fostering a deeper grasp of the course content.

This 3 credit-hour journey unfolds as a holistic learning experience, bridging theory and application. Beyond theoretical comprehension, students actively engage in practical sessions, acquiring valuable skills for real-world scenarios. Immerse yourself in this well-structured course, where each element, from the course description to interactive sessions, is meticulously crafted to shape a well-rounded and insightful academic experience.


Course Synopsis: Advanced aspects of Information Retrieval and Search Engine

Goal: To study advance aspects of information retrieval and working principle of search engine, encompassing the principles, research results and commercial application of the current technologies.

Units

Key Topics

  • Introduction to Computers
    IN-01

    An overview of computers and their significance in today's world. This topic sets the stage for understanding the basics of computers.

  • Digital and Analog Computers
    IN-02

    Understanding the difference between digital and analog computers, their characteristics, and applications.

  • Characteristics of Computers
    IN-03

    Exploring the key characteristics of computers, including input, processing, storage, and output.

  • History of Computers
    IN-04

    A brief history of computers, from their inception to the present day, highlighting key milestones and developments.

  • Generations of Computers
    IN-05

    Understanding the different generations of computers, including their features, advantages, and limitations.

  • Classification of Computers
    IN-06

    Categorizing computers based on their size, functionality, and application, including desktops, laptops, and mobile devices.

Key Topics

  • Introduction to IR Models
    2.1

    Overview of information retrieval models and their significance in IR systems.

  • Taxonomy of IR Models
    2.2

    Categorization of information retrieval models and their relationships.

  • Document Retrieval and Ranking
    2.3

    The process of retrieving and ranking documents based on relevance to a query.

  • Formal Characterization of IR Models
    2.4

    Mathematical representation of IR models and their underlying assumptions.

  • Boolean Retrieval Model
    2.5

    A model that retrieves documents based on exact matching of query terms.

  • Vector-Space Retrieval Model
    2.6

    A model that represents documents and queries as vectors in a high-dimensional space.

  • Probabilistic Retrieval Model
    2.7

    A model that estimates the probability of a document being relevant to a query.

  • Text-Similarity Metrics
    2.8

    Measures of similarity between documents and queries, including TF-IDF and cosine similarity.

Key Topics

  • Memory Read
    BA-01

    Memory Read operation involves retrieving data from memory locations. It is a fundamental operation in microprocessor-based systems.

  • Memory Write
    BA-02

    Memory Write operation involves storing data in memory locations. It is a crucial operation in microprocessor-based systems.

  • I/O Read
    BA-03

    I/O Read operation involves retrieving data from input/output devices. It enables the microprocessor to interact with the external environment.

  • I/O Write
    BA-04

    I/O Write operation involves sending data to input/output devices. It enables the microprocessor to interact with the external environment.

  • Direct Memory Access
    BA-05

    Direct Memory Access (DMA) is a technique that allows peripheral devices to access system memory directly, reducing the microprocessor's workload.

  • Interrupt
    BA-06

    An interrupt is a signal to the microprocessor that an event has occurred, requiring immediate attention. It enables the microprocessor to handle asynchronous events.

  • Types of Interrupts
    BA-07

    There are different types of interrupts, including maskable and non-maskable interrupts, which vary in their priority and handling by the microprocessor.

  • Interrupt Masking
    BA-08

    Interrupt Masking is a technique that enables the microprocessor to temporarily ignore or mask interrupts, allowing it to focus on high-priority tasks.

  • Non-Overlapping Lists
    BA-09

    Non-overlapping lists are used in some retrieval models to improve the efficiency of retrieval by reducing the number of documents to be ranked.

  • Proximal Nodes Mode
    BA-10

    The proximal nodes mode is a retrieval model that uses the proximity of terms in a document to improve the retrieval of relevant documents.

  • Performing CDB and PDB Flashback
    BA-11

    Understanding flashback technology, including performing flashback on Container Database (CDB) and Pluggable Database (PDB).

Relevance and Retrieval, performance metrics, Basic Measures of text retrieval (Recall, Precision and F-measure)

Key Topics

  • Query Processing
    QU-1

    Concept of query processing, including the steps involved in processing a query and the role of the query processor.

  • Query Trees and Heuristics
    QU-2

    Query trees and heuristics for query optimization, including the use of query trees to represent queries and heuristics to guide optimization.

  • Query Execution Plans
    QU-3

    Choice of query execution plans, including the factors that influence the choice of plan and the importance of plan selection.

  • Cost-Based Optimization
    QU-4

    Cost-based optimization, including the use of cost estimates to guide optimization and the role of cost-based optimization in query processing.

Word statistics (Zipf's law), Morphological analysis, Index term selection, Using thesauri, Metadata, Text representation using markup languages (SGML, HTML, XML)

Key Topics

  • Challenges and Approach of E-government Security
    SE-1

    This topic covers the challenges faced by e-government in terms of security and the approaches to address them. It explores the importance of security in e-government and the ways to mitigate risks.

  • Security Management Model
    SE-2

    This topic introduces a security management model for e-government, outlining the key components and processes involved in ensuring the security of e-government systems.

  • E-Government Security Architecture
    SE-3

    This topic delves into the architecture of e-government security, including the design and implementation of secure systems and infrastructure for e-government services.

  • Security Standards
    SE-4

    This topic covers the security standards and guidelines for e-government, including international standards and best practices for ensuring the security of e-government systems and data.

  • Data Transaction Security
    SE-5

    Security measures for protecting data during transactions in e-commerce.

  • Security Mechanisms
    SE-6

    Various security mechanisms used in e-commerce including cryptography, hash functions, digital signatures, authentication, access controls, intrusion detection systems, and secured socket layer (SSL).

Categorization algorithms (Rocchio; naive Bayes; decision trees; and nearest neighbor), Clustering algorithms (agglomerative clustering; k-means; expectation maximization (EM)) ,Applications to information filtering; organization

Personalization, Collaborative filtering recommendation, Content-based recommendation

Information extraction and applications, Extracting data from text, Evaluating IE Accuracy, XML and Information Extraction, Semantic web (purpose, Relation to hypertext page), Collecting and integrating specialized information on the web.

Probabilistic models, Generalized Vector Space Model, Latent Semantic Indexing (LSI), Efficient string searching, Pattern matching

Key Topics

  • Multiple Correlation
    MU-1

    Introduction to multiple correlation, its concept, and application in statistics. Exploring the relationship between multiple variables.

  • Partial Correlation
    MU-2

    Understanding partial correlation, its concept, and application in statistics. Analyzing the relationship between two variables while controlling for other variables.

  • Introduction to Multiple Linear Regression
    MU-3

    Basic concepts and principles of multiple linear regression, including model formulation and estimation. Understanding the relationship between multiple independent variables and a dependent variable.

  • Hypothesis Testing of Multiple Regression
    MU-4

    Testing hypotheses in multiple regression, including significance testing and confidence intervals. Evaluating the overall fit and significance of the regression model.

Lab works

The laboratory should contain all the features mentioned in a course

Samples

1. Program to demonstrate the Boolean Retrieval Model and Vector Space Model

2. Program to find the similarity between documents

3. Tokenize the words of large documents according to type and token.

4. Segment the documents according to sentences

5. Implement Porter stemmer

6. Try to build a stemmer for Nepali language

7. Build a spider that tracks only the link of nepali documents

8. Group the online news onto different categorize like sports, entertainment, politics

9. Build a recommender system for online music store