Tribhuvan university CSIT 2065 Syllabus Information Retrieval

Information Retrieval - Syllabus

Course Overview and Structure

Embark on a profound academic exploration as you delve into the Information Retrieval course () within the distinguished Tribhuvan university's CSIT department. Aligned with the 2065 Syllabus, this course (CSC-405) seamlessly merges theoretical frameworks with practical sessions, ensuring a comprehensive understanding of the subject. Rigorous assessment based on a 60 marks system, coupled with a challenging passing threshold of , propels students to strive for excellence, fostering a deeper grasp of the course content.

This 3 credit-hour journey unfolds as a holistic learning experience, bridging theory and application. Beyond theoretical comprehension, students actively engage in practical sessions, acquiring valuable skills for real-world scenarios. Immerse yourself in this well-structured course, where each element, from the course description to interactive sessions, is meticulously crafted to shape a well-rounded and insightful academic experience.

Course Synopsis: Advanced aspects of Information Retrieval and Search Engine

Goal: To study advance aspects of information retrieval and working principle of search engine, encompassing the principles, research results and commercial application of the current technologies.

Units

Key Topics

Introduction to Computers
IN-01

An overview of computers and their significance in today's world. This topic sets the stage for understanding the basics of computers.
Digital and Analog Computers
IN-02

Understanding the difference between digital and analog computers, their characteristics, and applications.
Characteristics of Computers
IN-03

Exploring the key characteristics of computers, including input, processing, storage, and output.
History of Computers
IN-04

A brief history of computers, from their inception to the present day, highlighting key milestones and developments.
Generations of Computers
IN-05

Understanding the different generations of computers, including their features, advantages, and limitations.
Classification of Computers
IN-06

Categorizing computers based on their size, functionality, and application, including desktops, laptops, and mobile devices.

Key Topics

Introduction to IR Models
2.1

Overview of information retrieval models and their significance in IR systems.
Taxonomy of IR Models
2.2

Categorization of information retrieval models and their relationships.
Document Retrieval and Ranking
2.3

The process of retrieving and ranking documents based on relevance to a query.
Formal Characterization of IR Models
2.4

Mathematical representation of IR models and their underlying assumptions.
Boolean Retrieval Model
2.5

A model that retrieves documents based on exact matching of query terms.
Vector-Space Retrieval Model
2.6

A model that represents documents and queries as vectors in a high-dimensional space.
Probabilistic Retrieval Model
2.7

A model that estimates the probability of a document being relevant to a query.
Text-Similarity Metrics
2.8

Measures of similarity between documents and queries, including TF-IDF and cosine similarity.

Key Topics

Memory Read
BA-01

Memory Read operation involves retrieving data from memory locations. It is a fundamental operation in microprocessor-based systems.
Memory Write
BA-02

Memory Write operation involves storing data in memory locations. It is a crucial operation in microprocessor-based systems.
I/O Read
BA-03

I/O Read operation involves retrieving data from input/output devices. It enables the microprocessor to interact with the external environment.
I/O Write
BA-04

I/O Write operation involves sending data to input/output devices. It enables the microprocessor to interact with the external environment.
Direct Memory Access
BA-05

Direct Memory Access (DMA) is a technique that allows peripheral devices to access system memory directly, reducing the microprocessor's workload.
Interrupt
BA-06

An interrupt is a signal to the microprocessor that an event has occurred, requiring immediate attention. It enables the microprocessor to handle asynchronous events.
Types of Interrupts
BA-07

There are different types of interrupts, including maskable and non-maskable interrupts, which vary in their priority and handling by the microprocessor.
Interrupt Masking
BA-08

Interrupt Masking is a technique that enables the microprocessor to temporarily ignore or mask interrupts, allowing it to focus on high-priority tasks.
Non-Overlapping Lists
BA-09

Non-overlapping lists are used in some retrieval models to improve the efficiency of retrieval by reducing the number of documents to be ranked.
Proximal Nodes Mode
BA-10

The proximal nodes mode is a retrieval model that uses the proximity of terms in a document to improve the retrieval of relevant documents.
Performing CDB and PDB Flashback
BA-11

Understanding flashback technology, including performing flashback on Container Database (CDB) and Pluggable Database (PDB).

Relevance and Retrieval, performance metrics, Basic Measures of text retrieval (Recall, Precision and F-measure)

Key Topics

Query Processing
QU-1

Concept of query processing, including the steps involved in processing a query and the role of the query processor.
Query Trees and Heuristics
QU-2

Query trees and heuristics for query optimization, including the use of query trees to represent queries and heuristics to guide optimization.
Query Execution Plans
QU-3

Choice of query execution plans, including the factors that influence the choice of plan and the importance of plan selection.
Cost-Based Optimization
QU-4

Cost-based optimization, including the use of cost estimates to guide optimization and the role of cost-based optimization in query processing.

Word statistics (Zipf's law), Morphological analysis, Index term selection, Using thesauri, Metadata, Text representation using markup languages (SGML, HTML, XML)

Key Topics

Challenges and Approach of E-government Security
SE-1

This topic covers the challenges faced by e-government in terms of security and the approaches to address them. It explores the importance of security in e-government and the ways to mitigate risks.
Security Management Model
SE-2

This topic introduces a security management model for e-government, outlining the key components and processes involved in ensuring the security of e-government systems.
E-Government Security Architecture
SE-3

This topic delves into the architecture of e-government security, including the design and implementation of secure systems and infrastructure for e-government services.
Security Standards
SE-4

This topic covers the security standards and guidelines for e-government, including international standards and best practices for ensuring the security of e-government systems and data.
Data Transaction Security
SE-5

Security measures for protecting data during transactions in e-commerce.
Security Mechanisms
SE-6

Various security mechanisms used in e-commerce including cryptography, hash functions, digital signatures, authentication, access controls, intrusion detection systems, and secured socket layer (SSL).

Categorization algorithms (Rocchio; naive Bayes; decision trees; and nearest neighbor), Clustering algorithms (agglomerative clustering; k-means; expectation maximization (EM)) ,Applications to information filtering; organization

Personalization, Collaborative filtering recommendation, Content-based recommendation

Information extraction and applications, Extracting data from text, Evaluating IE Accuracy, XML and Information Extraction, Semantic web (purpose, Relation to hypertext page), Collecting and integrating specialized information on the web.

Probabilistic models, Generalized Vector Space Model, Latent Semantic Indexing (LSI), Efficient string searching, Pattern matching

Key Topics

Multiple Correlation
MU-1

Introduction to multiple correlation, its concept, and application in statistics. Exploring the relationship between multiple variables.
Partial Correlation
MU-2

Understanding partial correlation, its concept, and application in statistics. Analyzing the relationship between two variables while controlling for other variables.
Introduction to Multiple Linear Regression
MU-3

Basic concepts and principles of multiple linear regression, including model formulation and estimation. Understanding the relationship between multiple independent variables and a dependent variable.
Hypothesis Testing of Multiple Regression
MU-4

Testing hypotheses in multiple regression, including significance testing and confidence intervals. Evaluating the overall fit and significance of the regression model.

Lab works

The laboratory should contain all the features mentioned in a course

Samples

1. Program to demonstrate the Boolean Retrieval Model and Vector Space Model

2. Program to find the similarity between documents

3. Tokenize the words of large documents according to type and token.

4. Segment the documents according to sentences

5. Implement Porter stemmer

6. Try to build a stemmer for Nepali language

7. Build a spider that tracks only the link of nepali documents

8. Group the online news onto different categorize like sports, entertainment, politics

9. Build a recommender system for online music store

Trivhuwan University

Pokhara University

Course Content

Other Courses

Information Retrieval - Syllabus

Units

1 Introduction

Introduction

Key Topics

Introduction to Computers

Digital and Analog Computers

Characteristics of Computers

History of Computers

Generations of Computers

Classification of Computers

2 2. Basic IR Models

2. Basic IR Models

Key Topics

Introduction to IR Models

Taxonomy of IR Models

Document Retrieval and Ranking

Formal Characterization of IR Models

Boolean Retrieval Model

Vector-Space Retrieval Model

Probabilistic Retrieval Model

Text-Similarity Metrics

3 Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval

Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval

Key Topics

Memory Read

Memory Write

I/O Read

I/O Write

Direct Memory Access

Interrupt

Types of Interrupts

Interrupt Masking

Non-Overlapping Lists

Proximal Nodes Mode

Performing CDB and PDB Flashback

4 Experimental Evaluation of IR

Experimental Evaluation of IR

5 Query Operations and Languages

Query Operations and Languages

Key Topics

Query Processing

Query Trees and Heuristics

Query Execution Plans

Cost-Based Optimization

6 Text Representation

Text Representation

7 Search Engine

Search Engine

Key Topics

Challenges and Approach of E-government Security

Security Management Model

E-Government Security Architecture

Security Standards

Data Transaction Security

Security Mechanisms

8 Text Categorization and Clustering

Text Categorization and Clustering

9 Recommender Systems

Recommender Systems

10 Information Extraction and Integration

Information Extraction and Integration

11 Advanced IR Models with indexing and searching text

Advanced IR Models with indexing and searching text

12 Multimedia IR

Multimedia IR

Key Topics

Multiple Correlation

Partial Correlation

Introduction to Multiple Linear Regression

Hypothesis Testing of Multiple Regression

Lab works