Information Retrieval - Syllabus
Embark on a profound academic exploration as you delve into the Information Retrieval course (IR) within the distinguished Tribhuvan university's CSIT department. Aligned with the 2074 Syllabus, this course (CSC413) seamlessly merges theoretical frameworks with practical sessions, ensuring a comprehensive understanding of the subject. Rigorous assessment based on a 60 + 20 + 20 marks system, coupled with a challenging passing threshold of , propels students to strive for excellence, fostering a deeper grasp of the course content.
This 3 credit-hour journey unfolds as a holistic learning experience, bridging theory and application. Beyond theoretical comprehension, students actively engage in practical sessions, acquiring valuable skills for real-world scenarios. Immerse yourself in this well-structured course, where each element, from the course description to interactive sessions, is meticulously crafted to shape a well-rounded and insightful academic experience.
Course Description:
This course familiarizes students with different concepts of information retrieval techniques mainly focused on clustering, classification, search engine, ranking and query operations techniques.
Course Objective:
The main objective of this course is to provide knowledge of different information retrieval techniques so that the students will be able to develop information retrieval engine.
Units
Key Topics
-
Introduction to Computers
IN-01An overview of computers and their significance in today's world. This topic sets the stage for understanding the basics of computers.
-
Digital and Analog Computers
IN-02Understanding the difference between digital and analog computers, their characteristics, and applications.
-
Characteristics of Computers
IN-03Exploring the key characteristics of computers, including input, processing, storage, and output.
-
History of Computers
IN-04A brief history of computers, from their inception to the present day, highlighting key milestones and developments.
-
Generations of Computers
IN-05Understanding the different generations of computers, including their features, advantages, and limitations.
-
Classification of Computers
IN-06Categorizing computers based on their size, functionality, and application, including desktops, laptops, and mobile devices.
-
The Computer System
IN-07An in-depth look at the components of a computer system, including hardware and software.
Key Topics
-
Types of Statistical Hypotheses
TE-1This topic covers the different types of statistical hypotheses, including null and alternative hypotheses, and their roles in hypothesis testing.
-
Power of the Test and P-Value
TE-2This topic explains the concept of power of the test, p-value, and its use in decision making during hypothesis testing.
-
Steps in Testing of Hypothesis
TE-3This topic outlines the steps involved in testing a hypothesis, from formulating the hypothesis to making a decision based on the test results.
-
One Sample Tests for Mean of Normal Population
TE-4This topic covers one sample tests for the mean of a normal population, including tests for known and unknown variance.
-
Test for Single Proportion
TE-5This topic explains how to conduct a test for a single proportion, including the test statistic and p-value calculation.
-
Test for Difference between Two Means
TE-6This topic covers the test for the difference between two means, including the test statistic and p-value calculation.
-
Test for Difference between Two Proportions
TE-7This topic explains how to conduct a test for the difference between two proportions, including the test statistic and p-value calculation.
-
Paired Sample T-Test
TE-8This topic covers the paired sample t-test, including its application and interpretation.
-
Linkage between Confidence Interval and Testing of Hypothesis
TE-9This topic explains the relationship between confidence intervals and hypothesis testing, including how to use confidence intervals to make inferences about a population.
-
Inverted Indices
TE-10Inverted indices are data structures used to store and retrieve information efficiently. They consist of a list of words and their corresponding document frequencies.
-
Positional Inverted Index
TE-11A positional inverted index is a type of inverted index that stores the position of each word in a document. This allows for more efficient phrase querying and proximity searching.
-
Natural Language Processing in Information Retrieval
TE-12Natural Language Processing (NLP) is a subfield of artificial intelligence that deals with the interaction between computers and human language. In information retrieval, NLP is used to improve the understanding and retrieval of text data.
-
Basic NLP Tasks
TE-13Basic NLP tasks include Part-of-Speech (POS) tagging and shallow parsing. These tasks are used to analyze and understand the structure and meaning of text data.
Key Topics
-
Memory Read
BA-01Memory Read operation involves retrieving data from memory locations. It is a fundamental operation in microprocessor-based systems.
-
Memory Write
BA-02Memory Write operation involves storing data in memory locations. It is a crucial operation in microprocessor-based systems.
-
I/O Read
BA-03I/O Read operation involves retrieving data from input/output devices. It enables the microprocessor to interact with the external environment.
-
I/O Write
BA-04I/O Write operation involves sending data to input/output devices. It enables the microprocessor to interact with the external environment.
-
Direct Memory Access
BA-05Direct Memory Access (DMA) is a technique that allows peripheral devices to access system memory directly, reducing the microprocessor's workload.
-
Interrupt
BA-06An interrupt is a signal to the microprocessor that an event has occurred, requiring immediate attention. It enables the microprocessor to handle asynchronous events.
-
Types of Interrupts
BA-07There are different types of interrupts, including maskable and non-maskable interrupts, which vary in their priority and handling by the microprocessor.
-
Interrupt Masking
BA-08Interrupt Masking is a technique that enables the microprocessor to temporarily ignore or mask interrupts, allowing it to focus on high-priority tasks.
-
Non-Overlapping Lists
BA-09Non-overlapping lists are used in some retrieval models to improve the efficiency of retrieval by reducing the number of documents to be ranked.
-
Proximal Nodes Mode
BA-10The proximal nodes mode is a retrieval model that uses the proximity of terms in a document to improve the retrieval of relevant documents.
Key Topics
-
Event Handling Concept
EV-1Understanding the concept of event handling in Java, including the role of listeners and event sources. This topic lays the foundation for handling events in Java applications.
-
Listener Interfaces
EV-2Exploring the different listener interfaces in Java, including their methods and usage. This topic covers the interfaces that must be implemented to handle events.
-
Using Action Commands
EV-3Learning how to use action commands to handle events, including setting and getting action commands. This topic covers the basics of using action commands in event handling.
-
Adapter Classes
EV-4Understanding the role of adapter classes in event handling, including their usage and benefits. This topic covers how adapter classes simplify event handling.
-
Handling Action Events
EV-5Handling action events, including button clicks and other actions. This topic covers the specifics of handling action events in Java applications.
-
Handling Key Events
EV-6Handling key events, including key presses and releases. This topic covers the specifics of handling key events in Java applications.
Key Topics
-
Query Processing
QU-1Concept of query processing, including the steps involved in processing a query and the role of the query processor.
-
Query Trees and Heuristics
QU-2Query trees and heuristics for query optimization, including the use of query trees to represent queries and heuristics to guide optimization.
-
Query Execution Plans
QU-3Choice of query execution plans, including the factors that influence the choice of plan and the importance of plan selection.
-
Cost-Based Optimization
QU-4Cost-based optimization, including the use of cost estimates to guide optimization and the role of cost-based optimization in query processing.
Key Topics
-
Search Engines
WE-01Understanding the working principle of search engines, including their architecture and functionality.
-
Spidering
WE-02Exploring the structure and algorithms of spiders, including simple and multithreaded approaches, and their role in web crawling.
-
Directed Spidering
WE-03Learning about topic-directed and link-directed spidering techniques, and their applications in web search.
-
Crawlers
WE-04Understanding the basic architecture and functionality of crawlers in web search.
-
Link Analysis
WE-05Studying link analysis techniques, including HITS and PageRank, and their role in ranking web pages.
-
Query Log Analysis
WE-06Analyzing query logs to understand user behavior and improve search engine performance.
-
Handling Invisible Web
WE-07Exploring techniques for handling the 'invisible' web, including snippet generation and CLIR (Cross Language Information Retrieval).
Key Topics
-
Types of Statistical Hypotheses
TE-1This topic covers the different types of statistical hypotheses, including null and alternative hypotheses, and their roles in hypothesis testing.
-
Power of the Test and P-Value
TE-2This topic explains the concept of power of the test, p-value, and its use in decision making during hypothesis testing.
-
Steps in Testing of Hypothesis
TE-3This topic outlines the steps involved in testing a hypothesis, from formulating the hypothesis to making a decision based on the test results.
-
One Sample Tests for Mean of Normal Population
TE-4This topic covers one sample tests for the mean of a normal population, including tests for known and unknown variance.
-
Test for Single Proportion
TE-5This topic explains how to conduct a test for a single proportion, including the test statistic and p-value calculation.
-
Test for Difference between Two Means
TE-6This topic covers the test for the difference between two means, including the test statistic and p-value calculation.
-
Test for Difference between Two Proportions
TE-7This topic explains how to conduct a test for the difference between two proportions, including the test statistic and p-value calculation.
Key Topics
-
Types of Statistical Hypotheses
TE-1This topic covers the different types of statistical hypotheses, including null and alternative hypotheses, and their roles in hypothesis testing.
-
Power of the Test and P-Value
TE-2This topic explains the concept of power of the test, p-value, and its use in decision making during hypothesis testing.
-
Steps in Testing of Hypothesis
TE-3This topic outlines the steps involved in testing a hypothesis, from formulating the hypothesis to making a decision based on the test results.
-
One Sample Tests for Mean of Normal Population
TE-4This topic covers one sample tests for the mean of a normal population, including tests for known and unknown variance.
-
Test for Single Proportion
TE-5This topic explains how to conduct a test for a single proportion, including the test statistic and p-value calculation.
-
Test for Difference between Two Means
TE-6This topic covers the test for the difference between two means, including the test statistic and p-value calculation.
Key Topics
-
Relational Database Design Using ER-to-Relational Mapping
RE-1Learn how to design relational databases using ER-to-relational mapping, including mapping of regular entities, weak entities, relationship types, multivalued attributes, and N-ary relationships.
-
Informal Design Guidelines for Relational Schemas
RE-2Understand informal design guidelines for relational schemas, including semantics of attributes in relations, redundant information in tuples and update anomalies, NULL values in tuples, and generation of spurious tuples.
-
Functional Dependencies
RE-3Study functional dependencies, including definition, inference rules, Armstrong's axioms, attribute closure, equivalence of functional dependencies, and minimal sets of functional dependencies.
Key Topics
-
Query Processing
QU-1Concept of query processing, including the steps involved in processing a query and the role of the query processor.
-
Query Trees and Heuristics
QU-2Query trees and heuristics for query optimization, including the use of query trees to represent queries and heuristics to guide optimization.
-
Query Execution Plans
QU-3Choice of query execution plans, including the factors that influence the choice of plan and the importance of plan selection.
-
Cost-Based Optimization
QU-4Cost-based optimization, including the use of cost estimates to guide optimization and the role of cost-based optimization in query processing.
-
Measurement of Queueing System Performance
QU-5This topic covers the metrics and methods used to measure the performance of queuing systems, including efficiency, effectiveness, and quality of service.
-
Networks of Queuing Systems
QU-6This topic explores the concept of networks of queuing systems, with a focus on computer systems and their applications.
-
Applications of Queuing Systems
QU-7This topic highlights the various applications of queuing systems in real-world scenarios, including manufacturing, healthcare, and transportation.
Key Topics
-
Active Database Concepts and Triggers
AD-1This topic covers the concepts of active databases, including triggers, and their applications in advanced database systems.
-
Temporal Database Concepts
AD-2This topic explores the concepts and techniques of temporal databases, which manage time-varying data and support temporal queries.
-
Spatial Database Concepts
AD-3This topic introduces the concepts and techniques of spatial databases, which manage spatial data and support spatial queries and analysis.
-
Multimedia Database Concepts
AD-4This topic covers the concepts and techniques of multimedia databases, which manage multimedia data such as images, audio, and video.
-
Deductive Database Concepts
AD-5This topic explores the concepts and techniques of deductive databases, which use logical rules to derive new information from existing data.
-
Introduction to Information Retrieval and Web Search
AD-6This topic provides an introduction to the concepts and techniques of information retrieval and web search, including indexing, querying, and ranking.
-
Multitasking
AD-7Multitasking in 80286 architecture allows multiple tasks to run concurrently, improving system performance and responsiveness.
Lab works
Laboratory Works:
The laboratory should contain all the features mentioned in a course. The Laboratory work should contain at least following tasks
- Program to demonstrate the Boolean Retrieval Model and Vector Space Model
- Tokenize the words of large documents according to type and token
- Program to find the similarity between documents
- Implement Porter stemmer
- Build a spider that tracks only the link of nepali documents
- Group the online news onto different categorize like sports, entertainment, politics
- Build a recommender system for online music store