Information Retrieval - Syllabus
Embark on a profound academic exploration as you delve into the Information Retrieval course () within the distinguished Tribhuvan university's CSIT department. Aligned with the 2065 Syllabus, this course (CSC-405) seamlessly merges theoretical frameworks with practical sessions, ensuring a comprehensive understanding of the subject. Rigorous assessment based on a 60 marks system, coupled with a challenging passing threshold of , propels students to strive for excellence, fostering a deeper grasp of the course content.
This 3 credit-hour journey unfolds as a holistic learning experience, bridging theory and application. Beyond theoretical comprehension, students actively engage in practical sessions, acquiring valuable skills for real-world scenarios. Immerse yourself in this well-structured course, where each element, from the course description to interactive sessions, is meticulously crafted to shape a well-rounded and insightful academic experience.
Course Synopsis: Advanced aspects of Information Retrieval and Search Engine
Goal: To study advance aspects of information retrieval and working principle of search engine, encompassing the principles, research results and commercial application of the current technologies.
Units
Key Topics
-
Introduction to Computers
IN-01An overview of computers and their significance in today's world. This topic sets the stage for understanding the basics of computers.
-
Digital and Analog Computers
IN-02Understanding the difference between digital and analog computers, their characteristics, and applications.
-
Characteristics of Computers
IN-03Exploring the key characteristics of computers, including input, processing, storage, and output.
-
History of Computers
IN-04A brief history of computers, from their inception to the present day, highlighting key milestones and developments.
-
Generations of Computers
IN-05Understanding the different generations of computers, including their features, advantages, and limitations.
-
Classification of Computers
IN-06Categorizing computers based on their size, functionality, and application, including desktops, laptops, and mobile devices.
Key Topics
-
Introduction to IR Models
2.1Overview of information retrieval models and their significance in IR systems.
-
Taxonomy of IR Models
2.2Categorization of information retrieval models and their relationships.
-
Document Retrieval and Ranking
2.3The process of retrieving and ranking documents based on relevance to a query.
-
Formal Characterization of IR Models
2.4Mathematical representation of IR models and their underlying assumptions.
-
Boolean Retrieval Model
2.5A model that retrieves documents based on exact matching of query terms.
-
Vector-Space Retrieval Model
2.6A model that represents documents and queries as vectors in a high-dimensional space.
-
Probabilistic Retrieval Model
2.7A model that estimates the probability of a document being relevant to a query.
-
Text-Similarity Metrics
2.8Measures of similarity between documents and queries, including TF-IDF and cosine similarity.
Key Topics
-
Memory Read
BA-01Memory Read operation involves retrieving data from memory locations. It is a fundamental operation in microprocessor-based systems.
-
Memory Write
BA-02Memory Write operation involves storing data in memory locations. It is a crucial operation in microprocessor-based systems.
-
I/O Read
BA-03I/O Read operation involves retrieving data from input/output devices. It enables the microprocessor to interact with the external environment.
-
I/O Write
BA-04I/O Write operation involves sending data to input/output devices. It enables the microprocessor to interact with the external environment.
-
Direct Memory Access
BA-05Direct Memory Access (DMA) is a technique that allows peripheral devices to access system memory directly, reducing the microprocessor's workload.
-
Interrupt
BA-06An interrupt is a signal to the microprocessor that an event has occurred, requiring immediate attention. It enables the microprocessor to handle asynchronous events.
-
Types of Interrupts
BA-07There are different types of interrupts, including maskable and non-maskable interrupts, which vary in their priority and handling by the microprocessor.
-
Interrupt Masking
BA-08Interrupt Masking is a technique that enables the microprocessor to temporarily ignore or mask interrupts, allowing it to focus on high-priority tasks.
-
Non-Overlapping Lists
BA-09Non-overlapping lists are used in some retrieval models to improve the efficiency of retrieval by reducing the number of documents to be ranked.
-
Proximal Nodes Mode
BA-10The proximal nodes mode is a retrieval model that uses the proximity of terms in a document to improve the retrieval of relevant documents.
-
Performing CDB and PDB Flashback
BA-11Understanding flashback technology, including performing flashback on Container Database (CDB) and Pluggable Database (PDB).
Relevance and Retrieval, performance metrics, Basic Measures of text retrieval (Recall, Precision and F-measure)
Key Topics
-
Query Processing
QU-1Concept of query processing, including the steps involved in processing a query and the role of the query processor.
-
Query Trees and Heuristics
QU-2Query trees and heuristics for query optimization, including the use of query trees to represent queries and heuristics to guide optimization.
-
Query Execution Plans
QU-3Choice of query execution plans, including the factors that influence the choice of plan and the importance of plan selection.
-
Cost-Based Optimization
QU-4Cost-based optimization, including the use of cost estimates to guide optimization and the role of cost-based optimization in query processing.
Word statistics (Zipf's law), Morphological analysis, Index term selection, Using thesauri, Metadata, Text representation using markup languages (SGML, HTML, XML)
Search engines (working principle), Spidering (Structure of a spider, Simple spidering algorithm, multithreaded spidering, Bot), Directed spidering(Topic directed, Link directed) ,Crawlers (Basic crawler architecture), Link analysis (e.g. hubs and authorities, Page ranking, Google Page Rank), shopping agents
Categorization algorithms (Rocchio; naive Bayes; decision trees; and nearest neighbor), Clustering algorithms (agglomerative clustering; k-means; expectation maximization (EM)) ,Applications to information filtering; organization
Personalization, Collaborative filtering recommendation, Content-based recommendation
Information extraction and applications, Extracting data from text, Evaluating IE Accuracy, XML and Information Extraction, Semantic web (purpose, Relation to hypertext page), Collecting and integrating specialized information on the web.
Probabilistic models, Generalized Vector Space Model, Latent Semantic Indexing (LSI), Efficient string searching, Pattern matching
Introduction, multimedia data support in commercial DBMSs, Query languages, Trends and research issues
Lab works
The laboratory should contain all the features mentioned in a course
Samples
1. Program to demonstrate the Boolean Retrieval Model and Vector Space Model
2. Program to find the similarity between documents
3. Tokenize the words of large documents according to type and token.
4. Segment the documents according to sentences
5. Implement Porter stemmer
6. Try to build a stemmer for Nepali language
7. Build a spider that tracks only the link of nepali documents
8. Group the online news onto different categorize like sports, entertainment, politics
9. Build a recommender system for online music store