Information Retrieval

(updated Jan. 2017)

 

Course Type      : Elective

Course Credits  : 06

Contact Hours   : 04 (theory) + 04 (laboratory, tutorials, presentations) per week

 

Course Objectives & Prerequisites: 

 

The course aims to introduce the paradigms and techniques of modern Information Retrieval (IR). It focuses on the information retrieval from the World Wide Web (Web) and describes algorithms, data structures and techniques for it. 

 

The course is designed as an introductory course in IR and as such only assumes that the student opting for this elective course has successfully completed a basic course in programming and understands fundamental concepts in Computer Networks and the Web. A prior course in Data Structures and Artificial Intelligence and hands-on JAVA/ Python/ R programming will help improve the pace of learning.  

 

 

Course Contents:   

 

Introduction: Information, Information Need and Relevance; The IR System; Early developments in IR, User Interfaces.

 

Retrieval and IR Models: Boolean Retrieval; Term Vocabulary and Postings list; Index Construction; Ranked and other alternative Retrieval Models.

 

Retrieval Evaluation: Notion of Precision and Recall; Precision-Recall Curve, Standard Performance Measures such as MAP, Reciprocal ranks, F-measure, NDCG, Rank Correlation.

 

Document Processing: Representation; Vector Space Model; Feature Selection; Stop Words; Stemming; Notion of Document Similarity; Standard Datasets..

 

Classification and Clustering: Notion of Supervised and Unsupervised Algorithms; Naive Bayes, Nearest Neighbour and Rochio’s algorithms for Text Classification; Clustering Methods such as K-Means.

 

Applications/ Laboratory Exercises.

 

 

Text and Reference Books: 

  • Ricardo Baeza-Yaets and Berthier Ribeiro-Neto, Modern Information Retrieval: The Concept and Technology behind Search, 2nd Edition, Addison-Wesley, 2011. [Companion Website - contains certain downloadable chapters, slides and resources ]

  • C.D. Manning, P. Raghvan and H. Schutze, Introduction to Information Retrieval, Cambridge University Press, 2008 [Companion Website - contains downloadable book, slides and exercises]

  • David A. Grossman and Ophir Frieder, Information Retrieval: Algorithms and Heuristics, 2nd Ed., Springer, 2008.

  • Stephen Buettcher, Charles L.A. Clarke and Gordon V. Carmack, Information Retrieval: Implementing and Evaluating Search Engines, MIT Press, 2010

  • Bruce Croft, Donald Metzler and Trevor Strohman, Search Engines: Information Retrieval in Practice, Addison Wesley, 2009.

 

Web Resources: 

 

Students are encouraged to visit following links about representative portals/ journals/ SIGs reporting research work in IR:

 

SOME OTHER interesting LINKS :   

 

Wikipedia: Information Retrieval

A. Singhal: Modern Information Retrieval- A Brief Overview - an old but useful article

David Austin: How Google Finds Your Needle in the Web's Haystack

WB Croft: What Do People Want from Information Retrieval, Very old but still interesting

IEEE Internet Computing Article, Sep-Oct. 1997

 

Links with pointers to more resources:

A list of Information Retrieval resources by Chris Manning

Information Retrieval and the Web Research at Google

 

Slides and PDF copies of some reading material will be shared as the class progresses. More printed and online material, particularly related to the assignments, will also be suggested.

 

Assessment Criteria:

 

Mid Semester Test (Open Book, Without Prior Notice) - 20 Marks

Seminar (on a topic of contemporary research in about 25 minutes) - 10 Marks

End Semester Examination - 70 Marks

Lab Exercises - 02 out of total 06 credits

 

LAB and PRESENTATION ASSIGNMENTS

 

Queries and Feedback may be routed to vivek@bhu.ac.in