Skip to content

ielab/IR-Superproject-2023

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 

Repository files navigation

IR-Superproject-2023

The integration of large language models such as BERT, GPT, and ChatGPT into search engine applications is revolutionizing the way we search for information. This themed project aims to help you understand, engage with, and advance this technology.

Through this project, you will develop an in-depth understanding of these language models and their applications in powering search engines. You will have the opportunity to explore one of several research directions that we have identified. For instance, you may choose to investigate the effectiveness of these methods under specific conditions, such as studying possible biases and robustness issues, or you design, develop, and evaluate new solutions to address known problems that affect these methods.

While completion of the INFS7410 course at UQ, or a similar Information Retrieval and Web Search course at other universities is desirable, we will provide background information and study material in the initial weeks of the project to allow you to explore these methods in depth. Therefore, if you possess a strong understanding of key artificial intelligence concepts but lack specific information retrieval knowledge, you're still encouraged to undertake this project.

Project Directions

  1. Reproduce the paper Penha, G., Câmara, A. and Hauff, C., 2022, April. Evaluating the robustness of retrieval pipelines with query variation generators. In Advances in Information Retrieval: 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10–14, 2022, Proceedings, Part I (pp. 397-412). Cham: Springer International Publishing..

  2. Reproduce the paper Chen, X., Luo, J., He, B., Sun, L., and Sun, Y., 2023. Towards Robust Dense Retrieval via Local Ranking Alignment. Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-22).

  3. Reproduce the paper Wu, C., Zhang, R., Guo, J., Fan, Y. and Cheng, X., 2022. Are neural ranking models robust?. ACM Transactions on Information Systems, 41(2), pp.1-36..

  4. Pre-trained Language Models-based rankers for Product Search. You will work with the Amazon Shopping Queries Dataset, which is publicly available. Multiple directions here:

    • Query Generation. In product search, rankers are very effective if they model user behaviour; however new products will not have behavioural features, importantly they will not have relevant queries associated. You will be setting up a ranking pipeline (to different level of complexity) and implement query generation methods
    • Neural features such those generated from cross-encoder rankers have been shown very effective when used in a learning to rank pipeline for product search (e.g. based on gradient boosted tree). However, cross-encoder features are expensive to generate and can only be generated for historic queries, i.e. queries observed in a query log -- i.e. offline (not real time). In this direction you will study the effect non generating cross-encoder features have on the rankers, and will investigate the effectiveness of weaker but computationally feasible neaural features, e.g. those generated by bi-encoders (dense retrievers).
  5. Participate in a TREC competition (only available for students with GPA >= 6). This includes all aspects of the competition, including creation of the pipeline, baselines, implementation of our methods, result analysis. Competition of interest:

    • TREC Product Search: This competition uses the Amazon Shopping Queries Dataset](https://arxiv.org/abs/2206.06588), which is publicly available. Task 1 (Product Ranking Task): The first task focuses on product ranking. In task we provide an initial ranking of 100 documents from a BM25 baseline and you are expected to re-rank the products in terms of their relevance to the users given intent. The ranking provides a focused task where the candidate sets are fixed and there is no need to implement complex end to end systems which makes experimentation quick and runs easily comparable. Task 2 (Product Retrieval Task): The second task focuses on end to end product retrieval. In task we provide an a large collection of products and participants need to design end to end retrieval systems which leverage whichever information they find relevant/useful. Unlike the ranking task, the focus here is in understanding the interplay between retrieval and reranking systems.
    • NeuCLIR Track: The track is focused on the application of modern neural computing techniques to cross-language information retrieval. NeuCLIR topics are written in English. NeuCLIR has three target language collections in Chinese, Persian, and Russian.

Activities for Semester 2

Week Meeting Date (every Wed 2-4pm) Deliverables on this week Meeting activity Work plan Due
1
2 read related works; get familiar with your experimental environments (e.g. computation resources, datasets, code)
3 Aug 9 check up meetings read related works; get familiar with your experimental environments (e.g. computation resources, datasets, code)
4 [writing] Have a draft for related work chapter of the thesis no meeting write related work chapter and submit the draft to the teaching team; start investigating your project
5 Aug 23 [slides] Make slides to show your plans for related methods and experiment settings before this week's meeting meeting for feedback on the experiment plans make slides for your research plan (i.e. related methods and experiment settings); start experimentation
6 Aug 30 QA experimentation
7 Sep 6 feedbacks on the related work chapter experimentation; and modify the related work chapter based on feedbacks
8 Sep 13 [writings] Have a skeleton for conference paper (master only) feedbacks on paper skeleton draft skeleton based on current results and plans; experimentation
9 Sep 20 QA experimentation
10 Oct 4 [slides] Make slides to show all experiment results by now feedbacks and discussion about experiment results make slides for current experiment results (figures and tables that will be used in your paper or thesis)
11 Oct 11 feedback on paper draft (masters only) write conference paper and submit Oct 12: conference paper (masters only)
12 Oct 18 feedback on posters work on posters and submit Oct 20: poster&demonstration
13 Oct 25 feedback on thesis draft thesis draft
Nov 6: thesis report

Activities for Semester 1

Useful Links

Possible computing infrastacture

Background material and videos

Links to videos:

BERT BERT For Ranking BERT Limitations Handling Length by Scores Handling Length by Representations with PARADE
duoBERT doc2query DPR ANCE RepBERT
CLEAR EPIC DRs Performance TILDE TILDEv2

Readings:

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •