11 research outputs found

    ChatGPT Versus Modest Large Language Models: An Extensive Study on Benefits and Drawbacks for Conversational Search

    Get PDF
    Large Language Models (LLMs) are effective in modeling text syntactic and semantic content, making them a strong choice to perform conversational query rewriting. While previous approaches proposed NLP-based custom models, requiring significant engineering effort, our approach is straightforward and conceptually simpler. Not only do we improve effectiveness over the current state-of-the-art, but we also curate the cost and efficiency aspects. We explore the use of pre-trained LLMs fine-tuned to generate quality user query rewrites, aiming to reduce computational costs while maintaining or improving retrieval effectiveness. As a first contribution, we study various prompting approaches - including zero, one, and few-shot methods - with ChatGPT (e.g., gpt-3.5-turbo). We observe an increase in the quality of rewrites leading to improved retrieval. We then fine-tuned smaller open LLMs on the query rewriting task. Our results demonstrate that our fine-tuned models, including the smallest with 780 million parameters, achieve better performance during the retrieval phase than gpt-3.5-turbo. To fine-tune the selected models, we used the QReCC dataset, which is specifically designed for query rewriting tasks. For evaluation, we used the TREC CAsT datasets to assess the retrieval effectiveness of the rewrites of both gpt-3.5-turbo and our fine-tuned models. Our findings show that fine-tuning LLMs on conversational query rewriting datasets can be more effective than relying on generic instruction-tuned models or traditional query reformulation techniques

    On Improving Efficiency/Effectiveness trade-offs with Neural Network Compression

    No full text
    Deep Neural Networks (DNNs) deliver state-of-the-art performance in various fields at the price of huge computational requirements. In this thesis, we propose three solutions to reduce the computational requirements of DNNs in Learning to Rank (LtR), Image Classification, and multi-term Dense Retrieval (DR). LtR is the field of machine learning employed to rank candidate documents in a search engine. We propose a methodology to train efficient and effective neural networks for LtR by e employing pruning and cross-modal knowledge distillation. Furthermore, we develop analytic time predictors estimating the execution time of sparse and dense neural networks, thus easing the design of neural models matching the desired time requirements. In Image Classification, we propose Automatic Prune Binarization (APB), a novel compression framework enriching the expressiveness of binary networks with few full-precision weights. Moreover, we design two innovative matrix multiplication algorithms for extremely low bits configurations, based on highly efficient bitwise and logical CPU instructions. In multi-term DR, we propose two different contributions, working with uncompressed and compressed vector representations, respectively. The former exploits query terms and document terms merging to speedup the search phase while jointly reducing the memory footprint. The latter introduces Product Quantization during the document scoring phase and presents a highly efficient filtering step implemented using bit vectors. Le Reti Neurali Profonde (DNN) sono l’attuale stato dell’arte nel Machine Learning (ML), ma richiedono enormi requisiti computazionali. In questa tesi, proponiamo tre soluzioni per ridurre tali requisiti nei tasks di Learning to Rank (LtR), classificazione delle immagini e multi-term Dense Retrieval (DR). LtR è il campo del (ML) utilizzato per ordinare i documenti candidati in un motore di ricerca. Viene proposta una metodologia per addestrare reti neurali efficienti ed efficaci per LtR utilizzando il pruning e la knowledge distillation. Inoltre, vengono sviluppati dei predittori analitici che stimano la latenza di reti neurali sparse e dense, semplificandonde così la progettazione. Nella classificazione delle immagini, proponiamo Automatic Prune Binarization (APB), un nuovo framework di compressione che arricchisce l'espressività delle reti binarie con pochi pesi full-precision. Inoltre, progettiamo due algoritmi innovativi di moltiplicazione tra matrici per configurazioni a pochi bit, basati sulle efficienti istruzioni bitwise e logiche della CPU. Nel multi-term DR, vengono proposti due contributi, rispettivamente per rappresentazioni vettoriali compresse e non compresse. Il primo sfrutta la fusione dei termini di query e documenti per velocizzare la fase di ricerca, riducendo anche la memoria necessaria. Il secondo introduce Product Quantization durante la fase di scoring del documento e presenta una fase di filtraggio efficiente implementata utilizzando bit vectors.

    Pairing Clustered Inverted Indexes with κ-NN Graphs for Fast Approximate Retrieval over Learned Sparse Representations

    No full text
    Learned sparse representations form an effective and interpretable class of embeddings for text retrieval. While exact top-k retrieval over such embeddings faces efficiency challenges, a recent algorithm called Seismic has enabled remarkably fast, highly-accurate approximate retrieval. Seismic statically prunes inverted lists, organizes each list into geometrically-cohesive blocks, and augments each block with a summary vector. At query time, each inverted list associated with a query term is traversed one block at a time in an arbitrary order, with the inner product between the query and summaries determining if a block must be evaluated. When a block is deemed promising, its documents are fully evaluated with a forward index. Seismic is one to two orders of magnitude faster than state-of-the-art inverted index-based solutions and significantly outperforms the winning graph-based submissions to the BigANN 2023 Challenge. In this work, we speed up Seismic further by introducing two innovations to its query processing subroutine. First, we traverse blocks in order of importance, rather than arbitrarily. Second, we take the list of documents retrieved by Seismic and expand it to include the neighbors of each document using an offline k-regular nearest neighbor graph; the expanded list is then ranked to produce the final top-k set. Experiments on two public datasets show that our extension, named SeismicWave, can reach almost-exact accuracy levels and is up to 2.2x faster than Seismic

    Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations

    Full text link
    Learned sparse representations form an attractive class of contextual embeddings for text retrieval. That is so because they are effective models of relevance and are interpretable by design. Despite their apparent compatibility with inverted indexes, however, retrieval over sparse embeddings remains challenging. That is due to the distributional differences between learned embeddings and term frequency-based lexical models of relevance such as BM25. Recognizing this challenge, a great deal of research has gone into, among other things, designing retrieval algorithms tailored to the properties of learned sparse representations, including approximate retrieval systems. In fact, this task featured prominently in the latest BigANN Challenge at NeurIPS 2023, where approximate algorithms were evaluated on a large benchmark dataset by throughput and recall. In this work, we propose a novel organization of the inverted index that enables fast yet effective approximate retrieval over learned sparse embeddings. Our approach organizes inverted lists into geometrically-cohesive blocks, each equipped with a summary vector. During query processing, we quickly determine if a block must be evaluated using the summaries. As we show experimentally, single-threaded query processing using our method, Seismic, reaches sub-millisecond per-query latency on various sparse embeddings of the MS MARCO dataset while maintaining high recall. Our results indicate that Seismic is one to two orders of magnitude faster than state-of-the-art inverted index-based solutions and further outperforms the winning (graph-based) submissions to the BigANN Challenge by a significant margin

    Distilled Neural Networks for Efficient Learning to Rank

    Full text link

    ChatGPT Versus Modest Large Language Models: An Extensive Study on Benefits and Drawbacks for Conversational Search

    No full text
    Large Language Models (LLMs) are effective in modeling text syntactic and semantic content, making them a strong choice to perform conversational query rewriting. While previous approaches proposed NLP-based custom models, requiring significant engineering effort, our approach is straightforward and conceptually simpler. Not only do we improve effectiveness over the current state-of-the-art, but we also curate the cost and efficiency aspects. We explore the use of pre-trained LLMs fine-tuned to generate quality user query rewrites, aiming to reduce computational costs while maintaining or improving retrieval effectiveness. As a first contribution, we study various prompting approaches — including zero, one, and few-shot methods — with ChatGPT (e.g., gpt-3.5-turbo). We observe an increase in the quality of rewrites leading to improved retrieval. We then fine-tuned smaller open LLMs on the query rewriting task. Our results demonstrate that our fine-tuned models, including the smallest with 780 million parameters, achieve better performance during the retrieval phase than gpt-3.5-turbo. To fine-tune the selected models, we used the QReCC dataset, which is specifically designed for query rewriting tasks. For evaluation, we used the TREC CAsT datasets to assess the retrieval effectiveness of the rewrites of both gpt-3.5-turbo and our fine-tuned models. Our findings show that fine-tuning LLMs on conversational query rewriting datasets can be more effective than relying on generic instruction-tuned models or traditional query reformulation techniques
    corecore