A lightweight, reproducible toolkit for LLM-based query reformulation
🌐 Website • 📊 Leaderboard • 🚀 Dashboard • 📚 Docs • 📦 PyPI • 📄 Paper
- Single Prompt Bank (YAML) with metadata
- Simple DataLoader: Dependency-free file loading for queries, qrels, and contexts
- Format Loaders: Optional BEIR, MS MARCO, and BRIGHT format loaders in
querygym.loaders - OpenAI-compatible LLM client (works with any OpenAI API–compatible endpoint)
- Pyserini optional: either pass contexts (JSONL) or pass a retriever instance to build contexts
- Export-only: emits reformulated queries; optionally generates a bash script for Pyserini +
trec_eval
QueryGym implements the following query reformulation methods:
| Method | Description | Paper |
|---|---|---|
| GenQR | Generic keyword expansion using LLM | Wang et al., 2023 |
| GenQR Ensemble | Ensemble of 10 instruction variants for diverse keyword expansion | Dhole & Agichtein, 2024 |
| Query2Doc | Generates pseudo-documents from LLM knowledge | Wang et al., 2023 |
| QA Expand | Question-answer based expansion with sub-questions | Seo et al., 2025 |
| MuGI | Multi-granularity information expansion with adaptive concatenation | Zhang et al., 2024 |
| LameR | Context-based passage synthesis using retrieved documents | Mackie et al., 2023 |
| CSQE | Context-based sentence-level query expansion (KEQE + CSQE) | Lee et al., 2024 |
| ThinkQE | Multi-round reasoning-based query expansion with corpus feedback | Le et al., 2025 |
| Query2E | Query to entity/keyword expansion | Jagerman et al., 2023 |
| ReFormeR | Pattern-based, document-conditioned reformulation via learned transformation rules | Bigdeli et al., 2026 |
For detailed usage and parameters, see the Methods Reference.
pip install querygym# GPU version (default)
docker pull ghcr.io/ls3-lab/querygym:latest
docker run -it --gpus all ghcr.io/ls3-lab/querygym:latest
# CPU version (lightweight)
docker pull ghcr.io/ls3-lab/querygym:cpu
docker run -it ghcr.io/ls3-lab/querygym:cpu
# Or use Docker Compose
docker compose run --rm querygym📖 Docker Setup: See DOCKER_SETUP.md for quick start or the full Docker guide for detailed usage.
import querygym as qg
# Load data
queries = qg.load_queries("queries.tsv")
qrels = qg.load_qrels("qrels.txt")
contexts = qg.load_contexts("contexts.jsonl")
# Create reformulator
reformulator = qg.create_reformulator("genqr_ensemble", model="gpt-4")
# Reformulate
results = reformulator.reformulate_batch(queries)
# Save
qg.DataLoader.save_queries(
[qg.QueryItem(r.qid, r.reformulated) for r in results],
"reformulated.tsv"
)pip install -e .[hf,beir,dev]
export OPENAI_API_KEY=sk-...
# Run a method (e.g., genqr_ensemble)
querygym run --method genqr_ensemble \
--queries-tsv queries.tsv \
--output-tsv reformulated.tsv \
--cfg-path querygym/config/defaults.yamlBEIR:
import querygym as qg
# Download with BEIR library
from beir.datasets.data_loader import GenericDataLoader
data_path = GenericDataLoader("nfcorpus").download_and_unzip()
# Load with querygym
queries = qg.loaders.beir.load_queries(data_path)
qrels = qg.loaders.beir.load_qrels(data_path)MS MARCO:
import querygym as qg
# Load from local files (download with ir_datasets)
queries = qg.loaders.msmarco.load_queries("queries.tsv")
qrels = qg.loaders.msmarco.load_qrels("qrels.tsv")BRIGHT:
import querygym as qg
from datasets import load_dataset
examples = load_dataset("xlangai/BRIGHT", "examples")["biology"]
documents = load_dataset("xlangai/BRIGHT", "documents")["biology"]
queries = qg.loaders.bright.load_queries(examples)
reasoning_queries = qg.loaders.bright.load_reasoning_queries(examples)
qrels = qg.loaders.bright.load_qrels(examples)
corpus = qg.loaders.bright.load_corpus(documents)See the examples directory for:
- Code snippets - Quick reference examples
- Docker examples - Containerized workflows with Jupyter notebooks
- QueryGym + Pyserini - Complete retrieval pipelines
- Methods Reference - Complete guide to all query reformulation methods
Check examples/README.md for the full guide.
We welcome contributions! Here's how you can help:
- Edit
querygym/prompt_bank.yaml - Add an entry with fields:
id,method_family,version,introduced_by,license,authors,tags,template:{system,user},notes
- Create a class under
querygym/methods/*.py - Subclass
BaseReformulator, annotateVERSION, and register with@register_method("name") - Pull templates via
PromptBank.render(prompt_id, query=...)
- Found a bug? Open an issue
- Have a feature request? We'd love to hear it!
For detailed development guidelines, see the Contributing Guide in our documentation.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
If you use QueryGym in your research, please cite:
@inproceedings{10.1145/3774905.3793135,
author = {Bigdeli, Amin and Hamidi Rad, Radin and Incesu, Mert and Arabzadeh, Negar and Clarke, Charles and Bagheri, Ebrahim},
title = {QueryGym: A Toolkit for Reproducible LLM-Based Query Reformulation},
booktitle = {Companion Proceedings of the ACM Web Conference 2026},
series = {WWW Companion '26},
year = {2026},
pages = {196--199},
publisher = {Association for Computing Machinery},
doi = {10.1145/3774905.3793135},
url = {https://doi.org/10.1145/3774905.3793135}
}