Scalable In-context Ranking with Generative Models
BlockRank imposes blockwise sparse attention and leverages query-token attention signals for efficient in-context ranking
BlockRank imposes blockwise sparse attention and leverages query-token attention signals for efficient in-context ranking
Learnable graph-based search index for classification/retrieval in large output space, scalable to label space on a single A100 GPU, achieves SOTA on multiple large-scale extreme classification benchmarks