Scalable In-context Ranking with Generative Models

BlockRank imposes blockwise sparse attention and leverages query-token attention signals for efficient in-context ranking

NeurIPS 2025 · 2 min · Nilesh Gupta, Chong You, Srinadh Bhojanapalli, Sanjiv Kumar, Inderjit S. Dhillon, Felix Yu ·