Scalable In-context Ranking with Generative Models
BlockRank imposes blockwise sparse attention and leverages query-token attention signals for efficient in-context ranking
BlockRank imposes blockwise sparse attention and leverages query-token attention signals for efficient in-context ranking