REST: A Stress-Testing Framework for Evaluating Multi-Problem Reasoning in Large Reasoning Models
Large Reasoning Models (LRMs) have rapidly advanced, exhibiting impressive performance in complex problem-solving tasks across domains like mathematics, coding, and scientific reasoning. However, current evaluation approaches primarily focus on single-question…
