Live eval

Verify-Bench

Job URL truth — can an agent confirm a listing is still active before you tailor a CV?

← All evals

What we measure

Verify-Bench stress-tests long-horizon agent behavior on job listing URLs. Given a posting link, the agent must determine whether the role is still open, closed, or ambiguous — across multiple job boards and employer sites, over days not seconds.

Shipped in

This eval is the foundation of Lemu Work — every application workflow runs through URL verification before the user invests time in tailoring.