Anthropic Reveals New Test to Counter AI Cheating

Anthropic Reveals New Test to Counter AI Cheating

Anthropic has redesigned its take-home coding test after AI models matched top candidates, exposing a new challenge for hiring in the age of advanced assistants.

At a Glance

  • The company’s performance optimization team introduced a take-home test in 2024.
  • AI models Claude Opus 4 and Claude Opus 4.5 soon surpassed many human applicants.
  • A new test version was released on January 22, 2026 to outpace current AI.
  • The change aims to keep the hiring process fair without in-person proctoring.
  • Why it matters: It highlights the growing difficulty of assessing human skill when AI can imitate it.
keep

The challenge of evaluating talent has become more complex as artificial intelligence grows smarter. In a blog post published on Wednesday, team lead Tristan Hume explained how Anthropic’s take-home test has evolved to stay ahead of the curve.

Background

Since 2024, the performance optimization team has required job applicants to complete a take-home coding test. The goal was to verify that candidates possess the necessary skills to work on Anthropic’s next-generation models.

The Challenge of AI Cheating

AI coding tools have improved rapidly, and the test needed frequent updates to remain effective. Hume wrote:

> “Each new Claude model has forced us to redesign the test,” Hume writes.

When Claude Opus 4 was released, it outperformed most human applicants within the same time limit. This was a warning sign that the test could no longer distinguish the strongest candidates from the model’s output.

> “When given the same time limit, Claude Opus 4 outperformed most human applicants. That still allowed us to distinguish the strongest candidates – but then, Claude Opus 4.5 matched even those.”

The problem became acute because the test is take-home and unsupervised. Without proctoring, there is no way to guarantee that an applicant isn’t using an AI assistant.

> “Under the constraints of the take-home test, we no longer had a way to distinguish between the output of our top candidates and our most capable model,” Hume writes.

This issue mirrors the broader concerns about AI cheating in academic settings, yet Anthropic is uniquely positioned to address it.

Evolution of the Test

The test has gone through several iterations. Below is a concise timeline of the key changes:

Date Model Test Status Key Change
2024 Claude Opus 4 Outperformed most human applicants Redesign needed
2025 Claude Opus 4.5 Matched top human candidates Further redesign
January 22, 2026 New test released AI-resistant design

Key Adjustments

  • Shifted focus from hardware optimization to novel problem sets.
  • Reduced reliance on patterns that AI models can easily replicate.
  • Increased emphasis on creative problem-solving and human intuition.
  • Added time constraints that limit the depth of AI-generated solutions.

Impact on Hiring

The inability to differentiate between human and AI output threatens Anthropic’s hiring integrity. If an applicant uses an AI tool, they could potentially outperform real candidates, skewing the selection process.

Anthropic’s approach to this dilemma demonstrates a proactive stance. By continuously updating the test, the company aims to preserve the meritocratic nature of its hiring pipeline.

Anthropic’s New Solution

Hume’s latest iteration was designed to be sufficiently novel that contemporary AI tools struggle to solve it. The test now includes:

  • Unpredictable constraints that require on-the-fly adaptation.
  • Domain-specific puzzles that demand deep knowledge of machine-learning theory.
  • Human-centric reasoning tasks that test logical flow rather than code syntax.

Hume invited the community to challenge the new test:

> “If you can best Opus 4.5,” the post reads, “we’d love to hear from you.”

This call to action encourages external developers to test the limits of the new test and help Anthropic refine it further.

Call to Action

Anthropic’s updated test is publicly available. Candidates and researchers are encouraged to attempt it and share their solutions. The company welcomes constructive feedback to ensure the assessment remains robust.

Key Takeaways

  • Anthropic introduced a take-home coding test in 2024 to evaluate talent.
  • AI models Claude Opus 4 and Claude Opus 4.5 soon matched or surpassed top human candidates.
  • The unsupervised nature of the test made it vulnerable to AI cheating.
  • A new test version was released on January 22, 2026 with AI-resistant features.
  • The company seeks external input to continuously improve the assessment.

The story underscores the evolving battle between AI capabilities and human evaluation methods, reminding organizations that hiring processes must adapt alongside technological progress.

Author

  • I’m Michael A. Turner, a Philadelphia-based journalist with a deep-rooted passion for local reporting, government accountability, and community storytelling.

    Michael A. Turner covers Philadelphia city government for Newsofphiladelphia.com, turning budgets, council votes, and municipal documents into clear stories about how decisions affect neighborhoods. A Temple journalism grad, he’s known for data-driven reporting that holds city hall accountable.

Leave a Comment

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *