Niall OHiggins
Niall OHiggins

Can AI Beat a Human Dev? Jules vs Codex on Live Django Issues with APX

APX is a Continuous Reliability platform built to proactively detect and resolve production issues that easily cost your team 30-40% of their engineering time. It seamlessly integrates with your existing tools, such as GitHub, Datadog, Sentry, and Postgres, providing fixes before problems lead to major production incidents.

In this test, we evaluated two newly released coderbots, OpenAI's Codex and Google's Jules, on real-world Django production issues identified by APX. How will they do and which one has the edge?

What Makes APX Different?

Unlike traditional monitoring platforms, APX doesn't overwhelm you with data and noise that you have to manually sift through; it provides clear red/yellow/green signals and actionable solutions. APX integrates deeply with your existing workflow - with the “Copy to LLM” button, you can directly feed your APX-provided solution to an IDE like Cursor or a coderbot like Codex or Jules.

Scenario 1: Optimizing a Slow Django Endpoint

APX identified a slow query within a Django application endpoint. Typically, diagnosing such issues involves manual investigation into traces, database queries and potential code inefficiencies.

We tasked Codex and Jules to optimize this endpoint in Django, both using APX’s recommendation:

  • Jules’ response (4 min):
    • Explicitly listed fields to limit database fetch overhead, though it unnecessarily included organization_id.
    • Included a helpful comment suggesting future optimizations, like caching and database maintenance.
  • Codex’s response (3 min):
    • Similar optimization with explicit fields, but notably cleaner by excluding redundant fields.
    • Provided a concise solution without additional commentary.

Conclusion: Codex delivered a cleaner solution in less time than Jules, so Codex is the winner on this task.

Scenario 2: Removing Unused Postgres Indexes

Another test involved removing unused indexes flagged by APX in Postgres, which were causing database performance issues.

  • Jules’ approach:
    • Proposed directly dropping indexes through Django migrations without adjusting the corresponding Django models, a significant oversight.
  • Codex’s approach:
    • Similarly overlooked adjusting ORM definitions, directly suggesting index removal in migrations only.

Conclusion: Both AI models failed to grasp the full context of Django ORM & database best practices, highlighting their limitations compared to human engineers. Indeed, the changes proposed could be considered dangerous and easily lead to problems in the future.

Why Developer Oversight is Crucial

Both Codex and Jules demonstrated potential in addressing straightforward issues quickly, with Codex slightly outperforming Jules. However, neither matched the nuanced understanding of a human engineer with experience in Django framework best practices.

Leveraging APX for Continuous Reliability

APX enables teams to proactively manage reliability, performance, and cost optimization without the overhead of manual investigation. Developers maintain full control, ensuring solutions align precisely with their team's standards and practices.

Interested in experiencing APX? Sign up for a free trial to see how APX can enhance your observability and optimize your Django applications today.

Read Next

Contact Us

Contact us and we'll get back to you shortly.

hello@stepchange.work

StepChange

StepChange Labs © 2025·Privacy&Terms