Orchestrating Autonomous Coding Agents - Manage Yo...

ladenaw

Today, AI-assisted coding may look something like this: a developer pulls a ticket from a Jira backlog, generates a prompt for their coding agent, iterates on the AI-generated code with the agent, and finally opens a PR to close the ticket in a review process with the team.

This approach significantly speeds up implementation, allows for shorter iteration cycles, and offloads straightforward, tedious tasks to automation. However, it still relies on a human to proactively kick off and babysit a single agent. This is why a new paradigm has recently emerged. Instead of assigning tickets to humans, some tickets are directly assigned to autonomous coding agents, which then spawn instances that draft PRs on separate branches in parallel. Humans code less and less, iterate using prompts, and focus on review tasks rather than coding.

This article discusses the benefits and pitfalls of orchestrating autonomous coding agents through issues and tickets, sheds light on how this approach can be integrated into the team and workflow of a software project, references successful industry examples, and points you to frameworks that help you get started with the orchestration of your very own agent dev team. Finally, we share a specific setup you can easily replicate to enable autonomous Claude Code agents in your git repository, leveraging SAP AI Core underneath.

Server-Side Autonomous Coding Agents – Idea and Bigger Picture

The fundamental idea of server-side autonomous coding agents is to move from having a dedicated tool on your local PC to interacting with remote agent coworkers. Instead of operating in a micro-managed, pair-programming-like setting, autonomous coding agents work independently for longer periods. They make decisions as they go and come back with a pull request when they are done: just like a human colleague!

This could enable human contributors to scale their work considerably, moving from individual agent sessions to parallel agent sessions and thus enabling faster iteration cycles. At this point, it's not about using coding agents only; it's about scaling a human dev team by integrating multiple, autonomous agent workers and controlling that scaling without compromising code quality or security.

This leaves human developers in a management position: they focus on supervisory work such as task definition, code review, and test-driven verification. It is only natural to leverage existing, well-established platforms and processes to fulfill these oversight duties:

Reference Architecture: Server-Side, Autonomous Coding Agents Produce Pull Requests Under Human Guidance

Orchestrating server-side autonomous coding agents through typical human communication channels like chat or email, handing off tasks via project management solutions like Jira or GitHub Projects, and reviewing finished work in developer platforms via pull requests offers the added advantage of working with both human and AI colleagues.

This all sounds great, but what's the catch?

The ability of AI to complete long coding tasks is high and growing, even for tasks that would take a human several hours. At the same time, agents are potentially faster and more cost-efficient than humans. This makes assigning whole tickets to them both feasible and attractive for cutting costs and increasing development speed. However, coding agents still make mistakes and can produce low-quality code. Andrej Karpathy recently summarized prototypical shortcomings of coding agents. He points out that they are prone to:

producing inefficient or sloppy architectures and concepts
bloating and over-complicating code
ignoring code maintainability
leaving outdated and unused code hanging after completing primary implementation goals
going beyond their task, working on code within the blast radius

While these issues may be solved eventually, we are not yet living in a world where humans exclusively focus on product management, goal setting, and design. Instead, it is paramount that developers and architects review agent-generated PRs thoroughly and iterate on code by requesting changes or taking over crunch-time aspects of a ticket. This also means that development speed is currently capped by the team's review bandwidth rather than their implementation velocity.

While we cannot pipe feature requests into a fully autonomous agent system that cuts the product requirements document into tickets, implements them, delivers pull requests, and ultimately returns with a new product release, development teams are certainly in a position to start thinking in that direction. Consider the following ways in which software development teams can establish an AI-ready development process that leverages the strengths of autonomous coding agents while mitigating their current flaws:

Make Supporting Documents Accessible to Coding Agents: Supporting documents like product requirements documents (PRD), architecture decision records (ADR), and documentation help agents understand the context of what you are building beyond what's clear from the code base and task. Specifically, PRD and ADR help autonomous coding agents see not only what has been built but also what shall be built from hereon. Dev teams can grant them access by either maintaining supporting documents within the same codebase (e.g., as Markdown files) or by providing indirect access through tools (e.g., an MCP connection to Aha!, Jira, Document Grounding, ...).
Embrace Implementation Speed, Govern Essentials, Take Responsibility, and Trace Usage: Looking at the shortcomings of today's coding agents, human developers must watch out for a clean architecture and code maintainability. The industry standard is to use coding agents in tightly controlled settings. Recently, the Linux Kernel Development Community has published its rules regarding the use of AI coding agents. While allowing the use of coding agents, they clarify that all contributors must take full responsibility for their code, review licensing requirements, sign off on commits with their accounts, and document the specific setup used to support implementation. Documentation includes agent names, model version, and significant tools. FOSS communities like matplotlib tend to allow generative AI for tasks that the responsible contributor can also solve themselves. Both examples stress the importance of human responsibility and source traceability. How that translates to autonomous coding agents is an open question. A good first approach might be to assign a responsible human who must finally sign off on the agent's code. Still, we are aware that this effectively bottlenecks the potential of autonomous coding agents.
Consider Agent Resources and Capabilities in Sprint Planning: With coding agents in the mix, agile development teams should be aware of project-specific learnings regarding agent reliability and use cases and consider them in their sprint planning. Building on past experiences, they can scope tickets that maximize autonomy and minimize the expected number of iterations on agent tickets. They might also choose to consciously hand off crucial tickets to humans, i.e., those that define architectures or have proven to be problematic in agentic settings in the past.
Support Junior Developers in Developing a Critical Mindset: With human developers mostly taking on the hardest and most decisive tasks of a software project and agents doing the ground work, critical thinking and system understanding become more important than ever. This raises an important question: how do junior developers skill up and learn to be critical of agent-generated code when coding agents easily solve entry-level tasks? We do not have a definitive answer to that. Still, it might be at least threefold: first, learning issues could be left intentionally open to provide ramp-up opportunities for new colleagues. This practice is currently strongly defended by, e.g., the matplotlib community. New colleagues would then work on these issues manually or in close collaboration with an agent. Second, pair programming, mentoring, and educational materials may help starters develop the necessary skills and critical mindset. Third, as the reliability of coding agents improves, many developers may use evals to validate implementations indirectly rather than checking code, effectively raising the level of abstraction. At this point, AI coding frameworks serve as operating systems or compilers. Most developers do not audit them regularly; instead, they develop on top of them.

These considerations may help anticipate and work with the principal strengths and weaknesses of autonomous coding agents in software projects, but how do coding agents fare in practice? How much more code per time can you expect to produce today? What are the implications on code quality, and – should you decide to try for yourself – how can you set up your own team of autonomous coding agents?

Industry Examples, Research, and Frameworks

Recently, autonomous coding agents have gained momentum in the industry. Famous case studies were published by Stripe and Spotify, reporting that more than 1000 pull requests were merged each week, with code generated entirely by coding agents. In these reports, humans only supervised through reviews. Both setups share some similarities, such as automatic feedback loops based on linting and testing, while also having some differences, such as the underlying coding agent: Spotify relies mostly on Claude Code, while Stripe uses a custom agent harness.

Academic research has also taken up the topic. Agarwal et al. studied the introduction of autonomous coding agents into two kinds of repository types: (1) Repositories with no prior coding agent involvement and (2) Repositories where IDE coding agents were already used before. For (1), velocity gains were discovered (+36.3% commits and +76.6% lines). However, for (2), essentially no gain was present (+3.1% commits and -6.3% lines) after adding an autonomous coding agent. In addition, for both types, quality degradation, front-loaded velocity gains, and decreasing code quality were reported. Based on the findings, the authors highlight the need for selective deployment and active oversight. Suggestions for future work revolve around the study of long-horizon post-adoption usage and collaboration patterns that balance acceleration with code quality. Overall, we see mixed reactions to autonomous coding agents. Reports evolve between enthusiasm and skepticism as they report favorable, neutral, and critical numbers.

We think autonomous coding agents will not go away, as they have shown significant potential for practical applications. We are sure that their role and autonomy in software projects will increase over time. However, research and experience clearly show that we still have much to learn about this technology and its integration into development cycles. Adapting autonomous coding agents will be a process, and adaptation speeds will differ from product to product. Until the dust settles and we have stable, industry-wide learnings, many actors should try out autonomous coding agents for themselves. After all, this is not a trend to overlook!

To get you started, you may want to check out the following tools and frameworks that enable you to collect hands-on experience in your projects:

Tools and Frameworks to Set Up and Orchestrate Autonomous Coding Agents

	Basic setup	Entry point	Key strengths	Main trade-offs
Claude Code GitHub Action	Custom GitHub Action + Claude Code	Every possible GitHub workflow trigger (mainly issue/PR creation + comments)	GitHub-native automation, powerful Claude Code coding agent, full Claude Code configuration possibilities	Limited to Claude Code coding agent, requires careful workflow setup and permission scoping, no full issue-tracker and orchestration layer
GitHub Copilot Cloud Agents	GitHub native (Action) + GitHub Copilot Cloud Agent	Issues, Agents Panel, Copilot Chat, GitHub CLI, IDEs, MCP, Jira, Slack, Teams, Linear, ...	Many different integrations and entry points, Support for multiple top-tier models (Google, OpenAI, Anthropic)	Less customizable than self-built workflows and integrations, no self-hosted option since the Agent is running in the cloud
n8n	Workflow automation + coding agent integrations	Everything that n8n supports natively or via community nodes	Extensive number of integrations, visual workflow builder	No native coding agent solution (only community integrations of coding agents, such as Claude Code into n8n)
OpenClaw	Autonomous agent + coding agent integrations	Everything that OpenClaw supports natively or via community plugins	Extensive number of integrations, high level of autonomy	No native coding agent solution, Limited guardrails and potential security problems (NemoClaw could solve some)

How to Set Up Your Coding Agent Team Using a Claude Code Action and SAP AI Core

In this section, we walk through how we added an autonomous coding agent to our GitHub repositories using a Claude Code Action and SAP AI Core. You can easily replicate this setup to get started with autonomous coding agents on top of SAP AI Core.

What is Claude Code Action?

A Claude Code Action is a GitHub Action that lets Claude work directly in GitHub. It can do so in two different modes:

In tag mode, a human initiates the process by writing something like `@claude fix the login bug` in an issue or PR comment. Claude reads the conversation context and acts on the request. This is the conversational, human-initiated path, similar to how you would ask a colleague for help.
In agent mode, no human needs to call the agent explicitly. Instead, any GitHub event can trigger the Claude Code Action. This may be a PR being opened, a cron schedule, a label being applied, or a CI failure. The event then supplies context to the action. This is what enables fully automated workflows, such as auto-reviewing every PR, triaging issues on creation, or running weekly repository maintenance.

What makes Claude Code Action so relevant to orchestrating autonomous coding agents is that it addresses many of the integration requirements outlined earlier. It hooks into GitHub's event system (like issue creation, comments, assignments), runs in a headless server session using GitHub Actions runners, and produces pull requests as output. Interacting with the agent is simple: use comments on issues and PRs.

Connect Claude Code Action to SAP AI Core

Out of the box, Claude Code Action supports multiple cloud providers: Anthropic API, Amazon Bedrock, Google Vertex AI, and Microsoft Foundry. It, however, does not natively support SAP AI Core. To bridge this gap, we extended the action with a proxy that runs as a sidecar during the GitHub Action execution. It accepts requests in Anthropic format and forwards them to SAP AI Core via LiteLLM:

Architecture: Extending Claude Code Actions to Support SAP AI Core through a LiteLLM Proxy

From Claude Code's perspective, it is talking to the standard Anthropic API. The proxy handles SAP AI Core authentication and request routing transparently. Feel free to have a look at how we implemented the necessary changes here.

Using this solution, connecting SAP AI Core to Claude Code Action is as easy as setting `use_litellm` to `"true"` in the workflow configuration. You don't need to do any additional setup for the proxy itself.

Set Up the Workflow

Here is what our GitHub Actions workflow file (`.github/workflows/claude-code.yml`) in the consuming repository looks like:

Code Snippet: Setting Up a GitHub Actions Workflow to Enable Claude Code Actions Powered by SAP AI Core

name: Claude Code

on:
  issue_comment:
    types: [created]
  issues:
    types: [opened, assigned]
  pull_request_review_comment:
    types: [created]
  pull_request_review:
    types: [submitted]

jobs:
  claude:
    runs-on: [self-hosted, solinas]
    permissions:
      contents: write
      pull-requests: write
      issues: write
      id-token: write
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 1

      - uses: <your-github-org>/claude-code-action@main
        with:
          trigger_phrase: "@claude"
          use_litellm: "true"
          litellm_model: "sap/anthropic--claude-4.6-sonnet"
          github_token: ${{ secrets.GITHUB_TOKEN }}
        env:
          AICORE_AUTH_URL: ${{ vars.AICORE_AUTH_URL }}
          AICORE_BASE_URL: ${{ vars.AICORE_BASE_URL }}
          AICORE_CLIENT_ID: ${{ vars.AICORE_CLIENT_ID }}
          AICORE_CLIENT_SECRET: ${{ secrets.AICORE_CLIENT_SECRET }}

The `use_litellm` and `litellm_model` inputs activate the proxy and tell it which model to request from SAP AI Core. The `AICORE_*` environment variables handle authentication. Everything else is standard Claude Code Action configuration. The `on:` block defines which GitHub events trigger the agent, and `trigger_phrase` sets the keyword that activates it in comments.

Once committed, writing `@claude` in an issue or PR comment triggers the agent. For fully automated workflows, like reviewing every PR or triaging new issues, you can use agent mode by providing a `prompt` input instead of relying on mentions.

(Optional) Enable Session Continuity

By default, each `@claude` invocation starts a fresh session. By adding a cache step for Claude's session data and the `--continue` flag, follow-up mentions under the same issue or PR resume the previous conversation instead of starting over. This is particularly useful when iterating on a task across multiple comments. Two steps were necessary to enable our experimental setup to continue sessions. First, we added a post-checkout script:

Code Snippet: Post-Checkout Script for Claude Session Caching

- name: Cache Claude Session
  id: cache_session
  continue-on-error: true
  uses: actions/cache@v5
  with:
    path: ~/.claude/projects
    key: claude-${{ github.event.pull_request.number && 'pr' || 'issue' }}-${{ github.event.pull_request.number || github.event.issue.number }}-${{ github.run_id }}-${{ github.run_attempt }}
    restore-keys: |
            claude-${{ github.event.pull_request.number && 'pr' || 'issue' }}-${{ github.event.pull_request.number || github.event.issue.number }}-

Second, we added `claude_args` to the `with` section of the Claude Code Action itself:

Code Snippet: Adapt Claude Code Action for Claude Session Caching

claude_args: |
  --continue

Outlook

The broad spectrum of reports ranging from positive to rather discouraging effects of (autonomous) coding agents on development speed and quality, leave plenty of room for further investigation and optimization. This calls for your own stories and experiences! Whether you start with a setup like the one described above or explore alternative frameworks, now is the perfect time to gain first experiences, establish learnings and best practices, and pave the way for new ways of developing software securely and autonomously. We will definitely continue exploring this space and are curious to hear your thoughts and experiences in the comments.

Reach out and tell us what works best for you!

# CTO Research and Innovation Munich

@ladenaw

@dDaniel

@_Ph1pp

@Martin_Obwexer

By Category

Related Content

Activity Groups

Industry Groups

Influence and Feedback Groups

Interest Groups

Location Groups

Customer Only Groups

Forums

Related Resources

Products

Learning and Support

About

My SAP Profile

My SAP Profile