The fundamental idea of server-side autonomous coding agents is to move from having a dedicated tool on your local PC to interacting with remote agent coworkers. Instead of operating in a micro-managed, pair-programming-like setting, autonomous coding agents work independently for longer periods. They make decisions as they go and come back with a pull request when they are done: just like a human colleague!
This could enable human contributors to scale their work considerably, moving from individual agent sessions to parallel agent sessions and thus enabling faster iteration cycles. At this point, it's not about using coding agents only; it's about scaling a human dev team by integrating multiple, autonomous agent workers and controlling that scaling without compromising code quality or security.
This leaves human developers in a management position: they focus on supervisory work such as task definition, code review, and test-driven verification. It is only natural to leverage existing, well-established platforms and processes to fulfill these oversight duties:
Reference Architecture: Server-Side, Autonomous Coding Agents Produce Pull Requests Under Human Guidance
Orchestrating server-side autonomous coding agents through typical human communication channels like chat or email, handing off tasks via project management solutions like Jira or GitHub Projects, and reviewing finished work in developer platforms via pull requests offers the added advantage of working with both human and AI colleagues.
This all sounds great, but what's the catch?
The ability of AI to complete long coding tasks is high and growing, even for tasks that would take a human several hours. At the same time, agents are potentially faster and more cost-efficient than humans. This makes assigning whole tickets to them both feasible and attractive for cutting costs and increasing development speed. However, coding agents still make mistakes and can produce low-quality code. Andrej Karpathy recently summarized prototypical shortcomings of coding agents. He points out that they are prone to:
While these issues may be solved eventually, we are not yet living in a world where humans exclusively focus on product management, goal setting, and design. Instead, it is paramount that developers and architects review agent-generated PRs thoroughly and iterate on code by requesting changes or taking over crunch-time aspects of a ticket. This also means that development speed is currently capped by the team's review bandwidth rather than their implementation velocity.
While we cannot pipe feature requests into a fully autonomous agent system that cuts the product requirements document into tickets, implements them, delivers pull requests, and ultimately returns with a new product release, development teams are certainly in a position to start thinking in that direction. Consider the following ways in which software development teams can establish an AI-ready development process that leverages the strengths of autonomous coding agents while mitigating their current flaws:
These considerations may help anticipate and work with the principal strengths and weaknesses of autonomous coding agents in software projects, but how do coding agents fare in practice? How much more code per time can you expect to produce today? What are the implications on code quality, and – should you decide to try for yourself – how can you set up your own team of autonomous coding agents?
Recently, autonomous coding agents have gained momentum in the industry. Famous case studies were published by Stripe and Spotify, reporting that more than 1000 pull requests were merged each week, with code generated entirely by coding agents. In these reports, humans only supervised through reviews. Both setups share some similarities, such as automatic feedback loops based on linting and testing, while also having some differences, such as the underlying coding agent: Spotify relies mostly on Claude Code, while Stripe uses a custom agent harness.
Academic research has also taken up the topic. Agarwal et al. studied the introduction of autonomous coding agents into two kinds of repository types: (1) Repositories with no prior coding agent involvement and (2) Repositories where IDE coding agents were already used before. For (1), velocity gains were discovered (+36.3% commits and +76.6% lines). However, for (2), essentially no gain was present (+3.1% commits and -6.3% lines) after adding an autonomous coding agent. In addition, for both types, quality degradation, front-loaded velocity gains, and decreasing code quality were reported. Based on the findings, the authors highlight the need for selective deployment and active oversight. Suggestions for future work revolve around the study of long-horizon post-adoption usage and collaboration patterns that balance acceleration with code quality. Overall, we see mixed reactions to autonomous coding agents. Reports evolve between enthusiasm and skepticism as they report favorable, neutral, and critical numbers.
We think autonomous coding agents will not go away, as they have shown significant potential for practical applications. We are sure that their role and autonomy in software projects will increase over time. However, research and experience clearly show that we still have much to learn about this technology and its integration into development cycles. Adapting autonomous coding agents will be a process, and adaptation speeds will differ from product to product. Until the dust settles and we have stable, industry-wide learnings, many actors should try out autonomous coding agents for themselves. After all, this is not a trend to overlook!
To get you started, you may want to check out the following tools and frameworks that enable you to collect hands-on experience in your projects:
Tools and Frameworks to Set Up and Orchestrate Autonomous Coding Agents
| Basic setup | Entry point | Key strengths | Main trade-offs | |
| Claude Code GitHub Action | Custom GitHub Action + Claude Code | Every possible GitHub workflow trigger (mainly issue/PR creation + comments) | GitHub-native automation, powerful Claude Code coding agent, full Claude Code configuration possibilities | Limited to Claude Code coding agent, requires careful workflow setup and permission scoping, no full issue-tracker and orchestration layer |
| GitHub Copilot Cloud Agents | GitHub native (Action) + GitHub Copilot Cloud Agent | Issues, Agents Panel, Copilot Chat, GitHub CLI, IDEs, MCP, Jira, Slack, Teams, Linear, ... | Many different integrations and entry points, Support for multiple top-tier models (Google, OpenAI, Anthropic) | Less customizable than self-built workflows and integrations, no self-hosted option since the Agent is running in the cloud |
| n8n | Workflow automation + coding agent integrations | Everything that n8n supports natively or via community nodes | Extensive number of integrations, visual workflow builder | No native coding agent solution (only community integrations of coding agents, such as Claude Code into n8n) |
| OpenClaw | Autonomous agent + coding agent integrations | Everything that OpenClaw supports natively or via community plugins | Extensive number of integrations, high level of autonomy | No native coding agent solution, Limited guardrails and potential security problems (NemoClaw could solve some) |
Architecture: Extending Claude Code Actions to Support SAP AI Core through a LiteLLM Proxy
From Claude Code's perspective, it is talking to the standard Anthropic API. The proxy handles SAP AI Core authentication and request routing transparently. Feel free to have a look at how we implemented the necessary changes here.
Using this solution, connecting SAP AI Core to Claude Code Action is as easy as setting `use_litellm` to `"true"` in the workflow configuration. You don't need to do any additional setup for the proxy itself.
name: Claude Code
on:
issue_comment:
types: [created]
issues:
types: [opened, assigned]
pull_request_review_comment:
types: [created]
pull_request_review:
types: [submitted]
jobs:
claude:
runs-on: [self-hosted, solinas]
permissions:
contents: write
pull-requests: write
issues: write
id-token: write
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 1
- uses: <your-github-org>/claude-code-action@main
with:
trigger_phrase: "@claude"
use_litellm: "true"
litellm_model: "sap/anthropic--claude-4.6-sonnet"
github_token: ${{ secrets.GITHUB_TOKEN }}
env:
AICORE_AUTH_URL: ${{ vars.AICORE_AUTH_URL }}
AICORE_BASE_URL: ${{ vars.AICORE_BASE_URL }}
AICORE_CLIENT_ID: ${{ vars.AICORE_CLIENT_ID }}
AICORE_CLIENT_SECRET: ${{ secrets.AICORE_CLIENT_SECRET }}- name: Cache Claude Session
id: cache_session
continue-on-error: true
uses: actions/cache@v5
with:
path: ~/.claude/projects
key: claude-${{ github.event.pull_request.number && 'pr' || 'issue' }}-${{ github.event.pull_request.number || github.event.issue.number }}-${{ github.run_id }}-${{ github.run_attempt }}
restore-keys: |
claude-${{ github.event.pull_request.number && 'pr' || 'issue' }}-${{ github.event.pull_request.number || github.event.issue.number }}-
Second, we added `claude_args` to the `with` section of the Claude Code Action itself:
Code Snippet: Adapt Claude Code Action for Claude Session Caching
claude_args: |
--continue
The broad spectrum of reports ranging from positive to rather discouraging effects of (autonomous) coding agents on development speed and quality, leave plenty of room for further investigation and optimization. This calls for your own stories and experiences! Whether you start with a setup like the one described above or explore alternative frameworks, now is the perfect time to gain first experiences, establish learnings and best practices, and pave the way for new ways of developing software securely and autonomously. We will definitely continue exploring this space and are curious to hear your thoughts and experiences in the comments.
Reach out and tell us what works best for you!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
| User | Count |
|---|---|
| 25 | |
| 19 | |
| 18 | |
| 14 | |
| 13 | |
| 11 | |
| 10 | |
| 9 | |
| 6 | |
| 4 |