Every third-party dependency in an agent deployment is a supply-chain risk, a cold-start penalty, and a version conflict risk. A new ArXiv paper (2605.21405v1) by Peng Ding and Rick Stevens asks whether LLMs can eliminate that overhead by rewriting popular Python libraries using only the standard library. The answer shapes whether agents can shrink deployment footprints and reduce cold-start overhead in Lambda and similar constrained environments.
The research introduces zerodep, a collection of 40+ single-file, stdlib-only reimplementations of common libraries. Each module is LLM-generated, API-compatible with its reference library, and validated for correctness. The goal is selective elimination where stdlib suffices.
Why Dependency Elimination Matters for Agents
Agent deployments face three dependency-related problems:
-
Cold-start latency: AWS Lambda and similar platforms pay a penalty per megabyte of package size. A typical agent with
requests,pandas, andnumpycan hit 50MB+ before you write a single tool. -
Supply-chain attacks: PyPI typosquatting, compromised maintainers, and transitive dependencies create attack surface. Agents that execute arbitrary tool calls in production environments need minimal trust boundaries.
-
Environment constraints: Edge devices, air-gapped systems, and compliance-heavy deployments often restrict or audit external packages. AWS Lambda Layer size limits to 250MB uncompressed. Stdlib-only code passes through these gates with less friction.
The trade-off is correctness. Third-party libraries handle edge cases, platform quirks, and performance optimizations that stdlib code may miss. The question is how often that trade-off is worth it.
What the Paper Measured
The zerodep project reimplemented libraries across 12 categories:
- Serialization (JSON, YAML, TOML)
- Networking (HTTP clients, WebSocket)
- Cryptography (JWT, bcrypt, HMAC)
- Agent protocols (OpenAI API, Anthropic API)
- Text processing (Markdown, regex utilities)
Each reimplementation was benchmarked against the reference library for:
- Correctness: Does it pass the reference library’s test suite?
- Performance: How does runtime compare on representative workloads?
- Code size: How many lines of stdlib code replace how many lines of third-party code?
Results show that stdlib-only implementations achieve performance within 2x of the reference library in the majority of cases. The paper reports that many widely-used libraries carry architectural overhead that LLM-generated stdlib reimplementations avoid, yielding 5x to 115x speedups in specific scenarios. The exceptions are predictable: C-extension-backed operations like image processing, binary serialization, and low-level crypto fall off a cliff.
For example, a stdlib-only JWT implementation using hmac and json was 5x faster than PyJWT on signing operations because it skipped unnecessary abstraction layers. Conversely, any attempt to replace Pillow or cryptography with pure Python stdlib code results in 10-100x performance degradation.
Architecture: How LLMs Generate Stdlib-Only Code
Ding and Stevens describe a constrained code generation pipeline:
-
Prompt construction: The LLM receives the reference library’s API surface, a sample of its test suite, and the constraint that only stdlib imports are allowed.
-
Iterative refinement: The LLM generates a candidate implementation. A validation harness runs the reference library’s tests against the stdlib version. Failures are fed back to the LLM with error traces.
-
Correctness gate: The implementation is accepted only if it passes a threshold of the reference test suite (typically 90%+ coverage).
-
Performance profiling: Accepted implementations are benchmarked. If performance is worse than 10x the reference, the LLM is prompted to optimize.
This is not a one-shot generation task. The paper reports an average of 3-7 refinement iterations per module. The LLM’s job is not to write perfect code. Its job is to navigate the constraint space until the validation harness stops failing.
# Example: stdlib-only HTTP POST with JSON payload
import urllib.request
import urllib.error
import json
def post_json(url, data, headers=None, timeout=30):
"""Drop-in replacement for requests.post with JSON payload.
Uses only stdlib: urllib.request, json, no external dependencies.
Agents detect failures via exception type or schema validation.
"""
headers = headers or {}
headers['Content-Type'] = 'application/json'
req = urllib.request.Request(
url,
data=json.dumps(data).encode('utf-8'),
headers=headers,
method='POST'
)
try:
with urllib.request.urlopen(req, timeout=timeout) as response:
return json.loads(response.read().decode('utf-8'))
except (urllib.error.URLError, urllib.error.HTTPError) as e:
raise RuntimeError(f"HTTP request failed: {e}")
# Usage in agent deployment
response = post_json("https://api.example.com/tool", {"action": "execute"})
This works for simple cases. It breaks when you need connection pooling, retry logic, or proxy support. The stdlib provides urllib.request.OpenerDirector for some of this, but the API is verbose and error-prone. The LLM’s task is to generate the glue code that bridges the gap.
Correctness Gaps and Runtime Detection
The paper’s systematic benchmarking reveals where stdlib-only implementations fail:
-
Edge-case handling: LLMs miss rare input patterns (malformed UTF-8, timezone edge cases, protocol violations). Detection requires fuzz testing and differential validation against the reference library.
-
Platform quirks: Stdlib behavior differs across operating systems (e.g.,
os.pathon Windows vs. Unix, socket timeouts on BSD vs. Linux). Cross-platform CI with the reference test suite catches these. -
Security vulnerabilities: LLM-generated cryptographic code requires manual audit for timing attacks, weak random number generation, and improper padding. Static analysis tools like Bandit and Semgrep catch some issues, but expert review is necessary.
Agents that use stdlib-only code need runtime detection for these failures. The simplest approach is differential testing in production: run both implementations on a sample of requests, compare outputs, and alert on divergence. This catches correctness bugs before they propagate but adds latency and complexity.
Deployment Trade-Offs
Here is when stdlib-only code makes sense:
- Serverless agents with tight cold-start budgets: Eliminating common dependencies reduces package size and cold-start overhead. The paper demonstrates performance parity in most cases, though specific savings depend on the dependency mix and runtime environment.
- Air-gapped or compliance-heavy environments: Stdlib-only code passes through security audits faster because the attack surface is smaller and the code is auditable.
- Single-purpose tools: If your agent only needs to parse JSON, sign JWTs, and make HTTP requests, stdlib covers it. You do not need the full feature set of
requestsorPyJWT.
Here is when it does not:
- C-extension-backed workloads: Image processing, binary serialization, and low-level crypto are 10-100x slower in pure Python. Avoid reimplementing
Pilloworcryptographywith stdlib. - Complex protocols: WebSocket, HTTP/2, and gRPC require state machines and edge-case handling that stdlib does not provide. The LLM will generate buggy code.
- Mature library ecosystems: If you need pandas-level data manipulation or numpy-level numerical computing, stdlib is not a substitute. The performance and correctness gap is too wide.
Supply-Chain Risk vs. LLM-Generated Risk
Eliminating dependencies shrinks the supply-chain attack surface. You no longer trust PyPI maintainers, transitive dependencies, or package mirrors. But you introduce a new risk: LLM-generated cryptographic code requires manual audit for timing attacks, weak random number generation, and improper padding. The paper’s validation harness caught functional bugs, but security vulnerabilities require static analysis and expert review.
The mitigation strategy is allowlisting: only use stdlib-only code for low-risk operations (HTTP clients, JSON parsing, text processing). For cryptography, authentication, and payment processing, use audited third-party libraries. The goal is zero unnecessary dependencies, not zero dependencies at all costs.
Technical Verdict
Use stdlib-only code when:
- Cold-start latency or package size is a hard constraint (serverless, edge deployments).
- You need to minimize supply-chain risk and can afford manual audit of LLM-generated code.
- Your agent’s tool calls are simple (HTTP requests, JSON parsing, file I/O) and do not require C-extension performance.
Avoid it when:
- You need C-extension performance (image processing, binary serialization, numerical computing).
- Your agent handles cryptography, authentication, or payment processing (use audited libraries).
- You lack the engineering capacity to validate LLM-generated code against the reference library’s test suite.
The real value is not replacing every dependency. The real value is knowing which dependencies are optional and which are essential. This paper gives you the data to make that call.
Source Links
- ArXiv Paper (2605.21405v1)
- Category: cs.AI