Slashy is a YC S25 general-purpose agent that connects to 15 SaaS apps (Gmail, Slack, Notion, Linear, HubSpot, Airtable, and others) and executes multi-step workflows across them. The team rejected the Model Context Protocol (MCP) abstraction layer and built custom tools in-house. They use a single-agent architecture instead of multi-agent orchestration, claiming it reduces hallucinations. The demo shows workflows like “research this company, create a Google Doc, add contacts to CRM, schedule follow-ups, and send personalized emails” in one task.
This is automation infrastructure, not financial automation. The source was discovered via a financial automation query, but the architecture applies to any cross-app workflow: sales ops, HR onboarding, research synthesis, or customer support. The plumbing (tool registration, semantic search, memory, credential scoping) is domain-agnostic. If you’re building agents that need to read from three apps and write to two in one task, this is the relevant architecture.
This is not a wrapper around ChatGPT with OAuth buttons. The plumbing involves tool registration, cross-app semantic search, personalized memory, and credential scoping. Here’s how it works.
Tool Registration and Direct API Calls
Slashy avoids MCP because most community-built connectors are low quality and add abstraction overhead. Instead, they build tools directly against each app’s API.
Tool design principles:
- Each tool exposes a single action (send email, create calendar event, search Slack messages, update Notion page)
- Tools normalize output to Markdown instead of preserving native formats (Slack blocks become Markdown, Notion rich text becomes Markdown)
- The LLM receives a tool schema with parameters, descriptions, and examples
- Tool calls are synchronous: the agent waits for the API response before proceeding
Why Markdown normalization matters:
When an agent reads a Slack message and writes it to a Notion page, format translation becomes a failure point. Slack uses a block-based JSON structure. Notion uses a proprietary rich text format. If you preserve native formats, the agent must learn 15 different serialization schemes. Markdown is a lossy but universal intermediate representation. You lose some formatting (nested lists, custom colors) but gain reliability.
Trade-off table:
| Approach | Pros | Cons |
|---|---|---|
| MCP abstraction | Community connectors, faster initial integration | Low quality, extra latency, format mismatches |
| Direct API + Markdown | Consistent output, full control, lower latency | Must maintain 15 API clients, loses rich formatting |
| Native format preservation | No data loss, perfect fidelity | Agent must learn 15 schemas, high hallucination risk |
The team chose row two. They accept format loss in exchange for predictable tool behavior.
Cross-App Semantic Search
Slashy indexes content from all connected apps into a single semantic search layer. When you ask “find all emails about the Q3 roadmap and related Slack threads,” the agent queries one index, not 15 separate APIs.
Index structure (inferred from behavior):
- Each document chunk gets an embedding (likely OpenAI text-embedding-3 or similar)
- Metadata includes source app, timestamp, author, and document ID
- Embeddings are stored in a vector database (Pinecone, Weaviate, or Postgres with pgvector)
- Search returns top-k chunks with metadata, then the agent uses tool calls to fetch full documents
Why this matters:
Without unified search, the agent would need to:
- Call Gmail API search
- Call Slack API search
- Call Notion API search
- Merge results in the LLM context window
- Re-rank by relevance
That’s five API calls with different query syntaxes and result formats. Unified search collapses it to one vector query plus targeted tool calls for full content.
Indexing pipeline (likely shape):
# Python 3.7+ with asyncio
# Pseudocode for cross-app indexing
async def index_user_data(user_id: str):
apps = get_connected_apps(user_id)
for app in apps:
client = get_api_client(app, user_id)
# Fetch recent documents
docs = await client.fetch_recent(limit=1000)
for doc in docs:
# Chunk long documents
chunks = chunk_document(doc.content, max_tokens=512)
for chunk in chunks:
embedding = await embed(chunk.text)
await vector_db.upsert({
"id": f"{app}:{doc.id}:{chunk.index}",
"embedding": embedding,
"metadata": {
"app": app,
"doc_id": doc.id,
"user_id": user_id,
"timestamp": doc.created_at,
"author": doc.author
},
"text": chunk.text
})
Failure modes:
- Stale index: if a user edits a Notion page, the embedding is outdated until the next sync
- Permission drift: if a user loses access to a Slack channel, the index still contains those messages
- Embedding model mismatch: if you switch from OpenAI to Cohere embeddings, you must re-index everything
Personalized Memory
Slashy has not published memory implementation details. The following analysis is based on observed behavior in the demo video and may not reflect actual design.
Slashy maintains per-user memory so the agent remembers context across sessions. If you tell it “I prefer emails in bullet points,” it should apply that preference in future workflows.
Memory architecture:
Based on observed behavior in the demo video, the architecture likely uses one of two patterns:
- Vector store per user: Each user gets a namespace in the vector DB. Preferences, past actions, and entity relationships are embedded and retrieved during task planning.
- Structured memory graph: A graph database (Neo4j or similar) stores entities (people, companies, projects) and relationships (works at, owns, mentioned in). The agent queries the graph for relevant context before executing tools.
The observed behavior suggests a hybrid approach (vector store for unstructured preferences, graph for entities), though this is educated inference rather than confirmed design.
Memory retrieval flow:
- User sends a request: “Send a follow-up email to the CEO of Acme Corp”
- Agent queries memory: “What do I know about Acme Corp and this user’s email preferences?”
- Memory returns: “Acme Corp: Series B, 50 employees, CEO is Jane Doe. User prefers short emails with action items.”
- Agent constructs email using that context
- After sending, agent writes to memory: “Sent follow-up to Jane Doe on 2026-05-19”
Memory write strategy:
- Append-only log: every action and user correction is logged
- Periodic summarization: a background job condenses logs into structured facts
- Conflict resolution: if the user says “I prefer long emails” after previously saying “short emails,” the newer preference wins
Orchestration: Single Agent vs. Multi-Agent
Slashy uses a single-agent architecture. One LLM instance plans the workflow, calls tools, and synthesizes results. The alternative is multi-agent: a planner agent delegates to specialist agents (email agent, calendar agent, research agent).
Why single agent:
- Fewer handoff points means fewer places for context to degrade
- No inter-agent communication protocol to debug
- Simpler state management (one context window, one tool call history)
When single agent breaks down:
- Long workflows exceed context window (even with 128k tokens)
- Parallel execution is hard (agent must call tools sequentially)
- Specialist knowledge is diluted (a general agent is worse at SQL than a dedicated SQL agent)
The team claims single agent reduces hallucinations. The mechanism: multi-agent systems introduce ambiguity in task boundaries. If the planner agent says “research this company,” the research agent might interpret that differently than intended. Single agent eliminates that translation layer.
Orchestration pseudocode:
# Python 3.7+ with asyncio
async def execute_task(user_request: str, user_id: str):
# Retrieve memory and search context
memory = await get_user_memory(user_id)
search_results = await semantic_search(user_request, user_id)
# Build initial prompt
system_prompt = build_system_prompt(memory, search_results)
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_request}
]
# Iterative tool calling loop
# Max iterations set to prevent runaway loops
max_iterations = 20
iteration = 0
while iteration < max_iterations:
response = await llm.chat(messages, tools=available_tools)
if response.finish_reason == "stop":
return response.content
if response.finish_reason == "tool_calls":
for tool_call in response.tool_calls:
result = await execute_tool(tool_call, user_id)
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": result
})
iteration += 1
raise Exception(f"Workflow exceeded {max_iterations} iterations")
Credential Scoping and Token Management
Each user connects 15 apps via OAuth. The agent needs valid tokens to call APIs on their behalf.
Credential storage:
- Tokens are encrypted at rest (likely AES-256)
- Each token is scoped to a user and an app
- Refresh tokens are stored separately from access tokens
- Tokens are never logged or passed to the LLM
Token expiration handling:
- Agent attempts API call with access token
- API returns 401 Unauthorized
- Agent retrieves refresh token
- Agent calls OAuth refresh endpoint
- Agent stores new access token
- Agent retries original API call
Failure mode:
If the refresh token is also expired (user revoked access or token TTL exceeded), the agent cannot proceed. The user must re-authenticate. The agent should detect this early and prompt for re-auth before starting a multi-step workflow.
Security boundaries:
- Each tool call includes a user_id parameter
- The tool executor validates that the user owns the credential before making the API call
- If a user shares a Slashy workspace (unclear if this exists), credential isolation must prevent cross-user access
Observability and Debugging
When a workflow fails, you need to know which tool call broke and why.
Observability requirements:
- Log every tool call with input parameters and output
- Track token usage per workflow (important for cost control)
- Capture API errors with full context (status code, error message, request ID)
- Store the full message history so you can replay the workflow
Likely tooling:
- Structured logging (JSON logs with trace IDs)
- Distributed tracing (OpenTelemetry spans for each tool call)
- Replay capability (store the initial request and all tool results, re-run the LLM with the same inputs)
Debugging a failed workflow:
- User reports: “Agent sent the wrong email”
- Look up trace ID from user session
- Inspect tool call history: which tools were called, in what order
- Check memory retrieval: did the agent have the right context?
- Review LLM response: did the model choose the wrong tool or pass bad parameters?
- Fix: update tool schema, add validation, or adjust system prompt
Technical Verdict
Use Slashy’s Markdown normalization pattern when:
- You’re building agents that read from 3+ SaaS apps with incompatible native formats (Slack blocks, Notion rich text, Gmail MIME). The universal intermediate representation prevents format translation bugs that break cross-app workflows.
- Your users care more about reliability than preserving nested lists, custom colors, or embedded media. Markdown loses these but gains predictable parsing.
Use Slashy’s single-agent architecture when:
- Your workflows are mostly sequential with 8-15 tool calls per task (the demo suggests this is the sweet spot). Beyond 20 steps, context window limits and latency compound.
- The cost of hallucinations (wrong email sent, wrong CRM entry) exceeds the cost of slower execution (2-10 seconds per workflow vs. sub-second multi-agent parallelism).
- You can afford to build and maintain custom API clients for each integration. MCP saves development time but adds latency and quality risk.
Use Slashy’s unified semantic search when:
- Cross-app queries are common (“find all emails and Slack threads about Project X”). Without unified search, you’d need 15 separate API calls with different query syntaxes.
- You can tolerate 2-5 minute index lag. Real-time workflows (trading, incident response, compliance monitoring) break with stale embeddings.
Avoid Slashy’s architecture when:
- You need to preserve rich formatting across apps. Markdown normalization loses nested structures, custom colors, and complex layouts. If users demand pixel-perfect fidelity, you must preserve native formats and accept higher hallucination risk.
- Your workflows require 50+ sequential steps or heavy parallelism. Single agent will hit context limits or be too slow. Multi-agent with DAG execution is better for complex orchestration.
- You need domain-specific reasoning (financial compliance, medical diagnosis, legal analysis). A general agent cannot match specialist performance. Use dedicated agents with domain-specific tools and fine-tuned models.
- You’re operating at thin margins. LLM inference ($0.25 per complex workflow), embedding costs ($0.00005 per 512-token chunk), and vector storage ($0.10-0.30 per GB per month) compound to $10-50 per active user per month. If your unit economics are tight, this infrastructure is expensive.
- Your users demand sub-second response times. OAuth refresh (200-500ms), vector search (100-300ms), and sequential tool calls (500ms-2s each) add 2-10 seconds per workflow. Real-time use cases need faster execution.
The single-agent pattern is a bet that simplicity beats specialization. It works when task complexity is moderate and the cost of hallucinations is higher than the cost of slower execution. If you’re building a cross-app agent, copy the Markdown normalization and unified search patterns. Skip MCP unless you need community connectors more than you need reliability.