<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
	<title>ai-assisted.dev</title>
	<subtitle>References for AI-native software development</subtitle>
	<link href="https://ai-assisted.dev/" rel="alternate" type="text/html"/>
	<link href="https://ai-assisted.dev/rss.xml" rel="self" type="application/atom+xml"/>
	<id>https://ai-assisted.dev/</id>
	<updated>2026-04-28T10:00:00.000Z</updated>
	<entry>
		<title>A field guide to writing your own eval harness</title>
		<link href="https://ai-assisted.dev/p/a-001"/>
		<id>https://ai-assisted.dev/p/a-001</id>
		<updated>2026-04-28T10:00:00.000Z</updated>
		<summary>Why &quot;vibes-based&quot; testing collapses past 50 prompts, and the smallest harness that scales without becoming a second product.</summary>
		<category term="evals"/><category term="agents"/><category term="tooling"/><category term="methodology"/>
	</entry>
	<entry>
		<title>Building effective agents</title>
		<link href="https://www.anthropic.com/engineering/building-effective-agents"/>
		<id>https://www.anthropic.com/engineering/building-effective-agents</id>
		<updated>2026-04-26T10:00:00.000Z</updated>
		<summary>The taxonomy I keep coming back to. Workflows vs. agents, with worked patterns.</summary>
		<category term="agents"/><category term="patterns"/>
	</entry>
	<entry>
		<title>How we&apos;re shipping with autonomous agents at scale</title>
		<link href="https://medium.com/"/>
		<id>https://medium.com/</id>
		<updated>2026-04-25T10:00:00.000Z</updated>
		<summary>A pragmatic field report. The section on guardrail budgets is worth the read alone.</summary>
		<category term="agents"/><category term="engineering"/>
	</entry>
	<entry>
		<title>Benchmarking 7 coding agents on a real refactor</title>
		<link href="https://ai-assisted.dev/p/a-002"/>
		<id>https://ai-assisted.dev/p/a-002</id>
		<updated>2026-04-22T10:00:00.000Z</updated>
		<summary>Same 12k-line TypeScript codebase, same task: extract a domain layer. I ran every agent twice and graded the diffs.</summary>
		<category term="benchmarks"/><category term="agents"/><category term="refactoring"/><category term="data"/>
	</entry>
	<entry>
		<title>SWE-bench Verified — leaderboard notes</title>
		<link href="https://www.swebench.com/"/>
		<id>https://www.swebench.com/</id>
		<updated>2026-04-20T10:00:00.000Z</updated>
		<summary>Worth reading the verified subset methodology before quoting any number from the headline board.</summary>
		<category term="benchmarks"/><category term="evals"/>
	</entry>
	<entry>
		<title>The new contract between developers and AI</title>
		<link href="https://thenewstack.io/"/>
		<id>https://thenewstack.io/</id>
		<updated>2026-04-18T10:00:00.000Z</updated>
		<summary>Skim the intro, read the middle. Their framing of &quot;negotiated autonomy&quot; is sticky.</summary>
		<category term="industry"/><category term="agents"/>
	</entry>
	<entry>
		<title>Things I have learned about LLMs in 2025</title>
		<link href="https://simonwillison.net/"/>
		<id>https://simonwillison.net/</id>
		<updated>2026-04-16T10:00:00.000Z</updated>
		<summary>Annual roundup. Section on tool-calling reliability is gold.</summary>
		<category term="retrospective"/><category term="tooling"/>
	</entry>
	<entry>
		<title>The shape of a good MCP server</title>
		<link href="https://ai-assisted.dev/p/a-003"/>
		<id>https://ai-assisted.dev/p/a-003</id>
		<updated>2026-04-12T10:00:00.000Z</updated>
		<summary>Most MCP servers I see are CRUD wrappers. Here is what changes when you design tools as if a model were a junior engineer with no memory.</summary>
		<category term="mcp"/><category term="tooling"/><category term="agents"/><category term="design-notes"/>
	</entry>
	<entry>
		<title>Subagents and the verifier pattern</title>
		<link href="https://docs.anthropic.com/"/>
		<id>https://docs.anthropic.com/</id>
		<updated>2026-04-05T10:00:00.000Z</updated>
		<summary>Forking a verifier you never await is the cheapest CI you will ever ship.</summary>
		<category term="claude-code"/><category term="patterns"/>
	</entry>
	<entry>
		<title>Latency budgets for agentic UIs</title>
		<link href="https://ai-assisted.dev/p/a-004"/>
		<id>https://ai-assisted.dev/p/a-004</id>
		<updated>2026-03-30T10:00:00.000Z</updated>
		<summary>A 90th-percentile breakdown of where the seconds actually go in a tool-using chat. Spoiler: it is not the model.</summary>
		<category term="ux"/><category term="performance"/><category term="agents"/><category term="data"/>
	</entry>
	<entry>
		<title>Aider — repo-map heuristics, annotated</title>
		<link href="https://aider.chat/"/>
		<id>https://aider.chat/</id>
		<updated>2026-03-24T10:00:00.000Z</updated>
		<summary>The PageRank-on-symbols trick is more interesting than the headline feature.</summary>
		<category term="tooling"/><category term="open-source"/>
	</entry>
	<entry>
		<title>Designing for the model in the loop</title>
		<link href="https://ai-assisted.dev/p/a-005"/>
		<id>https://ai-assisted.dev/p/a-005</id>
		<updated>2026-03-19T10:00:00.000Z</updated>
		<summary>When the user is not the only one reading your UI. A short manifesto on machine-legible interfaces.</summary>
		<category term="ux"/><category term="agents"/><category term="design-notes"/><category term="patterns"/>
	</entry>
</feed>