This article would benefit from a date. It looks like it's recent (Internet Arch...

__mharrison__ · 2026-06-05T21:16:21 1780694181

Here's a portion of my AGENTS.md from this week (playing FDE, implementing a custom workflow for a client that 20x their productivity).

    # Python Tooling
    
    - Use `uv` to manage Python environments and dependencies.
    - Use `uv run` to execute Python scripts and commands.
    - Use `pytest` for testing your code.
    - Use the `hypothesis` library for property-based testing when you have complex input spaces or need to test edge cases.
    - Don't edit `pyproject.toml` directly. Instead, use `uv add` and `uv add --dev` to manage dependencies.
    - Use ruff, ty, prek, wily for code quality and linting.
    - Don't use excessive casting. If you find yourself needing to cast types frequently, consider refactoring your code to use more appropriate types. Casting should only be done in boundary layers where you are interfacing with external systems.
    - Run appropriate tooling after making changes to your code to ensure it meets quality standards.
    - When you come across a bug or regression, think hard about writing a test and also how to create code that will prevent this from happening again in the future.
    - When creating a command line interface, add `--verbose` flag that provides logging output useful for debugging issues.
    - Before creating code, brainstorm 5 different approaches to solve the problem and sort them by their probable effectiveness. Then, choose the best approach and implement it.
    - Use Test Driven Development (TDD) for all code you write. Write tests before writing the implementation code. 
    - Collect pytest fixtures in a `conftest.py` file to avoid duplication 
    - Prefer testing real code where possible. Use doubles and `monkeypatch` when absolute necessary. Try to avoid mocking as much as possible.
    - Favor pytest monkeypatch to mock.
    - When a test fails, run the last failed test first using `uv run pytest --last-failed` 
    - Use numpy-style docstrings for all functions and classes you create.
    - Include doctests in the docstrings of your functions to provide examples
    - Use type hints for all function parameters and return types.
    - Use logging to provide insight into failures. Don't use print for debugging. Don't use logging to hide stack traces.

porphyra · 2026-06-05T20:49:24 1780692564

A lot of prompt engineering goes out of date quickly. Nobody nowadays goes "you are an expert software engineer. make no mistakes" lol.

As a personal anecdote, I find that a lot of big prompts and skills use up context window budget and in many cases agents will eagerly try to use a skill even if it isn't super relevant or necessary for the current task. So when I have too many skills I have to spend a bunch of time toggling the checkboxes to figure out which ones are needed for the task at hand before starting...

Royce-CMR · 2026-06-06T03:45:37 1780717537

I can't find the link now, but Anthropic has a post about using either a light model call or other logic (regex etc) to dynamically decide what tools to expose per incoming request.

I've run into the same issue and I still end up manually curtailing what's exposed to the model, limiting to the task at hand, but I like the idea of another (smaller I hope) model doing 70% of the clipping instead, automagically.

oefrha · 2026-06-06T10:19:57 1780741197

> Nobody nowadays goes "you are an expert software engineer. make no mistakes"

You know what, I checked Opus 4.8's instructions to a review subagent the other day and it literally opened with

> You are a senior infrastructure/security engineer doing a thorough, adversarial code review...

I didn't say anything like that myself.

mathgeek · 2026-06-06T12:06:52 1780747612

Much like agents, I can tell myself I'm a senior infrastructure/security engineer doing a thorough, adversarial code review, but that doesn't change the results much.

tclancy · 2026-06-06T15:10:05 1780758605

You have to be looking in a mirror and slap your face a couple of times to make it work.

jasonswett · 2026-06-06T00:47:36 1780706856

Good point! Will add a date.

nextaccountic · 2026-06-06T10:04:18 1780740258

https://github.com/jasonswett/llm-skills/blob/main/tdd/SKILL... has a timestamp (mar 14, 2026 as of today)

disgruntledphd2 · 2026-06-05T20:19:09 1780690749

Me too, although I dislike the fact that it over-focuses on mocks (which I accept is over-represented in the training data).

galsapir · 2026-06-05T20:46:51 1780692411

sometimes I also feel it tries to optimise for "per line coverage" over more "real, complex use cases" type tests

chrisweekly · 2026-06-06T00:13:01 1780704781

Every article should include a date!

0123456789ABCDE · 2026-06-05T23:28:16 1780702096

fwiw, response headers include: Last-Modified: Fri, 22 May 2026 19:08:09 GMT