Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This article would benefit from a date. It looks like it's recent (Internet Archive first grabbed it on May 29th) but it's the kind of information that can quickly become stale as models and agents improve.

(I've been getting solid results recently from simply telling Claude Code and Codex "Test with uv run pytest, use red/green TDD".)

 help



Here's a portion of my AGENTS.md from this week (playing FDE, implementing a custom workflow for a client that 20x their productivity).

    # Python Tooling
    
    - Use `uv` to manage Python environments and dependencies.
    - Use `uv run` to execute Python scripts and commands.
    - Use `pytest` for testing your code.
    - Use the `hypothesis` library for property-based testing when you have complex input spaces or need to test edge cases.
    - Don't edit `pyproject.toml` directly. Instead, use `uv add` and `uv add --dev` to manage dependencies.
    - Use ruff, ty, prek, wily for code quality and linting.
    - Don't use excessive casting. If you find yourself needing to cast types frequently, consider refactoring your code to use more appropriate types. Casting should only be done in boundary layers where you are interfacing with external systems.
    - Run appropriate tooling after making changes to your code to ensure it meets quality standards.
    - When you come across a bug or regression, think hard about writing a test and also how to create code that will prevent this from happening again in the future.
    - When creating a command line interface, add `--verbose` flag that provides logging output useful for debugging issues.
    - Before creating code, brainstorm 5 different approaches to solve the problem and sort them by their probable effectiveness. Then, choose the best approach and implement it.
    - Use Test Driven Development (TDD) for all code you write. Write tests before writing the implementation code. 
    - Collect pytest fixtures in a `conftest.py` file to avoid duplication 
    - Prefer testing real code where possible. Use doubles and `monkeypatch` when absolute necessary. Try to avoid mocking as much as possible.
    - Favor pytest monkeypatch to mock.
    - When a test fails, run the last failed test first using `uv run pytest --last-failed` 
    - Use numpy-style docstrings for all functions and classes you create.
    - Include doctests in the docstrings of your functions to provide examples
    - Use type hints for all function parameters and return types.
    - Use logging to provide insight into failures. Don't use print for debugging. Don't use logging to hide stack traces.

A lot of prompt engineering goes out of date quickly. Nobody nowadays goes "you are an expert software engineer. make no mistakes" lol.

As a personal anecdote, I find that a lot of big prompts and skills use up context window budget and in many cases agents will eagerly try to use a skill even if it isn't super relevant or necessary for the current task. So when I have too many skills I have to spend a bunch of time toggling the checkboxes to figure out which ones are needed for the task at hand before starting...


I can't find the link now, but Anthropic has a post about using either a light model call or other logic (regex etc) to dynamically decide what tools to expose per incoming request.

I've run into the same issue and I still end up manually curtailing what's exposed to the model, limiting to the task at hand, but I like the idea of another (smaller I hope) model doing 70% of the clipping instead, automagically.


> Nobody nowadays goes "you are an expert software engineer. make no mistakes"

You know what, I checked Opus 4.8's instructions to a review subagent the other day and it literally opened with

> You are a senior infrastructure/security engineer doing a thorough, adversarial code review...

I didn't say anything like that myself.


Much like agents, I can tell myself I'm a senior infrastructure/security engineer doing a thorough, adversarial code review, but that doesn't change the results much.

You have to be looking in a mirror and slap your face a couple of times to make it work.

Good point! Will add a date.


Me too, although I dislike the fact that it over-focuses on mocks (which I accept is over-represented in the training data).

sometimes I also feel it tries to optimise for "per line coverage" over more "real, complex use cases" type tests

Every article should include a date!

fwiw, response headers include: Last-Modified: Fri, 22 May 2026 19:08:09 GMT



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: