Hacker Newsnew | past | comments | ask | show | jobs | submit | mparis's commentslogin

I've been playing with the Gemini CLI w/ the gemini-pro-3 preview. First impressions are that its still not really ready for prime time within existing complex code bases. It does not follow instructions.

The pattern I keep seeing is that I ask it to iterate on a design document. It will, but then it will immediately jump into changing source files despite explicit asks to only update the plan. It may be a gemini CLI problem more than a model problem.

Also, whoever at these labs is deciding to put ASCII boxes around their inputs needs to try using their own tool for a day.

People copy and paste text in terminals. Someone at Gemini clearly thought about this as they have an annoying `ctrl-s` hotkey that you need to use for some unnecessary reason.. But they then also provide the stellar experience of copying "a line of text where you then get | random pipes | in the middle of your content".

Codex figured this out. Claude took a while but eventually figured it out. Google, you should also figure it out.

Despite model supremacy, the products still matter.


We've been running structured outputs via Claude on Bedrock in production for a year now and it works great. Give it a JSON schema, inject a '{', and sometimes do a bit of custom parsing on the response. GG

Nice to see them support it officially; however, OpenAI has officially supported this for a while but, at least historically, I have been unable to use it because it adds deterministic validation that errors on certain standard JSON Schema elements that we used. The lack of "official" support is the feature that pushed us to use Claude in the first place.

It's unclear to me that we will need "modes" for these features.

Another example: I used to think that I couldn't live without Claude Code "plan mode". Then I used Codex and asked it to write a markdown file with a todo list. A bit more typing but it works well and it's nice to be able to edit the plan directly in editor.

Agree or Disagree?


Before Claude Code shipped with plan mode, the workflow for using most coding agents was to have it create a `PLAN.md` and update/execute that plan. Planning mode was just a first class version of what users were already doing.


Claude Code keeps coming out with a lot of really nice tools that others haven't started to emulate from what I've seen.

My favorite one is going through the plan interactively. It turns it into a multiple choice / option TUI and the last choose is always reprompt that section of the plan.

I had switch back to codex recently and not being able to do my planning solely in the CLI feels like the early 1900s.

To trigger the interactive mode. Do something like:

Plan a fix for:

<Problem statement>

Please walk me through any options or questions you might have interactively.


> Give it a JSON schema, inject a '{', and sometimes do a bit of custom parsing on the response

I would hope that this is not what OpenAI/Anthropic do under the hood, because otherwise, what if one of the strings needs a lot of \escapes? Is it also supposed to newer write actual newlines in strings? It's awkward.

The ideal solution would be to have some special tokens like [object_start] [object_end] and [string_start] [string_end].


AI x Healthcare Startup | Boston, MA Onsite | Full-time | Early Engineer

We're looking for a backend-leaning fullstack dev. You will be one of the first engineers outside of the founding team. Here is a bit more about us:

We're a seed stage AI startup backed by several top tier VCs. We're on a mission to ensure patients get the coverage they deserve from their health insurance.

We’re building deep, vertically integrated technology systems to solve fundamental problems in US healthcare - the biggest market in the world ($5T). We use AWS, K8s, React, and Rust but there is no requirement to have prior experience with them specifically.

We are a founding team made up of ex-YC, Amazon, Meta, Microsoft, and Harvard Business School with previous successful exits. We have a 6+ month customer waitlist and growing insanely fast.

We're hiring our first engineer outside the founders. You'll work directly with customers to understand their needs, design the right solution, and build from zero to one. You'll own entire parts of the roadmap and tech stack while wearing multiple hats. Most important characteristics are resilience, work ethic, and curiosity. We care about slope, not where you are today.

This is an opportunity to work in an insanely fast paced, high ownership environment while solving real problems in healthcare. We're happy to share more details on the role in person/on zoom. Please fill out this form if interested!

https://wgwx7h7be0p.typeform.com/to/LV0t8OjI


Went through the form, seems like a data harvesting survey. Asks for several pieces of personal information, step by step, and then ends with saying they’ll be in contact.

No details at all about the position in that link


AI x Healthcare Startup | Boston, MA Onsite | Full-time | Early Engineer

We're looking for a backend-leaning fullstack dev. You will be one of the first engineers outside of the founding team. Here is a bit more about us: We're a seed stage AI startup backed by several top tier VCs. We're on a mission to ensure patients get the coverage they deserve from their health insurance.

We’re building deep, vertically integrated technology systems to solve fundamental problems in US healthcare - the biggest market in the world ($5T). We use AWS, K8s, React, and Rust but there is no requirement to have prior experience with them specifically. We'll teach you!

We are a founding team made up of ex-YC, Amazon, Meta, Microsoft, and Harvard Business School with previous successful exits. We have a 6+ month customer waitlist and growing insanely fast.

We're hiring our first engineer outside the founders. You'll work directly with customers to understand their needs, design the right solution, and build from zero to one. You'll own entire parts of the roadmap and tech stack while wearing multiple hats. Most important characteristics are resilience, work ethic, and curiosity. We care about slope, not where you are today.

This is an opportunity to work in an insanely fast paced, high ownership environment while solving real problems in healthcare.

We're happy to share more details on the role in person/on zoom. Please fill out this form if interested!

https://wgwx7h7be0p.typeform.com/to/LV0t8OjI


very interested. filled out the form. I wrote this about finding the right people. it's a plug, but might actually be worth a look.

https://news.ycombinator.com/item?id=44415442


I'm a recent snafu (https://docs.rs/snafu/latest/snafu/) convert over thiserror (https://docs.rs/thiserror/latest/thiserror/). You pay the cost of adding `context` calls at error sites but it leads to great error propagation and enables multiple error variants that reference the same source error type which I always had issues with in `thiserror`.

No dogma. If you want an error per module that seems like a good way to start, but for complex cases where you want to break an error down more, we'll often have an error type per function/struct/trait.


Thanks for using SNAFU! Any feedback you'd like to share?


> multiple error variants that reference the same source error type which I always had issues with in `thiserror`.

Huh?

    #[derive(Debug, thiserror::Error)]
    enum CustomError {
        #[error("failed to open a: {0}")]
        A(std::io::Error),
        #[error("failed to open b: {0}")]
        B(std::io::Error),
    }
    
    fn main() -> Result<(), CustomError> {
        std::fs::read_to_string("a").map_err(CustomError::A)?;
        std::fs::read_to_string("b").map_err(CustomError::B)?;
        Ok(())
    }
If I understand correctly, the main feature of snafu is "merely" reducing the boilerplace when adding context:

    low_level_result.context(ErrorWithContextSnafu { context })?;
    // vs
    low_level_result.map_err(|err| ErrorWithContext { err, context })?;
But to me, the win seems to small to justify the added complexity.


You certainly can use thiserror to accomplish the same goals! However, your example does a little subtle slight-of-hand that you probably didn't mean to and leaves off the enum name (or the `use` statement):

    low_level_result.context(ErrorWithContextSnafu { context })?;
    low_level_result.map_err(|err| CustomError::ErrorWithContext { err, context })?;
Other small details:

- You don't need to move the inner error yourself.

- You don't need to use a closure, which saves a few characters. This is even true in cases where you have a reference and want to store the owned value in the error:

    #[derive(Debug, Snafu)]
    struct DemoError { source: std::io::Error, filename: PathBuf }

    let filename: &Path = todo!();
    result.context(OpenFileSnafu { filename })?; // `context` will change `&Path` to `PathBuf`
- You can choose to capture certain values implicitly, such as a source file location, a backtrace, or your own custom data (the current time, a global-ish request ID, etc.)

----

As an aside:

    #[error("failed to open a: {0}")]
It is now discouraged to include the text of the inner error in the `Display` of the wrapping error. Including it leads to duplicated data when printing out chains of errors in a nicer / structured manner. SNAFU has a few types that work to undo this duplication, but it's better to avoid it in the first place.


Congrats on the launch. Seems like a natural domain for an AI tool. One nice aspect about pen testing is it only needs to work once to be useful. In other words, it can fail most of the time and no one but your CFO cares. Nice!

A few questions:

On your site it says, "MindFort can asses 1 or 100,000 page web apps seamlessly. It can also scale dynamically as your applications grow."

Can you provide more color as to what that really means? If I were actually to ask you to asses 100,000 pages what would actually happen? Is it possible for my usage to block/brown-out another customer's usage?

I'm also curious what happens if the system does detect a vulnerability. Is there any chance the bot does something dangerous with e.g. it's newly discovered escalated privileges?

Thanks and good luck!


Thanks so much!

In regards to the scale, we absolutely can assess at that scale, but it would require quite a large enterprise contract upfront, as we would need to get the required capacity from our providers.

The system is designed to safely test exploitation, and not perform destructive testing. It will traverse as far as it can, but it won't break anything along the way.


There will of course be high-flyers whose wings will melt and that will fall back to earth, but don't be so quick to dismiss the teams that are bringing real value to industries that have historically been tricky to make more productive. For every red-hot AI demo that drops promising to change the world, there is some other team using AI to do something that may sound boring, yet is important..

For example, I work in healthcare and its difficult to over-exaggerate how much time it can take to do the most basic things. The people that are tasked with doing those basic things are often highly-educated, highly-skilled, and highly-paid; and it still takes a long time.

I suspect there is an unreasonable amount of cost to shed from doing simple things. Things like:

1. Reading, reasoning over, and copying structured data from lightly-structured, highly variable documents like PDFs.

2. Reducing the amount of time a human sits on hold on the phone. I'm of the opinion the AI doesn't even need to do the talking to deliver huge amounts of value. Just help me help my highly-skilled employees move from high-value task to high-value task without the tedium in the middle.

3. Login and copy basic details from any of the 1000s of healthcare specific websites, each of which does more or less the same thing, slightly differently. RPA has always been so costly to build and maintain. The high variation fan-out just got a lot easier.

In the short term, I'm most bullish on AI to solve these low-value, highly-variable, highly-annoying tasks. I'm also reasonably confident that the AI we have today is already good enough to do it.

Give it time and we'll start to see companies operate at margins that were previously impossible in industries that we thought were near-impossible to make more productive.


It's not that it's not useful at all, it's that the mismatch between the reality (what you described) and hype ("build full-stack AI companies to outcompete human-first ones") is wilder than anything I have seen in my lifetime. It's reminiscent of stories about the dot-com bubble era, which I am not old enough to have seen first hand.

(Maybe a bit less than what you described. It's something I tried and I don't think LLMs can deal with most unstructured data at scale very well.)


The problem with AI isn't that it isn't useful.

Its that, as an industry or business at macro level, it is OBSCENELY overvalued.

Generative AI may be a 50bn business. Or a 25bn, or 75bn.

What it definitely isn't, is a multi-trillion doller game changer thats going to revolutionize the world in ways unheard of; and yet, that seems to be how it is being presented and, more importantly, pitched to VCs and hyperscalers.


That's fair. I don't know what the generative AI industry will end up being worth. Maybe you're right it's only worth 25bn or 75bn. But.. also.. maybe you're missing something. I certainly don't know, but I try to hold a spectrum of possible futures in mind.

I acknowledge your bear case and hear the possibility that it's all hype and the aggregate value of all generative AI (measured in dollars) will be worth less than e.g. the market capitalization of a single company like Uber.

BUT, hear me out. Forgive me, as I fallback to healthcare... The US spent ~$4.9 trillion on healthcare in 2023 alone (according to CMS). That cost is spread out across a lot of things, some of which is work that things like AI can help make more productive, some of which is not applicable to AI. When we are spending nearly ~$5 trillion dollars a year, it does not take a lot on a percentage basis to start seeing really significant dollar values in savings.

It's a story of death by 1000 cuts. I suspect it won't be a big magical fix all at once where AI magically solves healthcare. But we will optimize 1% here and 1% there using focused solutions that actually solve pain points. If someone improves productivity in healthcare by even just 1%, one-time, then we are talking about savings of ~50bn per year.


This project resonates with me a lot. Call me old-fashioned, but I still appreciate a nice ole' deterministic program that I can fully understand and operate reliably.

With that said, there is undoubtedly still room to innovate on the long-tail of RPA. In the healthcare domain, for example, there are 1000s of sites that might need to be scraped occasionally, somewhat transactionally as e.g. a new patient comes in. However, there are other sites that need regular attention and even the smallest of errors can be catastrophic.

The combination of browser-use & workflow-use seems like a really natural fit for such use cases. Nice work!

We've also experimented with the self-healing ideas you are playing with here. In our case, we wrote a chrome extension that connects to an LLM of your choice as well as a process running locally on your machine. You write a description of the job to be done, click around the browser, and then click "go". The extension grabs all the context, asks the LLM to write a typescript program, sends that typescript program to the local process where it is compiled & type-checked against our internal workflow harness, and then immediately allows you to execute the program against your existing, open browser context.

We've found that even this basic loop is outrageously productive. If the script doesn't do what you expect, there is a big "FIX IT" button that lets you tweak and try again. For the record, we're not a competitor and have no intention of trying to sell/offer this extension externally.

I suspect one of the harder parts about this whole ordeal will be how to integrate with the rest of the workflow stack. For us, we've really appreciated the fact that our extension outputs typescript that seamlessly fits into our stack and that is more easily verifiable than JSON. The TS target also allows us to do nice things like tell the self-healing bot which libraries will be available so that e.g. it can use `date-fns` instead of `Date`. We've also thought about adopting more traditional workflow tools like Temporal to manage the core workflow logic, vending out the browser connectivity remotely. Curious how you guys are thinking about this?

Rooting for you guys, we will be sure to keep an eye on your progress and consider adopting the technology as it matures!

PS. If you like things like this, want to work at a growing health-tech startup, and live in Boston, we're hiring! Reach out here: https://wgwx7h7be0p.typeform.com/to/LV0t8OjI


AI x Healthcare Startup | Boston, MA Onsite | Full-time | Founding Engineer

We're looking for a backend-leaning fullstack dev. You will be the first engineer outside of the founding team. Here is a bit more about us:

We're a seed stage AI startup backed by several top tier VCs.

We're on a mission to ensure patients get the coverage they deserve from their health insurance.

We’re building deep, vertically integrated technology systems to solve fundamental problems in US healthcare - the biggest market in the world ($5T). We use AWS, K8s, React, and Rust but there is no requirement to have prior experience with them specifically. We'll teach you!

We are a founding team made up of ex-YC, Amazon, Meta, Microsoft, and Harvard Business School with previous successful exits. We have a 6+ month customer waitlist and growing.

We're hiring our first engineer outside the founders. You'll work directly with customers to understand their needs, design the right solution, and build from zero to one. You'll own entire parts of the roadmap and tech stack while wearing multiple hats. Most important characteristics are resilience, work ethic, and curiosity. We care about slope, not where you are today.

This is an opportunity to work in an insanely fast paced, high ownership environment while solving real problems in healthcare.

We're happy to share more details on the role in person/on zoom. Please fill out this form if interested!

https://wgwx7h7be0p.typeform.com/to/LV0t8OjI


I also love rust and we use it heavily at our startup, but I agree with you and wish there were a mainstream alternative that kept much of the type system, pervasive expressions, and pattern matching while being smaller. I’d accept “very fast” even if it’s not as fast as rust.

One project I’ve seen that I don’t think is particularly active but that I really like the ideas behind is borgo. It compiles to go (thus GC) but is decidedly more rustacean.

Check it out. I hope someone makes something like this wide spread.

https://borgo-lang.github.io/

PS. I have no affiliation with borgo, just an online fan.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: