>* "AI decision making also needs to be more transparent. Whether it’s automated hiring, healthcare or financial services, AI should be understandable, accountable and open to scrutiny."
You can't simply look at a LLM's code and determine if, for example, it has racial biases. This is very similar to a human. You can't look inside someone's brain to see if they're racist. You can only respond to what they do.
If a human does something unethical or criminal, companies take steps to counter that behaviour which may include removing the human from their position. If an AI is found to be doing something wrong, one company might choose to patch it or replace it with something else, but will other companies do the same? Will they even be alerted to the problem? One human can only do so much harm. The harm a faulty AI can do potentially scales to the size of their install base.
Perhaps, in this sense, AI's need to be treated like humans while accounting for scale. If an AI does something unethical/criminal, it should be "recalled". i.e. Taken off the job everywhere until it can be demonstrated the behaviour has been corrected. It is not acceptable for a company, when alerted to a problem with an AI they're using, to say, "Well, it hasn't done anything wrong here yet."
Why even look at the LLM? A human deployed it and defers to its advice for decision-making; the problem is in the decision-making process, which is in full control of the human. It should be obvious how to regulate that: be critical of what the human behind the LLM does.
I'm giving the benefit of the doubt here that the machine can in theory work well, and is also well used by the human (and the flaw was very unlikely to be found by their inspection) : so in this idealized setting only the machine's manufacturer screwed up (potentially also that human's boss which decided to buy that machine).
You don't need to look at the code (of LLM or otherwise) to determine if it has racial biases. You feed it a test suite of cases where literally the only variable is race, and observe the results - statistically, in aggregate. It's much easier to do with an LLM than it is with a human, too, because with an LLM, each individual interaction is separate from all others.
You can't simply look at a LLM's code and determine if, for example, it has racial biases. This is very similar to a human. You can't look inside someone's brain to see if they're racist. You can only respond to what they do.
If a human does something unethical or criminal, companies take steps to counter that behaviour which may include removing the human from their position. If an AI is found to be doing something wrong, one company might choose to patch it or replace it with something else, but will other companies do the same? Will they even be alerted to the problem? One human can only do so much harm. The harm a faulty AI can do potentially scales to the size of their install base.
Perhaps, in this sense, AI's need to be treated like humans while accounting for scale. If an AI does something unethical/criminal, it should be "recalled". i.e. Taken off the job everywhere until it can be demonstrated the behaviour has been corrected. It is not acceptable for a company, when alerted to a problem with an AI they're using, to say, "Well, it hasn't done anything wrong here yet."