Sunday, July 16, 2023

Large Machine Learning Models Are Not Intrinsically Ethical - And Neither Are Large Corporations

I think the screen actors and writers concerns about the use of large AI models is legitimate, since the models cannot exist and could not be successful without being trained using a ginormous amount of human-created input, typically without the permission or even knowledge of the original creators.

But that's just the tip of the iceberg, being currently the most visible public example of this concern.

Eventually, software engineers will wise up and figure out they have the same issue, with companies training AIs using software - including open source software - written by humans, most of whom are no longer, or never were, employees of theirs, without any compensation, consent, or acknowledgement.

Worse, companies will try to get around using expensive, experienced, and ethical developers by training AIs to generate software that will be used in safety critical or weapons systems.

Eventually, companies will save even more money, and avoid any intellectual property issues, by training AIs using software that was itself generated by other AIs, and... it's turtles all the way down. With each iteration, it will be like a game of telephone, the quality of the output getting worse and worse. Except sometimes with ground to air missiles.

In time, there will be corporate executives for some prime defense contractor sitting in front of a Congressional committee, trying to explain why their automated weapons system shot down a commercial airliner because it thought it was a Russian bomber. They will be forced to admit that no one - not their scientists, not their own engineers, not anyone - really understands how the AI in the system came to that decision.

Because that's how complex large neural network machine learning models are. It's not traditional if-then-else logic, a so-called "rule-based" system, like I studied when I was a graduate student in Computer Science. It's an almost incomprehensibly gigantic simulated network of neurons that was configured by an almost unbelievably huge dataset of input. A dataset whose contents no human moderated or approved or even examined. Or, because of its volume, could examine.

I admit this isn't my area of expertise. But I have a couple of degrees in Computer Science from an accredited program at a university. I have worked for many years in large multi-national corporations, part of that time in the defense-industrial complex. So I feel like I have a reasonably informed opinion on both the technical aspects and how large corporations work.

I believe that the application of very large machine learning models to weapons systems is inevitable. If not by the U.S., then by other countries, perhaps including our allies. The results will be unpredictable. And unexplainable.


It only just now occurred to me that how large machine learning models work might be a good metaphor for the hive minds of large organizations.

Not really joking.

Postscript 2

My use of "hive minds" above was quite deliberate, BTW, since my train of thought first connected machine learning modes with the emergent behavior of some insect colonies e.g. bees. The individual bee - and the neural network inside its brain - is relatively simple, but the group behavior of a lot of bees is quite complex - and not even remotely understood by any individual bee.

Postscript 3

I couldn't read this paywalled article from Bloomberg [2023-07-16], but the part I could see, just a few minutes ago, coincidentally, was enough.

"Israel Quietly Embeds AI Systems in Deadly Military Operations

Selecting targets for air strikes and executing raids can now be conducted with unprecedented speed, according to army officials.

The Israel Defense Forces have started using artificial intelligence to select targets for air strikes and organize wartime logistics as tensions escalate in the occupied territories and with arch-rival Iran.

Though the military won’t comment on specific operations, officials say that it now uses an AI recommendation system that can crunch huge amounts of data to select targets for air strikes. Ensuing raids can then be rapidly assembled with another artificial intelligence model called Fire Factory, which uses data about military-approved targets to calculate munition loads, prioritize and assign thousands of targets to aircraft and drones, and propose a schedule."

Postscript 4

There's a very recent article on Vox about how the inner workings of large machine learning models are unknowable.

Postscript 5

Postscript 6

The article from VentureBeat that I cite just above makes an interesting point: the fact that using AI model output as training data for another AI model leads to "model collapse" means that high-quality human-generated or human-curated training data becomes increasingly more rare and more valuable. I predict this will lead to new open source licenses, GNU and otherwise, that restrict data or code use as training data for machine learning models. (And of course, AI developers will routinely violate those open source licenses, just as they are violated now.)

No comments: