Large Language Models - what passes for "AI" these days (there are other kinds, but this is what most people today mean when they use the term) - are in effect gigantic autocomplete algorithms, implemented using a technique called "machine learning", which is based on artificial neural networks (from how we currently believe the brain works), scaled up to use trillions of parameters, computed from terabytes of training data, much of which is copyrighted and used without the creators' permission. An LLM produces the output that its algorithm deems the most likely to be a response to your input prompt, based on its model of that training data. If that output represents actual truth or facts, it's only because the training data made that seem probable.
LLMs "hallucinating" isn't a bug; it's fundamental to how they operate.
I've read several articles on LLMs whose basic theme is "no one knows how LLMs work". This is true, but probably not in the way that most people think. The LLM developers that work for the AI companies know exactly how the software algorithms work - it's not only just code, it's code that they for the most part wrote. It's the trillions of parameters, derived algorithmically from the terabytes of training data, the is the big mystery.
Imagine a vast warehouse, on the scale of the scenes at the end of Citizen Kane or Raiders of the Lost Ark. That warehouse is full of file cabinets. Each file cabinet is full of paper files about every person that has ever lived in the United States, for as long as the U.S. Government has been keeping records. Your job: tally the number of people in those files whose first name ends in "e", who had a sibling whose first name ends in "r".
You understand the job. The task is straightforward. The algorithm you could use to accomplish this is obvious. But could you do it? No. The dataset is too ginormous. You literally won't live long enough to get it done, even if you could maintain your interest.
But if all that information were to be digitized, stored in a huge database, the database indexed to link records of family members together, and a program written to answer the original question, a computer could come up with the answer after a few minutes. These kinds of mundane repetitive tasks are what computers excel at.
(This isn't the perfect metaphor but it's the best I've got at the moment.)
LLMs are more complicated than that, and more probabilistic, but it's the same idea. We understand how the code part of the task works. But it's the data, the artificial neural network and its implications, we don't understand. We can't understand. Not just the training data - which is far too much for us to read and digest - but the interconnections between the trillions of parameters that are formed and the statistical weights that are computed as the training data are processed.
If someone asks "How did the AI come up with that response?", that's the part we have to say "We don't know." The artificial neural network is just too big, and stepping through it manually, tracing every single step the algorithm made, while technically not impossible, is just too tedious and time consuming. And relating the parameters and weights of the neural net back to the original training data would be like trying to unscramble an egg.
Knowing how the code works will get more complicated as we use LLMs themselves to revise or rewrite the code. This isn't a crazy idea, and if it's not happening now, it will happen, perhaps soon. And then the code, the part we thought we knew how it worked, will evolve such that we no longer know how it works.
Admittedly, artificial neural network based machine learning models aren't my area of expertise. But I'm not completely ignorant about how they work. I think there are myriads of applications for them. For example, I think we'll use them to discover new drug pathways, just waiting to be found in existing voluminous clinical datasets (although any such results will have to be carefully experimentally verified by human researchers). But I'm becoming increasingly skeptical about the more grandiose claims made for them - sometimes by people who should know better.
2 comments:
Several times I've attended a professional function where someone gives a talk how how to use AI (really, LLMs) in business, and how it's going to revolutionize things. That's probably true. But when they say (as someone did just the other day) "We won't have a problem with hallucinations because we're using our own training data" I know that they don't really understand how LLMs work. This happens a lot.
you know, I used to know (mostly) how my phone worked...
Post a Comment