Sunday, August 17, 2025

What do we mean when we say we don't know how Large Language Models work?

Large Language Models - what passes for "AI" these days (there are other kinds, but this is what most people today mean when they use the term) - are in effect gigantic autocomplete algorithms, implemented using a technique called "machine learning", which is based on artificial neural networks (from how we currently believe the brain works), scaled up to use trillions of parameters, computed from terabytes of training data, much of which is copyrighted and used without the creators' permission. An LLM produces the output that its algorithm deems the most likely to be a response to your input prompt, based on its model of that training data. If that output represents actual truth or facts, it's only because the training data made that seem probable.

LLMs "hallucinating" isn't a bug; it's fundamental to how they operate.

I've read several articles on LLMs whose basic theme is "no one knows how LLMs work". This is true, but probably not in the way that most people think. The LLM developers that work for the AI companies know exactly how the software algorithms work - it's not only just code, it's code that they for the most part wrote. It's the trillions of parameters, derived algorithmically from the terabytes of training data, the is the big mystery.

Imagine a vast warehouse, on the scale of the scenes at the end of Citizen Kane or Raiders of the Lost Ark. That warehouse is full of file cabinets. Each file cabinet is full of paper files about every person that has ever lived in the United States, for as long as the U.S. Government has been keeping records. Your job: tally the number of people in those files whose first name ends in "e", who had a sibling whose first name ends in "r".

You understand the job. The task is straightforward. The algorithm you could use to accomplish this is obvious. But could you do it? No. The dataset is too ginormous. You literally won't live long enough to get it done, even if you could maintain your interest.

But if all that information were to be digitized, stored in a huge database, the database indexed to link records of family members together, and a program written to answer the original question, a computer could come up with the answer after a few minutes. These kinds of mundane repetitive tasks are what computers excel at.

(This isn't the perfect metaphor but it's the best I've got at the moment.)

LLMs are more complicated than that, and more probabilistic, but it's the same idea. We understand how the code part of the task works. But it's the data, the artificial neural network and its implications, we don't understand. We can't understand. Not just the training data - which is far too much for us to read and digest - but the interconnections between the trillions of parameters that are formed and the statistical weights that are computed as the training data are processed.

If someone asks "How did the AI come up with that response?", that's the part we have to say "We don't know." The artificial neural network is just too big, and stepping through it manually, tracing every single step the algorithm made, while technically not impossible, is just too tedious and time consuming. And relating the parameters and weights of the neural net back to the original training data would be like trying to unscramble an egg.

Knowing how the code works will get more complicated as we use LLMs themselves to revise or rewrite the code. This isn't a crazy idea, and if it's not happening now, it will happen, perhaps soon. And then the code, the part we thought we knew how it worked, will evolve such that we no longer know how it works.

Admittedly, artificial neural network based machine learning models aren't my area of expertise. But I'm not completely ignorant about how they work. I think there are myriads of applications for them. For example, I think we'll use them to discover new drug pathways, just waiting to be found in existing voluminous clinical datasets (although any such results will have to be carefully experimentally verified by human researchers). But I'm becoming increasingly skeptical about the more grandiose claims made for them - sometimes by people who should know better.

Saturday, August 16, 2025

Events 2

I read a transcript of a science explainer by Dr. Sabine Hossenfelder about physicist David Deutsch's "Constructor Theory", which I had not heard of before, and how it accounts for time.


It sounds like just the 180º opposite of what I've been talking about: creating a model of physics that seems more like the kind of real-time systems I work on as a basis for reality. The shortest time period (Planck Time?) is the recycle time of a kind of null task, a term right out of Real-Time Operating Systems. That's basically how I think of the world around me - based solely on decades of professional experience - but it seems weird to think of it as a legitimate Theory of Everything.

Down deep, real-time computer systems - with their asynchronous, concurrent, and parallel behavior - are a lot more non-deterministic than people might think. It's one of the reasons that it's hard to debug such systems - a bug might only reveal itself under certain timing or certain order of events. Determinism is a kind of emergent property created by engineers who are hiding the details under the hood from the user - kind of like Newtonian physics layered on top of relativity and quantum mechanics.

Once you become accustomed to architecting, implementing, and debugging such systems, it's easy - it was for me, anyway - to start seeing the entire world through the same lens. Maybe I should not be surprised that there's one candidate for a Theory of Everything that takes this viewpoint.

Friday, August 08, 2025

Events

 I've spend decades as a software/firmware developer of real-time systems, going all the way back to the 1970s when I was writing software in the assembler languages of the IBM 360/370 and the PDP-11. The term "real-time" always seemed kind of ironic, since it is easy, when closely scrutinizing such systems - with their asynchronous, concurrent, and parallel behavior - to come to the conclusion that time doesn't exist. Only ordered events. We don't have a way to measure time, except by counting events produced by some oscillator that ultimately derives its periodicity from nature. We call such devices a "clock", Since the only way to test the accuracy and precision of a clock is with a better clock, it's turtles all the way down.

Turing award winning computer scientist Leslie Lamport even wrote what came to be a classic paper on this topic, "Time, Clocks, and the Ordering of Events in a Distributed System" [CACM, 21.7, 1978-07]. He proposed a "logical clock" which was simply a counter that incremented every time it was read, allowing events to be placed in a clear order. I remember reading this paper as a graduate student. And again, later. And again, even later. I may read it again today.

Years ago I mentioned this line of thought to a colleague of mine, who happened to have a Ph.D. in physics and had worked at Fermi Lab. (It's handy to keep such folks around just for this reason.) He immediately brought up the now obvious to me fact that time must exist: Einstein's special and general relativity.

Einstein's theories of SR and GR have been experimentally verified time and again (no pun intended). You can synchronize two atomic clocks side by side, then take one up to the top of mountain (where it experiences less gravity due to being further from the center of the Earth, and hence time runs faster: that's GR) and back down, and find they they now differ by just the predicted amount. This experiment has been done many times.

The U.S. Global Positioning System (and indeed all other Global Navigation Satellite Systems) work by just transmitting the current time to receivers on the Earth. Fundamentally, that's it. All the heavy lifting, computationally, is done by the GPS receiver in your hand. But the atomic clocks inside every GPS satellite have to be carefully adjusted by controllers on the ground to account for GR (because the satellites in their orbits are further from the center of the Earth than you are, and so their clocks run faster), and for SR (because the satellites in their orbits are centripetally accelerated more than you are, and so their clocks run slower). GPS wouldn't give useful results if this correction weren't performed.

The resonant frequency of cesium-133 is the definition of the "second" in the International System (SI) of units. Count off exactly 9,192,631,770 pulses of the microwaves emitted by cesium-133 during the hyperfine transition of their electron in the element's outer electron shell, and that's one second. If cesium is lying to us, we'll never know.

Or maybe we would. Experimental atomic clocks using elements like ytterbium are running in national metrology labs. These are called "optical" atomic clocks because they operate at terahertz frequencies using lasers instead of microwaves at gigahertz frequencies, and their periods are measured in attoseconds instead of nanoseconds. The time is very near in which the definition of the SI second will be changed to use these clocks.

Clocks that are so precise that their position has to be determined by careful surveying because their results are different if the altitude of the laboratory optical bench changes by a centimeter, thanks to GR.

Clocks that are still nothing more than oscillators and counters.

(I took the photograph below in 2018: a survey marker embedded in the concrete floor of an optical atomic clock laboratory at NIST's Boulder Colorado facility.)

Untitled