The other day I wanted to look up a specific IBM PS/2 model, a circa 1992 PS/2 Server system. So I punched the model into Google, and got this:
That did not look quite right, since the machine I was looking for had 486 processors (yes, plural). And it most certainly did use Microchannel (MCA).
Alright, let’s try again:
Simply re-running the identical query produces a different summary. Although the AI still claims that PS/2 Model 280 is an ISA-based 286 system. Maybe the third time is the charm?
The AI is really quite certain that PS/2 Model 280 was a 286-based system released in 1987, and I was really looking for a newer machine. Interestingly, the first time the AI claimed Model 280 had 1MB RAM expandable to 6MB, and now it supposedly only has 640 KB RAM. But the AI seems sure that Model 280 had a 1.44 MB drive and VGA graphics.
What if we try again? After a couple of attempts, yet different answer pops up:
Oh look, now the PS/2 Model 280 is a 286 expandable to 128 MB RAM. Amazing! Never mind that the 286 was architecturally limited to 16 MB.
Even better, the AI now tells us that “PS/2 Model 280 was a significant step forward in IBM’s personal computer line, and it helped to establish the PS/2 as a popular and reliable platform.”
The only problem with all that? There is no PS/2 Model 280, and never was. I simply had the model number wrong. The Google AI just “helpfully” hallucinates something that at first glance seems quite plausible, but is in fact utter nonsense.
But wait, that’s not the end of the story. If you try repeating the query often enough, you might get this:
That answer is actually correct! “Model 280 was not a specific model in the PS/2 series”, and there was in fact an error in the query.
Here’s another example of a correct answer:
Unfortunately the correct answer comes up maybe 10% of the time when repeating the query, if at all. In the vast majority of attempts, the AI simply makes stuff up. I do not consider made up, hallucinated answers useful, in fact they are worse than useless.
This minor misadventure might provide a good window into AI-powered Internet search. To a non-expert, the made up answers will seem highly convincing, because there is a lot of detail and overall the answer does not look like junk.
An expert will immediately notice discrepancies in the hallucinated answers, and will follow for example the List of IBM PS/2 Models article on Wikipedia. Which will very quickly establish that there is no Model 280.
The (non-expert) users who would most benefit from an AI search summary will be the ones most likely misled by it.
How much would you value a research assistant who gives you a different answer every time you ask, and although sometimes the answer may be correct, the incorrect answers look, if anything, more “real” than the correct ones?
When Google says “AI responses may include mistakes”, do not take it lightly. The AI generated summary could be utter nonsense, and just because it sounds convincing doesn’t mean it has anything to do with reality. Caveat emptor!
I’ve often noticed that LLMs are really bad at “admitting” that don’t “know” something. They’ll pretty much always give a plausible-looking (at least at first glance) answer to more-or-less any question that’s not obviously nonsense…
Of course, that also means they’ll hallucinate plausible answers to questions based on incorrect premises.
I can only imagine the implications for “vibe coding”…
Re “vibe coding”: I wonder what the quality of AI generated code is depending on what skills the one using the AI has? Like do the comparisons and whatnot take into account that whoever is most likely to write the queries is probably far from an expert in the field.
Speaking of incorrect results: It would be great if the search engines keep track of changes in major things people search for. When commenting, I was about to use the auto fill feature in Firefox and got annoyed by it had incorrectly saved some random junk as an alternative to my name. Googled it, the first result was totally incorrect. Adding a limit to max one year old results gave the correct suggestion (type the first letter of the incorrect suggestion, hold shift, select the incorrect suggestion, keep holding shift and press delete. Poof, the incorrect suggestion is gone!)
This is why me avoids the term “AI” — it’s more of an AS, really —
in favour of “ML” (machine learning).
In fact, me’s always understood that ML was one of the things to come
out of the grand *failure* of the {6,7}0s AI projects. (The problem?
after extensive research into replicating intelligence, they found out
they couldn’t really define “intelligence” at all! Oops.)
Either way, the main diff between contemporary “AI” and the 80s stuff is
that the contemporary version runs with a *much* bigger database
(anything it can scrape off the interwebs); it still operates on the
basis of statistical inference (if me has me terminology right). What
could *possibly* go wrong…?
The question WTH one would even want an “AI” modeled on humans me’ll
leave as an exercise for fellow commenta^W^Wthe reader.
OTOH a modern models did not learn still one simple logic rule (one of the corner stone of science method) “No matter how many negative statements you had before – after one verified positive statement all negative statements are false”. So from my point of view they are just giant semantic networks, not intellect.
The way it was explained to me is that LLMs have absolutely zero concept of “I’m certain about this” or “I am very uncertain about this”. It is really just a statistical machine generating the most likely sequence of tokens. The LLM does not know what it knows or doesn’t know, it just plows ahead and produces something. That arguably completely undermines any utility LLMs might have in the area of research, because if what you get could be (and often is) complete garbage, why even bother?
Numerous times I have observed that LLM translation would rather produce a nonsensical string of digits than just say “I don’t know what this means”.
I agree that artificial it may be, intelligent it’s certainly not. But Google calls it AI so… that’s their problem if they want to give AI a bad name.
AI is definitely something that’s been around for 50+ years. Browsing old magazines I noticed how the 386 based PCs were touted to be “great for AI” back in 1986-87.
Transport Tycoon gave “AI” a bad name way before Google was around =)
Bad terminology, like all bad habits, is quite pervasive and stubborn.
@zeurkous:
Haha, games that generate infrastructure seems to always struggle to some extent.
The spiritual successor to Transport Tycoon would be Transport Fever 2 and the way the “AI” expands cities is sometimes really weird.
Also Workers&Soviet Republic has a mode where it optionally generates pre built “old” cities and roads when starting a new game (at least on a random map, can’t remember if it can do this on custom maps?) and in some cases the infrastructure clearly shows that the “AI” struggled to build things. Bonus fun fact: If you immediately pause the game when starting a new game, with pre built cities and roads, you can sometimes see the animation for removing a road where the “AI” at first decided to build a road and then regretted it’s choice :O
The whole most recent AI craze have started as invented in Google for one and ONE task only: machine translation. That’s it, nothing less, nothing more.
It turned out that it may translate between many different things, not just between human languages, but also between “description of algorithm in English” and “description of algorithm in Python” and even “translate” from the “name of PS/2 system” into “expanded specification of said PS/2 system”.
But when you ask to translate from the name of model that doesn’t exist… well… it does a VERY credible imitation of a student during exam in a similar situation: it ”tries to read the answer in the eyes of the examiner”.
Whether to call it AI or not is good question, but there are no thinking or deduction involved, just memory.
It doesn’t matter what it is called. Is it useful? So far, the answer is no. Sure, some developer could carefully craft a demo that seems to work for one problem but it will be a long time before it can be relied on.