Here I illustrate my two most recent interactions with AI-powered GPT. It was an awful failure, a lot worse than before GenAI. Indeed, I had to revert back to old Google search to get help. This is typical of what hundreds of millions of users now experience every day.
➡️ First example:
I get payments from Stripe. I asked how I can pay someone, as opposed to getting paid, as I had a contact asking me to pay him with Stripe. After 30 mins of prompts to AI support, I got nowhere. In the end I decided to pay my contact using a different platform. I could not figure out how to a meaningful answer: see featured image.
➡️ Second example:
A VC guy I started to interact with sent me a few messages, but I never received any of them. I tried to contact my email provider, but was faced with a GenAI bot to answer the following precise question: his email address is xyz, mine is abc, his messages do not even show up in my spam box, and I did not block their domain name; how to fix this? After receiving irrelevant answers, I ask point blank: can I chat with a real human? Again, irrelevant answers, no matter how I phrase my question. In the end I told my contact to send messages to an alternate email address.
LLM 2.0 has been brewing for a long time. Now it is becoming mainstream and replacing LLM 1.0, for its ability to deliver better ROI to enterprise customers, at a much lower cost. Much of the past resistance towards its adoption lied in one question: how can you possibly do better with no training, no GPU, and zero parameter? It is as if everyone believed that multi-billion parameter models are mandatory, due to a long tradition.
However, this machinery is used to train models on tasks irrelevant to the purpose, relying on self-reinforcing evaluation metrics that fail to capture desirable qualities such as depth, conciseness or exhaustivity. Not that standard LLMs are bad: I use OpenAI and Perplexity a lot for code generation, writing my investor deck, and even to answer advanced number theory questions. But their strength comes from all the sub-systems they rely upon, not from the central deep neural network. Remove or simplify that part, then you get a product far easier to maintain and upgrade, costing far less in development, and if done right, delivering more accurate results without hallucination, without prompt engineering and without the need to double-check the answers. Many times, errors are quite subtle and can be overlooked.
Good LLM 1.0 still saves a lot of time but requires significant vigilance. There is plenty of room for improvement, but more parameters and Blackbox DNNs have shown their limitations.