If the screwdriver were invented by the tech industry today, then it would be widely deployed for a variety of tasks, including hammering nails. Since the debut of ChatGPT, there has been growing fervor and backlash against large language models (LLMs). Indeed, many adaptations of the technology seem misappropriated, and its capabilities are overhyped, given its frequent lack of veracity. This is not to say there are not many great uses for an LLM, but you should answer some key questions before going full bore.
Will an LLM be better or at least equal to human responses?
Does anyone like those customer service chatbots that don’t answer any question that isn’t already on the website’s front page? On the other hand, talking to a person in customer service who just reads a script and isn’t empowered to help is equally frustrating. Any deployment of an LLM should test whether it is equal or better to the chatbot or human responses it is replacing.
What is the liability exposure?
In a litigious society, any new process or technology has to be evaluated against its potential for legal exposure. There are obvious places for caution, like medical, law, or finance, but what about an LLM-generated answer that directs people to a potential policy or advice that is misleading or unallowed? In many places, bad company policies or management have meant human-generated responses resulted in class action lawsuits. However, an improperly trained or constrained LLM could generate responses for a large number of users and create unintended liability.
Is it actually cheaper?
Sure, it is easy to measure your subscription and use of a general LLM like ChatGPT, but more specific custom systems can have higher costs beyond just the compute power. What about the staff and other infrastructure to maintain and debug the system? You can hire quite a few customer service personnel for the price of one AI expert. Additionally, ChatGPT and similar services seem to be subsidized by investment at the moment. Presumably at some point they will want to turn a profit and therefore your cost could go up. Is it actually cheaper and will it stay so for the life of your system?
How will you maintain it?
Most LLM systems will be custom trained in specific data sets. A disadvantage to the neural networks on which LLMs rely is that they are notoriously difficult to debug. As this technology progresses, a model may eventually be able to update (or unlearn) something it has learned, but for now this can be quite difficult. What is your process or procedure for regularly updating the LLM, especially if it gives a bad response?
What is your testing process?
A key benefit of an LLM is that you don’t have to anticipate every possible permutation of a question in order for it to provide a credible answer. However, the word “credible” doesn’t mean correct. At least the most common questions and various permutations should be tested. If your LLM is replacing a human or existing machine process, the questions people are asking today are a good data set to start with.
There is an old proverb of dubious provenance that translates roughly to “slow down I’m in a hurry.” Not everything will be a great fit for LLMs, and there is ample evidence (no Grammarly, I don’t want to sound more positive!) that enthusiasm has outstripped capabilities. However, by measuring quality, economy, and coming up with some decent maintenance and testing procedures, LLMs can be a valuable tool in many different use cases.