Large language models (LLMs) like ChatGPT, Claude 2, Falcon, and now Llama 2 have captured the public imagination with their ability to generate human-like text. Businesses are racing to leverage these powerful AI tools to create next-generation applications. But while the potential is immense, there are also significant risks involved in deploying LLMs in production environments. In this article, we’ll explore the key challenges developers face when going from LLM proof-of-concept to building real products.

Hallucinating Instead of Knowing

A persistent issue with LLMs is their tendency to hallucinate – making up plausible-sounding but totally incorrect or nonsensical statements. Unlike humans, LLMs have no actual understanding of the words they generate. They simply predict sequences based on the statistical patterns they’ve observed in training data. So you may ask a question and get a very convincing but fabricated response. This is obviously problematic for any application relying on accuracy.

Possible solutions involve grounding the LLM’s knowledge by retrieving and summarizing relevant data from knowledge bases before generating text. This human-in-the-loop approach produces more factual, supported responses, though even retrieval techniques have limitations in coverage and can propagate biases. Many experts argue LLMs fundamentally lack the reasoning capabilities needed for truly trustworthy answers.

Beware Prompt Engineering’s Limits

Given their statistical nature, LLMs require carefully engineered prompts to shape their outputs. But prompt engineering has limits. While you can nudge an LLM’s response in a certain direction, the wording will always impact the result in ways that are challenging to control fully. So prompt engineering often involves an inefficient process of trial-and-error tweaking.

Some argue prompt engineering equates to programming. But unlike code, prompts are ambiguous and fail to provide the deterministic instructions computers need. Better prompt engineering tools would help, but tension remains between natural language’s expressiveness and a computer’s need for specificity. This suggests prompt engineering alone may not suffice for complex applications.

Prompt Injection Opens Security Risks

Allowing users to input natural language queries also opens the door to malicious exploit through prompt injection. If prompts can deliberately alter an LLM’s behavior, bad actors could potentially induce dangerous outputs. Imagine a medical app that generates dosage recommendations. A poisoned prompt could lead to hazardous advice.

There are tactics to mitigate these risks, like filtering and sanitizing user input. But prompt injection remains an inherent vulnerability for LLMs. For any safety-critical application, extensive testing and validation would be needed to avoid potentially disastrous edge cases an injected prompt could trigger.

Beware the Cost of Large Models

Developing with LLMs also demands substantial computational resources for training and inference. Leading models like GPT4 require thousands of dollars per month to access, putting them out of reach for many. Options are expanding with open source models and cloud services, but productionizing LLMs still involves non-trivial costs, especially at scale.

Not every application will need massive models with billions of parameters. But developers should still architect with efficiency in mind, and understand LLMs incur a tax on resources.

Challenges Remain in Production Use

None of this is to say LLMs can’t or won’t drive amazing new applications. Their versatility enables innovation across many verticals. But moving from an impressive demo to a reliable product involves clearing hurdles. Factfulness, controllability, security, and costs are issues developers must consider. Progress is being made both in techniques to stabilize models and entirely new architectures better suited for rigors of production. There is still much research needed, but the technology is advancing rapidly.

For now, temper expectations, rigorously test, and understand tradeoffs in leveraging LLMs versus other tools. With a thoughtful approach, their profound potential can be harnessed to build the future.

About the Author

Priyank Kapadia is the Product and Technology Partner at Accolite Digital.

This article was originally published in CXOToday.

Read more at:

https://www.cxotoday.com/specials/the-promise-and-peril-of-building-with-large-language-models/