The Promise and Peril of Building with Large Language Models

Large language models (LLMs) like ChatGPT, Claude 2, Falcon, and now Llama 2 have captured the public imagination with their ability to generate human-like text. Businesses are racing to leverage these powerful AI tools to create next-generation applications. But while the potential is immense, there are also significant risks involved in deploying LLMs in production environments. In this article, we’ll explore the key challenges developers face when going from LLM proof-of-concept to building real products.

Hallucinating Instead of Knowing

A persistent issue with LLMs is their tendency to hallucinate – making up plausible-sounding but totally incorrect or nonsensical statements. Unlike humans, LLMs have no actual understanding of the words they generate. They simply predict sequences based on the statistical patterns they’ve observed in training data. So you may ask a question and get a very convincing but fabricated response. This is obviously problematic for any application relying on accuracy.

Possible solutions involve grounding the LLM’s knowledge by retrieving and summarizing relevant data from knowledge bases before generating text. This human-in-the-loop approach produces more factual, supported responses, though even retrieval techniques have limitations in coverage and can propagate biases. Many experts argue LLMs fundamentally lack the reasoning capabilities needed for truly trustworthy answers.

Beware Prompt Engineering’s Limits

Given their statistical nature, LLMs require carefully engineered prompts to shape their outputs. But prompt engineering has limits. While you can nudge an LLM’s response in a certain direction, the wording will always impact the result in ways that are challenging to control fully. So prompt engineering often involves an inefficient process of trial-and-error tweaking.

Some argue prompt engineering equates to programming. But unlike code, prompts are ambiguous and fail to provide the deterministic instructions computers need. Better prompt engineering tools would help, but tension remains between natural language’s expressiveness and a computer’s need for specificity. This suggests prompt engineering alone may not suffice for complex applications.

Prompt Injection Opens Security Risks

Allowing users to input natural language queries also opens the door to malicious exploit through prompt injection. If prompts can deliberately alter an LLM’s behavior, bad actors could potentially induce dangerous outputs. Imagine a medical app that generates dosage recommendations. A poisoned prompt could lead to hazardous advice.

There are tactics to mitigate these risks, like filtering and sanitizing user input. But prompt injection remains an inherent vulnerability for LLMs. For any safety-critical application, extensive testing and validation would be needed to avoid potentially disastrous edge cases an injected prompt could trigger.

Beware the Cost of Large Models

Developing with LLMs also demands substantial computational resources for training and inference. Leading models like GPT4 require thousands of dollars per month to access, putting them out of reach for many. Options are expanding with open source models and cloud services, but productionizing LLMs still involves non-trivial costs, especially at scale.

Not every application will need massive models with billions of parameters. But developers should still architect with efficiency in mind, and understand LLMs incur a tax on resources.

Challenges Remain in Production Use

None of this is to say LLMs can’t or won’t drive amazing new applications. Their versatility enables innovation across many verticals. But moving from an impressive demo to a reliable product involves clearing hurdles. Factfulness, controllability, security, and costs are issues developers must consider. Progress is being made both in techniques to stabilize models and entirely new architectures better suited for rigors of production. There is still much research needed, but the technology is advancing rapidly.

For now, temper expectations, rigorously test, and understand tradeoffs in leveraging LLMs versus other tools. With a thoughtful approach, their profound potential can be harnessed to build the future.

About the Author

Priyank Kapadia is the Product and Technology Partner at Accolite Digital.

This article was originally published in CXOToday.

See our latest announcements and updates

View all news

Feb 1, 2024

Accolite and Bounteous Join Forces, Forming Global Leader in Digital Transformation Services

Nov 15, 2023

Security Boulevard

An assessment of how ‘Gen-AI’ has begun to transform DevSecOps

Oct 31, 2023

Optimize the existing. Conceptualize the new. Reveal the unknown.

AI Based Solution to Enable On Demand Forecasting

Deeper knowledge and broader capabilities in every sector

Modernizing the Omnichannel Journey & Providing a Superior Customer Experience

Our latest thinking and perspectives

The Accolite Podcast

Fearless ideas meet exceptional results

Accolite Ignites Global Enterprises to be Future-Ready with a Generative AI Centre of Excellence

Unlimited opportunity for growth and reinvention

In the News