Unlock New Heights in Employee Development

The New Performance Appraisal

AI Buyers Guide for HR

Pitfalls of ‘Hallucinating’ GPT Models

TalentGuard AI & ML Buyers Guide for HR

AI Buyers Guide for HR: Navigating the Pitfalls of ‘Hallucinating’ GPT Models (#3 in Series)
by Frank P. Ginac, CTO at TalentGuard

Welcome to the next addition to my series, AI Buyers Guide for Human Resources (HR) Professionals. This is article number 2 in the series. My objective for this series is to arm HR professionals responsible for selecting, deploying, and managing AI-based HR Tech solutions in the enterprise with the knowledge they need to perform these tasks confidently. The information shared here is not just of value to HR professionals but also generally applies to any buyer of AI-based software. I hope you find the information helpful and welcome your feedback and comments. I would greatly appreciate it if you’d share this article and the others in the series with your network.

This article discusses the capabilities and limitations of Large Language Models (LLMs) like ChatGPT, highlighting that while they are powerful tools for certain tasks, they are not universal AI solutions. It addresses common misconceptions about LLMs, such as their use as a “source of truth,” and illustrates this with examples from legal and medical fields where reliance on LLMs led to inaccuracies. The article emphasizes the need for a balanced approach to AI application, combining LLMs’ strengths with human expertise and other AI techniques for effective and responsible use, particularly in fields like HR Tech.

ChatGPT continues to dominate AI news. With hundreds of millions of users by the last count, it seems that everyone has jumped on the LLM bandwagon. The media is painting it as a universal AI that can be applied to solve just about any problem it’s tasked with. While it is impressive, and I expect it will continue to improve with users finding more and more creative ways to apply it, it can’t solve (and never will) every kind of task that the vast array of AI/ML approaches and algorithms are capable of solving today.

Of deep concern to practitioners and researchers is the misconception of its capabilities and its use as a “source of truth.” Consider the lawyers who used ChatGPT to write a legal brief that included six fabricated citations. After being sanctioned by the court, the law firm issued the statement, “We made a good faith mistake in failing to believe that a piece of technology could be making up cases out of whole cloth.” [1]

More alarming are concerns about the use of LLMs in medical diagnoses, the creation of treatment plans, and the like. Consider a recent research paper published by the Journal of Medical Internet Research that evaluated the accuracy of ChatGPT’s diagnoses and found that “ChatGPT achieved an overall accuracy of 71.7% across 36 clinical vignettes.” [2]

Deep Learning Neural Networks and LLMs perform very well at certain tasks, and in some cases, much better than other AI approaches. But they have their limits as we’ve seen in the cases above. LLM stands for “Large Language Model” — a machine learning model trained with data, designed to complete a sequence. Tasked with completing the sequence “Jack and Jill went…”, a well-trained LLM will respond with “…up the hill to fetch a pail of water.” They can be trained to complete any sequence from the domains of natural languages to programming languages to symbolic languages and more. Think about the above examples of applications of LLMs gone wrong. Does it now make sense why LLMs are not well suited for all tasks? To be fair, ChatGPT is much more than an LLM. It is an LLM that has undergone further fine-tuning to learn how to follow instructions. However, at its core, it is still an LLM with all of the limitations of LLMs.

In my world, HR Tech, I spoke to an analyst recently who posited that the only AI of value is “big data” AI — Deep Learning Neural Networks and LLMs that require vast amounts of data to train. I hear more and more VCs, analysts, CEOs, and other business decision-makers making the same ill-informed claim. When it comes to deciding which approach or algorithm to use to solve a particular problem or even whether or not AI should be applied at all, I turn to the following advice from experts in the field:

“Any problem that can be solved by your in-house expert in a 10–30 minute telephone call can be developed as an expert system.” [3]

“If a typical person can do a mental task with less than one second of thought, we can probably automate it using AI either now or soon… Such tasks are ripe for automation. However, they often fit into a larger context or business process; figuring out these linkages to the rest of your business is also important.” [4]

Ginac’s corollary to Andrew Ng’s “one second of thought” proposition is that anything that takes longer than a few seconds of thought is unlikely to be automated with supervised machine learning approaches, at least, not with today’s approaches.

When thinking about ways to apply AI/ML at TalentGuard, these two rules weigh heavily on what we choose to automate and how we automate it. Recently, we embarked on a project to automate updating thousands of out-of-date learning references in one of our data sets. Given a learning reference in the form of a book with a description, publisher, date of publication, and ISBN that was published more than 5 years ago, is there a recently published book that covers the same material?

The easy thing to try was to see how OpenAI’s gpt-4 model would do given the task of generating over 3,000 book references. The model generated titles, descriptions, publishers, dates, and even ISBNs for each one! But only 10% of the books it generated were real books.

This phenomenon is known as model “hallucination” and it is precisely why LLMs should not be used as a source of truth even for tasks for which they perform quite well. However, combining LLM inference with data from source-of-truth databases and other AI techniques allowed us to update over 90% of our out-of-date references with automation.

The key lies in leveraging AI as a tool to augment human intelligence and expertise, rather than as a standalone solution. By combining LLM inference with data verification from reliable sources and complementary AI technologies, we can harness these models more effectively and responsibly.

[1] Reuters. New York Lawyers Sanctioned for Using Fake ChatGPT Cases in Legal Brief.

[2] Journal of Medical Internet Research.

[3] Firebough, M. W. Artificial Intelligence.

[4] Andrew Ng: What AI Can and Can’t Do.

Continue to read the series:

See a preview of TalentGuard’s platform

Unlocking Employee Potential
The Power of Skill-Based Learning

The power of skill-based learning in corporate America is paramount. Businesses prioritizing employee development are better equipped to adapt to changes, boost productivity, and maintain a competitive edge. One of the most effective strategies for achieving this is through skill-based learning. Skill-based learning focuses on enhancing specific competencies, whether technical or soft skills, to improve […]

TalentGuard Talent Insights
Pioneering Data-Driven Talent Strategies

Pioneering Data-Driven Talent Strategies Organizations face a daunting challenge—the widening skills gap. This issue threatens their ability to innovate, stay competitive, and achieve strategic objectives. Traditional talent development methods are no longer sufficient. In this blog post, we delve into the white paper “Elevating CEO Insights: Pioneering Data-Driven Talent Strategies” by Linda M. Ginac, the […]

Refining Skills Assessment
Skill Assessment and Verification for Growth

Skill Assessment and Verification for Growth Redefining skill assessment for continuous growth enhances employee development and drives organizational success. Businesses are facing a profound challenge: traditional skill assessment and verification methods are no longer sufficient in today’s dynamic workforce. This realization has led organizations to seek innovative solutions to assess, verify, and showcase skills accurately […]