Image by OpenAI’s DALL·E 3.
Since late 2022 and the release of ChatGPT, a new crop of “generative artificial intelligence” tools has been taking AI out of academia and into the mainstream. Yet, one year on, it remains unclear what this technology means for boards of directors and the main instrument they rely on to do their job — the board pack.
Generative AI builds on past breakthroughs in the field of machine learning, namely neural networks and deep learning — systems that don’t need to be told what exact steps to follow but can simply be given an end goal and find out by themselves how to reach it through trial-and-error. These have transformed tasks long thought out-of-reach such as image recognition, making it possible to, say, identify who’s who in a picture or spot tumours in a CT scan.
How generative AI differs from its predecessors is twofold. First, it can tackle much more varied problems (not “just” answer “Does that lump look cancerous?” but also “What are the key risks highlighted in this report?” and everything in between). And second, it can be interacted with through natural speech. That means most of us can find some use for generative AI and can use it without being a data scientist — and that’s what makes it so powerful and popular.
That also means we can easily mistake it for an “everything” tool and use it on the wrong problems. And when we do, our performance suffers. A Harvard study showed this in action, taking smart, tech-savvy BCG consultants and asking them to complete a range of tasks with and without generative AI tools. The consultants were 19 percentage points less likely to reach correct conclusions when using generative AI on tasks that appeared well-suited for it but were actually outside of its capabilities. In contrast, on appropriate tasks, they produced 40% higher quality results and were 25% quicker. The researchers concluded that the “downsides of AI may be difficult for workers and organizations to grasp.”
For boards, then, knowing what generative AI is good and bad at, and steering both their own and their organisation’s use of these new tools, might soon become a key part of their role. The issue? According to a survey we conducted with the Corporate Governance Institute, one-third of directors fear they’re not tech-literate enough and nearly two-thirds believe that most of their colleagues lack sufficient expertise to oversee the use of technology. The first step, therefore, is to get a better understanding of what goes on behind the AI scenes.
You might have heard that generative AI consists of Large Language Models (“LLMs”) and that said models are trained to “predict the next word.” But how does that work?
If you’re short on time, just know that AI doesn’t “answer” your question; it “autocompletes” it. It does so by reusing text that it’s already seen and is statistically likely to follow your prompt, and modifying it to fit the words used in your request.
To jump ahead to what this means for the board and what directors need to watch out for when it comes to AI and board reporting, click the link below.
What are the dangers of using AI to generate or analyse board information? →
In a nutshell, LLMs start by looking at vast amounts of training text and measuring how often words appear near each other (for example, “work” and “university” would frequently co-occur; “work” and “zebra” wouldn’t). They then plot these words, according to how interconnected they are, onto a vector space — think your usual graph with X and Y axes, except that it has thousands of dimensions instead of just two. The closer two words are on a dimension, the closer their meaning is likely to be in that dimension (“zebra” would be close to “giraffe” and “lion” on one dimension, and close to “crossing” and “pedestrian” on another), thus creating clusters of concepts.
A 3D representation of the semantic clusters that emerge alongside dimensions (here, age and gender) when grouping words frequently found together. © Carnegie Mellon University
Because words are now represented as coordinates rather than letters, we can use arithmetic to move between these clusters and find a word’s nearest match in other concepts. For instance, take the vector for “king”, subtract the vector for “man”, and add the vector for “woman”: the word you arrive closest to is “queen”. More subtle (and potentially controversial) matches can be made too: “London” minus “UK” plus “France” gives you “Paris”, whilst “animal” plus “ethics” equals “human”.
How LLMs find which word to use when moving between concepts. © Carnegie Mellon University
These connections between words, and how to navigate them, aren’t specified by programmers but self-discovered. This happens through a training process in which LLMs blank some words in their dataset, guess (at random at first) what these words should be, then compare their prediction with the original text. Depending on how well they did, they adjust the vectors and try again. Over time, they become able to tell apart meanings of the same words (e.g. “match” probably means “game”, not “fire starter”, if it’s alongside “football”), build links across increasingly abstract concepts, and predict what the next word in a sentence should be.
The key differentiator of today’s LLMs is that they perform these steps whilst looking at the entire text, not just nearby words. Using something called a “transformer”, words can be linked to what or whom they refer to, even if that information lives in a distant paragraph or is scattered throughout the text. That gives modern models a much greater context — letting them know, for example, that “Rose” is a 23-year-old student in Michigan, not a flower, or that “the couple” alludes to Frank and Mary who were discussed several sentences earlier. And that, in turn, helps them predict which word should come next much more accurately.
Combine all this, and you end up with something able to generate “new” text — in effect, old text from the training data, that has been picked because it’s statistically likely to be the next words that should follow your request, and that had its content modified through maths. Want a sad story about unicorns? The AI has learned from its training corpus that the words you used (“write me a story about…”) are usually followed by a tale. It can take a conceptually close, happy story about rabbits that it’s already read, subtract “happy” and “rabbit” vectors and add “sad” and “unicorn” vectors to turn “sunshine” into “rain” or “paws” into “hooves”, et voilà.
ChatGPT and its kin are, of course, vastly more refined (they don’t operate on whole words but on chunks of words, for starters), but all the generative AI tools currently making the news follow these principles — even the image-generating ones. They identify patterns and relationships, project that information onto a mathematical space, then manipulate it with algebra to not really “answer” your prompt but rather “autocomplete” it.
That’s the wild thing: we’re not sure.
Once LLMs reach a certain size, capabilities seem to appear without having been coded in by the models’ developers. And figuring out why is tricky, because these models self-organise in ways that aren’t familiar to human minds. In the words of Ian Hogarth, chair of the UK Government’s AI Foundation Model Taskforce, “We don’t really program them — we grow them.”
According to Jeff Dean, chief scientist at Google DeepMind and Google Research, “there are currently few compelling explanations for why such abilities emerge.” Language and reasoning may be simpler than we think, or these AIs may just be great at fooling us. This is fuelling numerous debates in fields including cognition, linguistics, and philosophy, that are far outside the scope of this guide.
Once you understand how generative AI works, it’s easy to spot its limitations — and where we might go wrong if we start using it to generate (or summarise) content for the board pack. Don’t dismiss AI yet, though; there’s good news in the next section of this guide.
First, because AI models reflect the way humans use words, they also reflect many of the biases that humans exhibit and that will be found in the training data. For example, we’ve seen how “king - man + woman = queen”. But many AI tools that have been trained on data that mostly mention male doctors and female nurses would also happily calculate “doctor - man + woman = nurse”. In a stark illustration of that problem, Oxford researchers found that a popular image-generating AI tool could create pictures of white doctors providing care to black children but not of black doctors doing the same for white children.
What a biased AI generates when prompted for images of a “black African doctor helping poor and sick white children”. © Alenichev, Kingori, Grietens; University of Oxford
These biases are difficult to protect against because they’re embedded, right from the start, into the data that the model derives everything from. And they can be subtle, too. Research published by the National Academy of Sciences suggests that LLMs focus on what appeals to us rather than on what’s pertinent. In the authors’ words, “When using LLMs to summarise texts . . . we must expect that the most reproduced contents are, as for us, the ‘cognitively attractive’ ones and not necessarily the most informative.”
How could that affect boards? We know from our research with the Chartered Governance Institute UK & Ireland that board packs are getting longer and that directors struggle to identify the key messages within. Using AI to summarise verbose reports might therefore seem appealing… but is a dangerous idea if done by the readers of those reports. Without the writers’ oversight, the model may mimic the biases it’s been trained on and could, say, disregard perspectives that differ from the dominant opinions it has seen, focus exclusively on the good news, or incorrectly recap suggestions made by authors who don’t write like the stereotypical executive.
Biased machines are, alas, nothing new. As covered by MIT Technology Review, Enron’s email archive is so far the only large database of executive correspondence to have been digitised and shared, which means it has been used as training data for numerous pieces of software, such as tools that automatically prioritize certain messages in an inbox or automatically summarise messages.
But while the archive is representative of how people within Enron communicated, it isn’t representative of the general population. And that means the algorithms are susceptible to the same biases that the company’s senior executives were in the 2000s — when they were running a company that’s become synonymous with “fraud”.
Second, while AI is great at making its answers appear plausible and written by a human, the way they’re generated means that they’re not necessarily factually correct — the model simply extrapolates words from its training data and approximates a solution. As Dr Haomiao Huang, an investor at renowned Silicon Valley venture firm Kleiner Perkins, puts it: “Generative AI doesn’t live in a context of ‘right and wrong’ but rather ‘more and less likely.’”
Often, that’s sufficient to generate a close-enough answer. But sometimes that leads to entirely made-up facts. And in an ironic twist for computers, AI can particularly struggle with basic maths.
An incorrect answer confidently served by ChatGPT. © Ars Technica
That can have real-life consequences. In one high-profile incident, a lawyer relied on ChatGPT to find judicial opinions from over half a dozen court decisions in support of his argument before the defence discovered that none of these cases were real. “I did not comprehend that ChatGPT could fabricate cases,” the lawyer later confessed at his own court hearing. “I falsely assumed [that it] was a super search engine.”
“LLMs are not search engines looking up facts; they are pattern-spotting engines that guess the next best option in a sequence. Because of this inherent predictive nature, LLMs can also fabricate information . . . They can generate made-up numbers, names, dates, quotes — even web links or entire articles.”
For board directors, being given incorrect facts is bad enough — they’ll likely make poor decisions as a result. But they also run the risk of breaching their regulatory duties, notably around the need to exercise reasonable care, skill, and diligence. Just like lawyers blindly trusting AI-generated court cases, directors relying on AI-generated summaries of their board papers could easily be accused of not doing their job or being reckless with their power.
Research is ongoing to find ways to limit these “hallucinations”. But, for the time being, any material touched by AI should always be thoroughly checked by a subject matter expert before it’s sent to the board.
Finally, since AI models base their answers on what they saw in their training data, they tend to leak information from that original dataset — sometimes downright copy-pasting entire, unmodified sentences.
This can become a problem when those models are also set to continuously learn from the interactions they have with their users, because the content of one conversation could later be reused into another conversation with someone else. Imagine your CFO asking, “We want to acquire competitor X, what should I know about them?” and that information then being served to an outsider asking “What’s up with company X?” (It’s not just theoretical: Samsung banned generative AI internally after noticing that proprietary code was involuntarily shared that way.)
Running a private model within your organisation may seem like a solution. But it would only stop leaks between your organisation and outsiders; not between people within your organisation. Imagine one of your colleagues, who doesn’t sit on the board, asking, “What should I include in my report to impress the CFO?” and the AI replying “The CFO is thinking of buying X, you should research that company.”
For boards, then, it means that any AI tool accessing board information should explicitly not use said information for its own training, unless it can guarantee that the model runs individual training for each user that doesn’t get shared with others.
Given all this, you might be thinking that AI is inherently flawed and has no place in your boardroom. Well, that’s only half right.
The crux of the matter is that generative AI cannot be trusted as an author of board material. But it can shine as an editor, where its immense capabilities to analyse and rephrase text can nudge report writers in the right direction and act as a critical, always-on-call friend.
For most organisations, such an editor is sorely needed. Boards know that the reports they get aren’t great: overall, 80% of board members and governance professionals score their board packs as “Weak” or “Poor”. The typical response is to put the management team through training sessions or ask them to use templates that present the information that matters in ways that make sense. The issue is that these solutions rarely stick. Templates get misused, training gets forgotten, and habits are hard to change.
AI isn’t so easily ignored or forgotten. It can give instant feedback and real-time prompts that challenge what your team members are writing, as they’re writing it, nudging them in the right direction and continuously training them. And that means its impact can be much more durable.
So, in practice, how could AI be used as an augmentation tool to improve report writing in the long term, rather than as an automation tool to cut out report writing in the short term? We’ve identified two main ways: stimulating critical thinking (i.e. ensuring the content of the paper stacks up), and guiding writing (i.e. ensuring the content is easy to read and make sense of).
AI can guide your report writers’ thinking, letting them know if they’re falling into the most common board reporting traps and nudging them towards more actionable insights and plans. For instance:
In each case, it’s then up to the report author to decide how to act on that feedback. For example, maybe the bad news hasn’t been omitted and the report looks overly optimistic simply because last quarter was absolutely great. As the subject matter expert, the writer will know better than the AI. What matters is that the AI challenged the thought process and pointed out potential gaps in the writer’s thinking, helping produce a more robust paper and sharper insight for the board to act on.
Board Intelligence’s AI-powered management reporting software, Lucia, provides real-time prompts and feedback that challenge your team to think more deeply as they write.
Robust thinking is only half the battle, however, and great information won’t do much good if it isn’t easy for the reader to parse and process it. Here, too, AI can help by asking:
Because generative AI excels at modifying existing text and “translating” it from one representation into another, it can help fix the issues it spots with the click of a button. For example, by extracting the report’s key points and putting them in an executive summary that follows best practice.
Lucia’s Auto-Summarisation builds executive summaries that put the key points up front.
These translations capabilities can also be used for meeting minutes. Provided that it’s been trained to know what proper minutes look like, AI can convert meeting notes or transcripts into best-practice meeting minutes — and appropriately format them too.
Minute Writer’s AI turns notes into meeting minutes in just a few clicks.
Here too, the report writer remains the one in charge, ensuring no “hallucinations” slip through and finessing the end result if needed, but saving significant amounts of time by letting AI do the heavy lifting.
Lucia can tell writers when their reports include too much data and too little “So what?”
If you want to bring the best of generative AI to your board reporting process and be confident you’re doing it the right way, take a look at Lucia, our AI-powered management reporting software. It offers live feedback, real-time prompts, smart automatic editing tools — and much more.
Or check out Minute Writer and turn notes or transcripts into ready-to-go, formatted minutes in just a few click with the help of a purpose-built AI designed by board experts.