Marking ChatGPT’s Homework with TDL's Data Scientists

ai generated image of a floating brain with purple centre

Alfred Tiley, Data Scientist at The Data Lab, reviews ChatGPT’s predictions on data trends for 2023

Recently The Data Lab team asked ChatGPT to write a blog post on the data trends for 2023. While this is unlikely to be the first or the last time that ChatGPT is called on to help with an assignment, its responses were telling – illustrating both the benefits and some significant limitations of the technology.

This week, here in The Data Lab’s Data Team, we took the opportunity to mark ChatGPT’s homework, examining the quality of its predictions for data trends in 2023. On behalf of my team, I will issue ChatGPT’s (metaphorical) report card here – highlighting where it hit the mark and, importantly, what it got wrong. But first, let’s review the basics …

What is ChatGPT?

In short, ChatGPT is a chatbot – a software application that engages in human-like conversation with users via written text (or spoken word). ChatGPT was launched in November 2022 by OpenAI, a San Francisco-based artificial intelligence research laboratory and company. Currently free, and available to the general public, ChatGPT has proved immensely popular – accruing over a million users in the first week after its launch. Since then it has reached 100 million users, garnered sustained attention from the global media, and provoked both excitement and warnings of caution from the tech world.

Why is there so much focus on ChatGPT?

Chatbots have existed in some form since the 1960s, so why has ChatGPT generated so much attention? Like other chatbots, ChatGPT’s purpose is to provide natural language responses and answers to free-form textual prompts or questions from the user. What makes ChatGPT special is the high level of detail, convincingly human character, and sheer versatility of its responses.

For example, while ChatGPT can hold impressively natural-sounding discussions on a wide range of topics, it can also be used to generate and debug computer code, build software applications from scratch, run virtual machines, compose entire works of literary fiction, write fully formed essays and articles, or create personalised meal and exercise plans.

How does ChatGPT work?

The power behind ChatGPT is OpenAI’s GPT-3, the family of language models on which it is built. Language models use probability distributions to guess which word, or words, should appear at a given point in a sentence or text. In modern language modelling, these distributions are learned by neural networks via exposure to example text or speech.

Whether they know it or not, most people are already using existing wide-spread applications of language models. From asking Apple’s digital assistant Siri for directions to the airport, to performing a Google search, or accepting an autocorrect suggestion on a smartphone, many of us regularly depend on language models in our day-to-day lives.

GPT-3 stands out from other contemporary language models due to its size. It was fed 570GB of text, sourced from books and a substantial fraction of the internet, to optimise its 175 billion parameters. Its immense size means GPT-3 can carry out tasks that it was not explicitly trained for, making it very versatile.
ChatGPT itself was built by fine-tuning an updated iteration of the model, GPT-3.5, using both supervised learning and reinforcement learning from human feedback.

In the former, the model was provided with conversations in which human trainers adopted the role of both the user and the chatbot. In the latter, the model was asked to output several responses to a prompt, which were then ranked for suitability by a human trainer. These rankings were used to create reward models for further fine-tuning.

What are the limitations of ChatGPT?

ChatGPT’s capabilities are extremely impressive. However, users should exercise caution as they come with some key limitations. Ironically, the most important of these is also ChatGPT’s main strength: the large language model at its core. The model allows ChatGPT to produce detailed and plausible-sounding outputs on most topics. However, this does not mean that they are always correct. And ChatGPT’s convincing style means that, particularly for complex subjects, inaccuracies in its responses may only be picked up by human experts in the field.

For example, a company could ask ChatGPT to draw up an employment contract for future employees. But it would be risky to put that contract into use without a qualified lawyer first reviewing it. It might also prove unwise to request and follow instructions from ChatGPT on how to build an aeroplane without seeking oversight from an aeronautical engineer. Similarly, users can avoid real-world harm by seeking medical advice from trained practitioners rather than ChatGPT.

Another subtle problem stems from the chatbot’s natural language interface. There is, of course, a huge benefit in allowing users to pose free-form questions in their own words. However, this also means that ChatGPT’s responses can depend on the input phrasing. In some scenarios this can lead to it behaving more like a supportive friend than an objective and reliable source of truth, tailoring its responses to account for loaded questions from the user.

So how did ChatGPT perform with its 2023 data predictions?

So let’s take a look at ChatGPT’s ‘Top data trends to look out for in 2023’. On the face of it, the chatbot’s responses are respectable. It correctly identifies data privacy and security, big data, the Internet of Things, and machine learning and artificial intelligence as important factors in data science for the year. However, these are hardly new trends and certainly not unique to 2023.

It also flags the rise of 5G networks as something to watch this year. While it’s true that 5G coverage will continue to expand in 2023, designating the technology as a top trend for the year is a stretch. In the UK at least, about 50% of the population has 5G coverage. But the technology won’t be nationwide until circa 2028. And it isn’t expected to overtake 4G in prominence until 2025.

Overall, the factors ChatGPT highlights are generic. They wouldn’t be much use, for example, to a business looking for a competitive advantage in 2023. And when pressed for further details, at best its responses are more verbose, but still fairly text-book elaborations on the same points.

For example, asking it to discuss which particular elements of data privacy and security are important for data scientists in 2023 simply elicits a list of definitions of sub-topics in the field. A partial explanation for this generality lies in the limits of ChatGPT’s training data, which only extend to 2021. So its 2023 predictions are ultimately sourced from texts that are lagging by at least a year.

At worst, requests for clarification can result in misleading responses from the chatbot. The Data Lab team asked it to explain to businesses why 5G will be important for their 2023 data strategies. ChatGPT duly replied with a detailed explanation of why 5G is important. This is a good example of ChatGPT’s sensitivity to input phrasing. In the Data Team, we posed the opposite question but with similar phrasing: “Why is 5G not important to data scientists’ strategies?” Its response was a much less enthusiastic endorsement of the importance of 5G, from the perspective of data scientists …

“5G technology is important for many industries, including telecommunications and internet of things (IoT), but it may not be a significant consideration for data scientists when developing their strategies. Data scientists are primarily focused on utilizing data to gain insights and make predictions, and the technology used to transmit that data is generally less important than the quality and quantity of the data itself. However, 5G can be important for data scientists as it’s important for data-intensive applications such as self-driving cars, virtual reality, real-time streaming, big data analytics and edge computing. It enables faster data transfer, lower latency and better connectivity which can help data scientists to get the data faster and process it with less delay.”

So, what grade should ChatGPT receive for its efforts? The very fact that it can produce an article on data trends for 2023 with enough detail that it warrants a serious assessment by TDL’s Data Team is impressive. It speaks volumes as to how far modern large language models have progressed. However, some of ChatGPT’s key limitations are evident in its responses.

On a complex, wide-ranging, and perhaps subjective topic, its outputs seem authoritative. But under closer scrutiny, flaws and inaccuracies are revealed. Similarly, issues of objectivity and sensitivity to user input phrasing are apparent in its article. On balance, ChatGPT receives a B+ for its work.

Looking to the future of ChatGPT

Modern language models have achieved such size and complexity that they can now be flexibly employed to assist in a huge range of creative and administrative tasks. We should expect to see them continue to permeate the business world, with ChatGPT – and in particular its future iterations – having the potential to significantly impact ways of working across many sectors.

Despite their widespread applicability, predictions that ChatGPT or similar models may replace human efforts entirely are unlikely to transpire – at least for the foreseeable future. Their routine employment for increasing human productivity, however, is a much safer bet.

The continued rise of sophisticated large language models will also come with many complex challenges beyond those discussed here, including issues pertaining to privacy, plagiarism, bias, and discrimination. Additionally, the quality of future models will depend on effective ways to identify and exclude existing artificially generated content from training data. Tackling these problems will be an integral step in developing safe, fair, and effective applications of the technology in the future.

About Us

For Business

The Data Lab Academy

For Universities and Colleges

Community

Marking ChatGPT’s Homework with TDL’s Data Scientists

Alfred Tiley, Data Scientist at The Data Lab, reviews ChatGPT’s predictions on data trends for 2023

In this article

What is ChatGPT?

Why is there so much focus on ChatGPT?

How does ChatGPT work?

What are the limitations of ChatGPT?

So how did ChatGPT perform with its 2023 data predictions?

Looking to the future of ChatGPT

Get in touch

Follow us on social

About Us

For Business

The Data Lab Academy

For Universities and Colleges

Community

Alfred Tiley, Data Scientist at The Data Lab, reviews ChatGPT’s predictions on data trends for 2023

In this article

What is ChatGPT?

Why is there so much focus on ChatGPT?

How does ChatGPT work?

What are the limitations of ChatGPT?

So how did ChatGPT perform with its 2023 data predictions?

Looking to the future of ChatGPT

Reader Interactions

Leave a Reply

Get in touch

Follow us on social