Anja Chemnitz Thygesen Synthetic data: what information managers need to know
Jinfo Blog

22nd July 2025

By Anja Chemnitz Thygesen

Abstract

Synthetic data, meaning data that has been generated by a model, was a topic discussed at the KiMRA (Knowledge & Information Management, Research & Analysis) conference in London, UK, both last year and again this year. 

Here, Jinfo explores synthetic data to understand what it is, how it can be used, and what it could mean for information managers.

Item

What is synthetic data?

To start out, let's define synthetic data:

"Synthetic data is data that has been generated using a purpose-built mathematical model or algorithm, with the aim of solving a (set of) data science task(s)"
(Source: The Royal Society)

In other words, synthetic data is artificially created using mathematical models or algorithms to replicate the patterns and statistical properties of real-world data.

Rather than being observed or collected from people or events, it is generated to serve analytical and modelling purposes.

Its goal is to maintain the realism of actual data while avoiding some of the limitations and risks that come with using genuine datasets.

Synthetic data preserves relationships, distributions, and structures from source data without containing real personal identifiers. It comes in many forms and can be structured (e.g. numbers, tables) or unstructured (e.g. interviews, natural language responses).

Advantages of synthetic data

Synthetic data offers a range of benefits:

  • Contains no information about individuals
  • Cheaper to produce
  • Accessible, scalable and faster to produce
  • Can be used freely without data protection regulations.

Last but not least, imbalance or bias in datasets can also be corrected with the use of synthetic data, especially when certain groups are underrepresented.

Synthetic data is still ... synthetic

Despite its benefits, synthetic data is not without drawbacks. One major concern is that the data might not accurately reflect real-world behaviour or complexity, especially if it's poorly modelled.

Synthetic datasets can still reflect, or even exaggerate, biases, if the training data is flawed or not properly understood.

Validating synthetic data is critical. If used in AI models, poorly validated data can lead to misleading conclusions or underperformance in real-world applications.

The technology and infrastructure needed to produce, test, and maintain synthetic data systems can be resource-intensive and technically demanding.

Using synthetic data in market research

Synthetic data is increasingly used to improve the quality, speed, and depth of market research. It enhances traditional methods while opening new opportunities for experimentation and insight generation.

When used correctly, synthetic data enables the creation of synthetic personas or "digital twins" that simulate customer behaviours, allowing brands to test ideas before launching them publicly.

Researchers can use it to boost sample sizes in underrepresented demographics or to fill in missing responses in datasets.

It also supports scenario modelling, helping marketers forecast reactions to new products, pricing strategies, or branding changes more efficiently.

Managing synthetic data effectively

Information managers play a vital role in ensuring that synthetic data is used responsibly and effectively. This involves careful oversight, robust validation (using critical thinking), and alignment with ethical standards.

Being aware of synthetic data and how this is used in your own organisation is important, and synthetic data should be treated with the same diligence as real data - including documentation, version control, and quality monitoring.

Final note

Synthetic data is not a silver bullet, but when used wisely, it's a powerful complement to real-world data. It supports innovation, protects privacy, and enables more inclusive research.

However, its success depends on diligent testing, thoughtful integration, and ethical use throughout the data lifecycle.

Next actions

« Blog