Jinfo: Synthetic data: what information managers need to know

Synthetic data: what information managers need to know
Jinfo Blog

22nd July 2025

Abstract

Synthetic data, meaning data that has been generated by a model, was a topic discussed at the KiMRA (Knowledge & Information Management, Research & Analysis) conference in London, UK, both last year and again this year.

Here, Jinfo explores synthetic data to understand what it is, how it can be used, and what it could mean for information managers.

Item

What is synthetic data?

To start out, let's define synthetic data:

"Synthetic data is data that has been generated using a purpose-built mathematical model or algorithm, with the aim of solving a (set of) data science task(s)"
(Source: The Royal Society)

In other words, synthetic data is artificially created using mathematical models or algorithms to replicate the patterns and statistical properties of real-world data.

Rather than being observed or collected from people or events, it is generated to serve analytical and modelling purposes.

Its goal is to maintain the realism of actual data while avoiding some of the limitations and risks that come with using genuine datasets.

Synthetic data preserves relationships, distributions, and structures from source data without containing real personal identifiers. It comes in many forms and can be structured (e.g. numbers, tables) or unstructured (e.g. interviews, natural language responses).

Advantages of synthetic data

Synthetic data offers a range of benefits:

Contains no information about individuals
Cheaper to produce
Accessible, scalable and faster to produce
Can be used freely without data protection regulations.

Last but not least, imbalance or bias in datasets can also be corrected with the use of synthetic data, especially when certain groups are underrepresented.

Synthetic data is still ... synthetic

Despite its benefits, synthetic data is not without drawbacks. One major concern is that the data might not accurately reflect real-world behaviour or complexity, especially if it's poorly modelled.

Synthetic datasets can still reflect, or even exaggerate, biases, if the training data is flawed or not properly understood.

Validating synthetic data is critical. If used in AI models, poorly validated data can lead to misleading conclusions or underperformance in real-world applications.

The technology and infrastructure needed to produce, test, and maintain synthetic data systems can be resource-intensive and technically demanding.

Using synthetic data in market research

Synthetic data is increasingly used to improve the quality, speed, and depth of market research. It enhances traditional methods while opening new opportunities for experimentation and insight generation.

When used correctly, synthetic data enables the creation of synthetic personas or "digital twins" that simulate customer behaviours, allowing brands to test ideas before launching them publicly.

Researchers can use it to boost sample sizes in underrepresented demographics or to fill in missing responses in datasets.

It also supports scenario modelling, helping marketers forecast reactions to new products, pricing strategies, or branding changes more efficiently.

Managing synthetic data effectively

Information managers play a vital role in ensuring that synthetic data is used responsibly and effectively. This involves careful oversight, robust validation (using critical thinking), and alignment with ethical standards.

Being aware of synthetic data and how this is used in your own organisation is important, and synthetic data should be treated with the same diligence as real data - including documentation, version control, and quality monitoring.

Final note

Synthetic data is not a silver bullet, but when used wisely, it's a powerful complement to real-world data. It supports innovation, protects privacy, and enables more inclusive research.

However, its success depends on diligent testing, thoughtful integration, and ethical use throughout the data lifecycle.

Next actions

Is your organisation working with synthetic data? We'd love to hear your experiences. Please send me a note or book a call with me.
Read about other things that we picked up at the KiMRA conference, in the Jinfo Report "Five takeaways from the KiMRA conference."

About this article

Blog post title: Synthetic data: what information managers need to know
Link to this page
View printable version

« Blog

What's new at Jinfo?

AI use cases
22nd September 2026

Latest on our YouTube channel:

Read on the Blog:
July 2026 update
1st July 2026

Content and AI – key findings from Jinfo's recent research
21st July 2026
Vendors and AI – integration is creating a new opportunity for aggregators
9th July 2026
July 2026 update
1st July 2026

Blog

From AI use cases to implementation (Community) 22nd October 2026
AI use cases (Community) 22nd September 2026
Stakeholder value and AI: Workshop (Community) 16th July 2026

Webinars Community

Learn more about the Jinfo Subscription

Keep up to date:

Synthetic data: what information managers need to know Jinfo Blog

Synthetic data: what information managers need to know
Jinfo Blog