Synthetic data: what information managers need to know
Jinfo Blog
22nd July 2025
Abstract
Item
What is synthetic data?
To start out, let's define synthetic data:
"Synthetic data is data that has been generated using a purpose-built mathematical model or algorithm, with the aim of solving a (set of) data science task(s)"
(Source: The Royal Society)
In other words, synthetic data is artificially created using mathematical models or algorithms to replicate the patterns and statistical properties of real-world data.
Rather than being observed or collected from people or events, it is generated to serve analytical and modelling purposes.
Its goal is to maintain the realism of actual data while avoiding some of the limitations and risks that come with using genuine datasets.
Synthetic data preserves relationships, distributions, and structures from source data without containing real personal identifiers. It comes in many forms and can be structured (e.g. numbers, tables) or unstructured (e.g. interviews, natural language responses).
Advantages of synthetic data
Synthetic data offers a range of benefits:
- Contains no information about individuals
- Cheaper to produce
- Accessible, scalable and faster to produce
- Can be used freely without data protection regulations.
Last but not least, imbalance or bias in datasets can also be corrected with the use of synthetic data, especially when certain groups are underrepresented.
Synthetic data is still ... synthetic
Despite its benefits, synthetic data is not without drawbacks. One major concern is that the data might not accurately reflect real-world behaviour or complexity, especially if it's poorly modelled.
Synthetic datasets can still reflect, or even exaggerate, biases, if the training data is flawed or not properly understood.
Validating synthetic data is critical. If used in AI models, poorly validated data can lead to misleading conclusions or underperformance in real-world applications.
The technology and infrastructure needed to produce, test, and maintain synthetic data systems can be resource-intensive and technically demanding.
Using synthetic data in market research
Synthetic data is increasingly used to improve the quality, speed, and depth of market research. It enhances traditional methods while opening new opportunities for experimentation and insight generation.
When used correctly, synthetic data enables the creation of synthetic personas or "digital twins" that simulate customer behaviours, allowing brands to test ideas before launching them publicly.
Researchers can use it to boost sample sizes in underrepresented demographics or to fill in missing responses in datasets.
It also supports scenario modelling, helping marketers forecast reactions to new products, pricing strategies, or branding changes more efficiently.
Managing synthetic data effectively
Information managers play a vital role in ensuring that synthetic data is used responsibly and effectively. This involves careful oversight, robust validation (using critical thinking), and alignment with ethical standards.
Being aware of synthetic data and how this is used in your own organisation is important, and synthetic data should be treated with the same diligence as real data - including documentation, version control, and quality monitoring.
Final note
Synthetic data is not a silver bullet, but when used wisely, it's a powerful complement to real-world data. It supports innovation, protects privacy, and enables more inclusive research.
However, its success depends on diligent testing, thoughtful integration, and ethical use throughout the data lifecycle.
Next actions
-
Is your organisation working with synthetic data? We'd love to hear your experiences. Please send me a note or book a call with me.
- Read about other things that we picked up at the KiMRA conference, in the Jinfo Report "Five takeaways from the KiMRA conference."
- Blog post title: Synthetic data: what information managers need to know
- Link to this page
- View printable version
Register for our next Community session:
![]()
Transforming knowledge management at BASF – GenAI and the evolution of QKnows
10th December 2025
Latest on our YouTube channel:![]()
Read on the Blog:
December 2025 update
3rd December 2025
- Jinfo wins CILIP’s inaugural “McFarlane & Ward Information Management” award
4th December 2025 - December 2025 update
3rd December 2025 - Review of Matchplat – combining AI with traditional industry code searching
27th November 2025
- Team roles and AI (Community) 26th February 2026
- Team demand and AI (Community) 22nd January 2026
- Transforming knowledge management at BASF – GenAI and the evolution of QKnows (Community) 10th December 2025