Can AI salvage the surveys abandoned by humans? A study on synthetic data completion. Can AI salvage the surveys abandoned by humans? A study on synthetic data completion.
- Reading time
- 1 min
- Words
- Potloc
- Published date
- April 1, 2026
Consultants, investors, and business leaders run on survey data. Yet the system has a massive efficiency leak: up to 34% of responses are abandoned before completion and ultimately discarded by researchers. To study whether synthetic data can "complete the incompletes", we compared 18,836 AI predictions against actual human responses. See what our findings reveal.
Introduction
At Potloc, our Research on Research series serves a single mission: uncovering the foundations of high-quality insights. Our studies have gathered data on everything from better engaging respondents to spotting increasingly sophisticated signs of bad data.
But it’s an ironic moment for our industry. While fighting AI-driven fraud, we’re simultaneously finding ways to apply it to our advantage.
To contribute findings to this fast-growing area, we decided to examine how synthetic data can tackle a systemic issue in the survey industry: incomplete surveys.
Incomplete surveys: Trash or treasure?
In high-stakes research environments, data rejection is part of the discipline. Fraudulent responses, speeders, duplicates, and bot-generated answers are systematically filtered out. Incompletes have traditionally been grouped into this same category: unusable.
Yet, incomplete surveys differ in meaningful ways.
Incomplete surveys are interesting because, historically, the survey industry has treated them as unusable, with most players excluding them from the final sample. According to one analysis, web surveys have a median dropout rate of 16-34%, while telephone survey dropout rates have been reported as high as 26%.
These unsalvageable “incompletes” waste research resources, force costly re-recruitment, and lock critical insights in partial responses. For decades, this has been largely accepted as an inevitable cost of market research.
But are incompletes necessarily “bad”? Think about it: Fraudsters and bots are actually very good at finishing surveys. They push through any survey friction to maximize rewards. On the other hand, quitting a survey mid-way is the most human thing you can do. Maybe they got interrupted, or tired, or bored. The very act of abandoning can actually be the strongest signal that someone is real.
What if there were a way to recover value from these incomplete surveys through synthetic data completion? Could a multi-agent AI system use the first parts of a real human's answers to accurately predict the rest?
But first, what is synthetic data?
In survey research, synthetic data is artificially generated data that mimics the statistical patterns, relationships, and distributions of real human responses.
Of course, it’s an expansive umbrella term. There are many different tactics to incorporate synthetic data, including but not limited to:
-
Complete the incompletes: Constructing missing answers from a partially completed survey.
-
Sample augmentation: Reaching a target N-size when human feasibility is exhausted.
-
Digital twins: Creating AI personas that mirror specific persona.
-
Full synthetic sample: Generating an entire dataset from scratch based on historical patterns.
Crucially, not all survey research looks the same. There are B2B and B2C studies. There are surveys for thought leadership, usage & attitude, brand tracking, concept testing, segmentation, and pricing studies — all with distinct functions and features.
We decided to evaluate the “complete the incomplete” tactic on two types of surveys:
-
Usage & attitude (U&A) surveys.
-
Thought leadership (TL) surveys.
Our study: Can AI rescue surveys that humans abandon?
In this early experiment, we started with completed surveys from real respondents across nine survey projects, split between U&A and TL, and covering topics ranging from healthcare and AI to wellness.
We then simulated incomplete surveys by cropping human-completed surveys at different stages (30%, 50%, 70%, and 90% completion rates) — these defined the level of context available to the AI model, while the rest was hidden.
The AI model was then tasked with generating responses for the hidden portion of each survey. This allowed us to systematically vary the amount of context the AI had to predict the remaining answers.
We then evaluated performance by measuring the “accuracy” of 18,836 synthetic predictions against what human respondents actually said.
We tested our model’s accuracy across survey types (U&A and TL), different survey completion rates (30%, 50%, 70%, and 90%), and different question types (Binary questions, multiple-choice questions, NPS/sentiment questions, and open-ended questions).
How we built our AI model.
We didn't just "ask an LLM" to guess answers; we built a specialized multi-agent architecture to intelligently predict how real respondents would answer the remaining survey.
-
Our layered system deployed specialized agents based on question formats (Binary, Single-Choice, NPS, Open-End).
-
We used a LangGraph pipeline to intelligently route questions based on complexity, like sending simple binary questions to GPT-4o-mini and complex subjective judgment to Claude 4.5 Sonnet.
- Each prediction was fed a "human seed" consisting of the respondent's demographic profile (age, income, location) and their set responses before they dropped out, along with base rates that provided context on the statistical likelihood of specific answers within the broader population.
-
Accuracy measurement varied by question type: exact match for Binary and Single-Choice (strict, no partial credit), and a 5-component semantic similarity score for Open-Ended (embedding similarity, keyword overlap, length). Worth a sentence here so readers understand why the 5% open-end figure isn't an apples-to-apples comparison to binary.
Our findings: How did the synthetic model perform?
While we found some encouraging findings on the accuracy of synthetic completion, we also uncovered a phenomenon that gave us pause.
Overall performance.
Overall, the system achieved 55.78% exact-match accuracy — roughly twice the random baseline for this question mix. But you’ll see further ahead that this headline figure tells only part of the story.
The mystery: Where more context made AI worse.
In U&A research, the AI model performed as expected. The more it knew from past responses (i.e., higher completion rate), the better it predicted the remaining survey responses.
This aligned with our intuition: U&A research tends to be rooted in recall and evaluation: what consumers bought, how frequently they used it, and how satisfied they felt. These domains align closely with pattern recognition strengths inherent to large language models.
Analysis: What explains the synthetic completion paradox?
To understand why more context (i.e., higher completion rates) led to lower accuracy in TL surveys, we dug into a few hypotheses related to survey architecture.
1. Was it the survey content?
First, our analysis found that accuracy was impacted by the temporal orientation of the survey content.
-
U&A surveys are largely retrospective. They are rooted in the past and present (e.g., "What did you buy?"). AI excels at identifying these historical behavioral patterns.
-
Conversely, TL surveys are progressively prospective. They ask for a point of view on an unwritten future (e.g., "What trends will shape your industry?"). These questions largely require strategic foresight that isn’t always grounded in existing data.
The differences in survey content explain why accuracy degraded with higher survey completion for TL surveys. While earlier TL questions center on basic demographic, awareness, and screening questions that focus on the past or present, they become more strategic and future-facing as the questions go on. Presumably, accuracy degrades as the survey moves from "what is" to "what will be".
To confirm our hunch, we looked deeper into the anatomical differences between our two types of survey research: the format of the questions themselves — as well as where they sit in the survey.
2. Was it the question formats?
We started by analyzing the impact of survey format on accuracy. We looked at 4 types of survey questions.
-
Binary questions, like ‘Are you familiar with Web3?’ or ‘Do you track emerging trends?’
-
Multiple-choice questions, like ‘Which feature do you use?’ or ‘Which trend will dominate?’
-
For NPS or sentiment questions, like ‘How likely are you to recommend this product?’ or ‘How optimistic are you about AI regulation?’
-
Open-ended questions, like ‘What would improve this service?’ or ‘How will this emerging trend shape your industry?’
Our findings showed that alongside the survey content, the question format also impacted the accuracy of synthetic prediction.
The pattern was consistent: AI reliably handles what can be inferred from established behavior, and breaks down where the question demands genuine human judgment — strategic, speculative, or sentiment-driven.
So, we saw that survey content and question format played a significant role. But there was one other thing that affected accuracy — and better explained the synthetic completion paradox.
3. Was it the question distribution?
While question content and format play a role, our investigation revealed that the real difference in accuracy stems from survey structure.
Survey questions demand varying levels of cognitive load. Survey questions don’t just have distinct question content or formats — they also require a distinct level of mental effort from the respondent. We identified four categories of questions based on cognitive load.
Cognitive demands escalate differently across surveys. These question levels, from L1 recall questions to L4 speculative questions, are actually distributed differently across U&A and TL surveys.
What we found was that TL surveys have a steeper cognitive progression than U&A surveys, which are typically flatter.
- U&A surveys usually stay within L1-L3 throughout. Later questions ask about satisfaction, preferences, and opinions based on past or current behaviours and experiences.
- Thought leadership surveys go up to mostly L4 questions after the first half. The latter half demands more foresight and strategic judgment.
These cognitive curves dictate how difficult it is for the AI to accurately predict responses. For U&A surveys (with a flatter cognitive curve), more context about past behavior genuinely helps predict later questions on preferences and sentiments.
On the other hand, for TL surveys (with a steeper cognitive curve) past context can’t reliably predict responses that demand speculation on the future or have no single objective answer. In other words, TL surveys tend to get more difficult, earlier on, with questions that are harder for AI to predict.
That’s why we see the survey completion paradox in TL surveys, where more context (i.e., higher completion rates) was associated with a lower average accuracy of synthetic predictions.
-
TL surveys: Average accuracy is higher at 50% survey completion than at 90% survey completion — an inverse relationship — because higher completion rates force the AI to predict only L4 questions. The L4 questions toward the end of a TL survey are all about future-oriented or strategic judgment, and context from L1-L3 answers does not help the model predict these reliably. Thus, accuracy decreases as completion rate increases from 50% to 90% survey completion rates.
-
U&A surveys: The last half of U&A surveys maintains L2–L3 difficulty throughout, so more context from past questions genuinely helps predict these responses. This is why average accuracy increases from 50% to 90% survey completion rate.
Takeaways: What to ask before you apply.
So, what should you do? This is just an early study based on a proof-of-concept for a synthetic completion model.
A few specific constraints of our study are worth naming.
-
First, our validation used a small held-out sample per project (roughly 10 respondents per completion rate) which is enough to identify directional patterns but not to claim statistical precision. The findings are consistent across nine independent projects, which gives us confidence in the direction, but larger-scale validation is the clear next step.
-
Second, our simulation assumed respondents dropped out sequentially from the end of a survey. In practice, dropout can be triggered by a specific question type, a moment of friction, or a sensitive topic — patterns our masking strategy doesn't fully replicate.
-
Third, all nine projects in this study were conducted in English. Whether these findings hold across multilingual surveys or non-Western research contexts remains an open question, and an important one for a global research industry.
Naturally, we need more data before we can be confident in these takeaways. This area of research is evolving rapidly, and it’s not as simple as yes or no around synthetic data.
You just need to ask yourself the right questions.
Past or future
Are the questions to be completed about past behaviors and sentiments, or what they think will happen in the future?
-
In our study, AI could not adequately replicate tacit domain expertise or future judgment, so the stakes dictate the application. Questions based on past or present behavior (L1-L2 questions) are simpler to predict currently, whereas questions regarding future predictions or strategic judgment (L4) are not as reliable as of now. It’s the difference between publishing a TL report on how many people across nations are currently using EV cars, and one with expert POVs on the future of the EV industry.
Question distribution
How many — and what kinds of question types are left in your survey?
-
As we saw, for U&A, the accuracy of synthetic completion is stable across all stages. For TL, it degrades for surveys that are more than 50–60% complete and have mostly L4 questions remaining to be predicted.
-
The value of consulting and investment research is often its "jagged edge” or the outlier insight. We observed that using AI to complete open-ended questions risks regressing your POV to a "probabilistic average".
Risk tolerance
Will the synthetic data inform directional signals or high-stakes strategy?
-
Using synthetic completion for TL surveys, or to answer other high-stakes strategic questions, is as much a technical question as it is a reputational one. Even if synthetic data can technically complete a survey, there’s a bigger question: how does it affect the credibility of your work? For many audiences, tolerance for AI may be higher if it's seen as an investment to make the research more robust — rather than a cost-saving measure.
AI in market research is moving fast. Experimentation must speed up, too.
The topography of AI-powered market research is shifting weekly, and there are already meaningful AI applications in survey design, AI-assisted interviewing, data quality controls, and analysis that we’ve implemented to improve research efficiency and outcomes.
In this early study on “completing the incompletes”, we’ve only explored a small section of what’s possible with synthetic data. We need more data before making confident claims, and further research is the only way to see exactly where AI power — and the human premium — lives.
In the spirit of experimentation, next time you conduct a survey with Potloc, we’re happy to help you compare the real human responses in your sample against synthetic predictions from our multi-agent AI model. Talk to one of our experts if this could be of interest to your firm.