Well, it depends. To analyse 25,000 rows of open-ended questions, we learned publicly available LLMs work well with coding qualitative responses into pre-defined categories. However, these products fail to do that at scale.
2023 is arguably Generative AI’s breakout year. As with all new technologies, it has evoked conversations about defining the role of AI in our lives, jobs and the world we live in. A big part of the conversation on the role of AI in our jobs is usually the question, “Will AI replace us?” Much like everyone else, our team at YUX had (and still has) differing views on the role of AI in today’s world. However, in the middle of the year, we worked with Google on a project that had us thinking deeply about AI and its role in analysing large-scale qualitative data.
The User Research Challenge
This project was an exercise in learning as it challenged our team to think of new ways to collect and analyse qualitative data. The data collection methodology designed by the client was new to our team - a diary study with 1,000 participants across 4 African countries. At the end of the data collection process, we received 25,000 responses. Each response was a mix of open-ended questions and multi-select questions. While we could chart the multi-select questions using different graphs, our biggest challenge was finding a tool to code the qualitative responses. This article describes:
- how we used our market research tool, LOOKA for recruitment and data collection, and
- the learnings and challenges of analysing large-scale qualitative data with generative AI tools on the market.
image generated with DALL-E-3
Finding a Tool to Code Large-scale Qualitative Data
As soon as we started the study, we were searching for a tool to analyse the 25,000 qualitative responses. In our search, we looked out for speed and efficiency. The perfect product for the job will efficiently automate coding 25,000 open-ended responses into pre-defined categories in record time. This will leave our team with enough time to focus on drawing correlations, quizzing the data and contextualising our findings. We tested out a few tools for mixed-method data analysis recommended by our network and quickly realised that they were not built for:
- the scale of data we anticipated, and
- automating the process of coding the responses.
We had a dummy data of 30 survey responses filled by members of the team, tested a few of them with the data and landed on one suited for our needs - Numerous AI. It is a ChatGPT-enabled Google Sheet plugin with an INFER() formula that trains by taking in rows of qualitative responses and their matching codes, to code and quantify the remaining rows of qualitative responses.
How Did We Approach Recruitment and Data Collection?
Collecting 25,000 responses tested our process and challenged us to design a better process for data collection. We used LOOKA to recruit participants and collect survey responses during the study. Regarding the recruitment process, we had to make sure that the samples for each country were nationally representative. Given the diversity of profiles, we used a mix of methods including:
- Picking from our proprietary panel
- Using online Ads to source digitally accessible participants
- On-the-ground recruitment (surveyors) for participants especially those in rural areas, low-income segments and/or above 50 years of age
The use case for data collection was new as each participant had to fill in the same form multiple times a day over a few days. Each of the 20 LOOKA moderators on this project managed 50 participants in a WhatsApp group, reminding them to fill in the form. We had hoped for responses in a few local languages spoken across the 4 countries (the participants had the option to fill out the form in local languages) however, we received very few responses in these languages. People who filled out the survey actively use the internet and probably perceive English as the language of the internet.
We also experienced a few more challenges:
- High drop-off rate: Given the frequency of collecting responses (5 times a day for 5 days), we experienced a higher drop-off rate compared to previous diary studies. Each participant was expected to submit 25 responses. For an individual’s participantion to be valid, they needed to reach the minimum threshold of 75%. Anyone who couldn’t reach the threshold was replaced in the middle of data collection. To mitigate this, we organized additional catch-up groups to reach the different quotas set initially.
- Dishonest participation: We encountered cases of participants who tried to bypass the recruitment process. Also, some didn’t respect the timeline and waited till the last 1-2 days to send in all 25 responses. Others shared the same information for every entry against the study instructions.. To ensure the data collected was reliable, we implemented a multi-layered verification process from the registration to the payment of incentives. We double-checked IP addresses, and Bank/MoMo accounts to identify potential fraudulent participants. We also made changes to our survey platform to limit the number of responses per participant.
- Managing external factors: Factors such as power outages, and poor internet connection impacted the data collection process. Some participants had to write down their answers on paper and fill them on the online form whenever it was possible.
Expectation vs Reality: What Was Analysis Like?
We anonymised the data and were transparent with the client on the use of AI-powered tools for the analysis.
The first phase of the analysis involved brainstorming correlations. We had some sessions to brainstorm and prioritise correlation as we collected the data. As soon as the LOOKA team wrapped up data collection, our first step was charting some of these correlations using Pivot tables and seeing what insights came up.
The second phase of the analysis was refining and contextualising the different codes for the open-ended question. The client shared a list of codes, and we included a few more we found from the data. Numerous AI was helpful with this as it identified a few more categories we hadn’t picked up on.
As soon as we moved on to the third phase of the analysis - coding the rest of the data, we ran into a few challenges with Numerous AI. Though we had paid for tokens, we noticed that the tokens worked for less than 3,000 rows of data before they were exhausted. One of the reasons this kept happening was that the INFER() function couldn’t take in the volume of data at once without throwing an error. So, we decided to run the function in batches of hundreds. Another challenge arose, the rows of data already categorised were re-categorised (zapping the tokens) anytime a team member opened the sheet or refreshed the Google Sheet. Further clarification with their team revealed the limitations that Google Sheet integration with ChatGPT posed for using the plugin.
As a result, we adapted and used ChatGPT to code the qualitative responses. We wrote a prompt, tested it with some data, refined it, and fed it the rest of the data. The prompt worked as expected. The goal was to feed ChatGPT all the rows of information we needed to categorise at once, but we soon realised that it could not code above 150 rows of data without hallucinating or sometimes returning empty cells. This was a big challenge for us as we had to code in batches, but we managed to pull through.
Another challenge we experienced was that ChatGPT didn’t communicate when it failed at the task. Every output it returned came with a disclaimer, however, we had to figure out what rows it skipped and spend time cleaning up the data.
Reflections and Future Outlook: Expectations for AI-Powered UX Research Products
So, is generative AI the tool for uncovering Africa’s user needs or analysing large-scale qualitative data? The answer is yes, but there is a growing need for AI-powered products designed specifically for this use case and experience. The current tools (free to access and paid ones) have limitations, evidenced by our experience on this project. What have we learned about the current state of AI and AI-enabled tools from working on this project?
- Discoveries in the analysis: We didn’t go into the research expecting AI to be a tool for discovery, but it was. Numerous AI (a ChatGPT-enabled paid tool) helped us identify new categories for coding the data.
- Publicly available tools not efficient at scale: On the other hand, ChatGPT and Numerous AI (paid versions of these tools) fell short of our expectations of efficiency in coding the volume of data we were working with.
- Are large scale AI-moderated interviews the future of qualitative research and analysis? Recently, I learnt about Outset AI, an AI-moderated chatbot for conducting and analysing qualitative research. While it doesn’t solve the problem this research posed for our team, it is an interesting product.
- Organisations building internal AI tools to analyse qualitative data at scale: This case study by the GE team on their internally-built AI tool to analyse and find UX research insights from their huge data of customer support tickets is super insightful. While they didn’t analyse the data based on pre-defined categories, their internal tool identified recurring trends. Reading the case study, it appears they didn’t run into the challenge of efficiency at scale. This begs the question, “Is building an internal tool the best approach to using AI for qualitative data analysis at scale?”
Leading this study was a very fulfilling experience as it stretched me and built my confidence in quantitative data analysis. It also got me reading about how other researchers use AI tools for large-scale qualitative research and analysis. While I am not a futurist, I have been thinking, reading and listening to people talk about the role of AI in the future of work and qualitative research and analysis.
As much as I am open-minded about the future of AI in user reseach, I am also measured in my thinking as AI trains on the biases that exist in the world. Sense-making and local context are important for analysing qualitative data, and AI falls short of this. However, I am excited by the possibility of discovering new findings, leaving researchers with more time to focus on sense-making.
We’re still searching for an AI-powered product that can achieve the user research analysis needs working on this project has raised for us. Do you know of a product, or are you working on one? Send a message on LinkedIn or an email to email@example.com. I’ll be happy to chat with you.