Cracking the code: social data access and the DSA’s impact on disinformation research

March 10, 2025 • Digital News, Recent, Research, Technology • by Massimo Terenzi and Fabio Giglietto

Image source: flickr

This article is part of the PROMPT series.

By Massimo Terenzi and Fabio Giglietto

The Digital Services Act (DSA), which officially came into force on November 16, 2022, is a game-changer for social media research. By requiring Very Large Online Platforms (VLOPs) and Very Large Online Search Engines (VLOSEs) to grant researchers access to available data, the EU-wide law promises to revolutionise transparency. In a time when disinformation, algorithmic bias, and content moderation failures shape public discourse, accessing platform data is critical for understanding how these dynamics unfold.

However, while the DSA is a step forward, it is far from being a silver bullet. Platforms have responded in different ways, with inconsistent implementations, access restrictions, and technical roadblocks that limit the law’s effectiveness. The question now is not just what’s changing but whether these changes will actually help researchers do their work.

The DSA: a breakthrough for platform transparency?

For years, social media research has been constrained by opaque policies, unpredictable API shutdowns, and restrictive terms of service. Investigating coordinated disinformation campaigns, engagement manipulation, and content amplification has often required workarounds—ranging from scraping publicly visible data to relying on platforms’ voluntary transparency initiatives, many of which could be revoked at any time.

The EU is trying to change this with the DSA. Article 40.12 establishes that social media platforms must provide researchers with access to publicly accessible data—without requiring special agreements or privileged access. This data includes everything from public posts and engagement metrics to metadata on how content spreads.

The goal? To create a stable, reliable, and scalable research infrastructure, allowing researchers to study systemic risks like disinformation, election interference, and online harms.

At least, that’s the theory. The reality is proving more complicated.

The promise of data access: what’s working?

For the first time, platforms are being legally compelled to provide structured data access for researchers studying systemic risks like disinformation, algorithmic amplification, and online manipulation. While research APIs and tools have existed before, they were often unreliable, fragmented, or restricted to select partners. With the Digital Services Act, the landscape is shifting toward a more structured and enforceable framework, reducing the risk of arbitrary shutdowns or data restrictions.

Some platforms have begun rolling out interactive dashboards, designed for non-technical researchers to explore engagement trends, filter content, and generate insights without writing a single line of code. These dashboards lower the barrier to entry, making it easier for journalists, policymakers, and academics without programming expertise to analyse platform behaviour.

For those conducting large-scale, quantitative studies, application programming interfaces (APIs) provide direct access to massive datasets. These APIs allow researchers to pull granular metadata on posts, comments, and engagement metrics, enabling longitudinal studies that track how online behaviors evolve over time. In some cases, platforms are offering data export features, allowing research teams to merge social media datasets with other sources, such as election data or economic indicators, to uncover broader patterns.

This shift has enormous implications for digital research, unlocking new capabilities to analyse:

How narratives spread across different platforms, revealing cross-platform amplification and the influence of key actors.

Which types of content are amplified by algorithmic ranking systems, providing insight into bias, echo chambers, and virality mechanisms.

How moderation policies impact engagement and visibility over time, helping assess whether enforcement is consistent, effective, or prone to unintended consequences.

The role of bot networks and coordinated influence operations, by cross-referencing metadata and engagement patterns to detect manipulation tactics in real time.

However, the real challenge isn’t just accessing the data—it’s ensuring that it is complete, usable, and comparable across platforms. Without standardised formats, consistent metadata structures, and transparency about data omissions, even the most advanced tools risk producing incomplete or misleading insights. The potential is there, but the execution remains an open question.

Where the DSA falls short: a fractured system

Despite the law’s ambitions, data access under the DSA remains fragmented. Each VLOP and VLOSE interprets the regulation differently, leading to:

major discrepancies in the type and volume of data provided;

varying degrees of transparency regarding what is included (or excluded) from datasets;

significant inconsistencies in API structures, making cross-platform comparisons nearly impossible.

For instance, while some platforms offer detailed engagement metrics, others provide only aggregated statistics, stripping away valuable context. In some cases, publicly available data is locked behind application processes, requiring researchers to submit extensive documentation before even getting access.

Technical barriers make things even harder. Rate limits, for example, cap the number of API queries a researcher can make in a given timeframe. This makes it difficult—if not impossible—to collect large-scale, real-time data. For studies on fast-moving events like elections or misinformation surges, delays in access can make findings obsolete before they’re even published.

And then there’s the issue of data accuracy. Several researchers have already reported discrepancies between API data and what is actually visible on platforms, raising concerns on whether platforms are selectively filtering information or limiting research scope. Without independent auditing mechanisms, verifying data reliability becomes an uphill battle.

The bureaucratic maze of data access

Beyond the technical limitations, the bureaucratic hurdles researchers face under the DSA are significant. Platforms often require:

detailed project proposals, including research objectives, methodologies, and expected outcomes;

institutional backing, making access more difficult for independent researchers or journalists;

strict ethics approvals, which, while necessary, create delays in time-sensitive studies.

Even after clearing these hurdles, the approval process remains opaque. Researchers frequently report long wait times and unclear rejection criteria, leading to frustration and uncertainty about whether they will ever gain access at all.

Moreover, some platforms require researchers to work within secure research environments, such as virtual clean rooms, where data can be analysed but not exported. While these measures are designed to protect user privacy, they also create logistical challenges that limit how researchers can use and integrate data into broader studies.

Fixing the system: what needs to happen next?

For the DSA to fulfill its promise, a coordinated effort is needed among platforms, regulators, and the research community. There are at least 3 key areas that need urgent attention.

Standardisation of data access

Platforms need to align their APIs, documentation, and data formats to enable cross-platform analysis. Without a unified structure, researchers will continue to struggle with fragmented, inconsistent datasets that hinder meaningful comparisons.

Increased transparency in application processes

The criteria for accessing platform data should be clear and publicly available. Platforms must streamline approval processes, reduce unnecessary bureaucratic barriers, and provide timely responses to access requests.

Independent auditing and verification

To ensure data reliability, third-party audits should be conducted on platform-provided datasets. This would help detect biases, missing information, or manipulations that could distort research findings.

From silos to synergy: strengthening research networks

Ensuring data and tools reliability, though, also requires a stronger, more connected research community. Right now, researchers working under the DSA often interact with platforms on an individual basis, negotiating access, troubleshooting technical issues, and navigating opaque approval processes largely on their own. This siloed approach creates inefficiencies, preventing the kind of collective problem-solving that could significantly improve research outcomes.

One major issue is the lack of a shared space where researchers and other stakeholders with API access can communicate in real time. When something goes wrong—whether it’s an unexpected API outage, missing data fields, or inconsistencies between datasets—there is no immediate way to determine whether the issue is widespread or specific to a single user. Instead, researchers must rely on platform representatives for clarification, slowing down investigations, and creating unnecessary bottlenecks.

The absence of a centralised research network also makes it harder to develop and disseminate best practices. Each group is left to figure out methodologies, ethical guidelines, and data validation techniques on their own, often reinventing the wheel rather than building on collective knowledge. This fragmentation weakens the broader impact of research efforts and limits the ability to hold platforms accountable.

To address these challenges, a coordinated infrastructure is needed—one that facilitates horizontal information sharing, rapid troubleshooting, and collaborative problem-solving. Establishing independent researcher forums, shared documentation repositories, and open-access knowledge hubs would allow for quicker adaptation to technical changes, more effective oversight of platform-provided data, and a stronger collective voice in shaping future transparency initiatives.

A step forward, but not the finish line

The Digital Services Act was supposed to mark a turning point for digital transparency, forcing social media giants to open their doors to researchers studying disinformation, algorithmic bias, and platform governance. In many ways, it has. Access to structured datasets, APIs, and transparency tools has improved, creating new opportunities to analyse how content spreads, how algorithms shape engagement, and how moderation policies impact visibility.

But access alone isn’t enough. If data is incomplete, inconsistent, or restricted by technical and bureaucratic barriers, research suffers—and so does the public’s ability to understand what’s happening online. Without standardised APIs, cross-platform studies remain difficult. Without clear and efficient application processes, independent researchers and journalists risk being shut out. Without stronger guarantees on data reliability, findings could be skewed by hidden omissions or discrepancies.

What happens if these problems aren’t fixed? The stakes are high. Disinformation networks won’t wait for smoother access to research tools. Political manipulation, economic scams, and algorithmic amplifications continue to evolve in real-time, shaping public opinion before researchers can even collect the data needed to analyse them. The risk isn’t just slow research—it’s missing the bigger picture altogether.

The next step isn’t about rewriting the rules, but making sure they work as intended. Stronger standards for data access, better transparency on how platforms share information, and a research community that can test, validate, and refine these systems over time will be crucial. If the digital world is to be held accountable, the people studying it must be given the tools to do their jobs—without roadblocks, uncertainty, or compromise.

The foundation has been laid. Now, it’s time to build something that lasts.

This article was originally published on: https://de.ejo-online.eu The original article is available here: https://de.ejo-online.eu/digitales/den-code-knacken-zugang-zu-sozialdaten-und-die-auswirkungen-des-dsa-auf-die-desinformationsforschung

Opinions expressed on this website are those of the authors alone and do not necessarily reflect or represent the views, policies or positions of the EJO or the organisations with which they are affiliated.

About the Author

Related Posts

Operated by

Funded by

Newsletter

Find us on Facebook

Archives

Links