The promise that open science has transformed our information landscape offers a vision of democratized knowledge and increased transparency. In this rapidly evolving environment, platforms such as The Lens, OpenAlex, and OpenAIRE are emerging as alternatives to traditional databases like Scopus, boasting collections that exceed 200 million records.

However, SCImago has led an analysis titled A Comparative Analysis of Open and Commercial Bibliographic Infrastructures,” and the findings invite a deeper reflection: scale is not always synonymous with quality.

The study compared Scopus (with approximately 74 million records) against open platforms that now surpass the 200 million record mark. The exponential growth of the latter relies on aggressive automation designed to capture preprints, repositories, and grey literature.

Over 200 million records vs. 74 million: The massive growth of open platforms compared to the traditional editorial curation model.

The main finding is that more data does not necessarily equate to better data. The pursuit of total coverage sacrifices precision, creating a “signal-to-noise” problem that complicates serious bibliometric evaluation.

Two worlds within a single database

The study reveals that open platforms are not uniform collections. A closer analysis reveals two very distinct operational realities:

The “curated core“: A stable set of between 60 and 63 million records that overlap with databases like Scopus. Thanks to rigorous filtering processes, this core maintains robust metadata, reliable identifiers, and a consistent structure.

The “extended literature“: Approximately 150 million additional records captured through large-scale automation to encompass preprints, grey literature, and repositories. Here, the quality of metadata drops drastically.

Two worlds within the same database: The stable curated core overlapping with Scopus versus the vast and fragmented extended literature

The warning signs in this periphery are evident. While over 93% of the records in the curated core include a DOI, this coverage plummets to 46% in The Lens and 53% in OpenAlex within the extended literature. Similarly, ISSN coverage falls below 32%, leaving millions of documents disconnected from any clear publishing framework.


Challenges for institutional strategy

For institutional strategy, the most pressing issue is the absence of institutional affiliation data. In the extended literature, nearly 74% of the records cannot be linked to any institution.

The identity challenge: Why 74% of exclusive records in open platforms cannot be linked to any institution.

This poses a direct challenge to the visibility and positioning of universities and research centers. Research cannot be accurately evaluated, funded, or managed if the underlying identities of its creators are missing.

The impact imbalance

Compounding this lack of standardization is a deep structural problem. The millions of additional records on open platforms provide references that boost the impact metrics of already established journals, yet they rarely receive citations in return.

This imbalance is striking: while records in the curated core receive an average of 25 to 27 citations, documents found exclusively on open platforms average between 1.3 and 2 citations. In practice, this reinforces a well-known dynamic in science where visibility simply attracts more visibility, leaving much of the peripheral literature in the shadows.

The impact paradox: The curated core receives between 25 and 27 citations on average, while exclusive literature barely reaches 2 citations

Towards a strategic and hybrid approach

Open infrastructures represent a fundamental achievement and have created new opportunities for research discovery. However, accessibility is not synonymous with reliability.

For institutional leaders and scientific evaluation specialists, the lesson is clear: more data does not automatically mean better data. When the majority of a system’s exclusive records cannot even be linked to an institution, an uncomfortable question emerges: Is your organization prepared to make funding or hiring decisions based on data where many researchers are effectively anonymous?

For rigorous bibliometric analysis, the hybrid approach remains the safest path. Leveraging the breadth of open science demands maintaining, simultaneously, the strict quality control standards that ensure informed and strategic decision-making.

To explore these findings further, you can read the full report at: https://www.scimagoepi.com/producto/a-comparative-analysis