PubMed Central
PubMed Central (PMC) is a free digital repository that archives open access full-text scholarly articles that have been published in biomedical and life sciences journals. As one of the major research databases developed by the National Center for Biotechnology Information (NCBI), PubMed Central is more than a document repository. Submissions to PMC are indexed and formatted for enhanced metadata, medical ontology, and unique identifiers which enrich the XML structured data for each article.[1] Content within PMC can be linked to other NCBI databases and accessed via Entrez search and retrieval systems, further enhancing the public's ability to discover, read and build upon its biomedical knowledge.[2]
Producer | United States National Library of Medicine (United States) |
---|---|
History | 2000–present |
Access | |
Cost | Free |
Coverage | |
Disciplines | Medicine |
Record depth | Index, abstract & full-text |
Format coverage | Journal articles |
Links | |
Website | www |
Title list(s) | www |
PubMed Central is distinct from PubMed.[3] PubMed Central is a free digital archive of full articles, accessible to anyone from anywhere via a web browser (with varying provisions for reuse). Conversely, although PubMed is a searchable database of biomedical citations and abstracts, the full-text article resides elsewhere (in print or online, free or behind a subscriber paywall).
As of December 2018, the PMC archive contained over 5.2 million articles,[4] with contributions coming from publishers or authors depositing their manuscripts into the repository per the NIH Public Access Policy. Earlier data shows that from January 2013 to January 2014 author-initiated deposits exceeded 103,000 papers during a 12-month period.[5] PMC identifies about 4,000 journals which participate in some capacity to deposit their published content into the PMC repository.[6] Some publishers delay the release of their articles on PubMed Central for a set time after publication, referred to as an "embargo period", ranging from a few months to a few years depending on the journal. (Embargoes of six to twelve months are the most common.) PubMed Central is a key example of "systematic external distribution by a third party",[7] which is still prohibited by the contributor agreements of many publishers.
History
PubMed Central began as E-biomed, initially proposed in May 1999 by then-NIH director Harold Varmus.[8] The idea came to him "abruptly" in December 1998, inspired by the early use of arXiv for preprints after a presentation from Pat Brown of Stanford and David Lipman, director of NCBI:[9][10]
But my views broadened abruptly one morning in December of 1998 when I met Pat Brown for coffee, at the café that was formerly the famed Tassajara Bakery, on the corner of Cole and Parnassus, during a visit to San Francisco. [...] A few weeks before our coffee, Pat had learned about the methods being used by the physicist Paul Ginsparg and his colleagues at Los Alamos to allow physicists and mathematicians to share their work with one another over the Internet. They were posting “preprints” (articles not yet submitted or accepted for publication) at a publicly accessible website (called LanX or arXiv) for anyone to read and critique. [...] The more I thought about this, the more I was convinced that a radical restructuring of methods for publishing, transmitting, storing, and using biomedical research reports might be possible and beneficial. In a spirit of enthusiasm and political innocence, I wrote a lengthy manifesto, proposing the creation of an NIH-supported online system, called E-biomed.
The goal of E-biomed was to provide free access to all biomedical research. Papers submitted to E-biomed could take one of two routes: either immediately published as a preprint, or through a traditional peer review process. The peer review process was to resemble contemporary overlay journals, with an external editorial board retaining control over the process of reviewing, curating, and listing papers which would otherwise be freely accessible on the central E-biomed server. Varmus intended to realize the new possibilities presented by communicating scientific results digitally, imagining continuous conversation about published work, versioned documents, and enriched "layered" formats allowing for multiple levels of detail.[8]
The proposal to create a central index of biomedical research was a radical departure from prevailing publishing norms. Prior to the internet, publication indexes operated largely like ISBNs: allocated by registration agencies to secondary publishers. The idea that anyone could own their own address space via a domain name and create their own indexing system was a wholly new idea.[11][12] Major commercial publishers had begun experimenting with an indexing system for scientific papers shared across publishers as early as 1993, and were spurred to action following the E-biomed proposal. At the October 1999 STM Annual Frankfurt Conference, several publishers led by Springer-Verlag reached a hurried conference room consensus to launch their competitor prototype:[13]
At the Board meeting of the STM association, held the afternoon of Monday, October 11, before the fair’s Wednesday opening, discussion focused on an emerging U.S. National Library of Medicine (NLM) initiative called E-Biomed (later PubMed Central) that had been proposed by Harold Varmus of the National Institutes of Health in the spring of 1999. Varmus envisioned a digital archive of journals, accessible free of charge and with the added value of reference linking. “Our consensus was that publishers should be the ones doing the linking,” said Bob Campbell, who chaired the meeting. “Since we were ‘higher up the stream,’ so to speak, we should be able to link our articles ahead of the NLM as part of the process of producing them. Stefan von Holtzbrinck then set the ball rolling by offering to link Nature publications with anyone else’s. We decided to issue an announcement of a broad STM reference linking initiative. It was, of course, a strategic move only, since we had neither plan nor prototype.”
A small group led by Arnoud de Kemp of Springer-Verlag met in an adjacent room immediately following the Board meeting to draft the announcement, which was distributed to all attendees of the STM annual meeting the following day and published in an STM membership publication. [...] The potential benefit of the service that would become CrossRef was immediately apparent. Organizations such as AIP and IOP (Institute of Physics) had begun to link to each other’s publications, and the impossibility of replicating such one-off arrangements across the industry was obvious. As Tim Ingoldsby later put it, “All those linking agreements were going to kill us.”
Under pressure from vigorous lobbying from commercial publishers and scientific societies who feared for lost profits,[14] NIH officials announced a revised PubMed Central proposal in August 1999.[9] PMC would receive submissions from publishers, rather than from authors as in E-biomed. Publications were allowed time-embargoed paywalls up to 1 year. PMC would only allow peer-reviewed work — no preprints.[15] The then-unnamed publisher-led linking system shortly thereafter became CrossRef and the larger DOI system. Varmus, Brown, and others including Michael Eisen went on to found the Public Library of Science (PLoS) in 2001, reaching the conclusion "that if we really want to change the publication of scientific research, we must do the publishing ourselves."[16]
Adoption
Launched in February 2000, the repository has grown rapidly as the NIH Public Access Policy is designed to make all research funded by the National Institutes of Health (NIH) freely accessible to anyone, and, in addition, many publishers are working cooperatively with the NIH to provide free access to their works. In late 2007, the Consolidated Appropriations Act of 2008 (H.R. 2764) was signed into law and included a provision requiring the NIH to modify its policies and require inclusion into PubMed Central complete electronic copies of their peer-reviewed research and findings from NIH-funded research. These articles are required to be included within 12 months of publication. This is the first time the US government has required an agency to provide open access to research and is an evolution from the 2005 policy, in which the NIH asked researchers to voluntarily add their research to PubMed Central.[17]
A UK version of the PubMed Central system, UK PubMed Central (UKPMC), has been developed by the Wellcome Trust and the British Library as part of a nine-strong group of UK research funders. This system went live in January 2007. On 1 November 2012, it became Europe PubMed Central. The Canadian member of the PubMed Central International network, PubMed Central Canada, was launched in October 2009.
The National Library of Medicine "NLM Journal Publishing Tag Set" journal article markup language is freely available.[18] The Association of Learned and Professional Society Publishers comments that "it is likely to become the standard for preparing scholarly content for both books and journals".[19] A related DTD is available for books.[20] The Library of Congress and the British Library have announced support for the NLM DTD.[21] It has also been popular with journal service providers.[22]
With the release of public access plans for many agencies beyond NIH, PMC is in the process of becoming the repository for a wider variety of articles.[23] This includes NASA content, with the interface branded as "PubSpace".[24][25]
Technology
Articles are sent to PubMed Central by publishers in XML or SGML, using a variety of article DTDs. Older and larger publishers may have their own established in-house DTDs, but many publishers use the NLM Journal Publishing DTD (see above).
Received articles are converted via XSLT to the very similar NLM Archiving and Interchange DTD. This process may reveal errors that are reported back to the publisher for correction. Graphics are also converted to standard formats and sizes. The original and converted forms are archived. The converted form is moved into a relational database, along with associated files for graphics, multimedia, or other associated data. Many publishers also provide PDF of their articles, and these are made available without change.[26]
Bibliographic citations are parsed and automatically linked to the relevant abstracts in PubMed, articles in PubMed Central, and resources on publishers' Web sites. PubMed links also lead to PubMed Central. Unresolvable references, such as to journals or particular articles not yet available at one of these sources, are tracked in the database and automatically come "live" when the resources become available.
An in-house indexing system provides search capability, and is aware of biological and medical terminology, such as generic vs. proprietary drug names, and alternate names for organisms, diseases and anatomical parts.
When a user accesses a journal issue, a table of contents is automatically generated by retrieving all articles, letters, editorials, etc. for that issue. When an actual item such as an article is reached, PubMed Central converts the NLM markup to HTML for delivery, and provides links to related data objects. This is feasible because the variety of incoming data has first been converted to standard DTDs and graphic formats.
In a separate submission stream, NIH-funded authors may deposit articles into PubMed Central using the NIH Manuscript Submission (NIHMS). Articles thus submitted typically go through XML markup in order to be converted to NLM DTD.
Reception
Reactions to PubMed Central among the scholarly publishing community range between a genuine enthusiasm by some,[27] to cautious concern by others.[28]
While PMC is a welcome partner to open access publishers in its ability to augment the discovery and dissemination of biomedical knowledge, that same truth causes others to worry about traffic being diverted from the published version of record, the economic consequences of less readership, as well as the effect on maintaining a community of scholars within learned societies.[29][30] A 2013 analysis found strong evidence that public repositories of published articles were responsible for "drawing significant numbers of readers away from journal websites" and that "the effect of PMC is growing over time".[31]
Libraries, universities, open access supporters, consumer health advocacy groups, and patient rights organizations have applauded PubMed Central, and hope to see similar public access repositories developed by other federal funding agencies so to freely share any research publications that were the result of taxpayer support.[32]
The Antelman study of open access publishing found that in philosophy, political science, electrical and electronic engineering and mathematics, open access papers had a greater research impact.[33] A randomised trial found an increase in content downloads of open access papers, with no citation advantage over subscription access one year after publication.[34]
The NIH policy and open access repository work has inspired a 2013 presidential directive which has sparked action in other federal agencies as well.
In March 2020, PubMed Central accelerated its deposit procedures for the full text of publications on coronavirus. The NLM did so upon request from the White House Office of Science and Technology Policy and international scientists to improve access for scientists, healthcare providers, data mining innovators, AI healthcare researchers, and the general public.[35]
PMCID
The PMCID (PubMed Central identifier), also known as the PMC reference number, is a bibliographic identifier for the PubMed Central open access database, much like the PMID is the bibliographic identifier for the PubMed database. The two identifiers are distinct however. It consists of "PMC" followed by a string of seven numbers. The format is:[36]
- PMCID: PMC1852221
Authors applying for NIH awards must include the PMCID in their application.
See also
- Europe PubMed Central
- JATS (technology)
- MEDLINE, an international literature database of life sciences and biomedical information
- PMID (PubMed Identifier)
- PubMed Central Canada
- Redalyc (similar project focused on Latin America)
- SciELO (similar service)
References
- Beck J (2010). "Report from the Field: PubMed Central, an XML-based Archive of Life Sciences Journal Articles". Proceedings of the International Symposium on XML for the Long Haul: Issues in the Long-term Preservation of XML. 6. doi:10.4242/BalisageVol6.Beck01. ISBN 978-1-935958-02-4.
- Maloney C, Sequeira E, Kelly C, Orris R, Beck J (5 December 2013). PubMed Central. National Center for Biotechnology Information (US). Archived from the original on 28 July 2020. Retrieved 8 September 2017.
- "MEDLINE, PubMed, and PMC (PubMed Central): How are they different?". www.nlm.nih.gov. 9 September 2019. Archived from the original on 1 November 2020. Retrieved 29 January 2020.
- "Openness by Default", Inside Higher Ed, 16 January 2017.
- "NIHMS Statistics". nihms.nih.gov. Archived from the original on 2014-02-10. Retrieved 2014-02-07.
- "Home - PMC - NCBI". www.ncbi.nlm.nih.gov. Archived from the original on 2011-08-13. Retrieved 2017-09-08.
- Ouerfelli N. "Author rights: what's it all about" (PDF). Archived (PDF) from the original on 2019-01-19. Retrieved 2019-01-17.
- Varmus, Harold (1999-04-19). "E-Biomed: A Proposal for Electronic Publications in the Biomedical Sciences (Draft and Addendum)". NIH Preprint. 04 (99). 101584926X356. Retrieved 2023-10-19.
- Kling, Rob; Spector, Lisa B.; Fortuna, Joanna (2004). "The real stakes of virtual publishing: The transformation of E-Biomed into PubMed central". Journal of the American Society for Information Science and Technology. 5 (2): 127–148. doi:10.1002/asi.10352. Retrieved 2023-10-20.
- Varmus, Harold (2009). The Art and Politics of Science. New york: W.W. Norton & Company. pp. 255–256. ISBN 978-0-393-06128-4. Retrieved 2023-10-20.
- Saunders, Jonny (2022-09-01). "Decentralized Infrastructure for (Neuro)science". p. 26. arXiv:2209.07493 [cs.GL].
- "The goal of the semantic web is to express real life. Many things in real life, real questions which we will face are not efficiently computable. There are two solutions to this: The classical (pre-web) solution is to constrain the langage of expression so that all queries terminate in finite time. The weblike solution is to allow the expression of facts and rules in an overall language which is sufficiently flexible and powerful to express real life. Create subsets fo the web in which specific constraints give you specific computational properties. An anlogy is with the human-information systems which existed before the web. Most forced one to keep ones data in a hierarchy (sometimes of fixed depth or a matrix (often with a specific number of dimensions). This gave consistency properties within the information system. I bet DARPA has many of these systems and still does. They only way they could be integrated was to express them in terms of a much more powerful language - global hypertext. Hypertext did not have any of these reassuring properties. People were frightened about getting lost in it. You could follow links forever. As it turns out, it is true of course that there is a problem that you can follow links forever in the Web. " Berners-Lee, Tim. "What the Semantic Web can represent". Web design issues. Retrieved 2023-10-20.
- CrossRef (2009). The Formation of CrossRef: A Short History (PDF). CrossRef. p. 8. Retrieved 2023-10-20.
- Public Library of Science. "Public Library of Science post". The Harold Varmus Papers. NIH, National Library of Medicine. Retrieved 2023-10-20.
- Kutz, Myer. "The Scholars Rebellion Against Scholarly Publishing Practices: Varmus, Vitek, and Venting". Searcher. Retrieved 2020-10-20.
- Ashburner, Michael; Brown, Patrick O.; Eisen, Michael; Kirschner, Marc; Khosla, Chaitan; Nusse, Roel; Roberts, Richard J.; Scott, Matthew; Varmus, Harold; Wold, Barbara. "Letter from the Public Library of Science [to signers of the PLoS open letter]". Harold Varmus - Profiles in Science. NIH, National Library of Medicine. Retrieved 2023-10-20.
- "Public access to NIH research made law". Science Codex. 2007. Archived from the original on 4 March 2016. Retrieved 6 November 2013.
- "Journal Publishing Tag Set". National Center for Biotechnology Information. Archived from the original on 16 October 2020. Retrieved 6 November 2013.
- French D (4 August 2006). "ALPSP Technology Update: A Standard XML Document Format: The case for the adoption of NLM DTD". ALPSP. Archived from the original on 26 October 2020. Retrieved 6 November 2013.
- "NCBI Book Tag Set". dtd.nlm.nih.gov. Archived from the original on 2020-10-16. Retrieved 2006-09-27.
- "News from the Library of Congress". Library of Congress. 19 April 2006. Archived from the original on 17 April 2016. Retrieved 6 November 2013.
- "Inera Inc. - NLM DTD Resources". 19 February 2013. Archived from the original on 2013-02-19.
- "Public Access Plans of U.S. Federal Agencies". cendi.gov. Archived from the original on 2020-10-18. Retrieved 2016-08-19.
- Kovo Y (22 July 2016). "Public Access to Results of NASA-Funded Research". nasa.gov. Archived from the original on 19 August 2016.
- "NASA in PMC". preview.ncbi.nlm.nih.gov. Archived from the original on 2020-07-27. Retrieved 2016-08-19.
- "Journal Archiving and Interchange Tag Suite". dtd.nlm.nih.gov. Retrieved 2023-10-17.
- "PLOS Applauds Congress for Action on Open Access". Archived from the original on 2016-05-07. Retrieved 2014-02-07.
- "ACS Submission to the Office of Science and Technology Policy Request for Information on Public Access to Peer-Reviewed Scholarly Publications Resulting from Federally Funded Research" (PDF). Office of Science and Technology Policy. Archived (PDF) from the original on 2017-02-13. Retrieved 2014-02-07 – via National Archives.
- Davis PM (October 2012). "The effect of public deposit of scientific articles on readership". The Physiologist. 55 (5): 161, 163–5. PMID 23155924.
- Davis PM (July 2013). "Public accessibility of biomedical articles from PubMed Central reduces journal readership--retrospective cohort analysis". FASEB Journal. 27 (7): 2536–41. doi:10.1096/fj.13-229922. PMC 3688741. PMID 23554455.
- Davis PM (July 2013). "Public accessibility of biomedical articles from PubMed Central reduces journal readership--retrospective cohort analysis". FASEB Journal. 27 (7): 2536–41. doi:10.1096/fj.13-229922. PMC 3688741. PMID 23554455.
- "Autism Speaks Announces New Policy to Give Families Easy, Free Access to Key Research Findings - Press Release - Autism Speaks". www.autismspeaks.org. 25 July 2012. Archived from the original on 12 June 2018. Retrieved 7 February 2014.
- Antelman, Kristin (2004). "Do Open-Access Articles Have a Greater Research Impact?". College & Research Libraries. 65 (5): 372–382. doi:10.5860/crl.65.5.372., summarized by Stemper J, Williams K (2006). "Scholarly communication: Turning crisis into opportunity". College & Research Libraries News. 67 (11): 692–696. doi:10.5860/crln.67.11.7720.
- Davis PM, Lewenstein BV, Simon DH, Booth JG, Connolly MJ (July 2008). "Open access publishing, article downloads, and citations: randomised controlled trial". BMJ. 337: a568. doi:10.1136/bmj.a568. PMC 2492576. PMID 18669565.
- "The National Library of Medicine expands access to coronavirus literature through PubMed Central". National Institutes of Health (NIH). 2020-03-25. Archived from the original on 2020-12-21. Retrieved 2020-03-31.
To support this initiative, NLM is adapting its standard procedures for depositing articles into PMC to provide greater flexibility that will ensure coronavirus research is readily available.
- "Include PMCID in Citations | publicaccess.nih.gov". publicaccess.nih.gov. Archived from the original on 2020-12-29. Retrieved 2017-07-01.