NCDS Data Internship


The ability to provide data services is now a part of many job openings and new opportunities in health sciences and academic libraries. In turn, the goal of the NCDS Internship Program is to provide practical experiences to interns that include the soft and hard skills needed to enter data librarian positions, including working on a team, understanding the data lifecycle, and working with data. In an effort to diversify the profession, the National Center for Data Services (NCDS) of the Network of the National Library of Medicine (NNLM) provides internships for people from underrepresented racial and ethnic groups.The goal of this program is to introduce students from historically excluded racial and ethnic groups to data librarianship in a health sciences context. These paid internships offer opportunities to gain practical experience while working with a mentor in a guided environment on structured, data-related projects. There are multiple projects available from our site partners. These projects involve structured activities including data cleaning, structuring, analysis, or visualization, or a guided in-depth project in data curation. Each intern will be provided with training and mentoring throughout. The practical experiences developed during the internship will provide participants with skills needed to be competitive for data librarian positions.

**All internships are currently 100% virtual.**


The 10-week internship runs from June - August each year, contingent upon the availability of funds.

About Data Librarianship

Project Details

photo of interns smiling together at a conference

2024 Participants and Sites

The announcement and description of our 2024 participants is now available! Read more about the selected students at

Project Sites

The following are descriptions of the projects and sites being offered for 2024. Each project will have 3-4 interns working on it with a variety of intended outcomes. To see previous years' projects, visit the Past Programs tab using the button above.


Project 1: 

Cleaning, Analyzing, and Visualizing Data to Understand the Role of the Data Curator Through Job Postings

with Data Curation Network

In this project, internship participants will analyze job postings to investigate similarities and differences for data curator postings across the listservs and institution types. Also, the team will investigate trends in positions over time. In addition to becoming familiar with data curator roles, the interns on this project will employ basic data exploration processes to reach their findings.

Data Skills to be Developed: 1) CURATE(D) workflow for data curation, 2) Tableau for data visualization, and 3) OpenRefine for data cleaning

Project 2: 

Forest Ecology Data Rescue

with Cary Institute of Ecosystem Studies

The Forest Responses to Stress and Damage (FORSTAD) project was a multi-investigator effort that took place over 10 years (~1992-2002), monitoring long-term changes in the structure and function of Hudson Valley forests to understand the role of crucial stress and damage agents in producing those changes. The internship participants working for this site will learn about FORSTAD and explore the data shared with them in order to curate and prepare data for publishing in a repository.

Data Skills to be Developed:  1) get comfortable working with research data outside interns' domain(s), 2) effectively organize historical data for sharing, 3) apply data curation principles and skills to make data FAIR, ideally using the DCN CURATED model, 4) structure tabular data for sharing, and 5) create metadata and data documentation.

Project 3: 

University Library Use in Health Sciences and STEM Fields: An In-Depth Analysis of 2023-24 Survey Data from Faculty, Staff, and Students

with The University of Michigan Library in Ann Arbor

The internship project centers on quantitative and qualitative analysis of the health sciences and STEM field responses to the University of Michigan Library's campus-wide survey, with questions about the use of library spaces, physical and electronic materials of all types, and services such as consultation, document delivery, library workshops, as well as looking at special needs of the library's users.

Data Skills to be Developed:  1) quantitative data analysis in SPSS, 2) qualitative data analysis in Dedoose, 3) data reporting and visualization, and 4) science communication skills.

Project 4: 

Michael Kudish's Catskill Mountains Field Notes

with Cary Institute of Ecosystem Studies

The FEMC has collaborated with the Cary Institute of Ecosystem Studies to archive the thousands of field notes taken by Michael Kudish since the 1960's. These notes primarily include documentation of species presence, elevation, phenology, and disturbance history throughout the Catskill Mountains. For the internship, this site's participants will learn about field note digitization and use Kudish's plant records to link plant phenology to climate change, estimating coordinates using elevation and field markings. This group will use R and the National Oceanic and Atmospheric Administration (NOAA) API to obtain historical climate data for each coordinate.


Data Skills to be Developed:  1) digitize notes for data interpretation, 2) standardize and clean data, 3) access historical climate data, 4) work with spatial data, and 5) investigate real-world issues using evidence-based data analysis.


  • Must be a US citizen or permanent resident.
  • Must be currently enrolled in an accredited LIS graduate program.
  • Must be a member of a marginalized racial or ethnic group.
  • Must complete all application materials.


Applications are open from March 1 – April 1, 2024. The job description can be found at

To apply, please complete the following application form (the link will be updated annually). Applications are reviewed according to a rubric, and applicants who are accepted will be notified by April 26, 2024.

2024 Application Form 

Up to 12 applicants will be selected.


Would you like to contact a coach? 

A coach is a person external to the process who can assist with ideas for your letter or questions about how to format a resume. Want someone to review your materials to get their feedback? Or are you looking for any other support as your work on your application?

Contact Negeen Aghassibake, Data Visualization Librarian at University of Washington Libraries:

Information Session

Join us for an information session where we provide helpful information about the internship and answer any questions you may have!

March 20, 2024 at 1 PM Central. To register, visit 


External reviewers are Network members who have representative data librarian experience. The external reviewers make scored recommendations for selection to the NCDS Internship Committee.

Applicants may request to receive a copy of reviewer comments and rubric. 

Scored Review Criteria

The application will be scored in the following areas:

  • Describes applicant's interest in the internship and how the internship would benefit the applicant. (20 pts)
  • Describes the applicant's experience working with or seeking to learn about data or providing data services. (5 pts)
  • Describes and details projects the applicant may be interested in undertaking, as part of the internship. (5 pts)

After the scoring process, the applicable candidates are then selected based upon geographical distribution to align with funding and skill/project match.

Applicants who are accepted will be notified in late April. Onboarding procedures will follow in order to begin processing for employment. NCDS will provide an internship agreement for all selected candidates.

Expectations for Participation: Student interns will attend regular meetings and check-ins, complete all assigned trainings, and submit a final project report. Students are strongly encouraged to submit their work to conferences and journals. Each intern will have the opportunity to receive funding from NNLM to present at one conference where their proposal has been accepted.

Recipients of NNLM funding are required to deposit any peer-reviewed manuscript upon acceptance for publication in PubMed Central in accordance with the NIH Public Access Policy.

Data Sharing and Development of Training Materials

To facilitate the dissemination of knowledge and information associated with the NNLM Cooperative Agreement Award, all are required to share any data or training material resulting from funding. This information must be submitted to the following collection sites as applicable: Network of the National Library of Medicine (NNLM) website; Other websites specifically designated by the NLM as part of the Network of the National Library of Medicine (considering changes in the project and data repositories required to maintain sharing within the Network). In addition, recipients of funding are expected to use or adapt existing training materials before developing new materials. Consult with your RML/Office and the NNLM Training Office (NTO) prior to developing materials. Publication and Copyrighting: Per Section 8.2.1. - Right in Data (Publication and Copyrighting) of the NIH Grants Policy Statement. The NIH must be given a royalty-free, nonexclusive, and irrevocable license for the Federal government to reproduce, publish, or otherwise use any materials developed as a result of funding and to authorize others to do so for Federal purposes, i.e. the ongoing development of the Network of the National Library of Medicine.

Data developed by participants and consultants are also subject to this policy.

NIH Acknowledgement

Any resources developed with project funds must include an acknowledgment of NIH grant support and a disclaimer. Please consult with the NCDS for the specific acknowledgement statement to be used for your project award.

Internship Alumni Information


Click links below to read about past participants in the NCDS internship.

2022 Internship Participants

2023 Internship Participants

Alumni Posts

Panel of Former Interns on Their Experiences

Authored Works

NCDS Data Librarianship Intern Resource Guide by Aundria Parkman, Justin de la Cruz, Genevieve Milliken, Peace Ossom-Williamson, Mikala Narlock, Shawna Taylor, Jennifer Darragh, Wind Cowles, and Scout Calvert

Primer for Researchers on How to Manage Data by Maria Arteaga Cuevas, Shawna Taylor, and Mikala Narlock

Clinical Trials Data Primer by Liliana Gonzalez, Mikala Narlock, and Shawna Taylor


2023 Medical Library Association Southern Chapter and South Central Chapter Joint Meeting Absorbing Data: An Intern’s Journey Through the National Center for Data Services Program by Amanda Pazos

2023 Midwest Data Librarian Symposium - Preparing the Next Generation of Data Librarians: A Roundtable Session with National Center for Data Services (NCDS) Interns by Corey E. Black, Katya E. Mueller, and Jennifer Ye Moon-Chung

2023 Midwest Data Librarian Symposium - Engaging MLIS Interns with Data Curation Primers by Maria Lee

2023 Medical Library Association Upstate New York & Ontario Chapter Meeting - The National Center for Data Services Data Librarianship Internship: Training for Underrepresented, MLIS Students through a Bibliometrics Project by Phoebe Yip

2024 Research Data Access and Preservation Summit - NCDS Internship Showcase (3 presentations) by Peace Ossom, Avianna Wooten, and Maria Lee

2022 Digital Library Federation Forum conference - Hands-On Practical Experience in Data Services: Findings from the First Cohort of a Paid Summer Internship for BIPOC Graduate Students by Justin de la Cruz, Dev Wilder, Silvia Wu, and Loida Pan

2022 Joint Conference of Librarians of Color - Diversifying Data Librarianship via a National Internship Program: Practical Experiences for POC by Peace Ossom, Maria Arteaga, Aundria Parkman, and Robert Rosas (session was cancelled due to hurricane)

2023 Research Data Access and Preservation Summit - NCDS Internship Showcase (6 presentations) by Peace Ossom, Dev Wilder, Loida Pan, Liliana Gonzalez, Maria Arteaga Cuevas, and Silvia Wu

Previous Projects

Project 1: "Exploring the NNLM Data Warehouse." (NNLM National Evaluation Center) The NNLM Data Warehouse stores NNLM data from the early 2000s to the present. This data is used to feed interactive dashboards, and NNLM members have the option to request data for their own research, analysis, and reporting. This project will help adapt existing data to a new data reporting system (CiviCRM) and add modifications and improvements to enhance reporting capabilities and analyses.

Data Skills to be Developed: 1) Learn how to interpret and understand data models and database schemas  2) Learn how to use relational databases  3) Become proficient in the SQL programming language  4) Use Python to interact with the database and create data pipeline script  5) Integrate the GitHub code repository into your coding workflow  6) Lean how to retrieve data from APIs  7) Create data visualizations with Tableau


Project 2: "Enhanced Research Metrics: Turning Publication Data into Actionable Insights" (Edward G. Miner Libraries, University of Rochester) Research metrics, such as author collaboration networks and publication impact, can provide valuable insights into scientific production. However, collecting and organizing these data can be a time-consuming and resource-intensive process. This project aims to utilize bibliographic data from Scopus to generate research metrics for various medical departments at our medical center.

Data Skills to be Developed: 1) Data analysis and visualization 2) Data manipulation and cleaning 3) Use of an IDE, e.g., (Visual Studio Code) 4) Familiarity with Rbiblioshiny package 


Project 3: "Ecology of infectious disease" (Cary Institute of Ecosystem Studies) We are looking to predict the next disease outbreak before it happens. This project focuses on (1) identifying animals that amplify disease, using computer algorithms that compare traits of known disease carriers with species not yet known to carry disease, and (2) examining which combinations of species, pathogens, and environmental conditions give rise to disease outbreaks. Interns will help clean and augment a subset of data from the Global Infectious Disease and Epidemiology Network (GIDEON)

Data Skills to be Developed:  1) Creating tidy data  2) Working with and exploring biological and ecological data about mammals and zoonotic pathogens around the world  3) Using R, Python, or other scripting and coding languages 4) plotting and visualizing data 5) learning about data and software management plans


Project 4: "Git Primer Development" (The Data Curation Network) Data Curation Primers are detailed reference documents centered on a specific subject, disciplinary area, or curation task that can be used by curators when curating a dataset that falls outside of their expertise. These step-by-step resources provide a shared knowledge base for a specific data format, method, or tool. Interns will help develop a primer for Git, a distributed version control system that tracks changes in any set of computer files.

Data Skills to be Developed: 1) Git competency 2) GitHub competency 3) Familiarity with different data types 4) Data literacy   


Project 1: NYU Health Sciences Library, NYU Langone Health. The NYU Health Sciences Library Data Services Team provides workshops on various topics for the NYU Langone Health community. Each workshop requests feedback from participants through a survey. NCDS interns will work to analyze the information received in surveys dating from January 2019 as a way of improving educational offerings.

Data Skills to be Developed: 1) Data collection with REDCap, 2) Data cleaning and visualization in Python 3) Quantitative and qualitative data analysis


Project 2: NNLM National Evaluation Center. These internship positions will join our developers in enhancing the NNLM Data Warehouse, which stores NNLM data on subaward projects, activities conducted by staff and subawardees, and participants in NNLM activities. The intern will help to fulfill data requests, test new functionality, and contribute to the code base.

Data Skills to be Developed: 1) Build and use relational databases, 2) Become proficient in the SQL programming language, 3) Use Python to interact with database and create data pipeline script, 4) Retrieve data from APIs, 5) Create data dashboards with Tableau


Project 3: Data Curation Network. The Data Curation Network is a membership network composed of fifteen organizations with wellestablished public access data repositories, which share functional and subject expertise to facilitate the more robust management and curation of research data. Interns will be partnered with a team of DCN mentors and receive data curation training from DCN colleagues for the CURATED protocol, gain exposure to de-identification strategies through existing DCN material, and curate data using the CURATED protocol. The interns will then complete a rotation with DCN curators to obtain hands-on experience in curating various types of datasets that are deposited in the organizational repository

Data Skills to be Developed:  1) Using the CURATED workflow for processing data for sharing and preservation, 2) Evaluating data quality, 3) De-identification of data


NNLM National Center for Data Services

About NNLM

The mission of the Network of the National Library of Medicine (NNLM) is to advance the progress of medicine and improve the public's health by providing U.S. researchers, health professionals, public health workforce, educators, and the public with equal access to biomedical and health information resources and data. NNLM’s main goals are to work through libraries and other members to support a highly trained workforce for biomedical and health information resources and data, improve health literacy, and increase health equity through information. The NNLM Regional Medical Libraries (RMLs), Offices, and Centers rely upon partnerships with Network members to achieve these goals by providing training and funding and other opportunities for development. 

About NCDS

The National Center for Data Services supports NIH Aims to (1) Accelerate discovery and advance health by providing the tools for data driven research, and (2) Build a workforce for data-driven research and health through the goals of

  • developing capacity to conduct data science and/or deliver data services in the health information community,
  • partnering with other national data and/or health sciences organizations to maximize impact, and
  • supporting NIH priorities in the domain of data science, data services, and data governance.
photo of Genevieve and Nicole leaning into each other and laughing

Internship Committee

The internship committee runs the program, teaches workshops, and enacts improvements.

  • Peace Ossom, chair
  • Justin de la Cruz
  • Genevieve Milliken
  • Nicole Contaxis

For more information, visit the NCDS Staff Directory

Contact us at

photo of Justin and Peace standing and smiling