Renewal of NIDCR FaceBase Program

May 2023

Translational Genomics Research Branch
Division of Extramural Research

Back to top

Goal

The overall goal of this concept is to seek renewal of the FaceBase Program to support a state-of-the-art, public repository of dental, oral, and craniofacial (DOC) research and clinical data. The funded repository is expected to recruit, transform, and publicly share research and clinical data that cover the full translational spectrum in specific DOC areas identified by NIDCR. The goal is to enable and accelerate data-driven efforts in knowledge discovery, translation of knowledge gained into health and health care solutions, and the delivery of those solutions.

The funded repository will be required to meet the following expectations:

  • Possess expertise in scientific leadership, community outreach and training, dentistry, and biomedicine, as well as the development, improvement, and operation of public repositories of scientific data.
  • Recruit and publicly share multidimensional scientific data that cover the full translational spectrum in specific DOC biology, health, and disease areas as identified by NIDCR in consultation with the community that it serves.
  • Maintain compliance with the TRUST and FAIR principles.
  • Enable data-driven research that uses AI/ML/DL approaches.
  • Facilitate training of data scientists with diverse backgrounds.
  • Ensure interoperability or synergy with complementary resources and efforts that power data-driven research with data, data analytics, and computing environments.
  • Enable and collaborate with the scientific community to comply with the NIH Data Management and Sharing Policy.

Deliverables will be metrics-driven and evaluated accordingly.

Back to top

Background

NIDCR initiated the FaceBase Program in 2009, supporting a data hub and multiple research projects (“spokes”) to publicly share craniofacial development and malformation research data produced by the spokes. This hub-spoke model was maintained in the second FaceBase funding period, with a new hub and new spokes awarded in 2014. The third funding period, starting in 2019, marked the program’s transition to a “hub only” model to support the FaceBase data repository with broad recruitment of scientific data from DOC-research community for public sharing via FaceBase. This funding period will end in July 2024.

Over time, the FaceBase repository has evolved to meet growing needs for data use in research. Its data coverage of developmental stages, anatomical sites, diseases and disorders, biological levels of organization (molecular, cellular, tissue, and so on), and model organisms, has largely expanded. Its collection of experimental data outputs (data types) generated using cutting-edge technology platforms has continued to increase, and tools for data contributors and users have improved. Today, FaceBase is the central repository of DOC datasets generated using a variety of molecular, cellular, genomic, and imaging technologies, capturing typical and disrupted embryonic development in multiple model organisms. Notable data resources include FishFace – an atlas of zebrafish craniofacial development; MusMorph – a database of standardized mouse morphology data covering numerous genotypes and developmental stages; the newly launched EnamelBase – a sharing site for genetic mouse reagents, research data, and protocols for studying a multi-stage process of dental enamel formation called amelogenesis; and 3D Facial Norms – a web-based collection of normal range human facial images from diverse populations and the 3D morphometric library of craniofacial dysmorphic syndromes, both are accompanied with sequence information and have utility in diagnostics. Datasets are treated as citable, academic works with archive-grade Digital Object Identifiers.

Backed with the DERIVA data management platform, the FaceBase team strategically embraced the agnostic data model early on to accommodate increasingly diversifying datatypes. Other features built over time to streamline processes, reduce manual effort, and better serve data contributors and users include self-curation by data contributors, automation of data pipelines, standards and ontologies, structured metadata descriptions, anatomy-based data navigation, UCSC genome browser tracks, the Human Genomics Analysis Interface, and secure management and sharing of human datasets. Currently, migration of production to AWS cloud is ongoing; CoreTrustSeal certification is near completion; and automated bulk data submission for ingesting very large AI/ML-ready datasets is being implemented for production. These features combined render built-in flexibility for a variety of high-quality datatypes and scalability, making FaceBase a “hybrid” data repository. In other words, it fills a critical gap between the two ends of the data ecosystem, accommodating many experimental types that may have been housed in a generalist repository while ensuring high quality curation that is typical of a specialized repository, thereby making the data more reusable by the diverse community of dental and craniofacial researchers and others.

Aside from the efforts to streamline processes and reduce manual work, FaceBase staff host community forums, bootcamps, conference events, and office hours to recruit data and disseminate the know-how for data submission, curation, navigation, and access. In so doing, FaceBase plays a significant role in nurturing a DOC data-intensive research community. Website usage shows an upward trend, accommodating 21,000 unique website visitors and 29,000 user sessions from March 2022 to March 2023. So far, the data have been reused in DOC and other biomedical research. In particular, the facial image data have been used for method and product development, along with other research purposes. Data reuses have led to publications on shared genetics of human face and brain shape, regulatory pathway reconstitution, the impact of caries on the overall health, and so on. Taken together, the FaceBase repository is poised to evolve along with the fields of DOC translational research and data science and, in turn, enable investigators in these fields with data and data use tools to carry out data-driven research towards discovery and delivery of health solutions to all people.

Back to top

Gaps and Opportunities

Overall, this concept is consistent with community input gathered through 1) regular meetings with the External Scientific Panel (ESP) that offers input to NIDCR about the performance and future directions of the FaceBase Program; 2) annual FaceBase community forums; 3) user surveys conducted by the FaceBase team; 4) user interviews conducted by a third party; and 5) the program evaluation meeting held in January 2023 with the ESP plus external experts on translational data repository and precision medicine.

NIDCR recognizes our unprecedented capabilities to generate, integrate, harmonize, and analyze highly scaled and complex data. The institute’s Strategic Plan 2021-2026 calls for a translational DOC ecosystem of data; infrastructure for data storage, processing, and integration; and applications and data analytics that enable investigators and clinicians to ask questions across diverse datasets. To that end, NIDCR will be implementing pending recommendations from Council Data Science Strategy Working Group (DSS-WG) on how to make available a DOC data ecosystem and empower DOC research, including research in oral health disparities and inequities. This concept is guided by the Strategic Plan and dovetails with the DSS-WG’s effort. NIH-wide, the concept is aligned with the mission of the NIH Office of Data Science Strategy – catalyzing new capabilities in biomedical data science. Furthermore, the concept addresses the newly implemented NIH Data Management and Sharing Policy (DMSP) by aiming to support a state-of-the-art data repository open to DOC as well as the larger biomedical fields. Indeed, a “Hybrid” approach, as exemplified by the FaceBase repository, is applicable to repositories of data representing other areas of biology, health, and disease. Finally, the funded repository can be leveraged for training diverse, future generation data scientists.

Back to top

Impact

Renewal of the FaceBase Program will facilitate data-driven research, potentially leading to insights into biological systems and societal challenges at a depth previously unimaginable.

Back to top

Current Portfolio

The FaceBase Program supports a unique public repository of DOC data. Complementary to the Program are, e.g., three long standing NIDCR Notices of Funding Opportunity supporting secondary data analyses, listed below. Data analytics used in the funded projects under those NOFOs are increasingly robust, applicable to various data types and varied scales to address diverse research questions. Renewal of the FaceBase Program will feed data analysis projects and synergize with other data-driven research that will be initiated following pending recommendations by DDS-WG.

  • PAR-23-133: NIDCR Research Grants for Analyses of Existing Genomics Data (R01) (Clinical Trial Not Allowed)
  • PAR-23-132: NIDCR Small Research Grants for Analyses of Existing Genomics Data (R03) (Clinical Trial Not Allowed)
  • PAR 22-160: NIDCR Small Research Grants for Oral Health Data Analysis and Statistical Methodology Development (R03) (Clinical Trial Not Allowed)
Back to top

Selected References

Back to top
Last Reviewed
April 2024