CSTL Banner NIST Home Page CSTL Home Page CSTL Home Page CSTL Major Projects CSTL Products and Services Contact CSTL Search CSTL/NIST Webspace NIST Home Page

   
  CSTL Programs
 

 

Data and Informatics

For 30 years, NIST has provided well-documented numeric data to scientists and engineers for use in technical problem-solving, research, and development. These recommended values are based on data that have been extracted from the world's literature, assessed for reliability, and then evaluated to select the preferred values. These data activities are conducted by scientists at NIST and in university data centers.

One of CSTL's goals is to assure that US industry has access to accurate and reliable data and predictive models to determine the chemical and physical properties of materials and processes. CSTL's data and informatics activities impact all industry sectors from biotechnology and microelectronics to energy and instrument manufacturers. Versatile interactive databases provide easy access to high-quality NIST data. Many databases are now available via the World Wide Web. The NIST Standard Reference Database (SRD) series has grown to over 80 electronic databases in chemistry, physics, materials, building and fire research, software recognition, and electronics. Through this program CSTL provides SRDs for Analytical Chemistry, Atomic and Molecular Physics, Biotechnology, Chemical and Crystal Structure, Chemical Kinetics, Industrial Fluids and Chemical Engineering, Materials Properties, Surface Data, and Thermodynamics and Thermochemistry. A few of the highlights in the area of Data and Informatics are described below, and a full listing of activities, with references to the appropriate program section, is also provided.

Additional details on:
Bioinformatics
Thermodynamic Research Group
NIST Mass Spectral Library
NIST Chemistry WebBook

Projects in this Area:

       FY 2006

       FY 2005

 

Bioinformatics

One data area of increasing focus for CSTL is bioinformatics. CSTL researchers work to develop adaptive, automated methods of processing and presenting biological and chemical data using connection tables that are sufficiently flexible and easy to use and allow users to find, with confidence, information for the most structurally-relevant data used in structure-based drug design. NIST, in collaboration with NIH-NCI, unveiled the HIV Structural Database, an online database that contains the structures of HIV protease and compounds targeted against this enzyme. This database permits faster and more reliable access to standardized data related to the design and development of compounds against HIV. The availability of such a resource to industry is expected to foster the development of new and better drug products.

The Biological Macromolecule Crystallization Database (BMCD) contains crystal data and the crystallization conditions, which have been compiled from literature. The current version of the BMCD includes 3547 crystal entries from 2526 biological macromolecules for which diffraction quality crystals have been obtained. These include proteins, protein:protein complexes, nucleic acid, nucleic acid:nucleic acid complexes, protein:nucleic acid complexes, and viruses.

The Short Tandem Repeat (STR) DNA Internet Database benefits research and application of short tandem repeat DNA markers to human identity testing. CSTL scientists also maintain other web-based bio-related databases: the Human Mitochondrial Protein Database, and the Thermodynamics of Enzyme-Catalyzed Reactions Database.

Thermodynamics Research Group

NIST researchers are working with other scientists and organizations to establish data standards and more rapid methods of data entry in a number of data areas, including structural biology, thermodynamics, and kinetics. One example is the Thermodynamics Research Center (TRC). The TRC group is working with several journals to have the thermodynamic data from accepted articles go directly into the TRC database entry system through an electronic process. This assures that customers have the most up-to-date and complete information possible. The creation of data transfer and traceability standards is another key area of development. These standards remove barriers to the sharing of information and allow researchers to analyze results and collaborate in new ways. Another key concept is the establishment of the pedigree of data, in which enough information is retained to easily trace results and assign uncertainties to measured values, thus answering the vital question: “How good is that number?” In 2004, ThermoML was completed with incorporation of extensions for critically evaluated data, predicted data, and equation representations. In 2005, it was also accepted as the foundation for the development of the IUPAC (International Union of Pure and Applied Chemistry) standard for thermodynamic data communications. In addition, i n order to build an infrastructure for the process of global thermodynamic data communication, Guided Data Capture (GDC) software was developed for mass-scale abstraction of experimental data from the literature.

NIST Mass Spectral Library

Over 2,500 NIST Mass Spectral Libraries are installed on GC/MS instruments each year. The most recent version of the library was released in 2005, and it remains the most comprehensive, reliable library of mass spectral ‘fingerprints' to assist in the task of compound identification by GC/MS. GC/MS is the most widely used analytical tool for low concentration analysis for food safety and environmental monitoring. In addition it is extensively used in general organic analysis, the development of new flavoring agents, the analysis of fragrances and in many medical applications. However, at extremely low concentrations, it can be difficult to extract the trace signal from the mass spectrum due to the very complex background that is present. Therefore, a deconvolution software tool was developed by NIST. The Automatic Mass spectral Deconvolution and Identification Software (AMDIS) was originally developed for detection of chemical weapons in complex mixtures such as might be found in the environment or in chemical process streams. It was designed to work without analyst input as a method of insuring that sensitive business information that could be present in a process stream was not compromised. In the last year the growth in the use of AMDIS by the organic analytical community has been very strong. One of the most exciting developments has been the incorporation of AMDIS into a new set of tools for automatic analysis developed by Agilent Technologies. The tools have been given the general name of Deconvolution Reporting Software (DRS) and incorporate Agilent Technologies run-time locking technology, the NIST search software, and AMDIS in a combined tool to allow users to identify pesticides at lower concentrations and with more confidence than had been possible with the Agilent system alone. More details are provided in the article entitled: Automatic Mass Spectral Deconvolution and Identification Software (AMDIS).

NIST Chemistry WebBook

The NIST Chemistry WebBook remains one of the most used resources for chemical and physical property data. The numbers of users, between 10,000 and 20,000 per week, and the variety of users, in industry, government and academia is a clear indication of the success of the WebBook. The fraction of returning users, typically 45% to 55%, is a good indication that the user community feels that the resource is valuable. The NIST Chemistry WebBook has been awarded "Best Chemistry Site on the Web - Portals and Information Hubs" by ChemIndustry.com Inc., John Wiley and Sons, Inc., and the Royal Society of Chemistry, UK . The WebBook is second in total use among chemistry database web sites (only the Chemical Abstracts site has higher usage) and over 2,500 sites directly link to the WebBook, including essentially every technical library in the world. The Chemistry WebBook is now being translated to other language versions. The first results of this are a set of web pages allowing the basic search to be done in French, Spanish, Czech, and Portuguese.

The WebBook is also a tool to aid future evaluation projects both at NIST and in collaboration with outside organizations. It is difficult to overstate the possible impact of the ongoing work on developing standard protocols for transmission of chemical data. The need for such standards has only grown as the use of the Internet in electronic commerce has grown. This need has been acknowledged by the large number of commercial as well as governmental entities, in particular those working on IUPAC and ASTM committees.


Privacy Policy / Security Notice / Accessibility Statement / Disclaimer / Freedom of Information Act (FOIA)
NIST is an agency of the U.S. Commerce Department's Technology Administration

Date created: June 27, 2006
Last updated: April 4, 2007