Impact Focus: Building Trust in Software Systems: Standards, Interoperability, and the Future of AI. Q&A with Dr. Ram D. Sriram
Q1. Dr. Sriram, you have been involved with object-oriented systems (OOS) for more than four decades. Can you give us a brief history of your involvement with object-oriented systems, in particular object-oriented database management systems?
My foray into object-oriented systems started with my doctoral dissertation at Carnegie Mellon University (CMU). I was in the Department of Civil Engineering, but was very fortunate to have met Prof. Steven Fenves, who suggested that I should do my doctoral dissertation with him on AI for Engineering Design. He also suggested that I take courses in computer science, in particular AI. So, I took an undergraduate course on AI, which was taught my Elaine Kant.
I also attended a graduate course in AI, which was taught by AI stalwarts, such as Profs Raj Reddy and Allen Newell. Inspired by Prof. Raj Reddy’s work on Hearsay-II, I decided to focus on the development of an integrated structural analysis and design system using the blackboard problem solving paradigm around 1983. This led to the development of DESTINY for integrated design. Instead of using the bottom-up approach of speech recognition, DESTINY used a top-down approach by organizing the blackboard into various levels of abstractions. Several Knowledge Modules, which can be viewed as agents, communicated through the Blackboard and controlled by an Inference Mechanism. This was probably the first use of the Blackboard Architecture for engineering design. The entire DESTINY system was built using a frame-based language called Schema Representation Language (SRL). SRL was based on object-oriented concepts and was an extension of LISP.
After I graduated from CMU, in 1986 I joined the faculty of Civil Engineering at Massachusetts Institute of Technology (MIT). I collaborated with Bob Logcher, who along with Steve Fenves, developed the first large scale analysis package on a digital computer. Bob and I envisioned that the Internet, which was really used for e-mail by engineers at that time, will transform the way design will be done. Although we had a lot of resistance from traditional engineering scientists, we managed to conceptualize and build one of the first computer supported collaborative design environments – called DICE (Distributed and Integrated Computer Aided Engineering Environment) which had all the features that Facebook initially had (actually with much better security mechanisms).[1] This was feasible because of the world class graduate students we had.
Nicolas Groleau (Nic) who joined MIT in 1987, after having finished his undergraduate studies in France, was recruited to work on the initial prototype. At the same time an undergraduate electrical engineering student, Karl Büttner, started working on an object-oriented implementation of the Blackboard architecture. Within a year, Karl was able to develop an initial prototype in Smalltalk-80 on a Tektronix machine, which was donated to MIT by Mike Freiling’s group. Karl’s system ran on a single machine and did not support a distributed architecture. Nic used the client-server paradigm and developed a prototype in LISP. This prototype illustrated many of the concepts we envisioned. He finished this work in February 1989.
The Blackboard in Nic’s prototype resided in the main memory. This created scalability problems. To address this, we decided to use a database management system for storing and retrieving the evolving design. Since most of our KMs were developed in LISP, we wanted a LISP-based object-oriented database management system (OODMS). ORION, being developed at Microelectronics and Computer Technology Corporation (MCC) by Won Kim and his group, was ideally suited for our work. However, due to various contractual agreements MCC could not provide us with ORION. Albert Wong and Shamim Ahmed (who joined us in 1989 and 1990, respectively) developed a prototype in GEMSTONETM, which is Smalltalk-based OODBMS.
In the meantime, C++ was gaining increasing acceptance in the commercial world and we decided to re-implement various applications in C++. We also wanted our user interfaces to run on different computer platforms. X-Windows, developed at MIT, seemed appropriate at that moment, just as Web-based environments would be ideal today. After experimenting with several OODBMS we decided to use ObjectStoreTM, a C++-based OODBMS we acquired through the good grace of Tim Andrews at Object Design Inc.
As I mentioned before, for pragmatic reasons we decided to utilize C++ for implementing various DICE modules. C++ offers the advantages of object-oriented programming, while retaining the efficiency of C. However, C++ is a statically typed language and does not support the incremental addition of class objects, which was needed for rapid prototyping. Further, C++ does not come with a problem solving mechanism. Hence, we developed an object-oriented knowledge-based building tool—called COSMOS (C++ Object-oriented System Made fOr expert System development)—to address these deficiencies.
The evolution of COSMOS makes an interesting software engineering anecdote. It was developed by my students over a two year span (1989 – 1991). The following people contributed to the final MIT version: Bruno Fromont (Forward Chaining), Sreenivasa Rao Gorti (Backward Chaining), V. Vaidyanathan (Parsers), Vincent Su (User Interfaces), Murali Vemulapati (Database), and Albert Wong (Overall Architecture and Integration). COSMOS was extensively tested in my knowledge-based expert systems course. I also embarked on the development of a knowledge representation language for engineering problem solving (KREEPS), which is a general framework for representing engineering knowledge using COSMOS as the representation language and an appropriate OODBMs for database storage.
Collaborative engineering environments require a flexible framework for concurrency management of highly interleaved and interactive transactions. Traditional database management systems, including commercial OODBMS, provide various locking mechanisms in order to maintain data integrity. The techniques provided in these systems are good for short duration transactions, such as banking. However, design transactions are long duration in nature and the techniques developed for traditional DBMS are not appropriate, as these inhibit information sharing and may result in reduced concurrency and intolerably long waits. To address these limitations, Shamim Ahmed developed a framework and a prototype implementation. Shamim’s system provided various facilities needed for collaborative engineering. It also introduced a security mechanism – known nowadays as Role-based access.
Although ObjectStoretm provided most of the facilities for developing our prototypes, it lacked dynamic linking capabilities needed for design schema evolution. Murali Vemulapati implemented dynamic linking facilities over EXODUS, a public domain OODBMS developed by Mike Carey and his group at the University of Wisconsin, Madison.
Mike Carey moved to University of California, Irvine, and extended EXODUS to a system called Asterix (https://asterix.ics.uci.edu/), which I believe formed the underlying DBMS for some commercial systems.
Other students and faculty involved in the DICE project include: Nestor Agbayani, Navin Chandra, Jonathan Cherneff, Salal Humair, Karim Hussein, Masatoshi Kano, Atsuo Nishino, Feniosky Peña-Mora, and Amar Gupta. The details of the implementation of various DICE modules can be found in my book entitled “Distributed and Integrated Computer-Aided Engineering Design.” If you would like a PDF version of the book I will be glad to send it to you.
During my tenure at MIT, I started a graduate course on Object-Oriented Systems. I developed a notation similar to UML, but when the UML book by Rumbaugh et al. came out, I decided to follow their notation. We also offered as a summer course. We had several guest lecturers, including Stan Zdonik, Brown University, and Brad Cox, developer of Objective C. One of the outputs of this is a five volume video series that is available here. We also wrote a paper comparing various object-oriented database management systems in the Journal of Object-Oriented Programming. One particular vendor did not like the way we evaluated his system and complained to MIT, but Stan Zdonik came to my rescue.
In 1994 on the advice of Pradeep Khosla, I moved to NIST and using object-oriented techniques we built the NIST Design Repository. The NIST Design Repository project, which extended DICE’s product modeling framework, developed with Simon Szykman in 1995, became a model for design repository research. The NIST Design Repository revolutionized engineering design by providing a framework for capturing, sharing, and reusing rich, heterogeneous design knowledge. However, due to pragmatic reasons, we had to use a relational database as a backend. Later, after I joined the Software and Systems division we worked on mapping UML to Category Theory (more on Category Theory in a later question).
Q2. During your tenure as Chief of the Software and Systems Division, how has NIST’s approach to software testing and quality assurance evolved to meet the challenges of modern software development? Can you discuss the transition from traditional testing methodologies to your current focus on “scientifically rigorous and innovative software testing techniques,” and highlight some breakthrough projects that exemplify this evolution in the Software Quality Group’s work?
The main purpose of the Software and Systems Division (SSD) is to inspire and cultivate trust and confidence in software, systems, and their measurements.
The division accelerates the developmentand adoption of correct, reliable, interoperable, and testable software in many application areas, such as digital forensics, health care, biosciences, smart grid, Internet of Everything, cloud computing,material genome initiative, and scalable computing applications.
In particular, the Software Quality Group, led by Barbara Guttman, develops tools, methods, and related models for improving the process of ensuring that software behaves correctly, and for identifying software defects, thus helping industry improve the quality of software development and maintenance.
Improving the quality of software, especially for reducing security vulnerabilities, is an ongoing national priority. Recent attacks have highlighted, yet again, our dependence on software and how adversaries and software errors can cause significant harm.
SSD’s efforts in this area focus on developing core reference data and techniques that support the software assurance marketplace.Complementary work in our sister division — Computer Security Division (CSD) — addresses secure software development practices and supply chain risk management.
SSD has focused on key aspects of software assurance related primarily to understanding, finding, and preventing software bugs. Our goal is to develop knowledge about bugs so that better tools and techniques can be developed too prevent or fix them.
Understanding bugs. SSD is developing a Bugs Framework (BF), led by Irena Bojanova. BF is a structured, complete, orthogonal, and language- independent classification of software bugs. Each BFclass (e.g. Injection (INJ) or Memory Use Bugs (MUS)) is a taxonomic category of a kind of bugs,defined by all possible cause→consequence transitions, a set of operations, and a set of attributes. Structured means that a weakness is described via one cause, one operation, one value per attribute, and one consequence from the appropriate lists of values defining a BF class.
Complete means that BF has the expressiveness to describe any possible software weakness. Orthogonal means the sets of operations of any two BF classes do not overlap. Language-independent means it is applicable for source code written in any programming language. BF extends the Common Weaknesses Enumeration (CWE) as a back-end (via causes, operations, consequences, and related attributes) and in coverage (eliminating gaps and overlaps).
It also allows unambiguous descriptions of the particular instance(s) of the weakness(es) associated with a particular vulnerability as those recordedin the Common Vulnerabilities and Exposures (CVE) catalog. BF would provide a more formal approach towards vulnerabilities root cause identification, mitigation, and prevention. SSD alsodeveloped and is extending a Software Assurance Reference Dataset (SARD), which is led by Vadim Okun. It is a collection of over 170 000 programs, with documented weaknesses, in multiplelanguages, including C, C++, Java, C#, and PHP. The dataset includes “wild,” “synthetic,” and “injected” test cases. “Wild” means production software with some weaknesses identified. “Synthetic” means written to test or auto generated. “Injected” means with vulnerabilities carefully inserted into production software.
Finding bugs. SSD has run several Static Analysis Tool Exposition (SATE) events.
SATE is a non-competitive study of static analysis tool effectiveness, aiming at improving tools and increasing public awareness and adoption.
Participating tool makers run their static analyzers on a set of programs provided by NIST and return the tool outputs to NIST researchers for analysis.
The purpose of SATE is not to evaluate nor choose the “best” tools. Rather, it is aimed at exploring thefollowing tool characteristics: relevance of warnings to security, their correctness, and prioritization.
SATE provides feedback to help toolmakers improve their tools in accuracy, precision, and impact andenables empirical research about static analysis.
Currently, our team (Vadim Okun, Aure Delaitre, Amine Lbath) is working on using AI techniques for detecting bugs in software (more about this later in Question 7).
Preventing bugs. Excellent software development must minimize bugs from the beginning, for instance through formal methods.
SSD has researched approaches applicable early in the lifecycleand documented them in interagency reports, such as Dramatically Reducing Software Vulnerabilities: Report to the White House Office of Science and Technology Policy, 2016, NIST IR 8151 and FormalMethods for Statistical Software, 2019, NIST IR 8274. SSD is developing automated software measures to show which formal methods are best suited to different software architectures and to guide developers in making software more suited to formal verification and assurance.
Q3. The Systems Interoperability Group has been instrumental in advancing healthcare IT standards and interoperability testing. Given your dual role as manager of NIST’s Health IT Program, how has the division’s work contributed to resolving critical interoperability challenges in healthcare systems? Can you share insights into specific projects like the Healthcare Data Interoperability & Productivity Platform and their real-world impact on healthcare data exchange and patient care?
NIST’s Systems Interoperability Group, whose group leaders since I took over included Lisa Carnahan, Kevin Brady, and John Garguilo, has been a driving force in modernizing healthcare IT by developing and promoting robust standards for data exchange. Bettijoyce Lide played a significant leadership role in the initial stages of the project, a role later assumed by Kamie Roberts. Our work has directly improved the reliability of healthcare data through the creation of conformance testing tools and methodologies that are now widely adopted across the industry. We have been instrumental in authoring key standards for critical areas such as laboratory results, immunizations, public health reporting, medical devices, and medical document sharing.
Our Healthcare Data Interoperability & Productivity Platform, with leadership from Robert Snelick and aided by Bill Majurski, Michael Indovina, Sheryl Taylor, Caroline Rosin, Nicholas Crouzier and associates Youssef Bouij, Hossam Tamri, Abdelghani El Quakili, and Ismail Mellouli, has empowered major health organizations, including the Center for Disease Control and Prevention (CDC), American Association of Public Laboratories (APHL), American Immunization Registry Association, and Health IT vendor to create, implement, and test their own data exchange specifications.
The real-world impact of our efforts is evident in the enhanced quality and interoperability of Electronic Health Record (EHR) and Registry systems, which have seen a dramatic increase in adoption and performance since the implementation of the CMS Meaningful Use program. One specific example is AIRA’s use of the NIST Immunization test tools to quarterly test Immunization Registries for conformance. AIRA has shown a dramatic increase in the capability and adherence of these systems. By ensuring the quality of healthcare data, we are not only improving patient care today but also laying the groundwork for the future of AI-driven healthcare solutions.
For more information, see: https://www.nist.gov/itl/ssd/systems-interoperability-group.
NIST’s direct impact on healthcare systems interoperability includes improving healthcare data exchange standards and providing tools to test healthcare systems for conformance to those standards.
Standards Impact:
- The vast majority of healthcare data is exchanged via the HL7 v2 standards. NIST is the principal author of the conformance methodology specification for HL7 v2, which provides the methods for writing requirements for this standard.
- NIST has aided in authoring specific data exchange standards, including those for: Laboratory Results (e.g., reporting Lipid Panels); Laboratory Ordering; Laboratory Results Reporting to Public Health; Immunization Reporting and Querying; and Syndromic Surveillance Reporting.
- The NIST Productivity Platform has been used by many stakeholders to create and publish national and local jurisdictional specifications. These stakeholders include: Centers for Disease Control and Prevention (CDC); Association of Public Health Laboratories (APHL); American Immunization Registry Association (AIRA); Local, State, and Jurisdictional Public Health Agencies; Integrating the Healthcare Enterprise (IHE); Health Information Technology Developers and Vendors.
Testing Tools Impact:
- NIST has created specific conformance testing tools to test healthcare systems for compliance. The tools are used for: HHS ASTP/ONC EHR Certification (for CMS Meaningful Use); AIRA Assessment of Immunization Registries; APHL Assessment of Public Health Laboratories; Syndromic Surveillance Reporting; Integrating the Healthcare Enterprise (IHE); On-boarding of provider systems for immunizations, lab results, and syndromic surveillance systems; Health Information Technology Developers and Vendors.
- NIST has created a platform for stakeholders to create their own customized conformance testing tools. The platform allows for: Public Health Agencies to create state and jurisdiction-specific specifications and tools derived from national standards; and Health Information Technology Developers and Vendors to define interface definitions and tools to test their products.
Real-World Impact on Healthcare Data Exchange and Patient Care:
- Quality standards are the key to interoperability and data quality. NIST plays a vital role in improving the methods for writing standards, authoring specific standards, and providing a platform to create computer-processable standards.
- Testing is key to interoperability and data quality. NIST provides tools to test healthcare systems for conformance.
- Through standards improvement and product testing, the adoption, evolution, and performance of Electronic Health Record (EHR) systems have increased dramatically since the onset of the CMS Meaningful Use program. NIST has played a critical role in this endeavor.
Examples of Specific Impacts:
- Laboratory results are one of the most important analytical tools providers use for patient diagnosis and treatment plans. Healthcare systems exchanging lab result messages have been tested by the NIST conformance testing tools.
- AIRA uses the NIST Immunization test tools to quarterly test Immunization Registries for conformance. AIRA has shown a dramatic increase in the capability and adherence of these systems.
Downstream Impact of Better Healthcare Data Quality:
- Healthcare records contain vast amounts of data, including treatment histories and patient outcomes. Data quality is the key to AI because the accuracy, reliability, and relevance of the outputs generated by these systems are directly dependent on the quality of the data they are trained on; essentially, “garbage in, garbage out.” Poor data leads to inaccurate and unreliable AI results, while high-quality data ensures that AI models learn accurate patterns and produce meaningful insights.
- AI can analyze this data to help develop more effective healthcare plans for patients with similar conditions. A robust knowledge base is essential—ensuring that this data is accurate, complete, consistent, relevant, and up-to-date is critical for effective AI-driven healthcare solutions.
- NIST’s efforts in promoting interoperability and data quality in healthcare systems directly impact AI applications. Data is the fuel, and AI is the engine—NIST’s work in the healthcare IT space improves the fuel for AI.
Q4. The Configurable Data Curation System (CDCS) represents a significant leap in scientific data management and curation. Can you walk us through the conceptual origins of CDCS, the key technical and organizational challenges your team overcame during its development, and how it has evolved to serve not just NIST but broader scientific communities? What lessons have you learned about building platforms that enable collaborative scientific data sharing across diverse domains?
The Origin, Vision and Architecture of the CDCS. The Configurable Data Curation System (CDCS) concept was a direct response to data needs identified by the National Institute of Standards and Technology’s (NIST) Materials Genome Initiative (MGI) program. The MGI sought to accelerate and scale the discovery and development of advanced materials. However, its starting point was a landscape of valuable, but disparate, community data and activities. Most of these lacked the standardized formats, systems, or infrastructure necessary for integration—a prerequisite for achieving the MGI’s envisioned gains.
Consequently, the original CDCS design was directly targeted at these gaps. It established essential requirements and provided prototype systems for a global-scale informatics platform. This platform was designed to host, discover, share, and transform modular data of many types and formats, thereby achieving the integration and interoperability required across a community-scale infrastructure.
The CDCS architecture is a modular informatics platform built on top of Django’s Python-based web-application framework.
From a web-application standpoint, the CDCS inherits the Model-View-Template (MVT) architectural pattern from Django (a variant of the well-known Model-View Controller, or MVC, object-oriented design pattern). CDCS realizes a scalable, distributed architecture composed of interconnected web-application nodes.
The system features two primary node types, each serving a distinct function: 1) Repository: Contains the actual datasets for a specific research group, organization, or domain; 2) Registry: Provides domain-specific indexing and search capabilities, allowing users to discover data within and between domains and the repositories that contain them. A fundamental feature of any CDCS node is the ability for users to rapidly create structured, validated content using XML and/or JSON. Built on the Django stack, the system is flexible enough to accommodate various storage engines. By default, it is configured to use PostgreSQL and MongoDB out of the box. CDCS nodes are designed to form a federated network, enabling them to exchange and synchronize data. This is achieved using the OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) protocol. When configured, this protocol allows CDCS nodes to asynchronously harvest datasets from one another, ensuring consistency and broader data discovery across the network.
Technical and Organizational Challenges. The CDCS faced significant challenges, both technical and organizational. Technically, the core issue was making minimal, yet essential, design decisions that would allow the platform to flexibly host arbitrary modular data formats. Around this core, the system needed to grow capabilities for distributed transformation, rendering, sharing, searching, and processing of the data. Organizationally, the challenge was a collision of scientific, technical, and cultural norms. Our researchers not only had to learn and incorporate informatics but also had to become system owners and developers themselves.
Fostering a Sustained Infrastructure. The CDCS was built as a user problem-solving platform, grounded deeply in domain-specific goals, practices, and knowledge. Its initial growth was significantly catalyzed by the open-source approach, which provided a way to accelerate R&D, data sharing, rapid prototyping, and system integration across NIST, the U.S., and internationally. Early CDCS nodes—registries and repositories—were installed at key metrology and materials science hubs such as the Bureau International des Poids et Mesures (BIPM), the Materials Data Facility, as well as NIST.
However, growing and maintaining knowledge and infrastructure at a community scale requires additional resources.
This is as much a cultural challenge as a technical one, as it involves cultivating the very medium through which domain researchers conduct their research. The global infrastructure is, in fact, the collective “system” that these individual projects compose. Sustaining this infrastructure requires continuous engagement with community stakeholders to foster an awareness of this shared system, ensuring the communities that benefit from its advances will, in turn, sustain it.
Core Lessons for Collaborative Scientific Data Sharing. The Primacy of a Stable Development Process: A robust, iterative, open-source development process is essential for stability and scaling. Complex systems thrive on small, incremental changes (local evolutions), not massive, frequent overhauls. This approach, championed by talented developers focusing on key logical and architectural abstractions, is the only way to sustain and scale a community-level platform effectively over time.
Infrastructure Alone Is Necessary, But Not Sufficient: Simply providing infrastructure and access to data offers initial gains, but it won’t achieve the desired level of accelerated, scalable processing and powerful inference. To unlock maximum capability, you need to focus on the quality and structure of the data itself.
The Power of Advanced Community-Scale Modeling: Optimal processing is both cultural and information theoretic.
The true precondition for maximum computational power and efficiency is the logical structure and quality of the data models.
This requires: a balance between the modular properties of the data and the rich, domain-specific requirements for representation; community-scale modeling that defines and shares reusable patterns across diverse projects and domains.
Cultural Habits Drive and Sustain the Infrastructure: As Pieter Hintjens suggested, “the physics of software is people.” Infrastructure is not grown and maintained by technology alone, but by the habits and culture of its users. The most effective platform designs are those that reflect the existing cultural habits of problem-solving. When infrastructure aligns with long-lived collective behavior, it achieves long-term value and embodies the logical structure that promotes the most powerful and efficient inference at scale.
A brief history of CDCS. The project was started by Mary Brady and Alden Dima with funding from the Materials Genome Initiative, which was headed by Jim Warren. Notable names in SSD who have contributed to the project over a long period of time are Walid Keirouz, Ben Long, Philippe Dessauw, and Guillaume Sousa Amaral. Others who have contributed for a shorter duration are Alexandre Bardakoff, Sarra Chouder, Marcus Newrock, and June Lau.
A longer history of CDCS. The roots of the CDCDS can be traced to MGI-related informatics discussions began in the first few years of MGI at NIST, materializing as initial collaborations between NIST’s ITL, MML, and ODI operational units. The original efforts were the result of initial discussions between Mary Brady, Alden Dima, and Jim Warren which began to rapidly grow into collaborations between ITL and MML/ODI staff members, notably Chandler Becker, Carelyn Campbell, Ursula Kattner, Robert Hanisch, Gretchen Greene, Raymond Plante, Zachary Trautt, Lucas Hale, and Shengyen Li.
The initial version of the CDCS system was prototyped by Alden Dima, Philippe Dessauw, and Pierre Savonitto. Soon, Sharief Youssef and Guillaume Sousa Amaral joined the team and moved the initial prototype to the 1.x series implemented in Django. Collaborations with MML/ODI and other stakeholders resulted in a number of initial deployments such as the NIST Materials Resource Registry (NMRR), the BIPM International Metrology Resource Registry (IMRR), and project-specific repositories at NIST (such as the Interatomic Potentials Repository and Phase-Data Repository).
The CDCS team gained a number of developers through the years, including the following talented members who joined Guillaume and Philippe to form the core development team: Adrien Catel, Augustin Chini, Xavier Schmitt, Pierre-Francois Rigodiat, and Hamza Bouhanni. As the team grew over the late 2010s, CDCS was redesigned into a modular Django-app-based architecture which formed the 2.x series. During the pandemic, the development stack continued to evolve offering more flexibility to the community in the 3.x series as the team continued to take on challenging informatics tasks such as hosting an ongoing stream of increasing COVID-19 literature in CDCS systems.
Since then, the core has continued to evolve based on stakeholder projects and has sought to adapt strategically (such as bringing analytics and LLM usage closer to the core) to bring researchers close to the tools and resources necessary to address their R&D problems over time. The CDCS leadership team evolved from being originally led by Alden Dima to eventually being led by Kevin Brady Sr., and then by Benjamin Long (to the present). A number of additional projects sprang up around the CDCS, its tooling, APIs, and various R&D applications at NIST and beyond.
While the total number of collaborators, contributors, and application-developers would be too numerous to enumerate exhaustively, the core team acknowledges contributions from the following researchers: Alexander Bardakoff, Samia Benjida, Augustin Chini, Sarra Chouder, Faical Yannick Congo, Ali Daudi, Timothee Keyrkhah, Jaiwon Kim, Gerard Lemson, Melvin Martins, Yande Ndiaye, Mylene Simon, Joshua Taillon, Peter Bajcsy, Kathryn Beers, Kamal Choudhary, Noah Last, June Lau, Lyle Levine, Yan Lu, Frederick Phelan Jr., Kelsea Schumacher, Paul Witherell, Laura Bartolo, Ben Blaiszik, David Elbert, Michele Griffa, Greta Lindwall, and Stefan Szyniszewski, Marcus Newrock, Adam Morey, and Karen Price. This project and its collective impacts have been defined and enriched by this vibrant community of collaborators.
Q5. Your background spans both manufacturing systems integration (where you led work on CAD system interoperability standards) and software systems. How has this unique perspective influenced your approach to leading the Software and Systems Division? What synergies have you discovered between standards development in manufacturing and software domains, and how do these insights shape current projects in areas like AI applications and sustainable systems?
While my previous work on the DICE project at MIT was on tools, techniques, and shared modeling, my focus at NIST was on information exchange standards, as lack of interoperability was costing the engineering industry several billion dollars each year. We identified several types of information exchange mechanisms between engineering software tools: knowledge-based design to computer-aided design (CAD); CAD to CAD; computer-aided engineering analysis (CAEA) to computer-aided engineering analysis (CAEA); CAD-CAEA; CAD to Solid Free Form Fabrication (or additive manufacturing); CAD to process planning; and product data management systems to everything else. Additionally, we also looked into representational schemes that extend product information exchange beyond the geometry that was served by CAD systems. To achieve this, we embarked on a project to develop a representation scheme with proper semantics. This project, based on DICE’s shared product model, resulted in the Core Product Model (CPM) and its various extensions and developed in collaboration with Drs. Steven Fenves, R. Sudarsan, and Eswaran Subrahmanian.
CPM is an abstract object-oriented model with generic semantics, with meaningful semantics about a particular domain to be embedded within an implementation model and the policy of use of that model. In addition to traditional geometric representations, CPM supported the notions of form, function, and behavior. The Open Assembly Model (OAM) extended CPM to provide a standard representation and exchange protocol for assembly, which includes for tolerance representation and propagation and kinematics representation. Several extensions to CPM/OAM, such as mapping to ontologies, were also developed.
Along with Deba Dutta and Lalit Patil, we showed semantic interoperability between computer aided design and manufacturing systems. Other extensions to CPM/OAM were made by Abdelaziz Bouras, Sebti Foufou, SK Gupta, Xenia Fiorentini, Farrokh and Janet Mistree, Vadim Shapiro, Jami Shah, Under my leadership the Process Specification Language (PSL), which resulted in ISO 18629, was developed by Craig Schlenoff and Michael Gruninger.
Conrad Bock led the development of OMG’s SysML/UML and Business Process Modeling Notation (BPMN). CPM/OAM led to development of a program in sustainable manufacturing, where we developed representational schemes for shop floor manufacturing, with leadership from R. Sudarsan, Kevin Lyons and Mahesh Mani. CPM and its various extensions have found widespread use in industry (e.g., Boeing, GM, Ford, and European aerospace industry) and academia and have resulted in a substantial reduction in computer based interoperability costs in manufacturing. Our work on standards for 3D printing resulted in better information exchange between commercial CAD systems and layered manufacturing processes. PSL – the process specification language developed by our group — resulted in ISO 18629.
PSL has been used by many manufacturers (e.g., Siemens, SAP, Celestica, Rolls Royce) and enterprise software providers (SAP), while SysML/UML and BPMN (through the Object Management Group) have greatly improved productivity and business innovation across the industrial and manufacturing sectors. Many patents have been influenced by our work.
For example, the patent number US 9158-865 B2 titled “process program and apparatus for displaying an assembly of objects …” used our work as the prior art. The patent number US 10 311-182 B2 titled “topological change in a constrained asymmetrically subdivision mesh” refers to our work on product life cycle management as prior art.
The work on sustainable manufacturing led to the establishment of new committees for developing standards in the area of sustainable manufacturing (ASTM E60.13). This work provided a scientific basis for two ASTM standards (E2986-15 Standard Guide for Evaluation of Environmental Aspects of Sustainability of Manufacturing Processes, WK35705 New Guide for Sustainability Characterization of Manufacturing Processes) and a maturity model for deploying the sustainable manufacturing practices for SMEs. There are many such examples of how the work my team did aided the U.S. industry in acquiring various technology patents. This helped them to take leadership roles in providing software solutions for manufacturing.
The realization that healthcare and manufacturing shared organizational, technological, and informational parallels inspired me to launch the Manufacturing Metrology and Standards for the Healthcare Enterprise (MMSHE) program in the Manufacturing Engineering Lab (MEL) at NIST in 2005. As part of this program, I co-authored the report Healthcare Strategic Focus Area: Clinical Informatics, NIST Interagency or Internal Reports 7263, which for the first time presented a set of components for an interoperable framework. This report identified key roles for NIST in systems engineering, semantic languages, model-driven architecture, plus interoperability and conformance testing. I continued working on interoperability for Health IT (see answers to question 3), after I moved to the Information Technology Laboratory as the chief of Software and Systems Division.
In all types of communication, the ability to share information is often hindered because the meaning of information can be drastically affected by the context in which it is viewed and interpreted. Different representations of the same information may be based on different assumptions about the world, and use differing concepts and terminology — and conversely, the same terms may be used in different contexts to mean different things. Often, the loosely defined natural-language definitions associated with the terms will be too ambiguous to make the differences evident, or will not provide enough information to resolve the differences.
To address these challenges, various groups within industry, academia, and government have been developing sharable and reusable models known as ontologies. All ontologies consist of a vocabulary along with some specification of the meaning or semantics of the terminology within the vocabulary. In doing so, ontologies support interoperability by providing a common vocabulary with a shared semantics. Rather than develop point-to-point translators for every pair of applications, one simply needs to write one translator between the application’s terminology and the common ontology. Similarly, ontologies support reusability by providing a shared understanding of generic concepts that span across multiple projects, tasks and environments. The 2016 Ontology Summit (see https://ontologforum.org/index.php/OntologySummit2016), which I helped organize, delves further into use of ontologies for semantic integration.
The various ontologies that have been developed can be distinguished by their degree of formality in the specification of meaning. With informal ontologies, the definitions are expressed loosely in natural language. Semi-formal ontologies, such as taxonomies, provide weak constraints for the interpretation of the terminology. Formal ontologies use languages based on mathematical logic. Informal and semi-formal ontologies can serve as a framework for shared understanding among people, but they are often insufficient to support interoperability, since any ambiguity can lead to inconsistent interpretations and hence hinder integration. This can be represented by the Ontology Spectrum, which in the words of one of my colleagues Leo Obrst can be stated as “The Ontology Spectrum depicts a range of semantic models.
What is normally known as an ontology can range from the simple notion of a the terminological model can range from the simple notion of a Taxonomy, to a Thesaurus (terms, synonyms, broader than/narrower than term taxonomies, association relation), to a Conceptual Model (concepts structured in a subclass hierarchy, generalized relations, properties, attributes, instances), to a Logical Theory (elements of a Conceptual Model focusing however on real world semantics and extended with axioms and rules, also represented in a logical KR language enabling machine semantic interpretation).
A Conceptual Model can be considered a weak ontology; a Logical Theory can be considered a strong ontology. The Ontology Spectrum therefore displays the range of models in terms of expressivity or richness of the semantics that the model can represent, from “weak” or less expressive semantics at the left (value set, for example), to “strong” or more expressive semantics at the right. XML is sufficient for syntactic interoperability, XML Schema enables structural interoperability, but a minimum of RDF is necessary for semantic interoperability.” UML, which is near the left end of this spectrum, was used quite extensively for representing ontologies. One major issue is that the above formalisms are not good at representing engineering mechanics. One potential scheme that can be used to represent both predicate calculus and Newtonian mechanics is Category Theory (CT).
Along with Spencer Briener and Eswaran Subrahmanian, I embarked on exploring the use of category theory (CT) in information and knowledge modeling. While CT has been applied in programming languages and other parts of computer science, my group led the field in applying CT to systems design and engineering. Our work on CT was motivated by the search for knowledge representation, which is composable and formal and is a critical requirement for applying AI to various domains.
This effort is expected to aid not just domain modeling but also formalized models for interoperability, leading to a substrate for applying AI in multi-domain collaborative work. Any AI system would require the data available from different sources to be in a form that can be used for reasoning. In this context, creating data-integration across diverse databases for applying AI tools is vital. My group is using CT to coherently and consistently integrate data sources for processing to feed the appropriate data for use with AI tools. Several pieces of work exploring the use of CT in IoT, description of complex systems, and process models that incorporate AI components in a seamless manner have been published. In the recent past, we have been looking into neurocognitive or neuro-symbolic reasoning, which involves combining neural networks with knowledge networks/ontologies. Commercial ventures such as Symbolica (https://www.symbolica.ai/) are utilizing CT as the foundation for AI reasoning systems. We believe our work will lead to formal approaches to system specification, design, interoperability, and integration, thus resulting in high quality systems.
Q6. SSD emphasizes collaborative efforts with industry, academia, and government agencies to accelerate software adoption and build trust in deployed systems. What are the most effective collaboration models you’ve developed or refined during your leadership? Can you discuss specific examples where these partnerships have led to significant breakthroughs or standards adoption, and what challenges remain in fostering these multi-sector collaborations in rapidly evolving technology landscapes?
To achieve our mission, we collaborate with academia, industry, and other government agencies in several ways. With the academia we host students and faculty at NIST. We have a program for undergraduate students called SURF (see https://www.nist.gov/surf). We support graduate students and faculty through several programs. We have several mechanisms to work with the U.S. industries and Standards Organizations. We also collaborate with other government agencies through performing research for some agencies. Before its reorganization, we held several leadership positions at NITRD (https://www.nitrd.gov/), which “coordinates Federal R&D to identify, develop, and transition into use the secure, advanced IT, high-performance computing, networking, and software capabilities needed by the Nation, and to foster public-private partnerships that provide world-leading IT capabilities.” We also have a program to host researchers from other countries, which are aligned with U.S. interests.
Some of our products in SSD include: NIST SD 28, National Software Reference Library; Computer Forensics Federated Testing Suite; Software Assurance Reference Dataset; Health IT Conformance Testing Tools (described in answer to question 3); Voluntary Voting System Guidelines; Web image processing pipeline (WIPP), Accelerated Computing Pipelines; ISO/IEC 5140:2024 Concepts for multi-cloud and the use of multiple cloud services; ISO/IEC 30141:2024 (https://share.google/WPm3lOs5Uqvrhli6t) (IoT) — Reference architecture; and sub-nanosecond DC-QNet synchronization at metropolitan scales. Some of our evaluation work for other government agencies include: TrojAI and Video-LINCS. Brief descriptions of above products are given below.
National Software Reference Library. The National Software Reference Library (NSRL) provides a trusted repository of known software, file profiles, and file signatures for use in digital forensic examinations, systems integrity management, and other applications. The NSRL is used daily by virtually all computer forensics operations (nationally and many internationally).
It contains over 1.5 billion file signatures from over 300,000 software packages ranging from personal computing software (PC, Mac and Linx), gaming software and mobile apps. The NSRL has been publishing quarterly updates since the fall of 2001, but its collection includes historical software going back to the 1980s. NIST personnel include Doug White, Austin Snelick, Erica Blanco and Eric Trapnell. (https://www.nist.gov/itl/ssd/software-quality-group/national-software-reference-library-nsrl)
Computer Forensics Tool Testing. There is a critical need in the law enforcement community to ensure the reliability of computer forensic tools. The goal of the Computer Forensic Tool Testing (CFTT) project is to establish a methodology for testing computer forensic software tools by development of general tool specifications, test procedures, test criteria, test sets, and test hardware.
The results provide the information necessary for toolmakers to improve tools, for users to make informed choices about acquiring and using computer forensics tools, and for interested parties to understand the tools capabilities. Our approach for testing computer forensic tools is based on well-recognized international methodologies for conformance testing and quality testing. One of the outputs of the CFTT, SP 800-101: Guidelines for Mobile Device Forensics was heavily cited in the Supreme Court caseRiley vs. California(2014). NIST personnel include Jenise Reyes Rodriguez and Barbara Guttman. (https://www.nist.gov/itl/ssd/software-quality-group/computer-forensics-tool-testing-program-cftt/federated-testing)
Software Assurance. Software assurance is a set of methods and processes to prevent, mitigate or remove weaknesses and vulnerabilities to ensure that software functions as intended. Our efforts include defining bug classes, collecting a corpus of programs with known bugs, and enabling better understanding of tool effectiveness. The Software Assurance Reference Dataset (SARD) is a growing collection of thousands of test programs with documented weaknesses. The Bugs Framework (BF) is a structured, complete, orthogonal, and language-independent classification of software weaknesses (bugs). BF allows unambiguous descriptions of software vulnerabilities. We are developing a program to support the use of AI in bug detection. NIST personnel include Vadim Okun, Irena Bojanova, and Aurelien Delaitre.
Voting. The 2002 Help America Vote Act requires NIST to provide technical support for the development of Voluntary Voting Standard Guidelines (VVSG). The current version, VVSG 2.0, was finalized in 2021. In addition, we have developed a set of implementations guides to help developers and election officials effectively use the VVSG. Topics include usability, multi-factor authentication, interoperability, and accessibility. The VVSG is used by most states as a part of their voting equipment selection process. NIST personnel include Ben Long, Liliana Rodriguez, Kristen Greene, Shanee Dawkins, Gema Howell, Noah Wallack and Barbara Guttman. (https://www.nist.gov/itl/voting)
Computational Metrology. Our goal is to develop measurement methods and reference software to ensure that software used in imaging-based experiments produces validated measurements with known uncertainties. Using neural networks, my team has developed a tool – Web Image Processing Pipeline – that is currently being used by the NIH to determine the appropriate stem cells for retinal pigment epithelial (RPE) cell implants for age-related macular degeneration, which afflicts nearly 11 million people in the U.S. The first U.S. patient received autologous stem cell therapy to treat dry AMD using techniques developed by my team – see (https://www.nih.gov/news-events/news-releases/first-us-patient-receives-autologous-stem-cell-therapy-treat-dry-amd). Essentially, we implemented a novel computational framework for viewing, quantifying, sharing, and modeling image-based measurements for cell biology. This improved the integrity and accuracy of scientific findings through validated measurements.
Key people in this project include Peter Bajcsy, Mary Brady, and Walid Keyrouz. Other supporting team members are Philippe Dessauw, Mylene Simon, and Guillaume Sousa Amaral. (https://www.nist.gov/programs-projects/trusted-computations-terabyte-sized-images-using-cluster-and-cloud-computing-web)
An additional effort in Imaging-based metrology in healthcare is aimed at enabling the use of MRI instruments with much lower magnetic fields (low-field MRI — 64 mT) instead of high-field instruments (3 T). This effort uses Artificial Intelligence techniques to upscale the resolution and quality of images acquired by low-field instruments, so they match the resolution & quality of images acquired by high-field instruments. Key people in this project are Joe Chalfoun and Adele Peskin as well as their collaborators from other laboratories at NIST.
Imaging Metrology for Telemedicine. The aging U.S. population and the need to provide access to remote areas will make telemedicine an integral part of any future health care ecosystem. This would involve storage, transmission, and display of medical images and videos. These images/videos should be of sufficient quality for the physician to be able make appropriate recommendations. Our work is focused on various metrological issues in a variety of applications, ranging from transmissive display characterization, 3D displays, to near-the-eye display metrology. We have developed new methods for measuring 3D display quality and have developed a suite of optical characterization methods to evaluate and help improve the performance of large immersive displays. Our work has been incorporated into various standards (ISO TC159/SC4/WG2, IEC TC110, etc.).
This project is led by John Penczek.
Molecular Property Prediction & Design. This effort starts with a known molecular signature (e.g., an amino-acid sequence), uses AI techniques to predict or retrieve the relevant properties (e.g., minimum inhibitory concentration) of molecules matching this signature, and then uses Generative Artificial Intelligence techniques to generate candidate molecules with the desired properties. One of the applications targeted by this multi-year project is to identify antimicrobial peptides applicable to antibiotic-resistant pathogens. Antonio Cardone, Sarala Padi, along with colleagues from other laboratories, is focusing on this work.
Cloud Computing. NIST is tasked with leading federal agencies in the development of standards and implementation guidelines to facilitate the secure exchange of information via the cloud. Our group has developed reference architectures and standards guidelines for cloud computing. Their work is accelerating secure and effective cloud computing adoption in the U.S. John Messina is playing a leadership role in INCITS SC38 to develop ISO/IEC 5140 Cloud Computing Concepts for Multi-Cloud (https://www.iso.org/standard/80910.html).. Earlier, Eric Simmon (retired) held leadership positions in the Digital Twin Consortium (DTC) and INCITS SC41 to develop ISO/IEC 30141 Internet of Things: Reference Architecture (https://www.iso.org/standard/88800.html). Jacob Collard is now serving as chair of the Patterns working group in the DTC
Precision Timing Infrastructure. NIST has been tasked to establish and characterize timing network infrastructure for Mobile Robots and Quantum Networking testbeds. With ACMD, we are also co-leading the quantum network testbed infrastructure development in DC-QNet, a research consortium comprised of seven Washington D.C. area agenciesalong with our Mid-Atlantic Crossroads partners at University of Maryland in support of DARPA quantum-augmented networking and other experimental research. This project is led by Ya-Shian Li-Baboud, with support from Spencer Briener, M. Dodge Mumford, and Dan Rosiak.
TrojAI. Using current machine learning methods, an AI model is trained on data, learns relationships in that data, and is then deployed to the world to operate on new data. For example, an AI model can be trained on images of traffic signs, learn what stop signs and speed limit signs look like, and then be deployed as part of an autonomous car.
The problem is that an adversary that can disrupt the training pipeline can insert Trojan behaviors into the AI model. For example, an AI model learning to distinguish traffic signs can be given just a few additional examples of stop signs with yellow squares on them, each labeled “speed limit sign.” If the AI model were deployed in a self-driving car, an adversary could cause the car to run through the stop sign just by putting a sticky note on it. The goal of the TrojAI program is to combat such Trojan attacks by developing techniques to inspect AI models for Trojans. Our contributions to TrojAI can be found at:https://www.nist.gov/itl/ssd/trojai.
Key personnel include Walid Keyrouz, Timothy Blattner, Michael Majurski.(https://www.nist.gov/itl/ssd/trojai)
Category Theory. My group has been active in organizing three workshops to foster a community for the potential use of CT. Besides, papers, we worked on a project with Chevron on early Oil-well design where many actors bringing different information were integrated as a process to make decisions. We built a very early proof of concept prototype in collaboration with CMU, and Topos Institute.
Q7. Looking ahead, how is the Software and Systems Division positioning itself to address emerging challenges in artificial intelligence, machine learning integration, and next-generation software assurance? Given your research background in AI methods and their applications to healthcare, what do you see as the most critical areas where NIST’s standards development and testing expertise will be essential? How are you preparing the division to maintain its leadership role in software quality and interoperability as technology continues to evolve rapidly? What are your thoughts about the future of OOS in critical and emerging technologies?
AI and Quantum Information Science have been identified as two top science and technology priorities by the current administration. SSD will place a major emphasis on these technologies. I believe object-oriented techniques will play an important role in many projects. I will describe some of our ongoing projects in these areas (I have previously mentioned our people working on these various projects).
AI Metrology. As AI permeates our daily lives, the need for rigorous measurement and evaluation of AI systems becomes critical. Our work will focus on the emerging field of AI metrology, which aims to establish standardized methods for assessing the accuracy, reliability, and trustworthiness of AI models. We will focus on AI tools for metrology, including testing algorithms and developing standard datasets. We are working on several machine and deep learning metrology projects to help industry, academia and government build secure software and software systems, and working with stakeholders to develop standards to ensure fair and reliable software. By examining specific applications in selected domains, we will develop methodologies on how AI can enhance metrology practices and contribute to more reliable and trustworthy AI-driven solutions.
AI for Software Testing. The shortage of large-scale, high-quality vulnerability datasets undermines development of reliable automated vulnerability detection systems that are essential for protecting critical infrastructures. As we mentioned in Question2, The SAMATE team is developing metrics, tools, and source code-based datasets to accelerate AI-driven bug detection and automated remediation. The next step is to develop an AI-based vulnerability injection system to automatically create realistic, diverse, and accurate vulnerability datasets. These datasets will enable researchers to evaluate software assurance tools more rigorously and train AI-based vulnerability detection tools with larger, more accurate data. By enhancing software quality at its core, our work strengthens the security, reliability, and trustworthiness of modern computing systems.
AI Focused Testing Infrastructure for Interoperability. We have outlined our work on a testing platform for health IT interoperability in a previous question. We hope to extend this platform utilizing AI techniques. For example, a user can input test cases and the platform can automatically generate test methods for interoperability in any domain (currently our system only supports HL7) using neuro-symbolic computing. We will also develop standardized benchmarks and tools for enhancing data quality.
Cyberinfrastructure for Quantum Information Science (QIS). Quantum networking has potential to transform distributed computing, sensing, and secure communications. Realizing the practical application of QIS in noisy environments requires high precision metrology as well as real-time software algorithms accounting for complex temporal and spatial dynamics to support reliable quantum-enhanced information distribution. SSD will collaborate with NIST partners in CTL, ITL, and PML, on the development of a cyberinfrastructure for quantum networking research testbeds. SSD has also begun to research novel algorithms to understand and manage the complexity of spatial-temporal dynamics to advance high-quality measurements, computations, and communications at quantum-levels. SSD will also support QIS by researching software and network stacks and standards for integrating quantum information into classical computing infrastructures for cloud, internet of things/digital twins, and positioning, navigation and timing technologies.
Standards Infrastructure. Standards, which are necessary for system interoperability, have the potential to scale innovation and expand market share. As technology advances rapidly, the resources available and the ability to develop high-quality standards for complex critical technology systems become increasingly challenging. The standards development process requires timely, dynamic discourse and consensus to achieve the greatest societal benefit. SSD is exploring, with Standards Development Organizations (ASTM, INCITS), use of computational linguistics and AI to formalize requirements and standards for reasoning over them. This effort aims to extract the implicit conceptual structures in engineering and standards documents that form the basis for creating formal structures using category-theoretic models. Formal reasoning could support consistency of specifications, definitions, and requirements, and harmonization of potential conflicts. In addition, formal reasoning will serve as the basis for SSD in developing smart standards using ML, GenAI, and NLP tools.
[1] A query to perplexity.ai on a comparison of DICE with Facebook produced the following output. “The MIT DICE project was a sophisticated early platform for collaborative engineering design, integrating knowledge-based systems, communication, and coordination in distributed teams—anticipating social networking features but applied to engineering workflows. Facebook, by contrast, was initially a straightforward social networking site focused on connecting individuals socially rather than collaborating on engineering projects. While both involved networked collaboration, their goals, domains, and technological foci were quite different—DICE being a pioneering research project in computer-supported cooperative work within engineering, and Facebook being a broad social communication platform evolving towards a major global social network.”
…………………………………………………………….

Ram Sriram is the Chief of the Software & Systems Division in the National Institute of Standards and Technology’s Information Technology Laboratory (ITL). Before joining the Software and Systems Division, Sriram was the leader of the Design and Process group in the Manufacturing Systems Integration Division, Manufacturing Engineering Laboratory, where he conducted research on standards for interoperability of computer-aided design systems. He was also the manager of the Sustainable Manufacturing Program. Prior to joining NIST, he was on the engineering faculty (1986-1994) at the Massachusetts Institute of Technology (MIT) and was instrumental in setting up the Intelligent Engineering Systems Laboratory. At MIT, Sriram initiated the MIT-DICE project, which was one of the pioneering projects in collaborative engineering and documented in the book entitled Distributed and Integrative Collaborative Engineering Design, Sarven Publishers, 2002.