Princeton Laboratory for Artificial Intelligence Research Blog

Deep Dive Series: Building Biosecurity Safeguards into AI for Science

June 26, 2025

By Anil Ananthaswamy

In 1962, Max Perutz and John Kendrew received the Nobel Prize in Chemistry for figuring out for the first time ever the three-dimensional shapes of two protein molecules. Pertuz studied hemoglobin and Kendrew studied the somewhat smaller myoglobin, proteins involved in shuttling oxygen to tissues in our bodies.

Their work had been painstaking. Perutz, in particular, started working on the complex hemoglobin molecule in 1937, but it wasn’t until 1953 that he developed the method that would help reveal its structure. It would take him six more years to get the first low-resolution structure of hemoglobin.

Fast forward to 2024. The Nobel Prize in Chemistry went to David Baker for “computational protein design,” and to Demis Hassabis and John Jumper for “protein structure prediction.” The latter effort is an exemplar of using artificial intelligence (AI) to accelerate scientific discovery. As Hassabis said during a talk in March 2025 at Cambridge University, Google DeepMind’s AlphaFold — the AI system that won him and Jumper their Nobel Prize — has predicted the structure of all the 200 million or so proteins known to science. Assuming a Ph.D. student takes about 4-5 years to work out a protein’s structure, “it’s kind of like a billion years of Ph.D. time done in one year,” said Hassabis.

With such power comes responsibility. It’s one that Mengdi Wang, Professor at the Center for Statistics and Machine Learning, Department of Electrical and Computer Engineering and co-director of the Princeton AI for Accelerated Invention, takes seriously. “We are rolling out all these nice AI models, but what if the models get used by some adversarial party who could potentially weaponize it or use it for purposes that are different from our original purpose,” says Wang.

Mengdi Wang

To address such concerns, she and her co-authors put out a call for action in a paper in Nature Biotechnology, arguing that “AI models for protein engineering, genome editing and molecular synthesis can be misused to enhance viral virulence, design toxins or modify human embryos, and ethical and policy discussions lag behind technological advances.” They called for “proactive, built-in and AI-native safeguards within generative AI tools.”

Wang’s team is working on ensuring that. While they have been developing foundational AI models for biotechnology, the researchers have also been working on built-in safeguards to track and help prevent misuse. “Wang and her colleagues go beyond voluntary measures by outlining measures that could be built into AI models themselves,” Kristel Tjandra wrote for Science.

AI Agent for CRISPR

For example, in April 2024, Wang and her Ph.D. student Kaixuan Huang and researchers from Google DeepMind and Stanford University introduced CRISPR-GPT, a large language model (LLM)-based agent that interacts with a human user to facilitate gene-editing. In their paper, they gave a detailed example of how they used CRISPR-GPT to speed up the design and execution of one such experiment.

Let’s say you want to study a gene called TGFBR1, which codes for a receptor protein that is expressed on a cell’s surface. The receptor aids in information transfer across the cell wall; disrupting or knocking out the gene and studying the effects of doing so can lead to a better understanding of, say, certain cancers. CRISPR-GPT allows you to begin by simply asking the agent: “I hope to knockout TGFBR1 in A375 human melanoma cell line.”

The agent initiates the first step in the process: selecting the CRISPR-associated (Cas) enzyme that will be used as a molecular scissor to cut out a DNA sequence from the genome. You can tell the agent that you prefer an enzyme that can edit multiple sites on the genome, but you also want to minimize the number of edits at unwanted locations. The agent, after interacting with an LLM, recommends Cas12a, for example, and even orders it from Addgene, a non-profit organization that ships such DNA molecules.

Next, the agent interacts with you to settle on a method to deliver the DNA to the cell line. It recommends a Lentivirus. Then, the agent takes on the task of designing the RNA molecule that will guide the Cas12a enzyme to the requisite region on the genome. Such an RNA sequence is called sgRNA. You reiterate to the agent that you want to target the TGFBR1 gene. The agent recommends the appropriate sgRNA sequence and places an order with a vendor.

CRISPR-GPT automates this process even further, by identifying the subsequent experimental protocols — say, for cloning the sgRNA sequence and producing the Lentivirus — and guiding the user to complete the actual wet lab experiment, including testing and validation.

The researchers caution that “safety and ethical concerns arise when using AI tools to guide genome editing, ranging from the risk of illegally altering human genomes to privacy issues when user genome information is involved.” The team incorporated some measures to mitigate against foreseeable risks.

One measure was to guard against using CRISPR for making dangerous edits to human cell lines. Writing in the journal Nature in 2019, Eric Lander, Emmanuelle Charpentier (who won the Nobel Prize in Chemistry in 2020, along with Jennifer Doudna, for discovering “the CRISPR/Cas9 genetic scissors.”) and others called for “global moratorium on all clinical uses of human germline editing — that is, changing heritable DNA (in sperm, eggs or embryos) to make genetically modified children.”

Wang and colleagues have built safeguards into CRISPR-GPT that acknowledges the moratorium. If the agent detects that a user wants to edit a gene that is expressed in a human tissue or organ, it’ll issue a warning, provide a link to the moratorium guidelines, and ensure that the user has read the guidelines and understood the risk before proceeding. CRISPR-GPT also tries to protect personal privacy by ensuring that DNA sequences longer than 20 base pairs are not sent to external LLMs (such as GPT-4), because such sequences can in principle be used to identify an individual; the user is asked to explicitly delete the information in the input prompts.

Watermarking Biomolecules

Wang’s lab also took another approach to designing in-built safeguards for other types of foundation models used in biology. In particular, they focused on AIs that predict the 3D structure of proteins. In May 2024, Google DeepMind announced the latest version of AlphaFold, called AlphaFold 3, which when given a sequence of molecules generates the 3D structure of the biomolecule composed of all the individual molecules. The 3D structures can be those of DNA, RNA, proteins, or even smaller biomolecules. Google DeepMind made the service freely available via their AlphaFold Server, “explicitly for non-commercial use.” The terms of service also prohibit the use of AlphaFold’s outputs for training other “machine learning models or related technology for biomolecular structure prediction similar to AlphaFold Server.”

Zaixi Zhang

Of course, it’s not trivial to enforce such terms of use to protect intellectual property. Nor is it easy to ensure that AlphaFold 3 (and other similar efforts, such as RFDiffusion, another freely available open-source model for generating 3D structures of novel proteins) aren’t used for nefarious purposes, doesn’t cause breaches of biosecurity. To tackle such problems, Wang, along with postdoctoral fellow Zaixi Zhang and colleagues, figured out how to insert “watermarks” or unique signatures into the 3D structures of proteins, which could later be extracted to figure out the provenance of those structures.

Watermarks date back to the late 13^th century in Italy, when paper manufacturers began embedding patterns in sheets of paper, visible when the paper was held up to light. Such patterns were used to encode the quality of the paper and also its manufacturer, but eventually became a technique to guard against, say, counterfeit currency. In the era of modern deep learning, image generation AIs are capable of inserting digital watermarks that don’t change the image visually, but the watermarks can be detected by an AI designed to look for such signatures.

Wang and colleagues took a similar approach to design FoldMark, with the additional complexity that they had to watermark protein structures, not images or text. To do so, the team had to train an encoder and a decoder: the encoder learned how to add the watermark (which could be, for example, a 32-bit sequence of zeros and ones that contained information about the user, the name and version of the model, etc.). The watermark ever-so-slightly modified a protein’s 3D structure, for example, by making small changes to lengths of the chemical bonds between molecules or by changing the dihedral angle (which is angle formed by the intersection of two planes in the protein’s 3D structure; each plane dictates the orientation of one polypeptide backbone). The basic idea here is that the changes are so small that protein is functionally unaltered. The encoder is a deep neural network that is trained to minimize the difference between the protein’s original structure and the watermark-augmented structure it generates. Once the encoder is trained, it can take any AlphaFold 3- or RFDiffusion-generated 3D structure and watermark it.

The decoder, on the other hand, takes the watermarked 3D structure and extracts the 32-bit string, in our example. It’s trained to minimize something called the binary cross entropy loss between the original watermark and the decoded watermark. Minimizing this loss ensures that the extracted sequence of zeros and ones is as close to the original watermark as possible.

Now, if someone were to use AlphaFold 3 plus the encoder to generate 3D structures of proteins and these proteins were misused, it’d be possible in principle to detect the presence of watermarks and flag the misuse.

In the above example, the system used pretrained versions of AlphaFold 3 and RFDiffusion and trained an encoder and decoder to work alongside them. This architecture requires an encoder to be used with the generative AI; using the AI without the encoder in place would circumvent the watermarking procedure.

To make the solution more robust, the team also designed a solution that enables augmented versions of AlphaFold 3 or RFDiffusion to themselves generate 3D structures with embedded watermarks. The researchers used open-source versions that replicated the functionality of the generative AI models.

Encoding the watermark involved two steps. The first was to generate a protein 3D structure using the original model. The second was to train a new neural network to take a watermark and use that information to output the weights for a small set of the layers of the neural network that comprised the original model. These so-called low rank (LoRA) weights are then used to update the weights of the original model. This updated model then generated another 3D protein structure, with an embedded watermark. The entire system is trained end-to-end to minimize the difference between the two 3D structures, one generated by the original model and the other by the LoRA-updated model. This ensures that the watermarked structure is as close to the original as possible, thus preserving functionality.

The decoder is trained in much the same way as the before: take a watermarked 3D structure, extract the watermark, and minimize the binary cross entropy loss between the original and the extracted watermark.

Detection and Tracing

One test of the usefulness of the watermark is to see whether the decoder can trace a generated 3D protein structure back to a particular user. Consider a scenario where a legitimate user, Alice, releases a pretrained model that can generate watermarked 3D protein structures. Now Carol registers as a user on Alice’s server and uses the AI to generate dangerous toxins. The FoldMark decoder can detect the watermark in the toxin and successfully trace it back to Carol — and for a 32-bit watermark, it can do so for up to 1 million such users of Alice’s model.

Another crucial aspect of this work is to test whether the watermarked protein shows any loss of functionality. “We wanted to show that you can not only detect a watermark in the structure, but also that this watermark does not destroy the structural quality of the protein,” says Zhang.

One can verify the structural quality of the watermarked protein using computational techniques. But this is not conclusive. So, the team also performed a couple of wet lab experiments. First, they watermarked a CRISPR Cas13 protein, synthesized and tested it in the lab to see whether it had the same gene editing efficiency as the unmarked or wildtype Cas13. Second, they watermarked and synthesized a green fluorescent protein called EGFP and measured its fluorescence against the wildtype’s. In both cases, the functionality of the watermarked proteins was statistically indistinguishable from that of the wildtypes: the watermarked EGFP was 98 percent as fluorescent as the wildtype and the watermarked Cas13 was 95 percent as efficient at gene editing as the wildtype counterpart. Also, the researchers could detect the watermark itself with more than 90 percent accuracy in both cases.

The authors write that this offers “a generalizable and practical solution for safeguarding AI-driven protein research.”

Wang’s team has now set its sights higher. “We’re working on technologies beyond these protein structure models,” she says. “We are working on genome foundation models, how to jailbreak them, how to do anti-jail breaking, and how to add watermarks on genome foundation models. We’re expanding from one specific domain to a wider range of domains.”

Deep Dive Series: Building Biosecurity Safeguards into AI for Science

AI Agent for CRISPR

Watermarking Biomolecules

Detection and Tracing