"if you can do reliable genome generation, you can create plants that sequester carbon at 1000x the typical rate" -- it seems that I'm still missing the point of these generative models even after reading your excellent essay as I don't understand how one could in principle request for a certain function from these models? All they know is generating natural-looking sequences and I'm failing to see how can we get from that to 1000x faster carbon sequestration?
Well, I’m speaking in terms of being able to do reliable conditional genome generation :) we can do similar things for e.g. enzymes (see the ZymCTRL paper) or protein binders, it’s not a huge stretch to imagine it being able to be done for genomes
I take your point about the usefulness of generation of complex features like antibody synthesis or whatever but are nucleotide language models the right level for that? As opposed to a model that operates on a higher level of abstraction. Like with the glycosylation stuff why do you need to do base by base generation, essentially slightly re-engineering each glycosyltransferase, as opposed to gene by gene where you just paste in the appropriate gene sequence or enhancer element or whatever? It would look more like a systems biology model than language model, or maybe something like Future House-esque automated scientist + tons of compute for reasoning
...though come to think of it, probably an AI scientist would still consult a language model while doing the reasoning, so it's good to have around. I slightly wonder how core it would be though.
"if you can do reliable genome generation, you can create plants that sequester carbon at 1000x the typical rate" -- it seems that I'm still missing the point of these generative models even after reading your excellent essay as I don't understand how one could in principle request for a certain function from these models? All they know is generating natural-looking sequences and I'm failing to see how can we get from that to 1000x faster carbon sequestration?
Well, I’m speaking in terms of being able to do reliable conditional genome generation :) we can do similar things for e.g. enzymes (see the ZymCTRL paper) or protein binders, it’s not a huge stretch to imagine it being able to be done for genomes
I take your point about the usefulness of generation of complex features like antibody synthesis or whatever but are nucleotide language models the right level for that? As opposed to a model that operates on a higher level of abstraction. Like with the glycosylation stuff why do you need to do base by base generation, essentially slightly re-engineering each glycosyltransferase, as opposed to gene by gene where you just paste in the appropriate gene sequence or enhancer element or whatever? It would look more like a systems biology model than language model, or maybe something like Future House-esque automated scientist + tons of compute for reasoning
...though come to think of it, probably an AI scientist would still consult a language model while doing the reasoning, so it's good to have around. I slightly wonder how core it would be though.