3 Comments

I am excited by MINT but I am not sure if they used association data. “We start with 2.4 billion protein-protein interactions (PPIs) comprising 59.3 million unique protein sequences classified as physical links from the STRING database.” I believe physical links are those where there is evidence of binding or a complex forming.

Expand full comment

Good catch, my bad!

Expand full comment

(1) thanks for the shout-out!

(2) I'm not aware of super big controversy re: GNoME, it seems like a solid piece of work. the comment you linked seems about right. most of the complaints are about presentation, framing, and so forth (as opposed to the previous self-driving materials exploration work, where many of the XRD data were just incompetently analyzed and flat-out wrong).

(3) re: AF3 protein–peptide modeling, at least for small molecule ligands it seems that OOD performance is not great (cf. the papers I cite in my Achilles/tortoise post). I'm curious about if the observed good performance is a function of data leakage, or if cofolding methods are just better at peptides. it's very plausible that these methods are better at handling peptides, since there's much more data (and much less diversity), and one would hope that some of the classic AlphaFold ML performance would translate!

Expand full comment