Cadeddu, A., Wylie, E. K., Jurczak, J., Wampler‐Doty, M., & Grzybowski, B. A. (2014). Organic chemistry as a language and the implications of chemical linguistics for structural and retrosynthetic analyses. Angewandte Chemie International Edition, 53(31), 8108-8112.
综述类,关联NLP方法和应用领域的表格挺有价值的:
Öztürk, H., Özgür, A., Schwaller, P., Laino, T., & Ozkirimli, E. (2020). Exploring chemical space using natural language processing methodologies for drug discovery. Drug Discovery Today, 25(4), 689-705.
Asgari, E., & Mofrad, M. R. K. (2015). Continuous distributed representation of biological sequences for deep proteomics and genomics. PLoS ONE, 10(11), 1–15.
Protein与word embedding的结合: Bepler, T., & Berger, B. (2019). Learning protein sequence embeddings using information from structure. 7th International Conference on Learning Representations, ICLR 2019, 1–17.
Nam, J., & Kim, J. (2016). Linking the neural machine translation and the prediction of organic chemistry reactions. arXiv preprint arXiv:1612.09529.
Seq2Seq最佳: Schwaller, P., Gaudin, T., Lanyi, D., Bekas, C., & Laino, T. (2018). “Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models. Chemical science, 9(28), 6091-6098.
另一篇比较有价值的Seq2Seq: Karimi, M., Wu, D., Wang, Z., & Shen, Y. (2019). DeepAffinity: Interpretable deep learning of compound-protein affinity through unified recurrent and convolutional neural networks. Bioinformatics, 35(18), 3329–3338.
漂亮的标题漂亮的intro,但内容不是很惊艳的BERT应用:
Vig, J., Madani, A., Varshney, L. R., Xiong, C., Socher, R., & Rajani, N. F. (2020). Bertology meets biology: Interpreting attention in protein language models. arXiv preprint arXiv:2006.15222.