Analysis of the confidence in the prediction of the protein folding by artificial intelligence

  1. Paloma Tejera-Nevado 13
  2. Emilio Serrano 1
  3. Ana González-Herrero 2
  4. Rodrigo Bermejo-Moreno 2
  5. Alejandro Rodríguez-González 13
  1. 1 ETS Ingenieros Informáticos, Universidad Politécnica de Madrid, Madrid, Spain
  2. 2 Margarita Salas Center for Biological Research (CIB-CSIC), Spanish National Research Council, Madrid, Spain
  3. 3 Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Pozuelo de Alarcón, Madrid, Spain
Libro:
Practical applications of computational biology and bioinformatics, 17th International Conference (PACBB 2023)
  1. Miguel Rocha (ed. lit.)
  2. Florentino Fdez-Riverola (ed. lit.)
  3. Mohd Saberi Mohamad (ed. lit.)
  4. Ana Belén Gil-González (ed. lit.)

Editorial: Springer Suiza

ISBN: 978-3-031-38079-2 978-3-031-38078-5

Año de publicación: 2023

Páginas: 84-93

Congreso: Practical Applications of Computational Biology & Bioinformatics (PACBB). International Conference (17. 2023. Miño)

Tipo: Aportación congreso

Resumen

The determination of protein structure has been facilitated using deep learning models, which can predict protein folding from protein sequences. In some cases, the predicted structure can be compared to the already-known distribution if there is information from classic methods such as nuclear magnetic resonance (NMR) spectroscopy, X-ray crystallography, or electron microscopy (EM). However, challenges arise when the proteins are not abundant, their structure is heterogeneous, and protein sample preparation is difficult. To determine the level of confidence that supports the prediction, different metrics are provided. These values are important in two ways: they offer information about the strength of the result and can supply an overall picture of the structure when different models are combined. This work provides an overview of the different deep-learning methods used to predict protein folding and the metrics that support their outputs. The confidence of the model is evaluated in detail using two proteins that contain four domains of unknown function.