Machine Learning Based Approach for Functional Annotation of Lytic Polysaccharide Monooxygenases

Srivastava, Pulkit Anupam; Yennamalli, Ragothaman M. [Guided by]

Please use this identifier to cite or link to this item: http://www.ir.juit.ac.in:8080/jspui/jspui/handle/123456789/7234

Title:	Machine Learning Based Approach for Functional Annotation of Lytic Polysaccharide Monooxygenases
Authors:	Srivastava, Pulkit Anupam Yennamalli, Ragothaman M. [Guided by]
Keywords:	Lytic polysaccharide monooxygenases Deep neural network Long short-term memory Proteome
Issue Date:	2019
Publisher:	Jaypee University of Information Technology, Solan, H.P.
Abstract:	Lytic polysaccharide monooxygenases (LPMO), a family of copper-dependent oxidative enzymes, boost the degradation of crystalline polysaccharides, such as cellulose and chitin, by breaking an internal glycosidic bond thereby exposing the polymer for further degradation. Recently, the sequence diversity of LPMOs has increased significantly, with newer sequences identified in organisms across the tree of life. Accurate functional assignment of yet unknown sequences into LPMOs family is an important step towards production of enzymatic mixture adept at efficiently degrading recalcitrant polysaccharides. While, multiple experimental methods are used for accurate identification of LPMOs, a computational method that can accurately classify sequences into LPMOs is needed to match the sequences generated. Thus, to screen potential LPMOs, we developed a machine learning based tool that employs two different approaches to functionally classify a given protein sequence(s) as belonging to LPMO family or not. As proof of concept, we worked on classifying sequences belonging to either AA9 or AA10 family of LPMO. The first approach uses traditional neural network based prediction after calculating sequence features. The second approach uses bi-directional long short-term memory (LSTM) units, a type of recurrent neural network, which extracts important features directly from sequence and utilizes an internal state, i.e., memory, to process input data. The optimized model trained from both the approaches was cross validated on a validation set to test the precision and recall. Specifically, feature-based traditional neural network approach was able to correctly discriminate AA9 LPMO sequences from non-AA9 LPMOs with a recall of 96.4%, precision of 100% and AA10 LPMO sequences from non-AA10 LPMOs with a recall of 86.9%, precision of 100%. On the other hand, LSTM had a recall of 93.4%, precision of 90.7% on AA9 dataset and recall of 91.7%, precision of 89.6% on AA10 dataset. Further, we validated our method with an independent benchmark set of LPMO sequences, where we observed significant precision and recall compared to dbCAN2, an existing HMM-profile based CAZyme predicting tool. The working code can be freely found at: https://github.com/PulkiD/PreDSLpmo.
URI:	http://ir.juit.ac.in:8080/jspui/jspui/handle/123456789/7234
Appears in Collections:	B.Tech. Project Reports

Files in This Item:

File	Description	Size	Format
Machine Learning Based Approach for Functional Annotation of Lytic Polysaccharide Monooxygenases.pdf		1.43 MB	Adobe PDF	View/Open

Show full item record