About this Event
110 SW Park Terrace, Corvallis, OR 97331
INTERPRETABLE MACHINE LEARNING: APPLICATIONS IN BIOLOGY AND GENOMICS
Machine Learning and Deep Learning models impact our daily lives with applications in natural language modeling, image analysis, healthcare, genomics, and bioinformatics. The exponential growth in biological sequence data also creates a need for computational methods that can take advantage of these data. Although deep learning has proven to be highly effective at classifying and detecting biological sequences, challenges remain in extracting meaningful patterns and information from the learned models. To realize the potential of deep learning in biology, we need to develop new strategies of model interpretation so that new biological principles can be revealed. In this work, we first present different problems and methods to classify patterns with high performance. Next, We describe a series of novel developed techniques to understand the machine learning models and identify meaningful biological patterns. The main focus of each problem is the creation of interpretable intelligent systems without sacrificing the performance. To test our approaches for model interpretation, we first focused our analysis on known biological patterns, and then extended the search beyond what is known. This work can be categorized into three stages: I) the development of bpRNA, a novel annotation tool capable of parsing RNA secondary structures. The result of bpRNA is a richly-annotated database that contains over 100,000 structures from 7 different sources with their base pairing information. We plan to use these data in Pseudoknow, a machine learning model to detect the pseudoknots from sequence data alone; II) the classification of cell type from gene expression data using Stacked Denoising Auto Encoders (SDAE). In particular, we applied this approach to distinguish healthy cells from various types of cancer. The goal was not only high performance in the classification task, but also in the identification of genes that are informative for the! prediction of cancer cells. Our study suggests that the most ! influent ial genes for the dimensionality reduction performed by SDAE were highly predictive of cell type; III) the identification of transcription start sites from DNA sequences using convolutional neural networks. Our preliminary data affirms that our developed model could accurately detect transcription starts sites at the sequence level. Going forward, our analysis will interpret the learned filters and connections to higher convolutional layers to discover novel biological motifs and their relationships.
Major Advisor: David Hendrix
Committee: Prasad Tadepalli
Committee: Stephen Ramsey
Committee: Weng-Keen Wong
Committee: Xiaoli Fern
GCR: Brett Tyler
Event Details
See Who Is Interested
0 people are interested in this event
User Activity
No recent activity