Big Data Pilot Project Awards (2017)
"Automating Pathology with Deep Learning"
Cancer is the second leading cause of death in the US. Diagnosis and prognosis are typically determined by histological analysis of tissue samples by a pathologist, which is time consuming and costly, and suffers from diagnostic inconsistency. Machine vision offers opportunities to analyze large numbers of cancer images, to discover novel histological features in pathology images that may have biological significance, and to use those characters for automating classification and prognosis. However, machine vision in pathology has been limited to narrow domains, and has yet to offer a real alternative to human assessment. Deep Neural Networks (DNNs) are a class of algorithms that have demonstrated great potential over the past few years, approaching or even exceeding human performance in various visual recognition tasks. But DNNs have yet to impact pathology, in large part because of limitations in the size and variety of current pathology datasets. We aim to fill this gap through a multi-institutional collaboration to collect the largest database of pathology images for computer vision in existence in order to develop novel DNN approaches suitable for histological analysis. We will collect the largest database of pathology images for machine vision in existence with on the order of 100,000 unique images of benign and malignant tissue for several tissue types as well as patient outcome data. This will allow us to develop a DNN-based machine vision system for analyzing histological slides. We will train DNNs to both localize malignancy in the images and relate their morphological features to patient outcomes, leading to machine vision systems that can provide accurate diagnosis and prognosis of cancer in tissue slides. After establishing that DNNs effectively learn from cancer data, we will take additional steps to validate their decisions. By leveraging state-of-the-art machine learning techniques, we will visualize and characterize the visual features used by DNNs to make their decisions, ensuring the use of biologically significant rather than artefactual visual information.