MIT 6.S191: AI in Healthcare

Transcription for the video titled "MIT 6.S191: AI in Healthcare".


Note: This transcription is split and grouped by topics and subtopics. You can navigate through the Table of Contents on the left. It's interactive. All paragraphs are timed to the original video. Click on the time (e.g., 01:53) to jump to the specific portion of the video.

Beginning Segment

Introduction (00:00)

I've been at Google for 16 years. The last six years I've been in life sciences and healthcare. I generally like running more interactive classes. Given the size of the group, we thought polls might work. So I'll launch a couple of polls throughout the talk and I'll try and keep an eye on chat as well if you guys have questions, but I might save them for the end as well. So let me talk through the agenda a little bit. I'm hoping to give you some information about AI in particular, deep learning and healthcare. And I will be using ai and deep learning interchangeably because that's just the name of our team is google ai but the examples you'll be seeing are all deep learning examples and as you know ai does include other things like robotics and non-neural network approaches so i just wanted to be clear that when i use them i don't mean to be conflating them entirely uh once i cover what some of the key applications are for what we've done in ai and healthcare i'd like to discuss with you what the kind of unique opportunity i think we have because of deep learning to be able to create a much more equitable society while we're deploying ai models and we can talk about how that's possible and finally i'll touch on one last set of applications for ai and healthcare at the end here so uh on the in terms of uh the history behind AI in healthcare, we are benefiting from the fact that we have the maturation of deep learning, especially the end-to-end capabilities where we can learn directly from the raw data. This is extremely useful for advances in computer vision and speech recognition, which is highly valuable in the field of medical the other area as you all know is the increase in localized compute power via gpus so that's allowed for neural networks to outperform non-neural networks in the past and then the third is the value of all these open source large label data sets, ImageNet being one for non-health related areas, but there is public data sets like UK Biobank and even Mimic, which has been truly helpful and it's what's developed actually and produced at the MIT labs.

Discussion On Ai Applications In Healthcare And Research

Applications of AI in healthcare (02:40)

So you'll be hearing about some of the applications of AI in healthcare next. One of the things that we do is to make sure we look at the needs in the industry and match that up to the tech capabilities. Healthcare specifically has enormous amounts of complex datasets. Annually, it's estimated to be generating on the order of several thousand exabytes of healthcare data a year. Just to put that in perspective a bit, it's estimated that if you were to take the internet data, that's around something with more like hundreds of exabytes. So it's several thousand times more. And what we're looking at in terms of those applications you'll see in bit is the pattern detection and the ability to recognize for things like lesions and tumors and really nuanced subtle imagery. Another area that it's useful for is just the addressing the limited medical expertise globally. If you look to the right, what you'd like to see is one medical specialist, like a radiologist, to about 12,000 people in the population. And what you can see on the graph to the right is that in developing countries, it looks more like one to 100,000 or one to a million even. And so the benefit of AI in healthcare is that it can help scale up to running some of these complex tasks that are valuable that mental experts are capable of. The third is really addressing human inconsistencies. And we'll talk a little bit about this, especially when we're talking about generating labels. AI models don't obviously suffer from recency or cognitive biases. And they are also able to work tirelessly, which is an issue when you have to work overtime, as in the medical field, which often happens. Let me just talk a little bit through the next application, which is lung cancer. cancer. What we developed was a computer-aided diagnostic, and in this case, it was to help screen for lung cancer using low-dose CT scans. What you normally see is the survival rates increasing dramatically if you catch it at earlier stages, but about 80 percent of lung cancers are not caught early. And what they use usually to do these screenings are these low-dose CT scans that, if you look in this diagram to the right, is these three-dimensional imaging that happens to your entire body.

End-to-end lung cancer screening (05:12)

It creates hundreds of images for the radiologist radiologists to look through and typically the actual lung cancer signs are very subtle. So what our models were able to do when we looked at this was to actually not just outperform the state of the art but actually more importantly we compared it to the radiologists to see if there was an absolute reduction both false positives and false negatives so false positives will lead to over utilization of the system and false negatives will lead to not being able to catch the cancer early enough and usually once you both both reduced pathology is another area that's a hard deep learning problem and even more complex data.

Pathology (06:05)

This is on the left you can see when you take a biopsy you have slices of the body tissue and these are magnified up to 40 times and creates about 10 to 15 megapixels of information per slide. creates about 10 to 15 megapixels of information per slide. The part that is inherently complex is when you're doing pathology, you want to know both the magnified level, highly magnified level of this tissue so that you can characterize the lesion. And you also need to understand the overall tissue architecture to provide context for it and so that's at a lower power so you have a multi-scale problem and it is also inherently complex to be able to differentiate between benign and malignant tumors. There's hundreds of the different pathologies that can affect tissue and so being able to visually differentiate is very challenging. We built the model to detect breast cancer from pathology images and the pathologists actually had no false positives. The model was able to capture more of the cancer lesions so it was greater than 95% compared to 73% that pathologists were getting, but it also increased the number of false positives. This meant that what we tried then was to actually combine and have the model and pathologists work together to see if the accuracy could improve, and it absolutely did. And this combined effort led to also development of an augmented microscope where you can see the model detecting the patches inside the microscope view itself. And we'll come back to the fact that the models had certain weaknesses and how we dealt with that later.

Genomics (08:02)

Genomics is another area that's benefited significantly from deep learning. It's worth noting that when you do whole genome sequences, what you're doing is tearing up your DNA into a billion reads of about 100 bases. And there's about a 30x oversampling with errors when you do that. When you try and figure out the sequence, what you're trying to do is something like take 30 years of a Sunday newspaper, 30 copies each with errors introduced, and then shred them into 20-word snippets, and then you try and put them back together. That's essentially what's happening when you're doing your sequencing. And so we recast this problem as a deep learning problem. We looked at how image recognition and specifically the convolutional neural networks would be able to perform in this space and developed a tool called DeepVariant, which is open sourced and available for anyone to use. And we've been improving it over time. This has proven to be highly accurate. The US FDA runs a precision FDA competition every few years, and it's outperformed most and won the awards for three out of four accuracy areas. And you can see on the right that when you, it's quite visually obvious when you actually get an error, a false variant in the sequencing. So this was a clever way to actually be able to rapidly detect errors in variant calls. So we talked about the different needs that are in the medical field. And one of them was the limited medical expertise. There's one way to help them, which is scaling up the tasks that they run so that they can be automated. This is another way of addressing it, which is returning time to the doctors. What's happened is what you're seeing in this picture is a girl who drew her experience when visiting a doctor. You can see the doctor is actually facing the computer to the left. This sparked a lot of discussion within the health care industry about the cost of technology and how it's interfering with patient care. The doctors now at this point spend something on the order of six hours a day interacting with their electronic health records to get data entered. One of the areas that's ripe for being able to support medical doctors is scribes, human scribes have been deployed, medical dictation has gotten much better, the automatic speech recognitions now have end-to-end models that are highly accurate, and it's improved significantly also on natural language processing. So these are all ways that is more like an assistive kind of AI to help doctors relieve the burden of documentation from them. I'm going to launch the poll right now just to see what people think is the most valuable application. Let me see here if I can do that. And as I just to quickly recap there was was computer diagnostics, which are useful for screening and diagnoses. There is, and that was demonstrated with the radiology. There was proved prognosis. That's pathology is useful for determining therapeutics, being able to determine treatment efficacy and and the progression of the disease um and that's what both pathology and genomics is highly utilized for and then returning time to experts is really the ai assistance through medical dictation scribing great so let me just keep going while the poll is going um I want to talk about how you can actually achieve a greater moonshot.

Higher quality and more equitable learning (11:44)

So let me take a step back here where we look at how the healthcare, the world of healthcare looks right now. It's tremendously filled with fragmentation. It's fairly impersonal and it's inequitably distributed. And one of the things I noted was that in tech, we do amplify a system if you apply it to it. So tech is a way to both augment and scale up what exists. And so if you have if you're applying it to a broken system with perverse incentives, it won't fix the system inherently, it will accelerate it. But at the core of machine learning and these deep learning technologies, what we're doing is we're looking at the data very carefully and utilizing that to build predictions and determine outcomes. In this case, given that the world is not full of equity, you run the risk of training the wrong models. We published also a paper to help address this. So societal inequities and biases are often codified in the data that we're using. We actually have the opportunity to examine those historical biases and proactively promote a more equitable future when we're developing the models. You can do that by correcting for bias in the training data. You can also correct bias in the model design and the problem formulation which and what you're trying to solve for and we'll talk about that in a bit. And finally if finally, if if none of that is applicable, then you all you can also test and ensure for equal outcomes and resource allocations at the end of when you're deploying the AI models.

Moonshots at Google (13:34)

So this is I used to work in Google X, which is an Google's effort to do moonshots. The way we define moonshots is the intersection of a huge problem, breakthrough technology, and radical solutions. And a huge problem here is that the world is uncertain, impersonal, and it also needs higher accuracy. We have the benefit of a breakthrough tech right now, which is AI and deep learning. And I'm just going to say digital mobile tools is actually breakthrough tech for health care because they tend to lag about a decade behind other industries due to regulations, safety, privacy, and quality needs. radical solution here is we actually think about not just improving the quality of care that we're delivering, but making sure that when we do that, we also make it more equitable. And there's at every point in time when I see a technological wave happen, I do realize that at this point that it's an opportunity for us to reshape our future. So in the case of deep learning, I'd like to talk about the opportunities for actually moving. Sorry, I didn't realize the slides weren't advancing. I want to talk about the opportunity to actually make the AI models much more equitable and how we would do that. So the two key areas I'll talk about is community participation and how that's going to affect the models and the data evaluation. And then also planning for model limitations and how you can do that effectively with AI. One of the things that we did was work with the regions that we were directly going to deploy the models with. And so on the left here, you see us working with the team in India. And on the right, it's our team working with those in Thailand. What we found was that the socioeconomic situation absolutely mattered in terms of where you would deploy the model. absolutely mattered in terms of where you would deploy the model. An example is while we developed the model with the ophthalmology centers, and that's where the eye disease is happening, and diabetic chronopathy is the leading cause of growing cause of blindness in the world. This is where the models were developed, but the the use case was most acute in uh the diabetic centers so the endocrinology offices um and people were not making the 100 meter distance to be able to go from the endocrinology offices to the ophthalmology offices um because of access issues and uh challenges lines and so on and so forth. So this is an area that we explored using user research extensively to make sure that we thought through where the AI models would land and how that would impact the users.

Generating labels, bias, and uncertainty (16:43)

Another area that we looked at is when we're generating labels for the models. You can see on the left that as classically you would expect when you get more data, the models continues to improve. So it kind of flattens out here at 60,000 images. And at some point that's sufficient and you're not going to get much more improvement from that um what you actually benefit from if you look to the right graph is improvement of the what we the quality of the labels or the what we refer to as the grades on the images um each doctor gives an image and grade, which is their diagnostic opinion of what they think they're seeing. As we got multiple opinions on single images and were able to reconcile that, we were able to continuously improve the model output and improve the accuracy. So this is something that's often said in the healthcare spaces, if you ask three doctors, you get four opinions because even the doctor themselves may not be consistent with themselves over time. The way that this is addressed in some countries is to use the stealthy method, which was developed during the Cold War. It helps determine consensus where individual opinions vary. And we developed a tool to do asynchronous adjudication of different opinions. This has led to much higher ground truth data creation. And it's because of the fact that doctors actually sometimes just miss what another doctor notices. And so they generally do reconcile and are able to come to agreement on what the actual severity or diagnosis should be. And so this was something that we saw that was really impactful because when we were doing the analysis with ophthalmologists, we would see things like 60% consistency across the doctors. And this was a way to actually address that level of variance. And here's the last area that I want to talk about for community engagement. This is around, if you go even further upstream to the problem formulation, this is a case where they didn't think through the inputs to their models and algorithms. This algorithm was trying to determine the utilization needs of the community, and they were using the health costs as a proxy for the actual health needs. This has led to inadvertently a racial bias because less money was spent on black patients. And this was caught after the fact. And so if you just click one more time, this is one of the key areas where having input from the community would have actually caught that much earlier on when they were doing the algorithm development. This is something that we practice now frequently. And I know you guys are working on projects, so it'd be one of the polls I wanted to put out there was just, let's see if we can get it to launch, is which one of these approaches are actually potentially relevant at all for the projects that you guys are working on. Okay great, I'll keep going with the talk then while this is being saved and be nice to look back on it.

Plan for model limitations (20:22)

still look back on it. On the left here, I mentioned earlier how our pathology models had certain weaknesses in terms of false positives, but it also was capturing more of the cancer lesions than the pathologists were. So we developed a way to try to explain the models through similar image lookup. And what this has allowed to have happen was it uses a cluster algorithm and is able to find features that were not known before to pathologists that might be meaningful indicators of the actual diagnosis or prognosis. And in this case, pathologists have started to use the tool to learn from it. And there's also the benefit of the pathologists being able to recognize any issues with the models and inform the models and proofs. So you get a virtuous cycle of the models and inform the models to improve so you get a virtuous cycle of the models and the pathologists learning from each other on the right is another way that we uh used to explain the model output you see saliency maps um which is a way to just um be able to identify which features are um uh that models actually paying attention to and in this case which pixels the model is paying attention to and light those up. We do this so that we know that the way that the model is actually determining the diagnostic, whether it's a particular skin condition, that they're looking at the actual skin abnormalities and not some side potential, unintentional correlation to skin tone or demographic information. And so this has been valuable to you as a way of checking the models that way. And the last that I mentioned is doing model evaluation for equal outcomes. There's something in the dermatology space known as the Fitzpatrick skin type. It allows you to see the different skin tones. And what we do is to have test sets that are in the different skin tones to do the model evaluation to see if we do get equal outcomes. And this is something where as the model developer and you have to make some hard choices. If you see that your models aren't performing well for a particular category or demographic, ideally what happens is you supplement your data set so that you can improve your models further to appropriately address those regions, or you may have to make a decision to limit your model output so that there can be equal outcomes. And sometimes you don't actually, choose not to deploy the models and so these are some of the kind of real world implications of developing ai models in the healthcare space the last application i wanted to talk through with this group is the concept of healthcare typically in the past health care is thought of for patients.

Healthcare patient vs person (23:31)

And while every patient is a person, not every person is a patient. And patients typically are considered on the left here, people who are sick or at risk. They're entering the health care system. The models are quite different when you're thinking about people of this nature, whether they have acute or chronic diseases. And they're the ones that we talked about a bit earlier, which are screening, diagnostics, prognosis, treatment. Those are what the models tend to focus on. If you're looking at people, they are considered quote unquote well, but their health is impacted every single day by what we call like social determinants of health, which are your environmental and social circumstances, your behavioral and lifestyle choices, and how your genes are interacting with the environment. And the models here look dramatically different in terms of how you would approach the problem. They tend to focus on preventative care, so eating well, sleeping well, exercising. And they also focus on public health, which I think has been a big known issue now with coronavirus. And of course, screening is actually consistent across both sides. So when we talk about public health there's can be things like epidemiological models which are extremely valuable but there's also things that are happening right now especially probably one of the biggest global threats to public health is climate change. And so one of the things that's happening in places like India is flood forecasting for public health alerts. In India, there's a lot of alert fatigue, actually. And so it's actually unclear when they should care about the alerts or not. What this team did was they focused on building a scalable high-resolution hydraulic model using convolutional neural nets to estimate inputs like snow levels, soil moisture estimation and permeability. These hydraulic models simulate the water behavior across floodplains and were far more accurate than what was being used before. And this has been deployed now to help with alerting across the India regions during the monsoon seasons. Let's see. And so I just want to leave this group with the idea that on the climate change side, there is a lot going on right now. Nature is essential to the health of the plant, but also the people that live on it. So we currently rely on these ecosystem services. What that means is people rely on things like clean air, water supply, pollination of agriculture for food, land stability and climate regulations. And this is an area that's ripe for AI to be able to help understand far better and value those services that we currently don't pay a lot for, but we'll probably have to in the future. And so this last slide, let me see if we can get it to appear, is for the poll. And just I wanted to compare and understand if the perception around health is any different in terms of what might be the most exciting for AI to be applied to. Thanks for launching the final poll.

Summary and conclusion (27:35)

And the last thing I want to leave the group with is none of the work that we do in AI and healthcare is possible. There's a huge team and a lot of collaboration that happens across medical research organizations and research institutes and healthcare systems. And so that is, you know, this is our team as it's grown over the years and in different forms, it's not all of our team even anymore, but this is certainly where a lot of the work has been generated from.

Poll And Q&A Sessions

Poll results and Q&A (28:03)

Let me take a look at the questions in chat now. And so the, I'll just recap what the poll results were. So it looks like, um, the diagnostic models, um, uh, 54. Oh yeah. So I guess you guys can do multiple choice. You can pick multiple ones. So 50, but 60 people, half the people felt that the diagnostics and the therapeutics were valuable and less interested in, but still valuable, the assistance. Thanks for filling those out. Definitely, let me look at the questions. Given the fast advancement in terms of new models, what is the main bottleneck to scaling ML diagnostic solutions to more people around the world? It's the regulatory automation of meeting the regulatory needs. The long pole for diagnostics is ensuring patient safety, proper regulation. Usually you go through FDA or CE marks and that can take time. There's quality management systems that have to be built to make sure that the system is robust from a development perspective. So it's software as a medical device. is robust from a development perspective so it's software as a medical device and this is always going to be true in the case of when you're dealing with patients in terms of the other part maybe is the open source data sets having more labeled data sets out there so that everyone can have access and move that space forward is valuable. Second question here, good data sets are essential to developing useful equitable models. What effort and technologies do we need to invest in to continue collecting data sets and form our models? So, one of the things that's happening is developing scalable labeling infrastructure. That's one way to be able to generate better data sets. But raw data is also ones that are directly reflecting the outcomes is valuable. directly reflecting the outcomes is valuable. So an example is if you're thinking about data that's coming straight from the user in terms of their vital signs or their physiological signals, these are things that are as close to ground truth as you can get about the individual's wellbeing. And obviously what we saw with COVID-19 was it was even harder to get information like how many deaths were actually happening and what were the causes of those deaths. And so these are the kinds of data sets that need to these pipelines need to be thought of in the context of how can they be supporting public health goods and how does that get it accurately get out the door. So we do have an effort right now that lots of people pulled into especially for for coronavirus, which was on GitHub now, which I can provide a link for later. And it's volunteers who have built a transparent data pipeline for the source of the data. Provenance is very important to keep track of when you create these data sets to make sure you know what the purpose of the data is and how reliable it is and where the source is coming from. So these are the kinds of things that need to be built out to inform the models that you build. This last question, how do you facilitate conversation awareness around potential algorithmic bias related to the products you are developing? potential algorithmic bias related to the products you were developing. Several things. One is that the team you build, as much as of them that can be reflective of the be representative of a broader population is actually more meaningful than than I think people realize. So what I mean by that is if you have a diverse team working on it or you bring in people who can be contributors or part of a consortium that can reflect on the problem space that you're trying to tackle, that is actually a really good way to hear and discover things that you might not have ever thought of before. But again, it can start with the team that you build and then the network around you that you're actually getting feedback loops from. And if you can't afford it, you would want to do that in a way that is quite measurable and quantitative. But if you can, it's actually still quite meaningful to proactively just have these conversations about what space you're going into and how you're going to think about the inputs to your models. All right. So thank you. it looks like those were majority questions

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to Wisdom In a Nutshell.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.