MIT 6.S191: AI Bias and Fairness

Transcription for the video titled "MIT 6.S191: AI Bias and Fairness".

1970-01-01T13:09:42.000Z

Note: This transcription is split and grouped by topics and subtopics. You can navigate through the Table of Contents on the left. It's interactive. All paragraphs are timed to the original video. Click on the time (e.g., 01:53) to jump to the specific portion of the video.


Introduction And Motivation

Introduction and motivation (00:00)

Hi everyone. Welcome to our second hot topic lecture in 6S191, where we're going to learn about algorithmic bias and fairness. Recently, this topic is emerging as a truly pervasive issue in modern deep learning and AI more generally. And it's something that can occur at all stages of the AI pipeline, from data collection all the way to model interpretation. In this lecture, we'll not only learn about what algorithmic bias is and how it may arise, but we will also explore some new and exciting methodological advances where we can start to think about how we can build machines capable of identifying and to some degree actually mitigating these biases. The concept of algorithmic bias points to this observation that neural networks and AI systems more broadly are susceptible to significant biases such that these biases can lead to very real and detrimental societal consequences. Indeed, today more than ever we are already seeing this manifesting in society, from everything from facial recognition to medical decision making to voice recognition. And on top of this, algorithmic bias can actually perpetuate existing social and cultural biases, such as racial and gender biases. Now we're coming to appreciate and recognize that algorithmic bias in deep learning is a truly pervasive and severe issue. And from this, we really need strategies on all levels to actually combat this problem.


Types And Mitigation Of Biases In Machine Learning

What does bias mean? (01:40)

To start, first we have to understand what exactly does algorithmic bias actually mean. So let's consider this image. What do you see in this image? How would you describe it? Well, the first thing you may say to describe this image is watermelon. What if I tell you to look closer and describe it in more detail? Okay, maybe you'll say watermelon slices or watermelon with seeds or other descriptors like juicy watermelon, layers of watermelon, watermelon slices next to each other. But as you were thinking about this to yourself, I wonder how many of you thought to describe this image as red watermelon. If you're anything like me, most likely you did not. Now let's consider this new image. What is in this image here? Now you're probably much more likely to place a yellow descriptor when describing this watermelon. Your top answer is probably going to be yellow watermelon and then with slices, with seeds, juicy, etc, etc. But why is this the case? And why did we not say red watermelon when we saw this original image? Well when we see an image like, our tendency is to just think of it as watermelon rather than red watermelon. And that's because of our own biases, for example, due to geography, that make us used to seeing watermelon that look like this and have this red color. As this represents the prototypical watermelon flesh that I expect to see. But perhaps if you're from another part of the world where the yellow watermelon originated from, you could have a different prototypical sense of what color watermelons should be. And this points to this broader fact about how we as humans go about perceiving and making sense of the world. In all aspects of life, we tend to label and categorize things as a way of imposing order to simplify and make sense of the world. And as a result, what this means is that for everything, there's generally going to be some typical representation, what we can think of as a prototype. And based on the frequencies of what each of us observe, our tendency is going to be to point out things that don't fit what we as individuals consider to be the norm, those things that are atypical to us. For example, the yellow watermelon for me. And critically, biases and stereotypes can arise when particular labels, which may not necessarily be the minority label, can confound our decision making, whether that's human-driven or suggested by an algorithm. And in this lecture, we're going to focus on sources of algorithmic bias and discuss some emerging approaches to try to combat it. To do that, let's first consider how exactly bias can and does manifest in deep learning and AI.


Bias in machine learning (04:22)

One of the most prevalent examples of bias in deep learning that we see today is in facial detection. And recently there have been a couple of review analyses that have actually evaluated the performance of commercial facial detection and classification systems across different social demographics. For example, in an analysis of gender classifiers, this review showed that commercial pipelines performed significantly worse on faces of darker females relative to other demographic groups. And another analysis, which considered facial recognition algorithms, again found that error rates were highest on female faces of color. This notion of algorithmic bias can manifest in a myriad of different ways. So as another example, let's consider the problem of image classification generally. And let's say we have a trained CNN and this image on the left here, which shows a prototypical example of a bride in some North American and European countries. Now in recent analysis when this particular image of a bride was passed into a CNN that was trained on a open-source large-scale image data set, the predicted data class labels that were outputted by the CNN were perhaps unsurprisingly things like bride, dress, wedding, ceremony, women, as expected. However, when this image, which is a prototypical example of a bride in other parts of the world, such as in South Asia, was passed into that very same CNN, now the predicted class labels did not in fact reflect the ground truth label of bride. labels did not in fact reflect the ground truth label of bride. Clothing, event, costume art, as you can see nothing here about a bride or a wedding or even a human being. So clearly this is a very very significant problem and this is not at all the desired or expected behavior for something that deep learning we may think of as solved quote-unquote image classification. And indeed the similar behavior as what I showed here was also observed in another setting for object recognition. When again this image of spices which was taken from a home in North America was passed into a CNN trained to do object detection, object detection and recognition, the labels for the detected objects in this image were as expected. Seasoning, spice, spice rack, ingredient, as we'd expect. Now again, for this image of spices, shown now on the left, which was in fact taken from a home in the Philippines, when that image was fed into that very same CNN, once again the predicted labels did not reflect the ground truth label that this image was an image of spices. Again, I'm pointing to something really really concerning going on. Now what was really interesting about this analysis was they asked okay not only do we observe this bias, but what could be the actual drivers and the reasons for this bias? And it turned out from this analysis that the accuracy of the object recognition model actually correlated with the income of the homes where the test images were taken and generated. homes where the test images were taken and generated. This points to a clear bias in these algorithms favoring data from homes of higher incomes versus those from lower incomes. Why could this be? What could be the source for this bias? Well it turns out that the data that was used to train such a model, the vast majority of it was taken from the data that was used to train such a model, the vast majority of it was taken from the United States, Canada, and Western Europe. But in reality, this distribution does not at all match the distribution of the world's population, given that the bulk of the world's population is in East and South Asia. So here I think this is a really telling and powerful example because it shows how bias can be perpetuated and exist on multiple levels in a deep learning or AI pipeline.


Bias at all stages in the AI life cycle (08:32)

And this particular analysis started to uncover and unearth some of those biases. And indeed, as I mentioned, bias can truly poison all stages of the AI development and life cycle. As mentioned, bias can truly poison all stages of the AI development and lifecycle. Beginning with the data, where imbalances with respect to class labels or even features can result in unwanted biases, to the model itself, to the actual training and deployment pipeline which can reinforce and perpetuate biases, to evaluation and the types of analyses that are and should be done to evaluate fairness and performance across various demographics and subgroups, and finally in our human interpretation of the results and the outcomes and the decisions from these AI systems, where we ourselves can inject human error and impose our own biases that distort the meaning and interpretation of such results.


Outline of the lecture (09:25)

So in today's lecture, we're going to explore this problem of algorithmic bias, both in terms of first, different manifestations and sources of this bias, and we'll then move to discuss different strategies to mitigate each of these biases and to ultimately work towards improving fairness of AI algorithms. And by no means is this a solved problem. In fact, the setup and the motivation behind this lecture is to introduce these topics so we can begin to think about how we can continue to advance this field forward.


Taxonomy (types) of common biases (10:00)

Alright, so let's start by thinking about some common types of biases that can manifest in deep learning systems. I think we can broadly categorize these as being data-driven or interpretation-driven. On the data-driven side, we can often face problems where data are selected such that proper randomization is not achieved, or particular types of data or features in the data are represented more or less frequently relative to others, and also instances in which the data that's available to us as users does not reflect the real world likelihood of particular instances occurring. All of these, as you'll see and appreciate, are very, very intertwined and very related. Interpretation driven biases refer more to issues in how human interpretation of results can perpetuate some of these types of problems. For example, with respect to falsely equating correlation and causation, trying to draw more general conclusions about the performance or the generalization of an AI system, even in the face of very limited test data, and finally in actually favoring or trusting decisions from an algorithm over that of a human. And we're going to, by no means is this survey of common biases that can exist an exhaustive list. It's simply meant to get you thinking about different ways and different types of biases that can manifest. So today we're going to touch on ways and different types of biases that can manifest. So today we're going to touch on several of these types of biases.


Interpretation driven biases (11:29)

And I'd first like to begin by considering interpretation-driven issues of correlation fallacy and overgeneralization. All right, so let's suppose we have this plot that, as you can see, shows trends in two variables over time. And as you notice, the data from these two variables are tracking very well together. And let's say, and it turns out, that in fact these black points show the number of computer science PhDs awarded in the United States. And we could easily imagine building a machine learning pipeline that can use these data to predict the number of computer science doctorates that would be awarded in a given year. And specifically we could use the red variable, because it seems to correlate very well with the number of CS doctorates, to try to, as our input to our machine learning model, to try to predict the black variable. And ultimately, what we would want to do is, you know, train on a particular data set from a particular time frame and test in the current time frame, 2021 or further beyond, to try to predict the number of computer science PhDs that are going to be awarded. Well, it turns out that this red variable is actually the total revenue generated by arcades in a given year. And it was a variable that correlated with the number of computer science doctorates over this particular time frame. But in truth, it was obscuring some underlying causal factor that was what was ultimately driving the observed trend in the number of computer science doctorates. And this is an instance of the correlation fallacy. And the correlation fallacy can actually result in bias because a model trained on data like this generated revenue generated by arcades as input computer science doctorates as the output could very very easily break down because it doesn't actually capture the fundamental driving force that's leading to this observed trend in the variable that we're ultimately trying to predict. So correlation fallacy is not just about correlation not equating to causation, it can also generate and perpetuate bias when wrongfully or incorrectly used. Let's also consider the assumption of overgeneralization. So let's suppose we want to train a CNN on some images of mugs from some curated internal data set and take our resulting model and deploy it in the real world to try to predict and identify mugs. Well, mug instances in the real world are try to predict and identify mugs. Well, mug instances in the real world are likely to not be very similar to instances on which the model was trained. And the overgeneralization assumption and bias means that, or reflects the fact that our model could perform very well on select manifestations of mugs, those that are similar to the training examples it's seen, but actually fail and show poor performance on mugs that were represented less significantly in data, although we expect it to generalize. And this phenomenon can be often thought of and described as distribution shift, and it can truly bias networks to have worse performance on examples that it has not encountered before. One recent strategy that was recently proposed to try to mitigate this source of bias is to start with the dataset and try to construct an improved dataset that already account for potential distribution shifts. And this is done, for example, by specifying example sets of, say, images for training and then shifting with respect to a particular variable to construct the test data set. So for example, in this instance, the distribution shift that occurs between the train and the test is with respect to the time and the geographic region of the images here. Or in the instance of medical images this could mean sourcing data from different hospitals for each of train, validation, and test. And as greater awareness of this issue of distribution shift is brought to light, datasets like this could actually help try to tame and tune back the generalization bias that can occur, because they inherently impose this necessity of already testing your model on a distribution shifted series of examples. All right, so that gives you hopefully a sense of interpretation-driven biases and why they can be problematic.


Data driven biases - class imbalance (16:04)

Next, we're going to turn most of our attention to what are, in my opinion, some of the most pervasive sources and forms of bias in deep learning, which are driven by class or feature imbalances that are present in the data. First, let's consider how class imbalances can lead to bias. Let's consider some example data set shown here, or some example data shown here, and let's say that this plot on the left shows the real world distribution of that data that we're trying to model with respect to some series of classes. And let's suppose that the data that is available to us, in that data, the frequency of these classes is actually completely different from what occurs in the real world. What is going to be the resulting effect on the model's accuracy across these classes? Will the model's accuracy reflect the real world distribution of data? No. What instead is going to happen is that the model's accuracy can end up biased based on the data that it has seen, specifically such that it is biased towards improved or greater accuracies rather on the more frequently occurring classes. And this is definitely not desired. What is ultimately desired is we want the resulting model to be unbiased with respect to its performance, its accuracy across these various classes. The accuracies across the classes should be about the same. And if our goal is then to train a model that exhibits fair performance across all these classes, in order to understand how we can achieve that, we first need to understand why fundamentally class imbalance can be problematic for actually training the model. To understand the root of this problem, let's consider a simple binary classification task. And let's suppose we have this data space. And our task is to build a classifier that sees points somewhere in this data space and classifies them as orange or blue. And we begin in our learning pipeline by randomly initializing the classifier such that it divides this up the space. Now let's suppose we start to see data points. They're starting to be fed into the model, but our data set is class imbalanced, such that for every one orange point the model sees, it's going to see 20 blue points. Now the process of learning, as you know from gradient descent, is that incremental updates are going to be made to the classifier on the basis of the data that it has observed. So for example in this instance after seeing these blue points, the shift will be to try to move the decision boundary according to these particular observations and that's going to occur. Now we've made one update, more data is going to come in and again they're all going to be blue points again due to this 1 to 20 class imbalance and as a result the decision boundary is going to move accordingly. And so far the random samples that we have seen have reflected this underlying class imbalance but let's suppose now we see an orange data point for the first time. What's going to happen to that decision boundary? Well it's going to shift to that decision boundary? Well, it's going to shift to try to move the decision boundary closer to the orange point to account for this new observation, but ultimately remember this is only one orange point and for every one orange point we're going to see 20 blue points. So in the end our classifiers decision boundary is going to end up occupying more of the blue space since it will have seen more blue samples and it will be biased towards the majority class. So this is a very, very simplified example of how learning can end up skewed due to stark class imbalances. And I assure you that class imbalance is a very, very common problem which you almost certainly will encounter when dealing with real-world data that you will have to process and curate. And in fact, one setting in which class imbalance is particularly relevant is in medicine and healthcare. And this is because the incidence of many diseases, such as cancer, is actually relatively rare when you look at the general population. So to understand why this could be problematic and why this is not an ideal setting for training and learning, let's imagine that we want to try to build a deep learning model to detect the presence of cancer from medical images like MRI scans. And let's suppose we're working with a brain tumor called glioblastoma, which is the most aggressive and deadliest brain tumor that exists, but it's also very rare, occurring at an incidence of approximately three out of every 100,000 individuals. Our task is going to be to try to train a CNN to detect glioblastoma from MRI scans of the brain. And let's suppose that the class incidence in our data set reflected the real world incidence of disease, of this disease. Meaning that for a data set of 100,000 brain scans, only three of them actually had brain tumors. What would be the consequences on the model if it was trained in this way? Remember that a classification model is ultimately being trained to optimize its classification accuracy. So what this model could basically fall back towards is just predicting healthy all the time, because if it did so it would actually reach 99.997% accuracy even if it predicted healthy for instances when it saw a brain tumor because that was the rate at which healthy occurred in its data set. Obviously this is extremely problematic because the whole point of building up this pipeline was to detect tumors when they arise. All right, so how can we mitigate this? To understand this, we're going to discuss two very common approaches that are often used to try to achieve class balance during learning. Let's again consider our simple classification problem, where we randomly initialize our classifier dividing our data space. The first approach to mitigate this class imbalance is to select and feed in batches that are class balanced. What that means is that we're going to use data in batches that exhibit a one-to-one class ratio. Now during learning, our classifier is going to see equal representation with respect to these classes and move the decision boundary accordingly and once again the next batch that comes in is again going to be class balanced and the decision boundary will once again be updated and our end result is going to be a quite a reasonable decision boundary that divides this space roughly equally due to the fact that the data the model has seen is much more informative than what would have been seen with starkly imbalanced data. And in practice this balanced batch selection is an extremely important technique to try to alleviate this issue. Another approach is to actually weight the likelihood of individual data points being selected for training according to the inverse of the frequency at which they occur in the data set. So classes that are more frequent will have lower weights, classes that are less frequent will have higher weights, and the end result is that we're going to produce a class balance data set where different classes will ultimately contribute equally to the model's learning process. Another way we can visualize this reweighting idea is by using the size of these data points to reflect their probability of selection during training. And what example weighting means is that we can increase the probability that rare classes will be selected during training and decrease the probability that common classes will be selected.


Bias within the features (24:02)

So so far we have focused on the issue of class balance, class imbalance and discussed these two approaches to mitigate class imbalance. What if our classes are balanced? Could there still be biases and imbalances within each class? Absolutely. To get at this let's consider the problem where we're trying to train a facial detection system. And let's say we have an equal number of images of faces and non-faces that we can use to train the model. Still, there could be hidden biases that are lurking within each class, which in fact may be even harder to identify and even more dangerous. And this could reflect a lack of diversity in the within-class feature space. That is to say, the underlying latent space to this data. So continuing on with the facial detection example, one example of such a feature may be the hair color of the individuals whose images are in our face class data. And it turns out that in the real world, the ground truth distribution of hair color is about 75% to 80% of the world's population having black hair, 18% to 20% having brown hair, 2% to 5% having blonde hair, and approximately 2% having red hair. However, some gold standard data sets that are commonly used for image classification and face detection do not reflect this distribution at all, in that they are over-representing brown and blonde hair and under-representing black hair. And of course, in contrast to this, a perfectly balanced dataset would have equal representation for these four hair colors. I'll say here that this is a deliberate oversimplification of the problem. And in truth, all features including hair color will exist on a spectrum, a smooth manifold in data space. And so ideally, what we'd ultimately like is a way that we can capture more subtlety about how these features are distributed across the data manifold and use that knowledge to actively debias our deep learning model. But for the purpose of this example let's continue with the simplified view and let's suppose we take this gold standard data set and use it to train a CNN for facial detection. What could end up occurring at test time is that our model ends up being biased with respect to its performance across these different hair color demographics. And indeed as I introduced in the beginning of this lecture these exact same types of biases manifest quite strongly in large-scale commercial-grade facial detection and classification systems. And together I think these results and considerations raise the critical question of how can we actually identify potential biases which may be hidden and not overly obvious like skin tone or hair color and how can we actually integrate this information into the learning pipeline. From there going be a step beyond this how can learning pipelines and techniques actually use this information to mitigate these biases once they are identified?


Mitigate biases in the model/dataset (27:09)

And these two questions introduce an emerging area of research within deep learning, and that's in this idea of using machine learning techniques to actually improve fairness of these systems. And I think this can be done in two principal ways. The first is this idea of bias mitigation, and in this case we're given some bias model data set learning pipeline, and here we want to try to apply a machine learning technique that is designed to remove aspects of the signal that are contributing to unwanted bias. And the outcome is that this bias is effectively mitigated, reduced along the particular axis from which we remove the signal, resulting in a model with improved fairness. We can also consider techniques that, rather than trying to remove signal, try to add back signal for greater inclusion of underrepresented regions of the data space or of particular demographics, to ultimately increase the degree to which the model sees particular slices of the data. And in general, this idea of using learning to improve fairness and improve equitability is an area of research which I hope will continue to grow and advance in the future years as these problems gain more traction. All right, so to discuss and understand how learning techniques can actually mitigate bias and improve fairness, we first need to set up a few definitions and metrics about how we can actually formally evaluate the bias or fairness of a machine or deep learning model. So for the sake of these examples, we'll consider the setting of supervised learning, specifically classification. A classifier should ultimately produce the same output decision across some series of sensitive characteristics or features, given what it should be predicting. Therefore, moving from this, we can define that a classifier is biased if its decision changes after it is exposed to particular sensitive characteristics or feature inputs, which means it is fair with respect to a particular variable z if the classifier's output is the same whether we condition on that variable or not. So for example, if we have a single binary variable z, the likelihood of the prediction being correct should be the same whether or not z equals 0 or z equals 1. So this gives a framework for which we can think about how to evaluate the bias of a supervised classifier. We can do this, we can take this a step further to actually define performance metrics and evaluation analyses to determine these degrees of bias and fairness. One thing that's commonly done is to measure performance across different subgroups or demographics that we are interested in. This is called disaggregated evaluation. So let's say if we're working with colored shapes, this could be with respect to the color feature, keeping shape constant, or the shape feature, keeping color constant. We can also look at the performance at the intersections of different subgroups or demographics, which in our shape and color example would mean simultaneously considering both color and shape and comparing performance on blue circles against performance on orange squares and so on and so forth. So together now that we've defined what a fair super, what a fair classifier would look like and also some ways we can actually evaluate bias of a classification system, we now have the framework in place to discuss some recent works that actually used deep learning approaches to mitigate bias in the context of supervised classification. So the first approach uses a multi-task learning setup and adversarial training. In this framework the way it works is that we the human users need to start by specifying an attribute z that we seek to debias against. And the learning problem is such that we train we're going to train a model to jointly predict an output Y as well as the value of this attribute Z. So given a particular input X the network is going to this is going to be passed in to the network via embedding and hidden layers and at the output the network will have two heads each corresponding to one of the prediction tasks. The first being the prediction of the target label Y, and the second being the prediction of the value of the sensitive attribute that we're trying to debias against. And our goal is to be to try to remove any confounding effect of the sensitive attribute on the outcome of the task prediction decision. This effect removal is done by imposing an adversarial objective into training, specifically by negating the gradient from the attribute prediction head during backpropagation. And the effect of this is to remove any confounding effect that that attribute prediction has on the task prediction. When this model was proposed it was first applied to a language modeling problem where the sensitive attribute that was specified was gender and the task of interest was this problem of analogy completion where the goal is to predict the word that is likely to fill an analogy. For example he is to she as doctor is to blank. And when a biased model was tested on this particular analogy, the top predictions it returned were things like nurse, nanny, fiance, which clearly suggested a potential gender bias. However, a de-biased model employing this multi-task approach with specification of gender as the attribute was more likely to return words like pediatrician or physician, examples or synonyms for doctor, which suggested some degree of mitigation of the gender bias.


Automated debiasing from learned latent structure (33:20)

However, one of the primary limitations of this approach is this requirement for us, the human user, to specify the attribute to de-bias against. And this can be limiting in two ways. First, because there could be hidden and unknown biases that are not necessarily apparent from the outset, and ultimately we want to actually also de-bias against these. Furthermore, by specifying what the sensitive attribute is, we humans could be inadvertently propagating our own biases by way of telling the model what we think it is biased against. And so ultimately, what we want and what we desire is an automated system that could try to identify and uncover potential biases in the data without any annotation or specification. And indeed, this is a perfect use case for generative models, specifically those that can learn and uncover the underlying latent variables in a dataset. And in the example of facial detection, if we're given a data set with many many different faces, we may not know what the exact distribution of particular latent variables in this data set is going to be. And there could be imbalances with respect to these different variables, for example face pose, skin tone, that could end up resulting in unwanted biases in our downstream model. And as you may have seen in working through Lab 2, using generative models we can actually learn these latent variables and use this information to automatically uncover underrepresented and overrepresented features and regions of the latent landscape and use this information to mitigate some of these biases. We can achieve this by using a variational autoencoder structure. And in recent work, we showed that based on this VA network architecture, we can use this to learn the underlying latent structure of a dataset in a completely unbiased, unsupervised manner. For example, picking up in the case of face images particular latent variables such as orientation, which were once again never specified to the model. It picked up and learned this as a particular latent variable by looking at a lot of different examples of faces and recognizing that this was an important factor. From this learned latent structure, we can then estimate the distributions of each of these learned latent variables, which means the distribution of values that these latent variables can take. And certain instances are going to be overrepresented. So for example, if our data set has many images of faces of a certain skin tone, those are going to be over-represented and thus the likelihood of selecting a particular image that has this particular skin tone during training will be unfairly high, which could result in unwanted biases in favor of these types of faces. Conversely, faces with rare features like shadows, darker skin, glasses, hats may be underrepresented in the data and thus the likelihood of selecting instances with these features to actually train the model will be low, resulting in unwanted bias. From this uncovering of the distribution of latent structure, we showed that this model could actually adaptively adjust sampling probabilities of individual data instances to re-weight them during the training process itself, such that these latent distributions and this resampling approach could be used to adaptively generate a more fair and more representative dataset for training.


Adaptive latent space debiasing (37:11)

To dig more into the math behind how this resampling operation works, the key point to this approach is that the latent space distribution is approximated via this joint histogram over the individual latent variables. Specifically, we estimate individual histograms for each of the individual latent variables and for the purpose of this approximation assume that these latent variables are independent, such that we can take their product to arrive at a joint estimate, an estimate of the joint distribution across the whole latent space. And based on this estimated joint distribution, we can then define the adjusted probability for sampling a particular data point x during training, based on the latent space for that input instance x. Specifically, we define the probability of selecting that data point according to the inverse of the approximated joint distribution across latent space, which is once again defined by each of these individual histograms and furthermore weighted by a debiasing parameter alpha, which tunes the degree of debiasing that is desired. Using this approach and applying it to facial detection, we showed that we could actually increase the probability of resampling for faces that had underrepresented features. And this qualitatively manifested when we inspected the top faces with the lowest and highest resampling probabilities respectively. We then could deploy this approach to actually select batches during training itself, such that batches sampled with this learned debiasing algorithm would be more diverse with respect to features such as skin tone, pose, and illumination. And the power of this approach is that it conducts this resampling operation based on learned features that are automatically learned. There's no need for human annotation of what the attributes or biases should be, and thus it's more generalizable and also allows for de-biasing against multiple factors simultaneously. To evaluate how well this algorithm actually mitigated bias, we tested on a recent benchmark dataset for evaluation of facial detection systems that is balanced with respect to the male and female sexes as well as skin tone.


Evaluation towards decreased racial and gender bias (39:39)

And to determine the degree of bias present, we evaluated performance across subgroups in this dataset, grouped on the basis of the male-female annotation and the skin tone annotation. And when we considered the performance first of the model without any debiasing, so the supposedly biased model, we observed that it exhibited the lowest accuracy on dark males and the highest accuracy on light males, with around a 12% difference between the two. We then compared this accuracy to that of the de-biased models and found that with increasing de-biasing, the accuracy actually increased overall, and in particular on the subsets and subgroups such as dark males and dark females. And critically, the difference in accuracy between dark males and light male faces decreased substantially with the de-bias model, suggesting that this approach could actually significantly decrease categorical bias. To summarize, in today's lecture we've explored how different biases can arise in deep learning systems, how they manifest, and we also went beyond this to discuss some emerging strategies that actually use deep learning algorithms to mitigate some of these biases.


Conclusion And Future Considerations In Ai Fairness

Summary and future considerations for AI fairness (41:00)

Finally, I'd like to close by offering some perspectives on what I think are key considerations for moving towards improved fairness of AI. The first is what I like to call best practices, things that should become standard in the science and practice of AI. Things like providing documentation and reporting on the publication of datasets, as well as that of models that summarize things like training information, evaluation metrics, and model design. And the goal of this being to improve the reproducibility and transparency of these data sets and models as they're used and deployed. The second class of steps I think that need to be taken are these new algorithmic solutions to actually detect and mitigate biases during all aspects of the learning pipeline. And today we consider two such approaches. But there's still so much work to be done in this field to really build up to robust and scalable methods that can be seamlessly integrated into existing and new AI pipelines to achieve improved fairness. The third criterion I think will be improvements in terms of data set generation, in terms of sourcing and representation, as well as with respect to distribution shift, and also formalized evaluations that can become standard practice to evaluate the fairness and potential bias of new models that are output. Above all I think what is going to be really critical is a sustained dialogue and collaboration between AI researchers and practitioners as well as end-users, politicians, corporations, ethicists, so that there is increased awareness and understanding of potential societal consequences of algorithmic bias and furthermore discussion about new solutions that can mitigate these biases and promote inclusivity and fairness. With that I'll conclude and I'd like to remind you that for those of you taking the course that the entries for the lab competitions are going to be due today at midnight Eastern Time. Please submit them on Canvas and as always if you have any questions on this lecture, the labs, any other aspects of the course, please come to the course Gather Town. Thank you for your attention.


Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to Wisdom In a Nutshell.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.