Blinding identities during evaluations is often proposed as a way to combat discrimination and alleviate disparities, but previous studies have found mixed results. I implement blind review at an academic conference, and directly compare the scores that the same submitted paper received from blind and non-blind reviewers. With this design, I find that the effects of blinding differed by how a subject would perform "under the status quo" without blinding. Blinding did not significantly impact the gender score gap among those that would perform the best or the worst under the status quo, but significantly exacerbated the gap among applicants who would have scored near the median without blinding. Consequently, the effect of blinding on acceptance rate gaps varied with the overall acceptance rate. This can help reconcile why blinding has "worked" in some contexts while not in others. Ultimately, it is necessary to examine distributional effects to understand exactly in which situations blinding produces desired outcomes. Even when the average treatment effect on gaps is small, this may mask important heterogeneity for individuals at different margins.