Everything that makes up the web—text, images, video and audio—can be easily discovered. Many people who are blind or have low vision rely on screen readers to make the content of web pages accessible through spoken feedback or braille.
For images and graphics, screen readers rely on descriptions created by developers and web authors, which are usually referred to as “alt text” or “alt attributes” in the code. However, there are millions of online images without any description, leading screen readers to say “image,” “unlabeled graphic,” or a lengthy, unhelpful reading of the image’s file name. When a page contains images without descriptions, people who are blind may not get all of the information conveyed, or even worse, it may make the site totally unusable for them. To improve that experience, we’ve built an automatic image description feature called Get Image Descriptions from Google. When a screen reader encounters an image or graphic without a description, Chrome will create one.
Image descriptions automatically generated by a computer aren’t as good as those written by a human who can include additional context, but they can be accurate and helpful. An image description might help a blind person read a restaurant menu, or better understand what their friends are posting on social media.
If someone using a screen reader chooses to opt in through Settings, an unlabeled image on Chrome is sent securely to a Google server running machine learning software. The technology aggregates data from multiple machine-learning models. Some models look for text in the image, including signs, labels, and handwritten words. Other models look for objects they’ve been trained to recognize—like a pencil, a tree, a person wearing a business suit, or a helicopter. The most sophisticated model can describe the main idea of an image using a complete sentence.
The description is evaluated for accuracy and valuable information: Does the annotation describe the image well? Is the description useful? Based on whether the annotation meets that criteria, the machine learning model determines what should be shown to the person, if anything. We’ll only provide a description if we have reasonable confidence it’s correct. If any of our models indicate the results may be inaccurate or misleading, we err on the side of giving a simpler answer, or nothing at all.
Here are a couple of examples of the actual descriptions generated by Chrome when used with a screen reader.