ML Kit x Material Design: Design Patterns for Mobile Machine Learning (Google I/O’19)


[MUSIC PLAYING] RACHEL BEEN: Hi, audience. Hello, live stream. I’m Rachel, Creative
Director of Material Design. This is Kunal and Phil,
Senior Designers on the team. Let’s start out with
why are we doing this. Why are we talking about this. Why is a design
systems team concerned with machine learning. OK, let’s do a quick primer for
those of you maybe unfamiliar with machine learning. I love using cats to talk
about ML, so humor me. So what is machine learning? It basically gives
computers the ability to make predictions
and solve problems without specific instructions. So by identifying
really complex patterns, machine learning can
be used for a variety of product experiences. Identifying music,
responses in a chat app, identifying imagery. And products, as many
of you saw yesterday, like ML Kit and Firebase,
really make this technology available to developers
of any skill set. But the issue is this
technology is not effective if users can’t
understand its benefits. So this is an example of
one of the original demos out of ML Kit. Sorry, ML Kit, if anyone from
ML Kit is in the audience. Of looking at some object
detection patterns. Now, it’s picking up
the dog to some degree, but how is this experience
super beneficial to users seeing this on the front end? So the goal of
material design is to build the design system that
harnesses technology like this into a beautiful and
usable interface. And this well-designed interface
is what’s useful to people. But a well-designed
interface is also allows you to implement the
technology faster and provide assurances that your
audience and your users will understand the interface. And it will work for your users. One of the most important
things about a design system machine learning, are really
the design solutions when things do not go as planned. Because machine learning can
be fluid, and unpredictable, and augmented by users, these
situational design solutions must be considered for a
functional product experience. So there’s no single
design pattern. And as many of
you probably know, that encompasses the
entire myriad of things that machine learning can do. This is an example of the
plethora of features that machine learning can power. We selected some of the most
prevalent machine learning powered patterns. Visual search patterns. What does that mean? Really it means
using your camera to search the world, instead
of more traditional text input options. Kunal and Phil are going to talk
about these patterns upcoming. But before we start, I want to
mention a few tactical pieces of information to make this
guidance more understandable. What are we providing? We’re providing three visual
search pattern articles on material.io which
is the guideline site for material design. There’s also a great demo
app provided by the ML Kit team available on GitHub that
is specific for material design and specific for these
patterns that you can download. There’s also really great
demos from Adidas, , Ikea, and Flutter that we’ll talk
a little bit about later. And with that, I’m going
to hand it off to Kunal to start talking
about the patterns. KUNAL PATEL: All right. Thanks, Rachel. So as Rachel
mentioned, we’re going to focus on two ML Kit
APIs today and walk through three patterns. As we were working through
the object detection and tracking API,
we realized there were significant differences in
user experience for how a user would go through this flow. With the streaming
mode of the API, which uses a live camera
feed, or the static mode, which uses a provided image. So we’ve split that one API
into two separate design guidelines based on the
mode that you’re using. And as we go
through these flows, for those of you who are
familiar with material design, you’ll see a number of
familiar components. A top app bar,
buttons, dialogues, and also some new
elements that we’ve added. Such as the reticle
and object markers to extend the experience to
work for these visual search patterns, In addition, building on
our announcement last year of material theming. We wanted to make sure that
just like the rest of material design, these new visual search
experiences and the elements that we were adding can be
customized to match the look and feel of your application. What you’re seeing
here is a sticker sheet of visual search elements for
our material studies shrine. And we’ll be using this as an
example in all of our flows. So now I’m going to talk
about the first pattern we’re going to walk through,
which is object detection in a live camera. You may be asking, what do
I mean by object detection? What does live camera mean? And why should I care about
either of those terms? So object dissection is
really the first step of a visual search journey
as Rachel was talking about. The advantages of
searching by an image, or through visually, is that if
users don’t know what to type or say, they can
use an image instead to do that work for them. In addition, with the streaming
mode of the ML Kits object detection and tracking
API, they can do this without having to manually
take a photo of themselves. Just point out an object
and learn about it. Our guidelines for
this flow are going to cover tracking a single
prominent object in the user’s camera. So how does this
work technically? Users can detect an
object on their device using a ML Kits object
detection and trapping API. The API is going to crop
the camera frame just to the image area of
the detected object and send that along
to your own machine learning model
for classification and more information. Once you have results
available, those are sent back to the user. When we were thinking about
designing these experiences, we separated the
user’s experience into these three phases. Sensing, recognizing,
and communicating. Let’s take a closer
look at each one. So in the sensing
phase, this is when users have opened the visual
search feature in your app, and the app has begun
looking for an object. This could be to learn
more about a plant in a garden, an item
at a museum, or a shoe that they want to purchase. For first time
users, we want to be sure to explain how
this feature works. Many users are familiar with
using their smartphone camera to take photos. But not as kind of
a remote control in which to learn about
objects around them. So we recommend having
a light and fast one-screen on-boarding
experience that focuses on what the user
can do, what kinds of things they can detect,
and using animation to show that moving their device
to actually identify an object may be required. After that on-boarding process,
when they’re in the experience, we want to communicate
that the app is looking for objects in the camera. As I mentioned, our guidelines
for object detection in a live camera cover using
the prominent setting of ML Kits API. Which is going to look
for the largest object in the center of the camera. So we want to draw users’
attention to that area and let them know this is
where they have to search. We do that using this
new visual element that we call the reticle. It’s animated to draw attention
to the center of the screen and to let the user
know that it’s actively looking for items. In addition, we reinforce
what the reticle is doing with a tooltip at
the bottom of the screen that prompts the user to point
their camera at an object. A user may have trouble
detecting objects based on conditions in the
environments around them. Maybe there’s not
objects of the type that are recognized by your
app in their environment. Maybe it’s too bright or
dark, or objects are too close together for the app to get
an accurate reading of objects in the environment. What we recommend doing is
setting a detection time out here. So that if you notice that
users haven’t detected something after a certain
amount of time, you can bring up an error
message through a banner, and direct them to help
to troubleshoot the issues that they may be having. In the second phase
of the experience, when the user has
found an object that they want to
search for, the app needs to detect that
object and begin the visual search for them. When we do recognize an
object in the camera, we want to let the user
know that we identify this by adding a border
to the bounding box coordinates that we’re getting
back from the ML Kit API. You can notice that the
frame here is rectangular. This matches the
coordinates that you’re going to get back from
the API, and also lets the user know that
this is going to crop an area maybe a
little bit larger than the object
that’s being shown. However, when we
detect an object, we don’t want to necessarily
begin a search immediately. Users may be moving their device
around looking for objects, and we don’t want
to start a search for every single thing we see. That’s not going to align
to their expectation, and it’s going to be very
expensive for your app to do. Instead, we need
some light signal to confirm that
users are actually interested in this object. That they want to search for it. And the way we do
that is by asking them to keep their camera
still for a moment. Sort of hover over that object
for a brief moment to let us know that they’re interested
in searching for it. That amount of time, before
we wait to begin the search, can be preset by
your own application. So it can be customized. We want to communicate
that to the user by including a loader
within that center element, the reticle, so they
know how much longer they have to keep their device still. Once they do indicate
that they want us to search for this object. The first thing
we’re going to do is pause that live camera feed. This is a strong
signal to the user that they can now
move their device to a more comfortable
position without losing track of the object. And the second thing we’re going
to do is remove the reticle and replace it with an
indeterminate progress indicator to let them know that
we’ve started running a search and are looking
results for this item. At this stage, one
thing that can go wrong, or that you want to
keep an eye out for, is how far the user
is from the object. So while an object may be
detected from a distance, you may want to set a minimum
size for detected objects in order to begin your search. And this is because
the image that’s going to be used by your
own model to find results is based on what’s
in the camera view. So if the object is very
small in the camera, it may be missing
details that are going to be helpful for
identifying that item. So we included a partial
border style for you to use for an object
that’s detected, but is maybe too far away. And tips on how to message
and change the reticle style to let a user know
that they need to move closer before
we can actually search for this object. In the last phase
of this experience, the app has results for
that object that they’re ready to send back to the user. And the user needs
to be able to focus on them to complete their task. The first thing we
want to do is use a thumbnail of that image
of the detected object and present it above
the results space. This serves two
important functions. One, it’s a bridge between
the recognizing phase and the communicating phase. It confirms what the user
was looking to search for and provides it
to them on results for easy comparison
with any information that you’re returning them. The second thing is that
we’re using a modal bottom sheet to present the results. And this has a
couple advantages. One, modal bottom sheets
come with this layer that separates the sheet from
the rest of the app UI called the scrim. This scrim darkens the
camera view behind and brings more emphasis to the results. And also provides a way for
users to return to the camera by tapping on it. Users can also
return to the camera to conduct another search
by tapping the header of the results sheet. In terms of the
layout for results, really want this to be based
on the needs of your app. What is the specific kind
of content that you’re presenting to users? How many results
are you returning? And what is your
confidence in the results? If you’re returning
multiple search results to a user based on
a visual search, we recommend using
a list or grid format to show those results. And if you just have
a single result, or a high-confidence result
that you want to promote, customize that
layout to your needs. If your results are
mostly low-confidence, which is to say that
your model doesn’t believe that it’s very
similar to the object that was detected. It doesn’t have a lot of
faith in its prediction. Let users know that they
may want to search again. You want to make
sure that you’re also setting a confidence
threshold for what results you’re presenting to users. And if the results are
kind of borderline, let users know that
they can search again. And maybe provide
some tips on what they could do to improve
in case the results don’t meet their expectations. If no results are found, maybe
because the user was too far away from the object,
didn’t capture it from the right angle
that was expected, or the environment affected
how bright or dark the object appeared. You want to make sure
you have a way to message these error cases
and direct the user to Help for more information. A lot of these
issues are not going to be things that you
can detect or help a user with in their
app and are things they are going to need to
change about their environment. Maybe it’s turning
on their camera flash to increase the brightness
of the object that’s being detected. Or change their position
to try photographing the item from another angle. So I mentioned
that we were going to look at how these
experiences could be customized. So let’s take a look at how
this live object detection flow can be customized
to match your app. So we’re going to use
Shrine, which is our material study for a retail app. Has a very kind of like
minimal and clean aesthetic. It uses these angled
corners for key elements that’s based on the geometric
logo that Shrine has. And it has this light
pink brand color. So I’m thinking about how this
gets applied to a visual search experience. We wanted this to feel seamless. We want these to blend in with
the rest of your application and carry over key elements
from color typography and shape. If we take a closer look at
some of the new elements we introduce here, like the
reticle, in our baseline flow it has this very rounded shape. But since Shrine has a more
geometric approach to the app and to key elements, we’ve
gone with a diamond shape instead to reflect
the brand and fit in with the rest of the
key elements of the app. Our tooltip, which
in the baseline form uses Rubato and a
black container, now uses Rubic, which
is Shrines font, and light pink
background color that’s used in the rest of the app. So now that we’ve walked
through how detecting objects from a
live camera works, and how it can be
customized to your app, I’m going to turn
it over to Phil to talk about how this
works in a static image. [APPLAUSE] PHILLIPE CAO: Wow. If you like that, wait till
you see what’s coming up next. Thanks, Kunal. So object detection
and a static image allows users to select
an image on their device. And then detect up to five
objects located inside of that image. This feature is
really useful for when users would like to analyze
an image that they captured before. Or if they are not able
to detect something right there on the spot. So one of the ways that
this can be integrated is in a search flow. Like what we see
here in this example where a user is looking
for plants in a photo that they took. So let’s break down
the technical flow of this experience. This is pretty similar
to the live camera flow that Kunal shared earlier,
just that here, our source is different. We’re using a static image
instead of a live camera. So first, the objects
are detected on device using ML Kit. Then the objects are
classified with your own model. And finally, the results are
presented back to the user. Now, as before,
we split this flow into three distinct phases. And these should probably look
familiar to you at this point. So here we have
Input, where the user selects an image to search. Recognize, where we
wait for the objects to be detected and identified. And Communicate, where
we review the results and complete the task. So in the Input phase,
we’re introducing the flow to the user, and we ask them
to provide an image to search. When the user opens a
feature for the first time, it’s important to explain how it
works so that there’s a better chance for a successful search. And the best way to
do this is, again, with a simple
on-boarding screen. So as Kunal had
mentioned earlier, we’re limiting
this to one screen and providing a
short explanation for how this feature works. So it’s important
here that we don’t want to use this moment to go
through every possible error that a user can encounter. Instead, we’re focusing
on the user interactions. What does the user
need to do to get to the information they want? And once we get through
the on-boarding screen and into the image
selection screen, we recommend using
the operating systems native selection screens
so that users are already familiar with how
to select an image. So that’s the Input phase done. Now we move on to
the Recognize phase, where we enter a
transition state while the user is waiting
for objects in the image to be detected and identified. In this phase, we’re using
an indeterminate loader paired with a tooltip to
clearly communicate to the user that they should be
expecting a short delay. And in addition, we’re also
using a translucent scrim on top of the image. So this does two things for us. The first is that it helps
to obscure the image slightly so that users know
the image not quite ready to interact with yet. The second is that it helps
provide adequate contrast for the loader and the
tooltip to be visible on top. And sometimes
certain factors can affect whether an image is
suitable for object detection. These are things like
poor image quality, the object in the image
being too small, low contrast between an object
and it’s background, or an object being shown
from an unrecognizable angle. Other times, it might not
be with the image itself, and more that the user lost
connection to the network. So if a user
encounters an error, it’s important to
anticipate them, and to facilitate a smooth
experience by explaining the issue in a banner. And giving the user an
opportunity to try a new image. You can also include a way
for the user to learn more in a dedicated help section. Finally, once we have
the objects detected, we move on to the last
phase, Communicate. So in this part of
the flow, we show the user which objects
have been detected, and give them the opportunity
to inspect those results. So the way that we
identify detected objects is through these
cute little object markers. They should be placed in the
center of a detected objects bounding box, and they’re
elevated with a shadow to make sure that they’re
visible against an image. So remember how
earlier Kunal talked about using a thumbnail
in the results view of the live camera mode? This is conceptually doing
the same thing, right. We’re helping the user to
compare the object in the image to the object in the results. And in addition to
the object markers, we’re giving the users
a preview of the results through these little
mini cards at the bottom. We place them in this
horizontally scrolling carousel at the bottom of the screen
so that it’s easier for users to reach. And finally, how do we encourage
users to explore and interact with their results? With a little something
called the power of design. The first is through
the motion transition. So the way that these elements
appear on screen is staggered, and that helps to
demonstrate that there are multiple results in view. And second, the
tooltip at the bottom allows us to use language
to prompt the user to explore the results. And finally at the
end, notice how as the carousel is
being scrolled through, the markers scale up in size
as its corresponding card scrolls into view. So this helps the user to draw
a relationship between each card and its matching dot. And then tapping each
card or the object marker brings up more details about
it in the bottom sheet. Same as what Kunal showed us
earlier in the live camera mode. And now, again, when we talk
about errors and issues, it’s important to account
for result confidence. If a search returns
a result with only low-confidence scores,
you can let the user note at the bottom of the list itself
with links to search again and tips to improve
their search. And aside from
low-confidence results, a search can just fail
and return without matches for several reasons. Like if the object isn’t
in a known set of objects, or if the image is low quality. So when this happens we
recommend displaying a banner that guides users
to help section for more information about
how to improve their search. OK, now in terms of
theming, let’s quickly talk about how we can take
these baseline patterns and customize them to
express your brand. Once again, we’re going to
use the example of Shrine that Kunal shared earlier. So here’s an overview of
what those key phases look like in these screens. The user might tap on the image
search icon in the top bar, select an image
from their device, and then explore the results
in the following screen. Like we mentioned,
Shrines visual language is all about using those
angular shapes, right. So we’ve expressed that here
through the object markers, where we turn them from
circles into diamonds, and through the cut
corners of the cards. And, of course, we’ve
transferred the typography and the color scale of Shrine
into the card title and shape. So that’s object detection and
a static image in a nutshell. So, now I want to welcome
Kunal back to the stage to welcome you through the last
experience, barcode scanning. [APPLAUSE] KUNAL PATEL: Thanks, Phil. So barcodes are an easy
and convenient format for passing information from
the real world to your device. ML Kits barcode scanning API
reads most standard barcode formats and provides an
easy way for your own app to be able to recognize barcodes
without users having to open up a separate application. And this is a great
way to either have users be able to search
in a different way or to automatically input
information by scanning a code, rather than by manually
typing things in. So how this works, technically,
is the ML Kit barcode scanning API can detect
most common one-dimensional and two-dimensional
barcode formats and then read the value. And if that value is
like a string, an ID that needs to be looked up,
you can send it off to your own database to get
results back and present them to the user. But one cool feature of
the barcode scanning API is that if the barcodes
value is in one of several common structured
data formats for contact information, event
details, things like that, the API
can automatically parse that structured data
and let you present it immediately to users without
you having to do any extra work. So we thought about this
experience in the same three phases that we used for the
other live camera experience or object detection flow. So I’m going to try to focus on
what’s unique to barcodes here. So in the sensing phase,
we open up the feature. The app has begun
looking for barcodes. We want to do that same
type of on-boarding. Keep it simple. Keep it short. Focus on the motion
that users might need to do to get a
barcode into view to scan. And once they’re in the
experience, instead of the reticle that we used in
the live object detection flow, we have what we call a
barcode frame instead. This barcode frame is also
at the center of the screen, but it’s setting the area
that the barcode will be automatically
scanned in once entered. So providing a
more prominent area for users to place
the barcode to scan. And we’re also animating
it to draw attention to the center of the screen. One last thing to note
is that the aspect ratio this barcode frame
can be adjusted to match your own application. So if you’re only
looking to scan QR codes, then this frame can take
on a more square shape to provide another
hints or cue to the user that those are the
types of barcodes that should be looking for. If they have any difficulties
with detecting objects, detecting barcodes, as we
covered in our previous flows. We want to have the same kind
of error case and error styles for the barcode frame and
for the rest of the app that we mentioned in
the earlier flows. In the Recognition phase,
the user scanned a barcode, the app has read its value,
and its loading results. So similar to what we talked
about in object detection. For most simple
barcode formats, you shouldn’t need to really set
a high minimum detection size. Most formats are
graphically easy and should be able to get read
pretty quickly once they’re in that barcode frame area. But more complex
formats, such as PDF 417, have a lot of detail that
will need a higher quality image in order to be
accurately read by the barcode scanning API. For these types of formats,
we recommend setting a minimum detection size. And using this
partial fill style for the barcode frame and a
tooltip message to let the user know that they need to move
closer in order to get a higher quality image to be scanned. If you need to send
information about the barcode off to another
database or other part of your app for
processing, we want to communicate any
loading time to the user by turning the barcode frame
into an indeterminate progress indicator. If the value can be
read immediately, you can skip straight
to showing results. So when we do have results
available to users, there are a couple of unique
things about displaying results that we haven’t covered
in our previous flows. The first is, what I mentioned
in the technical flow demonstration, is that the
barcode scanning API can read both structured data and we’ll
call unstructured data, or data that you need to
look up, and that may have custom information. For structured data
that’s in key value pairs, text fields provide
a really great format for representing
this information. The key end value
kind of relationship matches really well to a text
fields label and input areas, and it gives you room to
present actions to copy, and also a consistent
format for users to scan the different
types of information you may have available. For more custom
information that you’re looking up and returning,
customize the layout of the container that you’re
using to present the results. Another difference
with barcode scanning is that the types of
tasks users are doing may be very different. For cases where users are going
to be scanning multiple items, let’s say they’re an
e-commerce app, and maybe doing price comparison, and
looking up multiple items. We recommend displaying
the barcode results in a mobile bottom sheet
so that users can easily return to the camera and
scan another barcode. But for tasks like completing a
form, like scanning a gift card and getting its balance
added to your account, displaying that information over
the barcode scanning feature doesn’t make a lot of
sense for the user. It’s actually creating
more work for them. So we recommend, for
these types of flows where you’re
completing data entry, is returning a
user automatically back to the previous screen with
the appropriate fields filled in. We want to do that work
for them when we can. A number of tasks may
fall somewhere in between. For reviewing contact
information, for example, you may want to
edit someone’s name, add additional notes, before
you save that information and exit the barcode feature. So we recommend bringing
those results up in a modal bottom sheet
or a full screen dialog, so that if a user needs to
go back and change anything, they can. And they have this
lightweight way of editing that information
before saving and exiting the feature entirely. So as we did in the
other flows, I’ll quickly share an example of
how barcode scanning can be customized to match your app. So we’re seeing
here in the flow, we’re bringing in some of
Shrines colors and typography again. One thing to call out
here is the scrim, that separates the area that can
be scanned for the barcode from the rest of
the camera view, is using Shrines
text color instead of pure black for a slightly
more branded experience. In addition, if we take a closer
look at the barcode frame, when we usually have these
slightly rounded corners, we’re using sharp
corners to reflect the sharp cuts in the logo. The same typography
and color choices for the tooltip that we
discussed in the previous flows carry through here as well. So we’ve gone through
three different patterns in a lot of detail, thrown
a lot of information and micro-interaction
decisions at you. But hopefully you
noticed there were some common themes across
all three of these flows. I’m going to turn it back over
to Phil to talk about some of the design principles
that we use to guide our decision making process. [APPLAUSE] PHILLIPE CAO: Thanks, Kunal. So now that we’re familiar
with these three experiences, why don’t we zoom
out for a second and walk through the
core design principles that you should keep
in mind as you’re implementing these patterns
into your products. Our first principle
is to make the camera UI minimal and meaningful. We want to make sure that the
central means of input, which is the camera, is unobscured. And when things do
have to be overlaid on top, like a reticle,
or an object marker, we’re making sure that they’re
legible against any kind of image, light or dark. Our second principle is to keep
users informed at every moment. These new patterns can be
pretty unfamiliar to users, and explaining what’s
going on at every step is really important to
ensure a smooth experience. So first, really rely
on these design phases that we’ve shown you to
help organize your flows. And use multiple methods
like language, motion, components to implicitly
and explicitly communicate to your users
what’s happening now and what they
should expect later. And finally, outside
of the main flow, introduce users to the
essentials of your app through an on-boarding
experience. And give them a persistent way
to learn more about the future through a dedicated
help section. Our last principle
is to anticipate opportunities for issues. So is what Rachel was talking
about at the beginning, about designing for the
fringes of these use cases. These flows are prone to errors
from a variety of factors, and it could be really
frustrating for your user if you don’t take these
factors into account. So before designing
these flows, before even getting into the
design side, really test and learn the
model’s boundaries. What is it good at doing? What is it bad at doing? And what common errors
might you encounter because of those boundaries? Also, always account for
environmental conditions. So most apps wouldn’t otherwise
care how bright or dark a user’s environment is, but
for something like visual search that’s really important, right. So these are also
common issues that are applicable to
all kinds of users, regardless of their
technical familiarity or how good their device is. And finally, adapt your
design based on the confidence of your results. So this is less about explicitly
communicating the confidence at every step of the way. But more about
accommodating when you do have suboptimal
results so that users have a better understanding
of where their information is coming from. So to recap here’s our three
high level design principles. Make your camera UI
minimal and meaningful. Keep users informed
at every moment. And anticipate
opportunities for issues. And now to close things
off, I’m going to throw it all the way back to Rachel. RACHEL BEEN: OK, I’m back. Let’s close this out. OK. So really quick,
I’m going to talk about some collaborations
we did that I mentioned in the beginning. We worked with Ikea and Adidas. We worked with these two
early-access partners to really look at
what these flows would look like in actual products. And most specifically,
using real training data. It was really fantastic
to work with these teams, not only to get feedback,
but to really see how these teams approached theming. How they really took this
same pattern of live object detection, but really
tailored really minute details to their brand. You can check out both of these
demos in the AIML tent, which is very close by. Secondly, we worked
with Flutter. They have downloadable
source code on GitHub that you can see a
hypothetical barcode flow. It is using our [? FA ?]
e-commerce app, Shrine, to showcase what this
hypothetical flow would look like. And really valuable
to see what that might look like in practice. OK, real quick. Real quick recap on
some of the resources to actually help you implement. ML kit, and all of the
dev docs, and the demo that is really specific
to material design is available on the ML
Kit site and on GitHub. For those of you at
the top yesterday, the people in our research team
recently launched a guidebook. A really fantastic resource
providing tools and best practices for designing
with machine learning. That’s available
pair.withgoogle.com smallthanks.withgoogle.com. And of course, all of
these three patterns are available on the material.io
site in-depth and in-detail. And with that, thank you
for coming to our talk. We are going to be in the
sandbox right next door right after, if you want to
ask us any questions. And goodbye. [MUSIC PLAYING]

5 thoughts on “ML Kit x Material Design: Design Patterns for Mobile Machine Learning (Google I/O’19)”

Leave a Reply

Your email address will not be published. Required fields are marked *