Supercharging Firebase Apps with Machine Learning and Cloud Functions (Google I/O ’17)


LAUREN LONG: My name is
Lauren, and I’m an engineer on the Firebase team. I joined Google just
about a year ago. In fact, next Tuesday is
actually my Google-versary, so I’m really
excited about that. [CHEERS] Thank you, thank you. So during the last year, I’ve
been working on the Cloud Functions for Firebase team. And we launched
in beta, in March, something called Cloud
Functions for Firebase. This is a product that allows
you to run custom server code on Google
servers in response to events happening in your app. So you might have already
attended some of the talks this week on Cloud
Functions for Firebase. It has many powerful
applications. We are sharing with
you one of them today, which is how to easily add
machine learning to your app. So before we dive in,
you must be thinking, why would I care about
machine learning? I’m not trying to build
the next self-driving car. Maybe you are, and if
you are, good on you. But if you’re not, it’s
still useful to you. It is not just something that
requires this huge team of data scientists and engineers. It is something that, thanks
to Google’s technologies, you can implement with
one or two engineers. And you can use it to
enhance the user experience of your app, something
that is important no matter what you are building. So what do I mean by that? One example is adding
multi-language support. So you have big
ambitions for your app. You don’t just want
to limit your users to those that speak English. With machine learning,
it can be automatically translating content, especially
user-generated content, so that people that otherwise
do not have a common language can be connected
through your app. Great example of this is
closed captioning for YouTube. So this is a feature
that you can turn on, where machine learning
has automatically not only transcribed
the audio of the video, but also translated it. So here we have a
video introducing Cloud Functions for Firebase. We only made it in English,
but a developer in China can simply turn on closed
captioning in Chinese and follow along. Another great example of how
machine learning enhances the user experience is
providing contextual data so that your app can
respond intelligently when the user does something. My favorite example of
this is in Google Photos. So if you’ve used
it before, you might know that when you
upload a photo, machine learning
automatically picks out exactly what’s in each
photograph and annotates them. This way, later, when you’re
looking for something, you can find it really quickly. So recently I was trying
to go back to a hike that I did that
I really enjoyed. I knew it was by the ocean. I knew it was in Marin County. I had no idea what
the name of it was. But I knew that I had
taken a photo of the sign, so I searched up “ocean”
inside Google Photos. It quickly pulled up all
of these photos of oceans I took in the last few months. I saw the ones that matched
the view I remembered, looked at that date, pulled
up the rest of that photos from that day, and it turned
out to be the Rocky Point Road trail in Mill Valley. So I was able to go
back to that hike, and that was a really
magical experience. Again, machine learning
here was behind the scenes. It automatically detected
what was in my photo when I upload it. I didn’t have to manually tag
it to try to remember it later. So that was a really
great experience for me. So another example of
using machine learning comes from our
partner Auger Labs. They’re a Firebase customer,
one of the early users of Cloud Functions for Firebase. They make mobile apps for
art communities and galleries that enhance the
art-viewing experience. So they use machine
learning in two ways. One, they perform
image recognition. When someone snaps a
photo of an artwork, it knows exactly which
piece it is and pulls up the relevant information about
the artist, about the work. Secondly, they use machine
learning whenever someone uploads photos of artwork
to then annotate them so they’re easily
searchable later. So this is a great
example of a company that is making an app where
they’re using machine learning to make the experience of
something even more fun, in this case walking
through an art gallery. So enough talking
about machine learning. You must be really anxious
to see it in action. So today I am going to
show you a few things. So we thought we could just
do a demo where we show you a bunch of machine learning
apps and show you how they work, but it will be even more
fun to make a game out of it and use Cloud Functions to
tie together a few APIs. One of the APIs we
really wanted to show you is called the Cloud
Video Intelligence API. This is something that Google
launched in March, in beta. And for the first time,
you can use machine learning int a very easy to
use API format to analyze video content. So you might have done speech
or vision before for photos, but video content
was, prior to this, really difficult because of
the massive amounts of data in each file. But now you can easily
feed it a video, and it’ll pull out
exactly what’s there. So what we did
prior to this talk is we found a few videos,
cranked it through this API, looked at the labels. And the game we’re
going to play today is where you as
players have to guess what the Video Intelligence API
guessed based on what you see. So I will demo one round
of the game right now, then I’ll invite up Brendan
to explain the magic sauce behind the game. Then RJ will walk
us through the code, and last but not least, you
all get a chance to play. So if you can switch over
to the demo screen, please. So here we’ve got
one video that we have cranked through the API. I’m going to play it in a sec,
and here I’ve got on my phone– which you can see on the
screen being livestreamed– and I’m going to guess
what’s in the video. So I’m going to
play it right now. And the way I interact with
this app is by speaking to it. So this red microphone
button, I’m going to press, speak when I hold it down, and
release when I’m done talking. Sand. So what’s happening
now is that the audio is being captured in the file. It gets uploaded to Cloud
Storage for Firebase. That kicks off a Cloud
function, which then analyzes it for correctness. And here I got it correct. So that was sand. So some browsers may not
support microphone access. In that case, you can press this
keyboard button at the top left and type in an answer instead. So I see a lot of sun here. Maybe sun is something else. Oh, it turns out it was not sun. You can see here on my
app you’ve got a score. So my score of one correct
guess, the audience score of two, because we, of
course, have a demo bug and it duplicated my answers. And another thing that
we see on the big screen is you see the audience score,
just like we did before, but also number of languages. So here I’ve only
spoken in English. It is only one for that. I grew up in China. I lived there until I was 10. So I’m going to speak
in Mandarin here. So I select Mandarin from
this list, click Submit, go back to microphone. So I’m going to say the
Mandarin word for “ocean.” [MANDARIN] So here it’s not only
transcribing what I’m saying, but also translating it and then
evaluating it for correctness. Oh, I see what’s going on. A lot of you are
playing the game. OK, not demo fail, user error. I’m just kidding. But thank you for participating. Don’t worry, you will all
get a chance to play later. So I want to draw your
attention to a few things. So here we’ve got an
Activity Stream of everything that’s happening. So the blue is correct guesses,
and the red is failed guesses. And at the end of each round I’m
going to pull up this summary. And you can see this
is the answer key, so this is what the
Video Intelligence API has pulled out. And a few of them we’ve guessed,
a few of them we haven’t. So if you can switch back
to the slide, please, we are going to
have Brendan walk us through what’s happening. [APPLAUSE] BRENDAN LIN: All right. Thanks, Lauren. All right, hello, everyone. My name is Brendan Lin. I’m the product manager in
Cloud Functions for Firebase. So you’re probably wondering
how we built the app that Lauren just demoed. So the magic behind
the scenes involves using a variety of
Google’s machine learning APIs on top of
Cloud Functions for Firebase. So we’ll quickly go through
some of the different machine learning APIs and see how we
can use them in our own app to improve the app experience. So first off, the Cloud
Video Intelligence API is one of Cloud’s newest
machine learning APIs. You can take any video and
run it through this API to detect different objects,
labels, and even scene changes. For our app, we used it to
detect objects in the video. These objects will
be the answers that you’ll be trying to guess
when we all play together. So we also used the Cloud Speech
API to convert audio to text. So when Lauren spoke
into the microphone, we ran that audio through
the Cloud Speech API so we could extract that text. The Cloud Speech
API also recognizes over 80 different languages,
and can even stream text results in real time. So our app lets
you submit answers in many different languages. For this we use the
Cloud Translation API. This lets you pass in
any arbitrary string and translate it to over
100 different languages. So each possible
answer in our app has been run through this API
to build out our answer bank. So although we’re not
using this in our app, it’s important to mention
another machine learning API, one of the most popular ones. So this is the Cloud Vision API. It allows you to analyze images
to detect individual objects, faces, extract text,
even to figure out if there are certain
landmarks in your photo. So this is an actual
photo of my dog. And it did a wonderful job
of detecting what type of dog she is, and even
noticed that there’s a stuffed toy in the photo. So for most use
cases, these APIs are best implemented off the
client and on your backend. This lets you have
a centralized place to have all of your
logic for processing input from multiple
users, allowing you to keep
long-running processes and tasks off the client. You can keep things
like secrets, as well as secure logic,
away from prying eyes. But what if you
don’t want to deal with setting up and managing
your own custom servers? Well, we have this
thing for that– as you may have guessed,
Cloud Functions for Firebase. So Cloud Functions is
our programmatic glue. It allows you to write
custom JavaScript code, deploy to Google’s
Cloud, which can be triggered by Firebase and Cloud Events. Also, there’s no need to
think about servers here. Cloud Functions
for Firebase lets you run your mobile
backend code without having to worry about setting up or
managing your own servers. You just write your functions,
deploy them, and that’s it. So with Cloud
Functions for Firebase, you also don’t have to
worry about scaling. We’ll automatically spin
up new instances whenever you need them, and
also scale back down to zero once they’re done. That way you only
pay for what you use. So earlier, Lauren mentioned
Auger Labs, one of our partners that uses machine learning to
build apps for art communities and galleries. They enjoyed using Cloud
Functions because they only had to focus on writing the
functions, and that’s it. So let’s quickly dive in and
see exactly how it works. So Cloud Functions
are event-driven. That means your functions
listen for an event. And once that event is
admitted, your function gets triggered, which executes
the code in your function. So what kind of
events do we support? Right now there’s support for
Google Analytics for Firebase, the Firebase Realtime Database,
Cloud Storage for Firebase, Cloud Pub/Sub, Firebase
Authentication, and even HTP requests for
integrating with third parties. So let’s see how this
looks with a real use case. So say we have an app that
allows users to upload videos to cloud storage. So a video upload will
trigger a function that listens for a particular
change on a bucket. Within that function, you
can analyze the video using the Video Intelligence
API, take those results, then store them into
the Realtime database. So let’s quickly take
a look at some code. So this is actually
a real example that we’re using in our app. So after a user
account is created, we want to set their
default language to English. So this is super simple
for us to accomplish using Cloud Functions for Firebase. So here we’re
creating a function that listens for the
user onCreate event from Firebase Auth. First, we grab the user’s
ID from the event payload. And finally, we set the user’s
default language to English in the Realtime database
using Firebase Admin. And that’s it. You can also rely on
multiple Cloud Functions to create specific workflows. For instance, we
have a workflow here that’s comprised of
two separate functions. So an audio file is
uploaded to Cloud Storage, which triggers a function. This function analyzes
the audio and transcribes it using the Cloud Speech API. This then gets written
to the database, which triggers another function. That function that was listening
for the write in the database will run that text through
the Cloud Translation API, which will then save the
result back to the database. Also, as I mentioned, you
want to try and keep secure, trusted code off the
client since there are ways to inspect the client code. We use Cloud Functions to
handle all of our scoring logic. So when someone
submits an answer, that gets written to
our database, which then triggers a Cloud
Function that contains all of our scoring logic. And depending on whether
or not the answer is right, we’ll update the user score. So with that, I’m
going to hand it over to RJ, who’s going to
be diving into our code. ROBERT-JAN HUJSMAN:
Thank you, Brendan. So we thought that before we
let you all play the game, we’d show you some of the
backend code, the Cloud Functions code,
that powers it all. And we won’t be
showing you all of it, but what we want to show you
is two parts that we think are most interesting. We’ll show you the part that
does the judging of the guesses that you have. So if you type in
a guess, there’s some custom scoring logic
that we do on the backend. But we’ll actually start by
showing you the code that does speech analysis. Now, if we can switch over
to the demo computer, please. The code that we’re
looking at is node JS. This is server side
code, as if you were to run it on your own
server, except it runs on ours. And what you see
is, this looks– if you’ve written
in node before, this might look familiar. We import two main dependencies. One of them is our
functions module, and the other is a helper module
that contains my actual logic. We’ll get to that
in just a second. What you see next
is the setDefaults function that Brendan
was talking about before. I folded the actual code,
but the structure is exactly as you’ve seen it on a slide. The cool thing, one of
my favorite features about Cloud Functions
for Firebase, is that when you
deploy your code, the Firebase command line tool
will actually read your code. And it will see that, for
example, in this case, setDefaults is an
authentication function that runs when a user gets created. And it’ll install
the function for you without any further
configuration on your part. It sees this from
the code structure. Similarly, if we scroll
down a little bit here to the
analyzeSpeech function– which is the one that I’m going
to show you all of the code for– you can see that this
is a Cloud function that acts on a storage trigger. When a storage object changes– for example, we upload
a new storage object– we get an event. And this event triggers
this function over here. So the first thing I
do, this is a little bit of just housekeeping, basically. When a function
triggers, you might have triggered it accidentally. For example, let’s say that I’m
cleaning up my storage buckets. So what I’m doing first is, oh,
if I triggered it accidentally, if this is not actually the
event that I wanted to get, I skip this one. But let’s say that, as
is usually the case, I’m actually getting an
event that I’m interested in. I will pull out two
pieces of information from the event that
I got, in this case the URL of the file
that was uploaded, a publicly available
URL, as well as the file name of the
object that was uploaded. And then I hand it off to
this other module that I have, my own custom, you might call
it business logic module. So why am I using
a separate module? The reason for that is because
I care about unit testing. And by factoring out my business
logic into a separate module, I can write unit tests
for exactly that module and just treat this function as
the entry point into my module. Now, if this is something
you’re interested in, we won’t go into depth
on it in this talk, but at 2:30 me and
two other colleagues will be talking about that
more here in a session called Cloud Functions Testability
and Open Source. So if you’re interested
in it, come back then. For now we’ll just
dive into my business logic, which is over here. So this is the actual
logic of the function that does speech analysis. And the first thing we do
is we parse the file name that I passed in to get
some useful information, like the user ID, the language
that the user was speaking when they were speaking to
their phone, the timestamp of the speaking, and then
we formulate a request that we will send to
the Cloud Speech API. Now, this is actually
deceptively easy. When I started using
the Cloud Speech API, I was expecting to have
to do a lot of work, but this is six lines of code to
do the entire call to the Cloud Speech API, which
is pretty cool. There’s a fun thing here,
by the way, that I’m using. If you’ve worked in
JavaScript before, you may be looking at
this “await” keyword at the speech.recognize line. Let me pull it up a little
bit so that people in the back can see it as well. There we go. So this recognize operation,
you would intuitively assume– and you would be correct– that this is a somewhat
longer running operation. It might take a second. And in JavaScript,
that means it’s usually an asynchronous operation. But I like reading my code in a
more synchronous reading flow, even though it is
asynchronous code. And so I’m using the
new “await” keyword to block on that line of
code until it finishes. And then I’ll just
move on to the next. This is basically a more
fluid way of writing promises, if you’ve used
JavaScript before. If you’ve never used
JavaScript– or in this case, actually, TypeScript– before, then don’t worry. This is actually just code
that executes top to bottom. So yay. All right. So we have asked
the Cloud Speech API to recognize
a audio segment, and it turns that
into a transcription, which we match with the
current scene that we’re in. The scene is something
stored in our database. We skip over it. If you accidentally
pressed the button and didn’t say
anything to your phone, just pressed the button by
accident, we skip over that. And then the next
thing we do is, the transcription will be
a full sentence that you spoke to your phone. So if you said,
“beach, uh, sand,” then what we get from
the Cloud Speech API is “beach, uh, sand.” And we want to be
generous to you. We want to give you
all of these guesses. So we’ll split it
into individual words so that you get the guesses
beach, uh, and sand. And with that, we write
that back to our database as a guess. So in just a few
lines of code, I’ve done something that
I, at least, didn’t know how to do two years ago. Two years ago, I
would not have known how to analyze an
audio file for speech in any number of languages. And here it was in
a few lines of code. So that’s pretty cool. [APPLAUSE] Why, thank you. So I wanted to show
you one more part, which is the
function that judges the guesses that we make. So in this function, this
is, again, my index files. So this is my entry point. This function is a
database function, and it triggers when we write to
a certain path in the database. On that write, I get an event
that, just like with the speech event, I will skip if
it’s an event that I’m not that interested in. But in the vast majority
of cases, I am interested, and I will move on to extracting
a few important pieces of information. And then I call my
business logic again. So very similar to
what we saw before. Even though it’s a
completely different type of event triggering
this function, the logic is very much the same. And that takes us to
the business logic here. Judging our noun starts with
filtering out any pranksters, and then getting some
information from the database again. In this case, the
language that the user was speaking, and the original
English translation of the word that you said. Now, if the guess did not have
an English translation for us already in the database,
then it was incorrect. But if it did have
a translation, then it was correct. So we know whether you had
a correct guess or not. We give you a score
based on that, and then we do a
number of operations to write that score
back to multiple places in the database. So why are we doing
this server side? Because all of this logic you
could have done client side as well, right? Well, the reason is
we don’t trust you. We know that you have laptops,
we know you can pull them out, and we know that you can edit
the code that runs client side. You cannot edit this code. We determine whether your guess
is correct or not, not you, which we like. So with that, I
will hand it back to Lauren, who is going
to play the game with us. Can we switch back
to the slides? LAUREN LONG: Actually, sorry,
stay on the slide, yeah. Sorry. ROBERT-JAN HUJSMAN: Yes, ma’am! LAUREN LONG: Oh, no,
sorry, stay on the slide. ROBERT-JAN HUJSMAN: Oh,
stay on the slides, yes. LAUREN LONG: Sorry. I misheard you. Thank you. If we can go to– I can control this, actually– the next slide. All right. So you’ve all been
anxious to play. Some of you have
already gotten practice, so I expect very high
scores this time. saythat.io. So if you’re streaming
in remotely, wherever you are in the world, you can
also play this game with us by going on this exact URL. You will be prompted
to sign in first. So you can do a Google
account, a GitHub account, or an email password
combination. Great. And then some browsers
will also have a popup that asks
you for permission to access your microphone. So if you would like to
be speaking to your phone during this game,
please grant it access. You can also play on your
laptop if you don’t have– well, I guess if you don’t
have a phone available, you have other concerns. But you can also
play on a laptop. Great. So if you can switch back to
the demo computer, please. OK. So we are going to be
playing a new scene. Space. Are you all ready? All right. OK, let the guessing begin. AUDIENCE: Space. LAUREN LONG: We only have
two languages so far, so I need you guys to
up the language game. If you speak another language,
tap that English button on the top left corner,
switch over to something else that you know. Five, that’s pretty good. I think we can do
better, though. All right. Wow, you guys are doing great. OK, let’s see what
the result is. Oh, nice. You guys have managed to guess
all of the possible answers. So that was galaxy, nebula,
space, star, universe. And you managed to do it
in a bunch of languages. OK. Think some people
might be confused about what scene we’re on. OK, now we’re OK. Nice, we got way more
languages this time. Great job. OK. So what the API saw from
this video was animal, carnivore– which some
of you guessed right, that’s really impressive. I did not see a carnivore when I
was looking at this cute puppy. Dog, pet, and the actual breed
of the dog, which was terrier. So that’s pretty cool. And one of you has really
great dog breed knowledge. 11 languages, very nice. Awesome. OK, let’s see. So this one was bamboo, forest,
nature, tree, and vegetation, which unfortunately
nobody got right. But I guess the API saw
vegetation from this scene. Great. Can we switch back to
the slides, please? Awesome. So in conclusion, I hope
we’ve managed to convince you that adding machine learning
will greatly improve your app’s user experience. And you can do it easily with
Cloud Functions for Firebase. So Cloud Functions allows
you to scale from zero to planet scale. It is great for
resource-intensive tasks that otherwise would slow
down your client, would be too intensive
for your client to do. It offers a trusted
environment for your code, so you can keep your business
logic and the secret sauce of your app within it. And with Google Cloud’s
machine learning APIs, you don’t need a PhD to
be able to offer things like multi-language support
and other machine learning experiences for a more
contextual and intuitive feel to your app. We used the top three from
the slide in our demo today, but we have other machine
learning APIs available to you, such as the Natural Language
API and the Vision API. And I encourage you to
check them out as well. So thank you very much for
being here with us this morning. If you want the source code of
the app that we just played, you can find it at
saythat.io/source. The three of us will all be
at the Firebase Sandbox right after this talk, so you can
come ask your questions then. You can also tweet at us with
the hashtag #io17-functions-ml to let us know what you
thought about our talk today. Thank you very much. [APPLAUSE]

15 thoughts on “Supercharging Firebase Apps with Machine Learning and Cloud Functions (Google I/O ’17)”

  1. Thank you very much for this demo using video Intelligent API, and our startup will surely use this API in our coming mobile web bot app. My Q: We have already the code of one chatbot works based on API.ai +Google Cloud Functions Service code. Now we want the above API.ai chatbot to connect with this kind of demo showed in this video, so this bot mobile web app can auto answer some Q from the users. There are few tutorials on the web for the above new capability. Any tips and suggestions will be highly appreciated.

  2. Hi Lauren
    I have been requesting for a solution @ twitter and here @ youtube but no solution, can you could provide me a solution for the following :

    1. I have a node with a field of base64 string, need to triiger database on write to convert base64 string to image or thumbnail and upload the image to cloud storage using FIREBASE FUNCTIONS.

    2. Secondly after uploading the image it would also update the node with downloadurl link

  3. hi ,i try to use this tutorial
    https://cloud.google.com/speech/docs/async-recognize

    and this error view me (in speech recognize)

    Error: Illegal value for audiovalue element of type message: string (object expected)

    and, i not see anything like this on all websites >_<
    I waite your answer 🙂
    for full my code see this link please
    https://stackoverflow.com/questions/45804808/google-cloud-speech-error-illegal-value-for-audiovalue-element-of-type-message

Leave a Reply

Your email address will not be published. Required fields are marked *