Knewton – Education Datapalooza


So the human race is about to enter a totally
data mine existence and it’s going to be really fun to watch. It’s going to be one
of those things where our grandkids are going to tell our kids I can’t believe
you grew up in a world like that just the way our kids complained that we went to record
stores. When Tom Cruise walks through the mall in Minority Report and the
ad beams right to his eyes and say “Hey Mr. Cruise you should you go on that
Caribbean vacation you’ve been thinking about.”
I know some entrepreneurs who work on that technology right now. And I’m still
waiting for the day when my refrigerators going to know when I’m running out of
milk and it’s ordered for me automatically on Fast Track. I think that day’s coming
in a few years it’s not far off. The world in 30 years is going to be unrecognizably
data mined. So what does that man for education? Well education happens to be
today the world’s most data minable industry by far and it’s not even close. So
maybe one day healthcare will be up there when they have little nanobots that
are in your bloodstream that are doing real time analysis, but until then it’s not
close, education beats everything else hands down.
So let’s look at other big data industries. The really big data industries in the world
right now are not surprisingly on the internet because that’s where it’s easy to
grab the data and that’s also where the congregation of talent that understands
data. So well let’s just look at it by the numbers because the name of the game is
Data Per User. So one of the things that fake us out about data and education is
education because it’s so big, it’s like the fourth biggest industry in the world that
produces incredible quantity of data. But data that just produces one or two
points per user per day is not really all that valuable to an individual user. It might
be valuable to like a school district administrator, but maybe not even then.
So let’s just compare. Netflix and Amazon get in the ones of data points per user
per day. Google and Facebook get in the tens of data points per user per day. So
you do 10 minutes of messing around in Google you produce about a dozen data
points for Google. Okay great. So Newton today gets five to ten million actionable
data per student per day. Now we do that because we get people, if you
can believe it, to tag every single sentence of their content so publishers, we
have a large publishing partnership with Pearson, and they tag all their content.
And we’re in open standard so anyone can tag us. If you tag all your content
and you do it down to the automatic concept level, down to the sentence, down
to the clause, you unlock an incredible amount of trapped hidden data.
Why do you do that? Well if you use programmatic taxonomy models and item
response theory and I think at the bottom, we haven’t given that a name yet,
what you figure out is everything in education is correlated to everything else
down to the concept. Now this is where education’s different from search and
social networking. If someone tagged every single line, every single sentence of all
the world’s web pages for Google, or every single line of dialogue from Netflix,
which no one’s done, but even if they had they’re not really a whole lot of
interesting correlations there. Everything in education is correlated to everything
else. Every single concept is correlated in a predictable way to everything
else using psychometrics right. So if you do 10 minutes of work in Google you produce
a dozen data points for Google. Because everything that we do is tagged at
such a grandeur level if you do 10 minutes of work for Newton you cascade out
lots and lots of other data, and here’s why. When you took the SAT there might
be 40 different concepts about equal auto triangles that are tested on all
the SATs ever given in any one year. But you didn’t get all 40 questions you got
two questions on equal auto triangles because they figure if you’re in the Top 14th
percentile at those two questions, 13th percentile on this one and 15% on that
one, if you’re in the Top 14% percentile on those two questions in equal
auto triangles the odds are 98% percentile chance that you’re in the Top 14%
percentile at every concept and equal auto triangles. And there’s a 96% chance
that you’re in the Top 15% percentile about all triangle concepts, three,
four five, 30%, 60%, 90%, asceles, etc., etc.
You did a little bit of work for Newton and we used just established signs of
psychometrics to cascade out hundreds of other data. So we can produce
incredible quantities of data per user per day. It’s really, really hard to get that,
but if you can get all that tagging done Ð and that’s one of our tags is on Ð that’s
a small part of our overall taxonomy, that’s
just part of one course and we have dozens of taxonomies, then you can do this.
What you can do with the data if you actually do all that work is you can figure
out exactly what students know and how well they know it. You can figure it out
down to the percentile versus the rest of the population.
So Newton students today we have about 180,000 right now, by December it’ll be
650,000, early next year it’ll be in the millions and the next year it’ll be closer to
10 million, and that’s just through our Pearson partnership. So for every one of
the students we can figure out within a few hours what they’re strong at and
what they’re weak at, at the beginning of the course. So we can produce a unique
syllabus for each student each day, literally unique. There’s not enough time in
the universe for any two students to have the same syllabus on any one day,
that’s how many there are. So it’s optimized for each kid down to the
atomic concept. And then we can figure out things like well here’s your homework
tomorrow night, you’re going to struggle with that homework or you’re going
to fail it, because concepts in that homework that we know you haven’t mastered
the previous concepts for that build up to that. Or there’s concepts in that
homework that [inaudible 04:53] very highly concepts always have trouble with.
So we know you’re going to fail, we know it in advance and we can prevent it in
advance. We go grab some content from somewhere else in the portfolio and going
to seamlessly blend that into your homework tonight.
So every kid gets a perfectly optimized textbook, except it’s also video and other
rich media dynamically generated in real time. And it also uses the combined data
power of the entire network. So here’s what I mean by that, like I said next year
we’ll have close to 10 million students, a few years from now we’ll have a 100
million. A 100 million first shows up to learn something like rules of exponents or
subject per agreement, whatever. We take the combined data problem all
hundred million to figure out exactly how to teach every concept to each kid.
So the 100 million first shows up to learn the rules of exponents, great let’s go
find a group of people who are psychometrically equivalent to that kid. They learn
the same ways, they have the same learning style, they know the same stuff,
because Newton can figure out things like you learn math best in the morning
between 8:40 and 9:13 am. You learn science best in 42 minute bite sizes the 44
minute mark you click right [inaudible 05:47], you start missing questions you
would normally get right. You learn social studies best with video clips
or 22% video to 78% taxed or whatever your optimal cocktail. We can tell
when we should return content to you for optimal retention. We literally know
everything about what you know and how you learn best, everything because we
have five orders of magnitude and more data about you than Google has. We literally
have more data about our students than any company has about anybody
else about anything, and it’s not even close. That’s why we can do all that
stuff right. So then what we can do is take that profile
the 100 million kids, next it’ll be 10 million. We can go figure out okay whose exactly
like that kid? Whose learning styles up and down the line are just the same?
Who knew the same stuff at the same level of mastery when they had [inaudible
06:24]? Great. Statistically speaking it has to be the case that some 5%
or 10% through shared bad luck did the absolute wrong thing for themselves without
knowing it. They did questions that were too hard, that
got discouraged, they bounced. They accessed text they should have gotten the
video, whatever. It also has to be a fact or statistics that through pure blind luck,
some Top 1% the absolute perfect thing for themselves without realizing it. And we
go take the whole combined data power that network of millions, soon to be
tens of millions, eventually it’ll be hundreds of millions of people. And for every
single concept that your child learns 2000 concepts in a particular semester along
math course, for every single autonomic concept we take the combined data
part, that vast network and use it to fund perfect plan forward for that kid
for that concept. So that’s what we do right now. Let me give
you a couple of examples. This is one student. There’s a few hundred learning clusters
there, there’s a few tens of thousands of autonomic learning objects there.
That’s one student’s path, this is a real student in a US college right now. And
you’ll see that each student has a totally different path. Some students have
short paths, some have long paths, in this particular course there were students
who finished it in 14 days, there were students who finished it in two semesters.
This is a course at ASU they had to change their semester structure to a modulate
semester structure because we were suddenly telling them things like if you give
this woman here the final right now she’ll get an A, it’s only 14 days into the
course. I promise you she’ll get an A. You can keep her in that seat if you want,
and that’s what we’ve always done now we don’t have to. So let’s show you this.
This is a 150 student’s one class and they kind of all look like fleas but that’s all
an individual learning path. Notice that some
of them are going really fast, some of them are going really slow, and then they’ll
all kind of speed up when the test comes. It’s kind of like organic and so those
different color coded things are like concept clusters. Like some test obviously
just happened, that’s why they all started working. And you can look at some
of those students and think boy that pure schmuck is really in a lot of trouble
because they’re going too slowly. So where we think we’re going with this obviously
it’s in market right now. We’re going to be in K-12 starting next year and
it’s an open platform anyone can plug it in and use it by APIs. And where we think
we’re going with the data side of it, which is the really fun stuff for today, is
we think within a few years we’ll be able to start predicting great performance. So
teachers grade persistently year in and year out, if that teacher grades consistently
we can match up the student profiles down to the autonomic concept levels versus
great performance. We can tell you you’re on track to get a B-
in this course right now. Either that or if your teacher gets totally inconstant we
can’t tell you that, but that’s another problem. If your teacher grades consistently
we can tell you what your grade’s going to be based on what you know and how
fast you’re learning it. But if you do another 30 minutes a day for three days a
week you can get it up to an A-. We can tell you things like that. We’re really excited
to correlate with other people’s datasets by open API things like, something
we’ve talked about as kind of a joke but it really should work, is like the food
diary. You tell us what you had for breakfast every morning at the beginning of
the semester, by the end of the semester we should be able to tell you what
you had for breakfast because you always do better on the days you have scrambled
eggs or whatever. And more importantly we should be able to tell you
what you should have for breakfast. So the power of data when you unlock millions
of data points per user per day you can accomplish things that people aren’t
even conceiving of right now. But that world is coming we’re trying to bring
it to you and we’re going to be an open system to allow anyone to just plug that data,
take it out, and then plug it back in. Thanks very much. JOSE FERREIRA © 2012 OfficeOfEdTech Page 8

2 thoughts on “Knewton – Education Datapalooza”

  1. This talk makes the assumption that everyone learns best through an electronic device.  While I love technology, some of the best learning occurs when away from electronic devices…….  My sons are in the third grade and they are already tired of all the testing stuff…..sad.  🙁

  2. Why on earth do you even care what a student's grade in a course is or what percentile the student scores in? If we REALLY want to abandon the "one size fits all" model we need to abandon the comparisons that result from such a system. If we move to a pure mastery system, which is the true promise of technology, we don't need to have "percentile rankings" or "grades". Does anyone really care how long it took me to learn how to drive sufficiently well to get a license? Or does anyone care that it took me three tries to pass the written part of the driver's license test? Do I get a "percentile ranking" or a "GPA" on my driver's license? 

Leave a Reply

Your email address will not be published. Required fields are marked *