Cio Review

Posted on

”Cio

welcome to episode 156 of cxotalk. i’m michaelkrigsman and today i am joined by anthony scriffignano, who is the chief data scientistat dun and bradstreet. anthony how are you today? hello michael, how are you. i’m doing great. hey listen, thank you so much for taking thetime. not at all it’s my pleasure anthony let’s begin by tell us some backgroundabout dun and bradstreet. it’s an interesting company and it’s been around for many manyyears. yeah, it’s a fascinating company to me anywayand i think to many people. it’s been around

for we’re at 174 years now so it startedbefore the civil war. and it’s been through many many iterations over the years. the companyhas between 4-5000 employees but then we also have a worldwide network, partner associationsaround the world so it’s a pretty big company. most of our customers focus on problems inthe area of either total risk or total opportunities, so think credit and also sales and marketing.and then some of the related issues like you get like compliance and government relationsonboard and customers and things like that. so very quickly because i’m curious aboutthis. you started before the civil war and i know that a number of presidents have actuallyworked for the company, including abraham lincoln, so what did mr. dun and mr. bradstreet– i’m assuming they were misters do before

the civil war? well they were, so if you think about whatwas going on at that time, so you had westward expansion and you had a lot of businesseson the east coast that was trying to do business with people who were increasingly far away.and it got to the point where you could go to visit them and judge the character andquality of the person, or how real they were, or weather their operations appeared to besignificant enough for you. so they wanted people who could essentiallybe their representative and forming those opinions and that’s how this all got startedis help me understand people i can’t see. and that’s pretty much what we do now insteadof try to deal with the two month stagecoach

ride it’s the two second trip over the internet,but it’s the same problem. so that two second trip over the internetcomes down to data analytics and data science. so in a sense back when the company was foundedthere was the transfer of information as you said over stagecoach, and then there was sometype of analytical method you used to evaluate the risks. now you use data science and you’rethe chief data scientist, so what does that actually mean in this context? well you know it’s the joke would be howhard can it be right. the issue is that as you try to make a decision, let’s take ourselvesback to pre-civil war day right, what you would look at to try and make a decision aboutwhether a business is ‘worthy’, right.

the first thing is are they real. and thenyou ask some questions about like how long have they been around, what kind of businessare they in. well we do the same kind of thing but when you think about data science andthink about the literally millions of sources of data that are potentially available tomake such a decision, how do you decide what’s true, how do you decide whether what you’reseeing is what it appears to be. how do you find that very small, very new business thatjust came into being? what happens when a business has a name, oraddress, or a phone number or any kind of physical presence that’s in some way transientor virtual. so the questions are really the same kinds of questions but the data scienceversion of it is how do you use new types

of data as opposed to just places where youcan go and look. it’s a very similar problem but much moreobviously much more algorithmic, much more automated, much more ‘scientific’. so how do you use data to determine ‘what’strue’, that’s the question you ask. that’s a big question, so if you think aboutwhat true means sometimes that’s relative right. suppose the question is, is this businessout of business? that seems like a very binary thing, either they’re out of business ortheir not out of business, well not really. when you look at a very small business they’renot necessarily going to go bankrupt, they’re not necessarily going to call us and say bythe way, we’re going out of business now.

they’re not going to put a notice in thenewspaper. there’s not going to be any kind of press release. there’s not going to beanything. they just stop, and then what if they’rejust resting for a while. what happens if a small business is actually still in businessbut the proprietor of the business is just doing something else; he’s sailing aroundthe world for a year, or he’s in the hospital or she decided to go and do some other businessfor a while and she’ll be back right. so we have the versions of kind of park asopposed to definitely gone out of business, and that’s a very nuance kind of thing.so how do you figure that out with a stream of data? obviously you could look at suits,leans, judgements, business deterioration.

look at those things as precursors to businessesthat really die as opposed to things look like they were going well and all of a suddenthey stopped. you might look at the type of business thatwe’re talking about. you might look at the location in the world. you might look at theowner of that business in the context of the business and see if you see them popping upelsewhere. there’s lots of different signal you might get in a situation like that. so explain how you go about as a data scientisthow you go about analyzing some of these problems. well let’s take the issue of fraud as agreat example. so fraud when we talk about it we think we know what we mean but everybodymeans something different. so fraud by any

definition around the world is some sort ofmisrepresentation of information for financial gain. when people lie to use they haven’tgained anything yet, so is that fraud; we call it malfeasance sometimes.if you think about the problem of fraud in the context of how you see it in data, oreven how you see it in real life, it’s often referred to as a quantum observation problem.when you observe it changes. so people committing fraud behave differently when they know they’vebeen detected. and so to try to use regressive methods thatonly look backwards at pre-existing data and pre-examples examples of fraud you’ll getvery good at catching the things that used to happen, which is counter intuitive becausethe thing you’re looking at you’ll know

it’s changing. so data science would sayyes do that because it’s not going to completely stop but it’s also but not sufficient youneed to do more. so how do you find types of bad behavior that haven’t occurred yet?well the first thing you do is you look for types of behavior you haven’t seen beforeand then you try to vet those behaviors against behaviors that are known to be maleficentto try to see if there are similarities. and data science provides non-aggressive methodsthat do thig like that, with the connected space, what we call dyadic relationships.relationships among multiple parties and looking for observable relationships that are differentfrom the ones we’ve seen before and then that allows us to focus and address a problemlike that. so it’s a very long way of saying,

you start looking for things that are newand you start to unpack them and see what they tell you. but you’re doing more than simple comparisonsand in a sense if i can incorrectly boil down what you just said was, you compare that withwhat we don’t know to that which we know. exactly. but that seems a fairly trivial observationso i assume that the data science part is quite a bit more involved than that. yes absolutely, so the part we don’t knowis where the challenge lies right. you have so much data in front of you and you haveto make a decision which parts are you going

to look at and which parts are you not goingto look at. there’s a huge opportunity cost to make a decision like that. you can’tjust bring in all the data and keep pressing the learn things button right. so every timenew data becomes available there’s a step of discovering, realizing that it’s available.there’s the step of curation, making a decision and about whether or not you bring it in andif you did what would it mean, and by the way are you allowed to bring it in; do youhave permissible use things like that. and then there’s the synthesis, making senseout of that. and that all sounds easy until you try and do it at the scale of the creationof information which is off the chart. there’s so much information being createdright now, that we’ve actually lost the

ability to measure the rate at which it’sincreasing, not only don’t we know how much information there is we don’t know how fastit’s growing anymore. okay, discovery, curation, synthesis, canyou give us a concrete example from your work that ties these pieces together so that wecan understand the data analysis process that you go through in order to learn somethingnew from the data that you didn’t see before. sure, so let me give you an example that seemsobvious that’s not. let’s suppose that we’re trying to understand how a companyrepresents itself around the world in different languages and different writing systems. soyou might think that you might translate, but translating works really well for commonnouns but it doesn’t work very well for

proper nouns. so if you have your own namehow do you represent that in arabic or chinese. those are decision that you have to make,and they involve sound and the interpretation of maybe the symbols you might use or howthose sounds sound in different languages. different languages have different phoneticpalates. my name scriffignano has a gn sound in itthe (neah) that’s not an english name. so when i tell people how to say it i say wellsay lasagna because you already know how to say that right. so that’s a sort of techniqueright. so how do you now discover the presence ofan organization or a person in different parts of the world when they’re represented differently.you can’t just sort of flip the letters

around, especially when we’re talking aboutdifferent writing systems. so one of the things that you do is you ingest a very large corpusof information that you understand. so you might ingest something like think aboutmaybe a chamber of commerce might produce a listing of the directors and officers, ceosand owners of businesses. so now i’ve pulled in, i’ve found a listing of a whole bunchof names and i have let’s say, a listing of a whole bunch of names. the curation istrying to make a correlation between those two saying, how much of this thing that i’vejust ingested that’s in a language i don’t know can i understand from the sort of thecontext that it’s in. and then the synthesis is can i discover anyrules. so i’m just thinking of an example,

in greek they have the letter chi, which sortof looks like an x. that sound doesn’t really exist in english; does that turn into a chor does it turn into an x or does it turn into a k. and those three different decisionswill lead you down a different path. so now once i have that question, is it ch,x, or k, now i can start to look at the data and say which seems to be more appropriateand over time i can develop rules and then over time those rules can form new processes.i can tune those processes. i can do what’s called heuristic analysis, where i get a groupof people to observe what the machine is doing and see whether they agree or disagree andyou tune these things over time and eventually it sort of approaches the collective experienceof a person doing the same thing. there’s

a thing called the turing test that you mightbe familiar with. that’s the ultimate example of that and at what point does it appear toyou to be intelligent. so at what point does it appear to you tobe intelligent? at what point do you make the decision that all of this analysis, thisnormalization of multiple data streams, all the analysis that you’re doing that you’vedone enough. and now, based on that analysis you actually do know what is ‘true’. so true is a very dangerous word, but whatwe’re looking for is we’re looking for something to converge on a groups of in thecase of heuristics what is a goal standard is a group of similarly instructed, similarlyincentive people.

so you look at a large enough collection ofinformation and you make sure that you ingest and interpret that information and the sameas a group of people who are similarly instructed and all have the same to gain or lose. youcan’t have like 10 experts and five interns; they’ve got to be kind of the same.and then there’s techniques for normalizing for optimism, and pessimism, and for fatigueand things like that. and eventually what you’ll get is not something necessarilyalways true, but we like to use the phase that consistently wrong is better than inconsistentlyright. get to something that’s consistent that you continue to tune as you understandin how it behaves and you either like or don’t like what its doing.

so the first step then is to aggregate a largeamount of data in what we commonly hear the term big data. i would say the first step would be to becomeaware of the data that could potentially be aggregated. so what does that actually mean? don’t try to eat the whole salad bar. don’ttry to take everything in. look at what’s available and decide what you’re going tohave for your salad and have a reason for deciding that. so you have to be clear about the problemthat you’re trying to solve?

exactly, it really goes back to you neverlead with the data and you never lead with the technology, you lead with the problem.now there are times where you might pull in the data and say what can this data tell me,but in general for a business problem you should start with the problem. you shouldstart with what’s the real thing you’re trying to do.i have used example with you of discovering fraud, or finding new businesses, or discoveringwhen businesses have died. those are real business problems. you start with the problemand then from there you look at the data. there’s the set of data there’s the datathat you already have, the data that you could go out and discover, and the data that you’renever going to get to. and you have to evaluate

the relative size and importance of thosethree classes of data against the problem that you’re trying to solve. so we hear this buzzword big data all thetime, what does big data actually mean in the context of your world and as a data scientistwho is looking at these large blocks of data or aggregations of data in a more rigorousway. so i guess compare big data as a marketing phrase versus a large volume of dat. and i’veheard you also you use the term smart data in making this comparison. yeah, i can only define smart data juxtaposeto big data, so let me take the first predicate in your question first. so big data, you knowwe jokingly refer to it as mmm now because

you’re almost not supposed to talk aboutit anymore but it hasn’t gone away. big data is described in many different ways.what i try to do is describe it very formally and very empirically and very consistently.so you’ll hear me say that you have these aspects of volume, velocity, veracity, variety,and value, the vs. and you have a big data problem when thosevs overwhelm the best attempts to deal with them. that doesn’t mean you’re too cheapto hire the right people or you have the wrong technology. but when you throw the best ofthe best at it and you’re still overwhelmed by one or more of those vs, now you have abig data problem. so it’s not just having a lot of data. it’snot just having data that’s changing really

quickly. it’s not just having data thatsome of it’s true and some of it’s not and you can’t tell the difference. it’sall of those things and more or less at the same time, and when they start to overwhelmthe system, that’s when you have a big data problem.smart data, some people use that term to differentiate between the big data and the smart data. thesmart data is the subset of that data that will actually apply to your problem that canbe used intelligently in a way that takes you towards a solution.and i would add to that definition, it doesn’t necessarily have to take you towards a solution.it could also take you towards breaking a large unsolved problem down to a smaller problemthat’s still unsolved.

think about like curing cancer right, youmay not cure cancer, but you may say all right cancer has nothing to do with the color ofyour blood, moving along. so you’ve taken the problem and made it smaller.and the other thing about that journey is there might be data that uncovers a questionthat you forgot to ask before. so we’ve been focused on are there planets outsideour solar system and we kind of decide that there must be you know, logic and histologysays there has to be. but until recently we couldn’t prove that there weren’t anyexo-planets. now all of a sudden we have tens of thousands of exo-planets that we know about.so the next question along the way is well do any of them look like ours? that’s notjust the only next question because someone

could say, what’s so special about lookinglike ours. you know might they look like something else and still be of interest.so you get these two classes of people, you know one class is looking for water and theother class is looking for a certain planetary mass. that’s first asking a question youforgot to ask and then taking that question and breaking it down into a smaller questionthat’s still unsolved but it’s moving you towards an answer. we have an interesting question from arsalankhan so… nothing to do with exo-planets i assume? you know i suspect you may be able to makea linkage here. but i’ll let you do that

one. so you mentioned this concept of truthis a rather tricky concept and there is no ground truth necessarily and so he’s wondering,you as a data scientist come up with your conclusions and then an executive companylooking at those conclusions say now way, that’s not a chance. your data’s wrongbecause that’s not the truth of the world. the truth of the world is this over here,and what do you say to that? well first i say that i start by saying here’sthe truth that i’ve discovered and then i deserve that kind of a reaction. so datascience is about the data part but it’s also the science part. and we have this thingcalled a scientific method, so it means that we observe the world around us. we form ahypothesis about that world. we ask a research

question. we look at what literature is outthere, like what everybody else has done first. we then pick a method to answer our question.we prove that that’s the best method. then and only then do we go out and collectsome data and use that data according to our method to answer our question. we talk aboutthe answers that we’ve concluded. we talk about the bias in those answers, the weaknessesof it and we support our answers. and then if we’re really good we answer questionsfor future research. so if we did all those things, i don’t justgo to the leadership in my organization and say, i think this data proves that there islife on other planets. i go to my leadership and say, i go and i say i asked myself thequestion, is there life on other planets?

i said well life as we know it right now isbased on water and some other things. so what i did was i looked for evidence of water.here’s how i decided to look for evidence of water and looked for hydrogen and oxygenand whatever you do. and here’s what i found and here’s what i think it means.now if you disagree with me, tell me what i think i got wrong. did i get the wrong question?did i understand the data wrong? did i use the the wrong method? and if they can answerany of those things and if i’m good scientist i should be able to respond to those things.that’s called defending your hypothesis, right. if i can’t respond then i’ve donebad science and then shame on me. so you’re one of those tricky ones, becauseyes you know i playing the role of an executive,

i hear everything that you’re saying, isee your data and yet looking at that planet it sure looks like it has a pinkish cast tome. and infact i know that it does and i’ve been working with planets that have a pinkishcast, or sets of data like this one my entire life i know this population. and you’renow telling me something through your scientific methods that contradicts firm beliefs of howi see the world and i know the way the world works. what about that? so michael let me respect your knowledge ofpink planets. i really appreciate your observation and your experience and i’m certainly notcalling you wrong because what you believe is what you believe. help me understand howyou’ve come to this opinion about the relationship

between life and pink planets.and pretty soon what’s going to happen is you’re going to be saying, ‘it just is,it’s in my experience’ and i’m not calling you wrong. i’m asking you to help me understandwhy you believe what you believe. so if we’re really going to be scientiststhen that means we have to be open to conflicting opinions and if we’re open to conflictingopinions, those people who have those opinions should be able to defend them.now sooner or later you could say, look i’m your boos and what part of i’m your bossdid you fail to understand and go back and prove my pink planet hypothesis right. ifyou go and tell me what to go prove, you’re basically asking me to engage in bad scienceand now we have a whole different problem.

boy it sure sounds like a lot of businessesi know. yeah it does. it sounds like a lot of themi’ve worked with, fortunately not the one i work for right now with not at all. butyou also have to be very careful. you can be right and dead, and part of being a gooddata scientist is being able to use what you’ve learned to tell a story that credibly approachesa problem that somebody has. you can’t just walk in and say, ‘oh look i used all ofthese great methods and look what i learned and you should bow down to the data’ – absolutelynot. you have to understand the problems that peopleare trying to solve. you have to understand how you can be relevant in the context ofthose problems. you can’t always do all

those steps i articulated because time, andmoney, and reality are going to get in the way sometimes. so you have to be reasonableand practical. but by all means you have to be empirical. you have to do something thatyou can repeat. you have to do something that you can defend.you should never use the fact that someone’s in a hurry or shouting loudly to go and dosomething completely irrelevant or negligent. you have to be very careful. there’s a lotof solutions out there that will let you just ingest a ton of data and push the magic buttonand reach some kind of a conclusion. and that’s great; i mean sometimes that’s all you have.you have no idea what this data means, but at some point you have to do better than that.a great example is if you just didn’t know

anything about playgrounds, and you drovepast a playground and you saw a bunch of kids playing in the playground, you might initiallyconclude that this is chaos; it’s just a bunch of kids doing stuff.if you observe the playground more closely, you would see a baseball diamond or a footballfield, you’d see lines. you’d see things that imply some sort of structure, and youmight if you looked closely see playground monitors. you might see people there thatare enforcing rules. you might see the little boys and the little girls are doing differentthings, they’re playing differently. you might start to uncover behavioral aspects.by using correct observational techniques and being careful about what you see, you’dlearn mora about that playground. now you

could yell and scream and say i’ve lookedat playgrounds all my life and you never, you absolutely never see business being conductedon a playground. and i say, ‘well that’s great. but what about those two guys in thesuits over there pointing at the foundations on that jungle jim’, ‘oh well those arecontractors, they’re not kids’. ‘well you didn’t say they weren’t kids, youwere talking about playgrounds’. you know we’ve got to make sure that we understandwhat we’re saying to each other. so you were talking earlier about connectedspaces, and people and relationships, can you elaborate what is a connected space inthis context. well great question, so you have to be verycareful when you use a term like that that

you know what you mean. things can be connectedin many many different ways in even defining what a connection is is somewhat problematic.one of the things that we talk about and what i talked before is a dyadic relationship.it’s a relationship between two entities. so at dun and bradstreet we mostly talk aboutbusinesses. a connected relationship might be ownership, so you have a branch and a parentowner of that branch; you have a subsidiary and a parent. so you might have, we defineit and one type of linkage we have is majority ownership. so if there’s a subsidiary thatowns more than 50% or something that would be a type of dyadic relationship.another type of dyadic relationship amongst business entities might be if they’ve suedeach other or that they’ve mentioned each

other on social media, or that they have co-collaboratedin some observable intellectual property. or that someone from one company is connectedto another company on a platform like linkedin or facebook or something like that. thosewould all be types of discoverable dyadic relationships.and then the question is how can you observe all of those dyadic relationships and howthey’re changing over time time to form conclusions about things, like maybe the businessis growing, or that the two companies are collaborating, or that the two companies areadvisories. or that there seems to be some kind of fraud or malfeasant behavior goingon. those might be all conclusions that you might try to reach to observing those dyadicrelationships. there are other relationships,

there are more than one-to-one. they havedifferent names and they have different problems and uses. give us an example of something that’s reallyhard. what’s the hard kind of problem you face and maybe talk about it in a businesscontext? so hard always is involved where people’sbehaviors is involved. so fraud –we keep talking about fraud, and fraud is hard, becausebad guys keep innovating while we’re innovating in how we detect the behaviour of the badguys, and that’s a really bad problem. another example that involves behavior iswhen businesses have connections to each other that are not formed through owning piecesof each other. so they form temporary relationships,

alliances. they form groups. they form youknow, lots of different words for you have to sue us separately; like we’re not partof the same thing, right. really hard because they’re very squishy kinds of things andyou think that involves behavior is not observing a strict set of rules that you can go anddiscover. so it’s when you either have the human elementor you know that there are connections that exist but the companies have been structuredto reduce or eliminate to the extent possible direct business connection even though theyare related there. yeah there may be intent like that and behindit or it might just be something that’s happening sort of organically. you know ifyou think about sometimes when there’s an

external event like a flood or the arab springor some you know major change in you know who’s in charge of the country or the regionor whatever. all of a sudden in the business world you see a lot of shifting around andit’s not like everybody gets together and says okay how are we going to react to thefact that there was an earthquake in taiwan. it’s just that there was an earthquake intaiwan and now some businesses start doing work for humanitarian reasons and other businessesstart seeing opportunities where they didn’t exist before. you get this, i don’t wantto say chaotic, but atypical behavior, and to say that you’re going to model that behaviormaybe if something very very similar has happened in a reasonably recent period of time againstthe same type of universe you might be able

to do that. but usually all of those preconditionsaren’t met, so you have something very squishy and it involves behavior and you have to respondto it, or choose not to; either one is a choice. how do you make the decision of which dataproblems to solve. since you mentioned that’s the first question, how do you decide what’sa good data problem to be looking at? well to be looking at and to solve are twodifferent questions, i have to kind of park that but you know there are some guiding principles.so we have general guidelines – i call them foul lines. we don’t just do things becausethey’re fun or interesting or scientifically challenging. there’s got to be some realbusiness frame for it. at the beginning i talked to you about totalrisk and total opportunity so at dun and bradstreet

normally we look at things like that. we lookat on the total risk side, it usually has something to do with are they going to pay,are they going to stay in business, are they going to commit some kind of maleficent actor are they going to in some way threaten some business objective.on the opportunity side it’s how big are they, how much do they look like my best customers,how much do they complement my best customers, what’s the white space in this industry.those are all opportunity kinds of questions. so normally i would start from one of thoseframes, if somebody just said, ‘here’s this really cool language problem and youguys do a lot of computational linguistics, wouldn’t you like this’. well look atarabic, well yeah, of course i’d like to

look at it but do i have any data to lookat it and is it part of a problem that our customers have and would anybody notice ifi made any progress there. and you know if the answer to any of those is no then youprobably ought to move on and just keep an eye on this and come back to it later. a different question here altogether, what’sthe relationship between data science, big data, artificial intelligence, machine learning?we hear these buzzwords thrown around and usually they’re thrown around by marketingdepartments, so from a data science perspective what’s going on with that? well artificial intelligence and machine learningare tools that are used by data science. so

some of these things that you hear about,neural networks and quantum algorithms and machine learning and those are all tools andtechniques that can be applied in the field of data science.data science is a complex combination of being able to understand the methods for understandingdata scientifically and also using it to tell a story. and in the business world that storyhas to relate to a real problem that is meaningful to the population. so if you think about datascience as the part where you use all those other things. and i would also add we oftenmisuse all of those other things because you’ve either been tricked into using them by somebodywho say that they’ll solve all of your problems or you are you know, hoping that is somehowgoing to be your silver bullet or you’ve

been in a conversation that started with myfavorite words, ‘why don’t we just…’, you know and we’ll push this button andeverything will get easier. so you know there’s a sort of a dark sideto all of this that you’ll go and use all of those tools and techniques without reallyunderstanding what you’re trying to do. that would be like me going into a hardwarestore or into a tool store and buying a laser saw – and i’m not a carpenter. well that’sgreat, you have this tool and you know you’re trying to carve a pumpkin – you bought thewrong tool. you need to understand what you’re trying to do; you don’t just jump rightfor the tools. so data science is about telling a businessstory, is that the ultimate goal, your end

objective in this sense. using data to address a problem and to beable to answer that problem in a way that’s meaningful to the business. and what i wouldadd, the science part is, in a repeatable defendable way. many people would not addthat last part. i would. so defeatable – repeatable it’s often defeatable as well but that’sanother problem. well if you go through the steps you’vebeen describing then hopefully it’s less defeatable and more repeatable. yes we should hope so yes.

so what about innovation, so right now theinternet of things, innovation around data seems to be where the future is taking us,may be give your point of view on that. yeah so thanks for that. you know it usedto be a couple years ago if you wanted to be a pundit and talk about technology andwhere we were going you had to say mobile, social, cloud, analytics. you had to get thosefour words out. and what i’ve been saying recently is that you know those four wordsleads to lots of other words. so you know if you’re going to talk about mobile you’dbetter talk about the internet of things right. mobile technology is just sort of things thatare out there and moving around, and the internet of things, some of those things move aroundand some of them don’t but they’re certainly

out there and we may not necessarily knowwhere they are or what they are when they’re talking and that presents a whole slew ofproblems. just like if you talked about i don’t know,cloud computing you’d better be talking about data sovereignty and you know the differentrules and regulations. you can’t just put data out on the cloud. nobody thinks there’shard drives’ floating around on the cloud right, that data sits somewhere.so your question was about the internet of things which to me is an extension of youknow where we were a couple of years ago. and i think a lot of people think this thatwe’ve got a thing or two to learn in this space. if you look at bluetooth from a numberof years ago i think as a good analogy, bluetooth

was sort of invented and then it took about10 years to catch on. and part of the reason was in my humble opinionwas we forgot to think about a number of questions, like you know, it’s great that you can havea bluetooth headset but how do we keep my headset discovering your phone and eavesdroppingon it right. well we’ll put this four digit passcode in there and nobody knows the number,so they’re always 0000 or 1234 and all of a sudden all you have to do is try a few numbers.so we’ve got to be better than that. with the internet of things we’ve got you knowtens of thousands, hundreds of millions of things right now talking to millions of otherthings, and we’re sort of making those same mistakes.there was a big issue not too long ago. i

won’t name the company but there was a dolland the doll could talk to a cloud application and your kid could talk to the doll and thedoll seemed to know what was going on and it would get smarter as other kids talkedto the same doll. that’s great except someone realized that that was a device on the internetof things that had an ip address, and if i can hack into it it has a microphone. andif the kid leaves the doll in the parents’ home office i can eavesdrop on the conversationand maybe short stoke or do things that are maleficent, and then that started to happen.whoops didn’t think of that. so if you’re going to build something onthe internet of things you’d better be thinking about how it might be used in unintended ways.you also better be able to think about what

happens if it’s used in intended ways ata scale that goes way beyond of what you ever intended.you also better be able to think about how other people might use it to solve unknownunmet needs. somebody starts to use your thing to solve a completely different problem thatyou didn’t plan on it solving, and you didn’t build it for that purpose and now all of asudden you’re negligent in a way that you didn’t even intend.we’ve got to be a lot smarter about this. we can’t just rush to say, ‘oh isn’tit great that things can talk to other things’, yes, but what might they say to each otherand how might they all of a sudden help people do things we didn’t intend. very big questions,we’d better be asking those questions.

and as you’re asking those questions atdun and bradstreet, what are some of the answers or the points of view or the trajectoriesthat you’re coming up with. well so things themselves don’t necessarilyplay into our landscape right away, although some of those things might talk to us andask about businesses. i won’t get into the complexities, but there’s ways that thingscan ask about businesses right. the reality is the only things that we foresee askingabout businesses right now are other computers. so we worry about the transactional responsetime of that question and answer and the anthology of the question and the anthology of the answerand all that’s great, but now do we do anything to detect what type of thing might be on theother end of the question.

and you know without getting into any security,there are things that we do today to make sure that the thing that we’re talking tois something that we intend to be talking to. we’ve got to do a lot mot ideation tomake sure that what we believe remains true as things get smarter and talk faster andfind new ways to whisper in our ear and all of that, just like any other company thattouches the internet, you just can’t say, well we we’re safe yesterday so we’llprobably be safe tomorrow. that’s crazy. what advice as we go towards the close here,what advice do you have for business people to use data science effectively? so maybe i can tell you a quick story aboutsomething that happened in my experience here

that literally has changed my life. numberof years ago we had this horrible situation in japan where there was an earthquake thatcaused an tsunami, the tsunami hit the coast of sendia, 20,000 people were washed out tosea. you had a nuclear meltdown at the daiichi power plant. you had all these things happeningto japan all at once. absolutely horrific. unprecedented.no data science in the world ever foresaw anything like that happening right. and herewe are, we come together. i was on a conference call. a few days later i was in japan rightbefore that happened, and we said look, we can do things from a humanitarian standpoint,but also from a business standpoint. there’s got to be something we can do to help thesemostly small businesses in japan that are

kind of living hand to mouth and now everybodyassumes that are now out of business. many of them are still in business. many ofthem are still there and doing fine and if everybody assumes they’re not then thingsare going to get even worse on top of radiation and tidal waves, they’re going to have todeal with no money. so we started to look at a database that saideverything was just the way it was just before this thing happened. to fix it the old fashionedway was going to take a very long time. way longer than these people have. and so we hadto look at new ways of collecting information. we started to look at new types of data thatwere available. we looked at crowd source radiation data. we taught algorithms how tofind the skyline and to measure the change

in the skyline before and after. we lookedat uninterrupted straight and curved lines that became interrupted and geospatial imaging.we looked at the propagation from the tectonic wave from the epicenter of the earthquakeand we built 19 different car detectors to measure whether or not cars were there andwhat they looked like. you could argue that some of that capability already existed butwe didn’t have time to go find it. very quickly we put all of this together andwe built it heuristic like i described to you before and we taught it how to look atthe data and we fixed all of the data in japan in about three months, and it would have ustake well over a couple of years the old fashioned way for very good reasons.we then had this dataset that was probably

the most valuable dataset that you could ownat that point relative to japan, and we could have made a lot of money on that. and whatwe did we put it on the internet and gave it away for free, and every time i tell thatstory i get tears in my eyes. so my very long winded answer to your question is we’vegot to be better than just making another dollar.we’ve got to think about the unintended impact of doing nothing. we’ve got to thinkabout letting the bad guys get ahead of the good guys. we’ve got to think about whatwe’re teaching our kids. we’ve got to think about what we’re teaching ourselvesor we’re just going to drown in this data and lose unbelievable opportunity and justfind ourselves swimming gin stupid decisions

because we didn’t have the time to do anythingbetter. we’re much better than that, and i truly believe that if we bring science intothe room we can at least make new mistakes everyday which is a very good start. and what words of advice do you have to businesspeople that are dealing with the data and they’re finding that the data is pointingout viewpoints on the world that are differently from their previously held beliefs and weknow change is hard, so what advice do you have there. so i would say three things. first of alljust knowing that you have that problem is the first step, so being what’s called thereflective leader. thinking about what you

believe in and why you believe it is extremelyimportant. your example of the guy with the pink planets before, screaming that he’sgot lots of experience that’s great. but we’ve got to be better than that. so thefirst thing is to be very clear about what we believe and why we believe it.the second step is once we understand that and presumably we can ask better questionsabout the business and what we’re trying to prove and all of that. the second stepis to look at the skills that we’re bringing into the organization and make sure that we’renot just brining in people that have rebranded themselves in this data science space, butpeople that really understand the different ways of knowing, the different ways of discovery,the different issues with regulation and with

synthesis of information; brining in the skillsthat we need. and the last thing and probably the most importantthing is constantly looking inwardly at ourselves and making sure that the skills that madeus successful so far, those are just table stakes. we’ve got to be constantly improving.this is a whole new world out here and we’ve got to have the conversations that’s tough,but you know you’re not as good as you think you were, because you’ve got to be muchbetter tomorrow than to just stay where you were. anthony scriffignano, chief data scientistat dun and bradstreet, what can i possibly ask you beyond your last comment. thank youso much for taking the time today, it’s

been enlightening it’s been a delightful conversation, thankyou so much for the opportunity. we have been talking with anthony scriffignano,who is the chief data scientist at dun and bradstreet. what an amazing conversation andi would like to thank anthony and thank the folks at dun and bradstreet for making thispossible. and especially to everybody who is watching thank you and come back next timebecause we’ll be here next friday as always.

Leave a Reply

Your email address will not be published. Required fields are marked *