welcome to episode 156 of cxotalk. iâ€™m michaelkrigsman and today i am joined by anthony scriffignano, who is the chief data scientistat dun and bradstreet. anthony how are you today? hello michael, how are you. iâ€™m doing great. hey listen, thank you so much for taking thetime. not at all itâ€™s my pleasure anthony letâ€™s begin by tell us some backgroundabout dun and bradstreet. itâ€™s an interesting company and itâ€™s been around for many manyyears. yeah, itâ€™s a fascinating company to me anywayand i think to many people. itâ€™s been around
for weâ€™re at 174 years now so it startedbefore the civil war. and itâ€™s been through many many iterations over the years. the companyhas between 4-5000 employees but then we also have a worldwide network, partner associationsaround the world so itâ€™s a pretty big company. most of our customers focus on problems inthe area of either total risk or total opportunities, so think credit and also sales and marketing.and then some of the related issues like you get like compliance and government relationsonboard and customers and things like that. so very quickly because iâ€™m curious aboutthis. you started before the civil war and i know that a number of presidents have actuallyworked for the company, including abraham lincoln, so what did mr. dun and mr. bradstreetâ€“ iâ€™m assuming they were misters do before
the civil war? well they were, so if you think about whatwas going on at that time, so you had westward expansion and you had a lot of businesseson the east coast that was trying to do business with people who were increasingly far away.and it got to the point where you could go to visit them and judge the character andquality of the person, or how real they were, or weather their operations appeared to besignificant enough for you. so they wanted people who could essentiallybe their representative and forming those opinions and thatâ€™s how this all got startedis help me understand people i canâ€™t see. and thatâ€™s pretty much what we do now insteadof try to deal with the two month stagecoach
ride itâ€™s the two second trip over the internet,but itâ€™s the same problem. so that two second trip over the internetcomes down to data analytics and data science. so in a sense back when the company was foundedthere was the transfer of information as you said over stagecoach, and then there was sometype of analytical method you used to evaluate the risks. now you use data science and youâ€™rethe chief data scientist, so what does that actually mean in this context? well you know itâ€™s the joke would be howhard can it be right. the issue is that as you try to make a decision, letâ€™s take ourselvesback to pre-civil war day right, what you would look at to try and make a decision aboutwhether a business is â€˜worthyâ€™, right.
the first thing is are they real. and thenyou ask some questions about like how long have they been around, what kind of businessare they in. well we do the same kind of thing but when you think about data science andthink about the literally millions of sources of data that are potentially available tomake such a decision, how do you decide whatâ€™s true, how do you decide whether what youâ€™reseeing is what it appears to be. how do you find that very small, very new business thatjust came into being? what happens when a business has a name, oraddress, or a phone number or any kind of physical presence thatâ€™s in some way transientor virtual. so the questions are really the same kinds of questions but the data scienceversion of it is how do you use new types
of data as opposed to just places where youcan go and look. itâ€™s a very similar problem but much moreobviously much more algorithmic, much more automated, much more â€˜scientificâ€™. so how do you use data to determine â€˜whatâ€™strueâ€™, thatâ€™s the question you ask. thatâ€™s a big question, so if you think aboutwhat true means sometimes thatâ€™s relative right. suppose the question is, is this businessout of business? that seems like a very binary thing, either theyâ€™re out of business ortheir not out of business, well not really. when you look at a very small business theyâ€™renot necessarily going to go bankrupt, theyâ€™re not necessarily going to call us and say bythe way, weâ€™re going out of business now.
theyâ€™re not going to put a notice in thenewspaper. thereâ€™s not going to be any kind of press release. thereâ€™s not going to beanything. they just stop, and then what if theyâ€™rejust resting for a while. what happens if a small business is actually still in businessbut the proprietor of the business is just doing something else; heâ€™s sailing aroundthe world for a year, or heâ€™s in the hospital or she decided to go and do some other businessfor a while and sheâ€™ll be back right. so we have the versions of kind of park asopposed to definitely gone out of business, and thatâ€™s a very nuance kind of thing.so how do you figure that out with a stream of data? obviously you could look at suits,leans, judgements, business deterioration.
look at those things as precursors to businessesthat really die as opposed to things look like they were going well and all of a suddenthey stopped. you might look at the type of business thatweâ€™re talking about. you might look at the location in the world. you might look at theowner of that business in the context of the business and see if you see them popping upelsewhere. thereâ€™s lots of different signal you might get in a situation like that. so explain how you go about as a data scientisthow you go about analyzing some of these problems. well letâ€™s take the issue of fraud as agreat example. so fraud when we talk about it we think we know what we mean but everybodymeans something different. so fraud by any
definition around the world is some sort ofmisrepresentation of information for financial gain. when people lie to use they havenâ€™tgained anything yet, so is that fraud; we call it malfeasance sometimes.if you think about the problem of fraud in the context of how you see it in data, oreven how you see it in real life, itâ€™s often referred to as a quantum observation problem.when you observe it changes. so people committing fraud behave differently when they know theyâ€™vebeen detected. and so to try to use regressive methods thatonly look backwards at pre-existing data and pre-examples examples of fraud youâ€™ll getvery good at catching the things that used to happen, which is counter intuitive becausethe thing youâ€™re looking at youâ€™ll know
itâ€™s changing. so data science would sayyes do that because itâ€™s not going to completely stop but itâ€™s also but not sufficient youneed to do more. so how do you find types of bad behavior that havenâ€™t occurred yet?well the first thing you do is you look for types of behavior you havenâ€™t seen beforeand then you try to vet those behaviors against behaviors that are known to be maleficentto try to see if there are similarities. and data science provides non-aggressive methodsthat do thig like that, with the connected space, what we call dyadic relationships.relationships among multiple parties and looking for observable relationships that are differentfrom the ones weâ€™ve seen before and then that allows us to focus and address a problemlike that. so itâ€™s a very long way of saying,
you start looking for things that are newand you start to unpack them and see what they tell you. but youâ€™re doing more than simple comparisonsand in a sense if i can incorrectly boil down what you just said was, you compare that withwhat we donâ€™t know to that which we know. exactly. but that seems a fairly trivial observationso i assume that the data science part is quite a bit more involved than that. yes absolutely, so the part we donâ€™t knowis where the challenge lies right. you have so much data in front of you and you haveto make a decision which parts are you going
to look at and which parts are you not goingto look at. thereâ€™s a huge opportunity cost to make a decision like that. you canâ€™tjust bring in all the data and keep pressing the learn things button right. so every timenew data becomes available thereâ€™s a step of discovering, realizing that itâ€™s available.thereâ€™s the step of curation, making a decision and about whether or not you bring it in andif you did what would it mean, and by the way are you allowed to bring it in; do youhave permissible use things like that. and then thereâ€™s the synthesis, making senseout of that. and that all sounds easy until you try and do it at the scale of the creationof information which is off the chart. thereâ€™s so much information being createdright now, that weâ€™ve actually lost the
ability to measure the rate at which itâ€™sincreasing, not only donâ€™t we know how much information there is we donâ€™t know how fastitâ€™s growing anymore. okay, discovery, curation, synthesis, canyou give us a concrete example from your work that ties these pieces together so that wecan understand the data analysis process that you go through in order to learn somethingnew from the data that you didnâ€™t see before. sure, so let me give you an example that seemsobvious thatâ€™s not. letâ€™s suppose that weâ€™re trying to understand how a companyrepresents itself around the world in different languages and different writing systems. soyou might think that you might translate, but translating works really well for commonnouns but it doesnâ€™t work very well for
proper nouns. so if you have your own namehow do you represent that in arabic or chinese. those are decision that you have to make,and they involve sound and the interpretation of maybe the symbols you might use or howthose sounds sound in different languages. different languages have different phoneticpalates. my name scriffignano has a gn sound in itthe (neah) thatâ€™s not an english name. so when i tell people how to say it i say wellsay lasagna because you already know how to say that right. so thatâ€™s a sort of techniqueright. so how do you now discover the presence ofan organization or a person in different parts of the world when theyâ€™re represented differently.you canâ€™t just sort of flip the letters
around, especially when weâ€™re talking aboutdifferent writing systems. so one of the things that you do is you ingest a very large corpusof information that you understand. so you might ingest something like think aboutmaybe a chamber of commerce might produce a listing of the directors and officers, ceosand owners of businesses. so now iâ€™ve pulled in, iâ€™ve found a listing of a whole bunchof names and i have letâ€™s say, a listing of a whole bunch of names. the curation istrying to make a correlation between those two saying, how much of this thing that iâ€™vejust ingested thatâ€™s in a language i donâ€™t know can i understand from the sort of thecontext that itâ€™s in. and then the synthesis is can i discover anyrules. so iâ€™m just thinking of an example,
in greek they have the letter chi, which sortof looks like an x. that sound doesnâ€™t really exist in english; does that turn into a chor does it turn into an x or does it turn into a k. and those three different decisionswill lead you down a different path. so now once i have that question, is it ch,x, or k, now i can start to look at the data and say which seems to be more appropriateand over time i can develop rules and then over time those rules can form new processes.i can tune those processes. i can do whatâ€™s called heuristic analysis, where i get a groupof people to observe what the machine is doing and see whether they agree or disagree andyou tune these things over time and eventually it sort of approaches the collective experienceof a person doing the same thing. thereâ€™s
a thing called the turing test that you mightbe familiar with. thatâ€™s the ultimate example of that and at what point does it appear toyou to be intelligent. so at what point does it appear to you tobe intelligent? at what point do you make the decision that all of this analysis, thisnormalization of multiple data streams, all the analysis that youâ€™re doing that youâ€™vedone enough. and now, based on that analysis you actually do know what is â€˜trueâ€™. so true is a very dangerous word, but whatweâ€™re looking for is weâ€™re looking for something to converge on a groups of in thecase of heuristics what is a goal standard is a group of similarly instructed, similarlyincentive people.
so you look at a large enough collection ofinformation and you make sure that you ingest and interpret that information and the sameas a group of people who are similarly instructed and all have the same to gain or lose. youcanâ€™t have like 10 experts and five interns; theyâ€™ve got to be kind of the same.and then thereâ€™s techniques for normalizing for optimism, and pessimism, and for fatigueand things like that. and eventually what youâ€™ll get is not something necessarilyalways true, but we like to use the phase that consistently wrong is better than inconsistentlyright. get to something thatâ€™s consistent that you continue to tune as you understandin how it behaves and you either like or donâ€™t like what its doing.
so the first step then is to aggregate a largeamount of data in what we commonly hear the term big data. i would say the first step would be to becomeaware of the data that could potentially be aggregated. so what does that actually mean? donâ€™t try to eat the whole salad bar. donâ€™ttry to take everything in. look at whatâ€™s available and decide what youâ€™re going tohave for your salad and have a reason for deciding that. so you have to be clear about the problemthat youâ€™re trying to solve?
exactly, it really goes back to you neverlead with the data and you never lead with the technology, you lead with the problem.now there are times where you might pull in the data and say what can this data tell me,but in general for a business problem you should start with the problem. you shouldstart with whatâ€™s the real thing youâ€™re trying to do.i have used example with you of discovering fraud, or finding new businesses, or discoveringwhen businesses have died. those are real business problems. you start with the problemand then from there you look at the data. thereâ€™s the set of data thereâ€™s the datathat you already have, the data that you could go out and discover, and the data that youâ€™renever going to get to. and you have to evaluate
the relative size and importance of thosethree classes of data against the problem that youâ€™re trying to solve. so we hear this buzzword big data all thetime, what does big data actually mean in the context of your world and as a data scientistwho is looking at these large blocks of data or aggregations of data in a more rigorousway. so i guess compare big data as a marketing phrase versus a large volume of dat. and iâ€™veheard you also you use the term smart data in making this comparison. yeah, i can only define smart data juxtaposeto big data, so let me take the first predicate in your question first. so big data, you knowwe jokingly refer to it as mmm now because
youâ€™re almost not supposed to talk aboutit anymore but it hasnâ€™t gone away. big data is described in many different ways.what i try to do is describe it very formally and very empirically and very consistently.so youâ€™ll hear me say that you have these aspects of volume, velocity, veracity, variety,and value, the vs. and you have a big data problem when thosevs overwhelm the best attempts to deal with them. that doesnâ€™t mean youâ€™re too cheapto hire the right people or you have the wrong technology. but when you throw the best ofthe best at it and youâ€™re still overwhelmed by one or more of those vs, now you have abig data problem. so itâ€™s not just having a lot of data. itâ€™snot just having data thatâ€™s changing really
quickly. itâ€™s not just having data thatsome of itâ€™s true and some of itâ€™s not and you canâ€™t tell the difference. itâ€™sall of those things and more or less at the same time, and when they start to overwhelmthe system, thatâ€™s when you have a big data problem.smart data, some people use that term to differentiate between the big data and the smart data. thesmart data is the subset of that data that will actually apply to your problem that canbe used intelligently in a way that takes you towards a solution.and i would add to that definition, it doesnâ€™t necessarily have to take you towards a solution.it could also take you towards breaking a large unsolved problem down to a smaller problemthatâ€™s still unsolved.
think about like curing cancer right, youmay not cure cancer, but you may say all right cancer has nothing to do with the color ofyour blood, moving along. so youâ€™ve taken the problem and made it smaller.and the other thing about that journey is there might be data that uncovers a questionthat you forgot to ask before. so weâ€™ve been focused on are there planets outsideour solar system and we kind of decide that there must be you know, logic and histologysays there has to be. but until recently we couldnâ€™t prove that there werenâ€™t anyexo-planets. now all of a sudden we have tens of thousands of exo-planets that we know about.so the next question along the way is well do any of them look like ours? thatâ€™s notjust the only next question because someone
could say, whatâ€™s so special about lookinglike ours. you know might they look like something else and still be of interest.so you get these two classes of people, you know one class is looking for water and theother class is looking for a certain planetary mass. thatâ€™s first asking a question youforgot to ask and then taking that question and breaking it down into a smaller questionthatâ€™s still unsolved but itâ€™s moving you towards an answer. we have an interesting question from arsalankhan soâ€¦ nothing to do with exo-planets i assume? you know i suspect you may be able to makea linkage here. but iâ€™ll let you do that
one. so you mentioned this concept of truthis a rather tricky concept and there is no ground truth necessarily and so heâ€™s wondering,you as a data scientist come up with your conclusions and then an executive companylooking at those conclusions say now way, thatâ€™s not a chance. your dataâ€™s wrongbecause thatâ€™s not the truth of the world. the truth of the world is this over here,and what do you say to that? well first i say that i start by saying hereâ€™sthe truth that iâ€™ve discovered and then i deserve that kind of a reaction. so datascience is about the data part but itâ€™s also the science part. and we have this thingcalled a scientific method, so it means that we observe the world around us. we form ahypothesis about that world. we ask a research
question. we look at what literature is outthere, like what everybody else has done first. we then pick a method to answer our question.we prove that thatâ€™s the best method. then and only then do we go out and collectsome data and use that data according to our method to answer our question. we talk aboutthe answers that weâ€™ve concluded. we talk about the bias in those answers, the weaknessesof it and we support our answers. and then if weâ€™re really good we answer questionsfor future research. so if we did all those things, i donâ€™t justgo to the leadership in my organization and say, i think this data proves that there islife on other planets. i go to my leadership and say, i go and i say i asked myself thequestion, is there life on other planets?
i said well life as we know it right now isbased on water and some other things. so what i did was i looked for evidence of water.hereâ€™s how i decided to look for evidence of water and looked for hydrogen and oxygenand whatever you do. and hereâ€™s what i found and hereâ€™s what i think it means.now if you disagree with me, tell me what i think i got wrong. did i get the wrong question?did i understand the data wrong? did i use the the wrong method? and if they can answerany of those things and if iâ€™m good scientist i should be able to respond to those things.thatâ€™s called defending your hypothesis, right. if i canâ€™t respond then iâ€™ve donebad science and then shame on me. so youâ€™re one of those tricky ones, becauseyes you know i playing the role of an executive,
i hear everything that youâ€™re saying, isee your data and yet looking at that planet it sure looks like it has a pinkish cast tome. and infact i know that it does and iâ€™ve been working with planets that have a pinkishcast, or sets of data like this one my entire life i know this population. and youâ€™renow telling me something through your scientific methods that contradicts firm beliefs of howi see the world and i know the way the world works. what about that? so michael let me respect your knowledge ofpink planets. i really appreciate your observation and your experience and iâ€™m certainly notcalling you wrong because what you believe is what you believe. help me understand howyouâ€™ve come to this opinion about the relationship
between life and pink planets.and pretty soon whatâ€™s going to happen is youâ€™re going to be saying, â€˜it just is,itâ€™s in my experienceâ€™ and iâ€™m not calling you wrong. iâ€™m asking you to help me understandwhy you believe what you believe. so if weâ€™re really going to be scientiststhen that means we have to be open to conflicting opinions and if weâ€™re open to conflictingopinions, those people who have those opinions should be able to defend them.now sooner or later you could say, look iâ€™m your boos and what part of iâ€™m your bossdid you fail to understand and go back and prove my pink planet hypothesis right. ifyou go and tell me what to go prove, youâ€™re basically asking me to engage in bad scienceand now we have a whole different problem.
boy it sure sounds like a lot of businessesi know. yeah it does. it sounds like a lot of themiâ€™ve worked with, fortunately not the one i work for right now with not at all. butyou also have to be very careful. you can be right and dead, and part of being a gooddata scientist is being able to use what youâ€™ve learned to tell a story that credibly approachesa problem that somebody has. you canâ€™t just walk in and say, â€˜oh look i used all ofthese great methods and look what i learned and you should bow down to the dataâ€™ â€“ absolutelynot. you have to understand the problems that peopleare trying to solve. you have to understand how you can be relevant in the context ofthose problems. you canâ€™t always do all
those steps i articulated because time, andmoney, and reality are going to get in the way sometimes. so you have to be reasonableand practical. but by all means you have to be empirical. you have to do something thatyou can repeat. you have to do something that you can defend.you should never use the fact that someoneâ€™s in a hurry or shouting loudly to go and dosomething completely irrelevant or negligent. you have to be very careful. thereâ€™s a lotof solutions out there that will let you just ingest a ton of data and push the magic buttonand reach some kind of a conclusion. and thatâ€™s great; i mean sometimes thatâ€™s all you have.you have no idea what this data means, but at some point you have to do better than that.a great example is if you just didnâ€™t know
anything about playgrounds, and you drovepast a playground and you saw a bunch of kids playing in the playground, you might initiallyconclude that this is chaos; itâ€™s just a bunch of kids doing stuff.if you observe the playground more closely, you would see a baseball diamond or a footballfield, youâ€™d see lines. youâ€™d see things that imply some sort of structure, and youmight if you looked closely see playground monitors. you might see people there thatare enforcing rules. you might see the little boys and the little girls are doing differentthings, theyâ€™re playing differently. you might start to uncover behavioral aspects.by using correct observational techniques and being careful about what you see, youâ€™dlearn mora about that playground. now you
could yell and scream and say iâ€™ve lookedat playgrounds all my life and you never, you absolutely never see business being conductedon a playground. and i say, â€˜well thatâ€™s great. but what about those two guys in thesuits over there pointing at the foundations on that jungle jimâ€™, â€˜oh well those arecontractors, theyâ€™re not kidsâ€™. â€˜well you didnâ€™t say they werenâ€™t kids, youwere talking about playgroundsâ€™. you know weâ€™ve got to make sure that we understandwhat weâ€™re saying to each other. so you were talking earlier about connectedspaces, and people and relationships, can you elaborate what is a connected space inthis context. well great question, so you have to be verycareful when you use a term like that that
you know what you mean. things can be connectedin many many different ways in even defining what a connection is is somewhat problematic.one of the things that we talk about and what i talked before is a dyadic relationship.itâ€™s a relationship between two entities. so at dun and bradstreet we mostly talk aboutbusinesses. a connected relationship might be ownership, so you have a branch and a parentowner of that branch; you have a subsidiary and a parent. so you might have, we defineit and one type of linkage we have is majority ownership. so if thereâ€™s a subsidiary thatowns more than 50% or something that would be a type of dyadic relationship.another type of dyadic relationship amongst business entities might be if theyâ€™ve suedeach other or that theyâ€™ve mentioned each
other on social media, or that they have co-collaboratedin some observable intellectual property. or that someone from one company is connectedto another company on a platform like linkedin or facebook or something like that. thosewould all be types of discoverable dyadic relationships.and then the question is how can you observe all of those dyadic relationships and howtheyâ€™re changing over time time to form conclusions about things, like maybe the businessis growing, or that the two companies are collaborating, or that the two companies areadvisories. or that there seems to be some kind of fraud or malfeasant behavior goingon. those might be all conclusions that you might try to reach to observing those dyadicrelationships. there are other relationships,
there are more than one-to-one. they havedifferent names and they have different problems and uses. give us an example of something thatâ€™s reallyhard. whatâ€™s the hard kind of problem you face and maybe talk about it in a businesscontext? so hard always is involved where peopleâ€™sbehaviors is involved. so fraud â€“we keep talking about fraud, and fraud is hard, becausebad guys keep innovating while weâ€™re innovating in how we detect the behaviour of the badguys, and thatâ€™s a really bad problem. another example that involves behavior iswhen businesses have connections to each other that are not formed through owning piecesof each other. so they form temporary relationships,
alliances. they form groups. they form youknow, lots of different words for you have to sue us separately; like weâ€™re not partof the same thing, right. really hard because theyâ€™re very squishy kinds of things andyou think that involves behavior is not observing a strict set of rules that you can go anddiscover. so itâ€™s when you either have the human elementor you know that there are connections that exist but the companies have been structuredto reduce or eliminate to the extent possible direct business connection even though theyare related there. yeah there may be intent like that and behindit or it might just be something thatâ€™s happening sort of organically. you know ifyou think about sometimes when thereâ€™s an
external event like a flood or the arab springor some you know major change in you know whoâ€™s in charge of the country or the regionor whatever. all of a sudden in the business world you see a lot of shifting around anditâ€™s not like everybody gets together and says okay how are we going to react to thefact that there was an earthquake in taiwan. itâ€™s just that there was an earthquake intaiwan and now some businesses start doing work for humanitarian reasons and other businessesstart seeing opportunities where they didnâ€™t exist before. you get this, i donâ€™t wantto say chaotic, but atypical behavior, and to say that youâ€™re going to model that behaviormaybe if something very very similar has happened in a reasonably recent period of time againstthe same type of universe you might be able
to do that. but usually all of those preconditionsarenâ€™t met, so you have something very squishy and it involves behavior and you have to respondto it, or choose not to; either one is a choice. how do you make the decision of which dataproblems to solve. since you mentioned thatâ€™s the first question, how do you decide whatâ€™sa good data problem to be looking at? well to be looking at and to solve are twodifferent questions, i have to kind of park that but you know there are some guiding principles.so we have general guidelines â€“ i call them foul lines. we donâ€™t just do things becausetheyâ€™re fun or interesting or scientifically challenging. thereâ€™s got to be some realbusiness frame for it. at the beginning i talked to you about totalrisk and total opportunity so at dun and bradstreet
normally we look at things like that. we lookat on the total risk side, it usually has something to do with are they going to pay,are they going to stay in business, are they going to commit some kind of maleficent actor are they going to in some way threaten some business objective.on the opportunity side itâ€™s how big are they, how much do they look like my best customers,how much do they complement my best customers, whatâ€™s the white space in this industry.those are all opportunity kinds of questions. so normally i would start from one of thoseframes, if somebody just said, â€˜hereâ€™s this really cool language problem and youguys do a lot of computational linguistics, wouldnâ€™t you like thisâ€™. well look atarabic, well yeah, of course iâ€™d like to
look at it but do i have any data to lookat it and is it part of a problem that our customers have and would anybody notice ifi made any progress there. and you know if the answer to any of those is no then youprobably ought to move on and just keep an eye on this and come back to it later. a different question here altogether, whatâ€™sthe relationship between data science, big data, artificial intelligence, machine learning?we hear these buzzwords thrown around and usually theyâ€™re thrown around by marketingdepartments, so from a data science perspective whatâ€™s going on with that? well artificial intelligence and machine learningare tools that are used by data science. so
some of these things that you hear about,neural networks and quantum algorithms and machine learning and those are all tools andtechniques that can be applied in the field of data science.data science is a complex combination of being able to understand the methods for understandingdata scientifically and also using it to tell a story. and in the business world that storyhas to relate to a real problem that is meaningful to the population. so if you think about datascience as the part where you use all those other things. and i would also add we oftenmisuse all of those other things because youâ€™ve either been tricked into using them by somebodywho say that theyâ€™ll solve all of your problems or you are you know, hoping that is somehowgoing to be your silver bullet or youâ€™ve
been in a conversation that started with myfavorite words, â€˜why donâ€™t we justâ€¦â€™, you know and weâ€™ll push this button andeverything will get easier. so you know thereâ€™s a sort of a dark sideto all of this that youâ€™ll go and use all of those tools and techniques without reallyunderstanding what youâ€™re trying to do. that would be like me going into a hardwarestore or into a tool store and buying a laser saw â€“ and iâ€™m not a carpenter. well thatâ€™sgreat, you have this tool and you know youâ€™re trying to carve a pumpkin â€“ you bought thewrong tool. you need to understand what youâ€™re trying to do; you donâ€™t just jump rightfor the tools. so data science is about telling a businessstory, is that the ultimate goal, your end
objective in this sense. using data to address a problem and to beable to answer that problem in a way thatâ€™s meaningful to the business. and what i wouldadd, the science part is, in a repeatable defendable way. many people would not addthat last part. i would. so defeatable – repeatable itâ€™s often defeatable as well but thatâ€™sanother problem. well if you go through the steps youâ€™vebeen describing then hopefully itâ€™s less defeatable and more repeatable. yes we should hope so yes.
so what about innovation, so right now theinternet of things, innovation around data seems to be where the future is taking us,may be give your point of view on that. yeah so thanks for that. you know it usedto be a couple years ago if you wanted to be a pundit and talk about technology andwhere we were going you had to say mobile, social, cloud, analytics. you had to get thosefour words out. and what iâ€™ve been saying recently is that you know those four wordsleads to lots of other words. so you know if youâ€™re going to talk about mobile youâ€™dbetter talk about the internet of things right. mobile technology is just sort of things thatare out there and moving around, and the internet of things, some of those things move aroundand some of them donâ€™t but theyâ€™re certainly
out there and we may not necessarily knowwhere they are or what they are when theyâ€™re talking and that presents a whole slew ofproblems. just like if you talked about i donâ€™t know,cloud computing youâ€™d better be talking about data sovereignty and you know the differentrules and regulations. you canâ€™t just put data out on the cloud. nobody thinks thereâ€™shard drivesâ€™ floating around on the cloud right, that data sits somewhere.so your question was about the internet of things which to me is an extension of youknow where we were a couple of years ago. and i think a lot of people think this thatweâ€™ve got a thing or two to learn in this space. if you look at bluetooth from a numberof years ago i think as a good analogy, bluetooth
was sort of invented and then it took about10 years to catch on. and part of the reason was in my humble opinionwas we forgot to think about a number of questions, like you know, itâ€™s great that you can havea bluetooth headset but how do we keep my headset discovering your phone and eavesdroppingon it right. well weâ€™ll put this four digit passcode in there and nobody knows the number,so theyâ€™re always 0000 or 1234 and all of a sudden all you have to do is try a few numbers.so weâ€™ve got to be better than that. with the internet of things weâ€™ve got you knowtens of thousands, hundreds of millions of things right now talking to millions of otherthings, and weâ€™re sort of making those same mistakes.there was a big issue not too long ago. i
wonâ€™t name the company but there was a dolland the doll could talk to a cloud application and your kid could talk to the doll and thedoll seemed to know what was going on and it would get smarter as other kids talkedto the same doll. thatâ€™s great except someone realized that that was a device on the internetof things that had an ip address, and if i can hack into it it has a microphone. andif the kid leaves the doll in the parentsâ€™ home office i can eavesdrop on the conversationand maybe short stoke or do things that are maleficent, and then that started to happen.whoops didnâ€™t think of that. so if youâ€™re going to build something onthe internet of things youâ€™d better be thinking about how it might be used in unintended ways.you also better be able to think about what
happens if itâ€™s used in intended ways ata scale that goes way beyond of what you ever intended.you also better be able to think about how other people might use it to solve unknownunmet needs. somebody starts to use your thing to solve a completely different problem thatyou didnâ€™t plan on it solving, and you didnâ€™t build it for that purpose and now all of asudden youâ€™re negligent in a way that you didnâ€™t even intend.weâ€™ve got to be a lot smarter about this. we canâ€™t just rush to say, â€˜oh isnâ€™tit great that things can talk to other thingsâ€™, yes, but what might they say to each otherand how might they all of a sudden help people do things we didnâ€™t intend. very big questions,weâ€™d better be asking those questions.
and as youâ€™re asking those questions atdun and bradstreet, what are some of the answers or the points of view or the trajectoriesthat youâ€™re coming up with. well so things themselves donâ€™t necessarilyplay into our landscape right away, although some of those things might talk to us andask about businesses. i wonâ€™t get into the complexities, but thereâ€™s ways that thingscan ask about businesses right. the reality is the only things that we foresee askingabout businesses right now are other computers. so we worry about the transactional responsetime of that question and answer and the anthology of the question and the anthology of the answerand all thatâ€™s great, but now do we do anything to detect what type of thing might be on theother end of the question.
and you know without getting into any security,there are things that we do today to make sure that the thing that weâ€™re talking tois something that we intend to be talking to. weâ€™ve got to do a lot mot ideation tomake sure that what we believe remains true as things get smarter and talk faster andfind new ways to whisper in our ear and all of that, just like any other company thattouches the internet, you just canâ€™t say, well we weâ€™re safe yesterday so weâ€™llprobably be safe tomorrow. thatâ€™s crazy. what advice as we go towards the close here,what advice do you have for business people to use data science effectively? so maybe i can tell you a quick story aboutsomething that happened in my experience here
that literally has changed my life. numberof years ago we had this horrible situation in japan where there was an earthquake thatcaused an tsunami, the tsunami hit the coast of sendia, 20,000 people were washed out tosea. you had a nuclear meltdown at the daiichi power plant. you had all these things happeningto japan all at once. absolutely horrific. unprecedented.no data science in the world ever foresaw anything like that happening right. and herewe are, we come together. i was on a conference call. a few days later i was in japan rightbefore that happened, and we said look, we can do things from a humanitarian standpoint,but also from a business standpoint. thereâ€™s got to be something we can do to help thesemostly small businesses in japan that are
kind of living hand to mouth and now everybodyassumes that are now out of business. many of them are still in business. many ofthem are still there and doing fine and if everybody assumes theyâ€™re not then thingsare going to get even worse on top of radiation and tidal waves, theyâ€™re going to have todeal with no money. so we started to look at a database that saideverything was just the way it was just before this thing happened. to fix it the old fashionedway was going to take a very long time. way longer than these people have. and so we hadto look at new ways of collecting information. we started to look at new types of data thatwere available. we looked at crowd source radiation data. we taught algorithms how tofind the skyline and to measure the change
in the skyline before and after. we lookedat uninterrupted straight and curved lines that became interrupted and geospatial imaging.we looked at the propagation from the tectonic wave from the epicenter of the earthquakeand we built 19 different car detectors to measure whether or not cars were there andwhat they looked like. you could argue that some of that capability already existed butwe didnâ€™t have time to go find it. very quickly we put all of this together andwe built it heuristic like i described to you before and we taught it how to look atthe data and we fixed all of the data in japan in about three months, and it would have ustake well over a couple of years the old fashioned way for very good reasons.we then had this dataset that was probably
the most valuable dataset that you could ownat that point relative to japan, and we could have made a lot of money on that. and whatwe did we put it on the internet and gave it away for free, and every time i tell thatstory i get tears in my eyes. so my very long winded answer to your question is weâ€™vegot to be better than just making another dollar.weâ€™ve got to think about the unintended impact of doing nothing. weâ€™ve got to thinkabout letting the bad guys get ahead of the good guys. weâ€™ve got to think about whatweâ€™re teaching our kids. weâ€™ve got to think about what weâ€™re teaching ourselvesor weâ€™re just going to drown in this data and lose unbelievable opportunity and justfind ourselves swimming gin stupid decisions
because we didnâ€™t have the time to do anythingbetter. weâ€™re much better than that, and i truly believe that if we bring science intothe room we can at least make new mistakes everyday which is a very good start. and what words of advice do you have to businesspeople that are dealing with the data and theyâ€™re finding that the data is pointingout viewpoints on the world that are differently from their previously held beliefs and weknow change is hard, so what advice do you have there. so i would say three things. first of alljust knowing that you have that problem is the first step, so being whatâ€™s called thereflective leader. thinking about what you
believe in and why you believe it is extremelyimportant. your example of the guy with the pink planets before, screaming that heâ€™sgot lots of experience thatâ€™s great. but weâ€™ve got to be better than that. so thefirst thing is to be very clear about what we believe and why we believe it.the second step is once we understand that and presumably we can ask better questionsabout the business and what weâ€™re trying to prove and all of that. the second stepis to look at the skills that weâ€™re bringing into the organization and make sure that weâ€™renot just brining in people that have rebranded themselves in this data science space, butpeople that really understand the different ways of knowing, the different ways of discovery,the different issues with regulation and with
synthesis of information; brining in the skillsthat we need. and the last thing and probably the most importantthing is constantly looking inwardly at ourselves and making sure that the skills that madeus successful so far, those are just table stakes. weâ€™ve got to be constantly improving.this is a whole new world out here and weâ€™ve got to have the conversations thatâ€™s tough,but you know youâ€™re not as good as you think you were, because youâ€™ve got to be muchbetter tomorrow than to just stay where you were. anthony scriffignano, chief data scientistat dun and bradstreet, what can i possibly ask you beyond your last comment. thank youso much for taking the time today, itâ€™s
been enlightening itâ€™s been a delightful conversation, thankyou so much for the opportunity. we have been talking with anthony scriffignano,who is the chief data scientist at dun and bradstreet. what an amazing conversation andi would like to thank anthony and thank the folks at dun and bradstreet for making thispossible. and especially to everybody who is watching thank you and come back next timebecause weâ€™ll be here next friday as always.