How to Detect Fraud Using Benford's Law
Hi, welcome to another New Jersey Friends Accountant discussion. Today's discussion is gonna be very interesting because it's one of the techniques we use in the vast majority of our forensic accounting analysis. It's called Benford's Law. And I've been getting a lot of people saying, "Hey, how do you actually... They want me to go into some of the techniques we actually use. So, I'm gonna discuss this Benford's Law. Now, as a forensic accountant, I have many ways and techniques to spot fraud. One of the ways we detect fraud is especially when analyzing tax returns, general ledgers, and other items that contains a large amount of numerical data. Now, remember, we get into a case, a lot of times people will give us a huge amount of information, millions or hundreds of millions of pieces of data, and we have to find out if it's random or is there fraud? Is it manipulated in some way? And the first thing we always do is we apply Benford's Law. And what the law basically states is that any random number will have a specific result as to which digit appears in each data set. And the way it does this is through what's called a base 10 logarithm. And it's very, very accurate. And when you apply this to a large amount of numbers, you should get something that looks like this bar chart right here. Okay? You could see here that this is 30, 18, 12. And you can see how the bar chart kind of slants down. And if you do this analysis... So, if you run this information on data, and you don't get something that looks like this, there's a high probability that it is manipulated in some way. It's not a natural occurrence. For example, when you take Benford's Law, okay, and you apply it to, for example, the distance of the planets from the sun, okay, to see if it's manipulated, if someone put the planets there or if it's random, you'll get a histogram, it'll look just like this. Okay? Or if you plot the distance of the stars from the earth, you'll get something like this or if you take all the phone numbers in the phonebook, you'll get the same histogram.
So, the reason for this is that what it simply does is takes the first number that appears in a data set. And we analyze this data set, and we spot anomalies that tip us off that there's a high probability that the numbers have manipulated. From there, we can perform a forensic accounting. Once I know that there's problems with the data we have, we can then dig down and find out what happened. For example, here is what it should look like, based on Benford's Law. But now you look at these here, okay, revenue per PSE firms, population, motor vehicle theft cases, they're not really in line. So what that's telling me is just this data is probably manipulated in some way. Okay? Something's wrong with the population count. Something's wrong with the number of motor vehicle theft cases. Okay, maybe some of these aren't thefts. So, I mean, the one that's pretty close is population, right, almost, you know. So maybe there's a problem up here, something going on. But anyway, we apply this and you can look at the data, and it's pretty easy to see that there's an issue there.
Now, the steps in using Benford's Law. Okay. The one... Let me just say, it's very difficult to understand Benford's Law. It's very complex, you know. The logarithm explaining all that can take days, if not weeks to understand. But let me just give you an example. I'll go through an example because it's the easy way to understand it. When I'm making this video, it's when COVID-19 is pretty prevalent. It's the end of the summer, and a lot of cases have been reported to CDC. Now, some people are saying that the cases are over-reported. Hospitals are over-reporting cases, that it's really... The deaths are over-reported. And so what I'm gonna do is I'm gonna go to the CDC website, and I'm gonna apply Benford's Law to the data. And I'm gonna go through and show you how we actually do it in real life. Now, here is the website for, okay, Center for Disease Control and Prevention, okay. And here they're talking about the deaths, USA, I mean, this disease is horrible. But total cases, new cases, USA deaths over 200,000. So what I'm gonna do is I'm gonna download cases in the last seven days by territory. I'm gonna download this data here. Let's see what this looks like. Okay, here's what I get when I download this from the CDC website. Okay? I'm gonna fix up this data, so that we can utilize it now.
So you could see here, the total cases, confirmed, probable cases, etc. Okay? It goes through all this good data here. And let's say someone hired my firm to do a fraud analysis, the first thing I would do was I would utilize Benford's Law, okay? What I would do is basically get rid of all this data here and just focus on the total cases in the state. Now, we've already done this. So, we would take this data and put it in an Excel sheet because Excel has some decent formula capabilities. And what it would look like, once we took the data, it would look like this. Okay? You could see here that we have the states, the number of reported cases, and then what we do is we utilize this formula, which is left B2. What it does is it goes to here and takes the first digit and puts it in this column. Okay? And for all the states and some of the territories in the United States. And then what we do is we want numbers 1 through 9, which are the digits and Benford's Law states that a certain number of these numbers here should have...start with a 1, a certain number should start with 2, a certain number should start with 3.
Then I go in this other formula, it's countifs, what it does is it takes all the numbers that start with 1 and puts them...there's 23, all the numbers that start with 2, there's 8, and all these numbers that start with 9 is 2. Okay. So it went through all these columns. There's a total of 56 here, okay. And I verify that because some of these have zeros. And you know, there's 56 basically, states and territories in this database that we downloaded. And then we do a calculation. And this is 41% of the numbers here start with 1, 14 start with 2 and 3. So now we have the percentages, and then we just do this histogram here. And you can see this does not look like, you know, a typical Benford's Law would predict, okay? the logarithms are way out of whack. So this is telling me here, just looking at this data, that this is not legitimate data. Okay. At this point, if we were doing this case, I would tell, you know, my clients, "Hey listen, the data we're looking at is definitely manipulated." Okay. Now what we need to do is, then we'd go in and look at the various hospitals, see how they're reporting these cases. Where are they coming from? How reliable the data is. We test it. And we, you know, actually back up. But this is telling me, if I had to go to court, this is the first thing I would show is that [inaudible 00:08:22], they're saying that the data is manipulated, okay, because we know what it should look like, right? What should it look like? It should look like this. It doesn't. Okay? It looks like this. So anyway, so we have some situations here. Benford's Law is great. I recommend it if you get into large databases. We have to quickly find out if it's something you wanna look into.
So listen guys, we went through this quick. If you have any questions, just leave it below. And if you like this video, please join my YouTube channel. It helps a lot with, you know, getting the recognition and name out there for this kind of stuff that we hope you enjoyed it. Thanks a lot. Bye.