Lumira’s blend of agility and power has made it very popular with financial analysts. Let’s examine using some simple statistical functions to do a quick sanity check of some of our data to check for fraud. For this blog we’ll implement Benford’s law in SAP Lumira. The application of this mathematical law is useful in fraud detection during financial audits.
If you don’t trust that your data is correct, then you cannot make data driven decisions.
Wikipedia describes Benford’s law as “an observation about the frequency distribution of leading digits in many real-life sets of numerical data. The law states that in many naturally occurring collections of numbers, the leading significant digit is likely to be small. For example, in sets which obey the law, the number 1 appears as the most significant digit about 30% of the time, while 9 appears as the most significant digit less than 5% of the time.”
Through this, if your data conforms to this slope, you can say that it has less chance of being fraudulent. If it does not conform to this slope, then it warrants further investigation.
Let’s walk through how you would easily implement this statistical test.
I am using Lumira 1.31 and a sample dataset from stackoverflow which can be downloaded here:
http://samplecsvs.s3.amazonaws.com/SalesJan2009.csv
First, we’ll import the file into Lumira 1.31.
Then the magic is going to happen in the ‘prepare’ room. Click on prepare
Benford’s law treats numbers as strings, so we need to convert our price object to a string. Click on the header of the price field and you will see a number of data action on the right hand side. Choose to convert to text with 0 decimal places
This will create a ‘price (2)’ object. Click the gear icon on this new field and select ‘create calculated dimension
’
Name this new dimension ‘leftmost int.’
Use the substring formula to extra the first character
SubString({Price (2)},1,1)
Once you do this you will find that on the right hand side Lumira automatically profiles the values, so we can start to see the distribution. However Benford’s law require that we really compare the log values of these leftmost int. So, we need to do a bit of math on top of the new field we just created. To do this, let’s convert the leftmost int into a number type.
Click on the header of the leftmost int and select ‘convert to number’
This creates a ‘leftmost int (2)’ column. Click the gear icon on Leftmost int (2) and select ‘create measure’
We will then create a new measure that will use this leftmost (2) object
Name this measure Benford distribution. The formula is:
Log({leftmost int (2)} + 1) - Log({leftmost int (2)})
You are now ready to visualize the data. Go to the Visualize room and create a Column chart with Benford distribution as the measure and leftmost int as the X axis
We can see that according to the distribution, this dataset appears to be highly suspect. Given that this is a sample dataset, it fits with our expectation.
I hope this blog has been helpful. Please post any questions below