Understanding how Artificial Intelligence can redesign recovery audit services [308705]
Introduction
For the largest part of my professional life I have been working in the outsourcing sector for various large companies that did their business in different industries from manufacturing to retail to banking. Within these companies I [anonimizat] I [anonimizat], and accounts receivable.
Within this over 10 year experience I have gained a lot of understanding off back office processes of large multinational companies and have been exposed to the difficulties they have in avoiding operational errors especially when we are talking about millions of transactions every year (invoices, [anonimizat], [anonimizat]). For the last 2 years I have been working for Societe Generale European Business Services (a [anonimizat] ) where I [anonimizat], accounts receivable and management accounting area.
My teams handle approximately 1.000.000 transactions per year per (400000 invoices, 200000 purchase orders etc.) year and we make about 4 [anonimizat] (under, over payment), VAT errors and fraud. With this in mind I have started an internal project for a part of our transactions and only 1 type of error i.e. to identify double payments made in the last 5 years and to attempt to recover as much as possible. Within this project we have used our extensive experience in the transactional space to define algorithms which would point towards potential double payments. We have done so and the project was a [anonimizat] 4 million euros in double payments out of which we were able to recover close to 2 million euros. The project funded itself and had a payback period of a couple of weeks.
This brings me to my motivation for proposing a [anonimizat]. I believe that this is a strategically important project for 2 very important reasons. The first one is that within the above mentioned projects we have only looked at a small fraction of the Societe Generale scope and only at 1 type of error (i.e. duplicate payments) and as such there is great potential if the Recovery Audit Spin off or additional service line which could offer this activity to all of the entities of the group especially as the service would improve the clients bottom line at no cost. This effort would finance itself out of the operational errors that are recovered and a part would go directly to the client as net benefit with no effort involved.
The second reason is the SG European Business Services should as some point after the offshoring of processes is finished expands to other areas in order to continue it growth and creating such a spinoff would allow to go out on the Romanian market where such a service is not very present. The spinoff would validate the business opportunity within the group and would gain a lot more experience in this business as well as fine tune the technology aspects (tools , algorithms, storage). With this validation expanding on the Romania market would be a lot easier in terms of credibility and chances of success.
I believe that this is the right moment to start this service for a few good reasons. The first one is that big data and machine learning have gained a lot of momentum in the finance area and technologies to support them are available. Within this spinoff we would start to leverage on the extensive business knowledge gathered in the transactional space of how to identify various operational errors and risks and with big data and machine learning we would develop a solution that we would apply within the group before expanding to the Romanian market. The second reason is that Societe General is a very large company there are many entities where this learning can take place which would not only improve the algorithms used but would generate a lot of revenue for the spinoff and positive bottom line impact for the respective entities.
Background on recovery audit
Companies today and especially large companies that deal with very large number of payment transactions have significant issues in avoiding financial losses due to errors in the process chain. According to a report published by Ardent Partners (Ardent Partners – ePayables 2013: AP’s New Dawn ) in 2013 as close 1% to 3% out of a company’s accounts payables is lost due to due to duplicate payments and overpayments. In addition the Institute of Finance Management (Institute of Finance & Management – 2013: AP Department Benchmarks and Analysis) has estimated 0.5% to 0.1% is lost only due to duplicate payments. In order to exemplify this dramatic situation if a company has paid 100 million euros it has lost 100 to 500 thousand euros due to duplicate payments alone.
This situation has caused the apparition of a large number of recovery audit companies. Recovery audit companies are specialized in identifying and recovering lost money caused by duplicate payments, overpayments and/or under deductions. The services of recovery audit companies have been requested by large companies to help them in identifying any erroneous transactions that may have caused them a loss that they are not even aware of. External audit usually does not offer the same depth of analysis and keeps findings at high level while recovery audits analyse a significant higher number of transactions and subsequently have more findings
There is another factor that has pushed companies into attempting to analyse their payment transactions more thoroughly which is the Sarbanes-Oxley act of 2002. This act was signed by President Bush in an attempt to re-establish confidence in the US markets after several financial disasters culminating with the Enron scandal. The focal point of the Sarbanes-Oxley Act is the financial statement certification that attempts to create a transparent representation of the financial situation as well as the effectiveness of internal controls and procedures.
So Sarbanes-Oxley is aimed at ensuring an accurate financial statement , recovery audit has the same aim as through the amounts recovered the financial statements can be improved (Section 302 of the Sarbanes-Oxley Act,” SOX-Online (www.sox-online.com) ) .
Business Context
Now let’s examine the business problem in more details to better understand the issue companies face in identifying these operational errors. In order to better grasp the issue we will narrow the analysis down to one type of operational errors which is duplicate payments.
We need to look at what are some of the main causes for duplicate payments to better understand the context.
Invoice related causes of duplicate payments include the following:
Multiple copies of the same invoice being sent due to delay in payment
High number of invoices processed and as such hard to control
Human error in data input (mistake in keying a 0 instead of the letter O)
Intentional fraudulent transmission of multiple invoices by the same supplier
As mentioned above the Sarbanes-Oxley act forces large companies and especially listed once to a specific set of controls that need to be performed in order to ensure the risk is mitigated.
In a study by KPMG 64% (KPMG: Low Tech Approach to High Risk Challenges – 2012 ) of the organisations rely on manual controls to verify the duplicate invoices. If we look now at the controls deployed by most companies on their invoices in order to avoid duplicate payments we can observe the following:
most companies have a control within their operation to avoid duplicate payments however in all cases there is a person that actually verifies the outcomes of these control
these controls are done based on specific criteria ( e.g. same invoice number, amount, supplier) that pinpoint towards a potential error
these criteria is static i.e. it does not evolve on its own based on real outcomes
these static controls generate a high number of false positives and due to the fact that companies have limited human resources to look at these potential errors on purpose companies choose only to check the potential errors with the highest likelihood of being a true error and as such leave a lot of payment transactions unverified.
The above was also confirmed by SMEs from the outsourcing and banking industries such as:
Alexandru Stoenescu – Vice-President Finance Genpact LLC (Global professional services firm focusing on Business Process Outsourcing (Business Services & Technology Solutions) and Digital Transformation) ,
Lucian Vintu – General Manager Finance WNS Global services (Global Business Process Management leader)
Serge Pouhaer – Global Process Owner Procure to Pay for Societe Generale Group
In addition the SMEs confirmed that they are actively looking for solutions driven by the new technological development to increase the efficiency of their duplicate payments controls as the established controls among their clients are not effective in mitigating the risk .
All of the SMEs have confirmed that they have strong convictions that the solution to this problem will be provided by the emerging technologies such artificial intelligence supported by big data foundation. As such the research objective will be to understand “how can artificial intelligence support in improving the identification of duplicate payments”. Based on this research question the idea will be developed in a consultancy plan to develop the above mentioned spin-off.
Lets explore in more depth what artificial intelligence is and subsequently how it can impact the recovery audit service.
Artificial intelligence – what is it?
(“Artificial Intelligence A Modern Approach Stuart J. Russell and Peter Norvig)
There are many definitions of artificial intelligence but a very good consolidation of these definitions was made by Stuart Russell and Peter Norvig shown above that go along the lines of two main dimensions. The definitions on top are looking at thought process and reasoning while the ones at the bottom focus more on behaviour. In addition to this the definitions on the left measure the outcome as compared to human performance while the ones on the right measure it compared to an ideal concept of intelligence.
As such according to Russel and Norvig this leaves 4 possible objectives within the field of artificial intelligence:
Historically all 4 avenues were explored and all of them have provided valuable insights that we will explore in more details.
Systems that act like humans
In the 1950s Alan Turing attempted to provide an operational definition of intelligence in terms of achieving a specific cognitive performance that is sufficient to fool an interrogator i.e. that if during an interrogation of a computer by a person the person can not say if on the other end of the conversation there is a computer or a human being we can consider this computer intelligent. In order to be able to fulfil these requirements the computer would need to have the following capabilities:
Natural Language processing , the ability to process and understand human languages
Knowledge representation , store and understand received information
Automatic reasoning , using the stored information to generate new conclusions and answer questions
Machine learning , to adapt to new scenarios and to identify and extrapolate patterns
Systems that think like humans
In order to be able to say that a system thinks like a human it is required to understand how humans think which can be done either by introspection or through specific psychological experiments. Building on this if a system treats the input/output and the timing similarly to a human we can say that the system thinks like a person.
Cognitive science is a key field that can help in describing how the human mind works in terms of language processing , perception, memory and reasoning. This processes if described in enough detailed can be transmitted to a machine through knowledge representation techniques.
Systems that think rationally
In order to understand what acting rationally would look like in case of a computer we need to come back to the fundament that developed Logic as a separate field which starts from the syllogisms developed first by Aristotle according to which specific argument patterns ensured correct conclusions if the premise was correct.
In case of a computer program this approach can be represented under the form of a logical notation that can provide a very precise statement about things in the world and the relationship between them. A computer program given this representation can attempt to solve a specific issue by following this rational path and can be considered as thinking rationally.
Systems that act rationally
Now it is obvious that we need to think about acting rationally slightly different when it comes to a computer program. Russel and Norvig have defined acting rationally as “acting as to achieve one’s goal given one’s beliefs” (Artificial Intelligence A Modern Approach Stuart J. Russell and Peter Norvig).Artificial intelligence can only perceive and act and as such needs to be viewed as rational thus artificial intelligence as a field attempts to construct rational agents.
As described above the field of logic has defined specific thinking patterns that allow to arrive to the correct conclusion given a correct premise. People involved in AI focused a lot in making sure that taking into account this patterns that are transmitted to a computer program as a logical notation provide the correct inferences. It is required for a rational agent to arrive to these correct inferences in order to prove his rational reasoning capability for a specific scenario that allows him to reach a conclusion and to act on it.
How might AI impact the recovery audit industry?
Now that we have an introduction to Artificial intelligence lets see how this can have an effect on Recovery audit services. We will do so by exploring in more detail some elements of the intelligent computer i.e. Machine Learning and Natural language processing. Lets see first of all how exactly a Recovery Audit Company operates and what are the stages they undergo to understand the underlying operation.
Several years ago while I was working for WNS Global Services Global professional services firm focusing on Business Process Outsourcing (Business Services & Technology Solutions) and Digital Transformation) , as a Global Operating lead for a manufacturing client I was asked by the senior management to review a company that provides recovery audit services and see if we could partner to develop a new service for our clients as well as for prospects.
The company was and still is called “Transparent” (www.transparent.nl) and operates in the recovery audit field . Their business model is based on attempting to discover erroneous financial transactions , recover the lost money and charge the customer a percentage out of the recovered transactions. Recovery areas according to their website are: duplicate payments, overpayments, unprocessed credits , unissued rebates etc. The attractive part for the customer is that there is no cost involved as the process funds itself out of what is recovered.
By analysing the operation of Transparent I discovered 3 distinct phases in their approach:
Data extraction and data cleaning
In this phase the recovery audit company requests from the customer all the data required to kick start their analysis such as: Payments made in the last 3 years, database of active suppliers, database of contracts etc.
At this stage considering the variety of companies out there on of the main tasks is to take this entire data consolidate and clean it so that it can be entered in the analysed and arrive at specific outcomes. This is a challenging step as data is usually received in a great variety of formats (pdf, excel, txt, csv ) and also more often then not it is not clean i.e. it contains duplicate fields, incomplete information, it is not structured ( unstructured data is data that follows no relational model i.e. does not have a predefined data model) and as such extremely difficult to analyse.
2. Data Mining
In this phase the data has been extracted and cleaned and it is ready for analysis. The data is entered in a tool for analysis that uses specific static parameters to identify areas of potential loss for the customer in the following manner:
For duplicate payments
Within the file with the payments made in the last 3 year the tool is applying an analysis that attempts to extract all transactions that have criteria’s in common e.g. payments where we have the same supplier, the same amount, the same invoice number and the same date.
The system is to do various other combinations of just 3 elements in common between transactions in order to cover scenarios where there was a human error in data entry (e.g. if you have an algorithm that includes the invoice number for example and the person that processed the invoice typed “O” instead of “0” it will not be highlighted )
Within the same data other algorithms are deployed to discover different areas of error, a good example here is VAT error. One case is the reverse charge mechanism where a lot of errors occur that may lead to wrong VAT statement. The typical algorithm deployed here on the payment transaction is to look for payments made to a supplier from a EU country say Germany for example however the VAT quote mentioned is from another country say France. As we know in case of transactions within the EU the reverse charge mechanisms says that the customer has to declare the VAT and not the seller. As such a potential error is highlighted.
A significant issue for recovery audit companies is that until the end of stage 2 they have still no generated any income for themselves as they have only identified potential area of recovery but no actual recovery has happened in order to charge the customer anything.
In the interview the CEO of the “Transparent” at that time company I found that in one of the biggest challenge for a recovery audit company is to say if the new customer they acquired will be generating any income at all and if this income will exceed the costs incurred to run the analysis. Let’s remember that the business proposition for a customer is that it is free of charge for him as the recovery audit company will charge only a when it has recovered anything. According to the “Transparent “ in the first 3 years 40% to 60% of their customers they on boarded did not bring any income at all . This has almost put the company out of business.
3) Data findings review and recovery commencement
In the third stage the findings of stage 2 i.e. the potential duplicate payments are passed on to a person in order to review the invoices in more detail e.g. to see if the services described are the same and any other indication on the invoices that they refer to the same object. This is a cumbersome stage as it takes time to analyse all of the potential duplicates and most of them turn out to be false positives. This consumes time and money from the recovery audit company still without generating any income. This high number of false positives is somewhat normal as we need to remember that we are using static parameters: invoice number, supplier, amount, invoice date. Except the supplier criteria all of the other parameters may be present in the data more than once for valid reasons (different suppliers may invoice the same amount, also on the same date, and invoice numbers might be the same between suppliers as there is not a unified numbering system used by all suppliers) but this aspect significantly increases the potential duplicates found and as such the effort to analyse each transaction in more details.
But assuming the analysis is finalized and the real duplicate payments are identified the difficult task of actually recovering the money begins. We have seen earlier that the Institute of Finance & Management ( 2013: AP Department Benchmarks and Analysis) has estimated 0.5% to 0.1% is lost only due to duplicate payments. If we replicate this number now into an actual recovery audit case where the recovery audit company would discover 0.5% duplicate payments out of the entire list of payments performed within several years , this constitutes maximum possible amount to be recovered out of which only a % would go to the audit company.
For example if the analysis would be done for a large company that has payments of roughly 50 million euros and we apply the 0.5 % duplicate payments we would be looking at maximum 250.000 Euros to be recovered. However this is not the actual income of the recovery audit company as only a % out of this amount is charged to the end customer usually within the range of 20% to 30%. So the actual maximum income for a recovery audit company would be between 50.000 to 75000 Euros for 50.000.000 Euros of payments analysed. This includes an assumption that everything that is discovered as a true duplicate payment is also recovered which might not be the case as the company that the duplicate payment was made to may go out of business or may simply not acknowledge the duplicate payment.
We need to remember that the recovery audit company incurs cost in all of these 3 phases both in terms of personnel as well as IT or other type of expenses and can only at the end of the of the data findings stage estimate a maximum amount of recovery and as such a maximum potential income for them.
From the interviews with the Transparent CEO Willem-Jeroen Stevens as well as by the industry SMEs t we can conclude from this is that there are some driving indicators for the recovery audit companies, indicators that we will use in a live environment where we deploy all of the above techniques versus a situation where an artificial intelligence is deployed:
% of false positives : percentage of findings that turn out not to be an actual duplicate payment (this number should be as low as possible)
Time to recovery : average time for one transaction from the moment of data extraction to the actual recovery of money (this number should be as low as possible)
Cost per transaction: what is the entire cost involved in processing 1 transaction i.e. from the data extraction to the actual recovery (this number should be as low as possible)
% of actual recovery : how much is actually recovered from the total findings (this number should be as high as possible.
We will explore now the key element of artificial intelligence and try to estimate its impact on the recovery audit services. Let’s look closer at machine learning.
Machine Learning
Machine Learning is a subdivision of Artificial intelligence and focuses with algorithms that allow computers to learn. According to Toby Segaran in his book “Programming Collective intelligence “ these algorithms are given sets of data on which they infer information based on recognized patterns and this allows the algorithm to make predictions about other data that it might see in the future.
Patterns in the data are used by the algorithm to generalize and in order to do that it trains a model in order to determine the important aspects of the data. A good example here is the spam filtering mechanism used today in many email systems. Humans are good at spotting patterns and when they get an email containing the words “special offer” they recognize that its an add and they move it to spam. They have thus created a model of what a spam email is and after classifying several emails like this the machine learning algorithm in the email system picks this model up and applies it to future emails containing these words.
There are many machine learning algorithms out there most of them suited to a specific type of problem. In adition to this some of these algorithms such as decision trees are very transparent in their thought process while others such as neural networks are a black box where you can only evaluate the output but one cannot properly understand how that output was derived.
There are many applications of machine learning that we experience everyday with sites such as Amazon and Netflix that use information about their customers to try end determine what they would like in order to make recommendations. Other applications of machine learning algorithms can be seen in biotechnology ( machine learning is applied to DNA sequences and protein structures to understand more about biological processes) , financial fraud detection (in order to detect I a transaction was fraudulent credit card companies have employed techniques such as neural networks ) and also in product marketing ( for understanding demographics and trends).
Broadly speaking we have 3 main categories of machine learning techniques :
Supervised Learning: in this category of algorithms the target is to predict a dependent variable from as set of independent variables. For this to work it is vital to map the inputs to the desired outputs, this process continues until an acceptable level of accuracy is achieved. Typical algorithms here are Decision Trees, Random Forest, Regression.
Unsupervised Learning: in this category of algorithms there is no dependent variable to be predicted but this is used mainly to identify patterns in the data in order to cluster the population which is very useful in customer segmentation exercises. Typical algorithms here are K-Means, neural networks.
Reinforcement Learning: in this category of algorithms the computer is exposed to an environment where by using a trial and error approach it learns from past experiences in order to increase the accuracy of its decisions . A typical algorithm here is the Markov decision process.
But lets take one algorithm in more detail and see how it might impact a recovery audit service.
One of the most used algorithms in machine learning is the decision tree algorithm which is considered an on the shelf solution for data mining tasks such as we are discussing as part of the recovery audit exercise . This algorithm has a key advantage over other with respect with the transparency of the thought process i.e. one can derive how the computer arrived at a particular decision.
A decision tree algorithm is a supervised learning techniques used for classification problems. It can be defined as an algorithm using inductive inference that can be represented as sets of if-then rules ( “Machine Learning” Tom Mitchell) .
The algorithm has 2 main stages, one being the training of the model and the second one being the prediction. We need to remember that in the case in this algorithm we have a predefined dependent variable and we need to properly map the input to the output in order to train the model.
Lets take an example from Tom Mitchel’s “Machine Learning” . Here we want to deploy a decision tree algorithm that predicts if we are going to Play Tennis or not based ( i.e. the target variable ) based on specific input criteria (independent variables) such as if the weather outlook is sunny, the outside temperature, humidity and wind strength.
It is important to observe that in all of these scenarios we have mapped the right output i.e. what the right answer is in terms of playing tennis or not. This type of table will provide the training for an algorithm that given similar input criteria will be able to predict if we are going to play tennis or not.
(Extract from Tom Mitchel, “Artificial Intelligence” page 59)
The thought process of the algorithm would be as mentioned above as series of if-then questions that sorts a specific examples through the decision tree. From a graphical representation standpoint it would look in the following way
This graphical representation is what the algorithm would perform in terms of steps in order to arrive at a prediction . The decision process would work in the following way:
If Outlook = Sunny and If Humidity = Normal then the prediction for Playing Tennis is Yes.
This is a very simplistic representation of the decision tree algorithm as usually it contains more input variables but still the way it would work is similar.
Now we need to look how such an algorithm would work in a recovery audit scenario. For this lets take a look a the below data set
We have here a random example of potential supplier payments made within a specific period. We want to predict in this case if a transaction is a duplicate payment or not. The input variables are clearly visible i.e. vendor number, invoice number, invoice Date and invoice amount. We have also theoretically mapped all of these variables to a specific outcome e.g. a “Yes” or “No” in the duplicate Payment column.
The machine learning algorithm attempts to classify the data and to build the decision tree process in order to arrive at a prediction for a set of data where it has no outcome mapped. The key aspect here is that the algorithm continues the classification until it is not possible anymore to split the data in any different way until the highest level of detail possible. To be clearer about this aspect the number of if-then scenarios would always be exhausted for this data set as the algorithm is attempting to cover all possible combinations and outcomes.
In the above table I have highlighted the payments that have been tagged as duplicates and we can see for the highlighted sections that line 1 is very similar to line 5 in terms of it is the same supplier, same invoice number and amount but a different invoice date. Similarly for the line 9 and 12 we have similar suppliers, invoice date and amount but a different invoice number.
The thought process that the algorithm might take in this case is:
If supplier number = same, and if amount = same then duplicate Payment is Yes. That is because the algorithm is learning that for that specific vendor number other parameters such as invoice number or invoice date are not conclusive in terms of an actual duplicate payment.
Now how is this different from the recovery audit approach? In a recovery audit scenario the classification is kept at the highest level. What I mean by this that for a dataset of payment transactions one possible classification is : transactions that have same supplier, invoice number , invoice date. Based on this classification a person is manually checking if it’s a true duplicate payment or not. This classification also never changes while in case of a decision tree algorithm it continuously updates itself. The effect here is that the high level classification requires much more effort due to the fact that all such transactions are treated as equally probable to be duplicate payment which is not the case. A decision tree algorithm can avoid that and implicitly reduce the time for the analysis.
There is a second element of artificial intelligence that could impact the recovery audit services quite dramatically and that element is natural language processing.
Natural Language processing
Natural language processing is an element of artificial intelligence that describes the ability of a computer to process and understand human language.
There are some several aspects of natural language processing which relate to the component steps of communication. It is important to understand them in order to better grasp the concept of natural language processing.
According to J. Russel and Peter Norvig “Artificial Intelligence A Modern Approach Stuart J. Russell and Peter Norvig) there are 7 different stages in communication:
Intention : the intent of a speaker agent to communicate something worth communicating to another agent which often involves reasoning about the goals of the hearer
Generation : the speaker agent uses knowledge about language to know what to say and chooses the words
Synthesis : the speaker agent utters the language by addressing them to the speaker
Perception: the hearer agent perceives the language . In case of written speech this is called optical character recognition
Analysis: there are 2 parts to it done by the hearer agent. Syntactic interpretation and semantic interpretation. Semantic interpretation includes both understanding as well as integrating knowledge about the situation at the same time dealing with ambiguity. Syntactic interpretation also called parsing refers to the systematic allocation of a specific part of speech to each word (noun, verb, adjective etc.
Disambiguation: in written and oral speech there are ambiguous aspects such as multiple meanings of words based on context (e.g. the sentence “the athlete is dead” may mean in the context of a race that he is extremely fatigued, in another context such as being caught while doping that he will have serious problems, or the usual sense that he is deceased.) . In case of multiple meanings the hearer agent uses the disambiguation function and chooses the best option given the available data.
Incorporation: the hearer agent decides to believe the information received from the speaker and integrates this information in his knowledge base
Now why is this important in a recovery audit scenario? Well it relates a lot to what I said in earlier pages. The algorithms used by the traditional approach are static from all points of view i.e. they contain rigid criteria (such as for identification of a duplicate payment where the criteria can be same invoice number, same amount and same vendor) that never updates itself based on real outcomes but also because these algorithms do not take into account any business context and this is very important.
Why is business context important? Well lets imagine the case where we have a company that has signed a consulting contract with a consulting company and the agreement is that on the first day of the month the consulting company will issue all its invoices based on the work provided. This may have as a result that we have invoices on the same day from the same supplier and in many cases it will have the same amounts. Now any duplicate payment control performed after a year will indicate that there are many potential duplicates. The business context is not taken into account which generates a lot of false positives that take time to analyse.
This business context is usually present in larger companies either in procedure manuals or within other data bases (contract databases for example) and with knowledge representation techniques it can be transferred to a machine that can use this knowledge to run a much more accurate analysis of the payment transactions.
By doing so the results would be much more relevant. In order for this entire analysis to work , the machine needs an extensive knowledge base of words, grammatical rules, semantic variants and so on
Research Framework
Problem Statement
As mentioned earlier in the paper the one of the major concerns for large businesses is how to avoid financial loss within their payment transactions. Both in interviews with the SMEs as well as by consulting the existing market researches on the topic the major factors that impact the financial losses are duplicate payments, over payments, fraud and Vat Errors .
Equally validated by the interviews was that there is a strong assumption that is to be validated by this research that artificial intelligence will significantly impact the efficiency of the recovery audit services.
Research Objective
the research objective will be to understand “how can artificial intelligence support in improving the identification of duplicate payments”. Based on this research question the idea will be developed in a consultancy plan to develop the above mentioned spin-off.
We have seen that there are some driving indicators for the recovery audit companies use in order to measure their performance. What we will do in this research is that we will first apply a traditional recovery audit approach to a data set and measure these indicators.
At a second stage we will attempt to measure the same indicators in a context where artificial intelligence is present and compare the results.
Research Limitations
It is important to mention several limitations of the framework before describing further the data collection process:
First of all in terms of limitations we will only look at mentioned in the research question at the duplicate payments as data is more easily available and it focuses the research.
Second limitation is that we will not be able to actually test an artificial intelligence as it is not developed. This research along with the business plan afterwards is supposed to serve as a proposal for Societe Generale to invest in such a development
Theoretical Framework
In order to compare the traditional approach of recovery audit companies to the proposed approach of using artificial intelligence we will build the research framework around some leading indicators for the recover audit business and will compare the results.
Let’s see these indicators as well as their calculation method:
% of false positives : percentage of findings that turn out not to be true duplicate payments (this number should be as low as possible) – To be calculated as the number of findings that are not true duplicate payments divided by the total number of potential duplicate payments
Time to recovery : average time for one transaction from the moment of data extraction to the actual recovery of money (this number should be as low as possible) to be calculated as the number of days between data extraction and actual recovery of money. Then an average for all transactions should be calculated
Cost per transaction: what is the entire cost involved in processing 1 transaction i.e. from the data extraction to the actual recovery (this number should be as low as possible). To be calculated as the cost of resources ( limited to human ressources) in terms of salary over the period divided by the number of transactions.
% of actual recovery : how much is actually recovered from the total findings (this number should be as high as possible. To be calculated as the amount actually recovered divided by the total potential of recovery.
Research Assumptions
Due to the nature of this research we will have to build in several assumptions especially related to the second phase of testing:
The main assumption will be that we have a working artificial intelligence solution with at least 1 functioning algorithm. The algorithm will be described however its impact on the outcomes will be estimated
Calculation of indicators that are used to compare phase 1 and phase 2 will only be estimated for phase 2 as no real numbers will exist due to the first assumption
Cost elements for human resources will be calculated based on a theoretical rate card
Data Collection and Testing
We will undergo 2 phases of testing the mentioned framework. At the basis of both phases we will use the same data extraction i.e. payment transactions for Societe Generale Parent Company between 2010 and 2015. Data will be anonymous in the sense that neither supplier names nor contents of these transactions will be visible, just the data relevant for this exercise.
In phase 1 of the testing we will look at the traditional Recovery Audit approach also with the regular algorithms that are used. We will describe the algorithm used as well as its outcomes.
Our main hypothesis is the following: “Artificial Intelligence will improve the identification of duplicate payments”
Let’s look now at the testing scenarios to validate or invalidate the above hypothesis that we will split in two phases:
Phase 1.
Societe Generale entities will serve as the companies which payment transactions will be audited for potential duplicate payments. We will do this audit in several steps
Data extraction across multiple years , entities and processes :
We will extract payment transactions between 2010 and 2015
We will cover different entities covering the investment banking sector, domestic and international retail banking
We will cover entities with different internal processes in order to have a good representation
Definition of criteria to be used in auditing those transactions
This payment transactions will be audited using the same criteria i.e. invoice number, supplier, amount , invoice date We will split these criteria in different combinations and view the results by bucket
Preliminary finding analysis and conclusions
These results will be analysed in the sense that representative sample of the invoices will be verified if they are true duplicate payments
We will use stratified sampling in order to ensure a proper representation of all entities audited.
The results will be extrapolated to the entire population to calculate the total amount at stake
At the end of Phase 1 we will calculate the key indicators as they were defined in the framework.
Phase 2
In phase 2 we will assume a working AI solution alongside specific elements such as machine learning that we will only apply theoretically to the data and draw conclusions.
We will use the same data i.e. payment transactions between 2010 and 2015
We will describe big data functionalities and machine learning techniques and will apply them only theoretically to the same data
We will compare the theoretical results to the real outcomes out of phase 1
At the end of Phase2 we will calculate the same key indicators as they were defined in the framework.
All of the results will be used in order to validate or invalidate our initial hypothesis. This will be used onward to build the business case for Societe Generale. Below we can see a graphical representation of the timeline
Research Commencement
Phase 1.
We have extracted payment transactions between 2010 and 2015 for the below entities:
The data contains the below number of invoices per year:
French Retail Banking including:
Global Transaction & Payment Services
Direction des Activités Immobiliers
Direction Commerciale et Markting
Délégation Régionale
Direction Franfinance
Direction des Ressources Humaines
Direction Stratégie, du Digital et de la Relation Client
Secrétariat General
Innovation, Technologies et Informatique au Service des Métiers
Global Banking and Investor Solutions
Corporate & Investment Banking, Private Banking, Asset Management and Securities Services
International Retail Banking & Financial Services – Retail banking outside France
Central Division including :
Direction du Contrôle Périodique
Direction des Risques
Direction de la Communication Groupe
Direction des Resources Groupe
The data has a structured format as it contains labelled columns with relational data model where we can clearly identify the supplier, the amount, the invoice number and the invoice date.
To simulate the regular recovery audit scenario we have used the following static algorithms as a normal recovery audit company
Algorithms used:
In order to identify from the base extraction the possible duplicates in order to start the analyses there were 4 algorithms used:
Nr. 1: All transactions with identical: Invoice Date– Invoice Nr – Vendor Name – Amount
Nr. 2: All transaction with identical: Invoice Date – amount – invoice nr
Nr. 3: All transaction with identical: Vendor name – invoice nr – amount
Nr. 4: All transaction with identical : Vendor name – invoice nr – invoice date
Nr. 5: All transaction with identical : Vendor name – amount – invoice date*
These various combinations of criteria was used in order to cover all potential issues that might impact the analysis such as:
An incorrect data entry of the invoice number might not highlight a potential duplicate thus the algorithms where the invoice number is not present
A duplicate vendor record might not highlight a potential duplicate payment thus the algorithm where the vendor name is not present
An incorrect data entry of the amount might not highlight a potential duplicate thus the algorithms where the amount is not present
An incorrect data entry of the date might not highlight a potential duplicate thus the algorithms where the date is not present
After applying the above mentioned algorithm to the date here are the results of all combinations :190 377 Invoices were highlighted as possible duplicates, with a toal amount of: 376.775.528, 23 €
As we can see in for the algorithms 1 to 4 we have a reasonable number of potential duplicate payments that we will be able to verify to 100% if it is a true duplicate payment or not however for the 5th algorithm we would need to select a sample size as the number of potential duplicates is too high. This is a first sign that the traditional method increases the effort unnecessary.
I chose to select a confidence level of 99% with a 4% margin of error and as such the representative sample size contains 1026 potential duplicate payments. By analysing them we would be able to make assumptions about the population within the above mentioned confidence level and margin of error.
As per the above mentioned results of the algorithms all of the potential duplicates (100% for phase 1 to 4 and the sample size for phase 5) will be given to a person to be checked and in case of a true duplicate payment to attempt to recover. But what are the details steps the person takes for this entire analysis?
The steps performed by a person for an end to end analysis are:
Analysis of the Documents
Open the 2 possible duplicate documents
Look at the invoice image and compare the invoice details such as: invoice number; invoice date; amount; description of the cost; invoice addressed “To”
In case of real duplicate the agent will search if the the money was already refunded in order to continue with the investigation
In case the money is not refunded, a document will be generated and supplier will be contacted asking for refund
Recovery process
The agent will start the recovery process by searching the contact details of supplier – taken from a recent invoice / vendor master data or by searching on internet on vendor website.
The real investigation process starts after the vendor is replying to the recovery action. Most of the times the supplier will send a statement of account (due to fact that the supplicate investigation refers to 2010-2015 most of the suppliers will not keep any record open mentioning the double payment).
The agent needs to take the statement of account, that most of the times is not easy to understand due to different format/different info, and try to do a deep analysis in order to see if :
there is a lost credit note cancelling the double booking of the invoice and therefore the supplier is no longer debtor.
There are cases when supplier mentions that the duplicate payment was ducted form a next invoice .
Supplier had changed his name/accounting system /details and therefore the agent will need to analyse 2 different vendor accounts to see if the supplied did sent the refund
The follow up with suppliers is done on a weekly bases and most of the times if you have a response, then the recovery process is successful in 80% if the times.
After the refund is received, the treasury department will be contacted and then the correction document closed with the refund.
AS we can see the effort for the recovery is much higher than in case of the actual analysis which gives a clear indication that if the analysis would me more accurate and would have as an outcome fewer potential duplicates the effort could be better invested in the recovery phase.
KPI measurement Phase 1
Let’s now look at the results of the various indicators we have defined in the framework:
KPI 1 – % of false positives : percentage of findings that turn out not to be true duplicate payments (this number should be as low as possible) – To be calculated as the number of findings that are not true duplicate payments divided by the total number of potential duplicate payments
For the first indicator we said that we will look at the number of false positives that the traditional recovery audit method generates. Below we have the results of the 5 combinations of algorithms used.
It is clear from the result of that the traditional method generates too many false positives. For all of tha various algorithms the % of false positives lies between 76% and 99%.
If we combine all of the results for all of the 5 phases we have an astonishing 94% of false positives (4466 out of 4736 results) which is incredibly inaccurate. Why this is important is that these false positives require effort in order to be analysed which in a real business scenarios would engage cost with no chance of a return.
KPI2 – Time to recovery: average time for one transaction from the moment of data extraction to the actual recovery of money (this number should be as low as possible) to be calculated as the number of days between data extraction and actual recovery of money. Then an average for all transactions should be calculated
As per the above described workflow steps of a person to identify the real duplicates and attempt to recover below are the results. We can see that the number of days is very high i.e. between 68 and 206 days.
The average of the above figures is 149 days for one duplicate payment on average between the data extraction and the actual cash back. We need to remind ourselves that in a real recovery audit environment this would be the number of days that the recovery company would need to wait before they could charge the end customer anything considering the business model where a fee is charged only in case of recovery.
So on average the audit company would have to carry the cost for a transaction for about 5 months before it generates any revenue. This is makes it very difficult for the recovery audit company to sustain its business and is a further indication that technology under the form of artificial intelligence could impact this indicator dramatically.
KPI 3 – Cost per transaction: what is the entire cost involved in processing 1 transaction i.e. from the data extraction to the actual recovery (this number should be as low as possible). To be calculated as the cost of resources (limited scope) in terms of salary over the period divided by the number of transactions.
In order to calculate the cost for 1 transaction we need to clarify first what are the cost elements that are taken into account. In the below view we are only looking at the cost of human resources. The resource cost that was taken into account here is 25000 EUR per annum which is a standard cost of a full time equivalent based on the average cost calculation performed by SG EBS that is currently built into the rate card charged to final customers. This cost includes all employee and employer related contributions as well as some indirect element such as IT cost.
The total cost for each algorithm was computed by dividing the 25000 EUR/annum by 365 days and multiplying it with the avg days per transaction for the respective algorithm.
The total cost for all of the 4 algorithms was 51232 EUR which is equivalent to 2 full time employees for 1 yea or 4 full time employees for 6 months. If we divide now the 51232 EUR by the total number of transactions checked i.e. 4736 we get a cost per transaction of 10.81 EUR.
This number needs to be viewed in the following context. As we have seen in the first indicator the % of false positives is very high. In this case 4466 out of 4736 were false positives. If we multiply this number by 10.81 EUR it means that 48277 EUE out of 51232 were wasted due to the inaccurate initial analysis.
Only the 270 true duplicates generate revenue for the company which means that by increasing the accuracy in the initial analysis the recovery audit company can significantly reduce its cost as well as speed up the entire effort. This is beneficial also from a customer standpoint.
KPI 4 – % of actual recovery: how much is actually recovered from the total findings (this number should be as high as possible. To be calculated as the amount actually recovered divided by the total potential of recovery
The fourth and final KPI relates to the actual recovery of money. This is important as this is the ultimate indicator that produces money for the recovery audit company. We can see in this example that based on the workflow steps of the analysis and recovery effort described above it has yielded specific results.
For each algorithm in particular a specific % of the true duplicate payments were recovered and this allows the recovery audit company to finally charge the end customer. Usually a 20% fee is charged by the recovery audit companies which in this case would mean that for the total amount recovered which in this case is 1,984,080 EUR the total charge would be close to 400.000 EUR which would be the recovery audit company’s turnover.
However as we have seen in previous indicator it would be up to 6 months before any money could be charged to the end customer, in the meantime the recovery audit company would need to sustain itself.
Phase 2
For phase 2 what we said is that we will work on the same data however we will estimate how artificial intelligence through machine Learning might impact the KPIs that were described above.
First of all we need to remind ourselves what types of machine learning exists. In page 15 of this paper I have defined 3 types of machine learning techniques e.g. supervised, unsupervised and reinforcement learning.
However there is another way to categorize machine learning which is based on the task it is supposed to perform. We can differentiate several types of machine learning however the 2 main ones are:
Classification – here the machine learning algorithm needs to learn to allocate the right label to the input data. The main example quoted in most sources is the spam filter in our email accounts. The classification task here is for the machine learning algorithm to properly allocate the label of spam or not spam to the incoming emails
Clustering – a task of identifying similar classes of data points. This works really well in the retail business for customer segmentation techniques based on their purchasing patterns.
In our example for the recovery audit business we are clearly looking at a classification task that we expect the machine learning algorithm to produce. To be more precise we expect the algorithm to allocate the label of “Duplicate Payment” or “No Duplicate Payment.
The difficult task is how to build the correct classifier for the task at hand as there are many algorithms that can be used. According to Pedro Domingos ( author of ‘The Master Algorithm “ ) in an article published within the department of computer science and engineering of the University of Washington called “A few useful things to know about machine learning” choosing the right algorithm consists of 3 components:
Representation – which refers to the requirement that a classifier must be represented in a language that the computer can use.
Evaluation – a function used to separate good classifiers from bad ones
Optimization – a method to search between the classifiers that have scored the highest within the evaluation function
The above table includes main approaches for each of the 3 components. (Depicted from “A few useful things to know about machine learning” by Pedro Domingos).
Before I explain how machine learning might impact our current dataset it is worth mentioning that the ultimate goal of machine learning is to generalize beyond the items available in the training set. We want for our concrete examples here that the machine learning algorithm generalizes from past experience on training sets of duplicate payments to new sets of data and correctly allocates the label of duplicate payment.
Within the limitations of this framework I did mention that the estimations in this second phase rely on an assumption that the artificial intelligence platform is developed.
I want to take this assumption a bit further and assume in addition that we have already chosen our classifier which is the decision tree algorithm. As mentioned in earlier pages a decision tree algorithm is a supervised learning techniques used for classification problems. It can be defined as an algorithm using inductive inference that can be represented as sets of if-then rules ( “Machine Learning” Tom Mitchell) .Now lets see how the decision tree algorithm might compare to the traditional data mining approach mentioned in phase 1.
One of the things that I have mentioned several time as a key drawback of the traditional approach is the fact that the algorithm is static. What I mean by that is that for each of the algorithm there is only on layer of the above mentioned IF-THEN function.
Algorithm 1 described in phase 1 has a combination of Invoice Number – Invoice Date – Invoice Amount – Supplier which translates into the following unique layer of analysis : IF 2 or more transactions have the same Invoice Nr – Invoice Date – Invoice Amount – Supplier THEN this is a duplicate payment. As its clear from the results of the first KPI this is not enough as the accuracy on this algorithm was only 24% i.e. we had 76% of false positives created by this layer.
The decision tree adds more IF – THEN layers to the analysis but before it can do that it needs to be trained on parts of this data set . By training I mean that it needs to be shown items in the data set that turned out to be true duplicates so it can attempt to generalize on the parts of the data it has not seen.
But how to add this additional layers ? According to Pedro Domingos in his article “A few useful things to know about machine learning” the data alone is not enough in order to produce a proper generalization.
There are general assumptions that can be added on top of the data that can support the generalization tremendously. This type of general assumption in our specific case come from the people involved on the day to day business that can added business context which can be used to add additional layers to our decision tree algorithm which implicitly will reduce the number of false positives.
Let’s see one example and its impact. In our First KPI we have calculated the % of false duplicates that traditional algorithms produce. The results were that we have an incredibly inaccurate outcome that produces between 76% and 99% false positives.
For our example lets look at the 5th algorithm where we selected just a sample size due to the fact that the potential duplicates where were to numerous to analyse. Here the traditional algorithm uses one IF-THEN layer based on the combination of Vendor-Amount –Invoice date.
The high number of potential outcomes is justified as after discussions with SMEs working on these type of invoices I learned that they refer to consulting and contractors invoices that are always invoiced by the respective company on the same date with the same amount. Basically the way that the business operates creates these items although most of them are valid payments .
During the analysis of the relevant sample size of 1026 items we have discovered a key aspect in differentiating potential from true duplicates that has allowed the team to discover the 24 true duplicates. Here the difference was by the period that the invoice was for i.e. if we had an invoice from the same supplier with the same amount on the same date and if the period the invoice was for was also the same then the possibility of the invoice being a duplicate increased significantly.
This is the additional layer that we could have built into the decision tree algorithm which in our example would have reduced the number of potential duplicates from 1026 to around 300 i.e. the accuracy of the algorithm would have increased from 2% to 8% as we would have had 24 true duplicates and 275 false positives versus 1002 false positives in the first version of the algorithm.
According to another study published in the “International Journal of Engineering and Innovative Technology” (“ Study on Performance of Machine Learning Algorithms Using Supermarket Dataset” Volume 3, Issue 11, May 2014) that has attempted to rank specific classifiers the accuracy of the mature decision tree algorithm was 75% . Building on that thought by only adding one additional IF-Then scenario we gained 6% in the accuracy of the analysis, something that a traditional algorithm would not do.
Now based on this finding let’s see how the other KPIs might be impacted by this additional layer for our algorithm. We have already established that the first KPI would increase its accuracy from around 2 % to around 8% but how would the other KPIs be impacted?
Having established the impact on the first KPI let’s look at the impact for the second KPI which is the average time for one transaction from the data extraction to the actual recovery of money. According to the results of phase 1 for the 5th algorithm we have an average of 129 days per transaction.
Earlier in this paper we have identified 3 steps in the recovery audit process e.g. Data Extraction and Data Cleaning, Data Mining and Data Findings and Recovery Commencement. It is also safe to assume that the 129 days can be split in the following way for the 3 steps i.e. 20% for the extraction and the data cleaning (i.e.26 days), 40% for the data mining (i.e. 52 days) and 40% for the recovery step (i.e. 52days) . Now step 1 and step 3 would remain unaffected by the additional layer in our machine learning algorithm however for step number 2 as we have seen in the first KPI the effort decreases from having to analyse 1026 transaction to having to analyse 300 transactions which means an effort reduction of around 70%. Thus we can conclude that the second step could be performed instead of 52 days in 15 days. As such KPI number 2 would be positively impacted with a reduction of the average number of days per transaction from 129 to 93 days i.e. a 30% higher efficiency.
Continuing on the same logic for KPI number 3 in the first phase we have computed as an average of 10.81 EUR per transaction. If we apply this cost to the transactions in the 5th algorithm and we build in the above assumptions for KPI number 2 we can safely say that the cost per transaction will gain the same efficiency as in the earlier KPI. 30% efficiency gain translates in a cost per transaction reduced from 10.81 EUR to 7.56 EUR. This makes sense as reducing the number of false positives to analyse reduces the human effort involved and as such the cost.
But what about the last KPI relating to the percentage of the actual recovery?
This performance indicator will most likely be unaffected by a performance improvement in the previous KPIs. This is due to the fact that the machine learning algorithm only reduces the effort involved in the initial analysis so the recovery effort can commence as soon as possible however does not increase the likelihood of recovery. One could argue that reaching the recovery phase earlier we could increase the % of actual recovery however in the experience of the SMEs the actual recovery is more influenced by factors such as if the company still exists or the duration and the outcome of a potential dispute with the supplier about a refund.
Summarizing our findings in phase 2 that we have built on the assumption of a multi-layered machine learning algorithm acting on the same findings of phase 1 for algorithm number 5 we have had the following results:
KPI Nr 1 – % of false positives : percentage of findings that turn out not to be true duplicate payments (this number should be as low as possible) – Accuracy improvement from 2% to 8%
KPI nr 2 – Time to recovery: average time for one transaction from the moment of data extraction to the actual recovery of money (this number should be as low as possible) – Nr of days reduction from 129 to 93
KPI nr 3 – Cost per transaction: what is the entire cost involved in processing 1 transaction i.e. from the data extraction to the actual recovery (this number should be as low as possible) – Cost reduction from 10.81 to 7.56 EUR
KPI nr 4 – % of actual recovery: how much is actually recovered from the total findings (this number should be as high as possible) – No Impact
We can conclude from the above analysis that a artificial intelligence could impact the recovery audit services in significant way by making the analysis much more accurate as well as improving effort involved in the entire exercise.
Business Opportunity for Societe Generale European Business Services
Executive Summary
Societe Generale European Business Services is a Shared service center of the Societe GEnerale Group that attempts to centralize European Finance , HR and IT services in order to achieve cost reduction for the group in light of the very low interest rate that put a lot of pressure on cost management within the banking sector. Societe Generale has established historically two other shared service centers, one in Paris and one in Bangalore . All this service centers will be the object of this business plan
Societe GEnerale European Business Services has grown from 50 employees in 2008 to 1000 employees in 2016. Sg EBS delivers at current size approximately 45 million euros per annum savings due to the labor arbitrage and attempts to deliver similar savings of 45 million euros per annum plus 30% efficiency on top in the next 3 years. The value proposition for customers (internal entities of Societe Generale e.g. ALD automotive ) was to take benefit of the highly qualified , multilingual staff in Bucharest to deliver accounting, HR and IT services and after the centralization of the services the processes will be made more efficient through various process improvement initiatives as well as disruptive technology.
Within the business lines SG EBS covers a large part of the investment banking business line as well as the central divisions however struggles to attract the retail banking sector both within France as well as internationally (e.g. BRD). This is mostly related to the fact that especially for the international retail banking sector of Societe Generale there is no labour arbitrage (e.g. no arbitrage between BRD and SG EBS) and growth especially in finance reached a blocking point.
There are 2 major growth opportunities for SG EBS as a finance shared service center .
First major growth opportunity is within the group i.e. to attract the business lines currently not covered. However we already know that these business lines cannot be attracted through labour arbitrage.
This is why this paper is supposed to offer a build on the research presented until know in order to develop a new service offering focusing on recovery audit in order to be able to expand the service delivery to these internal business lines.
Second major growth opportunity is to provide new service offering under the form of spinoff beyond the internal Societe Generale entities and go out into the Romanian Market.
Within this business case we will focus more on building a spin-of or an additional service linethat provides this service initially just for internal entities however may expand to the Romania market after a first period.
Service offering description
As mentioned in previous pages exploiting this service offer could take various forms i.e. a new business line within the current Finance shared Service Center or to take the form of a spin off out of the current entity which is Societe Generale European Business Services.
Irrespective of the form the mission of this new business line would be to provide recovery audit services to its clients. It attempts to help improve the financial results of its clients by discovering and recovering various financial leakages that happen in the day to day operation, such as duplicate payments, over payments, and VAT errors.
The recovery audit business line would have to fulfil 2 major objectives. First of all it should generate a new revenue stream for SG EBS by attracting entities that are not attracted by the shared service model due to the lacking labor arbitrage aspect such as the international retail banking sector within Societe Generale. Second off all it may create a new product that could be offered also to external entities i.e. medium and large companies in the Romanian market implicitly creating also top line impact for the group.
The recovery audit industry as a whole was created or at least significantly impacted by the Sarbanes-Oxley Act of 2002 that forced companies to pay much more attention to the accuracy of their financial statements. Areas of loss such as duplicate payments, overpayments and various other operational errors that create financial leakage in the companies distort the accuracy of the financial statements.
Like all industries recovery audit services cannot remain untouched by the current technological developments. This is also the main purpose of this paper, to demonstrate that emerging technologies such as artificial intelligence and big data will have a tremendous impact on this industry especially if it can be combined with business expertize.
Societe Generale European Business Services is the ideal incubator for the development of such a start-up. Firstly this is due to the fact that one of the main activity of this shared service center is to provide financial services to the groups entities and as such there is a lot of business expertise that can support this start-up. Secondly within the same financial services there is a lot of availability of data and a safe testing and development environment for such a new business offer.
The business model is based on the creation of an IT platform that combines artificial intelligence with business acumen in order to create a robust solution for clients that will provide an immediate image to the client about his potential area of leakage and business benefit.
The same platform due to the fact that it utilizes artificial intelligence as well as the ample business understanding will make the recovery audit process much more efficient and much more cost effective as illustrated in the detailed research.
Having worked in this field for over 10 years I see a lot of opportunity for differentiation by combining the 2 elements e.g. artificial intelligence and business acumen.
Product description
The Recovery Audit Services product is in fact an online AI powered Platform that will connect to the payment transactions of the customer running a diagnostic on his accounts payable.
According to a report published by Ardent Partners (Ardent Partners – ePayables 2013: AP’s New Dawn ) in 2013 as close 1% to 3% out of a company’s accounts payables is lost due to due to duplicate payments and overpayments. In addition the Institute of Finance Management (Institute of Finance & Management – 2013: AP Department Benchmarks and Analysis) has estimated 0.5% to 0.1% is lost only due to duplicate payments. In order to exemplify this dramatic situation if a company has paid 100 million euros it has lost 100 to 500 thousand euros due to duplicate payments alone.
Lest say we analyse an AP value of 100 million , we would run the analysis on the customers payments and will identify potential areas of losses such as duplicate payments , over and under payments. In this example our product highlights a potential area of loss of 500.000 EUR.
The result would be immediately be visible to the customer in terms of what the potential benefit he could have. In this case the benefit would be the 500.000 minus the 15% fee 425000 EUR benefit with no upfront investment.
Traditional recovery audit services are very present in mature economies where this is very developed service. These companies are trying to convince large corporations such as Societe Generale to provide them with their data on payment transaction so they can attempt to discover erroneous transactions and then recover them , while charging a specific fee.
But what they are selling is not a very tailored analysis, they are selling a standard diagnostic based on unchanging parameters and then they are using people to go through the transactions manually. They are actually selling unavailable capacity, as large companies don’t have the resources to check all of their transactions.
Because of their approach they target large companies, because their analysis is not specific and they cannot tell if a customer will bring them money or not so they assume that the larger the client the larger the probability of recovery. They also lack any business context in their analysis and their result is very unspecific i.e. they find a lot of potential errors but only a few are real errors.
Our product does something very different . We do not sell spare capacity we sell value insights into the data to quickly give an image to the customer what is his area of leakage. We are combining our significant understanding of the internal customers business (as we are processing his invoices and we have detailed procedure manuals) to build tailored algorithms for the customer and for his business type. We will be building this context into the traditional machine learning algorithms to be more accurate in the findings and to move directly to recovery, skipping the heavy transaction by transaction manual checks. As the analysis is automated our efforts go into recovery of the lost funds.
The product is free of charge for the customer, if the analysis finds nothing the customer doesn’t have to pay. Customer is charged a fee only in case of recovery so he only has a positive ROI instantly. Our solution is not related to any specific ERP, it can analyze data from all sources. So our customer landscape is wide. The below image describes clearly one transaction from a customer’s standpoint where in case of a duplicate payment of 10000 EUR if the amount is recovered the customer has to pay a fee of 15% i.e. 1500 EUR and receives 8500 EUR not having to invest anything.
But why 15%?
Established companies in the recovery audit field such as Transparent (www.transparent.nl) , Caatalyst (www.caatalyst.com), Apex Analytics (www.apexanalytix.com) , usually charge 20% to 30% . Within the description of their services most of these companies either quote or specifically indicate that a large part of their audit is a manual review of the transactions. As such it is safe to conclude that they run with a more heavier operational model that involves a lot of human resources.
However our model is different as it targets to eliminate as much as possible the manual step where each transaction will be verified and validated as a true error. Thus our service will runs with a much slimmer operation as we have cut a lot of the effort by replacing the human element in the audit process by the artificial intelligence. This will allow us to be also more efficient on the price charged to the final customer which will allow from a client standpoint to differentiate ourselves.
If we try to pinpoint all of the advantages of the solution we can say the following:
Free of charge for the customer. The customer is charged only in case something is recovered which means he can only have a positives impact
Our solution is system agnostic i.e. the operating system of the customer is not important
We move due to the automated analysis very quickly to the recovery which creates the end value.
Lower price
We have a wide landscape where we are daily connected to the customers’ business and we can continuously improve the algorithms used and can make the outcomes more and more accurate.
Operations
The following section will describe the operational flow of the spin-off. Largely the operational flow is very similar to a regular recovery audit company. This operational flow is commencing once the customer is on board. It consists of 3 main stages that have been described also earlier in this paper:
Data extraction and data cleaning
In this phase we request from the customer all the data required to kick start their analysis such as: Payments made in the last 3 years, database of active suppliers, database of contracts etc.
At this stage considering one of the main tasks is to take this entire data and consolidate and clean it so that it can be entered in our platform so in can be analysed. This is a challenging step as data is usually received in a great variety of formats (pdf, excel, txt, csv ) and also more often than not it is not clean i.e. it contains duplicate fields, incomplete information, it is not structured ( unstructured data is data that follows no relational model i.e. does not have a predefined data model) and as such extremely difficult to analyse
Data mining
Here we load our data into the platform and allow our tailored algorithms to commence the identification of payment errors. Due to our differentiated approach where we combine our significant business acumen with machine learning algorithms we will very quickly receive the erroneous payment transactions to investigate.
As we described in the earlier research we attempt to skip phase number 2 as soon as possible so we can begin with the recovery which is the one that is generating revenue.
Recovery Commencement
In the recovery commencement phase the findings are passed on to recovery experts. This is the critical phase as it is the one that is generating the income for our company as well as the benefit for the end customer.
This phase is very similar to a collection effort where companies that have benefited from these erroneous payment transactions on our clients side (e.g. duplicate payments, overpayments) are contacted through various means i.e. email, phone calls, post in order to clarify how this amount can be refunded. This can be a very cumbersome stage as depending on how long ago the error has happened the company that benefited from the error might be out of business, or might not recognize the error or simply refuse to reimburse.
In terms of the structure of operations we have 4 key units that will be cover the operational requirements. These units are:
IT Unit and data management Unit
There are several deliverables of the IT unit and they relate to data management, platform enhancements and infrastructure requirements.
In terms of data management the IT unit will clarify the way data is stored and the way data is processed. In terms of data storage both in the short term as well as in the long term own storage capabilities must be developed. . On top of this the IT unit will define the ingestion infrastructure of the data i.e. how various data sources are handled, size of data items , rate of data ingestion and data quality.
The IT unit will also be responsible for the way the data is processed. In the above research we have given an example of a possible algorithm however depending on the task at hand i.e. classification, clustering, regression, association analysis the IT unit will decide on the data mining approach.
Last but no least the IT unit will be responsible of data security which is crucial considering that we are processing sensitive information from the customer’s
Analysis and collection Unit
The analysis unit will take over the work once the IT unit has delivered the insights out of the customers data set. This unit will reconfirm the insights (for example that it is a true duplicate payment) and will afterwards proceed with the collection.
This unit will also be responsible once a transaction is validated as a loss to recover the lost money. This will take the form of a regular collection exercise as discussions are required with various suppliers first of all to confirm with them the erroneous transaction that they benefited from as well as the form the refund of the finances will take (the form can be either a cash refund, or a deduction out of a future invoice).
Sales Marketing Unit
The sales unit will identify and approach the companies that we will partner with. They will also be the owners of any promotion of the company as well as any market strategies.
The following assumptions are considered in order to build the cost structure:
For the first year the number of employees will be 12. 2 Employees within the IT unit, 2 employees for the back office unit and 1 employee for the sales effort. In addition the management function will be fulfilled by one person. 7 employees in total for year 1.
Average salary is 1500 EUR gross including employer contributions (based on Mercer Salary Study 2017)
I assumed that we need 150 sqm office space which will be enough until 2020. I assumed that 10 euro per sqm rent and 3 euro sqm maintenance. Based on current rates used for business parks in western and northern Bucharest
In terms of equipment and maintenance this refers to IT hardware and licenses purchased within year 1
In terms of operating cost the estimation for the first year would be the following .
In terms of the implementation strategy we will have 2 parallel running processes that will allow both to go to market very quickly as well as work on the long term . below is the timeline for the implementation Strategy. We plan to be running the operation on a MVP within 3 months and at the same time to have a beta version within 6 months and a release candidate within 8 months. Below is the timeline.
Within the first stage we will attempt to establish a running operation within 3 months. Here are some highlights of this phase. The below assumes that all the administrative requirements were fulfilled i.e. hiring, facilities and minimal IT infrastructure exists.
We will develop a Minimum viable product with 1 month that will allow to analyse the financial transactions of the customer however without relying on any machine learning algorithms.
Findings will be analysed manually one by one and classification will be stored for usage on the core platform
In the interim back office operation will fulfil the communication with the supplier manually as well as track all refunds received manually
Insights into the data will be logged manually into a knowledge management tool so it can be used as training material for the final machine learning platform
Budget requirements Stage 1 are the following based on an estimation received by the Societe Generale IT Function:
In the second stage we will work on the development of the core platform and we will start at the moment we have the MVP ready based on which the operation will run on . Here are the main deliverables of phase 2 as well as functionalities of the platform:
We will start the development of the platform after month 1 upon receiving the funds to launch the deployment. Upon development the platform will us datasets available from past exercises as well as from the work done with our minimum viable product.
Machine learning algorithm to be deployed on the data and a comparison of the accuracy will be done versus the manual checks performed with the minimum viable product. Once the accuracy of the two analysis are similar we can say the platform is operational
Datasets in different formats will be tested as the platform should be agnostic of the system the customer uses.
Data visualization techniques will be developed into the platform for easier interpretation of the outcomes.
Collection will be partly automated in the platform, only a small fraction remaining manual.
The below depicts and estimative budget based on an estimation received from the Societe Generale IT Function for developing the platform as well as initial project management and support structure cost.
Marketing Plan
Before we go into a more detailed analysis we need to properly define what our market is. In our case considering this solution could be exploited in different ways i.e. as either an additional service line or as a spinoff company. Based on the choice made the market is defined in completely different ways. We will explore both possibilities and try to pinpoint the market as well as how to enter the market.
However irrespective if the service line would be exploited internally or externally it is crucial to integrate business objectives with the customer value that makes it all possible. In order to ensure a consistent strategy that combines the two elements we will be using Paul Garrisons success dashboard from his book Exponential marketing.
Integrating business and customer objectives is not easy however it is vital in order to ensure the right focus of the entire business. In our case the business objectives are targeted towards growing the business of SG EBS that will stagnate after the offshoring opportunities run out. This can only be done by developing a new service that allows both growing the business with existing customers as well as be attractive to new customers. In order for this to be effective though the value proposition for the customer must be very clear. Our recovery audit solution aims to provide to customers a real time image of their areas of financial leakage and help them reduce the risk and the inefficiencies of their processes. This value proposition targets a concrete tension of business managers and especially CFOs focusing on areas of risk that are invisible and thus unmanageable by them.
Customer targeting is critical as this is the next logical step before defining go to market strategies. Let’s look at some approaches to see how they could help us identify the right customer. The following segmentation is depicted from Paul Garrisson’s book “Exponential marketing”:
Brand positioning – Considering we are offering a recovery audit solution we are naturally combining the functional element of delivering the audit however we address also a concrete emotional state of people responsible for the financial health of their business. If we look at our offering from an internal Societe Generale standpoint these are the CFOs of the various entities. This is also valid for external companies.
Communication strategies – in order to have an engaging communication with the customer the message need to be delivered at the right moment and place. For example if we look internally at Societe Generale, a communication related to our business offer could reach the client after a specific event ( e.g. incident related to financial leakage) .
Product development – our recovery audit solution needs to take the customers’ business reality into account. This is exactly what we are offering as we target to deliver a tailor made analysis to the customers’ business.
We are thus addressing CFOs within the Societe Generale entities and externally we are looking at CFOs or business owners.
In terms of go to market approaches there are various stages in the customer acquisition process with different approaches:
For the early stage that focuses on building awareness for the company to create some interest around the topic the choice is to go with an indirect approach. This will be done mostly through social media events and email marketing
For the middle stage of our customer acquisition process the focus will shift to a direct approach offering the customer the possibility to evaluate the opportunity. Networking and email offers are the tools for this stage
At a later stage the direct approach will be more specific for the customer through demos, trials and invites-events. This stage is the moment a customer is actually engaged in a purchasing opportunity
Last stage when the customer is already on board the attention shifts to customer retention through advocacy programs, client workshops and case studies.
Now we have reached the phase where we need to integrate customer centric indicators in our marketing strategy that will allow further development of the business. These indicators focus on the value delivered to the customer in our case that being the actual amount recovered by our solution for the customer but also on the feedback received from the client related to the experience as well as the customer advocacy of our services. Following these indicators will generate new business internally and externally.
Lets attempt now to estimate the market both internally within Societe Generale as well as externally on the Romanian market.
First of all we will focus on the scenario where this business opportunity will be exploited as an additional service line to the group’s entities. The Societe General groups total expenditure on a yearly level is 7.5 billion EUR. According to a report published by Ardent Partners (Ardent Partners – ePayables 2013: AP’s New Dawn ) in 2013 as close 1% to 3% out of a company’s accounts payables is lost due to due to duplicate payments and overpayments. In addition the Institute of Finance Management (Institute of Finance & Management – 2013: AP Department Benchmarks and Analysis) has estimated 0.5% to 0.1% is lost only due to duplicate payments.
As such with a conservative approach we can say that 0.5% out of the groups general expenses could be affected by either duplicate payment or overpayments or other types of financial leakage which means 37,5 million EUR lost for the bank. Considering our fee of 15% for delivering this service to the groups entities we can say that the internal market is 5,6 million EUR. Considering that internally there is no competition we can safely assume a 100% market share. The goal is by year 3 to cover the entire group with this service.
In year 1 the focus that will be described in the next paragraphs will be towards existing customers in the Bucharest shared service center that were recently acquired , year 2 the focus will move towards covering all existing customers present in the shared service centers around the world and year 3 will concentrate on entities that are not currently clients of the shared service centers. From a growth standpoint the first to years are growing the current business with the clients while year 3 is gaining new clients for SG EBS.
In terms of the go to market strategy in year 1 and 2 if we look at our internal market within Societe Generale in order to have the proof of concept. The goal is that within the first year to capture as clients all the entities that have recently offshored their activity into the Bucharest shared service center i.e. SG EBS and who have had no robust control framework in place.
This will increase the likelihood of identifying historical errors that were produced and recover part of them; also this will allow SG EBS to provide day 1 with a concrete value beyond the labour arbitrage of placing the activity in Bucharest. The entities are listed below in terms of number of invoices per year and amount of general expenses per year.
If we apply now the same percentages as quoted a big earlier to the client list target for year 1 we can say we are looking at 4,7 million EUR potential affected payments ( 0.5% out of the 950 million general expenses) and a potential income generated of 700.000 EUR for our service line (15% fee) as depicted in the below image.
Within year 2 the goal is to cover all clients that have offshored service either in Paris, Bucharest or the Bangalore shared Service center. The list of entities can be viewed below with their respective values of general expenses and number of transactions. In this case we are looking at 4,9 billion euros of annual payments and by applying the same number we are looking at 24.5 million euros potential affected payments and maximum income for year 2 of 3,6 million euros for our service line (based on the 15% fee).
One of the significant advantages of tackling the internal market this way is based on the fact that all these financial transaction are currently already existing in the shared service centers as such access to the data is extremely easy and furthermore we have significant insight into the customers business type and business cycle that will allow us to built significant business context into our recovery audit platform. Another benefit for the group as an entity is not actually the entire amount recovered and not just the fee charged by our business line as this is an area of leakage that the group was not aware of and as such will reduce the risk posture overall.
Total Scope currently in the 3 shared Service centers that serves as the basis for the year 2 growth expectations:
Now let’s take a look at the external market from both a size as well as a competition standpoint. First of all the recovery audit is very present in mature economies such as the United States and Western Europe however it is virtually non-existent in Romania.
As far as the United states are concerned we are looking at 3 main players which are PRGX (www.prgx.com listed on the NASDAQ stock exchange) , Protiviti (www.protiviti.com) and Connolly (www.cotiviti.com ) that are in this business for a long time and with a proven track record. In terms of Western Europe the established companies in this field are Caatalyst (www.caatalyst.com) , Transparent (www.transparent.nl ) . Irrespective of the provenance of this companies all of their business models are similar i.e. they operate on a contingency fee which means that they receive money from their clients only in case of recovery.
In Romania none of these companies exist however the recovery audit service is being offered as a side product by the big audit companies such as KPMG, E&Y, PWC. However this side product is not professionalised in any way.
According to www.listafirme.ro , in Romania there are 594 companies with a turnover minimum 50 mil EUR, companies that can be qualified as large. Additionally, there are 675 medium sized companies in Romania, with a turnover of 25 to 50 mil EUR turnover. If we just look at the 600 companies over 50 million EUR turnover and apply the 0.5% potential affected payments we can say hat 150 Million Euros would be in scope for a recovery audit exercise and with the 15% fee we apply we can conclude that our market is around 23 million EUR. If we include also the 675 companies with revenues in the area of 25 to 50 million EUR we are looking at an additional 118 million EUR affected and an additional market size of 17 million EUR. Best estimate for the recovery audit market in Romania is thus 40 million EUR. The target would be to have a market share of 10% which means 4 million EUR.
Financial Pack
In terms of financial planning we will first clarify the assumption that were built into the business plan and clarify their impact from a P&L and balance sheet impact. Let’s first look ath the assumptions from a growth standpoint:
Growth is projected based on the coverage of the entities within the Societe Generale Group
I assumed that in year 1 we will cover all of the customers that have recently offshored their services to the Bucharest Service center, in year 2 we will cover all of the clients in the 3 shared service centers and in year 3 the entire group.
The assumption based on the market studies in terms of % of transactions in scope (i.e. potentially erroneous ) is 0.5% and a 15% contingency based fee .Year 1 In Scope transactions are 950 millon Eur with 0.5% affected payments i.e. 4.7 Million EUR out of which 15% contingency fee would bring revenue for year 1 at 700k Eur. Year 2 In Scope transactions are 4,9 billion with 0,5% i.e. 24.5 Mil Eur affected payments out of which 15% contingency fee would bring our revenue at 3.7 Million. Year 3 in Scope transactions are 7.5 billion with 0,5% affected payments i.e. 37.5 Mil EUR out of which 15% contingency fee would bring our revenue at 5.6 Million EUR.
Assumption is also that 100% of what is discovered is also actually recovered through either a cash refund or a deduction
From a P&L Management standpoint the assumptions are the following, however I will build a best case scenario as well as a worst case scenario:
The initial investment in this entire business line is 200000 EUR that is supposed to cover phase 1 platform as well as phase 2 platform and the initial IT hardware and software cost. I will calculate a Net Present Value with some optimistic assumptions as well as a worst case scenario.
I assumed that we will start with 1 employee from IT and in a short period of time we will reach up to 4 employees with 3 back office/sales colleagues. Totally we will end the year with 7 people (3 management + 4 employee ). I plan to grow the total people to 15 in 2019 and 20 in 2020 .
I assumed the average gross salary as 1.500 euro (including bonus and employer contributions) with 4.5% per year increase based on mercer study (Compensation Planning for 2017, November 2016, Mercer)
I assumed that we need 150 sqm office space which will be enough for us until 2020. We assumed that 10 euro per sqm rent and 3 euro sqm maintenance (Prices based on Collier research February 2017). According to Collier Study rents have remained unchanged in the last 2.5 years however we will index them with the inflation rate of 3% for 2018 and 3,4% or 2019 according to the national bank forecast (http://www.bnr.ro/Proiectii-BNR-6152.aspx)
Equipment and Maintenance for Year 1 is estimated at 30000 EUR including 12000 investment for phase 1 of the platform as well as software and hardware cost. In year two 30000 would cover the annual cost for increasing and maintaining IT capacity from a storage and processing standpoint primarily. 140000 EUR is the cost of the entire platform with depreciation in year 2 of 60000 EUR. In year 3 the similar 30000 annual cost for IT capacity however we will have the remaining depreciation of 80000 investments in the final platform.
Utilities are estimated at 6000 EUR annually (electricity, water etc.) they will be indexed with the same inflation rate quoted by the national bank forecast
Supplies and IT consumables are estimated at 6000 EUR per annum and are indexed until 2020 with the annual inflation rate quoted by the national bank.
Compliance and risk is estimated first and second year to be extremely reduced. This is due to the fact that risk assessment of the clients system and processes was already performed when the processes were transferred to SG EBS. However considering year the target is customers that are not currently in the Shared Service centers the Risk asessement of such an audit will be dramatically increased. Here the estimation is the Man-day cost currently practiced within the bank of 600 EUR/day and duration of around 228 Man days . Total cost year 3 would be137000 EUR.
I assumed to pay 16% income corporate tax
From an optimistic scenario standpoint I have taken into account a 15% fee as well as 100% recovery of the faulty transaction as well 100% coverage of the group by Year 3. The numbers would look in the following way.
To calculate the net present value I will use a discount rate of 30% which is the usual rate of return expected from internal projects. Taking into account the initial investment of 200,000 EUR and the above mentioned discount rate we would have the following results
In a positive scenario the net present value would result in close to 3.8 million EUR in net present value.
From a pessimistic scenario standpoint I have taken into account a fee of just 5% with an 80% recovery rate of the faulty transactions and only an 80% coverage of the groupse spent by Year 3. The numbers would look in the following way.
We will use the same discount rate of 30%to compute the net present value in a pessimistic scenario.
Also in this case we are having a positive net present value of 530000 EUR so the recommendation is to invest in this solution.
Risk Management
Let’s look now from at the risk aspect of this investment project. We will run the simulation for both the best case as well as the worst case scenario. We will use for this simulation the risk analysis software from Palisade that is using the Monte Carlo simulation and is available under the following coordinates (http://www.palisade.com/risk/) . The Monte Carlo simulation is a mathematical model that plots probability distributions for any factor that is used in the analysis and that has a degree of uncertainty.
In order to calculate the probability for our NPV we need several coordinates. First of all we will use our discount factor of 30% that is usually used within Societe Generale as a constant in project evaluations. In terms of investment costs in both scenarios I will use 200,000 EUR as the best estimation for the phase 1 and phase 2 platform and initial IT hardware. Annual fixed cost are calculated as the sum of facilities cost, utilities, supplies and IT consumables as well as on-going IT hardware requirements (rounded at 66000 per annum).
Annual growth rate is around 280% in line with the growth estimates provided between year 1,2 and 3 and with variable cost at 15% out of revenue . Due to the calculation model of the risk analysis tool I have inputted worst, most likely and best case inputs for investment, revenue and fixed cost. For the annual growth rate I have used a standard deviation of 30% due to the dependency on the recovery of the actual financial leakages (i.e. not everything that is discovered is also recovered) as well as the fee that is used to determine the revenue i.e. between 5 and 15%. Net present value is calculated below.
From the results we can conclude the following:
The net present value of this project will remain positive in all circumstances with the minimum, mean and maximum NPV shown in the details below.
There is only a 5% probability that the NPV will be below 3,45 Mil or above 3,75 Mil Eur
In terms of impact on the overall NPV the highest impact is the year 1 revenue estimation. As the revenue varies over its range the NPV can fluctuate between 3,45 Mil and 3,74 Million. Similarly the variation of the investment cost may impact the NPV which will be between 3,58 Mil and 3,62 Mil EUR.
Similarly for the pessimistic scenario where the fee applied to the discovered leakages is only 5% and a 80% recovery rate and we use a discount rate of 30% with the same investment cost. Revenue growth estimates are also similar however there is a significant difference in the variable cost percentage. This is due to the fact that this cost is calculated out of the revenue which decreases significantly with the lower fee applied.
Standard deviation for the annual growth rate and the variable cost percentage remains the same
Ranges for the revenue for the first year were adjusted even further to include an even more pessimistic scenario for the first year revenue.
From the results of the analysis on the pessimistic scenario we can conclude the following:
NPV remains positive even in the revenue in all scenrios with the details shown below
There is only 5% probability that the NPV will be lower than 490 k Euros and higher then 520k Eur with a 90% probability for it to be within this range.
Highest impact on the NPV is again the Year 1 revenu which as it fluctuates along its range the NPV can be between 490k and 520 k Euros.
The above detailed risk analysis has confirmed that both an optimistic and pessimistic scenario would generate a positive NPV and as such the recommendation would be to invest in this project by SG EBS.
Developing this project would position SG EBS as key value driver within the group as well as offer the possibility at some point to go beyond the boundaries of the service to internal entities and expand to the Romania market with a new service proposition.
Copyright Notice
© Licențiada.org respectă drepturile de proprietate intelectuală și așteaptă ca toți utilizatorii să facă același lucru. Dacă consideri că un conținut de pe site încalcă drepturile tale de autor, te rugăm să trimiți o notificare DMCA.
Acest articol: Understanding how Artificial Intelligence can redesign recovery audit services [308705] (ID: 308705)
Dacă considerați că acest conținut vă încalcă drepturile de autor, vă rugăm să depuneți o cerere pe pagina noastră Copyright Takedown.
