Graph optimisation for business issues
András Csernenszky (OTP Bank) — Gyula Kovács (Sixtep Ltd) – Miklós Krész (University of Szeged) — András Pluhár (University of Szeged)
Graph optimisation for business issues
Banks, similarly to any large modern institution, accumulate and store huge amount of data. Some of these data naturally have a graph structure, or they can be transformed to have such a structure. It is vital to use these data to enhance the performance of the existing forecasting methods.
There are several possible applications of network based models in the operations of a Corporate Bank. In the market development area these are acquisition, estimating customer value, forecasting attrition (performing at the OTP), product development, sales support and campaign optimisation. There are also some applications for the credit risk department such as forecasting errors, using networks for segmenting risk-groups, conducting money laundering investigations, and mapping the client’s total customer-supplier relationship for business purposes.
Our research is concentrated on the network data about the links of large and medium size companies that can be deduced from the bank transactions. Few years ago the main goal for us was to develop a universal model to predict churn and bankruptcy. That work resulted in the implementation of the Domingos-Richardson cascade model for bankruptcy forecasting in 2009 September, which has since performed in the Bank quite well.
In recent years, we have done a lot of research concerning ‘how to define the network for a particular business problem. ’ This question essentially covers three issues:
- What infection values should be assigned to the vertices?
- When do we connect two companies with an edge?
- If the companies are to be connected, what should be the weight of that edge be?
The first question can be rephrased as: how should we compare different solutions? We developed a standard AI application using learning and test sets and a gain function to decide the above question. This leads to an optimisation problem, where the objective function is only implicitly defined.
The exact definition of the transaction graph is crucial. For forecasting bankruptcy, is it better to analyze all corporate clients with all of their transactions or do we need the debtors only? (In that case we found that it is better to consider just the debtors).
The following question I what should the initial distribution of infection on the vertices be?
Here two solutions looked plausible. One approach is to simply write 1 if the company is bankrupted in the given time period and 0 otherwise (this corresponds to the original Independent Cascade model, where a vertex is either healthy or infected.) The other approach needs more elaboration: we assign a priori default probabilities, the score values, to the vertices. For that case we generalized the model allowing fractional infection in the input. Our previous research confirmed that the second one yields better solution, increasing the efficiency on the reported target segment by 25 percent.
One goal of our current research is to re-examine and redefine parameter optimisation options and handle this issue in a more delicate way. The performance of the method greatly depends on the estimation of the edge infection probabilities. In this research we get the better influence values assigned to the edges (which represent the connections between two companies) that further improve the infection predictions. The cornerstones of these improvements are the involvement on certain static variables (graph structure information: such as community information; transaction data: ’relative traffic, that is the transfer of the edge divided by the sum of all incoming transfer’; application data: the age of a company; behavioural data: number of debits, delinquency etc.) and an appropriate optimisation on the parameters that take their effects into account. To find the best parameters for an infection model we used a grid search algorithm, by which we almost doubled the accuracy ration of our infection model.
The other goal of this research is to define which edges should be considered on a transaction graph of the Bank clients. Seemingly, this problem is just a special case of the previous one (for infection models if the edge weight is zero is equivalent that there is no connection), but it is not the case. Which companies belong to the same community is crucial information in an infection model, since it means stronger connection. We used a dynamic community search algorithm for the same corporate portfolio, considering different time periods. With that innovation we were able to measure the stability of communities. Our goal was to find rules for defining the relevant transactions for increasing the stability of communities regarding the relative transacted amount (due to the company size), and the frequency, regularity of the transactions as well. The result is that the stable edges and communities mean even stronger dependence between companies regarding both bankruptcy and churn.
Topics: Financial Statistics, Data mining, Networks,
Keywords: Graph mining, Influence models, Communities, Dynamic community, Graph optimisation






