Fujitsu Laboratories Ltd. today announced the development of machine learning technology that enables highly accurate analysis of graph-structured data that expresses the relationships between people and things.
Fujitsu Laboratories has now developed new technology that allows existing deep learning technology, which has already achieved extremely high accuracy in image and voice recognition, to be applied to graph-structured data. Graph-structured data has a complicated structure and mixes a variety of data, such as different sizes and methods of expression, but by transforming different data to a uniform expression called a “tensor”(1), used in cutting-edge mathematics, it becomes possible to do highly accurate machine learning on graph-structured data using deep learning technology.
This technology was used for learning the structure and activities of chemical compounds, based on data from the PubChem BioAssay(2) open database of chemical compounds. It was able to learn the relationships between the structures of several hundred thousand chemical compounds, about 100 times that of previous technology, as well as their individual activities. Also, by extracting features that could not be grasped with existing technology, it achieved about 80% accuracy in predicting activity, a 10% increase compared to existing technology.
In recent years, drug discovery and a variety of other fields utilize composition databases, such as for finance and chemical substances. These databases handle IoT log data for communication between things, or account transactions, and continue to generate an enormous amount of data that can be expressed in a graph structure to show the relationships between people and things (Fig.1). Previously, Fujitsu Laboratories had developed technology, known as “LOD”(3) to retrieve and analyze graph-structured data. It is expected that accurately categorizing and analyzing this graph-structured data will lead to the creation of new value and the opening up of new business areas.
Previously, categorization of graph-structured data was done on the basis of whether such data contained partial graphs people had previously focused on. When categorizing large volumes of graph-structured data, however, there were many yet-to-be-expressed features in the partial graphs that had been explored beforehand, so there were limits to achieving accurate categorization.
Deep learning technology can automatically extract characteristic features from data, attracting attention to such areas as image and voice recognition, but due to the complicated structure and the variety of data sizes and expressions mixed in graph-structured data, it was difficult to apply deep learning technology to the problem.
About the Technology
Fujitsu Laboratories has now developed new deep learning technology that can learn with high accuracy from a variety of graph-structured data that express the connections between people and things. Features of the technology are as follows:
1. New tensor factorization technology converts graph-structured data to a uniform expression
This technology uses a type of mathematical expression called a tensor, an extension of vectors and matrices, to express graph-structured data that has a variety of expression formats (Fig. 1). It uses a mathematical operation called tensor factorization(4), a cutting-edge data mining technology, to transform data to a uniform expression format (Fig. 2). Conventional tensor factorization could not always transform similar graph-structured data into similar tensor expressions, but now Fujitsu Laboratories has developed a technology that can perform tensor factorization in a way that maximizes the degree of similarity to an arbitrary pattern chosen as a basis.
2. Technology that optimizes uniform expressions and neural network learning
By extending the scope of application of back-propagation(5), which is commonly used in the learning process for neural networks, to tensor expressions, this technology simultaneously optimizes uniform expressions to maximize the accuracy of categorization (Fig. 3). Specifically, it updates the basis pattern for tensor expressions according to the amount of the difference in the categorization error of the neural network when the basis pattern is changed.
With this new deep learning technology, it is now possible to use data that can be expressed with a graph structure, such as the communication logs of computers or IoT devices, financial transactions, or chemical compositions, in new analyses.
In a trial in which this technology was applied to data from the PubChem BioAssay open database of the structure and activity of chemical compounds, and then to a virtual screening, which searches for candidate chemical compounds for drugs on a computer, it was able to learn the relationships between the structure and activity of several hundred thousand chemical compounds, about 100 times what was achieved with previous technology using support vector machines(6). By extracting features that could not be grasped with previous technology, it achieved an activity prediction accuracy of about 80%, an improvement of 10% compared with existing technology. It is expected that this will greatly reduce development time and cost, which are pressing issues in drug development.
In addition, Fujitsu Laboratories conducted a trial to detect illicit activity or attacks in which this technology was applied to benchmark data(7) derived from graph-structured data representing the communication relationships between hosts. The result was that false positives were successfully reduced by more than 20% compared with existing methods using support vector machines. It is expected that this will increase the efficiency of network monitoring tasks. Beyond that, by applying this technology to such data as records of transactions with digital currencies or the financing records of social lending services, such improvements as highly accurate detection of improper monetary manipulation or sophisticated judgements of suitability for lending become possible.
Fujitsu Laboratories will continue to further improve the accuracy of its categorization technology for graph-structured data, aiming to bring it into practical implementation as a core technology of Human Centric AI Zinrai. In addition, Fujitsu Laboratories will continue to expand the applicability of deep learning technology to more diverse data formats, providing advanced data analysis in a variety of fields from the first half of fiscal 2017.