|
Data Mining for COMPASS Simulation Model
Masaki Teraoka
Visiting Researcher, SFC Research Institute Kimio Uno Professor, the Graduate School of Media & Governance and the Faculty of Policy Management, Keio University
The simulation models are effective tools for policy experiments that are impossible to be executed in the real world. However, a detail description of the real world could cause complication in the model structure and the enlargement of the input and the output data. Furthermore, it is difficult to obtain meaningful information from the huge amount of output data with the typical hypothesis-testing method because the quantity of the data exceeds the data processing ability of human being.
In this research, extraction of the meaningful information from the simulation models is accomplished by applying the data mining technique to the accumulated output data of the simulation model. Traditionally, hypothesis testing is operated with hypothesis set by analyst and data prepared for validation. However, in this research, extraction of hypothesis from the data themselves has been succeeded, thus, making it possible to utilize the accumulated data of the simulation model. This new method enables an effective analysis to the output data of simulation model, and shed new light on information that is usually buried deeply under the data themselves.
In this research, extraction of the meaningful information is accomplished by applying data mining technique to the accumulated output data of the simulation models.
Simulation models are effective in the following situations:
-A real system is too complex to predict outcome.
However, the target of a model is often too determinative, or constrained with a lot of supposition and hypothesis. On the other hand, the detailed description of the phenomenon causes the complication and the enlargement of the models. In the detailed simulation models, enormous amount of data are used but it is not always possible to utilize such amount of data because of its overwhelming quantity. Obtaining the meaningful information from the output data is disturbed by the difficulty when only using the methods of conventional hypothesis testing and data processing throughput of the analysts. -An experiment in the real system requires huge amount of time, cost or risks -A system does not exist yet -Several solutions must be tested in a system The data mining is the general term for the set of data processing technologies that enable tactical utilization of data as the knowledge. It is mainly developed in the field of management of customer relation in enterprises and analysis of market trends. As the background of the development of the data mining technology, there have been strong demands in the business management and the marketing to put more emphasis on the customer. Also, the magnification of the customer data, the expansion of data storage capability in the database, the improvement of the data analysis technology and the advancement in computer processing technology are playing a big role in the development of data mining. The new method developed in this research enables effective analysis to the output data of simulation models, and sheds new light on information that is usually buried deeply under the data themselves. With the abilities of extracting hypothesis from data themselves and covering the area beyond the human data processing throughput, this research shows the possibility of effective utilization for the accumulated data of the simulation model. As achievements of the Uno Laboratory in 2002, a prototype of the analysis system has been created and the proof experiments were carried out. As the result, the operation of data mining for the COMPASS simulation model is confirmed to be effective, and the analysis system that could be used by analysts has been developed successfully. Moreover, the analysis system that works through the network was implemented. Using TCP/IP network, technically, it is now possible from anywhere to execute data mining for the COMPASS simulation model. However, with the consideration on the copyright of data and the network security, the permission is only opened to the local area network in the Uno Laboratory. The connections from the other place are accepted after the necessary application. In addition, as a sharing of knowledge, Teraoka(2003) has wrote "Data mining for COMPASS simulation model - System Reference Manual". |