By Aqsa Hameed
Department of Computer Science Faculty of Sciences
University of Agriculture, Faisalabad, Pakistan
Data mining is a diagnostic procedure used to investigate substantial measure of information. As databases are becoming popular quickly in business and in all different fields of life the immense measure of information is produced from various heterogeneous sources. Also, now databases are moving towards information warehouses. In Data Warehouses, there is multidimensional information in different organizations and complex in nature called Big Data. It is utilized for examination and reporting purposes in associations. It is important to mine this data to get valuable information.
Current data mining systems are not pertinent on Big datasets. A project is working to measure Internet (end-to-end) performance named as PingER. It is led by SLAC since 1998. This project has been generated huge amount of data which can reveal interesting information about power cuts, network bottlenecks and packet loss, etc. It is important to analyze PingER data to look at trends on internet connections, but it is not possible currently because loading such huge amount of data is not possible and this data is also not available for user access. In this research PingER data are analyzed by loading into Big Data platform (Data Warehouse OR HDFS) and processed by using data mining MR framework. Impala OLAP queries are applied to mine the results and get information from DWH. This resulted information is in complex format so this information is further converted in graphical form by applying visualization techniques as Bar chart and Line chart.
This process makes the information access easy and understandable for users and provides better enhanced storage architecture to store big data.
Download Thesis (Link to .PDF): https://confluence.slac.stanford.edu/download/attachments/123309267/full%20thesis-Aqsa%20Hameed.pdf