The Power and Perils of Security Analytics
BY Pratyusa K. Manadhata, Hewlett Packard Laboratories
Security analytics is the process of collecting, storing, analyzing, and visualizing security relevant datasets to extract actionable security information. One example is detecting malware infected devices in an enterprise (security information) from event logs collected in the enterprise (security data). This is a nascent area that holds promise for improving enterprise security, especially in timely breach detection, and is a marked departure from traditional approaches to securing enterprises.
Enterprises currently use many products such as anti-malware software and intrusion detection systems to detect breaches. These products, however, generate many false alarms and are not reliable. The promise of security analytics is to detect true attacks: collect alarms generated by security products in a network, augment the data with events from infrastructure elements in the network such as domain name systems (DNS) servers and events related to user activities, and then analyze the data to detect attacks in a reliable, scalable, and timely manner.
These datasets provide a holistic view of an enterprise and are a treasure trove of security information.
For example, DNS servers log DNS requests; firewalls log suspicious network traffic; anti-malware products detect malware infections and raise alarms; and HTTP proxy servers log websites accessed by devices. If devices in an enterprise network are infected with malware, then malware may contact their command and control (C&C) servers over DNS and may use compromised user credentials to communicate with and infect other devices in the enterprise. Hence DNS logs and Active Directory logs will contain information about malware activities.
Developing scalable and accurate techniques for detecting attacks from security data, however, remains a challenge for multiple reasons. Big data ecosystems such as Hadoop, Map-Reduce, Spark, and Storm have made the task of data collection, ingestion, storage, and querying easier. But one has to design algorithms and queries to identify patterns of interest from large datasets, essentially to automate a process that is currently manual and heuristics driven.
The algorithms have to continuously evolve to adapt to changes in adversary behavior, user behavior, and network structure. Since benign events outnumber malicious events by orders of magnitude, the algorithms must have extremely low false alarm rates to be usable in practice. Finally, the algorithms should be able to identify ‘intent’ from events as the same event patterns may be generated by both malicious attacker actions and benign user actions.
A much bigger concern with security analytics is its potential impact on privacy. For example, HTTP logs and DNS logs reveal users’ web browsing behavior; unwanted disclosure of the information will violate users’ privacy.
Hence security analytics systems must aim to preserve privacy during the entire process, starting from data collection to data analysis and reporting. Our observation is that privacy sensitive data, e.g., web searches and visits to health related websites, is usually benign data. Hence analytics systems can afford to not collect privacy sensitive data without significantly affecting detection results. In fact, they may produce better results by focusing on a smaller but relevant subset of data. Though counter-intuitive, we may be able to have both privacy and security at the same time.