Using Spark and ThingSpan for Intelligence Analytics
Human Intelligence (HUMINT) consists of a huge graph of connected snippets of information about criminals and terrorists, plus analyst reports and a wealth of background information. In this example, we will deal with data that is primarily about telephone metadata, which includes Call Detail Records and the people involved in the calls.
We will look for suspicions patterns of calls, and, if we find any, we will try to determine whether any of the people involved has been seen sighted near a potential target, such as an important government facility.
We start by loading Person, Call Detail Record, Locations and Sightings into a ThingSpan graph structure, shown in Figure 1 below. This can be done in parallel if a large amount of new information has to be loaded. In practice, hundreds of millions of objects and connections may be loaded per day.
Fig. 1: HUMINT graph in Objectivity’s ThingSpan
The next step is to run Spark GraphX graph algorithms to examine the Person to Call_Detail_Record to Person relationships to look for islands within the graph, shown in Figure 2 below. Islands are subgraphs whose members are connected in a cluster but are not connected to other members of the graph. ThingSpan automatically generates the DataFrames that Spark components use to access data, whether it is in HDFS or any other datastore.
Fig. 2: Island identified in graph
There is indeed an island, consisting of the group P7, P8, P11, P14 and P15. No calls were placed to people outside of the group. This may be because they are a closely knit but isolated family or a bunch of teenagers, so some further data mining on their demographics can be done, but it is probably just as fast to move to the next step.
In Figure 3 below, we now use ThingSpan graph analytics to look for connections that record sightings (“Seen”) between the people in the isolated group and places that we know may be targets for attacks of various kinds. This is a simple navigational query, using only the sighting connections and ignoring the phone calls. We could also use a pathfinding query to find all paths of any type except those involving phone calls.
Fig. 3: Sightings of people of interest identified in graph
The ThingSpan navigational query reveals that Persons P14 and P15 have been sighted near PlaceX, which is a potential target. Therefore, it would be wise to dig deeper by getting a warrant to find out more about the people of interest. This investigation should also include the other members of the group, P7, P8 and P11, as shown in Figure 4 below.
Fig. 4: Deeper investigation of people of interest
In this example, we have shown how the simple combination of Spark components and ThingSpan graph analytics can be used to find patterns in Human Intelligence. ThingSpan can be used to load streaming Fast Data (the Call Detail Records) and static Big Data (people, locations and sightings) whilst simultaneously carrying out parallel pattern-finding queries.
The architecture of such a system is shown below in Figure 5. Apache YARN is used to control the workflows and monitor the ThingSpan components in order to improve service availability.
Fig. 5: ThingSpan architecture
If you’re interested in learning more about ThingSpan uses open source technologies like Spark, YARN, and HDFS to enable intelligence analytics, download our solution brief or schedule a demonstration by contacting us. We look forward to working with you!
Sponsored by Objectivity