This white paper provides an introduction to the HPCC Systems Platform that solves large data processing problem.
As a result of the continuing information explosion, many organizations now have the need to process and analyze massive volumes of data. These data-intensive computing requirements can be addressed by scalable systems based on hardware clusters of commodity servers coupled with system software to provide a distributed file storage system, job execution environment, online query capability, parallel application processing, and parallel programming development tools. The LexisNexis HPCC Systems platform provides all of these capabilities in an integrated, easy– to-implement and use, commercially-available, high-performance computing environment. This paper provides an introduction to the LexisNexis HPCC Systems architecture, also referred to,as the LexisNexis Data Analytics Supercomputer (DAS) in government settings.
LexisNexis Risk Solutions, an industry leader in data content, data aggregation, and information services, independently developed and implemented the HPCC Systems platform as a solution for its own data-intensive computing requirements. In a similar manner to Hadoop (the open source implementation of MapReduce), the LexisNexis approach also uses commodity clusters of hardware running the Linux operating system and includes additional system software and middleware components to provide a complete and comprehensive job execution environment and distributed query and filesystem support needed for data-intensive computing.
The HPCC Systems platform includes a powerful high-level, heavily-optimized, data-centric declarative language for parallel data processing called ECL,Enterprise Control Language. Further information on ECL is provided later in this paper. The power, flexibility, advanced capabilities, speed of development, maturity, and ease of use of the ECL programming language is a primary distinguishing factor between the LexisNexis HPCC Systems platform and other data-intensive computing solutions.
Advantages of selecting the LexisNexis HPCC Systems platform for data-intensive computing include: (1) a highly integrated system environment with capabilities from raw data processing to high-performance queries and data analysis using a common language; (2) an optimized cluster approach which provides high performance at a much lower system cost than other system alternatives resulting in significantly lower total cost of ownership (TCO);
(3) a stable and reliable processing environment proven in production applications for varied organizations over
a 10-year period; (4) an innovative data-centric programming language (ECL) with extensive built-in capabilities for data-parallel processing, significantly increasing programmer productivity for application development, which automatically optimizes execution graphs with hundreds of processing steps into single efficient workunits; (5) a high-level of fault resilience and capabilities which reduce the need for re-processing in case of system failures; (6) suitability for a wide range of data-intensive applications from large volume ETL processing to support databases, data warehouses, and high volume online applications to network security analysis of massive amounts of log information; and (7) available from and supported by a well-known leader in information services and “large data” solutions (LexisNexis) which is part of one of the world’s largest publishers of information – RELX Group.
Sponsored by HPCC Systems®