Big Data: Three questions to Bigstep
Ioana Hreninciuc, commercial director, Big Step.
Q1. What is your current product offering?
Bigstep is an Infrastructure as a Service (Iaas) provider that combines bare metal servers with the flexibility of the cloud to create the world’s most powerful public computing infrastructure. It allows organisations to process big data faster and more effectively than any other means. We can achieve such power and performance because we have improved three essential aspects:
We have removed the hypervisor, which, in the case of big data workloads, can waste 20-80% of a physical server’s processing capabilities. It’s not just that the hypervisor itself needs resources to work but the delay that it introduces by performing software switching and resource management can add up quite significantly when the machine needs to perform billions of operations per second.
We also believe that vertical scalability is important in big data, so our larger instances are actually some of the largest you’ll find in a public cloud. With two ten-core CPUs at 3GHz/core and 192 GB of RAM, they provide massive computational performance and clients have direct access to all of it – we don’t oversell or share resources in any way.
Because we’ve removed the hypervisor, all our networking is also bare metal. Our unique network architecture also allows us to create a Layer 2 interconnect between any 2 ports – which means any connection between two instances in our infrastructure is line-rate. Add cut-through switching to that and we essentially provide the fastest network performance that you can get in a cloud: bare metal cut-through switching + line rate transfer speeds. And to take advantage of this performance at the machine level, we’ve added 4 x 10 Gbps ports to most of our instances. Even our smallest instances come with 4 x 1 Gbps ports – four times the maximum capacity some providers offer.
Our Full Metal Storage is a distributed all-SSD storage system. It not only provides extremely high-performance for both reads and writes but it’s also highly resilient. We use enterprise grade SSD drives and, although this is never going to be the cheapest storage option, the tremendous difference in performance, more than makes up for the one in price.
Q2. Who are your current customers and how do they typically use your products?
Whilst our bare metal cloud can be used by anyone, so far we have seen most demand from three main types of customer:
The most important use-cases we’ve seen so far are:
o Pricing comparison and analysis – retailers will monitor their competition to learn and adapt pricing levels
o Behavioural analysis – user’s behaviour is analysed and paired with recommendation engines – so retailers can send more attractive offers to their clients. Multi-variate testing is also included here.
o Social media analysis – brand sentiment analysis usually
– Security companies
They analyse log files or user behaviour in order to prevent fraud, detect threats or improve security systems.
– Big Data as a Service companies
SaaS companies that allow users to query large amounts of data – either historical or stream data.
All these companies use us for infrastructure and deploy different software stacks. Redis, ElasticSearch and Hadoop (usually Cloudera’s distro) seem to be the names we keep encountering from a software perspective but it might also be a snowball effect – we’re better at addressing needs for certain stacks because we’ve come across them more.
Q3. What are the main new technical features you are currently working on and why?
Right now, the main thing that we’re trying to do is go up the stack and allow users to easily deploy and integrate a number of technologies on our infrastructure. Hadoop distributions and NoSQL DBs are obvious candidates. Initially we thought allowing people to deploy these technologies was enough, but we soon found out that it’s connecting them together that’s the difficult part, not installing them. So now we’re working on ways to create an integrated system, rather than just add disparate software and let users do the hard work.
Also, while testing some of these technologies, we identified new infrastructures needs. Some applications are better at vertical scaling, some at horizontal scaling – some work best with distributed storage, some with local. Based on these findings we’re also reviewing our choice of instances and network capacity.