Processing big data with Hadoop

Published on 30/11/2016 in News

Processing big data with Hadoop

More and more companies are working with vast amounts of customer data collected by their systems. “Proximus uses Hadoop for this, an open-source framework with which companies can process and analyze big data,” explains Marc Matthys, Domain Manager in IT at Proximus.

We use Hadoop to analyze certain types of customer data. The aim is to get to know our customers better and be able to predict what they expect and what they need. That way, we can raise our service provision to a higher level and improve customer satisfaction still further. Hadoop contains a whole ecosystem with datawarehousing and all sorts of tools to collect, load and analyze data. It’s far more than just business intelligence.”

What can you do with Hadoop?

“The main reason for working with Hadoop is that you want to assemble large volumes of data,” says Marc Matthys. “Volumes that would cost you a lot of money if you were to keep them in conventional databases or data warehouses. And, with Hadoop, you can bring together complex data sources such as social media, sound and video, sensors in the Internet of Things, etc. to analyze them. Other reasons for using Hadoop are the data migration from conventional databases/ data warehouses or, quite simply, the cheap archiving of information that’s always within easy reach.”

What do you need as a company to start with Hadoop?

“First and foremost, experienced system engineers and big data experts. Of course, you also need a number of servers that you allow to cooperate with Hadoop as a cluster. This combines the strength of all the individual clusters. In principle, any server can be used for this, but it is best to work with similar systems that are called nodes in Hadoop. Alternatively, you could install Hadoop on a PC to start with, to get to know the technology.”

What difficulties might you encounter and how do you deal with them? 

“The difficulties depend on the configuration you have chosen and how you want to integrate the rest of your IT installations,” said Matthys. “The main way to resolve problems is to have experienced IT people backed up by your Hadoop provider, or you find solutions yourself in the open-source communities.”

How is Hadoop implemented?

“It basically involves five stages:

  1. A Hadoop project starts by choosing a channel of distribution: commercial or the open-source version.
  2. Then you decide on your hardware configuration depending on your objective(s). Do you want one large cluster with which you will support a number of different projects or applications? Or would you prefer several clusters? Both scenarios have their advantages and disadvantages.
  3. Once the OS has been installed (usually Linux), Hadoop is installed and configured.
  4. After that, the software developers can get down to work with the various components in the ecosystem.
  5. Finally, you determine how your users will work with the information that you provide. Lots of tools are available to the data scientists, but they are fairly complex. In addition, there are countless commercial solutions with user-friendly interfaces. The conventional IT providers are also integrating their solutions with Hadoop so as not to miss the boat.”
Why did Proximus go for a commercial version of Hadoop?

“Hadoop is an open-source technology: it evolves thanks to contributions from universities, individuals and companies across the world. But, as a company, you need to be able to rely on a stable version that has undergone the necessary tests on the most common IT infrastructure. A handful of companies offer a version like this plus online support at a price that depends on the size of your work environment. Proximus chose Hortonworks,” Matthys concludes.

Don’t hesitate to visit our Proximus Analytics page for more information


One magazine is the Proximus B2B magazine for CIOs and IT professionals in large and medium-sized organisations.

Other articles of One