> Bluecoders is recruiting! Apply now 📎

Collecting data is good. Being able to store it, make it reliable and analyze it is better! To do this, companies need experts who combine technical skills with an understanding of business issues: Data Engineers.

What is data?

All companies collect data voluntarily or not. Through their commercial, marketing and operational activities, they collect a volume of data every day proportional to the intensity of their activity: CRM, connected objects, social networks, search engines, service by the user, etc.

For the last twenty years and with the exponential growth of technologies, they have been learning how to use them, internally to improve their product/service, to support decision making, to automate through AI, to measure, etc., or externally by reselling them to other companies.

Now that everyone has understood the importance of data, it is no longer a question of wasting a single drop. It is the new wealth of the 21st century: while oil reserves are running out, data reserves are filling up ... and they have no limits.

What is the role of the Data Engineer?

More and more companies are hiring data engineers to understand and visualize data.

Why do companies need this job?

All companies are led by their activity to collect a large amount of data and wish to value them in different ways:

  • External: resale in B2B
  • Internal: product improvement, decision support, decision impact measurement, etc...

We distinguish between companies whose product and/or business depends on the collection of data (advertising, digital marketing, social networks, streaming platforms, etc.) and those whose activity involves a large volume of data (media, networking platforms).

All these companies need the skills of a Data Engineer at the entry point of the data valorisation chain, in order to collect raw data, transform it into usable data and then format it to make it available to :

  • Data scientist, who need data in the right format to be taken into account by the algorithms.
  • Data analysts, who exploit data in the form of dashboards, visualization tools or reporting.

Working upstream of data processing, its impact is indirect on the business and is done through the work of data scientists and data analysts. It can be more or less strong depending on the volume of data that the company collects.

But as the gateway to the data, it is indirectly responsible for all the value that will be made of the data collected. Without this profile, it is very complicated, if not impossible, to imagine any tasks related to data within the entire company.

The missions of the Data Engineer

A Data Engineer is someone with a technical background (most often in software development). He/she will build the architecture of the Big Data system and must ensure that he/she can collect, transform and store data from different sources. To do so, he/she develops solutions that allow to process a large volume of data in a limited time.

The job of a Data Engineer is to prepare the ground for a Data Scientist to use the "clean" data in order to exploit it in a more complex way, to draw trends (Insights), to predict, to infer with Machine Learning algorithms.

The Data Engineer will build the architecture of the Big Data system. He will choose storage tools adapted to the type of data and the storage/query ratio.

With an interest in Development and Operations (DevOps), he is in direct collaboration with other data roles. He knows how to balance the release aspect with the rapid iterations of development.

The main challenges it faces are: performance, scalability and management of large volumes of data.

Its role according to the size of the company

In a small tech team, he may be responsible for a data division, confusing the jobs of Data Engineer and Data Analyst. He or she will thus have control over the entire data development cycle without being able to go deeply into the subject.

This requires a horizontal knowledge of data issues without being able to develop a vertical expertise.

In a large tech team, he/she works under the responsibility of a Data Manager (Head of Data, CDO, Lead Data Manager), in collaboration with the Data Scientist and the Data Analyst who can work on the same decision-making issues but with a different output.

The Data Analyst will develop visual (dashboard) and reporting tools, while the Data Scientist will implement predictive models.

In charge of setting up the architecture of the Big Data system (hence his name of data architect), he also works with the devOps to build the data reservoirs called Data Warehouses.

What are the problems of the Data Engineer?

It will allow :

  • Develop and implement a data collection, storage and modeling process. This is where he will work on data infrastructure issues.
  • Set up relational and non-relational databases to allow access to Data Scientist and Data Analyst.
  • Protect and secure the access to the company's data, in order to avoid that other companies or users have access to their database.
  • Establish a data policy respecting the RGPD standards, in order to always be at the European standards of user data protection.
  • To make his Big Data expertise speak in collaboration with the other actors of the pole (Data Scientist & Data Analyst).

Collaboration in the team

The Data Engenieer will work with a Machine Learning Engineern, a Data Scientist or with a Devops.

What are the skills of a data engineer?

He uses NoSQL databases most of the time and will rely on the cloud for infrastructure. He also knows how to use technologies like Airflow and Spark to orchestrate and process these large volumes of data properly.

Generally speaking, the Data Engineer has a developer's background. In order to propose the best solutions, he is an application developer with an appetence for the administration of IT infrastructures.

To summarize, the Data Engineer is a tech profile that specializes in the creation of software solutions around big data.

His soft skills

Rigor, curiosity, communication and team spirit are the key elements to be a good Data Engineer.

Technologies & platforms used

The data engineer will work with several technologies, platforms and tools:

  • DB language: SQL NoSQL
  • Storage and ETL: Redshift, Terradata, Cassandra
  • Processing and manipulation: Spark, Hadoop, Kafka
  • Data Analysis (Hadoop suite): Hbase, Hive
  • Cloud skills: Microsoft Azure, AWS, GCP.

It will use the programming languages :

  • Python
  • Java
  • Go

and a specialized language like (++) :

  • Scala
  • Julia
  • Perl.

What are the training courses to become a data engineer?

The majority of Data Engineers have a background in computer science engineering or a Master's degree in Big Data from a university. Some Data Engineers are also former Software Engineers or Big Data Engineers.

What is the salary of a data engineer?

The salary of a data engineer can double between a junior and a senior profile:

  • Junior Data Engineer: 40 to 50 k€.
  • Confirmed Data Engineer: 48 to 70 k€.
  • Senior Data Engineer: 65 to +100 k€.

How can a career as a data engineer evolve?

Depending on the skills and soft-skills of the candidate, the career can evolve towards :

  • Lead Data Engineer
  • Head of Data
  • ML Engineer
  • Data Scientist

December 8, 2020

Facebook logo
Instagram logo
LinkedIn logo