We are currently partnering with a growing scaleup company based in Berlin who are looking for a remote Data Engineer with strong Spark knowledge to join their amazing team. This company has recently received massive funding and are growing rapidly. Please see below more details on the position:
Your mission with us
Empower the BI team to grow and scale through more efficient current data engineering and new data competencies.
You will achieve this by
- Taking ownership and implementing highly scalable, big data jobs with Apache Spark.
- Working focused around our growing need for data; building on our existing data models and data warehouses you will conceptualise different options to be used to design big data solutions for implementation in a scale-up environment.
- Working comfortably with Jupyter Notebooks (or similar) and ETL tools (i.e. Airflow).
- Designing, documenting and implementing data models for the data warehouse.
- Being a sparring partner for the Data Engineering, Development and Data Science community
- Working with data engineers, data analysts and data scientists and applying best fitting machine learning models on a variety of data to provide business insights.
- Implementing solutions using python
- Working with the BI team to build out cloud computing and parallel processing (EMR, EC2, Athena/Redshift Spectrum services).
- Making Apache Spark related devops configurations and changes when needed in Linux/Mac/BSD environments.
- Creating the most fitting development and production related processes, as well as environments for Apache SparkUtilizing your experience in working with CI systems (i.e. Concourse) and testing approaches.
- Refining and orchestrating data warehouse systems (Postgres/MySQL).
- Handling releases and using version control in the Github repository.
Skills and experience required
- A degree in Computer Science or related technical field or equivalent work experience.
- At least two years of experience working with Apache Spark, and also two years of experience with cloud computing (like AWS).
- Work experience with Jupyter, Zeppelin and ETL tools like Apache Airflow (or similar) is required.
- Experience with CI systems and Github.
- High motivation in deepening your coding skills and adapting to new technologies.
- Relevant working experience with Linux, databases and/or data warehouses is essential.
- A unique combination of working in teams, as well as independently.
Nice to have:
- Experience with PyTorch, Tensorflow is not required but is good to have solid knowledge of Python3, Pandas, NumPy, Scipy, MatPlotlib (or similar), Spark mllib
- Experience with applying mathematical functions, fitting data models; data analytics, data science, machine learning, data mining solutions.
You add to our culture with
- Your motivation to join our team
- Your proactivity and passion for Data Engineering and strong Apache Spark expertise.
- Your strong communication skills in English.