April 12, 2024

Junior Data Engineer for ML/DL/NLP data science in the field of Research and Innovation and Public Policy

Summary

Based in Barcelona, SIRIS Academic offers junior positions for postgraduate students. We seek highly motivated candidates with a Bachelor's or Master's degree in Software Engineering, Computer Science or Engineering who are eager to embark on a career as data engineer in our Data Science team.

If you are passionate about data engineering and want to make a significant impact in data solutions and analytics supporting better science, innovation and public policy, SIRIS Academic could be perfect for you. Take the first step towards a rewarding and challenging career in data and join us in our mission to use data and innovation to shape a better future.

Profile

Open position

Based in Barcelona, you will work full-time as a junior or entry level Data Engineer, developing data-based tools and services in the context of public policy and scientific research and innovation. 

You will help public administrations and governments make more rational and accountable decisions, and support universities and research organisations to focus their research on societal needs and high-impact innovation.

Your work will be mainly dedicated to

  • Data ingestion: Collection, preparation, analysis, and management of data sources in the fields of public policy and scientific research and innovation. This includes data retrieval from remote sources, data scraping, data integration, data cleaning, data matching, and statistical analysis.
  • Machine & deep learning: Assisting in the development of machine learning pipelines, especially in the area of natural language processing (NLP), to enrich data and support advanced analytics in R&D initiatives.
  • Data analytics: Application of advanced analytics and AI tools to manage, refine, and enrich different data sources. This will aid in evidence-based decision making in strategic processes, the preparation of analytical reports, and the development and updating of data platforms and monitoring systems.
Knowledge, skills and abilities

Who you are

The ideal candidate should have:

  • Bachelor's or Master's degree in Software Engineering, Computer Science or Engineering.
  • Solid engineering and coding skills, with the ability to write well-crafted, well-tested, readable and maintainable code. 
  • Proficiency in Java or Python, ideally both.
  • Experience in data modelling and demonstrated proficiency in SQL.
  • Understanding of ETL/ELT infrastructures and Cloud architecture. Experience working with APIs.
  • Experience in data ingestion, custom ETL design, implementation and maintenance. 
  • Experience working with analysis tools such as Pandas, SciPy, Scikit, Jupyter/iPython notebooks.
  • Proficiency in English plus, ideally, at least one of the following languages: French, Italian, Spanish, Catalan.
  • Pay close attention to the complexity and uniqueness of the data and adhere to data quality assurance.
  • Appreciate the importance of good documentation, both internal and external.
  • Be able to work with and improve existing code.
  • Be a fast learner who is not afraid to immerse themselves in new and challenging areas.
  • Be curious and interested in the fields of public policy and research and innovation.

Nice to have

While you don't need to be a world-class expert in all the technologies listed below, it would be and advantage if you have and understanding and fluency in at least some of them:

  • Version-control systems: Experience in Git, with experience using platforms such as GitHub, GitLab, or Bitbucket.
  • Containerization and Infrastructure as Code, including Docker and general CI/CD experience. 
  • Data workflows: experience with workflow management engines (e.g. Airflow, Google Cloud Composer).
  • Cloud services, familiarity with cloud solutions like Amazon Web Services or Google Cloud Platform. Experience working with cloud or on-prem Big Data/MPP analytics platform (e.g. AWS Redshift, Google BigQuery or similar).
  • Semantic web and Linked Open Data (LOD) technologies: Knowledge of technologies such as SPARQL, RDF, and OWL.
  • Machine Learning libraries: Prior exposure to machine learning and previous use of libraries like PyTorch, TensorFlow, Scikit-learn, HuggingFace, Gensim, Spacy, and NLTK. 
  • Test Driven Development: Previous experience with unit testing and shift-left approaches to automate data validation.

These skills are not essential, but they will certainly give you an advantage. We value the ability to learn and adapt over existing expertise, so if you're enthusiastic about data science and willing to learn, we'd love to hear from you.

What you will do

In this role, you will be instrumental in improving our data science environment for the R&D division of the Data Science team as well as supporting our transversal data engineering and back-end activities. The role involves supporting the development of data pipelines and the creation of self-service data tools. 

You will be responsible for:

  • Collaborating with Data Scientists to bring state of the art machine learning models to production.
  • Designing and implementing new features, refactoring existing code, and applying data engineering good practices to R&D projects.
  • Developing and maintaining data pipelines to extract data from various sources (static files, databases, APIs) and integrate it into projects and internal data services, adhering to data modelling best practices.
  • Conducting experiments in your domain to enhance precision, recall, or cost savings.
  • Working with Python or JVM-based languages on Airflow-based data pipelines.
  • Building the necessary infrastructure for optimal extraction, transformation, and loading of data from a variety of data sources using AWS technologies.
  • Participating in Research & Development activities under the guidance of experienced researchers within the company. Improving the current data and computing architecture for R&D initiatives.
  • Writing and updating technical documentation 

Please have a look at our R&D activities and  some of our projects for a quick snapshot of what we do and what you will be involved in.

Who will you work with

You will have the opportunity to work in a dynamic, dedicated, multicultural, multilingual, and passionate team of professionals. Our data engineering and data science team uses and develops state of the art tools and methodologies from back-end to analytics to front-end, amongst others: Linked Open Data, Virtual Knowledge Graphs, NLP, DL Classifiers, Generative AI, advanced interactive visualisations.

We frequently publish our research and distribute resources openly 

We maintain UNiCS, an open data platform based on semantic technologies that integrates open data repositories about higher education, research and innovation.

Contract
  • Full time position (37,5 hours/week). 
  • After a successful 6-month trial period, the contract will become permanent. 
  • We also offer the possibility of partial remote work.
Hiring Process

Send CV and cover letter to jobs@sirisacademic.com before 30 april 2024.

Please add the code “2024JuniorDataEngineer-RD” to the subject of the e-mail.

Only the applications that include both the CV and Cover Letter will be considered in the selection process. The selection process will unfold in accordance with the following steps and calendar:

  1. After an initial CV screening, a first interview will be carried out between 30th April and 15th of May
  2. A final round of face to face interviews for selected candidates will happen between 15th May and 31st of May.
  3. If selected, you will join SIRIS by the June-July, at the latest.
Benefits
  • Salary range is dependent on experience and in line with industry standards.
  • 33 days of vacation and an extra holiday on your birthday
  • Flexible working hours and support for work-life balance.
  • Complimentary Health insurance included.
  • Specific bonus tickets are provided to cover either public transportation, childcare, restaurants, or gym memberships.
  • You will have a workstation in a shared office and a laptop with the operating system of your choice.
  • Our office is located in downtown Barcelona, close to Mercat de Santa Caterina and is beautiful and lively.
  • We cover all work and travel-related expenses.
  • Relocation/moving budget.
  • We are committed to maintaining an ethical and sustainable working environment.
Context
Who we are and what we do?
What?
We are a consulting firm, fully owned by a non for profit Foundation. We work with decision makers in the public or non-profit sector, and provide them with insights, processes and tools, which challenge them to think and enable them to accomplish their mission better.Our services can take many forms: a strategic or analytical report, a business intelligence tool, a participatory workshop, a coaching session, the moderation of a discussion, a political recommendation, etc.But if the form varies, the core is always the same: what matters to us is to help leaders make better, more informed and more accountable decisions for the benefit of the citizens.

How? We approach problems as researchers: questioning systematically, proposing models, looking for evidence. We never defend a position which we have not tested as deeply as we can. We provide decision-makers with the tools and arguments to make decisions, but we also understand that their job involves political compromises and that the ideal decision is not always either possible or optimal.

Who? Born from research, our team of 45 people is a colourful mix, with specialities ranging from archaeology to physics, and philosophy to biology. Our team is held together by a strong ethical vision, based on the respect of the contribution of everyone, the belief in collective wisdom, and a deep curiosity.

Where? Based in Barcelona, we work with leaders and senior management in over 25 countries, helping analyse potential, build capacity and shape strategy for institutions such as philanthropic foundations, the European Commission, agencies and governments and over 200 Universities and Research Centres across the continent.

With whom? Our clients are leaders and managers in universities, regional governments, ministries, research organisations, not-for-profit foundations, etc.

In which sectors? Currently, most of our activity focuses on the sectors of higher education and research, science & innovation policy.

And… why? Deep down, we want to make the world a better place by improving public policies. We are strong believers in the power of science-based evidence, participative approaches and accountability processes to improve the way public policies are designed and decisions taken, day after day.