Open position
Based in Barcelona, you will work full-time as a junior or entry level Data Engineer, developing data-based tools and services in the context of public policy and scientific research and innovation.
You will help public administrations and governments make more rational and accountable decisions, and support universities and research organisations to focus their research on societal needs and high-impact innovation.
Your work will be mainly dedicated to
- Data ingestion: Collection, preparation, analysis, and management of data sources in the fields of public policy and scientific research and innovation. This includes data retrieval from remote sources, data scraping, data integration, data cleaning, data matching, and statistical analysis.
- Machine & deep learning: Assisting in the development of machine learning pipelines, especially in the area of natural language processing (NLP), to enrich data and support advanced analytics in R&D initiatives.
- Data analytics: Application of advanced analytics and AI tools to manage, refine, and enrich different data sources. This will aid in evidence-based decision making in strategic processes, the preparation of analytical reports, and the development and updating of data platforms and monitoring systems.
Who you are
The ideal candidate should have:
- Bachelor's or Master's degree in Software Engineering, Computer Science or Engineering.
- Solid engineering and coding skills, with the ability to write well-crafted, well-tested, readable and maintainable code.
- Proficiency in Java or Python, ideally both.
- Experience in data modelling and demonstrated proficiency in SQL.
- Understanding of ETL/ELT infrastructures and Cloud architecture. Experience working with APIs.
- Experience in data ingestion, custom ETL design, implementation and maintenance.
- Experience working with analysis tools such as Pandas, SciPy, Scikit, Jupyter/iPython notebooks.
- Proficiency in English plus, ideally, at least one of the following languages: French, Italian, Spanish, Catalan.
- Pay close attention to the complexity and uniqueness of the data and adhere to data quality assurance.
- Appreciate the importance of good documentation, both internal and external.
- Be able to work with and improve existing code.
- Be a fast learner who is not afraid to immerse themselves in new and challenging areas.
- Be curious and interested in the fields of public policy and research and innovation.
Nice to have
While you don't need to be a world-class expert in all the technologies listed below, it would be and advantage if you have and understanding and fluency in at least some of them:
- Version-control systems: Experience in Git, with experience using platforms such as GitHub, GitLab, or Bitbucket.
- Containerization and Infrastructure as Code, including Docker and general CI/CD experience.
- Data workflows: experience with workflow management engines (e.g. Airflow, Google Cloud Composer).
- Cloud services, familiarity with cloud solutions like Amazon Web Services or Google Cloud Platform. Experience working with cloud or on-prem Big Data/MPP analytics platform (e.g. AWS Redshift, Google BigQuery or similar).
- Semantic web and Linked Open Data (LOD) technologies: Knowledge of technologies such as SPARQL, RDF, and OWL.
- Machine Learning libraries: Prior exposure to machine learning and previous use of libraries like PyTorch, TensorFlow, Scikit-learn, HuggingFace, Gensim, Spacy, and NLTK.
- Test Driven Development: Previous experience with unit testing and shift-left approaches to automate data validation.
These skills are not essential, but they will certainly give you an advantage. We value the ability to learn and adapt over existing expertise, so if you're enthusiastic about data science and willing to learn, we'd love to hear from you.
What you will do
In this role, you will be instrumental in improving our data science environment for the R&D division of the Data Science team as well as supporting our transversal data engineering and back-end activities. The role involves supporting the development of data pipelines and the creation of self-service data tools.
You will be responsible for:
- Collaborating with Data Scientists to bring state of the art machine learning models to production.
- Designing and implementing new features, refactoring existing code, and applying data engineering good practices to R&D projects.
- Developing and maintaining data pipelines to extract data from various sources (static files, databases, APIs) and integrate it into projects and internal data services, adhering to data modelling best practices.
- Conducting experiments in your domain to enhance precision, recall, or cost savings.
- Working with Python or JVM-based languages on Airflow-based data pipelines.
- Building the necessary infrastructure for optimal extraction, transformation, and loading of data from a variety of data sources using AWS technologies.
- Participating in Research & Development activities under the guidance of experienced researchers within the company. Improving the current data and computing architecture for R&D initiatives.
- Writing and updating technical documentation
Please have a look at our R&D activities and some of our projects for a quick snapshot of what we do and what you will be involved in.
Who will you work with
You will have the opportunity to work in a dynamic, dedicated, multicultural, multilingual, and passionate team of professionals. Our data engineering and data science team uses and develops state of the art tools and methodologies from back-end to analytics to front-end, amongst others: Linked Open Data, Virtual Knowledge Graphs, NLP, DL Classifiers, Generative AI, advanced interactive visualisations.
We frequently publish our research and distribute resources openly
We maintain UNiCS, an open data platform based on semantic technologies that integrates open data repositories about higher education, research and innovation.
Send CV and cover letter to jobs@sirisacademic.com before 30 april 2024.
Please add the code “2024JuniorDataEngineer-RD” to the subject of the e-mail.
Only the applications that include both the CV and Cover Letter will be considered in the selection process. The selection process will unfold in accordance with the following steps and calendar:
- After an initial CV screening, a first interview will be carried out between 30th April and 15th of May
- A final round of face to face interviews for selected candidates will happen between 15th May and 31st of May.
- If selected, you will join SIRIS by the June-July, at the latest.