Data Engineer - Singapore - THALES DIS (SINGAPORE) PTE. LTD.

    THALES DIS (SINGAPORE) PTE. LTD.
    Default job background
    Description
    Roles & Responsibilities

    As a Data Engineer in AIR Lab, you should be someone who enjoys designing, discussing topics around processing patterns like data quality control, streaming SQL, data sources/sinks and data synchronization; streaming-backfill, stream-to-stream joins etc. You should be someone who cares about the quality of the technical implementation and delivery as much as you care about the quality of delivery. You should be someone who enjoys working in a team of diverse people with multiple ethnic and cultural backgrounds. You should be someone who enjoys diving into the technical details of figuring out a problem and be able to communicate the solution back to the team so that the members can learn from it. You should be someone who loves learning new technologies and find innovative ways to apply newfound knowledge and be courageous to encourage fellow team members to be like YOU and enjoy participating in all aspects of engineering activities in the AIR Lab.

    Responsibilities:

    • Improve and maintain the DataLake cybersecurity posture with regards to data governance and cybersecurity standards by working with other stakeholders (e.g., Data Architect, Data Assessment Office, Cybersecurity Office).
    • Improve and maintain the DataLake service levels for reliable data flow, health of infrastructure (i.e., compute and storage).
    • Improve and maintain the total-cost of ownership of the DataLake; this activity includes raising efficiencies around FinOps, CloudOps.
    • Improve and maintain the architecture transforming data between the DataLake and a distributed search and analytics engine (e.g., ElasticSearch)
    • Lead the technical evolution of the DataLake by conducting the following activities (non-exhaustively) exploring new methods, techniques, algorithms (e.g., data meshes, AI/MLOps infrastructure).
    • Improve and maintain the data model, data catalogue (e.g., event data, batched data, persisted, ephemeral).
    • Work with the Data Architect to effect best practices to the engineering organization.
    • To implement features by defining test, develop feature and associated automated tests. If appropriate, implement security tests and load tests.
    • Write and review the necessary technical and functional documentation in documentation repositories (e.g., , JIRA, READMEs).
    • Work in an agile, cross-functional multinational team, actively engaging to support the success of the team.

    Requirements:

    Education

    • Bachelors in Computer Science or Information Technology
    • Master's degree in Computer Science or Data Science, if applicable

    Essential Skills/Experience

    • Proficiency in designing, implementing ETL data pipelines (with structured or unstructured data) using the frameworks like Apache Dataflow/Apache Beam, Apache Flink; proficient in deploying ETL pipelines into Kubernetes cluster in Azure cloud either as virtual machines or containerized workloads.
    • Proficiency in designing, implementing data lifecycle management using scalable object-storage systems like MinIO (e.g., tiering, object expiration, multi-nodal approach)
    • Proficiency in programming languages in Java, Kotlin with a focus around design and development for scalable applications with microservices and monolith approaches.
    • Proficiency in developing performant abstract data structures (e.g., deterministic data lookups versus heuristic data lookups); able to conduct independent research of methods, techniques and algorithms.
    • Demonstrated application of working with Continuous Integration and/or Continuous Delivery models; you are expected to be familiar with using Linux (e.g., shell commands)
    • Proficiency in distributed source code management tools like GitLab, Github and practice GitOps
    • With respects to ETL pipelines, you are expected to demonstrate proficiency in the following.
    1. pipeline configuration using Gitlab
    2. environment management using Gitlab; it's a bonus if you have demonstrated experience in deployment management (e.g., canary, blue/green rollouts) using Gitlab
    • Familiar with cloud deployment strategies to public clouds (e.g., Azure Cloud, AWS, GCP) and Kubernetes using virtualized and containerized workloads (e.g., Kaniko, Docker, Virtual Machine)
    • Good communication skills in English

    Desirable Skills/Experience

    • Working knowledge of designing application with a "shift-left" cybersecurity approach.
    • Working knowledge of other languages (e.g., Python3, Scala2 or Scala3, Go, TypeScript, C, C++17, Java17)
    • Implement event-driven processing pipelines using frameworks like Apache Kafka, Apache Samza
    • Familiar with the cloud deployment models (e.g., public, private, community and hybrid)
    • Familiar with the main cloud service models: Software as a Service, Platform as a Service and Infrastructure as a Service.
    • Familiar with designing and/or implementing AI/MLOps pipelines in public cloud (e.g., Azure, AWS, GCP)

    Essential / Desirable Traits

    • Possess learning agility, flexibility and pro-activity
    • Comfortable with agile teamwork and user engagement
    Tell employers what skills you have

    Kubernetes
    Azure
    Pipelines
    Data Structures
    Architect
    Kotlin
    ETL
    Data Quality
    Data Governance
    Data Engineering
    SQL
    Continuous Integration
    Docker
    Data Science
    Java
    Apache
    Linux