Site Reliability Engineer

Job Title: Site Reliability Engineer
Contract Type: Permanent
Location: Kuala Lumpur
Salary: RM Attractive
Contact Name: Albert Lim
Contact Email:
Job Published: October 13, 2020 11:58

Job Description

About our client

Our client is one of the leading fin tech start up.


About the role

We are obsessed about delivering a seamless and frictionless retail experience for our customers. We strongly believe that we can only deliver these amazing experiences for our customers and merchants when we drive a work culture which inspires innovation, rewards risk-taking and celebrates success. If you live to solve hard problems, love proving out new technologies and takes pride in your deliverables, then we’d love to meet you!


Your responsibilities

  • Be an expert in infrastructure and develop best practices to help development teams using infrastructure more effectively.

  • Design, build and test out proof of concepts to improve infrastructure performance, efficiency, reliability and scalability.

  • Automate all aspects of deployment and Infrastructure as a Code (IaaC).

  • Ensure all key services are measured, monitored and raising alerts when needed.

  • Optimise cost of our infrastructure and tooling.

  • On point to improve site reliability and provide support for production issues as required.

  • Provide technical guidance and educate team members on CI/CD and DevOps practice.

  • Brainstorm for new ideas and ways to improve development quality and speed.

  • Manage and continuously improve CI and CD pipeline and tooling with development team.

  • Take lead for capacity planning and to help teams anticipate and prepare for growth.

  • Document and update new and existing processes.


You will have

  • 2+ years as DevOps, Infrastructure or Site Reliability engineer for large-scale, distributed systems.

  • Great verbal and written communication skills horizontally and vertically.

  • Deployed microservice architectures in production and understand scaling and high availability concerns.

  • A good understanding of large-scale distributed systems in practice, including multi-tier architectures, application security, monitoring and storage systems.

  • Deployed Docker containers on orchestrators such as Kubernetes, Rancher or Swarm.

  • Built CI/CD pipeline using Gitlab, CircleCI, AWS CodePipeline, Jenkins etc Deployed production workload on AWS, Azure or Google Cloud Platform.

  • Setup monitoring services such as New Relic, DataDog, Grafana+Prometheus, Elastic APM etc.

  • Excellent knowledge on Linux OS and scripting (Bash, PowerShell, Python or similar).

  • Networking knowledge of the TCP/IP stack, internet routing and load balancing.

  • Able to multitask, prioritize, and manage time efficiently.

  • Experience working with a distributed teams across multiple time zones.