At Paymentology, we are redefining what is possible in the payments space. As the first truly global issuer-processor, we provide banks and fintechs with the technology and talent to launch and manage Mastercard, Visa, and UnionPay cards at scale across more than 60 countries.
Our advanced multi-cloud platform delivers real-time data, unmatched scalability, and the flexibility of shared or dedicated processing instances. This global reach and innovation set us apart.
We are looking for a Site Reliability Engineer to ensure the high availability, scalability, and performance of our platform. This role is essential for maintaining reliable systems, reducing operational overhead, and enabling continuous improvement across our global technology landscape. If you are passionate about automation, incident response, and working at the intersection of infrastructure and software, this is your opportunity to help build resilient systems that power financial inclusion worldwide.
What you get to do :
Platform Reliability and Scalability
- Build software that enhances Paymentology services' scalability and reliability.
- Ensure platform services meet required uptime and service quality levels.
- Contribute to the design of reliable cloud infrastructure and implement reusable cloud-uptime components as code.
- Regularly review and optimize SRE practices, tools, and methodologies to enhance overall system reliability and team efficiency.
Observability and Automation
Contribute to the design, implementation, and maintenance of observability and monitoring solutions to track platform health, cost-effectiveness, reliability, and scalability, and identify potential issues for continuous improvement.Develop and implement automation scripts and tools to streamline operations and reduce manual interventions.Enable product teams to self-serve by participating in the development of a developer platform.Production Issue Resolution
Play an active role with incident response teams diagnosing and resolving production issues quickly to minimize downtime.Standards Compliance
Support product teams in building services that adhere to security and quality standards.Cross-team Collaboration
Work closely with engineering, operations, and product teams to ensure reliability is considered throughout the end-to-end software development lifecycle, fostering a culture of reliability.What you can look forward to :
At Paymentology, it’s not just about building great payment technology; it’s about building a company where people feel they belong and their work matters. You’ll be part of a diverse, global team committed to making a positive impact. Whether working across time zones or supporting local communities, you’ll find purpose and growth opportunities in a supportive, forward-thinking environment.
Travel Requirements :
What it takes to succeed :
Strong understanding of cloud networking principles.Proficiency with monitoring tools such as Datadog, Splunk, Prometheus, Grafana, ELK Stack, and New Relic.Programming expertise, especially in systems programming languages and databases.Familiarity with CI / CD tools like Jenkins, GitHub Actions, Gitlab CI, CodePipelines, CircleCI, and ArgoCD.Proven ability to achieve platform-level and end-to-end SLIs, SLOs, and SLAs, fostering accountability.Ability to navigate complex situations and lead effective post-incident reviews (PIRs).Knowledge of solutions to reduce MTTI and MTTR.Comprehensive understanding of large-scale distributed platform architecture.Expertise in load balancing, fault tolerance, and resource allocation to maintain service quality at scale.Understanding of security best practices within cloud environments.Education and Experience :
Bachelor’s Degree in Computer Science, IT, or related field.Professional experience in a similar role may be considered.At least 2 years as a Site Reliability Engineer.At least 2 years in software development.Extensive cloud experience, especially with AWS.Proven expertise in infrastructure-as-code tools like Terraform, CloudFormation, Puppet, and Ansible.Hands-on experience with Docker, ECS, EKS, and Kubernetes.Remote Work & Employment Type :
Full-time
Key Skills :
Kubernetes, FMEA, Continuous Improvement, Elasticsearch, Go, Root Cause Analysis, Maximo, CMMS, Maintenance, Mechanical Engineering, Manufacturing, Troubleshooting
Experience : 2+ years
Vacancy : 1
#J-18808-Ljbffr