Job Details

Site Reliability Engineer

Full Time

Pflugerville, TX

Experience:

3+ years

No. of openings:

Pay:

$60 - $70

Job Information

Seeking an experienced Platform Site Reliability Engineer (SRE) with 3+ years of expertise in Kubernetes, AWS, and cloud-native infrastructure. The role focuses on enhancing the reliability, scalability, and performance of digital payment platforms through automation, monitoring, and performance optimization.

Key Responsibilities:

  • Kubernetes Management: Deploy, scale, and optimize Kubernetes clusters.
  • AWS Cloud: Utilize AWS services (EC2, S3, EKS, RDS, etc.) for scalable infrastructure.
  • Automation & IaC: Develop workflows with Terraform, CloudFormation, or Ansible.
  • CI/CD Pipelines: Build and maintain CI/CD pipelines (e.g., Jenkins, GitLab CI).
  • Monitoring: Implement observability tools like Prometheus, Grafana, or CloudWatch.
  • Incident Management: Troubleshoot issues, conduct root cause analysis, and ensure system resilience.
  • Security: Apply best practices for securing infrastructure and compliance.

Required Qualifications:

  • Kubernetes: Expertise in multi-cluster management and optimization.
  • AWS: Proficiency in key AWS services and cloud best practices.
  • IaC Tools: Hands-on experience with Terraform or similar tools.
  • Scripting: Automation skills using Python, Bash, or Go.
  • Monitoring: Familiarity with observability stacks (ELK, Grafana).
  • Collaboration: Strong communication and teamwork abilities.

Preferred Qualifications:

  • AWS and Kubernetes certifications (e.g., CKA, AWS Solutions Architect).
  • Experience with service mesh (e.g., Istio) and microservices.
  • Knowledge of cost optimization for AWS infrastructure.

Related Jobs