Site Reliability Engineer

Perennial Systems

Pune,
Applied 69
Application Deadline 6 days left
Impressions 8,547
Eligibility
Experienced Professionals

Refer & Win

MacBook, iPhone, Apple Watch,
Cash and more!

Refer now Know more

Recruitment Process

Details

Role Overview:

  • We are seeking a Site Reliability Engineer (SRE) with strong expertise in observability and monitoring using Datadog. The ideal candidate will be responsible for ensuring system reliability, performance, scalability, and availability across cloud-native and distributed environments by leveraging Datadog for monitoring, alerting, and incident management.

Key Responsibilities:

  • Design, implement, and maintain comprehensive monitoring and observability solutions using Datadog (APM, Infrastructure Monitoring, Logs, RUM, Synthetic Monitoring).
  • Define and track SLIs, SLOs, and SLAs; build dashboards and alerts aligned with reliability and business goals.
  • Proactively identify performance bottlenecks, system anomalies, and availability risks using Datadog metrics and traces.
  • Lead incident response, root cause analysis (RCA), and postmortems; improve alert quality and reduce noise.
  • Collaborate with engineering, DevOps, and security teams to improve system resilience and operational excellence.
  • Automate monitoring, alerting, and remediation workflows using Infrastructure as Code (Terraform/CloudFormation).
  • Support on-call rotations and continuously improve reliability practices.
  • Integrate Datadog with CI/CD pipelines and cloud services for end-to-end visibility.

Required Skills & Experience:

  • 3+ years of experience as an SRE, DevOps Engineer, or Production Support Engineer.
  • Strong hands-on experience with Datadog (dashboards, monitors, APM, logs, synthetics).
  • Solid understanding of SRE principles: error budgets, toil reduction, availability, latency, and reliability.
  • Experience with cloud platforms (AWS, Azure, or GCP).
  • Proficiency in Linux/Unix systems and networking fundamentals.
  • Experience with containers and orchestration (Docker, Kubernetes).
  • Scripting experience in Python, Bash, or Go.
  • Familiarity with incident management and on-call best practices.

Good to Have:

  • Experience implementing custom Datadog metrics and distributed tracing.
  • Knowledge of CI/CD tools (Jenkins, GitHub Actions, GitLab CI).
  • Experience with configuration management and IaC tools (Terraform, Ansible).
  • Exposure to security monitoring and compliance observability.
  • Prior experience scaling high-traffic, distributed systems.
If an employer asks you to pay any kind of fee, please notify us immediately. unstop does not charge any fee from the applicants and we do not allow other companies also to do so.

Important dates & deadlines?

  • 3 Feb'26, 12:00 AM IST Registration Deadline

Contact the organisers

Send queries to organizers

Additional Information

Job Location(s)

Pune

Salary

Salary: Not Disclosed

Work Detail

Working Days: 5 Days

Job Type/Timing

Job Type: In Office

Job Timing: Full Time

Voice your opinion by leaving a feedback & your rating

*This opportunity has been listed by Perennial Systems . Unstop is not liable for any content mentioned in this opportunity or the process followed by the organizers for this opportunity. However, please raise a complaint if you want unstop to look into the matter.
Raise a Complaint