Overview
Job Opportunity : APPs Observability Specialist
Main Responsibilities
- Observability Setup and Configuration : Implement and configure observability tools to monitor applications, infrastructure, and systems (e.g., Dynatrace, New Relic, DataDog, Splunk, OpenTelemetry, or ELK Stack). Ensure end-to-end visibility by configuring metrics, logging, and distributed tracing for enterprise applications.
- Real-Time Performance Monitoring : Monitor application performance, response times, error rates, and system health in real time. Detect, analyze, and escalate anomalies or issues affecting application reliability.
- Troubleshooting and Root Cause Analysis : Help diagnose application performance issues by analyzing telemetry data, log patterns, and performance metrics. Provide root cause analysis (RCA) reports with detailed insights to improve long-term stability.
- Incident Management and Collaboration : Participate in incident detection and resolution processes, ensuring minimal downtime or degradation. Work alongside application developers, DevOps, and SRE teams to resolve issues efficiently.
- Application Optimization : Identify bottlenecks or areas for performance optimization based on observability data. Develop recommendations for application scaling, resource allocation, or code improvements.
- Configuration and Tuning : Fine-tune observability dashboards, alerts, and thresholds to align with business-critical SLAs. Regularly update observability tools to support evolving applications and infrastructure landscapes.
- Documentation and Communication : Create and maintain documentation for observability processes, application health policies, and troubleshooting guides. Provide performance insights and weekly / monthly reports to stakeholders.
- Proactive Improvements : Implement predictive analysis and AI / ML-powered techniques to foresee potential performance degradations. Suggest and implement best practices for observability and monitoring across the organization.
Note : This description preserves the original responsibilities while improving structure and readability, using allowed HTML tags only.
#J-18808-Ljbffr