Find A Job

Senior Observability Engineer

San Francisco, California | IT
Job ID: 103780
Listed on 2/18/2021

KellyMitchell matches the best IT and business talent with premier organizations nationwide. Our clients, ranging from Fortune 500 corporations to rapidly growing high-tech companies, are exceptionally served by our 1500+ IT and business consultants. Our industry is growing rapidly, and now is a great time to launch your career with the KellyMitchell team.

Senior Observability Engineer

Job Summary: As a Senior Engineer, you will join a team of world-class, highly motivated, innovative, and collaborative engineers with deep domain expertise in cloud based solutions and tools. You have the opportunity to use your wealth of experiences in application monitoring and performance at enterprise levels. Our goal is to maintain a continuous improvement culture for the engineering organization, where software engineers can use deep operational insight to optimize their services and provide more value to customers.

Duties:

  • Work closely with our Engineers and teams to instrument their services with New Relic and Splunk functionality
  • Support Engineers during critical and time sensitive production events to ensure a timely resolution
  • Support Engineers in the Root Cause and Correction Of Error flow to ensure we detect and avoid future customer impacts
  • Supporting and nurturing process improvements and knowledge base improvements
  • Assist in the the generation of major incident metrics and reports
     

Desired Skills/Experience:

  • 5+ years of software development (Python or Java)
  • Familiarity with one of the monitoring tools NewRelic / Dynatrace/DataDog / Prometheus
  • Familiarity with one of the logging tools Splunk/Elasticsearch
  • Ability to assist engineering teams with training & tools to observe how their code and services are performing so they can easily optimize the experience for the end-user
  • Ability to develop monitoring dashboards that convey meaningful business data, software readiness, and service/system health tailored for different audiences suchs as engineers and executives
  • Ability to assist engineers in understanding how to optimize their services for utilization, response time, throughput, latency, relative, absolute, variance, instrumentation, parallel, asynchronous, utilization, bottleneck, tracing, resource contention, and MTTR
  • Strong working knowledge of microservices, containerization/Docker and orchestration/Kubernetes

 

*mjp123