When you join Verizon

Verizon is a leading provider of technology, communications, information and entertainment products, transforming the way we connect across the globe. We’re a diverse network of people driven by our ambition and united in our shared purpose to shape a better future. Here, we have the ability to learn and grow at the speed of technology, and the space to create within every role. Together, we are moving the world forward – and you can too. Dream it. Build it. Do it here.

What you’ll be doing...

In this role, you will lead a cross-functional team that develops the SRE application monitoring framework and practice all tenets of SRE, vision and technical leadership to enable the execution of best in class monitoring practices that would improve reliability of applications.

  • The incumbent should be a strong Technical lead to help execute on our vision for Site Reliability Engineering (SRE), determining how each system relates to each other and using a breadth of tools, build monitoring framework and automation to improve reliability for customers. Practices, such as limiting time spent by operations on alert notifications, alert correlation, proactively identifying potential production issues, factor in the iterative improvement.
  • Create plans for Intelligent alerting for more actionable responses and help with reducing MTTD and MTTR for all critical applications
  • Develop and implement the next-gen monitoring solution for the enterprise based on SRE principles and practices.
  • Develop an effective data-driven approach for monitoring and alerting that enables the SRE team to maintain high availability and deliver a high quality of service.
  • Develop a log monitoring framework based on exceptions in access logs , server logs and platform logs
  • Analyze data to understand customer experience and usage patterns to identify gaps in current monitoring
  • Work with SRE and dev engineers to fine tune alert thresholds, increase alert effectiveness by event correlation and pattern recognition.
  • Develop and onboard new monitoring features and capabilities for critical metrics and transition to operations
  • Define standards, guidelines and templates for operational and business dashboards and metrics alerting
  • Provide implementation, configuration and ongoing performance enhancements for ELK Logging platform in the on-prem & AWS environments.
  • Develop robust alerting system that can identify problematic anomalies and minimizes false alarms
  • Practice sustainable incident response and blameless postmortems.

What we’re looking for...

You’ll need to have:

  • Bachelor’s degree or four or more years of work experience.
  • Four or more years of relevant work experience.
  • Four or more years of experience in applications development, infrastructure, or database architectures
  • Strong knowledge of SRE practices and principles to build resilient systems and to provide business continuity.
  • Four or more years of experience in the monitoring space - APM, Infrastructure, Logging, Tracing and AIOps
  • Experience with creating rich Grafana, Kibana and new Relic visualizations and dashboards for providing key metric monitoring information to users and support staff.

Even better if you have:

  • A degree.
  • Six or more years of relevant work experience.
  • Five or more years of experience in applications development, infrastructure, or database architectures
  • Five or more years of experience in the monitoring space - APM, Infrastructure, Logging, Tracing and AIOps.
  • Experience with development of log monitoring framework based on exceptions parsing.
  • Experience with New Relic APM and infrastructure alerting/monitoring development by metric data extraction using New Relic query language.
  • Experience in installing, configuring and maintaining Elasticsearch, Logstash, and Kibana logging platform.
  • Automation experience and ability to code or script at an advance level.
  • Experience in Systems Architecture, in-depth knowledge on SRE, IT Operations, Cloud, Coding and Scripting experience with Java, JavaScript, python and .NET, Ansible , Cloudformation, Ruby, understanding of AI/ML.
  • Experience in Cloud & Container platform monitoring and excelling in delivering high-value solutions in dynamic and ambiguous environments.
  • Leading medium to large projects by bringing together the right perspectives, identifying roadblocks, and integrating feedback from clients and team members.
  • Intellectual curiosity, problem solving and collaboration skills.
  • Experience in vendor management.
  • Experience in IT Security and compliance, operations and network services, and application development.
  • Serving as both a mentor and advocate for your team.

Equal Employment Opportunity

We're proud to be an equal opportunity employer - and celebrate our employees' differences, including race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, and Veteran status. At Verizon, we know that diversity makes us stronger. We are committed to a collaborative, inclusive environment that encourages authenticity and fosters a sense of belonging. We strive for everyone to feel valued, connected, and empowered to reach their potential and contribute their best. Check out our diversity and inclusion page to learn more.