SRE

added job

2022-07-26 03:30:50

Location

Type

Status

Open

United States Remote

Contract · Entry level
1,001-5,000 employees · Internet Publishing

Required Experience

Design and Implement Full Stack Java based custom tooling solutions aimed at automating and optimizing away toil.
Instantiate Site Reliability Engineering practice at Fannie Mae igniting the practice, principles, and culture leading by example. Assist in training skilled peer er and partnering with peer platform embedded SRE teams.
Introduce enterprise capabilities, tools, and innovation improving availability in a multi-cloud ecosystem by evolving observability, monitoring, logging, dashboard visualization, CI/CD integration, continuous testing (performance, smoke, regression, functional, chaos) introduce continuous improvement, standardization/automation, capabilities to conduct destructive and resiliency testing
Introducing self-healing and autonomic capabilities solving for complex operational and systemic issues with precision including building and training models, automating cognitive processes, leveraging cutting edge technologies to improve availability of products we provide to customers
Automate key SRE metrics and IT Service Operations processes including customer impact, % availability of critical business flows, SLO/SLI adherence, error budget, automate incident process for IT Service Operations through data integrating with unified communications, alerting/notification systems.
Share support responsibilities for critical applications and customer journeys on-boarded to SRE including remediation of issues through Agile
Proven Technical Expertise with one or more of the following:
Software Development: Java/J2EE, REST, Micro Services, Messaging Technologies like Kafka or MQ, JavaScript frameworks like React or Bootstrap, SQL
OS and Platform - Linux; Cloud Technologies AWS, Google Cloud Platform or Azure; Container platforms
Cl/CD and Automation: Jenkins, Gitlab, SonarQube, Artifactory
Observability and AIOPS: Grafana, Prometheus, ELK or SPLUNK, Jaeger or Zip kin, AppDynamics, Dynatrace or similar
Experience in one or more of the following areas is desired:
AIOPS: Big Panda, Moogsoft, Artificial Intelligence (Al) and Machine learning (ML) Frameworks
Testing: Gremlin, Chaos Monkey, Chaos tool kit, JMeter, Blaze meter, Load runner
Excellent problem-solving skills and proactivity in resolving issues / blockers
Excellent verbal / written communication skills, relationship management skills, and ability to collaborate with multiple stakeholders
Eagerness to learn and ability to work independently with minimal guidance

Desired Experience

Bachelor’s Degree in Computer Science, Management Information Systems (MIS), Systems Engineering, or related field
10+ years of Full stack engineering experience or experience in Scripting for Test Automation
Experience in conducting failover and failback exercises, blue green deployments
Good to have deep knowledge of Service Now, Splunk, Dynatrace
Cloud Developer or Architect Certification
Experience with Scaled Agile Framework (SAFe) and Jira / Confluence
Experience in conducting disaster recovery plans and executing failover tests
Understanding of Java performance monitors (JVM, Heap Size, Message Broker)
Experience creating JMeter and Selenium scripts

United States Remote Contract · Entry level 1,001-5,000 employees · Internet Publishing Required Experience Design and Implement Full Stack Java based custom tooling solutions aimed at automating and optimizing away toil. Instantiate Site Reliability Engineering practice at Fannie Mae igniting the practice, principles, and culture leading by example. Assist in training skilled peer er and partnering with peer platform embedded SRE teams. Introduce enterprise capabilities, tools, and innovation improving availability in a multi-cloud ecosystem by evolving observability, monitoring, logging, dashboard visualization, CI/CD integration, continuous testing (performance, smoke, regression, functional, chaos) introduce continuous improvement, standardization/automation, capabilities to conduct destructive and resiliency testing Introducing self-healing and autonomic capabilities solving for complex operational and systemic issues with precision including building and training models, automating cognitive processes, leveraging cutting edge technologies to improve availability of products we provide to customers Automate key SRE metrics and IT Service Operations processes including customer impact, % availability of critical business flows, SLO/SLI adherence, error budget, automate incident process for IT Service Operations through data integrating with unified communications, alerting/notification systems. Share support responsibilities for critical applications and customer journeys on-boarded to SRE including remediation of issues through Agile Proven Technical Expertise with one or more of the following: Software Development: Java/J2EE, REST, Micro Services, Messaging Technologies like Kafka or MQ, JavaScript frameworks like React or Bootstrap, SQL OS and Platform - Linux; Cloud Technologies AWS, Google Cloud Platform or Azure; Container platforms Cl/CD and Automation: Jenkins, Gitlab, SonarQube, Artifactory Observability and AIOPS: Grafana, Prometheus, ELK or SPLUNK, Jaeger or Zip kin, AppDynamics, Dynatrace or similar Experience in one or more of the following areas is desired: AIOPS: Big Panda, Moogsoft, Artificial Intelligence (Al) and Machine learning (ML) Frameworks Testing: Gremlin, Chaos Monkey, Chaos tool kit, JMeter, Blaze meter, Load runner Excellent problem-solving skills and proactivity in resolving issues / blockers Excellent verbal / written communication skills, relationship management skills, and ability to collaborate with multiple stakeholders Eagerness to learn and ability to work independently with minimal guidance Desired Experience Bachelor’s Degree in Computer Science, Management Information Systems (MIS), Systems Engineering, or related field 10+ years of Full stack engineering experience or experience in Scripting for Test Automation Experience in conducting failover and failback exercises, blue green deployments Good to have deep knowledge of Service Now, Splunk, Dynatrace Cloud Developer or Architect Certification Experience with Scaled Agile Framework (SAFe) and Jira / Confluence Experience in conducting disaster recovery plans and executing failover tests Understanding of Java performance monitors (JVM, Heap Size, Message Broker) Experience creating JMeter and Selenium scripts

0 Comments 0 Shares