Site Reliability Engineer
Contract to Hire role in St. Louis, MO
We are seeking professionals to be responsible for overall system operation, utilizing a breadth of tools and approaches to solve a broad set of problems.
• Engage in and improve the software development life cycle – from inception and design, through development, deployment, operation and refinement
• Influence and design infrastructure, architecture, standards and methods for large-scale systems
• Support services prior to production via infrastructure design, software platform development, load testing, capacity planning and launch reviews
• Maintain services during deployment and in production by measuring and monitoring key performance and service level indicators including availability, latency, and overall system health
• Automate system scalability and continually work to improve system resiliency, performance and efficiency
• Practice sustainable incident response as part of an on-call rotation and through blameless postmortems
• Re-mediate tasks within corrective action plan via sustainable, preventative, and automated measures whenever possible
• Work with cloud operations team to resolve trouble tickets, developing and running scripts, and troubleshooting.
• Create new tools and scripts designed for auto-remediation of incidents.
• Design/Implementation of Big Data technologies, including Hadoop, MongoDB, Kafka, RabbitMQ, Zookeeper, Spark, ELK, etc
• BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent practical experience
• 7 years of experience in software development and maintenance.
• 2+ years of experience developing and/or administering software in public cloud
• Experience in monitoring infrastructure and application up time and availability to ensure functional and performance objectives.
• Experience with Web service technologies, including REST, SOAP, JSON, XML
• Experience with database (RDBMS, NoSql) technologies is a plus.
• Demonstrable cross-functional knowledge with systems, storage, networking, security and databases
• System administration skills, including automation and orchestration of Linux/Windows using Chef, Puppet, Ansible, Salt Stack and/or containers (Docker, Kubernetes, etc.)
• Proficiency with continuous integration and continuous delivery tooling and practices
• Strong analytical and troubleshooting skills
• Expertise designing, analyzing and troubleshooting large-scale distributed systems.
• Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive
• Experience managing Infrastructure as code via tools such as Terra form or Cloud-formation
• A passion for automation with a desire to eliminate toil whenever possible
• Experience building software or maintaining systems in a highly secure, regulated or compliant industry
• Experience and passion for working within a DevOps culture and as part of a team