How you’ll make an impact
You will be part of the team who provide 24/7/365 support for our customer-facing applications to achieve new levels of operational performance. In this role, you will work closely with the product teams to agree shared objectives, build and implement sophisticated monitoring and remediation toolsets, and create a culture focussed on continually improving the operation of our platform and applications.
- Design and build tools for application deployment, system automation, and configuration management
- Work closely with other software engineers, cloudops, devops, product managers and QA personnel to deliver cutting edge cloud solutions.
- Create systems to allow product/development teams to self-service their AWS infrastructure needs
- Daily interaction with product development and platform teams (Networking, Security, and Customer Tech Support teams)
- Consult with teams on best practices for AWS platform and end-to-end lifecycle system deployments
- Create and implement automation processes and standards for AWS cloud services
- Ensure all cloud infrastructure components meet proper availability, cost, performance and security standards
- Create scalable alerting and auto remediation systems
- You will share an on-call rotation backed by all our product teams.
- Perform advanced troubleshooting and monitoring of our systems to ensure adequate SLA and capacity requirement.
- Help define the tools and philosophies used among the team around deployment, monitoring, testing and security.<span
At a minimum, you should have:
- 4+ years knowledge and experience of OOP, software design patterns, project build tools, automated build processes and source control
- 4+ years of software development experience in a modern programming languages (Golang, Python, c#, Java, Ruby and C++)
- Experience with building reliable and robust software that tolerates and recovers from unreliable dependencies.
- Expert level experience and software engineering expertise (coding, automated tests, profiling, etc).
- Experience running systems on Amazon Web Services (AWS)
- Good experience with full lifecycle system deployments from requirements gathering to design, implementation, unit testing, system testing, release and ongoing service management.
- Strong knowledge of administering Windows Server 2012 or later
- Experience with common networking protocols and services including TCP, UDP, DNS, DHCP, HTTP, and LDAP
- Familiarity with agile software development methods
- A strong desire to automate processes, build tooling, and create infrastructure-as-code solutions using languages and frameworks such as Terraform, Golang, Ruby, PowerShell
- A good understanding of large-scale distributed systems, including multi-tier architectures, security, and monitoring
- Able to effectively work in high pressure situations and handle competing priorities
- Working knowledge of repositories such as GitLab, Artifactory and GitHub. Log aggregation tools like Sumologic.
- Experience with continuous integration/continuous deployment tools (Octopus Deploy, Jenkins, Bamboo, TeamCity, AWS Codedeploy, etc.)
Bonus points if you have:
- AWS certification is a big plus
- Experience administering and automating UNIX/Linux operating systems
- Experience with Windows clusters, especially multi-datacenter deployments
- Experience working with business critical operations in a geographically distributed team
- Experience with risk and impact assessment
- Proven ability to exceed goals in an innovative environment with a high rate of change