Site Reliability Engineer

Site Reliability Engineer

Experiência

--

Tipo de Emprego

Full-time

Posição

--

Oferta Salarial

Descrição da Oferta de Emprego

Blip is a top-of-the-edge Portuguese IT company, focused on software engineering solutions for sports betting and gaming.  

As part of the Flutter Entertainment group, we are an essential piece of the business, delivering safe and entertaining websites, mobile apps, and retail systems for over 7.6 million monthly customers around the globe.   

Creating safe and fun software for sports betting and gaming it’s what we do. But the way we do it makes us one of the top options when choosing the best place to boost your career. Our Agile-oriented mindset, together with the most exciting technology and a team of 500 bold and inspiring people, will drive your star-quality skills to another level.  

We bet on people first. That’s why employer branding and flexible practices are cornerstones of our working culture. And our working culture is more than job benefits, it empowers you to come as you are and find the perfect balance between your life and your working challenges. We focus on autonomy, diversity, lifelong learning, and work-life balance.  

The Role

As Site Reliability Engineer in our UK&I division, you'll be accountable for closely monitoring the availability of our platforms, performance and stability while closely working with software development teams in how to improve critical components.

Working with complex challenges while assuring uptime and reliability in different setups (AWS Cloud, AWS Outpost, OpenStack) allows you to use different skillsets in coding, algorithms and complexity analysis.

What will you be doing?

  • Engage in and improve the whole lifecycle of services—from design, deployment, operation, and refinement.
  • Take an active part in production problems root cause investigation, identification, and resolution (where necessary)
  • Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning, and launch reviews.
  • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
  • Be an active part of performance and capacity testing;
  • Optimize reliability monitoring & alerting;
  • Scale systems sustainably through mechanisms like automation; evolve systems by pushing for changes that improve reliability and velocity.
  • Iteratively perform Auditing of performance and reliability vulnerabilities;
  • Define and revise Service Level Indicators (SLIs);
  • Practice sustainable incident response and blameless postmortems.

We are looking for someone who:

  • Has experience with Operating Systems & Networking knowledge;
  • Has experience with programming languages such as Python, Java or Go;
  • Has experience working with public cloud providers;
  • Has experience working with microservices architectures;
  • Has experience working with message queuing services and databases;
  • Has experience with Configuration Management tools such chef and ansible;
  • Has knowledge of Monitoring Solutions like Datadog and Splunk;
  • Familiar with CD/CI pipelines comprising Jenkins, Git, Artifactory or others.

This is what you should have. What do we have, you ask? Well...you can check our amazing perks & benefits right here!

So ... Are you in?

Apply Here