available for opportunities · Dublin, Ireland

Salman
Hadi

Site Reliability Engineer · Incident Response Manager

7+ years keeping high-availability fintech and enterprise systems reliable, observable, and resilient. Specialist in monitoring strategy, incident management, and operational automation.

View Experience Get in Touch

Years Experience

Cloud Platforms

Observability Tools

scroll

work history

Experience

Feb 2022 — Present · Dublin, Ireland

FreedomPay

Senior Platform Operations Engineer (Site Reliability)

Lead production operations and reliability improvements for a global cloud-based payment platform in high-availability environments.
Design and implement monitoring strategies using Dynatrace, Splunk, and New Relic — reducing alert noise and improving incident detection across production services.
Build end-to-end observability across infrastructure and services for proactive monitoring and performance optimisation.
Lead major incident response and root cause analysis (RCA), coordinating cross-team investigations and defining long-term corrective actions.
Developed Python automation utilities to streamline incident management, including automated timeline capture and P1 incident template generation.
Develop production support readiness frameworks, improving release processes and operational stability across engineering teams.
Perform SQL-based investigations to validate data flows and troubleshoot service anomalies.
Facilitate weekly Change Advisory Board (CAB) reviews for maintenance window activities.

Jan 2019 — Feb 2022 · Dublin, Ireland

Version 1

Technology Consultant

Acted as SME and lead analyst for Revenue.ie customs applications, supporting large-scale transactional systems in national customs operations.
Worked closely with business stakeholders and delivery teams to analyse requirements, support implementations, and troubleshoot production issues.
Investigated data integrity and transactional issues using SQL across Java and COBOL systems in high-volume environments.
Designed Bash automation scripts to streamline operational tasks, reducing manual workload by 30+ hours per month.
Supported AWS cloud transformation projects and virtualisation environments (VMware, Hyper-V).
Delivered internal training and knowledge-sharing sessions to improve team capability and system understanding.

featured work

Projects

⚙️ Incident Timeline Automation Tool ↗

Built a Python utility that automatically captures and formats incident timelines during major outages, eliminating manual logging during high-pressure P1 situations and significantly reducing documentation effort post-incident.

Python Automation Incident Mgmt FreedomPay

📋 P1 Incident Template Generator ↗

Built a Python tool that generates structured P1 incident report templates pre-populated with service context, stakeholder lists, and escalation paths — cutting initial triage time and ensuring consistent communication during critical outages.

Python Incident Response Tooling FreedomPay

📡 Observability & Monitoring Strategy ↗

Designed and implemented a full-stack monitoring strategy across production payment services using Dynatrace, Splunk, and New Relic. Reduced alert noise by consolidating thresholds and introducing synthetic monitoring for critical user journeys.

Dynatrace Splunk New Relic SRE

🛡️ Production Support Readiness Framework ↗

Developed a structured readiness framework to assess and improve operational stability before major releases — covering monitoring coverage, runbook completeness, on-call preparedness, and rollback plans across engineering teams.

Operations Documentation Release Mgmt Fintech

🖥️ Bash Automation Suite — Revenue.ie ↗

Designed a suite of Bash scripts to automate repetitive operational tasks for national customs infrastructure at Version 1, saving 30+ engineering hours per month and reducing risk from manual processes in high-volume environments.

Bash Linux Automation Version 1

☁️ AWS Cloud Transformation Support ↗

Contributed to AWS cloud migration projects at Version 1, supporting workload transitions from on-premise VMware and Hyper-V environments to AWS cloud infrastructure while maintaining operational continuity for public sector clients.

AWS VMware Hyper-V Cloud Migration

capabilities

Skills & Tooling

☁️

Infrastructure

Cloud Platforms

AWSAzureGCPVMwareHyper-V

📡

Observability

Monitoring & Alerting

DynatraceSplunkNew RelicCloudWatchDatadogPingdom

⚡

Automation

Scripting & Querying

PythonBashSQLSPLDQL

🔴

Operations

Incident Management

Incident ResponseRoot Cause AnalysisCAB ReviewsSRE

🐧

Systems

Operating Systems

LinuxWindows Server

🛠️

Collaboration

Ticketing Tools

JiraZendeskAzure DevOpsConfluenceSharePoint

credentials

Certifications

☁

AWS Cloud Practitioner

Google Cloud Platform Fundamentals

Google SRE Fundamentals

Google Project Management

academic background

Education

2017 – 2018

MSc Cloud Computing

National College of Ireland

2011 – 2016

BEng Information Technology

University of Pune

open to new roles

Let's Build Something Reliable

Looking for a Senior SRE or Incident Response role in Dublin or remote. Let's connect.

✉ Email in LinkedIn ☎ +353 894 489 719