Job description

Business: Emerging Technology

Open Positions: 1

Recruiter Name: Lampson Liang

Location: GZ

  

Why join us?

This role is within the Group Emerging Technology, Innovation and Ventures (ETIV), you will join a growing team to work with a wide range of experience engineers, product managers, production support specialist supporting our Group AI offering (e.g. speech transcription, translation, knowledge management) for scaled consumption across our Group Business and Functions.

  

The Opportunity:

We are seeking a seasoned Senior Site Reliability Engineer (SRE) with a minimum of 10 years of IT experience to drive reliability, scalability, and performance initiatives across our critical production environments. As a senior member of our team, you will design and implement solutions to ensure the health, availability, and continuous improvement of our infrastructure and services, acting as a technical leader and mentor in SRE methodologies and best practices.

  

What you’ll do:

  • Lead complex troubleshooting and root cause analysis efforts for incidents impacting production, driving rapid resolution and long-term prevention.

  • Design, architect, and enhance scalable, highly available, and secure infrastructure leveraging cloud, container, and orchestration technologies (e.g., AWS/GCP/Azure, Kubernetes, Docker).

  • Champion the adoption and refinement of SRE practices—defining and measuring SLIs/SLOs, establishing error budgets, and automating operational processes to minimize toil.

  • Develop and maintain comprehensive monitoring, logging, and alerting systems using modern observability tools (e.g., Prometheus, Grafana, ELK, Datadog, Splunk).

  • Drive advancements in deployment automation, CI/CD pipelines, infrastructure-as-code (Terraform, Ansible, Helm, etc.), and configuration management.

  • Guide, mentor, and coach junior SREs and engineers, fostering a culture of knowledge sharing, reliability, and continuous learning.

  • Collaborate with software development, QA, product, and operations teams to embed reliability, scalability, and security considerations throughout the software development lifecycle.

  • Participate in and lead on-call rotations, review and improve incident response processes, and perform blameless postmortems.

  • Identify, prioritize, and lead large-scale system improvements and engineering projects that enhance reliability and operational efficiency.

  • Author and maintain thorough documentation, runbooks, and knowledge bases.

Requisitos
  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related technical field, or equivalent experience.

  • At least 10 years of hands-on experience in IT, with significant experience focused in SRE, DevOps, Production Support, or related roles.

  • Advanced hands-on expertise with containers (Docker, Kubernetes), cloud platforms (AWS, GCP, Azure), and orchestration technologies.

  • Strong background in Linux system administration, networking, and security.

  • Deep experience with monitoring, log management, and observability platforms.

  • Fluent in at least one programming or scripting language (Python, Bash, Go, etc.).

  • Track record of driving large-scale automation and optimization projects.

  • Experience implementing and maturing SRE principles at the organizational level.

  • Excellent problem-solving, analytical, communication, and mentoring skills.

  • Proven ability in high-availability, mission-critical, and/or 24x7 operational environments.

  

What additional skills will be good to have?

  • Experience with infrastructure-as-code. (Terraform, Ansible, or similar tools)

  • Experience working in 24x7 or high-availability production environments.

  • SRE and cloud certifications. (e.g., GCP Professional SRE, AWS DevOps Engineer, CKA/CKAD)

  • Experience with microservices, distributed systems, and high-throughput architectures.

  • Experience with AIOps to optimize production operations.

  

Link to Candidate User Guide:

https://hsbchrdirect.service-now.com/esc?id=kb_article&table=kb_knowledge&sysparm_article=KB0184596&sys_kb_id=712b5f041b1f82908d3a0fe7ec4bcbd6&searchTerm=IJP

  

You’ll achieve more at HSBC

HSBC is an equal opportunity employer committed to building a culture where all employees are valued, respected and opinions count. We take pride in providing a workplace that fosters continuous professional development, flexible working and, opportunities to grow within an inclusive and diverse environment. We encourage applications from all suitably qualified persons irrespective of, but not limited to, their gender or genetic information, sexual orientation, ethnicity, religion, social status, medical care leave requirements, political affiliation, people with disabilities, color, national origin, veteran status, etc., We consider all applications based on merit and suitability to the role.

Personal data held by the Bank relating to employment applications will be used in accordance with our Privacy Statement, which is available on our website.

  

***Issued By HSBC Software Development (GuangDong) Limited***

Nombre del recruiter
Pei Zhi Liang
Email del recruiter
lampson.p.liang@hsbc.com.cn