VNG Career Site Header

Chia sẻ việc làm

  • Logo Footer
  • Logo Footer
Tìm công việc

Senior Site Reliability Engineer, VNG Solutions

OfficialTechSystem24-Solution-1301
locationtp.hồ chí minh
Xem mô tả bằng
Tiếng Anh

Mô tả công việc

VNG Solutions is a subsidiary of VNG Corporation, a reputable Vietnamese provider of IT services and technology on a global scale. With its key markets in the Middle East, the US, and Europe, VNG Solutions' headquarters in Vietnam is dedicated to providing enterprise-level services and empowering customers and society to advance decisively into the digital future.

VNG Solutions provides top-notch services while strictly adhering to international standards. We remain in the public eye as experts in "the next big technologies”. VNG Solutions will provide you with a creative environment with an emphasis on B2B services, where you will have the opportunity to foster your abilities and learn about various technologies to advance your career.
  • Lead deployment and management of web applications, ensuring stability, scalability and reliability.
  • Design and manage hybrid environment reliability solutions (cloud and on-premises), optimizing for availability and performance.
  • Implement Infrastructure as Code (IaC) with Terraform to enhance system scalability and maintainability.
  • Orchestrate containerized applications using Kubernetes, focusing on efficient deployment and runtime management.
  • Administer and optimize software, including GIS, and databases, maintaining data integrity and high performance.
  • Analyze and mitigate service disruptions, developing strategic preventative measures to minimize downtime.
  • Participate in network management, using superior understanding of network engineering principles.
  • Participate in evaluation and integration of new technologies, enhancing service reliability and operational capabilities.
  • Develop and automate critical system health metrics, using tools like Riverbed and ELK stack.
  • Manage major incident response efforts, ensuring effective resolution to maintain system stability.
  • Coordinate with cross-functional teams to align SRE practices with business objectives and IT standards.
  • Create and review technical documentation for system architecture and operational procedures.
  • Assure regulatory compliance and security assessments, implementing best practices to protect system integrity.
  • Participate in pager-duty rotations, resolving critical incidents.
  • Lead deployment and management of web applications, ensuring stability, scalability and reliability.
  • Design and manage hybrid environment reliability solutions (cloud and on-premises), optimizing for availability and performance.
  • Implement Infrastructure as Code (IaC) with Terraform to enhance system scalability and maintainability.
  • Orchestrate containerized applications using Kubernetes, focusing on efficient deployment and runtime management.
  • Administer and optimize software, including GIS, and databases, maintaining data integrity and high performance.
  • Analyze and mitigate service disruptions, developing strategic preventative measures to minimize downtime.
  • Participate in network management, using superior understanding of network engineering principles.
  • Participate in evaluation and integration of new technologies, enhancing service reliability and operational capabilities.
  • Develop and automate critical system health metrics, using tools like Riverbed and ELK stack.
  • Manage major incident response efforts, ensuring effective resolution to maintain system stability.
  • Coordinate with cross-functional teams to align SRE practices with business objectives and IT standards.
  • Create and review technical documentation for system architecture and operational procedures.
  • Assure regulatory compliance and security assessments, implementing best practices to protect system integrity.
  • Participate in pager-duty rotations, resolving critical incidents.
Note: The position may require international travel for periods of 3 to 6 months continuously. Candidates will be required to accept this requirement as part of the positions

Yêu cầu

  • Over 7 years of experience in IT, with advanced expertise in network engineering and system administration.
  • Over 5 years of experience in site reliability practices, any experience with GIS platforms is a plus.
  • Demonstrated excellence in network management, advanced troubleshooting, and system optimization, with a focus on enhancing efficiency and reducing downtime.
  • Over 4 years of experience with cloud environments and containerization technologies, including designing and implementing scalable, resilient infrastructure solutions using platforms like GCP, and Kubernetes.
Qualifications:
  • Bachelor’s degree in Computer Science, Information Technology, or a related field.
  • Cisco Certified Network Associate (CCNA) is a plus.
  • Certified Kubernetes Administrator (CKA) is a big plus.
  • Strong skills in scripting and automation, particularly with Python and Bash.
  • Experience with monitoring and logging tools such as Riverbed and ELK Stack.
  • Written and spoken English communication skills at CEFR B1 level or above.