Job Responsibilities:
• Deploy, configure, monitor, tune, and troubleshoot Linux and Windows servers.
• Manage daily operations and capacity planning for virtualization platforms (e.g., VMware, KVM) and cloud environments (e.g., AWS, Azure, Alibaba Cloud).
• Develop and execute system backup and recovery strategies, regularly validating data recoverability.
• Implement system hardening, vulnerability remediation, and compliance configurations.
• Ensure the stability and reliability of middleware and runtime environments that support business applications.
• Install, configure, and maintain web servers (Nginx, Apache) and application servers (Tomcat, WebLogic).
• Collaborate with application teams on system deployments, version releases, and issue resolution.
• Monitor system and application performance, identify bottlenecks, and recommend optimizations.
• Improve operational efficiency and enhance automation in system management.
• Develop scripts (Shell, Python, Ansible, etc.) to automate routine operational tasks.
• Maintain and optimize monitoring and alerting systems (e.g., Zabbix, Prometheus) to ensure observability of key metrics.
• Participate in the selection, deployment, and improvement of the operations toolchain.
• Contribute to system architecture design and disaster recovery planning to ensure high availability.
• Review new system architectures and provide operations-oriented recommendations.
• Implement high-availability solutions (e.g., clustering, load balancing) and conduct disaster recovery drills.
• Prepare operations technical documentation and emergency response plans.
Job Requirements:
• Minimum of 3 years of relevant experience; financial industry experience is a plus.
• Familiarity with container technologies, architectures, and principles; experience with Kubernetes and Docker architectures is preferred.
• Knowledge of mainstream data disaster recovery architectures and platforms; experience with “two locations, three centers” setups is a plus.
• Expertise in maintaining, performance tuning, configuration management, and securing Windows and Linux operating systems.
• Proficiency in scripting languages such as Shell and Python for daily system maintenance.
• Familiarity with infrastructure platforms like VMware and automation tools such as Zabbix, Puppet, and Ansible.
• Strong communication and coordination skills, quick learning ability, agile thinking, and resilience under pressure.