Design and implement robust network monitoring solutions capable of providing real-time insights into network performance and health.
Develop automation frameworks to streamline network management, including provisioning, configuration, and incident response.
Leverage Kubernetes for deploying and managing containerized network monitoring and automation applications, ensuring scalability and reliability.
AI/ML Integration:
Incorporate AI/ML technologies to enhance network monitoring, including predictive analytics for network performance, anomaly detection, and automated incident response.
Develop and implement machine learning models that can analyze network data and provide actionable insights for optimization.
Technical Leadership:
Lead the development and integration of innovative network monitoring and automation strategies.
Establish and enforce industry standards and best practices for software development within the network domain.
Development and Implementation:
Ensure seamless integration of monitoring and automation tools with existing network infrastructure and third-party systems.
Collaboration and Communication:
Work closely with cross-functional teams, including Network Reliability Engineers (NRE) and Service Reliability Engineers (SRE) to ensure the effective use of monitoring and automation solutions.
Performance Optimization:
Continuously monitor, optimize, and tune network monitoring and automation systems to meet required service levels.
Develop and implement automation scripts and tools to improve network incident response times.
Security and Compliance:
Integrate security considerations into the design of monitoring and automation solutions.
Ensure compliance with industry regulations, including the development of automated auditing and reporting features.
Continuous Improvement:
Provide training and mentorship to team members on the latest technologies and methodologies.
Requirements :
Bachelor’s or Master’s degree in Computer Science, Information Technology or a related field.
8+ years of experience in network architecture, with a focus on network monitoring and automation.
Strong programming skills in languages such as Python, Java, or Go
Extensive experience with network monitoring tools (e.g., Nagios, Prometheus), automation frameworks (e.g., Ansible, Puppet, Chef), and Kubernetes.
Deep knowledge of networking protocols such as BGP, VxLAN, SNMP, NetFlow, and streaming telemetry.
Experience with AI/ML technologies, particularly in network monitoring and predictive analytics.
Experience with DevOps practices and tools (e.g., CI/CD, Jenkins, Git).
Familiarity with AI/ML-based network monitoring and predictive analytics.
Knowledge of containerization technologies (e.g., Docker, Kubernetes).
Deep knowledge of networking protocols such as BGP, VXLAN, SNMP, NetFlow, and streaming telemetry.
Proven track record of designing and implementing scalable, high-performance network solutions.
Strong problem-solving skills and the ability to work in a fast-paced, dynamic environment.
Excellent communication skills, with the ability to convey complex technical concepts to non-technical audiences.
Industry certifications such as CCIE, ACE, JNCIE, or relevant cloud certifications are a plus.