Kairos
Back to jobs

Network Operations Center (NOC) Analyst

On-site
Lightning AIFort Worth, TX, US / Lisle, IL, US4 weeks agoWebsite
Fresh
Data Center Operations

Compensation

$85,000-$100,000
Apply
Share

Description

What We're Looking For

Lightning AI is seeking Network Operations Center (NOC) Analysts to support 24/7 operations across select high-performance compute data centers with advanced monitoring infrastructure. This is a technically focused role centered on telemetry analysis, infrastructure monitoring, and independent diagnosis of compute, network, and hardware systems.

You will serve as the first line of technical response — analyzing telemetry signals, diagnosing system anomalies, and troubleshooting Linux and network-layer issues before escalating with clear, actionable findings. You will operate with a high degree of independence, applying sound judgment in situations that often extend beyond predefined runbooks.

This role offers a clear pathway toward positions in network engineering, site reliability, or data center operations, and the opportunity work with next-generation AI hardware and some of the most advanced compute infrastructure deployed today.

This role is based onsite at one of our data center facilities in Lisle, IL; Fort Worth, TX; or Quincy, WA. Shift flexibility is required to support our 24/7 operations environment. We are not able to provide visa sponsorship for this position at this time.

What You'll Do

  • Monitor data center systems using telemetry data, dashboards, and alerting tools to detect anomalies and emerging issues
  • Perform independent technical diagnosis across Linux systems, network connectivity, and hardware health using command-line tools, logs, and diagnostic utilities
  • Troubleshoot network-layer issues including connectivity, routing, and interface errors
  • Triage and escalate incidents to the appropriate teams (hardware, network, SRE) with technically accurate summaries, relevant logs, and telemetry findings
  • Create and maintain detailed tickets documenting diagnostic steps, technical findings, and observed system behavior
  • Identify recurring alert patterns through telemetry analysis and surface findings to improve monitoring coverage and reliability

 

What You'll Need

Required Qualifications

  • Hands-on Linux experience including command-line proficiency and system log analysis
  • Practical understanding of networking concepts: TCP/IP, DNS, routing, and diagnostic tools (ping, traceroute, netstat, tcpdump)
  • Ability to independently diagnose technical issues and exercise sound judgment in ambiguous situations
  • Clear, precise communication skills with strong technical documentation ability
  • Availability to work overnight and rotating shifts in a 24/7 environment

Ideal Experience

  • Experience with Grafana, Datadog, or Prometheus
  • Familiarity with HPC or GPU-based infrastructure
  • Scripting experience in Bash or Python

 

Compensation

Stack

PythonGPU
Posted
May 27, 2026
Last seen
Jun 26, 2026
First seen
Jun 26, 2026
Status
active