Kairos
Back to jobs

Senior Solutions Architect, CSP System

On-site
NVIDIAShanghai, CN / CN18 hours agoWebsite
FreshRecently launched
Full-time
Senior

Compensation

Salary undisclosed
Apply
Share

Description

As a Senior GPU & AI Infra Expert focusing on Cloud Service Providers (CSPs) in China, you will be a core technical pillar in NVIDIA’s CSP SA team, responsible for driving GPU/AI Infra technical strategy, system-level solution optimization, and high-value customer engagement. You will work closely with major Chinese CSPs to address their critical demands on large-scale AI training/inference, Agentic AI, gaming AI, and distributed computing infrastructure. You will accelerate the mass deployment and performance maximization of NVIDIA GPU software/hardware stacks, and bridge technical gaps between CSP workload iteration and NVIDIA global engineering roadmap. This role requires deep expertise in GPU architecture, AI system optimization, cluster networking, and open-source AI infra contributions, with strong capability to deliver high-value technical outcomes for hyperscale data center workloads.

What you'll be doing:

  • Partner with Sales, BD and CPM teams to land NVIDIA GPU and AI Infra technologies into top-tier Chinese CSP accounts, drive technical penetration and sustainable business growth.

  • Serve as the primary technical authority for NVIDIA GPU system and AI infrastructure solutions for Chinese CSPs, providing end-to-end consultation on GPU cluster architecture design, AI workload deployment, heterogeneous computing tuning, and full-stack software stack optimization.

  • Unlock Vera CPU + GPU co-optimization value for RL training and Agentic AI workloads, eliminate CPU-GPU data movement bottlenecks, optimize end-to-end agent training and reasoning pipeline latency and throughput for CSP AI factory scenarios.

  • Lead open-source system architecture contributions for NVIDIA AI infra stacks, upstream optimized patches for key open-source projects, build China-localized best practices and shape industry technical standards.

  • Conduct in-depth GPU workload bottleneck analysis, implement system-level, kernel-level and framework-level tuning for AI training, inference, RL and gaming workloads, deliver production-ready reference designs and tuning guidelines for CSP mass deployment.

  • Act as the key technical liaison between Chinese CSP customers and NVIDIA global engineering, product and R&D teams, collect high-value local workload requirements, drive product roadmap iteration, and ensure full compliance with NVIDIA global technical policies and export compliance rules.

  • Lead technical workshops, hands-on training, PoC and production pilot projects for key CSP accounts, quantify and demonstrate GPU/AI Infra business value, accelerate technology adoption and large-scale replication.

  • Monitor cutting-edge industry trends including Agentic AI, LLM inference optimization, cloud gaming AI, and next-gen data center system architectures, output strategic technical insights to support team and product strategy formulation.

  • Mentor junior SA team members, standardize CSP technical engagement and solution delivery processes, and drive the precipitation of high-value technical best practices.

What we need to see:

  • Bachelor’s/Master’s/PhD degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field; equivalent industry experience is highly valued.

  • 8+ years of hands-on experience in GPU architecture, AI system optimization, large-scale data center infrastructure, or hyperscale cloud computing, with solid experience in AI training/inference, distributed computing or HPC workloads.

  • Deep understanding of GPU microarchitecture, CUDA programming model, GPU memory hierarchy and system scheduling mechanisms; proficient in performance profiling, bottleneck analysis and end-to-end AI workload tuning.

  • Strong programming proficiency in C/C++ and Python; familiar with CUDA kernels, compiler toolchains, AI framework optimization (PyTorch/TensorRT) and large-scale distributed system tuning.

  • Proven hands-on experience working with major Chinese CSPs or global hyperscalers, with in-depth knowledge of their public cloud AI service architectures, cluster operation mechanisms and core workload characteristics.

  • Excellent technical communication and presentation skills, capable of explaining complex GPU system and AI infra technologies to technical engineers, architecture teams and business stakeholders.

  • Strong cross-functional collaboration capability, able to work efficiently in a global matrix team and prioritize multiple high-value technical projects under fast-paced business demands.

  • Familiar with NVIDIA full-stack products (GPU data center hardware, TensorRT-LLM, Dynamo, NCCL, CUDA software stack) is a significant plus.

  • Hands-on engineering capability is mandatory; candidate must be result-oriented, self-driven and able to independently own end-to-end technical project delivery.

  • Committed, proactive, and capable of sustaining high-quality technical output for long-term strategic CSP projects.

Ways to stand out from the crowd:

  • Hands-on experience with Vera/Grace CPU + GPU heterogeneous co-optimization, familiar with AI agent and RL training system tuning.

  • In-depth experience in Dynamo LLM inference optimization, including KV Cache management, intelligent scheduling planner and dynamic resource scaling.

  • Open-source contribution experience in AI infra, GPU optimization libraries, or distributed computing frameworks with public upstream records.

  • Solid experience in Agentic AI, RL post-training or long-context LLM workload optimization on GPU clusters.

  • Familiar with semiconductor and data center technology export compliance requirements in China market.

  • Proven track record of independently leading CSP technical PoC, pilot verification and large-scale production deployment projects with measurable business outcomes.

With competitive salaries and a generous benefits package, we are widely considered to be one of the world’s most desirable employers! We have some of the most forward-thinking and hardworking people in the world working for us and, due to outstanding growth, our best-in-class engineering teams are rapidly growing. If you're a creative and autonomous person with a real passion for technology, we want to hear from you.

Stack

PythonC++PyTorchTensorRTGPULLMsAgentic AIDistributed SystemsCUDA
Posted
Jul 1, 2026
Last seen
Jul 1, 2026
First seen
Jul 1, 2026

Similar roles

Browse more AI jobs