Skip to yearly menu bar Skip to main content




MLSys 2024 Career Website

Here we highlight career opportunities submitted by our Exhibitors, and other top industry, academic, and non-profit leaders. We would like to thank each of our exhibitors for supporting MLSys 2024. Opportunities can be sorted by job category, location, and filtered by any other field using the search box. For information on how to post an opportunity, please visit the help page, linked in the navigation bar above.

Search Opportunities

Please visit our careers page at the link below.


Apply

Bay Area, California Only

Cerebras has developed a radically new chip and system to dramatically accelerate deep learning applications. Our system runs training and inference workloads orders of magnitude faster than contemporary machines, fundamentally changing the way ML researchers work and pursue AI innovation.

At Cerebras, we're proud to be among the few companies globally capable of training massive LLMs with over 100 billion parameters. We're active contributors to the open-source community, with millions of downloads of our models on Hugging Face. Our customers include national labs, global corporations across multiple industries, and top-tier healthcare systems. Recently, we announced a multi-year, multi-million-dollar partnership with Mayo Clinic, underscoring our commitment to transforming AI applications across various fields.

The Role

As the Cerebras ML Product Manager, you'll spearhead the transformation of AI across various industries by productizing critical machine learning (ML) use cases. Collaborating closely with Product leadership and ML research teams, you'll identify promising areas within the industry and research community, balancing business value and ML thought leadership. Your role involves translating abstract neural network requirements into actionable tasks for the Engineering team, establishing roadmaps, processes, success criteria, and feedback loops for product improvement. This position requires a blend of deep technical expertise in ML and deep learning concepts, familiarity with modern models, particularly in the Large Language Model (LLM) space, and a solid grasp of mathematical foundations. Ideal candidates will anticipate future trends in deep learning and understand connections across different neural network types and application domains.

Responsibilities

  • Understand deep learning use cases across industries through market analysis, research, and user studies
  • Develop and own the product roadmap for neural network architectures and ML methods on Cerebras platform
  • Collaborate with end users to define market requirements for AI models
  • Define software requirements and priorities with engineering for ML network support
  • Establish success metrics for application enablement, articulating accuracy and performance expectations
  • Support Marketing, Product Marketing, and Sales by documenting features and defining ML user needs
  • Collaborate across teams to define product go-to-market strategy and expand user community
  • Clearly communicate roadmaps, priorities, experiments, and decisions

Requirements

  • Bachelor’s or Master’s degree in computer science, electrical engineering, physics, mathematics, a related scientific/engineering discipline, or equivalent practical experience
  • 3-10+ years product management experience, working directly with engineering teams, end users (enterprise data scientists/ML researchers), and senior product/business leaders
  • Strong fundamentals in machine learning/deep learning concepts, modern models, and the mathematical foundations behind them; understanding of how to apply deep learning models to relevant real-world applications and use cases
  • Experience working with a data science/ML stack, including TensorFlow and PyTorch
  • An entrepreneurial sense of ownership of overall team and product success, and the ability to make things happen around you. A bias towards getting things done, owning the solution, and driving problems to resolution
  • Outstanding presentation skills with a strong command of verbal and written communication

Preferred

  • Experience developing machine learning applications or building tools for machine learning application developers
  • Prior research publications in the machine learning/deep learning fields demonstrating deep understanding of the space

Apply

UAE locals or any candidate willing to relocate

Cerebras has developed a radically new chip and system to dramatically accelerate deep learning applications. Our system runs training and inference workloads orders of magnitude faster than contemporary machines, fundamentally changing the way ML researchers work and pursue AI innovation.

We are innovating at every level of the stack – from chip, to microcode, to power delivery and cooling, to new algorithms and network architectures at the cutting edge of ML research. Our fully-integrated system delivers unprecedented performance because it is built from the ground up for deep learning workloads.

About the role

As an applied machine learning engineer, you will take today’s state-of-the-art solutions in various verticals and adapt them to run on the new Cerebras system architecture. You will get to see how deep learning is being applied to some of the world’s most difficult problems today and help ML researchers in these fields to innovate more rapidly and in ways that are not currently possible on other hardware systems.

Responsibilities

  • Familiar with state-of-the-art transformer architectures for language and vision model.
  • Bring up new state-of-the art model on Cerebras System and function validation.
  • Train a model to convergence, and hyper-parameter tuning.
  • Optimize model code to run efficiently on Cerebras System.
  • Explore new model architecture that take advantage of Cerebras unique capabilities.
  • Develop new approaches for solving real world AI problems on various domains.

Requirements

  • Masters or PhD in Computer Science or related field
  • Familiarity with JAX/TensorFlow/PyTorch
  • Good understanding of how to define custom layers and back-propagate through them.
  • Experience with transformer deep learning models
  • Experience in vertical such as computer vision or language modeling
  • Experience with Large Language Models such as GPT family, Llama, BLooM.

Apply

Please use the link below to review all opportunities at Cerebras Systems. We are actively hiring across our Machine Learning, Software, Hardware, Systems, Manufacturing, and Product organizations.

Why Join Cerebras People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras:

  1. Build a breakthrough AI platform beyond the constraints of the GPU
  2. Publish and open source their cutting-edge AI research
  3. Work on one of the fastest AI supercomputers in the world
  4. Enjoy job stability with startup vitality
  5. Our simple, non-corporate work culture that respects individual beliefs

Read our blog: Five Reasons to Join Cerebras in 2024.

Apply today and become part of the forefront of groundbreaking advancements in AI.

Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.


Apply

US and Canada only

Cerebras' systems are designed with a singular focus on machine learning. Our processor is the Wafer Scale Engine (WSE), a single chip with performance equivalent to a cluster of GPUs, giving the user cluster-scale capability with the simplicity of programming a single device. Because of this programming simplicity, large model training can be scaled out using simple data parallelism to the performance of thousands of GPUs. ML practitioners can focus on their machine learning, rather than parallelizing and distributing their applications across many devices. The Cerebras hardware architecture is designed with unique capabilities including orders of magnitude higher memory bandwidth and unstructured sparsity acceleration, not accessible on traditional GPUs. With a rare combination of cutting-edge hardware and deep expertise in machine learning, we stand among the select few global organizations capable of conducting large-scale innovative deep learning research and developing novel ML algorithms not possible on traditional hardware.

About the role

Cerebras has senior and junior research scientist roles open with focus on co-design and demonstration of novel state-of-the-art ML algorithms with this unique specialized architecture. We are working on research areas including advancing and scaling foundation models for natural language processing and multi-modal applications, new weight and activation sparsity algorithms, and novel efficient training techniques. A key responsibility of our group is to ensure that state-of-the-art techniques can be applied systematically across many important applications.

As part of the Core ML team, you will have the unique opportunity to research state-of-the-art models as part of a collaborative and close-knit team. We deliver important demos of Cerebras capability as well as publish our findings as ways to support and engage with the community. A key aspect of the senior role will also be to provide active guidance and mentorship to other talented and passionate scientists and engineers.

Research Directions

Our research focuses on improving state-of-the-art foundation models in NLP, computer vision, and multi-modal settings by studying many dimensions unique to the Cerebras architecture:

  • Scaling laws to predict and analyze large-scale training improvements: accuracy/loss, architecture scaling, and hyperparameter transfer
  • Sparse and low-precision training algorithms for reduced training time and increased accuracy. For instance, weight and activation sparsity, mixture-of-experts, and low-rank adaptation
  • Optimizers, initializers, normalizers to improve training dynamics and efficiency

Responsibilities

  • Develop novel training algorithms that advance state-of-the-art in model quality and compute efficiency
  • Develop novel network architectures that address foundational challenges in language and multi-modal domains
  • Co-design ML algorithms that take advantage of existing unique Cerebras hardware advantages and collaborate with engineers to co-design next generation architectures
  • Design and run research experiments that show novel algorithms are efficient and robust
  • Analyze results to gain research insights, including training dynamics, gradient quality, and dataset preprocessing techniques
  • Publish and present research at leading machine learning conferences
  • Collaborate with engineers in co-design of the product to bring the research to customers

Requirements

  • Strong grasp of machine learning theory, fundamentals, linear algebra, and statistics
  • Experience with state-of-the-art models, such as GPT, LLaMA, DaLL-E, PaLI, or Stable Diffusion
  • Experience with machine learning frameworks, such as TensorFlow and PyTorch.
  • Strong track record of relevant research success through relevant publications at top conferences or journals (e.g. ICLR, ICML, NeurIPS), or patents and patent applications

Apply

US & Canada only

Cerebras is on a mission to accelerate the pace of progress in Generative AI by building AI supercomputers that deliver unprecedented performance for LLM training! Cerebras is leveraging these supercomputers to turbocharge the exploration of end-to-end solutions that address real-world challenges, such as breaking down language barriers, enhancing developer productivity, and advancing medical research breakthroughs. The AppliedML team at Cerebras is a team of Generative AI practitioners and experts who leverage Cerebras AI supercomputers to push the technical frontiers of the domain and work with our partners to build compelling solutions. Some of this team's publicly announced successful efforts are BTLM, Jais 30B multilingual model, and Arabic chatbot, among others.

About the role

As an applied machine learning engineer, you will work on adapting state of the art deep learning (DL) models to run on our wafer scale system. This includes both functional validation and performance tuning of a variety of core models for applications like Natural Language Processing (NLP), Large Language Models (LLMs), Computer Vision (CV) and Graph Neural Networks (GNN).

As a member of the Cerebras engineering team you will be implementing models in popular DL frameworks like PyTorch and using insights into our hardware architecture to unlock to full potential of our chip. You will work on all aspects of the DL model pipeline including:

  • Dataloader implementation and performance optimization
  • Reference model implementation and functional validation
  • Model convergence and hyper-parameters tuning
  • Model customization to meet customer needs.
  • Model architecture pathfinding

This role will allow you to work closely with partner companies at the forefront of their fields across many industries. You will get to see how deep learning is being applied to some of the world’s most difficult problems today and help ML researchers in these fields to innovate more rapidly and in ways that are not currently possible on other hardware systems.

Responsibilities

  • Analyze, implement, and optimize DL models for the WSE
  • Functional and convergence of models on the WSE
  • Work with engineering teams to optimize models for the Cerebras stack
  • Support engineering teams in functional and performance scoping new models and layers
  • Work with customers to optimize their models for the Cerebras stack
  • Develop new approaches for solving real world AI problems on various domains

Requirements

  • Master's degree or PhD in engineering, science, or related field with 5+ years of experience
  • Experience programming in modern language like Python or C++
  • In-depth understanding of DL learning methods and model architectures
  • Experience with DL frameworks like PyTorch, TensorFlow and JAX
  • Familiar with state-of-the-art transformer architectures for language and vision model
  • Experience in model training and hyper-parameter tuning techniques
  • Familiar with different LLM downstream tasks and datasets

Preferred Skills

  • A deep passion for cutting edge artificial intelligence techniques
  • Understanding of hardware architecture
  • Experience programming accelerators like GPUs and FPGAs

Apply

Bengaluru, Karnataka, India

Cerebras has developed a radically new chip and system to dramatically accelerate deep learning applications. Our system runs training and inference workloads orders of magnitude faster than contemporary machines, fundamentally changing the way ML researchers work and pursue AI innovation.

We are innovating at every level of the stack – from chip, to microcode, to power delivery and cooling, to new algorithms and network architectures at the cutting edge of ML research. Our fully-integrated system delivers unprecedented performance because it is built from the ground up for deep learning workloads.

About the role

The AppliedML team is seeking a senior technical leader to spearhead new initiatives on Generative AI solutions. In this role, you will lead a team of research and software engineers to plan, develop, and deliver end-to-end solutions trained on massive supercomputers. These projects may be part of our customer collaborations or open-source initiatives. These solutions will be trained on some of the largest systems and using some unique datasets we have developed in partnership with our diverse collaborators. You will plan and design experiments, execute them using Cerebras' unique workflow, and share the findings with internal stakeholders and external partners.

Responsibilities

  • Lead the technical exploration – from framing the problem statement, defining the option space, and approaching the options in a data-driven way to identify the final approach
  • Design experiments to test the different hypotheses analyze output to distill the learnings, and use them to adjust the project direction
  • Keep up with the state-of-the-art in Generative AI – efficient training recipes, model architecture, alignment, and instruction tuning, among others
  • Influence and mentor a distributed team of engineers
  • Integrate and enhance the latest research in model compression, including sparsity and quantization, to achieve super-linear scaling in model performance and accuracy
  • Breakthrough efficiency through co-designing hardware capabilities, model architecture, and training/deployment recipes

Requirements

  • MS in Computer Science, Statistics, or related fields
  • Experience with technical leadership of a moderate size team for 2+ years
  • Hands-on experience with training DL models for speech, language, vision, or a combination of them (multi-modal)
  • Experience with being the technical lead of a feature or project from conception through productization
  • Experience operating in a self-directed environment with multiple stakeholders
  • Experience working with other leaders to define strategic roadmaps
  • Proven track record of clearly articulating the findings to a broad audience with varying technical familiarity with the subject matter

Preferred

  • Ph.D. in Computer Science, Statistics, or related fields
  • Publications in top conferences such as NeurIPS, ICML, and CVPR, among others
  • Track record of building impactful features through open source or productization
  • People management experience is desired

Apply

Seattle or Remote

OctoAI is a leading startup in the fast-paced generative AI market. Our mission is to empower businesses to build differentiated applications that delight customers with the latest generative AI features.

Our platform, OctoAI, delivers generative AI infrastructure to run, tune, and scale models that power AI applications. OctoAI makes models work for you by providing developers easy access to efficient AI infrastructure so they can run the models they choose, tune them for their specific use case, and scale from dev to production seamlessly. With the fastest foundation models on the market (including Llama-2, Stable Diffusion, and SDXL), integrated customization solutions, and world-class ML systems under the hood, developers can focus on building apps that wow their customers without becoming AI infrastructure experts.

Our team consists of experts in cloud services, infrastructure, machine learning systems, hardware, and compilers as well as an accomplished go-to-market team with diverse backgrounds. We have secured over $130M in venture capital funding and will continue to grow over the next year. We're based largely in Seattle but have a remote-first culture with people working all over the US and elsewhere in the world.

We dream big but execute with focus and believe in creativity, productivity, and a balanced life. We value diversity in all dimensions and are always looking for talented people to join our team!

Our Automation team specializes in developing the most efficient engine for generative model deployment. We concentrate on enhancements from detailed GPU kernel adjustments to broader system-level optimizations, including continuous batching.

We are seeking a highly skilled and experienced Machine Learning Systems Engineer with experience in CUDA Kernel optimization to join our dynamic team. In this role, you will be responsible for driving significant advancements in GPU performance optimizations and contributing to cutting-edge projects in AI and machine learning.


Apply

Seattle or Remote


OctoAI is a leading startup in the fast-paced generative AI market. Our mission is to empower businesses to build differentiated applications that delight customers with the latest generative AI features.

Our platform, OctoAI, delivers generative AI infrastructure to run, tune, and scale models that power AI applications. OctoAI makes models work for you by providing developers easy access to efficient AI infrastructure so they can run the models they choose, tune them for their specific use case, and scale from dev to production seamlessly. With the fastest foundation models on the market (including Llama-2, Stable Diffusion, and SDXL), integrated customization solutions, and world-class ML systems under the hood, developers can focus on building apps that wow their customers without becoming AI infrastructure experts.

Our team consists of experts in cloud services, infrastructure, machine learning systems, hardware, and compilers as well as an accomplished go-to-market team with diverse backgrounds. We have secured over $130M in venture capital funding and will continue to grow over the next year. We're based largely in Seattle but have a remote-first culture with people working all over the US and elsewhere in the world.

We dream big but execute with focus and believe in creativity, productivity, and a balanced life. We value diversity in all dimensions and are always looking for talented people to join our team!

Our MLSys Engineering team specializes in developing the most efficient and feature packed engines for generative model deployment. This includes feature enablement and optimization for popular media models, such as Mixtral, Llama-2, Stable Diffusion, SDXL, SVD, and SD3 and thus, requires broad understanding on a various system layers from serving API to hardware-level. We do this by building systems that innovate new techniques as well as leveraging and contributing to open source projects including TVM, MLC-LLM, vLLM, CUTLASS, and more.

We are seeking a highly skilled and experienced Machine Learning Systems Engineer to join our dynamic team. In this role, you will be responsible contributing to the latest techniques and technologies in AI and machine learning.


Apply

d-Matrix has fundamentally changed the physics of memory-compute integration with our digital in-memory compute (DIMC) engine. The “holy grail” of AI compute has been to break through the memory wall to minimize data movements. We’ve achieved this with a first-of-its-kind DIMC engine. Having secured over $154M, $110M in our Series B offering, d-Matrix is poised to advance Large Language Models to scale Generative inference acceleration with our chiplets and In-Memory compute approach. We are on track to deliver our first commercial product in 2024. We are poised to meet the energy and performance demands of these Large Language Models. The company has 100+ employees across Silicon Valley, Sydney and Bengaluru.

Our pedigree comes from companies like Microsoft, Broadcom, Inphi, Intel, Texas Instruments, Lucent, MIPS and Wave Computing. Our past successes include building chips for all the cloud hyperscalers globally - Amazon, Facebook, Google, Microsoft, Alibaba, Tencent along with enterprise and mobile operators like China Mobile, Cisco, Nokia, Ciena, Reliance Jio, Verizon, AT&AT. We are recognized leaders in the mixed signal, DSP connectivity space, now applying our skills to next generation AI.

ML Compiler Backend Developer https://jobs.ashbyhq.com/d-Matrix/ed7241c7-8fe0-4023-9813-efb93b43180f

Machine Learning Senior Staff https://jobs.ashbyhq.com/d-Matrix/7bd32e05-677e-48ec-98cb-fbfb4c6a14f3

Machine Learning Performance Architect https://jobs.ashbyhq.com/d-Matrix/64ba00d5-55b7-44c6-a564-eba934c07c2b

SQA (Software Quality Engineer) https://jobs.ashbyhq.com/d-Matrix/bc81c7b1-98aa-40a9-99b7-740592585da0

AI / ML System Software Engineer https://jobs.ashbyhq.com/d-Matrix/71b6738b-1b65-4471-8505-6893e4261ae0


Apply

d-Matrix has fundamentally changed the physics of memory-compute integration with our digital in-memory compute (DIMC) engine. The “holy grail” of AI compute has been to break through the memory wall to minimize data movements. We’ve achieved this with a first-of-its-kind DIMC engine. Having secured over $154M, $110M in our Series B offering, d-Matrix is poised to advance Large Language Models to scale Generative inference acceleration with our chiplets and In-Memory compute approach. We are on track to deliver our first commercial product in 2024. We are poised to meet the energy and performance demands of these Large Language Models. The company has 100+ employees across Silicon Valley, Sydney and Bengaluru.

Our pedigree comes from companies like Microsoft, Broadcom, Inphi, Intel, Texas Instruments, Lucent, MIPS and Wave Computing. Our past successes include building chips for all the cloud hyperscalers globally - Amazon, Facebook, Google, Microsoft, Alibaba, Tencent along with enterprise and mobile operators like China Mobile, Cisco, Nokia, Ciena, Reliance Jio, Verizon, AT&AT. We are recognized leaders in the mixed signal, DSP connectivity space, now applying our skills to next generation AI.  

Location:

Hybrid, working onsite at our Santa Clara, CA headquarters 3 days per week.

What You Will Do:

The role requires you to be part of the team that helps productize the SW stack for our AI compute engine. As part of the Software team, you will be responsible for the development, enhancement, and maintenance of the next-generation AI hardware simulation tools for hardware and for developing software kernels for the hardware. You possess experience building functional simulators for new HW architectures. You possess a very strong understanding of various hardware architectures and how to map algorithms to the architecture. You understand how to map computational graphs generated by AI frameworks to the underlying architecture. You have had past experience working across all aspects of the full stack tool chain and understand the nuances of what it takes to optimize and trade-off various aspects of hardware-software co-design. You are able to build and scale software deliverables in a tight development window. You will work with a team of compiler experts to build out the compiler infrastructure working closely with other software (ML, Systems) and hardware (mixed signal, DSP, CPU) experts in the company. 

What You Will Bring:

• MS or PhD preferred in Computer Science, Electrical Engineering, Math, Physics or related degree. with 12+ years of Industry Experience.

• Strong grasp of computer architecture, data structures, system software, and machine learning fundamentals. 

• Proficient in C/C++ and Python development in Linux environment and using standard development tools. 

• Experience implementing functional simulators in high level languages such as C/C++, Python. 

• Self-motivated team player with a strong sense of ownership and leadership.

Desired: 

• Prior startup, small team or incubation experience. 

• Experience implementing algorithms for specialized hardware such as FPGAs, DSPs, GPUs, AI accelerators. 

• Experience with ML algorithms and frameworks such as PyTorch and/or TensorFlow • Experience with ML compilers and frameworks such as MLIR, LLVM, TVM, GLow.

• Experience with a deep learning framework (such as PyTorch, Tensorflow) and ML models for CV, NLP, or Recommendation. 

• Work experience at a cloud provider or AI compute / sub-system company.


Apply

d-Matrix has fundamentally changed the physics of memory-compute integration with our digital in-memory compute (DIMC) engine. The “holy grail” of AI compute has been to break through the memory wall to minimize data movements. We’ve achieved this with a first-of-its-kind DIMC engine. Having secured over $154M, $110M in our Series B offering, d-Matrix is poised to advance Large Language Models to scale Generative inference acceleration with our chiplets and In-Memory compute approach. We are on track to deliver our first commercial product in 2024. We are poised to meet the energy and performance demands of these Large Language Models. The company has 100+ employees across Silicon Valley, Sydney and Bengaluru.

Our pedigree comes from companies like Microsoft, Broadcom, Inphi, Intel, Texas Instruments, Lucent, MIPS and Wave Computing. Our past successes include building chips for all the cloud hyperscalers globally - Amazon, Facebook, Google, Microsoft, Alibaba, Tencent along with enterprise and mobile operators like China Mobile, Cisco, Nokia, Ciena, Reliance Jio, Verizon, AT&AT. We are recognized leaders in the mixed signal, DSP connectivity space, now applying our skills to next generation AI.  

Location:

Hybrid, working onsite at our Santa Clara, CA headquarters or San Diego, CA Location 3 days per week.

What You Will Do:

The Machine Learning Team is responsible for the R&D of core algorithm-hardware co-design capabilities in d-Matrix's end-to-end solution.  You will be joining a team of exceptional people enthusiastic about researching and developing state-of-the-art efficient deep learning techniques tailored for d-Matrix's AI compute engine.  You will also have the opportunity of collaboration with top academic labs and help customers to optimize and deploy workloads for real-world AI applications on our systems. 

• Design, implement and evaluate efficient deep neural network architectures and algorithms for d-Matrix's AI compute engine.

• Engage and collaborate with internal and external ML researchers to meet R&D goals. 

• Engage and collaborate with SW team to meet stack development milestones. 

• Conduct research to guide hardware design. 

• Develop and maintain tools for high-level simulation and research. 

• Port customer workloads, optimize them for deployment, generate reference implementations and evaluate performance. 

• Report and present progress timely and effectively. 

• Contribute to publications of papers and intellectual properties. 

What You Will Bring:

• Master's degree in Computer Science, Electrical and Computer Engineering, or a related technical discipline with 3+ years of industry experience, PhD preferred with 1+ year of industry experience. 

• High proficiency with major deep learning frameworks: PyTorch is a must. 

• High proficiency in algorithm analysis, data structure, and Python programming is a must. 

Desired:

• Proficiency with C/C++ programming is preferred. 

• Proficiency with GPU CUDA programming is preferred.

• Deep, wide and current knowledge in machine learning and modern deep learning is preferred

• Experience in real-world data science projects in an industry setting is preferred. 

• Experience with efficient deep learning is preferred: quantization, sparsity, distillation. 

• Experience with specialized HW accelerator systems for deep neural network is preferred. 

• Passionate about AI and thriving in a fast-paced and dynamic startup culture.


Apply

d-Matrix has fundamentally changed the physics of memory-compute integration with our digital in-memory compute (DIMC) engine. The “holy grail” of AI compute has been to break through the memory wall to minimize data movements. We’ve achieved this with a first-of-its-kind DIMC engine. Having secured over $154M, $110M in our Series B offering, d-Matrix is poised to advance Large Language Models to scale Generative inference acceleration with our chiplets and In-Memory compute approach. We are on track to deliver our first commercial product in 2024. We are poised to meet the energy and performance demands of these Large Language Models. The company has 100+ employees across Silicon Valley, Sydney and Bengaluru.

Our pedigree comes from companies like Microsoft, Broadcom, Inphi, Intel, Texas Instruments, Lucent, MIPS and Wave Computing. Our past successes include building chips for all the cloud hyperscalers globally - Amazon, Facebook, Google, Microsoft, Alibaba, Tencent along with enterprise and mobile operators like China Mobile, Cisco, Nokia, Ciena, Reliance Jio, Verizon, AT&AT. We are recognized leaders in the mixed signal, DSP connectivity space, now applying our skills to next generation AI.  

Location:

Hybrid, working onsite at our Santa Clara, CA headquarters 3 days per week.

The role: Software Engineer, Staff - Kernels

What you will do:

The role requires you to be part of the team that helps productize the SW stack for our AI compute engine. As part of the Software team, you will be responsible for the development, enhancement, and maintenance of software kernels for next-generation AI hardware. You possess experience building software kernels for HW architectures. You possess a very strong understanding of various hardware architectures and how to map algorithms to the architecture. You understand how to map computational graphs generated by AI frameworks to the underlying architecture. You have had past experience working across all aspects of the full stack tool chain and understand the nuances of what it takes to optimize and trade-off various aspects of hardware-software co-design. You are able to build and scale software deliverables in a tight development window. You will work with a team of compiler experts to build out the compiler infrastructure working closely with other software (ML, Systems) and hardware (mixed signal, DSP, CPU) experts in the company. 

What you will bring:

Minimum:

MS or PhD in Computer Engineering, Math, Physics or related degree with 5+ years of industry experience.

Strong grasp of computer architecture, data structures, system software, and machine learning fundamentals. 

Proficient in C/C++ and Python development in Linux environment and using standard development tools. 

Experience implementing algorithms in high level languages such as C/C++, Python. 

Experience implementing algorithms for specialized hardware such as FPGAs, DSPs, GPUs, AI accelerators using libraries such as CuDA etc. 

Experience in implementing operators commonly used in ML workloads - GEMMs, Convolutions, BLAS, SIMD operators for operations like softmax, layer normalization, pooling etc.

Experience with development for embedded SIMD vector processors such as Tensilica. 

Self-motivated team player with a strong sense of ownership and leadership. 

Preferred:

Prior startup, small team or incubation experience. 

Experience with ML frameworks such as TensorFlow and.or PyTorch. 

Experience working with ML compilers and algorithms, such as MLIR, LLVM, TVM, Glow, etc.

Experience with a deep learning framework (such as PyTorch, Tensorflow) and ML models for CV, NLP, or Recommendation. 

Work experience at a cloud provider or AI compute / sub-system company.


Apply

d-Matrix has fundamentally changed the physics of memory-compute integration with our digital in-memory compute (DIMC) engine. The “holy grail” of AI compute has been to break through the memory wall to minimize data movements. We’ve achieved this with a first-of-its-kind DIMC engine. Having secured over $154M, $110M in our Series B offering, d-Matrix is poised to advance Large Language Models to scale Generative inference acceleration with our chiplets and In-Memory compute approach. We are on track to deliver our first commercial product in 2024. We are poised to meet the energy and performance demands of these Large Language Models. The company has 100+ employees across Silicon Valley, Sydney and Bengaluru.

Our pedigree comes from companies like Microsoft, Broadcom, Inphi, Intel, Texas Instruments, Lucent, MIPS and Wave Computing. Our past successes include building chips for all the cloud hyperscalers globally - Amazon, Facebook, Google, Microsoft, Alibaba, Tencent along with enterprise and mobile operators like China Mobile, Cisco, Nokia, Ciena, Reliance Jio, Verizon, AT&AT. We are recognized leaders in the mixed signal, DSP connectivity space, now applying our skills to next generation AI

Location:

Hybrid, working onsite at our Santa Clara, Ca headquarters 3-5 days per week.

What You Will Do:

The role requires you to be part of the team that helps productize the SW stack for our AI compute engine. As part of the Software team, you will be responsible for the development, enhancement, and maintenance of the development and testing infrastructure for next-generation AI hardware. You can build and scale software deliverables in a tight development window. You will work with a team of compiler, ML, and HW architecture experts to build performant ML workloads targeted for d-Matrix’s architecture. You will also research and develop forward looking items that further improve the performance of ML workloads on d-Matrix’s architecture.

What You Will Bring:

MS or PhD preferred in Computer Science, Electrical Engineering, Math, Physics or related degree with 2+ Years of Industry Experience.

Strong grasp of computer architecture, data structures, system software, and machine learning fundamentals

Experience with mapping NLP models (BERT and GPT) to accelerators and awareness of trade-offs across memory, BW and compute

Proficient in Python/C/C++ development in Linux environment and using standard development tools

Experience with deep learning frameworks (such as PyTorch, Tensorflow)

Self-motivated team player with a strong sense of ownership and leadership

Desired: 

Research background with publication record in top-tier ML/Computer architecture conferences

Prior startup, small team or incubation experience

Experience implementing and optimizing ML workloads and low-level software algorithms for specialized hardware such as FPGAs, DSPs, DL accelerators.

Experience with ML Models from definition to deployment including training, quantization, sparsity, model preprocessing, and deployment

Work experience at a cloud provider or AI compute / sub-system company

Experience implementing SIMD algorithms on vector processors


Apply

Our mission at Capital One is to create trustworthy, reliable and human-in-the-loop AI systems, changing banking for good. For years, Capital One has been leading the industry in using machine learning to create real-time, intelligent, automated customer experiences. From informing customers about unusual charges to answering their questions in real time, our applications of AI & ML are bringing humanity and simplicity to banking. Because of our investments in public cloud infrastructure and machine learning platforms, we are now uniquely positioned to harness the power of AI. We are committed to building world-class applied science and engineering teams and continue our industry leading capabilities with breakthrough product experiences and scalable, high-performance AI infrastructure. At Capital One, you will help bring the transformative power of emerging AI capabilities to reimagine how we serve our customers and businesses who have come to love the products and services we build.

We are looking for an experienced Director, AI Platforms to help us build the foundations of our enterprise AI Capabilities. In this role you will work on developing generic platform services to support applications powered by Generative AI. You will develop SDKs and APIs to build agents, information retrieval and to build models as a service for powering generative AI workflows such as optimizing LLMs via RAG.

Additionally you will manage end-to-end coordination with operations and manage creation of high quality curated datasets and productionizing of models along with working with applied research and product teams to identify and prioritize ongoing and upcoming services.


Apply