Top Ai Inference Engine Startups

Groq

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

The startup develops deterministic single-core streaming architectures that predict performance and compute time for various workloads. This technology enhances computing speed, quality, and energy efficiency in artificial intelligence and quality-performance computing applications.

Founded 2016

250+

Approximate amount of employees

50K+

Funding: $1.0M

Rough estimate of the amount of funding raised

Alumni VenturesBlackRock

Funding: $1.0M

Rough estimate of the amount of funding raised

Wallaroo.AI

City of New York, United States

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

The startup offers a cloud-based data processing AI platform that enables the deployment of real-time applications without infrastructure constraints. Its software allows data engineers and architects to efficiently process large data volumes, enhancing outpatient monitoring and real-time bidding while minimizing investment costs.

Founded 2017

40+

Approximate amount of employees

2K+

Funding: $33.1M

Rough estimate of the amount of funding raised

M12 - Microsoft's Venture Fund

Funding: $33.1M

Rough estimate of the amount of funding raised

Simplismart

Bengaluru, India

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

Simplismart provides a high-performance inference engine that enables rapid deployment and fine-tuning of generative AI models on-premises or across various cloud platforms. This technology reduces model deployment time from months to days, significantly lowering operational costs while enhancing inference speed and scalability.

Founded 2022

20+

Approximate amount of employees

10K+

Funding: $8.3M

Rough estimate of the amount of funding raised

Google for Startups

Funding: $8.3M

Rough estimate of the amount of funding raised

Numericcal

Berkeley, United States

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

Provides tools for deploying, optimizing, and managing deep neural network (DNN) models on edge devices. This enables real-time data processing and reduced latency for applications requiring efficient AI inference in resource-constrained environments.

Founded 2015

Funding: $120.0K

Rough estimate of the amount of funding raised

Funding: $120.0K

Rough estimate of the amount of funding raised

Apex Compute

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

Apex Compute offers a unified AI inference engine that integrates systolic arrays with vector processing units to execute large language model kernels on edge devices with up to 20× higher performance‑per‑watt than conventional GPUs. Its hardware‑aware compiler and scheduler achieve over 90% utilization, delivering sub‑millisecond latency for GPT, vision transformer and related models. The solution is provided as an FPGA prototype and a licensed software stack for OEMs building power‑constrained edge AI systems such as drones, autonomous vehicles and industrial robots.

5+

Approximate amount of employees

700+

Fireworks AI

Redwood City, United States

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

Fireworks AI provides a serverless inference platform that enables the rapid deployment and fine-tuning of compound AI models, optimizing for speed and cost efficiency. The technology addresses the challenges of slow model inference and high operational costs, allowing businesses to scale AI applications effectively while maintaining low latency and high throughput.

Founded 2022

50+

Approximate amount of employees

10K+

Funding: $77.0M

Rough estimate of the amount of funding raised

Sequoia Capital

Funding: $77.0M

Rough estimate of the amount of funding raised

TerraCA AI

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

TerraCA AI provides AI-driven automation platforms for agribusiness and security operations, converting drone imagery, video, and IoT sensor streams into real‑time dispatch and response actions. Its cloud‑native inference engine with edge deployment delivers variable‑rate fertilizer prescriptions and threat‑detection triggers, integrated via MQTT/OPC‑UA and accessed through a subscription portal.

Nunchaku AI

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

Nunchaku AI provides a lightweight inference engine optimized for multimodal generative AI models, reducing GPU compute and memory usage across text, image, audio, and video workloads. The platform offers a unified API with dynamic batching, adaptive precision, and cloud‑native orchestration for low‑latency, cost‑effective deployment on both cloud and on‑premise hardware.

Wafer

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

Wafer offers an AI‑driven platform that automatically profiles, diagnoses, and optimizes inference workloads on any target hardware, achieving 1.5–5× higher throughput while reducing energy consumption. The solution provides a hardware‑agnostic API, cloud‑hosted performance dashboards, and SDKs for integration into deployment pipelines, serving semiconductor vendors, cloud providers, and AI research labs.

Groq

Mountain View, United States

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

Groq accelerates AI inference with custom-designed Language Processing Units (LPUs) that deliver sub-millisecond latency and consistent performance. Their cloud platform and on-premise solutions enable developers to deploy AI models efficiently and cost-effectively.

Founded 2016

250+

Approximate amount of employees

50K+

Funding: $640.0M

Rough estimate of the amount of funding raised

Alumni VenturesBlackRock

Funding: $640.0M

Rough estimate of the amount of funding raised

Untether AI

Toronto, Canada

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

Untether AI develops high-density AI accelerators that utilize at-memory computing to enhance the speed and energy efficiency of AI inference tasks. Their technology enables real-world applications, such as autonomous vehicles and smart cities, to operate more effectively and affordably.

Founded 2018

100+

Approximate amount of employees

7K+

Funding: $125.0M

Rough estimate of the amount of funding raised

Intel CapitalTracker Capital Management

Funding: $125.0M

Rough estimate of the amount of funding raised

Rebellions

Seongnam-si, South Korea

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

Rebellions develops AI accelerators that utilize HBM3e chiplet architecture and 5nm System-on-Chip technology to enhance energy efficiency and computational performance for deep learning applications. The company addresses the need for scalable and efficient AI inference solutions in the rapidly growing generative AI market.

Founded 2020

100+

Approximate amount of employees

10K+

Funding: $224.7M

Rough estimate of the amount of funding raised

KT CorpWa’ed Ventures

Funding: $224.7M

Rough estimate of the amount of funding raised

Vanta AI

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

The company operates tier‑3/4 data center facilities optimized for AI inference, featuring ultra‑dense rack layouts, advanced airflow, and integrated renewable or hybrid power to maximize compute per watt. It offers modular, pre‑virtualized AI clusters accessible via standard APIs and an AI‑driven orchestration layer that balances load, power, and cooling in real time. The global network of partner sites enables rapid deployment to both core cloud regions and edge locations for enterprises, hyperscale providers, and edge AI operators.

Enso

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

Enso provides an AI‑native commerce platform that replaces separate search and chatbot widgets with native e‑commerce modules, converting vague shopper queries into contextual product displays and personalized store layouts in real time. The system uses large‑language‑model inference to generate on‑brand headlines, descriptions, images, comparisons and bundle recommendations, and integrates via API‑first endpoints with Shopify, Magento and headless stacks. Retailers gain unified AI navigation, low‑latency personalization and analytics that lift conversion rates and average order value.

Inference.ai

Palo Alto, United States

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

The startup develops artificial intelligence software that optimizes GPU environments by providing on-demand GPU models and nodes. This enables clients to efficiently deploy high-performance computing resources, addressing the challenges of scalability and resource allocation in data-intensive applications.

10+

Approximate amount of employees

1K+

Funding: $4.0M

Rough estimate of the amount of funding raised

Cherubic VenturesMaple VC

Funding: $4.0M

Rough estimate of the amount of funding raised

Loc.ai

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

Loc.ai provides an edge AI platform that replaces OpenAI‑compatible inference APIs with a local runtime, enabling developers to run models on‑premise or on end‑user devices while preserving the same request/response schema. The system automatically falls back to cloud inference when local resources are insufficient, delivering zero network latency, data sovereignty, and compliance with regulations such as GDPR and HIPAA. A unified dashboard offers real‑time monitoring, device management, and audit logging, and pricing is fixed‑cost or pay‑as‑you‑go to cap expenses.

Together AI

San Francisco, United States

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

Together AI provides an AI-native cloud platform engineered for accelerating model training, fine-tuning, and inference on performance-optimized GPU infrastructure. The platform offers a comprehensive suite of tools, including a model library, serverless inference APIs, and self-service GPU clusters featuring frontier hardware. This infrastructure delivers industry-leading unit economics and performance for developers building large-scale generative AI applications.

Founded 2022

100+

Approximate amount of employees

30K+

Funding: $513.5M

Rough estimate of the amount of funding raised

Salesforce Ventures

Funding: $513.5M

Rough estimate of the amount of funding raised

Luminal

San Francisco, United States

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

Luminal provides a compiler‑first inference platform that converts machine‑learning models into optimized native code for GPUs and ASICs, eliminating runtime interpreter overhead. Its Inference OS monitors utilization and dynamically load‑balances workloads across heterogeneous clusters, offering both serverless cloud endpoints with usage‑based billing and on‑prem licensed deployments with custom kernel tuning.

10+

Approximate amount of employees

1K+

Funding: $500.0K

Rough estimate of the amount of funding raised

Y Combinator

Funding: $500.0K

Rough estimate of the amount of funding raised

Cerebras

Sunnyvale, United States

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

Cerebras provides a wafer‑scale AI compute platform that runs inference, fine‑tuning, and full‑parameter training of large language models on a single engine, delivering up to 3,000 tokens per second and reducing total cost of ownership versus GPU clusters. The system is offered as on‑premise CS‑2/CS‑3 hardware, private‑cloud capacity, or a pay‑as‑you‑go SaaS, with a drop‑in OpenAI‑compatible API and SOC 2/HIPAA‑certified data handling for enterprise workloads.

500+

Approximate amount of employees

50K+

Funding: $1.1B

Rough estimate of the amount of funding raised

Atreides ManagementFidelity

Funding: $1.1B

Rough estimate of the amount of funding raised

NeuReality

Caesarea, Israel

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

NeuReality designs AI-centric infrastructure that integrates a network addressable processing unit (NAPU) with purpose-built software to streamline AI inference workflows. This solution reduces reliance on traditional CPUs and networking components, addressing the complexity and inefficiencies that hinder AI model deployment and scalability.

Founded 2018

75+

Approximate amount of employees

5K+

Funding: $114.6M

Rough estimate of the amount of funding raised

Alumni VenturesXT Venture Capital

Funding: $114.6M

Rough estimate of the amount of funding raised

Silicon Mobile

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

Silicon Mobile Technology optimizes large AI models for inference, reducing the computational cost of running AI-powered applications. This allows businesses to deploy AI applications more efficiently and affordably.

Polargrid

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

This company provides low-latency, edge GPU compute infrastructure across North America for real-time AI inference workloads. They offer a developer-centric console for simplified model deployment and managed inference endpoints. The platform ensures sub-30 millisecond latency and seamless scaling for AI applications in sectors like healthcare, gaming, and e-commerce.

Features and Labels

San Francisco, United States

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

Fal provides a platform for developers to customize, deploy, and scale generative media models using the fastest inference engine for diffusion models, achieving up to 400% faster performance. This technology addresses the need for efficient and cost-effective model inference, allowing users to run their models on serverless GPUs while only paying for the computing power they consume.

Founded 2021

20+

Approximate amount of employees

3K+

Funding: $14.0M

Rough estimate of the amount of funding raised

Kindred Ventures

Funding: $14.0M

Rough estimate of the amount of funding raised

KamiwazaAI

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

Kamiwaza.ai provides a Gen AI stack that integrates an Inference Mesh and a locality-aware Distributed Data Engine, enabling enterprises to process data where it resides without compromising privacy. This technology allows businesses to achieve scalable AI solutions, targeting 1 trillion inferences per day while maintaining stringent security protocols for sensitive information.

Founded 2023

5+

Approximate amount of employees

500+

Funding: $11.0M

Rough estimate of the amount of funding raised

S3 Ventures

Funding: $11.0M

Rough estimate of the amount of funding raised

Baseten

San Francisco, United States

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

Baseten provides an inference platform that lets ML teams deploy and manage large language, diffusion, transcription, and other generative AI models with a single click. The service offers pre‑optimized runtimes, automatic multi‑cloud capacity management, and built‑in high‑availability, while supporting single‑tenant or self‑hosted deployments for secure, low‑latency serving. It integrates with CI/CD pipelines via API/SDK and includes tools for version control, monitoring, and performance tuning.

100+

Approximate amount of employees

10K+

Funding: $150.0M

Rough estimate of the amount of funding raised

Bond

Funding: $150.0M

Rough estimate of the amount of funding raised

Qingcheng

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

Enterprises and AI developers face prohibitive costs and engineering complexity when deploying large language models on domestic hardware, often requiring extensive custom integration, high memory footprints, and low inference throughput. Existing open‑source inference stacks are optimized for foreign GPUs and do not fully exploit the capabilities of Chinese AI chips, leading to inefficient resource utilization and data‑security concerns for private deployments. Qingcheng delivers an end‑to‑end AI‑infra software suite that abstracts heterogeneous compute resources and provides production‑grade model deployment on domestic accelerators.

InfiniFlow

Shanghai, China

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

The startup provides AI infrastructure and services that enable businesses to access and implement machine learning models without requiring extensive technical expertise. By simplifying the deployment of AI technology, the company helps organizations leverage data-driven insights to enhance operational efficiency and decision-making.

Founded 2023

Cerebras Systems

Sunnyvale, United States

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

Cerebras Systems provides a wafer‑scale AI processor that offers vastly higher memory bandwidth and lower latency than traditional GPUs, allowing developers to train and serve models from 1 B to 24 T parameters without sharding or code changes. The platform is available via cloud, private‑cloud API, or on‑premise deployment with OpenAI‑compatible endpoints and usage‑based pricing.

Founded 2015

500+

Approximate amount of employees

50K+

Funding: $1.1B

Rough estimate of the amount of funding raised

+ 6 Other investorsAtreides ManagementFidelity

Funding: $1.1B

Rough estimate of the amount of funding raised

Supermoon AI

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

Supermoon AI provides a developer SDK that builds on‑device, context‑aware AI agents for mobile and wearable applications, leveraging TensorFlow Lite or Core ML inference combined with federated learning to keep raw user data local. The platform fuses sensor streams to deliver sub‑100 ms personalized prompts, recommendations, and gestures, while a cloud analytics dashboard offers model performance monitoring and A/B testing. It lets app developers and device manufacturers add adaptive, privacy‑first AI experiences without deep machine‑learning expertise.

Scalegen

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

ScaleGenAI offers a private large‑language‑model platform that runs inference on dedicated multi‑region GPU clusters, providing low‑latency, high‑throughput serving with strict data isolation and compliance controls. The system auto‑scales compute resources, cuts GPU expenses by more than 50 % compared with typical cloud pricing, and supports deployment in public clouds, VPCs, or on‑premise environments via a unified API. Pricing is usage‑based with optional subscription tiers for predictable budgeting.

Unaware

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

Unaware provides a PCIe plug‑and‑play AI accelerator ASIC that runs neural network inference directly on hardware without a host operating system or runtime libraries. The chip’s dataflow architecture, on‑chip weight storage, and secure enclaves deliver over 10 TOPS/W efficiency while protecting model and data privacy, targeting privacy‑focused AI developers, edge‑computing startups, and small research labs.

Exlords

Góra, India

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

Exlords develops an AI and machine learning-integrated hardware platform for edge processors, enabling real-time AI inference on user devices through on-device neural processing units. This technology addresses the need for efficient, localized computing solutions that enhance performance and connectivity in heterogeneous computing environments.

Founded 2023

AICA

Prilly, Switzerland

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

AICA provides a visual, node‑based software platform that lets system integrators and robotics engineers build sensor‑driven, adaptive robot applications without custom code. Its hardware abstraction layer and built‑in AI inference engine enable the same skill set to run across multiple robot and sensor vendors, with cloud‑edge deployment, version control, and remote diagnostics.

Founded 2019

25+

Approximate amount of employees

2K+

Funding: $2.7M

Rough estimate of the amount of funding raised

Momenta

Funding: $2.7M

Rough estimate of the amount of funding raised

Crusoe

Denver, United States

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

Crusoe provides a managed AI cloud platform that delivers low‑latency, high‑throughput inference for large‑context models using NVIDIA and AMD GPUs with its MemoryAlloy engine. The service abstracts cluster provisioning via an API‑key workflow, auto‑scales on Kubernetes/Slurm, and includes a web console for one‑click model deployment, while its renewable‑powered data centers reduce compute costs by up to 80 %.

Founded 2018

500+

Approximate amount of employees

50K+

Funding: $1.4B

Rough estimate of the amount of funding raised

Mubadala CapitalValor Equity Partners

Funding: $1.4B

Rough estimate of the amount of funding raised

MEGVII

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

The company provides an AIoT ecosystem built on its Brain++ platform, which combines the MegEngine deep‑learning framework, MegCompute scheduler, and MegData secure pipeline to train and run AI models across edge devices and cloud services. Its hardware portfolio—including the Hongtu edge compute box, Huanfang AIoT server, smart network cameras, and Shenxing face‑recognition terminals—runs the Brain++ OS for real‑time inference, while the Pangu view platform and Jiuxiao cloud services deliver unified device management, analytics, and API integration for smart‑city, building security, and enterprise IoT applications.

Inferact

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

Inferact builds infrastructure to accelerate AI progress by making large language model inference cheaper and faster. Leveraging expertise from the vLLM open-source engine, the company optimizes performance across diverse model architectures and hardware accelerators. They aim to simplify large-scale AI deployment by absorbing infrastructure complexity for users.

Positron

Léo, United States

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

Positron develops purpose-built hardware systems specifically designed to accelerate Transformer Model Inference. These systems offer superior performance per dollar and performance per watt compared to existing solutions like NVIDIA. The platform seamlessly supports any trained HuggingFace Transformers Library model for efficient deployment via an OpenAI API-compliant endpoint.

Founded 2023

15+

Approximate amount of employees

700+

Funding: $22.5M

Rough estimate of the amount of funding raised

Funding: $22.5M

Rough estimate of the amount of funding raised

Beam

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

Beam provides a serverless cloud infrastructure that enables developers to deploy AI inference APIs, train models, and manage task queues with automatic GPU scaling. This platform addresses the challenges of slow deployment times and infrastructure management, allowing users to focus on building applications while only paying for the resources they consume.

15+

Approximate amount of employees

1K+

Latent AI

Menlo Park, United States

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

Latent AI provides an Efficient Inference Platform (LEIP) that enables enterprises to design, deploy, and manage AI models on edge devices with optimized performance and minimal resource consumption. This technology addresses the challenges of slow prototype development and high operational costs by facilitating rapid model retraining and real-time monitoring in the field.

Founded 2018

50+

Approximate amount of employees

3K+

Funding: $30.5M

Rough estimate of the amount of funding raised

Blackhorn VenturesFuture Ventures

Funding: $30.5M

Rough estimate of the amount of funding raised

Stellon Labs

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

Stellon Labs is an AI research lab that develops highly efficient, small-scale machine learning models optimized for deployment on edge devices with limited compute resources. Their technology enables applications such as real-time analytics, computer vision, and natural language processing to run locally on sensors, wearables, and IoT hardware without reliance on cloud infrastructure. The company monetizes its models and licensing agreements through direct contracts with hardware manufacturers and enterprise customers seeking on‑device AI capabilities.

Founded 2025

<3

Approximate amount of employees

500+

Funding: $500.0K

Rough estimate of the amount of funding raised

Y Combinator

Funding: $500.0K

Rough estimate of the amount of funding raised

8080

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

8080 provides a cloud inference platform optimized for Taalas Hardcore Chips, delivering sub‑second latency and high‑throughput execution of large language models. The service automatically compiles and scales models on specialized hardware, offering API‑based access, monitoring dashboards, and secure multi‑tenant isolation for AI developers and enterprises.

8080

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

8080 provides a cloud inference platform optimized for Taalas Hardcore Chips, delivering sub‑second latency and high‑throughput execution of large language models. The service automatically compiles and scales models on specialized hardware, offering API‑based access, monitoring dashboards, and secure multi‑tenant isolation for AI developers and enterprises.

Antigma

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

Antigma offers Ante, a Rust‑based single‑binary runtime that enables thousands of lightweight AI agents to run on edge hardware using on‑device llama.cpp inference with GGUF models, removing the need for external API services. The platform provides provider‑agnostic model switching and declarative coordination topologies for scalable, self‑healing agent orchestration.

Canopywave

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

Canopy Wave provides an inference platform for deploying and accessing open AI models via chat or API integration. The service offers secure, high-performance GPU infrastructure for model training and serverless inference without requiring users to manage underlying AI resources. They deliver dedicated AI infrastructure services, including model fine-tuning and customized agent development, on a pay-per-use basis.

Scx

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

This company provides high-performance, sovereign AI inference infrastructure featuring dedicated endpoints and advanced governance controls for secure enterprise workloads. They offer access to leading LLMs, including a proprietary model fine-tuned for Australian reasoning, delivered via OpenAI-compatible APIs. The platform utilizes purpose-built ASICs to achieve significantly lower latency and superior energy efficiency compared to standard GPU setups.

d-Matrix

Santa Clara, Cuba

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

D-Matrix has developed Corsair, an AI inference platform that achieves 60,000 tokens per second with 1 ms latency for Llama3 8B models, significantly enhancing throughput and energy efficiency in datacenters. This technology addresses the high computational costs and energy consumption associated with large-scale AI inference, enabling organizations to scale their AI capabilities sustainably.

Founded 2019

100+

Approximate amount of employees

7K+

Funding: $161.3M

Rough estimate of the amount of funding raised

Temasek Holdings

Funding: $161.3M

Rough estimate of the amount of funding raised

NeutronTech

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

NeutronTech provides a native AI runtime for Apple Silicon that runs inference, analytics, and data transformations entirely on-device, leveraging the Neural Engine, Metal GPU, and Secure Enclave. Its seven‑layer zero‑trust architecture ensures data never leaves the device, offering encrypted peer‑to‑peer mesh networking and compliance‑by‑design for regulated sectors. The company licenses the software stack and offers enterprise support.

Luchen Technology

Beijing, China

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

Luchen Technology provides a platform for training and deploying large AI models, significantly reducing training and inference costs by up to 90% while enhancing model capacity and speed. Their solutions enable businesses to efficiently build high-quality AI applications with minimal resources, streamlining the development process across various hardware environments.

Founded 2021

Nx10

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

Nx10 offers an emotion AI platform that infers user emotional and cognitive states from haptic and motion data during device interaction. Its on-device inference engine provides real-time affective insights to applications, enabling adaptive user experiences and enhancing engagement.

Mythic

Austin, United States

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

10

Relative Traction Score based on online presence metrics compared to companies in the same age group.

Mythic provides analog compute‑in‑memory AI inference accelerators that integrate compute and weight storage on a single silicon plane, eliminating off‑chip memory traffic. Delivered as standard M.2 cards, the APUs achieve up to 25 TOPS with 3‑4× lower power than comparable digital accelerators, and are compatible with TensorFlow and PyTorch for edge devices such as robots, drones, and smart‑city cameras.

Founded 2012

50+

Approximate amount of employees

10K+

Funding: $13.0M

Rough estimate of the amount of funding raised

Atreides ManagementLux Capital

Funding: $13.0M

Rough estimate of the amount of funding raised

Find Investable Startups and Competitors

Top 50 Ai Inference Engine

Groq

Wallaroo.AI

Simplismart

Numericcal

Apex Compute

Fireworks AI

TerraCA AI

Nunchaku AI

Wafer

Groq

Untether AI

Rebellions

Vanta AI

Enso

Inference.ai

Loc.ai

Together AI

Luminal

Cerebras

NeuReality

Silicon Mobile

Polargrid

Features and Labels

KamiwazaAI

Baseten

Qingcheng

InfiniFlow

Cerebras Systems

Supermoon AI

Scalegen

Unaware

Exlords

AICA

Crusoe

MEGVII

Inferact

Positron

Beam

Latent AI

Stellon Labs

8080

8080

Antigma

Canopywave

Scx

d-Matrix

NeutronTech

Luchen Technology

Nx10

Mythic

Frequently Asked Questions

What are AI inference engine startups / vendors?

Which use cases drive demand for AI inference engine companies?

What technologies are common in AI inference engine platforms / tools?

How can I find a directory of AI inference engine startups / companies?