Find Investable Startups and Competitors
Search thousands of startups using natural language—just describe what you're looking for
Top 50 Ai Inference Engine
Discover the top 50 Ai Inference Engine startups. Browse funding data, key metrics, and company insights. Average funding: $39.1M.
Sort by
NeuReality
-Caesarea, IsraelNeuReality designs AI-centric infrastructure that integrates a network addressable processing unit (NAPU) with purpose-built software to streamline AI inference workflows. This solution reduces reliance on traditional CPUs and networking components, addressing the complexity and inefficiencies that hinder AI model deployment and scalability.
Funding: $100M+
Rough estimate of the amount of funding raised
Lepton AI
-San Francisco, United StatesLepton AI Cloud provides a scalable platform for AI inference and training, utilizing high-performance GPU infrastructure and a fast LLM engine to achieve up to 600 tokens per second. The platform enables enterprises to efficiently deploy and manage AI models, processing over 20 billion tokens and generating 1 million images daily with 99.9% uptime.
Funding: $10M+
Rough estimate of the amount of funding raised
Groq
-Mountain View, United StatesGroq accelerates AI inference with custom-designed Language Processing Units (LPUs) that deliver sub-millisecond latency and consistent performance. Their cloud platform and on-premise solutions enable developers to deploy AI models efficiently and cost-effectively.
Funding: $500M+
Rough estimate of the amount of funding raised
Untether AI
-Toronto, CanadaUntether AI develops high-density AI accelerators that utilize at-memory computing to enhance the speed and energy efficiency of AI inference tasks. Their technology enables real-world applications, such as autonomous vehicles and smart cities, to operate more effectively and affordably.
Funding: $100M+
Rough estimate of the amount of funding raised
Deeplite
-Toronto, CanadaDeeplite provides AI optimization software that enhances the performance of deep neural networks by reducing their size and energy consumption. This technology enables faster inference times, making AI applications more efficient and cost-effective for various industries.
Funding: $5M+
Rough estimate of the amount of funding raised
Inferless
-Bengaluru, IndiaInferless provides a serverless GPU platform that enables rapid deployment of custom machine learning models from various sources, including Hugging Face and Docker, while automatically scaling resources to handle unpredictable workloads. This solution reduces operational costs by up to 90% and eliminates the complexities associated with traditional GPU clusters, allowing businesses to efficiently manage their machine learning inference needs.
Funding: $3M+
Rough estimate of the amount of funding raised
Groq
The startup develops deterministic single-core streaming architectures that predict performance and compute time for various workloads. This technology enhances computing speed, quality, and energy efficiency in artificial intelligence and quality-performance computing applications.
Funding: $1M+
Rough estimate of the amount of funding raised
Fireworks AI
-Redwood City, United StatesFireworks AI provides a serverless inference platform that enables the rapid deployment and fine-tuning of compound AI models, optimizing for speed and cost efficiency. The technology addresses the challenges of slow model inference and high operational costs, allowing businesses to scale AI applications effectively while maintaining low latency and high throughput.
Funding: $50M+
Rough estimate of the amount of funding raised
Axelera AI
10
Relative Traction Score based on online presence metrics compared to companies in the same age group.
Axelera AI develops and sells high-performance, energy-efficient AI inference hardware for edge devices. Their Metis AI Platform integrates a specialized in-memory computing architecture with a comprehensive software stack, enabling efficient deployment of deep learning models for computer vision and natural language processing applications.
Funding: $50M+
Rough estimate of the amount of funding raised
Pruna AI
-München, DeutschlandPruna AI provides an AI optimization engine that enhances machine learning model performance with just two lines of code, utilizing execution kernel and graph optimization techniques. This solution reduces runtime costs and carbon emissions by making AI models faster and more efficient, enabling scalable inference without extensive re-engineering.
Funding: $5M+
Rough estimate of the amount of funding raised
Latent AI
-Menlo Park, United StatesLatent AI provides an Efficient Inference Platform (LEIP) that enables enterprises to design, deploy, and manage AI models on edge devices with optimized performance and minimal resource consumption. This technology addresses the challenges of slow prototype development and high operational costs by facilitating rapid model retraining and real-time monitoring in the field.
Funding: $20M+
Rough estimate of the amount of funding raised
CLIKA
-San Jose, United StatesCLIKA provides an SDK that automatically compresses and optimizes AI models for diverse hardware backends. Its engine generates tailored compression plans based on model architecture, reducing model size and accelerating inference with minimal accuracy loss.
Inference Labs
-Hamilton, CanadaInference Labs provides a decentralized platform that utilizes cryptographic verification to ensure computational integrity for AI models, enabling secure and transparent AI interactions. By implementing agentic native protocols and zero-knowledge proofs, the company addresses the need for trust and reliability in AI inference across distributed networks.
Funding: $2M+
Rough estimate of the amount of funding raised
SpiNNcloud Systems
-Dresden, GermanySpiNNcloud Systems develops specialized hardware that replicates brain-like parallel processing to enhance real-time computing for complex simulations and data analysis. This technology overcomes the limitations of traditional computing architectures, significantly improving efficiency in handling large-scale data tasks.
Funding: $500K+
Rough estimate of the amount of funding raised
Neural Magic
-Somerville, United StatesNeural Magic provides an enterprise inference server solution that optimizes the deployment of open-source large language models (LLMs) on both CPU and GPU infrastructures. By enhancing computational efficiency and reducing hardware requirements, the platform enables organizations to run AI models securely and cost-effectively across various environments, including cloud and edge.
Funding: $20M+
Rough estimate of the amount of funding raised
BentoML
-San Francisco, United StatesBentoML provides a Unified Inference Platform that enables developers to build and deploy scalable AI systems using any model on their preferred cloud infrastructure. The platform addresses the challenges of slow iteration and high costs in AI deployment by offering features like auto-scaling, low-latency serving, and seamless integration with existing cloud resources.
Funding: $10M+
Rough estimate of the amount of funding raised
d-Matrix
-Santa Clara, CubaD-Matrix has developed Corsair, an AI inference platform that achieves 60,000 tokens per second with 1 ms latency for Llama3 8B models, significantly enhancing throughput and energy efficiency in datacenters. This technology addresses the high computational costs and energy consumption associated with large-scale AI inference, enabling organizations to scale their AI capabilities sustainably.
Funding: $100M+
Rough estimate of the amount of funding raised
Deep Infra
-Palo Alto, United StatesProvides a serverless machine learning inference platform that enables businesses to deploy and scale AI models via a simple API, eliminating the need for complex ML infrastructure. It reduces costs and improves efficiency by offering pay-per-use pricing, low-latency performance, and automatic scaling on dedicated A100 and H100 GPUs.
Funding: $20M+
Rough estimate of the amount of funding raised
Okahu
-Redwood City, United StatesThe startup offers an artificial intelligence infrastructure platform that enhances the transparency of deep learning models by making their decision-making processes explainable. This platform provides insights into AI operations and optimizes cost management, eliminating the need for custom integration or extensive log analysis.
Funding: $5M+
Rough estimate of the amount of funding raised
VSORA
-Meudon, FranceThe startup manufactures semiconductor chips with a multicore DSP architecture that accelerates the design of complex integrated circuits for mobile and network infrastructure. By eliminating the need for DSP coprocessors, these chips enable chipmakers to efficiently develop next-generation digital communication systems, including fifth-generation technologies.
Funding: $20M+
Rough estimate of the amount of funding raised
RaiderChip
-Ares, SpainRaiderChip designs semiconductor hardware accelerators that enhance AI performance by addressing memory bandwidth limitations. Their solutions enable efficient AI inference for both edge and cloud applications, allowing users to run complex large language models locally with full privacy and without ongoing subscriptions.
Funding: $1M+
Rough estimate of the amount of funding raised
Habana
-San Jose, United StatesHabana Labs develops Intel® Gaudi® AI accelerators designed for high-performance deep learning training and inference, providing enterprises and cloud providers with efficient compute solutions. Their technology delivers up to 40% better price/performance on cloud instances, addressing the need for cost-effective and scalable AI infrastructure.
Funding: $50M+
Rough estimate of the amount of funding raised
DeGirum
-Santa Clara, CubaDeGirum develops an AI Hub that integrates hardware and software infrastructure to streamline the development and deployment of edge AI applications. By providing a unified software toolchain and flexible hardware options, it reduces time to market and minimizes costs associated with multiple hardware and application investments.
Funding: $20M+
Rough estimate of the amount of funding raised
NEUCHIPS
-Hsinchu, TaiwanNEUCHIPS develops AI ASIC solutions, including the Evo Gen 5 PCIe Card and Gen AI N3000 Accelerator, specifically designed for deep learning inference in data centers. Their technology addresses the need for energy-efficient hardware that minimizes total cost of ownership (TCO) while enhancing performance for machine learning applications.
Funding: $50M+
Rough estimate of the amount of funding raised
Positron
-Léo, United StatesProvides a transformer inference server that delivers up to 5.2x higher performance and 75% lower cost per token compared to Nvidia DGX-H100 systems, optimizing AI model deployment for power-constrained environments. The platform supports seamless integration with HuggingFace models and offers a managed inference service for remote evaluation, enabling efficient scaling and reduced operational expenses for AI-driven applications.
Funding: $20M+
Rough estimate of the amount of funding raised
Mako
-Boston, United StatesMAKO provides automated GPU kernel selection and tuning technology that enables the deployment of AI models with up to 70% lower computing costs across any hardware infrastructure. This solution eliminates the need for manual optimization and vendor lock-in, allowing businesses to efficiently scale their AI operations in any cloud or on-premises environment.
Funding: $5M+
Rough estimate of the amount of funding raised
Fractile
Fractile is developing specialized chips that perform all operations for running large language models directly in memory, eliminating the significant delays caused by moving model weights to the processor. This technology enables the fastest possible inference of the largest transformer networks, achieving speeds up to 100 times faster at one-tenth the cost of current systems.
Funding: $10M+
Rough estimate of the amount of funding raised
Nscale
-London, United KingdomNscale provides a GPU cloud platform optimized for AI workloads, featuring on-demand compute and inference services, dedicated training clusters, and scalable GPU nodes. The platform addresses the high costs and inefficiencies associated with AI model training and deployment by offering a fully integrated infrastructure powered by renewable energy in Europe.
Funding: $100M+
Rough estimate of the amount of funding raised
TitanML
-London, United KingdomTitanML provides an enterprise-grade LLM cluster for high-performance language model inference, enabling organizations to deploy AI applications securely within their own infrastructure. This solution addresses the need for data privacy and control while optimizing operational costs and performance through advanced inference techniques.
Funding: $10M+
Rough estimate of the amount of funding raised
WhiteFiber
-City of New York, United StatesThis company provides on-demand GPU cloud infrastructure optimized for AI and machine learning workloads. Their platform offers scalable GPU clusters, high-speed storage, and secure networking, enabling teams to accelerate model training and deployment.
Cactus
-San Francisco, United StatesCactus offers a cross-platform inference framework for deploying AI models directly onto mobile devices, enabling low-latency, on-device multimodal processing. This ensures user privacy by keeping data local and optimizes performance through hardware acceleration for edge AI applications.
SwarmOne
-Mountain View, United StatesThe startup offers an AI training platform that enables instance-less deployment across thousands of GPUs with minimal code, facilitating rapid model training. This technology allows AI engineers to achieve faster training times, improved model performance, and reduced operational costs.
Funding: $50M+
Rough estimate of the amount of funding raised
AiM Future, Inc.
-Seoul, South KoreaThe startup develops an AI-based NeuroMosAIc Processor (NMP) that integrates a RISC-V architecture for high-performance computing in semiconductor applications. Its technology enables clients to efficiently evaluate neural network performance metrics such as accuracy, memory bandwidth, and run-time using SDK solutions compatible with TensorFlow, Caffe, PyTorch, and ONNX frameworks.
Funding: $5M+
Rough estimate of the amount of funding raised
FlyMy.AI
FlyMy.AI provides a cloud platform that enables businesses to run and integrate thousands of AI models with optimized inference times as low as 55.7 milliseconds, utilizing a compiler-first architecture for peak performance. This solution eliminates the need for extensive engineering teams and reduces operational costs by offering autoscaling and per-second billing, making advanced AI capabilities accessible to companies of all sizes.
Graphcore
-Bristol, United KingdomZetic.ai
-Seoul, South KoreaZETIC.ai provides NPU-powered on-device AI solutions that eliminate the need for cloud servers, significantly reducing operational costs by up to 99%. Their automated pipeline enables rapid transformation of AI models, achieving runtime performance up to 60 times faster than traditional CPU methods within 24 hours.
Focoos AI
-Turin, ItalyThe startup develops AI-driven software that automates the design and training of neural networks for artificial vision applications. This platform enables companies to deploy optimized vision models that achieve high accuracy while minimizing power consumption.
Funding: $300K+
Rough estimate of the amount of funding raised
INF AI
INF Tech provides industry-specific AI solutions, such as INF FIN for finance and INF MED for healthcare, that leverage neuro-symbolic technology to ensure reasoning accuracy. Their AI-native applications are designed to avoid common AI pitfalls, like hallucinations, and are used by security firms and hospitals.
Kog
-Paris, FranceThis company offers a real-time AI platform that combines foundational models and intuitive interfaces for rapid iteration and multi-agent orchestration. Their platform helps businesses and developers build faster, more efficient AI-driven experiences by overcoming limitations in speed and output.
NovuMind Inc.
-Santa Clara, CubaThe startup develops chip technology that integrates big data analytics and heterogeneous computing to enhance the functionality of the Internet of Things. This technology enables industries, such as automotive and healthcare, to incorporate artificial intelligence into their products and services, improving operational efficiency and decision-making capabilities.
Funding: $10M+
Rough estimate of the amount of funding raised
Yotta Labs
-Seattle, United StatesThis startup offers a decentralized operating system that optimizes AI workloads across distributed GPUs. Their platform deploys and optimizes large language models (LLMs) and AI applications on decentralized GPU networks, maximizing computational power for users. The system optimizes LLM inference flows and schedules AI workloads across decentralized networks.
Quadric
-Burlingame, United StatesQuadric has developed the Chimera GPNPU, a licensable processor architecture that integrates on-device machine learning inference with the ability to run complex C++ code without requiring code partitioning across multiple processor types. This technology scales from 1 to 864 TOPs and supports all machine learning models, including classical networks and large language models, streamlining SoC design and accelerating model porting.
Funding: $20M+
Rough estimate of the amount of funding raised
Elastix
-Seattle, United StatesElastix offers an AI inference platform that dynamically adapts resource allocation for next-generation AI workloads. It focuses on achieving a breakthrough total cost of ownership (TCO) per token by optimizing computational strategies for diverse deployment environments.
Mentium Technologies Inc.
Mentium develops co-processors that utilize hybrid in-memory and digital computation to deliver cloud-quality AI inference at ultra-low power for mission-critical applications on the ground and in space. Their technology addresses the need for reliable and efficient AI processing in environments where performance and power consumption are critical, achieving 100 times the speed and 50 times the efficiency of current solutions without requiring external memory.
Cerebrium
-London, United KingdomCerebrium is a data and AI platform that enables businesses to deploy applications without the need for a dedicated data team, utilizing efficient build processes and low-latency inference. The platform optimizes resource allocation and costs, ensuring applications are live in seconds while maintaining 99.999% uptime and compliance with security standards.
Funding: $500K+
Rough estimate of the amount of funding raised
OpenInfer
-Danville, United StatesOpenInfer provides an AI engine that enables developers to create intelligent agents capable of local inference across various hardware platforms, ensuring data privacy and reducing operational costs. This technology facilitates continuous, long-chain reasoning for applications ranging from robotics to personal assistants, enhancing the functionality of smart devices.
MK One
-Menlo Park, United StatesMK1 Flywheel is a high-performance LLM inference engine that integrates directly into existing software stacks, allowing businesses to manage GPU resources efficiently while keeping customer data and model weights secure. It enables faster response times and higher request processing rates, optimizing token costs by allowing users to utilize their own GPUs and cloud contracts without vendor lock-in.
Enot
-LuxembourgEnot offers neural network compression and acceleration tools to optimize AI model performance for faster inference and lower computational overhead. Their platform reduces model complexity and memory footprint, enabling efficient AI deployment on edge devices and in the cloud.
turba
-Mountain View, United StatesThe startup operates an artificial intelligence infrastructure optimization platform that enhances the deployment and management of machine learning models through smart workload scheduling and real-time monitoring tools. By providing a digital twin for infrastructure recommendations, the platform enables organizations to improve operational efficiency and reduce costs in multi-cloud environments.
Funding: $10M+
Rough estimate of the amount of funding raised
deepsilicon
-San Francisco, United StatesDeepsilicon develops software and hardware solutions that optimize neural network performance on-device, achieving 8x less RAM usage, 20x higher throughput, and 100x improved power efficiency. This technology addresses the challenges of high resource consumption and slow processing speeds in running complex AI models.
Funding: $500K+
Rough estimate of the amount of funding raised