Find Investable Startups and Competitors
Search thousands of startups using natural language—just describe what you're looking for
Top 50 Data Warehouse Service - Late Stage
Discover the top 50 Data Warehouse Service startups at Late Stage. Browse funding data, key metrics, and company insights. Average funding: $158.6M.
Sort by
Firebolt
Firebolt is a cloud data warehousing platform that utilizes specialized indexing and JOIN acceleration to deliver sub-second query performance on terabytes of data. It enables businesses to analyze large datasets efficiently, reducing query latency from days to seconds while minimizing storage costs.
Funding: $200M+
Rough estimate of the amount of funding raised
MotherDuck
MotherDuck is a cloud-based data warehouse that enhances DuckDB's in-process analytics capabilities, enabling real-time, collaborative data analysis without the overhead of traditional systems. It provides users with fast performance and efficient pricing, allowing for the rapid onboarding of non-technical users and the creation of interactive data applications.
Yellowbrick Data
Yellowbrick Data provides a high-performance SQL data platform that supports enterprise data warehousing and streaming analytics with continuous data ingestion and low-latency query execution. This technology enables organizations to efficiently handle large-scale, concurrent workloads while minimizing unpredictable query runtimes, facilitating faster decision-making.
Funding: $50M+
Rough estimate of the amount of funding raised
Fivetran
Fivetran provides an automated ELT platform that extracts, loads, and optionally transforms data from over 700 SaaS, database, ERP, and file sources into data warehouses, lakes, or downstream applications. The service handles schema drift, change‑data‑capture, and real‑time replication without custom code, offering enterprise‑grade security, governance, and hybrid deployment options. Users configure pipelines via a web UI or API and are billed per million rows synced.
Funding: $100M+
Rough estimate of the amount of funding raised
Onehouse
Onehouse is a fully managed cloud-native lakehouse service that ingests data from various sources in near real-time, enabling organizations to maintain a single source of truth without the need for complex data replication. By leveraging Apache Hudi and supporting multiple query engines, it reduces operational costs by over 50% while providing scalable access to analytics-ready data.
Funding: $50M+
Rough estimate of the amount of funding raised
Dremio
Dremio provides a unified lakehouse platform that combines the flexibility of data lakes with the performance of data warehouses, utilizing Apache Iceberg for efficient data management and optimization. This solution enables organizations to perform high-speed, self-service analytics on all their data without the complexities of traditional ETL processes, significantly reducing total cost of ownership and time to insight.
Funding: $200M+
Rough estimate of the amount of funding raised
RudderStack
RudderStack provides a warehouse-native customer data platform that enables businesses to collect, unify, and activate customer data in real-time. By centralizing data collection and ensuring data quality, it eliminates the complexities of data integration and compliance, allowing teams to deliver actionable insights and improve customer engagement efficiently.
Space and Time
Space and Time provides a decentralized data platform that combines blockchain indexing, data warehousing, and API services, all secured by sub-second zero-knowledge (ZK) proofs for SQL queries. This technology enables smart contracts to access verified on-chain and off-chain data in real-time, enhancing the reliability and efficiency of data-driven applications.
Funding: $50M+
Rough estimate of the amount of funding raised
Cybersyn
Cybersyn provides data-as-a-service (DaaS) by delivering analytics-ready data directly to Snowflake instances, enabling businesses to make informed decisions without the need for complex data engineering. The platform offers real-time insights into consumer behavior and market trends, allowing companies to enhance their competitive strategies and operational efficiency.
Funding: $50M+
Rough estimate of the amount of funding raised
Anomalo
Anomalo provides automated AI-driven data quality monitoring for enterprise data warehouses, utilizing unsupervised machine learning to detect anomalies and validate data integrity without requiring code. This solution addresses the issue of unreliable data by enabling rapid identification and resolution of data quality problems, ensuring accurate and trustworthy insights for business operations.
Funding: $100M+
Rough estimate of the amount of funding raised
Airbyte
Airbyte is an open-source data integration engine that enables organizations to sync data from various applications to data warehouses, facilitating seamless data movement across multi-cloud environments. By providing a platform for building custom connectors with low-code or no-code options, Airbyte addresses the challenge of managing diverse data sources while ensuring data privacy and governance.
Dune
Dune provides a cloud‑native platform that aggregates and normalizes on‑chain data from over 100 public blockchains into a unified, queryable schema. Users can run SQL‑compatible queries, build visual dashboards, and access results via REST APIs or export connectors for data warehouses and machine‑learning pipelines, all with enterprise‑grade security and near‑real‑time freshness.
Funding: $50M+
Rough estimate of the amount of funding raised
Census
Census is a Data Activation and Reverse ETL platform that enables businesses to define and sync trusted data from their data warehouse to over 150 operational tools without the need for code or CSVs. This solution eliminates data silos, allowing marketing and data teams to collaborate effectively by providing real-time access to actionable insights and standardized datasets.
Starburst
Provides a data analytics platform built on an enhanced Trino SQL engine, enabling businesses to query and analyze data across hybrid, on-premises, and multi-cloud environments without moving it. This approach reduces data processing time by 25% and supports complex queries over exabytes of data, streamlining insights for data teams while maintaining security and scalability.
Funding: $200M+
Rough estimate of the amount of funding raised
StarTree
StarTree provides a platform-as-a-service built on Apache Pinot, enabling real-time analytics with sub-second query response times on petabyte-scale data. This solution allows businesses to efficiently handle high concurrency demands while minimizing costs associated with data processing and analysis.
Funding: $50M+
Rough estimate of the amount of funding raised
Incorta
Incorta provides a unified data and analytics platform that connects directly to source systems such as ERP, CRM, and other core applications using its Direct Data Mapping engine, removing the need for traditional batch ETL pipelines. The platform streams transaction‑level data into an in‑memory engine for sub‑second queries, offers low‑code data modeling, AI‑driven natural‑language analytics, automated workflows, and role‑based security, with open APIs for integration into existing BI and cloud environments.
Funding: $100M+
Rough estimate of the amount of funding raised
Sumo Logic
Sumo Logic provides a cloud‑native platform that consolidates log management, infrastructure monitoring, and SIEM into a single searchable data lake. It ingests petabyte‑scale data from over 450 integrations, applies AI/ML for threat detection and automated alert triage, and delivers real‑time dashboards, customizable queries, and response playbooks via web UI or API. The service includes built‑in compliance controls and a pay‑for‑data‑used pricing model for enterprise security and observability teams.
Addepar
Addepar provides a cloud‑native platform that continuously aggregates asset data from over 450 custodial and market sources into a unified model, enabling real‑time visualization, scenario analysis, and on‑demand reporting for multi‑asset portfolios. The solution includes integrated trading and automated rebalancing tools, white‑label report generation, and open APIs that connect to CRM, accounting and analytics systems, all secured with role‑based access and SOC 3‑certified encryption. It is used by wealth managers, family offices, private banks and institutional investors to streamline data consolidation, analytics and client reporting.
Funding: $200M+
Rough estimate of the amount of funding raised
Ocient
The startup operates a data analytics platform that enables rapid analysis of large datasets, handling tens of terabytes to exabytes with trillions of rows. By ingesting billions of rows per second and providing filtered aggregate results, the platform simplifies complex data ecosystems for organizations.
Funding: $100M+
Rough estimate of the amount of funding raised
Materialize
Materialize is a cloud operational data store that uses Differential Dataflow to provide strongly consistent, real-time views of operational data with sub-second latency. This technology enables businesses to quickly respond to changes by integrating and querying data from multiple sources without the complexity of traditional data processing methods.
Voltron Data
Voltron Data provides Theseus, a GPU-accelerated SQL engine designed for processing petabyte-scale data without the need for indexing or data movement. It enables enterprises to significantly reduce query times, server counts, and operational costs, making it ideal for large-scale ETL and machine learning preprocessing tasks.
Funding: $100M+
Rough estimate of the amount of funding raised
Saronic
Saronic offers a cloud‑native AI platform that centralizes the full machine‑learning lifecycle for enterprise teams. It provides auto‑scaling compute for distributed training, automated data‑ingestion and feature‑store pipelines, version‑controlled model management, and secure inference APIs with built‑in explainability and audit logging. The platform integrates with major data warehouses, enabling data‑science and analytics groups to deploy predictive models at scale while maintaining governance and compliance.
Funding: $500M+
Rough estimate of the amount of funding raised
Sigma Computing
Sigma provides a cloud analytics solution with a spreadsheet-like interface that allows users to analyze billions of records in real-time using SQL, Python, or AI. This platform enables teams to collaborate effectively and automate data workflows while maintaining security and performance, addressing the need for accessible and scalable data analysis in organizations of all sizes.
Funding: $200M+
Rough estimate of the amount of funding raised
PlanetScale
PlanetScale offers a database-as-a-service built on Vitess, enabling horizontal scaling of MySQL databases through sharding across multiple nodes. This platform ensures zero downtime for schema changes and migrations, providing high availability and performance for applications handling large volumes of data.
Flexe
Flexe provides a cloud‑based platform that links enterprises to a network of over 800 warehouse operators across the United States and Canada, allowing on‑demand scaling of storage and fulfillment capacity. The system integrates with WMS, OMS and IMS via API, EDI or XML and delivers real‑time order routing, inventory visibility, and analytics while using a pay‑as‑you‑go pricing model to avoid capital expenditures and long‑term contracts. A dedicated logistics analyst control‑tower monitors performance and ensures service‑level compliance across the flexible network.
Funding: $100M+
Rough estimate of the amount of funding raised
WEKA
WEKA provides a cloud-native, software-defined data platform that enables organizations to efficiently store, process, and manage large volumes of data across on-premises and cloud environments. By transforming stagnant data silos into streaming data pipelines, WEKA enhances performance for AI and high-performance computing workloads while reducing energy consumption and carbon emissions.
Funding: $100M+
Rough estimate of the amount of funding raised
Treasure Data
Treasure Data offers a cloud-based data analytics platform that enables organizations to manage and analyze large volumes of data efficiently. The platform addresses the challenges of data silos and integration, providing actionable insights to enhance decision-making and operational performance.
Funding: $200M+
Rough estimate of the amount of funding raised
Cherre
The startup offers a financial database platform that aggregates real estate data from public, private, and internal sources to provide detailed market evaluations, property valuations, and tax assessments. This platform enables clients to reduce manual analytics costs and enhance their strategic decision-making processes.
Funding: $100M+
Rough estimate of the amount of funding raised
ClickHouse
ClickHouse develops a real-time analytical processing database management system optimized for online analytical processing (OLAP) that enables organizations to perform fast queries on large datasets. It addresses the challenge of slow data retrieval and high costs associated with traditional databases, providing significant improvements in query speed and storage efficiency.
Eppo
Eppo is an experimentation and feature management platform that utilizes a warehouse-native architecture to automate A/B testing and analysis, ensuring data integrity and reducing analysis cycles to zero. By providing feature flags and out-of-the-box reporting, Eppo enables organizations to implement a culture of experimentation, enhancing decision-making and minimizing the risk of false positives.
Funding: $50M+
Rough estimate of the amount of funding raised
Replica
Replica is an enterprise data platform that aggregates over a dozen datasets and 50+ metrics related to transportation, demographics, and land use, providing accurate and up-to-date insights for planning and operational decisions. It addresses the challenge of accessing high-quality, recent data about the built environment, enabling agencies and investors to make informed decisions based on comprehensive analytics.
Nozomi
The startup provides a straightforward tool for collecting and organizing data from API endpoints, enabling users to efficiently manage their data flow. This solution addresses the challenge of data fragmentation by simplifying the integration and accessibility of diverse API data sources.
Funding: $100M+
Rough estimate of the amount of funding raised
Pigment
Pigment provides a cloud‑native business‑planning platform that consolidates data from ERP, CRM, HRIS and data‑lake sources into a single, auditable model. Its native agentic AI suite automates data cleaning, forecasting, reporting and scenario generation, enabling finance, sales, HR and supply‑chain teams to run real‑time, collaborative what‑if analyses. The solution includes bi‑directional integrations, no‑code modeling tools, and enterprise‑grade security and compliance.
Funding: $100M+
Rough estimate of the amount of funding raised
Reveal
Reveal is a cloud-based platform that enables organizations to capture and analyze critical business data efficiently. By streamlining data collection and reporting processes, it enhances decision-making and operational effectiveness for enterprises.
Funding: $200M+
Rough estimate of the amount of funding raised
Cribl
Cribl provides a centralized data engine that ingests logs, metrics, and traces from any source, applies real‑time routing, transformation, and reduction, and forwards the processed telemetry to any destination without additional agents. The platform includes a visual pipeline builder, a persistent raw data lake for replay, and AI‑driven analytics for anomaly detection and enrichment, giving IT and security teams granular control over data volume, format, and cost.
Funding: $200M+
Rough estimate of the amount of funding raised
VAST Data
VAST Data provides a unified data platform that integrates storage, database, and compute capabilities, eliminating the need for data tiering and silos. This architecture enables organizations to manage unstructured data efficiently, enhancing accessibility and performance for AI-driven applications.
HAI ROBOTICS
Hai Robotics provides autonomous mobile robots and a cloud‑native Warehouse Execution System that automate case handling, pallet picking, and item retrieval in multi‑level warehouses. The robots use lidar‑SLAM and vision‑guided navigation, support up to 300 kg payloads, and integrate via REST/OPC UA APIs with existing WMS/ERP systems to increase storage density, speed order fulfillment, and reduce labor costs.
Funding: $100M+
Rough estimate of the amount of funding raised
VIMAAN
VIMAAN provides AI-driven computer vision solutions that enhance inventory tracking in warehouses by automating cycle counting, order validation, and real-time inventory visibility. This technology significantly reduces labor costs and improves inventory accuracy, enabling warehouses to achieve nearly instant ROI and minimize mis-shipments.
Funding: $50M+
Rough estimate of the amount of funding raised
Crisp
Crisp automates the ingestion and analysis of retail data from over 40 retailers and distributors, providing brands with real-time insights into inventory and sales performance. This enables companies to reduce out-of-stocks, optimize supply chain operations, and enhance revenue growth through actionable data-driven decisions.
Funding: $100M+
Rough estimate of the amount of funding raised
ChaosSearch
ChaosSearch provides a data platform that integrates with Databricks to enable scalable log analytics using native Elasticsearch query capabilities. The solution consolidates log and event data in a unified data lake, offering unlimited retention and reducing costs by 50-80% while eliminating the need for data movement or transformation.
Funding: $50M+
Rough estimate of the amount of funding raised
Vendia
Vendia provides a data sharing platform that enables real-time synchronization and secure exchange of business data across diverse systems and partners, eliminating the complexities of fragmented data workflows. By offering fine-grained access controls and instant data reconciliation, Vendia enhances operational efficiency and reduces costs for enterprises in industries such as finance, manufacturing, and supply chain.
Funding: $50M+
Rough estimate of the amount of funding raised
dbt Labs
dbt provides an analytics engineering platform that centralizes SQL transformation logic, testing, and documentation within a version‑controlled workflow. Its Fusion engine compiles and executes queries up to 30× faster, while built‑in governance features such as automated testing, schema enforcement, and lineage visualizations ensure data reliability. The platform also includes AI‑driven Copilot assistance and low‑code tools like dbt Canvas for collaborative development across analytics engineers and analysts.
Funding: $200M+
Rough estimate of the amount of funding raised
SafeGraph
The startup offers a machine learning-based data platform that integrates and verifies data from thousands of sources, including business names, addresses, and operational hours. This platform provides companies with accurate records essential for analyzing human movement patterns and making informed decisions.
Funding: $50M+
Rough estimate of the amount of funding raised
Attabotics
Attabotics offers a modular robotic warehousing system that utilizes three-dimensional automated storage and retrieval technology to optimize space and efficiency in fulfillment operations. By reducing warehouse footprint by up to 85% and labor costs by 75%, the platform enables faster delivery times and lowers capital expenditures for businesses.
Funding: $100M+
Rough estimate of the amount of funding raised
Simon Data
Simon Data provides a Customer Data Platform (CDP) built on Snowflake that enables marketing teams to unify and access customer data for real-time segmentation and personalization. This platform addresses the challenge of outdated data architectures, allowing brands to enhance campaign performance and customer engagement through precise, data-driven marketing strategies.
Funding: $50M+
Rough estimate of the amount of funding raised
PingCAP US
TiDB is a distributed SQL database that provides seamless scalability and high availability for managing large volumes of data without the need for manual sharding. It enables organizations to achieve real-time insights and operational agility while ensuring data security and compliance across various workloads.
Volumez
Volumez offers a Data Infrastructure as a Service (DIaaS) platform that dynamically orchestrates compute, network, and storage resources across cloud environments to create optimized data infrastructures for various workloads. This solution addresses the challenges of performance inconsistency and resource inefficiency in data-intensive applications by delivering guaranteed high throughput, low latency, and maximized GPU utilization.
Funding: $50M+
Rough estimate of the amount of funding raised
LaunchDarkly
LaunchDarkly provides a cloud platform for feature flag management, enabling developers to toggle code paths, target specific user segments, and execute progressive rollouts without redeploying. The service includes real‑time monitoring, automated rollback, and a built‑in experimentation engine that delivers statistical analysis and product analytics for data‑driven decisions. It also supports AI Configs for versioning prompts, models, and agents, and integrates with CI/CD, observability, and data‑warehouse tools via 35+ SDKs.
Funding: $200M+
Rough estimate of the amount of funding raised
inVia Robotics
inVia Robotics provides a Robots-as-a-Service solution that integrates autonomous mobile robots and AI-powered Warehouse Execution System software to enhance warehouse productivity. Their technology enables e-commerce distribution centers to achieve up to 5x productivity increases while minimizing labor costs and utilizing existing infrastructure.
Funding: $50M+
Rough estimate of the amount of funding raised
Hammerspace
Hammerspace provides an automated data orchestration system that unifies and manages unstructured data across edge locations, data centers, and public cloud environments using a standards-based parallel file system architecture. This technology eliminates data silos, enabling organizations to optimize resource utilization and accelerate access to critical data for AI and high-performance computing applications.
Funding: $50M+
Rough estimate of the amount of funding raised