Find Investable Startups and Competitors
Search thousands of startups using natural language—just describe what you're looking for
Top 50 Data Warehouse Service - Series B
Discover the top 50 Data Warehouse Service startups at Series B. Browse funding data, key metrics, and company insights. Average funding: $64.5M.
Sort by
MotherDuck
MotherDuck is a cloud-based data warehouse that enhances DuckDB's in-process analytics capabilities, enabling real-time, collaborative data analysis without the overhead of traditional systems. It provides users with fast performance and efficient pricing, allowing for the rapid onboarding of non-technical users and the creation of interactive data applications.
Yellowbrick Data
Yellowbrick Data provides a high-performance SQL data platform that supports enterprise data warehousing and streaming analytics with continuous data ingestion and low-latency query execution. This technology enables organizations to efficiently handle large-scale, concurrent workloads while minimizing unpredictable query runtimes, facilitating faster decision-making.
Funding: $50M+
Rough estimate of the amount of funding raised
Fivetran
Fivetran provides an automated ELT platform that extracts, loads, and optionally transforms data from over 700 SaaS, database, ERP, and file sources into data warehouses, lakes, or downstream applications. The service handles schema drift, change‑data‑capture, and real‑time replication without custom code, offering enterprise‑grade security, governance, and hybrid deployment options. Users configure pipelines via a web UI or API and are billed per million rows synced.
Funding: $100M+
Rough estimate of the amount of funding raised
Onehouse
Onehouse is a fully managed cloud-native lakehouse service that ingests data from various sources in near real-time, enabling organizations to maintain a single source of truth without the need for complex data replication. By leveraging Apache Hudi and supporting multiple query engines, it reduces operational costs by over 50% while providing scalable access to analytics-ready data.
Funding: $50M+
Rough estimate of the amount of funding raised
SCUBA Analytics
SCUBA is a Time-Series Data Warehouse optimized for real-time decision intelligence, enabling businesses to leverage first-party and AI-generated data without complex integrations. The platform provides in-the-moment insights and activation across channels while ensuring data privacy and enhancing audience profiles through agnostic identity matching.
Revefi
Revefi develops Raden, an AI-powered data engineering platform that automates data observability, quality, and performance optimization for cloud data warehouses. The platform enables businesses to achieve a 30-50% reduction in data infrastructure costs and enhances operational efficiency by providing actionable insights within minutes.
Funding: $20M+
Rough estimate of the amount of funding raised
Tembo
Provides a fully-managed PostgreSQL platform that supports over 200 extensions, enabling developers to build, scale, and optimize applications with features like data warehousing, message queuing, machine learning, and geospatial processing. By consolidating multiple data services into a single, extensible database solution, it reduces infrastructure sprawl and simplifies operational complexity for real-time and analytical workloads.
Funding: $20M+
Rough estimate of the amount of funding raised
Rivery
Rivery is a no-code ELT platform that enables rapid data ingestion, transformation, and orchestration from various sources into cloud data warehouses. It streamlines data workflows and reduces processing time, allowing businesses to efficiently manage their data operations and enhance analytics capabilities.
RudderStack
RudderStack provides a warehouse-native customer data platform that enables businesses to collect, unify, and activate customer data in real-time. By centralizing data collection and ensuring data quality, it eliminates the complexities of data integration and compliance, allowing teams to deliver actionable insights and improve customer engagement efficiently.
Vaultspeed
Vaultspeed provides a no-code, metadata-driven data modeling platform that automates data integration and business logic for cloud data warehouses, lakehouses, and meshes. This solution enables data teams to rapidly build and deploy data products in under two sprints, significantly reducing time-to-market and technical debt.
Funding: $20M+
Rough estimate of the amount of funding raised
Space and Time
Space and Time provides a decentralized data platform that combines blockchain indexing, data warehousing, and API services, all secured by sub-second zero-knowledge (ZK) proofs for SQL queries. This technology enables smart contracts to access verified on-chain and off-chain data in real-time, enhancing the reliability and efficiency of data-driven applications.
Funding: $50M+
Rough estimate of the amount of funding raised
Cybersyn
Cybersyn provides data-as-a-service (DaaS) by delivering analytics-ready data directly to Snowflake instances, enabling businesses to make informed decisions without the need for complex data engineering. The platform offers real-time insights into consumer behavior and market trends, allowing companies to enhance their competitive strategies and operational efficiency.
Funding: $50M+
Rough estimate of the amount of funding raised
Anomalo
Anomalo provides automated AI-driven data quality monitoring for enterprise data warehouses, utilizing unsupervised machine learning to detect anomalies and validate data integrity without requiring code. This solution addresses the issue of unreliable data by enabling rapid identification and resolution of data quality problems, ensuring accurate and trustworthy insights for business operations.
Funding: $100M+
Rough estimate of the amount of funding raised
Airbyte
Airbyte is an open-source data integration engine that enables organizations to sync data from various applications to data warehouses, facilitating seamless data movement across multi-cloud environments. By providing a platform for building custom connectors with low-code or no-code options, Airbyte addresses the challenge of managing diverse data sources while ensuring data privacy and governance.
Dune
Dune provides a cloud‑native platform that aggregates and normalizes on‑chain data from over 100 public blockchains into a unified, queryable schema. Users can run SQL‑compatible queries, build visual dashboards, and access results via REST APIs or export connectors for data warehouses and machine‑learning pipelines, all with enterprise‑grade security and near‑real‑time freshness.
Funding: $50M+
Rough estimate of the amount of funding raised
Census
Census is a Data Activation and Reverse ETL platform that enables businesses to define and sync trusted data from their data warehouse to over 150 operational tools without the need for code or CSVs. This solution eliminates data silos, allowing marketing and data teams to collaborate effectively by providing real-time access to actionable insights and standardized datasets.
Kubit
Kubit provides a no-code analytics platform that activates customer journey insights directly from existing data warehouses, eliminating the need for ETL processes. This solution enables teams to access real-time, event-level visibility into user behavior, facilitating faster, data-driven decision-making without data duplication.
Decodable
Decodable offers a fully managed, SQL‑native streaming platform that enables data teams to create continuous data pipelines using ANSI‑SQL, while the service automatically handles scaling, state management, and fault tolerance. The platform provides a wide connector catalog for ingesting from event hubs and exporting to data warehouses, APIs, and analytics tools, and includes built‑in observability, lineage, and enterprise‑grade security features.
Funding: $20M+
Rough estimate of the amount of funding raised
Tabular (now part of Databricks)
Develops a data automation platform built on Apache Iceberg, enabling seamless table format interoperability across data lakes and warehouses. It addresses the challenges of data fragmentation and inefficiency by providing a unified standard for managing and processing large-scale datasets.
Funding: $20M+
Rough estimate of the amount of funding raised
Qatalog
Provides a data integration platform that connects and harmonizes information from emails, files, and various applications, including databases like BigQuery and Snowflake. It enables organizations to perform intelligent queries across their entire data ecosystem without requiring technical expertise, ensuring secure, permission-based access to insights. This streamlines data retrieval and collaboration for teams, reducing time spent on manual data aggregation and improving decision-making efficiency.
Harbr
Provides a cloud-based platform for creating, governing, and distributing data products across enterprises without moving data from its source. It enables seamless data sharing, integration with tools and AI models, and the establishment of self-service data marketplaces, reducing the complexity and cost of data access and distribution.
LanceDB
LanceDB provides an AI‑native multimodal lakehouse that stores raw media, structured data, and billions of vectors together in a columnar file format, enabling hybrid keyword‑vector search and on‑the‑fly reranking. It offers declarative, versioned feature pipelines with LLM‑as‑UDF support and integrates high‑throughput SQL and PyTorch/JAX data loaders for efficient model training, while separating compute from storage to reduce costs. The platform can be deployed on‑premises, in existing data lakes, or as a managed service and includes enterprise‑grade security and compliance.
Funding: $20M+
Rough estimate of the amount of funding raised
Modjoul
Modjoul provides an AI-driven platform that transforms warehouse data into real-time insights, enabling management to enhance safety and optimize workflows. The technology specifically targets workplace injuries and operational inefficiencies by autonomously monitoring wearable and machine data to deliver actionable notifications.
Funding: $20M+
Rough estimate of the amount of funding raised
Pantomath
Pantomath provides an AI‑driven Data Operations Center that continuously monitors databases, data warehouses, streaming platforms, and orchestration tools, aggregating telemetry into a unified traceability graph. The platform automatically correlates events, performs root‑cause analysis, and triggers self‑healing actions while delivering compliance and SLA metrics for data‑trust governance. It helps large and mid‑market enterprises reduce manual triage, operational costs, and downtime in complex analytics pipelines.
StarTree
StarTree provides a platform-as-a-service built on Apache Pinot, enabling real-time analytics with sub-second query response times on petabyte-scale data. This solution allows businesses to efficiently handle high concurrency demands while minimizing costs associated with data processing and analysis.
Funding: $50M+
Rough estimate of the amount of funding raised
Castor
Provides an AI-powered data discovery platform that enables users to find, understand, and trust their data through natural language search, automated documentation, and SQL query simplification. It reduces reliance on IT by offering self-service analytics while ensuring data governance, compliance, and security at scale.
Funding: $20M+
Rough estimate of the amount of funding raised
Syncari
Provides an autonomous data management platform that unifies, synchronizes, and governs enterprise data across multiple systems using AI-driven workflows and real-time multi-directional sync. This eliminates data silos, ensures consistent data quality, and enables accurate reporting, empowering organizations to make informed decisions and scale operations efficiently.
Funding: $20M+
Rough estimate of the amount of funding raised
Silverfin
Silverfin provides a cloud‑native platform that aggregates client ledgers, source documents and financial data into a single encrypted repository and automates accounting tasks such as journal entry validation, consolidation and tax computation through configurable, rule‑based workflows. The solution includes real‑time collaboration, integrated analytics and KPI dashboards, and native integrations with over 50 ERP, banking and accounting systems via an open REST API. It also offers an advisory services module that lets firms package data‑driven consulting services within the same environment.
Funding: $20M+
Rough estimate of the amount of funding raised
Simple Form
Simple Form provides real-time access to essential corporate data for financial institutions, enabling them to make informed decisions based on accurate and up-to-date information. This service addresses the challenge of data fragmentation by consolidating critical information on domestic corporations into a single, reliable source.
Funding: $20M+
Rough estimate of the amount of funding raised
Hevo Data
Hevo Data provides an automated data pipeline platform that integrates data from over 150 sources in near real-time, enabling users to prepare and load analytics-ready data with minimal maintenance. The platform ensures 100% data accuracy and 99.9% uptime, addressing the challenges of data management and operational inefficiencies for modern data teams.
Incorta
Incorta provides a unified data and analytics platform that connects directly to source systems such as ERP, CRM, and other core applications using its Direct Data Mapping engine, removing the need for traditional batch ETL pipelines. The platform streams transaction‑level data into an in‑memory engine for sub‑second queries, offers low‑code data modeling, AI‑driven natural‑language analytics, automated workflows, and role‑based security, with open APIs for integration into existing BI and cloud environments.
Funding: $100M+
Rough estimate of the amount of funding raised
Tessell
Tessell provides a fully-managed database service that optimizes cloud database performance and reduces total cost of ownership by over 40% through infrastructure right-sizing and database consolidation. The platform ensures high availability, data protection, and simplified management, enabling enterprises to achieve significant ROI while minimizing operational complexities.
Hightouch
Hightouch is a Composable Customer Data Platform (CDP) that enables marketers to activate and sync customer data directly from their data warehouse to over 200 marketing and sales tools without the need for coding. This approach allows businesses to create targeted audiences, optimize campaign performance, and enhance customer engagement while maintaining data security and governance.
Funding: $20M+
Rough estimate of the amount of funding raised
Sumo Logic
Sumo Logic provides a cloud‑native platform that consolidates log management, infrastructure monitoring, and SIEM into a single searchable data lake. It ingests petabyte‑scale data from over 450 integrations, applies AI/ML for threat detection and automated alert triage, and delivers real‑time dashboards, customizable queries, and response playbooks via web UI or API. The service includes built‑in compliance controls and a pay‑for‑data‑used pricing model for enterprise security and observability teams.
Ocient
The startup operates a data analytics platform that enables rapid analysis of large datasets, handling tens of terabytes to exabytes with trillions of rows. By ingesting billions of rows per second and providing filtered aggregate results, the platform simplifies complex data ecosystems for organizations.
Funding: $100M+
Rough estimate of the amount of funding raised
PrimeNumber
PrimeNumbers offers a data integration service called TROCCO® that automates the data acquisition process, enabling engineers to efficiently manage and utilize data from various sources. This solution addresses the challenge of fragmented data environments by providing a centralized platform for data orchestration, enhancing data accessibility and operational efficiency for businesses.
Funding: $20M+
Rough estimate of the amount of funding raised
Preset
Preset offers a cloud-based analytics platform that aggregates and analyzes large datasets from diverse sources, enabling real-time insights for organizations. This platform enhances data-driven decision-making and collaboration among teams, addressing the challenge of fragmented data access and analysis.
Omni
Omni is a business intelligence platform that utilizes a unified data model and SQL to provide reliable, curated metrics with centralized management and clear permissions. It enables teams to perform ad hoc analysis and access data independently, reducing the need for extensive in-house resources while ensuring data governance.
Funding: $20M+
Rough estimate of the amount of funding raised
Materialize
Materialize is a cloud operational data store that uses Differential Dataflow to provide strongly consistent, real-time views of operational data with sub-second latency. This technology enables businesses to quickly respond to changes by integrating and querying data from multiple sources without the complexity of traditional data processing methods.
Voltron Data
Voltron Data provides Theseus, a GPU-accelerated SQL engine designed for processing petabyte-scale data without the need for indexing or data movement. It enables enterprises to significantly reduce query times, server counts, and operational costs, making it ideal for large-scale ETL and machine learning preprocessing tasks.
Funding: $100M+
Rough estimate of the amount of funding raised
Sigma Computing
Sigma provides a cloud analytics solution with a spreadsheet-like interface that allows users to analyze billions of records in real-time using SQL, Python, or AI. This platform enables teams to collaborate effectively and automate data workflows while maintaining security and performance, addressing the need for accessible and scalable data analysis in organizations of all sizes.
Funding: $200M+
Rough estimate of the amount of funding raised
PlanetScale
PlanetScale offers a database-as-a-service built on Vitess, enabling horizontal scaling of MySQL databases through sharding across multiple nodes. This platform ensures zero downtime for schema changes and migrations, providing high availability and performance for applications handling large volumes of data.
Tracer
The startup offers a data intelligence platform that automatically collects and organizes non-personally identifiable data from encrypted user identities to corporate revenue statements. By providing subscription-based access and consulting services, the platform enables businesses to gain transparency into their performance and make informed decisions based on accurate data analysis.
Funding: $20M+
Rough estimate of the amount of funding raised
Similarweb
Similarweb offers an AI‑enhanced digital intelligence platform that aggregates billions of data points from websites, mobile apps, search keywords, advertising spend and shopper behavior into a unified, real‑time view. The service provides automated competitive analysis, SEO/PPC performance metrics, sales prospecting insights and a Data‑as‑a‑Service API that can be integrated with CRM, BI and marketing automation tools. Users such as marketers, sales teams and analysts use the platform to benchmark performance, detect market trends and generate data‑driven recommendations.
Funding: $20M+
Rough estimate of the amount of funding raised
Flexe
Flexe provides a cloud‑based platform that links enterprises to a network of over 800 warehouse operators across the United States and Canada, allowing on‑demand scaling of storage and fulfillment capacity. The system integrates with WMS, OMS and IMS via API, EDI or XML and delivers real‑time order routing, inventory visibility, and analytics while using a pay‑as‑you‑go pricing model to avoid capital expenditures and long‑term contracts. A dedicated logistics analyst control‑tower monitors performance and ensures service‑level compliance across the flexible network.
Funding: $100M+
Rough estimate of the amount of funding raised
Y42
Y42 offers a turnkey data orchestration platform that enables users to build, monitor, and maintain data pipelines using Google BigQuery and Snowflake. It addresses fragmented data flows and inefficient maintenance by providing a unified architecture with built-in monitoring and version control, allowing teams to streamline their data operations.
Funding: $20M+
Rough estimate of the amount of funding raised
WEKA
WEKA provides a cloud-native, software-defined data platform that enables organizations to efficiently store, process, and manage large volumes of data across on-premises and cloud environments. By transforming stagnant data silos into streaming data pipelines, WEKA enhances performance for AI and high-performance computing workloads while reducing energy consumption and carbon emissions.
Funding: $100M+
Rough estimate of the amount of funding raised
Abett
Abett provides a centralized hub that aggregates and unifies employee benefits data from various sources, enabling employers to access clear insights and streamline decision-making. This approach addresses the challenge of fragmented data management, allowing organizations to optimize healthcare expenditure and enhance benefits strategy.
Funding: $20M+
Rough estimate of the amount of funding raised
GoodData
GoodData provides an API‑first data intelligence platform that lets developers embed analytics, AI assistants and automated workflows into applications via Python and React SDKs. The platform includes an analytics lake with an in‑memory FlexQuery engine built on Apache Arrow and a headless semantic layer exposing real‑time metrics through REST, GraphQL and PostgreSQL. All components are managed as code‑as‑infrastructure, supporting CI/CD, multi‑tenant isolation and fine‑grained security for scalable, governed deployments.
Funding: $20M+
Rough estimate of the amount of funding raised
Cherre
The startup offers a financial database platform that aggregates real estate data from public, private, and internal sources to provide detailed market evaluations, property valuations, and tax assessments. This platform enables clients to reduce manual analytics costs and enhance their strategic decision-making processes.
Funding: $100M+
Rough estimate of the amount of funding raised