Find Investable Startups and Competitors
Search thousands of startups using natural language—just describe what you're looking for
Top 50 Data Labeling Service - Series B
Discover the top 50 Data Labeling Service startups at Series B. Browse funding data, key metrics, and company insights. Average funding: $48.8M.
Sort by
Surge AI
Surge AI provides a data labeling platform that utilizes human feedback to enhance the training of large language models (LLMs). By delivering high-quality labeled data, Surge AI enables organizations to improve the accuracy and performance of their NLP applications.
Funding: $20M+
Rough estimate of the amount of funding raised
HumanSignal
HumanSignal provides a data labeling platform that combines automation and human oversight to prepare training data, fine-tune large language models, and evaluate AI outputs. This solution enhances model accuracy and efficiency while ensuring compliance and data security across various use cases and data types.
Labelbox
Labelbox operates a data training platform that utilizes AI-assisted labeling and a global network of experts to provide high-quality data curation and evaluation for machine learning applications. This platform addresses the challenge of efficiently managing large-scale data labeling and evaluation, enabling businesses to accelerate model development and improve AI performance.
Centaur Labs
Centaur Labs provides a medical AI platform that utilizes a global network of expert annotators for precise data labeling across various modalities, including text, audio, and imaging. This approach addresses the challenge of slow and inconsistent data annotation by ensuring high-quality labels through automated quality checks and performance metrics.
V7
V7 is an AI training data platform that provides high-quality image and video annotations for computer vision models, utilizing AI-assisted labeling tools to enhance accuracy and efficiency. The platform addresses the challenge of slow and error-prone data labeling processes by streamlining workflows and enabling rapid deployment of training data.
Funding: $20M+
Rough estimate of the amount of funding raised
Kognic
Kognic offers a data annotation platform specifically designed for sensor-fusion datasets, enabling efficient management and accurate labeling of complex multi-sensor data. By utilizing an auto-label co-pilot, Kognic reduces annotation time by up to 68%, addressing the high costs and complexities associated with generating and curating representative datasets.
Funding: $20M+
Rough estimate of the amount of funding raised
Clarifai
Clarifai offers an end-to-end AI lifecycle platform that automates data labeling, model training, and deployment, enabling organizations to build and operationalize AI applications efficiently. By standardizing workflows and optimizing compute resources, the platform reduces development time and costs, allowing enterprises to scale AI solutions rapidly.
Funding: $50M+
Rough estimate of the amount of funding raised
Snorkel AI
Snorkel Flow is an AI data development platform that enables data scientists to programmatically label and annotate large datasets, significantly reducing the time required for data preparation. By leveraging domain knowledge and automated techniques, the platform enhances the accuracy and efficiency of training data for specialized AI applications in fields like bioinformatics and natural language processing.
Funding: $100M+
Rough estimate of the amount of funding raised
Kili Technology
Kili Technology provides tailored data annotation and evaluation services for large language models, utilizing expert-led project management to streamline the data pipeline. This approach eliminates data bottlenecks, enabling companies to enhance model performance and accelerate AI project deployment.
Funding: $20M+
Rough estimate of the amount of funding raised
SuperAnnotate
SuperAnnotate is an AI data platform that integrates dataset creation, curation, and model evaluation into a single workflow, enabling users to build and fine-tune high-quality models efficiently. The platform addresses the challenges of data annotation and model performance assessment by providing customizable tools and access to a global marketplace of trained annotation teams.
Encord
Encord is an AI data development platform that enables computer vision and multimodal AI teams to manage, curate, and annotate diverse data types, including images, videos, and documents, all in one place. By transforming unstructured data into high-quality training datasets, Encord enhances AI model performance and accelerates labeling processes, resulting in significant improvements in accuracy and efficiency.
Dataloop AI
DataLoops provides a data management and annotation platform that automates the preprocessing and curation of unstructured visual data, enabling the rapid generation of machine-readable datasets. This solution enhances the efficiency of AI application development by streamlining data pipelines and integrating human feedback for improved accuracy.
Funding: $20M+
Rough estimate of the amount of funding raised
Pienso
Pienso provides a no-code platform for training and deploying customized Large Language Models (LLMs) using both structured and unstructured data, enabling users to categorize, label, and analyze their data efficiently. The solution ensures data privacy by operating in the user's environment, allowing businesses to gain real-time insights while maintaining control over their sensitive information.
Funding: $20M+
Rough estimate of the amount of funding raised
Outlier AI
Outlier AI connects AI development companies with a global network of domain experts for specialized data annotation and model evaluation. The platform facilitates remote, flexible work, enabling experts to improve AI model accuracy through tasks like rating AI outputs and evaluating multi-modal data.
Funding: $20M+
Rough estimate of the amount of funding raised
Voxel51
Voxel51 provides the FiftyOne platform, which enables machine learning and computer vision teams to efficiently curate, visualize, and manage large datasets while automating the identification of annotation errors. This technology enhances model performance by ensuring high-quality data is readily available for training and evaluation, streamlining the development of visual AI applications.
Coactive AI
Coactive AI is a machine learning platform that automates metadata generation for unstructured image and video data, achieving 95% accuracy without manual tagging. This technology enhances content discoverability and optimizes media management systems, enabling businesses to unlock the value of their digital archives.
Funding: $20M+
Rough estimate of the amount of funding raised
Roboflow
Roboflow provides a platform for developers to manage image data and streamline the process of training and deploying computer vision models. By offering tools for dataset annotation, preprocessing, and one-click model training, it simplifies the complexities of computer vision projects, enabling faster development and deployment.
Cleanlab
Cleanlab automates data error detection and correction using AI-powered algorithms to enhance the quality of datasets for machine learning and analytics. This technology addresses issues such as label noise, outliers, and data drift, significantly reducing the time and cost associated with data management while improving model performance.
Funding: $20M+
Rough estimate of the amount of funding raised
aiMotive
aiMotive provides an end‑to‑end platform that automates sensor data ingestion, AI‑assisted labeling, and photorealistic simulation while delivering modular, ISO‑26262‑aligned perception, planning, and control software for radar‑camera‑only ADAS and automated driving. The integrated cloud‑based NPU emulator enables faster‑than‑real‑time software‑in‑the‑loop testing within CI/CD pipelines, helping OEMs and Tier‑1 suppliers reduce development time and validation costs for L2‑L4 features.
Funding: $20M+
Rough estimate of the amount of funding raised
Chooch
The startup offers a visual recognition platform that autonomously processes diverse visual data, including infrared and X-ray images, while accurately tagging objects of interest. This technology enhances operational efficiency and ensures high-quality results for clients across various industries.
Funding: $20M+
Rough estimate of the amount of funding raised
Datagen
Datagen Technologies develops simulated data technology that generates scalable, bias-free datasets with automatic annotation capabilities. This technology addresses the challenges of data scarcity and bias in machine learning, enabling more accurate and reliable model training.
Funding: $50M+
Rough estimate of the amount of funding raised
Ripcord
Ripcord utilizes robotics and AI to digitize and classify both paper and digital documents, extracting and enriching data for easy access and automation. This process addresses the inefficiency of managing unstructured data, enabling organizations to streamline operations and enhance decision-making with accurate, readily available information.
Nansen
Nansen is a blockchain analytics platform that utilizes wallet labeling and on-chain data querying to provide crypto investors with actionable insights and real-time alerts on market movements. By enabling users to identify significant wallet activities and trends across multiple blockchains, Nansen helps investors make informed decisions and mitigate risks in their portfolios.
Accern
Accern provides a no-code natural language processing (NLP) platform that classifies content to enhance research workflows and improve model accuracy across various industries. By automating the classification of key information, the platform helps businesses reduce costs and increase revenue through more efficient data utilization.
Funding: $20M+
Rough estimate of the amount of funding raised
DatologyAI
DatologyAI develops automated data curation tools that utilize modality-agnostic algorithms to identify and eliminate redundant and noisy data points without requiring labels. This technology enables organizations to optimize their deep learning model training, significantly improving performance while reducing computational costs.
Kargo
Kargo utilizes computer vision to automate loading dock operations, capturing data from labels and images to provide real-time inventory updates and damage verification. This technology enhances efficiency and accuracy while reducing manual labor and freight claims in supply chain management.
Funding: $20M+
Rough estimate of the amount of funding raised
Parsableai
Parsable.ai provides a RESTful API that extracts structured data from PDFs, scanned images, DOCX, HTML, and other office files using AI‑enhanced OCR and transformer‑based NLP. Users define extraction templates through a low‑code UI or programmatically via the API, receiving results in JSON, XML, or CSV and integrating with cloud storage, webhooks, or message queues. The service includes enterprise‑grade security, audit logging, and a monitoring dashboard to automate data entry for finance, real‑estate, and HR processes.
Funding: $50M+
Rough estimate of the amount of funding raised
Bedrock Security
Provides a data security platform that uses patented AI-driven discovery and classification technology to identify, monitor, and protect sensitive data across cloud, SaaS, and on-premises environments. It reduces risks from unauthorized access, data exposure, and compliance failures by enabling real-time visibility, policy enforcement, and automated threat response without moving data outside the enterprise.
Funding: $20M+
Rough estimate of the amount of funding raised
1touch.io
1touch.io provides a sensitive data intelligence platform that utilizes supervised AI to achieve 98.6% accuracy in structured data and 100% accuracy in unstructured data across various environments, including on-premises and multi-cloud systems. The platform enables organizations to identify and protect sensitive information in real-time, addressing the challenge of unknown data exposure and compliance with privacy regulations.
Funding: $20M+
Rough estimate of the amount of funding raised
Defined.ai
Defined.ai provides a marketplace for ethically sourced training data, specializing in diverse datasets for speech recognition, natural language processing, and medical image analysis. The company addresses the need for high-quality, bias-free data that complies with ethical and legal standards, enabling organizations to develop AI solutions responsibly and effectively.
Funding: $50M+
Rough estimate of the amount of funding raised
Borneo
Borneo is a data security platform that provides real-time visibility, assessment, and remediation of sensitive data across its entire lifecycle, utilizing machine learning for automated discovery and classification of personally identifiable information (PII). It addresses the challenges of data misuse and compliance gaps by enabling organizations to maintain control over their data while ensuring adherence to privacy regulations without ongoing management overhead.
Funding: $20M+
Rough estimate of the amount of funding raised
Anyline
This company offers an AI-powered optical character recognition (OCR) technology that extracts data from images, including barcodes and QR codes, directly on devices. Their solution converts scanned text into editable data without requiring a server connection, enabling offline data extraction for various applications.
Funding: $20M+
Rough estimate of the amount of funding raised
Dasera
Dasera is a Data Security and Privacy Management (DSPM) platform that automates the discovery, classification, and governance of structured and unstructured data across on-premises, cloud, and hybrid environments. By providing precise visibility and control over data access and usage, Dasera minimizes the risks associated with data breaches and regulatory non-compliance.
Funding: $20M+
Rough estimate of the amount of funding raised
Mine
MineOS is a data privacy and governance platform that utilizes AI-driven data mapping and classification techniques to automate compliance with privacy regulations. It addresses the challenges of managing sensitive information and data subject requests by providing centralized workflows and real-time visibility across diverse data sources.
Normalyze
Provides a Data Security Posture Management (DSPM) platform that continuously discovers, classifies, and monitors sensitive data across cloud, SaaS, PaaS, and on-premises environments. It identifies and prioritizes data access risks, enables real-time remediation, and supports compliance by generating detailed risk assessments and actionable security insights.
Funding: $20M+
Rough estimate of the amount of funding raised
Lemonilo
Lemonilo manufactures snack, noodle, and ready‑to‑eat products reformulated with low‑glycemic, high‑fiber, plant‑based ingredients, using low‑temperature extrusion and high‑pressure processing to retain nutrients and extend shelf life. The company distributes these affordable, healthier FMCG items through modern trade and e‑commerce channels, providing QR‑code labeling for transparent nutrition information.
Funding: $20M+
Rough estimate of the amount of funding raised
Better Trucks
This startup provides package delivery services for online retail companies by utilizing a strategically-placed warehouse network for sorting and labeling packages. Their system enables businesses to optimize dispatch schedules and delivery routes, resulting in reduced operational costs and improved capacity management.
Funding: $20M+
Rough estimate of the amount of funding raised
Sentra
Provides an AI-powered Data Security Posture Management (DSPM) platform that automatically discovers, classifies, and monitors sensitive data across cloud environments, including SaaS, IaaS, and on-premises systems. It mitigates risks by enforcing least privilege access, detecting policy violations, and preventing data breaches through continuous monitoring and contextual threat analysis.
Funding: $50M+
Rough estimate of the amount of funding raised
Syncell
Syncell develops the Microscoop® platform, which utilizes automated photo-biotinylation for high-precision microscopy-guided proteomic discovery at cellular and subcellular levels. This technology enables the unbiased identification of protein constituents in tissue samples, addressing the limitations of traditional proximity labeling and mass spectrometry methods in understanding disease-associated protein interactions.
Funding: $20M+
Rough estimate of the amount of funding raised
Mindee
Mindee provides an AI-driven platform for precise data extraction from various document types, significantly reducing manual data entry errors by up to 30%. The solution enables businesses to automate complex workflows, enhancing operational efficiency and cutting turnaround times by 57%.
CreativeX
The startup offers a creative measurement platform that analyzes images and videos by tagging content and cross-referencing it with brand guidelines and digital ad performance. This technology enables clients to enhance the effectiveness of their visual marketing by providing actionable insights derived from artificial intelligence.
Funding: $20M+
Rough estimate of the amount of funding raised
Open Raven
Open Raven provides a data security platform that enables multi-cloud data discovery, classification, and posture management, ensuring visibility and control over sensitive data across environments like AWS and Google Drive. The platform identifies potential data leaks and compliance risks, allowing organizations to proactively manage their data security posture and streamline compliance efforts.
Funding: $20M+
Rough estimate of the amount of funding raised
Superb AI
Superb AI offers an end-to-end training data platform that automates data preparation and curation, enabling rapid and systematic dataset creation for AI model development. This solution addresses the inefficiencies in data handling, allowing organizations to streamline their AI workflows and enhance model deployment speed.
Flatfile
Flatfile provides a data onboarding platform that utilizes a JavaScript snippet to import, map, and normalize customer data from spreadsheets into software applications. This technology reduces the time and cost associated with manual data cleanup, ensuring high-quality, validated data for seamless integration into business systems.
Hyperscience
Hyperscience is an intelligent document-processing platform that utilizes machine learning to automate the extraction and validation of data from various document types, achieving over 96% accuracy and 99% automation. The platform addresses the inefficiencies in manual document processing, enabling enterprises to significantly reduce turnaround times and operational costs.
Flatfile
Flatfile (Obvious) provides an AI‑driven data preparation platform that automates extraction, schema mapping, cleaning, transformation, and validation of enterprise files from any source. The system combines a smart extractor, semantic AI mapping, real‑time validation with AutoFix, and natural‑language bulk edits, while offering both no‑code configuration and extensible SDKs in a secure, collaborative workspace. It is aimed at data onboarding teams and system integrators building pipelines for ERP systems such as NetSuite and Workday.
Alkymi
Alkymi develops an AI-powered platform that automates the extraction and transformation of unstructured investment data into structured, actionable datasets, enabling seamless integration with existing financial systems. This solution addresses the inefficiencies in managing diverse investment document workflows, allowing firms to process data faster and make informed, data-driven decisions.
Funding: $20M+
Rough estimate of the amount of funding raised
Anonos
Anonos provides the Data Embassy platform, a policy‑as‑code engine that encodes GDPR, CCPA and other regulations into machine‑readable policies and enforces them automatically on data wherever it resides—on‑prem, public cloud, or partner sites. The platform applies protection at ingestion, transformation, and export, supports cross‑border residency routing, AI data provenance, and on‑demand synthetic or masked test data, while logging all actions to an immutable audit trail accessible via API. It enables regulated enterprises to share and analyze sensitive data faster without compromising compliance.
Funding: $50M+
Rough estimate of the amount of funding raised
SafeGraph
The startup offers a machine learning-based data platform that integrates and verifies data from thousands of sources, including business names, addresses, and operational hours. This platform provides companies with accurate records essential for analyzing human movement patterns and making informed decisions.
Funding: $50M+
Rough estimate of the amount of funding raised
Tracer
The startup offers a data intelligence platform that automatically collects and organizes non-personally identifiable data from encrypted user identities to corporate revenue statements. By providing subscription-based access and consulting services, the platform enables businesses to gain transparency into their performance and make informed decisions based on accurate data analysis.
Funding: $20M+
Rough estimate of the amount of funding raised