Arroyo

About Arroyo

Arroyo is a serverless stream processing platform that enables users to execute SQL queries on real-time data streams from Kafka, allowing for the transformation, filtering, and aggregation of data with sub-second latency. It eliminates the need for a dedicated operations team by scaling automatically to handle millions of events per second while ensuring exactly-once processing semantics.

```xml <problem> Traditional stream processing systems often require dedicated operations teams to manage infrastructure and ensure scalability, adding complexity and overhead. Existing solutions can also struggle to provide both analytical SQL capabilities and sub-second latency for real-time data processing. </problem> <solution> Arroyo is a serverless stream processing platform designed for cloud-native environments, enabling users to execute analytical SQL queries on real-time data streams with sub-second latency. The platform automatically scales from zero to millions of events per second without requiring a dedicated operations team. Built on Rust and the Arrow in-memory analytics format, Arroyo offers high performance and exactly-once processing semantics. It allows data scientists and engineers to build end-to-end real-time applications and dashboards using familiar SQL syntax. </solution> <features> - Analytical SQL support for transforming, filtering, aggregating, and joining data streams - Serverless architecture that automatically scales to handle varying workloads - Exactly-once processing semantics to ensure data accuracy and consistency - Native support for JSON, Avro, Parquet, and raw text/binary formats - User-defined functions (UDFs) for extending SQL functionality with Rust (and soon Python) - Web UI for managing connections, developing SQL queries, and monitoring pipelines - REST API for declarative orchestration and management at scale - Support for time windows (sliding, tumbling, and session) with watermark processing - Connectors for integrating with various data sources and sinks, including Kafka, Kinesis, Confluent Cloud, and more </features> <target_audience> Arroyo is targeted towards data scientists, data engineers, and application developers who need to build real-time data pipelines and applications with low latency and high scalability. </target_audience> ```

What does Arroyo do?

Arroyo is a serverless stream processing platform that enables users to execute SQL queries on real-time data streams from Kafka, allowing for the transformation, filtering, and aggregation of data with sub-second latency. It eliminates the need for a dedicated operations team by scaling automatically to handle millions of events per second while ensuring exactly-once processing semantics.

Where is Arroyo located?

Arroyo is based in Berkeley, United States.

When was Arroyo founded?

Arroyo was founded in 2022.

How much funding has Arroyo raised?

Arroyo has raised 500000.

Who founded Arroyo?

Arroyo was founded by Micah Wylde and Jackson Newhouse.

  • Micah Wylde - Founder
  • Jackson Newhouse - Founder
Location
Berkeley, United States
Founded
2022
Funding
500000
Employees
2 employees
Major Investors
Y Combinator
Looking for specific startups?
Try our free semantic startup search

Arroyo

Score: 100/100
AI-Generated Company Overview (experimental) – could contain errors

Executive Summary

Arroyo is a serverless stream processing platform that enables users to execute SQL queries on real-time data streams from Kafka, allowing for the transformation, filtering, and aggregation of data with sub-second latency. It eliminates the need for a dedicated operations team by scaling automatically to handle millions of events per second while ensuring exactly-once processing semantics.

arroyo.dev700+
cb
Crunchbase
Founded 2022Berkeley, United States

Funding

$

Estimated Funding

$500K+

Major Investors

Y Combinator

Team (<5)

Micah Wylde

Founder

Jackson Newhouse

Founder

Company Description

Problem

Traditional stream processing systems often require dedicated operations teams to manage infrastructure and ensure scalability, adding complexity and overhead. Existing solutions can also struggle to provide both analytical SQL capabilities and sub-second latency for real-time data processing.

Solution

Arroyo is a serverless stream processing platform designed for cloud-native environments, enabling users to execute analytical SQL queries on real-time data streams with sub-second latency. The platform automatically scales from zero to millions of events per second without requiring a dedicated operations team. Built on Rust and the Arrow in-memory analytics format, Arroyo offers high performance and exactly-once processing semantics. It allows data scientists and engineers to build end-to-end real-time applications and dashboards using familiar SQL syntax.

Features

Analytical SQL support for transforming, filtering, aggregating, and joining data streams

Serverless architecture that automatically scales to handle varying workloads

Exactly-once processing semantics to ensure data accuracy and consistency

Native support for JSON, Avro, Parquet, and raw text/binary formats

User-defined functions (UDFs) for extending SQL functionality with Rust (and soon Python)

Web UI for managing connections, developing SQL queries, and monitoring pipelines

REST API for declarative orchestration and management at scale

Support for time windows (sliding, tumbling, and session) with watermark processing

Connectors for integrating with various data sources and sinks, including Kafka, Kinesis, Confluent Cloud, and more

Target Audience

Arroyo is targeted towards data scientists, data engineers, and application developers who need to build real-time data pipelines and applications with low latency and high scalability.

Arroyo - Funding: $500K+ | StartupSeeker