Exxa

About Exxa

EXXA provides a cost-efficient asynchronous LLM inference service that utilizes a custom scheduler to aggregate unused compute resources across multiple data centers, enabling batch processing of requests. This approach reduces costs to as low as $0.30 per million tokens while optimizing energy consumption, making it ideal for applications that can tolerate some delay.

```xml <problem> Large language model (LLM) inference can be computationally expensive, leading to high costs for applications that require processing substantial volumes of text. Traditional systems often fail to efficiently utilize available compute resources, resulting in wasted capacity and increased energy consumption. </problem> <solution> EXXA provides a cost-optimized asynchronous LLM inference service by aggregating unused compute capacity across multiple data centers. Their custom scheduler and orchestrator efficiently capture intermittent compute windows, enabling batch processing of requests at significantly reduced costs. By leveraging underutilized resources, EXXA offers a more sustainable and affordable solution for LLM inference, particularly for applications that can tolerate a delay in processing. The platform supports open-source models like Llama 3, providing high-quality output at a fraction of the cost of traditional inference services. </solution> <features> - Asynchronous batch processing with a typical turnaround time of under 24 hours - Custom scheduler and orchestrator to maximize the use of intermittent and low-cost compute resources - Predictive inference optimizer to select optimal settings for each payload, including batch size and context size - Specialized inference engine optimized for batch API, featuring persistent KV cache and cross-platform/cross-GPU compatibility - Support for Llama 3.1 70B and 8B models, with plans to add more models - Detailed energy consumption data provided for each request via API - Option to offset carbon footprint by purchasing certified carbon credits through the platform - Batch cancellation feature, charging only for completed work </features> <target_audience> EXXA primarily targets businesses and developers who require cost-effective LLM inference for large-scale tasks such as LLM evaluation, contextual retrieval, classification, translation, parsing, and synthesis. </target_audience> <revenue_model> EXXA charges per million tokens, with different rates for input and output tokens depending on the model used; for example, Llama-3.1-70b-instruct-fp16 is priced at $0.30 per million input tokens and $0.50 per million output tokens. </revenue_model> ```

What does Exxa do?

EXXA provides a cost-efficient asynchronous LLM inference service that utilizes a custom scheduler to aggregate unused compute resources across multiple data centers, enabling batch processing of requests. This approach reduces costs to as low as $0.30 per million tokens while optimizing energy consumption, making it ideal for applications that can tolerate some delay.

When was Exxa founded?

Exxa was founded in 2023.

Who founded Exxa?

Exxa was founded by Etienne Balit.

  • Etienne Balit - Co-founder/CTO
Founded
2023
Employees
3 employees
Looking for specific startups?
Try our free semantic startup search

Exxa

Score: 49/100
AI-Generated Company Overview (experimental) – could contain errors

Executive Summary

EXXA provides a cost-efficient asynchronous LLM inference service that utilizes a custom scheduler to aggregate unused compute resources across multiple data centers, enabling batch processing of requests. This approach reduces costs to as low as $0.30 per million tokens while optimizing energy consumption, making it ideal for applications that can tolerate some delay.

withexxa.com300+
Founded 2023

Funding

No funding information available. Click "Fetch funding" to run a targeted funding scan.

Team (<5)

Etienne Balit

Co-founder/CTO

Company Description

Problem

Large language model (LLM) inference can be computationally expensive, leading to high costs for applications that require processing substantial volumes of text. Traditional systems often fail to efficiently utilize available compute resources, resulting in wasted capacity and increased energy consumption.

Solution

EXXA provides a cost-optimized asynchronous LLM inference service by aggregating unused compute capacity across multiple data centers. Their custom scheduler and orchestrator efficiently capture intermittent compute windows, enabling batch processing of requests at significantly reduced costs. By leveraging underutilized resources, EXXA offers a more sustainable and affordable solution for LLM inference, particularly for applications that can tolerate a delay in processing. The platform supports open-source models like Llama 3, providing high-quality output at a fraction of the cost of traditional inference services.

Features

Asynchronous batch processing with a typical turnaround time of under 24 hours

Custom scheduler and orchestrator to maximize the use of intermittent and low-cost compute resources

Predictive inference optimizer to select optimal settings for each payload, including batch size and context size

Specialized inference engine optimized for batch API, featuring persistent KV cache and cross-platform/cross-GPU compatibility

Support for Llama 3.1 70B and 8B models, with plans to add more models

Detailed energy consumption data provided for each request via API

Option to offset carbon footprint by purchasing certified carbon credits through the platform

Batch cancellation feature, charging only for completed work

Target Audience

EXXA primarily targets businesses and developers who require cost-effective LLM inference for large-scale tasks such as LLM evaluation, contextual retrieval, classification, translation, parsing, and synthesis.

Revenue Model

EXXA charges per million tokens, with different rates for input and output tokens depending on the model used; for example, Llama-3.1-70b-instruct-fp16 is priced at $0.30 per million input tokens and $0.50 per million output tokens.