Exxa
About Exxa
EXXA provides a cost-efficient asynchronous LLM inference service that utilizes a custom scheduler to aggregate unused compute resources across multiple data centers, enabling batch processing of requests. This approach reduces costs to as low as $0.30 per million tokens while optimizing energy consumption, making it ideal for applications that can tolerate some delay.
```xml <problem> Large language model (LLM) inference can be computationally expensive, leading to high costs for applications that require processing substantial volumes of text. Traditional systems often fail to efficiently utilize available compute resources, resulting in wasted capacity and increased energy consumption. </problem> <solution> EXXA provides a cost-optimized asynchronous LLM inference service by aggregating unused compute capacity across multiple data centers. Their custom scheduler and orchestrator efficiently capture intermittent compute windows, enabling batch processing of requests at significantly reduced costs. By leveraging underutilized resources, EXXA offers a more sustainable and affordable solution for LLM inference, particularly for applications that can tolerate a delay in processing. The platform supports open-source models like Llama 3, providing high-quality output at a fraction of the cost of traditional inference services. </solution> <features> - Asynchronous batch processing with a typical turnaround time of under 24 hours - Custom scheduler and orchestrator to maximize the use of intermittent and low-cost compute resources - Predictive inference optimizer to select optimal settings for each payload, including batch size and context size - Specialized inference engine optimized for batch API, featuring persistent KV cache and cross-platform/cross-GPU compatibility - Support for Llama 3.1 70B and 8B models, with plans to add more models - Detailed energy consumption data provided for each request via API - Option to offset carbon footprint by purchasing certified carbon credits through the platform - Batch cancellation feature, charging only for completed work </features> <target_audience> EXXA primarily targets businesses and developers who require cost-effective LLM inference for large-scale tasks such as LLM evaluation, contextual retrieval, classification, translation, parsing, and synthesis. </target_audience> <revenue_model> EXXA charges per million tokens, with different rates for input and output tokens depending on the model used; for example, Llama-3.1-70b-instruct-fp16 is priced at $0.30 per million input tokens and $0.50 per million output tokens. </revenue_model> ```
What does Exxa do?
EXXA provides a cost-efficient asynchronous LLM inference service that utilizes a custom scheduler to aggregate unused compute resources across multiple data centers, enabling batch processing of requests. This approach reduces costs to as low as $0.30 per million tokens while optimizing energy consumption, making it ideal for applications that can tolerate some delay.
When was Exxa founded?
Exxa was founded in 2023.
Who founded Exxa?
Exxa was founded by Etienne Balit.
- Etienne Balit - Co-founder/CTO
- Founded
- 2023
- Employees
- 3 employees