Enabled the AI Product to Go To Market faster

Client Background

Our client is a pioneering AI startup that has developed a groundbreaking product to combat online fraud and identity theft via facial recognition. Their cutting-edge machine learning models are the backbone of their innovative solutions, transforming how businesses and individuals protect themselves in this era.

The Challenge

As their business rapidly scaled, they faced a critical challenge: their AI inferencing APIs, built using PyTorch, FastAPI, and containers, were struggling to keep up with the surging demand. The product was unable to scale effectively in production, jeopardizing the company's plans for a major product launch.

Seeking a swift resolution to their scaling issues, they turned to CloudRaft. We are renowned for our expertise in AI, Kubernetes, and cloud-native solutions.

The Solution

Our specialists collaborated closely with their team to diagnose and address the underlying performance issues.

Container Optimization: Our team helped to optimize their container images, reducing the size by ~43%. This optimization significantly decreased the cold start time, enabling faster and more efficient model inferencing.
Multithreading: We advised them to leverage multithreading in their API and adjust the worker-to-thread ratio to achieve maximum throughput, unlocking substantial performance gains.
Load Testing and Benchmarking: By conducting extensive load testing, our team was able to identify the right solution that would enable the product to seamlessly meet the non-functional requirements set by the customer, ensuring a high-performance and scalable experience for end-users.
Cost & Performance Optimization: By carefully analyzing their resource utilization, we were able to identify a better approach for the AI inferencing platform, resulting in a 5x performance improvement and 3x cost savings. Even with Cloud Run’s ability to have a dedicated warm pool, the performance was not up to the requirement. We moved the workload from the Google Cloud Run to GKE Autopilot to reduce the cold startup penalty and get consistent CPU cycles.

The Results

Our strategic guidance and technical expertise helped our client, overcome their scaling challenges and position their AI product for unprecedented success in a very short duration.

The key outcomes include:

5x Performance Improvement: Our client’s model inferencing API achieved a remarkable 5x performance boost, enabling the seamless handling of surging user demand.
3x Cost Savings: By optimizing cloud resource utilization and moving from serverless to GKE Autopilot, we helped them reduce their cloud infrastructure costs by 3x.
Successful Product Launch: With the scalability and performance issues resolved, they were able to launch their game-changing AI product, unlocking new growth opportunities.

Get an Expert Consultation

We provide end-to-end solution and support for Building AI Cloud, Cloud Optimization, Platform Engineering, Observablity, Monitoring and many more areas. Empower yourself with best in class Kubernetes and Cloud Native solutions.