Rezolve.ai is an AI-powered modern Employee Service Desk that brings instant employee support within Microsoft Teams, reducing enterprise friction and enhancing the employee experience.
Various employees of their customers, use Rezolve’s product to access knowledge base, raise incidents, and support requests. Thus, application availability and security are of prime importance.
Risk of getting out of support: Rezolve’s infrastructure uses Microsoft Azure and various other services such as Kubernetes, PostgreSQL, and Azure cognitive services. Some of the core components like Azure Kubernetes service and Postgresql database were on old versions. They were being deprecated by the Cloud provider. The major challenge was to upgrade its infrastructure without incurring long downtime. There was also a risk of getting out of support.
Security and compliance: Rezolve is GDPR and SOC 2 compliant product so needs to audit and enforce the compliance requirements in the Cloud and Kubernetes. Also, the team wanted faster feedback on static code analysis, vulnerabilities, and management capability.
Cloud governance issues and pending cost optimization: It was on the wishlist to optimize the infrastructure and reduce the cost as far as possible.
Implementation of MLOps platform: Being an AI company, Rezolve is constantly innovating in AI, specifically in NLP and generative AI. It requires the MLOps platform to train models on the customer’s dataset and manage the complete lifecycle.
Rezolve selected Cloudraft to overcome the aforementioned challenges. Our team crafted a plan and assessed the whole cloud environment holistically to understand the feasibility of upgrading and optimizing the Azure cloud infrastructure. During the process, we collaborated on research and development (R&D) and provided Rezolve with consulting services to drive internal innovation.
To upgrade from version 1.19 to 1.24, we used a blue-green strategy and used Velero, is an open-source cloud native backup and recovery solution to make it fast. Navigating the hurdles because of the API deprecation and other Kubernetes upgrade nuances, we tested all the applications before switching traffic to the new Cluster.
Additionally, we dissociated the public-facing IPs from the ingress controller and locked them using the Azure resource lock feature to avoid losing them.
We also implemented mechanisms like Autoscaling, PodDisruptionBudget, and Quotas to improve the reliability of the workload.
We initially tried Azure data migration service (DMS) but it didn’t work out. Additionally, OOTB was very slow and unreliable. We developed a custom strategy and solution to move the database from single server Postgres v9.6 to flexible server v14 using various open source and custom tools and programs. The goal of the solution was to minimize outages.
We also did an assessment of the Postgres database in context with performance and optimized it to improve performance and storage costs.
Implemented static code analysis with Semgrep in the CI pipeline and along with vulnerability management tools like Defectdojo, the developers are better able to rectify vulnerabilities and see the status in a centralized dashboard.
We audited the Azure cloud and fixed the security issues and implemented IDP and DDoS protection for the workload.
Implemented cert-manager to automatically manage the SSL certificates for all the services.
Audited the overall environment for compliance requirements.
- Participated in R&D and assessed various MLOps platforms to run the machine learning and AI workload in Azure cloud and Kubernetes. Implemented solutions like Haystack, and ClearML to build the MLOps platform.
We did a thorough analysis of the Cloud to identify opportunities to reduce costs. Implemented techniques like autoscaling, rightsizing, instance reservations, and garbage collection to optimize the workload.
Detailed assessment of the Postgres database, schema, and queries to identify performance bottlenecks.
- Fast turnaround and minimal disruption to business for the core services, major Kubernetes upgrade with less than 10 minutes of downtime.
- 30% cost saving, enhanced reliability, and performance improvements.
- Better security: applied shift left strategy to reduce costly late changes.
- Innovation and quicker prototyping support in AI/ML projects.
“Cloudraft assisted Rezolve.ai in migrating to the latest versions of Kubernetes and Postgres, while also taking advantage of additional capabilities. The task of managing the live infrastructure that is actively used by customers in production requires special attention, and they successfully navigated this challenge. Furthermore, they researched new technologies relevant to our use case and provided valuable recommendations. Cloudraft team was dedicated and professional in delivering the desired outcome, and we highly recommend their services.” – Uday Bhaskar Reddy, CTO, Rezolve.ai
Is your Cloud and Kubernetes infrastructure optimized for cost and performance? Are you aware of the potential risks lurking within your system? Don't fret – we're here to help!
Take advantage of our expertise and book a non-obligatory free session with us today. Our team of seasoned professionals will guide you through a comprehensive assessment of your Cloud and Kubernetes infrastructure.