Alexey Fateev

MLOps | LLMOps Engineer

Summary

  • 6+ years of professional experience in the Information Technology industry
  • 4+ year of professional experience in Python development
  • 3+ year of professional experience in Data Engineering
  • 2+ year of professional experience in MLOps and LLMOps

MLOps Engineer focused on building robust machine learning infrastructure and implementing DevOps practices in ML processes. Specialize in developing and optimizing pipelines for model training and deployment, implementing GitOps approaches using ArgoCD and FluxCD, automating ML processes. Experienced in working with high-load systems and deploying LLM solutions. Passionate about creating efficient ML infrastructure and constantly exploring new approaches to optimize MLOps processes. Driven to innovate in the field of ML systems automation and scaling.

Experience

Tech Lead | LLMOps Engineer

KTSAugust 2024 - PresentRussia

In this role, I lead the MLOps team in a project to create a unified RAG platform for the entire bank. My work combines technical leadership, model optimization, and interaction with business stakeholders to integrate new solutions.

Key Achievements:

  • Leading a cross-functional team of 15+ professionals (Data Scientists, ML Engineers, Data Engineers, System Analysts) as Tech Lead, driving technical strategy and execution across multiple AI initiatives
  • Successfully delivered 5 production-ready RAG-based products and AI Agent solutions, serving the entire bank's AI infrastructure needs
  • Architected and implemented from scratch an A/B testing platform for RAG products leveraging Istio Service Mesh and Argo Rollouts, enabling data-driven product optimization
  • Designed and deployed canary deployment strategy from the ground up, significantly reducing production deployment risks and enabling safer rollouts
  • Established unified technical and infrastructure layer across all AI products, ensuring consistency, scalability, and maintainability
  • Optimized LLM model inference, resulting in a 40% performance improvement. This reduced the response time of the entire RAG service by half
  • Ensured high performance and reliability of the service, maintaining SLA at 5 seconds under load of up to 250,000 requests per day
  • Developed and implemented production-ready MLOps pipelines for LLM model deployment using KServe and vLLM
  • Resolved infrastructure constraints by building vLLM from source with flash-attention support for legacy CUDA (11.8)
  • Implemented a unified gateway (HiGress) for all LLM models and MCP (Model Context Protocol), centralizing management and access

Core Responsibilities:

  • Designing architecture and participating in RAG system implementation
  • Deploying and maintaining LLM inference infrastructure in new clusters based on KServe, including troubleshooting kNative and Istio components
  • Client interaction: conducting meetings, developing connection schemes for new clients to RAG service, and effort estimation
  • Creating unified pipelines for deploying various non-model services across multiple environments (clusters), improving release speed and consistency
  • Research and implementation of best practices for optimizing and accelerating LLM model inference
KubernetesKServevLLMRAGArgoCDArgo RolloutsIstioPythonJenkinsAI Agents

MLOps Engineer

VKMay 2023 - August 2024 · 1 year 4 monthsRussia
  • Developed and maintained a machine learning model deployment platform, managing 100+ ML models as part of a specialized ML team
  • Orchestrated database operations, including table creation and structure optimization for enhanced performance
  • Led critical aspects of a large-scale infrastructure migration, including server relocation and system upgrades
  • Authored and implemented Lua scripts for Tarantool Cartridge cluster during application migration
  • Enhanced a Golang-based database emulator for Clickhouse, improving integration testing capabilities
  • Streamlined Python environment migration through RPM packaging and GitLab CI pipeline development
  • Developed and deployed a chat-bot application utilizing OpenAI API, Langchain, and RAG for custom report generation
  • Deployed applications in Kubernetes (k8s) environments, ensuring scalability and efficient container orchestration
  • Utilized Puppet for automated server deployment and configuration management
PythonRAGLuaGolangClickhousePythonRPMGitLab CIOpenAI APILangchainKubernetesPuppet

Data Engineer

Метр квадратныйMarch 2022 - May 2023 · 1 year 3 monthsRussia
  • DWH maintenance
  • Modeling new database objects from non-relational to relational form
  • Implementing Grafana and Prometheus to track metrics about DAGs execution
  • Creating and maintaining ETL pipelines to automate CRM interactions with customers through various communication channels (email, SMS, push notifications, etc)
  • Using asynchrony to speed up query execution
  • API integration with external systems
PythonDWHApache AirflowApache KafkaPostgreSQL

Data Engineer

DataArtAugust 2021 - March 2022 · 8 monthsRussia
  • Developed data pipelines in GCP for financial data processing, including encryption and anonymization in PCI environment
  • Built backend services using FastAPI and deployed them to Cloud Run and Cloud Functions
  • Created and maintained data analytics protocols, standards and documentation
  • Developed web application using Django and Plotly Dash for IT job market trend analysis
  • Implemented ETL pipelines using Apache Airflow for data processing
  • Worked with technologies: GKE, Cloud PubSub, BigQuery, Cloud Build, PostgreSQL, Docker, Redis
GCPFastAPIDjangoPlotly DashApache AirflowGKECloud PubSubBigQueryPostgreSQLDockerRedis

Education

Master of Mathematical Modeling and Computer Science

Voronezh State University2009 - 2015Russia