Loading video player...
Core Objective This project demonstrates the end-to-end development and deployment of a Production-Ready LLM Microservice. The goal was to transform a standard Hugging Face model (GPT-2) into a scalable, secure, and highly optimized RESTful API using FastAPI and Docker. Key Technical Challenges & Solutions Optimized Multi-Stage Docker Build: To satisfy production requirements, I implemented a two-stage Dockerfile. Stage 1 handles heavy build dependencies, while Stage 2 creates a lean, secure production image optimized at 1.53GB using torch-cpu. Enterprise-Grade Security: Unlike basic scripts, this API is secured using API Key Authentication. The service validates requests via custom headers (X-API-KEY), with secrets managed securely through environment variables. 🧵 Concurrency & Performance: To prevent server blocking during heavy AI inference, I utilized asynchronous thread pooling (anyio). This allows the service to handle 5+ concurrent requests simultaneously without crashing. ⏳ Efficiency via Lazy Loading: Implemented a Singleton pattern for the model class. The model remains uninitialized until the first API call is made, significantly reducing initial container startup time. ### 🛠️ Technical Stack Language: Python 3.10. API Framework: FastAPI (Uvicorn). ML Library: Hugging Face Transformers (GPT-2). Infrastructure: Docker & Docker Compose. Testing: Python Requests & Concurrent Futures. ### Outcomes Verified [200 OK] Health monitoring endpoint functional. [403 Forbidden] Unauthorized requests successfully blocked. [Stable] Passed stress testing with simultaneous generation requests. [Optimized] Final deployment image size verified under 1.6GB.