Loading video player...
Running SmolLM2-360M-Instruct from HuggingFace in pure Go with zero Python dependencies. QUICK START: Download any HuggingFace model: huggingface-cli download HuggingFaceTB/SmolLM2-360M-Instruct Start backend server (pure Go): go run serve_model_bytes.go -model HuggingFaceTB/SmolLM2-360M-Instruct -port 8080 Start web interface (pure Go): go run web_interface.go -model HuggingFaceTB/SmolLM2-360M-Instruct -backend http://localhost:8080 -port 5000 Open browser at http://localhost:5000 Streaming LLM inference with no Python runtime! WHAT'S HAPPENING: - Loading HuggingFace safetensors directly (no conversion) - Pure Go BPE tokenizer (no transformers library) - Native transformer implementation with MHA, RMSNorm, SwiGLU - Streaming generation with Server-Sent Events - Embedded web UI (no separate frontend) - Single 10MB binary deployment WHY THIS MATTERS: ✓ No Python runtime required ✓ No Docker needed ✓ Cross-platform: Linux, macOS, Windows, ARM ✓ 10MB binary vs 5GB+ Python containers ✓ Under 1 second cold start vs 10+ seconds for Python ✓ Works on Raspberry Pi, edge devices, air-gapped systems ✓ Same code deploys everywhere WORKS WITH THESE MODELS: - Qwen2.5 (0.5B, 1.5B, 3B, 7B) - SmolLM2 (135M, 360M, 1.7B) - TinyLlama (1.1B) - Mistral (7B) - Any Llama-architecture model from HuggingFace LOOM/WELVET FEATURES: - 10 layer types: Dense, Conv2D, Multi-Head Attention, RNN, LSTM, Softmax, LayerNorm, RMSNorm, SwiGLU, Residual - Cross-platform determinism (MAE less than 1e-8) - Native Mixture of Experts via Grid Softmax - Published to PyPI, npm, NuGet - Browser WASM support - Full training and inference INSTALLATION OPTIONS: Python: pip install welvet JavaScript: npm install @openfluke/welvet C#: dotnet add package Welvet LINKS: GitHub: https://github.com/openfluke/loom Documentation: https://github.com/openfluke/loom/blob/main/README.md Model used: https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct Drop your questions in the comments! #go #golang #llm #ai #machinelearning #huggingface #mlops #inference #edge #opensource #transformer #llama #qwen #smollm