Lab: Deploying a Local LLM with Ollama
Overview
In this lab, you will set up a local environment to run a Large Language Model (LLM). You will install Ollama, pull a lightweight model (e.g., Llama 3 8B), and interact with it using both the command-line interface and its REST API.
Scenario
Your development team needs a sandbox environment to test prompts and understand LLM capabilities before moving to a costly cloud-based GPU cluster. You are tasked with providing a local, functional LLM endpoint on a standard developer machine.
Tasks
- Install Ollama: Download and install the Ollama binary on your provided Linux VM.
- Pull the Model: Use the Ollama CLI to pull the
llama3model. Observe the download size and initialization process. - CLI Interaction: Run the model interactively and ask it a few basic questions to verify functionality.
- API Integration: Use
curlto send a POST request to the Ollama REST API and parse the JSON response. - Resource Monitoring: While generating a long response, use
htopor a similar tool in another terminal to observe CPU and RAM usage spikes.
Success Criteria
- Ollama is running as a background service.
- The
llama3model is successfully downloaded and cached. - You receive a coherent response from both the CLI and the REST API.
- You have documented the peak memory usage during inference.