Lab: Deploying a Local LLM with Ollama

Overview

In this lab, you will set up a local environment to run a Large Language Model (LLM). You will install Ollama, pull a lightweight model (e.g., Llama 3 8B), and interact with it using both the command-line interface and its REST API.

Scenario

Your development team needs a sandbox environment to test prompts and understand LLM capabilities before moving to a costly cloud-based GPU cluster. You are tasked with providing a local, functional LLM endpoint on a standard developer machine.

Tasks

Install Ollama: Download and install the Ollama binary on your provided Linux VM.
Pull the Model: Use the Ollama CLI to pull the llama3 model. Observe the download size and initialization process.
CLI Interaction: Run the model interactively and ask it a few basic questions to verify functionality.
API Integration: Use curl to send a POST request to the Ollama REST API and parse the JSON response.
Resource Monitoring: While generating a long response, use htop or a similar tool in another terminal to observe CPU and RAM usage spikes.

Success Criteria

Ollama is running as a background service.
The llama3 model is successfully downloaded and cached.
You receive a coherent response from both the CLI and the REST API.
You have documented the peak memory usage during inference.

Deploying a Multi-Region Cluster Mesh Capstone Lab: Building the Foundation