Back to notes

How to Run Local AI Models on Your MacBook Pro M1: Practical Tools for Developers

I was trying to figure out how I could actually use AI models directly on my MacBook Pro M1 without sending everything to the cloud. It seems like running things locally is a big deal, especially for privacy and speed. I read some stuff about local AI models, and I found a few projects that look really interesting for developers and people who just want to experiment.

What are the key tools for running local AI on Apple Silicon?

The main takeaway is that the M1 chip is great for this because it handles the heavy lifting efficiently. I looked at a few projects that focus on different aspects of local AI: data retrieval, data management, and building local assistants.

Localizing Knowledge: RAG and Data Management

One area that caught my attention is Retrieval-Augmented Generation, or RAG. RAG is a way to connect an LLM (a large language model) to your own private documents so the model can answer questions based only on that information. For developers, I saw a project called `shinpr/mcp-local-rag`. It’s a local RAG server designed for searching code and technical documents using semantic and keyword search. The cool part is that it’s fully private and requires zero setup, which seems super convenient.

Another piece of the puzzle is managing the data itself. I checked out `LightlyStudio`. This tool is designed to unify the workflow for curating, annotating, and managing data, especially for images and videos. It’s built using Rust, which means it’s really fast, and it runs well on platforms like the MacBook Pro M1 with 16GB of memory. It lets you do things like classification, segmentation, and tagging automatically based on folder structure. It seems like a great way to prepare the data before feeding it into an AI model.

Building Local AI Agents

Finally, there are projects focused on building full local AI assistants. I saw `Jarvis · Flat-Out HUD`, which is a fully local, voice-driven AI kiosk. It uses a stack of local models—like Qwen for the language, Whisper for speech-to-text, and Kokoro for text-to-speech—meaning there’s no cloud round-trip for the actual AI inference. It’s designed to be an on-premise assistant with a Heads-Up Display and integrates with tools for video editing and communication. It shows how you can combine different local models to create a powerful, self-contained system.

Overall, what I’m learning is that running AI locally isn't just about running one big model; it’s about building a whole local ecosystem. It involves tools for managing the data (like LightlyStudio), tools for retrieving information (like RAG), and systems for orchestrating the whole experience (like Jarvis). It’s a lot to take in, and I’m still figuring out the best way to connect these pieces together, but the potential for privacy and control is really exciting.

Sources