notes for running LLM Mistral 7B on AWS with Flask and HuggingFace
  • Python 98.2%
  • Shell 1.8%
Find a file
2026-04-13 16:31:28 -07:00
examples rest 2024-03-13 20:27:10 -07:00
.gitignore requirements 2024-03-17 09:02:15 -07:00
health_check.sh health 2024-02-11 14:52:34 -08:00
infer.py init 2024-02-11 13:38:41 -08:00
infer.service init 2024-02-11 13:38:41 -08:00
infer_server.py notes 2024-03-29 05:52:37 -07:00
README.md notes 2024-03-29 05:52:37 -07:00
requirements.txt Bump pytest from 8.1.1 to 9.0.3 2026-04-13 22:37:09 +00:00

Running Mistral 7B on AWS

These are my notes for following the very good tutorial about running Mistral 7b on an AWS GPU VM. https://www.youtube.com/watch?v=88ByWjM-KGM&t=617s

After getting to the point where the presenter ssh's into the VM, I iterated on things to make a service start the inference via HTTP REST POST with Flask.

Model https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2

Note, the change below using torch_dtype=torch.float16 is how the model fits in the limits of the G5 - float16 instead of float32, so there is compromise.

import torch

model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2", torch_dtype=torch.float16)

Set up remote editing (I don't use VSCode):

mkdir ~/remote-server                                                                                                                         <<<
sshfs ubuntu@34.220.227.12:infer ~/remote-server

# if it ever gets in a bad state because you brought down the vm:
sudo diskutil umount force ~/remote-server

Check drivers:

nvidia-smi

ssh to remote server and install transformers

python3 -m venv venv
source venv/bin/activate
pip install git+https://github.com/huggingface/transformers torch gunicorn flask

NOTE: if you still can't find the installed modules, sometimes python -m pip install blah works.

Tear down remote editing:

umount ~/remote-server                                                                                                                         <<<

MAKING A SERVER:

Test:

gunicorn -w 1 -b 0.0.0.0:8080 infer_server:app

Install the infer.service file at /etc/systemd/system/infer.service

sudo systemctl enable infer.service
sudo systemctl start infer.service