gpu//db
NVIDIA Ampere 2020 enthusiast

NVIDIA GeForce RTX 3090

// 24 GB GDDR6X · 350W TDP · 35.6 TFLOPS FP32
▸ AI VALUE
4.3/5
ENTHUSIAST · RANK #4.4
▸ VRAM
24GB
▸ FP32
35.6TFL
▸ FP16
35.6TFL
▸ MEM BW
936GB/s
▸ TDP
350W

LLM Inference Performance

Model Tokens / sec Local Fit
Mistral 7b Q4
102 tok/s
fits · single GPU
Llama 3 8b Q4
95 tok/s
fits · single GPU
Llama 3 13b Q4
55 tok/s
fits · single GPU
Llama 3 70b Q4
— OOM — OOM / offload

Local Model Compatibility

7B params (int) fits
13B params fits
70B (4-bit quant) OOM

Spec Sheet

▸ COMPUTEA0
▸ ARCHITECTURE Ampere
▸ GPU CHIP GA102
▸ CUDA CORES 10,496
▸ TENSOR CORES 328
▸ BOOST CLOCK 1695 MHz
▸ FP32 35.6 TFLOPS
▸ FP16 / BF16 35.6 TFLOPS
▸ LAUNCH YEAR 2020
▸ MEMORY & RATINGSB0
▸ VRAM 24 GB GDDR6X
▸ BANDWIDTH 936 GB/s
▸ TIER enthusiast
▸ OVERALL 4.4/5
▸ AI VALUE 4.3/5
▸ GAMING VALUE 4.0/5
▸ POWERC0
▸ TDP 350 W
▸ PERF/W (FP32) 0.102 TFL/W
▸ MODEL FITD0
▸ RUNS 7B (INT) yes
▸ RUNS 13B yes
▸ RUNS 70B (4-bit) no
▸ PLATFORM CUDA · ROCm via HIP

Comparable GPUs

Analysis notes

Quick Summary

The RTX 3090 AI story in 2026 is all about value: 24GB of VRAM and full CUDA support at used prices that newer 24GB cards can’t touch. For local LLM inference — where VRAM capacity, not raw compute, is usually the limiting factor — it remains the smartest dollar-per-gigabyte buy in this database.

Specs That Matter for AI

24GB of GDDR6X runs 7B and 13B quantized models with headroom, and 936 GB/s of bandwidth keeps token generation brisk. At 35.6 TFLOPS FP32 it is slower on paper than newer cards, but for inference the memory ceiling matters more than the math rate.

Performance

Expect ~95 tok/s on Llama 3 8B q4 and ~55 tok/s on 13B q4 — comfortably interactive. A 70B q4 model still overflows 24GB, so that remains multi-GPU territory.

Verdict

For an inference-first build on a budget, a used RTX 3090 is hard to beat: maximum VRAM, full CUDA, low cost. Buy the speed of a 4090 only if your workload is compute-bound or you fine-tune frequently.

Frequently Asked Questions

Why is the RTX 3090 popular for AI?
It pairs 24GB of VRAM with full CUDA support, and used prices have fallen well below newer 24GB cards. That makes it the best dollar-per-gigabyte option for local LLM inference, where VRAM capacity matters more than raw speed.
How fast is the RTX 3090 for LLMs?
Around 95 tok/s on Llama 3 8B q4 — slower than a 4090 but plenty responsive. For inference of 7B–13B models the bottleneck is usually VRAM, not compute, so the 3090 punches above its age.
Should I buy a used RTX 3090 in 2026?
For an inference-focused build it is excellent value. Check the card's thermals and fan health, since many were used for mining or heavy compute; GDDR6X runs hot.

Sources