
🚀 Essential Containers for Data Engineering (with Docker commands)#
This post highlights a set of ready-to-use containers that speed up any data project.
Below is a summary of the commands needed to bring them up.
🧰 Technologies and commands#
🐳 PostgreSQL#
For databases
docker run
--name postgres
-e POSTGRES_PASSWORD=postgres
-p 5432:5432
-d
postgres🐳 MySQL#
For databases
docker run
--name mysql
-e MYSQL_ROOT_PASSWORD=root
-p 3306:3306
-d
mysql🔄 Apache Airflow#
For orchestration
docker run -d -p 8080:8080 puckel/docker-airflow webserver⚡ Apache Spark#
For distributed processing
docker run -it --rm -p 8888:8888 jupyter/pyspark-notebook📦 MinIO (S3 compatible)#
For S3-like storage
docker run -p 9000:9000 -p 9001:9001 \
-e "MINIO_ROOT_USER=admin" \
-e "MINIO_ROOT_PASSWORD=password" \
quay.io/minio/minio server /data --console-address ":9001"📊 Metabase#
For quick visualization
docker run -d -p 3000:3000 --name metabase metabase/metabase🧩 A brief explanation#
If you’re getting started:
A container is like a portable box that includes everything a tool needs to run without complicated setup.
Instead of installing everything manually, you simply run a command like:
docker run ...and the service starts the same way on any machine.
This lets you learn, test, and deploy data solutions without headaches.
More information at the link 👇
Also published on LinkedIn.

