Skip to main content
  1. Posts/

Containers for Data Engineering

··241 words·2 mins·

🚀 Essential Containers for Data Engineering (with Docker commands)
#

This post highlights a set of ready-to-use containers that speed up any data project.
Below is a summary of the commands needed to bring them up.

🧰 Technologies and commands
#

🐳 PostgreSQL
#

For databases

docker run 
    --name postgres
    -e POSTGRES_PASSWORD=postgres
    -p 5432:5432 
    -d 
    postgres

🐳 MySQL
#

For databases

docker run 
    --name mysql 
    -e MYSQL_ROOT_PASSWORD=root 
    -p 3306:3306 
    -d 
    mysql

🔄 Apache Airflow
#

For orchestration

docker run -d -p 8080:8080 puckel/docker-airflow webserver

⚡ Apache Spark
#

For distributed processing

docker run -it --rm -p 8888:8888 jupyter/pyspark-notebook

📦 MinIO (S3 compatible)
#

For S3-like storage

docker run -p 9000:9000 -p 9001:9001 \
  -e "MINIO_ROOT_USER=admin" \
  -e "MINIO_ROOT_PASSWORD=password" \
  quay.io/minio/minio server /data --console-address ":9001"

📊 Metabase
#

For quick visualization

docker run -d -p 3000:3000 --name metabase metabase/metabase

🧩 A brief explanation
#

If you’re getting started:
A container is like a portable box that includes everything a tool needs to run without complicated setup.
Instead of installing everything manually, you simply run a command like:

docker run ...

and the service starts the same way on any machine.
This lets you learn, test, and deploy data solutions without headaches.

More information at the link 👇

Also published on LinkedIn.
Juan Pedro Bretti Mandarano
Author
Juan Pedro Bretti Mandarano