Hi, my name is

Emeka.

I build reliable data platforms.

I am a data and platform engineer specializing in production lakehouse architectures, cloud infrastructure, streaming and CDC pipelines, and analytics systems that are meant to be used by real people.

Berlin-based AWS + GCP Platform engineering

Data and Platform Engineer Building cloud systems that are clear, trusted, and usable.

See my projects More about me

Enterprise workflow

What I build: data moving safely from source systems into trusted analytics products.

Terraform

multi-environment platform foundation

GitHub Actions

CI/CD, validation, and controlled deployments

Airflow

orchestration, scheduling, operational flow

BigLake

governance, external access, fine-grained control

Source PostgreSQL

application records and transactional data

Change feed CDC + Kafka

event capture, streaming, and reliable delivery

Bronze S3 + GCS lake

raw zones and historical storage for traceability and replay

Silver PySpark cleanup

validated, deduplicated, analytics-ready records

Gold dbt models

business-ready marts for reporting and reuse

Warehouse BigQuery + Redshift + Athena

trusted models exposed for analytics; Athena serves SQL over the lake

Serving Agent + BI

dashboards, APIs, and natural-language analytics

AWS production platform GCP enterprise rebuild Dev, staging, prod discipline

4+ Years building end-to-end data pipelines and platform systems

1M+ Daily records handled in real-time platform work using Databricks and Kinesis

AWS · GCP cloud environments used across architecture, delivery, and platform learning

LLM Analytics natural-language analytics agent built on top of curated Gold data

01. About

I build reliable data platforms with a strong focus on cloud architecture, medallion data layers, CDC pipelines, analytics serving, and developer workflows. My best work sits at the intersection of engineering depth and clarity: I care about systems that not only run, but can also be understood, maintained, and improved by others.

Recently, that has meant building a full AWS enterprise data platform repo by repo, then recreating the same ideas in GCP to deepen my platform understanding in a new cloud environment. I like work that goes beyond isolated ETL jobs into full platform design: infrastructure, identity, transformation, orchestration, serving, and docs.

In addition to platform engineering, I have extended into LLM application development, shipping a natural-language analytics agent on top of a curated Gold layer so users can ask questions in plain English and get answers, SQL, charts, and reports back.

Core strengths

Production lakehouse architecture, streaming pipelines, infrastructure as code

Warehouses and platforms

Snowflake, BigQuery, Redshift, Databricks, Athena, AWS, GCP

Languages and tools

Python, SQL, Terraform, PySpark, dbt, Kafka, Airflow, Streamlit, FastAPI

02. Experience

2025 — Present

Independent Data and Platform Engineer

Architecting a production AWS medallion lakehouse platform: PostgreSQL via DMS CDC into Bronze, Silver, and Gold S3 layers, Glue PySpark transformations, dbt on Athena, and Redshift Serverless with Spectrum; provisioned through modular Terraform across Dev, Staging, and Prod with GitHub Actions CI/CD.

AWS GCP Terraform Kafka dbt FastAPI

2022 — 2024

Data Engineer · Digital Spine GmbH

Built a real-time IoT data platform using Databricks Asset Bundles and Amazon Kinesis, processing over one million elevator telemetry records daily with sub-minute latency. Established GitHub Actions CI/CD for Databricks and dbt, enforced data quality, and built Power BI dashboards for operational decisions.

Databricks Kinesis PySpark Power BI GitHub Actions

2020 — 2022

Data Specialist · Potters Real Estate Limited

Built ELT workflows using dbt and AWS to centralize and standardize real estate data, replacing manual processes with automated pipelines. Deployed dashboards that contributed to stronger lead generation and better decision support.

dbt S3 Redshift Power BI

03. Work

Core platform

Enterprise Data Platform on AWS

A full multi-repo platform with PostgreSQL via DMS CDC into Bronze, Glue PySpark into Silver, dbt on Athena into Gold, and an analytics agent on top.

Open platform docs

Infrastructure

terraform-platform-infra-live

Networking, data lake buckets, IAM, serving infrastructure, monitoring, and the overall AWS platform foundation.

View repo

Analytics agent

platform-analytics-agent

Natural-language analytics with SQL guardrails, cost tracking, charts, interactive sessions, and PDF reporting.

View repo

PySpark and CDC

platform-glue-jobs

Shared CDC reconciliation logic and six Glue jobs that turn raw change events into clean current-state Silver tables.

View repo

Session lifecycle

platform-session-orchestrator

GitHub Actions workflows that bring the platform up for a working session and tear it down cleanly to control cost.

View repo

Current lab

Enterprise rebuild on GCP

Private ongoing work translating the same platform ideas into GCP with foundation, bootstrap, BigLake governance, and strong multi-environment discipline.

More about the GCP lab

04. Contact

Get in touch or explore the work in more detail.

If you are interested in data platforms, cloud data engineering, analytics systems, or practical enterprise architecture, the links below are the best places to reach me or browse my work.

Direct email

For collaboration, platform discussions, or project questions.

nweke.edeh@gmail.com

Send email

Professional profile

Background, experience timeline, and career highlights.

linkedin.com/in/edeh

Open profile

GitHub

Code and build work

Repositories, implementation details, and platform experiments.

github.com/ChuquEmeka

Open profile