Khang
Do

Data Engineer building production pipelines on Azure & GCP. Currently: real-time fraud detection + live market analytics.

About Me
I
am
Khang
Do,
a
Data
Engineer.
I
build
pipelines
that
run
in
production.

I've spent the last year building production data pipelines on Azure and GCP designing ETL/ELT architecture, real-time fraud detection that runs in under 5 seconds, and observability layers that tell you when your data is stale before your stakeholders do. Currently at Kyanon Digital delivering data solutions across real estate analytics and enterprise domains.

Technical Skills

Core competencies built through production experience.

Programming

PythonSQLPySpark

Data Engineering

AirflowdbtSparkKafkaDelta Lake

Cloud & Infra

Azure (Synapse, ADF, ADLS)GCP (BigQuery)DockerTerraform

Databases & Stores

PostgreSQLDuckDBRedisSupabase

Featured Projects

A selection of data engineering projects showcasing end-to-end pipeline development, automation, and cloud infrastructure expertise.

FinFlow

Lambda Architecture Fintech Platform

  • Real-time fraud detection with Kafka + Spark Structured Streaming
  • Batch analytics pipeline with Airflow + dbt (Bronze/Silver/Gold layers)
  • Self-monitoring observability: schema drift detection + freshness SLA alerts
KafkaPySparkAirflowdbtRedisFastAPI
LIVE

Vietnam Market Pulse

Real-time Financial Data Platform

  • Live BTC/ETH/BNB prices in VND via Supabase Realtime WebSocket
  • USD/VND exchange rates updated every 15 minutes via GitHub Actions pipeline
  • VN-Index tracking with pipeline health monitoring dashboard
PythonSupabaseGitHub ActionsNext.jsRecharts

Enterprise Data Sync Pipeline

End-to-end data synchronization pipeline for employee, member roles, and account management with comprehensive failure logging.

  • Designed Role/Member-Role Entity schema
  • Built Employee & Account Role sync pipeline
  • Implemented failure logging system
ETLDatabase DesignData Sync

ELT Pipeline with DBT

Modern Data Stack implementation with containerized ELT system, automated workflows, and real-time data quality monitoring.

  • 100% automated daily workflows via Airflow
  • Star Schema transformation with DBT
  • Data Quality Framework (6 dimensions)
DBTAirflowDockerStar Schema

CMS Migration Automation

Test automation framework for CMS migration using SeleniumBase, enabling automated QC and content verification.

  • Automated login & page creation flows
  • QC scripts for CMS migration
  • Content consistency verification
SeleniumBasePythonQA Automation

Legal Data Platform

Backend development and refactoring for EU legal data platform, including web crawlers, authentication, and subscription systems.

  • Refactored EU legal web crawlers
  • Improved authentication & billing
  • Role-based access control system
PythonWeb ScrapingBackendRBAC

Cloud Analytics Audit

Data stack audit and optimization for analytics infrastructure, including BigQuery, Fivetran, and BI tool evaluation.

  • Audited data architecture
  • Resolved cloud data warehouse connectivity issues
  • Hex vs Looker Studio comparison
BigQueryFivetranLookerMixpanel

Azure Data Lakehouse for Wood Import & Export

End-to-end modern data lakehouse architecture on Microsoft Azure for tracking and analyzing wood supply chain operations.

  • Architected Medallion (Bronze/Silver/Gold) layer
  • Automated ETL workflows via Azure Data Factory
  • Optimized data transformations using Databricks
AzureDatabricksADFMedallion Architecture

Experience & Growth

Career progression focusing on data engineering, cloud infrastructure, and building scalable data systems.

View Resume

Data Engineer / Azure Data Lakehouse for Wood Import & Export

Kyanon DigitalNov 2025 — Apr 2026

  • Engineered a scalable Medallion ETL pipeline using Azure Synapse Analytics and PySpark for automated ingestion and dynamic schema mapping.
  • Optimized Spark on constrained hardware, eliminating critical OOM errors via Reliable Checkpoints to ADLS for heavy workloads (>1400 stages).
  • Orchestrated resource-efficient workflows (Parallel Ingest & Sequential ETL) and integrated Azure Monitor for automated alerting.

Data Engineer / Cloud Analytics Audit

Kyanon DigitalAug 2025 — Nov 2025

  • Audited the cloud data architecture and resolved critical pipeline connectivity issues between Fivetran and BigQuery.
  • Led BI tool evaluation (Hex vs Looker Studio) and revamped Mixpanel dashboards to optimize product analytics reporting.
  • Authored comprehensive technical documentation covering data validation, architecture logic, and end-user guides.

Automation Engineer / CTI CMS Automation

Kyanon DigitalJun 2025 — Aug 2025

  • Architected a robust testing framework using SeleniumBase to automate secure authentication flows and environment setups.
  • Engineered automation scripts to bulk-generate product categories and orchestrate page tree creation/approval workflows in the new CMS.
  • Developed automated Quality Control (QC) scripts to validate URL integrity and cross-verify content consistency during the legacy-to-new CMS migration.

Data Engineer / Modern ELT Pipeline & Data Quality Framework

Kyanon DigitalMar 2025 — May 2025

  • Engineered a containerized Modern Data Stack using Docker and Airflow, achieving 100% automated daily workflows and fully decoupling operational from analytical databases.
  • Architected BI-ready Star Schema models utilizing DBT, establishing a highly scalable, modular, and maintainable data transformation codebase.
  • Implemented a robust Data Quality framework to automatically validate 6 key quality dimensions, integrated with real-time anomaly detection and alerting mechanisms.

B.E. in Information Technology

Ho Chi Minh City University of Technology and EducationSep 2021 — Jul 2025

  • Relevant coursework: Database Systems, Data Structures, Cloud Computing
  • Completed capstone project on data pipeline automation

Career Goal

Lead the design and implementation of end-to-end data infrastructures for complex projects. Master the Modern Data Stack ecosystem to deliver optimized, scalable solutions.