Tag: <span>PostgreSQL</span>

Data profiling is an important but often overlooked component in ETL pipelines or exploratory data analysis (EDA). It provides a way to look into the data to understand the structure, inter-relationships and dependencies with each other. It can also uncover any data quality issues that may stem inside a data pipeline during migration, preventing data…

Data Engineering

PostgreSQL is one of the most popular and feature rich open source relational database. It supports different types of encodings, e.g. ‘SQL_ASCII’, ‘UTF8’, ‘LATIN1’, ‘EUC_KR’ etc. Joel Spolsky has a must read article about unicode and character encoding, but basically character encoding is a mapping between a set of bytes and their corresponding characters. Without…

Database

ETL process usually involves a few discrete tasks that relies on each other. One way to achieve this is to use crontab (or Kubernetes CronJob) where each task is scheduled to run at a specific time. However, ensuring inter-dependencies among the tasks, where one task should only start when the previous has finished is not…

Data Engineering

JSON (Javascript Object Notation) is supported in PostgreSQL which lets you store semi-structured or unstructured data in a table and allows for greater flexibility for applications and support for NoSQL like features. Currently, there are two JSON data types in PostgreSQL: JSON and JSONB. In this post, I want to try some common operations for…

Database

As data engineers, we have to collect data from different types of sources and often have to come up with custom data pipelines and ETL tools to move data from one system to another in order to consolidate into a single data warehouse. The sources can be conventional relational databases, NoSQL databases or message bus…

Data Engineering