Integrate data from PostgreSQL to Amazon Redshift using Matillion

Our PostgreSQL to Redshift connector seamlessly transfers your data to Redshift within minutes, ensuring it remains up-to-date—no manual coding or complex ETL scripts required.

Try platform for free

Extracting data from PostgreSQL to Amazon Redshift

Extracting data from PostgreSQL is a foundational step in many modern data integration projects, particularly when centralizing analytics in data warehouses like Amazon Redshift. Establishing an efficient and reliable data pipeline ensures that your organization can benefit from timely, accurate insights while minimizing operational complexity. In this article, we will guide you through the essential steps for moving your PostgreSQL data into Redshift. We begin with the creation of an identity in PostgreSQL, ensuring secure and controlled access for your extraction process. For users of Matillion, a widely adopted ETL platform, we discuss how to check for (or acquire) the necessary JDBC driver that enables connectivity between services. Next, we examine network connectivity requirements to allow seamless data transfer from your PostgreSQL source to the Redshift target. Finally, we cover techniques for querying data, handling both the initial full load and subsequent incremental updates. Whether you are setting up your pipeline for the first time or refining an existing process, this article provides practical guidance to streamline your PostgreSQL to Redshift data integration.

What is PostgreSQL?

PostgreSQL is a powerful, open-source relational database known for advanced SQL features like transactional integrity, multi-version concurrency control (MVCC), and diverse indexing, including full-text search. Highly extensible, it supports custom data types and procedural languages, making it adaptable for varied applications. Adhering to ACID principles and SQL standards, PostgreSQL ensures reliable, consistent, and interoperable data management. Its active community drives frequent updates and a broad ecosystem of tools, solidifying its reputation as a top choice for scalable, reliable, and feature-rich database solutions in both enterprise and open-source environments.

What is Amazon Redshift?

Amazon Redshift is a fully managed, petabyte-scale cloud data warehouse optimized for fast querying and analysis of large datasets. Built on PostgreSQL, it uses columnar storage, parallel processing, and compression for high-performance analytic workloads. Redshift ingests data from sources like Amazon S3, DynamoDB, and on-premises databases, and supports complex SQL queries via BI tools within AWS. Key features include automatic scaling, workload management, and materialized views, supporting enterprise BI and real-time analytics. Its architecture enables seamless connectivity and integration with AWS services, allowing users to build robust end-to-end analytics pipelines.

Why Move Data from PostgreSQL into Amazon Redshift

The Advantages of Transferring Data from PostgreSQL to Redshift for Analytics

A data engineer or architect may choose to copy data from PostgreSQL into Amazon Redshift for several compelling reasons. Firstly, the data housed within PostgreSQL often contains valuable information that can inform business decisions and drive analytical insights. By transferring this data to Redshift, it becomes possible to integrate PostgreSQL data with other enterprise data sources in a centralized data warehouse, thereby amplifying its value through comprehensive analysis and cross-source correlations. Importantly, performing complex queries, reporting, and data integration operations directly on the transactional PostgreSQL database can negatively impact its performance and affect critical business applications. By leveraging Redshift’s high-performance, scalable architecture for analytical workloads, organizations can ensure that operational databases remain unburdened while still gaining timely and actionable intelligence from their data.

Similar connectors

View all connectors

Creating a User in PostgreSQL

This guide provides step-by-step instructions for creating a user (role) in a PostgreSQL database using the PostgreSQL command-line interface (

psql

). These steps assume you have administrative (superuser or sufficient privilege) access to the database server.

1. Connect to PostgreSQL

First, access your PostgreSQL environment via the terminal:

bash
psql -U postgres

- Replace

postgres

with your superuser or admin username if different.

2. Check Existing Roles

To list existing roles, run:

\du

3. Create a New User

Use the

CREATE ROLE

statement or its shortcut

CREATE USER

to create a new database user. Here is an example:

CREATE USER username WITH PASSWORD 'user_password';

Replace
```
username
```
with the desired PostgreSQL username.
Replace
```
user_password
```
with a strong, secure password.

Example:

CREATE USER reporting_user WITH PASSWORD 'Str0ngP@ssw0rd!';

4. Grant Privileges (Optional)

By default, the new user cannot create databases or modify schemas. To grant additional privileges, use:

Grant Permission to Create Databases

ALTER USER username CREATEDB;

Grant Permission to Login (Almost always required for users):

The

LOGIN

privilege is included by default when using

CREATE USER

. To be explicit, you can write:

CREATE ROLE username WITH LOGIN PASSWORD 'user_password';

Grant User All Privileges on a Specific Database

GRANT ALL PRIVILEGES ON DATABASE database_name TO username;

Example:

GRANT ALL PRIVILEGES ON DATABASE mydb TO reporting_user;

5. Verify the New User

To confirm the new user was created successfully, run:

\du

The new username should appear in the list of roles.

Note:
Managing users in PostgreSQL is a privileged operation. Only superusers or users with adequate rights can create and manage roles and permissions. For more advanced configurations—such as role inheritance or restricting user access to specific tables or schemas—refer to the PostgreSQL documentation on roles and privileges.

Checking the JDBC Driver

The PostgreSQL JDBC driver is already installed in Matillion Data Productivity Cloud, so there is no need to download or install it separately. To confirm the presence of the JDBC driver, follow these steps:

In Matillion Data Productivity Cloud, create a new Orchestration pipeline.
Drag and drop a "Database Query" component into the workspace.
Open the configuration for the "Database Query" component and locate the "Database Type" dropdown list.
Check that "PostgreSQL" appears as an option in the dropdown list. This confirms that the corresponding JDBC driver is available.

If you would like to check the version of the JDBC driver in your environment, follow the instructions provided in the Matillion documentation: https://docs.matillion.com/data-productivity-cloud/designer/docs/database-query/#database-driver-versions

Checking network connectivity

To enable Matillion Data Productivity Cloud to connect to your PostgreSQL database, you must ensure the database accepts incoming connections according to your deployment configuration:

Full SaaS Agent Configuration:
Configure your PostgreSQL database (and any relevant firewalls or network controls) to allow incoming connections from the IP addresses published by Matillion for Full SaaS environments. You can find the current list of required IP addresses here: https://docs.matillion.com/data-productivity-cloud/security/docs/allowing-ip-addresses/
Hybrid SaaS Deployment:
In this deployment model, the connections to your PostgreSQL database originate from within your own virtual private cloud (VPC) where the Hybrid SaaS agent is running. You must allow incoming connections from the appropriate VPC resources. For assistance with network diagnostics, you can use the utilities available at: https://exchange.matillion.com/data-productivity-cloud/pipeline/check-network-access/

Note: If your PostgreSQL database is referenced using a DNS address (rather than a direct IP), ensure that the Full SaaS or Hybrid SaaS agent can resolve the hostname to the correct address. Network DNS settings must be configured to allow proper name resolution within your chosen deployment.

Querying Data from a PostgreSQL Database

To efficiently retrieve and replicate data from a PostgreSQL database—especially when integrating with platforms like Amazon Redshift—follow the best practices outlined below.

Example SELECT Queries (PostgreSQL)

Here are common SQL

SELECT

statements used to query data from PostgreSQL:

``` -- Select all columns from a table SELECT * FROM public.customers;

-- Select specific columns from a table SELECT id, first_name, last_name FROM public.customers;

-- Filter data using WHERE SELECT * FROM public.orders WHERE order_date >= '2024-01-01';

-- Join two tables SELECT o.id, o.order_date, c.first_name, c.last_name FROM public.orders o JOIN public.customers c ON o.customer_id = c.id WHERE o.order_status = 'SHIPPED'; ```

Datatype Conversion

When replicating or synchronizing data from PostgreSQL to Amazon Redshift, pay attention to datatype conversions. Some PostgreSQL datatypes do not have a direct equivalent in Redshift and may require explicit casting or adjustment. For example:

```
serial
```
or
```
bigserial
```
(PostgreSQL) ⇨
```
INTEGER
```
or
```
BIGINT
```
(Redshift)
```
text
```
⇨
```
VARCHAR
```
```
timestamp with time zone
```
⇨
```
TIMESTAMPTZ
```

Consult the official documentation for detailed mapping between PostgreSQL and Redshift datatypes.

Loading Patterns: Initial and Incremental Loads

The optimal loading strategy from PostgreSQL to Redshift (or similar targets) is a two-stage approach:

1. Initial (Once-Off) Load

Extract the entire dataset from the source table.
Use the Database Query component without a filter clause:

SELECT * FROM public.orders;

All available rows are retrieved and replicated.

2. Incremental Loads

Only extract and replicate data that has changed (new or updated) since the last load.
Use the Database Query component with a filter clause referencing a suitable column (typically a timestamp or incrementing ID):

SELECT * FROM public.orders
    WHERE order_modified_at > '2024-06-01 00:00:00';

Adjust the value in the filter dynamically with each load, so only the newly changed rows are processed.

Further Reading: Detailed explanation and patterns can be found at Matillion Exchange: Incremental Load Data Replication Strategy

By following the above query patterns and load strategies, and understanding datatype implications, your data integrations with PostgreSQL will be robust and performant.

Data Integration Architecture

Loading data in advance of integration represents a "divide and conquer" strategy, where the challenge of integrating heterogeneous data sources is separated into two distinct steps. This approach underpins the Extract, Load, and Transform (ELT) architecture, delivering clear advantages over traditional models. Data is first extracted and loaded efficiently into Amazon Redshift, and only then subjected to necessary integration and transformation. Data integration inherently requires transformation—for example, harmonizing different formats or applying business rules—and this is best achieved through robust data transformation pipelines. In the ELT paradigm, all data transformation and integration processes are executed within the target Redshift database itself. This confers further benefits: transformations are fast, on-demand, and naturally scalable with the Redshift cluster, while organizations avoid the cost and operational complexity of maintaining separate data processing infrastructure externally.

Agentic Data Engineers

Automation & Management

Data Connectivity

Security

Data Transformation

All Features

Amazon S3

PostgreSQL

SAP

Salesforce

Google Analytics

Custom Connectors

Cloud Data Warehousing

Legacy ETL Migration

Business Analytics Projects

Generative AI

Data Movement + Reverse ETL

Self-Service Business Intelligence

Financial Services

Healthcare

Retail

Communications and Media

Technology and Software

Manufacturing

PRODUCT

Blog

Documentation

Product Tours

eBooks

TRAINING

Matillion Academy

How-to Videos

Certifications

How-to Articles

Support Portal

Support Plans

Professional Services Packages

Downloads (Matillion Exchange)

USER COMMUNITY

Customer Resource Center

Roadmap & Ideas Portal

Events

PARTNERS

Become a Partner

Find a Partner

Partner Portal

Why Matillion?

Company

Blog

Customers

Partners

Investors

News

Press

Careers

Integrate data from PostgreSQL to Amazon Redshift using Matillion

Extracting data from PostgreSQL to Amazon Redshift

What is PostgreSQL?

What is Amazon Redshift?

Why Move Data from PostgreSQL into Amazon Redshift

The Advantages of Transferring Data from PostgreSQL to Redshift for Analytics

Similar connectors

Creating a User in PostgreSQL

1. Connect to PostgreSQL

2. Check Existing Roles

3. Create a New User

Example:

4. Grant Privileges (Optional)

Grant Permission to Create Databases

Grant Permission to Login (Almost always required for users):

Grant User All Privileges on a Specific Database

Example:

5. Verify the New User

Checking the JDBC Driver

Checking network connectivity

Querying Data from a PostgreSQL Database

Example SELECT Queries (PostgreSQL)

Datatype Conversion

Loading Patterns: Initial and Incremental Loads

1. Initial (Once-Off) Load