AWS

Upgrade Your Amazon Redshift Cluster from DC1 to DC2 Nodes

Improve Query Performance at No Additional Cost


Our Redshift data warehouse is still relatively small – about 40GB of disk space used, which for us is about 1 billion rows – so we only need a single node cluster. When I created the cluster in September 2017, the DC1 generation was the only type of node available.

While perusing the Stitch blog yesterday, I saw an article about upgrading from DC1 to the new DC2 node type. These are the same price as DC1 instances of the same size, but I/O is significantly higher, and for dc2.8xlarge, ECU is slightly lower (104 units down to 99 units on DC2). Stitch is an ETL or data pipeline tool, moving cloud applications’ data into your data warehouse of choice.

Upgrading to DC2 is simple. In the AWS Console, when on the dashboard for your Redshift cluster, click the “Cluster” dropdown and select “Resize.”

Choose the node size (dc2.large for us), and number of nodes, and click Resize.

Redshift - Resize Cluster Modal

The resize operation will put your cluster in Read-Only mode for at least an hour, so this should be done during off-hours. The resize operation seems to restart your cluster with both the old nodes and the new nodes, then runs a data-transfer from the old to the new. The restart took about 10 minutes, and the data transfer took about an hour for 40GB of data. You can view progress under the Status tab.

Redshift - Resize Cluster Status Tab

If you use Stitch, your integrations will error during the downtime, but data will be loaded once your cluster comes back online. It’s all pretty seamless. Here’s what it looked like during our Resize:

Stitch Notifications During Redshift Resize

Once the cluster came back online, the notification count started going down over the next few hours.

Performance Gains

I recently wrote a query that scans about 100 million rows on our Redshift cluster and does an aggregation. On the DC1 node, it took an average of 5 minutes and 30 seconds to run (average of 5 executions, all between 5 and 6 minutes). On the DC2 node, the same query is running in 3 minutes and 15 seconds. That’s about a 40% improvement!

Redshift DC2 Performance Improvements Over DC1

I'm the Analytics Therapist at Redox, a quickly growing technology platform that enables organizations to send healthcare data back and forth. Here, I write about our journey to become a data-driven organization, and the technical challenges I've faced along the way. All views and opinions are my own and do not represent those of my employer.

View Comments