Redshift Observatory

Blog

2025-05-16

Data Sharing

I’ve not investigated data sharing, but I happened to start looking at a cluster today where it’s in use, and it looks like - as expected - it’s full local copies of remote tables.

Having not looked at it, I was thinking the copy might be block-level, but it looks like it’s an actual table, which raises questons about sorting, distribution, column encodings and vacuum.

On the face of it, sort/dist/encoding would be expected to be same as source table - but I wouldn’t bet on it. Might also all be set to auto.

Vacuum I imagine is being left to auto-vacuum, but given the table is presumably being updated when the source table is updated, wow, it means the vacuum on your cluster is now being driven by what’s going on over on the source cluster.

Replacement System Tables

I’ll be making a bug-fix release of RST on Saturday.

  1. There’s a bug in one of the “base layer” views, where I missed out the +1 to column ordinal needed to join between STV_BLOCKLIST and pg_attribute, which causes views showing info on columns to go wrong. That’s what happens when you haven’t yet implemented a comprehensive test suite.

  2. I’ll be moving over to using late-binding views. I absolutely do not want to, and this is a full misuse of them, but the Redshift maintenance update process breaks if you have views (or at least, enough of them, and complex enough) on the system tables. This to my eye should not occur - I’ve implemented code in the past to fully remove and then replace dependent views (and their privs, ownerships, etc) - it seems to be an implementation issue in the maintenance process. I mean no matter how you cut it, users should not be able to break the database update process.

I’ll had time to write just a little bit more documentation, which will go out in the doc.

I have to say I’m thinking now about ditching support for having multiple versions installed concurrently. I think it’s right, but I see it’s not user friendly.

2025-05-18

Replacement System Tables Update

I’ve decided on the new install design/approach, and created the new index page for the site.

Now need to do the engineering work to produce the new installer.

No Recursion Support with Late-binding Views

So, users can break Redshift maintenance by having normal view on system tables.

Since AWS are not going to fix this, late-binding views have to be used.

This means I now need in the installer generators for the replacement system tables to maintain two builds, one with normal views and one with late-binding, because normal view dependency is extremely useful for detecting bugs in views.

However, I’ve just discovered that late-binding views do not support view recursion - which is in use by the views which enumerate dependencies.

So now I have two builds, and one of them now has an exceptional case.

I used to have a simple installer. I’m now bearing development and maintenance costs to compensate for Redshift’s improperly implemented cluster upgrade process.

I also am now with the question now of whether or not the recursive normal views are going to be enough to break cluster upgrades. I can’t test this, because I can’t trigger cluster upgrades.

New Replacement System Tables Release

I’ve fixed up replacement system tables to use late-binding views, to work around the problems with Redshift cluster upgrades, where they fail when normal views exist on the system tables.

I’ve also taken the opportunity to simplify installation - the versionless sf schema has gone; you now modify the schema search path.

Where this has so greatly changed how installation occurs, and where I think adoption of the original replacement system tables was minimal, I’ve taken the unprecedented step of actually retiring the original release, where it’s installation arrangement is simply incompatible with the new release.

Finally note this has been done with limited time available, and so I’ve not checked as much as usual - in particular, I’ve not benchmarked on a reasonable system. I think the release is okay, given the nature and limited extent of changes made, but nothing works until tested. I’ll get this done in the next couple of days.

https://www.redshift-observatory.ch/replacement_system_tables/index.html

2025-05-24

dc2 Node Type Retirement

Email from AWS about a week ago, the dc2 node types will be retired end of April 2026.

This means ra3 only from now on.

The ra3 node type claims to separate storage from compute, but it doesn’t actually do so. The problem is that local disk is much faster than S3/RMS. Local disk is a block-based LRU cache of S3. Redshift does not use S3 directly - it only uses local disk - and when a block is not on local disk, it’s faulted in from S3. When local disk is larger than data, everything is on disk and it’s fast - I benchmark local disk on ra3.xlplus to be four times faster than S3/RMS. When you have enough data local disk is full, you then tart pushing data out into S3/RMS, and then you start swapping. I’ve seen client systems hit the wall when they have enough data, because performance is now bound by swapping data into local disk. I had them add two more nodes, to get more local disk, and performance was restored.

The claim then that ra3 separates compute and storage is true for and only for amount of storage. It is not true for performance. As it is, in my experience, most Redshift users have small data and everything fits on local disk, and so this problem does not arise. This leads to the discussion about who ought to be using Redshift in the first place, but that’s for another day.

Note also the smallest ra3 node type costs 0.543 cents per minute, where smallest dc2 costs 0.25 cents per minute. The cheapest cluster will double in price.



Home 3D Друк Blog Bring-Up Times Cross-Region Benchmarks Email Forums Mailing Lists Redshift Price Tracker Redshift Version Tracker Replacement System Tables Reserved Instances Marketplace Slack System Table Tracker The Known Universe White Papers