Redshift Observatory

Blog

2024-10-11

`ra3.xlplus` local disk and `stv_partitions`

Mmm. Interesting. ra3.xlplus nodes seem to have changed a bit - I think they used to have two disk partitions of 932gb, one for their own data and the other being one k-safety. I see this still with dc2.large (but with smaller disks of course). Now in ra3.xlplus however I see a single disk partition of twice that size. Half of that must be being used for k-safety, but stv_partitions can’t indicate this any more, since there’s only one disk, so owner now equals host. On the face of it, stv_partitions is now showing incorrect data for ra3.xlplus.

2024-10-12

`ra3.large`

First ra3.large benchmarks;

https://www.redshift-observatory.ch/cross_region_benchmarks/index.html

Failed to start, or is unavailable, in us-east-1.

Bring up times for ra3.large are long.

https://www.redshift-observatory.ch/bring_up_times/index.html

2024-10-13

Leader-Node Views Considered Faster

Running a leader-node only query from a view is about 0.5s to 1s faster than running the same SQL by issuing it directly (this on an ra3.16xlarge). Postgres (the leader node) stores views in a parsed form, not their original SQL, so creating a view I would guess is performing work which otherwise has to be performed when the query is issued.

2024-10-27

Layered Views

I am coming to the view that layered views to do not work, because it becomes too hard to know what’s going on with joins, and in RS, you need to, and can, avoid a lot of joins by using window functions, and you really need to do that, for performance, so you have to know what’s going on with joins, and layered views make that too difficult.

This means then that each view has to be whole and complete in itself.

In practice you can have some layering - but those layers in fact almost cannot contain joins.

2024-10-30

Interval

Little discovery. There are three different formats for the string representation of an interval, and the user can switch between them using SET.

dev=# select interval '10 day 5 hour 16 minute 55 second';
     interval     
------------------
 10 days 05:16:55
(1 row)

Time: 125.447 ms

dev=# set intervalstyle to postgres_verbose;
SET
Time: 217.357 ms
dev=# select interval '10 day 5 hour 16 minute 55 second';
             interval              
-----------------------------------
 @ 10 days 5 hours 16 mins 55 secs
(1 row)

Time: 126.878 ms

dev=# set intervalstyle to sql_standard;
SET
Time: 213.241 ms
dev=# select interval '10 day 5 hour 16 minute 55 second';
   interval   
--------------
 10 5:16:55.0
(1 row)

Time: 208.436 ms
dev=# exit

So you can’t rely on the string output being what you expect.

2024-10-31

`sf.copy_errors`

Check this - a view from the upcoming replacement system tables which shows, in human readable form, the single most recent COPY error for your user - what you see here is actual output, not formatting from psql. I’ve done it this way because often columns have wide values, which means the line often wraps, and then it’s hard to read.

prod=# select * from sf.copy_errors ;
    key    |                                           value                                            
-----------+--------------------------------------------------------------------------------------------
 event_ts  | 2024-10-30 17:39:24.228849
 column    | public.test_table.column_1
 reason    | Delimiter not found
 file_name | s3://wib-aoplop-bucket/aoplop/wib_for_aoplop/20240901-20241001/wib_for_aoplop-00005.csv.gz
 line      | 2
 value     | alpha_beta_gamma
(6 rows)

Nice, huh? :-)

Home 3D Друк Blog Bring-Up Times Cross-Region Benchmarks Email Forums Mailing Lists Redshift Price Tracker Redshift Version Tracker Replacement System Tables Reserved Instances Marketplace Slack System Table Tracker The Known Universe White Papers