Uncategorized, Vertica

How to know the disk space consumption of Vertica Cluster (bit 3)

This post is a part of Bit Series blog.

Have you ever wondered on how much data size or catalog size is dedicated for Vertica on a single host / complete cluster.

This information is going to help you get the stats on the below

  1. How much data size dedicated for catalog and actual vertica data
  2. How much is size used by catalog and data on each host
  3. Stats in terms of percentage

Use the below query as a super user on Vertica to get the stats.

navin => select node_name, storage_usage, disk_space_used_mb, disk_space_free_mb, disk_space_free_percent  
	from disk_storage order by node_name, storage_usage;

     node_name | storage_usage | disk_space_used_mb| disk_space_free_mb| disk_space_free_percent
 v_db_node0001 | CATALOG       |              4540 |             46138 | 91%
 v_db_node0001 | DATA,TEMP     |            354548 |            270111 | 43%
 v_db_node0002 | CATALOG       |              9595 |             41083 | 81%
 v_db_node0002 | DATA,TEMP     |            362546 |            262113 | 41%
 v_db_node0003 | CATALOG       |              6957 |             43721 | 86%
 v_db_node0003 | DATA,TEMP     |            355516 |            269143 | 43%
(6 rows)

Note : This post only covers how to get stats for disk space utilization of Vertica data / catalog, besides this Vertica also stores the catalog in memory on every host and that can lead to some issues in performance of regular DML queries and TM operations.

Before Vertica 8.0 version, Vertica catalog issues were common, where a global catalog lock was held on every node of vertica to update the catalog at very frequent intervals, when there was such a lock held by Vertica Engine all the DML operations and Tuple Mover operations would be blocked. The main reason for this was Vertica tried to keep a centralized global catalog and sync of this log across all nodes would take time internally

Later they fixed by having a copy of global catalog on every node and thus reducing the time taken to sync to centralized catalog. Also they did reduce the size of global catalog by reducing the metadata related to projections inside each copy.

For more information on Catalog size in memory and checklist of identifying if your cluster has catalog performance issues, please refer to the below blogs by Vertica Engineering

  1. https://www.vertica.com/blog/catalog-size-debugging/
  2. https://www.vertica.com/blog/calculate-the-catalog-size-in-memory-on-each-node-quick-tip/

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s