Estimating Database and Table Size in Vertica

Hello All,

Today we will be looking at how to calculate size of database / tables in a Vertica Cluster.
So how is Vertica different in terms of storage – It is columnar
So Vertica stores data in Columnar format and hence the compression ratio is much better than other databases. I will not emphasize on Columnar storage of Vertica in this article (Will surely have a complete different article for this).
Below are some snippets through which you can get to know how much data is residing in your database or how much space is consumed by your data in database.

We will be calculating two different sizes in this article with respect to Raw data size and Actual Database Size:
1.       Database Size
2.       Table Size

Let’s start with Database size
The function used for calculating the database size is AUDIT                                                                          
AUDIT:Audit performs a full database audit and gives the actual database size in terms of raw data. Audit considers all the data in the database including the one is your WOS (Write optimized Store / Memory of your cluster)
Audit results are given in terms of bytes. It can be supplied with two optional parameters error_tolerance and confidence_level

Lowering the error_tolerance to zero (0) and raising the confidence_level to 100, provides the most accurate size estimate, and increases the performance impact of calling the AUDIT function. During a detailed, low error-tolerant audit, all of the data in the database is dumped to a raw format to calculate its size. Since performing a stringent audit can significantly impact database performance, never perform a full audit of a production database.
Audit function estimates the database size of your database considering the raw database size. What is raw data – raw data means the uncompressed, un-federated data stored in a single Vertica database.

What is excluded in while calculating raw database size
1.       Same data in multiple projections
2.       Data stored in temp tables
3.       Data which is marked for deletion ( Delete Vectors )
4.       Data stored in Monitoring, Data Collector and system tables
Usage of Audit Function

For calculating complete database size
NNANI=> select audit(”);

(1 row)
For calculating complete database size with high accuracy and high performance impact
NNANI=> select audit(”,0,100);

(1 row)
Here the 1st parameter is error_tolerance and second is confidence_level ….so the difference can be seen in both the scenarios regarding accuracy level.
To calculate size of a table
NNANI=> select audit(‘public.bank_holidays’);

(1 row

The size of table bank holidays  table is 319 bytes
Calculating Actual Database Size:
Actual database size will be the footprint of your database .i.e. it will be including all the projections for same table, data in WOS, Data in system tables. In simple words the space your data is consuming in your database.
For calculating this we can use two monitoring tables
1.       Column_storage
2.       Projection_storage
Using Column_storage table to calculate the database size
NNANI=> select round((sum(column_used_bytes)/1024/1024/1024), 2.0) || ‘ GB’ as database_size_in_GB from
NNANI-> (select anchor_table_schema, anchor_table_name,node_name, column_name, sum(used_bytes) as column_used_bytes
NNANI(> from column_storage where
NNANI(> node_name IN (select node_name from nodes)
NNANI(> group by 1,2,3,4) sub;
 0.1 GB

(1 row)
Using projection_storage table to calculate the database size

NNANI=> select round(sum(used_bytes)/1024/1024/1024, 2.0) || ‘ GB’ as database_size_in_GB  from
NNANI-> (select anchor_table_name,anchor_table_schema,node_name,sum(ros_used_bytes) as used_bytes
NNANI(> from projection_storage
NNANI(> group by 1,2,3
NNANI(> order by 3) sub;
 0.1 GB

(1 row)
Using Column_storage table to calculate the table size
NNANI=> select CASE WHEN round((sum(column_used_bytes)), 2.0) > 0 THEN round((sum(column_used_bytes)), 2.0) || ‘ MB’ END as table_size_in_Bytes,
NNANI-> CASE WHEN round((sum(column_used_bytes)/1024), 2.0) > 0 THEN round((sum(column_used_bytes)/1024), 2.0) || ‘ KB’ END as table_size_in_KB,
NNANI-> CASE WHEN round((sum(column_used_bytes)/1024/1024), 2.0) > 0 THEN round((sum(column_used_bytes)/1024/1024), 2.0) || ‘ MB’ END as table_size_in_MB,
NNANI-> CASE WHEN round((sum(column_used_bytes)/1024/1024/1024), 2.0) > 0 THEN round((sum(column_used_bytes)/1024/1024/1024), 2.0) || ‘ GB’ END as table_size_in_GB
NNANI-> from
NNANI-> (select anchor_table_schema, anchor_table_name,node_name, column_name, sum(used_bytes) as column_used_bytes
NNANI(> from column_storage where anchor_table_schema=’public’ and anchor_table_name=’bank_holidays’ and
NNANI(> node_name IN (select node_name from nodes)

NNANI(> group by 1,2,3,4) sub;

table_size_in_Bytes | table_size_in_KB | table_size_in_MB | table_size_in_GB
 828 MB              | 0.81 KB          |                  |
(1 row)
You can customize the above query according to your needs. You can even use the projection_storage table in same manner as column_storage is used.
This was a short article on calculating table size and database size in Vertica.
Please suggest any corrections if needed to this article
Your feedback is appreciated


Vertica SQL function Words_to_number

Hi There,

Just a quick update where we left last time – Vertica SQL Function

Here is a good example of Vertica SQL Function

The name of this function is words_to_number
This function takes arguments as a two word digit say ‘seventy one’ and gives the result in numbers .i.e 71

Just observe the way we can use the SQL function in Vertica to optimize the reusable codes.

Function Definition

WHEN REGEXP_COUNT(SPLIT_PART(x, ' ', 1), 'ty') = 1 then
WHEN SPLIT_PART(lower(x), ' ', 1) = 'twenty' THEN 2*10 
WHEN SPLIT_PART(lower(x), ' ', 1) = 'thirty' THEN 3*10 
WHEN SPLIT_PART(lower(x), ' ', 1) = 'forty' THEN 4*10
WHEN SPLIT_PART(lower(x), ' ', 1) = 'fifty' THEN 5*10
WHEN SPLIT_PART(lower(x), ' ', 1) = 'sixty' THEN 6*10
WHEN SPLIT_PART(lower(x), ' ', 1) = 'seventy' THEN 7*10
WHEN SPLIT_PART(lower(x), ' ', 1) = 'eighty' THEN 8*10
WHEN SPLIT_PART(lower(x), ' ', 1) = 'ninety' THEN 9*10 ELSE '0' END)
ELSE '0' END +
WHEN REGEXP_COUNT(x, ' [a-z]') = 1 then
WHEN SPLIT_PART(lower(x), ' ', 2) = 'one' THEN 1*1
WHEN SPLIT_PART(lower(x), ' ', 2) = 'two' THEN 2*1
WHEN SPLIT_PART(lower(x), ' ', 2) = 'three' THEN 3*1
WHEN SPLIT_PART(lower(x), ' ', 2) = 'four' THEN 4*1
WHEN SPLIT_PART(lower(x), ' ', 2) = 'five' THEN 5*1
WHEN SPLIT_PART(lower(x), ' ', 2) = 'six' THEN 6*1
WHEN SPLIT_PART(lower(x), ' ', 2) = 'seven' THEN 7*1
WHEN SPLIT_PART(lower(x), ' ', 2) = 'eight' THEN 8*1
WHEN SPLIT_PART(lower(x), ' ', 2) = 'nine' THEN 9*1
WHEN lower(x) = 'one' THEN 1
WHEN lower(x) = 'two' THEN 2
WHEN lower(x) = 'three' THEN 3
WHEN lower(x) = 'four' THEN 4
WHEN lower(x) = 'five' THEN 5
WHEN lower(x) = 'six' THEN 6
WHEN lower(x) = 'seven' THEN 7
WHEN lower(x) = 'eight' THEN 8
WHEN lower(x) = 'nine' THEN 9
ELSE '0' END) 


Always remember to use BEGIN and END statements as in Function definition.
And the most important or you can say limit to Vertica SQL Function is no usage of Select statement is allowed in Function statement.

Some example to use this function
Function Usage

nnani=> select WORDS_TO_NUMBER('twenty three');

(1 row)

You can also try some more example with this function

select WORDS_TO_NUMBER('ninety nine');

select WORDS_TO_NUMBER('seventy one');

select WORDS_TO_NUMBER('one');

The rules / limits when using this functions are the words can only extend till two digits, you cannot have a three word digit in the arguments.

Go ahead and use this function. Let me know your reviews on this.