Data Archive Considerations

This page highlights some initial considerations around the archiving of data in the vAirify platform. This includes some very rough calculations to decide if concentrating on archiving is worth the effort.

As the vAirify platform continues to gather data with daily runs of the ETL processes the overall size of both the database and pre-processed data textures volumes will continue to grow. Currently there is no upper limit to this growth.

The forecast ETL runs twice a day. The In-Situ data ETL runs once an hour. For the purposes of this we will ignore logs.

In order to get a rough estimate for how quickly our data stores are likely to grow I first cleared down the three main database tables and all local data textures, then reran the ETLs to populate these, I then removed any data from the current and previous days, as these may not have represented complete datasets. In effect, the only data stored was for a 5 day period. This ended up with the following:

Date	Forecast documents	In Situ documents	Data Textures documents
1st Aug	12546	32708	42
2nd Aug	12546	34019	42
3rd Aug	12546	34793	42
4th Aug	12546	34358	42
5th Aug	12546	33880	42

According to Mongo the storage size of these databases was:

Database	Size
DB forecast_data	5.39 MB
DB in_situ_data	12.88 MB
DB data_textures	28.67 kB

(Image for DB of data_textures table not shown)

This translates to roughly 3.7 MB a day.

In addition to the database tables we have the data textures themselves (stored separately on disk). On my local machine these took up 221 MB overall, or 22.1 MB a day

Combining these equates to a very rough estimate of 25.8 MB added daily by our processes.

Given the Linux box is 200 GB in size, if we were (very) cautious we could allocate 100 GB to the data storage, which would take (100 * 1000) / 25.8 = 3,876 days, or just over 10.5 years to fill.

It should be noted that these calculations are VERY high level and rough.

vAirify Wiki

Home

Getting Started and Overview

Investigations and Notebooks

Testing

Manual Test Charters

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Archive Considerations

vAirify Wiki

Clone this wiki locally