Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorg tiering policy sections into manage tiering #3524

Open
wants to merge 18 commits into
base: latest
Choose a base branch
from

Conversation

atovpeko
Copy link
Contributor

No description provided.

Copy link

Allow 10 minutes from last push for the staging site to build. If the link doesn't work, try using incognito mode instead. For internal reviewers, check web-documentation repo actions for staging build status. Link to build for this PR: http://docs-dev.timescale.com/docs-3508-docs-rfc-reorg-tiering-policy-sections-into-manage-tiering

Copy link
Contributor

@billy-the-fish billy-the-fish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments, good stuff.

use-timescale/data-tiering/enabling-data-tiering.md Outdated Show resolved Hide resolved
use-timescale/data-tiering/enabling-data-tiering.md Outdated Show resolved Hide resolved
use-timescale/data-tiering/enabling-data-tiering.md Outdated Show resolved Hide resolved
use-timescale/data-tiering/enabling-data-tiering.md Outdated Show resolved Hide resolved
use-timescale/data-tiering/enabling-data-tiering.md Outdated Show resolved Hide resolved
use-timescale/data-tiering/enabling-data-tiering.md Outdated Show resolved Hide resolved
use-timescale/data-tiering/enabling-data-tiering.md Outdated Show resolved Hide resolved
use-timescale/data-tiering/enabling-data-tiering.md Outdated Show resolved Hide resolved
use-timescale/data-tiering/enabling-data-tiering.md Outdated Show resolved Hide resolved
use-timescale/data-tiering/enabling-data-tiering.md Outdated Show resolved Hide resolved
atovpeko and others added 6 commits October 30, 2024 12:15
…-tiering' of github.com:timescale/docs into 3508-docs-rfc-reorg-tiering-policy-sections-into-manage-tiering
use-timescale/data-tiering/enabling-data-tiering.md Outdated Show resolved Hide resolved
use-timescale/data-tiering/enabling-data-tiering.md Outdated Show resolved Hide resolved
use-timescale/data-tiering/enabling-data-tiering.md Outdated Show resolved Hide resolved
use-timescale/data-tiering/enabling-data-tiering.md Outdated Show resolved Hide resolved
use-timescale/data-tiering/enabling-data-tiering.md Outdated Show resolved Hide resolved
use-timescale/data-tiering/enabling-data-tiering.md Outdated Show resolved Hide resolved
use-timescale/data-tiering/enabling-data-tiering.md Outdated Show resolved Hide resolved
use-timescale/data-tiering/enabling-data-tiering.md Outdated Show resolved Hide resolved
use-timescale/data-tiering/enabling-data-tiering.md Outdated Show resolved Hide resolved
* [Disable tiering on a hypertable][disabling-data-tiering] on an individual table if you no longer want to associate it with tiered storage.
This section explains the following:
* [Learn about the object storage tier][about-data-tiering]: understand tiered storage before you
[Manage tiering][enabling-data-tiering].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should remove this.

* [Manage tiering][enabling-data-tiering]: enable and disable data tiering, automate tiering with
policies or tier and untier manually.
* [Query tiered data][querying-tiered-data]: query and performance for tiered data.
* [Replicas and forks with tiered data][replicas-and-forks]: billing and tiered storage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* [Replicas and forks with tiered data][replicas-and-forks]: billing and tiered storage.
* [Replicas and forks with tiered data][replicas-and-forks]: How does tiered storage work with forks and replicas.

older than the `move_after` threshold to the object storage tier. This works similarly to a
[data retention policy][data-retention], but chunks are moved rather than deleted.

A tiering policy schedules a job that runs periodically to asynchronously migrate eligible chunks to object storage. Chunks are considered tiered once they appear in the `timescaledb_osm.tiered_chunks` view.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A tiering policy schedules a job that runs periodically to asynchronously migrate eligible chunks to object storage. Chunks are considered tiered once they appear in the `timescaledb_osm.tiered_chunks` view.
A tiering policy schedules a job that runs periodically to asynchronously migrate eligible chunks to object storage. After chunks are tiered, they appear in the `timescaledb_osm.tiered_chunks` view.

Copy link
Contributor

@gayyappan gayyappan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor edits suggested.
The overall "Manage tiering" section looks good!

Copy link
Contributor

@billy-the-fish billy-the-fish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These pages are really coming together.

* [Disable tiering on a hypertable][disabling-data-tiering] on an individual table if you no longer want to associate it with tiered storage.
This section explains the following:
* [Learn about the object storage tier][about-data-tiering]: understand tiered storage.
* [Tour tiered storage][tour-data-tiering]: see the different features in tiered storage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this link please.

---

# Tier data to the object storage tier
# Manage tiering
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the title should explain more clearly what we explain. Manage automatic and manual tiering?

---

# About the object storage tier

The tiered storage architecture complements Timescale's standard high-performance storage tier with a low-cost object storage tier.
The Timescale's tiered storage architecture includes a standard high-performance storage tier and a low-cost object storage tier built on Amazon S3. You can use the standard tier for data that requires quick access, and the object tier for rarely used historical data. Chunks from a single hypertable, including compressed chunks, can stretch across these two storage tiers. A compressed chunk uses a different storage representation after tiering.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The Timescale's tiered storage architecture includes a standard high-performance storage tier and a low-cost object storage tier built on Amazon S3. You can use the standard tier for data that requires quick access, and the object tier for rarely used historical data. Chunks from a single hypertable, including compressed chunks, can stretch across these two storage tiers. A compressed chunk uses a different storage representation after tiering.
Timescale's tiered storage architecture includes a standard high-performance storage tier, and a low-cost object storage tier built on Amazon S3. You use the standard tier for data that requires quick access, and the object tier for rarely used historical data. Chunks from a single hypertable, including compressed chunks, can stretch across these two storage tiers. A compressed chunk uses a different storage representation after tiering.

build views on tiered data, and even define continuous aggregates on tiered data.
In fact, because the implementation of continuous aggregates also use hypertables,
they can be tiered to low-cost storage as well.
In the standard storage, chunks are stored in the block format. In the object storage, they are stored in a compressed, columnar format. This format is different from that of the internals of the database, for better interoperability across various platforms. It allows for more efficient columnar scans across longer time periods, and Timescale uses other metadata and query optimizations to reduce the amount of data that needs to be fetched from the object storage tier to satisfy a query.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In the standard storage, chunks are stored in the block format. In the object storage, they are stored in a compressed, columnar format. This format is different from that of the internals of the database, for better interoperability across various platforms. It allows for more efficient columnar scans across longer time periods, and Timescale uses other metadata and query optimizations to reduce the amount of data that needs to be fetched from the object storage tier to satisfy a query.
In high-performance storage, chunks are stored in the block format. In the object storage, they are stored in a compressed, columnar format. For better interoperability across various platforms, this format is different from that of the internals of the database. It allows for more efficient columnar scans across longer time periods, and Timescale Cloud uses other metadata and query optimizations to reduce the amount of data that needs to be fetched from the object storage tier to satisfy a query.

an object store built on Amazon S3.
There, it's stored in the Apache Parquet format, which is a compressed
columnar format well-suited for S3. Data remains accessible both during and after the migration.
The tiered storage backend works by periodically and asynchronously moving older chunks to the object storage tier.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The tiered storage backend works by periodically and asynchronously moving older chunks to the object storage tier.
The tiered storage backend works by periodically and asynchronously moving older chunks from high-performance storage to the object storage tier.


The result is transparent queries across standard PostgreSQL storage and S3
storage, so your queries fetch the same data as before.
* Chunk pruning - exclude the chunks that fall outside the query time window.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put Chunk pruning: etc in bold to match the other lists in the page please.

* Row group pruning - identify the row groups within the Parquet object that satisfy the query.
* Column pruning - fetch only columns that are requested by the query.

The result is transparent queries across standard PostgreSQL storage and S3 storage, so your queries fetch the same data as before.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The result is transparent queries across standard PostgreSQL storage and S3 storage, so your queries fetch the same data as before.
The result is transparent queries across high-performance storage and S3 object storage , so your queries fetch the same data as before.


Enable tiered storage to begin migrating rarely used data from Timescale's standard high-performance storage tier
to the object storage tier to save on storage costs.
You use tiered storage to save on storage costs. Specifically, you can migrate rarely used data from Timescale's standard high-performance storage to the object storage. After you [enable tiered storage](#enable-tiered-storage), you then either [create automated tiering policies](#automate-tiering-with-policies) or [manually tier and untier data](#manually-tier-and-untier-chunks).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You use tiered storage to save on storage costs. Specifically, you can migrate rarely used data from Timescale's standard high-performance storage to the object storage. After you [enable tiered storage](#enable-tiered-storage), you then either [create automated tiering policies](#automate-tiering-with-policies) or [manually tier and untier data](#manually-tier-and-untier-chunks).
You use tiered storage to save on storage costs. Specifically, you can migrate rarely used data from Timescale's standard high-performance storage to object storage. After you [enable tiered storage](#enable-tiered-storage), you then either [create automated tiering policies](#automate-tiering-with-policies) or [manually tier and untier data](#manually-tier-and-untier-chunks).

@@ -23,95 +21,170 @@ sessions.
With tiered reads enabled, you can query your data normally even when it's distributed across different storage tiers.
Your hypertable is spread across the tiers, so queries and `JOIN`s work and fetch the same data as usual.

<!-- vale Google.Acronyms = YES -->

<Highlight type="warning">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd make this into a sentence without the warning and link to the performance section. if you must, make it an info admomition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Docs RFC] Reorg tiering policy sections into Manage tiering
3 participants