Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some technical notes on using MultiversX data from BigQuery #816

Merged
merged 6 commits into from
Feb 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
141 changes: 141 additions & 0 deletions docs/sdk-and-tools/google-bigquery.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
---
id: google-bigquery
title: Google BigQuery
---

[comment]: # "mx-abstract"

This page succinctly describes how to use Google BigQuery to analyze data from the MultiversX blockchain.

[comment]: # "mx-context-auto"

## Overview

[**BigQuery**](https://cloud.google.com/bigquery/docs/introduction) is Google's fully managed, serverless data warehouse that enables analysis of extremely large datasets using [SQL queries](https://cloud.google.com/bigquery/docs/introduction-sql) and / or visual tools (such as [Google Looker Studio](https://lookerstudio.google.com)); it also has built-in [machine learning capabilities](https://cloud.google.com/bigquery/docs/bqml-introduction).

[**MultiversX Blockchain data**](https://console.cloud.google.com/marketplace/product/bigquery-public-data/blockchain-analytics-multiversx-mainnet-eu) is published to Google BigQuery, and available (for free) through the [**Google Cloud Marketplace**](https://console.cloud.google.com/marketplace/product/bigquery-public-data/blockchain-analytics-multiversx-mainnet-eu). The dataset, namely [**`bigquery-public-data.crypto_multiversx_mainnet_eu`**](https://console.cloud.google.com/bigquery?p=bigquery-public-data&d=crypto_multiversx_mainnet_eu&page=dataset), is one of many crypto datasets that are available within [**Google Cloud Public Datasets**](https://cloud.google.com/bigquery/public-data). One can query these datasets for free: up to 1TB / month of free processing, every month.

The MultiversX BigQuery dataset closely resembles the set of indices of the [**MultiversX Elasticsearch instance**](/sdk-and-tools/elastic-search#elasticsearch-indices). Their schema and data are **approximately equivalent**, the data [being mirrored from the Elasticsearch instance to BigQuery](https://github.com/multiversx/multiversx-etl) at regular intervals (most tables are updated _hourly_, and a few are updated every _4 hours_).

:::note
As of February 2024, the MultiversX BigQuery dataset **is not updated in real-time** (see above). For real-time data, [use the public APIs](/sdk-and-tools/rest-api).
:::

:::note
If you experience any issue with the published dataset, please [let us know](https://github.com/multiversx/multiversx-etl/issues).
:::

## Query from BigQuery Studio

[**Google BigQuery Studio**](https://cloud.google.com/bigquery/docs/query-overview#bigquery-studio) is a unified workspace for Google Cloud's data analytics suite which incorporates, among others, an SQL editor (optionally [assisted by AI](https://cloud.google.com/bigquery/docs/write-sql-duet-ai)) and Python notebooks. It is a great way to explore the MultiversX dataset, and to run queries. Below, we'll explore a few example queries.

:::tip
Make sure to explore the dataset, the tables and their schema before running queries. Both the schema and a data preview are available in BigQuery Studio.
:::

#### How many transactions were processed on MultiversX, in the last couple of days?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#### -> ### ? otherwise, they won't appear in the right side sidebar

Copy link
Contributor Author

@andreibancioiu andreibancioiu Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Was by intent (long sub-titles etc.).


```sql
SELECT
DATE(`timestamp`) `day`,
COUNT(*) `transactions`
FROM `bigquery-public-data.crypto_multiversx_mainnet_eu.transactions`
WHERE DATE(`timestamp`) >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
GROUP BY `day`
ORDER BY `day` DESC
```

#### Which were the top used Smart Contracts, in the last couple of days?

```sql
SELECT
DATE(`timestamp`) `day`,
`receiver` `contract`,
COUNT(DISTINCT `sender`) `num_users`,
FROM `bigquery-public-data.crypto_multiversx_mainnet_eu.transactions`
WHERE `isScCall` = true
GROUP BY `day`, `contract`
HAVING `day` >= DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY) AND `num_users` > 1000
ORDER BY `day` DESC, `num_users` DESC
```

#### What ESDT tokens have the most holders?

```sql
SELECT
`token`,
`type`,
COUNT(_id) `num_holders`
FROM `bigquery-public-data.crypto_multiversx_mainnet_eu.accountsesdt`
WHERE `type` = 'FungibleESDT' OR `type` = 'MetaESDT'
GROUP BY `token`, `type`
HAVING `num_holders` > 5000
ORDER BY `num_holders` DESC
```

#### What are the transactions with the largest transferred EGLD amounts, in the last couple of days?

```sql
SELECT
`day`,
`hash`,
`sender`,
`receiver`,
`amount`
FROM (
SELECT
DATE(`timestamp`) `day`,
`_id` `hash`,
`sender`,
`receiver`,
PARSE_BIGNUMERIC(`value`) `amount`,
ROW_NUMBER() OVER (PARTITION BY DATE(`timestamp`)
ORDER BY PARSE_BIGNUMERIC(`value`) DESC) AS `row_num`
FROM
`bigquery-public-data.crypto_multiversx_mainnet_eu.transactions`
WHERE
`status` = 'success'
AND DATE(`timestamp`) >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY) )
WHERE `row_num` = 1
ORDER BY `day` DESC
LIMIT 7;
```

#### What is the (global) network hitrate, per day, in the last month?

```sql
SELECT
DATE(`timestamp`) `day`,
-- 14400 is the number of rounds per day, and 3 + 1 = 4 is the number of shards
ROUND(COUNT(*) / (14400 * 4), 4) `hit_rate`
FROM `bigquery-public-data.crypto_multiversx_mainnet_eu.blocks`
WHERE
DATE(`timestamp`) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
AND DATE(`timestamp`) < CURRENT_DATE()
GROUP BY `day`
ORDER BY `day` DESC
```

:::note
Even if BigQuery includes a generous free tier, it is important to be mindful of the costs associated with running queries. For more information, see [BigQuery pricing](https://cloud.google.com/bigquery/pricing).

If you believe that specific optimizations can be applied to the dataset (to improve query performance), please [let us know](https://github.com/multiversx/multiversx-etl/issues).
:::

## Analyze using Looker Studio

[**Google Looker Studio**](https://lookerstudio.google.com) is a powerful tool for analyzing data and creating (shareable) reports. Out of the box, it connects to BigQuery (and many other data sources), thus it's a great way to explore the MultiversX dataset.

Example of report created in Looker Studio (leveraging the MultiversX dataset in BigQuery):

![img](/sdk-and-tools/looker_studio_1.png)

:::tip
In the BigQuery Studio, you can save the results of a given query as your own BigQuery tables, then immediately import them in Looker Studio, to create visualizations and reports.
:::

## Programmatic access

One can also query datasets programmatically, using the [BigQuery client libraries](https://cloud.google.com/bigquery/docs/reference/libraries).

See how to [query a public dataset with the BigQuery client libraries](https://cloud.google.com/bigquery/docs/quickstarts/quickstart-client-libraries).
1 change: 1 addition & 0 deletions sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -248,6 +248,7 @@ const sidebars = {
],
},
"sdk-and-tools/notifier",
"sdk-and-tools/google-bigquery",
"sdk-and-tools/devcontainers",
{
type: "category",
Expand Down
Binary file added static/sdk-and-tools/looker_studio_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading