Skip to content
/ eset Public

Empirical Study of Ethereum Transactions

Notifications You must be signed in to change notification settings

aphd/eset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 

Repository files navigation

Empirical Study of Ethereum Transactions (ESET)

Limitations

Miners rarely post accurate clock times.

Blocks variables

Attribute Description
confirmed Recorded time at which block was built. Note: Miners rarely post accurate clock times.
height The height of the block in the blockchain.
fees The total number of fees collected by miners in this block. (GWei)
size Raw size of block (including header and all transactions) in bytes.
n_tx The number of transactions included in the block.
lowest_gas_price The lowest gas price accepted by the block.

Transactions variables

Attribute Description
block_height Height of the block that contains this transaction. If this is an unconfirmed transaction, it will equal -1.
hash The hash of the transaction.
gas_price The price of gas in this transaction. (Wei)
gas_used The amount of gas used by this transaction.
gas_limit It is the maximum amount of Gas that a user is willing to pay for confirming a transaction. If not set, default is 21000 gas for external accounts and 80000 gas for contract accounts.
fees The total number of fees collected by miners in this transaction. Equal to gas_price * gas_used. (GWei)
received Time this transaction was received by a specific mempool server.
confirmed Time at which transaction was included in a block; only present for confirmed transactions.
size The size of the transaction in bytes.

EthGasStation variables

Attribute Description
timestamp The time at which the EthGasStation variables are recorded by the server.
fastest Gas to pay to have the transaction confirmed within 1 to 2 blocks time.
fast -
safeLow -
average -
blockNum -

Test

cd src
export PYTHONPATH="${PYTHONPATH}:app"
cp app/tokens-sample.py app/tokens.py
python3 -m unittest discover

Cron/Job scheduler

* * * * * cd ~/github/eset/src/; export PYTHONPATH="${PYTHONPATH}:app"; FN=fetch_oracle.py; /usr/bin/python3.6 app/$FN >> /tmp/$FN.log 2>&1

A descriptive analysis

The dataset is available to this link: 10.5281/zenodo.3584242.

From sqlite to csv

wget https://zenodo.org/record/3584242/files/db.sqlite3.bz2?download=1
bzip2 -d -v  /tmp/db.sqlite3.bz2
sqlite3 /tmp/db.sqlite3
sqlite> .headers on
sqlite> .mode csv
sqlite> .output block.csv
sqlite> select * from block
sqlite> .quit

R Programming analisys

df <- read.csv("/tmp/block.csv")

str(df)

ggplot(df, aes(y=n_tx, x="")) 
    + geom_violin() + geom_boxplot(width=0.1)


ggplot(df, aes(y=fees, x="")) 
    + geom_violin() + geom_boxplot(width=0.1) 
    + scale_y_continuous(trans = "log10")
    
ggplot(df, aes(y=fees, x="")) 
    + ylab("Fees (Wei)") + xlab("") 
    + geom_violin() + geom_boxplot(width=0.1) 
    + scale_y_continuous(trans = "log10")

with(df, hist(block_num[block_num >= 0 & block_num < 131], breaks=seq(0,130,by=1)))
with(df[df$gas_price >=20 & df$block_num >= 0, ] , hist(block_num[block_num >= 0 & block_num < 50], breaks=seq(0,50,by=1)))

Screenshot 2020-05-24 at 18 44 55

Oracles data analysis

bunzip2 block.csv.bz2
import pandas as pd
df = pd.read_csv('oracle.csv')
df['egs'] = df['egs']/10
oracles = pd.DataFrame({'Ether Chain':df["ec"], 'Ether Gas Station': df["egs"]})
print(((oracles.describe()).transpose()).to_latex())

table-latex

import pandas as pd
import seaborn as sns
df = pd.read_csv('oracle.csv')
ec_oracle = pd.DataFrame({'Gas Price (GWei)':df["ec"], 'Oracles': 'Ether Chain'})
egs_oracle = pd.DataFrame({'Gas Price (GWei)':df["egs"]/10, 'Oracles': 'Ether Gas Station'})
oracles = ec_oracle.append(egs_oracle)
sns.set(style="whitegrid", rc={'figure.figsize':(7,8)})
ax = sns.violinplot( x="Oracles" ,  y="Gas Price (GWei)", data=oracles)

violin-plot

import pandas as pd
df =pd.read_csv('/tmp/egsOracle.csv')

df['fastest'] = df['fastest']/10
df['fast'] = df['fast']/10
df['safeLow'] = df['safeLow']/10
df['average'] = df['average']/10

egs_fastest = pd.DataFrame({'y':df["fastest"], 'x': 'fastest'})
egs_fast = pd.DataFrame({'y':df["fast"], 'x': 'fast'})
egs_safeLow = pd.DataFrame({'y':df["safeLow"], 'x': 'safeLow'})
egs_average = pd.DataFrame({'y':df["average"], 'x': 'average'})

sns.set(style="whitegrid", rc={'figure.figsize':(30,8)})
ax = sns.violinplot( x="x" , y="y", data=egs_categories)

categories violin plots

Transactions data analysis

bunzip2 txs.csv.bz2
import pandas as pd
import seaborn as sns
df = pd.read_csv('txs.csv')
df = df[df.waiting_time > 15] # a tx needs to wait at least 1 block equals to 15 seconds
df = df[df.waiting_time < 150] # outliners set to txs waiting more than 30 blocks
df = df[df.gas_price < 10]
txs_less_10 = pd.DataFrame({'Waiting time (sec)': df["waiting_time"], 'x': 'Gas Price < 10 Gwei'})


df = pd.read_csv('txs.csv')
df = df[df.waiting_time > 15] # a tx needs to wait at least 1 block equals to 15 seconds
df = df[df.waiting_time < 150] # outliners set to txs waiting more than 30 blocks
df = df[df.gas_price > 10]
txs_greater_10 = pd.DataFrame({'Waiting time (sec)': df["waiting_time"], 'x': 'Gas Price > 10 Gwei'})

ax = sns.violinplot( y="Waiting time (sec)" , x="x", data=txs_less_10.append(txs_greater_10))

tx-confirmation-time-vs-gas-price

df = pd.read_csv('block.csv')

x=(df['fees']/10e+9)
x.describe()

ax = sns.boxplot(x)
ax.set_xscale("log")

Block data analysis

To calculate the block_time from the received_time variable of block, I calculated the difference between previous and current row.

SELECT
    height , received_time, fees, size, n_tx, lowest_gas_price,
    received_time - LAG ( received_time, 1, 0 ) OVER ( ORDER BY height ) block_time ,
    height - LAG ( height, 1, 0 ) OVER ( ORDER BY height ) height_diff
FROM
    block 

LIMIT 10
OFFSET 1;

SELECT
    received_time, height, fees, size, n_tx, lowest_gas_price, block_time
FROM
    block
ORDER BY height
LIMIT 10
OFFSET 1;

Generate descriptive statistics in latex

import pandas as pd

df = pd.read_csv('txs.csv')

print(((df.describe()).transpose()).to_latex())

# to suppress scientific notation
print(((df.describe().apply(lambda s: s.apply(lambda x: format(x, 'g')))).transpose()).to_latex())

References

  1. https://www.blockcypher.com/dev/ethereum/#blockchain
  2. https://www.etherchain.org/

About

Empirical Study of Ethereum Transactions

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages