Ethereum Dataset

Catalog

Ethereum On-chain Data

Introduction

We run the Ethereum Full Node (up to 8099999 blocks) to get the on-chain data (block, trace, receipt) and process them into the following datasets:

Block :9,000,000 blocks information.

NormalTransaction:There are 590,040,569 ​transactions generated from the block data.

InternalEtherTransaction:Ether is the native cryptocurrency of Ethereum. The transactions of Ether not only happen in the transactions recorded in the block, but also occur during the smart contract execution. 374,352,941 Ether transactions which occur among 62,792,733 addresses are collected.

ContractInfo: Ethereum can be considered as a platform for smart contracts. There are 19,678,790 smart contracts created by 145,060 addresses. It implies that there should be a number of users who create multiple contracts. 7,261,009 contracts are deleted while they refund the Ether balance to 19,136,653 addresses.

ContractCall: In EVM, a smart contract can call another one to invoke some codes or functions. It consists of 1,392,685,279 Contract Calls, among which 842,001,519 contain input codes and 171,862,694 contain errors.

ERC20Transaction:In order to collect the information of tokens, we process the receipt dataset to extract the standard events, which are defined in the standard ERC20 protocol of Ethereum community. Additionally, each ERC20 token contains basic information like name, symbol, total supply, etc. There are 290,592,657 ERC20 transactions among 46,634,221 holder addresses.

ERC721Transaction:ERC721 token is another contract protocol proposed by Ethereum community. We find that 2,789 ERC721 contracts contain 23,308,838 token transactions and 457,739 holder addresses.

You can get more details and analysis from the paper called “Xblock-ETH: Extracting and Exploring Blockchain Data from Ethereum“.

Data details

About this table
Ethereum block information.
Columns (14 columns)
blockNumber block number
timestamp timestamp
size block size
difficulty difficulty
transactionCount transaction count
minerAddress miner address
minerExtra miner extra
gasLimit gasLimit
gasUsed gasUsed
minGasPrice min gasPrice
maxGasPrice max gasPrice
avgGasPrice avg gasPrice
miner miner address
reward miner reward
About this table
Ethereum block normal transactions information.
Columns (10 columns)
blockNumber block number
timestamp timestamp
transactionHash transaction hash
from address
to address
creates contract created
value value
gasLimit gasLimit
gasPrice gasPrice
gasUsed gasUsed
About this table
Ether Transaction. The transactions of Ether not only happen in the transactions recorded in the block, but also occur during the smart contract execution.
Columns (8 columns)
blockNumber block number
timestamp timestamp
transactionHash transaction hash
from address
to address
fromIsContract from is contract or not
toIsContract to is contract or not
value value
About this table
Ethereum Contract Information
Colmns (11 columns)
address address
createdBlockNumber created block number
createdTransactionHash created transaction hash
creator creator
createValue createValue
creationCode creation code
contractCode contract code
decreatedBlockNumber decreated block number
decreatedTransactionHash decreated transaction hash
refunder refunder
refundValue refund value
About this table
Ethereum Contract Calling
Colmns (11 columns)
blockNumber block number
timestamp timestamp
transactionHash transaction hash
from address
to address
fromIsContract from is contract or not
toIsContract to is contract or not
callType call type
callingFunction calling function
value value
isError error or not
About this table
Ethereum ERC20 token transaction information.
Columns (7 columns)
blockNumber block number
timestamp timestamp
transactionHash transaction hash
tokenAddress token contract address
from address
to address
value value
About this table
Ethereum ERC721 token transaction information.
Columns (7 columns)
blockNumber block number
timestamp timestamp
transactionHash transaction hash
tokenAddress token contract address
from address
to address
tokenId token id

Citation

BibTeX

@article{zhen2020xblock,
title={XBlock-ETH: Extracting and Exploring Blockchain Data From Ethereum},
author={Zheng, Peilin and Zheng, Zibin and Wu, Jiajing and Dai, Hong-ning},
journal={IEEE Open Journal of the Computer Society},
year={2020},
volume={1},
number={},
pages={95-106},
}

IEEE

P. Zheng, Z. Zheng, J. Wu and H. Dai, “XBlock-ETH: Extracting and Exploring Blockchain Data From Ethereum,” in IEEE Open Journal of the Computer Society, vol. 1, pp. 95-106, 2020, doi: 10.1109/OJCS.2020.2990458.

ACM

Peilin Zheng, Zibin Zheng, Jiajing Wu, and Hongning Dai, “XBlock-ETH: Extracting and Exploring Blockchain Data From Ethereum,” IEEE Open Journal of the Computer Society, vol. 1, pp. 95-106, 2020, doi: 10.1109/OJCS.2020.2990458.

Ethereum Partial Transaction Dataset

Introduction

Different from the Ethereum On-chain Data, the Ethereum Partial Transaction Datasets are three relatively small Ethereum datasets (namely EthereumG1, EthereumG2, EthereumG3) for easier analysis. The transaction datasets are modeled as complex networks, which can be used in graph analysis such as link prediction.

In the constructed network, a node represents an Ethereum account and a link (i.e. edge) represents an Ethereum transfer transaction.

In our work, we conduct temporal link prediction with these three datasets. We use the existing links in the past (with smaller timestamps) as the training data to predict the occurrences of links in the future (with larger timestamps). You can learn more details in the Related Research.

The data details of EthereumG1 are described below. The file structure of EthereumG2 and EthereumG3 are similar to EthereumG1. You can know more information in the README file.

Data details

About this table
Each row represents a mapping from Ethereum address to unique node number (ID).
Columns (2 columns)
addr Ethereum address
idx Node number (ID)
About this table
Sort all the collected edges according to their timestamps.
Columns (4 columns)
From The sender of the transaction
To The recipient of the transaction
Value The amount of money transferred
Timestamp When the transaction happens
About this table
The edge list of the training data for temporal link prediction problem.
Columns (4 columns)
from_node_num Node number (From)
to_node_num Node number (To)
value Value
timestamp Timestamp
About this table
Dataset split. Load this pickle file as a dict.
Columns (4 elements)
train_edges_pos positive node pairs in training set
test_edges_pos positive node pairs in test set
train_edges_false negative node pairs in training set
test_edges_false negative node pairs in test set

Citation

BibTeX

@article{lin2020modeling,
title={Modeling and Understanding {Ethereum} Transaction Records via A Complex Network Approach},
author={Lin, Dan and Wu, Jiajing and Yuan, Qi and Zheng, Zibin},
journal={IEEE Transactions on Circuits and Systems--II: Express Briefs },
year={2020},
note = {to be published, doi: \url{10.1109/TCSII.2020.2968376}},
publisher={IEEE},
}

IEEE

D. Lin, J. Wu, Q. Yuan, and Z. Zheng, “Modeling and understanding ethereum transaction records via a complex network approach,” IEEE Transactions on Circuits and Systems–II: Express Briefs, 2020, to be published, doi: 10.1109/TCSII.2020.2968376.

ACM

Dan Lin, Jiajing Wu, Qi Yuan, and Zibin Zheng, “Modeling and understanding ethereum transaction records via a complex network approach,” IEEE Transactions on Circuits and Systems–II: Express Briefs, 2020, to be published, doi: 10.1109/TCSII.2020.2968376.

Smart Ponzi Scheme Labels

Introduction

A Ponzi scheme is a fraudulent investment operation where the operator generates returns for older investors through revenue paid by new investors.

Ponzi_label.csv:The dataset is the labels of smart ponzi contracts by manually check.

Data details

About this table
The labels of whether a contract is a smart ponzi scheme.
Columns (2 columns)
Contract The address of contracts
Ponzi Whether the contract is a smart ponzi scheme (1 if yes)

Citation

BibTeX

@inproceedings{chen2018detecting,  
title={Detecting ponzi schemes on ethereum: Towards healthier blockchain technology},
author={Chen, Weili and Zheng, Zibin and Cui, Jiahui and Ngai, Edith and Zheng, Peilin and Zhou, Yuren},
booktitle={Proceedings of the 2018 World Wide Web Conference},
pages={1409--1418},
year={2018},
organization={International World Wide Web Conferences Steering Committee}
}

IEEE

W. Chen, Z. Zheng, J. Cui, E. Ngai, P. Zheng, and Y. Zhou. 2018. Detecting Ponzi Schemes on Ethereum: Towards Healthier Blockchain Technology. In Proceedings of the 2018 World Wide Web Conference (WWW ’18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 1409–1418. DOI:https://doi.org/10.1145/3178876.3186046

ACM

Weili Chen, Zibin Zheng, Jiahui Cui, Edith Ngai, Peilin Zheng, and Yuren Zhou. 2018. Detecting Ponzi Schemes on Ethereum: Towards Healthier Blockchain Technology. In Proceedings of the 2018 World Wide Web Conference (WWW ’18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 1409–1418. DOI:https://doi.org/10.1145/3178876.3186046

Smart Contract Attribute Dataset

Introduction

Open Source Contract Info.csv dataset contains about 14 thousand contracts which is open source on Etherscan.

ContractInfo.csv dataset is crawled from Ethereum main chain using geth RPC tool.

Data details

About this table
All Open Source Contracts in Ethereum.
Columns (9 columns)
address Address of Open Source Smart Contract
ContractCode Bytecode of Contract
timestamp Timestamp of Contract Creation
createValue Cerate Value of Contract Creation
createBlockNumber Block Height of Contract Creation
createdTransactionHash Transaction of Contract Creation
creationCode Bytecode of Contract Creation
creator Address of Creator
code Source Code of Contract  
About this table
Information of Smart Contract of Ethereum.
Columns (7 columns)
address Address of Open Source Smart Contract
createdBlockNumber Block Height of Contract Creation
createdTransactionHash Transaction of Contract Creation
creator Address of Contract Creater
createValue Parameter of Contract Creation
creationCode Bytecode of Contract Creation
contractCode Bytecode of Contract

Citation

BibTeX

@inproceedings{huang2019recommending,
   title={Recommending differentiated code to support smart contract update},
   author={Huang, Yuan and Kong, Queping and Jia, Nan and Chen, Xiangping and Zheng, Zibin},
   booktitle={Proceedings of the 27th International Conference on Program Comprehension},
   pages={260--270},
   year={2019},
   organization={IEEE Press}
}

IEEE

Y. Huang, Q. Kong, N. Jia, X. Chen and Z. Zheng, "Recommending Differentiated Code to Support Smart Contract Update," 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC), Montreal, QC, Canada, 2019, pp. 260-270.

ACM

Yuan Huang, Queping Kong, Nan Jia, Xiangping Chen, and Zibin Zheng. 2019. Recommending differentiated code to support smart contract update. In Proceedings of the 27th International Conference on Program Comprehension (ICPC ’19). IEEE Press, 260–270. DOI:https://doi.org/10.1109/ICPC.2019.00045

Second-order Transaction Network of Phishing Nodes

Introduction

The second-order transaction network dataset contains 1660 target phishing nodes and 1700 non-phishing nodes crawled from Etherscan.

The second-order transaction network is divided into two parts. The csv files record the transaction information between the target node and its first-order transaction node, and the csv files also record the transactions between each first-order transaction node and its corresponding second-order transaction node information.

Data details

About this table
Transaction information between the target node and its first-order transaction node.
Columns (6 columns)
TxHash Transaction hash
BlockHeight Height of block
TimeStamp Transaction timestamp
From Address of transaction initiating node
To Address of transaction receiving node
Value Transaction amount

Citation

BibTeX

@misc{xblockEthereum,
author = {Yuan, Zihao and Wu, Jiajing and Zheng, Zibin},
title = {{XBLOCK Blockchain Datasets}: {InPlusLab} Ethereum Transaction Network of Phishing Nodes Datasets},
howpublished = {\url{http://xblock.pro/ethereum/}},
month = Mar,
year = 2020
}

IEEE

Z. Yuan, J. Wu, Z. Zheng “{XBLOCK Blockchain Datasets}: {InPlusLab} Ethereum Transaction Network of Phishing Nodes Datasets,” \url{http://xblock.pro/ethereum/}, Accessed: Mar 2020.

ACM

Zihao Yuan, Jiajing Wu, Zibin Zheng “{XBLOCK Blockchain Datasets}: {InPlusLab} Ethereum Transaction Network of Phishing Nodes Datasets,” \url{http://xblock.pro/ethereum/}, Accessed: Mar 2020.

Ethereum Phishing Transaction Network

Introduction

Cryptocurrency, as blockchain’s most famous implementation, suffers a huge economic loss due to phishing scams. In our work, accounts and transactions in Ethereum are treated as nodes and edges, thus detection of phishing accounts can be modeled as a node classification problem. 

In this work, we collected phishing nodes from Ethereum that reported in Etherscan labeled cloud. Starting from phishing nodes we crawl a huge Ethereum transaction network via second-order BFS. Dataset contains 2,973,489 nodes, 13,551,303 edges and 1,165 labeled nodes.

MulDiGraph.pkl:This dataset is stored in pickle format, and it is the  networkx object. Each node is an address with an attribute called isp indicating whether it is a phishing node. Each edge has two attributes, including amount and timestamp, which represent the balance of the transaction and the timestamp of the transaction, respectively. In this data set, the total number of nodes is 2,973,489, the number of transactions is 13,551,303, and the average degree is 4.5574.

Data details

About this table
The networkx format data is not suitable for display in a table, so after reconstructing the three attributes of isp, amount, and timestamp, a form for display is generated as follows.
Columns (6 columns)
from Address
to Address
amount Balance of the transaction
timestamp When the transaction finished
fromIsPhi 1 means fishing mark node, otherwise 0
toIsPhi 1 means fishing mark node, otherwise 0

Citation

BibTeX

@misc{xblockEthereum,
author = {Chen, Liang and Peng, Jiaying and Liu, Yang and Li, Jintang and Xie, Fenfang and Zheng, Zibin},
title = {{XBLOCK Blockchain Datasets}: {InPlusLab} Ethereum Phishing Detection Datasets},
howpublished = {\url{http://xblock.pro/ethereum/}},
year = 2019
}

IEEE

L. Chen, J. Peng, Y. Liu, J. Li, F. Xie, and Z. Zheng “{XBLOCK Blockchain Datasets}: {InPlusLab} Ethereum Phishing Detection Datasets,” \url{http://xblock.pro/ethereum/}, Accessed: Nov 2019.

ACM

Liang Chen, Jiaying Peng, Yang Liu, Jiatang Li, Fenfang Xie, Zibin Zheng “{XBLOCK Blockchain Datasets}: {InPlusLab} Ethereum Phishing Detection Datasets,” \url{http://xblock.pro/ethereum/}, Accessed: Nov 2019.

Ether Price and Volume Dataset

Introduction

This is the market data of Ether in terms of price and volume from August 2015 (when Ether first appeared) to March 2019. The time interval of sampling is selected as four-hour, that is to say, we choose every kinds of price and volume every of four-hour as the original data.

The original market data of Ether are obtained from Poloniex,one of the most active crypto asset exchanges.

Data details

About this table
Market data about Ether as the exchange rate is ETH/USDT.
Columns (8 columns)
close The close price in the period
date The timestamp in the beginning of this period
high The highest price in the period
low The lowest price in the period
open The open price in the period
quoteVolume The quote volume in the period
volume The base volume in the period
weightedVolume The average price for those base volume and quote volume

Citation

BibTeX

@article{han2020long,
title={Long-range dependence, multi-fractality and volume-return causality of Ether market},
author={Han, Qing and Wu, Jiajing and Zheng, Zibin},
journal={Chaos: An Interdisciplinary Journal of Nonlinear Science},
volume={30},
number={1},
pages={011101},
year={2020},
publisher={AIP Publishing LLC}
}

IEEE

Q. Han, J. Wu and Z. Zheng, “Long-range dependence, multi-fractality and volume-return causality of Ether market,” Chaos: An Interdisciplinary Journal of Nonlinear Science, vol. 30, no. 1, pp. 011101, 2020.

ACM

Qing Han, Jiajing Wu and Zibin Zheng, “Long-range dependence, multi-fractality and volume-return causality of Ether market,” Chaos: An Interdisciplinary Journal of Nonlinear Science, vol. 30, no. 1, pp. 011101, 2020.

1人评论了“Ethereum”

发表评论

电子邮件地址不会被公开。 必填项已用*标注