Ethereum Dataset
Ethereum On-chain Data
Introduction
We run the Ethereum Full Node (up to 8099999 blocks) to get the on-chain data (block, trace, receipt) and process them into the following datasets:
Block :9,000,000 blocks information.
NormalTransaction:There are 590,040,569 transactions generated from the block data.
InternalEtherTransaction:Ether is the native cryptocurrency of Ethereum. The transactions of Ether not only happen in the transactions recorded in the block, but also occur during the smart contract execution. 374,352,941 Ether transactions which occur among 62,792,733 addresses are collected.
ContractInfo: Ethereum can be considered as a platform for smart contracts. There are 19,678,790 smart contracts created by 145,060 addresses. It implies that there should be a number of users who create multiple contracts. 7,261,009 contracts are deleted while they refund the Ether balance to 19,136,653 addresses.
ContractCall: In EVM, a smart contract can call another one to invoke some codes or functions. It consists of 1,392,685,279 Contract Calls, among which 842,001,519 contain input codes and 171,862,694 contain errors.
ERC20Transaction:In order to collect the information of tokens, we process the receipt dataset to extract the standard events, which are defined in the standard ERC20 protocol of Ethereum community. Additionally, each ERC20 token contains basic information like name, symbol, total supply, etc. There are 290,592,657 ERC20 transactions among 46,634,221 holder addresses.
ERC721Transaction:ERC721 token is another contract protocol proposed by Ethereum community. We find that 2,789 ERC721 contracts contain 23,308,838 token transactions and 457,739 holder addresses.
You can get more details and analysis from the paper called “Xblock-ETH: Extracting and Exploring Blockchain Data from Ethereum“.
Data details
Citation
BibTeX
@article{zhen2020xblock,
title={XBlock-ETH: Extracting and Exploring Blockchain Data From Ethereum},
author={Zheng, Peilin and Zheng, Zibin and Wu, Jiajing and Dai, Hong-ning},
journal={IEEE Open Journal of the Computer Society},
year={2020},
volume={1},
number={},
pages={95-106},
}
IEEE
P. Zheng, Z. Zheng, J. Wu and H. Dai, “XBlock-ETH: Extracting and Exploring Blockchain Data From Ethereum,” in IEEE Open Journal of the Computer Society, vol. 1, pp. 95-106, 2020, doi: 10.1109/OJCS.2020.2990458.
ACM
Peilin Zheng, Zibin Zheng, Jiajing Wu, and Hongning Dai, “XBlock-ETH: Extracting and Exploring Blockchain Data From Ethereum,” IEEE Open Journal of the Computer Society, vol. 1, pp. 95-106, 2020, doi: 10.1109/OJCS.2020.2990458.
Ethereum Partial Transaction Dataset
Introduction
Different from the Ethereum On-chain Data, the Ethereum Partial Transaction Datasets are three relatively small Ethereum datasets (namely EthereumG1, EthereumG2, EthereumG3) for easier analysis. The transaction datasets are modeled as complex networks, which can be used in graph analysis such as link prediction.
In the constructed network, a node represents an Ethereum account and a link (i.e. edge) represents an Ethereum transfer transaction.
In our work, we conduct temporal link prediction with these three datasets. We use the existing links in the past (with smaller timestamps) as the training data to predict the occurrences of links in the future (with larger timestamps). You can learn more details in the Related Research.
The data details of EthereumG1 are described below. The file structure of EthereumG2 and EthereumG3 are similar to EthereumG1. You can know more information in the README file.
Data details
Citation
BibTeX
@article{lin2020modeling,
title={Modeling and Understanding {Ethereum} Transaction Records via A Complex Network Approach},
author={Lin, Dan and Wu, Jiajing and Yuan, Qi and Zheng, Zibin},
journal={IEEE Transactions on Circuits and Systems--II: Express Briefs },
year={2020},
note = {to be published, doi: \url{10.1109/TCSII.2020.2968376}},
publisher={IEEE},
}
IEEE
D. Lin, J. Wu, Q. Yuan, and Z. Zheng, “Modeling and understanding ethereum transaction records via a complex network approach,” IEEE Transactions on Circuits and Systems–II: Express Briefs, 2020, to be published, doi: 10.1109/TCSII.2020.2968376.
ACM
Dan Lin, Jiajing Wu, Qi Yuan, and Zibin Zheng, “Modeling and understanding ethereum transaction records via a complex network approach,” IEEE Transactions on Circuits and Systems–II: Express Briefs, 2020, to be published, doi: 10.1109/TCSII.2020.2968376.
Smart Ponzi Scheme Labels
Introduction
A Ponzi scheme is a fraudulent investment operation where the operator generates returns for older investors through revenue paid by new investors.
Ponzi_label.csv:The dataset is the labels of smart ponzi contracts by manually check.
Data details
Citation
BibTeX
@inproceedings{chen2018detecting,
title={Detecting ponzi schemes on ethereum: Towards healthier blockchain technology},
author={Chen, Weili and Zheng, Zibin and Cui, Jiahui and Ngai, Edith and Zheng, Peilin and Zhou, Yuren},
booktitle={Proceedings of the 2018 World Wide Web Conference},
pages={1409--1418},
year={2018},
organization={International World Wide Web Conferences Steering Committee}
}
IEEE
W. Chen, Z. Zheng, J. Cui, E. Ngai, P. Zheng, and Y. Zhou. 2018. Detecting Ponzi Schemes on Ethereum: Towards Healthier Blockchain Technology. In Proceedings of the 2018 World Wide Web Conference (WWW ’18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 1409–1418. DOI:https://doi.org/10.1145/3178876.3186046
ACM
Weili Chen, Zibin Zheng, Jiahui Cui, Edith Ngai, Peilin Zheng, and Yuren Zhou. 2018. Detecting Ponzi Schemes on Ethereum: Towards Healthier Blockchain Technology. In Proceedings of the 2018 World Wide Web Conference (WWW ’18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 1409–1418. DOI:https://doi.org/10.1145/3178876.3186046
Smart Contract Attribute Dataset
Introduction
Open Source Contract Info.csv dataset contains about 14 thousand contracts which is open source on Etherscan.
ContractInfo.csv dataset is crawled from Ethereum main chain using geth RPC tool.
Data details
Citation
BibTeX
@inproceedings{huang2019recommending, title={Recommending differentiated code to support smart contract update}, author={Huang, Yuan and Kong, Queping and Jia, Nan and Chen, Xiangping and Zheng, Zibin}, booktitle={Proceedings of the 27th International Conference on Program Comprehension}, pages={260--270}, year={2019}, organization={IEEE Press} }
IEEE
Y. Huang, Q. Kong, N. Jia, X. Chen and Z. Zheng, "Recommending Differentiated Code to Support Smart Contract Update," 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC), Montreal, QC, Canada, 2019, pp. 260-270.
ACM
Yuan Huang, Queping Kong, Nan Jia, Xiangping Chen, and Zibin Zheng. 2019. Recommending differentiated code to support smart contract update. In Proceedings of the 27th International Conference on Program Comprehension (ICPC ’19). IEEE Press, 260–270. DOI:https://doi.org/10.1109/ICPC.2019.00045
Second-order Transaction Network of Phishing Nodes
Introduction
The second-order transaction network dataset contains 1660 target phishing nodes and 1700 non-phishing nodes crawled from Etherscan.
The second-order transaction network is divided into two parts. The csv files record the transaction information between the target node and its first-order transaction node, and the csv files also record the transactions between each first-order transaction node and its corresponding second-order transaction node information.
Data details
Citation
BibTeX
@misc{xblockEthereum,
author = {Yuan, Zihao and Wu, Jiajing and Zheng, Zibin},
title = {{XBLOCK Blockchain Datasets}: {InPlusLab} Ethereum Transaction Network of Phishing Nodes Datasets},
howpublished = {\url{http://xblock.pro/ethereum/}},
month = Mar,
year = 2020
}
IEEE
Z. Yuan, J. Wu, Z. Zheng “{XBLOCK Blockchain Datasets}: {InPlusLab} Ethereum Transaction Network of Phishing Nodes Datasets,” \url{http://xblock.pro/ethereum/}, Accessed: Mar 2020.
ACM
Zihao Yuan, Jiajing Wu, Zibin Zheng “{XBLOCK Blockchain Datasets}: {InPlusLab} Ethereum Transaction Network of Phishing Nodes Datasets,” \url{http://xblock.pro/ethereum/}, Accessed: Mar 2020.
Ethereum Phishing Transaction Network
Introduction
Cryptocurrency, as blockchain’s most famous implementation, suffers a huge economic loss due to phishing scams. In our work, accounts and transactions in Ethereum are treated as nodes and edges, thus detection of phishing accounts can be modeled as a node classification problem.
In this work, we collected phishing nodes from Ethereum that reported in Etherscan labeled cloud. Starting from phishing nodes we crawl a huge Ethereum transaction network via second-order BFS. Dataset contains 2,973,489 nodes, 13,551,303 edges and 1,165 labeled nodes.
MulDiGraph.pkl:This dataset is stored in pickle format, and it is the networkx object. Each node is an address with an attribute called isp indicating whether it is a phishing node. Each edge has two attributes, including amount and timestamp, which represent the balance of the transaction and the timestamp of the transaction, respectively. In this data set, the total number of nodes is 2,973,489, the number of transactions is 13,551,303, and the average degree is 4.5574.
Data details
Citation
BibTeX
@misc{xblockEthereum,
author = {Chen, Liang and Peng, Jiaying and Liu, Yang and Li, Jintang and Xie, Fenfang and Zheng, Zibin},
title = {{XBLOCK Blockchain Datasets}: {InPlusLab} Ethereum Phishing Detection Datasets},
howpublished = {\url{http://xblock.pro/ethereum/}},
year = 2019
}
IEEE
L. Chen, J. Peng, Y. Liu, J. Li, F. Xie, and Z. Zheng “{XBLOCK Blockchain Datasets}: {InPlusLab} Ethereum Phishing Detection Datasets,” \url{http://xblock.pro/ethereum/}, Accessed: Nov 2019.
ACM
Liang Chen, Jiaying Peng, Yang Liu, Jiatang Li, Fenfang Xie, Zibin Zheng “{XBLOCK Blockchain Datasets}: {InPlusLab} Ethereum Phishing Detection Datasets,” \url{http://xblock.pro/ethereum/}, Accessed: Nov 2019.
Ether Price and Volume Dataset
Introduction
This is the market data of Ether in terms of price and volume from August 2015 (when Ether first appeared) to March 2019. The time interval of sampling is selected as four-hour, that is to say, we choose every kinds of price and volume every of four-hour as the original data.
The original market data of Ether are obtained from Poloniex,one of the most active crypto asset exchanges.
Data details
Citation
BibTeX
@article{han2020long,
title={Long-range dependence, multi-fractality and volume-return causality of Ether market},
author={Han, Qing and Wu, Jiajing and Zheng, Zibin},
journal={Chaos: An Interdisciplinary Journal of Nonlinear Science},
volume={30},
number={1},
pages={011101},
year={2020},
publisher={AIP Publishing LLC}
}
IEEE
Q. Han, J. Wu and Z. Zheng, “Long-range dependence, multi-fractality and volume-return causality of Ether market,” Chaos: An Interdisciplinary Journal of Nonlinear Science, vol. 30, no. 1, pp. 011101, 2020.
ACM
Qing Han, Jiajing Wu and Zibin Zheng, “Long-range dependence, multi-fractality and volume-return causality of Ether market,” Chaos: An Interdisciplinary Journal of Nonlinear Science, vol. 30, no. 1, pp. 011101, 2020.
Thanks for sharing these datasets!