Transaction Dataset
Labeled Dataset
First-order Transaction Network of Phishing Nodes
Second-order Transaction Network of Phishing Nodes
Ethereum Phishing Transaction Network
Bitcoin Partial Transaction Dataset
Unlabeled Dataset
Ethereum On-chain Data
EOSIO On-chain Data
Ethereum Partial Transaction Dataset
Ethereum On-chain Data
Introduction
We run the Ethereum Full Node (up to 10,999,999 blocks) to get the on-chain data (block, trace, receipt) and process them into the following datasets:
Block :11,000,000 blocks information.
NormalTransaction:There are 858,580,934 transactions generated from the block data.
InternalEtherTransaction:Ether is the native cryptocurrency of Ethereum. The transactions of Ether not only happen in the transactions recorded in the block, but also occur during the smart contract execution. 529,634,152 Ether transactions which occur among 87,570,650 addresses are collected.
ContractInfo: Ethereum can be considered as a platform for smart contracts. There are 31,949,110 smart contracts created by 182,142 addresses. It implies that there should be a number of users who create multiple contracts. 14,357,190 contracts are deleted while they refund the Ether balance to 19,139,210 addresses.
ContractCall: In EVM, a smart contract can call another one to invoke some codes or functions. It consists of 2,205,957,409 Contract Calls, among which 1,518,793,033 contain input codes.
ERC20Transaction:In order to collect the information of tokens, we process the receipt dataset to extract the standard events, which are defined in the standard ERC20 protocol of Ethereum community. Additionally, each ERC20 token contains basic information like name, symbol, total supply, etc. There are 467,603,485 ERC20 transactions among 66,719,139 holder addresses.
ERC721Transaction:ERC721 token is another contract protocol proposed by Ethereum community. We find that 6,255 ERC721 contracts contain 28,313,312 token transactions and 653,535 holder addresses.
You can get more details and analysis from the paper called “XBlock-ETH: Extracting and Exploring Blockchain Data from Ethereum“.
Data details
Citation
BibTeX
@article{zhen2020xblock,
title={XBlock-ETH: Extracting and Exploring Blockchain Data From Ethereum},
author={Zheng, Peilin and Zheng, Zibin and Wu, Jiajing and Dai, Hong-ning},
journal={IEEE Open Journal of the Computer Society},
year={2020},
volume={1},
number={},
pages={95-106},
}
IEEE
P. Zheng, Z. Zheng, J. Wu and H. Dai, “XBlock-ETH: Extracting and Exploring Blockchain Data From Ethereum,” in IEEE Open Journal of the Computer Society, vol. 1, pp. 95-106, 2020, doi: 10.1109/OJCS.2020.2990458.
ACM
Peilin Zheng, Zibin Zheng, Jiajing Wu, and Hongning Dai, “XBlock-ETH: Extracting and Exploring Blockchain Data From Ethereum,” IEEE Open Journal of the Computer Society, vol. 1, pp. 95-106, 2020, doi: 10.1109/OJCS.2020.2990458.
Ethereum Partial Transaction Dataset
Introduction
Different from the Ethereum On-chain Data, the Ethereum Partial Transaction Datasets are three relatively small Ethereum datasets (namely EthereumG1, EthereumG2, EthereumG3) for easier analysis. The transaction datasets are modeled as complex networks, which can be used in graph analysis such as link prediction.
In the constructed network, a node represents an Ethereum account and a link (i.e. edge) represents an Ethereum transfer transaction.
In our work, we conduct temporal link prediction with these three datasets. We use the existing links in the past (with smaller timestamps) as the training data to predict the occurrences of links in the future (with larger timestamps). You can learn more details in the Related Research.
The data details of EthereumG1 are described below. The file structure of EthereumG2 and EthereumG3 are similar to EthereumG1. You can know more information in the README file.
Data details
Citation
BibTeX
@article{lin2020modeling,
title={Modeling and Understanding {Ethereum} Transaction Records via A Complex Network Approach},
author={Lin, Dan and Wu, Jiajing and Yuan, Qi and Zheng, Zibin},
journal={IEEE Transactions on Circuits and Systems--II: Express Briefs },
year={2020},
note = {to be published, doi: \url{10.1109/TCSII.2020.2968376}},
publisher={IEEE},
}
IEEE
D. Lin, J. Wu, Q. Yuan, and Z. Zheng, “Modeling and understanding ethereum transaction records via a complex network approach,” IEEE Transactions on Circuits and Systems–II: Express Briefs, 2020, to be published, doi: 10.1109/TCSII.2020.2968376.
ACM
Dan Lin, Jiajing Wu, Qi Yuan, and Zibin Zheng, “Modeling and understanding ethereum transaction records via a complex network approach,” IEEE Transactions on Circuits and Systems–II: Express Briefs, 2020, to be published, doi: 10.1109/TCSII.2020.2968376.
Second-order Transaction Network of Phishing Nodes
Introduction
The second-order transaction network dataset contains 1660 target phishing nodes and 1700 non-phishing nodes crawled from Etherscan.
The second-order transaction network is divided into two parts. The csv files record the transaction information between the target node and its first-order transaction node, and the csv files also record the transactions between each first-order transaction node and its corresponding second-order transaction node information.
Data details
Citation
BibTeX
@InProceedings{yuan2020Phishing,
author="Yuan, Zihao and Yuan, Qi and Wu, Jiajing",
title="Phishing Detection on Ethereum via Learning Representation of Transaction Subgraphs",
booktitle="Blockchain and Trustworthy Systems",
year="2020",
publisher="Springer Singapore",
pages="178--191",
}
IEEE
Z. Yuan, Q. Yuan, and J. Wu, Phishing Detection on Ethereum via Learning Representation of Transaction Subgraphs, vol. 2. Springer Singapore, 2020.
ACM
Zihao Yuan, Qi Yuan, and Jiajing Wu, Phishing Detection on Ethereum via Learning Representation of Transaction Subgraphs, vol. 2. Springer Singapore, 2020.
Ethereum Phishing Transaction Network
Introduction
Cryptocurrency, as blockchain’s most famous implementation, suffers a huge economic loss due to phishing scams. In our work, accounts and transactions in Ethereum are treated as nodes and edges, thus detection of phishing accounts can be modeled as a node classification problem.
In this work, we collected phishing nodes from Ethereum that reported in Etherscan labeled cloud. Starting from phishing nodes we crawl a huge Ethereum transaction network via second-order BFS. Dataset contains 2,973,489 nodes, 13,551,303 edges and 1,165 labeled nodes.
MulDiGraph.pkl:This dataset is stored in pickle format, and it is the networkx object. Each node is an address with an attribute called isp indicating whether it is a phishing node. Each edge has two attributes, including amount and timestamp, which represent the balance of the transaction and the timestamp of the transaction, respectively. In this data set, the total number of nodes is 2,973,489, the number of transactions is 13,551,303, and the average degree is 4.5574.
Data details
Citation
BibTeX
@article{chen2020phishing,
author = {Chen, Liang and Peng, Jiaying and Liu, Yang and Li, Jintang and Xie, Fenfang and Zheng, Zibin},
title = {Phishing Scams Detection in Ethereum Transaction Network},
year = {2020},
issue_date = {December 2020},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {21},
number = {1},
doi = {10.1145/3398071},
journal = {ACM Trans. Internet Technol.},
}
IEEE
L. Chen, J. Peng, Y. Liu, J. Li, F. Xie, and Z. Zheng. 2020. Phishing Scams Detection in Ethereum Transaction Network. ACM Trans. Internet Technol. 21, 1, Article 10 (December 2020), 16 pages. DOI:https://doi.org/10.1145/3398071
ACM
Liang Chen, Jiaying Peng, Yang Liu, Jintang Li, Fenfang Xie, and Zibin Zheng. 2020. Phishing Scams Detection in Ethereum Transaction Network. ACM Trans. Internet Technol. 21, 1, Article 10 (December 2020), 16 pages. DOI:https://doi.org/10.1145/3398071
EOSIO On-chain Data
Introduction
We run a EOSIO full node and replay all transactions (up to 89,829,999 blocks) to get the on-chain data (block, transaction receipts, action traces) and process them into the following datasets:
Block:89,829,999 blocks information.
EOSTransferAction(EOSTA):EOS is the native cryptocurrency of EOSIO. There are 1,356,748,049 internal EOS transfers and 653,529,552 external EOS transfers that occur among 1,156,658 accounts.
ContractInvacationAction(CIA): Unlike Ethereum, all actions (transactions) in EOSIO are completed through calling contracts, including common EOS transfers. There are several system contract accounts in EOSIO, such as eosio, eosio.token, eosio.msig, and so on. In order to investigate the contract development ecology of EOSIO, we extracted the invocation data of all contracts except the system contracts. There are 775,082 authorization accounts initiated a total number of 2,189,162,705 contract invocations.
ContractCodeAction(CCA): EOSIO can be considered as a platform for smart contracts. There are 55,735 SetCode actions for creating or updating 5,594 contracts. It is worth noting that users can easily update contract code in EOSIO, which is not allowed in Ethereum.
ContractAbiAction(CAA): It is worth noting that users can also easily update contract abi, which is not allowed in Ethereum.
TokenInfoAction(TInfoA):In EOSIO, a contract that contains three standard functions of create, issue, and transfer can be regarded as a standard token contract. 1,826 contracts are considered as standard token contracts, and a total of 4,811 tokens have been created and issued. It implies that in EOSIO, a contract can issue multiple tokens, which is different from that of Ethereum.
TokenTransferAction(TTA): A total of 1,128,111,142 token transfers occurred in 1,295,389 holding accounts.
TokenIssueAction(TIssueA): Token issuers can send tokens directly to any account without permission, being commonly known as Token Airdrop.
NewAccountAction(NAA): In most public blockchain systems, creating a new address (account) is easy and free. However, in EOSIO, creating a new account requires a creator to buy RAM for storing account information. In addition, the creator will generally stake some CPU and NET resources for the new account to initiate transactions. There are 1,636,043 different accounts (or NewAccounts), which were created by only 45,350 account creators.
CPUNETAction(CPUNETA): In EOSIO, users need to stake CPU and NET for transaction calculation and network transmission. There are 5,474,353 CPU-Related actions, including 3,805,742 stakecpu actions and 1,668,611 unstakecpu actions. Meanwhile, there are 3,100,820 NET-Related actions, including 2,324,444 stakenet actions and 776,376 unstakenet actions.
RAMAction(RAMA): Users need to buy RAM to store information in EOSIO. There are a total number of 2,983,276 RAM-related actions, including 2,546,849 buyram actions and 436,427 sellram actions.
REXAction(REXA): In order to solve the problem that users do not have enough EOS to stake CPU, EOSIO officially launched the CPU and NET leasing mechanism, i.e., the REX mechanism, on May 1, 2019 (around the block 56,000,000). Users can store some EOS tokens in REX pool through buyrex action to lease to others, and retrieve EOS and get the corresponding rent at any time through sellrex action. Meanwhile, users can rent CPU or NET from the REX pool by rentcpu or rentnet actions.
You can get more details and analysis from the paper called “XBlock-EOS: Extracting and Exploring Blockchain Data From EOSIO“.
Data details
Citation
BibTeX
@article{zheng2020xblock,
title={XBlock-EOS: Extracting and Exploring Blockchain Data From EOSIO},
author={Zheng, Weilin and Zheng, Zibin and Dai, Hong-Ning and Chen, Xu and Zheng, Peilin},
journal={arXiv preprint arXiv:2003.11967},
year={2020}
}
IEEE
Zheng, W., Zheng, Z., Dai, H. N., Chen, X., & Zheng, P. (2020). XBlock-EOS: Extracting and Exploring Blockchain Data From EOSIO. arXiv preprint arXiv:2003.11967.
ACM
Zheng, Weilin, Zibin Zheng, Hong-Ning Dai, Xu Chen, and Peilin Zheng. "XBlock-EOS: Extracting and Exploring Blockchain Data From EOSIO." arXiv preprint arXiv:2003.11967 (2020).
Bitcoin Partial Transaction Dataset
Introduction
The Bitcoin Partial Transaction Datasets contain three snapshots of Bitcoin transaction data for easier analysis, namly dataset1_2014_11_1500000, dataset2_2015_6_1500000 and dataset3_2016_1_1500000. We sample the snapshots from November 2014 to January 2016 with six months as the sampling interval. Each snapshot contains the first 1,500,000 transaction records in its corresponding month, namly Nov. 2014, Jun. 2015 and Jan. 2016.
We also provide a file including the labeled addresses belonging to mixing services, and these addresses were active during the observing time of our snapshots.
Due to the pseudonymous requirements of Bitcoin, it is unlikely to enforce Know-Your-Customer (KYC) processes, which are guidelines in anti-money laundering. However, mixing services in Bitcoin, originally designed to enhance transaction anonymity, have been widely employed for money laundry to complicate trailing illicit fund.
In our work, we study mixing service detection with this dataset. For further study, we can chase up users involved in criminal activities by analyzing users who take part in Bitcoin mixing.
The details of dataset1_2014_11_1500000 are described below. The file structure of dataset2_2015_6_1500000 and dataset3_2016_1_1500000 are similar to EthereumG1. You can know more information from the README file.
Data details
Citation
BibTeX
@misc{wu2020detecting, title={Detecting Mixing Services via Mining Bitcoin Transaction Network with Hybrid Motifs}, author={Jiajing Wu and Jieli Liu and Weili Chen and Huawei Huang and Zibin Zheng and Yan Zhang}, year={2020}, eprint={2001.05233}, archivePrefix={arXiv}, primaryClass={cs.SI} }
IEEE
J. Wu, J. Liu, W. Chen, H. Huang, Z. Zheng, and Y. Zhang, “Detecting Mixing Services via Mining Bitcoin Transaction Network with Hybrid Motifs,” ArXiv Preprint ArXiv:2001.05233, 2020.
ACM
Jiajing Wu, Jieli Liu, Weili Chen, Huawei Huang, Zibin Zheng, and Yan Zhang, “Detecting Mixing Services via Mining Bitcoin Transaction Network with Hybrid Motifs,” ArXiv Preprint ArXiv:2001.05233, 2020.
First-order Transaction Network of Phishing Nodes
Introduction
Recently, blockchain technology has become a topic in the spotlight but also a hotbed of various cybercrimes. Ethereum is currently the largest blockchain platform that supports smart contracts and the corresponding cryptocurrency ether is the second-largest cyptocurrency. Besides, among various security issues of blockchain digital cryptocurrency, the number of phishing scams accounts for more than 50% of all cybercrimes in Ethereum since 2017 and this kind of scam has become as a main threat to trading security of Ethereum, thus emerging as a serious threat to the trading security of the blockchain ecosystem.
Our work shares phishing account information from Etherscan and the code for how to crawl it. In addition, the trans2vec algorithm for detection was also shared.
Data details
Citation
BibTeX
@article{phishingdetection,
title={ Who Are the Phishers? Phishing Scam Detection on Ethereum via Network Embedding},
author={ Jiajing Wu , Qi Yuan, Dan Lin , Wei You, Weili Chen , Chuan Chen , and Zibin Zheng},
journal={ IEEE Transactions on Systems, Man and Cybernetics: Systems},
year={2020},
note = {to be published, doi: \url{doi: 10.1109/TSMC.2020.3016821. }},
publisher={IEEE},
}
IEEE
J. Wu, Q. Yuan, D. Lin, W. You, W. Chen, C. Chen and Z. Zheng, "Who are the phishers? Phishing scam detection on ethereum via network embedding", IEEE Transactions on Systems, Man, and Cybernetics: Systems, to be published, doi: 10.1109/TSMC.2020.3016821.
ACM
Jiajing Wu, Qi Yuan, Dan Lin, Wei You, Weili Chen, Chuan Chen and Zibin Zheng, "Who are the phishers? Phishing scam detection on ethereum via network embedding", IEEE Transactions on Systems, Man, and Cybernetics: Systems, to be published, doi: 10.1109/TSMC.2020.3016821.