Downloading all Datasets

1 Ethereum Dataset

1.1 Temporal Link Prediction

Introduction

The task of link prediction aims to predict the occurrence of links in a given graph on the basis of observed information. In the temporal link prediction problem, unlike the static link prediction where links have no timestamp, we use the existing links in the past (with smaller timestamps) as the training data to predict the occurrences of links in the future (with larger timestamps).

Data details

About this table
Each row represents an edge information.
Columns (4 columns)
from_node_num Node number (From)
to_node_num Node number (To)
value Value
timestamp Timestamp
About this table
Each row represents a mapping from Ethereum address to unique node number (ID).
Columns (2 columns)
addr Ethereum address
idx Node number (ID)
About this table
Sort all the collected edges according to their timestamps.
Columns (4 columns)
From The sender of the transaction
To The recipient of the transaction
Value The amount of money transferred
Timestamp When the transaction happens
About this table
Dataset split. Load this pickle file as a dict.
Columns (4 elements)
train_edges_pos positive node pairs in training set
test_edges_pos positive node pairs in test set
train_edges_false negative node pairs in training set
test_edges_false negative node pairs in test set

Download

You can know more about the reliability, availability, and timeliness of the dataset from the README.

Download all of the dataset.

@misc{
xblockNetworkAnalysis, author = {Lin, Dan and Wu, Jiajing and Zheng, Zibin}, title = {{XBLOCK Datasets}: {InPlusLab} Blockchain Network Dataset Collection}, howpublished = {\url{http://xblock.pro/Data-Sets-details/}}, month = Nov, year = 2019 }

1.2 Ethereum On-chain Data

Information

We run the Ethereum Full Node (up to 8099999 blocks) to get the on-chain data (block, trace, receipt) and process them into the following datasets:
Block and NormalTransaction:To investigate the basic statistics of Ethereum, we extract the information about the blocks and the transactions inside the blocks. There are 8,100,000 blocks and 491,562,222 transactions generated from the block data.
InternalEtherTransaction:Ether is the native cryptocurrency of Ethereum. The transactions of Ether not only happen in the transactions recorded in the block, but also occur during the smart contract execution. 329,020,672 Ether transactions which occur among 54,720,018 addresses are collected.
ContractInfo: Ethereum can be considered as a platform for smart contracts. There are 16,609,273 smart contracts created by 133,484 addresses. It implies that there should be a number of users who create multiple contracts. 5,564,823 contracts are deleted while they refund the Ether balance to 19,133,481 addresses.
ContractCall: In EVM, a smart contract can call another one to invoke some codes or functions. It consists of 1,148,572,009 Contract Calls, among which 639,336,722 contain input codes and 169,463,261 contain errors.
ERC20Transaction:In order to collect the information of tokens, we process the receipt dataset to extract the standard events, which are defined in the standard ERC20 protocol of Ethereum community. Additionally, each ERC20 token contains basic information like name, symbol, total supply, etc. There are 227,698,645 ERC20 transactions among 42,146,575 holder addresses.
ERC721Transaction:ERC721 token is another contract protocol proposed by Ethereum community. We find that 1,954 ERC721 contracts contain 7,524,827 token transactions and 414,829 holder addresses.

Data details

About this table
Ethereum block information.
Columns (14 columns)
blockNumber block number
timestamp timestamp 
size block size
difficulty difficulty
transactionCount transaction count
minerAddress miner address
minerExtra miner extra
gasLimit gasLimit
gasUsed gasUsed
minGasPrice min gasPrice
maxGasPrice max gasPrice
avgGasPrice avg gasPrice
miner miner address
reward miner reward
About this table
Ethereum block normal transactions information.
Columns (10 columns)
blockNumber block number
timestamp timestamp
transactionHash transaction hash
from from
to to
creates contract created
value value
gasLimit gasLimit
gasPrice gasPrice
gasUsed gasUsed
About this table
Ether Transaction. The transactions of Ether not only happen in the transactions recorded in the block, but also occur during the smart contract execution.
Columns (8 columns)
blockNumber block number
timestamp timestamp 
transactionHash transaction hash  
from from  
to to
fromIsContract from is contract or not
toIsContract to is contract or not
value value
About this table
Ethereum Contract Information
Colmns (11 columns)
address address
createdBlockNumber created block number
createdTransactionHash created transaction hash
creator creator
createValue createValue
creationCode creation code
contractCode contract code
decreatedBlockNumber decreated block number
decreatedTransactionHash decreated transaction hash
refunder refunder
refundValue refund value
About this table
Ethereum Contract Calling
Colmns (11 columns)
blockNumber block number
timestamp timestamp
transactionHash transaction hash
from from
to to
fromIsContract from is contract or not
toIsContract to is contract or not
callType call type
callingFunction calling function
value value
isError error or not
About this table
Ethereum ERC20 token transaction information.
Columns (7 columns)
blockNumber block number
timestamp timestamp
transactionHash transaction hash
tokenAddress token contract address
from from
to to
value value
About this table
Ethereum ERC721 token transaction information.
Columns (7 columns)
blockNumber block number
timestamp timestamp
transactionHash transaction hash
tokenAddress token contract address
from from
to to
tokenId token id

Download

Download all of the dataset.

@article{
    zheng2019xblock,
    title={Xblock-ETH: Extracting and Exploring Blockchain Data from Ethereum},
    author={Zheng, Peilin and Zheng, Zibin and Dai, Hong-ning},
    journal={arXiv preprint arXiv:1911.00169},
    year={2019}
}

1.3 Smart Ponzi Scheme Detection

Introduction

The task of smart Ponzi scheme detection is to predict whether a given smart contract is a ponzi scheme. To train the classification model, we manually check more than 3,000 smart contracts by reading their source codes. The user transcation data are also used to train the model.

Data details

About this table
The labels of whether a contract is a smart ponzi scheme.
Columns (2 columns)
Contract The address of contracts
Ponzi Whether the contract is a smart ponzi scheme (1 if yes)

Download

You can know more about the reliability, availability, and timeliness of the dataset from the README.

Download all of the dataset.

1.4 Smart Contract

Introduction

Open Source Contract Info.csv dataset contains about 14 thousand contracts which is open source on Etherscan.

ContractInfo.csv dataset is crawled from Ethereum main chain using geth RPC tool.

Data details

About this table
All Open Source Contracts in Ethereum.
Columns (9 columns)
address Address of Open Source Smart Contract
ContractCode Bytecode of Contract
timestamp Timestamp of Contract Creation
createValue Cerate Value of Contract Creation
createBlockNumber Block Height of Contract Creation
createdTransactionHash Transaction of Contract Creation
creationCode Bytecode of Contract Creation
creator Address of Creator
code Source Code of Contract  
About this table
Information of Smart Contract of Ethereum.
Columns (7 columns)
address Address of Open Source Smart Contract
createdBlockNumber Block Height of Contract Creation
createdTransactionHash Transaction of Contract Creation
creator Address of Contract Creater
createValue Parameter of Contract Creation
creationCode Bytecode of Contract Creation
contractCode Bytecode of Contract

Download

You can know more about the reliability, availability, and timeliness of the dataset from the README.

Download all of the dataset.

@misc{
xblockSCdataset, author = {Kong, Queping and Chen, Xiangping and Zheng, Zibin}, title = {{XBLOCK Datasets}: {InPlusLab} Ethereum Smart Contract Dataset}, howpublished = {\url{http://xblock.pro/Data-Sets-details/}}, month = Jan, year = 2020 }

1.5 EPTransNet

Introduction

Cryptocurrency, as blockchain’s most famous implementation, suffers a huge economic loss due to phishing scams. In our work, accounts and transactions in Ethereum are treated as nodes and edges, thus detection of phishing accounts can be modeled as a node classification problem. To tackle the problem, we propose a detecting method based on Graph Convolutional Network(GCN) and auto encoer to better mine structural information and precisely distinguish phishing accounts.

In this work, we collected phishing nodes from Ethereum that reported in Etherscan labeled cloud. Starting from phishing nodes we crawl a huge Ethereum transaction network via second-order BFS. Dataset contains 2,973,489 nodes, 13,551,303 edges and 1,165 labeled nodes.

Data details

About this table
The dataset is not suitable to be presented in the form of a table. So the text is as follows:
Number of nodes: 2973489 , Number of edges: 13551303 Average degree: 4.5574
Columns (3 columns)
isp The label 1 means fishing mark node, otherwise 0.
amount The amount mount of the transaction.
timestamp The timestamp of the transaction.

Download

You can know more about the reliability, availability, and timeliness of the dataset from the README.

Download all of the dataset.

@inproceedings {
   author ={Chen, Liang and Peng, Jiaying and Liu, Yang and Li, Jintang and Xie, Fenfang and Zheng, Zibin},
   title = {{XBLOCK Datasets}: {InPlusLab} Phishing Scams Detection in Ethereum Transaction Network},
   jouranl={Transactions on Internet Technology, submitted},
   year={2019}
}

2 Bitcoin Dataset

2.1 Bitcoin Mixing Detection

Introduction

Due to the pseudonymous requirements of Bitcoin, it is unlikely to enforce Know-Your-Customer (KYC) processes, which are guidelines in anti-money laundering. However, mixing services in Bitcoin, originally designed to enhance transaction anonymity, have been widely employed for money laundry to complicate trailing illicit fund. Here, we provide the dataset to study the mixing service detection. We can further chase up users involved in criminal activities by analyzing users who take part in Bitcoin mixing.

In this task we focus on detecting Bitcoin addresses belonging to mixing services.

As transactions in Bitcoin increase explosively during this period, we consider three snapshots of Bitcoin transaction data with mixing labels between November 2014 and January 2016 and use six months as the sampling interval. Each snapshot contains 1,500,000 transaction records.

Data details

About this table
Information of block.
Columns (4 columns)
blockID Block ID
bhash Block hash (identifier in the blockchain)
btime Creation time of block
txs Number of transactions
About this table
Transaction ID and hash pairs.
Columns (2 columns)
txID Transaction ID
txhash Transaction hash
About this table
Bitcoin address ID and address pairs.
Columns (2 columns)
addrID Address ID
addr String representation of the address
About this table
Information of transaction.
Columns (5 columns)
txID Transaction ID
blockID Block ID
n_inputs Number of inputs
n_outputs Number of outputs
btime Creation time
About this table
List of all transaction inputs.
Columns (3 columns)
txID Transaction ID
addrID Sending address
value Integer sum in Satoshis (1e-8 BTC)
About this table
List of all transaction outputs.
Columns (3 columns)
txID Transaction ID
addrID Receiving address
value Integer sum in Satoshis (1e-8 BTC)
About this table
label.rar contains 38 files, the file structure of each file can refer to BitMixer.io.csv. These files contain addresses of BitMixer.io, BitLaunder.com, BitcoinFog, HelixMixer crawled from walletexplorer.
Columns (1 column)
Address Address belonging to this service (separated by ‘\n’)

Download

You can know more about the reliability, availability, and timeliness of the dataset from the README.

Download all of the dataset.

@misc{
xblockNetworkAnalysis, author = {Liu, Jieli and Wu, Jiajing and Zheng, Zibin}, title = {{XBLOCK Datasets}: {InPlusLab} Blockchain Network Dataset Collection}, howpublished = {\url{http://xblock.pro/Data-Sets-details/}}, month = Jan, year = 2020 }

2.2 Mt.Gox leaked transaction

Introduction

This data set is the transaction data leaked by mt.gox exchange. First, we combine the buy and sell transaction fields of the same transaction, and then de duplicate them through transaction time, transaction account, etc. to ensure the uniqueness of each transaction data. This transaction data is very useful for analyzing the user behavior of bitcoin market.

Data details

About this table
Transactions of bitcoin market.
Columns (8 columns)
Source The user who sell bitcoins
Target The user who buy bitcoins
Trade_Id The ID of present trasaction
Bitcoins Number of bitcoins involved in the current transaction
Money Dollars spent buying bitcoins
Money_rate Price per bitcoin
Date Date of transaction
label Types of users

Download

You can know more about the reliability, availability, and timeliness of the dataset from the README.

Download all of the dataset.

@inproceedings{
chen2019market, title={Market Manipulation of Bitcoin: Evidence from Mining the Mt. Gox Transaction Network}, author={Chen, Weili and Wu, Jun and Zheng, Zibin and Chen, Chuan and Zhou, Yuren}, booktitle={IEEE Conference on Computer Communications}, pages={964--972}, year={2019}, organization={IEEE} }

3 EOS Dataset

3.1 Activities of DApps

Introduction

There are a number of DApps on platforms like Ethereum, EOS, Steem and so on. Some of them are always active (s.t., there are always new transactions corresponding to those DApps), while some of them are seemingly dead for a long time.
This dataset contains two files, one for DappRadar and the other for State of the Dapps. The dataset provides information about dapps’ activity, such as their transaction counts, total transaction values and active users in 24 hours, 7 days and 30 days.

Data details

About this table
Activity information of DApps on DappRadar.
Columns (15 columns)
Txs_24h Transaction count of that DApp in 24 hours
Txs_7d Transaction count of that DApp in 7 days
balance Balance
category Category
link Link to the DApp
name DApp name
protocol DApp protocol
rank DApp rank, can be used to identify one DApp
users_24h Active users of the DApp in 24 hours
volume_24h Volume of that DApp in 24 hours
volume_7d Volume of that DApp in 7 days
long_intro Introduction of the DApp
other_link Other important links
submitted Time added
contract Contracts contained in that DApp
About this table
Activity information of DApps on State of the Dapps.
Columns (28 columns)
category Category
dev_activity_30d Dev activity in 30 days
link Link to the DApp
name DApp name
platform DApp platform
rank DApp rank, can be used to identify one DApp
users_24h Active users of the DApp in 24 hours
volume_7d Active users of the DApp in 7 days
short_intro Brief introduction of the DApp
long_intro Introduction of the DApp
status DApp status
author App authors
software_license Software license
submitted_updated Submitted date
mainnet Mainnet
contract Contract addresses contained in that DApp
tag DApp tags
last_updated Last updated date
transaction_count Transaction count of that DApp
transaction_count_volume_greater_0 Nonzero transaction count
transaction_count_ratio The ratio of nonzero transactions
total_transaction_volume_ether Total transaction volume in ethers
contract_count Contract count
dapp_total_loss The DApp’s net income
user_count_unique_remove_contract_creator Unique users count, contract creator not included
user_total_loss User total net income
user_loss_ratio User income ratio
user_loss_average Average user net income

Download

You can know more about the reliability, availability, and timeliness of the dataset from the README.

Download all of the dataset.

@inproceedings {
   author ={Chen, Liang and Peng, Jiaying and Liu, Yang and Li, Jintang and Xie, Fenfang and Zheng, Zibin},
   title = {{XBLOCK Datasets}: {InPlusLab} Phishing Scams Detection in Ethereum Transaction Network},
   jouranl={Transactions on Internet Technology, submitted},
   year={2019}
}