Michal Zalecki
Michal Zalecki
software development, testing, JavaScript,
TypeScript, Node.js, React, and other stuff

Using IPFS with Ethereum for Data Storage

Ethereum is a well-established blockchain that enables developers to create smart contracts — programs that execute on blockchain that can be triggered by transactions. People often refer to blockchain as a database but using blockchains as a data store is prohibitively expensive.

At the current price ($530, 4gwei) storing 250GB on Ethereum would cost you $106,000,000. In general, we can put up with the high cost because we a) don’t save that much data on blockchains b) the censorship resistance, transparency and robustness of blockchains are worth it.

Decentralized Storage

IPFS (InterPlanetary File System) has some guarantees we know from blockchains, namely decentralization, and tamper-proof storage, but doesn’t cost more than a conventional disc space. Running your EC2 t2.micro instance with EBS 250GB storage would cost you about $15/mo. A unique feature of IPFS is the way it addresses files. Instead of using location-based addressing (like domain name, IP address, the path to the file, etc.), it uses content-based addressing. After adding a file (or a directory) to the IPFS repository, you can refer to it by its cryptographic hash.

$ ipfs add article.json
added Qmd4PvCKbFbbB8krxajCSeHdLXQamdt7yFxFxzTbedwiYM article.json

$ ipfs cat Qmd4PvCKbFbbB8krxajCSeHdLXQamdt7yFxFxzTbedwiYM
{
  "title": "This is an awesome title",
  "content": "paragraph1\r\n\r\nparagraph2"
}

$ curl https://ipfs.io/ipfs/Qmd4PvCKbFbbB8krxajCSeHdLXQamdt7yFxFxzTbedwiYM
{
  "title": "This is an awesome title",
  "content": "paragraph1\r\n\r\nparagraph2"
}

You can then access files using IPFS client or any public gateway. You can also create a non-public gateway, make it writable (read-only) by default and implement your authorization scheme getting programmatic access to the IPFS network.

It’s important to understand that IPFS is not a service where other peers will store your content no matter what. If your content isn’t popular, the garbage collector will remove it from other nodes if they didn’t pin the hash (they are not interested in renting you their disc space). As long as at least one peer on the network does care about your files and has the interest in storing them, other nodes on the network can easily fetch that file. Even when your file disappears from the network, it can be added again later, and unless its content changes, its address (hash) will be the same.

IPFS and Ethereum Smart Contracts

Although Ethereum protocol doesn’t provide any native way to connect to IPFS, we can fall back to off-chain solutions like Oraclize to remedy that. Oraclize allows for feeding smart contracts with all sorts of data. One of the available data sources is URL. We could use a public gateway to read from our JSON file on IPFS. Relying on a single gateway would be a weak link. Another data source we are going to use is IPFS. By using JSON parser, which is part of the query, to Oraclize smart contract we can extract specific field in the JSON document.

oraclize_query("IPFS", "json(Qmd4PvCKbFbbB8krxajCSeHdLXQamdt7yFxFxzTbedwiYM).title"));

If Oraclize can fetch the file within 20 seconds, you can expect an asynchronous request. If you upload file using well-connected node, timeout is not something you should be concerned about. Our EC2 (EU Frankfurt) instance connects to roughly 750 peers. Fetching files through the public gateways or locally running daemon is almost instant. The response is asynchronous, and oraclize_query call returns query id (bytes32). You use it as an identifier for data coming from Oraclize.

function __callback(bytes32 _queryId, string _data) public {
  require(msg.sender == oraclize_cbAddress());
  process_data(_data);
}

For safety reasons, we want to make sure that only Oraclize is allowed to call the __callback function.

You can find the full codebase of out decentralized blog example on GitHub: tooploox/ipfs-eth-database!

Performance and Implementation

Initially, I was concerned for the performance. Can you fetch JSON files hosted on IPFS as quickly as it takes centralized services to send a response? I was pleasantly surprised.

$ wrk -d10s https://ipfs.io/ipfs/Qmd4PvCKbFbbB8krxajCSeHdLXQamdt7yFxFxzTbedwiYM
Running 10s test @ https://ipfs.io/ipfs/Qmd4PvCKbFbbB8krxajCSeHdLXQamdt7yFxFxzTbedwiYM
  2 threads and 10 connections
  Thread Stats Avg Stdev Max +/- Stdev
    Latency 59.18ms 24.36ms 307.93ms 94.73%
    Req/Sec 86.34 15.48 101.00 85.57%
  1695 requests in 10.05s, 1.38MB read
Requests/sec: 168.72
Transfer/sec: 140.70KB

In our implementation of the censorship-resistant blog, the author has to enter only the IPFS hash when calling addPost on the smart contract. We read the title from the file using IPFS and Oraclize to store it using Ethereum events. We don’t need to keep the title accessible for other smart contracts so using events is good enough for our use case. That might be not the most groundbreaking example but nicely shows how to optimize for low transaction fees.

pragma solidity 0.4.24;

import "openzeppelin-solidity/contracts/ownership/Ownable.sol";
import "./lib/usingOraclize.sol";
import "./lib/strings.sol";


contract Blog is usingOraclize, Ownable {
  using strings for *;

  mapping(address => string[]) public hashesByAuthor;
  mapping(bytes32 => string) public hashByQueryId;
  mapping(bytes32 => address) public authorByHash;

  event PostAdded(address indexed author, string hash, uint timestamp, string title);
  event PostSubmitted(address indexed author, string hash, bytes32 queryId);

  uint private gasLimit;

  constructor(uint _gasPrice, uint _gasLimit) public {
    setCustomOraclizeGasPrice(_gasPrice);
    setCustomOraclizeGasLimit(_gasLimit);
  }

  function getPrice(string _source) public view returns (uint) {
    return oraclize_getPrice(_source);
  }

  function setCustomOraclizeGasPrice(uint _gasPrice) public onlyOwner {
    oraclize_setCustomGasPrice(_gasPrice);
  }

  function setCustomOraclizeGasLimit(uint _gasLimit) public onlyOwner {
    gasLimit = _gasLimit;
  }

  function withdraw() public onlyOwner {
    owner.transfer(address(this).balance);
  }

  function __callback(bytes32 _queryId, string _title) public {
    require(msg.sender == oraclize_cbAddress());
    require(bytes(hashByQueryId[_queryId]).length != 0);
    string memory hash = hashByQueryId[_queryId];
    address author = authorByHash[keccak256(bytes(hash))];
    hashesByAuthor[author].push(hash);
    emit PostAdded(author, hash, now, _title);
  }

  function addPost(string _hash) public payable returns (bool) {
    require(authorByHash[keccak256(bytes(_hash))] == address(0), "This post already exists");
    require(msg.value >= oraclize_getPrice("IPFS"), "The fee is too low");
    bytes32 queryId = oraclize_query("IPFS", "json(".toSlice().concat(_hash.toSlice()).toSlice().concat(").title".toSlice()), gasLimit);
    authorByHash[keccak256(bytes(_hash))] = msg.sender;
    hashByQueryId[queryId] = _hash;
    emit PostSubmitted(msg.sender, _hash, queryId);
    return true;
  }

  function getPriceOfAddingPost() public view returns (uint) {
    return oraclize_getPrice("IPFS");
  }
}

The frontend reads events using Web3 and builds a list of all blog posts for a given author.

The content of the article in markdown is also stored on IPFS. It allows keeping the fixed fee for adding new blog posts. We use a range of public IPFS starting with our own. That makes sense especially when you upload files from the same node. You can also pin files programmatically if you decide to run your gateway in write mode (by default it’s read-only). We also allow the user to specify his own gateway. If user installed IPFS Companion he can take advantage of running his own node.

BlogEvents.getPastEvents("PostAdded", { fromBlock: 0, filter: { author } }).then(events => {
  this.setState({ addedPosts: events.map(e => e.returnValues) });
});

// ...

getPost(gatewayIndex = 0) {
  this.fetchPostFromIpfs(gateways[gatewayIndex])
    .catch(() => this.retry(gatewayIndex))
}

You can find the full codebase of out decentralized blog example on GitHub: tooploox/ipfs-eth-database!

Conclusions

Our little experiment with requesting IPFS data from Ethereum smart contracts let us dive deeper into IPFS performance and built the foundation for further implementation in more production use cases.

The only place where performance is an issue can be IPNS. IPNS is the naming system for IPFS and allows for mutable URLs. Hash corresponds to the peer id instead of the file or directory content hash. The new IPNS resolver and publisher introduced in version 0.4.14 have mitigated some of the problems. Make sure you have an up-to-date version and run the daemon with — enable-namesys-pubsub option to benefit from nearly instant IPNS updates.

There were no significant problems with continuously running IPFS node on Amazon Linux 2 whatsoever.


This article has been originaly posted on Tooploox's blog: Using IPFS with Ethereum for Data Storage

Photo by David Menidrey on Unsplash.