Using IPFS with Ethereum for Data Storage

Store JSON files on IPFS and access the data from your smart contracts using Oraclize

Ethereum is a well-established blockchain that enables developers to create smart contracts – programs that execute on blockchain that can be triggered by transactions. People often refer to blockchain as a database but using blockchains as a data store is prohibitively expensive.

At the current price ($530, 4gwei) storing 250GB on Ethereum would cost you $106,000,000. In general, we can put up with the high cost because we a) don’t save that much data on blockchains b) the censorship resistance, transparency and robustness of blockchains are worth it.

If you are new to the Ethereum, check out this introduction.

Decentralized Storage

IPFS (InterPlanetary File System) has some guarantees we know from blockchains, namely decentralization, and tamper-proof storage, but doesn’t cost more than a conventional disc space. Running your EC2 t2.micro instance with EBS 250GB storage would cost you about $15/mo. A unique feature of IPFS is the way it addresses files. Instead of using location-based addressing (like domain name, IP address, the path to the file, etc.), it uses content-based addressing. After adding a file (or a directory) to the IPFS repository, you can refer to it by its cryptographic hash.

You can then access files using IPFS client or any public gateway. You can also create a non-public gateway, make it writable (read-only) by default and implement your authorization scheme getting programmatic access to the IPFS network.

It’s important to understand that IPFS is not a service where other peers will store your content no matter what. If your content isn’t popular, the garbage collector will remove it from other nodes if they didn’t pin the hash (they are not interested in renting you their disc space). As long as at least one peer on the network does care about your files and has the interest in storing them, other nodes on the network can easily fetch that file. Even when your file disappears from the network, it can be added again later, and unless its content changes, its address (hash) will be the same.

IPFS and Ethereum Smart Contracts

Although Ethereum protocol doesn’t provide any native way to connect to IPFS, we can fall back to off-chain solutions like Oraclize to remedy that. Oraclize allows for feeding smart contracts with all sorts of data. One of the available data sources is URL. We could use a public gateway to read from our JSON file on IPFS. Relying on a single gateway would be a weak link. Another data source we are going to use is IPFS. By using JSON parser, which is part of the query, to Oraclize smart contract we can extract specific field in the JSON document.

If Oraclize can fetch the file within 20 seconds, you can expect an asynchronous request. If you upload file using well-connected node, timeout is not something you should be concerned about. Our EC2 (EU Frankfurt) instance connects to roughly 750 peers. Fetching files through the public gateways or locally running daemon is almost instant. The response is asynchronous, and `oraclize_query` call returns query id (bytes32). You use it as an identifier for data coming from Oraclize.

For safety reasons, we want to make sure that only Oraclize is allowed to call the __callback function.

You can find the full codebase of out decentralized blog example on GitHub: tooploox/ipfs-eth-database!

Performance and Implementation

Initially, I was concerned for the performance. Can you fetch JSON files hosted on IPFS as quickly as it takes centralized services to send a response? I was pleasantly surprised.

In our implementation of the censorship-resistant blog, the author has to enter only the IPFS hash when calling addPost on the smart contract. We read the title from the file using IPFS and Oraclize to store it using Ethereum events. We don’t need to keep the title accessible for other smart contracts so using events is good enough for our use case. That might be not the most groundbreaking example but nicely shows how to optimize for low transaction fees.

The frontend reads events using Web3 and builds a list of all blog posts for a given author.

The content of the article in markdown is also stored on IPFS. It allows keeping the fixed fee for adding new blog posts. We use a range of public IPFS starting with our own. That makes sense especially when you upload files from the same node. You can also pin files programmatically if you decide to run your gateway in write mode (by default it’s read-only). We also allow the user to specify his own gateway. If user installed IPFS Companion he can take advantage of running his own node.

You can find the full codebase of out decentralized blog example on GitHub: tooploox/ipfs-eth-database!

Conclusions

Our little experiment with requesting IPFS data from Ethereum smart contracts let us dive deeper into IPFS performance and built the foundation for further implementation in more production use cases.

The only place where performance is an issue can be IPNS. IPNS is the naming system for IPFS and allows for mutable URLs. Hash corresponds to the peer id instead of the file or directory content hash. The new IPNS resolver and publisher introduced in version 0.4.14 have mitigated some of the problems. Make sure you have an up-to-date version and run the daemon with –enable-namesys-pubsub option to benefit from nearly instant IPNS updates.

There were no significant problems with continuously running IPFS node on Amazon Linux 2 whatsoever.