Zeno, a distributed cross chain notariser
How does it work?
- Each member of a group of notaries runs the daemon and the chains.
- It inspects the chains to figure out what to do (Zeno is stateless).
- Creates a round seed that all participants will be able to recreate given similar chain state.
- Performs a series of steps that conclude with some bit of data being written to a blockchain.
Ok so each member runs the chains?
Right, such that we get a reliable view of the network (unless everyone is using infura.io :eyes:). This does mean that all nodes need a beefy server with 1tb SSD.
How does it figure out what to do?
Depends on mode of notarisation, eg:
- Synchronous notarisation between 2 chains
- Unilateral checkpointing of 1 chain
However, in general, Zeno will inspect the chains to discover parameters for the next notarisation.
Given these parameters, it will generate a
round seed which is the hash of a few bits of information:
- A hash representing the data Zeno is configured with
- The input data to the current notarisation round
Zeno will then try to agree on this information with other online nodes to get a threshold agreement. If this is successful, a transaction is written. If not, state is wiped and the process starts over (
Top level loop).
Screenshot
How do nodes co-operate?
Once a node has examined the chain state and figured out the round inputs, it enters a cooperative mode where each step is taken in concert with other nodes that are listening on the same round ID. In the event that not enough nodes are active on the same round ID, a timeout will occur when trying to collect a threshold number of items for a step. If a packet is received for a round ID that a node is not currently active on, it is cached in a fixed size rotating cache in case it is just about to catch up. This speeds up replication greatly. Many rounds may be active at the same time; rounds that have been completed need not be dropped immediately, as they run in their own threads, they can be left open to facilitate other nodes in catching up, and killed at some later point.
Here is a function that ends in a transaction being written, if enough nodes agree.
The
runConsensus function runs code in the Consensus context, this provides functions to synchronise data between nodes.
Consensus in Zeno
- Consensus in Zeno refers mainly to data synchronisation and cooperation of nodes.
- In reality, Zeno is not solving the two generals problem at all. Finality is provided by the chains that are being written to.
- Sometimes, you need the network to agree on something which cannot be consistently determined by common obersation, e.g. an ordered subset of nodes that are online. For this, you can select a proposer.
- A proposer is an SPOF, and it would be easy for an adversary to DOS your proposer. Tendermint tries to protect against this at the network level, using sentry nodes (kind of like firewalls). Bitcoin uses PoW, so it cannot be determined ahead of time who will release a block. Algorand uses VRFs, which is similar to PoW in that it is weighted by probability.
Consensus frontend
Guts of data synchronisation; i.e. building an inventory of data from different nodes for a step
Concurrency in Zeno
- Haskell uses scalable user space threads and has an event manager (io loop) as part of the runtime.
- The networking connections have one thread per inbound and outbound connection (Node datatype), because the lifecycle of the thread is tied to the connection and it doesn't matter if they die. There are points of resource contention but they are not too hard to deal with.
- However, the concurrent round runner, which may have many rounds / steps running at the same time, is much more complex and has a heirarchical structure. And occasionally we might want to send the same message to all steps in a round, or kill the round from outside. So, we gain value from encoding the structure of the rounds / steps explicitly and implementing the step as a coroutine so that they can all be run from a single thread. This reduces the number of interfaces that need to be thread safe and makes it easier to test.
Zeno talks with Ethereum
- Zeno was designed to get it's configuration from an Ethereum contract; so it is configured at init time with a contract address.
- It then queries this contract for a key and expects a JSON object in return.
- The Gateway contract holds the runtime configs, member list and also proxies the writes in an authenticated manner.
- Since the Gateway contract is somewhat complex, and critically important, it is well tested. However, it is hard to overstate the importance of testing where smart contracts are involved, and without formal verification the best guarantee you can give is "pretty good I guess".
- Zeno also has it's own implementation of ABI and Transaction data structures. I also have a closed form merkle-patricia trie implemented for an earlier application of Zeno.
List shuffle algorithm
I implemented a
deterministic list shuffle, to support proposer selection. This seems like a strange idea, given that it is a common problem, however it was quite fun.
Tests.
Secp256k1
Zeno uses Secp256k1 extensively, and exchanges compact recoverable signatures to authenticate all messages. The Haskell wrapper for Secp256k1 used to support recovery, however, under heavy workloads it
exhibited a bug, which I could not track down. The author of the library fixed the bug, however in the process, removed support for recoverable mode. It seems the issue comments may have been editorialized. Anyhow, I made the decision to remove the dependency, and implemented my own
higher level functions to more closely match my use cases. This was for the same reason that Zeno has it's own Ethereum data structures, rather than using the Haskell web3 library; if you can easily implement or copy the bits you need, it has a smaller surface area and it is easier to fix things, especially if the library you are depending on is bloated or not well maintained.
Reflections on development with Haskell
Haskell has some properties that make it amazing to work with:
- Best in class type system and constructs like sum types and monads, and pure functions are very powerful tools.
- Interepreted mode keeps development "save / test" workflow very snappy. GHC is really incredible considering it's 30 years old.
- You never need to think about memory or ownership unless you want to, but code is compiled so execution is typically an order of magnide faster than an interpreted program. Good enough for most use cases.
However, I would not use it for a big project in 2020 because:
- Haskell is in fact showing it's age, and not a streamlined experience, at 30 years old now.
- Packages are frequently not well maintained. Rust is miles ahead in this regard, and although Haskell saves you alot of time given it's advantages, it also cost me alot of time digging for obscure issues or making up for the lack of support for things in external packages. I even encountered a few compiler bugs, but GHC has thousands of open issues so it was unlikely that mine would get looked at in a reasonable amount of time.
- Zeno has memory leaks that I could not track down. They are minor, but as an operator you would need to compensate by restarting it every once in a while. It doesn't sit very well with me that I could not find them, though perhaps its a reality of many medium/large size projects.
And unfortunately there is not yet any new serious compiler to compete with Haskell's feature set. I think part of the problem is that
GC-less functional programming is impossible.
Screenshots
monkeytop
normalop
scale-htop
scaletest