In this blog post, I’d like to introduce you to an evening ramble with Dean Eigenmann called No-Tag-Back Gossiping, or NTB-G. This post came about as we worked through ideas to make gossiping more reliable.
Make sure to read the spec first, as it clearly lays out the important components. Below, I’ll talk about the whys and hows of some of the requirements we establish.
When you gossip a message, the idea is that the message will traverse the whole network and eventually be sent to everyone. But not everyone cares about your message and the network might be bigger than you think. Before you know it, you clogged the network and pissed off everyone, and worst of all, they don’t know how to let you know about it or to shut you out. All they get is your message, one after the other, like an acid drip that just won’t stop.
At the same time, what if you’d like to trigger reorgs of your network to make it easier to communicate with distant peers (in terms of number of hops)? And get them some messages in time? We’re going to cover all this.
Before we get started, please consider reading the Plumtree paper. Otherwise not much of what follows will make sense to you. If we’re both going to pretend you read it, here are some coping strategies:
- Print it on tissues so you can read it, one sentence at a time, when you blow your nose.
- Print it on cotton. Make a pillow of it. Many pillows even. Then splurge on Plumtree whenever you have TV night.
- Note that paper is from 2007 and people must have come up with something better. Become outraged that the author of this article came up with such an old paper for what is a seminal topic, well-researched by academics. Get an acm.org membership. Research the subject. Map out all papers about it. Eventually take to Twitter and in 2 tweets put an end to this author’s career, bringing an end to his indigence in fact checking.
- Live tweet the whole paper, one paragraph at a time. Add comments in the voice of Bob Ross or Neil de Grasse Tyson.
- Or read below the cliff notes, which won’t replace reading the paper:
Plumtree: A bunch of peers get connected. They send each other messages. You send to everyone. You get a new message, you broadcast to everyone (except the sender, doh). Eventually some of them reply to you saying, “hey, I got it already, thanks” and let you know to just send a signature or hash of the message going forward. But sometimes their paper boy is late, they don’t get the right news with Fox, so they come back and say “hey, thanks for that hash, can you give me the full message again?” in which case you start sending them full messages again.
Plumtree has been equated to epidemic broadcast trees. It allows the network over time to get better at circulating messages. And we’re going to riff on it through this whole post.
A few definitions for you, dear reader:
- Full message: all the bytes of the original message, as the sender intended.
- Headers: the hash of the message, possibly signed by you (spoiler), so they can check they got it. Others call those attestations or metadata.
- Eager peers: the unfortunate peers that get your whole messages.
- Lazy peers: the peers who get just the headline, ie the signature or hash.
OK, that was the hard part. Now let’s engage on the standard components of NTB-G.
Identity through handshake
First and foremost, you have to have some notion of identity. Take any peer-to-peer system; it’s usually a public key. And to show that you have the private key somewhere, you use that public key as part of the handshake process, meaning when you first connect to a peer. This notion of identity is going to be helpful down the road.
Note what we didn’t say: You can implement a system where your software has more than one identity. Actually that sounds fun. You can change your identity as often as you like. More power to you.
You’re in this beautiful new world of peer-to-peer communications, and looking forward to blasting your entire collection of genuine gifs of german metal bands. But here’s the thing, not everybody likes your content or is even looking for it.
So they are going to apply rate limiting, which means they will slow down pushing or outright stop passing around your message if you send more than your quota. Here’s how it works:
We’re going to define a window of time relative to the two peers. Note we don’t use clocks, because clocks never work across computers very well. We’re going to apply a sliding window of time, defined usually in seconds, for how many messages we’re going to deal with from you. We’re going to allow a certain number of full messages, and an even greater number of headers. This way, if you all of the sudden have a million messages to send, you can first send all the full messages you are allowed, and then start sending headers.
We set those parameter limits at handshake time. This is important, because it gives you an idea of the throughput your peer can experience on the network – and therefore you know how many peers you’ll need.
If your node goes over the limits, the peer may decide to blacklist you. That’s pretty harsh. It means you get disconnected from the peer, and cannot reconnect, for a period up to the peer (and not advertised, so you can’t really guess or play it).
New rules in NTB-G
So far, we haven’t defined anything groundbreaking, so let’s go and play with our parameters a bit.
- We are going to add metadata to each message, full or headers.
- The metadata is the trail of the peers the message went through (or just a counter if you’re constrained for size). Each peer is signing the message with their pub key (including the previous metadata), so you can verify it’s legit.
- We’re going to create a notion of max hops for the messages. We’ll go into that some more.
- And we’re going to add, optionally, that any message can have a peer or trail of peers as targets.
First, during handshake, you add two new limits: the number of hops for full messages, and the number of hops for headers. Hopefully you’re more likely to keep repeating headers more than full messages, so your max hops for headers should be much higher than the max hops for full messages. Then, when your node receives a message, it verifies the validity of the message by checking the last peer signed it properly and signs it. It checks the list of peers – anybody blacklisted in there? If yes, your peer may decide to stop right there and then and stop propagating the message.
Your node changes the message to sign it with its own public key – you know the one we talked about earlier? Your node then sends the new message to all its eager peers and sends the headers to all lazy peers – just like Plumtree intended.
With a twist..
By counting the trail of peers or looking at the counter, your node may decide to stop propagating a message if it reached the maximum number of hops. Your node can stop propagation. In the case of a full message, it can switch over to headers and send them instead.
If the node gets headers and wants the full message, it can request the full message from the sender of the message, which may pass it back along. Note in that case you embed the path into the request, and any peer in that path with the full message can stop the request and send back the message.
Routing optimization #1
Now you’ve got a system where peers talk to each other, and have a remote idea of how they’re tied together. So as part of that request for a full message, your node may also give its personal information (“here’s my number”). That allows any remote peer in the trail to opportunistically connect directly, if that makes sense (and it does, here’s a distant peer who wants your message, you should optimize routing for them).
Routing optimization #2
A full message comes in, and you send to all your peers. Well, not the sender, right? You can also look at all the peers the message went through. If any of them are your peers, you can avoid sending them the full message. Send them the header’s message instead.
Routing optimization #3
You get a message, and it’s intended for a specific peer. It’s one of your peers. You send it to that peer and you stop propagating the message once you get an ACK back (or none, I never said we had to be transactional about this). You get a message and it’s for you. You don’t send that message to everyone you know.
Gossiping as a gateway drug
Gossiping is noisy and messy. So sooner or later, you should specialize your comms. While it’s ok to blast everyone with consensus related info, it’s likely other peers will want to communicate with you directly. This type of reallocation of connections allows just that.
Step 1: send all your peers a free sample
Step 2: watch them send back a response asking for the real thing, alongside their connection info
Step 3: connect to them, either directly and/or gossiping over other peers (think TCP holepunching). You can encrypt messages using their public key if you feel particularly shy.
Step 4: Profit?
In this blog post, I outlined some ramblings I had with Dean over Telegram and HackMD. We want to allow gossip communications to be resilient to spam, so we introduced rate limiting, and realism to the way they propagate to the network (gotta stop at some point). By introducing those limits, we discover new fun ways that allow us to optimize routing messages, which may make the system more viable under load with many, many participants.
If you have feedback please join our Telegram or discuss the post on our community forum.