2026-05-08
Lecture Friday: A New Way to look at Networking
This talk first covers two sections, though tersely, providing an overview of our move from telephony networks to packet-switched networks. I'd love more depth since it's likely difficult to follow without prior familiarity. If you've never seen an electromechanical automatic telephone exchange in action, go search for videos. They're super cool.
The harder part is that the third part is also terse, and he's introducing a new way of thinking about the role of a network. I'm left curious but apprehensive. For example, Kademlia already exists by the time of this talk. It's not the same as his pitch where he talks about being able to get access to your data using overhead airplanes, but I'm also skeptical of that sort of rhetoric. It's like talking about microcontrollers mixed into drywall to discuss the future of IoT.
To recap: Telegraphy understands the system as humans across the country with wires between them. By connecting two devices with a wire, you can send data from one end of the wire to the other nearly instantly. Applications use the network to send a message asynchronously. You first give someone on the network a message. They will then pass the message using the wires to another person on the network closer to the intended recipient. This repeats until the message reaches a person closest to the receiver, who will then deliver it when the recipient checks for new telegrams or by dispatching a courier.
Telephony understands the system as connecting wire segments into one long circuit between endpoints. It understands the system as a series of links. Applications specify a path through those links to create a circuit. By connecting devices to the ends of those temporary circuits, you can directly connect devices to each other over long distances.
TCP/IP understands the system as sender and receiver. By moving routing into the network and out of the application, you leave getting between the two up to the network. Your application only needs to know the name of its destination either directly by address or more commonly, indirectly by domain name, which translates into an address. Once you have that address, you can send data to that system and once you start sending, the receiver will obtain your address to reply.
In a data network as described, it sounds like the system removes the sender and receiver and only works on fetching and publishing data. Data has a name, systems do not. You request a name from the network or publish a list of available names to the network. The application requests a given data by name and the network fetches and returns the data for a given name for you, optionally caching it. But this is a subset of what networks currently provide, not a superset as pitched.
I'm ignoring the model where data is broadcast into the network and left there since it devolves into the storage server being treated as an edge node of the network. Someone has to store it and they're paying a cost to do so (both the cost of disk capacity and the opportunity cost to not store something else). I'm unlikely to pay for the network to store data for me if I can store it myself. Then I only have to cover the cost of transit less the savings from efficient caching that reduces the overhead on redundant copies. I might pay to store data if I don't want to run a server 24/7 or if I want redundant copies closer to users for faster response times. In that case, you can think of the network as running a server for you. He mentions the idea of networks just storing broadcast copies among nodes to move data around. That's space expensive and only makes sense when the storage cost is less than the cost to transit. This gets worse as the number of nodes in the network grows and only makes sense when more and more clients request the same data.
I'm not worried about the caching. That's the network's concern in this system as described. A robust caching scheme works when there is no caching. Availability of the data is still left to the endpoints. The network is responsible for returning data by name and storing a list of names endpoints are either publishing or subscribed to. The unfortunate part is that the network is free to return stale data and may return data after the publisher no longer wishes the data to be available. This is a trade-off. Similar to the existing trade-off in TCP/IP where your data may route through hostile nodes on the network because you don't control the paths anymore. By not connecting endpoints, only data, you lose control of lifetimes.
How do you send data back to the origin of some data? You need to be able to do this for dynamic RPCs. I don't just want to see my bank balance, I want to move money too. What's described is robust for static data, but hasn't really thought through mutable data beyond a vague versioning mechanism and generic to specific name alias. I need a way to, based on that first data, send some data back to a listener so they can create new data for me. Do you subscribe to data names and the network will ensure you get all updates to a name at least once? Do you have to poll for them? Are you able to publish names in a message that clients can publish back to? For example, how do you create an account on the New York Times? I assume I publish my account profile to the network indicating my preferences for investigative journalism. How do they know about it? If I update that profile, how do they get the changes? Do they subscribe to all of the millions of their customers' profiles published onto the network for updates? How are we efficiently tracking these subscriptions?
That also implies that data names can be rapidly added and removed from the network. If I have a bespoke data stream on top of the data network (like listing their bank balance or showing a checkout page), I'm publishing new names for each client in real time. How do we manage name propagation or discovery? It's not unsolvable, but does present considerable overhead for static nodes compared to the endpoint addressing scheme. Granted, mobile nodes present overhead right now, so it's a trade-off.
If a data's name is derived from the data, you can ban data. You can work around this by adding non-semantic padding to the data to generate new names for the same effective data. It’s notable that this enables tracking even with encrypted payloads. Since senders likely want to publish statically encrypted data to get any of the benefits of caching, you can track the contents of requests at the blob level, not just the service level. You may not know what they downloaded, but you can check if what you downloaded was downloaded by anyone else. You can include random padding data to ensure uniqueness, but that invalidates any of the wins this architecture could provide in caching since you're providing all users unique data. This also runs into the rapid name problem.
I know this talk was given in 2008, so security was still somewhat fuzzy. But he mentions the example of only being able to secure the tunnel, then alludes to how getting spam over a secure tunnel isn't great. He suggests the network could fix this. I'm not sure where he's going. Let's just ignore that emails are unencrypted here (it was 2008). Why is your network doing spam detection and not the server? This raises a number of red flags around censorship. He says you can communicate over anything that moves and therefore, don't worry. Sneakernets are a fun slot machine but quite limited. The equivalent of relying on those little libraries in front of houses instead of a metropolitan or university reference library.
Let's talk mobile endpoints, which is one area he touts. This is one year before the IETF starts work on MPTCP to allow cellular networks to send a TCP stream over multiple edge connections. That protocol allows seamless transition of a stream as you move between towers. For a query response model on static data, this design is pretty streamlined. It just unfortunately requires applications to be explicitly designed to take advantage of it, unlike MPTCP which was designed to be transparent to applications. Backward compatibility tends to win.
Given the marginal gains for mobile connectivity, the broader question becomes whether this architecture could replace the current stack or simply layer onto it. And this brings me to extend versus overlay. TCP/IP emerged because it could be overlaid on the existing network until it proved valuable enough for networks to pay for equipment capable of moving it down into the hardware. After that, it inspired converged networks where it became easier to run packet networks and run voice on top than run voice networks and packets simultaneously. To that end, is it easier to run data networks and run connections on top? Could you for example run a voice call on this data network? I'm not so sure.
And that's also where my mind goes. We've rigged up the existing system to provide essentially what this does by treating addresses much like names function in the hypothetical data network. You can use BGP to anycast an address and/or GeoDNS to split-horizon a name with different addresses based on where the requester's IP address implies the client is geographically. This is how a CDN works. It's an overlay that excels at static data. It pretends to be a single HTTP server but in reality is many different servers that all serve the same data.
He mentions Akamai but says you can't use a CDN for the Olympics because it's dynamic content. I'm not sure that's right. You can stream video in chunks to distribution servers that clients pull from. Just add a layer to your star topology. Instead of everyone pulling from the origin you setup proxy distributors that pull from the origin and each have a subset of all clients. Still too many clients? Add another star layer. Origin to intermediate distribution, those out to end delivery. This is why every data link standard converges to serial point-to-point links connected in a tiered-star topology transporting packets.
For benefits, you could save the DNS round trip since the network inherently speaks names. On the other hand, everyone publishing things on the web effectively contributes to what becomes BGP routing updates. He notes that the design of names would have to be very careful, and this is partly why. Route aggregation is critical to the function of the internet. Storing individual routes for all four billion IPv4 addresses in fast hardware memory is prohibitively expensive. Now imagine storing just every HTTP path on every domain. This assumes every endpoint returns static data, a vast undercount of the number of names the system needs to handle. Without a hierarchical naming system, flat name routing would bloat forwarding tables. Giving the power of what is effectively BGP router advertisements to adversarial users is a disaster waiting to happen.
If it doesn't need to be a new internet, it could be useful for clients to explicitly support a protocol designed to be a CDN overlay on top of TCP/IP. Otherwise we're stuck cobbling it together using a dozen different protocols. Not sure if there's a good business case to get networks to adopt this transport. Speaking of business model as he did, the biggest flaw right now is the complete collapse of network companies in America and Europe. They've consolidated into monopolies and outsourced all their expertise. You're not going to see anything interesting out of them anymore without using politics to reshape the way they make money.
To give this something of a conclusion, I know I've raised objection after objection. That's probably just because I'm not fully getting it. Understanding these sorts of ideas and working through the implications is a lot of fun. It's kind of like alternative history fan fiction. Sometimes that helps you realize we're stuck in a local minima. Other times you realize we're just stuck with a set of trade-offs and there's a different set of trade-offs that were possible that we didn't end up with. Like Dvorak or Colemak keyboard layouts, foundational changes usually require the alternative to improve things by an order of magnitude to see widespread adoption, not just ten or twenty percent. We can't even switch to IPv6 to get rid of the NAT hack we added to buy time. There's just got to be an unmistakable business case to switching. In the case of IPv6, not switching is creating a reverse pressure as it creates a rental market for addresses.