Transcript from the "Introducing Kappa Architecture" Lesson
>> The Kappa Architecture is sort of an old idea. It's not like it was invented. It's just kind of a practice that's evolved over time. You might also hear people talk about things like event sourcing, similar kinds of ideas. They basically all mean this idea. Sometimes they emphasize different parts of it and sometimes they leave out important parts.
[00:00:21] So it's important to know when they're doing that. Because things like the immutability. And the Pindell leanness are kind of important to get the nice properties of Merkle DAGs. I see on the chat someone's asking if Bitcoin uses a Merkle DAG, it does, yeah. So that's how Bitcoin kind of has like a single log.
[00:00:42] So it doesn't really allow forking, except with some of the extensions like side chains and that kind of stuff. I'm no expert in Bitcoin, but I know that it is one, all right. So getting back to the Kappa Architecture, the main idea with a Kappa Architecture is that you have an append only logs.
[00:01:04] So you can only add items to the end of the log and it's like a log book in a ship or a store or something. You're not gonna go back and erase things, scroll them out. You can only add things. In fact, if you want to amend something you said previously, you have to do that as a new document.
[00:01:22] It's like say, by the way, I messed up way back there at that hash. It really ought to have been this way. This idea is a really important idea for building systems that kind of have the certain robustness properties because there's always a list of what happened. And you can always take that list.
[00:01:46] And because it's the single source of truth, you can always play it back and get into the same state that you were at previously, so long as your indexes which are also called materialized views in this in this parlance. If those materialized views are completely deterministic, which they ought to be, then you'll get back to where you started.
[00:02:09] So this is a really useful idea if you're building an enterprise bus with Kafka or something. But it's also really handy for building a peer to peer system where you don't even have any servers. If you're just running the software on people's own computers, their phones, web pages even that kind of thing.
[00:02:32] So to build an append only log, we can use the same trick that we've just seen with hashes to make that log into a Merkle DAG. This does mean that the log might diverge in some cases. So this would be a constraint that if you can deal with logs, then that fork, then you can let the hashes be whatever you want.
[00:02:53] But otherwise, if there's only supposed to be a single, a non-forking log, then that's just something you have to verify in your client software. And then other clients would have to verify that property for you. So for example, if there was a fork, the clients could just like reject the slug altogether.
[00:03:10] It's like, this is no good. And that forking couldn't be caused by an adversary if you do things like use cryptographic signatures to verify authorship of that log. So the best thing about append only logs is that replication is literally concatenation in the naive case. There's fancier ways to do it but if you wanna make something really simple, like we could write some shell scripts.
[00:03:38] And just using the cat command, we could do sync with two logs just by concatenating them. This is also true because with a Merkle DAG, if you're addressing is based on hashes, you can always write the document with the hash as many times as you like in the graph structure doesn't change.
[00:03:58] Because you're not linking to anything different. So with concatenation, it doesn't matter if you've seen the same documents again, you just, whatever, I already have them, that's okay. I'm gonna get the same end result in either case. So it's just the easiest way that you can get data synchronization is to put your data into an append only log.