35:13Right.
35:13And I definitely did hit it and.
35:16You will find in the resource, it's
not addressing that issue right now.
35:20It's in a very naive state of when change
is made, send, fetch, call to server.
35:26And there are a few problems with
that, even if you're the only client.
35:30Because first off, if you send one event
per request, it's very possible that you
35:35send a lot of very quick secession events
in like an order, and then they reach the
35:41server in a different order because maybe
you pushed the first change on low network
35:47latency, the next one on high latency,
or actually the reverse of that where
35:50the first change hits after the second
change because that's just the speed of
35:54the network and that's what happened.
35:56And also you need to think
about offline capabilities.
36:00If you push changes when they
happen, how do you queue them
36:04when you're not connected to the
internet and then run through that
36:07queue once you're back online?
36:09That's another consideration
you kind of have to think about.
36:12Could be solved with just like
an in-memory event log and
36:15just kind of work with that.
36:16But you still have the order issue.
36:19I'm familiar with atomic
clocks as a method to do this.
36:23There are even SQLite extensions
that'll sort of enforce that, having
36:28not implemented atomic clocks.
36:30Is it kind of this silver bullet
to that problem or are there more
36:34considerations to think about than
just reaching for something like that?
36:38Right.
36:39I suppose you're referring
to vector clocks or logical
36:42clocks on a more higher level?
36:43Yeah.
36:44since the atomic clocks, at least
my understanding is like that's
36:47actually what's, at least in some
super high-end hardware is like
36:51an atomic clock that is, like that
actually gives us like the wall clock.
36:56So Right, right now is like.
36:58Uh, 6:30 PM on my time, but
this clock might drift, and this
37:03is what makes it so difficult.
37:04So what you were referring to with logical
clocks, this is where it basically,
37:09instead of saying like, Hey, it's
6:30 with this time zone, which makes
37:14everything even more complicated, I'm
keeping track of my time is like 1, 2, 3.
37:20It like might just be a logical
counter, like much simpler
37:24actually than wall clock time.
37:26but this is easier to reason about
and there might be no weird issues of
37:31like, Daylight saving where certainly
like the, the clock is going backwards
37:36or someone tinkers with the time,
this is why you need logical clocks.
37:40And, there, at least the mechanism
that I've also landed on to
37:44implement, to impose a total order.
37:47But then it's also tricky,
how do you exchange that?
37:50how does your client know what like
three means in my client, et cetera?
37:54And the answer that I found to
this is to like that we all trust.
38:00A single, authority in the system.
38:02So this is where, and I think this is also
what you're going for, and with the Git
38:07analogy, what we are trusting as authority
in that system is GitHub or GitLab.
38:13And this is where we are basically,
we could theoretically, you could
38:17send me your IP address and I could
try to like pull directly from you.
38:20It would work, and that would also
work with the system that you've built.
38:25However, there might still be,
they're called network petitions,
38:29where like the two of us have like,
synced up, but some others haven't.
38:33So as long as we're all connected to
the same, like main upstream node, that
38:39is the easiest way to, to model this.
38:41An alternative would be to go full on
peer to peer, which makes everything
38:46a lot, lot, lot more complicated.
38:49And this is where like something, like
an extension of logical clocks called
38:53vector clocks, can come in handy.
38:55you've mentioned the, the book, designing
dataset intensive application by Martin
39:00Kleppman had him on the show before.
39:02he's actually working on the version two
of that book right now, but he's also done
39:06a fantastic free course about distributed
systems where he is walking through all of
39:12that, with a whiteboard, I actually think
so, I think does what, what the two of
39:18you have very much like you've both nailed
the, craft of like showing with simple
39:24strokes, some very complicated matters.
39:27so highly recommend to anyone
who wants to learn more there.
39:31Like, learn it from, from Martin.
39:33He's, like an absolute master
of explaining those difficult
39:37concepts in a simple way.
39:40But, yeah, a lot of things go kind
of downstream from that total order.
39:45So just to, go together on like one little
journey to understand like a downstream
39:51problem of this, let's say we have
implemented the queuing of those events.
39:56So let's say you're currently on
a plane ride and, you're like.
40:00Writing your blog post,
you're very happy with it.
40:03You have now like a thousand
of events of like change
40:07events that captures your work.
40:09Your SQLite database is up to date.
40:12but you didn't just create this new blog
post, but you maybe while you're still at
40:16the airport, like you created the initial
version with it with like TBD in the body.
40:21And your coworker thought like, oh,
actually I have a lot of thoughts on this.
40:26And they also started writing
down some notes in there.
40:29And now, the worlds have
like, kind of drifted apart.
40:33Your coworker.
40:35Has written down some important
things they don't want to lose,
40:38and you've written down some things
you are not aware of the other ones
40:42neither are they, and at some point
the semantic merge needs to happen.
40:48But how do you even make that happen
in this sync engine thing here?
40:52And this is where you need the total
order, where you basically, in the worst
40:57case, this is what decides, like who, gets
a say in this, who gets the last say, in
41:04which order those events have happened.
41:07The model that I've landed on, and
I think that's similar to what Git
41:12does with rebasing, is basically that
before you get to push your own stuff,
41:18you need to pull down the events
first, and then you need to reconcile
41:22your kind of stash local changes.
41:26On top of the work that whoever has
gotten the, who got lucky enough to push
41:32first without being told to pull first.
41:35So in that case, it might have
been your coworker because they've
41:39stayed online and kept pushing.
41:41And now it sort of like falls
on you to reconcile that.
41:46And I've implemented a, like an
actual rebase mechanism for this,
41:51where you now have this set of
new events that your coworker has
41:56produced and you still have your set
of events that, reflect your changes.
42:01And now you need to reconcile this.
42:03So that is purely on the.
42:05Event log level, but given that we
both, want to use SQLite now, we don't
42:12need to just think about going forward
with SQLite, but we also now need to
42:17think about like, Hey, how do we go?
42:19Like in Git you have like, you
have this stack of events, right?
42:24So you have like a commit, which has
a parent of another commit, which
42:27has a parent of another commit.
42:29It's very similar to how your events and
this event log look like, except it's now
42:36no longer just one event log, but you also
get this little branch from your coworker.
42:41So now you need to go to
the last common ancestor.
42:44And from there you need
to figure out like.
42:46How do I linearize this?
42:49I've opted for a model where everything
that was pushed once cannot be
42:53overwritten, so there's no force push.
42:55So you basically just get
to append stuff at the end.
42:59But, in order to get there, you need
to first roll back your own stuff, then
43:05play forward what you've gotten first.
43:08and then on top add those.
43:10And the rolling back with SQLite is
a, thing that I've like put a lot of
43:15time into where I've been using another
SQLite extension, called the SQLite
43:21Sessions extension, which allows you,
per SQLite write, to basically, record
43:27what has the thing actually done.
43:30So instead of storing, insert.
43:33Into issues, blah, blah, blah.
43:35when running that, you get a blob
of let's say 30 bytes, and that has
43:40recorded on SQLite level, what has
happened to the SQLite database.
43:46And I store that alongside of each
change event, that sits in the event log.
43:53And the very cool thing about this
is, I can use that to replay it
43:57on top of another database, but to
kind of catch it up more quickly.
44:01But I can also invert it.
44:03So now I have basically this
like, let's say 20 events.
44:07And for each, I've recorded what
has happened on SQLite level,
44:11and now I can basically say.
44:13When I need to roll back, I can revisit
each of those, invert each of those
44:17change sets, apply them again on the
SQLite database, and then I'll end up
44:22where I was before and that's how I've
implemented rollback on top of SQL Lite.
44:27So this is as mentioned when
you're going, down the, rabbit hole
44:32of like imposing a total order.
44:34There's a lot of downstream
things you need to do that makes
44:37this even more complicated.
44:39But, from what I can see,
you're, on the right track if
44:43you wanna pursue this further.
44:45Yeah.
44:45And I do have a rebasing mechanism
in place in mind that's more,
44:52just kind of a sledgehammer.
44:53I got two SQLite databases in mind.
44:56in the same way that on Git you have like
your local copy of the main line and your
45:00local copy of your work, there's always
this local copy of Main, that's just
45:05whatever events have come from the server.
45:07So this is the source of truth that the
server has told me about and that was
45:12something I forgot to mention earlier.
45:13Explaining all of this is the
server is the source of truth.
45:16It has that main line of the order
of all of the events, and that is
45:21what all the clients use to trust.
45:23But yeah, it has like that local
copy, and then when it pulls from
45:27the server, it'll update that copy.
45:29It'll look at all the events that
are kind of ahead in the client,
45:33and then it'll say, okay, I'm gonna
roll back my client copy of my
45:39branch to whatever the server is.
45:41And it's literally just a file right call.
45:43So it just overwrites.
45:46Your like client SQLlite file
with a copy of the server one.
45:50And then we look at the events that
the server didn't acknowledge yet
45:53and then we replay those on top as
a very basic way to pull and make
45:58sure, because it's very possible that
you made some changes locally that
46:02the server hasn't acknowledged yet.
46:04Like you've pushed them up still
in process and you pull down the
46:08latest changes and you don't see
all of that stuff that you pushed
46:11up yet because of network latency.
46:14So this sort of avoids that problem
where you pull down from the server
46:18and now you need to replay whatever
you did on the client that the
46:21server hasn't acknowledged yet.
46:23It hasn't received that network request.
46:25So that was a very basic need to
have some rebasing, but it does
46:30get a lot more complicated when you
have collaborators on a document.
46:34I've seen a few different
versions of this.
46:37CRDTs is the fun, like magic wand.
46:40It does everything.
46:42but there are also solutions from
Figma, for example, where they
46:47say everything in Figma is kind
of its own little data structure.
46:50Like you can put some text and
that's its own little data field.
46:54You have rectangles.
46:54Those are a data field.
46:56And whenever you update a rectangle,
like you update the pixel width of
47:01a rectangle, that's like an update
event on some SQL table that stores
47:05all the rectangles for this document.
47:07So whenever you make that update, it'll
update the pixel value of whatever
47:12that row entry is, and then it'll push
it up for other people to receive.
47:17And when you pull it down,
it's last right wins.
47:20In other words, whoever the last
person is in that order that the
47:24server decided on that total order.
47:26That's a new word I know about now.
47:28Didn't know it was called total order,
but yeah, that, once you pull it down,
47:31whatever the server said was the order
of events, that's gonna be the final
47:35state of that rectangle on your device.
47:38The only time it becomes a problem, and
you may have experienced this, if you're
47:41ever working on like a fig jam together
with a bunch of people, if you're all
47:45typing in the same text box, everyone's
just like overriding each other and a
47:48text box glitches out and changes to
whatever's on the other person's screen.
47:52You can't see people's cursors
because you're fighting to update
47:55the exact same entry in the database
and it can't reconcile those changes.
48:00so it only works up to, like
you're editing different things
48:04in the file and you're not really
stepping on each other too much.
48:08As soon as you're stepping on each other
trying to edit like the same text field,
48:12then you wanna reach for something
that's very, very fancy, like CRDTs.
48:17Which will try to merge elegantly
all of the changes that you're
48:20typing into the same database field.
48:23It's maybe over-prescribed because of how
powerful it is, but for those specific
48:28scenarios, it's really nice to reach for,
and we can talk about them if you want.
48:32I only have a high level understanding
of what CRDTs do, but it would be
48:36something to apply that kind of problem.
48:39my takeaway from where to apply, CRDTs
versus where I would apply event sourcing
48:45is, CR DTs great for in two scenarios.
48:51One, if you don't quite know
yet where you want to go.
48:54And where in the past you might've
reached for, let's say, Firebase to
48:59just like have a backend of service.
49:00You know, you might want to change
it later, but you just, for now,
49:04you just want to get going and,
you can, particularly if you
49:08don't have like a strict schema
across your entire application.
49:12So you just try to like, not go off
the rails too much, but at least the
49:17data is like, mostly, like across
the applications in a good spot.
49:22But as you roll this out in
production, and, we are shipping
49:26an iOS app as well, that someone
is, running an old version on.
49:31Now you don't quite know, oh, this
document, this data document that has
49:35been synced around here, this might
not yet have this field that the
49:39newer application version depends on.
49:42So now you have, like, this is where
time drifts in a more significant
49:47way and in the more traditional
application architecture approach
49:52you would, this way you don't trust
the client in the first place.
49:54Then you have like your API endpoint
and the APIs, versioned, et cetera, and
49:58everything is governed through the, API.
50:01But now you also need to
tame that problem somehow.
50:03So at this point you're already,
going a little bit beyond where I
50:07think CRDTs shine right now, which
brings me to my next kind of more
50:12evergreen scenario for CRDTs, which
are like very specific, tasks.
50:19And so text editing,
particularly rich text editing.
50:22Is such a scenario where I think CRDTs
are just like a very, very good, approach.
50:28There's also like, you can also use
ot, like operational transform, which
50:32is, somewhat related under the covers,
works a bit differently, but the way how
50:37you would use it is pretty similarly.
50:40And, related to rich text editing
is also when you have like complex
50:45list structures where you wanna
move things within the list.
50:49So if you want to go for the, Figma
scenario, let's say you change the
50:55order of like multiple rectangles, like
where do they sit in that layer order?
51:01how do you convey how
you wanna change that?
51:04You could always, have like maybe
an array of all the IDs that give
51:08you this perfect order, but if
this kind of happens concurrently,
51:13then you need to reconcile that.
51:14So that's not great.
51:16And this is where CRDTs are also
like a very, special purpose
51:20tool, which works super well.
51:23And so what I've landed on is use
event sourcing for everything except
51:28where I need those special purpose
tools, and this is where them reach
51:33for CRDTs or for something else.
51:35That's kind of the conclusion I, took away
if you like the event sourcing approach.
51:41But, I think ultimately it really
comes down to what is the application
51:46that you're building and what are,
like, what is the domain of what
51:51you're building and which sort
of trade-offs does this require?
51:54So I think in Figma.
51:56The real timeness is really important
and it is recognized that those different
52:02pieces that are floating around, they're
like pretty, independent from each other.
52:07So, and if they're independent,
then you don't need that total order
52:10between that, which makes everything
a lot easier in terms of scalability,
52:14in terms of correctness, and then
you don't need to rebase as much.
52:18distributed systems is the
ultimate case of it depends.
52:22and I think trying to build one like
you did, I think is a very good way
52:28to like build a better understanding.
52:30And also I think that opens your eyes
of like, ah, now I understand why Figma
52:35has this shortcoming or Notion if we are
trying to change the same line, change the
52:40same block as where last writers, applies.
52:43Whereas in Google Docs, for example, we
could easily change the, same word even.
52:49And it would reconcile
that in a, in a better way.
52:52But, maybe you have some advice for
people like yourself when you're
52:57just getting started on that journey.
53:00What would you tell people what they
should do maybe shouldn't yet do?
53:05today 2025?
53:07There's more technologies out there now.