TCP is not reliable, it’s just ordered

I gave a talk last Sunday in the Free and Open Source Developers’ European Meeting (FOSDEM) 2007 on how Jaiku uses Jabber, aka XMPP, (Slides with notes, even if I don’t usually do that).

I realized that there was this undertone in my talk that TCP is somehow broken by GPRS, since packets get acked even if they are not received by the phone and that the connection breaks a lot and has a large latency. But that’s of course not true: TCP isn’t being _broken_ by this, just our mistaken view of what TCP is.

TCP tends to be describe as reliable: RFC 793 introduces it as a “highly reliable host-to-host protocol”; Wikipedia claims that it “guarantees reliable [...] delivery” and man tcp calls it “reliable” as well.

The trick, is that the “reliability” this documents talk about is quite different with what we tend to think of as reliable. TCP’s reliability means in-order delivery and integrity (no bits flipped). What people often think when they think of reliable is some vague idea of “data getting to the other end”. Which of course isn’t the case.

If you want reliability, you 0) give your data a unique identifier, 1) store persistently the data you are going to send, 2) send it to the other side 3) store the data persistently on the other side, 4) send an acknowledgement to the sender, 5) delete on the sender. TCP does something like this, but it forgets ‘persistently’, since it’s just in the OS buffers which get thrown away if you close the socket, your program, or the machine.

So, guys and gals, TCP is just ordered: if you send x, y and z the received will never receive just x and z. But that’s all. If you want “reliability” you have to do what I described above on the application level.

This entry was posted in CS. Bookmark the permalink.

10 Responses to TCP is not reliable, it’s just ordered

  1. Anonymous Hero says:

    I disagree with your definition of “reliability”. TCP (as a transport layer protocol) does indeed guarantee reliable data trasfer, but anything else to do with received data is outside TCP’s “jurisdiction”, so to say.

  2. Symbiatch says:

    I, too, disagree with this a bit. Or rather claim that any network programmer with half a brain is aware of these things.

    TCP guarantees that the data gets there, or the connection is closed. As you said, there can’t be a situation that only x and z get through. Naturally if you want to know for sure if the other end got the packet, the other end must tell you this. It’s not like the OS must truly block everything until the data has gone through to the other side. Even the manuals tell that send and such commands block until there is space in the send buffer. So even if the send command succeeds, it doesn’t mean that any data has been sent anywhere.

    So anyone who reads manuals knows this and can design the protocol so that both ends really know what’s gotten through if it is important.

    BTW, you say that packets get ACKed even if they don’t get there. Any network dumps where this has happened? Would be interesting to see this. I’ve never seen this but I have noticed that data transfer to a device on GPRS causes quite a lot retransmissions for packets :P

  3. Mika says:

    I maintain that the use of the word ‘reliable’ in connection with TCP is misleading and makes people not use that half a brain.

    Even Symbiatch’s statement ‘the data gets there or the connection is closed’ is actually a good example of the problem: there is actually no meaningful connection between these two things. Yes, if the connection gets closed no more data will get through, but you have no idea about what of the data that you sent before that got through, if any. So ‘get there or the connection is closed’ is either false (if you meant an exclusive-or) or meaningless (if you meant an inclusive-or).

    Pooh. I do agree with you guys on the principle (neither the specs nor the man pages actually lie), but not in practice.

    My post was inspired by the long thread on the Jabber Standards-JIG list about the need to ack stanzas in XMPP. This is also were the claim that GPRS gateways ack TCP packets on their own came from. After Symbiatch’s question I’ve been digging around to see if anybody else is talking about this, which they aren’t. So I guess I’ll have to try and produce some traces :-).

    Even without any non-transparent gateways in the way, most stacks will not close a connection without acks very soon. Lots of data can get sent in five minutes, and many popular programming environments (Java, anyone?) don’t give you access to any feedback from the stack about what has been acked on the TCP level and what not.

  4. Anonymous Hero says:

    Data is guaranteed to be delivered if you perform graceful shutdown and the application on the other end behaves.

    E.g. HTTP
    - Server sends response, shutdowns writing.
    - Client receives response, then EOF, shuts down writing too.
    - Server receives EOF and hence *knows* that data has got to the other end and was read and understood by the client.

  5. Mika says:

    Ah, we’ve been talking from slightly different reference points. You are quite correct: once you have done a graceful close of the socket, you know all the data was sent.

    I was looking at it from the POV of long-lived connections (like Jabber), where you don’t close the socket after a transaction, but that you have many interesting transactions over the same TCP connection.

  6. XYZ says:

    Yes Mika,
    the reliability is only between the hardware, or better between the tcp stack.
    The user space application cannot trust, if there are mermory problems etc.
    Maybe you found a tcp/ip bug (memory leaks)?
    It should be easy for a clever man like you to verify it with a self made udp based communication ;)

  7. Mika says:

    I stand appropriately chastized for being a smart-ass. But face it: that’s why you are reading this blog.

    And no, I’m not saying I want to do it myself over UDP. ‘Ordered’ is a powerful concept. Let’s look at Anonymous Hero’s HTTP case. It creates reliability from ordering:
    - Client sends [ request ]
    - Server receives [ request ]
    - Server sends [ x, y, z, EOF ]
    - Client receives [ x, y, z, EOF ]
    - Client sends [ EOF ]
    - Server receives [ EOF ]

    Now the only additional contract the client and server have agreed to is that the client only sends EOF after it has read and understood the response, whose end is marked. But assume for example that the client could have received EOF before receiving y. The contract would be useless. So using the ordering we create reliability in this case.

    But it’s actually the same as I was talking about. The data the server sends is x, y and z and the client acknowledges it by sending an EOF.

  8. XYZ says:

    Ouuuh, next time I read a bit slower. Now I understand: You meant the ORDER of the packets and not missing packets. Sorry.

  9. Symbiatch says:

    Anyway I do agree with you in that it would be nice if we had some kind of feedback about data packets etc, but unfortunately we have to do it ourselves.

    It’s like an IM thingy in one place in the wild years. You connect, ask for messages, server sends them to you and deletes them immediately. If your connection breaks, tough luck. If the client crashes, tough luck. Messages gone forever.

    Since you mentioned Jabber, I must say that I really dislike pure XML protocols. You have to parse the stuff on the fly to know when a message has come through etc. If they even had length info somewhere but at least the last time I checked they didn’t. And it’s not that nice to read some bytes, search for an attribute which tells the length and then read the rest (since it wouldn’t be nice to send length as binary before XML.

    wbXML to the resque? ;)

  10. bah. says:

    This post and the following discussion seems to me like FUD, even if it was unintentional. I will attempt to clarify some things about TCP for whoever is reading.

    The abstraction of a reliable in-order byte-stream pipe is one which is easy to understand, implement and use, so a good one in my opinion. Adding any explicit feedback channel to it would not give any significant advantages and would complicate the currently clean interface. I’ll try to explain why.

    The lack of application layer access to the information carried in TCP acks is a feature, not a bug: TCP implementations will send acks to the sender when data has been stored in the receiving TCPs buffer, so if your receiving application crashes before it has processed (or even read) the data but your sending application peeks at the TCP acks to see what has been received, it could mistakenly think the data reached it’s destination (your receiving application). This is why you must implement the feedback at the application layer if you need it: TCP acks contain information relevant to TCP, and TCP does not know the needs of your application. For a thorough discussion, real-life experiences with horror stories and everything, search for all the end-to-end principles papers and read them.

    TCP is reliable even if you don’t close the socket. The data will eventually get there, you just have no way of knowing _when_ unless you implement feedback at the application layer. And for the reasons mentioned above, doing it at the application layer is what you should do.

    You don’t need to store the data you’ve written to a TCP socket, nor do you ever have a need to write it again to the socket (as a basic rule). You can either trust it to get to the other side eventually or you can make the receiver keep the sender posted about what it has received, so the sender.

    If all you need to know is how many messages are in-transit, you can just store a number which you increment when you send a message and decrement when you receive an ‘ack’ (sent by your receiving application). The ‘ack’ doesn’t need to contain anything except the the stuff (in your protocol) necessary to identify it as an ack.

    If your OS has a TCP/IP stack which tears down connections prematurely or if your operator changes your IP address every once in a while, that’s a completely different story, but in these cases, all the reliability features of TCP are useless (including in-order delivery!), and you get to reimplement the whole thing in your application anyway, but it’s not the fault of TCP.