Message and connection semantics: UDP vs. TCP

The parsers have these stub implementations:

module foo;

public type Request = unit {
    payload: bytes &eod;
};

public type Response = unit {
    payload: bytes &eod;
};

We have used &eod to denote that we want to extract all data. The semantics of all data differ between TCP and UDP parsers:

  • UDP has no connection concept so Zeek synthesizes UDP "connections" from flows by grouping UDP messages with the same 5-tuple in a time window. UDP has no reassembly, so a new parser instance is created for each UDP packet; &eod means until the end of the current packet.
  • TCP: TCP supports connections and packet reassembly, so both sides of a connection are modelled as streams with reassembled data; &eod means until the end of the stream. The stream is unbounded.

For this reason one usually wants to model parsing of a TCP connection as a vector of protocol messages, e.g.,

public type Requests = unit {
    : Request[];
};

type Request = unit {
    # TODO: Parse protocol message.
};
  • the length of the vector of messages is unspecified so it is detected dynamically
  • to avoid storing an unbounded vector of messages we use an anonymous field for the vector
  • parsing of the protocol messages is responsible for detecting when a message ends