Message and connection semantics: UDP vs. TCP
The parsers have these stub implementations:
module foo;
public type Request = unit {
payload: bytes &eod;
};
public type Response = unit {
payload: bytes &eod;
};
We have used &eod
to denote that we want to extract all data. The semantics
of all data differ between TCP and UDP parsers:
- UDP has no connection concept so Zeek synthesizes UDP "connections" from flows by
grouping UDP messages with the same
5-tuple
in a time window. UDP has no reassembly, so a new parser instance is
created for each UDP packet;
&eod
means until the end of the current packet. - TCP: TCP supports connections and packet reassembly, so both sides of a
connection are modelled as streams with reassembled data;
&eod
means until the end of the stream. The stream is unbounded.
For this reason one usually wants to model parsing of a TCP connection as a vector of protocol messages, e.g.,
public type Requests = unit {
: Request[];
};
type Request = unit {
# TODO: Parse protocol message.
};
- the length of the vector of messages is unspecified so it is detected dynamically
- to avoid storing an unbounded vector of messages we use an anonymous field for the vector
- parsing of the protocol messages is responsible for detecting when a message ends