Exercise: Input not matching parser grammar
The PCAP tftp-unsupported.pcap
contains TFTP
traffic, but there are problems. Using just the logs produced by Zeek figure
out what is the issue.
-
Make sure you have the package
zeek/spicy-tftp
in versionv0.0.5
installed, its analyzerspicy_TFTP
is working, and the package scripts are loaded.Hint
The package should be listed
zkg
, e.g.,$ zkg list zeek/zeek/spicy-tftp (installed: v0.0.5) - Spicy-based analyzer for the TFTP protocol.
Otherwise install it with
$ zkg install zeek/spicy-tftp --version v0.0.5 The following packages will be INSTALLED: zeek/zeek/spicy-tftp (v0.0.5) Proceed? [Y/n] Running unit tests for "zeek/zeek/spicy-tftp" Installing "zeek/zeek/spicy-tftp". Installed "zeek/zeek/spicy-tftp" (v0.0.5) Loaded "zeek/zeek/spicy-tftp"
If the package is installed a TFTP analyzer should be listed by
zeek -NN
, e.g.,$ zeek -NN | grep -i tftp [Analyzer] spicy_TFTP (ANALYZER_SPICY_TFTP, enabled)
Running the analyzer on some test data should produce a
tftp.log
. For this the scripts from the package need to be loaded, e.g.,@load packages
should be present inlocal.zeek
.Using the file
tftp_rrq.pcap
from the test data in the package repository:$ zeek -Cr tftp_rrq.pcap $ ls *.log conn.log packet_filter.log tftp.log $ cat tftp.log | jq { "ts": 1737142107.374696, "uid": "CAMTBX2d54B930rgkg", "id.orig_h": "127.0.0.1", "id.orig_p": 60027, "id.resp_h": "127.0.0.1", "id.resp_p": 69, "wrq": false, "fname": "hello.txt", "mode": "octet", "uid_data": "Csmqzc1PPz5ZPJ1uj", "size": 0, "block_sent": 0, "block_acked": 0 }
-
Run
zeek
ontftp-unsupported.pcap
. This should produce at least the log filesconn.log
,tftp.log
andanalyzer.log
.Hint
If you only see
packet_filter.log
andweird.log
you need to invokezeek
with an additional argument.Hint
weird.log
reports that the PCAP has bad checksums.cat weird.log | jq { "ts": 1737142107.374696, "id.orig_h": "127.0.0.1", "id.orig_p": 0, "id.resp_h": "127.0.0.1", "id.resp_p": 0, "name": "bad_IP_checksum", "notice": false, "peer": "zeek", "source": "IP" }
Run
zeek
with-C
(or--no-checksums
) causes Zeek to ignore checksums and produce the three log files. -
Working top to bottom, what is going on and what is the issue?
a. What do you see in
conn.log
?Solution
- two connections are reported,
spicy_tftp_data
with data payload, andspicy_tftp
for some request with TCP data not ack'd (leading to the checksum error) - the
resp_h
andresp_p
ofspicy_tftp_data
connection match theorig_h
andorig_p
of thespicy_tftp
connection, so they seem to belong to the same TFTP transaction - both connections report some data
- apart from the missing ACK nothing seems obviously off
$ cat conn.log | jq { "ts": 1737142107.374971, "uid": "Csmqzc1PPz5ZPJ1uj", "id.orig_h": "127.0.0.1", "id.orig_p": 50012, "id.resp_h": "127.0.0.1", "id.resp_p": 60027, "proto": "udp", "service": "spicy_tftp_data", "duration": 0.0007979869842529297, "orig_bytes": 468, "resp_bytes": 12, "conn_state": "SF", "local_orig": true, "local_resp": true, "missed_bytes": 0, "history": "Dd", "orig_pkts": 3, "orig_ip_bytes": 552, "resp_pkts": 3, "resp_ip_bytes": 96, "ip_proto": 17 } { "ts": 1737142107.374696, "uid": "CAMTBX2d54B930rgkg", "id.orig_h": "127.0.0.1", "id.orig_p": 60027, "id.resp_h": "127.0.0.1", "id.resp_p": 69, "proto": "udp", "service": "spicy_tftp", "conn_state": "S0", "local_orig": true, "local_resp": true, "missed_bytes": 0, "history": "D", "orig_pkts": 1, "orig_ip_bytes": 58, "resp_pkts": 0, "resp_ip_bytes": 0, "ip_proto": 17 }
b. What do you see in
tftp.log
?Solution
- a single TFTP transaction for reading the file
hello.txt
is reported - both
block_sent
andblock_acked
are reported as zero, but no error seems to have occurred
$ cat tftp.log | jq { "ts": 1737142107.374696, "uid": "CAMTBX2d54B930rgkg", "id.orig_h": "127.0.0.1", "id.orig_p": 60027, "id.resp_h": "127.0.0.1", "id.resp_p": 69, "wrq": false, "fname": "hello.txt", "mode": "octet", "uid_data": "Csmqzc1PPz5ZPJ1uj", "size": 0, "block_sent": 0, "block_acked": 0 }
b. What do you see in
analyzer.log
? How it related to the grammar?Solution
- the TFTP analyzer
SPICY_TFTP
reports a protocol violation since it sees an unknown and unhandled opcode (Opcode::<unknown-6>
) - the payload data triggering this error is
b"\x00\x06blksize\x00256\x00"
$ cat analyzer.log | jq { "ts": 1737142107.374971, "cause": "violation", "analyzer_kind": "protocol", "analyzer_name": "SPICY_TFTP", "uid": "Csmqzc1PPz5ZPJ1uj", "id.orig_h": "127.0.0.1", "id.orig_p": 50012, "id.resp_h": "127.0.0.1", "id.resp_p": 60027, "failure_reason": "no matching case in switch statement for value 'Opcode::<unknown-6>' (/root/spicy-tftp/analyzer/tftp.spicy:20:5-26:10)", "failure_data": "\\x00\\x06blksize\\x00256\\x00" }
Looking at the error location
/root/spicy-tftp/analyzer/tftp.spicy:20:5-26:10
, the protocol violation happens since noOpcode=6
is modelled in the grammar.public type Packet = unit { # public top-level entry point for parsing op: uint16 &convert=Opcode($$); switch ( self.op ) { # ~~~~~~~~~~~~~~~~~~~~ no matching case in switch statement for value 'Opcode::<unknown-6>' Opcode::RRQ -> rrq: Request(True); Opcode::WRQ -> wrq: Request(False); Opcode::DATA -> data: Data; Opcode::ACK -> ack: Acknowledgement; Opcode::ERROR -> error: Error; }; };
# TFTP supports five types of packets [...]: # # opcode operation # 1 Read request (RRQ) # 2 Write request (WRQ) # 3 Data (DATA) # 4 Acknowledgment (ACK) # 5 Error (ERROR) type Opcode = enum { RRQ = 0x01, WRQ = 0x02, DATA = 0x03, ACK = 0x04, ERROR = 0x05 };
- two connections are reported,
-
Figure out the spec of what you need to implement to parse this connection.
Hint
Possible search terms:
TFTP opcode 6 RFC blksize
(🔎 duckduckgo.com)Solution
Relevant RFCs:
RFC2348 is specifically about the
blksize
negotiation we are seeing here, but implementing option support more generally according to RFC2347 does not seem much more work. -
Write a minimal extension to the grammar so it can parse this connection (ignore logging for now).
You only need to extend parsing for the response, why is the extra data in the request ignored? Add parsing to the request side as well and make sure we do not miss additional data in the future.
Sometimes surrounding context is needed in addition to the payload information from the logs, but probably not here. If you still want to have a look at the PCAP with Wireshark you could install and run
tshark
like this:$ apt-get update $ apt-get install -y tshark $ tshark -r /workspaces/zeek-playground/tftp-unsupported.pcap Running as user "root" and group "root". This could be dangerous. 1 0.000000 127.0.0.1 → 127.0.0.1 TFTP 62 Read Request, File: hello.txt, Transfer type: octet, blksize=256 2 0.000275 127.0.0.1 → 127.0.0.1 TFTP 46 Option Acknowledgement, blksize=256 3 0.000520 127.0.0.1 → 127.0.0.1 TFTP 36 Acknowledgement, Block: 0 4 0.000590 127.0.0.1 → 127.0.0.1 TFTP 292 Data Packet, Block: 1 5 0.000816 127.0.0.1 → 127.0.0.1 TFTP 36 Acknowledgement, Block: 1 6 0.000876 127.0.0.1 → 127.0.0.1 TFTP 226 Data Packet, Block: 2 (last) 7 0.001073 127.0.0.1 → 127.0.0.1 TFTP 36 Acknowledgement, Block: 2
Solution
-
To parse this trace without errors we only need to add support for Option Acknowledgment (OACK) packets.
-
Add support for OACK
Opcode
:type Opcode = enum { RRQ = 0x01, WRQ = 0x02, DATA = 0x03, ACK = 0x04, - ERROR = 0x05 + ERROR = 0x05, + OACK = 0x06, };
-
Add support for parsing OACK packets ignoring data for now since it will not be logged:
public type Packet = unit { # public top-level entry point for parsing op: uint16 &convert=Opcode($$); switch ( self.op ) { Opcode::RRQ -> rrq: Request(True); Opcode::WRQ -> wrq: Request(False); Opcode::DATA -> data: Data; Opcode::ACK -> ack: Acknowledgement; Opcode::ERROR -> error: Error; + Opcode::OACK -> : skip bytes &eod; # Ignore OACK payload for now. }; };
-
-
Making sure all request data is consumed.
The request was parsed successfully even with the additional option data since the parser operates on UDP data. For UDP data "connections" are only assembled by Zeek, but on the Spicy side each packet is parsed individually, so there is no stray data "left on the wire" for any other parser to stumble over.
With that any extra data passed to
Request
simply falls off the end. We could encode that no additional data is expected by adding askip
field which consumesbytes
until&eod
, but can at most consume zero bytes (i.e., nothing expected until EOD).type Request = unit(is_read: bool) { filename: bytes &until=b"\x00"; mode: bytes &until=b"\x00"; + : skip bytes &eod &max-size=0; on %done { spicy::accept_input(); } };
-
Consume options on the request side.
By consuming an unknown number of options we activate lookahead parsing which would fail for non-option data, so above explicit skipping is not needed anymore.
-
Add a unit to parse options:
+type Option = unit { + name: skip bytes &until=b"\x00"; + value: skip bytes &until=b"\x00"; +};
-
Extend request to also consume options (data ignored for now):
type Request = unit(is_read: bool) { filename: bytes &until=b"\x00"; mode: bytes &until=b"\x00"; + options: skip Option[]; on %done { spicy::accept_input(); } };
-
-
-
If you have not already done so, add testing for your parser changes. What are the possible approaches and where should we test?
Solution
Applying the terminology described in the section about testing:
- parsing changes affect individual units
- the changed units are
public
- changes only affect the Spicy parsing layer with no visible external effects beyond absence of parse errors
Ideally add some test based on a Spicy batch file, but for TFTP we make the connection between request and response only in the Zeek scripts. Instead add an system, end-to-end test based on the original PCAP (which is small):
# @TEST-EXEC: zeek -NN Zeek::Spicy > l # # @TEST-EXEC: zeek -Cr ${TRACES}/tftp_blksize_option.pcap ${PACKAGE} # # Note: Try to extract relevant columns from common logs to # reduce churn with upstream changes. # @TEST-EXEC: cat conn.log | zeek-cut uid service orig_pkts resp_pkts > conn.log.min # @TEST-EXEC: btest-diff conn.log.min # # @TEST-EXEC: btest-diff tftp.log
Add some unit-style tests:
## Precompile grammar so we can reuse it across multiple tests. # @TEST-EXEC: spicyc -dj ${DIST}/analyzer/tftp.spicy %INPUT -o tftp.hlto ## Note: We use bash's `printf` to output binary data (BTest defaults to `sh`) ## below. We should capture this in a script for reuse. ## Validate parsing of OACK responses ## ---------------------------------- ## ## Baseline: ## ## TFTP::Packet { ## op: OACK ## } ## # @TEST-EXEC: bash -c 'printf "\x00\x06blksize\x00256\x00"' | spicy-dump tftp.hlto >oack.log 2>&1 # @TEST-EXEC: btest-diff oack.log ## Validate parsing of options in requests ## --------------------------------------- ## No options ## ~~~~~~~~~~ ## ## Baseline: ## ## TFTP::Packet { ## op: RRQ ## rrq: TFTP::Request { ## filename: hello.txt ## mode: octet ## options: [] ## } ## } ## # @TEST-EXEC: bash -c 'printf "\x00\x01hello.txt\x00octet\x00"' | spicy-dump tftp.hlto >request-no-opts.log 2>&1 # @TEST-EXEC: btest-diff request-no-opts.log ## Single option ## ~~~~~~~~~~~~~ ## ## Baseline: ## ## TFTP::Packet { ## op: RRQ ## rrq: TFTP::Request { ## filename: hello.txt ## mode: octet ## options: [ ## TFTP::Option {} ## ] ## } ## } ## # @TEST-EXEC: bash -c 'printf "\x00\x01hello.txt\x00octet\x00blksize\x00256\x00"' | spicy-dump tftp.hlto >request-single-opt.log 2>&1 # @TEST-EXEC: btest-diff request-single-opt.log ## Multiple options ## ~~~~~~~~~~~~~~~~ ## ## Deliberately test an unsupported option with slightly weird value (empty). ## ## Baseline: ## ## TFTP::Packet { ## op: RRQ ## rrq: TFTP::Request { ## filename: hello.txt ## mode: octet ## options: [ ## TFTP::Option {} ## TFTP::Option {} ## ] ## } ## } ## # @TEST-EXEC: bash -c 'printf "\x00\x01hello.txt\x00octet\x00blksize\x00256\x00my special option\x00\x00"' | spicy-dump tftp.hlto >request-two-opts.log 2>&1 # @TEST-EXEC: btest-diff request-two-opts.log
Spoiler
During this exercise you basically implemented a solution for zeek/spicy-tftp#14.
Key takeaways
analyzer.log
is needed to operate parsers and should be ingested into the SIEM solution- similarly,
dpd.log
contains useful information about the DPD phase and is needed to diagnose issues where parsers never make it past DPD
- similarly,
- even if a parser fully supports a protocols RFC, extensions are common (codified in another RFC, or private)
- Always try to add tests when changing an analyzer. This not only makes sure that you do not accidentally break your parser, but also serves as living documentation of its features.