Testing parsers with shared state
If parser share state, e.g., via a %context
we might not be able to fully test them in isolation.
For this Spicy allows parsing batch input which are trace files similar to PCAPs.
As an example consider this PCAP:
$ tshark -r http-get.pcap
1 0.000000 ::1 → ::1 TCP 56150 → 8080 [SYN] Seq=0 Win=65535 Len=0 MSS=16324 WS=64 TSval=2906150528 TSecr=0 SACK_PERM
2 0.000147 ::1 → ::1 TCP 8080 → 56150 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=16324 WS=64 TSval=91891620 TSecr=2906150528 SACK_PERM
3 0.000173 ::1 → ::1 TCP 56150 → 8080 [ACK] Seq=1 Ack=1 Win=407744 Len=0 TSval=2906150528 TSecr=91891620
4 0.000185 ::1 → ::1 TCP [TCP Window Update] 8080 → 56150 [ACK] Seq=1 Ack=1 Win=407744 Len=0 TSval=91891620 TSecr=2906150528
5 0.000211 ::1 → ::1 HTTP GET /hello.txt HTTP/1.1
6 0.000233 ::1 → ::1 TCP 8080 → 56150 [ACK] Seq=1 Ack=87 Win=407680 Len=0 TSval=91891620 TSecr=2906150528
7 0.000520 ::1 → ::1 TCP HTTP/1.1 200 OK
8 0.000540 ::1 → ::1 TCP 56150 → 8080 [ACK] Seq=87 Ack=275 Win=407488 Len=0 TSval=2906150528 TSecr=91891620
9 0.000584 ::1 → ::1 HTTP HTTP/1.1 200 OK (text/plain)
10 0.000602 ::1 → ::1 TCP 56150 → 8080 [ACK] Seq=87 Ack=293 Win=407488 Len=0 TSval=2906150528 TSecr=91891620
11 0.000664 ::1 → ::1 TCP 56150 → 8080 [FIN, ACK] Seq=87 Ack=293 Win=407488 Len=0 TSval=2906150528 TSecr=91891620
12 0.000686 ::1 → ::1 TCP 8080 → 56150 [ACK] Seq=293 Ack=88 Win=407680 Len=0 TSval=91891620 TSecr=2906150528
13 0.000704 ::1 → ::1 TCP 8080 → 56150 [FIN, ACK] Seq=293 Ack=88 Win=407680 Len=0 TSval=91891620 TSecr=2906150528
14 0.000758 ::1 → ::1 TCP 56150 → 8080 [ACK] Seq=88 Ack=294 Win=407488 Len=0 TSval=2906150528 TSecr=91891620
We can convert this to a Spicy batch file batch.dat
by loading a Zeek policy
script (redef Spicy::filename
to change the output path):
$ zeek -Cr http-get.pcap -b policy/frameworks/spicy/record-spicy-batch
tracking [orig_h=::1, orig_p=56150/tcp, resp_h=::1, resp_p=8080/tcp, proto=6]
recorded 1 session total
output in batch.dat
Now batch.dat
contains data for processing with e.g., spicy-driver
and could be edited.
Most data portions in this batch file have lines terminated with CRLF, but only LF is rendered here.
!spicy-batch v2
@begin-conn ::1-56150-::1-8080-tcp stream ::1-56150-::1-8080-tcp-orig 8080/tcp%orig ::1-56150-::1-8080-tcp-resp 8080/tcp%resp
@data ::1-56150-::1-8080-tcp-orig 86
GET /hello.txt HTTP/1.1
Host: localhost:8080
User-Agent: curl/8.7.1
Accept: */*
@data ::1-56150-::1-8080-tcp-resp 274
HTTP/1.1 200 OK
content-length: 18
content-disposition: inline; filename="hello.txt"
last-modified: Thu, 23 Jan 2025 09:46:26 GMT
accept-ranges: bytes
content-type: text/plain; charset=utf-8
etag: "af67690:12:67920ff2:34f489e1"
date: Thu, 23 Jan 2025 09:46:41 GMT
@data ::1-56150-::1-8080-tcp-resp 18
Well hello there!
@end-conn ::1-56150-::1-8080-tcp
The originator and responder of this connection are on port 56150/tcp
and
8080/tcp
. Any analyzer with a either %port
would be invoked for this
traffic automatically, e.g.,
module foo;
public type X = unit {
%port = 8080/tcp;
data: bytes &eod;
};
on foo::X::%done {
print self;
}
$ spicy-driver -F batch.dat parse.spicy -d
[$data=b"GET /hello.txt HTTP/1.1\x0d\x0aHost: localhost:8080\x0d\x0aUser-Agent: curl/8.7.1\x0d\x0aAccept: */*\x0d\x0a\x0d\x0a"]
[$data=b"HTTP/1.1 200 OK\x0d\x0acontent-length: 18\x0d\x0acontent-disposition: inline; filename=\"hello.txt\"\x0d\x0alast-modified: Thu, 23 Jan 2025 09:46:26 GMT\x0d\x0aaccept-ranges: bytes\x0d\x0acontent-type: text/plain; charset=utf-8\x0d\x0aetag: \"af67690:12:67920ff2:34f489e1\"\x0d\x0adate: Thu, 23 Jan 2025 09:46:41 GMT\x0d\x0a\x0d\x0aWell hello there!\x0a"]
The same mechanism works for mime types.
With >=spicy-1.13
(part of >=zeek-7.2
) one can also externally specify how
analyzers should be mapped to ports so the grammars do not need to specify
%port
/%mime-type
, e.g.,
# Grammar has no `%port` attribute.
$ spicy-driver -F batch.dat parse.spicy -d --parser-alias '8080/tcp=foo::X'