Exercises: A naive CSV parser
Assuming the following simplified CSV format:
- rows are separated by newlines
b"\n"
- individual columns are separated by
b","
- there are not separators anywhere else (e.g., no
,
in quoted column values)
A sample input would be
I,a,ABC
J,b,DEF
K,c,GHI
When copying above data into a file, make sure it ends in a single newline. If you use the copy to clipboard button (upper right in snippet) the data should be copied correctly.
For testing you can use the -f
flag to spicy-dump
or spicy-driver
to read
input from a file instead of stdin, e.g.,
spicy-driver csv_naive.spicy -f input.csv
-
Write a parser which extracts the bytes on each row into a vector.
Hint 1
You top-level parser should contain a vector of rows which has unspecified length.
Hint 2
Define a new parser for a row which parses
bytes
until it finds a newline and consumes it.Solution
module csv_naive; public type CSV = unit { rows: Row[]; }; type Row = unit { data: bytes &until=b"\n"; };
-
Extend your parser so it also extracts individual columns (as
bytes
) from each row.Hint
The
&convert
attribute allows changing the value and/or type of a field after it has been extracted. This allows you to split the row data into columns.Is there a builtin function which splits your row data at a separator (consuming the iterator)? Functions on
bytes
are documented here. You can access the currently extracted data via$$
.Solution
module csv_naive; public type CSV = unit { rows: Row[]; }; type Row = unit { cols: bytes &until=b"\n" &convert=$$.split(b","); };
-
Without changing the actual parsing, can you change your grammar so the following output is produced? This can be done without explicit loops.
$ spicy-driver csv_naive.spicy -f input.csv [[b"I", b"a", b"ABC"], [b"J", b"b", b"DEF"], [b"K", b"c", b"GHI"]]
Hint 1
You could add a unit hook for your top-level unit which prints the rows.
on CSV::%done { print self.rows; }
Since
rows
is a vector of units you still need to massage its data though ...Hint 2
You can use a unit
&convert
attribute on your row type to transform it to its row data.Solution
module csv_naive; public type CSV = unit { rows: Row[]; }; type Row = unit { data: bytes &until=b"\n" &convert=$$.split(b","); } &convert=self.data; on CSV::%done { print self.rows; }