Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Exercises: A naive CSV parser

Assuming the following simplified CSV format:

  • rows are separated by newlines b"\n"
  • individual columns are separated by b","
  • there are not separators anywhere else (e.g., no , in quoted column values)

A sample input would be

I,a,ABC
J,b,DEF
K,c,GHI

Note

When copying above data into a file, make sure it ends in a single newline. If you use the copy to clipboard button (upper right in snippet) the data should be copied correctly.

For testing you can use the -f flag to spicy-dump or spicy-driver to read input from a file instead of stdin, e.g.,

spicy-driver csv_naive.spicy -f input.csv
  1. Write a parser which extracts the bytes on each row into a vector.

    Hint 1

    You top-level parser should contain a vector of rows which has unspecified length.

    Hint 2

    Define a new parser for a row which parses bytes until it finds a newline and consumes it.

    Solution
    module csv_naive;
    
    public type CSV = unit {
        rows: Row[];
    };
    
    type Row = unit {
        data: bytes &until=b"\n";
    };
    
  2. Extend your parser so it also extracts individual columns (as bytes) from each row.

    Hint

    The &convert attribute allows changing the value and/or type of a field after it has been extracted. This allows you to split the row data into columns.

    Is there a builtin function which splits your row data at a separator (consuming the iterator)? Functions on bytes are documented here. You can access the currently extracted data via $$.

    Solution
    module csv_naive;
    
    public type CSV = unit {
        rows: Row[];
    };
    
    type Row = unit {
        cols: bytes &until=b"\n" &convert=$$.split(b",");
    };
    
  3. Without changing the actual parsing, can you change your grammar so the following output is produced? This can be done without explicit loops.

    $ spicy-driver csv_naive.spicy -f input.csv
    [[b"I", b"a", b"ABC"], [b"J", b"b", b"DEF"], [b"K", b"c", b"GHI"]]
    
    Hint 1

    You could add a unit hook for your top-level unit which prints the rows.

    on CSV::%done {
        print self.rows;
    }
    

    Since rows is a vector of units you still need to massage its data though ...

    Hint 2

    You can use a unit &convert attribute on your row type to transform it to its row data.

    Solution
    module csv_naive;
    
    public type CSV = unit {
        rows: Row[];
    };
    
    type Row = unit {
        data: bytes &until=b"\n" &convert=$$.split(b",");
    } &convert=self.data;
    
    on CSV::%done {
        print self.rows;
    }