Christoph and Nate lift concepts from the raw log-parsing series.

Christoph and Nate lift concepts from the raw log-parsing series.

Reflecting on the lessons learned in the log series.
(01:15) Concept 1: We found Clojure to be useful for devops.

Everything is a web application these days,
"The only UIs in Devops are dashboards."
For most of the series, our UI was our connected editor.
We grabbed a chunk of the log file and were fiddling with the data in short order.
We talk about connected editors in our REPL series, starting with Episode 12.
Being able to iteratively work on the log parsing functions in our editor was key to exploring the data in the log files.

(04:04) Concept 2: Taking a lazy approach is essential when working with a large data set.

Lazily going through a sequence is reminiscent of database cursors. You are at some point in a stream of data.
We ran into some initial downsides.
When using with-open, fully lazy processing results in an I/O error, because the file has been closed already.
Shouldn't be too eager too early, because then the entire dataset will reside in memory.
Two kinds of functions: lazy and eager.

Lazy functions only take from a sequence as they need more values.
Eager functions consume the whole sequence before returning.

Ensure that only the last function in the processing chain is eager.
"It only takes one eager to get everybody unlazy."

(08:38) Concept 3: Clojure helps you make your own lazy sequences using lazy-seq.

Clojure has a deep library of functions for making and processing lazy sequences.
We were able to make our own lazy sequences that could then be used with those functions.
Wrap the body in lazy-seq and return either nil (to indicate the end) or a sequence created by calling cons on a real value and a recursive call to itself.

(12:41) Concept 4: We work with information at different levels, and that forms an information hierarchy.

The data goes from bits to characters to lines, and then we get involved.
We move from lines on up to more meaningful entities. Parsed lines are maps that have richer information, and then errors are richer still.
Our parsers take a sequence and emit a new sequence that is at a higher level of information.
We first explored this concept in the Time series.
The transformations from one level to the next are all pure.

(14:53) Concept 5: Sometimes you have to go down before you can go up again another way.

We pre-abstracted a little bit, and only accepted lines that had all of the data we were looking for (time, log level, etc.).
Exceptions broke that abstraction, so we reworked our "parsed line" map to make the missing keys optional.

(15:54) Concept 6: Maps are flexible bags of dimensions. They are a set of attributes rather than a series of rigid slots that must be filled.

Functions only need to look at the parts of the map that they need.
Every time we amplify the data, we add a new set of dimensions.
Thanks to namespacing, all of these dimensions coexist peacefully.
Multiple levels of dimensions give you more to filter/map/reduce on.
Just because you distill, doesn't mean you want to lose essence.

(21:09) Concept 7: Operating within a level of information is a different concern than lifting up to a higher level of information.

Within a level, functions aid in filtering and aggregating.
Between levels, functions recognize patterns and groupings to produce higher levels of information.
Make the purpose of the function clear in how you name it.
Separate functions that "lift" the data from functions that operate at the same level of information.
When exploring data, you don't know where it will lead, so start by moving the data up a level in small steps.

Related episodes:

012: Embrace the REPL
015: Finding the Time
028: Fail Donut
029: Problem Unknown: Log Lines
030: Lazy Does It
031: Eager Abstraction
032: Call Me Lazy
033: Cake or Ice Cream? Yes!
034: Break the Mold

Clojure in this episode:

lazy-seq, cons
with-open