12 Unnesting
Sometimes, data thatβs read into a data-frame will have deeply nested structure (for example, when reading JSON directly into a data-frame). These operations convert these variables into tidy data, as described in Pivoting; with enough unnesting, these operations can turn this data into a tidy data-frame (when all hashes, vectors, lists, etc are removed).
These operations only work on "list-like" (the sequence returns one value per iteration) or "dictionary-like" (two values per iteration) sequences.
> (define deep-df (column-df [character #("Toothless" "Dory" "Holly")] [metadata (vector (hash 'species "dragon" 'color "black" 'films (vector "How to Train Your Dragon" "How to Train Your Dragon 2" "How to Train Your Dragon: The Hidden World")) (hash 'species "blue tang" 'color "blue" 'films (vector "Finding Nemo" "Finding Dory")) (hash 'species "glaceon" 'color "also blue" 'films #f))]))
procedure
(unnest-wider df column-name [ #:index-prefix index-prefix #:remove? remove?]) β data-frame? df : data-frame? column-name : string? index-prefix : string? = "idx-" remove? : boolean? = #t
If column-nameβs data is "list-like" (returns one element per iteration), the column names will be index-prefix concatenated with its index plus one. Otherwise, the column names will be the "keys" (first element) of the sequence.
If the keys of the given sequence are not strings, they will be converted to strings with ~a.
If remove? is true, column-name will be removed from the result data-frame.
> (define expanded-metadata (~> deep-df (unnest-wider "metadata") introspect))
data-frame: 3 rows x 4 columns
βββββββββββ¬ββββββββββ¬ββββββββββ¬ββββββββββ
βcharacterβcolor βfilms βspecies β
βββββββββββΌββββββββββΌββββββββββΌββββββββββ€
βToothlessβblack β#<vector>βdragon β
βββββββββββΌββββββββββΌββββββββββΌββββββββββ€
βDory βblue β#<vector>βblue tangβ
βββββββββββΌββββββββββΌββββββββββΌββββββββββ€
βHolly βalso blueβ#f βglaceon β
βββββββββββ΄ββββββββββ΄ββββββββββ΄ββββββββββ
> (~> expanded-metadata (unnest-wider "films") (slice ["character" "idx-1" "idx-2" "idx-3"]) show)
data-frame: 3 rows x 4 columns
βββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββ¬βββββββββββββββββββββββββββ
βcharacterβidx-3 βidx-1 βidx-2 β
βββββββββββΌβββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββΌβββββββββββββββββββββββββββ€
βToothlessβHow to Train Your Dragon: The Hidden WorldβHow to Train Your DragonβHow to Train Your Dragon 2β
βββββββββββΌβββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββΌβββββββββββββββββββββββββββ€
βDory β#f βFinding Nemo βFinding Dory β
βββββββββββΌβββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββΌβββββββββββββββββββββββββββ€
βHolly β#f β#f β#f β
βββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββ΄βββββββββββββββββββββββββββ
procedure
(unnest-longer df column-name [ #:keys-to keys-to #:values-to values-to #:remove? remove?]) β data-frame? df : data-frame? column-name : string? keys-to : (or/c string? #f) = #f values-to : (or/c string? #f) = #f remove? : boolean? = #t
If values-to is unspecified or #f, it defaults to column-name.
If keys-to is unspecified or #f, it defaults to (string-append column-name "-keys").
If column-nameβs data is "list-like" (returns one element per iteration), the column keys-to is not added, even if it specified.
If remove? is #t, column-name will be removed from the result data-frame, unless column-name is equal to values-to.
> (~> deep-df (unnest-longer "metadata") show)
data-frame: 9 rows x 3 columns
βββββββββββββββ¬ββββββββββ¬ββββββββββ
βmetadata-keysβcharacterβmetadata β
βββββββββββββββΌββββββββββΌββββββββββ€
βcolor βToothlessβblack β
βββββββββββββββΌββββββββββΌββββββββββ€
βspecies βToothlessβdragon β
βββββββββββββββΌββββββββββΌββββββββββ€
βfilms βToothlessβ#<vector>β
βββββββββββββββΌββββββββββΌββββββββββ€
βcolor βDory βblue β
βββββββββββββββΌββββββββββΌββββββββββ€
βspecies βDory βblue tangβ
βββββββββββββββΌββββββββββΌββββββββββ€
βfilms βDory β#<vector>β
βββββββββββββββ΄ββββββββββ΄ββββββββββ
3 rows, 0 cols elided
(use (show df everything #:n-rows 'all) for full frame)
> (~> expanded-metadata (unnest-longer "films") show)
data-frame: 6 rows x 4 columns
ββββββββββββββββββββββββββββββββββββββββββββ¬ββββββββββ¬ββββββββββ¬ββββββββββ
βfilms βcolor βspecies βcharacterβ
ββββββββββββββββββββββββββββββββββββββββββββΌββββββββββΌββββββββββΌββββββββββ€
βHow to Train Your Dragon βblack βdragon βToothlessβ
ββββββββββββββββββββββββββββββββββββββββββββΌββββββββββΌββββββββββΌββββββββββ€
βHow to Train Your Dragon 2 βblack βdragon βToothlessβ
ββββββββββββββββββββββββββββββββββββββββββββΌββββββββββΌββββββββββΌββββββββββ€
βHow to Train Your Dragon: The Hidden Worldβblack βdragon βToothlessβ
ββββββββββββββββββββββββββββββββββββββββββββΌββββββββββΌββββββββββΌββββββββββ€
βFinding Nemo βblue βblue tangβDory β
ββββββββββββββββββββββββββββββββββββββββββββΌββββββββββΌββββββββββΌββββββββββ€
βFinding Dory βblue βblue tangβDory β
ββββββββββββββββββββββββββββββββββββββββββββΌββββββββββΌββββββββββΌββββββββββ€
β#f βalso blueβglaceon βHolly β
ββββββββββββββββββββββββββββββββββββββββββββ΄ββββββββββ΄ββββββββββ΄ββββββββββ