10 Pivoting
The majority of data found in the wild will not be tidy, and therefore not work with the above operations. The goal of pivoting is to help make your data tidy.
Every column is a variable.
Every row is an observation.
Every cell is a single value.
So, for example, if you were to have a column corresponding to a value and not a variable (such as a site), or a column corresponding to a "type" of observation, these operations would help.
syntax
(pivot-longer df slice-spec #:names-to names-to #:values-to values-to)
df : data-frame?
names-to : string?
values-to : string?
slice-spec is an expression in the slice sub-language. See Slicing for more information on this language. Any not column returned by evaluating slice-spec will be brought along so that former observations line up, but its data will not be modified.
All the names of every column selected by slice-spec is brought into a new column with name names-to, and all the values are brought into a new column with name values-to.
This function does not work with grouped data frames, as it has potential to destroy some internal invariants.
> (define wide-df (row-df [day hour a b c] 1 10 97 84 55 2 11 78 47 54))
> (~> wide-df (pivot-longer ["a" "b" "c"] #:names-to "site" #:values-to "catch") show)
data-frame: 6 rows x 4 columns
ββββββ¬ββββββ¬βββββ¬ββββ
βhourβcatchβsiteβdayβ
ββββββΌββββββΌβββββΌββββ€
β10 β97 βa β1 β
ββββββΌββββββΌβββββΌββββ€
β11 β78 βa β2 β
ββββββΌββββββΌβββββΌββββ€
β10 β84 βb β1 β
ββββββΌββββββΌβββββΌββββ€
β11 β47 βb β2 β
ββββββΌββββββΌβββββΌββββ€
β10 β55 βc β1 β
ββββββΌββββββΌβββββΌββββ€
β11 β54 βc β2 β
ββββββ΄ββββββ΄βββββ΄ββββ
procedure
(pivot-wider df #:names-from names-from #:values-from values-from) β data-frame? df : data-frame? names-from : string? values-from : string?
names-from is the column to create new columns from, and values-from is the column to get the corresponding data from.
If a value is not found in the long format data-frame, it will be replaced with "NA" (#f).
This function does not work with grouped data frames, as it has potential to destroy some internal invariants.
> (define long-df1 (row-df [day grp val] 1 "A" 10 1 "B" 20 2 "B" 30))
> (~> long-df1 (pivot-wider #:names-from "grp" #:values-from "val") show)
data-frame: 2 rows x 3 columns
ββββ¬βββ¬ββββ
βB βA βdayβ
ββββΌβββΌββββ€
β20β10β1 β
ββββΌβββΌββββ€
β30β#fβ2 β
ββββ΄βββ΄ββββ
> (define long-df2 (row-df [day hour grp val] 1 10 "a" 83 1 10 "b" 78 1 11 "a" 80 1 11 "b" 105 2 10 "a" 95 2 10 "b" 77 2 11 "a" 96 2 11 "b" 99))
> (~> long-df2 (pivot-wider #:names-from "grp" #:values-from "val") show)
data-frame: 4 rows x 4 columns
βββββ¬βββ¬βββββ¬ββββ
βb βa βhourβdayβ
βββββΌβββΌβββββΌββββ€
β78 β83β10 β1 β
βββββΌβββΌβββββΌββββ€
β77 β95β10 β2 β
βββββΌβββΌβββββΌββββ€
β105β80β11 β1 β
βββββΌβββΌβββββΌββββ€
β99 β96β11 β2 β
βββββ΄βββ΄βββββ΄ββββ