── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0 ✔ purrr 1.0.1
✔ tibble 3.1.8 ✔ dplyr 1.1.0
✔ tidyr 1.3.0 ✔ stringr 1.5.0
✔ readr 2.1.3 ✔ forcats 1.0.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
データ加工・操作用関数
データの結合 (mutating join)
x
, y
, by
は関数の引数です。by
で指定したキー(変数名)が一致するように行を合わせることができる。
full_join(x, y, by)
:全ての x
と y
行と列を結合する。
inner_join(x, y, by)
:x
と y
で共通する行と列を結合する。
left_join(x, y, by)
:左側()第 1 引数のtibble に y
の変数を追加する。
right_join(x, y, by)
:右側()第 2 引数のtibble に x
の変数を追加する。
データの結合 (join)
bind_cols()
:渡したtibbleを横に結合する(行数が異なったらエラーが発生する)。
bind_rows()
:渡した tibble を立てに結合する(一致する変数名を合わせてくれます)。
mutating join のつかいかた
X = tibble (x = c ("A" , "B" , "C" , "G" ), y = c (NA , rnorm (3 , mean = 5 )))
Y = tibble (x = c ("A" , "C" , "D" , "E" ), z = c (rpois (3 , lambda = 5 ), NA ))
# A tibble: 4 × 2
x y
<chr> <dbl>
1 A NA
2 B 4.28
3 C 3.72
4 G 3.37
# A tibble: 4 × 2
x z
<chr> <int>
1 A 4
2 C 2
3 D 5
4 E NA
# A tibble: 6 × 3
x y z
<chr> <dbl> <int>
1 A NA 4
2 B 4.28 NA
3 C 3.72 2
4 G 3.37 NA
5 D NA 5
6 E NA NA
inner_join (X, Y, by = "x" )
# A tibble: 2 × 3
x y z
<chr> <dbl> <int>
1 A NA 4
2 C 3.72 2
left_join (X, Y, by = "x" )
# A tibble: 4 × 3
x y z
<chr> <dbl> <int>
1 A NA 4
2 B 4.28 NA
3 C 3.72 2
4 G 3.37 NA
right_join (X, Y, by = "x" )
# A tibble: 4 × 3
x y z
<chr> <dbl> <int>
1 A NA 4
2 C 3.72 2
3 D NA 5
4 E NA NA
# A tibble: 8 × 3
x y z
<chr> <dbl> <int>
1 A NA NA
2 B 4.28 NA
3 C 3.72 NA
4 G 3.37 NA
5 A NA 4
6 C NA 2
7 D NA 5
8 E NA NA
bind_rows ("X" = X, "Y" = Y, .id = "origin" )
# A tibble: 8 × 4
origin x y z
<chr> <chr> <dbl> <int>
1 X A NA NA
2 X B 4.28 NA
3 X C 3.72 NA
4 X G 3.37 NA
5 Y A NA 4
6 Y C NA 2
7 Y D NA 5
8 Y E NA NA
New names:
• `x` -> `x...1`
• `x` -> `x...3`
# A tibble: 4 × 4
x...1 y x...3 z
<chr> <dbl> <chr> <int>
1 A NA A 4
2 B 4.28 C 2
3 C 3.72 D 5
4 G 3.37 E NA
行と列の加工・操作用関数
列における操作
mutate()
:既存の変数の書き換えや変数の追加する
select()
:既存の変数を選らぶ
rename()
:既存の変数の名前を変える
pull()
:既存の変数をリストとして抽出する
`relocate()``:指定した列の位置を変える
行における操作
filter()
:条件を満たした行を返す
distinct()
:指定した変数から重複している行を外す
slice()
:指定した行インデックスを返す
arrange()
:指定した列の昇順で行を並べ替える
列の加工
iris |> as_tibble () |> mutate (P2 = Petal.Length^ 2 )
# A tibble: 150 × 6
Sepal.Length Sepal.Width Petal.Length Petal.Width Species P2
<dbl> <dbl> <dbl> <dbl> <fct> <dbl>
1 5.1 3.5 1.4 0.2 setosa 1.96
2 4.9 3 1.4 0.2 setosa 1.96
3 4.7 3.2 1.3 0.2 setosa 1.69
4 4.6 3.1 1.5 0.2 setosa 2.25
5 5 3.6 1.4 0.2 setosa 1.96
6 5.4 3.9 1.7 0.4 setosa 2.89
7 4.6 3.4 1.4 0.3 setosa 1.96
8 5 3.4 1.5 0.2 setosa 2.25
9 4.4 2.9 1.4 0.2 setosa 1.96
10 4.9 3.1 1.5 0.1 setosa 2.25
# … with 140 more rows
iris |> as_tibble () |> select (Species, Petal.Length)
# A tibble: 150 × 2
Species Petal.Length
<fct> <dbl>
1 setosa 1.4
2 setosa 1.4
3 setosa 1.3
4 setosa 1.5
5 setosa 1.4
6 setosa 1.7
7 setosa 1.4
8 setosa 1.5
9 setosa 1.4
10 setosa 1.5
# … with 140 more rows
iris |> as_tibble () |> select (matches ("Length" ))
# A tibble: 150 × 2
Sepal.Length Petal.Length
<dbl> <dbl>
1 5.1 1.4
2 4.9 1.4
3 4.7 1.3
4 4.6 1.5
5 5 1.4
6 5.4 1.7
7 4.6 1.4
8 5 1.5
9 4.4 1.4
10 4.9 1.5
# … with 140 more rows
iris |> as_tibble () |> rename (PL = Petal.Length)
# A tibble: 150 × 5
Sepal.Length Sepal.Width PL Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <fct>
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
# … with 140 more rows
iris |> as_tibble () |>
rename_with (~ str_replace_all (.x, "[(a-z.)]" , "" ), .cols = matches ("(Pet)|(Sep)" ))
# A tibble: 150 × 5
SL SW PL PW Species
<dbl> <dbl> <dbl> <dbl> <fct>
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
# … with 140 more rows
iris |> as_tibble () |> pull (Species)
[1] setosa setosa setosa setosa setosa setosa
[7] setosa setosa setosa setosa setosa setosa
[13] setosa setosa setosa setosa setosa setosa
[19] setosa setosa setosa setosa setosa setosa
[25] setosa setosa setosa setosa setosa setosa
[31] setosa setosa setosa setosa setosa setosa
[37] setosa setosa setosa setosa setosa setosa
[43] setosa setosa setosa setosa setosa setosa
[49] setosa setosa versicolor versicolor versicolor versicolor
[55] versicolor versicolor versicolor versicolor versicolor versicolor
[61] versicolor versicolor versicolor versicolor versicolor versicolor
[67] versicolor versicolor versicolor versicolor versicolor versicolor
[73] versicolor versicolor versicolor versicolor versicolor versicolor
[79] versicolor versicolor versicolor versicolor versicolor versicolor
[85] versicolor versicolor versicolor versicolor versicolor versicolor
[91] versicolor versicolor versicolor versicolor versicolor versicolor
[97] versicolor versicolor versicolor versicolor virginica virginica
[103] virginica virginica virginica virginica virginica virginica
[109] virginica virginica virginica virginica virginica virginica
[115] virginica virginica virginica virginica virginica virginica
[121] virginica virginica virginica virginica virginica virginica
[127] virginica virginica virginica virginica virginica virginica
[133] virginica virginica virginica virginica virginica virginica
[139] virginica virginica virginica virginica virginica virginica
[145] virginica virginica virginica virginica virginica virginica
Levels: setosa versicolor virginica
iris |> as_tibble () |> relocate (Species, .before = "Sepal.Length" )
# A tibble: 150 × 5
Species Sepal.Length Sepal.Width Petal.Length Petal.Width
<fct> <dbl> <dbl> <dbl> <dbl>
1 setosa 5.1 3.5 1.4 0.2
2 setosa 4.9 3 1.4 0.2
3 setosa 4.7 3.2 1.3 0.2
4 setosa 4.6 3.1 1.5 0.2
5 setosa 5 3.6 1.4 0.2
6 setosa 5.4 3.9 1.7 0.4
7 setosa 4.6 3.4 1.4 0.3
8 setosa 5 3.4 1.5 0.2
9 setosa 4.4 2.9 1.4 0.2
10 setosa 4.9 3.1 1.5 0.1
# … with 140 more rows
iris |> as_tibble () |> relocate (Species, matches ("Length" ), .before = "Sepal.Length" )
# A tibble: 150 × 5
Species Sepal.Length Petal.Length Sepal.Width Petal.Width
<fct> <dbl> <dbl> <dbl> <dbl>
1 setosa 5.1 1.4 3.5 0.2
2 setosa 4.9 1.4 3 0.2
3 setosa 4.7 1.3 3.2 0.2
4 setosa 4.6 1.5 3.1 0.2
5 setosa 5 1.4 3.6 0.2
6 setosa 5.4 1.7 3.9 0.4
7 setosa 4.6 1.4 3.4 0.3
8 setosa 5 1.5 3.4 0.2
9 setosa 4.4 1.4 2.9 0.2
10 setosa 4.9 1.5 3.1 0.1
# … with 140 more rows
行の加工
iris |> as_tibble () |> filter (str_detect (Species, "versicolor" ))
# A tibble: 50 × 5
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <fct>
1 7 3.2 4.7 1.4 versicolor
2 6.4 3.2 4.5 1.5 versicolor
3 6.9 3.1 4.9 1.5 versicolor
4 5.5 2.3 4 1.3 versicolor
5 6.5 2.8 4.6 1.5 versicolor
6 5.7 2.8 4.5 1.3 versicolor
7 6.3 3.3 4.7 1.6 versicolor
8 4.9 2.4 3.3 1 versicolor
9 6.6 2.9 4.6 1.3 versicolor
10 5.2 2.7 3.9 1.4 versicolor
# … with 40 more rows
iris |> as_tibble () |> filter (Petal.Length > 6 & Sepal.Length > 7.5 )
# A tibble: 6 × 5
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <fct>
1 7.6 3 6.6 2.1 virginica
2 7.7 3.8 6.7 2.2 virginica
3 7.7 2.6 6.9 2.3 virginica
4 7.7 2.8 6.7 2 virginica
5 7.9 3.8 6.4 2 virginica
6 7.7 3 6.1 2.3 virginica
iris |> as_tibble () |> distinct (Species)
# A tibble: 3 × 1
Species
<fct>
1 setosa
2 versicolor
3 virginica
iris |> as_tibble () |> distinct (Petal.Length, .keep_all = T)
# A tibble: 43 × 5
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <fct>
1 5.1 3.5 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.4 3.9 1.7 0.4 setosa
5 4.8 3.4 1.6 0.2 setosa
6 4.3 3 1.1 0.1 setosa
7 5.8 4 1.2 0.2 setosa
8 4.6 3.6 1 0.2 setosa
9 4.8 3.4 1.9 0.2 setosa
10 7 3.2 4.7 1.4 versicolor
# … with 33 more rows
iris |> as_tibble () |> slice (1 : 5 )
# A tibble: 5 × 5
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <fct>
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5 3.6 1.4 0.2 setosa
iris |> as_tibble () |> slice_head (n = 2 )
# A tibble: 2 × 5
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <fct>
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3 1.4 0.2 setosa
iris |> as_tibble () |> slice_tail (n = 2 )
# A tibble: 2 × 5
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <fct>
1 6.2 3.4 5.4 2.3 virginica
2 5.9 3 5.1 1.8 virginica
iris |> as_tibble () |> slice_min (Petal.Length)
# A tibble: 1 × 5
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <fct>
1 4.6 3.6 1 0.2 setosa
iris |> as_tibble () |> slice_max (Petal.Length)
# A tibble: 1 × 5
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <fct>
1 7.7 2.6 6.9 2.3 virginica
iris |> as_tibble () |> slice_sample (n = 3 )
# A tibble: 3 × 5
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <fct>
1 5.7 4.4 1.5 0.4 setosa
2 6.1 2.8 4.7 1.2 versicolor
3 6.3 3.3 4.7 1.6 versicolor
iris |> as_tibble () |> arrange (Sepal.Length)
# A tibble: 150 × 5
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <fct>
1 4.3 3 1.1 0.1 setosa
2 4.4 2.9 1.4 0.2 setosa
3 4.4 3 1.3 0.2 setosa
4 4.4 3.2 1.3 0.2 setosa
5 4.5 2.3 1.3 0.3 setosa
6 4.6 3.1 1.5 0.2 setosa
7 4.6 3.4 1.4 0.3 setosa
8 4.6 3.6 1 0.2 setosa
9 4.6 3.2 1.4 0.2 setosa
10 4.7 3.2 1.3 0.2 setosa
# … with 140 more rows
iris |> as_tibble () |>
arrange (desc (Sepal.Length), desc (Sepal.Width))
# A tibble: 150 × 5
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <fct>
1 7.9 3.8 6.4 2 virginica
2 7.7 3.8 6.7 2.2 virginica
3 7.7 3 6.1 2.3 virginica
4 7.7 2.8 6.7 2 virginica
5 7.7 2.6 6.9 2.3 virginica
6 7.6 3 6.6 2.1 virginica
7 7.4 2.8 6.1 1.9 virginica
8 7.3 2.9 6.3 1.8 virginica
9 7.2 3.6 6.1 2.5 virginica
10 7.2 3.2 6 1.8 virginica
# … with 140 more rows
グループ化・ネストに関する関数
group_by()
:tibble
をグループ化する
group_nest()
:グループ化した tibble
をネスト(入れ子)する
nest()
:渡した列をネストする
unnest()
:ネストされている列を展開(アンネスト)する
group_map()
:グループ化した tibble
に関数を適応して、リストを返す
group_modify()
:グループ化した tibble
に関数を適応して、tibble を返す
tibble
のグループ化
iris |> as_tibble () |> select (1 : 3 )
# A tibble: 150 × 3
Sepal.Length Sepal.Width Petal.Length
<dbl> <dbl> <dbl>
1 5.1 3.5 1.4
2 4.9 3 1.4
3 4.7 3.2 1.3
4 4.6 3.1 1.5
5 5 3.6 1.4
6 5.4 3.9 1.7
7 4.6 3.4 1.4
8 5 3.4 1.5
9 4.4 2.9 1.4
10 4.9 3.1 1.5
# … with 140 more rows
iris |> as_tibble () |> group_by (Species) |> select (1 : 3 )
Adding missing grouping variables: `Species`
# A tibble: 150 × 4
# Groups: Species [3]
Species Sepal.Length Sepal.Width Petal.Length
<fct> <dbl> <dbl> <dbl>
1 setosa 5.1 3.5 1.4
2 setosa 4.9 3 1.4
3 setosa 4.7 3.2 1.3
4 setosa 4.6 3.1 1.5
5 setosa 5 3.6 1.4
6 setosa 5.4 3.9 1.7
7 setosa 4.6 3.4 1.4
8 setosa 5 3.4 1.5
9 setosa 4.4 2.9 1.4
10 setosa 4.9 3.1 1.5
# … with 140 more rows
iris |> as_tibble () |> group_nest (Species)
# A tibble: 3 × 2
Species data
<fct> <list<tibble[,4]>>
1 setosa [50 × 4]
2 versicolor [50 × 4]
3 virginica [50 × 4]
iris |> as_tibble () |> nest (data = matches ("Length|Width" ))
# A tibble: 3 × 2
Species data
<fct> <list>
1 setosa <tibble [50 × 4]>
2 versicolor <tibble [50 × 4]>
3 virginica <tibble [50 × 4]>
iris |> as_tibble () |> group_nest (Species) |> unnest (data)
# A tibble: 150 × 5
Species Sepal.Length Sepal.Width Petal.Length Petal.Width
<fct> <dbl> <dbl> <dbl> <dbl>
1 setosa 5.1 3.5 1.4 0.2
2 setosa 4.9 3 1.4 0.2
3 setosa 4.7 3.2 1.3 0.2
4 setosa 4.6 3.1 1.5 0.2
5 setosa 5 3.6 1.4 0.2
6 setosa 5.4 3.9 1.7 0.4
7 setosa 4.6 3.4 1.4 0.3
8 setosa 5 3.4 1.5 0.2
9 setosa 4.4 2.9 1.4 0.2
10 setosa 4.9 3.1 1.5 0.1
# … with 140 more rows
iris |> as_tibble () |> group_by (Species) |> group_map (~ head (.x, n = 2 ))
[[1]]
# A tibble: 2 × 4
Sepal.Length Sepal.Width Petal.Length Petal.Width
<dbl> <dbl> <dbl> <dbl>
1 5.1 3.5 1.4 0.2
2 4.9 3 1.4 0.2
[[2]]
# A tibble: 2 × 4
Sepal.Length Sepal.Width Petal.Length Petal.Width
<dbl> <dbl> <dbl> <dbl>
1 7 3.2 4.7 1.4
2 6.4 3.2 4.5 1.5
[[3]]
# A tibble: 2 × 4
Sepal.Length Sepal.Width Petal.Length Petal.Width
<dbl> <dbl> <dbl> <dbl>
1 6.3 3.3 6 2.5
2 5.8 2.7 5.1 1.9
iris |> as_tibble () |> group_by (Species) |> group_modify (~ head (.x, n = 2 ))
# A tibble: 6 × 5
# Groups: Species [3]
Species Sepal.Length Sepal.Width Petal.Length Petal.Width
<fct> <dbl> <dbl> <dbl> <dbl>
1 setosa 5.1 3.5 1.4 0.2
2 setosa 4.9 3 1.4 0.2
3 versicolor 7 3.2 4.7 1.4
4 versicolor 6.4 3.2 4.5 1.5
5 virginica 6.3 3.3 6 2.5
6 virginica 5.8 2.7 5.1 1.9
その他の関数
drop_na()
:NA
(欠損値)を含む行を削除
replace_na()
:NA
を他の値と書き換える
fill()
:NA
を直前の値で埋める
separate()
:文字列の変数を任意の区切りで複数変数に分裂する
unite()
:複数の変数を任意の区切りで 1 列にまとめる
# A tibble: 4 × 2
x y
<chr> <dbl>
1 A NA
2 B 4.28
3 C 3.72
4 G 3.37
# A tibble: 3 × 2
x y
<chr> <dbl>
1 B 4.28
2 C 3.72
3 G 3.37
X |> replace_na (list (x = "Z" , y = 0 ))
# A tibble: 4 × 2
x y
<chr> <dbl>
1 A 0
2 B 4.28
3 C 3.72
4 G 3.37
X |> mutate (y = replace_na (y, 0 ))
# A tibble: 4 × 2
x y
<chr> <dbl>
1 A 0
2 B 4.28
3 C 3.72
4 G 3.37
# A tibble: 4 × 2
x z
<chr> <int>
1 A 4
2 C 2
3 D 5
4 E 5
tibble (x = c (NA , "Iris.setosa" , "Iris.virginica" , "Iris.versicolor" )) |>
separate (x, into = c ("Genus" , "Species" ))
# A tibble: 4 × 2
Genus Species
<chr> <chr>
1 <NA> <NA>
2 Iris setosa
3 Iris virginica
4 Iris versicolor
tibble (x = rep ("Iris" , 3 ), y = c ("setosa" , "virginica" , "versicolor" )) |>
unite (Species, x, y, sep = "_" )
# A tibble: 3 × 1
Species
<chr>
1 Iris_setosa
2 Iris_virginica
3 Iris_versicolor
ピボット・tibble
を変形する関数
pivot_longer()
:tibble
を wide format (横広)から long format (縦長)に変える
pivot_wider()
:tibble
をlong format から wide format に変える
重要な引数
pivot_longer()
cols
:動かす変数
names_to
:動かした変数の名前の移動先
values_to
:動かした変数の値の移動先
names_transform
:移動先の変数のタイプを変換
pivot_wider()
id_cols
:行(値)を区別するための列名
names_from
:移動先の列名になる変数
values_from
:移動したい値
values_fill
:存在しない要素の埋め込み方法
values_fn
:行の区別ができないときの処理(デフォルトはリスト)
pivot_longer()
の使い方
relig_income |> as_tibble ()
# A tibble: 18 × 11
religion `<$10k` $10-2…¹ $20-3…² $30-4…³ $40-5…⁴ $50-7…⁵ $75-1…⁶ $100-…⁷
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Agnostic 27 34 60 81 76 137 122 109
2 Atheist 12 27 37 52 35 70 73 59
3 Buddhist 27 21 30 34 33 58 62 39
4 Catholic 418 617 732 670 638 1116 949 792
5 Don’t know/r… 15 14 15 11 10 35 21 17
6 Evangelical … 575 869 1064 982 881 1486 949 723
7 Hindu 1 9 7 9 11 34 47 48
8 Historically… 228 244 236 238 197 223 131 81
9 Jehovah's Wi… 20 27 24 24 21 30 15 11
10 Jewish 19 19 25 25 30 95 69 87
11 Mainline Prot 289 495 619 655 651 1107 939 753
12 Mormon 29 40 48 51 56 112 85 49
13 Muslim 6 7 9 10 9 23 16 8
14 Orthodox 13 17 23 32 32 47 38 42
15 Other Christ… 9 7 11 13 13 14 18 14
16 Other Faiths 20 33 40 46 49 63 46 40
17 Other World … 5 2 3 4 2 7 3 4
18 Unaffiliated 217 299 374 365 341 528 407 321
# … with 2 more variables: `>150k` <dbl>, `Don't know/refused` <dbl>, and
# abbreviated variable names ¹`$10-20k`, ²`$20-30k`, ³`$30-40k`, ⁴`$40-50k`,
# ⁵`$50-75k`, ⁶`$75-100k`, ⁷`$100-150k`
relig_income |> as_tibble () |>
pivot_longer (! religion, names_to = "income" , values_to = "count" )
# A tibble: 180 × 3
religion income count
<chr> <chr> <dbl>
1 Agnostic <$10k 27
2 Agnostic $10-20k 34
3 Agnostic $20-30k 60
4 Agnostic $30-40k 81
5 Agnostic $40-50k 76
6 Agnostic $50-75k 137
7 Agnostic $75-100k 122
8 Agnostic $100-150k 109
9 Agnostic >150k 84
10 Agnostic Don't know/refused 96
# … with 170 more rows
# A tibble: 317 × 79
artist track date.ent…¹ wk1 wk2 wk3 wk4 wk5 wk6 wk7 wk8 wk9
<chr> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2 Pac Baby… 2000-02-26 87 82 72 77 87 94 99 NA NA
2 2Ge+h… The … 2000-09-02 91 87 92 NA NA NA NA NA NA
3 3 Doo… Kryp… 2000-04-08 81 70 68 67 66 57 54 53 51
4 3 Doo… Loser 2000-10-21 76 76 72 69 67 65 55 59 62
5 504 B… Wobb… 2000-04-15 57 34 25 17 17 31 36 49 53
6 98^0 Give… 2000-08-19 51 39 34 26 26 19 2 2 3
7 A*Tee… Danc… 2000-07-08 97 97 96 95 100 NA NA NA NA
8 Aaliy… I Do… 2000-01-29 84 62 51 41 38 35 35 38 38
9 Aaliy… Try … 2000-03-18 59 53 38 28 21 18 16 14 12
10 Adams… Open… 2000-08-26 76 76 74 69 68 67 61 58 57
# … with 307 more rows, 67 more variables: wk10 <dbl>, wk11 <dbl>, wk12 <dbl>,
# wk13 <dbl>, wk14 <dbl>, wk15 <dbl>, wk16 <dbl>, wk17 <dbl>, wk18 <dbl>,
# wk19 <dbl>, wk20 <dbl>, wk21 <dbl>, wk22 <dbl>, wk23 <dbl>, wk24 <dbl>,
# wk25 <dbl>, wk26 <dbl>, wk27 <dbl>, wk28 <dbl>, wk29 <dbl>, wk30 <dbl>,
# wk31 <dbl>, wk32 <dbl>, wk33 <dbl>, wk34 <dbl>, wk35 <dbl>, wk36 <dbl>,
# wk37 <dbl>, wk38 <dbl>, wk39 <dbl>, wk40 <dbl>, wk41 <dbl>, wk42 <dbl>,
# wk43 <dbl>, wk44 <dbl>, wk45 <dbl>, wk46 <dbl>, wk47 <dbl>, wk48 <dbl>, …
billboard |> as_tibble () |>
pivot_longer (col = starts_with ("wk" ),
names_to = "week" , names_prefix = "wk" ,
values_to = "rank" , values_drop_na = TRUE )
# A tibble: 5,307 × 5
artist track date.entered week rank
<chr> <chr> <date> <chr> <dbl>
1 2 Pac Baby Don't Cry (Keep... 2000-02-26 1 87
2 2 Pac Baby Don't Cry (Keep... 2000-02-26 2 82
3 2 Pac Baby Don't Cry (Keep... 2000-02-26 3 72
4 2 Pac Baby Don't Cry (Keep... 2000-02-26 4 77
5 2 Pac Baby Don't Cry (Keep... 2000-02-26 5 87
6 2 Pac Baby Don't Cry (Keep... 2000-02-26 6 94
7 2 Pac Baby Don't Cry (Keep... 2000-02-26 7 99
8 2Ge+her The Hardest Part Of ... 2000-09-02 1 91
9 2Ge+her The Hardest Part Of ... 2000-09-02 2 87
10 2Ge+her The Hardest Part Of ... 2000-09-02 3 92
# … with 5,297 more rows
# A tibble: 7,240 × 60
country iso2 iso3 year new_s…¹ new_s…² new_s…³ new_s…⁴ new_s…⁵ new_s…⁶
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Afghanistan AF AFG 1980 NA NA NA NA NA NA
2 Afghanistan AF AFG 1981 NA NA NA NA NA NA
3 Afghanistan AF AFG 1982 NA NA NA NA NA NA
4 Afghanistan AF AFG 1983 NA NA NA NA NA NA
5 Afghanistan AF AFG 1984 NA NA NA NA NA NA
6 Afghanistan AF AFG 1985 NA NA NA NA NA NA
7 Afghanistan AF AFG 1986 NA NA NA NA NA NA
8 Afghanistan AF AFG 1987 NA NA NA NA NA NA
9 Afghanistan AF AFG 1988 NA NA NA NA NA NA
10 Afghanistan AF AFG 1989 NA NA NA NA NA NA
# … with 7,230 more rows, 50 more variables: new_sp_m65 <dbl>,
# new_sp_f014 <dbl>, new_sp_f1524 <dbl>, new_sp_f2534 <dbl>,
# new_sp_f3544 <dbl>, new_sp_f4554 <dbl>, new_sp_f5564 <dbl>,
# new_sp_f65 <dbl>, new_sn_m014 <dbl>, new_sn_m1524 <dbl>,
# new_sn_m2534 <dbl>, new_sn_m3544 <dbl>, new_sn_m4554 <dbl>,
# new_sn_m5564 <dbl>, new_sn_m65 <dbl>, new_sn_f014 <dbl>,
# new_sn_f1524 <dbl>, new_sn_f2534 <dbl>, new_sn_f3544 <dbl>, …
who %>% as_tibble () |>
pivot_longer (cols = new_sp_m014: newrel_f65,
names_to = c ("diagnosis" , "gender" , "age" ),
names_pattern = "new_?(.*)_(.)(.*)" ,
values_to = "count" , values_drop_na = TRUE )
# A tibble: 76,046 × 8
country iso2 iso3 year diagnosis gender age count
<chr> <chr> <chr> <dbl> <chr> <chr> <chr> <dbl>
1 Afghanistan AF AFG 1997 sp m 014 0
2 Afghanistan AF AFG 1997 sp m 1524 10
3 Afghanistan AF AFG 1997 sp m 2534 6
4 Afghanistan AF AFG 1997 sp m 3544 3
5 Afghanistan AF AFG 1997 sp m 4554 5
6 Afghanistan AF AFG 1997 sp m 5564 2
7 Afghanistan AF AFG 1997 sp m 65 0
8 Afghanistan AF AFG 1997 sp f 014 5
9 Afghanistan AF AFG 1997 sp f 1524 38
10 Afghanistan AF AFG 1997 sp f 2534 36
# … with 76,036 more rows
# A tibble: 11 × 8
x1 x2 x3 x4 y1 y2 y3 y4
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 10 10 10 8 8.04 9.14 7.46 6.58
2 8 8 8 8 6.95 8.14 6.77 5.76
3 13 13 13 8 7.58 8.74 12.7 7.71
4 9 9 9 8 8.81 8.77 7.11 8.84
5 11 11 11 8 8.33 9.26 7.81 8.47
6 14 14 14 8 9.96 8.1 8.84 7.04
7 6 6 6 8 7.24 6.13 6.08 5.25
8 4 4 4 19 4.26 3.1 5.39 12.5
9 12 12 12 8 10.8 9.13 8.15 5.56
10 7 7 7 8 4.82 7.26 6.42 7.91
11 5 5 5 8 5.68 4.74 5.73 6.89
anscombe %>% as_tibble () |>
pivot_longer (everything (), names_to = c (".value" , "set" ), names_pattern = "(.)(.)"
)
# A tibble: 44 × 3
set x y
<chr> <dbl> <dbl>
1 1 10 8.04
2 2 10 9.14
3 3 10 7.46
4 4 8 6.58
5 1 8 6.95
6 2 8 8.14
7 3 8 6.77
8 4 8 5.76
9 1 13 7.58
10 2 13 8.74
# … with 34 more rows
pivot_wider()
の使い方
# A tibble: 114 × 3
fish station seen
<fct> <fct> <int>
1 4842 Release 1
2 4842 I80_1 1
3 4842 Lisbon 1
4 4842 Rstr 1
5 4842 Base_TD 1
6 4842 BCE 1
7 4842 BCW 1
8 4842 BCE2 1
9 4842 BCW2 1
10 4842 MAE 1
# … with 104 more rows
fish_encounters |> as_tibble () |>
pivot_wider (names_from = station, values_from = seen)
# A tibble: 19 × 12
fish Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE MAW
<fct> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 4842 1 1 1 1 1 1 1 1 1 1 1
2 4843 1 1 1 1 1 1 1 1 1 1 1
3 4844 1 1 1 1 1 1 1 1 1 1 1
4 4845 1 1 1 1 1 NA NA NA NA NA NA
5 4847 1 1 1 NA NA NA NA NA NA NA NA
6 4848 1 1 1 1 NA NA NA NA NA NA NA
7 4849 1 1 NA NA NA NA NA NA NA NA NA
8 4850 1 1 NA 1 1 1 1 NA NA NA NA
9 4851 1 1 NA NA NA NA NA NA NA NA NA
10 4854 1 1 NA NA NA NA NA NA NA NA NA
11 4855 1 1 1 1 1 NA NA NA NA NA NA
12 4857 1 1 1 1 1 1 1 1 1 NA NA
13 4858 1 1 1 1 1 1 1 1 1 1 1
14 4859 1 1 1 1 1 NA NA NA NA NA NA
15 4861 1 1 1 1 1 1 1 1 1 1 1
16 4862 1 1 1 1 1 1 1 1 1 NA NA
17 4863 1 1 NA NA NA NA NA NA NA NA NA
18 4864 1 1 NA NA NA NA NA NA NA NA NA
19 4865 1 1 1 NA NA NA NA NA NA NA NA
# A tibble: 114 × 3
fish station seen
<fct> <fct> <int>
1 4842 Release 1
2 4842 I80_1 1
3 4842 Lisbon 1
4 4842 Rstr 1
5 4842 Base_TD 1
6 4842 BCE 1
7 4842 BCW 1
8 4842 BCE2 1
9 4842 BCW2 1
10 4842 MAE 1
# … with 104 more rows
# 存在しない組み合わせの要素を埋める
fish_encounters |> as_tibble () |>
pivot_wider (names_from = station, values_from = seen, values_fill = 0 )
# A tibble: 19 × 12
fish Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE MAW
<fct> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 4842 1 1 1 1 1 1 1 1 1 1 1
2 4843 1 1 1 1 1 1 1 1 1 1 1
3 4844 1 1 1 1 1 1 1 1 1 1 1
4 4845 1 1 1 1 1 0 0 0 0 0 0
5 4847 1 1 1 0 0 0 0 0 0 0 0
6 4848 1 1 1 1 0 0 0 0 0 0 0
7 4849 1 1 0 0 0 0 0 0 0 0 0
8 4850 1 1 0 1 1 1 1 0 0 0 0
9 4851 1 1 0 0 0 0 0 0 0 0 0
10 4854 1 1 0 0 0 0 0 0 0 0 0
11 4855 1 1 1 1 1 0 0 0 0 0 0
12 4857 1 1 1 1 1 1 1 1 1 0 0
13 4858 1 1 1 1 1 1 1 1 1 1 1
14 4859 1 1 1 1 1 0 0 0 0 0 0
15 4861 1 1 1 1 1 1 1 1 1 1 1
16 4862 1 1 1 1 1 1 1 1 1 0 0
17 4863 1 1 0 0 0 0 0 0 0 0 0
18 4864 1 1 0 0 0 0 0 0 0 0 0
19 4865 1 1 1 0 0 0 0 0 0 0 0
us_rent_income |> as_tibble ()
# A tibble: 104 × 5
GEOID NAME variable estimate moe
<chr> <chr> <chr> <dbl> <dbl>
1 01 Alabama income 24476 136
2 01 Alabama rent 747 3
3 02 Alaska income 32940 508
4 02 Alaska rent 1200 13
5 04 Arizona income 27517 148
6 04 Arizona rent 972 4
7 05 Arkansas income 23789 165
8 05 Arkansas rent 709 5
9 06 California income 29454 109
10 06 California rent 1358 3
# … with 94 more rows
us_rent_income |> as_tibble () |>
pivot_wider (names_from = variable, values_from = c (estimate, moe))
# A tibble: 52 × 6
GEOID NAME estimate_income estimate_rent moe_income moe_rent
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 01 Alabama 24476 747 136 3
2 02 Alaska 32940 1200 508 13
3 04 Arizona 27517 972 148 4
4 05 Arkansas 23789 709 165 5
5 06 California 29454 1358 109 3
6 08 Colorado 32401 1125 109 5
7 09 Connecticut 35326 1123 195 5
8 10 Delaware 31560 1076 247 10
9 11 District of Columbia 43198 1424 681 17
10 12 Florida 25952 1077 70 3
# … with 42 more rows
# us_rent_income |> as_tibble() |>
# pivot_wider(names_from = variable,
# names_sep = ".",
# values_from = c(estimate, moe))
# us_rent_income |> as_tibble() |>
# pivot_wider(names_from = variable,
# names_glue = "{variable}_{.value}",
# values_from = c(estimate, moe))
warpbreaks |> as_tibble ()
# A tibble: 54 × 3
breaks wool tension
<dbl> <fct> <fct>
1 26 A L
2 30 A L
3 54 A L
4 25 A L
5 70 A L
6 52 A L
7 51 A L
8 26 A L
9 67 A L
10 18 A M
# … with 44 more rows
warpbreaks |> as_tibble () |>
pivot_wider (names_from = wool,
values_from = breaks)
Warning: Values from `breaks` are not uniquely identified; output will contain
list-cols.
• Use `values_fn = list` to suppress this warning.
• Use `values_fn = {summary_fun}` to summarise duplicates.
• Use the following dplyr code to identify duplicates.
{data} %>%
dplyr::group_by(tension, wool) %>%
dplyr::summarise(n = dplyr::n(), .groups = "drop") %>%
dplyr::filter(n > 1L)
# A tibble: 3 × 3
tension A B
<fct> <list> <list>
1 L <dbl [9]> <dbl [9]>
2 M <dbl [9]> <dbl [9]>
3 H <dbl [9]> <dbl [9]>
warpbreaks |> as_tibble ()
# A tibble: 54 × 3
breaks wool tension
<dbl> <fct> <fct>
1 26 A L
2 30 A L
3 54 A L
4 25 A L
5 70 A L
6 52 A L
7 51 A L
8 26 A L
9 67 A L
10 18 A M
# … with 44 more rows
warpbreaks |> as_tibble () |>
pivot_wider (names_from = wool,
values_from = breaks,
values_fn = mean)
# A tibble: 3 × 3
tension A B
<fct> <dbl> <dbl>
1 L 44.6 28.2
2 M 24 28.8
3 H 24.6 18.8
不都合なデータ構造
fname = "photosynthesis1_low.csv"
dset1_low = read_csv (fname)
# A tibble: 35 × 12
sample min `0` `5` `10` `15` `20` `25` `30` `35` `40` `45`
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 9.99 10.1 10.0 10.0 10.1 9.81 10.0 10.1 10.0 10.2
2 2 0 9.97 9.99 10.1 10.0 9.99 9.94 9.91 9.95 9.79 9.97
3 3 0 10.0 10.1 10.1 9.97 10.1 10.0 10.0 10.0 9.88 10.0
4 4 0 10.0 9.87 10.1 9.87 10.0 9.95 10.2 10.0 10.1 10.1
5 5 0 9.92 9.97 9.88 9.86 10.0 9.91 10.0 9.94 10.0 9.95
6 1 5 9.67 10.0 9.93 10.5 10.4 10.7 10.8 10.6 10.7 10.6
7 2 5 9.40 9.87 10.1 10.0 10.4 10.6 10.7 10.6 10.6 10.7
8 3 5 9.37 9.84 10.2 10.3 10.5 10.6 10.5 10.6 10.7 10.7
9 4 5 9.52 9.71 9.92 10.1 10.5 10.5 10.7 10.5 10.7 10.8
10 5 5 9.65 9.83 10.1 10.4 10.4 10.6 10.5 10.5 10.7 10.7
# … with 25 more rows
sample
と min
の列はサンプル番号と時間 (minutes) の変数です。 それぞれに、サンプル番号と時間の値が入っています。 0
から 45
の列には溶存酸素濃度の値が入っています。 この時の変数名は光条件ですね。
ワイドからロングへ変換
dset1_low |>
pivot_longer (cols = matches ("[0-9]+" ), names_to = "light" ,
names_transform = list (light = as.numeric))
# A tibble: 350 × 4
sample min light value
<dbl> <dbl> <dbl> <dbl>
1 1 0 0 9.99
2 1 0 5 10.1
3 1 0 10 10.0
4 1 0 15 10.0
5 1 0 20 10.1
6 1 0 25 9.81
7 1 0 30 10.0
8 1 0 35 10.1
9 1 0 40 10.0
10 1 0 45 10.2
# … with 340 more rows
残りのデータの読み込み
dset1_high = read_csv ("photosynthesis1_high.csv" )
dset2_low = read_csv ("photosynthesis2_low.csv" )
dset2_high = read_csv ("photosynthesis2_high.csv" )
ピボットしてから結合
dset1_low = dset1_low |> pivot_longer (cols = matches ("[0-9]+" ), names_to = "light" , names_transform = list (light = as.numeric))
dset1_high = dset1_high |> pivot_longer (cols = matches ("[0-9]+" ), names_to = "light" , names_transform = list (light = as.numeric))
dset2_low = dset2_low |> pivot_longer (cols = matches ("[0-9]+" ), names_to = "light" , names_transform = list (light = as.numeric))
dset2_high = dset2_high |> pivot_longer (cols = matches ("[0-9]+" ), names_to = "light" , names_transform = list (light = as.numeric))
alldata = bind_rows (dset1_low, dset2_low, dset1_high, dset2_high)
alldata
# A tibble: 910 × 4
sample min light value
<dbl> <dbl> <dbl> <dbl>
1 1 0 0 9.99
2 1 0 5 10.1
3 1 0 10 10.0
4 1 0 15 10.0
5 1 0 20 10.1
6 1 0 25 9.81
7 1 0 30 10.0
8 1 0 35 10.1
9 1 0 40 10.0
10 1 0 45 10.2
# … with 900 more rows
結合してからピボット
Rows: 35 Columns: 12
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (12): sample, min, 0, 5, 10, 15, 20, 25, 30, 35, 40, 45
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 35 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (5): sample, min, 50, 75, 100
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 35 Columns: 12
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (12): sample, min, 0, 5, 10, 15, 20, 25, 30, 35, 40, 45
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 35 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (5): sample, min, 50, 75, 100
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
dset1 = full_join (dset1_low, dset1_high, by = c ("sample" , "min" ))
dset2 = full_join (dset2_low, dset2_high, by = c ("sample" , "min" ))
alldata = bind_rows (dset1, dset2)
alldata = alldata |>
pivot_longer (cols = matches ("[0-9]+" ), names_to = "light" ,
names_transform = list (light = as.numeric))