Dplyr rowwise not working on unnamed position identifiers - r

I'm trying to get the minimum time for each row in a dataframe. I don't know the names of the columns that I will be choosing, but I do know they will be the first to fifth columns:
data <- structure(list(Sch1 = c(99, 1903, 367),
Sch2 = c(292,248, 446),
Sch3 = c(252, 267, 465),
Sch4 = c(859, 146,360),
Sch5 = c(360, 36, 243),
Student.ID = c("Ben", "Bob", "Ali")),
.Names = c("Sch1", "Sch2", "Sch3", "Sch4", "Sch5", "Student.ID"), row.names = c(NA, 3L), class = "data.frame")
# this gets overall min for ALL rows
data %>% rowwise() %>% mutate(min_time = min(.[[1]], .[[2]], .[[3]], .[[4]], .[[5]]))
# this gets the min for EACH row
data %>% rowwise() %>% mutate(min_time = min(Sch1, Sch2, Sch3, Sch4, Sch5))
Should column notation .[[1]] return all values when in rowwise mode? I've also tried grouping on Student.ID instead of rowwise, but this doesn't make any difference

The reason column notation .[[1]] returns all values even during the grouping is is that . is not actually grouped. Basically, . is the same thing as the dataset you started with. So, when you call .[[1]], you are essentially accessing all the values in the first column.
You may have to mutate the data and add a row_number column. This allows you to index the columns you are mutating at their corresponding row numbers. The following should do:
data %>%
mutate(rn = row_number()) %>%
rowwise() %>%
mutate(min_time = min(.[[1]][rn], .[[5]][rn])) %>%
select(-rn)
Should yield:
# Sch1 Sch2 Sch3 Sch4 Sch5 Student.ID min_time
# <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
# 1 99 292 252 859 360 Ben 99
# 2 1903 248 267 146 36 Bob 36
# 3 367 446 465 360 243 Ali 243

Related

Pivot wider dataframe with difficult structure dplyr

I was working on something I thought would be simple, but maybe today my brain isn't working. My data is like this:
tibble(metric = c('income', 'income_upp', 'income_low', 'n_house', 'n_house_upp', 'n_house_low'),
value = c(120, 140, 100, 10, 8, 12))
metric value
income 120
income_low 100
income_upp 140
n 10
n_low 8
n_upp 12
And I want to pivot_wider so it looks like this:
metric value value_low value_upp
income 120 100 140
n 10 8 12
I'm having trouble separating metrics, because pivot_wider as is, brings a dataframe that's too wide:
df %>% pivot_wider(names_from = 'metric', values_from = value)
How can I achieve this or should I pivot longer after the pivot wider?
Thanks!
I think if you convert metric into a column with "value", "value_upp" and "value_low" values, you can pivot_wider:
df %>%
mutate(param = case_when(str_detect(metric, "upp") ~ "value_upp",
str_detect(metric, "low") ~ "value_low",
TRUE ~ "value"),
metric = str_remove(metric, "_low|_upp")) %>%
pivot_wider(names_from = param, values_from = value)
I like to use separate() when I have text in a column like this. This function allows you to separate a column into multiple columns if there is a separator in the function.
In particular in this example we would want to use the arguments sep="_" and into = c("metric", "state") to convert into columns with those names.
Then mutate() and pivot_wider() can be used as you had previously specified.
library(tidyverse)
df <- tribble(~metric, ~value,
"income", 120,
"income_low", 100,
"income_upp", 140,
"n", 10,
"n_low", 8,
"n_upp", 12)
df |>
separate(metric, sep = "_", into = c("metric", "state")) |>
mutate(state = ifelse(is.na(state), "value", state)) |>
pivot_wider(id_cols = metric, names_from = state, values_from = value, names_sep = "_")
#> Warning: Expected 2 pieces. Missing pieces filled with `NA` in 2 rows [1, 4].
#> # A tibble: 2 × 4
#> metric value low upp
#> <chr> <dbl> <dbl> <dbl>
#> 1 income 120 100 140
#> 2 n 10 8 12
Created on 2022-12-21 with reprex v2.0.2
Note you can use the argument names_glue or names_prefix in pivot_wider() to add the "value" as a prefix to the column names.
a data.table approach (if you can live wit the trailing underacore achter value_
library(data.table)
setDT(df)
# create some new columns based on metric
df[, c("first", "second") := tstrsplit(metric, "_")]
# metric value first second
# 1: income 120 income <NA>
# 2: income_low 100 income low
# 3: income_upp 140 income upp
# 4: n 10 n <NA>
# 5: n_low 8 n low
# 6: n_upp 12 n upp
# replace NA with ""
df[is.na(df)] <- ""
# now cast to wide, createing colnames on the fly
dcast(df, first ~ paste0("value_", second), value.var = "value")
# first value_ value_low value_upp
# 1: income 120 100 140
# 2: n 10 8 12

What if I want to transpose values in multiple columns based on unique names in another column?

Suppose I have dataframe containing columns A, B, C and D. I now want to transpose values in columns B, C and D based on unique values in column A.
Dataframe:
Key
Date Occ
Amount
123
03-04-18
45000
345
31-12-18
92045
123
17-04-18
2400
345
04-07-19
1045
I would require:
Key
Date Occ1
Amount1
Date Occ2
Amount2
123
03-04-18
45000
17-04-18
2400
345
31-12-18
92045
04-07-19
1045
1) First add a name column to identify the first and second occurrence of each Key, use pivot_wider with multiple values_from columns and finally rearrange the columns to be as shown. The last line could be omitted if the order of the columns is not important. The names_sep = "" argument could be omitted if the default of separating the final number in the column names from the prefix using underscore is acceptable.
library(dplyr)
library(tidyr)
DF %>%
group_by(Key) %>%
mutate(name = 1:n()) %>%
ungroup %>%
pivot_wider(values_from = c(`Date Occ`, Amount), names_sep = "") %>%
select(Key, `Date Occ1`, Amount1, `Date Occ2`, Amount2)
giving
# A tibble: 2 × 5
Key `Date Occ1` Amount1 `Date Occ2` Amount2
<int> <chr> <int> <chr> <int>
1 123 03-04-18 45000 17-04-18 2400
2 345 31-12-18 92045 04-07-19 1045
2) If the first instances of each key come at the beginning and the second instance at the end then this base R code could be used.
nr <- nrow(DF)
merge(head(DF, nr/2), tail(DF, nr/2), by = 1, suffixes = 1:2)
giving:
Key Date Occ1 Amount1 Date Occ2 Amount2
1 123 03-04-18 45000 17-04-18 2400
2 345 31-12-18 92045 04-07-19 1045
Note
The input in reproducible form
DF <-
structure(list(Key = c(123L, 345L, 123L, 345L), `Date Occ` = c("03-04-18",
"31-12-18", "17-04-18", "04-07-19"), Amount = c(45000L, 92045L,
2400L, 1045L)), class = "data.frame", row.names = c(NA, -4L))

Mutate column based on list of lists in R

I have a dataframe that I want to gather so that it is in tall format, and then mutate on another column with values based on membership of a string from another column in a list of lists. For example, I have the following data frame and list of lists:
dummy_data <- data.frame("id" = 1:20,"test1_10" = sample(1:100, 20),"test2_11" = sample(1:100, 20),
"test3_12" = sample(1:100, 20),"check1_20" = sample(1:100, 20),
"check2_21" = sample(1:100, 20),"sound1_30" = sample(1:100, 20),
"sound2_31" = sample(1:100, 20),"sound3_32" = sample(1:100, 20))
dummylist <- list(c('test1_','test2_','test3_'),c('check1_','check2_'),c('sound1_','sound2_','sound3_'))
names(dummylist) <- c('shipments','arrivals','departures')
And then I gather the data frame like so:
dummy_data <- dummy_data %>%
gather("part", "number", 2:ncol(.))
What I want to do is add a column that has the name of the list found in dummylist where the string before the underscore in the part column is a member. And I can do that like this:
dummydata <- dummydata %>%
mutate(Group = case_when(
str_extract(part,'.*_') %in% dummylist[[1]] ~ names(dummylist[1]),
str_extract(part,'.*_') %in% dummylist[[2]] ~ names(dummylist[2]),
str_extract(part,'.*_') %in% dummylist[[3]] ~ names(dummylist[3])
))
However, this requires a separate str_extract line for each list/group within the dummylist. And my real data has way more than 3 lists/groups. So I'm wondering if there is a more efficient way to do this mutate step to get the names of the lists in?
Any help is much appreciated, thanks!
It may be easier with a regex_left_join after converting the 'dummylist' to a two column dataset
library(fuzzyjoin)
library(dplyr)
library(tidyr)
library(tibble)
dummy_data %>%
# // reshape to long format - pivot_longer instead of gather
pivot_longer(cols = -id, names_to = 'part', values_to = 'number') %>%
# // join with the tibble/data.frame converted dummylist
regex_left_join(dummylist %>%
enframe(name = 'Group', value = 'part') %>%
unnest(part)) %>%
rename(part = part.x) %>%
select(-part.y)
-output
# A tibble: 160 × 4
id part number Group
<int> <chr> <int> <chr>
1 1 test1_10 72 shipments
2 1 test2_11 62 shipments
3 1 test3_12 17 shipments
4 1 check1_20 89 arrivals
5 1 check2_21 54 arrivals
6 1 sound1_30 39 departures
7 1 sound2_31 94 departures
8 1 sound3_32 95 departures
9 2 test1_10 77 shipments
10 2 test2_11 4 shipments
# … with 150 more rows
If you prepare your lookup table beforehand, you don't need any extra libraries, but dplyr and tidyr:
lookup <- sapply(
names(dummylist),
\(nm) { setNames(rep(nm, length(dummylist[[nm]])), dummylist[[nm]]) }
) |>
setNames(nm = NULL) |>
unlist()
lookup
# test1_ test2_ test3_ check1_ check2_ sound1_ sound2_ sound3_
# "shipments" "shipments" "shipments" "arrivals" "arrivals" "departures" "departures" "departures"
Now you just gsubing on the fly, and translating your parts, within usual mutate() verb:
dummy_data |>
pivot_longer(-id, names_to = 'part', values_to = 'number') |>
mutate(group = lookup[gsub('^(\\w+_).*$', '\\1', part)])
# # A tibble: 160 × 4
# id part number group
# <int> <chr> <int> <chr>
# 1 1 test1_10 91 shipments
# 2 1 test2_11 74 shipments
# 3 1 test3_12 46 shipments
# 4 1 check1_20 62 arrivals
# 5 1 check2_21 7 arrivals
# 6 1 sound1_30 35 departures
# 7 1 sound2_31 23 departures
# 8 1 sound3_32 84 departures
# 9 2 test1_10 59 shipments
# 10 2 test2_11 73 shipments
# # … with 150 more rows

pivot_wider() generates new dataframe filled with NULL values and other misprinted values [duplicate]

This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 1 year ago.
I am using pivot_wider() in an attempt to transform this dataframe.
subject_id test_name test_result test_unit
12 Spanish 100 print
12 English 99 online
13 Spanish 98 print
13 English 91 print
Into:
subject_id spanish_test english_test
12 100 99
13 98 91
I used pivot_wider with the following code:
test %>%
pivot_wider(id_cols = subject_id,
names_from = Test_Name,
values_from = Test_Unit)
And I got the individual test columns generated, however, they were filled with the units or NULL values. Here is the dataframe for reference:
subject_id <- c(12, 12, 13, 13)
test_name <- c("Spanish", "English", "Spanish", "English")
test_result <- c(100, 99, 98, 91)
test_unit <- c("print", "online", "print", "print")
df <- data.frame(subject_id, test_name, test_result, test_unit)
You can use pivot_wider as -
tidyr::pivot_wider(df,
id_cols = subject_id,
names_from = test_name,
values_from = test_result,
names_glue = '{test_name}_test')
# subject_id Spanish_test English_test
# <dbl> <dbl> <dbl>
#1 12 100 99
#2 13 98 91
An alternative using reshape2 along with dplyr to rename the columns.
library(reshape2)
library(dplyr)
reshape2::dcast(df, subject_id ~ test_name,
value.var = "test_result") %>%
dplyr::rename_at(vars(Spanish:English), list( ~ paste0(., "_test"))) %>%
dplyr::rename_all(tolower) %>%
dplyr::select(subject_id, spanish_test, english_test)
Output
subject_id spanish_test english_test
1 12 100 99
2 13 98 91

How to reshape a data table

I am using R and have the next table: (example)
ID Euros N Euros N Euros N
1 A 133.911,20 451 134.208,78 450 442,03 328
2 C 9.470,35 2856 26,18 2721 26,28 2699
My desired behaivour is that you have Euros in one line and N in other line instead of columns:
ID Var1 Var2 Var3 Var4
1 A Euros 133.911,20 134.208,78 442,03
2 A N 451 450 328
3 C Euros 9.470,35 26,18 26,28
4 C N 2856 2721 2699
I have tried to do so only with A group and using the following code:
mydatatable_wide <- spread(mydatatable, Euros, N)
But I don´t get my expected result. What I get is:
ID 133.911,20 134.208,78 442,03
1 A 451 450 328
Need some work to achieve what you want - I am using dplyr & tidyr
library(dplyr)
library(tidyr)
# Here is the tribble from your question
# Note that in my language "." is decimal point and "," is thousand separate
# In R code thousand separate is not used.
df <- tribble(
~ID, ~Euros, ~N, ~Euros, ~N, ~Euros, ~N,
"A", 133911.20, 451, 134208.78, 450, 442.03, 328,
"C", 9470.35, 2856, 26.18, 2721, 26.28, 2699)
df %>%
# first convert your data set into a long version with multiple lines per ID
# contains all the numerical values Euros & N
pivot_longer(cols = where(is.numeric), names_to = "var", values_to = "value") %>%
# then split them into multiple group of Euros using group_by & group_map
group_by(var) %>%
group_map(~ {
.x %>%
group_by(ID) %>%
# in group map within each ID create a index var for those values
mutate(index_name = paste0("var_", seq(1, n(), by =1))) %>%
# then pivot them wider to have one line per ID & (Euros/N)
pivot_wider(names_from = "index_name", values_from = value, values_fill = NA)
}, .keep = TRUE) %>%
# Finally combined all the data.frame from group_map into one data.frame
bind_rows()
Output
ID var var_1 var_2 var_3
<chr> <chr> <dbl> <dbl> <dbl>
1 A Euros 133911. 134209. 442.
2 C Euros 9470. 26.2 26.3
3 A N 451 450 328
4 C N 2856 2721 2699

Resources