Cast multiple values in R [duplicate] - r

This question already has answers here:
Convert data from long format to wide format with multiple measure columns
(6 answers)
Closed 1 year ago.
Is there a way to cast multiple values in R
asd <- data.frame(week = c(1,1,2,2), year = c("2019","2020","2019","2020"), val = c(1,2,3,4), cap = c(3,4,6,7))
Expected output
week 2019_val 2020_val 2019_cap 2020_cap
1 1 2 3 6
2 3 4 4 7

If you want to do this in base R, you can use reshape:
reshape(asd, direction = "wide", idvar = "week", timevar = "year", sep = "_")
#> week val_2019 cap_2019 val_2020 cap_2020
#> 1 1 1 3 2 4
#> 3 2 3 6 4 7
Note that it is best not to start your new column names with the year, since variable names beginning with numbers are not legal in R, and therefore always need to be quoted. It becomes quite tiresome to write asd$'2020_val' rather than asd$val_2020 and can often lead to errors when one forgets the quotes.

With tidyr::pivot_wider you could do:
asd <- data.frame(week = c(1,1,2,2), year = c("2019","2020","2019","2020"), val = c(1,2,3,4), cap = c(3,4,6,7))
tidyr::pivot_wider(asd, names_from = year, values_from = c(val, cap), names_glue = "{year}_{.value}")
#> # A tibble: 2 × 5
#> week `2019_val` `2020_val` `2019_cap` `2020_cap`
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 2 3 4
#> 2 2 3 4 6 7

For completion, here is data.table option -
library(data.table)
dcast(setDT(asd), week~year, value.var = c('val', 'cap'))
# week val_2019 val_2020 cap_2019 cap_2020
#1: 1 1 2 3 4
#2: 2 3 4 6 7

Slightly different approach using pivot_longer and pivot_wider together:
library(tidyr)
library(dplyr)
asd %>%
pivot_longer(
cols = -c(week, year)
) %>%
pivot_wider(
names_from = c(year, name)
)
week `2019_val` `2019_cap` `2020_val` `2020_cap`
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 3 2 4
2 2 3 6 4 7

Related

How to find the first observation of a column that matches a condition

I have a data frame:
df = tibble(a=c(7,6,10,12,12), b=c(3,5,8,8,7), c=c(4,4,12,15,20), week=c(1,2,3,4,5))
# A tibble: 5 x 4
a b c week
<dbl> <dbl> <dbl> <dbl>
1 7 3 4 1
2 6 5 4 2
3 10 8 12 3
4 12 8 15 4
5 12 7 20 5
and i want for every column a, b and c the week in which the observation is equal to or exceeds 10.
I.e. for column a it would be week 3, for column b it would be week NA, for column c it would be week 3 as well.
A desired ouotcome could look like this:
tibble(abc=c("a", NA, "b"), value=c(10, NA, 12), week=c(3, NA, 3))
# A tibble: 3 x 3
abc value week
<chr> <dbl> <dbl>
1 a 10 3
2 b NA NA
3 c 12 3
One way would be to get the data in long format and for each column name select the first value that is greater than 10. We fill the missing combinations with complete.
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = -week, names_to = 'abc') %>%
group_by(abc) %>%
slice(which(value >= 10)[1]) %>%
ungroup %>%
complete(abc = names(df)[-4])
# A tibble: 3 x 3
# abc week value
# <chr> <dbl> <dbl>
#1 a 3 10
#2 b NA NA
#3 c 3 12
Another way is to first calculate what we want and then transform the dataset into long format.
df %>%
summarise(across(a:c, list(week = ~week[which(. >= 10)[1]],
value = ~.[. >= 10][1]))) %>%
pivot_longer(cols = everything(),
names_to = c('abc', '.value'),
names_sep = "_")

pivot_wider in R dropping variables that I need [duplicate]

This question already has answers here:
R Reshape data frame from long to wide format? [duplicate]
(2 answers)
Closed 2 years ago.
I'm so confused here. I have a dataset that looks like this:
dataset <- data.frame(
Label = c(1.1,1.1,1.1,2.1,2.1,2.1,3.1,3.1,3.1,1.6,1.6,1.6,2.6,2.6,2.6,3.6,3.6,3.6),
StudyID = c(1,1,1,2,2,2,3,3,3,1,1,1,2,2,2,3,3,3),
ScanNumber = c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3),
Timepoint = c(1,1,1,1,1,1,1,1,1,6,6,6,6,6,6,6,6,6),
Fat = c(3,3,3,4,4,4,5,5,5,6,6,6,7,7,7,8,8,8),
Lean = c(5,5,5,6,6,6,7,7,7,3,3,3,4,4,4,5,5,5)
)
I want to pivot_wider so that I have triplicate Fat and Lean measurements for each StudyID and Timepoint. You can see the Label contains information on the StudyID and Timepoint combined (for example, say StudyID = 1 and Timepoint = 6, Label is 1.6). This is how I am doing it:
newdataset <- dataset %>%
pivot_wider(
id_cols = Label,
names_from = ScanNumber,
names_sep = "_",
values_from = c(Fat, Lean)
)
However, the output I get no longer includes StudyID and Timepoint. I require these variables to then merge the dataset with another dataset. I have been searching the internet but can't seem to find how to keep StudyID and Timepoint in the new dataset after performing pivot_wider. What am I missing?
Thanks in advance.
Combine them within id_cols, which are preserved (and grouped):
dataset %>%
pivot_wider(
id_cols = c(Label, StudyID, Timepoint),
names_from = ScanNumber,
names_sep = "_",
values_from = c(Fat, Lean)
)
# # A tibble: 6 x 9
# Label StudyID Timepoint Fat_1 Fat_2 Fat_3 Lean_1 Lean_2 Lean_3
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1.1 1 1 3 3 3 5 5 5
# 2 2.1 2 1 4 4 4 6 6 6
# 3 3.1 3 1 5 5 5 7 7 7
# 4 1.6 1 6 6 6 6 3 3 3
# 5 2.6 2 6 7 7 7 4 4 4
# 6 3.6 3 6 8 8 8 5 5 5

Creating wide data that has only 1 ID column [duplicate]

This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 3 years ago.
I have a data frame that looks like this:
ID Code_Type Code date
1 10 4 1
1 9 5 2
2 10 6 3
2 9 7 4
and I would like it to look like this:
ID date.1 date.2 9 10
1 1 2 5 4
2 3 4 7 6
Where the different dates have different columns on the same row.
My current code is this:
#Example df
df <- data.frame("ID" = c(1,1,2,2),
"Code_Type" = c(10,9,10,9),
"Code" = c(4,5,6,7),
"date"= c(1,2,3,4))
spread(df, Code_Type,Code)
This outputs:
ID date 9 10
1 1 NA 4
1 2 5 NA
2 3 NA 6
2 4 7 NA
Which is similar to what I want I just have no idea how to make the date column turn into multiple columns. Any help or extra reading is appreciated.
To clarify this is my expected output data frame
ID date.1 date.2 9 10
1 1 2 5 4
2 3 4 7 6
You could use reshape from base R.
reshape(dat, idvar=c("ID"), timevar="Code_Type", direction="wide")
# ID Code.10 date.10 Code.9 date.9
# 1 1 4 1 5 2
# 3 2 6 3 7 4
Data
dat <- structure(list(ID = c(1, 1, 2, 2), Code_Type = c(10, 9, 10, 9
), Code = c(4, 5, 6, 7), date = c(1, 2, 3, 4)), class = "data.frame", row.names = c(NA,
-4L))
Here's a dplyr / tidyr alternative:
df %>% mutate(date.1 = date %% 2 * date) %>% mutate(date.2 = - (date %% 2 - 1) * date) %>% select(-date) %>% spread(Code_Type, Code) %>% group_by(ID) %>% summarise_all(list(~ sum(.[!is.na(.)])))
# A tibble: 2 x 5
ID date.1 date.2 `9` `10`
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 2 5 4
2 2 3 4 7 6
The idea is to split the date column into two columns whether date is even or odd. This is done using the modulo (%%) operator (and some additional number crunching). date.1 = date %% 2 * date catches the odd numbers in date and is 0 for all the others; date.2 = - (date %% 2 - 1) * date catches the even numbers and is 0 for all the others.
Afterwards it's straight forward: select all columns but date; spread it to wide format and, a bit tricky again, summarise by ID and drop all NAs (group_by(ID) %>% summarise_all(list(~ sum(.[!is.na(.)]))).

Moving rows to columns in R using identifier

I have a dataset in r with two columns of numerical data and one with an identifier. Some of the rows share the same identifier (i.e. they are the same individual), but contain different data. I want to use the identifier to move those that share an identifier from a row into a columns. There are currently 600 rows, but there should be 400.
Can anyone share r code that might do this? I am new to R, and have tried the reshape (cast) programme, but I can't really follow it, and am not sure it's exactly what i'm trying to do.
Any help gratefully appreciated.
UPDATE:
Current
ID Age Sex
1 3 1
1 5 1
1 6 1
1 7 1
2 1 2
2 12 2
2 5 2
3 3 1
Expected output
ID Age Sex Age2 Sex2 Age3 Sex3 Age4 Sex4
1 3 1 5 1 6 1 7 1
2 1 2 12 2 5 2
3 3 1
UPDATE 2:
So far I have tried using the melt and dcast commands from reshape2. I am getting there, but it still doesn't look quite right. Here is my code:
x <- melt(example, id.vars = "ID")
x$time <- ave(x$ID, x$ID, FUN = seq_along)
example2 <- dcast (x, ID ~ time, value.var = "value")
and here is the output using that code:
ID A B C D E F G H (for clarity i have labelled these)
1 3 5 6 7 1 1 1 1
2 1 12 5 2 2 2
3 3 1
So, as you can probably see, it is mixing up the 'sex' and 'age' variables and combining them in the same column. For example column D has the value '7' for person 1 (age4), but '2' for person 2 (Sex). I can see that my code is not instructing where the numerical values should be cast to, but I do not know how to code that part. Any ideas?
Here's an approach using gather, spread and unite from the tidyr package:
suppressPackageStartupMessages(library(tidyverse))
x <- tribble(
~ID, ~Age, ~Sex,
1, 3, 1,
1, 5, 1,
1, 6, 1,
1, 7, 1,
2, 1, 2,
2, 12, 2,
2, 5, 2,
3, 3, 1
)
x %>% group_by(ID) %>%
mutate(grp = 1:n()) %>%
gather(var, val, -ID, -grp) %>%
unite("var_grp", var, grp, sep ='') %>%
spread(var_grp, val, fill = '')
#> # A tibble: 3 x 9
#> # Groups: ID [3]
#> ID Age1 Age2 Age3 Age4 Sex1 Sex2 Sex3 Sex4
#> * <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 3 5 6 7 1 1 1 1
#> 2 2 1 12 5 2 2 2
#> 3 3 3 1
If you prefer to keep the columns numeric then just remove the fill='' argument from spread(var_grp, val, fill = '').
Other questions which might help with this include:
R spreading multiple columns with tidyr
How can I spread repeated measures of multiple variables into wide format?
I have recently come across a similar issue in my data, and wanted to provide an update using the tidyr 1.0 functions as gather and spread have been retired. The new pivot_longer and pivot_wider are currently much slower than gather and spread, especially on very large datasets, but this is supposedly fixed in the next update of tidyr, so hope this updated solution is useful to people.
library(tidyr)
library(dplyr)
x %>%
group_by(ID) %>%
mutate(grp = 1:n()) %>%
pivot_longer(-c(ID, grp), names_to = "var", values_to = "val") %>%
unite("var_grp", var, grp, sep = "") %>%
pivot_wider(names_from = var_grp, values_from = val)
#> # A tibble: 3 x 9
#> # Groups: ID [3]
#> ID Age1 Sex1 Age2 Sex2 Age3 Sex3 Age4 Sex4
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 3 1 5 1 6 1 7 1
#> 2 2 1 2 12 2 5 2 NA NA
#> 3 3 3 1 NA NA NA NA NA NA

recoding categorical with no mapping values

Got a data frame with a lot of variables (82), many of them are used for further calculations. So I've tried to convert to numerical but there's a huge work guessing distinct values for every variable and then assign numbers.
I wonder if there's a more automated way of doing it since I don't care which number is assigned to any value as it is not repeated.
My approach so far (for he sake of clarity, dummy data):
df <- data.frame(original.var1 = c("display","memory","software","display","disk","memory"),
original.var2 = c("skeptic","believer","believer","believer","skeptic","believer"),
original.var3 = c("round","square","triangle","cube","sphere","hexagon"),
original.var4 = c(10,20,30,40,50,60))
taking into account this worked fine
library(dplyr)
library(magrittr)
df$NEW1 <- as.numeric(interaction(df$original.var1, drop=TRUE))
I've tried to adapt to dplyr and pipes this way
df %<>% mutate(VAR1= as.numeric(interaction(original.var1, drop=TRUE))) %>%
mutate(VAR2= as.numeric(interaction(original.var2, drop=TRUE))) %>%
mutate(VAR3= as.numeric(interaction(original.var2, drop=TRUE)))
but results got wrong from third VAR ahead
df %>% dplyr::group_by(original.var1,VAR1) %>% tally()
# A tibble: 4 x 3
# Groups: original.var1 [?]
original.var1 VAR1 n
<fctr> <dbl> <int>
1 disk 1 1
2 display 2 2
3 memory 3 2
4 software 4 1
> df %>% dplyr::group_by(original.var2,VAR2) %>% tally()
# A tibble: 2 x 3
# Groups: original.var2 [?]
original.var2 VAR2 n
<fctr> <dbl> <int>
1 believer 1 4
2 skeptic 2 2
> df %>% dplyr::group_by(original.var3,VAR3) %>% tally()
# A tibble: 6 x 3
# Groups: original.var3 [?]
original.var3 VAR3 n
<fctr> <dbl> <int>
1 cube 1 1
2 hexagon 1 1
3 round 2 1
4 sphere 2 1
5 square 1 1
6 triangle 1 1
Any approach or package to recode not having the mapping declared previously?
You can use mutate_if,
library(dplyr)
mutate_if(df, is.factor, funs(as.numeric(interaction(., drop = TRUE))))
which gives,
original.var1 original.var2 original.var3 original.var4
1 2 2 3 10
2 3 1 5 20
3 4 1 6 30
4 2 1 1 40
5 1 2 4 50
6 3 1 2 60
Alternatively you can read your data frame with stringsAsFactors = FALSE and use is.character but it's the same thing
To address your comment, If you want to also keep your original columns, then,
mutate_if(df, is.factor, funs(new = as.numeric(interaction(., drop = TRUE))))
Using purrr Keep the factor columns only and operate on them. Merge with numerical at the end.
df %>% purrr::keep(is.factor) %>% mutate_all(funs(as.numeric(interaction(., drop = TRUE))))

Resources