pivot_wider in R dropping variables that I need [duplicate] - r

This question already has answers here:
R Reshape data frame from long to wide format? [duplicate]
(2 answers)
Closed 2 years ago.
I'm so confused here. I have a dataset that looks like this:
dataset <- data.frame(
Label = c(1.1,1.1,1.1,2.1,2.1,2.1,3.1,3.1,3.1,1.6,1.6,1.6,2.6,2.6,2.6,3.6,3.6,3.6),
StudyID = c(1,1,1,2,2,2,3,3,3,1,1,1,2,2,2,3,3,3),
ScanNumber = c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3),
Timepoint = c(1,1,1,1,1,1,1,1,1,6,6,6,6,6,6,6,6,6),
Fat = c(3,3,3,4,4,4,5,5,5,6,6,6,7,7,7,8,8,8),
Lean = c(5,5,5,6,6,6,7,7,7,3,3,3,4,4,4,5,5,5)
)
I want to pivot_wider so that I have triplicate Fat and Lean measurements for each StudyID and Timepoint. You can see the Label contains information on the StudyID and Timepoint combined (for example, say StudyID = 1 and Timepoint = 6, Label is 1.6). This is how I am doing it:
newdataset <- dataset %>%
pivot_wider(
id_cols = Label,
names_from = ScanNumber,
names_sep = "_",
values_from = c(Fat, Lean)
)
However, the output I get no longer includes StudyID and Timepoint. I require these variables to then merge the dataset with another dataset. I have been searching the internet but can't seem to find how to keep StudyID and Timepoint in the new dataset after performing pivot_wider. What am I missing?
Thanks in advance.

Combine them within id_cols, which are preserved (and grouped):
dataset %>%
pivot_wider(
id_cols = c(Label, StudyID, Timepoint),
names_from = ScanNumber,
names_sep = "_",
values_from = c(Fat, Lean)
)
# # A tibble: 6 x 9
# Label StudyID Timepoint Fat_1 Fat_2 Fat_3 Lean_1 Lean_2 Lean_3
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1.1 1 1 3 3 3 5 5 5
# 2 2.1 2 1 4 4 4 6 6 6
# 3 3.1 3 1 5 5 5 7 7 7
# 4 1.6 1 6 6 6 6 3 3 3
# 5 2.6 2 6 7 7 7 4 4 4
# 6 3.6 3 6 8 8 8 5 5 5

Related

I want summarise a data frame [duplicate]

This question already has answers here:
count number of rows in a data frame in R based on group [duplicate]
(8 answers)
Closed 1 year ago.
I want summarize the following data frame to a summary table.
plot <- c(rep(1,2), rep(2,4), rep(3,3))
bird <- c('a','b', 'a','b', 'c', 'd', 'a', 'b', 'c')
area <- c(rep(10,2), rep(5,4), rep(15,3))
birdlist <- data.frame(plot,bird,area)
birdlist
plot bird area
1 1 a 10
2 1 b 10
3 2 a 5
4 2 b 5
5 2 c 5
6 2 d 5
7 3 a 15
8 3 b 15
9 3 c 15
I tried the following
birdlist %>%
group_by(plot, area) %>%
mutate(count(bird))
I am trying to get a data frame as result that looks like the following
plot bird area
1 2 10
2 4 5
3 3 15
Please help/advice on how to count bird with reference to plot and respective area of the plot. Thanks.
You were very close, you want summarize instead of mutate though and you can use n() to count the number of rows within the group you're specifying.
library(tidyverse)
birdlist %>%
group_by(plot, area) %>%
summarize(bird = n(),
.groups = "drop")
#> # A tibble: 3 x 3
#> plot area bird
#> <dbl> <dbl> <int>
#> 1 1 10 2
#> 2 2 5 4
#> 3 3 15 3
If you're set on count, you would use it without group_by.
birdlist %>%
count(plot, area, name = "bird")
We could group_by plot and summarise using unique():
birdlist %>%
group_by(plot) %>%
summarise(bird = n(), area = unique(area))
plot bird area
<dbl> <int> <dbl>
1 1 2 10
2 2 4 5
3 3 3 15

Is there a way to count occurrence within a group in R? [duplicate]

This question already has answers here:
r Group by and count
(3 answers)
count number of rows in a data frame in R based on group [duplicate]
(8 answers)
Closed 1 year ago.
I have list of people grouped by their counties and by villages. I would like to count the number of villages in the respective counties. I am able to count the number of people in each county.
library(dplyr)
set.seed(123)
df <- data.frame(
person = 1:100,
county = round(runif(100, 1, 5)),
village = round(runif(100, 1, 10))
)
# Number of people per county
df %>% count(county )
library(dplyr)
df %>%
group_by(county) %>%
add_count(village)
output:
person county village n
<int> <dbl> <dbl> <int>
1 1 2 6 4
2 2 4 4 8
3 3 3 5 5
4 4 5 10 1
5 5 5 5 3
6 6 1 9 2
7 7 3 9 1
8 8 5 6 2
9 9 3 5 5
10 10 3 2 6
# ... with 90 more rows
Would that work for you Moses
df %>% group_by(county) %>%
count(village,county)

Cast multiple values in R [duplicate]

This question already has answers here:
Convert data from long format to wide format with multiple measure columns
(6 answers)
Closed 1 year ago.
Is there a way to cast multiple values in R
asd <- data.frame(week = c(1,1,2,2), year = c("2019","2020","2019","2020"), val = c(1,2,3,4), cap = c(3,4,6,7))
Expected output
week 2019_val 2020_val 2019_cap 2020_cap
1 1 2 3 6
2 3 4 4 7
If you want to do this in base R, you can use reshape:
reshape(asd, direction = "wide", idvar = "week", timevar = "year", sep = "_")
#> week val_2019 cap_2019 val_2020 cap_2020
#> 1 1 1 3 2 4
#> 3 2 3 6 4 7
Note that it is best not to start your new column names with the year, since variable names beginning with numbers are not legal in R, and therefore always need to be quoted. It becomes quite tiresome to write asd$'2020_val' rather than asd$val_2020 and can often lead to errors when one forgets the quotes.
With tidyr::pivot_wider you could do:
asd <- data.frame(week = c(1,1,2,2), year = c("2019","2020","2019","2020"), val = c(1,2,3,4), cap = c(3,4,6,7))
tidyr::pivot_wider(asd, names_from = year, values_from = c(val, cap), names_glue = "{year}_{.value}")
#> # A tibble: 2 × 5
#> week `2019_val` `2020_val` `2019_cap` `2020_cap`
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 2 3 4
#> 2 2 3 4 6 7
For completion, here is data.table option -
library(data.table)
dcast(setDT(asd), week~year, value.var = c('val', 'cap'))
# week val_2019 val_2020 cap_2019 cap_2020
#1: 1 1 2 3 4
#2: 2 3 4 6 7
Slightly different approach using pivot_longer and pivot_wider together:
library(tidyr)
library(dplyr)
asd %>%
pivot_longer(
cols = -c(week, year)
) %>%
pivot_wider(
names_from = c(year, name)
)
week `2019_val` `2019_cap` `2020_val` `2020_cap`
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 3 2 4
2 2 3 6 4 7

Creating wide data that has only 1 ID column [duplicate]

This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 3 years ago.
I have a data frame that looks like this:
ID Code_Type Code date
1 10 4 1
1 9 5 2
2 10 6 3
2 9 7 4
and I would like it to look like this:
ID date.1 date.2 9 10
1 1 2 5 4
2 3 4 7 6
Where the different dates have different columns on the same row.
My current code is this:
#Example df
df <- data.frame("ID" = c(1,1,2,2),
"Code_Type" = c(10,9,10,9),
"Code" = c(4,5,6,7),
"date"= c(1,2,3,4))
spread(df, Code_Type,Code)
This outputs:
ID date 9 10
1 1 NA 4
1 2 5 NA
2 3 NA 6
2 4 7 NA
Which is similar to what I want I just have no idea how to make the date column turn into multiple columns. Any help or extra reading is appreciated.
To clarify this is my expected output data frame
ID date.1 date.2 9 10
1 1 2 5 4
2 3 4 7 6
You could use reshape from base R.
reshape(dat, idvar=c("ID"), timevar="Code_Type", direction="wide")
# ID Code.10 date.10 Code.9 date.9
# 1 1 4 1 5 2
# 3 2 6 3 7 4
Data
dat <- structure(list(ID = c(1, 1, 2, 2), Code_Type = c(10, 9, 10, 9
), Code = c(4, 5, 6, 7), date = c(1, 2, 3, 4)), class = "data.frame", row.names = c(NA,
-4L))
Here's a dplyr / tidyr alternative:
df %>% mutate(date.1 = date %% 2 * date) %>% mutate(date.2 = - (date %% 2 - 1) * date) %>% select(-date) %>% spread(Code_Type, Code) %>% group_by(ID) %>% summarise_all(list(~ sum(.[!is.na(.)])))
# A tibble: 2 x 5
ID date.1 date.2 `9` `10`
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 2 5 4
2 2 3 4 7 6
The idea is to split the date column into two columns whether date is even or odd. This is done using the modulo (%%) operator (and some additional number crunching). date.1 = date %% 2 * date catches the odd numbers in date and is 0 for all the others; date.2 = - (date %% 2 - 1) * date catches the even numbers and is 0 for all the others.
Afterwards it's straight forward: select all columns but date; spread it to wide format and, a bit tricky again, summarise by ID and drop all NAs (group_by(ID) %>% summarise_all(list(~ sum(.[!is.na(.)]))).

How to average all columns in dataset by group [duplicate]

This question already has answers here:
How to calculate mean of all columns, by group?
(6 answers)
Closed 4 years ago.
I'm using aggregate in R to try and summarize my dataset. I currently have 3-5 observation per ID and I need to average these so that I have 1 value (the mean) per ID. Some columns are returning all "NA" when I use aggregate.
So far, I've created a vector for each column to average it, then tried to use merge to combine all of them. Some columns are characters, so I tried converting them to numbers using as.numeric(as.character(column)), but that returns too many NA in the column.
library(dplyr)
Tr1 <- data %>% group_by(ID) %>% summarise(mean = mean(Tr1))
Tr2 <- data %>% group_by(ID) %>% summarise(mean = mean(Tr2))
Tr3 <- data %>% group_by(ID) %>% summarise(mean = mean(Tr3))
data2 <- merge(Tr1,Tr2,Tr3, by = ID)
From this code I get error codes:
There were 50 or more warnings (use warnings() to see the first 50)
then,
Error in fix.by(by.x, x) :
'by' must specify one or more columns as numbers, names or logical
My original dataset looks like:
ID Tr1 Tr2 Tr3
1 4 5 6
1 5 3 9
1 3 5 9
4 5 1 8
4 2 6 4
6 2 8 6
6 2 7 4
6 7 1 9
and I am trying to find a code so that it looks like:
ID Tr1 Tr2 Tr3
1 4 4.3 8
4 3.5 3.5 6
6 3.7 5.3 6.3
You can use summarise_all instead of multiple uses of summarise:
library(dplyr)
data %>%
group_by(ID) %>%
summarise_all(mean)
# A tibble: 3 x 4
ID Tr1 Tr2 Tr3
<int> <dbl> <dbl> <dbl>
1 1 4 4.33 8
2 4 3.5 3.5 6
3 6 3.67 5.33 6.33

Resources