Does reshape2 have casetovars equivalent? - r

I have: data table:
Id
Time
v1
v2
v3
T1
2
1
2
T2
3
1
2
T3
1
3
3
Basically, I have data in three waves (T1, T2) etc. I need to make it a wide format so it looks like this:
id
v1T1
v2T1
v3T1
v1T2
v2T2
v3T2
v1T3
V2T3
2
1
2
3
1
2
1
3
I have tried the following code:
data %>%
group_by(id) %>%
mutate(id=paste0("id", row_number())) %>%
spread(id, v1, v2, v3)
What am I missing? I know how to do this with casetovars in SPSS, but I can't duplicate it in R.

You can use pivot_wider :
tidyr::pivot_wider(df, names_from = Time, values_from = v1:v3)
# Id v1_T1 v1_T2 v1_T3 v2_T1 v2_T2 v2_T3 v3_T1 v3_T2 v3_T3
# <dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#1 1 2 3 1 1 1 3 2 2 3
Or using data.table :
library(data.table)
dcast(setDT(df), Id~Time, value.var = c('v1', 'v2', 'v3'))
data
df <- structure(list(Id = c(1, 1, 1), Time = c("T1", "T2", "T3"), v1 = c(2L,
3L, 1L), v2 = c(1L, 1L, 3L), v3 = c(2L, 2L, 3L)), row.names = c(NA,
-3L), class = "data.frame")

Since your question title mentions "reshape2", the function you'd be looking for in that package is recast. The recast function is basically a melt followed by a dcast, which is required in reshape2::dcast to cast multiple value variables (but not in data.table::dcast, which can accept a vector of value variables, as demonstrated in Ronak's answer).
Here's what it looks like:
library(reshape2)
recast(df, ... ~ variable + Time, id.var=1:2)
## Id v1_T1 v1_T2 v1_T3 v2_T1 v2_T2 v2_T3 v3_T1 v3_T2 v3_T3
## 1 1 2 3 1 1 1 3 2 2 3
For reference, you can also do this with reshape from base R:
reshape(df, direction = "wide", idvar = "Id", timevar = "Time")
## Id v1.T1 v2.T1 v3.T1 v1.T2 v2.T2 v3.T2 v1.T3 v2.T3 v3.T3
## 1 1 2 1 2 3 1 2 1 3 3
That said, many people find reshape very hard to learn, and while the "reshape2" package will be maintained, it's not being actively developed. Thus, while you can expect that things won't break, new features aren't going to be added to it. For that, you'll have to look at the "data.table" implementation or start using "tidyr" or other alternatives.

Related

Group by area then by custom age range in R [duplicate]

I am trying to obtain counts of each combination of levels of two variables, "week" and "id". I'd like the result to have "id" as rows, and "week" as columns, and the counts as the values.
Example of what I've tried so far (tried a bunch of other things, including adding a dummy variable = 1 and then fun.aggregate = sum over that):
library(plyr)
ddply(data, .(id), dcast, id ~ week, value_var = "id",
fun.aggregate = length, fill = 0, .parallel = TRUE)
However, I must be doing something wrong because this function is not finishing. Is there a better way to do this?
Input:
id week
1 1
1 2
1 3
1 1
2 3
Output:
1 2 3
1 2 1 1
2 0 0 1
You could just use the table command:
table(data$id,data$week)
1 2 3
1 2 1 1
2 0 0 1
If "id" and "week" are the only columns in your data frame, you can simply use:
table(data)
# week
# id 1 2 3
# 1 2 1 1
# 2 0 0 1
You don't need ddply for this. The dcast from reshape2 is sufficient:
dat <- data.frame(
id = c(rep(1, 4), 2),
week = c(1:3, 1, 3)
)
library(reshape2)
dcast(dat, id~week, fun.aggregate=length)
id 1 2 3
1 1 2 1 1
2 2 0 0 1
Edit : For a base R solution (other than table - as posted by Joshua Uhlrich), try xtabs:
xtabs(~id+week, data=dat)
week
id 1 2 3
1 2 1 1
2 0 0 1
The reason ddply is taking so long is that the splitting by group is not run in parallel (only the computations on the 'splits'), therefore with a large number of groups it will be slow (and .parallel = T) will not help.
An approach using data.table::dcast (data.table version >= 1.9.2) should be extremely efficient in time and memory. In this case, we can rely on default argument values and simply use:
library(data.table)
dcast(setDT(data), id ~ week)
# Using 'week' as value column. Use 'value.var' to override
# Aggregate function missing, defaulting to 'length'
# id 1 2 3
# 1: 1 2 1 1
# 2: 2 0 0 1
Or setting the arguments explicitly:
dcast(setDT(data), id ~ week, value.var = "week", fun = length)
# id 1 2 3
# 1: 1 2 1 1
# 2: 2 0 0 1
For pre-data.table 1.9.2 alternatives, see edits.
A tidyverse option could be :
library(dplyr)
library(tidyr)
df %>%
count(id, week) %>%
pivot_wider(names_from = week, values_from = n, values_fill = list(n = 0))
#spread(week, n, fill = 0) #In older version of tidyr
# id `1` `2` `3`
# <dbl> <dbl> <dbl> <dbl>
#1 1 2 1 1
#2 2 0 0 1
Using only pivot_wider -
tidyr::pivot_wider(df, names_from = week,
values_from = week, values_fn = length, values_fill = 0)
Or using tabyl from janitor :
janitor::tabyl(df, id, week)
# id 1 2 3
# 1 2 1 1
# 2 0 0 1
data
df <- structure(list(id = c(1L, 1L, 1L, 1L, 2L), week = c(1L, 2L, 3L,
1L, 3L)), class = "data.frame", row.names = c(NA, -5L))

Create a frequency dataframe with dplyr in R [duplicate]

I am trying to obtain counts of each combination of levels of two variables, "week" and "id". I'd like the result to have "id" as rows, and "week" as columns, and the counts as the values.
Example of what I've tried so far (tried a bunch of other things, including adding a dummy variable = 1 and then fun.aggregate = sum over that):
library(plyr)
ddply(data, .(id), dcast, id ~ week, value_var = "id",
fun.aggregate = length, fill = 0, .parallel = TRUE)
However, I must be doing something wrong because this function is not finishing. Is there a better way to do this?
Input:
id week
1 1
1 2
1 3
1 1
2 3
Output:
1 2 3
1 2 1 1
2 0 0 1
You could just use the table command:
table(data$id,data$week)
1 2 3
1 2 1 1
2 0 0 1
If "id" and "week" are the only columns in your data frame, you can simply use:
table(data)
# week
# id 1 2 3
# 1 2 1 1
# 2 0 0 1
You don't need ddply for this. The dcast from reshape2 is sufficient:
dat <- data.frame(
id = c(rep(1, 4), 2),
week = c(1:3, 1, 3)
)
library(reshape2)
dcast(dat, id~week, fun.aggregate=length)
id 1 2 3
1 1 2 1 1
2 2 0 0 1
Edit : For a base R solution (other than table - as posted by Joshua Uhlrich), try xtabs:
xtabs(~id+week, data=dat)
week
id 1 2 3
1 2 1 1
2 0 0 1
The reason ddply is taking so long is that the splitting by group is not run in parallel (only the computations on the 'splits'), therefore with a large number of groups it will be slow (and .parallel = T) will not help.
An approach using data.table::dcast (data.table version >= 1.9.2) should be extremely efficient in time and memory. In this case, we can rely on default argument values and simply use:
library(data.table)
dcast(setDT(data), id ~ week)
# Using 'week' as value column. Use 'value.var' to override
# Aggregate function missing, defaulting to 'length'
# id 1 2 3
# 1: 1 2 1 1
# 2: 2 0 0 1
Or setting the arguments explicitly:
dcast(setDT(data), id ~ week, value.var = "week", fun = length)
# id 1 2 3
# 1: 1 2 1 1
# 2: 2 0 0 1
For pre-data.table 1.9.2 alternatives, see edits.
A tidyverse option could be :
library(dplyr)
library(tidyr)
df %>%
count(id, week) %>%
pivot_wider(names_from = week, values_from = n, values_fill = list(n = 0))
#spread(week, n, fill = 0) #In older version of tidyr
# id `1` `2` `3`
# <dbl> <dbl> <dbl> <dbl>
#1 1 2 1 1
#2 2 0 0 1
Using only pivot_wider -
tidyr::pivot_wider(df, names_from = week,
values_from = week, values_fn = length, values_fill = 0)
Or using tabyl from janitor :
janitor::tabyl(df, id, week)
# id 1 2 3
# 1 2 1 1
# 2 0 0 1
data
df <- structure(list(id = c(1L, 1L, 1L, 1L, 2L), week = c(1L, 2L, 3L,
1L, 3L)), class = "data.frame", row.names = c(NA, -5L))

Is there an R function to count strings by re-arranging the table [duplicate]

I am trying to obtain counts of each combination of levels of two variables, "week" and "id". I'd like the result to have "id" as rows, and "week" as columns, and the counts as the values.
Example of what I've tried so far (tried a bunch of other things, including adding a dummy variable = 1 and then fun.aggregate = sum over that):
library(plyr)
ddply(data, .(id), dcast, id ~ week, value_var = "id",
fun.aggregate = length, fill = 0, .parallel = TRUE)
However, I must be doing something wrong because this function is not finishing. Is there a better way to do this?
Input:
id week
1 1
1 2
1 3
1 1
2 3
Output:
1 2 3
1 2 1 1
2 0 0 1
You could just use the table command:
table(data$id,data$week)
1 2 3
1 2 1 1
2 0 0 1
If "id" and "week" are the only columns in your data frame, you can simply use:
table(data)
# week
# id 1 2 3
# 1 2 1 1
# 2 0 0 1
You don't need ddply for this. The dcast from reshape2 is sufficient:
dat <- data.frame(
id = c(rep(1, 4), 2),
week = c(1:3, 1, 3)
)
library(reshape2)
dcast(dat, id~week, fun.aggregate=length)
id 1 2 3
1 1 2 1 1
2 2 0 0 1
Edit : For a base R solution (other than table - as posted by Joshua Uhlrich), try xtabs:
xtabs(~id+week, data=dat)
week
id 1 2 3
1 2 1 1
2 0 0 1
The reason ddply is taking so long is that the splitting by group is not run in parallel (only the computations on the 'splits'), therefore with a large number of groups it will be slow (and .parallel = T) will not help.
An approach using data.table::dcast (data.table version >= 1.9.2) should be extremely efficient in time and memory. In this case, we can rely on default argument values and simply use:
library(data.table)
dcast(setDT(data), id ~ week)
# Using 'week' as value column. Use 'value.var' to override
# Aggregate function missing, defaulting to 'length'
# id 1 2 3
# 1: 1 2 1 1
# 2: 2 0 0 1
Or setting the arguments explicitly:
dcast(setDT(data), id ~ week, value.var = "week", fun = length)
# id 1 2 3
# 1: 1 2 1 1
# 2: 2 0 0 1
For pre-data.table 1.9.2 alternatives, see edits.
A tidyverse option could be :
library(dplyr)
library(tidyr)
df %>%
count(id, week) %>%
pivot_wider(names_from = week, values_from = n, values_fill = list(n = 0))
#spread(week, n, fill = 0) #In older version of tidyr
# id `1` `2` `3`
# <dbl> <dbl> <dbl> <dbl>
#1 1 2 1 1
#2 2 0 0 1
Using only pivot_wider -
tidyr::pivot_wider(df, names_from = week,
values_from = week, values_fn = length, values_fill = 0)
Or using tabyl from janitor :
janitor::tabyl(df, id, week)
# id 1 2 3
# 1 2 1 1
# 2 0 0 1
data
df <- structure(list(id = c(1L, 1L, 1L, 1L, 2L), week = c(1L, 2L, 3L,
1L, 3L)), class = "data.frame", row.names = c(NA, -5L))

How do I convert a specific column in my R dataframe from long to wide and display the counts and percentages? [duplicate]

I am trying to obtain counts of each combination of levels of two variables, "week" and "id". I'd like the result to have "id" as rows, and "week" as columns, and the counts as the values.
Example of what I've tried so far (tried a bunch of other things, including adding a dummy variable = 1 and then fun.aggregate = sum over that):
library(plyr)
ddply(data, .(id), dcast, id ~ week, value_var = "id",
fun.aggregate = length, fill = 0, .parallel = TRUE)
However, I must be doing something wrong because this function is not finishing. Is there a better way to do this?
Input:
id week
1 1
1 2
1 3
1 1
2 3
Output:
1 2 3
1 2 1 1
2 0 0 1
You could just use the table command:
table(data$id,data$week)
1 2 3
1 2 1 1
2 0 0 1
If "id" and "week" are the only columns in your data frame, you can simply use:
table(data)
# week
# id 1 2 3
# 1 2 1 1
# 2 0 0 1
You don't need ddply for this. The dcast from reshape2 is sufficient:
dat <- data.frame(
id = c(rep(1, 4), 2),
week = c(1:3, 1, 3)
)
library(reshape2)
dcast(dat, id~week, fun.aggregate=length)
id 1 2 3
1 1 2 1 1
2 2 0 0 1
Edit : For a base R solution (other than table - as posted by Joshua Uhlrich), try xtabs:
xtabs(~id+week, data=dat)
week
id 1 2 3
1 2 1 1
2 0 0 1
The reason ddply is taking so long is that the splitting by group is not run in parallel (only the computations on the 'splits'), therefore with a large number of groups it will be slow (and .parallel = T) will not help.
An approach using data.table::dcast (data.table version >= 1.9.2) should be extremely efficient in time and memory. In this case, we can rely on default argument values and simply use:
library(data.table)
dcast(setDT(data), id ~ week)
# Using 'week' as value column. Use 'value.var' to override
# Aggregate function missing, defaulting to 'length'
# id 1 2 3
# 1: 1 2 1 1
# 2: 2 0 0 1
Or setting the arguments explicitly:
dcast(setDT(data), id ~ week, value.var = "week", fun = length)
# id 1 2 3
# 1: 1 2 1 1
# 2: 2 0 0 1
For pre-data.table 1.9.2 alternatives, see edits.
A tidyverse option could be :
library(dplyr)
library(tidyr)
df %>%
count(id, week) %>%
pivot_wider(names_from = week, values_from = n, values_fill = list(n = 0))
#spread(week, n, fill = 0) #In older version of tidyr
# id `1` `2` `3`
# <dbl> <dbl> <dbl> <dbl>
#1 1 2 1 1
#2 2 0 0 1
Using only pivot_wider -
tidyr::pivot_wider(df, names_from = week,
values_from = week, values_fn = length, values_fill = 0)
Or using tabyl from janitor :
janitor::tabyl(df, id, week)
# id 1 2 3
# 1 2 1 1
# 2 0 0 1
data
df <- structure(list(id = c(1L, 1L, 1L, 1L, 2L), week = c(1L, 2L, 3L,
1L, 3L)), class = "data.frame", row.names = c(NA, -5L))

Faster ways to calculate frequencies and cast from long to wide

I am trying to obtain counts of each combination of levels of two variables, "week" and "id". I'd like the result to have "id" as rows, and "week" as columns, and the counts as the values.
Example of what I've tried so far (tried a bunch of other things, including adding a dummy variable = 1 and then fun.aggregate = sum over that):
library(plyr)
ddply(data, .(id), dcast, id ~ week, value_var = "id",
fun.aggregate = length, fill = 0, .parallel = TRUE)
However, I must be doing something wrong because this function is not finishing. Is there a better way to do this?
Input:
id week
1 1
1 2
1 3
1 1
2 3
Output:
1 2 3
1 2 1 1
2 0 0 1
You could just use the table command:
table(data$id,data$week)
1 2 3
1 2 1 1
2 0 0 1
If "id" and "week" are the only columns in your data frame, you can simply use:
table(data)
# week
# id 1 2 3
# 1 2 1 1
# 2 0 0 1
You don't need ddply for this. The dcast from reshape2 is sufficient:
dat <- data.frame(
id = c(rep(1, 4), 2),
week = c(1:3, 1, 3)
)
library(reshape2)
dcast(dat, id~week, fun.aggregate=length)
id 1 2 3
1 1 2 1 1
2 2 0 0 1
Edit : For a base R solution (other than table - as posted by Joshua Uhlrich), try xtabs:
xtabs(~id+week, data=dat)
week
id 1 2 3
1 2 1 1
2 0 0 1
The reason ddply is taking so long is that the splitting by group is not run in parallel (only the computations on the 'splits'), therefore with a large number of groups it will be slow (and .parallel = T) will not help.
An approach using data.table::dcast (data.table version >= 1.9.2) should be extremely efficient in time and memory. In this case, we can rely on default argument values and simply use:
library(data.table)
dcast(setDT(data), id ~ week)
# Using 'week' as value column. Use 'value.var' to override
# Aggregate function missing, defaulting to 'length'
# id 1 2 3
# 1: 1 2 1 1
# 2: 2 0 0 1
Or setting the arguments explicitly:
dcast(setDT(data), id ~ week, value.var = "week", fun = length)
# id 1 2 3
# 1: 1 2 1 1
# 2: 2 0 0 1
For pre-data.table 1.9.2 alternatives, see edits.
A tidyverse option could be :
library(dplyr)
library(tidyr)
df %>%
count(id, week) %>%
pivot_wider(names_from = week, values_from = n, values_fill = list(n = 0))
#spread(week, n, fill = 0) #In older version of tidyr
# id `1` `2` `3`
# <dbl> <dbl> <dbl> <dbl>
#1 1 2 1 1
#2 2 0 0 1
Using only pivot_wider -
tidyr::pivot_wider(df, names_from = week,
values_from = week, values_fn = length, values_fill = 0)
Or using tabyl from janitor :
janitor::tabyl(df, id, week)
# id 1 2 3
# 1 2 1 1
# 2 0 0 1
data
df <- structure(list(id = c(1L, 1L, 1L, 1L, 2L), week = c(1L, 2L, 3L,
1L, 3L)), class = "data.frame", row.names = c(NA, -5L))

Resources