Is there an easy way or a built-in based function that is equivalent to tidyr::expand?
To elaborate on the comment made by #onyambu, you could do
mtcars |> with(expand.grid(cyl=unique(cyl), am=unique(am)))
# cyl am
# 1 6 1
# 2 4 1
# 3 8 1
# 4 6 0
# 5 4 0
# 6 8 0
whereas tidyr throws this:
library(magrittr)
mtcars %>% tidyr::expand(cyl, am)
# # A tibble: 6 × 2
# cyl am
# <dbl> <dbl>
# 1 4 0
# 2 4 1
# 3 6 0
# 4 6 1
# 5 8 0
# 6 8 1
Related
I would like conditionally mutate variables (var1, var2) within groups (id) at different timepoints (timepoint) using previously updated/muated values according to this function:
change_function <- function(value,pastvalue,timepoint){
if(timepoint==1){valuenew=value} else
if(value==0){valuenew=pastvalue-1}
if(value==1){valuenew=pastvalue}
if(value==2){valuenew=pastvalue+1}
return(valuenew)
}
pastvalue is the MUTATED/UPDATED value at timepoint -1 for timepoint 2:4
Here is an example and output file:
``` r
#example data
df <- data.frame(id=c(1,1,1,1,2,2,2,2),timepoint=c(1,2,3,4,1,2,3,4),var1=c(1,0,1,2,2,2,1,0),var2=c(2,0,1,2,3,2,1,0))
df
#> id timepoint var1 var2
#> 1 1 1 1 2
#> 2 1 2 0 0
#> 3 1 3 1 1
#> 4 1 4 2 2
#> 5 2 1 2 3
#> 6 2 2 2 2
#> 7 2 3 1 1
#> 8 2 4 0 0
#desired output
output <- data.frame(id=c(1,1,1,1,2,2,2,2),timepoint=c(1,2,3,4,1,2,3,4),var1=c(1,0,0,1,2,3,3,2),var2=c(2,1,1,2,3,4,4,3))
output
#> id timepoint var1 var2
#> 1 1 1 1 2
#> 2 1 2 0 1
#> 3 1 3 0 1
#> 4 1 4 1 2
#> 5 2 1 2 3
#> 6 2 2 3 4
#> 7 2 3 3 4
#> 8 2 4 2 3
```
<sup>Created on 2020-11-23 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0)</sup>
My Approach: use my function using dplyr::mutate_at
library(dplyr)
df %>%
group_by(id) %>%
mutate_at(.vars=vars(var1,var2),
.funs=funs(.=change_function(.,dplyr::lag(.),timepoint)))
However, this does not work because if/else is not vectorized
Update 1:
Using a nested ifelse function does not give the desired output, because it does not use updated pastvalue's:
change_function <- function(value,pastvalue,timepoint){
ifelse((timepoint==1),value,
ifelse((value==0),pastvalue-1,
ifelse((value==1),pastvalue,
ifelse((value==2),pastvalue+1,NA))))
}
library(dplyr)
df %>%
group_by(id) %>%
mutate_at(.vars=vars(var1,var2),
.funs=funs(.=change_function(.,dplyr::lag(.),timepoint)))
id TimePoint var1 var2 var1_. var2_.
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 1 2 1 2
2 1 2 0 0 0 1
3 1 3 1 1 0 0
4 1 4 2 2 2 2
5 2 1 2 3 2 3
6 2 2 2 2 3 4
7 2 3 1 1 2 2
8 2 4 0 0 0 0
Update 2:
According to the comments, purrr:accumulate could be used
Thanks to akrun I could get the correct function:
# write a vectorized function
change_function <- function(prev, new) {
change=if_else(new==0,-1,
if_else(new==1,0,1))
if_else(is.na(new), new, prev + change)
}
# use purrr:accumulate
df %>%
group_by(id) %>%
mutate_at(.vars=vars(var1,var2),
.funs=funs(accumulate(.,change_function)))
# A tibble: 8 x 4
# Groups: id [2]
id timepoint var1 var2
<dbl> <dbl> <dbl> <dbl>
1 1 1 1 2
2 1 2 0 1
3 1 3 0 1
4 1 4 1 2
5 2 1 2 3
6 2 2 3 4
7 2 3 3 4
8 2 4 2 3
Unfortunately, I have spotted a weird inconsistency in the colnames when cbind different 2 particular objects: tibbles that has been by_group()ed and matrix. I writing this here because I would understand what is going on under the hood with the cbind operation and these 2 objects.
Consider the following objects:
Simple tibble
library(tidyverse)
tbl <- tibble(tbl_name = seq(1,8))
# # A tibble: 8 x 1
# tbl_name
# <int>
# 1 1
# 2 2
# 3 3
# 4 4
# 5 5
# 6 6
# 7 7
# 8 8
Simple data.frame
df <- data.frame(df_name = seq(1,8))
df
# df_name
# 1 1
# 2 2
# 3 3
# 4 4
# 5 5
# 6 6
# 7 7
# 8 8
Simple matrix
mtx <- matrix(seq(1,8), nrow = 8)
colnames(mtx) <- "mtx_name"
# mtx_name
# [1,] 1
# [2,] 2
# [3,] 3
# [4,] 4
# [5,] 5
# [6,] 6
# [7,] 7
# [8,] 8
by_grouped tibble
tb2 <- tibble(tbl2_name = seq(1,8),
tbl_group_by = c("a","b","b","c","d","d","d","d"))
tb2 <- tb2 %>%
group_by(tbl_group_by) %>%
mutate(N_by_group = n())
# A tibble: 8 x 3
# Groups: tbl_group_by [4]
# tbl2_name tbl_group_by N_by_group
# <int> <chr> <int>
# 1 1 a 1
# 2 2 b 2
# 3 3 b 2
# 4 4 c 1
# 5 5 d 4
# 6 6 d 4
# 7 7 d 4
# 8 8 d 4
When cbind them:
>This works (a.k.a: keeps the correct names)
# Comparison
# tibble & data.frame: OK
cbind(tbl,df)
# tbl_name df_name
# 1 1 1
# 2 2 2
# 3 3 3
# 4 4 4
# 5 5 5
# 6 6 6
# 7 7 7
# 8 8 8
# matrix & data.frame: OK
cbind(mtx,df)
# mtx_name df_name
# 1 1 1
# 2 2 2
# 3 3 3
# 4 4 4
# 5 5 5
# 6 6 6
# 7 7 7
# 8 8 8
# tibble & matrix: OK
cbind(tbl,mtx)
# tbl_name mtx_name
# 1 1 1
# 2 2 2
# 3 3 3
# 4 4 4
# 5 5 5
# 6 6 6
# 7 7 7
# 8 8 8
This doesn't work (a.k.a: destroyed the colname of the matrix)
# tibble(group_by()) & matrix: oops!!!!
cbind(tb2,mtx)
# New names:
# * NA -> ...4
# # A tibble: 8 x 4
# # Groups: tbl_group_by [4]
# tbl2_name tbl_group_by N_by_group ...4[,"mtx_name"]
# <int> <chr> <int> <int>
# 1 1 a 1 1
# 2 2 b 2 2
# 3 3 b 2 3
# 4 4 c 1 4
# 5 5 d 4 5
# 6 6 d 4 6
# 7 7 d 4 7
# 8 8 d 4 8
Any intuition of what's happening or how to prevent it, is very welcome. Thank you in advance.
We can remove the group attributes with ungroup and now cbind should work
library(dplyr)
cbind(ungroup(tb2), mtx)
-output
# tbl2_name tbl_group_by N_by_group mtx_name
#1 1 a 1 1
#2 2 b 2 2
#3 3 b 2 3
#4 4 c 1 4
#5 5 d 4 5
#6 6 d 4 6
#7 7 d 4 7
#8 8 d 4 8
Or specifically use cbind.data.frame because by default it may use cbind.matrix
cbind.data.frame(tb2, mtx)
When we create the 'tb2', after grouping, make sure to ungroup to prevent this kind of issues
tb2 <- tb2 %>%
group_by(tbl_group_by) %>%
mutate(N_by_group = n()) %>%
ungroup
Or make use of is_grouped_df to find if the data is grouped or not and then ungroup
f1 <- function(dat) {
if(dplyr::is_grouped_df(dat)) {
dat <- ungroup(dat)
}
dat
}
cbind(f1(tb2), mtx)
If I have a grouping:
mtcars %>% group_by(cyl,carb)
How can I add a column that counts the number of unique group combinations; so carb groups within cyl groups? This would be something like:
cyl carb combination
6 2 1
6 4 2
6 6 3
4 2 1
4 4 2
4 6 3
Maybe there's a better way to avoid the n column, but below should be a good start:
mtcars %>% count(cyl,carb) %>% group_by(cyl) %>% mutate(combination=1:n())
# A tibble: 9 x 4
# Groups: cyl [3]
cyl carb n combination
<dbl> <dbl> <int> <int>
1 4 1 5 1
2 4 2 6 2
3 6 1 2 1
4 6 4 4 2
5 6 6 1 3
6 8 2 4 1
7 8 3 3 2
8 8 4 6 3
9 8 8 1 4
There are many ways to do this, this is the way I did it:
library(dplyr)
mtcars %>% group_by(cyl,carb) %>% summarize("count" = length(carb))
I am triying to get individual frequency table for each variable using a loop and dplyr package, example of my code is below using mtcars data:
library(dplyr)
var= c("vs", "am", "gear")
for (i in var){
mtcars %>%
group_by(carb) %>%
count(i)
}
Lamentably only i get:
Error: Column `i` is unknown
I also tried with
for (i in var){
mtcars %>%
group_by(carb) %>%
summarise_each(funs(n()), i)
}
But not succces,
Please any advice I will gratefull.
We can use !!sym() for the variable names. I would also recommend to save the results to a list as follows.
var <- c("vs", "am", "gear")
library(dplyr)
count_tables <- list()
for (i in var){
temp <- mtcars %>%
group_by(carb) %>%
count(!!sym(i))
count_tables[[i]] <- temp
}
count_tables
# $vs
# # A tibble: 8 x 3
# # Groups: carb [6]
# carb vs n
# <dbl> <dbl> <int>
# 1 1 1 7
# 2 2 0 5
# 3 2 1 5
# 4 3 0 3
# 5 4 0 8
# 6 4 1 2
# 7 6 0 1
# 8 8 0 1
#
# $am
# # A tibble: 9 x 3
# # Groups: carb [6]
# carb am n
# <dbl> <dbl> <int>
# 1 1 0 3
# 2 1 1 4
# 3 2 0 6
# 4 2 1 4
# 5 3 0 3
# 6 4 0 7
# 7 4 1 3
# 8 6 1 1
# 9 8 1 1
#
# $gear
# # A tibble: 11 x 3
# # Groups: carb [6]
# carb gear n
# <dbl> <dbl> <int>
# 1 1 3 3
# 2 1 4 4
# 3 2 3 4
# 4 2 4 4
# 5 2 5 2
# 6 3 3 3
# 7 4 3 5
# 8 4 4 4
# 9 4 5 1
# 10 6 5 1
# 11 8 5 1
It is also common to use lapply to loop through a vector or a list to apply a function and return the objects as a list. The following generates the same output as the for-loop.
count_tables <- lapply(var, function(x) {
mtcars %>%
group_by(carb) %>%
count(!!sym(i))
})
names(count_tables) <- var
For programmatically passing the variable as a string, you can use the version of those functions with and underscore at the end, such as count_, group_by_, etc.
In this case it would be:
for (i in var){
mtcars %>%
group_by(carb) %>%
count_(i) %>%
print()
}
You specifically asked for a for loop, but for your consideration, here goes a lapply alternative, which makes it easier to store the different results in one place for later access:
lapply(var, FUN = function(i) mtcars %>% group_by(carb) %>% count_(i))
I'm trying to delete rows for which the condition is not satisfying
eg.Remove that Subject row which do not have all period's value
following is the dataframe
Subject Period
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
4 1
4 2
4 3
Subject Period
1 1
1 2
1 3
2 1
2 2
2 3
4 1
4 2
4 3
A dplyr solution.
library(dplyr)
dat %>%
group_by(Subject) %>%
filter(all(unique(dat$Period) %in% Period)) %>%
ungroup()
# # A tibble: 9 x 2
# Subject Period
# <int> <int>
# 1 1 1
# 2 1 2
# 3 1 3
# 4 2 1
# 5 2 2
# 6 2 3
# 7 4 1
# 8 4 2
# 9 4 3
A base R solution.
dat_list <- split(dat, f = dat$Subject)
keep_vec <- sapply(dat_list, function(x) all(unique(dat$Period) %in% x$Period))
dat_keep <- dat_list[keep_vec]
dat2 <- do.call(rbind, dat_keep)
dat2
# Subject Period
# 1.1 1 1
# 1.2 1 2
# 1.3 1 3
# 2.4 2 1
# 2.5 2 2
# 2.6 2 3
# 4.9 4 1
# 4.10 4 2
# 4.11 4 3
A solution using purrr and dplyr.
library(purrr)
library(dplyr)
dat2 <- dat %>%
split(f = .$Subject) %>%
keep(~all(unique(dat$Period) %in% .x$Period)) %>%
bind_rows()
dat2
# Subject Period
# 1 1 1
# 2 1 2
# 3 1 3
# 4 2 1
# 5 2 2
# 6 2 3
# 7 4 1
# 8 4 2
# 9 4 3
DATA
dat <- read.table(text = "Subject Period
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
4 1
4 2
4 3",
header = TRUE)
Consider ave for inline aggregation then subset accordingly:
sub_df <- subset(df, ave(Period, Subject, FUN=max) != 3)