I am manipulating my dataset in a tidyverse fashion. However, the recode function at the end was not working. Here is an example:
olddata <- data.frame(
x = rep(1,12),
var_a = sample(1:10, 12, replace = TRUE),
var_b = sample(1:10, 12, replace = TRUE),
var_c = sample(1:10, 12, replace = TRUE))
newdata <- olddata %>%
gather(var, type, var_a:var_c) %>%
separate(var, into = c("var", "role"), sep = -1) %>%
recode(role, "a"=1, "b"=2, "c"=3)
Error message says
Error in UseMethod("recode") : no applicable method for 'recode' applied to an object of class "data.frame"
What is the problem here?
gather has been retired. If you use the pivot_longer function it can combine gather and separate step together here.
library(dplyr)
library(tidyr)
olddata %>%
pivot_longer(cols = -x,
names_to = c('var', 'role'),
names_pattern = '(var_)(.*)') %>%
mutate(role = recode(role, "a"=1, "b"=2, "c"=3))
# x var role value
# <dbl> <chr> <dbl> <int>
# 1 1 var_ 1 6
# 2 1 var_ 2 1
# 3 1 var_ 3 9
# 4 1 var_ 1 4
# 5 1 var_ 2 6
# 6 1 var_ 3 7
# 7 1 var_ 1 5
# 8 1 var_ 2 8
# 9 1 var_ 3 8
#10 1 var_ 1 7
# … with 26 more rows
recode is a function that returns a vector. If you're aiming for a tidyverse workflow, you can use a mutate function to get the desired result
newdata <- olddata %>%
gather(var, type, var_a:var_c) %>%
separate(var, into = c("var", "role"), sep = -1) %>%
mutate( role = recode(role, "a"=1, "b"=2, "c"=3))
head(newdata)
x var role type
1 1 var_ 1 3
2 1 var_ 1 5
3 1 var_ 1 2
4 1 var_ 1 4
5 1 var_ 1 10
6 1 var_ 1 7
Related
I have conflict data that looks like this
conflict_ID country_code SideA
1 1 1
1 2 1
1 3 0
2 4 1
2 5 0
Now I want to make it into dyadic conflict data that looks like this (SideA=1 should be country_code_1):
conflict_ID country_code_1 country_code_2
1 1 3
1 2 3
2 4 5
Can anyone point me in the right direction?
Here's a direct approach:
df %>%
filter(SideA == 1) %>%
select(conflict_ID, country_code_1 = country_code) %>%
left_join(
df %>%
filter(SideA == 0) %>%
select(conflict_ID, country_code_2 = country_code),
by = "conflict_ID"
)
# conflict_ID country_code_1 country_code_2
# 1 1 1 3
# 2 1 2 3
# 3 2 4 5
Using this data:
df = read.table(text = 'conflict_ID country_code SideA
1 1 1
1 2 1
1 3 0
2 4 1
2 5 0 ', header = T)
This extends the previous issue you posted. You could produce all combinations for each conflict_ID, and filter out those combinations where country_code_2 matches country_code with SideA == 1.
library(dplyr)
library(tidyr)
mydf %>%
group_by(conflict_ID) %>%
summarise(country_code = combn(country_code, 2, sort, simplify = FALSE),
.groups = 'drop') %>%
unnest_wider(country_code, names_sep = '_') %>%
anti_join(filter(mydf, SideA == 1),
by = c("conflict_ID", "country_code_2" = "country_code"))
# # A tibble: 3 × 3
# conflict_ID country_code_1 country_code_2
# <int> <int> <int>
# 1 1 1 3
# 2 1 2 3
# 3 2 4 5
I'd like to take a tibble (or dataframe), convert one of the columns to numeric, only select the same column plus a third column, and filter out NAs.
Given the following data:
library(tidyverse)
set.seed(1)
mytib <- tibble(a = as.character(c(1:5, NA)),
b = as.character(c(6:8, NA, 9:10)),
c = as.character(sample(x = c(0,1), size = 6, replace = TRUE)))
vars <- c("a", "b")
I have created the following function
convert_tib <- function(var, tib){
tib <- tib %>%
mutate("{var}" = as.numeric({{ var }})) %>%
dplyr::select({{ var }}, c) %>%
filter(!is.na({{ var }}))
return(tib)
}
And run it with purrr:map
map(vars, ~ convert_tib(var = ., tib = mytib))
The output of this code unfortunately does not convert the vector to numeric and it also doesn't filter out the NA. I have tried many different strategies such as ensym(var) and enquo(var) inside the function and leaving out the curly-curly operators.
What I'd like to get is the following:
> map(vars, ~ convert_tib(var = ., tib = mytib))
[[1]]
# A tibble: 5 × 2
a c
<int> <int>
1 1 0
2 2 1
3 3 0
4 4 0
5 5 1
[[2]]
# A tibble: 5 × 2
b c
<int> <int>
1 6 0
2 7 1
3 8 0
4 9 1
5 10 0
You may try this. I made use of ensym function inside your custom function, since I noticed you would like to specify the variable names as strings. Then I also used !! called big bang operator to unquote it. In the end you also need := to define a custom variable name in place of =:
library(dplyr)
library(rlang)
library(purrr)
convert_tib <- function(var, tib){
var <- ensym(var)
tib <- tib %>%
dplyr::select(!!var, c) %>%
mutate(!!var := as.integer(!!var),
c = as.integer(c)) %>%
filter(!is.na(!!var))
return(tib)
}
map(vars, convert_tib, mytib)
The output:
[[1]]
# A tibble: 5 x 2
a c
<int> <int>
1 1 0
2 2 1
3 3 0
4 4 0
5 5 1
[[2]]
# A tibble: 5 x 2
b c
<int> <int>
1 6 0
2 7 1
3 8 0
4 9 1
5 10 0
You can do this without injection or embracing:
library(dplyr)
library(purrr)
convert_tib <- function(tib, var) {
tib %>%
transmute(across(c(var, c), as.integer)) %>%
filter(!is.na(.data[[var]]))
}
map(vars, convert_tib, tib = mytib)
[[1]]
# A tibble: 5 x 2
a c
<int> <int>
1 1 0
2 2 1
3 3 0
4 4 0
5 5 1
[[2]]
# A tibble: 5 x 2
b c
<int> <int>
1 6 0
2 7 1
3 8 0
4 9 1
5 10 0
I want a function where i can enter different numbers of column names and have them grouped. The first piece of code here works:
df <- data.frame(col_a = sample(1:10, 100, replace = T),
col_b = sample(letters, 100, replace = T),
col_c = sample(LETTERS, 100, replace = T))
my_fun = function(df, ...) {
df %>% group_by_(...) %>% summarise(n = n())
}
my_fun(df , 'col_a')
my_fun(df , 'col_a', 'col_b')
my_fun(df , 'col_a', 'col_b', 'col_c')
What I now want is to apply the complete function, so all possible values in each grouped variable are present. I've manually typed col_a and col_b into the complete() function below. I'd want to pass the possible values as a function argument though, as I'm not always going to be grouping by col_a & col_b.
my_fun = function(df, ...) {
df %>% group_by_(...) %>% summarise(count = n()) %>%
ungroup() %>%
complete(col_a = 1:10, col_b = letters, fill = list(count = 0))
}
my_fun(df , 'col_a', 'col_b')
You can capture the data as named list. group_by + summarise n() can be replaced with count.
library(tidyverse)
my_fun = function(df, ...) {
args <- list(...)
df %>%
count(across(all_of(names(args))), name = 'count') %>%
complete(!!!args, fill = list(count = 0))
}
This can be ran as -
my_fun(df , 'col_a' = 1:12)
# col_a count
# <int> <dbl>
# 1 1 9
# 2 2 15
# 3 3 4
# 4 4 11
# 5 5 7
# 6 6 12
# 7 7 12
# 8 8 10
# 9 9 5
#10 10 15
#11 11 0
#12 12 0
my_fun(df , 'col_a' = 1:10, 'col_b' = letters)
# col_a col_b count
# <int> <chr> <dbl>
# 1 1 a 1
# 2 1 b 0
# 3 1 c 0
# 4 1 d 0
# 5 1 e 0
# 6 1 f 1
# 7 1 g 0
# 8 1 h 0
# 9 1 i 0
#10 1 j 0
# … with 250 more rows
I have a very simple case where I want to combine several data frames into one based on a common id elements of a particular data frame.
Example:
id <- c(1, 2, 3)
x <- c(10, 12, 14)
data1 <- data.frame(id, x)
id <- c(2, 3)
x <- c(20, 22)
data2 <- data.frame(id, x)
id <- c(1, 3)
x <- c(30, 32)
data3 <- data.frame(id, x)
Which gives us,
$data1
id x
1 1 10
2 2 12
3 3 14
$data2
id x
1 2 20
2 3 22
$data3
id x
1 1 30
2 3 32
Now, I want to combine all three data frames based on the id's of the data3. The expected output should look like
> comb
id x
1 1 10
2 1 NA
3 1 30
4 3 14
5 3 22
6 3 32
I am trying the following, but not getting the expected output.
library(dplyr)
library(tidyr)
combined <- bind_rows(data1, data2, data3, .id = "id") %>% arrange(id)
Any idea how to get the expected output?
Does this work:
library(dplyr)
library(tidyr)
data1 %>% full_join(data2, by = 'id') %>% full_join(data3, by = 'id') %>% arrange(id) %>% right_join(data3, by = 'id') %>%
pivot_longer(cols = -id) %>% select(-name) %>% distinct()
# A tibble: 6 x 2
id value
<dbl> <dbl>
1 1 10
2 1 NA
3 1 30
4 3 14
5 3 22
6 3 32
Combine the 3 dataframes in one list and use filter to select only the id's in 3rd dataframe.
library(dplyr)
library(tidyr)
bind_rows(data1, data2, data3, .id = "new_id") %>%
filter(id %in% id[new_id == 3]) %>%
complete(new_id, id)
# new_id id x
# <chr> <dbl> <dbl>
#1 1 1 10
#2 1 3 14
#3 2 1 NA
#4 2 3 22
#5 3 1 30
#6 3 3 32
A pure base R solution can also make it
lst <- list(data1, data2, data3)
reshape(
subset(
reshape(
do.call(rbind, Map(cbind, lst, grp = seq_along(lst))),
idvar = "id",
timevar = "grp",
direction = "wide"
),
id %in% lst[[3]]$id
),
idvar = "id",
varying = -1,
direction = "long"
)[c("id", "x")]
which gives
id x
1.1 1 10
3.1 3 14
1.2 1 NA
3.2 3 22
1.3 1 30
3.3 3 32
>
Using base R
do.call(rbind, unname(lapply(mget(ls(pattern = "^data\\d+$")), \(x) {
x1 <- subset(x, id %in% data3$id)
v1 <- setdiff(data3$id, x1$id)
if(length(v1) > 0) rbind(x1, cbind(id = v1, x = NA)) else x1
})))
-output
id x
1 1 10
3 3 14
2 3 22
11 1 NA
12 1 30
21 3 32
bind_rows(data1, data2, data3, .id = 'grp')%>%
complete(id, grp)%>%
select(-grp) %>%
filter(id%in%data3$id)
# A tibble: 6 x 2
id x
<dbl> <dbl>
1 1 10
2 1 NA
3 1 30
4 3 14
5 3 22
6 3 32
I am trying to gather() a data.frame, but somehow it is not doing what I want.
This is my data:
df <- data.frame("id" = c(1),
"reco_1"= c(2),
"sim_1" = c(2),
"title_1"= c(2),
"reco_2" = c(3),
"sim_2" = c(3),
"title_2"= c(3))
And this is what it looks like printed:
> df
id reco_1 sim_1 title_1 reco_2 sim_2 title_2
1 1 2 2 2 3 3 3
When I now gather() my df, it looks like this:
> df %>% gather(reco, sim, -id)
id reco sim
1 1 reco_1 2
2 1 sim_1 2
3 1 title_1 2
4 1 reco_2 3
5 1 sim_2 3
6 1 title_2 3
However, what I would like to have is the following structure:
id reco sim title
1 1 2 2 2
2 2 3 3 3
I would appreciate any help, since I do not even know whether gather() is even the right verb for it.
We can use pivot_longer
library(dplyr)
library(tidyr)
df %>%
pivot_longer(-id, names_to = c(".value", "new_id"), names_sep = "_") %>%
select(-id)
# A tibble: 2 x 4
new_id reco sim title
<chr> <dbl> <dbl> <dbl>
1 1 2 2 2
2 2 3 3 3