I have a df
df = data.frame(col1=1:4, col2 = 5:8, col3 = 9:12)
I want to change the value in row2, col2 to 44
In base R, I use df["2","col2"] = 44, how do I do this in tidyverse?
df = data.frame(col1=1:4, col2 = 5:8, col3 = 9:12)
df
df["2","col2"]=44
df
A possible solution:
library(dplyr)
df %>%
mutate(col2 = ifelse(row_number() == 2, 44, col2))
#> col1 col2 col3
#> 1 1 5 9
#> 2 2 44 10
#> 3 3 7 11
#> 4 4 8 12
You could maybe use the rows_update() function from dplyr?
rows_update(df, tibble(col1=2, col2=44))
# Matching, by = "col1"
# A tibble: 4 x 3
col1 col2 col3
<int> <int> <int>
1 1 5 9
2 2 44 10
3 3 7 11
4 4 8 12
Related
I ask this because I feel like I've over complicated my current solution and I'm hoping to find something that makes more sense. I want to create a column that contains a sorted comma separated string of values based on other columns. So I have a table like this:
dsA = tibble(
col1 = 21:25
, col2 = 16:20
, col3 = 11:15
, col4 = 6:10
)
col1 col2 col3 col4
<int> <int> <int> <int>
1 21 16 11 6
2 22 17 12 7
3 23 18 13 8
4 24 19 14 9
5 25 20 15 10
And I want to add a column of sorted values based on a subset of columns c("col2", "col3", "col4") in dsA
so I have this:
col1 col2 col3 col4 idString
<int> <int> <int> <int> <chr>
1 21 16 11 6 6, 11, 16
2 22 17 12 7 7, 12, 17
3 23 18 13 8 8, 13, 18
4 24 19 14 9 9, 14, 19
5 25 20 15 10 10, 15, 20
What I've done looks like this:
#columns to sort
sortCols <- c("col2", "col3", "col4")
#create list function
fnCreateList <- function(x)
list(unname(x[names(x) %in% sortCols
& !is.na(x)]))
#add the list to the tibble
dsA$colList <- apply(dsA, 1, fnCreateList)
#sort the list and convert to a string
dsA <- dsA %>%
rowwise() %>%
transmute(
col1, col2, col3, col4
, idString = toString(sort(unlist(colList)))
)
The entire thing feels overly complex, and I don't think I'm seeing the correct solution.
library(tidyr)
dsA = tibble(
col1 = 21:25
, col2 = 16:20
, col3 = 11:15
, col4 = 6:10
)
dsA$idString <- apply(dsA[-1], 1, function(x) toString(sort(x)))
dsA
#> # A tibble: 5 x 5
#> col1 col2 col3 col4 idString
#> <int> <int> <int> <int> <chr>
#> 1 21 16 11 6 6, 11, 16
#> 2 22 17 12 7 7, 12, 17
#> 3 23 18 13 8 8, 13, 18
#> 4 24 19 14 9 9, 14, 19
#> 5 25 20 15 10 10, 15, 20
Created on 2021-05-28 by the reprex package (v2.0.0)
Perhaps just this:
dsA = tibble(
col1 = 21:25
, col2 = 16:20
, col3 = 11:15
, col4 = 6:10
)
v <- dsA %>% select( -col1 ) %>%
apply( 1, function(row){ paste(sort(unlist(row)),collapse=", ") } )
dsA$idString <- v
Will this work:
library(dplyr)
library(stringr)
dsA %>% rowwise() %>% mutate(id = str_c(sort(c_across(col2:col4)), collapse = ','))
# A tibble: 5 x 5
# Rowwise:
col1 col2 col3 col4 id
<int> <int> <int> <int> <chr>
1 21 16 11 6 6,11,16
2 22 17 12 7 7,12,17
3 23 18 13 8 8,13,18
4 24 19 14 9 9,14,19
5 25 20 15 10 10,15,20
tidyr has a function unite for just this sort of thing:
library(tidyr)
dsA %>%
unite(idString, col4:col2, sep = ",", remove = F)
You can pipe this output to select(all_of(names(dsA)), everything()) if you want to maintain the column order.
I have a list with two dataframes, the first of which has two columns and the second of which has three.
dat.list<-list(dat1=data.frame(col1=c(1,2,3),
col2=c(10,20,30)),
dat2= data.frame(col1=c(5,6,7),
col2=c(30,40,50),
col3=c(7,8,9)))
# $dat1
# col1 col2
# 1 1 10
# 2 2 20
# 3 3 30
# $dat2
# col1 col2 col3
# 1 5 30 7
# 2 6 40 8
# 3 7 50 9
I am trying to create a new column in both dataframes using map(), mutate() and case_when(). I want this new column to be identical to col3 if the dataframe has more than two columns, and identical to col1 if it has two or less columns. I have tried to do this with the following code:
library(tidyverse)
dat.list %>% map(~ .x %>%
mutate(newcol=case_when(ncol(.)>2 ~ col3,
TRUE ~ col1),
))
However, this returns the following error: "object 'col3' not found". How can I get the desired output? Below is the exact output I am trying to achieve.
# $dat1
# col1 col2 newcol
# 1 1 10 1
# 2 2 20 2
# 3 3 30 3
# $dat2
# col1 col2 col3 newcol
# 1 5 30 7 7
# 2 6 40 8 8
# 3 7 50 9 9
if/else will do :
library(dplyr)
library(purrr)
dat.list %>% map(~ .x %>% mutate(newcol= if(ncol(.) > 2) col3 else col1))
#$dat1
# col1 col2 newcol
#1 1 10 1
#2 2 20 2
#3 3 30 3
#$dat2
# col1 col2 col3 newcol
#1 5 30 7 7
#2 6 40 8 8
#3 7 50 9 9
Base R using lapply :
lapply(dat.list, function(x) transform(x, newcol = if(ncol(x) > 2) col3 else col1))
Following a question I came across today, I would like to know how I can use bind_rows function in a pipe while avoiding duplication and NA values. Consider I have the following simple tibble:
df <- tibble(
col1 = c(3, 4, 5),
col2 = c(5, 3, 1),
col3 = c(6, 4, 9),
col4 = c(9, 6, 5)
)
I would like to bind col1 & col2 row-wise with col3 & col4 so that I have a tibble with 2 columns and 6 observations. In the end changing the names of the columns to colnew1 and colnew2.
But when I use bind_rows I got the following output with a lot of duplications and NA values.
df %>%
bind_rows(
select(., 1:2),
select(., 3:4)
)
# A tibble: 9 x 4
col1 col2 col3 col4
<dbl> <dbl> <dbl> <dbl>
1 3 5 6 9
2 4 3 4 6
3 5 1 9 5
4 3 5 NA NA
5 4 3 NA NA
6 5 1 NA NA
7 NA NA 6 9
8 NA NA 4 6
9 NA NA 9 5
# My desired output would be something like this:
f1 <- function(x) {
df <- x %>%
set_names(nm = rep(c("newcol1", "newcol2"), 2))
bind_rows(df[, c(1, 2)], df[, c(3, 4)])
}
f1(df)
# A tibble: 6 x 2
newcol1 newcol2
<dbl> <dbl>
1 3 5
2 4 3
3 5 1
4 6 9
5 4 6
6 9 5
I can get the desired output without a pipe but first I would like to know how I could use bind_rows in a pipe without getting NA values and duplications and second whether I could use select function in bind_rows as I remember once Hadley Wickham used filter function wrapped by bind_rows.
I would appreciate any explanation to this problem and thank you in advance.
Select the first two columns and bind_rows col3 col4 to col1 and col2 then use transmute
df1 <- df %>%
select(col1, col2) %>%
bind_rows(
df %>%
transmute(col1 = col3, col2 = col4)
)
Results:
# A tibble: 6 x 2
col1 col2
<dbl> <dbl>
1 3 5
2 4 3
3 5 1
4 6 9
5 4 6
6 9 5
I'm trying to concatenate a string that identifies the order of the columns by their value.
set.seed(100)
df <- tibble(id = 1:5,
col1 = sample(1:50, 5),
col2 = sample(1:50, 5),
col3 = sample(1:50, 5)) %>%
mutate_at(vars(-id), ~if_else(. <= 20, NA_integer_, .))
# A tibble: 5 x 4
id col1 col2 col3
<int> <int> <int> <int>
1 1 NA 44 NA
2 2 38 23 34
3 3 48 22 NA
4 4 25 NA 48
5 5 NA NA 43
res <- df %>%
add_column(order = c('col2',
'col2_col3_co1',
'col2_col1',
'col1_col3',
'col3'))
# A tibble: 5 x 5
id col1 col2 col3 order
<int> <int> <int> <int> <chr>
1 1 NA 44 NA col2
2 2 38 23 34 col2_col3_co1
3 3 48 22 NA col2_col1
4 4 25 NA 48 col1_col3
5 5 NA NA 43 col3
My current data is in the form of df while the column I'm trying to add is the order column in res. The ordering of the elements in the string is determined by the value of each column, and it also needs to skip over NAs. I'm trying to identify the sequence that each ID takes to populate a value in each column as the values are time in days. However, not all IDs will have a value in all columns, so there's missing values throughout. I usually work within tidyverse, but any solution or thoughts would be much appreciated.
An easier option is apply, loop over the rows (MARGIN = 1), remove the NA elements, order the rest of the non-NA, use the index to get the column names and paste them together
df$order <- apply(df[-1], 1, function(x) {x1 <- x[!is.na(x)]
paste(names(x1)[order(x1)], collapse="_")})
df$order
#[1] "col2" "col2_col3_col1" "col2_col1" "col1_col3" "col3"
Or using tidyverse
library(dplyr)
library(tidyr)
library(stringr)
df %>%
pivot_longer(cols = -id, values_drop_na = TRUE) %>%
arrange(id, value) %>%
group_by(id) %>%
summarise(order = str_c(name, collapse="_")) %>%
right_join(df) %>%
select(names(df), order)
# A tibble: 5 x 5
# id col1 col2 col3 order
# <int> <int> <int> <int> <chr>
#1 1 NA 44 NA col2
#2 2 38 23 34 col2_col3_col1
#3 3 48 22 NA col2_col1
#4 4 25 NA 48 col1_col3
#5 5 NA NA 43 col3
Or using pmap from purrr
library(purrr)
df %>%
mutate(order = pmap_chr(select(., starts_with('col')), ~
{x <- c(...)
x1 <- x[!is.na(x)]
str_c(names(x1)[order(x1)], collapse="_")}))
I have the following tibble:
library(tidyverse)
df <- tibble::tribble(
~gene, ~colB, ~colC,
"a", 1, 2,
"b", 2, 3,
"c", 3, 4,
"d", 1, 1
)
df
#> # A tibble: 4 x 3
#> gene colB colC
#> <chr> <dbl> <dbl>
#> 1 a 1 2
#> 2 b 2 3
#> 3 c 3 4
#> 4 d 1 1
What I want to do is to filter every columns after gene column
for values greater or equal 2 (>=2). Resulting in this:
gene, colB, colC
a NA 2
b 2 3
c 3 4
How can I achieve that?
The number of columns after genes actually is more than just 2.
One solution: convert from wide to long format, so you can filter on just one column, then convert back to wide at the end if required. Note that this will drop genes where no values meet the condition.
library(tidyverse)
df %>%
gather(name, value, -gene) %>%
filter(value >= 2) %>%
spread(name, value)
# A tibble: 3 x 3
gene colB colC
* <chr> <dbl> <dbl>
1 a NA 2
2 b 2 3
3 c 3 4
The forthcoming dplyr 0.6 (install from GitHub now, if you like) has filter_at, which can be used to filter to any rows that have a value greater than or equal to 2, and then na_if can be applied similarly through mutate_at, so
df %>%
filter_at(vars(-gene), any_vars(. >= 2)) %>%
mutate_at(vars(-gene), funs(na_if(., . < 2)))
#> # A tibble: 3 x 3
#> gene colB colC
#> <chr> <dbl> <dbl>
#> 1 a NA 2
#> 2 b 2 3
#> 3 c 3 4
or similarly,
df %>%
mutate_at(vars(-gene), funs(na_if(., . < 2))) %>%
filter_at(vars(-gene), any_vars(!is.na(.)))
which can be translated for use with dplyr 0.5:
df %>%
mutate_at(vars(-gene), funs(na_if(., . < 2))) %>%
filter(rowSums(is.na(.)) < (ncol(.) - 1))
All return the same thing.
We can use data.table
library(data.table)
setDT(df)[df[, Reduce(`|`, lapply(.SD, `>=`, 2)), .SDcols = colB:colC]
][, (2:3) := lapply(.SD, function(x) replace(x, x < 2, NA)), .SDcols = colB:colC][]
# gene colB colC
#1: a NA 2
#2: b 2 3
#3: c 3 4
Or with melt/dcast
dcast(melt(setDT(df), id.var = 'gene')[value>=2], gene ~variable)
# gene colB colC
#1: a NA 2
#2: b 2 3
#3: c 3 4
Alternatively we could also try the below code
df %>% rowwise %>%
filter(any(c_across(starts_with('col'))>=2)) %>%
mutate(across(starts_with('col'), ~ifelse(!(.>=2), NA, .)))
Created on 2023-02-05 with reprex v2.0.2
# A tibble: 3 × 3
# Rowwise:
gene colB colC
<chr> <dbl> <dbl>
1 a NA 2
2 b 2 3
3 c 3 4