R - How to recode multiple columns [duplicate] - r

This question already has answers here:
Replacing character values with NA in a data frame
(7 answers)
Closed 1 year ago.
I am trying to change the 6s to NAs across multiple columns. I have tried using the mutate_at command in dplyr, but can't seem to make it work. Any ideas?
library(dplyr)
ID <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) #Create vector of IDs for ID column.
Score1 <- c(1, 2, 3, 2, 5, 6, 6, 2, 5, 4) #Create vector of scores for Score1 column.
Score2 <- c(2, 2, 3, 6, 5, 6, 6, 2, 3, 4) #Create vector of scores for Score2 column.
Score3 <- c(3, 2, 3, 4, 5, 5, 6, 2, 6, 4) #Create vector of scores for Score3 column.
df <- data.frame(ID, Score1, Score2, Score3) #Combine columns into a data frame.
VectorOfNames <- as.vector(c("Score1", "Score2", "Score3")) #Create a vector of column names.
df <- mutate_at(df, VectorOfNames, 6=NA) #Within the data frame, apply the function (6=NA) to the columns specified in VectorOfNames.

dplyr has the na_if() function for precisely this task. You were almost there with your code and can use:
mutate_at(df, VectorOfNames, ~na_if(.x, 6))
ID Score1 Score2 Score3
1 1 1 2 3
2 2 2 2 2
3 3 3 3 3
4 4 2 NA 4
5 5 5 5 5
6 6 NA NA 5
7 7 NA NA NA
8 8 2 2 2
9 9 5 3 NA
10 10 4 4 4

You could use :
library(dplyr)
df %>%mutate_at(VectorOfNames, ~replace(., . == 6, NA))
#OR
#df %>%mutate_at(VectorOfNames, ~ifelse(. == 6, NA, .))
# ID Score1 Score2 Score3
#1 1 1 2 3
#2 2 2 2 2
#3 3 3 3 3
#4 4 2 NA 4
#5 5 5 5 5
#6 6 NA NA 5
#7 7 NA NA NA
#8 8 2 2 2
#9 9 5 3 NA
#10 10 4 4 4
Or in base R :
df[VectorOfNames][df[VectorOfNames] == 6] <- NA

Related

Apply same recoding rules to multiple data frames

I have 5 data frames. I want to recode all variables ending with "_comfort", "_agree", and "effective" using the same rules for each data frame. As is, the values in each column are 1:5 and I want is to recode 5's to 1, 4's to 2, 2's to 4, and 5's to 1 (3 will stay the same).
I do not want the end result to one merged dataset, but instead to apply the same recoding rules across all 5 independent data frames. For simplicity sake, let's just assume I have 2 data frames:
df1 <- data.frame(a_comfort = c(1, 2, 3, 4, 5),
b_comfort = c(1, 2, 3, 4, 5),
c_effective = c(1, 2, 3, 4, 5))
df2 <- data.frame(a_comfort = c(1, 2, 3, 4, 5),
b_comfort = c(1, 2, 3, 4, 5),
c_effective = c(1, 2, 3, 4, 5))
What I want is:
df1 <- data.frame(a_comfort = c(5, 4, 3, 2, 1),
b_comfort = c(5, 4, 3, 2, 1),
c_effective = c(5, 4, 3, 2, 1))
df2 <- data.frame(a_comfort = c(5, 4, 3, 2, 1),
b_comfort = c(5, 4, 3, 2, 1),
c_effective = c(5, 4, 3, 2, 1))
Conventionally, I would use dplyr's mutate_atand ends_withto achieve my goal, but have not been successful with this method across multiple data frames. I am thinking a combination of the purr and dplyr packages will work, but haven't nailed down the exact technique.
Thanks in advance for any help!
You can use get() and assign() in a loop:
library(dplyr)
for (df_name in c("df1", "df2")) {
df <- mutate(
get(df_name),
across(
ends_with(c("_comfort", "_agree", "_effective")),
\(x) 6 - x
)
)
assign(df_name, df)
}
Result:
#> df1
a_comfort b_comfort c_effective
1 5 5 5
2 4 4 4
3 3 3 3
4 2 2 2
5 1 1 1
#> df2
a_comfort b_comfort c_effective
1 5 5 5
2 4 4 4
3 3 3 3
4 2 2 2
5 1 1 1
Note, however, it’s often better practice to keep multiple related dataframes in a list than loose in the global environment (see). In this case, you can use purrr::map() (or base::lapply()):
library(dplyr)
library(purrr)
dfs <- list(df1, df2)
dfs <- map(
dfs,
\(df) mutate(
df,
across(
ends_with(c("_comfort", "_agree", "_effective")),
\(x) 6 - x
)
)
)
Result:
#> dfs
[[1]]
a_comfort b_comfort c_effective
1 5 5 5
2 4 4 4
3 3 3 3
4 2 2 2
5 1 1 1
[[2]]
a_comfort b_comfort c_effective
1 5 5 5
2 4 4 4
3 3 3 3
4 2 2 2
5 1 1 1
You can use ls(pattern = 'df\\d+') to find all objects whose names match a certain pattern. Then store them into a list and pass to purrr::map or lapply for recoding.
library(dplyr)
df.lst <- purrr::map(
mget(ls(pattern = 'df\\d+')),
~ .x %>% mutate(6 - across(ends_with(c("_comfort", "_agree", "effective"))))
)
# $df1
# a_comfort b_comfort c_effective
# 1 5 5 5
# 2 4 4 4
# 3 3 3 3
# 4 2 2 2
# 5 1 1 1
#
# $df2
# a_comfort b_comfort c_effective
# 1 5 5 5
# 2 4 4 4
# 3 3 3 3
# 4 2 2 2
# 5 1 1 1
You can further overwrite those dataframes in your workspace from the list through list2env().
list2env(df.lst, .GlobalEnv)
Please try the below code where i convert the columns to factor and then recode them
data
a_comfort b_comfort c_effective
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
code
library(tidyverse)
df1 %>% mutate(across(c(ends_with('comfort'),ends_with('effective')), ~ factor(.x, levels=c('1','2','3','4','5'), labels=c('5','4','3','2','1'))))
output
a_comfort b_comfort c_effective
1 5 5 5
2 4 4 4
3 3 3 3
4 2 2 2
5 1 1 1

Completing the NAs of a Tibble in R

I have a database in R where there are some NAs in the variables. I would like to apply a logic function where the NAs would be filled with the immediately preceding value. Below is an example:
dados <- tibble::tibble(x = c(2, 3, 5, NA, 2, 1, NA, NA, 9, 3),
y = c(4, 1, 9, NA, 8, 5, NA, NA, 1, 2)
)
# A tibble: 10 x 2
x y
<dbl> <dbl>
1 2 4
2 3 1
3 5 9
4 NA NA
5 2 8
6 1 5
7 NA NA
8 NA NA
9 9 1
10 3 2
In this case, the 4th value of the variable x would be filled with a 5 and so on.
Thank you!
We could use fill from tidyr package:
ibrary(tidyr)
library(dplyr)
dados %>%
fill(c(x,y), .direction = "down")
x y
<dbl> <dbl>
1 2 4
2 3 1
3 5 9
4 5 9
5 2 8
6 1 5
7 1 5
8 1 5
9 9 1
10 3 2
We can use coalesce
library(dplyr)
dados %>%
mutate(across(x:y, ~ coalesce(., lag(.))))
# A tibble: 10 x 2
x y
<dbl> <dbl>
1 2 4
2 3 1
3 5 9
4 5 9
5 2 8
6 1 5
7 1 5
8 NA NA
9 9 1
10 3 2
library(dplyr)
dados %>%
mutate(x = case_when(is.na(x) ~ lag(x),
TRUE ~ x),
y = case_when(is.na(y) ~ lag(y),
TRUE ~ y))
The follow will only work, if the first value in a column is not NA but I leave that for the sake of clear and easy code as an execise for you we can solve this for one column as in:
library(tibble)
dados <- tibble::tibble(x = c(2, 3, 5, NA, 2, 1, NA, NA, 9, 3),
y = c(4, 1, 9, NA, 8, 5, NA, NA, 1, 2)
)
#where are the NA?
pos <- dados$x |>
is.na() |>
which()
# replace
while(any(is.na(dados$x)))
dados$x[pos] <- dados$x[pos-1]
dados

How to bind tibbles by row with different number of columns in R [duplicate]

This question already has answers here:
Combine two data frames by rows (rbind) when they have different sets of columns
(14 answers)
Closed 2 years ago.
I want to bind df1 with df2 by row, keeping the same column name, to obtain df3.
library(tidyverse)
df1 <- tibble(a = c(1, 2, 3),
b = c(4, 5, 6),
c = c(1, 5, 7))
df2 <- tibble(a = c(8, 9),
b = c(5, 6))
# how to bind these tibbles by row to get
df3 <- tibble(a = c(1, 2, 3, 8, 9),
b = c(4, 5, 6, 5, 6),
c = c(1, 5, 7, NA, NA))
Created on 2020-10-30 by the reprex package (v0.3.0)
Try this using bind_rows() from dplyr. Updated credit to #AbdessabourMtk:
df3 <- dplyr::bind_rows(df1,df2)
Output:
# A tibble: 5 x 3
a b c
<dbl> <dbl> <dbl>
1 1 4 1
2 2 5 5
3 3 6 7
4 8 5 NA
5 9 6 NA
A base R option
df2[setdiff(names(df1),names(df2))]<-NA
df3 <- rbind(df1,df2)
giving
> df3
# A tibble: 5 x 3
a b c
<dbl> <dbl> <dbl>
1 1 4 1
2 2 5 5
3 3 6 7
4 8 5 NA
5 9 6 NA
We can use rbindlist from data.table
library(data.table)
rbindlist(list(df1, df2), fill = TRUE)
-output
# a b c
#1: 1 4 1
#2: 2 5 5
#3: 3 6 7
#4: 8 5 NA
#5: 9 6 NA

Alternative to dcast in data.table? [duplicate]

This question already has an answer here:
R spreading multiple columns with tidyr [duplicate]
(1 answer)
Closed 5 years ago.
I'm using the below method to cast variables in a dataframe from long to wide format. However, I'm looking for an alternative way, using another package.
Any help is much appreciated?
subject <- c(1:10, 1:10)
condition <- c(rep(1,10), rep(2,10))
value <- c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5)
rating <- c(1, 3, 5, 2, 3, 5, 6, 7, 5, 3, 5, 7, 3, 6, 3, 5, 6, 7, 7, 8)
df <- data.frame(subject, condition, value, rating)
library(data.table)
df_wide <- dcast(setDT(df), subject ~ condition, value.var=c("rating", "value"))
We can use tidyverse
library(tidyverse)
df %>%
gather(key, val, value:rating) %>%
unite(cond, key, condition) %>%
spread(cond, val)
# subject rating_1 rating_2 value_1 value_2
#1 1 1 5 1 1
#2 2 3 7 2 2
#3 3 5 3 3 3
#4 4 2 6 4 4
#5 5 3 3 5 5
#6 6 5 5 1 1
#7 7 6 6 2 2
#8 8 7 7 3 3
#9 9 5 7 4 4
#10 10 3 8 5 5

Linear intrapolation in data.table [duplicate]

I would like to perform a linear interpolation in a variable of a data frame which takes into account the: 1) time difference between the two points, 2) the moment when the data was taken and 3) the individual taken for measure the variable.
For example in the next dataframe:
df <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),
Individuals=c(1,1,1,1,1,1,1,2,2,2),
Value=c(1, 2, 3, NA, 5, NA, 7, 5, NA, 7))
df
I would like to obtain:
result <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),
Individuals=c(1,1,1,1,1,1,1,2,2,2),
Value=c(1, 2, 3, 4, 5, 6, 7, 5, 5.5, 6))
result
I cannot use exclusively the function na.approx of the package zoo because all observations are not consecutives, some observations belong to one individual and other observations belong to other ones. The reason is because if the second individual would have its first obsrevation with NA and I would use exclusively the function na.approx, I would be using information from the individual==1 to interpolate the NA of the individual==2 (e.g the next data frame would have sucherror)
df_2 <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),
Individuals=c(1,1,1,1,1,1,1,2,2,2),
Value=c(1, 2, 3, NA, 5, NA, 7, NA, 5, 7))
df_2
I have tried using the packages zoo and dplyr:
library(dplyr)
library(zoo)
proof <- df %>%
group_by(Individuals) %>%
na.approx(df$Value)
But I cannot perform group_by in a zoo object.
Do you know how to interpolate NA values in one variable by groups?
Thanks in advance,
Use data.frame, rather than cbind to create your data. cbind returns a matrix, but you need a data frame for dplyr. Then use na.approx inside mutate. I've commented out group_by, as you haven't provided the grouping variable in your data, but the approach should work once you've added the grouping variable to the data frame.
df <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),
Individuals=c(1,1,1,1,1,1,1,2,2,2),
Value=c(NA, 2, 3, NA, 5, NA, 7, 8, NA, 10))
library(dplyr)
library(zoo)
df %>%
group_by(Individuals) %>%
mutate(ValueInterp = na.approx(Value, na.rm=FALSE))
time Individuals Value ValueInterp
1 1 1 NA NA
2 2 1 2 2
3 3 1 3 3
4 4 1 NA 4
5 5 1 5 5
6 6 1 NA 6
7 7 1 7 7
8 1 2 8 8
9 2 2 NA 9
10 3 2 10 10
Update: To interpolate multiple columns, we can use mutate_at. Here's an example with two value columns. We use mutate_at to run na.approx on all columns that include "Value" in the column name. list(interp=na.approx) tells mutate_at to generate new column names by running na.approx and adding interp as a suffix to generate the new column names:
df <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),
Individuals=c(1,1,1,1,1,1,1,2,2,2),
Value1=c(NA, 2, 3, NA, 5, NA, 7, 8, NA, 10),
Value2=c(NA, 2, 3, NA, 5, NA, 7, 8, NA, 10)*2)
df %>%
group_by(Individuals) %>%
mutate_at(vars(matches("Value")), list(interp=na.approx), na.rm=FALSE)
time Individuals Value1 Value2 Value1_interp Value2_interp
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 NA NA NA NA
2 2 1 2 4 2 4
3 3 1 3 6 3 6
4 4 1 NA NA 4 8
5 5 1 5 10 5 10
6 6 1 NA NA 6 12
7 7 1 7 14 7 14
8 1 2 8 16 8 16
9 2 2 NA NA 9 18
10 3 2 10 20 10 20
If you don't want to preserve the original, uninterpolated columns, you can do:
df %>%
group_by(Individuals) %>%
mutate_at(vars(matches("Value")), na.approx, na.rm=FALSE)
We can use data.table
library(data.table)
library(zoo)
setDT(df1)[, ValueInterp:= na.approx(Value, na.rm=TRUE), by = Individual]

Resources