Updating individual values (not rows) in an R data.frame - r

I would like to update values of var3 in an R data.frame mydata according to a simple criterion.
var1 var2 var3
1 1 4 5
2 3 58 800
3 8 232 8
I would think that the following should do:
mydata$var3[mydata$var3 > 500,] <- NA
However, this replaces the entire row of every matching record with NA (all cells of the row), instead of just the var3 value (cell):
var1 var2 var3
1 1 4 5
2 NA NA NA
3 8 232 8
How can I ensure that just the value for the selected variable is replaced? mydata should then look like
var1 var2 var3
1 1 4 5
2 3 58 NA
3 8 232 8

Use which and arr.ind=TRUE
> mydata[which(mydata[,3]>500, arr.ind=TRUE), 3] <- NA
> mydata
var1 var2 var3
1 1 4 5
2 3 58 NA
3 8 232 8
Or just modify your previous attempt...
mydata[mydata$var3 > 500, 3] <- NA
This also works
mydata$var3[mydata$var3 > 500 ] <- NA # note no comma is inside [ ]
Your attempt didnt work because mydata$var3 gives a vector and you are indexing it as if it were a matrix by using [mydata$var3 > 500,] so a dimension error is thrown. You almost got it, all you have to do is remove the comma in your code (see my last alternative).

Related

How to call column names from an object in dplyr?

I am trying to replace all zeros in multiple columns with NA using dplyr.
However, since I have many variables, I do not want to call them all by one, but rather store them in an object that I can call afterwards.
This is a minimal example of what I did:
library(dplyr)
Data <- data.frame(var1=c(1:10), var2=rep(c(0,4),5), var3 = rep(c(2,0,3,4,5),2), var4 = rep(c(7,0),5))
col <- Data[,c(2:4)]
Data <- Data %>%
mutate(across(col , na_if, 0))
However, if I do this, I get the following error message:
Error: Problem with 'mutate()' input '..1'.
x Must subset columns with a valid subscript vector.
x Subscript has the wrong type 'data.frame<
var2: double
var3: double
var4: double>'.
i It must be numeric or character.
i Input '..1' is '(function (.cols = everything(), .fns = NULL, ..., .names = NULL) ...'.
I have tried to change the format of col to a tibble, but that did not help.
Could anyone tell me how to make this work?
In case you wanted to target numeric columns only, then try helper functions like where(), which will select any variable where the function returns TRUE. I suppose the only benefit here is targeting a specific type of variable.
library(dplyr)
# The where() function will select var2, var3, and var4
# Note: var1 is an integer so the function returns FALSE
# Useful when you want to completely ignore a specific type of variable
Data <- data.frame(
var1 = c(1:10),
var2 = rep(c(0, 4),5),
var3 = rep(c(2, 0 ,3, 4, 5), 2),
var4 = rep(c(7, 0), 5)
)
Data %>%
mutate(across(where(is.numeric), ~na_if(., 0)))
Here is the output:
var1 var2 var3 var4
1 1 NA 2 7
2 2 4 NA NA
3 3 NA 3 7
4 4 4 4 NA
5 5 NA 5 7
6 6 4 2 NA
7 7 NA NA 7
8 8 4 3 NA
9 9 NA 4 7
10 10 4 5 NA
The other answer you'll find here is great and allows you to select any arbitrary number of columns.
Here, the col should be names of the Data. As there is a function name with col, we can name the object differently, wrap with all_of and replace the 0 to NA within across
library(dplyr)
col1 <- names(Data)[2:4]
Data <- Data %>%
mutate(across(all_of(col1) , na_if, 0))
-output
Data
# var1 var2 var3 var4
#1 1 NA 2 7
#2 2 4 NA NA
#3 3 NA 3 7
#4 4 4 4 NA
#5 5 NA 5 7
#6 6 4 2 NA
#7 7 NA NA 7
#8 8 4 3 NA
#9 9 NA 4 7
#10 10 4 5 NA
NOTE: Here the OP asked about looping based on either the index or the column names

R - Multiply certain rows with a constant [duplicate]

I would like to update values of var3 in an R data.frame mydata according to a simple criterion.
var1 var2 var3
1 1 4 5
2 3 58 800
3 8 232 8
I would think that the following should do:
mydata$var3[mydata$var3 > 500,] <- NA
However, this replaces the entire row of every matching record with NA (all cells of the row), instead of just the var3 value (cell):
var1 var2 var3
1 1 4 5
2 NA NA NA
3 8 232 8
How can I ensure that just the value for the selected variable is replaced? mydata should then look like
var1 var2 var3
1 1 4 5
2 3 58 NA
3 8 232 8
Use which and arr.ind=TRUE
> mydata[which(mydata[,3]>500, arr.ind=TRUE), 3] <- NA
> mydata
var1 var2 var3
1 1 4 5
2 3 58 NA
3 8 232 8
Or just modify your previous attempt...
mydata[mydata$var3 > 500, 3] <- NA
This also works
mydata$var3[mydata$var3 > 500 ] <- NA # note no comma is inside [ ]
Your attempt didnt work because mydata$var3 gives a vector and you are indexing it as if it were a matrix by using [mydata$var3 > 500,] so a dimension error is thrown. You almost got it, all you have to do is remove the comma in your code (see my last alternative).

Replace values within a range in a data frame in R

I have ranked rows in a data frame based on values in each column.Ranking 1-10. not every column in picture
I have code that replaces values to NA or 1. But I can't figure out how to replace range of numbers, e.g. 3-6 with 1 and then replace the rest (1-2 and 7-10) with NA.
lag.rank <- as.matrix(lag.rank)
lag.rank[lag.rank > n] <- NA
lag.rank[lag.rank <= n] <- 1
At the moment it only replaces numbers above or under n. Any suggestions? I figure it should be fairly simple?
Is this what your are trying to accomplish?
> x <- sample(1:10,20, TRUE)
> x
[1] 1 2 8 2 6 4 9 1 4 8 6 1 2 5 8 6 9 4 7 6
> x <- ifelse(x %in% c(3:6), 1, NA)
> x
[1] NA NA NA NA 1 1 NA NA 1 NA 1 NA NA 1 NA 1 NA 1 NA 1
If your data aren't integers but numeric you can use between from the dplyr package:
x <- ifelse(between(x,3,6), 1, NA)

R How to select a value in a dataframe by selecting specific row and column

My dataframe looks like this:
var1 var2 var3
1 2 5 "other"
2 25 3 "sample"
3 4 5 "baseline_other"
4 60 5 "baseline_sample"
5 40 5 "other"
6 60 5 "other"
7 25 3 "sample"
8 6 8 "other"
9 60 7 "other"
10 4 3 "other"
I want to add a new column in which I do a calculation that uses the value of df$var1 of row df$var3==baseline_other in case the df$var3 == "other".
And if df$var3 == "sample" it should use the value of df$var1 of row df$var3==baseline_sample. (I hope i'm making myself clear). So i would like to give x_other the value of df$var1 in row baseline_other. (And the same for x_sample using baseline_sample).
I tried x_other <- df[df$var3=="baseline_other",df$var1] but i get an error saying undefined columns selected.
When I could make this work I would further like to add a column with a calculation (as I mentioned above) that would then look like this:
df$new <- ifelse(df$var3=="other", (abs(df$var1-x_other)/x_other), NA)
and repeat this for "sample" so:
df$new <- ifelse(df$var3=="sample", (abs(df$var1-x_sample)/x_sample), NA)
I hope I made myself clear enough. But my problem is to actually select a certain value in my dataframe by row name and column and then use that value in a calculation.
My new df should look like this:
var1 var2 var3 var4
1 2 5 "other" 0.5
2 25 3 "sample" 0.6
3 4 5 "baseline_other" NA
4 60 5 "baseline_sample" NA
5 40 5 "other" 9
6 60 5 "other" 14
7 25 3 "sample" 0.6
8 6 8 "other" 0.5
9 60 7 "other" 14
10 4 3 "sample" 0.9

Setting values to NA in a dataframe in R

Here is some reproducible code that shows the problem I am trying to solve in another dataset. Suppose I have a dataframe df with some NULL values in it. I would like to replace these with NAs, as I attempt to do below. But when I print this, it comes out as <NA>. See the second dataframe, which comes is the dataframe I would like to produce from df, in which the NA is a regular old NA without the carrots.
> df = data.frame(a=c(1,2,3,"NULL"),b=c(1,5,4,6))
> df[4,1] = NA
> print(df)
a b
1 1 1
2 2 5
3 3 4
4 <NA> 6
>
> d = data.frame(a=c(1,2,3,NA),b=c(1,5,4,6))
> print(d)
a b
1 1 1
2 2 5
3 3 4
4 NA 6

Resources