replacing cell values in dataframe for specific variables - r

I have thousands of rows in each column. I need to find specific values in column A based on the value of column B, and replace column A with a new value if it is greater than a specific value.
For example, if column B = 1 and the values in column A > 2, then I want to replace all the values in column A > 2 equal to 2 when column B = 1.
I've tried this code:
if(dt$B=='1'){
dt <- dt %>% mutate(A = ifelse(A > 2, 2, A))
}
But this does not work. I've tried some other methods as well, but nothing I do works. Please let me know if you can help with this! Thank you.

We can have a & option within ifelse for the test condition
library(dplyr)
dt <- dt %>%
mutate(A = ifelse(A > 2 & B == 1, 2, A))

Related

Creating new column with values based on old columns including NA

I have a dataset where I'm focusing on 2 specific columns. I want to create a new column with the following:
How can I do this in R?
Thanks
This is the code that I tried, which didn't work. Gave me an error. Also, I wasn't sure how to include all the NA values in this code.
data_2=data_1%>%mutate(majoramp_indX=case_when(c("amplevel_r","amplevel_l")>=2~1,c("amplevel_r","amplevel_l")<2~0))
Then I also tried this, which gave me all 1s in the new column
data_1$majoramp_indX=case_when(c("amplevel_r","amplevel_l")>=2~1,c("amplevel_r","amplevel_l")<2~0)
Welcome to SO Mufti!
There is a point that needs clarifying in your two conditions: what to do if, for example A = 1 and B = 2? Is the result 0 or 1? To get you going, I've done code for the two conditions.
To code your first condition, you can then do
library(dplyr)
myData <- myData |> mutate(Con1 = ifelse((A %in% c(2:5) | (B %in% c(2:5))), 1, NA))
For your second condition, you can do
myData <- myData |> mutate(Con2 = ifelse((A %in% c(1, NA) | (B %in% c(1,NA))), 0, NA))
Hope this helps! :-)

Clustering rows by ID based on a column value condition multiple times

Some time ago I opened a related question in this post
Suppose I have the following df:
data <- data.frame(ID = c(1,1,1,1,1,1,1,1,1,1,1, 1, 1,1,1,1,1,1,1,1,1,1),
Obs1 = c(1,1,0,1,0,1,1,0,1,0,0,0,1,1,1,1,1,1,1,1,0,1),
Control = c(0,3,3,1,12,1,1,1,36,13,1,1,2,24,2,2,48,24,20,21,10,10),
ClusterObs1 = c(1,1,1,2,2,3,3,3,4,4,4,4,5,5,5,5,5,5,5,5,5,6))
And I want to obtain:
data <- data.frame(ID = c(1,1,1,1,1,1,1,1,1,1,1, 1, 1,1,1,1,1,1,1,1,1,1),
Obs1 = c(1,1,0,1,0,1,1,0,1,0,0,0,1,1,1,1,1,1,1,1,0,1),
Control = c(0,3,3,1,12,1,1,1,36,13,1,1,2,24,2,2,48,24,20,21,10,10),
ClusterObs1 = c(1,1,1,2,2,3,3,3,4,4,4,4,5,5,5,5,5,5,5,5,5,6),
DesiredResultClusterObs1 = c(1,1,1,2,2,3,3,3,4,4,4,4,5,6,6,6,7,8,9,10,10,11))
The conditions are:
If value of 'Control' is higher than 12 and actual 'Obs1' value is equal to 1 and to previous 'Obs1' value, 'DesiredResultClusterObs1' value should add +1 (the main difference with the other question is that consecutive control values above 12 must be considered)
Any idea of how can I achieve the desired result.
I don't know much how to use the whith() and rle() functions, but i've got to a solution to the problem, using ifelse.
data <- data %>% mutate (aux = ifelse (Control>12 & Obs1 == 1 & lag(Obs1) ==1,1,0),
DesiredResultClusterObs1 = ClusterObs1 + cumsum(aux))
The aux variable is not necessary, it just help to see step by step. You can do the following too
data <- data %>% mutate (DesiredResultClusterObs1 =
ClusterObs1 +
cumsum(ifelse (Control>12 & Obs1 == 1 & lag(Obs1) ==1,1,0)))

Check if a column in a dataframe is of the same value

It is a follow-up question to this one. What I would like to check is if any column in a data frame contain the same value (numerical or string) for all rows. For example,
sample <- data.frame(col1=c(1, 1, 1), col2=c("a", "a", "a"), col3=c(12, 15, 22))
The purpose is to inspect each column in a data frame to see which column does not have identical entry for all rows. How to do this? In particular, there are both numbers as well as strings.
My expected output would be a vector containing the column number which has non-identical entries.
We can use apply columnwise (margin = 2) and calculate unique values in the column and select the columns which has number of unique values not equal to 1.
which(apply(sample, 2, function(x) length(unique(x))) != 1)
#col3
# 3
The same code can also be done using sapply or lapply call
which(sapply(sample, function(x) length(unique(x))) != 1)
#col3
# 3
A dplyr version could be
library(dplyr)
sample %>%
summarise_all(funs(n_distinct(.))) %>%
select_if(. != 1)
# col3
#1 3
We can use Filter
names(Filter(function(x) length(unique(x)) != 1, sample))
#[1] "col3"

Subset by value of next row

I'm looking to subset rows by the value of the next row for one column.
df <- data.frame(t = c(1,2,3,4,5,6,7,8),
b = c(1,2,1,0,1,0,1,2))
So I want to subset df and get the rows where b == 2 following any row where b == 1. So subset should return 2 rows (where t=1 and t=7)
I tried using which and lag from dplyr, as mentioned in other answers, but I couldn't get that to work.
We can get the next value with lead, create a condition to check whether it is equal to 2 and the current value is 1 and use that expression in the filter
library(dplyr)
df %>%
filter(b == 1, lead(b)==2)
# t b
#1 1 1
#2 7 1
Or use subset from base R
subset(df, c(b[-1] == 2, FALSE) & b == 1)

R function or loop that could go through a binary variable (1 and 0) in a dataframe and returns a third variable (y) value from a different column

I do need some help. I am trying to build a function or a loop using R that could go through a binary variable (1 and 0) in a dataframe in such way that everytime 1 is followed by a 0, I could save a vector indicating the value of a third variable (y) in the same line where it occurred. I tried a couple of options based on previous posts, but nothing gives me something even close from that.
My data looks a bit like that:
ID <- rep(1001, 5)
variable <- c(1, 1, 0, 1, 0)
y <- c(10, 20, 30, 40, 50)
df <- cbind(ID, variable, y)
In this case, for example, the answer would give me a vector with the y values 30 and 50. Sorry if someone already has answered that, I could not find something similar. Thanks a lot!
Here's a 'vectorial' solution. Basically, I paste together variable in position i and i+1. Then I check to see if the combination is "10". The position you want is actually the next one (e.g. i+1), so we add 1.
df <- data.frame(ID, variable, y)
idx <- which(paste0(df$variable[-nrow(df)], df$variable[-1]) == "10") + 1
df$y[idx]
Here is an approach with tidyverse:
library(tidyverse)
df %>%
as.tibble %>%
mutate(y1 = ifelse(lag(variable) == 1 & variable == 0, y, NA)) %>%
pull(y1)
#output
[1] NA NA 30 NA 50
and in base R:
ifelse(c(NA, df[-nrow(df),2]) == 1 & df[, 2] == 0, df[, 3], NA)
if the lag of variable is 1 and the variable is 0 then return y, else return NA.
If you would like to remove the NA. wrap it in na.omit

Resources