Replace selected columns' negative values with 0s or NAs using R

Replace selected columns' negative values with 0s or NAs using R - r

For the example data df, I want to replace the negative values in the first column (x1) with 0 and the third column (x3) with NA by the function replace_negatives as follows:
df <- data.frame(x1 = -3:1,
x2 = -1,
x3 = -2:2)
df
Out:
x1 x2 x3
1 -3 -1 -2
2 -2 -1 -1
3 -1 -1 0
4 0 -1 1
5 1 -1 2
Please note that I do not index by column names because there are many columns in the actual data and the column names are not fixed.
replace_negatives <- function(data){
df <<- data %>%
mutate(.[[1]] = if_else(.[[2]] < 0, 0, .[[1]])) %>%
mutate(.[[3]] = if_else(.[[3]] < 0, NA, .[[3]]))
return(df)
}
lapply(df, replace_negatives)
But it raises an error:
> replace_negatives <- function(data){
+ df <<- data %>%
+ mutate(.[[1]] = if_else(.[[2]] < 0, 0, .[[1]])) %>%
Error: unexpected '=' in:
" df <<- data %>%
mutate(.[[1]] ="
> mutate(.[[3]] = if_else(.[[3]] < 0, NA, .[[3]]))
Error: unexpected '=' in " mutate(.[[3]] ="
> return(df)
Error: no function to return from, jumping to top level
> }
Error: unexpected '}' in "}"
Any helps would be appreciated.
The expected output:
x1 x2 x3
1 0 -1 NA
2 0 -1 NA
3 0 -1 0
4 0 -1 1
5 1 -1 2

To perform the required operation, here's a base R method:
df <- data.frame(x1 = -3:1,
x2 = -1,
x3 = -2:2)
df[[1]] <- ifelse(df[[1]] < 0, 0, df[[1]])
df[[3]] <- ifelse(df[[3]] < 0, NA, df[[3]])
df
#> x1 x2 x3
#> 1 0 -1 NA
#> 2 0 -1 NA
#> 3 0 -1 0
#> 4 0 -1 1
#> 5 1 -1 2
Created on 2022-04-18 by the reprex package (v2.0.1)

You could use across in the function:
library(tidyverse)
replace_negatives <- function(data){
df <- data %>%
mutate(across(1, ~ ifelse(. < 0, 0, .)),
across(3, ~ ifelse(. < 0, NA, .)))
return(df)
}
replace_negatives(df)
Output
x1 x2 x3
1 0 -1 NA
2 0 -1 NA
3 0 -1 0
4 0 -1 1
5 1 -1 2

Here is base R version of your function:
replace_negatives <- function(df){
is.na(df[,1]) <- df[,1] < 0
index <- df[,3] < 0
df[,3][index] <- 0
return(df)
}
replace_negatives(df)
x1 x2 x3
1 NA -1 0
2 NA -1 0
3 NA -1 0
4 0 -1 1
5 1 -1 2

Related

How to change values of R cells (dataframe) based on a condition for specific rows>?

I have the following dataframe,
C1
C2
C3
0
0
0
1
1
0
0
0
0
1
1
0
0
0
0
I want to now apply the following condition on the dataframe for specific indexes only.
C1 should be equal to 0
A random number should be less than 0.5
If the above conditions match, I want to change the value of the Cell in C1 and C2 to 1 else do nothing.
I am trying the following: (rowIndex is the specific indexes on which I want to apply the conditions)
apply(DF[rowsIndex,], 2, fun)
where fun is:
fun<- function(x) {
ifelse(x==0,ifelse(runif(n=1)<0.5,x <- 1,x),x )
print(x)
}
My questions are:
In my function, How do I apply the conditions to a certain column only i.e C1 (I have tried using DF[rowsIndex,c(1)], but gives an error
Is there any other approach I can take Since this approach is not giving me any results and the same DF is printed.
Thanks

If you want to stay in base R:
#your dataframe
DF <- data.frame(C1 = c(0, 1, 0, 1, 0),
C2 = c(0, 1, 0, 1, 0),
C3 = c(0, 0, 0, 0, 0))
fun<- function(x) {
if(x[1]==0 & runif(n=1)<0.5) {
x[1:2] <- 1
}
return(x)
}
#your selection of rows you want to process
rowsIndex <- c(1, 2, 3, 4)
#Using MARGIN = 1 applies the function to the rows of a dataframe
#this returns a dataframe containing your selected and processed rows
DF_processed <- t(apply(DF[rowsIndex,], 1, fun))
#replace the selected rows in the original DF by the processed rows
DF[rowsIndex, ] <- DF_processed
print(DF)

Something like this?
library(dplyr)
df %>%
mutate(across(c(C1, C2), ~ifelse(C1 == 0 & runif(1) < 0.5, 1, .)))
C1 C2 C3
1 1 0 0
2 1 1 0
3 1 0 0
4 1 1 0
5 1 0 0
Applying it to your function:
fun<- function(df, x, y) {
df %>%
mutate(across(c({{x}}, {{y}}), ~ifelse({{x}} == 0 & runif(1) < 0.5, 1, .)))
}
fun(df, C1, C2)
C1 C2 C3
1 0 0 0
2 1 1 0
3 0 0 0
4 1 1 0
5 0 0 0

Extract certain columns from data frame R

My data frame looks like this:
x s1 s2 s3 s4
1 x1 1 1954 1 yes
2 x2 2 1955 1 no
3 x3 1 1976 2 yes
4 x4 2 1954 2 yes
5 x5 3 1943 1 no
Sample data:
df <- data.frame(x=c('x1','x2','x3','x4','x5'),
s1=c(1,2,1,2,3),
s2=c(1954,1955,1976,1954,1943),
s3=c(1,1,2,2,1),
s4=c('yes','no','yes','yes','no'))```
Is it possible to extract the data frame's columns containing integers 1 to 3? For example, the new data frame would look like:
newdf
x s1 s3
1 x1 1 1
2 x2 2 1
3 x3 1 2
4 x4 2 2
5 x5 3 1
Is it possible to change the s1 and s3 columns to 0 or 1 depending on whether or not the value in the column is 1? The altered data frame would then look like:
newdf2
x s1 s3
1 x1 1 1
2 x2 0 1
3 x3 1 0
4 x4 0 0
5 x5 0 1

base R
newdf <- df[, unique(c("x", names(which(sapply(df, function(z) is.numeric(z) & any(c(1, 3) %in% z)))))), drop = FALSE]
newdf
# x s1 s3
# 1 x1 1 1
# 2 x2 2 1
# 3 x3 1 2
# 4 x4 2 2
# 5 x5 3 1
newdf[-1] <- lapply(newdf[-1], function(z) +(z == 1))
newdf
# x s1 s3
# 1 x1 1 1
# 2 x2 0 1
# 3 x3 1 0
# 4 x4 0 0
# 5 x5 0 1
Walk-through:
first, we determine which columns are numbers and contain the numbers 1 or 3:
sapply(df, function(z) is.numeric(z) & any(c(1, 3) %in% z))
# x s1 s2 s3 s4
# FALSE TRUE FALSE TRUE FALSE
This will exclude any column that is not numeric, meaning that a character column that contains a literal "1" or "3" will not be retained. This is complete inference on my end; if you want to accept the string versions then remove the is.numeric(z) component.
second, we extract the names of those that are true, and prepend "x"
c("x", names(which(sapply(df, function(z) is.numeric(z) & any(c(1, 3) %in% z)))))
# [1] "x" "s1" "s3"
wrap that in unique(.) if, for some reason, "x" is also numeric and contains 1 or 3 (this step is purely defensive, you may not strictly need it)
select those columns, defensively adding drop=FALSE so that if only one column is matched, it still returns a full data.frame
replace just those columns (excluding the first column which is "x") with 0 or 1; the z == 1 returns logical, and the wrapping +(..) converts logical to 0 (false) or 1 (true).
dplyr
library(dplyr)
df %>%
select(x, where(~ is.numeric(.) & any(c(1, 3) %in% .))) %>%
mutate(across(-x, ~ +(. == 1)))
# x s1 s3
# 1 x1 1 1
# 2 x2 0 1
# 3 x3 1 0
# 4 x4 0 0
# 5 x5 0 1

I think this is what you expect :
my_df <- data.frame(x=c('x1','x2','x3','x4','x5'),
s1=c(1,2,1,2,3),
s2=c(1954,1955,1976,1954,1943),
s3=c(1,1,2,2,1),
s4=c('yes','no','yes','yes','no'))
my_df$end <- apply(my_df, 2, function(x) paste(x, collapse = " "))
my_df <- my_df %>% group_by(x) %>% mutate(end2 = paste(str_extract_all(string = end, pattern = "1|2|3", simplify = TRUE), collapse = " "))
my_var <- which(my_df$end == my_df$end2)
my_df[, my_var] <- t(apply(my_df[, my_var], 1, function(x) ifelse(test = x == 1, yes = 1, no = 0)))
my_df <- my_df[, c(1, my_var)]

Replace column values based on column name

I have a data frame with several binary variables: x1, x2, ... x100. I want to replace the entry 1 in each column with the number in the name of the column, i.e.:
data$x2[data$x2 == 1] <- 2
data$x3[data$x3 == 1] <- 3
data$x4[data$x4 == 1] <- 4
data$x5[data$x5 == 1] <- 5
...
How can I achieve this in a loop?

Using col:
# example data
set.seed(1); d <- as.data.frame(matrix(sample(0:1, 12, replace = TRUE), nrow = 3))
names(d) <- paste0("x", seq(ncol(d)))
d
# x1 x2 x3 x4
# 1 0 0 0 1
# 2 1 1 0 0
# 3 0 0 1 0
ix <- d == 1
d[ ix ] <- col(d)[ ix ]
d
# x1 x2 x3 x4
# 1 0 0 0 4
# 2 1 2 0 0
# 3 0 0 3 0

dplyr approach (using zx8754's data):
library(dplyr)
d %>%
mutate(across(starts_with('x'), ~ . * as.numeric(gsub('x', '', cur_column()))))
#> x1 x2 x3 x4
#> 1 0 0 0 4
#> 2 1 2 0 0
#> 3 0 0 3 0
Created on 2021-05-26 by the reprex package (v2.0.0)

Here is a base R solution with a lapply loop.
data[-1] <- lapply(names(data)[-1], function(k){
n <- as.integer(sub("[^[:digit:]]*", "", k))
data[data[[k]] == 1, k] <- n
data[[k]]
})
data
Test data.
set.seed(2021)
data <- replicate(6, rbinom(10, 1, 0.5))
data <- as.data.frame(data)
names(data) <- paste0("x", 1:6)

A solution based on a simple for loop is below (otherwise similar to the accepted answer using lapply):
for (i in 2:100) {
k <- paste0('x', i)
data[data[[k]] == 1, k] <- i
}

Converting counts to individual observations in r

I have a data set that looks as follows
df <- data.frame( name = c("a", "b", "c"),
judgement1= c(5, 0, NA),
judgement2= c(1, 1, NA),
judgement3= c(2, 1, NA))
I want to reshape the dataframe to look like this
# name judgement1 judgement2 judgement3
# a 1 0 0
# a 1 0 0
# a 1 0 0
# a 1 0 0
# a 1 0 0
# b 1 0 0
# b 0 1 0
# b 0 0 1
And so on. I have seen that untable is recommended on some other threads, but it does not appear to work with the current version of r. Is there a package that can convert summarised counts into individual observations?

You could try something like this:
df <- data.frame( name = c("a", "b", "c"),
judgement1= c(5, 0, NA),
judgement2= c(1, 1, NA),
judgement3= c(2, 1, NA))
rep.vec <- colSums(df[colnames(df) %in% paste0("judgement", (1:nrow(df)), sep="")], na.rm = TRUE)
want <- data.frame(name=df$name, cbind(diag(nrow(df))))
colnames(want)[-1] <- paste0("judgement", (1:nrow(df)), sep="")
(want <- want[rep(1:nrow(want), rep.vec), ])

I wrote a function that works to give you your desired output:
untabl <- function(df, id.col, count.cols) {
df[is.na(df)] <- 0 # replace NAs
out <- lapply(count.cols, function(x) { # for each column with counts
z <- df[rep(1:nrow(df), df[,x]), ] # replicate rows
z[, -c(id.col)] <- 0 # set all other columns to zero
z[, x] <- 1 # replace the count values with 1
z
})
out <- do.call(rbind, out) # combine the list
out <- out[order(out[,c(id.col)]),] # reorder (you can change this)
rownames(out) <- NULL # return to simple row numbers
out
}
untabl(df = df, id.col = 1, count.cols = c(2,3,4))
# name judgement1 judgement2 judgement3
#1 a 1 0 0
#2 a 1 0 0
#3 a 1 0 0
#4 a 1 0 0
#5 a 1 0 0
#6 a 0 1 0
#7 b 0 1 0
#8 a 0 0 1
#9 a 0 0 1
#10 b 0 0 1
And for your reference, reshape::untable consists of the following code:
function (df, num)
{
df[rep(1:nrow(df), num), ]
}

Convert negative values to zero in dataframe in R

I have a dataframe in R that I would like to convert all columns (outside the ids) from negative to zero
id1 id2 var1 var2 var3
-1 -1 0 -33 5
-1 -2 9 -10 -1
I can convert all columns with code line like:
temp[temp < 0] <- 0
But I can't adjust it to only a subset of columns. I've tried:
temp[temp < 0, -c(1,2)] <- 0
But this errors saying non-existent rows not allowed

Edit a bit your variant
temp[,-c(1,2)][temp[, -c(1,2)] < 0] <- 0

You can try using replace:
> mydf[-c(1, 2)] <- replace(mydf[-c(1, 2)], mydf[-c(1, 2)] < 0, 0)
> mydf
id1 id2 var1 var2 var3
1 -1 -1 0 0 5
2 -1 -2 9 0 0

We can use data.table
setDT(d1)
for(j in grep('^var', names(d1))){
set(d1, i= which(d1[[j]]<0), j= j, value=0)
}
d1
# id1 id2 var1 var2 var3
# 1: -1 -1 0 0 5
# 2: -1 -2 9 0 0

There might be fancier or more compact ways, but here's a vectorised replacement you can apply to the var columns:
mytable <- read.table(textConnection("
id1 id2 var1 var2 var3
-1 -1 0 -33 5
-1 -2 9 -10 -1"), header = TRUE)
mytable[, grep("^var", names(mytable))] <-
apply(mytable[, grep("^var", names(mytable))], 2, function(x) ifelse(x < 0, 0, x))
mytable
## id1 id2 var1 var2 var3
## 1 -1 -1 0 0 5
## 2 -1 -2 9 0 0

You could use pmax:
dat <- data.frame(id1=c(-1,-1), id2=c(-1,-2), var1=c(0,9), var2=c(-33,10), var3=c(5,-1))
dat[,-c(1,2)] <- matrix(pmax(unlist(dat[,-c(1,2)]),0), nrow=nrow(dat))