I got a data frame (test) with a matrix of 4 x 2. I intended to use ifelse function to fix the dataset. Lines of code as below:
test <- data.frame(cbind(c(4,-5,-6,1),c("1","-3","4","-3")),stringsAsFactors = F)
test$X1 <- as.numeric(test$X1)
test$X2 <- as.numeric(test$X2)
test$X2 <- ifelse(test$X1<0 & test$X2>0, test$X2, test$X2*-1)
How do we write a code which apply the vice versa condition which means that if X1 < 0 & X2 > 0, then make X2 < 0, which apply the same on X1 (vice versa on the same logic)
The expected output is:
X1 <- 4 -5 -6 -1
X2 <- 1 -3 -4 -3
Would appreciate on any ideas.
We could achieve the desired result as follows using dplyr(assuming I understood the logic (which means that if X1 < 0 & X2 > 0, then make X2 < 0, which apply the same on X1 (vice versa on the same logic) well):
test %>%
mutate(X2 = ifelse(X1 <0 & X2>0, -X2, X2),
X1 = ifelse(X2<0 & X1>0, -X1,X1))
X1 X2
1 4 1
2 -5 -3
3 -6 -4
4 -1 -3
You could do
test$X2 <- with(test, X2 * c(1, -1)[(X1 < 0 & X2 > 0) + 1])
test$X1 <- with(test, X1 * c(1, -1)[(X1 > 0 & X2 < 0) + 1])
test
# X1 X2
#1 4 1
#2 -5 -3
#3 -6 -4
#4 -1 -3
To explain, let's take the first case.
The condition returns a logical vector
with(test, X1 < 0 & X2 > 0)
#[1] FALSE FALSE TRUE FALSE
By adding + 1 we convert it to numerical index where FALSE becomes 1 and TRUE becomes 2
with(test, X1 < 0 & X2 > 0) + 1
#[1] 1 1 2 1
We use this index to subset c(1, -1)
c(1, -1)[with(test, X1 < 0 & X2 > 0) + 1]
#[1] 1 1 -1 1
which is then multiplied to X2
with(test, X2 * c(1, -1)[(X1 < 0 & X2 > 0) + 1])
#[1] 1 -3 -4 -3
Related
For the example data df, I want to replace the negative values in the first column (x1) with 0 and the third column (x3) with NA by the function replace_negatives as follows:
df <- data.frame(x1 = -3:1,
x2 = -1,
x3 = -2:2)
df
Out:
x1 x2 x3
1 -3 -1 -2
2 -2 -1 -1
3 -1 -1 0
4 0 -1 1
5 1 -1 2
Please note that I do not index by column names because there are many columns in the actual data and the column names are not fixed.
replace_negatives <- function(data){
df <<- data %>%
mutate(.[[1]] = if_else(.[[2]] < 0, 0, .[[1]])) %>%
mutate(.[[3]] = if_else(.[[3]] < 0, NA, .[[3]]))
return(df)
}
lapply(df, replace_negatives)
But it raises an error:
> replace_negatives <- function(data){
+ df <<- data %>%
+ mutate(.[[1]] = if_else(.[[2]] < 0, 0, .[[1]])) %>%
Error: unexpected '=' in:
" df <<- data %>%
mutate(.[[1]] ="
> mutate(.[[3]] = if_else(.[[3]] < 0, NA, .[[3]]))
Error: unexpected '=' in " mutate(.[[3]] ="
> return(df)
Error: no function to return from, jumping to top level
> }
Error: unexpected '}' in "}"
Any helps would be appreciated.
The expected output:
x1 x2 x3
1 0 -1 NA
2 0 -1 NA
3 0 -1 0
4 0 -1 1
5 1 -1 2
To perform the required operation, here's a base R method:
df <- data.frame(x1 = -3:1,
x2 = -1,
x3 = -2:2)
df[[1]] <- ifelse(df[[1]] < 0, 0, df[[1]])
df[[3]] <- ifelse(df[[3]] < 0, NA, df[[3]])
df
#> x1 x2 x3
#> 1 0 -1 NA
#> 2 0 -1 NA
#> 3 0 -1 0
#> 4 0 -1 1
#> 5 1 -1 2
Created on 2022-04-18 by the reprex package (v2.0.1)
You could use across in the function:
library(tidyverse)
replace_negatives <- function(data){
df <- data %>%
mutate(across(1, ~ ifelse(. < 0, 0, .)),
across(3, ~ ifelse(. < 0, NA, .)))
return(df)
}
replace_negatives(df)
Output
x1 x2 x3
1 0 -1 NA
2 0 -1 NA
3 0 -1 0
4 0 -1 1
5 1 -1 2
Here is base R version of your function:
replace_negatives <- function(df){
is.na(df[,1]) <- df[,1] < 0
index <- df[,3] < 0
df[,3][index] <- 0
return(df)
}
replace_negatives(df)
x1 x2 x3
1 NA -1 0
2 NA -1 0
3 NA -1 0
4 0 -1 1
5 1 -1 2
My data frame looks like this:
x s1 s2 s3 s4
1 x1 1 1954 1 yes
2 x2 2 1955 1 no
3 x3 1 1976 2 yes
4 x4 2 1954 2 yes
5 x5 3 1943 1 no
Sample data:
df <- data.frame(x=c('x1','x2','x3','x4','x5'),
s1=c(1,2,1,2,3),
s2=c(1954,1955,1976,1954,1943),
s3=c(1,1,2,2,1),
s4=c('yes','no','yes','yes','no'))```
Is it possible to extract the data frame's columns containing integers 1 to 3? For example, the new data frame would look like:
newdf
x s1 s3
1 x1 1 1
2 x2 2 1
3 x3 1 2
4 x4 2 2
5 x5 3 1
Is it possible to change the s1 and s3 columns to 0 or 1 depending on whether or not the value in the column is 1? The altered data frame would then look like:
newdf2
x s1 s3
1 x1 1 1
2 x2 0 1
3 x3 1 0
4 x4 0 0
5 x5 0 1
base R
newdf <- df[, unique(c("x", names(which(sapply(df, function(z) is.numeric(z) & any(c(1, 3) %in% z)))))), drop = FALSE]
newdf
# x s1 s3
# 1 x1 1 1
# 2 x2 2 1
# 3 x3 1 2
# 4 x4 2 2
# 5 x5 3 1
newdf[-1] <- lapply(newdf[-1], function(z) +(z == 1))
newdf
# x s1 s3
# 1 x1 1 1
# 2 x2 0 1
# 3 x3 1 0
# 4 x4 0 0
# 5 x5 0 1
Walk-through:
first, we determine which columns are numbers and contain the numbers 1 or 3:
sapply(df, function(z) is.numeric(z) & any(c(1, 3) %in% z))
# x s1 s2 s3 s4
# FALSE TRUE FALSE TRUE FALSE
This will exclude any column that is not numeric, meaning that a character column that contains a literal "1" or "3" will not be retained. This is complete inference on my end; if you want to accept the string versions then remove the is.numeric(z) component.
second, we extract the names of those that are true, and prepend "x"
c("x", names(which(sapply(df, function(z) is.numeric(z) & any(c(1, 3) %in% z)))))
# [1] "x" "s1" "s3"
wrap that in unique(.) if, for some reason, "x" is also numeric and contains 1 or 3 (this step is purely defensive, you may not strictly need it)
select those columns, defensively adding drop=FALSE so that if only one column is matched, it still returns a full data.frame
replace just those columns (excluding the first column which is "x") with 0 or 1; the z == 1 returns logical, and the wrapping +(..) converts logical to 0 (false) or 1 (true).
dplyr
library(dplyr)
df %>%
select(x, where(~ is.numeric(.) & any(c(1, 3) %in% .))) %>%
mutate(across(-x, ~ +(. == 1)))
# x s1 s3
# 1 x1 1 1
# 2 x2 0 1
# 3 x3 1 0
# 4 x4 0 0
# 5 x5 0 1
I think this is what you expect :
my_df <- data.frame(x=c('x1','x2','x3','x4','x5'),
s1=c(1,2,1,2,3),
s2=c(1954,1955,1976,1954,1943),
s3=c(1,1,2,2,1),
s4=c('yes','no','yes','yes','no'))
my_df$end <- apply(my_df, 2, function(x) paste(x, collapse = " "))
my_df <- my_df %>% group_by(x) %>% mutate(end2 = paste(str_extract_all(string = end, pattern = "1|2|3", simplify = TRUE), collapse = " "))
my_var <- which(my_df$end == my_df$end2)
my_df[, my_var] <- t(apply(my_df[, my_var], 1, function(x) ifelse(test = x == 1, yes = 1, no = 0)))
my_df <- my_df[, c(1, my_var)]
I have a vector of numbers:
v1 <- c(1,2,3)
and I want to programmatically analyze the impact of sign change whose
variants could be:
v1[1] + v1[2] + v1[3]
[1] 6
v1[1] + v1[2] - v1[3]
[1] 0
v1[1] - v1[2] - v1[3]
[1] -4
v1[1] - v1[2] + v1[3]
[1] 2
How can I exchange signs ('+', '-') programatically? I'm thinking this is a silly question, but can't think my way out of my box, though my line of analysis points to evaluating changing signs.
Here's a quick way to get all possibilities with matrix multiplication:
signs = rep(list(c(1, -1)), length(v1))
signs = do.call(expand.grid, args = signs)
signs$sum = as.matrix(signs) %*% v1
signs
# Var1 Var2 Var3 sum
# 1 1 1 1 6
# 2 -1 1 1 4
# 3 1 -1 1 2
# 4 -1 -1 1 0
# 5 1 1 -1 0
# 6 -1 1 -1 -2
# 7 1 -1 -1 -4
# 8 -1 -1 -1 -6
If you don't want all combinations, you could filter down the signs data frame to the combos of interest, or build it in a way that only creates the combos you care about.
You may use gtools::permutations to get all possible permutations of signs, use apply to evaluate the values for each combination.
v1 <- c(1,2,3)
sign <- c('+', '-')
all_comb <- gtools::permutations(length(sign), length(v1) - 1, v = symbols, repeats.allowed = TRUE)
do.call(rbind, apply(all_comb, 1, function(x) {
exp <- do.call(sprintf, c(fmt = gsub(',', ' %s', toString(v1)), as.list(x)))
data.frame(exp, value = eval(parse(text = exp)))
}))
# exp value
#1 1 - 2 - 3 -4
#2 1 - 2 + 3 2
#3 1 + 2 - 3 0
#4 1 + 2 + 3 6
I have a data frame with several binary variables: x1, x2, ... x100. I want to replace the entry 1 in each column with the number in the name of the column, i.e.:
data$x2[data$x2 == 1] <- 2
data$x3[data$x3 == 1] <- 3
data$x4[data$x4 == 1] <- 4
data$x5[data$x5 == 1] <- 5
...
How can I achieve this in a loop?
Using col:
# example data
set.seed(1); d <- as.data.frame(matrix(sample(0:1, 12, replace = TRUE), nrow = 3))
names(d) <- paste0("x", seq(ncol(d)))
d
# x1 x2 x3 x4
# 1 0 0 0 1
# 2 1 1 0 0
# 3 0 0 1 0
ix <- d == 1
d[ ix ] <- col(d)[ ix ]
d
# x1 x2 x3 x4
# 1 0 0 0 4
# 2 1 2 0 0
# 3 0 0 3 0
dplyr approach (using zx8754's data):
library(dplyr)
d %>%
mutate(across(starts_with('x'), ~ . * as.numeric(gsub('x', '', cur_column()))))
#> x1 x2 x3 x4
#> 1 0 0 0 4
#> 2 1 2 0 0
#> 3 0 0 3 0
Created on 2021-05-26 by the reprex package (v2.0.0)
Here is a base R solution with a lapply loop.
data[-1] <- lapply(names(data)[-1], function(k){
n <- as.integer(sub("[^[:digit:]]*", "", k))
data[data[[k]] == 1, k] <- n
data[[k]]
})
data
Test data.
set.seed(2021)
data <- replicate(6, rbinom(10, 1, 0.5))
data <- as.data.frame(data)
names(data) <- paste0("x", 1:6)
A solution based on a simple for loop is below (otherwise similar to the accepted answer using lapply):
for (i in 2:100) {
k <- paste0('x', i)
data[data[[k]] == 1, k] <- i
}
I have a dataframe in R that I would like to convert all columns (outside the ids) from negative to zero
id1 id2 var1 var2 var3
-1 -1 0 -33 5
-1 -2 9 -10 -1
I can convert all columns with code line like:
temp[temp < 0] <- 0
But I can't adjust it to only a subset of columns. I've tried:
temp[temp < 0, -c(1,2)] <- 0
But this errors saying non-existent rows not allowed
Edit a bit your variant
temp[,-c(1,2)][temp[, -c(1,2)] < 0] <- 0
You can try using replace:
> mydf[-c(1, 2)] <- replace(mydf[-c(1, 2)], mydf[-c(1, 2)] < 0, 0)
> mydf
id1 id2 var1 var2 var3
1 -1 -1 0 0 5
2 -1 -2 9 0 0
We can use data.table
setDT(d1)
for(j in grep('^var', names(d1))){
set(d1, i= which(d1[[j]]<0), j= j, value=0)
}
d1
# id1 id2 var1 var2 var3
# 1: -1 -1 0 0 5
# 2: -1 -2 9 0 0
There might be fancier or more compact ways, but here's a vectorised replacement you can apply to the var columns:
mytable <- read.table(textConnection("
id1 id2 var1 var2 var3
-1 -1 0 -33 5
-1 -2 9 -10 -1"), header = TRUE)
mytable[, grep("^var", names(mytable))] <-
apply(mytable[, grep("^var", names(mytable))], 2, function(x) ifelse(x < 0, 0, x))
mytable
## id1 id2 var1 var2 var3
## 1 -1 -1 0 0 5
## 2 -1 -2 9 0 0
You could use pmax:
dat <- data.frame(id1=c(-1,-1), id2=c(-1,-2), var1=c(0,9), var2=c(-33,10), var3=c(5,-1))
dat[,-c(1,2)] <- matrix(pmax(unlist(dat[,-c(1,2)]),0), nrow=nrow(dat))