Subtract from values in a dataframe by condition - r

I want to select some columns from my dataframe, and subtract a number from all values meeting a condition. In my case, i want to select columns 5:10 of my data, and subtract 10 from all values >5, while keeping all other values the same, and then saving this dataframe.
The solution i have tried (below) just subtracts 10 from all the values. How can I do this? Any help much appreciated.
data <- data.frame(replicate(10,sample(-1:10,1000,rep=TRUE))) #generate random data
# what i have tried so far
(data[, 5:10] > 5) - 10

in base r you may use lapply
lapply(data[, 5:10], function(x) ifelse(x > 5, x - 10, x))
In dplyr you can do
data <- data.frame(replicate(10,sample(-1:10,1000,rep=TRUE)))
library(dplyr, warn.conflicts = F)
data %>%
mutate(across(5:10, ~ifelse(.>5, . - 10, .)))
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 9 3 3 5 -2 -4 1 4 -4 -1
2 1 0 7 7 -2 3 2 -1 4 -3
3 2 -1 8 1 1 1 -4 0 0 3
4 9 9 4 6 -2 -3 3 0 0 0
5 7 -1 9 5 0 1 1 -1 -1 2
6 4 9 4 7 4 1 0 -1 -3 -1
.
.
.
.

You can use -
cols <- 5:10
data[cols] <- data[cols] - 10 * +(data[cols] > 5)
+(data[cols] > 5) would give you 1/0 values which is multiplied by 10. So you'll have 10 for values which are greater than 5 and 0 otherwise. These values are subtracted from the selected columns of the dataframe.

I would use dplyr and base subsetting here.
library(dplyr)
data %>% mutate(across(5:10, ~{.x[.x>5]<-.x[.x>5]-10; .x}))
We can also substitute the whole subsetted dataframe in place, without loops or lapply, which can be done with not-so-beautiful but potentially very fast code:
data[,5:10][data[,5:10]>5]<-data[,5:10][data[,5:10]>5]-10
output
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 8 -1 0 3 1 -1 -2 0 1 -1
2 5 6 5 4 4 4 -1 3 2 -3
3 10 4 4 4 4 0 -3 -3 4 -1
4 1 7 5 5 -2 0 -3 5 1 5
5 0 6 7 1 0 -3 0 -1 -1 3
6 8 4 7 4 -3 5 0 -4 1 2
7 -1 5 9 7 0 1 0 2 4 4
8 9 8 5 3 -1 5 -3 -1 -4 -1
9 9 9 8 8 4 2 1 -1 1 3
10 8 8 9 5 2 -4 2 -3 -3 -1
......
[ reached 'max' / getOption("max.print") -- omitted 900 rows ]

Using vapply():
colIndices <- seq(5, 10)
df[,colIndices] <- vapply(
df[,colIndices],
function(x){
ifelse(x > 5, x - 10, x)
},
numeric(nrow(df))
)

Related

Specifying multiple column names inside mutate

How do I specify column names inside mutate when multiple columns are generated?
In this example:
set.seed(5)
data.frame(x2 = sample(1:10, 10),
x3 = sample(1:10, 10),
x1 = sample(1:10, 10),
y3 = sample(1:10, 10),
y2 = sample(1:10, 10),
y1 = sample(1:10, 10)) |>
mutate(z1 = x1 - y1,
z2 = x2 - y2,
z3 = x3 - y3) |>
mutate(zz = across(num_range(prefix = 'x',
range = 1:3)) - across(num_range(prefix = 'y',
range = 1:3)))
Resulting in:
x2 x3 x1 y3 y2 y1 z1 z2 z3 zz.x1 zz.x2 zz.x3
1 2 3 9 10 9 6 3 -7 -7 3 -7 -7
2 9 10 6 6 4 5 1 5 4 1 5 4
3 7 6 4 8 8 3 1 -1 -2 1 -1 -2
4 3 2 3 4 10 8 -5 -7 -2 -5 -7 -2
5 1 5 2 5 7 7 -5 -6 0 -5 -6 0
6 6 4 5 3 6 2 3 0 1 3 0 1
7 5 8 10 2 1 4 6 4 6 6 4 6
8 10 7 8 7 3 1 7 7 0 7 7 0
9 4 1 1 9 2 9 -8 2 -8 -8 2 -8
10 8 9 7 1 5 10 -3 3 8 -3 3 8
I want zz.x1 be named zz1, ...
Here's a dplyr-way:
mutate takes its name(s) from the first across and we can change this using the .names-argument. It accepts glue-style input that we can adapt to your needs using str_replace().
library(dplyr)
library(stringr)
df |>
mutate(across(num_range(prefix = 'x', range = 1:3),
.names = "{str_replace(col, 'x', 'z')}")
- across(num_range(prefix = 'y',range = 1:3)))
Output:
x2 x3 x1 y3 y2 y1 z1 z2 z3
1 2 3 9 10 9 6 3 -7 -7
2 9 10 6 6 4 5 1 5 4
3 7 6 4 8 8 3 1 -1 -2
4 3 2 3 4 10 8 -5 -7 -2
5 1 5 2 5 7 7 -5 -6 0
6 6 4 5 3 6 2 3 0 1
7 5 8 10 2 1 4 6 4 6
8 10 7 8 7 3 1 7 7 0
9 4 1 1 9 2 9 -8 2 -8
10 8 9 7 1 5 10 -3 3 8
Data:
set.seed(5)
df <- data.frame(x2 = sample(1:10, 10),
x3 = sample(1:10, 10),
x1 = sample(1:10, 10),
y3 = sample(1:10, 10),
y2 = sample(1:10, 10),
y1 = sample(1:10, 10))
Update: Or similar to OP's desired output
df2 |>
mutate(across(num_range(prefix = 'x', range = 1:3),
.names = "{str_replace(col, 'x', 'zz')}")
- across(num_range(prefix = 'y',range = 1:3)))
Output:
x2 x3 x1 y3 y2 y1 z1 z2 z3 zz1 zz2 zz3
1 2 3 9 10 9 6 3 -7 -7 3 -7 -7
2 9 10 6 6 4 5 1 5 4 1 5 4
3 7 6 4 8 8 3 1 -1 -2 1 -1 -2
4 3 2 3 4 10 8 -5 -7 -2 -5 -7 -2
5 1 5 2 5 7 7 -5 -6 0 -5 -6 0
6 6 4 5 3 6 2 3 0 1 3 0 1
7 5 8 10 2 1 4 6 4 6 6 4 6
8 10 7 8 7 3 1 7 7 0 7 7 0
9 4 1 1 9 2 9 -8 2 -8 -8 2 -8
10 8 9 7 1 5 10 -3 3 8 -3 3 8
Data
set.seed(5)
df2 <- data.frame(x2 = sample(1:10, 10),
x3 = sample(1:10, 10),
x1 = sample(1:10, 10),
y3 = sample(1:10, 10),
y2 = sample(1:10, 10),
y1 = sample(1:10, 10)) |>
mutate(z1 = x1 - y1,
z2 = x2 - y2,
z3 = x3 - y3)
I don't know how to do this with dplyr but in base R it is pretty straightforward. This might partly answer also your previous question.
# hard-coded variable suffixes
suff <- 1:3
# OR suffixes extracted from data
suff <- sort(unique(sub('[a-z]*', '', names(df))))
for (i in suff) {
df[[paste0('zz', i)]] <- df[[paste0('x', i)]] - df[[paste0('y', i)]]
}
df
# x2 x3 x1 y3 y2 y1 zz1 zz2 zz3
# 1 2 3 9 10 9 6 3 -7 -7
# 2 9 10 6 6 4 5 1 5 4
# 3 7 6 4 8 8 3 1 -1 -2
# 4 3 2 3 4 10 8 -5 -7 -2
# 5 1 5 2 5 7 7 -5 -6 0
# 6 6 4 5 3 6 2 3 0 1
# 7 5 8 10 2 1 4 6 4 6
# 8 10 7 8 7 3 1 7 7 0
# 9 4 1 1 9 2 9 -8 2 -8
# 10 8 9 7 1 5 10 -3 3 8
A more efficient way which avoids the loop over suffixes would be like this:
zz <- df[paste0('x', suff)] - df[paste0('y', suff)]
names(zz) <- paste0('zz', suff)
df <- cbind(df, zz)
Data:
set.seed(5)
df <- data.frame(x2 = sample(1:10, 10),
x3 = sample(1:10, 10),
x1 = sample(1:10, 10),
y3 = sample(1:10, 10),
y2 = sample(1:10, 10),
y1 = sample(1:10, 10))

Generating all possible outcomes in r

Given a vector with numeric values, how do I generate all possible outcomes for subtraction to find the differences and put them in a data.frame?
dataset1 <- data.frame(numbers = c(1,2,3,4,5,6,7,8,9,10))
i.e. (1 - 1, 1 - 2 , 1 - 3,...)
Ideally, I would want the output to give me a data frame with 3 columns (Number X, Number Y, Difference) using dataset1.
The expand.grid function can get you "pairings" which are different than the pairings you get with combn. Since you included 1-1 I'm assuming you didn't want since it doesn't return 1-1 and only gives you 45 combinations.
> pairs=expand.grid(X=1:10, Y=1:10)
> pairs$diff <- with(pairs, X-Y)
> pairs
X Y diff
1 1 1 0
2 2 1 1
3 3 1 2
4 4 1 3
5 5 1 4
6 6 1 5
7 7 1 6
8 8 1 7
9 9 1 8
10 10 1 9
11 1 2 -1
12 2 2 0
13 3 2 1
14 4 2 2
15 5 2 3
16 6 2 4
17 7 2 5
snipped remainder (total of 100 rows)
Use outer as another way to get such a group of paired differences;
> tbl <- matrix( outer(X=1:10, Y=1:10, "-"), 10, dimnames=list(X=1:10, Y=1:10))
> tbl
Y
X 1 2 3 4 5 6 7 8 9 10
1 0 -1 -2 -3 -4 -5 -6 -7 -8 -9
2 1 0 -1 -2 -3 -4 -5 -6 -7 -8
3 2 1 0 -1 -2 -3 -4 -5 -6 -7
4 3 2 1 0 -1 -2 -3 -4 -5 -6
5 4 3 2 1 0 -1 -2 -3 -4 -5
6 5 4 3 2 1 0 -1 -2 -3 -4
7 6 5 4 3 2 1 0 -1 -2 -3
8 7 6 5 4 3 2 1 0 -1 -2
9 8 7 6 5 4 3 2 1 0 -1
10 9 8 7 6 5 4 3 2 1 0
But I didn't see a compact way to create a dataframe of the sort you specified.
The now deleted comment by #RitchieSacramento iswas correct:
> tbl <- matrix( outer(X=1:10, Y=1:10, "-"), 10, dimnames=list(X=1:10, Y=1:10))
> as.data.frame.table(tbl)
X Y Freq
1 1 1 0
2 2 1 1
3 3 1 2
4 4 1 3
5 5 1 4
6 6 1 5
7 7 1 6
8 8 1 7
9 9 1 8
10 10 1 9
11 1 2 -1
12 2 2 0
13 3 2 1
14 4 2 2
15 5 2 3
16 6 2 4
You can use the combn() function to generate the list of all combinations take 2 at a time.
numbers = c(1,2,3,4,5,6,7,8,9,10)
output <-combn(numbers, 2, FUN = NULL, simplify = TRUE )
answer <- as.data.frame(t(output))
answer$Difference <- answer[ ,1] - answer[ ,2]
head(answer)
V1 V2 Difference
1 1 2 -1
2 1 3 -2
3 1 4 -3
4 1 5 -4
5 1 6 -5
6 1 7 -6

How do I sum a specific value from a particular column given other criteria in R?

Let's say I have the following table:
> df <- data.frame("1"=c(9,10,11,10,11,9,10,10,9,11), "2"=c(1,1,2,2,1,2,1,2,2,1), "3"=c(3,1,0,0,3,3,3,3,1,0))
> df
X1 X2 X3
1 9 1 3
2 10 1 1
3 11 2 0
4 10 2 0
5 11 1 3
6 9 2 3
7 10 1 3
8 10 2 3
9 9 2 1
10 11 1 0
How do I find the sum of all the 3's in the column X3, given the criteria that the value in column X1 must be 9, and the value in column X2 is 1?
We can use == with & to create a logical vector, get the sum and multiply by 3
with(df, 3 * sum(X3 == 3 & X1 == 9 & X2 == 1))
#[1] 3
Or another option is
3 * sum(do.call(paste0, df) == '913')

How to count observations satisfying specific criteria in R [duplicate]

Suppose I have the following data frame:
Data1
X1 X2
1 15 1
2 3 1
3 7 0
4 11 1
5 1 0
6 9 0
7 18 0
8 6 1
9 3 1
I would like to know how to find the total number of observations where X1 is greater than 9 and X2 is equal to 1?
I think I will need to use sum(), but I have no idea what to put in the parenthesis.
data1='
X1 X2
15 1
3 1
7 0
11 1
1 0
9 0
18 0
6 1
3 1'
data1=read.table(text=data1,header=T)
1)
nrow(data1[data1$X1 > 9 & data1$X2 ==1,])
2)
sum(data1$X1 > 9 & data1$X2 ==1)
3)
With data.table:
dataDT = data.table(data1)
dataDT[X1 > 9 & X2 == 1, .N]

Subtract multiple columns ignoring NA

I'm fairly new to R and have run into an issue with NA's. This question may have been answered elsewhere but I can't seem to find the answer. I'm trying to do sort of the opposite of rowSums() in that I'm trying to subtract x2 and x3 from x1 in order to generate x4 without NA's. The code I'm currently using is as follows:
> x <- data.frame(x1 = 3, x2 = c(4:1, 2:5), x3=c(1,NA))
> x$x4=x$x1-x$x2-x$x3
> x
x1 x2 x3 x4
1 3 4 1 -2
2 3 3 NA NA
3 3 2 1 0
4 3 1 NA NA
5 3 2 1 0
6 3 3 NA NA
7 3 4 1 -2
8 3 5 NA NA
In other words I want to ingore the NA's similar to how rowSums allows the na.rm=TRUE argument so that I get this result:
x1 x2 x3 x4
1 3 4 1 -2
2 3 3 NA 0
3 3 2 1 0
4 3 1 NA 2
5 3 2 1 0
6 3 3 NA 0
7 3 4 1 -2
8 3 5 NA -2
Any help is greatly appreciated.
You can use something like this if all columns have NAs -
x$x4 <- ifelse(is.na(x$x1),0,x$x1) -ifelse(is.na(x$x2),0,x$x2)-ifelse(is.na(x$x3),0,x$x3)
Provided you want to treat NAs as 0. Else you can replace the 0s in the above formula with the value you need.
Just use rowSums:
> x$x4 <- x$x1 - rowSums(x[,2:3], na.rm=TRUE)
> x
x1 x2 x3 x4
1 3 4 1 -2
2 3 3 NA 0
3 3 2 1 0
4 3 1 NA 2
5 3 2 1 0
6 3 3 NA 0
7 3 4 1 -2
8 3 5 NA -2

Resources