Related
Let´s say I have two vectors
a <- c(5,10,12)
b <- c(4,11,15)
I would like to compare a with b and obtain the smaller closest value to each element. The smaller closest value to 5 is 4, for 10 is 4 and for 12 is 11. And the same but finding the closest bigger value. For 5 is 11, for 10 is 11 and for 12 is 15.
Desired vector of closest smaller values:
4 4 11
Desired vector of closest bigger values:
11 11 15
I found another example using the function closest from the package DescTools, but the results are different
> unlist(lapply(a, function(i) min(Closest(x = b, a = i))))
[1] 4 11 11
> unlist(lapply(a, function(i) max(Closest(x = b, a = i))))
[1] 4 11 11
Do you know how I could achieve my objective?
This should do it:
> sapply(a,function(x) b[tail(which(b<x),1)])
[1] 4 4 11
> sapply(a,function(x) b[head(which(b>x),1)])
[1] 11 11 15
Assuming both are sorted and unique:
idx <- findInterval(a, b)
a[idx]
[1] 5 5 10
b[idx+1]
[1] 11 11 15
I have a dataframe called barometre2013 with a column called q0qc that contain this numbers:
[1] 15 1 9 15 9 3 6 3 3 6 6 10 15 6 15 10
I want to add +1 to the numbers that are >= 10, so the result should be this:
[1] 16 1 9 16 9 3 6 3 3 6 6 11 16 6 16 11
I have tried this code:
if (barometre2013$q0qc > 9) {
barometre2013$q0qc <- barometre2013$q0qc + 1
}
But this add +1 to all the numbers without respecting the condition:
[1] 16 2 10 16 10 4 7 4 4 7 7 11 16 7 16 11
How can I do what I want ?
Thank a lot.
When you executed:
if (barometre2013$q0qc > 9) {
barometre2013$q0qc <- barometre2013$q0qc + 1
}
... you should have seen a warning about "only the first value being evaluated". That first value in barometre2013$q0qc was 15 and since it was TRUE, then that assignment was done on the entire vector. ifelse or Boolean logic are approaches suggested in the comments for conditional evaluation and/or assignment. The first:
barometre2013$q0qc <- barometre2013$q0qc + (barometre2013$q0qc >= 10)
... added a vector of 1 and 0's to the starting vector; 1 if the logical expression is satisfied and 0 if not. If you wanted to add something other than one (which is the numeric value of TRUE) you could have multiplied that second term by the desired increment or decrement.
Another approach was to use ifelse which does do a conditional test of its first argument on returns either the second or third argument on an item-by-item basis:
barometre2013$q0qc <- barometre2013$q0qc + ifelse(barometre2013$q0qc >= 10, 1, 0)
The third approach suggested by dash2 would be to only modify those values that meet the condition. Note that this method requires having the "test vector on both sides of the assignment (which is why dash2 was correcting the earlier comment:
barometre2013$q0qc[barometre2013$q0qc>=10] <-
barometre2013$q0qc[barometre2013$q0qc>=10]+ 1
data <- c(15,1,9,15,9,3,6,3,3,6,6,10,15,6,15,10)
data2 <-
as.numeric(
for(i in data){
if(i >= 10){ i = i +1 }
print(i)
}
)
class(data)
class(data2)
I have two integer/posixct vectors:
a <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15) #has > 2 mil elements
b <- c(4,6,10,16) # 200000 elements
Now my resulting vector c should contain for each element of vector a the nearest element of b:
c <- c(4,4,4,4,4,6,6,...)
I tried it with apply and which.min(abs(a - b)) but it's very very slow.
Is there any more clever way to solve this? Is there a data.table solution?
As it is presented in this link you can do either:
which(abs(x - your.number) == min(abs(x - your.number)))
or
which.min(abs(x - your.number))
where x is your vector and your.number is the value. If you have a matrix or data.frame, simply convert them to numeric vector with appropriate ways and then try this on the resulting numeric vector.
For example:
x <- 1:100
your.number <- 21.5
which(abs(x - your.number) == min(abs(x - your.number)))
would output:
[1] 21 22
Update: Based on the very kind comment of hendy I have added the following to make it more clear:
Note that the answer above (i.e 21 and 22) are the indexes if the items (this is how which() works in R), so if you want to get the actual values, you have use these indexes to get the value. Let's have another example:
x <- seq(from = 100, to = 10, by = -5)
x
[1] 100 95 90 85 80 75 70 65 60 55 50 45 40 35 30 25 20 15 10
Now let's find the number closest to 42:
your.number <- 42
target.index <- which(abs(x - your.number) == min(abs(x - your.number)))
x[target.index]
which would output the "value" we are looking for from the x vector:
[1] 40
Not quite sure how it will behave with your volume but cut is quite fast.
The idea is to cut your vector a at the midpoints between the elements of b.
Note that I am assuming the elements in b are strictly increasing!
Something like this:
a <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15) #has > 2 mil elements
b <- c(4,6,10,16) # 200000 elements
cuts <- c(-Inf, b[-1]-diff(b)/2, Inf)
# Will yield: c(-Inf, 5, 8, 13, Inf)
cut(a, breaks=cuts, labels=b)
# [1] 4 4 4 4 4 6 6 6 10 10 10 10 10 16 16
# Levels: 4 6 10 16
This is even faster using a lower-level function like findInterval (which, again, assumes that breakpoints are non-decreasing).
findInterval(a, cuts)
[1] 1 1 1 1 2 2 2 3 3 3 3 3 4 4 4
So of course you can do something like:
index = findInterval(a, cuts)
b[index]
# [1] 4 4 4 4 6 6 6 10 10 10 10 10 16 16 16
Note that you can choose what happens to elements of a that are equidistant to an element of b by passing the relevant arguments to cut (or findInterval), see their help page.
library(data.table)
a=data.table(Value=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15))
a[,merge:=Value]
b=data.table(Value=c(4,6,10,16))
b[,merge:=Value]
setkeyv(a,c('merge'))
setkeyv(b,c('merge'))
Merge_a_b=a[b,roll='nearest']
In the Data table when we merge two data table, there is an option called nearest which put all the element in data table a to the nearest element in data table b. The size of the resultant data table will be equal to the size of b (whichever is within the bracket). It requires a common key for merging as usual.
For those who would be satisfied with the slow solution:
sapply(a, function(a, b) {b[which.min(abs(a-b))]}, b)
Here might be a simple base R option, using max.col + outer:
b[max.col(-abs(outer(a,b,"-")))]
which gives
> b[max.col(-abs(outer(a,b,"-")))]
[1] 4 4 4 4 6 6 6 10 10 10 10 10 16 16 16
Late to the party, but there is now a function from the DescTools package called Closest which does almost exactly what you want (it just doesn't do multiple at once)
To get around this we can lapply over your a list, and find the closest.
library(DescTools)
lapply(a, function(i) Closest(x = b, a = i))
You might notice that more values are being returned than exist in a. This is because Closest will return both values if the value you are testing is exactly between two (e.g. 3 is exactly between 1 and 5, so both 1 and 5 would be returned).
To get around this, put either min or max around the result:
lapply(a, function(i) min(Closest(x = b, a = i)))
lapply(a, function(i) max(Closest(x = b, a = i)))
Then unlist the result to get a plain vector :)
this is for setting
#this is for setting
A <- c(1,1,2,2,2,3,4,4,5,5,5)
B <- c(1:10)
C <- c(11:20)
ABC <- data.frame(A,B,C)
#so, I made up my own ABC like this
A B C
1 1 1 11
2 1 2 12
3 2 3 13
4 2 4 14
5 2 5 15
6 3 6 16
7 4 7 17
8 4 8 18
9 5 9 19
10 5 10 20
On this setting,
I want to know, when (A) are in a specific condition, how to get average (B)or(C)
For example
if condition(A) are 2:4, get mean (B), and mean(C)
new_ABC <- subset(ABC, ABC$A >= 2 & ABC$A <= 4)
mean(newABC$B)
mean(newABC$C)
and it works.
But if I want to make a function like this, I tried severe days, I have no idea...
getMeanB <- function(condition){
for(i in min(condition) : max(condition){
# I do not really know what to do..
}
}
Any helps will very thanks!!
If the argument 'condition' is a vector, then we can do it
getMean <- function(data, condition, cName) {
minC <- min(condition)
maxC <- max(condition)
i1 <- data[[cName]] >= minC & data[[cName]] <= maxC
colMeans(data[i1,setdiff(names(data), cName)], na.rm = TRUE)
}
getMean(ABC, 2:4, "A")
# B C
# 5.5 15.5
NOTE: Here, the 'data' and 'cName' arguments are added to make it more dynamic and applied to other datasets with different column names.
Assume you have a data frame like this:
df <- data.frame(Nums = c(1,2,3,4,5,6,7,8,9,10), Cum.sums = NA)
> df
Nums Cum.sums
1 1 NA
2 2 NA
3 3 NA
4 4 NA
5 5 NA
6 6 NA
7 7 NA
8 8 NA
9 9 NA
10 10 NA
and you want an output like this:
Nums Cum.sums
1 1 0
2 2 0
3 3 0
4 4 3
5 5 5
6 6 7
7 7 9
8 8 11
9 9 13
10 10 15
The 4. element of the column Cum.sum is the sum of 1 and 2, the 5. element of the Column Cum.sum is the sum of 2 and 3 and so on...
This means, I would like to build the cumulative sum of the first row and save it in the second row. However I don't want the normal cumulative sum but the sum of the element 2 rows above the current row plus the element 3 rows above the current row.
I allready tried to play a little bit around with the sum and cumsum function but I failed.
Any ideas?
Thanks!
You could use the embed function to create the appropriate lags, rowSums to sum, then lag appropriately (I used head).
df$Cum.sums[-(1:3)] <- head(rowSums(embed(df$Nums,2)),-2)
You don't need any special function, just use normal vector operations (these solutions are all equivalent):
df$Cum.sums[-(1:3)] <- head(df$Nums, -3) + head(df$Nums[-1], -2)
or
with(df, Cum.sums[-(1:3)] <- head(Nums, -3) + head(Nums[-1], -2))
or
df$Cum.sums[-(1:3)] <- df$Nums[1:(nrow(df)-3)] + df$Nums[2:(nrow(df)-2)]
I believe the first 3 sums SHOULD be NA, not 0, but if you prefer zeroes, you can initialize the sums first:
df$Cum.sums <- 0
Another solution, elegant and general, using matrix multiplication - and so very inefficient for large data. So it's not much practical, though a nice excercise:
len <- nrow(df)
sr <- 2 # number of rows to sum
lag <- 3
mat <- matrix(
head(c(
rep(0, lag * len),
rep(rep(1:0, c(sr, len - sr + 1)), len)
), len * len),
nrow = 10, byrow = TRUE
)
mat %*% df$Nums