I have a logical vector, for which I wish to insert new elements at particular indexes. I've come up with a clumsy solution below, but is there a neater way?
probes <- rep(TRUE, 15)
ind <- c(5, 10)
probes.2 <- logical(length(probes)+length(ind))
probes.ind <- ind + 1:length(ind)
probes.original <- (1:length(probes.2))[-probes.ind]
probes.2[probes.ind] <- FALSE
probes.2[probes.original] <- probes
print(probes)
gives
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
and
print(probes.2)
gives
[1] TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE
[13] TRUE TRUE TRUE TRUE TRUE
So it works but is ugly looking - any suggestions?
These are all very creative approaches. I think working with indexes is definitely the way to go (Marek's solution is very nice).
I would just mention that there is a function to do roughly that: append().
probes <- rep(TRUE, 15)
probes <- append(probes, FALSE, after=5)
probes <- append(probes, FALSE, after=11)
Or you could do this recursively with your indexes (you need to grow the "after" value on each iteration):
probes <- rep(TRUE, 15)
ind <- c(5, 10)
for(i in 0:(length(ind)-1))
probes <- append(probes, FALSE, after=(ind[i+1]+i))
Incidentally, this question was also previously asked on R-Help. As Barry says:
"Actually I'd say there were no ways of doing this, since I dont think you can actually insert into a vector - you have to create a new vector that produces the illusion of insertion!"
You can do some magic with indexes:
First create vector with output values:
probs <- rep(TRUE, 15)
ind <- c(5, 10)
val <- c( probs, rep(FALSE,length(ind)) )
# > val
# [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
# [13] TRUE TRUE TRUE FALSE FALSE
Now trick. Each old element gets rank, each new element gets half-rank
id <- c( seq_along(probs), ind+0.5 )
# > id
# [1] 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0 13.0 14.0 15.0
# [16] 5.5 10.5
Then use order to sort in proper order:
val[order(id)]
# [1] TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE
# [13] TRUE TRUE TRUE TRUE TRUE
probes <- rep(TRUE, 1000000)
ind <- c(50:100)
val <- rep(FALSE,length(ind))
new.probes <- vector(mode="logical",length(probes)+length(val))
new.probes[-ind] <- probes
new.probes[ind] <- val
Some timings:
My method
user system elapsed
0.03 0.00 0.03
Marek method
user system elapsed
0.18 0.00 0.18
R append with for loop
user system elapsed
1.61 0.48 2.10
How about this:
> probes <- rep(TRUE, 15)
> ind <- c(5, 10)
> probes.ind <- rep(NA, length(probes))
> probes.ind[ind] <- FALSE
> new.probes <- as.vector(rbind(probes, probes.ind))
> new.probes <- new.probes[!is.na(new.probes)]
> new.probes
[1] TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE
[13] TRUE TRUE TRUE TRUE TRUE
That is sorta tricky. Here's one way. It iterates over the list, inserting each time, so it's not too efficient.
probes <- rep(TRUE, 15)
probes.ind <- ind + 0:(length(ind)-1)
for (i in probes.ind) {
probes <- c(probes[1:i], FALSE, probes[(i+1):length(probes)])
}
> probes
[1] TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE
[13] TRUE TRUE TRUE TRUE TRUE
This should even work if ind has repeated elements, although ind does need to be sorted for the probes.ind construction to work.
Or you can do it using the insertRow function from the miscTools package.
probes <- rep(TRUE, 15)
ind <- c(5,10)
for (i in ind){
probes <- as.vector(insertRow(as.matrix(probes), i, FALSE))
}
I came up with a good answer that's easy to understand and fairly fast to run, building off Wojciech's answer above. I'll adapt the method for the example here, but it can be easily generalized to pretty much any data type for an arbitrary pattern of missing points (shown below).
probes <- rep(TRUE, 15)
ind <- c(5,10)
probes.final <- rep(FALSE, length(probes)+length(ind))
probes.final[-ind] <- probes
The data I needed this for is sampled at a regular interval, but many samples are thrown out, and the resulting data file only includes the timestamps and measurements for those retained. I needed to produce a vector containing all the timestamps and a data vector with NAs inserted for timestamps that were tossed. I used the "not in" function stolen from here to make it a bit simpler.
`%notin%` <- Negate(`%in%`)
dat <- rnorm(50000) # Data given
times <- seq(from=554.3, by=0.1, length.out=70000] # "Original" time stamps
times <- times[-sample(2:69999, 20000)] # "Given" times with arbitrary points missing from interior
times.final <- seq(from=times[1], to=times[length(times)], by=0.1)
na.ind <- which(times.final %notin% times)
dat.final <- rep(NA, length(times.final))
dat.final[-na.ind] <- dat
Um, hi, I had the same doubt, but I couldn't understand what people had answered, because I'm still learning the language. So I tried make my own and I suppose it works! I created a vector and I wanted to insert the value 100 after the 3rd, 5th and 6th indexes. This is what I wrote.
vector <- c(0:9)
indexes <- c(6, 3, 5)
indexes <- indexes[order(indexes)]
i <- 1
j <- 0
while(i <= length(indexes)){
vector <- append(vector, 100, after = indexes[i] + j)
i <-i + 1
j <- j + 1
}
vector
The vector "indexes" must be in ascending order for this to work. This is why I put them in order at the third line.
The variable "j" is necessary because at each iteration, the length of the new vector increases and the original values are moved.
In the case you wish to insert the new value next to each other, simply repeat the number of the index. For instance, by assigning indexes <- c(3, 5, 5, 5, 6), you should get vector == 0 1 2 100 3 4 100 100 100 5 100 6 7 8 9
Related
This should be very simple, but my r knowledge is limited.
I'm trying to find out if any value is greater than all previous values.
An example would be
x<-c(1.1, 2.5, 2.4, 3.6, 3.2)
results:
NA True False True False
My real values are measurements with many decimal places so I doubt I will get the same value twice
You can use cummax() to get the biggest value so far. x >= cummax(x) basically gives you the answer, although element 1 is TRUE, so you just need to change that:
> out = x >= cummax(x)
> out[1] = NA
> out
[1] NA TRUE FALSE TRUE FALSE
Although #Marius has got this absolutely correct. Here is an option with a loop
sapply(seq_along(x), function(i) all(x[i] >= x[seq_len(i)]))
#[1] TRUE TRUE FALSE TRUE FALSE
Or same logic with explicit for loop
out <- logical(length(x))
for(i in seq_along(x)) {
out[i] <- all(x[i] >= x[seq_len(i)])
}
out[1] <- NA
out
#[1] NA TRUE FALSE TRUE FALSE
We can use lapply
unlist(lapply(seq_along(x), function(i) all(x[i] >=x[seq(i)])))
#[1] TRUE TRUE FALSE TRUE FALSE
Or with max.col
max.col(t(sapply(x, `>=`, x)), 'last') > seq_along(x)
#[1] FALSE TRUE FALSE TRUE FALSE
or with for loop
mx <- x[1]
i1 <- logical(length(x))
for(i in seq_along(x)) {i1[i][x[i] > mx] <- TRUE; mx <- max(c(mx, x[i]))}
how can I check to see if vector exists inside a matrix. The vector will be of size 2. I have an approach but I would like something vectorized/faster.
dim(m)
[1] 30 2
x = c(1, -2)
for(j in 1:nrow(m)){
if ( isTRUE(as.vector(x[1]) == as.vector(m[j,1])) && as.vector(x[2] == as.vector(m[j,2]) )) {
print(TRUE)
}
}
note, x=c(1, -2) is not the same as -2, 1 in the matrix.
If we are comparing the rows of the matrix ('m') with 'x' having the same length as the number of columns of 'm', we can replicate 'x' (x[col(m)]) to make the lengths same, compare (!=), get the rowSums. If the sum is 0 for a particular row, it means that all the values in the vector matches that row of 'm'. Negate (!) to convert 0 to TRUE and all other values as FALSE.
indx1 <- !rowSums(m!=x[col(m)])
Or if we need a solution using apply, we can use identical
indx2 <- apply(m, 1, identical, y=x)
identical(indx1, indx2)
#[1] TRUE
If this to find only a single TRUE/FALSE, we can wrap any to 'indx1' or 'indx2'.
data
x <- c(1, -2)
set.seed(24)
m <- matrix(sample(c(1,-2,3,4), 30*2, replace=TRUE), ncol=2)
Try
m<-matrix(rnorm(60),30)
x<-m[8,]
m[9,]<-c(x[2],x[1]) # to prove 1,-2 not same -2,1
apply(m,1,function(n,x) all(n==x),x=x)
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[24] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
if you need just one T/F use any() you
any(apply(m,1,function(n,x) all(n==x),x=x))
[1] TRUE
if run this code with akrun's data
x <- c(1, -2)
set.seed(24)
m <- matrix(sample(c(1,-2,3,4), 30*2, replace=TRUE), ncol=2)
any(apply(m,1,function(n,x) all(n==x),x=x))
[1] TRUE
I have a matrix and a vector with values:
mat<-matrix(c(1,1,6,
3,5,2,
1,6,5,
2,2,7,
8,6,1),nrow=5,ncol=3,byrow=T)
vec<-c(1,6)
This is a small subset of a N by N matrix and 1 by N vector. Is there a way so that I can subset the rows with values in vec?
The most straight forward way of doing this that I know of would be to use the subset function:
subset(mat,vec[,1] == 1 & vec[,2] == 6) #etc etc
The problem with subset is you have to specify in advance the column to look for and the specific combination to do for. The problem I am facing is structured in a way such that I want to find all rows containing the numbers in "vec" in any possible way. So in the above example, I want to get a return matrix of:
1,1,6
1,6,5
8,6,1
Any ideas?
You can do
apply(mat, 1, function(x) all(vec %in% x))
# [1] TRUE FALSE TRUE FALSE TRUE
but this may give you unexpected results if vec contains repeated values:
vec <- c(1, 1)
apply(mat, 1, function(x) all(vec %in% x))
# [1] TRUE FALSE TRUE FALSE TRUE
so you would have to use something more complicated using table to account for repetitions:
vec <- c(1, 1)
is.sub.table <- function(table1, table2) {
all(names(table1) %in% names(table2)) &&
all(table1 <= table2[names(table1)])
}
apply(mat, 1, function(x)is.sub.table(table(vec), table(x)))
# [1] TRUE FALSE FALSE FALSE FALSE
However, if the vector length is equal to the number of columns in your matrix as you seem to indicate but is not the case in your example, you should just do:
vec <- c(1, 6, 1)
apply(mat, 1, function(x) all(sort(vec) == sort(x)))
# [1] TRUE FALSE FALSE FALSE FALSE
This question already has answers here:
Check which elements of a vector is between the elements of another one in R
(4 answers)
Closed 9 years ago.
I have two vectors. I want to check the first element of first vector is between first and second element of second vector , then check the second element of first vector is between the third and forth element of the second vector ,.....How can I do this in R?
For example, If we have tow vectors
a = c(1.5, 2, 3.5)
b = c(1, 2, 3, 5, 3, 8)
the final result in R should be for 1.5 is TRUE and 3.5 is TRUE and for 2 is FALSE.
x <- c(1.5,3.5,3.5,3.5,4)
y <- 1:5
x > y & x < c(y[-1],NA)
#[1] TRUE FALSE TRUE FALSE FALSE
You need to take care of vector lengths and think about, what you want the result to be for the last element of x and of course.
More robust solution:
x <- c(1.5,3.5,3.5,3.5,4)
findInterval(x,y) == seq_along(x)
#[1] TRUE FALSE TRUE FALSE FALSE
x1 <- c(1.5,3.5)
findInterval(x1,y) == seq_along(x1)
#[1] TRUE FALSE
x2 <- c(1.5,3.5,1:5+0.5)
findInterval(x2,y) == seq_along(x2)
#[1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE
Here's one way.
s <- seq_along(a)
b[s] < a[s] & a[s] < b[s+1]
# [1] TRUE FALSE TRUE
Maybe this is not an ideal and fastest solution, but it works.
a <- rnorm(99)
b <- rnorm(100)
m <- cbind(b[-length(b)], b[-1])
a > m[,1] & a < m[,2]
You should check the lengths of both initial vectors.
Here is one-line solution:
sapply(1:length(a), function(i) {a[i] > b[i] & a[i] < b[i+1]})
I have two data frames with different number of rows, but the same number of columns. In the example below data frame 1 is 4 x 2, data frame 2 is 3 x 2. I need a 4 x 3 logical matrix where TRUE indicates that the all the rows in the data frames match. This example works, but takes a very long time to run with larger data frames (I'm trying two data frames with about 5,000 rows, but still just two columns). Is there a more efficient way of doing this?
> df1 <- data.frame(row.names=1:4, var1=c(TRUE, TRUE, FALSE, FALSE), var2=c(1,2,3,4))
> df2 <- data.frame(row.names=5:7, var1=c(FALSE, TRUE, FALSE), var2=c(5,2,3))
>
> m1 <- t(as.matrix(df1))
> m2 <- as.matrix(df2)
>
> apply(m2, 1, FUN=function(x) { apply(m1, 2, FUN=function(y) { all(x==y) } ) })
5 6 7
1 FALSE FALSE FALSE
2 FALSE TRUE FALSE
3 FALSE FALSE TRUE
4 FALSE FALSE FALSE
Thanks in advance for any help.
I was drawn here by your post on R-bloggers: http://jason.bryer.org/posts/2013-01-24/Comparing_Two_Data_Frames.html
If like you say, your data has no numeric vectors, then I think I can suggest a faster approach. It consists in:
turn your two data.frames into two matrices of integers
compute the Euclidean distance between rows of your two datas
Quick example using your data:
mat1 <- as.matrix(sapply(df1, as.integer))
mat2 <- as.matrix(sapply(df2, as.integer))
library(fields)
rdist(mat1, mat2) < 1e-9
# [,1] [,2] [,3]
# [1,] FALSE FALSE FALSE
# [2,] FALSE TRUE FALSE
# [3,] FALSE FALSE TRUE
# [4,] FALSE FALSE FALSE
A few comments:
if your data contained vectors of characters, you would have to convert them into factors and make sure that they share the same factor levels.
I used the fields package to compute the Euclidean distance. It uses a Fortran implementation and is as far as I know the fastest R package around for the task (and I have tested many, trust me.)
I'm honestly not sure if this will be faster, but you might try:
foo <- Vectorize(function(x,y) {all(df1[x,] == df2[y,])})
> outer(1:4,1:3,FUN = foo)
[,1] [,2] [,3]
[1,] FALSE FALSE FALSE
[2,] FALSE TRUE FALSE
[3,] FALSE FALSE TRUE
[4,] FALSE FALSE FALSE
I feel compelled to at least mention the danger in using == for comparisons as opposed to all.equal or identical. I'm presuming that you're comfortable enough with the data types involves that this won't be a problem.
I suspect that the optimal solution depends on how many unique rows and how many total rows you have.
For the example on your blog, where there are 1000-1500 rows but only 20 unique values (for the seed you set there), I think it's faster to do this:
assign ids to each unique row and then
run outer on the vector of ids seen in each data.frame.
Here's the performance I got. #flodel's approach does about the same on my computer; it's the third one below. Disclaimer: I don't know much about running these kinds of tests.
> set.seed(2112)
> df1 <- data.frame(row.names=1:1000,
+ var1=sample(c(TRUE,FALSE), 1000, replace=TRUE),
+ var2=sample(1:10, 1000, replace=TRUE) )
> df2 <- data.frame(row.names=1001:2500,
+ var1=sample(c(TRUE,FALSE), 1500, replace=TRUE),
+ var2=sample(1:10, 1500, replace=TRUE))
>
> # candidate method on blog
> system.time({
+ df1$var3 <- apply(df1, 1, paste, collapse='.')
+ df2$var3 <- apply(df2, 1, paste, collapse='.')
+ df6 <- sapply(df2$var3, FUN=function(x) { x == df1$var3 })
+ dimnames(df6) <- list(row.names(df1), row.names(df2))
+ })
user system elapsed
1.13 0.00 1.14
>
> rownames(df1) <- NULL # in case something weird happens to rownames on merge
> rownames(df2) <- NULL
> # id method
> system.time({
+ df12 <- unique(rbind(df1,df2))
+ df12$id <- rownames(df12)
+
+ id1 <- merge(df12,df1)$id
+ id2 <- merge(df12,df2)$id
+
+ x <- outer(id1,id2,`==`)
+ })
user system elapsed
0.11 0.02 0.13
>
> library(fields)
> # rdlist from fields method
> system.time({
+ mat1 <- as.matrix(sapply(df1, as.integer))
+ mat2 <- as.matrix(sapply(df2, as.integer))
+ rdist(mat1, mat2) < 1e-9
+ })
user system elapsed
0.15 0.00 0.16
I guess the rbind and the merges would make this solution relatively more costly with different data.