I am running a loop in R to find indices of a vector when its elements are equal to elements of a reference vector.
As far as I know R, I need to declare the variable before the for-loop, but in this case I do not know the final length of my indices vector (see code below).
How can I create a variables that allows R to change its size during the for loop?
extract of my code:
k <- 1
for(i in 1:length(Lid.time)){
ind <- which(Net.time==Lid.time[i])
if(length(ind)>0){
ind.Net[k] <- ind
k <- k+1
}
}
Notes about the code:
Lid.time is a vector of a different lenght than Net.time.
I need to find an array of indices that tells me where Net.time is equal to Lid.time. I do not know in advance how long will the ind.Net vector will be, so how can I declare the vector ind.Net?
Thanks for your help
As Dason stated, match will work just fine for that specific task:
>a <- seq(2,20,2)
#[1] 2 4 6 8 10 12 14 16 18 20
>b <- c(4,14,18)
>match(b,a)
#[1] 2 7 9 # The indices!
>a %in% b #shorthand logical version of match
#[1] FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE
But to answer your question of a vector of unknown length within a loop:
Vector <- c()
for(i in sample(1:100,20)) {
if(i<50) {Vector <- append(Vector, i)}
}
length(HowLongIsThisVector)
It will be different every time you run it because of sample.
No need for a loop as it sounds like match does what you want.
a <- 1:10
b <- c(2, 7, 9)
match(a, b)
# [1] NA 1 NA NA NA NA 2 NA 3 NA
Related
I have a list r containing n vectors of different length.
And a separate vector a also of length n
x <- 1:100
r <- slider(x,.size=5)
a <- 1:length(r)
From every element in each vector of the list r I want to subtract an element of a.
So the first element of a shall be subtracted from every element of the first vector of r.
Something like this, but on a larger scale and keeping the vectors in the list r
r[1]-a[1]
r[2]-a[2]
r[3]-a[3]
This gives me Error in r[1] - n[1] : non-numeric argument to binary operator
Disclaimer: The vectors of r in the example do NOT have different lengths. I do not know how to do this when generating the example.
You can use Map :
Map(`-`, r, a)
Same result from #RonakShah can be obtained with:
mapply(`-`,r,a)
Output:
[[1]]
[1] 0 1 2 3 4 5 6 7
[[2]]
[1] -1 0 1 2 3 4 5 6 7
[[3]]
[1] -2 -1 0 1 2 3 4 5 6 7
We could use a for loop
out <- vector('list', length(r))
for(i in seq_along(r)) {
out[[i]] <- r[[i]] - a[i]
}
I want to make 2 vectors subsetting from the same data, with replace=TRUE.
Even if both vectors can contain the same values, they cannot be the same at the same index position.
For example:
> set.seed(1)
> a <- sample(15, 10, replace=T)
> b <- sample(15, 10, replace=T)
> a
[1] 4 6 9 14 4 14 15 10 10 1
> b
[1] 4 3 11 6 12 8 11 15 6 12
> a==b
[1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
In this case, vectors a and b contain the same value at index 1 (value==4), which is wrong for my purposes.
Is there an easy way to correct this?
And can it be done on the subset step?
Or should I go through a loop checking element by element and if the values are identical, make another selection for b[i] and check again if it's not identical ad infinitum?
many thanks!
My idea is, instead of getting 2 samples of length 10 with replacement, get 10 samples of length 2 without replacement
library(purrr)
l <- rerun(10,sample(15,2,replace=FALSE))
Each element in l is a vector of integers of length two. Those two integers are guaranteed to be different because we specified replace=FALSE in sample
# from l extract all first element in each element, this is a
a <- map_int(l,`[[`,1)
# from list extract all second elements, this is b
b <- map_int(l,`[[`,2)
How about a two-stage sampling process
set.seed(1)
x <- 1:15
a <- sample(x, 10, replace = TRUE)
b <- sapply(a, function(v) sample(x[x != v], 1))
a != b
#[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
We first draw samples a; then for every sample from a, we draw a new sample from the set of values x excluding the current sample from a. Since we're doing this one-sample-at-a-time, we automatically allow for sampling with replacement.
I'd like to understand what's going on in this piece of R code I was testing. I'd like to replace part of a vector with another vector. The original and replacement values are in a data.frame. I'd like to replace all elements of the vector that match the original column with the corresponding replacement values. I have the answer to the larger question, but I'm unable to understand how it works.
Here's a simple example:
> vecA <- 1:5;
> vecB <- data.frame(orig=c(2,3), repl=c(22,33));
> vecA[vecA %in% vecB$orig] <- vecB$repl #Question-1
> vecA
[1] 1 22 33 4 5
> vecD<-data.frame(orig=c(5,7), repl=c(55,77))
> vecA[vecA %in% vecD$orig] <- vecD$repl #Question-2
Warning message:
In vecA[vecA %in% vecD$orig] <- vecD$repl :
number of items to replace is not a multiple of replacement length
> vecA
[1] 1 22 33 4 55
Here are my questions:
How does the assignment on Line-3 work? The LHS expression is a 2-item vector, whereas the RHS is a 5-element vector.
Why does the assignment on Line-6 give a warning (but still work)?
The First Question
R goes through each element in vecA and checks to see if it exists in vecB$orig. The %in% operator will return a boolean. If you run the command vecA %in% vecB$orig you get the following:
[1] FALSE TRUE TRUE FALSE FALSE
which is telling you that in the vector 1 2 3 4 5 it sees 2 and 3 in vecB$orig.
By subsetting vecA by this command, you are isolating only the TRUE values in vecA, so vecA[vecA %in% vecB$orig] returns:
[1] 2 3
On the RHS, you are re-assigning wherever vecA[vecA %in% vecB$orig] equals TRUE to vecB$repl, which will replace 2 3 in vecA with 22 33.
The Second Question
In this case, the same logic applies for subsetting, but running vecA[vecA %in% vecD$orig] gives you
[1] 5
as 7 does not exist in vecA. You are trying to replace a vector of length 1 with a vector of length 2, which is what triggers the warning. In this case, it will just replace the first element of vecD$repl which happens to be 55.
Following is related to R language.
x1 <- c(1, 4, 3, NA, 7)
is.na(x1) <- which(x1 == 7)
I don't undertand, the LHS in last line gives you a vector of boolean and RHS is a value(index where x ==7, 5 in this case). So what does it mean to assign a boolean vector a value of 5?
is.na from the docs returns:
The default method for is.na applied to an atomic vector returns a logical vector of the same length as its argument x, containing TRUE for those elements marked NA or, for numeric or complex vectors, NaN, and FALSE otherwise.
Therefore, by making a logical vector(you're in essence saying wherever an index is TRUE, this should be an NA.
By "matching" these indices to the corresponding index from which, you're turning the latter into NAs wherever FALSE hence the change.
To put it in practice:
This is the output from is.na(x1):
is.na(x1)
[1] FALSE FALSE FALSE TRUE FALSE
The corresponding output from which(x==7):
which(x1 == 7)
[1] 5
Combining, the element at position 5 will now become an NA because it has been given the logical is.na() which returns TRUE
is.na(x1) <- which(x1 == 7)
x1
[1] 1 4 3 NA NA
The above turns the first index into an NA and appends two more NAs so as to make index 7 and NA.
This can be best seen by:
is.na(x1) <- c(1,7)
x1
[1] NA 4 3 NA 7 NA NA
Compare with this example from the docs:
(xx <- c(0:4))
is.na(xx) <- c(2, 4)
xx
[1] 0 NA 2 NA 4
From the above, it is clear that c(2,4) follows the original indices in xx hence the rest become NAs.
I don't find the help page for the replace function from the base package to be very helpful. Worst part, it has no examples which could help understand how it works.
Could you please explain how to use it? An example or two would be great.
If you look at the function (by typing it's name at the console) you will see that it is just a simple functionalized version of the [<- function which is described at ?"[". [ is a rather basic function to R so you would be well-advised to look at that page for further details. Especially important is learning that the index argument (the second argument in replace can be logical, numeric or character classed values. Recycling will occur when there are differing lengths of the second and third arguments:
You should "read" the function call as" "within the first argument, use the second argument as an index for placing the values of the third argument into the first":
> replace( 1:20, 10:15, 1:2)
[1] 1 2 3 4 5 6 7 8 9 1 2 1 2 1 2 16 17 18 19 20
Character indexing for a named vector:
> replace(c(a=1, b=2, c=3, d=4), "b", 10)
a b c d
1 10 3 4
Logical indexing:
> replace(x <- c(a=1, b=2, c=3, d=4), x>2, 10)
a b c d
1 2 10 10
You can also use logical tests
x <- data.frame(a = c(0,1,2,NA), b = c(0,NA,1,2), c = c(NA, 0, 1, 2))
x
x$a <- replace(x$a, is.na(x$a), 0)
x
x$b <- replace(x$b, x$b==2, 333)
Here's two simple examples
> x <- letters[1:4]
> replace(x, 3, 'Z') #replacing 'c' by 'Z'
[1] "a" "b" "Z" "d"
>
> y <- 1:10
> replace(y, c(4,5), c(20,30)) # replacing 4th and 5th elements by 20 and 30
[1] 1 2 3 20 30 6 7 8 9 10
Be aware that the third parameter (value) in the examples given above: the value is a constant (e.g. 'Z' or c(20,30)).
Defining the third parameter using values from the data frame itself can lead to confusion.
E.g. with a simple data frame such as this (using dplyr::data_frame):
tmp <- data_frame(a=1:10, b=sample(LETTERS[24:26], 10, replace=T))
This will create somthing like this:
a b
(int) (chr)
1 1 X
2 2 Y
3 3 Y
4 4 X
5 5 Z
..etc
Now suppose you want wanted to do, was to multiply the values in column 'a' by 2, but only where column 'b' is "X". My immediate thought would be something like this:
with(tmp, replace(a, b=="X", a*2))
That will not provide the desired outcome, however. The a*2 will defined as a fixed vector rather than a reference to the 'a' column. The vector 'a*2' will thus be
[1] 2 4 6 8 10 12 14 16 18 20
at the start of the 'replace' operation. Thus, the first row where 'b' equals "X", the value in 'a' will be placed by 2. The second time, it will be replaced by 4, etc ... it will not be replaced by two-times-the-value-of-a in that particular row.
Here's an example where I found the replace( ) function helpful for giving me insight. The problem required a long integer vector be changed into a character vector and with its integers replaced by given character values.
## figuring out replace( )
(test <- c(rep(1,3),rep(2,2),rep(3,1)))
which looks like
[1] 1 1 1 2 2 3
and I want to replace every 1 with an A and 2 with a B and 3 with a C
letts <- c("A","B","C")
so in my own secret little "dirty-verse" I used a loop
for(i in 1:3)
{test <- replace(test,test==i,letts[i])}
which did what I wanted
test
[1] "A" "A" "A" "B" "B" "C"
In the first sentence I purposefully left out that the real objective was to make the big vector of integers a factor vector and assign the integer values (levels) some names (labels).
So another way of doing the replace( ) application here would be
(test <- factor(test,labels=letts))
[1] A A A B B C
Levels: A B C