A friend wrote up this function for determining unique members of a vector. I can't figure out (mentally) what this one line is doing and it's the crux of the function. Any help is greatly appreciated
myUniq <- function(x){
len = length(x) # getting the length of the argument
logical = rep(T, len) # creating a vector of logicals as long as the arg, populating with true
for(i in 1:len){ # for i -> length of the argument
logical = logical & x != x[i] # logical vector = logical vector & arg vector where arg vector != x[i] ??????
logical[i] = T
}
x[logical]
}
This line I can't figure out:
logical = logical & x != x[i]
can anyone explain it to me?
Thanks,
Tom
logical is a vector, I presume a logical one containing len values TRUE. x is a vector of some other data of the same length.
The second part x != x[i] is creating a logical vector with TRUE where elements of x aren't the same as the current value of x for this iteration, and FALSE otherwise.
As a result, both sides of & are now logical vector. & is an element-wise AND comparison the result of this is TRUE if elements of logical and x != x[i] are both TRUE and FALSE otherwise. Hence, after the first iteration, logical gets changed to a logical vector with TRUE for all elements x not the same as the i=1th element of x, and FALSE if they are the same.
Here is a bit of an example:
logical <- rep(TRUE, 10)
set.seed(1)
x <- sample(letters[1:4], 10, replace = TRUE)
> x
[1] "b" "b" "c" "d" "a" "d" "d" "c" "c" "a"
> logical
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> x != x[1]
[1] FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> logical & x != x[1]
[1] FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
This seems very complex. Do you get the same results as:
unique(x)
gives you? If I run my x above through myUniq() and unique() I get the same output:
> myUniq(x)
[1] "b" "d" "c" "a"
> unique(x)
[1] "b" "c" "d" "a"
(well, except for the ordering...)
Related
I have two vectors:
a = strsplit("po","")[[1]]
[1] "p" "o"
b = strsplit("polo","")[[1]]
[1] "p" "o" "l" "o"
I'm trying to compare them using ==.
Unfortunately, a==b gives an unexpected result.
a==b
[1] TRUE TRUE FALSE TRUE
While I expect to have:
[1] TRUE TRUE FALSE FALSE
So, what is causing this? and how can one achieve the expected result?
The problem seems to be related to the fact that the last element of both vectors is the same as changing b to e.g. polf does give the expected result, and also because setting b to pooo gives TRUE TRUE FALSE TRUE and not TRUE TRUE TRUE TRUE.
Edit
In other words, I'd expect missing elements (when lengths differ) to be passed as nothing (only "" seems to give TRUE TRUE FALSE FALSE, NA and NULL give different results).
c("p","o","","")==c("p","o","l","o")
[1] TRUE TRUE FALSE FALSE
The problem you've encountered here is due to recycling (not the eco-friendly kind). When applying an operation to two vectors that requires them to be the same length, R often automatically recycles, or repeats, the shorter one, until it is long enough to match the longer one. Your unexpected results are due to the fact that R recycles the vector c("p", "o") to be length 4 (length of the larger vector) and essentially converts it to c("p", "o", "p", "o"). If we compare c("p", "o", "p", "o") and c("p", "o", "l", "o") we can see we get the unexpected results of above:
c("p", "o", "p", "o") == c("p", "o", "l", "o")
#> [1] TRUE TRUE FALSE TRUE
It's not exactly clear to me why you would expect the result to be TRUE TRUE FALSE FALSE, as it's somewhat of an ambiguous comparison to compare a length-2 vector to a length-4 vector, and recycling the length-2 vector (which is what R is doing) seems to be the most reasonable default aside from throwing an error.
To get the result shown in OP we may put the two vectors in a list, adapt their lengths to maximum lengths (by adding NA's) and test if the comparison is %in% TRUE.
list(a, b) |>
(\(.) lapply(., `length<-`, max(lengths(.))))() |>
(\(.) do.call(\(x, y, ...) (x == y) %in% TRUE, .))()
# [1] TRUE TRUE FALSE FALSE
Note: R version 4.1.2 (2021-11-01)
Data:
a <- c("p", "o")
b <- c("p", "o", "l", "o")
We may create a function to pad space (stringr::str_pad) on the right if any of the strings have less number of characters before the strsplit
checkStrings <- function(s1, s2) {
n1 <- nchar(s1)
n2 <- nchar(s2)
if(n1 != n2) {
n <- max(n1, n2)
i1 <- which.min(c(n1, n2))
if(i1 == 1) {
s1 <- stringr::str_pad(s1, width = n, pad = " ", side = "right")
} else {
s2 <- stringr::str_pad(s1, width = n, pad = " ", side = "right")
}
}
s1v <- strsplit(s1, "")[[1]]
s2v <- strsplit(s2, "")[[1]]
return(s1v == s2v)
}
-testing
> checkStrings(str1, str2)
[1] TRUE TRUE FALSE FALSE
data
str1 <- "po"
str2 <- "polo"
Another way to solve the problem is to create a vector of length(b) and replace the first values with a:
a <- replace(character(length(b)), seq(a), a)
a
# [1] "p" "o" "" ""
Then we can appropriately compare the two vectors using ==:
a==b
# [1] TRUE TRUE FALSE FALSE
character(length(b)) creates a vector of "" of length(b). vector(,length(b)) is another option, but it creates a vector of FALSE instead.
If one wants to do it over two or more strings, a possible function is:
matchLength = function(strings){
l = lapply(strings,\(x) strsplit(x,"")[[1]])
larger = which.max(lengths(l))
lapply(l, function(x) replace(character(length(l[[larger]])), seq(x), x))
}
Which gives the desired output:
strings=c("po","polo","polka")
matchLength(strings)
# [[1]]
# [1] "p" "o" "" "" ""
#
# [[2]]
# [1] "p" "o" "l" "o" ""
#
# [[3]]
# [1] "p" "o" "l" "k" "a"
I've two vectors with different lengths and want to get all the occurrences of the first one in the second one.
I've tried:
vec <- c("jan-fev-mar", "abr-mai-jun", "jul-ago-set")
vec2 <- c("jan-fev-mar", "abr-mai-jun", "jul-ago-set", "out-nov-dez", "jan-fev-mar", "abr-mai-jun", "jul-ago-set", "out-nov-dez")
# It returns: TRUE TRUE TRUE
vec %in% vec2
I expect to get all the occurrences of vec on vec2, like: TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE
vec %in% vec2 returns TRUE for each element in vec if there is a match in all elements of vec2. The result is a logical vector of length equal to length(vec).
It seems you want vec2 %in% vec, which returns:
vec2 %in% vec
[1] TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE
You could interpret it like the following:
(vec2 %in% vec)[1]: There is a match of vec2[1] (= "jan-fev-mar") in vec? TRUE
(vec2 %in% vec)[2]: There is a match of vec2[2] (= "abr-mai-jun") in vec? TRUE
...
(vec2 %in% vec)[8]: There is a match of vec2[8] (= "out-nov-dez") in vec? FALSE
consider two vectors test1 <- c(1,2,3,4,5,3) test2 <- c(2,3,4,5,6,7,2) My goal is to create a vector, that only contains values, that can be found in both vectors. The result should be a vector like 2 3 4 5
For this matter I have two questions.
1) How can I get the wanted result in R? (even with 3 vectors, say test3 <- c(1,3,5,6,7) and I wanted to get all values that can be found in all three vectors 3 5
2) I tried to write a loop for this, but it would not do the job as intended. Curiously if I run each step of my code manually, everything works out as intended. What am I missing? Why doesn't my code work?
The idea is to create a vector test4 <- c(test1, test2) and iteratively check, if the value can be found in test1 and test2.
for(i in levels(as.factor(test4))){ #loop for all occuring levels
log1 <- rep(0,nlevels(as.factor(test4))) #create logical vector
log1 <- as.logical(log1) #to store results
if(is.element(i,test1) == TRUE & is.element(i,test2) == TRUE){
log1[which(levels(as.factor(test4)) == i)] <- TRUE
} else{
log1[which(levels(as.factor(test4)) == i)] <- FALSE
}
#if i is element of test1 and test2 the the corresponding entry
#in log1 becomes TRUE, otherwise FALSE
This leads the result
log1
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Now one can think of errors in the loops. To check for that, I printed the values and they are all correct:
for(i in levels(as.factor(test4))){
if(is.element(i,test1) == TRUE & is.element(i,test2) == TRUE){
print(TRUE)
} else{
print(FALSE)
}
}
[1] FALSE
[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
[1] FALSE
[1] FALSE
To check the index i I run this code
for(i in levels(as.factor(test3))){
j <- which(levels(as.factor(test3)) == i)
print(j)
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
All seems to be correct to this point. Now I run the code manually and get the wanted result:
test1 <- c(1,2,3,4,5)
test2 <- c(2,3,4,5,6,7)
test4 <- c(test1, test2)
log1 <- rep(0,nlevels(as.factor(test4)))
log1 <- as.logical(log1)
log1[1] <- is.element(1,test1) == TRUE & is.element(1,test2) == TRUE
log1[2] <- is.element(2,test1) == TRUE & is.element(2,test2) == TRUE
log1[3] <- is.element(3,test1) == TRUE & is.element(3,test2) == TRUE
log1[4] <- is.element(4,test1) == TRUE & is.element(4,test2) == TRUE
log1[5] <- is.element(5,test1) == TRUE & is.element(5,test2) == TRUE
log1[6] <- is.element(6,test1) == TRUE & is.element(6,test2) == TRUE
log1[7] <- is.element(7,test1) == TRUE & is.element(7,test2) == TRUE
log1
[1] FALSE TRUE TRUE TRUE TRUE FALSE FALSE
I tried to set a index j <- which(levels(as.factor(test4)) == i) and replace entries log[j].
The if loop is not necessary, but it helped to locate the problem. the for loop could be written as
for(i in levels(as.factor(test4))){
log1 <- rep(0,nlevels(as.factor(test4)))
log1 <- as.logical(log1)
log1[which(levels(as.factor(test4)) == i)] <- is.element(i,test1) == TRUE & is.element(i,test2) == TRUE
}
Which doesn't help. I really don't know, what I did wrong here. I searched on the web and on stack overflow, but I could not find a solution. I hope you can!
Gather unique values then keep duplicated :
all <- c(unique(test1), unique(test2))
all[duplicated(all)]
I can't get my head around this problem regarding ifelse:
Say I have two vectors:
x <- c(0, 1:4, 1:4)
y <- letters[1:3]
When I do
ifelse(x==2, y[x], x)
I get
"0" "1" "c" "3" "4" "1" "c" "3" "4"
However, it should return "b" at position 2 of vector y.
Why is ifelse doing that?
To explain this strange behaviour the source code of ifelse is helpful (see below).
As soon as you call ifelse the expressions passed as the arguments test, yes and no are evaluated resulting in:
Browse[2]> test
[1] FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE
Browse[2]> yes
[1] "a" "b" "c" NA "a" "b" "c" NA
Browse[2]> no
[1] 0 1 2 3 4 1 2 3 4
Observe that y[x] uses the values of x to pick values from y
and the value 0 is empty (= ignored) , values above 3 are NA,
that is why the `yes´ argument becomes
[1] "a" "b" "c" NA "a" "b" "c" NA
The code line
ans[test & ok] <- rep(yes, length.out = length(ans))[test & ok]
is then applied at the end and effectivly does update all TRUE-elements using the test logical vector:
yes[test]
which results in:
[1] "c" "c"
being stored in the result indices 3 and 7
ans[test & ok]
So the problem is using y[x] as second argument to ifelse + the non-intuitive ifelse behaviour to use a logical index to pick the "TRUE"-results from y[x]...
Lesson learned: Avoid complicated ifelse logic, it has lot of side effects (eg. you may loose the correct data type or attributes).
# ifelse function
function (test, yes, no)
{
if (is.atomic(test)) {
if (typeof(test) != "logical")
storage.mode(test) <- "logical"
if (length(test) == 1 && is.null(attributes(test))) {
if (is.na(test))
return(NA)
else if (test) {
if (length(yes) == 1) {
yat <- attributes(yes)
if (is.null(yat) || (is.function(yes) && identical(names(yat),
"srcref")))
return(yes)
}
}
else if (length(no) == 1) {
nat <- attributes(no)
if (is.null(nat) || (is.function(no) && identical(names(nat),
"srcref")))
return(no)
}
}
}
else test <- if (isS4(test))
methods::as(test, "logical")
else as.logical(test)
ans <- test
ok <- !(nas <- is.na(test))
if (any(test[ok]))
ans[test & ok] <- rep(yes, length.out = length(ans))[test &
ok]
if (any(!test[ok]))
ans[!test & ok] <- rep(no, length.out = length(ans))[!test &
ok]
ans[nas] <- NA
ans
}
You are using 0 as an index in the first element so that is why the alignment is messed up.
y[x]
[1] "a" "b" "c" NA "a" "b" "c" NA
So
> y[0]
character(0)
> y[1]
[1] "a"
> y[2]
[1] "b"
> y[3]
[1] "c"
So the length of y[x] is different than the length of x.
What you want is
> ifelse(x==2, y[x+1], x)
[1] "0" "1" "c" "3" "4" "1" "c" "3" "4"
but only if the first element is always 0.
Old answer
Because
x <- c(0, 1:4, 1:4)
returns
[1] 0 1 2 3 4 1 2 3 4
so x==2
returns
1] FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE
so for y = letters[1:3]
ifelse(x==2, y[x], x)
You are going to get the letters in the third and seventh positions.
The documentation for ifelse says that if one vector is too short it will be recycled which you would expect to be
c("a","b","c","a","b","c","a").
However when I try
ifelse(x==3, y[x], x)
I get
[1] "0" "1" "2" NA "4" "1" "2" NA "4"
Which tells me that the recycling is not working the way I would expect.
So that's the nominal reason you are getting the result. The reason it works like that is something I don't know now, but if I figure it out I will add to this answer. I suspect it has to do with the conversion to a string.
Just looking at y[x] I get
[1] "a" "b" "c" NA "a" "b" "c" NA
Which, by the way is only length 8 even though x is length 9.
So this really doesn't have to do with ifelse() at all, it is really about a different issue with recycling.
From Comment: It returns c because: which(x==2) returns 3 and 7. I don't know why it doesn't recycle 7 but chooses only 3. Perhaps because y is less than length 7
Try:
ind<-which(x==2)
ind1<-ind[1]-1
ifelse(x==2,y[ind1],x)
[1] "0" "1" "b" "3" "4" "1" "b" "3" "4"
Here's an attempt to make a function:
dynamic_index<-function(ind,x,y){
x<-x
y<-y
ind1<-which(x==ind)
ind2<-ind1[1]-1
ifelse(x==ind,y[ind2],x)
}
dynamic_index(2,x,y)
The result occurs lat way because the == function returns a vector of logicals:
x <- c(0, 1:4, 1:4)
y <- letters[1:3]
ifelse(x==2, y[x], x)
#look at x==2
x==2
[1] FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE
It's a logical vector that has true in the third position, not the second so it is the third value of y that is selected. This also shows why the answer that references the behavior of which is incorrect.
x <- c(0, 1:4, 1:4)
y <- letters[1:3]
ifelse(x==2, y[x], x)
in ifelse it will check each position in x .if it is true then it will print y[x] position it means the position which was checked in x and that position of value in Y will be printed .it will check all the values in X
I have the following vector:
p<-c(0,0,1,1,1,3,2,3,2,2,2,2)
I'm trying to write a function that returns TRUE if there are x consecutive duplicates in the vector.
The function call found_duplications(p,3) will return True because there are three consecutive 1's. The function call found_duplications(p,5) will return False because there are no 5 consecutive duplicates of a number. The function call found_duplications(p,4) will return True because there are four consecutive 4's.
I have a couple ideas. There's the duplicated() function:
duplicated(p)
> [1] FALSE TRUE FALSE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
I can make a for loop that counts the number of TRUE's in the vector but the problem is that the consecutive counter would be off by one. Can you guys think of any other solutions?
You could also do
find.dup <- function(x, n){
n %in% rle(x)$lengths
}
find.dup(p,3)
#[1] TRUE
find.dup(p,2)
#[1] TRUE
find.dup(p,5)
#[1] FALSE
find.dup(p,4)
#[1] TRUE
p<-c(0,0,1,1,1,3,2,3,2,2,2,2)
find.dup <- function(x, n) {
consec <- 1
for(i in 2:length(x)) {
if(x[i] == x[i-1]) {
consec <- consec + 1
} else {
consec <- 1
}
if(consec == n)
return(TRUE) # or you could return x[i]
}
return(FALSE)
}
find.dup(p,3)
# [1] TRUE
find.dup(p,4)
# [1] TRUE
find.dup(p,5)
# [1] FALSE