Find duplicated values from list o vector in R - r

I have this example data
a<-c(1,5,7,8,10,15)
b<-c(2,6,7,9,10,20,31)
I need to find the duplicated values (the values which are in both vectors) and create new vector which include these number. It should looks like
c<-c(7,10)
Because of different length of vector I have tried to give them into list of vectors
l<-list(a=a,b=b)
and tried
duplicated(l)
or
duplicated(a,b)
but it gives nonsense output. I'm looking for correct solution but I cannot still find. Any advices?

Looks like a job for intersect()
a<-c(1,5,7,8,10,15)
b<-c(2,6,7,9,10,20,31)
c<-intersect(a,b)
c
[1] 7 10

c(a, b)[duplicated(c(a, b))]
produces:
[1] 7 10
duplicated applied to a vector returns a logical vector of the same length, with TRUE for every value that has already appeared earlier in the vector. You can use that to subset the original vector.
Note that if you don't care if values are duplicated within a single vector, then you should do:
a.b <- c(unique(a), unique(b))
a.b[duplicated(a.b)]

Keeping within the scope of the original question,
You could use match
> b[!is.na(match(a, b))]
# [1] 7 10
Or more simply, %in%
> b[a %in% b]
# [1] 7 10

Solved by creating function like
duplicated_values<-function(x){
if(x%in%b){
return(x)
}
}
values<-mclapply(c(1:length(a)),duplicated_values)

Related

R function like which to match multiple inputs to multiple values

I have a vector of multiple values that I want to match to multiple values without the use of a loop. Is there a function that can do this?
x <- c(2,5,4)
y <- 2:10
which(x==y) #won't work
Expected output is 1,4,3
In my real use case, you can assume that there is only 1 correct match and it will match y every time. I need this to be as fast as possible, that's why I'm trying to avoid a loop. As a side note, this part is already inside of a foreach loop.
You want match
match(x,y)
# 1 4 3
The which() version would be which(x %in% y). But I don't think this fits for your purpose as the expected output is 1,2,3.
But if you apply which(y %in% x) than your output will be 1,3,4

Can I rename vector elements based on their nchar length and their current values using R?

I have 6 vector values
vec<-c("Col1","Col2islonger","Col3isabitlonger","Col4isless","Col5willbelongest")
I run
nchar(vec)
my results are
4,12,16,10,17
Based on those values I would like to run a conditional statement or for loop whichever is better that would rename the columns based on their lengths and current values. For example
If nchar(vec) is less or equal to 10 keep the name as is. If the it is greater than 10 make sure the renamed element takes the first 9 values and skips over to the last.
newvec<- c("Col1","Col2islonr","Col3isabir","Col4isless","Col5willbt")
Try this:
#Data
vec<-c("Col1","Col2islonger","Col3isabitlonger","Col4isless","Col5willbelongest")
#Rename
vec2 <- ifelse(nchar(vec)<=10,vec,paste0(substr(vec,1,9),substr(vec,nchar(vec),nchar(vec))))
Output:
[1] "Col1" "Col2islonr" "Col3isabir" "Col4isless" "Col5willbt"
We can use sub to trim the vec. We create a logical vector with the number of character of the 'vec' ('i1'). Using that, we update for those elements that are greater than 10 characters to remove the characters between the 10th and the last character with sub and update it
i1 <- nchar(vec) > 10
vec[i1] <- sub("^(.{1,9}).*(.)", "\\1\\2", vec[i1])
-output
vec
#[1] "Col1" "Col2islonr" "Col3isabir" "Col4isless" "Col5willbt"

Accessing a vector element by name in R when some names are duplicated

I had the vector x<-1:5 I named its elements (wrongly) names(x)<-rep(c(letters[1:4], "a")). How can I access the last element by name?
x["a"] only return the first element named "a".
How about:
x[names(x) == "a"]
# a a
# 1 5
Or to get only the final one:
x[tail(which(names(x) == "a"), 1L)]
# a
# 5
This is more readable but marginally slower than getting at what tail does directly (see getAnywhere("tail.default")):
x[(idx <- which(names(x) == "a"))[length(idx)]
# a
# 5
The function duplicated() will give a boolean vector of occurrences except the first. In your case it would be only the second "a". Consequently,
x[duplicated(names(x))]
would give you the second entry. If you add more "a"-entries to the dataframe, you would get a vector of 2,3 and so on elements. All except the first. In that case you would have to cycle through or something.

Lookup of entries with multiplicities

Suppose I have a vector data <- c(1,2,2,1) and a reference table, say : ref <- cbind(c(1,1,2,2,2,2,4,4), c(1,2,3,4,5,6,7,8))
I would like my code to return the following vector : result <- c(1,2,3,4,5,6,3,4,5,6,1,2). It's like using the R function match(). But match() only returns the first occurrence of the reference vector. Similar for %in%.
I have tried functions like merge(), join() but I would like something with only the combination of rep() and seq() R functions.
You can try
ref[ref[,1] %in% data,2]
To return the second column value whenever the first column value is in the given set. You can wrap this in a lapply:
unlist(lapply(data, function(x) ref[ref[,1] ==x, 2]))
You can get the indices you are looking for like this:
indices <- sapply(data,function(xx)which(ref[,1]==xx))
Of course, that is a list, since the number of hits will be different for each entry of data. So you just unlist() this:
ref[unlist(indices),2]
[1] 1 2 3 4 5 6 3 4 5 6 1 2

the [] operator in matrices R

So I know that if you have:
m = matrix(1:9, 3,3)
z = as.matrix(expand.grid(1:3, 1:3))
and you do
m[z]
# you get back 1 2 3 4 5 6 7 8 9
But if you do
m[] = m[z]
# You get back a matrix..
I'm a little confused as to what this [] operator does? why doesnt something like m[][z] or m[z][] return a matrix? and how would I get it to return a matrix without assigning it to a variable m[]
Thanks!
The key here is that when the argument to "[]" (which is really a function) is a two column matrix as you provided, the result will be a vector where the first column specifies the row and the second column specifies the column in operated-upon matrix. This is a "feature" ( and a very handy one I might add) of the language.
The arguments might or might not contain all of the possible combinations of row and column so the result would not predictably be something that would sensibly be a matrix of the same dimensions. The form: m[] <- m[ z[1:4, ] ] will produce a result but also a warning. You should look at the result and then make an effort to understand what is happening.

Resources