I was wondering if there is a package or generic function in R that counts sequence lengths.
For instance, if I input a sequence
s1<-c('a','a','b','a','a','a','b','b')
The proposed function F(s1,'a') would return a vector:
[2,3]
and F(s1,'b') would return [1,2]
Those madly typing people must have gone elsewhere:
s1<- c('a','a','b','a','a','a','b','b')
F1 <- function(s, el) {rle(s)$lengths[rle(s)$values==el] }
F1(s1, "a")
#[1] 2 3
F1(s1, "b")
#[1] 1 2
Related
I have the following numeric vectors x and y
x <- c(a=1,b=2,c=3)
y <- c(d=2,e=1,f=4)
I want to find the parallel maximum of each elements in the vectors, so I used:
> pmax(x,y)
a b c
2 2 4
The output has the right values, however, it returns the wrong names. The documentation for pmax mentions that it returns the attributes of the first argument, hence the a b c. Is there a way of getting the names of the maximum values? The desired output is as follow:
d b f
2 2 4
One option would be using max.col for finding the index of the maximum value per each row. For that, we need to create a matrix/data.frame by cbinding the vectors ('xy') and its names ('nmxy'). Create a row/column index ('ij') and subset the elements of 'xy' and set the names from 'nmxy'.
xy <- cbind(x,y)
nmxy <- cbind(names(x), names(y))
ij <- cbind(1:nrow(xy), max.col(xy))
setNames(xy[ij], nmxy[ij])
# d b f
# 2 2 4
Let
r <- pmax(x,y)
Simply add after the function a rename command
names(r)[y == r] <- names(y)[y == r]
If you want to be fancy, you can overload the pmax function to have the desired output.
old.pmax = pmax
pmax <- function(x,y){
r <- old.pmax(x,y)
names(r)[y == r] <- names(y)[y == r]
return(r)
}
anyone know if there's a build in function in R that can return indices of duplicated elements corresponding to the unique elements?
For instance I have a vector
a <- ["A","B","B","C","C"]
unique(a) will give ["A","B","C"]
duplicated(a) will give [F,F,T,F,T]
is there a build-in function to get a vector of indices for the same length as original vector a, that shows the location a's elements in the unique vecor (which is [1,2,2,3,3] in this example)?
i.e., something like the output variable "ic" in the matlab function "unique". (which is, if we let c = unique(a), then a = c(ic,:)).
http://www.mathworks.com/help/matlab/ref/unique.html
Thank you!
We can use match
match(a, unique(a))
#[1] 1 2 2 3 3
Or convert to factor and coerce to integer
as.integer(factor(a, levels = unique(a)))
#[1] 1 2 2 3 3
data
a <- c("A","B","B","C","C")
This should work:
cumsum( !duplicated( sort( a)) ) # one you replace Mathlab syntax with R syntax.
Or just:
as.numeric(factor(a) )
I am writing an xor function for a class, so although any recommendations on currently existing xor functions would be nice, I have to write my own. I have searched online, but have not been able to find any solution so far. I also realize my coding style may be sub-optimal. All criticisms will be welcomed.
I writing a function that will return an element-wise TRUE iff one condition is true. Conditions are given as strings, else they will throw an error due to unexpected symbols (e.g. >). I would like to output a list of the pairwise elements of a and b in which my xor function is true.
The problem is that, while I can create a logical vector of xor T/F based on the conditions, I cannot access the objects directly to subset them. It is the conditions that are function arguments, not the objects themselves.
'%xor%' <- function(condition_a, condition_b) {
# Perform an element-wise "exclusive or" on the conditions being true.
if (length(eval(parse(text= condition_a))) != length(eval(parse(text= condition_b))))
stop("Objects are not of equal length.") # Objects must be equal length to proceed
logical_a <- eval(parse(text= condition_a)) # Evaluate and store each logical condition
logical_b <- eval(parse(text= condition_b))
xor_vector <- logical_a + logical_b == 1 # Only one condition may be true.
xor_indices <- which(xor_vector == TRUE) # Store a vector which gives the indices of the elements which satisfy the xor condition.
# Somehow access the objects in the condition strings
list(a = a[xor_indices], b = b[xor_indices]) # Desired output
}
# Example:
a <- 1:10
b <- 4:13
"a < 5" %xor% "b > 4"
Desired output:
$a
[1] 1 5 6 7 8 9 10
$b
[1] 4 8 9 10 11 12 13
I have thought about doing a combination of ls() and grep() to find existing object names in the conditions, but this would run into problems if the objects in the conditions were not initialized. For example, if someone tried to run "c(1:10) < 5" %xor% "c(4:13) > 4".
This question already has answers here:
Call apply-like function on each row of dataframe with multiple arguments from each row
(12 answers)
Closed 8 years ago.
Anyway, I simplfy my question. We have a dataframe like this:
dt <- data.frame(x=c(1,2,3), y=c("a", "b", "c"))
f <- function(x, y){
#f is a function that only take vector whose length is one.
}
So I need to use f function like the following:
f(1, "a")
f(2, "b")
f(3, "c")
I know I can use for-loop as the following:
for (i in 1:3) {
f(dt$x[i], dt$y[i])
}
But it seems stupid and ugly.
Is there any better way to do such work?
one option would be to vectorize the function f which works nicely in some cases (i.e. vector return values), as in:
# returs a vector of length 1
f = function(x,y)paste(x[1],y[1])
# returs a vector with length == nrow(dt)
Vectorize(f)(dt$x,dt$y)
# returs a vector of length 2
f = function(x,y)rep(x[1],1)
# returns a matrix with 2 rows and nrow(dt) columns
Vectorize(f)(dt$x,dt$y)
f = function(x,y)rep(y[1],x[1])
# returns a list with length == nrow(dt)
Vectorize(f)(dt$x,dt$y)
but not in others (i.e. compound return values [lists]), as in:
# returns a list
f = function(x,y)list(x[1],y[1])
# returns a matrix but the second row is not useful
Vectorize(f)(dt$x,dt$y)
So I have 50 variables with value range from 1 to 4, and I want to count how many are 1 or 2 and how many are 3 or 4.
i.e. abc1=2, abc2=2, ... abc50=3
and the following is my code
#Create new variable to store the counted number to
abc.low=0
abc.high=0
And here is the code I am stuck at (it doesn't work)
for (i in 1:50){
ifelse (paste("abc",i,sep="")==1|paste("abc",i,sep="")==2,
(abc.low<-abc.low<-1),(abc.low<-abc.low))
}
for (i in 1:50){
ifelse (paste("abc",i,sep="")==3|paste("abc",i,sep="")==4,
(abc.high<-abc.high<-1),(abc.high<-abc.high))
}
I am assuming the paste function is not appropriate in what I am trying to do.
i.e)
abc1=3
abc1==3
#True
paste("abc",1,sep="")==3
# False
where the paste function should return true for my purpose.
I appreciate your input!
Try this for example:
table(unlist(mget(paste0('abc',1:50))))
mget create a list of variable that unlist transform it a to a vector.
table gives the occurrence of each value for example:
1 2 3 4
14 13 13 10
This will help you:
groups = rbinom(32, n = 50, prob = 0.4)
tapply(groups, groups, length)
Above tapply function returns count of elements in groups