I have an integer vector:
a <- c(1,1,3,1,4)
where each element in a indicates how many times its index should be replicated in a new vector.
So the resulting vector should be:
b <- c(1,2,3,3,3,4,5,5,5,5)
What would be the most efficient way to do this?
For example using rep:
rep(seq_along(a),a)
1 2 3 3 3 4 5 5 5 5
Another less efficient option is to use inverse.rle :
inverse.rle(list(lengths=a,values=seq_along(a)))
[1] 1 2 3 3 3 4 5 5 5 5
Related
This question already has answers here:
Calculating cumulative sum for each row
(6 answers)
Sum of previous rows in a column R
(1 answer)
Closed 3 years ago.
I have a vector of alternating TRUE and FALSE values:
dat <- c(T,F,F,T,F,F,F,T,F,T,F,F,F,F)
I'd like to number each instance of TRUE with a unique sequential number and to assign each FALSE value the number associated with the TRUE value preceding it.
therefore, my desired output using the example dat above (which has 4 TRUE values):
1 1 1 2 2 2 2 3 3 4 4 4 4 4
What I tried:
I've tried the following (which works), but I know there must be a simpler solution!!
whichT <- which(dat==T)
whichF <- which(dat==F)
l1 <- lapply(1:length(whichT),
FUN = function(x)
which(whichF > whichT[x] & whichF < whichT[(x+1)])
)
l1[[length(l1)]] <- which(whichF > whichT[length(whichT)])
replaceFs <- unlist(
lapply(1:length(whichT),
function(x) l1[[x]] <- rep(x,length(l1[[x]]))
)
)
replaceTs <- 1:length(whichT)
dat2 <- dat
dat2[whichT] <- replaceTs
dat2[whichF] <- replaceFs
dat2
[1] 1 1 1 2 2 2 2 3 3 4 4 4 4 4
I need a simpler and quicker solution b/c my real data set is 181k rows long!
Base R solutions preferred, but any solution works
cumsum(dat) will do what you want. When used in mathematical functions TRUE gets converted to 1 and FALSE to 0 so taking the cumulative sum will add 1 every time you see a TRUE and add nothing when there is a FALSE which is what you want.
dat <- c(T,F,F,T,F,F,F,T,F,T,F,F,F,F)
cumsum(dat)
# [1] 1 1 1 2 2 2 2 3 3 4 4 4 4 4
Instead of doing the indexing, it can be easily done with cumsum from base R. Here, TRUE/FALSE gets coerced to 1/0 and when we do the cumulative sum, whereever there is 1, it gets increment by 1
cumsum(dat)
#[1] 1 1 1 2 2 2 2 3 3 4 4 4 4 4
cumsum() is the most straightforward way, however, you can also do:
Reduce("+", dat, accumulate = TRUE)
[1] 1 1 1 2 2 2 2 3 3 4 4 4 4 4
The numeric variable weitage is given like,
> weitage
[1] 20 10 50 10 5 5
Then,
sort_wei<-sort(weitage,decreasing = T)
sort_wei
[1] 50 20 10 10 5 5
match(sort_wei,weitage)
results in 3 1 2 2 5 5. But actually needed position is 3 1 2 4 5 6. How to get these positions? Can i use match() in R?
We can try using the order function, which returns the indices of the input vector according to some sort order:
order(weitage, decreasing=TRUE)
#[1] 3 1 2 4 5 6
I have the following vector:
v = c(1,2,3,1,3,2,3,4,3,3,1, 5, 5,2)
I would like to obtain the vector
v_new = c(3,3,2,3,4,3,3,5,2,2)
from which I removed the first smallest elements which are 1, 1, 1, 2. Please not that I do not want to remove the other occurrence of the number 2. The function order almost gives me what I need, except its output is weird because it takes care that v[order(v)] gives the elements in increasing order and does not give the rank of the elements. rank also gives something strange:
v[rank(v)]
[1] 2 3 3 2 3 3 3 5 3 3 2 5 5 3
Any help would be much appreciated.
order is what you need, but to make it work, you need negative indexing. By itself, order returns the set of indices that would sort the input vector:
v = c(1,2,3,1,3,2,3,4,3,3,1,5,5,2)
order(v)
#> [1] 1 4 11 2 6 14 3 5 7 9 10 8 12 13
v[order(v)]
#> [1] 1 1 1 2 2 2 3 3 3 3 3 4 5 5
You can use negative indexing to remove elements from a vector:
(5:1)[c(-1, -2)]
#> [1] 3 2 1
Putting the two together, to remove the smallest elements from a vector, negate the first n elements of the results of order:
v[-order(v)[1:4]]
#> [1] 3 3 2 3 4 3 3 5 5 2
Note that order indexes tied elements from the front, which is why the first 2 is the one removed.
Assume i have the following dataset:
dt<-data.frame(X=sample(5),Y=sample(5))
now, i need to compare these two features and select the one which is smaller.
X Y
1 4 3
2 5 2
3 2 4
4 3 5
5 1 1
Then the expected answer would be
3
2
2
3
1
I know
min(dt[1,])
could be helpful but it only gives me 1
Use pmin, which is the vectorized version of min:
pmin(dt$X,dt$Y)
Like thus:
> dt<-data.frame(X=sample(5),Y=sample(5))
> dt
X Y
1 3 2
2 4 3
3 1 5
4 2 4
5 5 1
> pmin(dt$X,dt$Y)
[1] 2 3 1 2 1
high <- apply(dt[,c("X","Y")], 1, max)
is another implementation
integer(0) or length 0 element happens when one of X or Y is of length(0)
For min or max, a length-one vector. For pmin or pmax, a vector of length the longest of the input vectors, or length zero if one of the inputs had zero length.
(from documentation)
max(which(1:3 == 5),10) works but pmax(which(1:3 == 5),10) gives integer(0)
I have some data:
Length(cm) Frequency
1 5
2 2
3 3
4 5
Is there a way to expand these numbers in R without typing them out manually, so I can work out the std error of the mean for length, so I have a dataset like:
1 1 1 1 1 2 2 3 3 3 4 4 4 4 4
which I can then work on? Thanks
You can use rep.
> l <- 1:4
> f <- c(5,2,3,5)
> rep(l,f)
[1] 1 1 1 1 1 2 2 3 3 3 4 4 4 4 4
In addition to using rep to replicate the observations you could also use the wtd.mean and wtd.var functions in the Hmisc package to compute the weighted summaries without expanding (this will be better if the expanded vector would take up a large portion of memory).
I recommend using a dataframe:
sd(rep(data$length, data$freq))