how to get index of sorted array elements - r

Say I have an array in R : c(10, 7, 4, 3, 8, 2)
Upon sorting, this would be : c(2, 3, 4, 7, 8, 10)
What is the best way in R to return the indices for the sorted array elements from the original array. I'm looking for an output like :
6(index of 2), 4(index of 3), 3(index of 4), 2(index of 7), 5(index of 8), 1(index of 10)

The function you're looking for is order:
> x
[1] 10 7 4 3 8 2
> order(x)
[1] 6 4 3 2 5 1

sort has index.return argument, which by default is FALSE
x <- c(10,7,4,3,8,2)
sort(x, index.return=TRUE) #returns a list with `sorted values`
#and `$ix` as index.
#$x
#[1] 2 3 4 7 8 10
#$ix
#[1] 6 4 3 2 5 1
You can extract the index by
sort(x, index.return=TRUE)$ix
#[1] 6 4 3 2 5 1

Related

How to repeat a data list with two vectors in R

I have a list data X with two vectors
X[1]=(1,2,3,5,6,9,7,8)
X[2]=(2,3,4,5,6)
I want to get a new list data Y
Y[1]=(1,2,3,5,6,9,7,8,1,2,3,5,6,9,7,8)-repeat x[1]
Y[2]=(2,3,4,5,6,2,3,4,5,6)-repeat x[2]
I used Y<-rep(X,2) but get
Y[1]:(1,2,3,5,6,9,7,8)
Y[2]:(2,3,4,5,6)
Y[3]:(1,2,3,5,6,9,7,8)
Y[4]:(2,3,4,5,6)
How to do it right? Many thanks.
Use sapply/lapply :
sapply(X, rep, 2)
#[[1]]
# [1] 1 2 3 5 6 9 7 8 1 2 3 5 6 9 7 8
#[[2]]
# [1] 2 3 4 5 6 2 3 4 5 6
data
X <- list(c(1, 2, 3, 5, 6, 9, 7, 8), c(2, 3, 4, 5, 6))
You are having problems accessing the list elements - use [[1]] etc.
X <- list( c(1,2,3,5,6,9,7,8),
c(2,3,4,5,6))
Y = list(rep(X[[1]], 2),
rep(X[[2]], 2))
# R > Y
# [[1]]
# [1] 1 2 3 5 6 9 7 8 1 2 3 5 6 9 7 8
#
# [[2]]
# [1] 2 3 4 5 6 2 3 4 5 6
Using map from purrr
library(purrr)
map(X, rep, 2)
data
X <- list(c(1, 2, 3, 5, 6, 9, 7, 8), c(2, 3, 4, 5, 6))

How to condense non-sequential integers?

I'm trying to condense non-sequential numbers to subset haplotype data. I could do it manually, but given that I've got hundreds to do, I'd rather not if there's an alternative
class(haplotype1[[1]])
#[1] "integer"
haplotype1[[1]]
#[1] 1 2 3 4 5 7 8 9 10 11
I want to get [1:5, 7:11], which seems simple, but I haven't found a solution exactly matching my problem
Thanks!
Using cumsum to create the sequential groups,
tapply(x, cumsum(c(TRUE, diff(x) != 1)), FUN = function(i)paste(i[1], i[length(i)], sep = ':'))
# 1 2
#"1:5" "7:11"
It's unclear what type of object you want to create. I would just store the start and end values.
x <- c(1, 2, 3, 4, 5, 7, 8, 9, 10, 11)
starts <- x[!c(FALSE, diff(x) == 1L)]
#[1] 1 7
ends <- x[!c(diff(x) == 1L, FALSE)]
#[1] 5 11
paste(starts, ends, sep = ":")
#[1] "1:5" "7:11"
Maybe you want something like this ?
vec <- c(1, 2, 3, 4, 5, 7, 8, 9, 10, 11)
split(vec, cumsum(c(1,diff(vec)>1)))
# $`1`
# [1] 1 2 3 4 5
#
# $`2`
# [1] 7 8 9 10 11

How can I merge a list of vectors (in R) such that they have a minimum length?

I have the following four vectors:
A <- c(5, 5, 5, 5, 5, 5, 5)
B <- c(6, 6, 6)
C <- c(7, 7, 7, 7)
D <- c(8, 8, 8, 8)
In the end, I'd like to transform this list of vectors (here 4 vectors) into all pairwise combinations of full vectors (i.e. no splitting of individual vector) with the concatenated vector having a minimum length of 7. One possible solution would thereby be:
s1 <- c(5, 5, 5, 5, 5, 5, 5, 6, 6, 6)
s2 <- c(7, 7, 7, 7, 8, 8, 8, 8)
How would I do that?
I use combn to make pairwise combinations of your individual vector (ignoring direction c(A,B) == c(B,A) and self-concatenation, c(A,A) is not desired)
temp <- combn(list(A,B,C,D), 2)
threshold <- 7
L1 <- lapply(1:ncol(temp), function(x) Reduce("c", c(temp[1,x], temp[2,x])))
Discard concatenated vectors with length less than 7. I use lengths (note different from length) to grab length of all vectors in the list
L2 <- L1[lengths(L1) >= threshold]
Output
[[1]]
[1] 5 5 5 5 5 5 5 6 6 6
[[2]]
[1] 5 5 5 5 5 5 5 7 7 7 7
[[3]]
[1] 5 5 5 5 5 5 5 8 8 8 8
[[4]]
[1] 6 6 6 7 7 7 7
[[5]]
[1] 6 6 6 8 8 8 8
[[6]]
[1] 7 7 7 7 8 8 8 8

New variable with values depending on combination of other variables

I'm very inexperienced in R, and although this site has been tremendously helpful, I have a very specific situation and cannot find a solution. I imagine I need to write a function to accomplish this. However, my current time frame does not allow me to spend the time doing trial/error. (I apologize in advance for anything unclear).
Here is an example of my current data:
UniqueID, Time1.Feel1, Time2.Feel1.1, Time2.Feel1.2, Time2Num
1, 9, 5, 6, 1
1, 9, 7, 5, 2
2, 4, 3, 4, 1
2, 4, 5, 6, 2
3, 7, 4, 7, 1
3, 7, 6, 5, 2
I want to create a new variable: Time2.Feel1, which consists of the values of either Time2.Feel1.1 OR Time2.Feel1.2, depending on the value of Time2Num.
So, this:
UniqueID, Time1.Feel1, Time2.Feel1.1, Time2.Feel1.2, Time2Num, Time2.Feel1
1, 9, 5, 6, 1, 5
1, 9, 7, 5, 2, 5
2, 4, 3, 4, 1, 3
2, 4, 5, 6, 2, 6
3, 7, 4, 7, 1, 4
3, 7, 6, 5, 2, 5
I need to do this 30 times (i.e., Time2Num has values 1:30 and there are 30 different Time2.Feel1 variables: Time2.Feel1.1:30)
I then want to calculate a correlation between Time1.Feel1 and Time2.Feel1 for EACH UniqueID, creating a new data frame with the variables UniqueID and the new correlations. This part is less of a concern; I think I've figured out how to that, but if the combined steps could be done more simply, I'd prefer that.
Thanks in advance!
To expound on #thelatemail's comment, you could do this
dat <- read.csv(text="UniqueID, Time1.Feel1, Time2.Feel1.1, Time2.Feel1.2, Time2Num
1, 9, 5, 6, 1
1, 9, 7, 5, 2
2, 4, 3, 4, 1
2, 4, 5, 6, 2
3, 7, 4, 7, 1
3, 7, 6, 5, 2")
dat$Time2.Feel1 <- dat[c("Time2.Feel1.1","Time2.Feel1.2")][cbind(seq(nrow(dat)),dat$Time2Num)]
# UniqueID Time1.Feel1 Time2.Feel1.1 Time2.Feel1.2 Time2Num Time2.Feel1
# 1 1 9 5 6 1 5
# 2 1 9 7 5 2 5
# 3 2 4 3 4 1 3
# 4 2 4 5 6 2 6
# 5 3 7 4 7 1 4
# 6 3 7 6 5 2 5
Doing that 30 times isn't very efficient, so you could use a loop:
## creating some example data which I think matches your format
nr <- nrow(dat)
set.seed(1)
dat1 <- lapply(1:15, function(ii)
matrix(c(sample(1:9, nr * 2, replace = TRUE),
sample(1:2, nr, replace = TRUE)), nrow = nr,
dimnames = list(NULL, c(paste0('Time2.Feel1.', 1 + 2 * (ii - 1)),
paste0('Time2.Feel1.', 2 + 2 * (ii - 1)),
sprintf('Time%sNum', 2 + 2 * (ii - 1))))))
dat1 <- data.frame(do.call('cbind', dat1))
# Time2.Feel1.1 Time2.Feel1.2 Time2Num Time2.Feel1.3 Time2.Feel1.4 Time4Num
# 1 3 9 2 4 3 1
# 2 4 6 1 7 4 2
# 3 6 6 2 9 1 1
# 4 9 1 1 2 4 1
# 5 2 2 2 6 8 2
# 6 9 2 2 2 4 2
# Time2.Feel1.5 Time2.Feel1.6 Time6Num Time2.Feel1.7 Time2.Feel1.8 Time8Num
# 1 8 8 2 1 9 1
# 2 1 5 2 1 3 2
# 3 7 5 1 3 5 1
# 4 4 8 2 5 3 2
# 5 8 1 1 6 6 1
# 6 6 5 1 4 3 2
# Time2.Feel1.9 Time2.Feel1.10 Time10Num Time2.Feel1.11 Time2.Feel1.12 Time12Num
# 1 4 7 2 3 5 1
# 2 4 9 1 1 4 2
# 3 5 4 2 6 8 2
# 4 9 7 1 8 6 1
# 5 8 4 1 8 6 1
# 6 4 3 1 8 4 1
etc, etc
So you can start here. First you make the input vectors:
I call xx which is Time2.Feel1, Time2.Feel3, Time2.Feel5, etc
yy which is Time2.Feel2, Time2.Feel4, Time2.Feel6, etc; xx and yy are your two "choices"
and zz which is the "decision" column, Time2Feel1, Time4Feel1, Time6Feel1, etc
Then use mapply to do the indexing above but in a 1-1 mapping using those three input vectors with mapply. Note that zz, yy, and xx are all the same length
n <- 30
xx <- paste0('Time2.Feel1.', seq(1, n - 1, by = 2))
yy <- paste0('Time2.Feel1.', seq(2, n, by = 2))
zz <- sprintf('Time%sNum', seq(2, n, by = 2))
nn <- sprintf('Time%s.Feel1', seq(2, n, by = 2))
res <- mapply(function(x, y, z) dat1[, c(x, y)][cbind(1:nr, dat1[, z])],
xx, yy, zz, SIMPLIFY = FALSE)
res <- `colnames<-`(do.call('cbind', res), nn)
# Time2.Feel1 Time4.Feel1 Time6.Feel1 Time8.Feel1 Time10.Feel1 Time12.Feel1
# [1,] 9 4 8 1 7 3
# [2,] 4 4 5 3 4 4
# [3,] 6 9 7 3 4 8
# [4,] 9 2 8 3 9 8
# [5,] 2 8 8 6 8 8
# [6,] 2 4 6 3 4 8
And then you can combine the results back. You would need to reorder them if that is important to you
## combine results into original data
cbind(dat1, res)
When searching for the error I received when trying the answer from #user12202013, I came across this solution using ifelse, found here: Conditional assignment of one variable to the value of one of two other variables
Time2.Feel1 <- ifelse(Time2Num == 1, Time2.Feel1.1, ifelse(Time2Num == 2,
Time2.Feel1.2,""))
Although it is definitely not the most efficient solution, particularly because I need to nest it 30 times and I need to do it for 9 items, it solved my problem. A simpler answer is still welcome, though!
Thanks for your answers!
You want to do something like:
Time2.Feel1 = rep(NA, length(Time2Num))
Time2.Feel1[Time2Num == 1] <- Time2.Feel1.1
Time2.Feel1[Time2Num == 2] <- Time2.Feel1.2
This says to create a vector called Time2.Feel1 which we initialize with NA values. Then where Time2Num is one we fill in the values from Time2.Feel1.1 and where Time2Num is two we fill in the values from Time2.Feel1.2. If there is any place where Time2Num is neither 1 nor 2 thenTime2.Feel1` will have an NA value.
Edit:
Not sure what the error message is referring to since I am able to do this
# reproducible example
set.seed(1)
A <- letters
B <- sample(c(0, 1, NA), 26, TRUE)
A[B == 1] <- '5' # assignment where subscript contains NAs
A[B == 0] <- NA # assigning NA values
A
[1] NA "5" "5" "d" NA "f" "g" "5" "5" NA NA NA "m" "5" "o" "5" "q" "r" "5" "t" "u" NA "5" NA NA "5"
I would need to see more complete code to know what is causing the error.

Convenient indexing to end of array

So I'm coming from Python where I can index until the end of an array by either omitting the end index. (Note that Python is 0-indexed):
In [1]: x = range(10)
In [2]: x
Out[2]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [3]: x[4:]
Out[3]: [4, 5, 6, 7, 8, 9]
In [4]: x[-6:]
Out[4]: [4, 5, 6, 7, 8, 9]
In [5]: x[-1]
Out[5]: 9
Also, negative indices start from the end of the array. In R, so far I've found that I need to do things like this:
> x <- 0:9
> x[5:length(x)]
[1] 4 5 6 7 8 9
and so on. Is there any sort of syntactic sugar for length(x)? (Perhaps something like end that MATLAB uses).
You could use the tail function instead of indexing.
In the following, the final 6 elements of the vector x are returned. 6 is the default return length for tail (and head, shown below), and the second argument changes that. So if you were to write tail(x, 5) for example, the final 5 elements will be returned.
> x <- 0:9
> tail(x)
# [1] 4 5 6 7 8 9
> tail(x, 5)
# [1] 5 6 7 8 9
Similarly, there is a head function for viewing the first few elements, which operates in the same manner.
> head(x)
# [1] 0 1 2 3 4 5
> head(x, 5)
# [1] 0 1 2 3 4
Logical indexing is one method. As mentioned (but not demonstrating the second argument to head or tail):
> x <- 1:10
> head(x,-3)
[1] 1 2 3 4 5 6 7
> tail(x,-3)
[1] 4 5 6 7 8 9 10
> x[x>=4]
[1] 4 5 6 7 8 9 10
Mixing negative and positive indices is not allowed.

Resources