Percentile in list - r

I have the following list (h):
> h
[[1]]
[1] 0.9613971
[[2]]
[1] 0.9705882
[[3]]
[1] 0.9503676
[[4]]
[1] 0.9632353
[[5]]
[1] 0.9779412
[[6]]
[1] 0.9852941
[[7]]
[1] 0.9852941
[[8]]
[1] 0.9816176
I would like to add a new column that will calculate the percentile of each number in the list.
I tried to use the following and I get errors:
perc.rank <- function(x, xo) length(x[x <= xo])/length(x)*100
perc.rank <- function(x) trunc(rank(x))/length(x)
trunc(rank(h))/length(h)
In addition, I would to know given a number such as 0.9503676 (the third number) or its number (3) how can I know what is his percentile?

You can do this more efficiently by first converting your list into a vector as follows:
h <- unlist(h)
Next, create a function to find the percentile, which you can easily do by creating an empirical cdf function for your list as follow:
perc.rank <- ecdf(h)
To find the percentile for any number, example the third number, do the following:
perc.rank(0.9503676)
This will work even if the number isn't in your list. eg. perc.rank(0.91) should give you the percentile for 0.91 and you can also pass multiple numbers to the function like perc.rank(c(0.950,0.913,0.6))

Converting to dataframe will make things easier. Here is one solution
library(dplyr)
df<-data.frame(x=rnorm(10))
df%>%mutate(percrank=rank(x)/length(x)*100)
x percrank
1 1.56254900 100
2 -0.52554968 10
3 0.16410991 70
4 0.95150575 80
5 0.01960002 60
6 -0.22860395 30
7 1.43025012 90
8 -0.15836126 40
9 -0.01150753 50
10 -0.39064474 20

This adds two list elements to the current list h.
The second list element is the percentile as you have it.
The third list element is an ordinal rank number.
h <- list(.9613971, .9705882, .9503676, .9632353, .9779412, .9852941, .9852941, .9816176)
# create percentiles
rnk1 <- rank(unlist(h)) / length(h)
# ordinal rank
rnk2 <- rank(unlist(rnk1))
# combine the original lists with the two additional elements
res <- mapply(c, h, rnk1, rnk2, SIMPLIFY=FALSE)
res
[[1]]
[1] 0.9613971 0.2500000 2.0000000
[[2]]
[1] 0.9705882 0.5000000 4.0000000
[[3]]
[1] 0.9503676 0.1250000 1.0000000
[[4]]
[1] 0.9632353 0.3750000 3.0000000
[[5]]
[1] 0.9779412 0.6250000 5.0000000
[[6]]
[1] 0.9852941 0.9375000 7.5000000
[[7]]
[1] 0.9852941 0.9375000 7.5000000
[[8]]
[1] 0.9816176 0.7500000 6.0000000
Lookup function by ordinal rank
perc.rank <- function(x, xo) {
x[[match(xo, sapply(x, "[[", 1))]]
}
perc.rank(res, .9779412)
[1] 0.9779412 0.6250000 5.0000000
Which shows that .9779412 is ranked number 5

Related

Pairwise Similarity Matrix from Function (HPOSim)

I am trying to create a pairwise similarity matrix where I compare the similarity of each HPO term to every other HPO term using the "getSimWang" function of the R package HPOSim. Package available here: https://sourceforge.net/projects/hposim/
I can create the pairwise similarity matrix for a subset of the HPO terms (there are ~13,000) using the following:
list1<-c("HP:0002404","HP:0011933","HP:0030286")
custom <- function(x,y){
z <- getSimWang(x,y)
return(z)
}
outer(list1, list1, Vectorize(custom))
[,1] [,2] [,3]
[1,] 1.0000000 0.6939484 0
[2,] 0.6939484 1.0000000 0
[3,] 0.0000000 0.0000000 1
sapply(list1, function(x) sapply(list1, function(y) custom(x,y)))
HP:0002404 HP:0011933 HP:0030286
HP:0002404 1.0000000 0.6939484 0
HP:0011933 0.6939484 1.0000000 0
HP:0030286 0.0000000 0.0000000 1
However, when I tried to expand this code to apply to the rest of the HPO terms, R was calculating for 24+ hours, and when I used pbsapply to estimate the time it would take, it estimated it would be 20 days!
I have also tried mapply - but that only gives me a subset of the calculations (x1y1, x2y2, and x3y3) rather than all combinations (x1y1, x1y2, x1y3, etc).
mapply(custom, list1, list1)
HP:0002404 HP:0011933 HP:0030286
1 1 1
And the xapply solution here, but when I run that I lose the information about what terms are being compared:
xapply(FUN = custom, list1, list1)
[[1]]
[1] 1
[[2]]
[1] 0.6939484
[[3]]
[1] 0
[[4]]
[1] 0.6939484
[[5]]
[1] 1
[[6]]
[1] 0
[[7]]
[1] 0
[[8]]
[1] 0
[[9]]
[1] 1
Is there a different method that I am missing in order to get the pairwise (or ideally non-redundant pairwise) calculations for the similarity? Or is this really going to take 20 days?!?

sample list randomly and remove used values

I have a problem (maybe it is not that difficult but I cannot figure it out:
I have a list (l) of 25 and I want to divide the list into 5 groups but randomly. The problem I have is if I use sample(l, 5) and this 5times it does not give me unique samples. So basically, I am looking for is to choose 5 then remove them from the list and then sample again.
I hope someone has a solution... thanks
If you want Andrew's method as a function
sample2 <- function(x, sample.size){
split(x, sample(ceiling(seq_along(x)/sample.size)))
}
sample2(1:20, 5)
gives
$`1`
[1] 1 15 6 3 18
$`2`
[1] 11 7 5 10 14
$`3`
[1] 2 12 4 13 17
$`4`
[1] 19 16 20 8 9
Another method...
x <- 1:20
matrix(x[sample(seq_along(x),length(x))],ncol = 4)
Here we are randomly reordering your vector by sampling index values, then dumping results into a matrix so that its columns represent your five groups. You could also leave it as a vector, or make a list if you don't want your output as a matrix.
You could do something like this...
l <- as.list(LETTERS[1:25])
l2 <- split(l,rep(1:5,5)[sample(25)])
l2 #is then a list of five lists containing all elements of l...
$`1`
$`1`[[1]]
[1] "D"
$`1`[[2]]
[1] "I"
$`1`[[3]]
[1] "M"
$`1`[[4]]
[1] "W"
$`1`[[5]]
[1] "Y"
$`2`
$`2`[[1]]
[1] "C"
$`2`[[2]]
[1] "E"
$`2`[[3]]
[1] "H"
$`2`[[4]]
[1] "T"
$`2`[[5]]
[1] "X"
etc...

Replacing values in a list based on a condition

I have a list of values called squares and would like to replace all values which are 0 to a 40.
I tried:
replace(squares, squares==0, 40)
but the list remains unchanged
If it is a list, then loop through the list with lapply and use replace
squares <- lapply(squares, function(x) replace(x, x==0, 40))
squares
#[[1]]
#[1] 40 1 2 3 4 5
#[[2]]
#[1] 1 2 3 4 5 6
#[[3]]
#[1] 40 1 2 3
data
squares <- list(0:5, 1:6, 0:3)
I think for this purpose, you can just treat it as if it were a vector as follows:
squares=list(2,4,6,0,8,0,10,20)
squares[squares==0]=40
Output:
[[1]]
[1] 2
[[2]]
[1] 4
[[3]]
[1] 6
[[4]]
[1] 40
[[5]]
[1] 8
[[6]]
[1] 40
[[7]]
[1] 10
[[8]]
[1] 20

How to expand.grid specific objects in a list to form a new list

I have a list data as follows:
a<-list(10,c(8,9),5,14,c(3,7),c(2,3),5,13,c(3,4),4,5,8,12,c(2,3),c(5,7))
a
[[1]]
[1] 10
[[2]]
[1] 8 9
[[3]]
[1] 5
[[4]]
[1] 14
[[5]]
[1] 3 7
[[6]]
[1] 2 3
[[7]]
[1] 5
[[8]]
[1] 13
[[9]]
[1] 3 4
[[10]]
[1] 4
[[11]]
[1] 5
[[12]]
[1] 8
[[13]]
[1] 12
[[14]]
[1] 2 3
[[15]]
[1] 5 7
Then I want to use "expand.grid" in every 3 objects in list a.
That is to say, to expand.grid 1-3, 4-6, 7-9, 10-12, 13-15, respectively, then combine these result to a new list form.
Result should be something like following appearance.
I just use the foolishest way to solve this problem:list(expand.grid(a[1:3]),expand.grid(a[4:6]),expand.grid(a[7:9]),expand.grid(a[10:12]),expand.grid(a[13:15]))
When I try to use "sapply": sapply(1:(length(a)/3), function(x){expand.grid(a[1:3+3*x-3])})it didn't work, the result is as follows:
I don't know why, and could you help me with this problem, thank you so much!
We can create a grouping index with gl, split the sequence of 'a', subset the list elements using the index and use expand.grid.
lapply(split(seq_along(a), as.numeric(gl(length(a), 3, length(a)))),
function(i) expand.grid(a[i]))
We can also use sapply, but make sure we use simplify=FALSE as the option
The OP's code with simplify=FALSE gives
sapply(1:(length(a)/3), function(x)
{expand.grid(a[1:3+3*x-3])}, simplify=FALSE)
According to ?sapply
simplify: logical or character string; should the result be simplified
to a vector, matrix or higher dimensional array if possible? For
sapply it must be named and not abbreviated. The default value, TRUE,
returns a vector or matrix if appropriate, whereas if simplify =
"array" the result may be an array of “rank” (=length(dim(.))) one
higher than the result of FUN(X[[i]]).

Replace all values of a recursive list with values of a vector

Say, I have the following recursive list:
rec_list <- list(list(rep(1,5), 10), list(rep(100, 4), 20:25))
rec_list
[[1]]
[[1]][[1]]
[1] 1 1 1 1 1
[[1]][[2]]
[1] 10
[[2]]
[[2]][[1]]
[1] 100 100 100 100
[[2]][[2]]
[1] 20 21 22 23 24 25
Now, I would like to replace all the values of the list, say, with the vector seq_along(unlist(rec_list)), and keep the structure of the list. I tried using the empty index subsetting like
rec_list[] <- seq_along(unlist(rec_list))
But this doesn't work.
How can I achieve the replacement while keeping the original structure of the list?
You can use relist:
relist(seq_along(unlist(rec_list)), skeleton = rec_list)
# [[1]]
# [[1]][[1]]
# [1] 1 2 3 4 5
#
# [[1]][[2]]
# [1] 6
#
#
# [[2]]
# [[2]][[1]]
# [1] 7 8 9 10
#
# [[2]][[2]]
# [1] 11 12 13 14 15 16
If you wanted to uniquely index each element of a nested list, you could start with the rapply() function which is the recursive form of the apply() family. Here I use a special function that can uniquely index across a list of any structure
rapply(rec_list,
local({i<-0; function(x) {i<<-i+length(x); i+seq_along(x)-length(x)}}),
how="replace")
other functions are simplier, for example if you just wanted to seq_along each subvector
rapply(rec_list, seq_along, how="replace")

Resources