Calculate the rank of each index in a vector - r

I'd like to calculate the rank of each index within a vector, e.g:
x <- c(0.82324952352792, 0.11953364405781, 0.588659686036408, 0.41683742380701,
0.11452184105292, 0.438547774450853, 0.586471405345947, 0.943002870306373,
0.28184655145742, 0.722095313714817)
calcRank <- function(x){
sorted <- x[order(x)]
ranks <- sapply(x, function(x) which(sorted==x))
return(ranks)
}
calcRank(x)
> calcRank(x)
[1] 9 2 7 4 1 5 6 10 3 8
Is there a better way to do this?

Why not just:
rank(x) # ..... ?
# [1] 9 2 7 4 1 5 6 10 3 8

match is what you want:
match(x, sort(x))

Related

Return a vector free of duplicates [without using unique() or duplicate()]

I am trying to get rid of duplicates within a vector without using unique function (as this one doesn`t work in that instance).
My loop looks as follows:
#finding and deleting duplicates
dupes <- function(x) {
for (i in 1:(length(x))){
while (is_true(all.equal(x[i],x[i+1]))){
x=x[-i]
}
}
print(x)
}
I want to run a vector through the function and get a vector (free of dupes) returned.
Here's one simple way to do it -
# for numeric vector
x <- c(1:8, 4:10)
# [1] 1 2 3 4 5 6 7 8 4 5 6 7 8 9 10
x[ave(x, x, FUN = seq_along) == 1]
# [1] 1 2 3 4 5 6 7 8 9 10
# for character vector
x <- as.character(iris$Species)
x[ave(x, x, FUN = seq_along) == 1]
# [1] "setosa" "versicolor" "virginica"
Here are a couple of ways to do it, assuming that your vector is NOT numeric (i.e. It is integer or character),
set.seed(666)
v1 <- sample(15:20, 10, replace = TRUE)
as.integer(names(table(v1)))
#[1] 15 16 17 19 20
rle(sort(v1))$values
#[1] 15 16 17 19 20
dplyr's distinct() function will work for you .
library(dplyr)
df_new <- distinct(your_vector)

Remove elements in a list in R

I want to remove part of the list where it is a complete set of the other part of the list. For example, B intersect A and E intersect C, therefore B and E should be removed.
MyList <- list(A=c(1,2,3,4,5), B=c(3,4,5), C=c(6,7,8,9), E=c(7,8))
MyList
$A
[1] 1 2 3 4 5
$B
[1] 3 4 5
$C
[1] 6 7 8 9
$E
[1] 7 8
MyListUnique <- RemoveSubElements(MyList)
MyListUnique
$A
[1] 1 2 3 4 5
$C
[1] 6 7 8 9
Any ideas ? Any know function to do it ?
As long as your data is not too huge, you can use an approach like the following:
# preparation
MyList <- MyList[order(lengths(MyList))]
idx <- vector("list", length(MyList))
# loop through list and compare with other (longer) list elements
for(i in seq_along(MyList)) {
idx[[i]] <- any(sapply(MyList[-seq_len(i)], function(x) all(MyList[[i]] %in% x)))
}
# subset the list
MyList[!unlist(idx)]
#$C
#[1] 6 7 8 9
#
#$A
#[1] 1 2 3 4 5
Similar to the other answer, but hopefully clearer, using a helper function and 2 sapplys.
#helper function to determine a proper subset - shortcuts to avoid setdiff calculation if they are equal
is.proper.subset <- function(x,y) !setequal(x,y) && length(setdiff(x,y))==0
#double loop over the list to find elements which are proper subsets of other elements
idx <- sapply(MyList, function(x) any(sapply(MyList, function(y) is.proper.subset(x,y))))
#filter out those that are proper subsets
MyList[!idx]
$A
[1] 1 2 3 4 5
$C
[1] 6 7 8 9

How to make a function to get average (B) when (A) is a certain condition on data.frame in R

this is for setting
#this is for setting
A <- c(1,1,2,2,2,3,4,4,5,5,5)
B <- c(1:10)
C <- c(11:20)
ABC <- data.frame(A,B,C)
#so, I made up my own ABC like this
A B C
1 1 1 11
2 1 2 12
3 2 3 13
4 2 4 14
5 2 5 15
6 3 6 16
7 4 7 17
8 4 8 18
9 5 9 19
10 5 10 20
On this setting,
I want to know, when (A) are in a specific condition, how to get average (B)or(C)
For example
if condition(A) are 2:4, get mean (B), and mean(C)
new_ABC <- subset(ABC, ABC$A >= 2 & ABC$A <= 4)
mean(newABC$B)
mean(newABC$C)
and it works.
But if I want to make a function like this, I tried severe days, I have no idea...
getMeanB <- function(condition){
for(i in min(condition) : max(condition){
# I do not really know what to do..
}
}
Any helps will very thanks!!
If the argument 'condition' is a vector, then we can do it
getMean <- function(data, condition, cName) {
minC <- min(condition)
maxC <- max(condition)
i1 <- data[[cName]] >= minC & data[[cName]] <= maxC
colMeans(data[i1,setdiff(names(data), cName)], na.rm = TRUE)
}
getMean(ABC, 2:4, "A")
# B C
# 5.5 15.5
NOTE: Here, the 'data' and 'cName' arguments are added to make it more dynamic and applied to other datasets with different column names.

Place on specific positions of dataframes specific values

DF <- data.frame(x1=c(NA,7,7,8,NA), x2=c(1,4,NA,NA,4)) # a data frame with NA
WhereAreMissingValues <- which(is.na(DF), arr.ind=TRUE) # find the position of the missing values
Modes <- apply(DF, 2, function(x) {which(tabulate(x) == max(tabulate(x)))}) # find the modes of each column
DF
WhereAreMissingValues
Modes
I would like to replace the NAs of each column of DF with the mode, accordingly.
Please for some help.
Map provides here a one line solution:
data.frame(Map(function(u,v){u[is.na(u)]=v;u},DF, Modes))
# x1 x2
#1 7 1
#2 7 4
#3 7 4
#4 8 4
#5 7 4
Here's how I would do this.
First I'll define an helper function
Myfunc <- function(x) as.numeric(names(sort(-table(x)))[1L])
Then just use lapply over the data set
DF[] <- lapply(DF, function(x){x[is.na(x)] <- Myfunc(x) ; x})
DF
# x1 x2
# 1 7 1
# 2 7 4
# 3 7 4
# 4 8 4
# 5 7 4

R Dataframe comparison which, scaling bad

The idea is extracting the position of df charactes with a reference of other df, example:
L<-LETTERS[1:25]
A<-c(1:25)
df<-data.frame(L,A)
Compare<-c(LETTERS[sample(1:25, 25)])
df[] <- lapply(df, as.character)
for (i in 1:nrow(df)){
df[i,1]<-which(df[i,1]==Compare)
}
head(df)
L A
1 14 1
2 12 2
3 2 3
This works good but scale very bad, like all for, any ideas with apply, or dplyr?
Thanks
Just use match
Your data (use set.seed when providing data using sample)
df <- data.frame(L = LETTERS[1:25], A = 1:25)
set.seed(1)
Compare <- LETTERS[sample(1:25, 25)]
Solution
df$L <- match(df$L, Compare)
head(df)
# L A
# 1 10 1
# 2 23 2
# 3 12 3
# 4 11 4
# 5 5 5
# 6 21 6

Resources