Ranking in R based on multiple criteria - r

I am working in R would like some help on ranking with multiple criteria. Using the mtcars dataset - I want to generate a new column in this case based initially on the rank of mtcars$mpg, then in the case of ties for this to be decided by the rank of mtcars$qsec for example. I have mtcars["rank"] = NA then mtcars$rank=rank(mtcars$mpg) but not sure how to include how to deal with the ties. I've tried mtcars$rank=order(mtcars$mpg, mtcars$qsec) but not getting the outcome I want - I want the initial ranking for mtcars$mpg and in the event of ties for this to be decided by the lower ranking in mtcars$qsec. Thanks.

I would first order it based on mpg and qsec.
mtcars <- mtcars[order(mtcars$mpg, mtcars$qsec), ]
Ranking is now simply giving indexing to the dataframe.
mtcars$rank <- 1:nrow(mtcars)

Related

R creating conditional random variables in data frame

I am rather a beginner with R and currently facing the following challenge where the search didn't provide me an answer.
I have a data frame that has a group assignment in the first column and now I want to create conditional random variables based on the group. E.g. everyone in group A should get a normally distributed random variable with mean 50 and stddev 10. The result of this random variable would then be added as additional column.
Example:
group_assigned <- c("A","A","B","C","A","C")
dframe <- data.frame(group_assigned)
groups <-c("A","B","C")
group_mean <- c(50,40,30)
group_stddev <- c(10,5,5)
group_properties <- data.frame(groups,group_mean, group_stddev)
Can you guide me to a solution? Thank you for your help!

Using a global function to identify which subfunctions to run

I have a dataset with a categorical variable that may take around 6 or 7 unique variables. Depending upon which variable that is, I need to run several functions - each of which is different depending upon the value of the categorical variable.
I don't know how to go about programming this so that things are called correctly. Keep in mind this might be simple here, but in my scenario is much more complicated with lots of sub functions.
library(dplyr)
func1_value_one = function(multiplication_value){
mtcars$check="value_one"
mtcars$mpg =mtcars*multiplication_value
filter(mtcars, mpg>60)
}
func0_value_zero = function(division_value){
mtcars$check="value_zero"
mtcars$mpg =mtcars$mpg / division_value
filter(mtcars, mpg <3)
}
help_function=function(category_p,change_p){
mtcars=return(filter(mtcars, vs==category_p))
data=ifelse(category_p==0,return(func0_value_zero(change_p)),return(func1_value_ one(change_p) ))
return(data)
}
#i want to filter for the values that meet the parameter passed in and then perform the update on the values
# right now I am not able to both filter for the values and then perform the correct function call.
help_function(0,2)
so
identifying_func(0,20) would return only only the rows from mtcars that has VS==0, divide all the mpg values by 20, and create a new column called check with all values equal to 'value_zero'
In the broader context, my dataset would use a flag to determine many different table join combinations and depending upon the data perform a variety of calculations and adjustments.

how to make groups of variables from a data frame in R?

Dear Friends I would appreciate if someone can help me in some question in R.
I have a data frame with 8 variables, lets say (v1,v2,...,v8).I would like to produce groups of datasets based on all possible combinations of these variables. that is, with a set of 8 variables I am able to produce 2^8-1=63 subsets of variables like {v1},{v2},...,{v8}, {v1,v2},....,{v1,v2,v3},....,{v1,v2,...,v8}
my goal is to produce specific statistic based on these groupings and then compare which subset produces a better statistic. my problem is how can I produce these combinations.
thanks in advance
You need the function combn. It creates all the combinations of a vector that you provide it. For instance, in your example:
names(yourdataframe) <- c("V1","V2","V3","V4","V5","V6","V7","V8")
varnames <- names(yourdataframe)
combn(x = varnames,m = 3)
This gives you all permutations of V1-V8 taken 3 at a time.
I'll use data.table instead of data.frame;
I'll include an extraneous variable for robustness.
This will get you your subsetted data frames:
nn<-8L
dt<-setnames(as.data.table(cbind(1:100,matrix(rnorm(100*nn),ncol=nn))),
c("id",paste0("V",1:nn)))
#should be a smarter (read: more easily generalized) way to produce this,
# but it's eluding me for now...
#basically, this generates the indices to include when subsetting
x<-cbind(rep(c(0,1),each=128),
rep(rep(c(0,1),each=64),2),
rep(rep(c(0,1),each=32),4),
rep(rep(c(0,1),each=16),8),
rep(rep(c(0,1),each=8),16),
rep(rep(c(0,1),each=4),32),
rep(rep(c(0,1),each=2),64),
rep(c(0,1),128)) *
t(matrix(rep(1:nn),2^nn,nrow=nn))
#now get the correct column names for each subset
# by subscripting the nonzero elements
incl<-lapply(1:(2^nn),function(y){paste0("V",1:nn)[x[y,][x[y,]!=0]]})
#now subset the data.table for each subset
ans<-lapply(1:(2^nn),function(y){dt[,incl[[y]],with=F]})
You said you wanted some statistics from each subset, in which case it may be more useful to instead specify the last line as:
ans2<-lapply(1:(2^nn),function(y){unlist(dt[,incl[[y]],with=F])})
#exclude the first row, which is null
means<-lapply(2:(2^nn),function(y){mean(ans2[[y]])})

Generating new variable values by subset

I have a data set, and I am trying to create a new variable with random values that are associated with a particular subset.
For example, given the data frame:
data(iris)
iris=iris
I want another variable that associates each value of iris$Species with a random number (between 0 and 1). This can be accomplished in a circuitous fashion by creating a data frame:
df=data.frame(unique(iris$Species),runif(length(unique(iris$Species))))
And merging it with the original data frame:
iris=merge(iris,df,by.x="Species",by.y="unique.iris.Species.")
This accomplishes what I want, but it is inelegant. Furthermore, if I wanted to replicate this process many times over different variables this process would be burdensome. What I would hope for is some quick indexing method that would hopefully look something like:
iris$Species.unif=runif(length(unique(iris$Species)))[iris$Species]
Given that indexing in R is typically very slick, I expect there is some way of doing this that I am not aware of.
Thank you in advance.
You may want to try by using levels:
iris <- iris
iris$species_unif <- iris$Species
levels(iris$species_unif ) <- runif(length(levels(iris$Species)))

computing a subset using a loop

I have a data frame with different variables and I want to build different subsets out of this data frame using some conditions and I want to use a loop because there will be a lot of subsets and this would be saving a lot of time.
This are the conditions:
Variable A has an ID for an area, variable B has different species (1,2,3, etc.) and I want to compute different subsets with these columns. The name of every subset should be the the ID of a point and the content should be all individuals of a certain specie in this point.
For a better understanding:
This would be the code for the one subset and I want to use a loop
A_2_NGF_Abies_alba <- subset(A_2_NGF, subset = Baumart %in% c("Abies alba"))
Is this possible doing in R
Thanks
Does this help you?
Baumdaten <- data.frame(pointID=sample(c("A_2_SEF","A_2_LEF","A_3_LEF"), 10, T), Baumart=sample(c("Abies alba", "Betula pendula", "Fagus sylvatica"), 10, T))
split(Baumdaten, Baumdaten[, 1:2])

Resources