R creating conditional random variables in data frame - r

I am rather a beginner with R and currently facing the following challenge where the search didn't provide me an answer.
I have a data frame that has a group assignment in the first column and now I want to create conditional random variables based on the group. E.g. everyone in group A should get a normally distributed random variable with mean 50 and stddev 10. The result of this random variable would then be added as additional column.
Example:
group_assigned <- c("A","A","B","C","A","C")
dframe <- data.frame(group_assigned)
groups <-c("A","B","C")
group_mean <- c(50,40,30)
group_stddev <- c(10,5,5)
group_properties <- data.frame(groups,group_mean, group_stddev)
Can you guide me to a solution? Thank you for your help!

Related

Ranking in R based on multiple criteria

I am working in R would like some help on ranking with multiple criteria. Using the mtcars dataset - I want to generate a new column in this case based initially on the rank of mtcars$mpg, then in the case of ties for this to be decided by the rank of mtcars$qsec for example. I have mtcars["rank"] = NA then mtcars$rank=rank(mtcars$mpg) but not sure how to include how to deal with the ties. I've tried mtcars$rank=order(mtcars$mpg, mtcars$qsec) but not getting the outcome I want - I want the initial ranking for mtcars$mpg and in the event of ties for this to be decided by the lower ranking in mtcars$qsec. Thanks.
I would first order it based on mpg and qsec.
mtcars <- mtcars[order(mtcars$mpg, mtcars$qsec), ]
Ranking is now simply giving indexing to the dataframe.
mtcars$rank <- 1:nrow(mtcars)

how to make groups of variables from a data frame in R?

Dear Friends I would appreciate if someone can help me in some question in R.
I have a data frame with 8 variables, lets say (v1,v2,...,v8).I would like to produce groups of datasets based on all possible combinations of these variables. that is, with a set of 8 variables I am able to produce 2^8-1=63 subsets of variables like {v1},{v2},...,{v8}, {v1,v2},....,{v1,v2,v3},....,{v1,v2,...,v8}
my goal is to produce specific statistic based on these groupings and then compare which subset produces a better statistic. my problem is how can I produce these combinations.
thanks in advance
You need the function combn. It creates all the combinations of a vector that you provide it. For instance, in your example:
names(yourdataframe) <- c("V1","V2","V3","V4","V5","V6","V7","V8")
varnames <- names(yourdataframe)
combn(x = varnames,m = 3)
This gives you all permutations of V1-V8 taken 3 at a time.
I'll use data.table instead of data.frame;
I'll include an extraneous variable for robustness.
This will get you your subsetted data frames:
nn<-8L
dt<-setnames(as.data.table(cbind(1:100,matrix(rnorm(100*nn),ncol=nn))),
c("id",paste0("V",1:nn)))
#should be a smarter (read: more easily generalized) way to produce this,
# but it's eluding me for now...
#basically, this generates the indices to include when subsetting
x<-cbind(rep(c(0,1),each=128),
rep(rep(c(0,1),each=64),2),
rep(rep(c(0,1),each=32),4),
rep(rep(c(0,1),each=16),8),
rep(rep(c(0,1),each=8),16),
rep(rep(c(0,1),each=4),32),
rep(rep(c(0,1),each=2),64),
rep(c(0,1),128)) *
t(matrix(rep(1:nn),2^nn,nrow=nn))
#now get the correct column names for each subset
# by subscripting the nonzero elements
incl<-lapply(1:(2^nn),function(y){paste0("V",1:nn)[x[y,][x[y,]!=0]]})
#now subset the data.table for each subset
ans<-lapply(1:(2^nn),function(y){dt[,incl[[y]],with=F]})
You said you wanted some statistics from each subset, in which case it may be more useful to instead specify the last line as:
ans2<-lapply(1:(2^nn),function(y){unlist(dt[,incl[[y]],with=F])})
#exclude the first row, which is null
means<-lapply(2:(2^nn),function(y){mean(ans2[[y]])})

Multiple comparisons of two proportions prop.test

I have a large number of treatment and control groups I need to provide a comparison of population proportions for. I'm looking for a way to loop through a data.frame providing the test against each of the categories.
Sample data:
test_data <- data.frame(
Category = c("A","A","B","B"),
Churn = c(56,46,83,58),
Other = c(180,555,144,86))
For example, compare category A (56/180 to 46/555) and so forth.
My initial solution:
by(test_data, test_data$Category,
function(x) prop.test(test_data$Churn, test_data$Other))
The problem: The solution outputs by category but provides a 4 sample test instead of a two sample test. I've found lots of solutions that iterate well through rows but not so much by a category. Output as a list is fine for now.
Really appreciate the help on this one!
Your by() function is incorrect. You are not using the x value that is passed in. By using the original variable name (test_data) no data is being subset for each by() call. Try
by(test_data, test_data$Category,
function(x) prop.test(x$Churn, x$Other))

Generating new variable values by subset

I have a data set, and I am trying to create a new variable with random values that are associated with a particular subset.
For example, given the data frame:
data(iris)
iris=iris
I want another variable that associates each value of iris$Species with a random number (between 0 and 1). This can be accomplished in a circuitous fashion by creating a data frame:
df=data.frame(unique(iris$Species),runif(length(unique(iris$Species))))
And merging it with the original data frame:
iris=merge(iris,df,by.x="Species",by.y="unique.iris.Species.")
This accomplishes what I want, but it is inelegant. Furthermore, if I wanted to replicate this process many times over different variables this process would be burdensome. What I would hope for is some quick indexing method that would hopefully look something like:
iris$Species.unif=runif(length(unique(iris$Species)))[iris$Species]
Given that indexing in R is typically very slick, I expect there is some way of doing this that I am not aware of.
Thank you in advance.
You may want to try by using levels:
iris <- iris
iris$species_unif <- iris$Species
levels(iris$species_unif ) <- runif(length(levels(iris$Species)))

computing a subset using a loop

I have a data frame with different variables and I want to build different subsets out of this data frame using some conditions and I want to use a loop because there will be a lot of subsets and this would be saving a lot of time.
This are the conditions:
Variable A has an ID for an area, variable B has different species (1,2,3, etc.) and I want to compute different subsets with these columns. The name of every subset should be the the ID of a point and the content should be all individuals of a certain specie in this point.
For a better understanding:
This would be the code for the one subset and I want to use a loop
A_2_NGF_Abies_alba <- subset(A_2_NGF, subset = Baumart %in% c("Abies alba"))
Is this possible doing in R
Thanks
Does this help you?
Baumdaten <- data.frame(pointID=sample(c("A_2_SEF","A_2_LEF","A_3_LEF"), 10, T), Baumart=sample(c("Abies alba", "Betula pendula", "Fagus sylvatica"), 10, T))
split(Baumdaten, Baumdaten[, 1:2])

Resources