Using a variable to point to a vector - r

I am trying to a comparison between many different items using a Spearman test(from the package pspearman). I would like to have a way to automate the switching in of variables so that rather than running it one of at a time and running it would be able to just switch in each one and run all at once.
I tried to pass the list of the vectors that I would like to compare it to.
spearman.test(access_sam2$Area,access_sam2$B)
All the columns are in the dataframe access_sam2. In the y, position there is a list of columns that I need to run:
"CD8_PD1, CD8_PDL1, CD8_GBNEG_FOXP3, CD8_GBNEG_FOXP3_CD45RO, CD8_GBNEG_FOXP3NEG_CD45RO, CD8NEG_PD1, CD8NEG_PDL1, CD8NEG_FOXP3, CD8NEG_FOXP3_CD45RO,CD68_PDL1, CK_PDL1."
The problem is that it is not possible to use indexes because they are not sequential columns, and has 660+ columns.
I could write 7 spearman tests but changing all 7 for each Area variable seems inefficent

First set yvars to be a character variable which names the columns you want or is a numeric variable that gives their column numbers. We have shown the first few elements of it below. Then we define a function which takes a variable name and outputs the spearman test. Finally use Map to apply that function to each component of yvars.
yvars <- c("CD8_PD1", "CD8_PDL1", "CD8_GBNEG_FOXP3")
sptest <- function(yvar) spearman.test(access_sam2$Area, access_sam2[[yvar]])
Map(sptest, yvars)
Reproducible example
Below is a reproducible example using the mtcars data frame that comnes with R.
library(pspearman)
yvars <- c("cyl", "disp", "hp")
sptest <- function(yvar) spearman.test(mtcars$mpg, mtcars[[yvar]])
Map(sptest, yvars)

Related

Convert range of column titles to variables for vars() function

I have a data frame with 100+ variables listed in columns, and each subject in rows. I'd like to loop through each column to perform an ANOVA, and while the loop function works fine the step I am stuck on is listing which columns to loop through. Currently I can set these by manually typing/pasting each variable name but this is obviously not practical.
Currently the loop runs through my list of vars, to get this I currently just type the name of these columns manually...
variables <- vars(height, width, strength)
Which only loops for those selected 3 out of 100+ variables that I have had to manually type in.
I had thought I could list the range of column names for dataframe df between columns 3 to 100 within the vars expression as below...
variables <- vars(colnames(df[3:100]))
This just provides one variable of the name colnames(df[3:100]).
Any ideas to avoid typing or manually inserting commas/removing quotation marks from 100+ different variable names? Thanks in advance.
Consider do.call which is shorthand for expanded list of arguments to a function. Specifically, below:
variables <- do.call(vars, colnames(df)[3:100])
is equivalent to expanded version:
variables <- vars(colnames(df)[3], colnames(df)[4], ..., colnames(df)[100])

R - Problems creating dataframe from sapply results that combine lists

I'm trying to create a dataframe from the results of sapply.
The variable fits is a collection of nls objects, all with the same names for coefficients. I am able to create a dataframe where the vectors are the coefficients and the rows are the individual fits using this:
Coefficients <- sapply(fits,function(x){coef(x)},simplify = TRUE,USE.NAMES = TRUE)
Coefficients_df <- data.frame(t(Coefficients))
This works fine and when I do summary(Coefficients_df), I get what I expect, the normal summary of a dataframe with 3 columns.
But what I want is to have a dataframe that includes the coefficients and the average error for each fit. I can get a matrix using this:
Coef_and_Error <- sapply(fits,function(x){c(coef(x),list(error = summary(x)$sigma))},simplify = TRUE,USE.NAMES = TRUE)
But then I do not get a proper dataframe using this:
Coef_and_Error_df <- data.frame(t(Coef_and_Error))
RStudio reports this as a dataframe (4105 obs. of 4 variables), but when I try to see a summary of the dataframe, it spews out a long list of "1 -none- numeric", flooding the console.
Also, when I click the triangle next to "Coef_and_Error_df"in Rstudio it does not show the sort of summary I'm used to. Instead of a simple summary with one row per column (e.g. k num 233 189 391 ...) It says "k: list of 4105"
I have been able to get the output I want by creating two separate matrices, converting both to dataframes and adding one as a new vector to the other, but that shouldn't be necessary.
I tried using "append" instead of "c" to combine the values, but no luck.

correlation of several columns need to be calculated

I'm trying to get the correlation coefficient for corresponding columns of two csv files. I simply use the followings but get errors. consider each csv file has 50 columns
first values <- read.csv("")
second values <- read.csv("")
correlation.csv <- cor(x= first values , y=second values, method="spearman)
But i get x' must be numeric error!
subset of one csv file
Thanks for your help
The read.table function and all of it's derivatives return a data.frame which is an R list object. The mapply function processes lists in "parallel". If the matching columns are in the same order in the two datasets and have the same number of rows and do not have spaces in their names, it would be as simple as:
mapply(cor, first_values , second_values)
If it's more complicated tahn that, then you need to fill in the missing details with example data by editing the question (not by responding in comments.)
There must be some categorical variable in X.So you can first separate that categorical variable from X and then use X in cor() function.

how to make groups of variables from a data frame in R?

Dear Friends I would appreciate if someone can help me in some question in R.
I have a data frame with 8 variables, lets say (v1,v2,...,v8).I would like to produce groups of datasets based on all possible combinations of these variables. that is, with a set of 8 variables I am able to produce 2^8-1=63 subsets of variables like {v1},{v2},...,{v8}, {v1,v2},....,{v1,v2,v3},....,{v1,v2,...,v8}
my goal is to produce specific statistic based on these groupings and then compare which subset produces a better statistic. my problem is how can I produce these combinations.
thanks in advance
You need the function combn. It creates all the combinations of a vector that you provide it. For instance, in your example:
names(yourdataframe) <- c("V1","V2","V3","V4","V5","V6","V7","V8")
varnames <- names(yourdataframe)
combn(x = varnames,m = 3)
This gives you all permutations of V1-V8 taken 3 at a time.
I'll use data.table instead of data.frame;
I'll include an extraneous variable for robustness.
This will get you your subsetted data frames:
nn<-8L
dt<-setnames(as.data.table(cbind(1:100,matrix(rnorm(100*nn),ncol=nn))),
c("id",paste0("V",1:nn)))
#should be a smarter (read: more easily generalized) way to produce this,
# but it's eluding me for now...
#basically, this generates the indices to include when subsetting
x<-cbind(rep(c(0,1),each=128),
rep(rep(c(0,1),each=64),2),
rep(rep(c(0,1),each=32),4),
rep(rep(c(0,1),each=16),8),
rep(rep(c(0,1),each=8),16),
rep(rep(c(0,1),each=4),32),
rep(rep(c(0,1),each=2),64),
rep(c(0,1),128)) *
t(matrix(rep(1:nn),2^nn,nrow=nn))
#now get the correct column names for each subset
# by subscripting the nonzero elements
incl<-lapply(1:(2^nn),function(y){paste0("V",1:nn)[x[y,][x[y,]!=0]]})
#now subset the data.table for each subset
ans<-lapply(1:(2^nn),function(y){dt[,incl[[y]],with=F]})
You said you wanted some statistics from each subset, in which case it may be more useful to instead specify the last line as:
ans2<-lapply(1:(2^nn),function(y){unlist(dt[,incl[[y]],with=F])})
#exclude the first row, which is null
means<-lapply(2:(2^nn),function(y){mean(ans2[[y]])})

Specifying names of columns to be used in a loop R

I have a df with over 30 columns and over 200 rows, but for simplicity will use an example with 8 columns.
X1<-c(sample(100,25))
B<-c(sample(4,25,replace=TRUE))
C<-c(sample(2,25,replace =TRUE))
Y1<-c(sample(100,25))
Y2<-c(sample(100,25))
Y3<-c(sample(100,25))
Y4<-c(sample(100,25))
Y5<-c(sample(100,25))
df<-cbind(X1,B,C,Y1,Y2,Y3,Y4,Y5)
df<-as.data.frame(df)
I wrote a function that melts the data generates a plot with X1 giving the x-axis values and faceted using the values in B and C.
plotdata<-function(l){
melt<-melt(df,id.vars=c("X1","B","C"),measure.vars=l)
plot<-ggplot(melt,aes(x=X1,y=value))+geom_point()
plot2<-plot+facet_grid(B ~ C)
ggsave(filename=paste("X_vs_",l,"_faceted.jpeg",sep=""),plot=plot2)
}
I can then manually input the required Y variable
plotdata("Y1")
I don't want to generate plots for all columns. I could just type the column of interest into plotdata and then get the result, but this seems quite inelegant (and time consuming). I would prefer to be able to manually specify the columns of interest e.g. "Y1","Y3","Y4" and then write a loop function to do all those specified.
However I am new to writing for loops and can't find a way to loop in the specific column names that are required for my function to work. A standard for(i in 1:length(df)) wouldn't be appropriate because I only want to loop the user specified columns
Apologies if there is an answer to this is already in stackoverflow. I couldn't find it if there was.
Thanks to Roland for providing the following answer:
Try
for (x in c("Y1","Y3","Y4")) {plotdata(x)}
The index variable doesn't have to be numeric

Resources