I'm trying to iterate through columns in an R data.frame.
To do so, I'm hoping to write a for loop which loops over the column names and then filters the data.table accordingly with values.
My issue is that given the syntax:
df[which(df$XX == y), ]
XX needs to actually be a column name versus a variable that is a string equivalent to the column name.
Is there a way to loop over the columns via inputting a variable?
Many thanks!
I am trying to a comparison between many different items using a Spearman test(from the package pspearman). I would like to have a way to automate the switching in of variables so that rather than running it one of at a time and running it would be able to just switch in each one and run all at once.
I tried to pass the list of the vectors that I would like to compare it to.
spearman.test(access_sam2$Area,access_sam2$B)
All the columns are in the dataframe access_sam2. In the y, position there is a list of columns that I need to run:
"CD8_PD1, CD8_PDL1, CD8_GBNEG_FOXP3, CD8_GBNEG_FOXP3_CD45RO, CD8_GBNEG_FOXP3NEG_CD45RO, CD8NEG_PD1, CD8NEG_PDL1, CD8NEG_FOXP3, CD8NEG_FOXP3_CD45RO,CD68_PDL1, CK_PDL1."
The problem is that it is not possible to use indexes because they are not sequential columns, and has 660+ columns.
I could write 7 spearman tests but changing all 7 for each Area variable seems inefficent
First set yvars to be a character variable which names the columns you want or is a numeric variable that gives their column numbers. We have shown the first few elements of it below. Then we define a function which takes a variable name and outputs the spearman test. Finally use Map to apply that function to each component of yvars.
yvars <- c("CD8_PD1", "CD8_PDL1", "CD8_GBNEG_FOXP3")
sptest <- function(yvar) spearman.test(access_sam2$Area, access_sam2[[yvar]])
Map(sptest, yvars)
Reproducible example
Below is a reproducible example using the mtcars data frame that comnes with R.
library(pspearman)
yvars <- c("cyl", "disp", "hp")
sptest <- function(yvar) spearman.test(mtcars$mpg, mtcars[[yvar]])
Map(sptest, yvars)
I am having a bit of trouble with trying to script a code in R so that it separates a data frame based on the character in a data frame column without manually specifying a subset command. Below is the script for reproduction in R:
a=c("Model_A","R1",358723.0,171704.0,1.0,36.818500,4.0222700,1.38895000)
b=c("Model_A","R2",358723.0,171704.0,2.6,36.447300,4.0116100,1.37479000)
c=c("Model_A","R3",358723.0,171704.0,5.0,35.615400,3.8092600,1.34301000)
d=c("Model_B","R1",358723.0,171704.0,1.0,39.818300,2.4475600,1.50384000)
e=c("Model_B","R2",358723.0,171704.0,2.6,39.391600,2.4209900,1.48754000)
f=c("Model_B","R3",358723.0,171704.0,5.0,38.442700,2.3618400,1.45126000)
g=c("Model_C","R1",358723.0,171704.0,1.0,31.246400,2.2388000,1.30652000)
h=c("Model_C","R2",358723.0,171704.0,2.6,30.911600,2.2144800,1.29234000)
i=c("Model_C","R3",358723.0,171704.0,5.0,30.166700,2.1603000,1.26077000)
df=data.frame(a,b,c,d,e,f,g,h,i)
df=t(df)
df=data.frame(df)
col_list=list("Model","Receptor.name","X(m.)","Y(m.)","Z(m.)",
"nox","PM10","PM2.5")
colnames(df)=col_list
Essentially what I am trying is to separate the data frame (df) by the Model names ("Model_A", "Model_B", and "Model_C") and store them in new and different data frames. I have been trying to use the following command
df_test=split(df,with(df,interaction(Model,Model)), drop = TRUE)
This command separates the data frame but stores them in lists, and I don't know how to extract the lists individually and store them as data frames. Is there a simpler solution (avoiding the subset command if possible as I need the script to be dynamic and relative) or does anyone know how to use the last command shown above to separate the lists into individual data frames? Also if possible, is it possible to name the data frame after the model?
I apologize if these are a lot of questions but any help would be hugely appreciated! Thank you!
list2env(split(df, df$Model), envir = .GlobalEnv) will give you three dataframes in your global environment, named after the models, containing the relevant rows.
> Model_A
Model Receptor.name X(m.) Y(m.) Z(m.) nox PM10 PM2.5
a Model_A R1 358723 171704 1 36.8185 4.02227 1.38895
b Model_A R2 358723 171704 2.6 36.4473 4.01161 1.37479
c Model_A R3 358723 171704 5 35.6154 3.80926 1.34301
Although I would just keep the list of three dataframes by only using dflist <- split(df, df$Model).
Why a list? Lists allow you the use of lapply - a looping function that applies an operation over every list element. A quick example: Let's say you'd want to get a frequency table for both PM variables in your data for all three datasets.
For single elements in your global environment this would be
table(Model_A$PM10)
table(Model_A$PM2.5)
...
table(Model_C$PM2.5)
With a list, it would be
lapply(dflist, function(x) table(x["PM10"]))
lapply(dflist, function(x) table(x["PM2.5"]))
Right now, it seems to only save some lines of code, but better yet, the output of lapply is again a list, which you can store in an object and further use for different operations. Due to this, you can have a global environment with only a few objects in it, each being lists which contain certain similar objects, like dataframes, tables, summaries or even plots.
I'm trying to get the correlation coefficient for corresponding columns of two csv files. I simply use the followings but get errors. consider each csv file has 50 columns
first values <- read.csv("")
second values <- read.csv("")
correlation.csv <- cor(x= first values , y=second values, method="spearman)
But i get x' must be numeric error!
subset of one csv file
Thanks for your help
The read.table function and all of it's derivatives return a data.frame which is an R list object. The mapply function processes lists in "parallel". If the matching columns are in the same order in the two datasets and have the same number of rows and do not have spaces in their names, it would be as simple as:
mapply(cor, first_values , second_values)
If it's more complicated tahn that, then you need to fill in the missing details with example data by editing the question (not by responding in comments.)
There must be some categorical variable in X.So you can first separate that categorical variable from X and then use X in cor() function.
I have a df with over 30 columns and over 200 rows, but for simplicity will use an example with 8 columns.
X1<-c(sample(100,25))
B<-c(sample(4,25,replace=TRUE))
C<-c(sample(2,25,replace =TRUE))
Y1<-c(sample(100,25))
Y2<-c(sample(100,25))
Y3<-c(sample(100,25))
Y4<-c(sample(100,25))
Y5<-c(sample(100,25))
df<-cbind(X1,B,C,Y1,Y2,Y3,Y4,Y5)
df<-as.data.frame(df)
I wrote a function that melts the data generates a plot with X1 giving the x-axis values and faceted using the values in B and C.
plotdata<-function(l){
melt<-melt(df,id.vars=c("X1","B","C"),measure.vars=l)
plot<-ggplot(melt,aes(x=X1,y=value))+geom_point()
plot2<-plot+facet_grid(B ~ C)
ggsave(filename=paste("X_vs_",l,"_faceted.jpeg",sep=""),plot=plot2)
}
I can then manually input the required Y variable
plotdata("Y1")
I don't want to generate plots for all columns. I could just type the column of interest into plotdata and then get the result, but this seems quite inelegant (and time consuming). I would prefer to be able to manually specify the columns of interest e.g. "Y1","Y3","Y4" and then write a loop function to do all those specified.
However I am new to writing for loops and can't find a way to loop in the specific column names that are required for my function to work. A standard for(i in 1:length(df)) wouldn't be appropriate because I only want to loop the user specified columns
Apologies if there is an answer to this is already in stackoverflow. I couldn't find it if there was.
Thanks to Roland for providing the following answer:
Try
for (x in c("Y1","Y3","Y4")) {plotdata(x)}
The index variable doesn't have to be numeric