Creating multiple scatter plots with a for loop in R - r

I am looking for a way to use a for loop to loop through multiple columns of a csv and plot them.
Here is an example of how I have been making a scatter plot:
ggplot(top_scorers, aes(x=Win%,y=PER))
Top_scorers is the name of the csv with Win% and PER being columns in the file. I was hoping for a way to keep the x value the same while looping through different columns for the y value. If this is confusing please let me know and I will try to clear up any issues. Thanks

You can use a lapply based solution over the columns of your dataframe. Using you want to keep the first one as x dimension, use aes_string in ggplot
mtcars
library(ggplot2)
lapply(colnames(mtcars)[2:length(colnames(mtcars))], function(nm){
ggplot(mtcars) +
geom_point(aes_string(x =colnames(mtcars)[1],
nm))
})

Related

How can I make a histogram for more than one column of a data frame?

I have a dataframe df, now I want to make a histogram using ggplot2 function I want to merge the data of two columns 1 and 2
+geom_histogram
So I tried:
v<-c(df$column1,df$column2)
myplot = ggplot(v)
myplot+geom_histogram()
I get an error:
ggplot2 doesn't know how to deal with data of class numeric
Is there another way to merge columns?
My only problem is that I have yearly data and I just want to compare it without considering years. Phrased differently pour it all together.
v<-c(df$columm1,df$columm2)
library(ggplot2)
ggplot()+aes(v)+geom_histogram(binwidth = (0.01))+xlim(c(-0.1,0.1))+labs(x="Jahresuberschuss",y="count")`

Specifying names of columns to be used in a loop R

I have a df with over 30 columns and over 200 rows, but for simplicity will use an example with 8 columns.
X1<-c(sample(100,25))
B<-c(sample(4,25,replace=TRUE))
C<-c(sample(2,25,replace =TRUE))
Y1<-c(sample(100,25))
Y2<-c(sample(100,25))
Y3<-c(sample(100,25))
Y4<-c(sample(100,25))
Y5<-c(sample(100,25))
df<-cbind(X1,B,C,Y1,Y2,Y3,Y4,Y5)
df<-as.data.frame(df)
I wrote a function that melts the data generates a plot with X1 giving the x-axis values and faceted using the values in B and C.
plotdata<-function(l){
melt<-melt(df,id.vars=c("X1","B","C"),measure.vars=l)
plot<-ggplot(melt,aes(x=X1,y=value))+geom_point()
plot2<-plot+facet_grid(B ~ C)
ggsave(filename=paste("X_vs_",l,"_faceted.jpeg",sep=""),plot=plot2)
}
I can then manually input the required Y variable
plotdata("Y1")
I don't want to generate plots for all columns. I could just type the column of interest into plotdata and then get the result, but this seems quite inelegant (and time consuming). I would prefer to be able to manually specify the columns of interest e.g. "Y1","Y3","Y4" and then write a loop function to do all those specified.
However I am new to writing for loops and can't find a way to loop in the specific column names that are required for my function to work. A standard for(i in 1:length(df)) wouldn't be appropriate because I only want to loop the user specified columns
Apologies if there is an answer to this is already in stackoverflow. I couldn't find it if there was.
Thanks to Roland for providing the following answer:
Try
for (x in c("Y1","Y3","Y4")) {plotdata(x)}
The index variable doesn't have to be numeric

Data organisation in R: vectors of differing lengths as a single object

I'm trying to plot multiple overlaying density plots for two vectors on the same figure. As far as I know, I'm not able to do so unless they are in the same object.
In order to plot the data, I need to have a data.frame() with two columns; one for the value, and one to specify which vector each value belongs to.
My first vector contains 400 data. The second contains 1200. My current (somewhat inelegant) solution involves concatenating the two vectors into a new data.frame vector, and adding a second vector to the data.frame which contains 400 'a's and 1200 'b's, to indicate which vector the original data came from. This only works because I know how many data there were in each original vector.
Surely there must be a more efficient way to do this?
Let's say my original data are from dframe1$vector and dframe2$vector. I'm looking to create a new object called dframe3 which contains the columns $value and $original_vector_number. How do I do this?
You're trying to solve a problem you don't need to solve. You don't need to have them in the same object to plot their densities. Just use lines.
x <- rnorm(400,0,1)
y <- rnorm(1200,2,2)
plot(density(x))
lines(density(y))
Use library(reshape) and melt if you don't want to do this by hand:
library(reshape)
dframe <- data.frame(a = rnorm(400,1,1),b = rnorm(1200,1.2,2))
df.m <- melt(dframe)
library(ggplot2)
ggplot(df.m,aes(x = value,color = variable)) + geom_density()
Note that this will not truly provide the correct answer as putting the data frames together does expand the smaller of the two to fit the number of rows. The correct way to do this and plot in ggplot is the following:
By hand:
vecA <- data.frame(rnorm(400,1,1),'a')
vecB <- data.frame(rnorm(1200,1.2,2),'b')
names(vecA) <- c('value','name')
names(vecB) <- c('value','name')
dtf <- rbind(vecA,vecB)
library(ggplot2)
ggplot(dtf,aes(x=value,color=name))+geom_density()

Selecting matching row values from a column (data frame) to create plots using a loop in R

I have a set of data that looks like this,
species<-"ABC"
ind<-rep(1:4,each=24)
hour<-rep(seq(0,23,by=1),4)
depth<-runif(length(ind),1,50)
df<-data.frame(cbind(species,ind,hour,depth))
df$depth<-as.numeric(df$depth)
In this example, the column "ind" has more levels and they don't have always the same length (here each individual has 4 levels, but in reality some individuals have thousands of rows of data, while other only a few lines).
What I would like to do is to have an outer loop or function that will select all the rows from each individual ("ind") and generate a boxplot using the depth/hour columns.
This is the idea that I have in mind,
for (i in 1:length(unique(df$ind))){
data<-df[df$ind==df$ind[i],]
individual[i]<-data
plot.boxplot<-function(data){
boxplot(depth~hour,dat=data,xlab="Hour of day",ylab="Depth (m)")
}
}
par(mfrow=c(2,2),mar=c(5,4,3,1))
plot.boxplot(individual)
I realized that this loop might be inappropriate, but I am still learning. I can do the boxplot for each individual at a time, but I would like a faster, more efficient way of selecting the data for each individual and creating or storing boxplot results. This will be very useful for when I have many more individuals (instead of doing one at a time...). Thanks a lot in advance.
What about something like this?
par(mfrow=c(2,2))
invisible(
by(df,df$ind,
function(x)
boxplot(depth~hour,data=x,xlab="Hour of day",ylab="Depth (m)")
)
)
To provide some explanation, this runs a boxplot for each group of cases in df defined by df$ind. The invisible wrapper just makes it so that the bunch of output used for the boxplot is not written to the console.

Looping over ggplot2 with columns

I am attempting to loop over both data frames and columns to produce multiple plots. I have a list of data frames, and for each, I want to plot the response against one of several predictors.
For example, I can easily loop across data frames:
df1=data.frame(response=rpois(10,1),value1=rpois(10,1),value2=rpois(10,1))
df2=data.frame(response=rpois(10,1),value1=rpois(10,1),value2=rpois(10,1))
#Looping across data frames
lapply(list(df1,df2), function(i) ggplot(i,aes(y=response,x=value1))+geom_point())
But I am having trouble looping across columns within a data frame:
lapply(list("value1","value2"), function(i) ggplot(df1,aes_string(x=i,y=response))+geom_point())
I suspect it has something to do with the way I am treating the aesthetics.
Ultimately I want to string together lapply to generate all combinations of data frames and columns.
Any help is appreciated!
EDIT: Joran has it! One must put the non-list responses in quotes when using aes_string
lapply(list("value1","value2"), function(i) ggplot(df1,aes_string(x=i,y="response"))+geom_point())
For reference, here is stringing the lapply functions to generate all combinations:
lapply(list(df1,df2), function(x)
lapply(list("value1","value2"), function(i) ggplot(x,aes_string(x=i,y="response"))+geom_point() ) )
Inside of aes_string(), all variables need to be represented as character. Add quotes around "response".
lapply(list("value1","value2"),
function(i) ggplot(df1, aes_string(x=i, y="response")) + geom_point())

Resources