using loop function to plot multiple columns - r

I'm trying to plot 2,695 different plots using the columns of my dataset. The x axis will be constant for all the datasets which is the "instrument.supersaturation" column. As for the y axis it will be the remaining columns label with date and times.
I have tried the following code to plot all 2,695 plots using the loop function. The code works and it shows the x-axis points as the instrument supersaturation values, but I'm having trouble plotting the y-axis using the concentrations of my column so it give a straight line on the plot.
library(ggplot2)
col_names <- colnames(rotated.plot.data)
col_names <- col_names[-1]
for(i in col_names){
plot <- ggplot(rotated.plot.data, aes(x=rotated.plot.data$instrument.supersaturation, y="i"))+
geom_point()
print(plot)}

Tried it in your way. The error arises from - i in inverted commas as ggplot does not recognize it. sym function removes inverted commas and eval function will evaluate it as an expression.
Phils method would be much easier if you are familiar with map()
library(ggplot2)
library(tidyverse)
iris<-iris %>% select(-c(Species))
for(i in 1:(length(colnames(iris))-1)){
plot <- ggplot(iris, aes(x=Sepal.Length, y=eval(sym(colnames((iris[i+1]))))))+
geom_point()
print(plot)}

Related

How do I make my row names appear on my x axis? And the numbers on from my variables appear as the y axis?

I created a dataframe with countries as row names and percentages as obs. from the variables, but when making a histogram it seems that the percentages from the variables are occupying the x axis and the country names aren't even there. How do I make it so that the countrie's names are on the x axis and the variables on the y?
Country <- c('Albania','Armenia','Austria','Belarus','Belgium','Bosnia and Herzegovina','Bulgaria','Croatia','Cyprus','Czechia','Denmark','Estonia','Finland','France','Georgia','Germany','Greece','Hungary','Iceland','Ireland','Italy','Latvia','Lithuania','Luxembourg','Malta','Moldova','Montenegro','Netherlands','Norway','Poland','Portugal','Romania','Russia','Serbia','Slovakia','Slovenia','Spain','Sweden','Switzerland','Turkey','Ukraine','United Kingdom')
Anxiety.Disorders <- c(3.38,2.73,5.22,3.03,4.92,3.70,3.84,3.74,5.61,3.59,5.18,3.01,3.59,6.37,2.46,6.37,5.58,3.69,5.15,5.66,5.57,3.04,3.06,5.19,5.14,2.77,3.55,6.43,7.33,3.68,5.52,3.41,3.02,3.60,3.61,3.60,5.14,5.16,5.28,3.85,3.09,4.43)
Depressive.Disorders <- c(2.42,3.16,3.66,4.84,4.35,2.88,3.30,3.60,3.88,3.25,3.62,4.78,5.08,4.55,2.98,4.42,4.56,3.53,3.55,4.37,3.94,4.44,5.20,3.95,3.69,3.77,2.96,4.34,3.95,2.72,5.27,2.88,4.36,3.15,2.87,3.58,3.91,4.84,4.17,3.76,5.02,4.35)
Bipolar.Disorder <- c(0.72,0.77,0.95,0.73,0.91,0.79,0.67,0.77,1.04,0.75,0.99,0.71,0.99,0.93,0.67,0.79,0.93,0.74,0.97,0.80,0.95,0.71,0.73,0.95,0.97,0.67,0.74,0.94,0.85,0.76,0.97,0.78,0.70,0.74,0.76,0.75,0.97,1.04,0.98,0.85,0.73,1.05)
G08 <- data.frame(Country, Anxiety.Disorders, Depressive.Disorders, Bipolar.Disorder)
row.names(G08) <- G08$Country
G08[1] <- NULL
hist(G08$Anxiety.Disorders)
I use the melt() call to create one observation per row. Then, I use ggplot to produce the bar plot.
library(ggplot2)
library(reshape2)
Country <- c('Albania','Armenia','Austria','Belarus','Belgium','Bosnia-Herzegovina','Bulgaria','Croatia','Cyprus','Czechia','Denmark','Estonia','Finland','France','Georgia','Germany','Greece','Hungary','Iceland','Ireland','Italy','Latvia','Lithuania','Luxembourg','Malta','Moldova','Montenegro','Netherlands','Norway','Poland','Portugal','Romania','Russia','Serbia','Slovakia','Slovenia','Spain','Sweden','Switzerland','Turkey','Ukraine','United Kingdom')
Anxiety.Disorders <- c(3.38,2.73,5.22,3.03,4.92,3.70,3.84,3.74,5.61,3.59,5.18,3.01,3.59,6.37,2.46,6.37,5.58,3.69,5.15,5.66,5.57,3.04,3.06,5.19,5.14,2.77,3.55,6.43,7.33,3.68,5.52,3.41,3.02,3.60,3.61,3.60,5.14,5.16,5.28,3.85,3.09,4.43)
Depressive.Disorders <- c(2.42,3.16,3.66,4.84,4.35,2.88,3.30,3.60,3.88,3.25,3.62,4.78,5.08,4.55,2.98,4.42,4.56,3.53,3.55,4.37,3.94,4.44,5.20,3.95,3.69,3.77,2.96,4.34,3.95,2.72,5.27,2.88,4.36,3.15,2.87,3.58,3.91,4.84,4.17,3.76,5.02,4.35)
Bipolar.Disorder <- c(0.72,0.77,0.95,0.73,0.91,0.79,0.67,0.77,1.04,0.75,0.99,0.71,0.99,0.93,0.67,0.79,0.93,0.74,0.97,0.80,0.95,0.71,0.73,0.95,0.97,0.67,0.74,0.94,0.85,0.76,0.97,0.78,0.70,0.74,0.76,0.75,0.97,1.04,0.98,0.85,0.73,1.05)
G08 <- data.frame(Country, Anxiety.Disorders, Depressive.Disorders, Bipolar.Disorder)
G08melt <- melt(G08, "Country")
G08.bar <- ggplot(G08melt, aes(x = Country, y=value)) +
geom_bar(aes(fill=variable),stat="identity", position ="dodge") +
theme_bw()+
theme(axis.text.x = element_text(angle=-40, hjust=.1))
G08.bar
Looking at your question, I think you tried to do a grouped column diagram instead of a histogram. You can do the plot directly using the barplot function from the graphics package. But before that, you need to convert your dataframe into a matrix. I removed the first column from G08.
mat<-G08[,-1]
Now just simply use the barplot function on the transpose of the matrix mat and use the names parameter of barplot to write the names of the Countries on the x-axis:
barplot(t(mat),beside=T,col=c('red','blue','gold'),border=NA,names=G08$Country,cex.names=0.45,las=2)
par(new=T)
legend('topright',c("Anxiety","Depressive","Bipolar"),fill=c("red","blue","gold"),cex=0.5,title='Disorder types')
Suggestion:
For a little bit of more 'fresh air' in the graph, you can just set beside=F in barplot and get a stacked column diagram:

R ggplot loop: in a for loop of ggplot histograms, how can you automatically set the y axis scale based on max frequency?

I have the following loop to produce several histograms based off certain columns (columns 2 to 5) in a larger dataset (df):
loop.vector <- 2:5
for (i in loop.vector){
x <- df[,i]
print(ggplot(df,aes(x=x)) + geom_histogram(binwidth=1)+scale_x_continuous(breaks=seq(0,max((x),1)))
}
I'd like to have my y-axis scale done automatically as I have for the x-axis, where it ranges between zero and whatever the maximum frequency value is, at increments of 1.
I know how to set these values manually if I were to plot, take a look at it, and enter the max y-axis value separately, but i'd like to do this automatically within the loop.
Thanks!
Answering the question: how to access max counts for a histogram plot?
The information you're missing on each plot in order to create your scale_y_continuous command is the maximum number of counts. There is a nice way to access this information once you have created a ggplot object, which is to use the built-in ggplot_build() function from ggplot2. For a given plot, myPlot, the following will give you a list of dataframes that are used for each layer in your plot:
ggplot_build(myPlot)$data
In the case of your example, you can access the count column of the first data frame (since you only have one histogram geom layer). Here's how you can write the function to do what you need it to do. I'll use an example dataset that can show you the results. Note that I've also changed your scale_x_continuous line to be able to accomodate positive and negative numbers by using a combination of min(), max(), and the ceiling() and floor() functions:
set.seed(1234)
df <- data.frame(
y1=rnorm(100,10,1),
y2=rnorm(100,12,3),
y3=rnorm(100,5,4),
y4=rnorm(100,13,5))
for (i in 1:ncol(df)) {
p <- ggplot(df, aes(df[,i])) +
geom_histogram(alpha=0.5, color='black', fill='red', binwidth=1) +
scale_x_continuous(breaks=seq(floor(min(df[,i])),ceiling(max(df[,i])))) +
ggtitle(names(df)[i])
# get max counts
max_count <- max(ggplot_build(p)$data[[1]]$count)
p <- p + scale_y_continuous(breaks=seq(0,max_count,1))
print(p)
}
Is there a better way?
While that gets you what need, it's typically hard to deal with multiple plots output to your graphics device iteratively. I would recommend reformatting the above code as a function and then using lapply() and using something like plot_grid() from cowplot to display the output. This suggested approach is detailed in the code below:
myPlots <- function(data, column, fill_color) {
# column = character name of column
p <- ggplot(data, aes_string(x=column)) +
geom_histogram(fill='red', binwidth=1, alpha=0.5, color='black') +
scale_x_continuous(breaks=seq(floor(min(data[column])), ceiling(max(data[column])),1)) +
ggtitle(column)
max_count <- max(ggplot_build(p)$data[[1]]$count)
p <- p + scale_y_continuous(breaks=seq(0,max_count,1))
return(p)
}
library(cowplot)
plotList <- lapply(names(df), myPlots, data=df)
plot_grid(plotlist = plotList)
Figured it out - my values are integers, so what ended up working was a variation on Duck's response. See below:
loop.vector <- 2:5
for (i in loop.vector){
x <- df[,i]
print(ggplot(df,aes(x=x)) + geom_histogram(binwidth=1)+scale_x_continuous(breaks=seq(0,max((x),1)))+scale_y_continuous(breaks=seq(0,max(table(x)),1)))
}

How to plot a CSV file and a mathematical function on the same ggplot2 plot?

I am trying to accomplish a very basic task using R+ggplot2: plotting data from a CSV file along with a polynomial fit obtained outside R. The way I am trying to do it is the following:
library(ggplot2)
# This CSV file contains dates and numeric values
data <- read.csv("data.csv")
# I want to plot using numeric values in the x axis, so I am adding this column
data$idx <- as.numeric(row.names(data))
# Some arbitrary function to plot over the data
eq = function(x){x*x}
# Plotting the data
p <- ggplot() + geom_line(aes(x=idx, y=Close), data=data)
# Adding the function
p + stat_function(fun=eq)
print(p)
The problem is that this is only plotting the data. I can't get the function to appear in the same plot no matter what I do. The function is supposed to be calculated at the idx values created above, by the way.
What am I doing wrong?

Plotting distributions of all columns in an R data frame

I'm trying to come up with a clean way to plot a grid view of all the columns in an R data frame. The problem is my dataframe has both discrete and numeric values in it. For simplicity's sake, we can use the sample dataset provided by R called iris. I would use par(mfrow(x, y)) to split my plots and maybe an mapply to cycle through each column? I'm unsure what's best here.
I'm thinking something akin to:
ggplot(iris, aes(Sepal.Length))+geom_density()
But instead plotted for each column. My concern is the "Species" column being discrete. Maybe "geom_density" wouldn't be the right plot to use here, but the idea is to see each of the data frame's variables distributions in one plot-- even the discrete ones. Bar plots for the discrete values would serve the purpose. Basically I'm trying to do the following:
Cycle through each column in the data frame
If numeric, plot a histogram
If discrete (a string basically), plot a bar plot
Any thoughts or advice would be appreciated!
You can use the function plot_grid from the cowplot package. This function takes a list of plots generated by ggplot and created a new plot, cobining them in a grid.
First, create a list of plots with lapply, using geom_density for numeric variables and geom_bar for everything else.
my_plots <- lapply(names(iris), function(var_x){
p <-
ggplot(iris) +
aes_string(var_x)
if(is.numeric(iris[[var_x]])) {
p <- p + geom_density()
} else {
p <- p + geom_bar()
}
})
Now we simply call plot_grid.
plot_grid(plotlist = my_plots)

R - How to histogram multiple matrixes using qplot/ggplot2

I'm using R to read and plot data from NetCDF files (ncdf4). I've started using R only recently thus I'm very confused, I beg your pardon.
Let's say from the files I obtain N 2-D matrixes of numerical values, each with different dimensions and many NA values.
I have to histogram these values in the same plot, with bins of given width and within given limits, the same for every matrix.
For just one matrix, I can do this:
library(ncdf4)
library(ggplot2)
file0 <- nc_open("test.nc")
#Read a variable
prec0 <- ncvar_get(file0,"pr")
#Some settings
min_plot=0
max_plot=30
bin_width=2
xlabel="mm/day"
ylabel="PDF"
title="Precipitation"
#Get maximum of array, exclude NAs
maximum_prec0=max(prec0, na.rm=TRUE)
#Store the histogram
histo_prec0 <- hist(prec0, xlim=c(min_plot,max_plot), right=FALSE, breaks=seq(0,ceiling(maximum_prec0),by=bin_width))
#Plot the histogram densities using points instead of bars, which is what we want
qplot(histo_prec0$mids, histo_prec0$density, xlim=c(min_plot,max_plot), color=I("yellow"), xlab=xlabel, ylab=ylabel, main=title, log="y")
#If necessary, can transform matrix to vector using
#vector_prec0 <- c(prec0)
However it occurs to me that it would be best to use a DataFrame for plotting multiple matrixes. I'm not certain of that nor on how to do it. This would also allow for automatic legends and all the advantages that come from using dataframes with ggplot2.
What I want to achieve is something akin to this:
https://copy.com/thumbs_public/j86WLyOWRs4N1VTi/scatter_histo.jpg?size=1024
Where on Y we have the Density and on X the bins.
Thanks in advance.
To be honest, it is unclear what you are after (scatter plot or histogram of data with values as points?).
Here are a couple of examples using ggplot which might fit your goals (based on your last sentence: "Where on Y we have the Density and on X the bins"):
# some data
nsample<- 200
d1<- rnorm(nsample,1,0.5)
d2<- rnorm(nsample,2,0.6)
#transformed into histogram bins and collected in a data frame
hist.d1<- hist(d1)
hist.d2<- hist(d2)
data.d1<- data.frame(hist.d1$mids, hist.d1$density, rep(1,length(hist.d1$density)))
data.d2<- data.frame(hist.d2$mids, hist.d2$density, rep(2,length(hist.d2$density)))
colnames(data.d1)<- c("bin","den","group")
colnames(data.d2)<- c("bin","den","group")
ddata<- rbind(data.d1,data.d2)
ddata$group<- factor(ddata$group)
# plot
plots<- ggplot(data=ddata, aes(x=bin, y=den, group=group)) +
geom_point(aes(color=group)) +
geom_line(aes(color=group)) #optional
print(plots)
However, you could also produce smooth density plots (or histograms) directly in ggplot:
ddata2<- cbind(c(rep(1,nsample),rep(2,nsample)),c(d1,d2))
ddata2<- as.data.frame(ddata2)
colnames(ddata2)<- c("group","value")
ddata2$group<- factor(ddata2$group)
plots2<- ggplot(data=ddata2, aes(x=value, group=group)) +
geom_density(aes(color=group))
# geom_histogram(aes(color=group, fill=group)) # for histogram instead
windows()
print(plots2)

Resources