Plotting distributions of all columns in an R data frame - r

I'm trying to come up with a clean way to plot a grid view of all the columns in an R data frame. The problem is my dataframe has both discrete and numeric values in it. For simplicity's sake, we can use the sample dataset provided by R called iris. I would use par(mfrow(x, y)) to split my plots and maybe an mapply to cycle through each column? I'm unsure what's best here.
I'm thinking something akin to:
ggplot(iris, aes(Sepal.Length))+geom_density()
But instead plotted for each column. My concern is the "Species" column being discrete. Maybe "geom_density" wouldn't be the right plot to use here, but the idea is to see each of the data frame's variables distributions in one plot-- even the discrete ones. Bar plots for the discrete values would serve the purpose. Basically I'm trying to do the following:
Cycle through each column in the data frame
If numeric, plot a histogram
If discrete (a string basically), plot a bar plot
Any thoughts or advice would be appreciated!

You can use the function plot_grid from the cowplot package. This function takes a list of plots generated by ggplot and created a new plot, cobining them in a grid.
First, create a list of plots with lapply, using geom_density for numeric variables and geom_bar for everything else.
my_plots <- lapply(names(iris), function(var_x){
p <-
ggplot(iris) +
aes_string(var_x)
if(is.numeric(iris[[var_x]])) {
p <- p + geom_density()
} else {
p <- p + geom_bar()
}
})
Now we simply call plot_grid.
plot_grid(plotlist = my_plots)

Related

using loop function to plot multiple columns

I'm trying to plot 2,695 different plots using the columns of my dataset. The x axis will be constant for all the datasets which is the "instrument.supersaturation" column. As for the y axis it will be the remaining columns label with date and times.
I have tried the following code to plot all 2,695 plots using the loop function. The code works and it shows the x-axis points as the instrument supersaturation values, but I'm having trouble plotting the y-axis using the concentrations of my column so it give a straight line on the plot.
library(ggplot2)
col_names <- colnames(rotated.plot.data)
col_names <- col_names[-1]
for(i in col_names){
plot <- ggplot(rotated.plot.data, aes(x=rotated.plot.data$instrument.supersaturation, y="i"))+
geom_point()
print(plot)}
Tried it in your way. The error arises from - i in inverted commas as ggplot does not recognize it. sym function removes inverted commas and eval function will evaluate it as an expression.
Phils method would be much easier if you are familiar with map()
library(ggplot2)
library(tidyverse)
iris<-iris %>% select(-c(Species))
for(i in 1:(length(colnames(iris))-1)){
plot <- ggplot(iris, aes(x=Sepal.Length, y=eval(sym(colnames((iris[i+1]))))))+
geom_point()
print(plot)}

Apply ggplot2 across columns

I am working with a dataframe with many columns and would like to produce certain plots of the data using ggplot2, namely, boxplots, histograms, density plots. I would like to do this by writing a single function that applies across all attributes (columns), producing one boxplot (or histogram etc) and then storing that as a given element of a list into which all the boxplots will be chained, so I could later index it by number (or by column name) in order to return the plot for a given attribute.
The issue I have is that, if I try to apply across columns with something like apply(df,2,boxPlot), I have to define boxPlot as a function that takes just a vector x. And when I do so, the attribute/column name and index are no longer retained. So e.g. in the code for producing a boxplot, like
bp <- ggplot(df, aes(x=Group, y=Attr, fill=Group)) +
geom_boxplot() +
labs(title="Plot of length per dose", x="Group", y =paste(Attr)) +
theme_classic()
the function has no idea how to extract the info necessary for Attr from just vector x (as this is just the column data and doesn't carry the column name or index).
(Note the x-axis is a factor variable called 'Group', which has 6 levels A,B,C,D,E,F, within X.)
Can anyone help with a good way of automating this procedure? (Ideally it should work for all types of ggplots; the problem here seems to simply be how to refer to the attribute name, within the ggplot function, in a way that can be applied / automatically replicated across the columns.) A for-loop would be acceptable, I guess, but if there's a more efficient/better way to do it in R then I'd prefer that!
Edit: something like what would be achieved by the top answer to this question: apply box plots to multiple variables. Except that in that answer, with his code you would still need a for-loop to change the indices on y=y[2] in the ggplot code and get all the boxplots. He's also expanded-grid to include different ````x``` possibilities (I have only one, the Group factor), but it would be easy to simplify down if the looping problem could be handled.
I'd also prefer just base R if possible--dplyr if absolutely necessary.
Here's an example of iterating over all columns of a data frame to produce a list of plots, while retaining the column name in the ggplot axis label
library(tidyverse)
plots <-
imap(select(mtcars, -cyl), ~ {
ggplot(mtcars, aes(x = cyl, y = .x)) +
geom_point() +
ylab(.y)
})
plots$mpg
You can also do this without purrr and dplyr
to_plot <- setdiff(names(mtcars), 'cyl')
plots <-
Map(function(.x, .y) {
ggplot(mtcars, aes(x = cyl, y = .x)) +
geom_point() +
ylab(.y)
}, mtcars[to_plot], to_plot)
plots$mpg

Sorting data vector for a histogram using ggplot and R

So I have 10.000 values in a vector from a Monte Carlo simulation. I want to plot this data as a histogram and a density plot. Doing this with the hist() function is easy, and it will calculate the frequency of the of the different values automatically. My ambition is however doing this in ggplot.
My biggest problem right now is how to transform the data so ggplot can handle it. I would like my x-axis to show the "price" while the x-axis shows the frequency or density. My data has a lot decimals as shown in the example data below.
myData <- c(266.8997, 271.5137, 225.4786, 223.3533, 258.1245, 199.5601, 234.2341, 231.7850, 260.2091, 184.5102, 272.8287, 203.7482, 212.5140, 220.9094, 221.2627, 236.3224)
My current code using the hist()-function, and the plot is shown below.
hist(myData,
xlab ="Price",
prob=TRUE)
lines(density(myData))
Histogram for the data vector containing 10000 values
How would you sort the data, and how would you do this with ggplot? I am thinking if I should round the numbers as well?
Hard to say exactly without seeing a sample of your data, but have you tried:
ggplot(myData, aes(Price)) + geom_histogram()
or:
ggplot(myData, aes(Price)) + geom_density()
Just try this:
ggplot() +
geom_bar(aes(myData)) +
geom_density(aes(myData))

R: Plot multiple box plots using columns from data frame

I would like to plot an INDIVIDUAL box plot for each unrelated column in a data frame. I thought I was on the right track with boxplot.matrix from the sfsmsic package, but it seems to do the same as boxplot(as.matrix(plotdata) which is to plot everything in a shared boxplot with a shared scale on the axis. I want (say) 5 individual plots.
I could do this by hand like:
par(mfrow=c(2,2))
boxplot(data$var1
boxplot(data$var2)
boxplot(data$var3)
boxplot(data$var4)
But there must be a way to use the data frame columns?
EDIT: I used iterations, see my answer.
You could use the reshape package to simplify things
data <- data.frame(v1=rnorm(100),v2=rnorm(100),v3=rnorm(100), v4=rnorm(100))
library(reshape)
meltData <- melt(data)
boxplot(data=meltData, value~variable)
or even then use ggplot2 package to make things nicer
library(ggplot2)
p <- ggplot(meltData, aes(factor(variable), value))
p + geom_boxplot() + facet_wrap(~variable, scale="free")
From ?boxplot we see that we have the option to pass multiple vectors of data as elements of a list, and we will get multiple boxplots, one for each vector in our list.
So all we need to do is convert the columns of our matrix to a list:
m <- matrix(1:25,5,5)
boxplot(x = as.list(as.data.frame(m)))
If you really want separate panels each with a single boxplot (although, frankly, I don't see why you would want to do that), I would instead turn to ggplot and faceting:
m1 <- melt(as.data.frame(m))
library(ggplot2)
ggplot(m1,aes(x = variable,y = value)) + facet_wrap(~variable) + geom_boxplot()
I used iteration to do this. I think perhaps I wasn't clear in the original question. Thanks for the responses none the less.
par(mfrow=c(2,5))
for (i in 1:length(plotdata)) {
boxplot(plotdata[,i], main=names(plotdata[i]), type="l")
}

Barplot in ggplot

I'm having problems making a barplot using ggplot.
I tried different combinations of qplot and gplot, but I either get a histogram, or it swaps my bars or it decides to use log-scaling.
Using the ordinary plot functions. I would do it like:
d <- 1/(10:1)
names(d) <- paste("id", 1:10)
barplot(d)
To plot a bar chart in ggplot2, you have to use geom="bar" or geom_bar. Have you tried any of the geom_bar example on the ggplot2 website?
To get your example to work, try the following:
ggplot needs a data.frame as input. So convert your input data into a data.frame.
map your data to aesthetics on the plot using `aes(x=x, y=y). This tells ggplot which columns in the data to map to which elements on the chart.
Use geom_plot to create the bar chart. In this case, you probably want to tell ggplot that the data is already summarised using stat="identity", since the default is to create a histogram.
(Note that the function barplot that you used in your example is part of base R graphics, not ggplot.)
The code:
d <- data.frame(x=1:10, y=1/(10:1))
ggplot(d, aes(x, y)) + geom_bar(stat="identity")

Resources