I am completely new to using ggplot2 but heard of it's great plotting capabilities. I have a list with of different samples and for each sample observations according to three instruments. I would like to turn that into a figure with boxplots. I cannot include a figure but the code to make an example figure is included below. The idea is to have for each instrument a figure with boxplots for each sample.
In addition, next to the plots I would like to make a sort of legend giving a name to each of the sample numbers. I have no idea on how to start doing this with ggplot2.
Any help will be appreciated
The R-code to produce the example image is:
#Make data example
Data<-list();
Data$Sample1<-matrix(rnorm(30),10,3);
Data$Sample2<-matrix(rnorm(30),10,3);
Data$Sample3<-matrix(rnorm(30),10,3);
Data$Sample4<-matrix(rnorm(30),10,3);
#Make the plots
par(mfrow=c(3,1)) ;
boxplot(data.frame(Data)[seq(1,12,by=3)],names=c(1:4),xlab="Sample number",ylab="Instrument 1");
boxplot(data.frame(Data)[seq(2,12,by=3)],names=c(1:4),xlab="Sample number",ylab="Instrument 2");
boxplot(data.frame(Data)[seq(3,12,by=3)],names=c(1:4),xlab="Sample number",ylab="Instrument 3");
First, you'll want to set your data up differently: as a data.frame rather than a list of matrices. You want one column for sample, one column for instrument, and one column for the observed value. Here's a fake dataset:
df <- data.frame(sample = rep(c("One","Two","Three","Four"),each=30),
instrument = rep(rep(c("My Instrument","Your Instrument","Joe's Instrument"),each=10),4),
value = rnorm(120))
> head(df)
sample instrument value
1 One My Instrument 0.08192981
2 One My Instrument -1.11667766
3 One My Instrument 0.34117450
4 One My Instrument -0.42321236
5 One My Instrument 0.56033804
6 One My Instrument 0.32326817
To get three plots, we're going to use faceting. To get boxplots we use geom_boxplot. The code looks like this:
ggplot(df, aes(x=sample,y=value)) +
geom_boxplot() +
facet_wrap(~ instrument, ncol=1)
Rather than including a legend for the sample numbers, if you put the names directly in the sample variable it will print them below the plots. That way people don't have to reference numbers to names: it's immediately clear what sample each plot is for. Note that ggplot puts the factors in alphabetical order by default; if you want a different ordering you have to change it manually.
Related
I have a data set with multiple categories of study type for pond data. The column of overall categories is organized with each type having individual values that follow. I can make a histogram for each when I produce individual sheets to use. I have dug around for a while, but cannot find how to make the same histogram for the study types from the overall data set.
Piece of data sheet that I am working with. As you can see, there are multiple study types that we have each with their own data.
Basically, I want to pull each individual study type and the num_divided to make a histogram for the types. My end goal is to make one image with the 9 different histograms stacked above one another. Each having the same x-axis values and their individual names on the left-hand side.
The trouble I am running into is that when I make the histograms from the separated sheets, I cannot make the stacked image I want. I apologize in advance if this lacks some information, but I also thank anyone that offers advice.
ggplot2 is the best option.
You didn't give reproducible data but it's easy to make some. Here are 9 studies each with 100 values:
set.seed(111)
dat <- data.frame(study = rep(letters[1:9], each = 100), num_divided = rnorm(900))
What you want is a facetted plot.
library(ggplot2)
ggplot(dat, aes(x = num_divided)) + geom_histogram() + facet_grid(study ~ .)
If you don't know much about ggplot2, a good starting point is the R Cookbook.
I have one table of derived vegetation indices for 63 sample sites from different satellites. this gives me a table with 63 observations(sample sites) and 56 variables(1 Sample ID, 50 vegetation indices, 4 Biomass and 1 LAI). The last 5 columns of the table are the biomass and LAI, and the first column is the sample ID.
I want to generate a plot showing the relationship between a single vegetation index and one of the biomass parameters.
I am able to do this using the plot function, for one observation and variable at a time.
plot(data$Dry10, data$X8047EVImea)
I don't want to run this code 50 times and again by 5 sets for each biomass and LAI parameter.
Is there a way to loop or nested loop this plot function so that I can generate 200 graphs at once?
Also, I will place a regression line in each plot to see what vegetation index will best represent the amount of biomass present at the sample site.
This is my first post on stackoverflow, so please don't hesitate to request more information on the problem if I have missed something.
As noted in my comment you can accomplish this with a faceted plot in the ggplot2 package. This does require a little bit of data re-arrangement that can be accomplished with the reshape2 package. Here is some code that will be close to what you want to do but since I don't completely know your data formats it might take some fixes:
library(ggplot2)
library(reshape2)
library(dplyr)
vegDat <- data[,2:51]
bioDat <- data[,52:55]
## melt the data.frames so the biomass and vegetation headers are now variables
vegDatM <- melt(vegDat, variable.name='vegInd', value.name='vegVal')
bioDatM <- melt(bioDat, variable.name='bioInd', value.name='bioVal')
## Join these datasets to create all comparisons to be made
gdat <- bind_cols(vegDatM[rep(seq_len(nrow(vegDatM)), each=nrow(bioDatM)),],
bioDatM[rep(seq_len(nrow(bioDatM)), nrow(vegDatM)),])
## plot the data in a faceted grid
ggplot(gdat) + geom_point(aes(x=vegVal, y=bioVal)) + facet_grid(vegInd ~ bioInd)
Note that since there are 50 plots you may want to open a divice with a large height (or width if you swap the facet) i.e. pdf('foo.pdf', heigth=20). Hope this gets you on the right track.
I've been trying to create a proportional stacked bar graph using ggplot and a huge data set that is one column of a dummy variable and one column a factor variable with 14 different levels.
I posted a small sample of the data here.
Despite not having a clear y-variale in my data, I can produce a plot that is only really useful looking at the factors that have a lot of observations, but when there's only one or two, you can't see the proportion at all. The code I used is here.
ggplot(data,aes(factor(data$factor),fill=data$dummy))+
geom_bar()
ggplot says you need to apply a ddply function to the data frame.
ce<-ddply(data,"factor",transform, percent_y=y/sum(y)*100)
Their example doesn't really apply in the case of this data since there's no clear y-variable to call in the plot; just counts of each factor that is 1 or 0.
My best guess for a ddply function spits out an error about differeing number of rows.
ce<-ddply(plot,"factor(data$factor)",transform,
percent=sum(data$dummy)*100/(dim(data$dummy)[1]))
So... newbie R user here. I have some observations that I'd like to record using R and be able to add to later.
The items are sorted by weights, and the number at each weight recorded. So far what I have looks like this:
weights <- c(rep(171.5, times=1), rep(171.6, times=2), rep(171.7, times=4), rep(171.8, times=18), rep(171.9, times=39), rep(172.0, times=36), rep(172.1, times=34), rep(172.2, times=25))
There will be a total of 500 items being observed.
I'm going to be taking additional observations over time to (hopefully) see how the distribution of weights changes with use/wear. I'd like to be able plots showing either stacked histograms or boxplots.
What would be the best way to format / store this data to facilitate this kind of use case? A matrix, dataframe, something else?
As other comments have suggest, the most versatile (and perhaps useful) container (structure) for your data would be a data frame - for use with the library(ggplot2) for your future plotting and graphing needs(such as BoxPlot with ggplot and various histograms
Toy example
All the code below does is use your weights vector above, to create a data frame with some dummy IDs and plot a box and whisker plot, and results in the below plot.
library(ggplot2)
IDs<-sample(LETTERS[1:5],length(weights),TRUE) #dummy ID values
df<-data.frame(ID=IDs,Weights=weights) #make data frame with your
#original `weights` vector
ggplot(data=df,aes(factor(ID),Weights))+geom_boxplot() #box-plot
I don't know how can I plot in better way.
I have
df1 <- data.frame(x=c(1,3,5), y=c(2,4,6))
df2 <- data.frame(x=c(2,6,10,12), y=c(1,4,7,15)
Those data frames have x as time, y as its own value.
I have data-frames with different amount of elements
I want to combine this data by x (time), but I need one method of two to show them on one plot: a) to show df1.y on x axis of a plot to see distribution df2 by df1, so these two data frames should be connected by the time (x) but shown each on one of two axis, or b) to show three axis, and for df1.y the y axis should be at the right side of a plot.
For a better terminology, I will rename your example variables according to your sample plots.
df1 <- data.frame(time=c(1,3,5), memory=c(2,4,6))
df2 <- data.frame(time=c(2,6,10,12), threads=c(1,4,7,15))
Your first plot:
From your description, I assume that you want to do the following: For each available time value get the value of df1$memory and df2$threads. However, that value may not always be available. One suitable approach is to fill up missing values by linear interpolation. This may be done using the approx-function:
merged.time <- sort(unique(c(df1$time, df2$time))
merged.data <- data.frame(time = merged.time,
memory = approx(df1$time, df1$memory, xout=merged.time)$y
threads = approx(df2$time, df2$threads, xout=merged.time)$y
)
Note that appprox(...)$y just extracts the interpolated data.
Plotting may now be done using standard plotting commands (or, as your tags suggest, using ggplot2:
ggplot(data=merged.data, aes(x=memory, y=threads)) + geom_line()
Your second plot
... is not possible with ggplot2. That is for numerous reasons, for example see here.