Split plot legend per group of variables - r

I would like to split the legend that is generated by ggplot based on the variables that are taken from the data frame. I'm starting from melted data.
Using the Iris data set, I'd like to get this:
What I'm looking for is to obtain a legend box that includes only the Sepal variables and another legend box with Petal variables. Originally I'm dealing with similar data from different sources and would like to make clear to which source belong certain variables.

Related

Making individual histograms for multiple categories from one sheet in R

I have a data set with multiple categories of study type for pond data. The column of overall categories is organized with each type having individual values that follow. I can make a histogram for each when I produce individual sheets to use. I have dug around for a while, but cannot find how to make the same histogram for the study types from the overall data set.
Piece of data sheet that I am working with. As you can see, there are multiple study types that we have each with their own data.
Basically, I want to pull each individual study type and the num_divided to make a histogram for the types. My end goal is to make one image with the 9 different histograms stacked above one another. Each having the same x-axis values and their individual names on the left-hand side.
The trouble I am running into is that when I make the histograms from the separated sheets, I cannot make the stacked image I want. I apologize in advance if this lacks some information, but I also thank anyone that offers advice.
ggplot2 is the best option.
You didn't give reproducible data but it's easy to make some. Here are 9 studies each with 100 values:
set.seed(111)
dat <- data.frame(study = rep(letters[1:9], each = 100), num_divided = rnorm(900))
What you want is a facetted plot.
library(ggplot2)
ggplot(dat, aes(x = num_divided)) + geom_histogram() + facet_grid(study ~ .)
If you don't know much about ggplot2, a good starting point is the R Cookbook.

Box plot categories from two variables

Sorry for basic question, I am new to R.
I would like to plot a box with subcategories and then with measurements taken over time.
For example I have tried this:
boxplot(field_data$week_1~field_data$field, ylab= 'number of infected plants')
This gives me two box plots (field is either ‘north’ or ‘south’). I want to split each boxplot into two boxplots by "position" variable (1 or 2). Is there a way to make it so that I will still have a plot with 2 main categories defined by "field", but then each will consist of two boxplots defined by "position" variable. I would also then like to plot the results from the ‘week_2’ readings next to the 'week_1' set of box plots. All of the data is in one df. I have other variables ('beds' and 'rows') with different levels too that categorise the measurements taken.
I have tried with ggplot but not sure how to do this or if this is the right function.
Thank you.

ggplot making a descriptive bar graph with no clear y variable

I've been trying to create a proportional stacked bar graph using ggplot and a huge data set that is one column of a dummy variable and one column a factor variable with 14 different levels.
I posted a small sample of the data here.
Despite not having a clear y-variale in my data, I can produce a plot that is only really useful looking at the factors that have a lot of observations, but when there's only one or two, you can't see the proportion at all. The code I used is here.
ggplot(data,aes(factor(data$factor),fill=data$dummy))+
geom_bar()
ggplot says you need to apply a ddply function to the data frame.
ce<-ddply(data,"factor",transform, percent_y=y/sum(y)*100)
Their example doesn't really apply in the case of this data since there's no clear y-variable to call in the plot; just counts of each factor that is 1 or 0.
My best guess for a ddply function spits out an error about differeing number of rows.
ce<-ddply(plot,"factor(data$factor)",transform,
percent=sum(data$dummy)*100/(dim(data$dummy)[1]))

How to structure data for R?

So... newbie R user here. I have some observations that I'd like to record using R and be able to add to later.
The items are sorted by weights, and the number at each weight recorded. So far what I have looks like this:
weights <- c(rep(171.5, times=1), rep(171.6, times=2), rep(171.7, times=4), rep(171.8, times=18), rep(171.9, times=39), rep(172.0, times=36), rep(172.1, times=34), rep(172.2, times=25))
There will be a total of 500 items being observed.
I'm going to be taking additional observations over time to (hopefully) see how the distribution of weights changes with use/wear. I'd like to be able plots showing either stacked histograms or boxplots.
What would be the best way to format / store this data to facilitate this kind of use case? A matrix, dataframe, something else?
As other comments have suggest, the most versatile (and perhaps useful) container (structure) for your data would be a data frame - for use with the library(ggplot2) for your future plotting and graphing needs(such as BoxPlot with ggplot and various histograms
Toy example
All the code below does is use your weights vector above, to create a data frame with some dummy IDs and plot a box and whisker plot, and results in the below plot.
library(ggplot2)
IDs<-sample(LETTERS[1:5],length(weights),TRUE) #dummy ID values
df<-data.frame(ID=IDs,Weights=weights) #make data frame with your
#original `weights` vector
ggplot(data=df,aes(factor(ID),Weights))+geom_boxplot() #box-plot

Boxplots using ggplot2

I am completely new to using ggplot2 but heard of it's great plotting capabilities. I have a list with of different samples and for each sample observations according to three instruments. I would like to turn that into a figure with boxplots. I cannot include a figure but the code to make an example figure is included below. The idea is to have for each instrument a figure with boxplots for each sample.
In addition, next to the plots I would like to make a sort of legend giving a name to each of the sample numbers. I have no idea on how to start doing this with ggplot2.
Any help will be appreciated
The R-code to produce the example image is:
#Make data example
Data<-list();
Data$Sample1<-matrix(rnorm(30),10,3);
Data$Sample2<-matrix(rnorm(30),10,3);
Data$Sample3<-matrix(rnorm(30),10,3);
Data$Sample4<-matrix(rnorm(30),10,3);
#Make the plots
par(mfrow=c(3,1)) ;
boxplot(data.frame(Data)[seq(1,12,by=3)],names=c(1:4),xlab="Sample number",ylab="Instrument 1");
boxplot(data.frame(Data)[seq(2,12,by=3)],names=c(1:4),xlab="Sample number",ylab="Instrument 2");
boxplot(data.frame(Data)[seq(3,12,by=3)],names=c(1:4),xlab="Sample number",ylab="Instrument 3");
First, you'll want to set your data up differently: as a data.frame rather than a list of matrices. You want one column for sample, one column for instrument, and one column for the observed value. Here's a fake dataset:
df <- data.frame(sample = rep(c("One","Two","Three","Four"),each=30),
instrument = rep(rep(c("My Instrument","Your Instrument","Joe's Instrument"),each=10),4),
value = rnorm(120))
> head(df)
sample instrument value
1 One My Instrument 0.08192981
2 One My Instrument -1.11667766
3 One My Instrument 0.34117450
4 One My Instrument -0.42321236
5 One My Instrument 0.56033804
6 One My Instrument 0.32326817
To get three plots, we're going to use faceting. To get boxplots we use geom_boxplot. The code looks like this:
ggplot(df, aes(x=sample,y=value)) +
geom_boxplot() +
facet_wrap(~ instrument, ncol=1)
Rather than including a legend for the sample numbers, if you put the names directly in the sample variable it will print them below the plots. That way people don't have to reference numbers to names: it's immediately clear what sample each plot is for. Note that ggplot puts the factors in alphabetical order by default; if you want a different ordering you have to change it manually.

Resources