How to create a barplot with several variable - r

I am very new to R and I am trying to create a barplot with my infiltration on the y axis and burn on the x axis comparing the grazing treatment (example of the data set is below)
Infiltration Grazing Burn
3301.145496 G S
8165.771889 U S
9937.833576 G L
11576.5892 U L
32739.07643 G N
25923.84328 U N
25942.3 G C
So far I have produced the code so that it reads the table
#reads the basic data in
ana<-read.table("D:\\Dave.txt",head=T)
attach(ana)
head(ana)
dim(ana)
However I do not understand how to write the code to produce a barplot.
I have attached an image of how I produced the graph on excel
excel graph
Also I do not have the ggplot, how do I format it as "barplot(....."

With the sample dataset available, an example could be:
barplot(as.matrix(mtcars[,10:11]), beside = TRUE)

With your dataset and assuming 'ana$burn' is coded as a factor:
barplot(xtabs(ana$Infiltration ~ ana$Burn + ana$Grazing),beside = TRUE)
The beside = TRUE gives your barplot side by side, if you want to stack, put beside = FALSE (just look at the two, see what you prefer).
If 'ana$Burn' and 'ana$Grazing' not coded as factor, you need to add this before:
ana$Burn = as.factor(ana$Burn)
ana$Grazing = as.factor(ana$Grazing)

Related

Gradient colours in ggplot (relatively simple)

I have a dataframe which I have constructed by interpolating a series of origin destination points (they relate to a cycle share scheme that used to run in Seattle).
I've called the dataframe interpolated_flows:
line_id long lat seg_num count
1 1 -122.3170 47.61855 1 155
2 1 -122.3170 47.61911 2 155
3 1 -122.3170 47.61967 3 155
4 1 -122.3170 47.62023 4 155
5 1 -122.3169 47.62079 5 155
6 1 -122.3169 47.62135 6 155
What I would like to do (and I think is relatively simple if you know ggplot) is to plot these flows (lines) with the width of a line determined by the count and the gradient determined by the seg_num.
This is my attempt so far:
#Create variables to store relevant data for simplicity of code
X <- interpolated_flows$long
Y <- interpolated_flows$lat
sgn <- interpolated_flows$seg_num
ct <- interpolated_flows$count
#Create a map from flow data and include the bounded box as a base
g <- ggplot(interpolated_flows,aes(x=X, y=Y),group=interpolated_flows$line_id,color=sgn)
map <- ggmap(seattle_map,base_layer = g)
map <- map + geom_path(size=as.numeric(ct)/100,alpha=0.4)+
scale_alpha_continuous(range = c(0.03, 0.3))+coord_fixed(ratio=1.3)+
scale_colour_gradient(high="red",low="blue")
png(filename='Seattle_flows_gradient.png')
print(map)
dev.off()
And I end up with the image attached. I have spent a long time playing around with various parameters in the plotting part of the code but without success so would really appreciate if someone could point me in the right direction.
Edit:
base <- ggplot(interpolated_flows,aes(x=X, y=Y))
map <- ggmap(seattle_map,base_layer = g)
map <- map+geom_path(aes(color=seg_num,size=as.numeric(count)))+
scale_size_continuous(name="Journey Count",range=c(0.05,0.4))+
scale_color_gradient(name="Journey Path",high="white",low="blue",breaks=c(1,10), labels=c('Origin','Destination'))+
coord_fixed(ratio=1.3)+scale_x_continuous("", breaks=NULL)+
scale_y_continuous("", breaks=NULL)
png(filename='Seattle_flows_gradient.png')
print(map)
dev.off()
This is the plot I have now got to which looks like this. I have only two questions - 1) does anyone know a way to improve the resolution of the background map? I tried changing the zoom parameter in the get_map function but it didn't seem to help. 2) The lines I have plotted seem very 'white' heavy. It doesn't look to me like the gradient is evenly distributed. Anyone have any ideas why this would be and how to fix?
See if this suits you. I have create a new dataset so as to see diffencies. Once the data.frame is created you can use it as your first ggplot argument and reference columns by their names as Mako212 say.
long<-seq(-122,-123,length.out = 6)
lat<-seq(47,48,length.out = 6)
seg_num<-seq(1,6,1)
count<-seq(155,165,length.out = 6)
interpolated_flows<-data.frame(long,lat,seg_num,count,stringsAsFactors = false)
base_plot<-ggplot(interpolated_flows,aes(x=long, y=lat))
base_plot+
geom_path(aes(color=seg_num,size=as.numeric(count/100),alpha=lat))+
#notice that size, color and alpha are into aethetic
scale_size_continuous(name="Count")+
scale_alpha_continuous(name="Latitude",range = c(0.03, 0.3))+ #you won't need it if you don't want variable transparency
#just put the desired value into the aethteic
scale_color_gradient(name="Seg_num",high="red",low="blue")+
coord_fixed(ratio=1.3)
Hope it helps

Multiple histograms in Julia using Plots.jl

I am working with a large number of observations and to really get to know it I want to do histograms using Plots.jl
My question is how I can do multiple histograms in one plot as this would be really handy. I have tried multiple things already, but I am a bit confused with the different plotting sources in julia (plots.jl, pyplot, gadfly,...).
I don't know if it would help for me to post some of my code, as this is a more general question. But I am happy to post it, if needed.
There is an example that does just this:
using Plots
pyplot()
n = 100
x1, x2 = rand(n), 3rand(n)
# see issue #186... this is the standard histogram call
# our goal is to use the same edges for both series
histogram(Any[x1, x2], line=(3,0.2,:green), fillcolor=[:red :black], fillalpha=0.2)
I looked for "histograms" in the Plots.jl repo, found this related issue and followed the links to the example.
With Plots, there are two possibilities to show multiple series in one plot:
First, you can use a matrix, where each column constitutes a separate series:
a, b, c = randn(100), randn(100), randn(100)
histogram([a b c])
Here, hcat is used to concatenate the vectors (note the spaces instead of commas).
This is equivalent to
histogram(randn(100,3))
You can apply options to the individual series using a row matrix:
histogram([a b c], label = ["a" "b" "c"])
(Again, note the spaces instead of commas)
Second, you can use plot! and its variants to update a previous plot:
histogram(a) # creates a new plot
histogram!(b) # updates the previous plot
histogram!(c) # updates the previous plot
Alternatively, you can specify which plot to update:
p = histogram(a) # creates a new plot p
histogram(b) # creates an independent new plot
histogram!(p, c) # updates plot p
This is useful if you have several subplots.
Edit:
Following Felipe Lema's links, you can implement a recipe for histograms that share the edges:
using StatsBase
using PlotRecipes
function calcbins(a, bins::Integer)
lo, hi = extrema(a)
StatsBase.histrange(lo, hi, bins) # nice edges
end
calcbins(a, bins::AbstractVector) = bins
#userplot GroupHist
#recipe function f(h::GroupHist; bins = 30)
args = h.args
length(args) == 1 || error("GroupHist should be given one argument")
bins = calcbins(args[1], bins)
seriestype := :bar
bins, mapslices(col -> fit(Histogram, col, bins).weights, args[1], 1)
end
grouphist(randn(100,3))
Edit 2:
Because it is faster, I changed the recipe to use StatsBase.fit for creating the histogram.

How to put 2 boxplot in one graph in R without additional libraries?

I have this kind of dataset
Defect.found Treatment Program
1 Testing Counter
1 Testing Correlation
0 Inspection Counter
3 Testing Correlation
2 Inspection Counter
I would like to create two boxplotes, one boxplot of detected defects per program and one boxplot of detected defects per technique but in one graph.
Meaning having:
boxplot(exp$Defect.found ~ exp$Treatment)
boxplot(exp$Defect.found ~ exp$Program)
In a joined graph.
Searching on Stackoverflow I was able to create it but with lattice library typing:
bwplot(exp$Treatment + exp$Program ~ exp$Defects.detected)
but i would like to know if its possible to create the graph without additional libraries like ggplot and lattice
Prepare the plot window to receive two plots in one row and two columns (default is obviously one row and one column):
par(mfrow = c(1, 2))
My suggestion is to avoid using the word exp, because it is already used for the exponential function. Use for instance mydata.
Defects found against treatment (frame = F suppresses the external box):
with(mydata, plot(Defect.found ~ Treatment, frame = F))
Defects found against program (ylab = NA suppresses the y label because it is already shown in the previous plot):
with(mydata, plot(Defect.found ~ Program, frame = F, ylab = NA))

Assigning "beanplot" object to variable in R

I have found that the beanplot is the best way to represent my data. I want to look at multiple beanplots together to visualize my data. Each of my plots contains 3 variables, so each one looks something like what would be generated by this code:
library(beanplot)
a <- rnorm(100)
b <- rnorm(100)
c <- rnorm(100)
beanplot(a, b ,c ,ylim = c(-4, 4), main = "Beanplot",
col = c("#CAB2D6", "#33A02C", "#B2DF8A"), border = "#CAB2D6")
(Would have just included an image but my reputation score is not high enough, sorry)
I have 421 of these that I want to put into one long PDF (EDIT: One plot per page is fine, this was just poor wording on my part). The approach I have taken was to first generate the beanplots in a for loop and store them in a list at each iteration. Then I will use the multiplot function (from the R Cookbook page on multiplot) to display all of my plots on one long column so I can begin my analysis.
The problem is that the beanplot function does not appear to be set up to assign plot objects as a variable. Example:
library(beanplot)
a <- rnorm(100)
b <- rnorm(100)
plot1 <- beanplot(a, b, ylim = c(-5,5), main = "Beanplot",
col = c("#CAB2D6", "#33A02C", "#B2DF8A"), border = "#CAB2D6")
plot1
If you then type plot1 into the R console, you will get back two of the plot parameters but not the plot itself. This means that when I store the plots in the list, I am unable to graph them with multiplot. It will simply return the plot parameters and a blank plot.
This behavior does not seem to be the case with qplot for example which will return a plot when you recall the stored plot. Example:
library(ggplot2)
a <- rnorm(100)
b <- rnorm(100)
plot2 <- qplot(a,b)
plot2
There is no equivalent to the beanplot that I know of in ggplot. Is there some sort of workaround I can use for this issue?
Thank you.
You can simply open a PDF device with pdf() and keep the default parameter onefile=TRUE. Then call all your beanplot()s, one after the other. They will all be in one PDF document, each one on a separate page. See here.

want to use another df for errorbars in R with barplot

I have these two df.
x;
experiment expression
1 HC 50
2 LC 4
3 HR 10
4 LR 2
y;
HC_conf_lo HC_conf_hi LC_conf_lo LC_conf_hi HR_conf_lo HR_conf_hi LR_conf_lo LR_conf_hi
1 63.3293 109.925 2.33971 5.26642 8.8504 16.7707 0.124013 0.434046
I want to use df:y to plot low and high conf. points. Output should be a barplot with errorbars. Can someone show me using lines in the basic package how to do this?
So don't know if your data is valid. Assuming the confidence intervals are valid.
Here's what you can do to get error bars in your data
#First reading in your data
x<-read.table("x.txt", header=T)
y<=read.table("y.txt", header =T)
#reshaping y to merge it with x
y.wide <-data.frame(matrix(t(y),ncol=2,byrow=T)) #Transpose Y,
#matrix with 2 cols, byrow,
#so we get the lo and hi values in one row
names(y.wide)<-c("lo","hi") #name the columns in y.wide
#Make a data.frame of x and y.wide
xy.df <-data.frame(x,y.wide) # this will be used for plotting the error bars
#make a matrix for using with barplot (barplot takes only matrix or table)
xy<-as.matrix(cbind(expression=x$expression,y.wide))
rownames(xy)<-x$experiment #rownames, so barplot can label the bars
#Get ylimts for barplot
ylimits <-range(range(xy$expression), range(xy$lo), range(xy$hi))
barx <-barplot(xy[,1],ylim=c(0,ylimits[2])) #get the x co-ords of the bars
barplot(xy[,1],ylim=c(0,ylimits[2]),main = "barplot of Expression with ? bars")
# ? as don't know if it's C.I, or what
with(xy.df, arrows(barx,expression,barx,lo,angle=90, code=1,length=0.1))
with(xy.df, arrows(barx,expression,barx,hi,angle=90, code=1,length=0.1))
Resultant Plot
But it doesn't look right, This is because your expression values don't fall between the lo and hi values.
With the hack below,
barplot(xy[,1],ylim=c(0,ylimits[2]),main = "barplot of Expression with ? bars")
with(xy.df, arrows(barx,lo,barx,hi,angle=90, code=2,length=0.1))
with(xy.df, arrows(barx,hi,barx,lo,angle=90, code=2,length=0.1))
The resultant plot
So look at the both arrows call carefully, and you will see how I achieved it.
I would recommend double checking your calculations though.
And this is far easier with ggplot2. Look at this page for examples and code
http://docs.ggplot2.org/0.9.3.1/geom_errorbar.html

Resources