ggplot plotting problems and error bars - r

So I have some data that I imported into R using read.csv.
d = read.csv("Flux_test_results_for_R.csv", header=TRUE)
rows_to_plot = c(1,2,3,4,5,6,13,14)
d[rows_to_plot,]
It looks like it worked fine:
> d[rows_to_plot,]
strain selective rate ci.low ci.high
1 4051 rif 1.97539e-09 6.93021e-10 5.63066e-09
2 4052 rif 2.33927e-09 9.92957e-10 5.51099e-09
3 4081 (mutS) rif 1.32915e-07 1.05363e-07 1.67671e-07
4 4086 (mutS) rif 1.80342e-07 1.49870e-07 2.17011e-07
5 4124 (mutL) rif 5.53369e-08 4.03940e-08 7.58077e-08
6 4125 (mutL) rif 1.42575e-07 1.14957e-07 1.76828e-07
13 4760-all rif 6.74928e-08 5.41247e-08 8.41627e-08
14 4761-all rif 2.49119e-08 1.91979e-08 3.23265e-08
So now I'm trying to plot the column "rate", with "strain" as labels, and ci.low and ci.high as boundaries for confidence intervals.
Using ggplot, I can't even get the plot to work. This gives a plot where all the dots are at 1 on the y-axis:
g <- ggplot(data=d[rows_to_plot,], aes(x=strain, y=rate))
g + geom_dotplot()
Attempt at error bars:
error_limits = aes(ymax = d2$ci.high, ymin = d2$ci.low)
g + geom_errorbar(error_limits)
As you can tell I'm a complete noob to plotting things in R, any help appreciated.
Answer update
There were two things going on. As per boshek's answer, which I selected, I it seems that geom_point(), not geom_dotplot(), was the way to go.
The other issue was that originally, I filtered the data to only plot some rows, but I didn't also filter the error limits by row. So I switched to:
d2 = d[c(1,2,3,4,5,6,13,14),]
error_limits = aes(ymax = d2$ci.high, ymin = d2$ci.low)
g = ggplot(data=d2, ...etc...

A couple general comments. Get away from using attach. Though it has its uses, for beginners it can be very confusing. Get used to things like d$strain and d$selective. That said, once you call the dataframe with ggplot() you can refer to variables in that dataframe subsequently just by their names. Also you really need to ask questions with a reproducible example. This is a very important step in figuring out how to ask questions in R.
Now for the plot. I think this should work:
error_limits = aes(ymax = rate + ci.high, ymin = rate - ci.low)
ggplot(data=d[rows_to_plot,], aes(x=strain, y=rate)) +
geom_point() +
geom_errorbar(error_limits)
But of course this is untested because you haven't provided a reproducible examples.

Related

Changing the xlim of numeric value causing error ggplot R

I have a grouped barplot produced using ggplot in R with the following code
ggplot(mTogether, aes(x = USuniquNegR, y = value, fill = variable)) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_discrete(name = "Area",
labels = c("Everywhere", "New York")) +
xlab("Reasons") +
ylab("Proportion of total complaints") +
coord_flip() +
ggtitle("Comparison between NY and all areas")
mTogether is created using the following code
mTogether <- melt(together, id.vars = 'USuniquNegR')
The Data Frame together is made up of
USperReasons USperReasonsNY USuniquNegR
1 0.198343304187759 0.191304347826087 Late Flight
2 0.35987114588127 0.321739130434783 Customer Service Issue
3 0.0667280257708237 0.11304347826087 Lost Luggage
4 0.0547630004601933 0.00869565217391304 Flight Booking Problems
5 0.109065807639208 0.121739130434783 Can't Tell
6 0.00460193281178095 0 Damaged Luggage
7 0.0846755637367694 0.0782608695652174 Cancelled Flight
8 0.0455591348366314 0.0521739130434783 Bad Flight
9 0.0225494707777266 0.0347826086956522 longlines
10 0.0538426138978371 0.0782608695652174 Flight Attendant Complaints
Together can be generated by the following
together<-data.frame(cbind(USperReasons,USperReasonsNY,USuniquNegR))
where
USperReasons <- c(0.19834,0.35987,.06672,0.05476,0.10906,.00460,.08467,0.04555,0.02254,0.05384)
USperReasonsNY <- c(0.191304348,0.321739130,0.113043478,0.008695652,0.121739130,0.000000000,0.078260870,0.05217391,0.034782609,0.078260870)
USuniquNegR <- c("Late Flight","Customer Service Issue","Lost Luggage","Flight Booking Problems","Can't Tell","Damaged Luggage","Cancelled Flight","Bad Flight","longlines","Flight Attendant Complaints")
The problem is when I try change xlim of the ggplot using
+ xlim(0, 1)
I just seem to get an error:
Discrete value supplied to continuous scale
I can't understand why this happens but I need to resolve it because currently the x axis starts below 0 and is very highly packed:
image of ggplot output
The problem is that you are cbind()ing your column vectors together, which converts the numbers to characters. Fix that and the rest should fix itself.
together<-data.frame(USperReasons,USperReasonsNY,USuniquNegR)
You need to remove the cbind from
together<-data.frame(cbind(USperReasons,USperReasonsNY,USuniquNegR))
because str(together) tells that all three columns are factors.
With
together <- data.frame(USperReasons, USperReasonsNY, USuniquNegR)
the plot looks reasonable to me (without having to use ylim or xlim).
So, the error was not within ggplot2 but in data preparation.
Therefore, please, provide a full working example which can be copied, pasted and run when asking a question next time. Thank you.

How to map lots of data points on one plot? when grouping by shape is not applicable

I'm trying to plot two metrics' scores from 50 simulations. I need to map each simulation on the plot but shape accepts only 8, and using color for 50 groups doesn't seem to look good on plot at all(I tried and it was terrible!) Any suggestions?
myplot<- ggplot(new, aes(sppp_loss, history)) +
geom_point(aes(colour = metric),
position = position_jitter(width = 0.3, height = 0)) +
geom_smooth (aes(x=sppp_loss , y= history, color=metric, group=(metric)), method="lm", se=FALSE)
Subset of data
metric history sppp_loss sim
ED_loss 1.209177471 5 tree1
ED_loss 1.453112762 5 tree2
ED_loss 1.174947503 5 tree3
ED_loss 1.226344648 5 tree4
ED_loss 0.972865697 5 tree5
cheers
Since my comment solved your problem, I'm converting it to an answer:
You could use the value of sim as the actual point markers, but it will be very cluttered with 50 different values. At the least you'll want to shorten it to t1, t2, etc. (or even just the number alone), but faceting is a better option (as noted by #thelatemail). Anyway, here's a conceptual example using a built-in data set:
ggplot(mtcars, aes(factor(gear), mpg)) +
geom_text(aes(label=carb), position=position_jitter(width=0.2,height=0))

How do you create a bar graph for a data frame in R that uses percentages as the y-axis instead of a count?

If I had a data from like this (but larger):
ID Rating
12 Good
12 Good
16 Good
16 Bad
16 Very Bad
34 Very Good
38 Very Bad
52 Bad
What would I have to do to make a plot show the percent of the count of each type. Basically, the graph should look like 4 bars on the x-axis for each type of rating and the y-axis should be the percent of the time the rating appears. For example, the data frame above would have 4 bars with Very Bad and Bad being 25%, Good being 37.5% and Very Good being 12.5%. I would really prefer to get an answer in ggplot2, but, since I really can't find this at all, anything in R would work.
This is the best answer I found:
# create data
data <- data.frame(ID = as.factor(c(12,12,16,16,16,34,38,52)),
Rating = c("Good","Good","Good","Bad","Very Bad","Very Good","Very Bad","Bad"))
# get summary table of Rating
t <- table(data$Rating)
# get percentage list
percent <- as.vector(t)/nrow(data)
# plot
library(ggplot2)
ggplot(data = data,aes(x=Rating)) +
geom_bar(aes(y = (..count..)/sum(..count..))) +
ylab("Percentage") +
ylim(0,0.4)
library(ggplot2)
# create some data
DT <- data.frame(ID=1:10,Rating=sample(c("Very Good","Good","Bad","Very Bad"),20,replace=TRUE))
ggplot(DT, aes(factor(Rating))) + geom_bar()
Reference: ggplot2 docs
For showing proportions in base barplots, with actual proportions displayed as text over the bars:
tmp.table <- prop.table(table(dat$Rating))
with(dat, barplot(tmp.table, xlab= "Rating", ylab="proportion", ylim=c(0,.40)))
text(x = c(0.75, 2, 3.1, 4.25), y = tmp.table + .01, labels=paste(tmp.table*100,"%"))
Result
Data
dat <- read.csv(text="Rating
Good
Good
Good
Bad
Very Bad
Very Good
Very Bad
Bad")

ggplot2: how to overlay 2 plots when using stat_summary

i am totally new in R so maybe the answer to the question is trivial but I couldn't find any solution after searching in the net for days.
I am using ggplot2 to create graphs containing the mean of my samples with the confidence interval in a ribbon (I can't post the pic but something like this: S1
I have a data frame (df) with time in the first column and the values of the variable measured in the other columns (each column is a replicate of the measurement).
I do the following:
mdf<-melt(df, id='time', variable_name="samples")
p <- ggplot(data=mdf, aes(x=time, y=value)) +
geom_point(size=1,colour="red")
stat_sum_df <- function(fun, geom="crosbar", ...) {
stat_summary(fun.data=fun, geom=geom, colour="red")
}
p + stat_sum_df("mean_cl_normal", geom = "smooth")
and I get the graph I have shown at the beginning.
My question is: if I have two different data frames, each one with a different variable, measured in the same sample at the same time, how I can plot the 2 graphs in the same plot? Everything I have tried ends in doing the statistics in the both sets of data or just in one of them but not in both. Is it possible just to overlay the plots?
And a second small question: is it possible to change the colour of the ribbon?
Thanks!
something like this:
library(ggplot2)
a <- data.frame(x=rep(c(1,2,3,5,7,10,15,20), 5),
y=rnorm(40, sd=2) + rep(c(4,3.5,3,2.5,2,1.5,1,0.5), 5),
g = rep(c('a', 'b'), each = 20))
ggplot(a, aes(x=x,y=y, group = g, colour = g)) +
geom_point(aes(colour = g)) +
geom_smooth(aes(fill = g))
I'd suggest you reading the basics of ggplot. Check ?ggplot2 for help on ggplot but also available help topics here and particularly how group aesthetic may be manipulated.
You'll find useful the discussion group at Google groups and maybe join it. Also, QuickR have a lot of examples on ggplot graphs and, obviously, here at Stackoverflow.

Plot multiple histograms in one using ggplot2 in R

I am fairly new to R and ggplot2 and am having some trouble plotting multiple variables in the same histogram plot.
My data is already grouped and just needs to be plotted. The data is by week and I need to plot the number for each category (A, B, C and D).
Date A B C D
01-01-2011 11 0 11 1
08-01-2011 12 0 3 3
15-01-2011 9 0 2 6
I want the Dates as the x axis and the counts plotted as different colors according to a generic y axis.
I am able to plot just one of the categories at a time, but am not able to find an example like mine.
This is what I use to plot one category. I am pretty sure I need to use position="dodge" to plot multiple as I don't want it to be stacked.
ggplot(df, aes(x=Date, y=A)) + geom_histogram(stat="identity") +
labs(title = "Number in Category A") +
ylab("Number") +
xlab("Date") +
theme(axis.text.x = element_text(angle = 90))
Also, this gives me a histogram with spaces in between the bars. Is there any way to remove this? I tried spaces=0 as you would do when plotting bar graphs, but it didn't seem to work.
I read some previous questions similar to mine, but the data was in a different format and I couldn't adapt it to fit my data.
This is some of the help I looked at:
Creating a histogram with multiple data series using multhist in R
http://www.cookbook-r.com/Graphs/Plotting_distributions_%28ggplot2%29/
I'm also not quite sure what the bin width is. I think it is how the data should be spaced or grouped, which doesn't apply to my question since it is already grouped. Please advise me if I am wrong about this.
Any help would be appreciated.
Thanks in advance!
You're not really plotting histograms, you're just plotting a bar chart that looks kind of like a histogram. I personally think this is a good case for faceting:
library(ggplot2)
library(reshape2) # for melt()
melt_df <- melt(df)
head(melt_df) # so you can see it
ggplot(melt_df, aes(Date,value,fill=Date)) +
geom_bar() +
facet_wrap(~ variable)
However, I think in general, that changes over time are much better represented by a line chart:
ggplot(melt_df,aes(Date,value,group=variable,color=variable)) + geom_line()

Resources