I would like to recreate the following chart in R using ggplot. My data is as per a similar table where for each code (A, B, C etc.). I have a current value, a value 12M ago and the respective range (max, min) over the period.
My chart needs to show the current value in red, the value 12M ago in blue and then a line show the max and min range.
I can produce this painstaking in Excel using error bars, but I would like to reproduce it in R.
Any ideas on how I can do this using ggplot? Thanks.
Here's what I came up with, but just a note: please if you post your dataset, don't post an image, but instead post the result of dput(your.data.frame). The result of that is easily copy-pasted into the console in order to replicate your dataset, whereas I recreated your data frame manually. :/
A few points first regarding your data as is and the intended plot:
The red and blue hash marks used to indicate 12 months ago and today are not a geom I know of off the top of my head, so I'm using geom_point here to show them (easiest way). You can pick another geom of you wish to show them differently.
The ranges for high and low are already specified by those column names. I'll use those values for the required aesthetics in geom_errorbar.
You can use your data as is to plot and use two separate geom_point calls (one for "today" and one for "12M ago"), but that's going to make creating the legend more difficult than it needs to be, so the better option is to adjust the dataset to support having the legend created automatically. For that, we'll use the gather function from tidyr, being sure to just "gather together" the information in "today" and "12M ago" (my column name for that was different b/c you need to start with a letter in the data frame), but leave alone the columns for "high", "low", and the letters (called "category" in my dataframe).
Where df is the original data frame:
df1 <- df %>% gather(time, value, -category, -high, -low)
The new dataframe (df1) looks like this (18 observations total):
category high low time value
1 A 82 28 M12.ago 81
2 B 82 54 M12.ago 80
3 C 80 65 M12.ago 75
4 D 76 34 M12.ago 70
5 E 94 51 M12.ago 93
6 F 72 61 M12.ago 65
where "time" has "M12.ago" or "today".
For the plot, you apply category to x and value to y, and specify ymax and ymin with high and low, respectively for the geom_errorbar:
ggplot(df1, aes(x=category, y=value)) +
geom_errorbar(aes(ymin=low, ymax=high), width=0.2) + ylim(0,100) +
geom_point(aes(color=time), size=2) +
scale_color_manual(values=list('M12.ago'='blue', 'today'='red')) +
theme_bw() + labs(color="") + theme(legend.position='bottom')
Giving you this:
Related
I have neuroscientific data where we count synapses/cells in the cochlea and quantify these per frequency. We do this for animals of different ages. What I thus ideally want is the frequencies (5,10,20,30,40) in the x-axis and the amount of synapses/cells plotted on the y-axis (usually a numerical value from 10 - 20). The graph then will contain 5 lines of the different ages (6 weeks, 17 weeks, 43 weeks, 69 weeks and 96 weeks).
I try this with ggplot and first just want to plot one age. When I use the following command:
ggplot(mydata, aes(x=Frequency, y=puncta6)) + geom_line()
I get a graph, but no line and the following error: 'geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?'
So I found I have to adjust the code to:
ggplot(mydata, aes(x=Frequency, y=puncta6, group = 1)) + geom_line()
This works, except for the fact that my first data point (5 kHz) is now plotted behind my last data point (40 kHz)......... (This also happens without the 'group = 1' addition). How do I solve this or is there an easier way to plot this kind of data?
I couldnt add a file so I added a photo of my code + graph with the 5 kHz data point oddly located and I added a photo of my data in excel.
example data
example code and graph
I've seen similar questions asked, and this discussion about adding functionality to ggplot Setting x/y lim in facet_grid . In my research I often want to produce several panels plots, say for different simulation trials, where the axes limits remain the same to highlight differences between the trials. This is especially useful when showing the plot panels in a presentation. In each panel plot I produce, the individual plots require independent y axes as they're often weather variables, temperature, relative humidity, windspeed, etc. Using
ggplot() + ... + facet_wrap(~ ..., scales = 'free_y')
works great as I can easily produce plot panels of different weather variables.
When I compare between different plot panels, its nice to have consistent axes. Unfortunately ggplot provides no way of setting the individual limits of each plot within a panel plots. It defaults to using the range of given data. The Google Group discussion linked above discusses this shortcoming, but I was unable to find any updates as to whether this could be added. Is there a way to trick ggplot to set the individual limits?
A first suggestion that somewhat sidesteps the solution I'm looking for is to combine all my data into one data table and use facet_grid on my variable and simulation
ggplot() + ... + facet_grid(variable~simulation, scales = 'free_y')
This produces a fine looking plot that displays the data in one figure, but can become unwieldy when considering many simulations.
To 'hack' the plotting into producing what I want, I first determined which limits I desired for each weather variable. These limits were found by looking at the greatest extents for all simulations of interest. Once determined I created a small data table with the same columns as my simulation data and appended it to the end. My simulation data had the structure
'year' 'month' 'variable' 'run' 'mean'
1973 1 'rhmax' 1 65.44
1973 2 'rhmax' 1 67.44
... ... ... ... ...
2011 12 'windmin' 200 0.4
So I created a new data table with the same columns
ylims.sims <- data.table(year = 1, month = 13,
variable = rep(c('rhmax','rhmin','sradmean','tmax','tmin','windmax','windmin'), each = 2),
run = 201, mean = c(20, 100, 0, 80, 100, 350, 25, 40, 12, 32, 0, 8, 0, 2))
Which gives
'year' 'month' 'variable' 'run' 'mean'
1 13 'rhmax' 201 20
1 13 'rhmax' 201 100
1 13 'rhmin' 201 0
1 13 'rhmin' 201 80
1 13 'sradmean' 201 100
1 13 'sradmean' 201 350
1 13 'tmax' 201 25
1 13 'tmax' 201 40
1 13 'tmin' 201 12
1 13 'tmin' 201 32
1 13 'windmax' 201 0
1 13 'windmax' 201 8
1 13 'windmin' 201 0
1 13 'windmin' 201 2
While the choice of year and run is aribtrary, the choice of month need to be anything outside 1:12. I then appended this to my simulation data
sim1data.ylims <- rbind(sim1data, ylims)
ggplot() + geom_boxplot(data = sim1data.ylims, aes(x = factor(month), y = mean)) +
facet_wrap(~variable, scale = 'free_y') + xlab('month') +
xlim('1','2','3','4','5','6','7','8','9','10','11','12')
When I plot these data with the y limits, I limit the x-axis values to those in the original data. The appended data table with y limits has month values of 13. As ggplot still scales axes to the entire dataset, even when the axes are limited, this gives me the y limits I desire. Important to note that if there are data values greater than the limits you specify, this will not work.
Before: Notice the differences in the y limits for each weather variable between the panels.
After: Now the y limits remain consistent for each weather variable between the panels.
I hope to edit this post in the coming days and add a reproducible example for better explanation. Please comment if you've heard anything about adding this functionality to ggplot.
Data:
I have a data frame comprising 4 variables and about 300k rows including a unique account ID, a start date in yyyy-mm-dd, a start year, and the total number of months to-date the customer has held an account active. Snippet of the data below (don't let the row numbers confuse, this is obviously a subset, if more data is necessary, let me know):
> head(ten.by.id)
acct.id start_date strt.yr max_ten
1 155 1998-11-01 1998 175
19 902 2001-09-01 2001 143
39 995 2001-09-01 2001 143
59 1014 2000-10-01 2000 153
78 1017 2000-04-01 2000 160
100 1137 2000-11-01 2000 153
Problem (Why I want to render a faceted plot):
Showing a histogram of the entire dataset across all years renders the following:
Obviously, there are mixed distributions of information here, but the effect is unknown. First I thought I'd check for time domain effects with a visual. By using facets, I can provide a serial histogram of frequency distributions by year, overlaying the KDE plot for each year.
If multiple distributions were a product of something that occurred over time, I could spot check relevant shape changes (i.e. uni to multimodal). I used the code below to generate this plot:
maxten_time <- ggplot(ten.by.id, aes(max_ten))
+ geom_histogram(colour="grey19", fill="orange", binwidth=2, stat="bin")
+ scale_y_continuous(breaks=seq(0,12000,by=100))
+ scale_x_continuous(breaks=seq(0,180,by=45))
+ labs(title ="Serial Distribution of Max Length of Tenure for all Customers by Start Date", x="Max Tenure(months)", y="# of Customers", colour="blue")
+ facet_grid(. ~ strt.yr) + geom_density(fill=NA, colour="orange", cex=1) + aes(y = ..count..)
Which renders the following:
Questions for recreating the faceted plot:
What I wish to do is add a horizontal line (or some other single marker) to each facet which indicates
the total # of customer starts for each year. Can this be done in a faceted
plot?
I would like to add an additional axis that spans across the facets to
mark the number of months across all years (1 to 175). Am I reaching with ggplot to try to do this (i.e. since each facet is its own plot, would aligning the month markers across all facets even be possible)? I haven't seen any relevant examples on doing something quite like this.
The objective is merely to combine the horiz lines in each facet and the axis across facets into the entire plot. Any direction would be helpful.
Phillip
I'm working on a R Shiny program that can take any csv file and output graphs of it. The user who uploads the csv has some guidelines on how the data should look, but I don't want it to be too strict.
I'm currently trying to use ggplot2 to graph multiple lines of the same dataset on one plot for comparison.
The data I am currently uploading looks like this (simplified, as the data has over 1000 rows):
Date Hamburgers Salads Sodas Fries
12-01 4 4 3 2
12-02 1 7 3 9
12-03 22 24 45 34
12-04 23 44 46 22
I'm trying to output a graph that has the time on the X-axis (the user chooses this via a sidebar, as he can choose any axis, but time makes the most sense here). For the Y axis, I want 4 lines, colored differently, plotting each variable over time.
I have all of the 'user taking in input and choosing which columns to graph' implemented, but for simplicity's sake, we can assume that for the most part, this has been hard coded (so Y variable will actually be input$y, etc in my implementation)
The portion of my code where I try to graph the data is:
output$plotLine <- renderPlot({
p <- ggplot(data, aes_string(x=X, y=Y), environment = environment())
p <- p + geom_point(size = 3)
p <- p + geom_line(aes(group=1))
print(p)
})
This plots one of the lines, but I have no idea how to plot the others on the same plot. I've read about using 'group' in the aes function, but this depends on having a classifier in the dataset, which this one currently does not have.
I have also looked into the melt() function from the reshape2 package but am not sure how it would help me (both for the multiple line problem and the greater sense of this project, so that the user doesn't have to abide by strict rules for upload format of the csv).
Any help would be much appreciated!
Assuming you put the xaxis variable (Date) in selectedxaxis, the selected products in selectedproducts and with data holding the loaded data:
selectedxaxis = "Date"
selectedproducts = c("Sodas", "Salads")
widedata = subset(data, select = c(selectedxaxis, selectedcolumns))
longdata = melt(widedata, id.vars=selectedxaxis, variable.name='Product', value.name='Count')
ggplot(longdata) + geom_line(aes(Date, Count, color=Product))
I have the following dataframe:
Catergory Reason Species
1 Decline Genuine 24
2 Improved Genuine 16
3 Improved Misclassified 85
4 Decline Misclassified 41
5 Decline Taxonomic 2
6 Improved Taxonomic 7
7 Decline Unclear 41
8 Improved Unclear 117
I'm trying to make a grouped bar chart, species as height and then 2 colours for catergory.
here is my code:
Reasonstats<-read.csv("bothstats.csv")
Reasonstats2<-as.matrix(Reasonstats[,3])
barplot((Reasonstats2),beside=T,col=c("darkblue","red"),ylab="number of
species",names.arg=Reasonstats$Reason, cex.names=0.8,las=2,space=c(0,100)
,ylim=c(0,120))
box(bty="l")
Now what I want, is to not have to label the two bars twice and to group them apart, I've tried changing the space value to all sorts of things and it doesn't seem to move the bars apart. Can anyone tell me what I'm doing wrong?
with ggplot2:
library(ggplot2)
Animals <- read.table(
header=TRUE, text='Category Reason Species
1 Decline Genuine 24
2 Improved Genuine 16
3 Improved Misclassified 85
4 Decline Misclassified 41
5 Decline Taxonomic 2
6 Improved Taxonomic 7
7 Decline Unclear 41
8 Improved Unclear 117')
ggplot(Animals, aes(factor(Reason), Species, fill = Category)) +
geom_bar(stat="identity", position = "dodge") +
scale_fill_brewer(palette = "Set1")
Not a barplot solution but using lattice and barchart:
library(lattice)
barchart(Species~Reason,data=Reasonstats,groups=Catergory,
scales=list(x=list(rot=90,cex=0.8)))
There are several ways to do plots in R; lattice is one of them, and always a reasonable solution, +1 to #agstudy. If you want to do this in base graphics, you could try the following:
Reasonstats <- read.table(text="Category Reason Species
Decline Genuine 24
Improved Genuine 16
Improved Misclassified 85
Decline Misclassified 41
Decline Taxonomic 2
Improved Taxonomic 7
Decline Unclear 41
Improved Unclear 117", header=T)
ReasonstatsDec <- Reasonstats[which(Reasonstats$Category=="Decline"),]
ReasonstatsImp <- Reasonstats[which(Reasonstats$Category=="Improved"),]
Reasonstats3 <- cbind(ReasonstatsImp[,3], ReasonstatsDec[,3])
colnames(Reasonstats3) <- c("Improved", "Decline")
rownames(Reasonstats3) <- ReasonstatsImp$Reason
windows()
barplot(t(Reasonstats3), beside=TRUE, ylab="number of species",
cex.names=0.8, las=2, ylim=c(0,120), col=c("darkblue","red"))
box(bty="l")
Here's what I did: I created a matrix with two columns (because your data were in columns) where the columns were the species counts for Decline and for Improved. Then I made those categories the column names. I also made the Reasons the row names. The barplot() function can operate over this matrix, but wants the data in rows rather than columns, so I fed it a transposed version of the matrix. Lastly, I deleted some of your arguments to your barplot() function call that were no longer needed. In other words, the problem was that your data weren't set up the way barplot() wants for your intended output.
I wrote a function wrapper called bar() for barplot() to do what you are trying to do here, since I need to do similar things frequently. The Github link to the function is here. After copying and pasting it into R, you do
bar(dv = Species,
factors = c(Category, Reason),
dataframe = Reasonstats,
errbar = FALSE,
ylim=c(0, 140)) #I increased the upper y-limit to accommodate the legend.
The one convenience is that it will put a legend on the plot using the names of the levels in your categorical variable (e.g., "Decline" and "Improved"). If each of your levels has multiple observations, it can also plot the error bars (which does not apply here, hence errbar=FALSE