Modify legend and labels of stacked-area plot in R/ggplot2 - r

EDIT: Solved by Haboryme in comments; the problem was my use of xlab and ylab instead of x and y as the names of keyword arguments to labs() (explaining the graph labels), and a redundant use of colour= in the second call to aes() (explaining the persistence of the original legend).
I'd like to make a stacked-area chart from some CSV data with R and ggplot2. For example:
In file "test.csv":
Year,Column with long name 1,Column with long name 2
2000,1,1
2001,1,1.5
2002,1.5,2
I run this code (imitating the answer to this GIS.SE question):
library(ggplot2)
library(reshape)
df <- read.csv('test.csv')
df <- melt(df, id="Year")
png(filename="test.png")
gg <- ggplot(df,aes(x=as.numeric(Year),y=value)) +
# Add a new legend
scale_fill_discrete(name="Series", labels=c("Foo bar", "Baz quux")) +
geom_area(aes(colour=variable,fill=variable)) +
# Change the axis labels and add a title
labs(title="Test",xlab="Year",ylab="Values")
print(gg)
dev.off()
The result, in file "test.png":
Problems: my attempt to change the axis labels was ignored, and my new legend (with code borrowed from the R Cookbook's suggestions) was added to, not substituted for, the (strangely recolored) default one. (Other solutions offered by the R Cookbook, such as calling guides(fill=FALSE), do more or less the same thing.) I'd rather not use the workaround of editing my dataframe (e.g. stripping the periods that read.csv() substitutes for spaces in column headers) so that the default labels turn out correct. What should I do?

ggplot(df,aes(x=as.numeric(Year),y=value)) +
scale_fill_discrete(name="Series", labels=c("Foo bar", "Baz quux")) +
geom_area(aes(fill=variable)) +
labs(title="Test",x="Year",y="Values")
The argument colour in the aes() of geom_area() only colours the contour and hence doesn't add anything to the plot here.

Related

Plotting bar-plot/clustered column charts in R from csv file using ggplo2

I am planning to plot a bar-plot/clustered column chart for time vs revenue with trend-line connecting each bars on top. Starting from year 1981 to 1988.
I have used this code to read the csv : read.csv(file_location/Revenue.csv",header = T, sep=",", dec = ".")
for the plotting : pl <- ggplot(data,aes(x=ï..Year))
and then : pl + geom_bar(color='red',fill='blue').
Unfortunately, i end up with something like this. Whereas, i'd prefer something like this.
I used only ggplot2 library in this case, should i use tidyr, diplyr additionally ? Am i mistaking between continuous and discrete variables. Any advice regarding aesthetic modification to beautify it or solutions regarding this would be really appreciated as i am still in the basics of ggplot and data visualizations.
I have added the fine in case if you want to check it : Revenue.csv
Check the documentation here form some information, but the big change you should make is to use geom_col in place of geom_bar. Your current call specifies an x= aesthetic (what should be the x axis), but not the y= aesthetic (what should be the y axis). geom_bar indicates the number of cases/observations at each x value by default, whereas geom_col is used to display a bar of length y at each x value... but you need a y aesthetic.
With all that being said, try this:
pl <- ggplot(data,aes(x=ï..Year, y=your.y.column.name)) +
geom_col(color='red',fill='blue')
As for aesthetics, I might change the color scheme a bit and also the theme, but that's ind of personal preference. My suggestion would be to at least change your color scheme for geom_bar/col. The color= specifies the outline on the bars, and the fill= is the color of the bars. Your code would give you bright blue bars with a red outline... not awesome. I would also change the width of your bars a to be a bit skinnier by adjusting the width= argument from the default of 1 to something smaller. Here is an example with a dummy dataset. Most people (me included) would not want to download someone else's data via a link, sorry.
df <- data.frame(x=1:10, y=1:10)
ggplot(df, aes(x=x, y=y)) +
geom_col(fill='steelblue', color='black', width=0.5) +
theme_bw()

Annotate is giving error in ggplot2 when using facet

I had previously used annotate() to add letters to facet panels of ggplots. After updating R (to 3.6.1), code that had previously worked with annotate no longer does.
I can solve this by making a separate dataframe to label each facet, but that is cumbersome when I have a decent number of plots to make that vary in how many facets they have. All I want is a letter (e.g., a-f) on each panel for identification in a journal article.
library(ggplot2)
data(diamonds)
ggplot(diamonds, aes(x=carat,y=price)) +geom_point()+ facet_wrap(~cut) + annotate("text",label=letters[1:5],x=4.5,y=15000,size=6,fontface="bold")
ggplot(diamonds, aes(x=carat,y=price)) +geom_point()+ facet_wrap(~cut) + annotate("text",label=letters[1],x=4.5,y=15000,size=6,fontface="bold")
The first ggplot should produce a plot that has the facets labeled with lowercase letters. Instead, I get the error:
Error: Aesthetics must be either length 1 or the same as the data (25): label
The code does work if only one letter is used, as seen in the second ggplot, so annotate will work, but not with multiple values as it previously did.
I usually always use an external data frame for faceted annotations, because it is more traceable to me.
df_labels=unique(diamonds[,"cut"])
df_labels$label=letters[as.numeric(df_labels$cut)] #to preserve factor level ordering
df_labels$x=4.5
df_labels$y=15000
ggplot(diamonds, aes(x=carat,y=price)) +
geom_point()+ facet_wrap(~cut) +
geom_text(data=df_labels,aes(x=x,y=y,label=label))

Adding bar labels with italics and superscript to a barplot

I am using barplot for my data.
I need to insert x-axis bar labels (sample names) which have superscripts and should be italicized. For instance, one of the sample names (bar labels) is lab(delta21). Apart from the whole name to be in italics, I want the delta in (delta21) to be in symbol form and (delta21) to be a superscript of lab. (This is nothing fancy, just how biological gene mutant names are written).
I have tried fiddling around with names.arg=expression() but could not get it to work.
Any suggestions/ideas are most welcome.
Please try this minimal example:
x <- rnorm(2)
barplot(x, names.arg = c(expression(paste(italic("1")^"st")), expression(paste(italic("2")^"nd"))))
italic() does the italic part, ^ does the superscript part.
You may need to use ggplot2 to create your barplot because "bold, italic and bolditalic do not apply to symbols, and hence not to the Greek symbols such as mu" quoted from this help page. I am also assuming that different numbers are assigned to different samples (e.g., Lab_delta21, Lab_delta22, etc).
library(ggplot2)
library(reshape)
## make up data
data_table <- cast(mtcars, gear ~., value="mpg", mean)
data_table <- rename(data_table, c("(all)"="mean_mpg"))
lab_number <- 21:23
fancy_labels <- sapply(lab_number, function(x) paste0("italic(Lab[delta]", "[", x, "])"))
ggplot(data_table, aes(gear, mean_mpg)) + geom_bar(stat = "identity") +
scale_y_continuous(limits=c(0, 30))+
geom_text(aes(label=fancy_labels), parse=TRUE, hjust=0.5, vjust=-0.5, size=7)
The second "[]" is necessary as in [delta][21] because without it geom_text recognize [delta21] as one word, without rendering delta into a Greek letter.

add extra legend to plot

Hi can I add an additional legend to a ggplot.
Like
the following code
d <- melt(as.matrix(data.frame(y1=1/(1:10),y2=1/(10:1))))
ggplot(d, aes(x=Var1, y=value,fill=Var2)) + geom_bar(stat="identity",position='dodge')
This generates a nice legend containing the name of my dataframe.
But is it possible to put in an extralegend, that contains some extra information generated from the data.
In the standard R, I would add the additional legend like
d<-data.frame(y1=1/(1:10),y2=2*1/(10:1))
barplot(t(d),beside=T)
legend("top",paste("sums:",apply(d,2,sum)))
Thanks
This seems to work for me.
plot.new()
d <- melt(as.matrix(data.frame(y1=1/(1:10),y2=1/(10:1))))
ggplot(d, aes(x=Var1, y=value,fill=Var2)) +
geom_bar(stat="identity",position='dodge')
then the exciting stuff.
legend('top',paste("sums:",tapply(d$value,d$Var2,sum)))
I changed the apply statement to work on the molten data.
I am not aware of a ggplot solution, but I would love to see one.

How can I make a legend in ggplot2 with one point entry and one line entry?

I am making a graph in ggplot2 consisting of a set of datapoints plotted as points, with the lines predicted by a fitted model overlaid. The general idea of the graph looks something like this:
names <- c(1,1,1,2,2,2,3,3,3)
xvals <- c(1:9)
yvals <- c(1,2,3,10,11,12,15,16,17)
pvals <- c(1.1,2.1,3.1,11,12,13,14,15,16)
ex_data <- data.frame(names,xvals,yvals,pvals)
ex_data$names <- factor(ex_data$names)
graph <- ggplot(data=ex_data, aes(x=xvals, y=yvals, color=names))
print(graph + geom_point() + geom_line(aes(x=xvals, y=pvals)))
As you can see, both the lines and the points are colored by a categorical variable ('names' in this case). I would like the legend to contain 2 entries: a dot labeled 'Data', and a line labeled 'Fitted' (to denote that the dots are real data and the lines are fits). However, I cannot seem to get this to work. The (awesome) guide here is great for formatting, but doesn't deal with the actual entries, while I have tried the technique here to no avail, i.e.
print(graph + scale_colour_manual("", values=c("green", "blue", "red"))
+ scale_shape_manual("", values=c(19,NA,NA))
+ scale_linetype_manual("",values=c(0,1,1)))
The main trouble is that, in my actual data, there are >200 different categories for 'names,' while I only want the 2 entries I mentioned above in the legend. Doing this with my actual data just produces a meaningless legend that runs off the page, because the legend is trying to be a key for the colors (of which I have way too many).
I'd appreciate any help!
I think this is close to what you want:
ggplot(ex_data, aes(x=xvals, group=names)) +
geom_point(aes(y=yvals, shape='data', linetype='data')) +
geom_line(aes(y=pvals, shape='fitted', linetype='fitted')) +
scale_shape_manual('', values=c(19, NA)) +
scale_linetype_manual('', values=c(0, 1))
The idea is that you specify two aesthetics (linetype and shape) for both lines and points, even though it makes no sense, say, for a point to have a linetype aesthetic. Then you manually map these "nonsense" aesthetics to "null" values (NA and 0 in this case), using a manual scale.
This has been answered already, but based on feedback I got to another question (How can I fix this strange behavior of legend in ggplot2?) this tweak may be helpful to others and may save you headaches (sorry couldn't put as a comment to the previous answer):
ggplot(ex_data, aes(x=xvals, group=names)) +
geom_point(aes(y=yvals, shape='data', linetype='data')) +
geom_line(aes(y=pvals, shape='fitted', linetype='fitted')) +
scale_shape_manual('', values=c('data'=19, 'fitted'=NA)) +
scale_linetype_manual('', values=c('data'=0, 'fitted'=1))

Resources