I have seen many examples using directlabels to place labels on plot itself. However, all the examples only label the lines or points with the name of the series, i.e pretty much like legend.
Is it possible to label the first and last data points with the values of the points? E.g.
1-Jan 2-Jan ... 31-Jan
A 10 3 ... 7
B 8 11 ... 20
If the above data is plotted as line charts, is it possible to place a label on the left of the 2 lines as 10 and 8, and likewise label the right most points as 7 and 20?
Update: Thanks for the comments. Yes, I am using ggplot. I attach a mock-up below just to illustrate my requirement:
with ggplot2 you can pass the desired subset which you want to be labled in to data
add the below to your ggplot
p + geom_point() + geom_text(data="subset condition"), .....)
Related
I have neuroscientific data where we count synapses/cells in the cochlea and quantify these per frequency. We do this for animals of different ages. What I thus ideally want is the frequencies (5,10,20,30,40) in the x-axis and the amount of synapses/cells plotted on the y-axis (usually a numerical value from 10 - 20). The graph then will contain 5 lines of the different ages (6 weeks, 17 weeks, 43 weeks, 69 weeks and 96 weeks).
I try this with ggplot and first just want to plot one age. When I use the following command:
ggplot(mydata, aes(x=Frequency, y=puncta6)) + geom_line()
I get a graph, but no line and the following error: 'geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?'
So I found I have to adjust the code to:
ggplot(mydata, aes(x=Frequency, y=puncta6, group = 1)) + geom_line()
This works, except for the fact that my first data point (5 kHz) is now plotted behind my last data point (40 kHz)......... (This also happens without the 'group = 1' addition). How do I solve this or is there an easier way to plot this kind of data?
I couldnt add a file so I added a photo of my code + graph with the 5 kHz data point oddly located and I added a photo of my data in excel.
example data
example code and graph
I want to create a scatterplot in ggplot where there are multiple y values for each x value. I want to add these y values and plot the sum against the x value.
>df
a b
1 2
1 2
2 1
2 4
3 1
3 5
I want a plot that plots the sums of the b values for each a
a b
1 4
2 5
3 6
I can do this for a barplot by making a stacked barplot:
ggplot(data=df, aes(x=df$a, y=df$b)) + geom_bar(stat="identity")
but if I do this with geom_point ggplot just plots each value of y without stacking.
I could use ddply for this, but that would require a number of more steps. If there is a more expedient way I'd appreciate it.
I searched the site for other answers. While there were plenty about "stacked scatterplots" they were all about overlaid plots.
I don't see anything stacked about your bar chart example. If you just want to summarize the values to a single pont, you can use stat_summary
ggplot(data=df, aes(x=a, y=b)) + stat_summary(fun.y=sum, geom="point")
There are many ways to achieve this effect - of a 'histogram' but without bars, whose height is the sum of all values at the same X.
This type of graph is called a Cleveland Dot Plot, and is used because the conspicuous bars of a histogram can a distraction or at worse be misleading. (see works by Cleveland, Tufte etc).
One way to achieve this is to pre-process the data to do the sum, using functions such as table or hist or tapply or xtabs...
Note that base R has the function dotchart for the production of this type of graph.
dotchart(xtabs(rev(df)))
... but since we are discussing ggplot, which has powerful ways to summarise the data while plotting it, let's stick to MrFlick's theme of how to do it directly ggplot operators (i.e. not preprocessed).
Using a weighted bin summary statistic:
ggplot(data=df, aes(x=factor(a),weight=b)) + geom_point(stat="bin")
you may want to adjust the lower y limit to 0 here.
By stacking the height of the points:
ggplot(data=df, aes(x=factor(a),y=b)) + geom_point(position="stack")
the additional dots visible on this plot are probably superfluous and definitely ambiguous, but highlight the fact of multiplicity in the source data.
Building a dotplot
This one is popular in newspapers, but usually has dollar bills instead of giant black holes:
ggplot(data=df, aes(x=factor(a),weight=b)) + geom_dotplot(method="histodot")
It's probably not what you are looking for, but it's worth being aware of.
You should also be aware that scales are difficult to get correct in this mode, so it's best used in a hand-tuned mode, with the y scale numbering turned off.
I am looking for a way where data points are connected following a top-down manner to visualize a ranking. In that the y-axis represents the rank and the x-axis the attributes. With the normal setting the line connects the point starting from left to right. This results that the points are connected in the wrong order.
With the data below the line should be connected from (6,1) to (4,2) and then (5,3) etc. Optimally the ranking scale need to be inverted so that rank one starts on the top.
data <- read.table(header=TRUE, text='
attribute rank
1 6
2 5
3 4
4 2
5 3
6 1
7 7
8 11
9 10
10 8
11 9
')
plot(data$attribute,data$rank,type="l")
Is there a way to change the line drawing direction? My second idea would be to rotate the graph or maybe you have better ideas.
The graph I am trying to achieve is somewhat similar to this one:
example vertical line chart
You can do this with ggplot:
library(ggplot2)
ggplot(data, aes(y = attribute, x = rank)) +
geom_line() +
coord_flip() +
scale_x_reverse()
It solves the problem exactly the way you suggested. The first part of the command (ggplot(...) + geom_line()) creates an "ordinary" line plot. Note that I have already switched x- and y-coordinates. The next command (coord_flip()) flips x- and y-axis, and the last one (scale_x_reverse) changes the ordering of the x-axis (which is plotted as the y-axis) such that 1 is in the top left corner.
Just to show you that something like the example you linked in your question can be done with ggplot2, I add the following example:
library(tidyr)
data$attribute2 <- sample(data$attribute)
data$attribute3 <- sample(data$attribute)
plot_data <- pivot_longer(data, cols = -"rank")
ggplot(plot_data, aes(y = value, x = rank, colour = name)) +
geom_line() +
geom_point() +
coord_flip() +
scale_x_reverse()
If you intend to do your plots with R, learning ggplot2 is really worthwhile. You can find many examples on Cookbook for R.
I am trying to plot two vectors with different values, but equal length on the same graph as follows:
a<-23.33:52.33
b<-33.33:62.33
days<-1:30
df<-data.frame(x,y,days)
a b days
1 23.33 33.33 1
2 24.33 34.33 2
3 25.33 35.33 3
4 26.33 36.33 4
5 27.33 37.33 5
etc..
I am trying to use ggplot2 to plot x and y on the x-axis and the days on the y-axis. However, I can't figure out how to do it. I am able to plot them individually and combine the graphs, but I want just one graph with both a and b vectors (different colors) on x-axis and number of days on y-axis.
What I have so far:
X<-ggplot(df, aes(x=a,y=days)) + geom_line(color="red")
Y<-ggplot(df, aes(x=b,y=days)) + geom_line(color="blue")
Is there any way to define the x-axis for both a and b vectors? I have also tried using the melt long function, but got stuck afterwards.
Any help is much appreciated. Thank you
I think the best way to do it is via a the approach of melting the data (as you have mentioned). Especially if you are going to add more vectors. This is the code
library(reshape2)
library(ggplot2)
a<-23:52
b<-33:62
days<-1:30
df<-data.frame(x=a,y=b,days)
df_molten=melt(df,id.vars="days")
ggplot(df_molten) + geom_line(aes(x=value,y=days,color=variable))
You can also change the colors manually via scale_color_manual.
A simpler solution is to use only ggplot. The following code will work in your case
a<-23.33:52.33
b<-33.33:62.33
days<-1:30
df<-data.frame(a,b,days)
ggplot(data = df)+
geom_line(aes(x = df$days,y = df$a), color = "blue")+
geom_line(aes(x = df$days,y = df$b), color = "red")
I added the colors, you might want to use them to differentiate between your variables.
I am trying get my head around ggplot2 which creates beautiful graphs as you probably all know :)
I have a dataset with some transactions of sold houses in it (courtesy of: http://support.spatialkey.com/spatialkey-sample-csv-data/ )
I would like to have a line chart that plots the cities on the x axis and 4 lines showing the number of transactions in my datafile per city for each of the 4 home types. Doesn't sound too hard, so I found two ways to do this.
using an intermediate table doing the counts and geom_line() to plot the results
using geom_freqpoly() on my raw dataframe
the basic charts look the same, however chart nr. 2 seems to be missing plots for all the 0 values of the counts (eg. for the cities right of SACRAMENTO, there is no data for Condo, Multi-Family or Unknown (which seems to be missing completely in this graph)).
I personally like the syntax of method number 2 more than that of number 1 (it's a personal thing probably).
So my question is: Am I doing something wrong or is there a method to have the 0 counts also plotted in method 2?
# line chart example
# setup the libraries
library(RCurl) # so we can download a dataset
library(ggplot2) # so we can make nice plots
library(gridExtra) # so we can put plots on a grid
# get the data in from the web straight into a dataframe (all data is from: http://support.spatialkey.com/spatialkey-sample-csv-data/)
data <- read.csv(text=getURL('http://samplecsvs.s3.amazonaws.com/Sacramentorealestatetransactions.csv'))
# create a data frame that counts the number of trx per city/type combination
df_city_type<-data.frame(table(data$city,data$type))
# correct the column names in the dataframe
names(df_city_type)<-c('city','type','qty')
# alternative 1: create a ggplot with a geom_line on the calculated values - to show the nr. trx per city (on the x axis) with a differenct colored line for each type
cline1<-ggplot(df_city_type,aes(x=city,y=qty,group=type,color=type)) + geom_line() + theme(axis.text.x=element_text(angle=90,hjust=0))
# alternative 2: create a ggplot with a geom_freqpoly on the source data - - to show the nr. trx per city (on the x axis) with a differenct colored line for each type
c_line <- ggplot(na.omit(data),aes(city,group=type,color=type))
cline2<- c_line + geom_freqpoly() + theme(axis.text.x=element_text(angle=90,hjust=0))
# plot the two graphs in rows to compare, see that right of SACRAMENTO we miss two lines in plot 2, while they are in plot 1 (and we want them)
myplot<-grid.arrange(cline1,cline2)
As #joran pointed out, this gives a "similar" plot, when using "continuous" values:
ggplot(data, aes(x=as.numeric(factor(city)), group=type, colour=type)) +
geom_freqpoly(binwidth=1)
However, this is not exactly the same (compare the start of the graph), as the breaks are screwed up. Instead of binning from 1 to 39 with binwidth of 1, it, for some reason starts at 0.5 and goes until 39.5.