Why do geom_line() and geom_freqpoly() give back different graphs? - r

I am trying get my head around ggplot2 which creates beautiful graphs as you probably all know :)
I have a dataset with some transactions of sold houses in it (courtesy of: http://support.spatialkey.com/spatialkey-sample-csv-data/ )
I would like to have a line chart that plots the cities on the x axis and 4 lines showing the number of transactions in my datafile per city for each of the 4 home types. Doesn't sound too hard, so I found two ways to do this.
using an intermediate table doing the counts and geom_line() to plot the results
using geom_freqpoly() on my raw dataframe
the basic charts look the same, however chart nr. 2 seems to be missing plots for all the 0 values of the counts (eg. for the cities right of SACRAMENTO, there is no data for Condo, Multi-Family or Unknown (which seems to be missing completely in this graph)).
I personally like the syntax of method number 2 more than that of number 1 (it's a personal thing probably).
So my question is: Am I doing something wrong or is there a method to have the 0 counts also plotted in method 2?
# line chart example
# setup the libraries
library(RCurl) # so we can download a dataset
library(ggplot2) # so we can make nice plots
library(gridExtra) # so we can put plots on a grid
# get the data in from the web straight into a dataframe (all data is from: http://support.spatialkey.com/spatialkey-sample-csv-data/)
data <- read.csv(text=getURL('http://samplecsvs.s3.amazonaws.com/Sacramentorealestatetransactions.csv'))
# create a data frame that counts the number of trx per city/type combination
df_city_type<-data.frame(table(data$city,data$type))
# correct the column names in the dataframe
names(df_city_type)<-c('city','type','qty')
# alternative 1: create a ggplot with a geom_line on the calculated values - to show the nr. trx per city (on the x axis) with a differenct colored line for each type
cline1<-ggplot(df_city_type,aes(x=city,y=qty,group=type,color=type)) + geom_line() + theme(axis.text.x=element_text(angle=90,hjust=0))
# alternative 2: create a ggplot with a geom_freqpoly on the source data - - to show the nr. trx per city (on the x axis) with a differenct colored line for each type
c_line <- ggplot(na.omit(data),aes(city,group=type,color=type))
cline2<- c_line + geom_freqpoly() + theme(axis.text.x=element_text(angle=90,hjust=0))
# plot the two graphs in rows to compare, see that right of SACRAMENTO we miss two lines in plot 2, while they are in plot 1 (and we want them)
myplot<-grid.arrange(cline1,cline2)

As #joran pointed out, this gives a "similar" plot, when using "continuous" values:
ggplot(data, aes(x=as.numeric(factor(city)), group=type, colour=type)) +
geom_freqpoly(binwidth=1)
However, this is not exactly the same (compare the start of the graph), as the breaks are screwed up. Instead of binning from 1 to 39 with binwidth of 1, it, for some reason starts at 0.5 and goes until 39.5.

Related

plotting multiple lines in ggplot R

I have neuroscientific data where we count synapses/cells in the cochlea and quantify these per frequency. We do this for animals of different ages. What I thus ideally want is the frequencies (5,10,20,30,40) in the x-axis and the amount of synapses/cells plotted on the y-axis (usually a numerical value from 10 - 20). The graph then will contain 5 lines of the different ages (6 weeks, 17 weeks, 43 weeks, 69 weeks and 96 weeks).
I try this with ggplot and first just want to plot one age. When I use the following command:
ggplot(mydata, aes(x=Frequency, y=puncta6)) + geom_line()
I get a graph, but no line and the following error: 'geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?'
So I found I have to adjust the code to:
ggplot(mydata, aes(x=Frequency, y=puncta6, group = 1)) + geom_line()
This works, except for the fact that my first data point (5 kHz) is now plotted behind my last data point (40 kHz)......... (This also happens without the 'group = 1' addition). How do I solve this or is there an easier way to plot this kind of data?
I couldnt add a file so I added a photo of my code + graph with the 5 kHz data point oddly located and I added a photo of my data in excel.
example data
example code and graph

Plot mean and standard deviation by category

I'm trying to plot a plot with mean and sd bars by three levels of a factor.
(After two hours of searching on the internet, then checking the Rbook and Rgraphs book I'm still not finding the answer. I think this is because it is a very simple question.)
I have a simple data frame with three columns: my categories, mean, sd.
I would like to do a plot with the mean by category and its sd bars, just like
this one (edit: link broken)
My dataframe looks like this
color mean.temp sd
black 37.93431 2.267125
red 37.01423 1.852052
orange 36.61345 1.339032
I'm so sorry for asking this dumb question but I sincerely couldn't find any simple answer to my simple question.
With ggplot:
read data:
df=read.table(text=' color mean.temp sd
1 black 37.93431 2.267125
2 red 37.01423 1.852052
3 orange 36.61345 1.339032',header=TRUE)
plotting:
ggplot(df, aes(x=color, y=mean.temp)) +
geom_errorbar(aes(ymin=mean.temp-sd, ymax=mean.temp+sd), width=.2) +
geom_line() +
geom_point()
output
Create a data.frame holding your data:
foo <- data.frame(color=c("black","red","orange"),
mean.temp=c(37.93431,37.01423,36.61345),
sd=c(2.267125,1.852052,1.339032))
Now, we first plot the means as dots, making sure that we have enough room horizontally (xlim) and vertically (ylim), suppressing x axis annotation (xaxt="n") and all axis labeling (xlab="", ylab="").
plot(1:3,foo$mean.temp,pch=19,xlab="",ylab="",xaxt="n",xlim=c(0.5,3.5),
ylim=c(min(foo$mean.temp-foo$sd),max((foo$mean.temp+foo$sd))))
Next, we plot the standard deviations as lines. You could also use three separate lines commands, which may be easier to read. This way, we first collect the data into matrices via rbind(). R will automatically turn these matrices into vectors and recycle them. The NAs are there so we don't join the end of one line to the beginning of the next one. (Try removing the NAs to see what happens.)
lines(rbind(1:3,1:3,NA),rbind(foo$mean.temp-foo$sd,foo$mean.temp+foo$sd,NA))
Finally, annote the x axis:
axis(side=1,at=1:3,labels=foo$color)

Trying to label bar chart ggplot2, get "Error: Aesthetics must either be length one, or the same length..."

I'm using a bar chart to show the income distribution of parking meters in a city. My data frame includes columns for parking meter ID, annual revenue for that meter, and which decile (1-10) that meter falls into based on its total revenue. So my command looks like this:
> rev <- ggplot(parking, aes(x=decile, y=revenue))
> rev + geom_bar(stat="identity")
And the result is exactly what I want, but I'd like to add the total revenue for each decile atop each bar in the graph, and I don't know how. I tried this:
> aggrev <- aggregate(revenue~decile, data=parking, sum)
> totals <- aggrev$revenue
> rev + geom_bar(stat="identity") + geom_text(aes(label=totals))
But I get this error message:
Error: Aesthetics must either be length one, or the same length as the
dataProblems:totals.
I checked length(decile) and length(totals), and their values are 4600 and 10, respectively. So I understand why this is happening, but why can't I just add any 10 characters to the 10 bars? Or to get the chart to display the bar totals automatically, maybe using "identity"? I've decided to just run this:
ggplot(aggrev, aes(x=decile,y=revenue))+geom_bar()+geom_text(aes(label=revenue))
which works, but I'd rather not have to make a new dataframe each time I want to have labels.
Add the totals to the parking dataframe:
parking$totals <- aggrev$revenue
That will allow the token "totals" to get found in the correct environment. (You may also need to specify an x and y vector.)
Simply put, you have to have a dataset that looks like what you are plotting except it will have a variable "labels". It maps your specific label to x y coordinate. This gets a little tricky once you add facet.grid.
if your dataframe has 80 rows with 4 factors:
Your label dataframe will have 80 rows of x y values as well as your factor variable + "label".
datain <- data.frame(type=rep(c('dog','cat'),40),height=c(1,5,3,2,5,2,6,8,10,3))
datain <- datain[order(datain$type),]
for_text <- data.frame(type=c('dog','cat'),height=c(10.5,9),label=c('dog','cat'))
plot <- ggplot(datain,aes(x=type,y=height)) +
geom_boxplot(width=1) +
geom_text(data=for_text,aes(x=for_text$type,y=for_text$height,label=for_text$label))

Plotting multiple lines with ggplot2 in R

I have data from some psychophysical experiments that I'd like to plot. My dataframe contains multiple observations from multiple participants in three paradigms of an experiment.
In other words, each participant took part in three psychophysical experiments and I'd like to plot the data on a single graph.
At present, my plot looks like this:
The data on the right of the plot are from one of the experiments (1), whilst the mass of data on the left are from the two other experiments (2 & 3). Essentially, I'm trying to show graphically that experiment 1 yields very different results to experiments 2 & 3.
This plot is of two parameters, 'probability_seen' and 'visual_acuity'. My dataframe also contains two other columns: subject_initials and experiment_type. As you can see, I'm separating out the subjects by colour. I'd also like to join the lines up for each of the experiments (the above plot actually contains three curves for each subject), but if I add geom_line() to my plot, I get this:
Obviously, I haven't asked ggplot2 to respect the state of 'experiment_type'. How do I do this?
n.b. I currently call the plot with the following code:
qplot(visual_acuity, probability_seen, data = dframe1, colour = subject_initials,
xlab = "Visual acuity", ylab = "Probability seen") + geom_line()
As #Baptiste has stated, the solution is to add group = experiment_type to the qplot call.

introducing a gap in continuous x axis using ggplot

This is kinda a build-on on my previous post creating an stacked area/bar plot with missing values (all the script I run can be found there). In this post, however, Im asking if its possible to leave a gap in an continuous x axis? I have a time-serie (month-by-month) over a year, but for one sample one month is missing and I would like to show this month as a complete gap in the plot. Almost like plotting a graph for Jan-Aug (Sep is missing) and one for Oct-Dec and merging these with a gap for Sep.
The only things I have come up trying are treating the missing month as zero or NA, creating a hugh drop in the area chart for Sep or excluding it but with an x axis ranging from 1-11, respectively (see plots in dropbox folder).
The data set Im working on can be found in my dropbox folder and it's named r_class.txt and you can also see the two different plots (Rplots1 and 2).
Any ideas would really be appreciated!
Plot the series as two separate data frames:
#Load libraries
require(ggplot2)
require(reshape)
#Code copied from your linked post:
wa=read.table('wa_class.txt', sep="", header=F, na.string="0")
names(wa)=c("Class","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")
wam=melt(wa)
wam$variablen=as.numeric(wam$variable)
#For readability, split the melted data frame into two separate data frames
wam1 <- wam[wam$variablen %in% 1:6,]
wam2 <- wam[wam$variablen %in% 8:12, ]
ggplot() +
geom_area(data=wam1, aes(x=variablen, y=value, fill=Class)) +
geom_area(data=wam2, aes(x=variablen, y=value, fill=Class))
#and add lineranges, etc., accordingly

Resources