plotting multiple lines in ggplot R - r

I have neuroscientific data where we count synapses/cells in the cochlea and quantify these per frequency. We do this for animals of different ages. What I thus ideally want is the frequencies (5,10,20,30,40) in the x-axis and the amount of synapses/cells plotted on the y-axis (usually a numerical value from 10 - 20). The graph then will contain 5 lines of the different ages (6 weeks, 17 weeks, 43 weeks, 69 weeks and 96 weeks).
I try this with ggplot and first just want to plot one age. When I use the following command:
ggplot(mydata, aes(x=Frequency, y=puncta6)) + geom_line()
I get a graph, but no line and the following error: 'geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?'
So I found I have to adjust the code to:
ggplot(mydata, aes(x=Frequency, y=puncta6, group = 1)) + geom_line()
This works, except for the fact that my first data point (5 kHz) is now plotted behind my last data point (40 kHz)......... (This also happens without the 'group = 1' addition). How do I solve this or is there an easier way to plot this kind of data?
I couldnt add a file so I added a photo of my code + graph with the 5 kHz data point oddly located and I added a photo of my data in excel.
example data
example code and graph

Related

R facet_wrap not showing facet that has different numbers of observations

I am trying to plot a set of time-series data where some data are observed quarterly and others are observed monthly. When I plot my data, however, the facet containing the data that has fewer observations doesn't come out correctly.
In my actual data set, I'm getting a bit of a different outcome, but I think that the problem I face in the example below is probably due to the same underlying problem. You'll see in the example below, that the last facet doesn't display a line at all. In my actual dataset, the last facet displays all of the points, but they are compressed together at the start of the graph as if the observations were happening monthly. So the line in the last facet is a quarter the length of frame, which spans the whole time period.
a <- economics[1:100,]
a[seq(1,100,2), "unemploy"] <- NA
b <- melt(a,id.vars="date")
smPlot <- ggplot(b, aes_string(x="date", y="value")) +
geom_line() +
facet_wrap(~ variable, ncol=5, scales="free")
smPlot
Which creates the following plot:
The issue in your example code is that a line must have at least 2 points. If every other point is NA (as in the last facet), each line segment has only one point in it so does not show anything.
To expand on this a little, consider how a line trace is drawn. It is comprised of many line segments, each having a start and an end. In continuous data that does not have any NA's, then each segment starts where the last ended. But when there is an NA in the data, it will break the line and a new line starts from the next non-NA data point. So when there is only a single data point between each NA, each segment has only a single point and cannot be drawn as a line (but could be shown as a point using geom_point).
If you just subset out the rows with NA, it should look fine
smPlot <- ggplot(b[!is.na(b$value), ], aes_string(x="date", y="value")) +
geom_line() +
facet_wrap(~ variable, ncol=5, scales="free")
smPlot

How to plot two y axis? or combine(merge) two plots? Should handle faceted column as well

I've a combination of two difficult(I'm naive) requirements :(
Consider the Weather data as example. Let's say I've dataset with following information.
"Datetime", "Word", "Frequency", "Temperature"
Visualization: I want to see change in frequency of a word over time and at temperature.
X-axis shows the time series(date)
Y-axis has the frequency scale(0 to max freq).
Requirements:
I need to draw frequencies of several words(Column "word") over the time.
Correlate the frequency with temperature.
I started with ggplot2:
ggplot(TemperatureData, aes(x=timeId, y=termFrequency)) + geom_line() + facet_wrap(~Keyword) +
geom_line(data = TemperatureData, aes(y = temperature)) +
labs(x="Time Series over X days", y = "Term Frequency")
The above approach results in overlapping y axis (frequency, temperature). And, a separate bin for each "Word" (facet for ggplot). i.e plot has 3 bin's for each keyword. Each bin shows temperature over time, and frequency of a word over time.
Problems:
I want to be able to separate y-axis for temperature, and frequency. Also, I do not want to normalize these y-axis as it gets tough to understand what are the high/low values of each axis over days. Plot Loses readability. I learnt that two y-axis is not possible using ggplot2.
Separate bin for each keyword is not required. One horizontal line per keyword is what I'm looking for.
The plot should have only one appearance(line graph) of temperature to reflect change over time.
I tried using PAR, but could not succeed.
Example solution using plotrix package

Plotting multiple frequency polygon lines using ggplot2

I have a dataset with records that have two variables: "time" which are id's of decades, and "latitude" which are geographic latitudes. I have 7 time periods (numbered from 26 to 32).
I want to visualize a potential shift in latitude through time. So what I need ggplot2 to do, is to plot a graph with latitude on the x-axis and the count of records at a certain latitude on the y-axis. I need it do this for the seperate time periods and plot everything in 1 graph.
I understood that I need the function freqpoly from ggplot2, and I got this so far:
qplot(latitude, data = lat_data, geom = "freqpoly", binwidth = 0.25)
This gives me the correct graph of the data, ignoring the time. But how can I implement the time? I tried subsetting the data, but I can't really figure out if this is the best way..
So basically I'm trying to get a graph with 7 lines showing the frequency distribution in each decade in order to look for a latitude shift.
Thanks!!
Without sample data it is hard to answer but try to add color=factor(time) (where time is name of your column with time periods). This will draw lines for each time period in different color.
qplot(latitude, data = lat_data, geom = "freqpoly", binwidth = 0.25,
color=factor(time))

Why do geom_line() and geom_freqpoly() give back different graphs?

I am trying get my head around ggplot2 which creates beautiful graphs as you probably all know :)
I have a dataset with some transactions of sold houses in it (courtesy of: http://support.spatialkey.com/spatialkey-sample-csv-data/ )
I would like to have a line chart that plots the cities on the x axis and 4 lines showing the number of transactions in my datafile per city for each of the 4 home types. Doesn't sound too hard, so I found two ways to do this.
using an intermediate table doing the counts and geom_line() to plot the results
using geom_freqpoly() on my raw dataframe
the basic charts look the same, however chart nr. 2 seems to be missing plots for all the 0 values of the counts (eg. for the cities right of SACRAMENTO, there is no data for Condo, Multi-Family or Unknown (which seems to be missing completely in this graph)).
I personally like the syntax of method number 2 more than that of number 1 (it's a personal thing probably).
So my question is: Am I doing something wrong or is there a method to have the 0 counts also plotted in method 2?
# line chart example
# setup the libraries
library(RCurl) # so we can download a dataset
library(ggplot2) # so we can make nice plots
library(gridExtra) # so we can put plots on a grid
# get the data in from the web straight into a dataframe (all data is from: http://support.spatialkey.com/spatialkey-sample-csv-data/)
data <- read.csv(text=getURL('http://samplecsvs.s3.amazonaws.com/Sacramentorealestatetransactions.csv'))
# create a data frame that counts the number of trx per city/type combination
df_city_type<-data.frame(table(data$city,data$type))
# correct the column names in the dataframe
names(df_city_type)<-c('city','type','qty')
# alternative 1: create a ggplot with a geom_line on the calculated values - to show the nr. trx per city (on the x axis) with a differenct colored line for each type
cline1<-ggplot(df_city_type,aes(x=city,y=qty,group=type,color=type)) + geom_line() + theme(axis.text.x=element_text(angle=90,hjust=0))
# alternative 2: create a ggplot with a geom_freqpoly on the source data - - to show the nr. trx per city (on the x axis) with a differenct colored line for each type
c_line <- ggplot(na.omit(data),aes(city,group=type,color=type))
cline2<- c_line + geom_freqpoly() + theme(axis.text.x=element_text(angle=90,hjust=0))
# plot the two graphs in rows to compare, see that right of SACRAMENTO we miss two lines in plot 2, while they are in plot 1 (and we want them)
myplot<-grid.arrange(cline1,cline2)
As #joran pointed out, this gives a "similar" plot, when using "continuous" values:
ggplot(data, aes(x=as.numeric(factor(city)), group=type, colour=type)) +
geom_freqpoly(binwidth=1)
However, this is not exactly the same (compare the start of the graph), as the breaks are screwed up. Instead of binning from 1 to 39 with binwidth of 1, it, for some reason starts at 0.5 and goes until 39.5.

Plotting multiple lines with ggplot2 in R

I have data from some psychophysical experiments that I'd like to plot. My dataframe contains multiple observations from multiple participants in three paradigms of an experiment.
In other words, each participant took part in three psychophysical experiments and I'd like to plot the data on a single graph.
At present, my plot looks like this:
The data on the right of the plot are from one of the experiments (1), whilst the mass of data on the left are from the two other experiments (2 & 3). Essentially, I'm trying to show graphically that experiment 1 yields very different results to experiments 2 & 3.
This plot is of two parameters, 'probability_seen' and 'visual_acuity'. My dataframe also contains two other columns: subject_initials and experiment_type. As you can see, I'm separating out the subjects by colour. I'd also like to join the lines up for each of the experiments (the above plot actually contains three curves for each subject), but if I add geom_line() to my plot, I get this:
Obviously, I haven't asked ggplot2 to respect the state of 'experiment_type'. How do I do this?
n.b. I currently call the plot with the following code:
qplot(visual_acuity, probability_seen, data = dframe1, colour = subject_initials,
xlab = "Visual acuity", ylab = "Probability seen") + geom_line()
As #Baptiste has stated, the solution is to add group = experiment_type to the qplot call.

Resources