Plotting multiple frequency polygon lines using ggplot2 - r

I have a dataset with records that have two variables: "time" which are id's of decades, and "latitude" which are geographic latitudes. I have 7 time periods (numbered from 26 to 32).
I want to visualize a potential shift in latitude through time. So what I need ggplot2 to do, is to plot a graph with latitude on the x-axis and the count of records at a certain latitude on the y-axis. I need it do this for the seperate time periods and plot everything in 1 graph.
I understood that I need the function freqpoly from ggplot2, and I got this so far:
qplot(latitude, data = lat_data, geom = "freqpoly", binwidth = 0.25)
This gives me the correct graph of the data, ignoring the time. But how can I implement the time? I tried subsetting the data, but I can't really figure out if this is the best way..
So basically I'm trying to get a graph with 7 lines showing the frequency distribution in each decade in order to look for a latitude shift.
Thanks!!

Without sample data it is hard to answer but try to add color=factor(time) (where time is name of your column with time periods). This will draw lines for each time period in different color.
qplot(latitude, data = lat_data, geom = "freqpoly", binwidth = 0.25,
color=factor(time))

Related

Add points to geom_density_ridges for groups with small number of observations

I am loving using geom_density_ridges(), with individual points also included for each group. However, some groups have small sample sizes (e.g. n=1 or 2) precluding the generation of the density ridges. For these groups, I'd like to be able to plot the locations of the existing observations - even though no probability density function will be shown.
In this example, I'd like to be able to plot the 2 data points for May on the appropriate line.
library(tidyverse)
library(ggridges)
data("lincoln_weather")
#pull weather from all months that are NOT May
lincoln_weather_nomay<-lincoln_weather[which(lincoln_weather$Month!="May"),]
#pull weather just from May
lincoln_weather_may<-lincoln_weather[which(lincoln_weather$Month=="May"),]
#recombine, keeping only the first two rows for the May dataset
new_weather<-rbind(lincoln_weather_nomay,lincoln_weather_may[c(1:2),])
ggplot( new_weather, aes(x=`Min Temperature [F]`,y=Month,fill=Month))+
geom_density_ridges(alpha = 0.5,jittered_points = TRUE, point_alpha=1,point_shape=21) +
labs(x="Average temperature (F)",y='')+
guides(fill=FALSE,color=FALSE)
How can I add the points for the May observations to the appropriate location (i.e. the May slot) and at the appropriate location along the x-axis?
Simply add a separate geom_point() call to the function, in which you subset the data to include only observations for the previously-unplotted categories. You can apply any of the usual customizations to either 'match' the points plotted for the other categories, or to make these points 'stand out'.
ggplot( new_weather, aes(x=`Min Temperature [F]`,y=Month,fill=Month))+
geom_density_ridges(alpha = 0.5,jittered_points = TRUE, point_alpha=1,point_shape=21) +
geom_point(data=subset(new_weather, Month %in% c("May")),
aes(),shape=13)+
labs(x="Average temperature (F)",y='')+
guides(fill=FALSE,color=FALSE)

plotting multiple lines in ggplot R

I have neuroscientific data where we count synapses/cells in the cochlea and quantify these per frequency. We do this for animals of different ages. What I thus ideally want is the frequencies (5,10,20,30,40) in the x-axis and the amount of synapses/cells plotted on the y-axis (usually a numerical value from 10 - 20). The graph then will contain 5 lines of the different ages (6 weeks, 17 weeks, 43 weeks, 69 weeks and 96 weeks).
I try this with ggplot and first just want to plot one age. When I use the following command:
ggplot(mydata, aes(x=Frequency, y=puncta6)) + geom_line()
I get a graph, but no line and the following error: 'geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?'
So I found I have to adjust the code to:
ggplot(mydata, aes(x=Frequency, y=puncta6, group = 1)) + geom_line()
This works, except for the fact that my first data point (5 kHz) is now plotted behind my last data point (40 kHz)......... (This also happens without the 'group = 1' addition). How do I solve this or is there an easier way to plot this kind of data?
I couldnt add a file so I added a photo of my code + graph with the 5 kHz data point oddly located and I added a photo of my data in excel.
example data
example code and graph

choosing specific values on the X axis when using ggplot2

I am trying to plot a graph showing the number of events at the Olympics as a function of the year that a specific Olympic took place.
My data frame is called supertable and it consists of 2 columns, the first is the year and the second is the number of events in the games held that year.
My problem is that on the x axis I only get the years 1920 and 1980 and I would like to have 1920,1950,1980,2010
this is my code
ggplot(data = supertable,aes(x=year,y=no.of.events))+geom_point(colour='red')+
scale_x_discrete(breaks=c(1920,1950,1980,2010))
This is the picture I get
I tried doing this
scale_x_discrete(breaks=c(1920,1950,1980,2010),limits=c(1920,1950,1980,2010)
but it didn't help
I am assuming It is some thing small that I am missing, I tried searching for the answer but didn't find it.
Your x-axis is a continuous variable, so you need to use scale_x_continuous.
You used breaks correctly to indicate where your ticks on the x axis are, but the limits value should be a c(min, max) of the range of the plot you want to show.
Try this: scale_x_continuous(breaks=c(1920,1950,1980,2010), limits = c(1920, 2019))

How to plot two y axis? or combine(merge) two plots? Should handle faceted column as well

I've a combination of two difficult(I'm naive) requirements :(
Consider the Weather data as example. Let's say I've dataset with following information.
"Datetime", "Word", "Frequency", "Temperature"
Visualization: I want to see change in frequency of a word over time and at temperature.
X-axis shows the time series(date)
Y-axis has the frequency scale(0 to max freq).
Requirements:
I need to draw frequencies of several words(Column "word") over the time.
Correlate the frequency with temperature.
I started with ggplot2:
ggplot(TemperatureData, aes(x=timeId, y=termFrequency)) + geom_line() + facet_wrap(~Keyword) +
geom_line(data = TemperatureData, aes(y = temperature)) +
labs(x="Time Series over X days", y = "Term Frequency")
The above approach results in overlapping y axis (frequency, temperature). And, a separate bin for each "Word" (facet for ggplot). i.e plot has 3 bin's for each keyword. Each bin shows temperature over time, and frequency of a word over time.
Problems:
I want to be able to separate y-axis for temperature, and frequency. Also, I do not want to normalize these y-axis as it gets tough to understand what are the high/low values of each axis over days. Plot Loses readability. I learnt that two y-axis is not possible using ggplot2.
Separate bin for each keyword is not required. One horizontal line per keyword is what I'm looking for.
The plot should have only one appearance(line graph) of temperature to reflect change over time.
I tried using PAR, but could not succeed.
Example solution using plotrix package

How to plot binary data together with continuous data in time series with ggplot2?

I have several data sets containing binary and continuous data respectively.
The data sets includes the datetime for the given observation.
The time step in the datetime column is not the same, so I cannot merge the datasets.
(So far I kept the two datasets apart, especially because the timestep in each dataset is irregular it itself.)
The binary data is in lower frequency than the continous data
Important: I transformed the time to POSIXct format in order to get around the irregular timesteps in the data
I would like to plot the two datasets in one time series plot with ggplot2.
The binary data (0's and 1's) should shade the continuous curve with rectangular surfaces going from y=-Inf to y=Inf.
Does it make sense?
My question: How do I do that?
How to I create a legend and control the colors of the plot?
So far I have the binary data in one plot using geom_step
and the continous data in another plot
I tried multiplot, but it does not seem to work.
The dream situation is, to put multiple plots of different data on top of each other as layers using the POSIXct time as reference somehow!
Not sure I can give some reproducible code..
This is how I transform the time column to POSIXct format:
D$Time <- strptime(D$Time, format="%Y/%m/%d %H:%M:%S")
This is the plot with two binary data sets using geom_step:
ggplot() +
geom_step(data=E, aes(x=Time, y=Set, group=1, col="high window")) +
geom_step(data=D, aes(x=Time, y=Set, group=1)) +
scale_x_datetime(limits=c(as.POSIXct('0015-01-07 08:00:00'), as.POSIXct('0015-01-07 10:00:00'))) +
scale_y_continuous(breaks=seq(0, 1, 1))
I am currently trying to plot the plot above together with a third dataset which is continuous, which means I need another y-axis if I should continue with geom_step...

Resources