Several or multiple timeseries plot outputs from a single data frame - r

Hello,
I have been struggling with this problem for a while now and anyone who can help me out I would greatly appreciate it.
First off, I am working with time series data in a single data frame containing multiple time series. Too many to output individually into graphs. I have tried passing qplot() through ddply() however r tells me it qplot is not a function and therefore it will not work.
the structure of my data is like this...
goodlocs <-
Loc Year dir
Artesia 1983 1490
Artesia 1984 1575
Artesia 1986 1567
Artesia 1987 1630
Artesia 1990 1680
Bogota 1983 1525
Bogota 1984 1610
Bogota 1985 1602
Bogota 1986 1665
Bogota 1990 1715
Carlsbad 1983 1560
Carlsbad 1985 1645
Carlsbad 1986 1637
Carlsbad 1987 1700
Carlsbad 1990 1750
Carlsbad 1992 1595
Datil 1987 1680
Datil 1990 1672
Datil 1991 1735
Datil 1992 1785
I have about 250 Locations(Locs) and would like to be able to go over each stations data on a graph like the following one so I can inspect all of my data visually.
Artesia <- goodlocs[goodlocs$Loc == "Artesia",]
qplot(YEAR, dir, data = Artesia, geom = c("point", "line"), xlab = "Year",
ylab = "DIR", main = "Artesia DIR Over Record Period") +
geom_smooth(method=lm)
I understand that Par() is supposed to help do this but I can not figure it out for the life of me. Any help is greatly appreciated.
Thanks,
-Zia
edit -
as Arun pointed out, I am trying to save a .pdf of 250 different graphs of my goodlocs df split by "Loc", with point and line geometry for data review....
I also tried passing a ddply of my df through qplot as the data but it did not work either, I was not really expecting it to but i had to try.

How about this?
require(ggplot2)
require(plyr)
require(gridExtra)
pl <- dlply(df, .(Loc), function(dat) {
ggplot(data = dat, aes(x = Year, y = dir)) + geom_line() +
geom_point() + xlab("x-label") + ylab("y-label") +
geom_smooth(method = "lm")
})
ml <- do.call(marrangeGrob, c(pl, list(nrow = 2, ncol = 2)))
ggsave("my_plots.pdf", ml, height = 7, width = 13, units = "in")
The idea: First split the data by Loc and create the plot for each subset. The splitting part is done using plyr function dlply that basically takes a data.frame as input and provides a list as output. The plot element is stored in each element of the list corresponding to the subset. Then, we use gridExtra package's marrangeGrob function to arrange multiple plots (which also has the very useful nrow and ncol arguments to set the argument). Then, you can save it using ggsave from ggplot2.
I'll leave you to any additional tweaks you may require.

Related

R GAM visualisation, geom_smooth not fit to all observed data

I've made a GAM model in R using the following code:
mod_gam1 <-gam(y ~ s(ï..x), data=Bird.data, method = "REML")
plot(mod_gam1)
coef(mod_gam1)
plot(mod_gam1, residuals = TRUE, pch = 1)
coef(mod_gam1)
mod_gam1$fitted.values
result <- data.frame(data = c(mod_gam1$fitted.values, Bird.data$y), Year = rep(1991:2019, times = 2),
'source' = c(rep('Modelled', times = 29), rep('Observed', times = 29)))
ggplot(result, aes(x = Year, y = data, colour = source))+ geom_point()+ geom_smooth(span= 0.8)+labs(x="Year", y = "Bird Island Total Debris Count")+ scale_y_continuous(limits = c(0,1000))
and the output looks ok but the shaded area of the geom_smooth error doesn't extend to the whole of my dataset (stops short of my first two datapoints) and I am not sure why.
Any help would be appreciated!
I can't upload a picture as I am new to the site, but yeah basically I have two datasets (observed and GAM modelled values) which both have their SE confidence ribbon, but these start two datapoints in to my datasets not at the first points.
These are my datapoints:
Bird.data
ï..x
y
1991
17
1992
76
1993
328
1994
131
1995
425
1996
892
1997
501
1998
419
1999
297
2000
277
2001
310
2002
282
2003
189
2004
278
2005
322
2006
444
2007
412
2008
241
2009
242
2010
255
2011
289
2012
335
2013
279
2014
628
2015
500
2016
174
2017
636
2018
420
2019
447
Fitted Values
[1] 95.56189 177.01468 255.17074 324.97532 380.28813 415.71334 428.67793 420.86624 398.18522 369.06325
[11] 341.72715 321.65585 310.33971 305.81158 304.53360 303.60521 302.21413 301.75501 304.77184 313.43400
[21] 328.37279 348.39076 371.04203 393.66222 414.29754 432.15104 447.48020 461.14595 474.09266
Negative Binomial
It is because of the limits you have put using scale_y_continuous. If you remove that line (or adjust the y down, so that it allows the minimum y value of the smooth, then you will see the smooth fill completely.
However, you have a larger problem here. You are not actually showing the gam model in the smooth (only the gam point predictions). There are a couple of ways to do this.. Easiest might be to feed Bird.data directly to the ggplot function, and use the method and formula params of the geom_smooth() to directly request the gam smooth:
ggplot(Bird.data, aes(x,y)) +
geom_point() +
geom_smooth(method="gam", formula=y~s(x)) +
labs(x="Year", y = "Bird Island Total Debris Count")
The problem with this approach is that you don't get the prediction points as well. This can be fixed with the following approach
add the se directly to the result dataframe
result$se = c(predict(mod_gam1,se=T)$se, rep(NA,29))
use ggplot as before, but use geom_ribbon, setting the ymin and ymax directly
ggplot(result, aes(x = Year, y = data, colour = source, fill=source))+
geom_point()+
geom_ribbon(aes(ymin=data-1.96*se, ymax=data+1.96*se), alpha=0.2) +
labs(x="Year", y = "Bird Island Total Debris Count")+
scale_y_continuous(limits = c(-200,1000))

How to merge estimation and projection graphs in one plot?

I accessed this graph of estimation of number of cases of diabetes and future projections of numbers for every two year estimation data points from year 2000. The graph is factually incorrect as the points on line do not coincide with the scale on left. I am trying to replot it in ggplot2 or ggplotly. While replotting I intend to make two line graphs in a single plot - One for estimations over last few years and the other for future projections made in those years for next 20-25 years and the year on which the projections were made. Any help is highly appreciable.
Here is the data that was used to plot the graph - Estimated numbers with year are represented in blue while Projected numbers for future years are represented by red line. Since, there are multiple projected numbers for few year, I am intending to keep the highest number on the line graph.
EstimationYear
Estimates (in millions)
Projections (in millions)
Projection Year
2000
151
333
2025
2003
194
380
2025
2006
246
438
2030
2009
285
552
2030
2011
366
578
2030
2013
382
592
2035
2015
415
642
2040
2017
425
629
2045
2019
463
700
2045
Your question is more about the data wrangling than the actual plotting with ggplot. Once you have the data in the right shape, the plotting command is just a few lines.
prepare the data for the estimation (blue) points. Set a column type to "estimation".
prepare the data for the projected (red) points. Set a column type to "projection".
use bind_rows to combine both tables.
In the aesthetics of ggplot use color=type
Here is a start in how you can go recreate the plot from the data. I haven't put any effort in recreating the balloons, set the theme to something more elegant and those kind of things.
library(ggplot2)
txt <- "2000 151 333 2025
2003 194 380 2025
2006 246 438 2030
2009 285 552 2030
2011 366 578 2030
2013 382 592 2035
2015 415 642 2040
2017 425 629 2045
2019 463 700 2045"
df <- read.table(text = txt)
# Putting years and values in the same columns
# Probably some tidyverse function can do this more elegantly
df <- rbind(cbind(unname(df[1:2]), type = "Estimates"),
cbind(unname(df[4:3]), type = "Projection"))
colnames(df) <- c("year", "value", "type")
# We're reordering on value, because the red line does not touch year-duplicates
df <- df[order(df$value, decreasing = TRUE),]
ggplot(df, aes(year, value, colour = type)) +
# Formula notation to filter out data for the line
geom_line(data = ~ subset(., !duplicated(year))) +
geom_point() +
scale_colour_manual(
values = c("Estimates" = "dodgerblue",
"Projection" = "tomato")
) +
scale_y_continuous(limits = c(0, NA),
name = "Millions")
Created on 2021-01-06 by the reprex package (v0.3.0)

How can I get my area plot to stack using ggplot?

I am trying to get my cumulative area plot to stack using the code below, which is based on http://dantalus.github.io/2015/08/16/step-plots/. I have added in position=stack, however the plot still overlaps.
The aim of what I am trying to achieve is to show the cumulative number of publications each year within a given period. So, as an example, in 1940 there may be one publication, the following year there may be 2 more, bringing the cumulative total to 3.
What would be the best way to get the areas to stack on top of each other?
How can the order be controlled? Would I need to use arrange() to order TERM2?
ggplot(data=working, aes(x=Year, color=TERM2, fill=TERM2)) +
stat_bin(data = subset(working, TERM2=="A"), bins=80, aes(y=cumsum(..count..)),geom="area", position="stack", alpha=0.1) +
stat_bin(data = subset(working, TERM2=="B"), bins=80, aes(y=cumsum(..count..)),geom="area", position="stack",alpha=0.1) +
stat_bin(data = subset(working, TERM2=="Both"),bins=80, aes(y=cumsum(..count..)),geom="area", position="stack", alpha=0.1) +
ylab("Total Number") + xlim(1940,2020) + ggtitle("Cumulative number by measurement method")
What I am currently getting:
Example of what I am trying to achieve:
The following chart was created in Excel using the same data which is exactly what I am looking to achieve in R.
My Data:
Example of how my data is currently structured:
Year TERM2
1944 A
1959 B
1966 A
1968 B
1968 A
1970 A
1971 B
1971 B
1971 A
1971 A
1971 Both
1971 Both
1971 Both
1972 A
1972 Both
1972 Both
1973 B
1973 A
1974 A
1974 A
'data.frame': 803 obs. of 6 variables:
$ Year : int 1944 1959 1966 1968 1968 1970 1971 1971 1971 1971 ...
$ TERM2 : Factor w/ 3 levels "B","A","Both": 2 1 2 1 2 2 1 1 2 2 ...
Changes based on user127649's suggestions
This is the plot after user127649's suggestions, which is close to what I would expect except I am looking for it to start at 0 and end at 803 (total number of publications).
ggplot(data=working, aes(x=Year, color=TERM2, fill=TERM2)) +
stat_bin(bins=80, aes(y=cumsum(..count..)), geom="area", alpha=0.1) +
ylab("Total Number") + xlim(1940,2020) + ggtitle("Cumulative number by measurement method")
I think there were two issues.
When You use stat_bin() in three separate layers, each effectively has it’s own independent data set. This will give the correct count, but (and this is a guess really) I think being in three separate layers means you can’t stack them.
If you use stat_bin() on all the layers I think stat = '..count..' performs cumsum() on the data as a whole.
I don’t know whether this is the best approach or not, but I think it’s what you’re after.
Data
The data are grouped and cumsum() is used on each group separately.
library(tidyverse)
working <- working %>%
count(Year, TERM2) %>%
spread(TERM2, n, fill = 0) %>%
mutate_at(vars('A', 'B', 'Both'), cumsum) %>%
gather(TERM2, N, -Year, factor_key = T) #%>%
# mutate(TERM2 = ordered(TERM2, levels = rev(levels(TERM2))))
Plot
This code will produce the first plot below. If you prefer the look of the second plot, you can un-comment the last line of the data manipulation chunk.
ggplot(working, aes(Year, N, fill = TERM2)) +
geom_area(position = 'stack') +
ylab("Total Number")
Result

R ggplot2 the names of the biggest values on x axis

Hi there) can anybody help me. I have a big DF with two columns Country_dest and SumTotal (is value), trying to use qplot function
qplot(country_dest, SumTotal, data=Africa)
Brunei 58
Aruba 73
Cuba 95
Nicaragua 97
Turkmenistan 99
Saint Lucia 102
Honduras 153
Barbados 161
Haiti 165
Montenegro 175
And I would like to draw a plot, but on x axis put the name of the countries (for example 7 or 6 of them) with the highest value of SumTotal, is it possible to do?)
Thank you in advance!
using ggplot, just reorder by population:
ggplot(data = Africa, aes(x= reorder(country_dest, -SumTotal), y= SumTotal)) + geom_bar(stat = "identity")
if you just wanna take say the top 5 use arrange and then subset:
require(dplyr)
Africa.ordered <- arrange(Africa, -SumTotal)
Africa.top5 <- Africa.ordered[1:5,]
and then draw your plot

The difference between geom_density in ggplot2 and density in base R

I have a data in R like the following:
bag_id location_type event_ts
2 155 sorter 2012-01-02 17:06:05
3 305 arrival 2012-01-01 07:20:16
1 155 transfer 2012-01-02 15:57:54
4 692 arrival 2012-03-29 09:47:52
10 748 transfer 2012-01-08 17:26:02
11 748 sorter 2012-01-08 17:30:02
12 993 arrival 2012-01-23 08:58:54
13 1019 arrival 2012-01-09 07:17:02
14 1019 sorter 2012-01-09 07:33:15
15 1154 transfer 2012-01-12 21:07:50
where class(event_ts) is POSIXct.
I wanted to find the density of bags at each location in different times.
I used the command geom_density(ggplot2) and I could plot it very nice. I wonder if there is any difference between density(base) and this command. I mean any difference about the methods that they are using or the default bandwith that they are using and the like.
I need to add the densities to my data frame. If I had used the function density(base), I knew how I can use the function approxfun to add these values to my data frame, but I wonder if it is the same when I use geom_density(ggplot2) .
A quick perusal of the ggplot2 documentation for geom_density() reveals that it wraps up the functionality in stat_density(). The first argument there references that the adjust parameter coming from the base function density(). So, to your direct question - they are built off of the same function, though the exact parameters used may be different. You have some control over setting those parameters, but you may not be able to have the amount of flexibility you want.
One alternative to using geom_density() is to calculate the density that you want outside of ggplot() and then plot it with geom_line(). For example:
library(ggplot2)
#100 random variables
x <- data.frame(x = rnorm(100))
#Calculate own density, set parameters as you desire
d <- density(x$x)
x2 <- data.frame(x = d$x, y = d$y)
#Using geom_density()
ggplot(x, aes(x)) + geom_density()
#Using home grown density
ggplot(x2, aes(x,y)) + geom_line(colour = "red")
Here, they give nearly identical plots, though they may vary more significantly with your data and your settings.

Resources