time series plot in R - r

My data looks something like this:
There are 10,000 rows, each representing a city and all months since 1998-01 to 2013-9:
RegionName| State| Metro| CountyName| 1998-01| 1998-02| 1998-03
New York| NY| New York| Queens| 1.3414| 1.344| 1.3514
Los Angeles| CA| Los Angeles| Los Angeles| 12.8841| 12.5466| 12.2737
Philadelphia| PA| Philadelphia| Philadelphia| 1.626| 0.5639| 0.2414
Phoenix| AZ| Phoenix| Maricopa| 2.7046| 2.5525| 2.3472
I want to be able to do a plot for all months since 1998 for any city or more than one city.
I tried this but i get an error. I am not sure if i am even attempting this right. Any help will be appreciated. Thank you.
forecl <- ts(forecl, start=c(1998, 1), end=c(2013, 9), frequency=12)
plot(forecl)
Error in plots(x = x, y = y, plot.type = plot.type, xy.labels = xy.labels, :
cannot plot more than 10 series as "multiple"

You might try
require(reshape)
require(ggplot2)
forecl <- melt(forecl, id.vars = c("region","state","city"), variable_name = "month")
forecl$month <- as.Date(forecl$month)
ggplot(forecl, aes(x = month, y = value, color = city)) + geom_line()

To add to #JLLagrange's answer, you might want to pass city through facet_grid() if there are too many cities and the colors will be hard to distinguish.
ggplot(forecl, aes(x = month, y = value, color = city, group = city)) +
geom_line() +
facet_grid( ~ city)

Could you provide an example of your data, e.g. dput(head(forecl)), before converting to a time-series object? The problem might also be with the ts object.
In any case, I think there are two problems.
First, data are in wide format. I'm not sure about your column names, since they should start with a letter, but in any case, the general idea would be do to something like this:
test <- structure(list(
city = structure(1:2, .Label = c("New York", "Philly"),
class = "factor"), state = structure(1:2, .Label = c("NY",
"PA"), class = "factor"), a2005.1 = c(1, 1), a2005.2 = c(2, 5
)), .Names = c("city", "state", "a2005.1", "a2005.2"), row.names = c(NA,
-2L), class = "data.frame")
test.long <- reshape(test, varying=c(3:4), direction="long")
Second, I think you are trying to plot too many cities at the same time. Try:
plot(forecl[, 1])
or
plot(forecl[, 1:5])

Related

Why geom_bracket is not allowing me to plot a bracket?

I would like to add a bracket using geom_bracket for my first two groups of countries the United Kingdom (UK) and France (FR). I use the following code and it plots the three estimates:
library(ggpubr)
library(ggplot2)
df %>%
ggplot(aes(estimate, cntry)) +
geom_point()
However, whenever i add the geom_bracket as below, i get an error. I tried to get around it in different ways but it is still not working. Could someone let me know what i am doing wrong?
df %>%
ggplot(aes(estimate, cntry)) +
geom_point() +
geom_bracket(ymin = "UK", ymax = "FR", x.position = -.75, label.size = 7,
label = "group 1")
Here is a reproducible example:
structure(list(cntry = structure(1:3, .Label = c("BE", "FR",
"UK"), class = "factor"), estimate = c(-0.748, 0.436,
-0.640)), row.names = c(NA, -3L), groups = structure(list(
cntry = structure(1:3, .Label = c("BE", "FR", "UK"), class = "factor"),
.rows = structure(list(1L, 2L, 3L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, 3L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
Well, it's pretty damn late at that, but I figured out a workaround for this. I though that I might as well post it here in case anyone finds it useful.
Firstly, as Basti mentioned, ymin, ymax, and x.position aren't arguments that can be used - you have to use xmin, xmax, and y.position. Now, won't this only work for a flipped graph (i.e. x = cntry, y = estimate)? Yes, it will. However you can easily get around this by using coord_flip().
Secondly, it turns out that geom_bracket doesn't inherit the data description (df) and won't run without it being defined inside it. Why? No idea. But this is what was causing the error. Additionally, for some reason, merely defining the data isn't enough, a label must also be added. Not a problem here, just thought I might mention it for dumb people like me who decided to use geom_bracket to add brackets to stat_compare_means.
Here's an example of the OP that should work, along with data generation:
library(ggplot2)
library(ggpubr)
library(tibble) #I like tibbles
df <- tibble(cntry = factor(c("BE", "FR", "UK")),
estimate = c(-0.748,0.436,-0.64)) #dataframe generation
df %>%
ggplot(aes(cntry, estimate)) +
geom_point() +
coord_flip() + #necessary if you want to keep this weird x/y orientation
geom_bracket(data = df, xmin = "UK", xmax = "FR", y.position = -.75,
label.size = 7, label = "group 1", coord.flip = T)
#coord.flip = T reflects the added coord_flip()
You can then play around with y coordinates, size, etc. You can also expand the graph using expand_limits().

Plotting different rows as different lines in R with matplot

I would like to plot different rows as different lines in the same plot to illustrate the movements of the average development of 3 groups: All, Men and Women. However, I'm not getting one of the lines printed and the legend is not being filled with the rownames.
I'l be glad for a solution, either in matplot or in ggplot.
Thank you!
Code:
matplot(t(Market_Work), type = 'l', xaxt = 'n', xlab = "Time Period", ylab = "Average", main ="Market Work")
legend("right", legend = seq_len(nrow(Market_Work)), fill=seq_len(nrow(Market_Work)))
axis(1, at = 1:6, colnames(Market_Work))
Data:
2003-2005 2006-2008 2009-2010 2011-2013 2014-2016 2017-2018
All 31.48489 32.53664 30.41938 30.53870 31.15550 31.77960
Men 37.38654 38.16698 35.10247 35.65543 36.54855 36.72496
Women 31.48489 32.53664 30.41938 30.53870 31.15550 31.77960
> dput(Market_Work)
structure(list(`2003-2005` = c(31.4848853173555, 37.3865421137,
31.4848853173555), `2006-2008` = c(32.5366433161048, 38.1669798351148,
32.5366433161048), `2009-2010` = c(30.4193794808191, 35.1024661973137,
30.4193794808191), `2011-2013` = c(30.5387012166381, 35.6554329405739,
30.5387012166381), `2014-2016` = c(31.1555032381292, 36.5485451138792,
31.1555032381292), `2017-2018` = c(31.7795953402235, 36.7249638612854,
31.7795953402235)), row.names = c("All", "Men", "Women"), class = "data.frame")
Here is an example with ggplot2. I changed some of your data, as two rows were same in your originial data.
library(tidyverse)
df <- structure(list(`2003-2005` = c(31.4848853173555, 37.3865421137,
30.4848853173555), `2006-2008` = c(32.5366433161048, 38.1669798351148,
30.5366433161048), `2009-2010` = c(30.4193794808191, 35.1024661973137,
33.4193794808191), `2011-2013` = c(30.5387012166381, 35.6554329405739,
33.5387012166381), `2014-2016` = c(31.1555032381292, 36.5485451138792,
30.1555032381292), `2017-2018` = c(31.7795953402235, 36.7249638612854,
30.7795953402235)), row.names = c("All", "Men", "Women"), class = "data.frame")
df2 <- as.data.frame(t(df))
df2$Year <- rownames(df2)
df2%>% pivot_longer( c(All,Men,Women), names_to = "Category") %>%
ggplot(aes(x = Year, y = value)) + geom_line(aes(group = Category, color = Category))

How draw a loess line in ts plot

I tried hours to figure out how I can make my loess line work. The problem is I do not know much (lets say near nothing). I only have to use R for one course in university.
I created a fake table the real table is for download here
I have to make a timeline plot that worked surprisingly well. But now I have to add two loess lines with different spans. My Problem is I don't know how the command really works. I mean I know it should be something like loess(..~.., data=..). The step where I'm stuck is marked with "WHAT BELONGS HERE" in the given code below.
table <- structure(list(
Months = c("1980-06", "1980-07", "1980-08", "1980-09",
"1980-10", "1980-11", "1980-12", "1981-01"),
Total = c(75000, 70000, 60000, 73000, 72000, 71000, 76000, 71000)),
.Names = c("Monts", "Total of Killed Pigs"),
row.names = c(NA, 4L), class = "data.frame")
ts.obj <- ts(table$`Total of Killed Pigs`, start = c(1980, 1), frequency = 2)
plot(ts.obj)
trend1 <- loess(# **WHAT BELONGS HERE?**, data = table, span =1)
predict1 <- predict(trend1)
lines(predict1, col ="blue")
That is my original code:
obj <- read.csv(file="PATH/monthly-total-number-of-pigs-sla.csv", header=TRUE, sep=",")
ts.obj <- ts(obj$Monthly.total.number.of.pigs.slaughtered.in.Victoria..Jan.1980...August.1995, start = c(1980, 1), frequency = 12)
plot(ts.obj)
trend1 <- loess (WHAT BELONGS HERE?, data = obj, span =1)
predict1 <- predict (trend1)
lines(predict1, col="blue")
We can do away with the data argument as the time series is univariate (just one variable).
The formula ts.obj ~ index(ts.obj) can be read as
value as a function of time
as ts.obj will give you the values, and index(ts.obj) will give you the time index for those values, and the tilde ~ specifies that the first is a function of, or dependent on, the other.
library(zoo) # for index()
plot(ts.obj)
trend1 <- loess(ts.obj ~ index(ts.obj), span=1)
trend2 <- loess(ts.obj ~ index(ts.obj), span=2)
trend3 <- loess(ts.obj ~ index(ts.obj), span=3)
pred <- sapply(list(trend1, trend2, trend3), predict)
matlines(index(ts.obj), pred, lty=1, col=c("blue", "red", "orange"))
zoo isn't strictly required. If you replace index(ts.obj) with as.numeric(time(ts.obj)) you should be fine, I think.
In case you were wanting to go with ggplot2:
library(ggplot2)
library(dplyr)
table <- structure(list(
Months = c("1980-06", "1980-07", "1980-08", "1980-09",
"1980-10", "1980-11", "1980-12", "1981-01"),
Total = c(75000, 70000, 60000, 73000, 72000, 71000, 76000, 71000)),
.Names = c("Months", "Total"),
row.names = c(NA, 8L), class = "data.frame")
Change to proper dates:
table <- table %>% mutate(Months = as.Date(paste0(Months,"-01")))
Plot:
ggplot(table, aes(x=Months, y=Total)) +
geom_line() +
geom_smooth(span=1, se= FALSE, color ="red") +
geom_smooth(span=2, se= FALSE, color ="green") +
geom_smooth(span=3, se= FALSE) +
theme_minimal()

how to simply plot similar dates of different years in one plot

I have a dataframe with dates. Here are the first 3 rows with dput:
df.cv <- structure(list(ds = structure(c(1448064000, 1448150400, 1448236800
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), y = c(10.4885204292416,
10.456538985014, 10.4264986311659), yhat = c(10.4851491194439,
10.282089547027, 10.4354960430083), yhat_lower = c(10.4169914076864,
10.2162549984153, 10.368531352493), yhat_upper = c(10.5506038959764,
10.3556867861042, 10.5093092789713), cutoff = structure(c(1447977600,
1447977600, 1447977600), class = c("POSIXct", "POSIXt"), tzone = "UTC")),.Names = c("ds",
"y", "yhat", "yhat_lower", "yhat_upper", "cutoff"), row.names = c(NA,
-3L), class = c("`enter code here`tbl_df", "tbl", "data.frame"))
I'm trying to plot the data with ggplot + geom_line from similar day/month combinations in one plot. So, for example, I want the y-value of 2016-01-01 to appear on the same x-value as 2017-01-01. If found a way to do this, but it seems to be a very complex workaround:
library(tidyverse)
library(lubridate)
p <- df.cv %>%
mutate(jaar = as.factor(year(ds))) %>%
mutate(x = as_date(as.POSIXct(
ifelse(jaar==2016, ds + years(1), ds),
origin = "1970-01-01")))
ggplot(p %>% filter(jaar!=2015), aes(x=x, group=jaar, color=jaar)) +
geom_line(aes(y=y))
It works, but as you can see I first have to extract the year, then use an ifelse to add one year to only the 2016 dates, convert with POSIXct because ifelse strips the class, convert back into POSIXct while supplying an origin, and finally remove the timestamp with as_date.
Isn't there a simpler, more elegant way to do this?
Use year<- to replace the year with any fixed leap year:
p <- df.cv %>%
mutate(jaar = as.factor(year(ds)),
x = `year<-`(as_date(ds), 2000))
ggplot(p, aes(x = x, y = y, color = jaar)) +
geom_line()

First painful attempt to do a Spatial map

I am struggling to get my first map to work. I have read every document I could find but I am not able to pull it all together to view my data on a map.
This is what I have done so far.
1. I created a very basic data table with 3 observations and 5 variables as a very simple starting point.
str(Datawithlatlongnotvector)
'data.frame': 3 obs. of 5 variables:
$ Client: Factor w/ 3 levels "Jan","Piet","Susan": 2 1 3
$ Sales : int 100 1000 15000
$ Lat : num 26.2 33.9 23.9
$ Lon : num 28 18.4 29.4
$ Area : Factor w/ 3 levels "Gauteng","Limpopo",..: 1 3 2
(the Area is the provinces of South Africa and also is as per the SHP file that I downloaded, see below)
I downloaded a map of South Africa and placed all 3 files (.dbf, shp and shx) files in the same directory - previous error but I found the answer from another user's question. http://www.mapmakerdata.co.uk.s3-website-eu-west-1.amazonaws.com/library/stacks/Africa/South%20Africa/index.htm and selected Simple base map.
I created a map as follows :
SAMap <- readOGR(dsn = ".", layer = "SOU-level_1")
and I can plot the map of the country showing the provinces with plot(SAMap)
I can also plot the data
plot(datawithlatlong)
I saw the instructions how to make a SpatialPointsData frame and I did that :
coordinates(Datawithlatlong) = ~Lat + Lon
I do not know how to pull it all together and do the following :
Show the data (100,1000 and 15000) on the map with different colours i.e. between 1 and 500 is one colour, between 501 and 10 000 is one colour and above 10 000 is one colour.
Maybe trying ggplot2 with some function like:
map = ggplot(df, aes(long, lat, fill = Sales_cat)) + scale_fill_brewer(type = "seq", palette = "Oranges", name = "Sales") + geom_polygon()
With scale_fill_brewer you can represent scales in terms of colours on the map. You should create a factor variable that represents categories according to the range of sales ("Sales_cat"). In any case, the shape file must be transformed into a data.frame.
Try this for 'SAMap' as the country shapefile and 'datawithlatlong' as your data convereted to SpatialPointDataFrame:
library(maptools)
library(classInt)
library(RColorBrewer)
# Prepare colour pallete
plotclr <- brewer.pal(3,"PuRd")
class<-classIntervals(datawithlatlong#data$sales, n=3, style="fixed", fixedBreaks=c(0, 500,1000,10000)) # you can adjust the intervals here
colcode <- findColours(class, plotclr)
# Plot country map
plot(SAMap,xlim=c(16, 38.0), ylim=c(-46,-23))# plot your polygon shapefile with appropriate xlim and ylim (extent)
# Plot dataframe convereted to SPDF (in your step 5)
plot(datawithlatlong, col=colcode, add=T,pch=19)
# Creating the legend
legend(16.2, -42, legend=names(attr(colcode, "table")), fill=attr(colcode, "palette"), cex=0.6, bty="n") # adjust the x and y for fixing appropriate location for the legend
I generated a bigger dataset because I think with only 3 points it hard to see how things are working.
library(rgdal)
library(tmap)
library(ggmap)
library(randomNames)
#I downloaded the shapefile with the administrative area polygons
map <- readOGR(dsn = ".", layer = "SOU")
#the coordinate system is not part of the loaded object hence I added this information
proj4string(map) <- CRS("+init=epsg:4326")
# Some sample data with random client names and random region
ADM2 <- sample(map#data$ADM2, replace = TRUE, 50)
name <- randomNames(50)
sales <- sample(0:5000, 50)
clientData <- data.frame(id = 1:50, name, region = as.character(ADM2), sales,
stringsAsFactors = FALSE)
#In order to add the geoinformation for each client I used the awesome
#function `ggmap::geocode` which takes a character string as input an
#provides the lon and lat for the region, city ...
geoinfo <- geocode(clientData$region, messaging = FALSE)
# Use this information to build a Point layer
clientData_point <- SpatialPointsDataFrame(geoinfo, data = clientData)
proj4string(clientData_point) <- CRS("+init=epsg:4326")
Now the part I hope that answers the question:
# Adding all sales which occured in one region
# If there are 3 clients in one region, the sales of the three are
# summed up and returned in a new layer
sales_map <- aggregate(x = clientData_point[ ,4], by = map, FUN = sum)
# Building a map using the `tmap` package`
tm_shape(sales_map) + tm_polygons(col = "sales")
Edit:
Here is a ggplot2 solution because it seems you want to stick with it.
First, for ggplot you have to transform your SpatialPolygonDataFrame to an ordinary data.frame. Fortunately, broom::tidy() will do the job automatically.
Second, your Lat values are missing a -. I added it.
Third, I renamed your objects for less typing.
point_layer<- structure(list(Client = structure(c(2L, 1L, 3L),
.Label = c("Jan", "Piet", "Susan"),
class = "factor"),
Sales = c(100, 1000, 15000 ),
Lat = c(-26.2041, -33.9249, -23.8962),
Lon = c(28.0473, 18.4241, 29.4486),
Area = structure(c(1L, 3L, 2L),
.Label = c("Gauteng", "Limpopo", "Western Cape"),
class = "factor"),
Sale_range = structure(c(1L, 2L, 4L),
.Label = c("(1,500]", "(500,2e+03]", "(2e+03,5e+03]", "(5e+03,5e+04]"),
class = "factor")),
.Names = c("Client", "Sales", "Lat", "Lon", "Area", "Sale_range"),
row.names = c(NA, -3L), class = "data.frame")
point_layer$Sale_range <- cut(point_layer$Sales, c(1,500.0,2000.0,5000.0,50000.0 ))
library(broom)
library(ggplot2)
ggplot_map <- tidy(map)
ggplot() + geom_polygon(ggplot_map, mapping = aes(x = long, y = lat, group = group),
fill = "grey65", color = "black") +
geom_point(point_layer, mapping = aes(x = Lon, y = Lat, col = Sale_range)) +
scale_colour_brewer(type = "seq", palette = "Oranges", direction = 1)

Resources