R native time series: date data - r

There are R native datasets, such as the Nile dataset, that are time series. However, if I actually look at the data set, be it as it was, after as_tibble(), after as.data.frame() – it doesn't matter –, there is only one column: x (which, in this specific case, is the "measurement of anual flow of the river"). However, if I plot() the data, in any of the three formats (raw, tibble or data.frame), I plots with the dates:
(Technically, the x axis label changes, but that's not the point).
Where are these dates stored? How can I access them (to use ggplot(), for example), or even – how can I see them?

If you use str(Nile) or print(Nile), you'll see that the Nile data set is store in a Time-Series object. You can use the start(), end() and frequency() functions to extract those attribute then create a new column to store those informations.
data(Nile)
new_df = data.frame(Nile)
new_df$Time = seq(from = start(Nile)[[1]], to = end(Nile)[[1]], by = frequency(Nile))

Related

Download forecast values from tsibble (Fable)

I'm sure this is a simple question, but relatively new here. I'm trying to extract the forecasted values in a CSV/table I can use outside of R. I followed along with the multiple series example from here: https://www.mitchelloharawild.com/blog/fable/ . I'm trying to extract the 2 years forecasted data that's completed in this step:
fit %>%
forecast(h = "2 years") %>%
autoplot(tourism_state, level = NULL)
I can see the 3 models in the autoplot, but can't figure out how to get the forecasted values from the Fit tsibble. Any help is appreciated. It looks like there's quite a bit of information that can be genreated (forecast intervals, etc.), so if there's somewhere I can reference on how to parse through what all can be downloaded and how please let me know. Thanks!
The forecasted values of a fable can be saved to a csv using readr::write_csv().
When used with columns that are not in a flat format (such as forecast distributions or intervals), the values will be stored as character strings and information will be lost. Before writing to a file, you should flatten these structures by extracting their components into separate columns.
You can use unpack_hilo() to extract the lower, upper, and level values within a <hilo> to create a flat data structure. Alternatively you can access the components of a <hilo> with $, for example: my_interval$lower.

How to get dates on the xaxis of my Arima forecast plot rather than just numbers

I have imported a netCDF file into R and created a dataset which has 58196 time stamps. I’ve then fitted an Arima model to it and forecasted. However, the format of the time is ‘hours since 1900-01-01 00:00:00’. Each of the times are just in a numerical order up to 58196, but I would like to use ggplot to plot the forecast with dates on the xaxis.
Any ideas? Here is some code I have put in.
I have read in the required variable and taken it along what pressure level I want, so that it is a single variable at 58169 times, 6hourly intervals up to the end of the year in 2018. I have then done the following:
data <- data_array[13, ] # To get my univariate time series.
print(data)
[58176] -6.537371e-01 -4.765177e-01 -4.226107e-01 -4.303621e-01
-3.519134e-01
[58181] -2.706966e-01 -1.864843e-01 -9.974014e-02 2.970415e-02
6.640909e-02
[58186] -1.504763e-01 -3.968417e-01 -4.864971e-01 -5.934973e-01
-7.059880e-01
[58191] -7.812654e-01 -7.622807e-01 -8.968482e-01 -9.414597e-01
-1.003678e+00
[58196] -9.908477e-01
datafit <- auto.arima(data)
datamodel <- Arima(data, order = c(5, 0, 2))
datafcst <- forecast(datamodel, h=60, level=95)
plot(datafcst, xlim=c(58100, 58250))
enter image description here
I have attached the image it yields too. The idea is that I can use ggplot to plot this rather than the standard plot, with dates on the xaxis instead of the numerical values. However, ggplot also won't work for me as it says it isn't considered a data frame?
Many thanks!
as you did not provide a minimal example it is hard to help you but I try. Assume your date is called "date".
dater = as.Date(strptime(date, "%Y-%m-%d"))
And from ?strptime:
format
A character string. The default for the format methods is "%Y-%m-%d %H:%M:%S" if any element has a time component which is not midnight, and "%Y-%m-%d" otherwise.
Hope that helps

Choropleth Plotting polygons with ggplot2 R on a map

I realise this has been asked about 100 times prior, but none of the answers I've read so far on SO seem to fit my problem.
I have data. I have the lat and lon values. I've read around about something called sp and made a bunch of shape objects in a dataframe. I have matched this dataframe with the variable I am interested in mapping.
I cannot for the life of me figure out how the hell to get ggplot2 to draw polygons. Sometimes it wants explicit x,y values (which are a PART of the shape anyway, so seems redundant), or some other shape files externally which I don't actually have. Short of colouring it in with highlighters, I'm at a loss.
if I take an individual sps object (built with the following function after importing, cleaning, and wrangling a shitload of data)
createShape = function(sub){
#This funciton takes the list of lat/lng values and returns a SHAPE which should be plottable on ggmap/ggplot
tempData = as.data.frame(do.call(rbind, as.list(VICshapes[which(VICshapes$Suburb==sub),] %>% select(coords))[[1]][[1]]))
names(tempData) = c('lat', 'lng')
p = Polygon(tempData)
ps = Polygons(list(p),1)
sps = SpatialPolygons(list(ps))
return(sps)
}
These shapes are then stored in the same dataframe as my data - which only this afternoon for some reason, I can't even look at, as trying to look at it yields the following error.
head(plotdata)
Error in prettyNum(.Internal(format(x, trim, digits, nsmall, width, 3L, : first argument must be atomic
I realise I'm really annoyed at this now, but I've about 70% of a grade riding on this, and my university has nobody capable of assisting.
I have pasted the first few rows of data here - https://pastebin.com/vFqy5m5U - apparently you can't print data with an s4 object - the shape file that I"m trying to plot.
Anyway. I'm trying to plot each of those shapes onto a map. Polygons want an x,y value. I don't have ANY OTHER SHAPE FILES. I created them based on a giant list of lat and long values, and the code chunk above. I'm genuinely at a loss here and don't know what question to even ask. I have the variable of interest based on locality, and the shape for each locality. What am I missing?
edit: I've pasted the summary data (BEFORE making them into shapes) here. It's a massive list of lat/lng values for EACH tile/area, so it's pretty big...
Answered on gis.stackexchange.com (link not provided).

Creating SpatialLinesDataFrame from SpatialLines object and basic df

Using leaflet, I'm trying to plot some lines and set their color based on a 'speed' variable. My data start at an encoded polyline level (i.e. a series of lat/long points, encoded as an alphanumeric string) with a single speed value for each EPL.
I'm able to decode the polylines to get lat/long series of (thanks to Max, here) and I'm able to create segments from those series of points and format them as a SpatialLines object (thanks to Kyle Walker, here).
My problem: I can plot the lines properly using leaflet, but I can't join the SpatialLines object to the base data to create a SpatialLinesDataFrame, and so I can't code the line color based on the speed var. I suspect the issue is that the IDs I'm assigning SL segments aren't matching to those present in the base df.
The objects I've tried to join, with SpatialLinesDataFrame():
"sl_object", a SpatialLines object with ~140 observations, one for each segment; I'm using Kyle's code, linked above, with one key change - instead of creating an arbitrary iterative ID value for each segment, I'm pulling the associated ID from my base data. (Or at least I'm trying to.) So, I've replaced:
id <- paste0("line", as.character(p))
with
lguy <- data.frame(paths[[p]][1])
id <- unique(lguy[,1])
"speed_object", a df with ~140 observations of a single speed var and row.names set to the same id var that I thought I created in the SL object above. (The number of observations will never exceed but may be smaller than the number of segments in the SL object.)
My joining code:
splndf <- SpatialLinesDataFrame(sl = sl_object, data = speed_object)
And the result:
row.names of data and Lines IDs do not match
Thanks, all. I'm posting this in part because I've seen some similar questions - including some referring specifically to changing the ID output of Kyle's great tool - and haven't been able to find a good answer.
EDIT: Including data samples.
From sl_obj, a single segment:
print(sl_obj)
Slot "ID":
[1] "4763655"
[[151]]
An object of class "Lines"
Slot "Lines":
[[1]]
An object of class "Line"
Slot "coords":
lon lat
1955 -74.05228 40.60397
1956 -74.05021 40.60465
1957 -74.04182 40.60737
1958 -74.03997 40.60795
1959 -74.03919 40.60821
And the corresponding record from speed_obj:
row.names speed
... ...
4763657 44.74
4763655 34.8 # this one matches the ID above
4616250 57.79
... ...
To get rid of this error message, either make the row.names of data and Lines IDs match by preparing sl_object and/or speed_object, or, in case you are certain that they should be matched in the order they appear, use
splndf <- SpatialLinesDataFrame(sl = sl_object, data = speed_object, match.ID = FALSE)
This is documented in ?SpatialLinesDataFrame.
All right, I figured it out. The error wasn't liking the fact that my speed_obj wasn't the same length as my sl_obj, as mentioned here. ("data =
object of class data.frame; the number of rows in data should equal the number of Lines elements in sl)
Resolution: used a quick loop to pull out all of the unique lines IDs, then performed a left join against that list of uniques to create an exhaustive speed_obj (with NAs, which seem to be OK).
ids <- data.frame()
for (i in (1:length(sl_obj))) {
id <- data.frame(sl_obj#lines[[i]]#ID)
ids <- rbind(ids, id)
}
colnames(ids)[1] <- "linkId"
speed_full <- join(ids, speed_obj)
speed_full_short <- data.frame(speed_obj[,c(-1)])
row.names(speed_full_short) <- speed_full$linkId
splndf <- SpatialLinesDataFrame(sl_obj, data = speed_full_short, match.ID = T)
Works fine now!
I may have deciphered the issue.
When I am pulling in my spatial lines data and I check the class it reads as
"Spatial Lines Data Frame" even though I know it's a simple linear shapefile, I'm using readOGR to bring the data in and I believe this is where the conversion is occurring. With that in mind the speed assignment is relatively easy.
sl_object$speed <- speed_object[ match( sl_object$ID , row.names( speed_object ) ) , "speed" ]
This should do the trick, as I'm willing to bet your class(sl_object) is "Spatial Lines Data Frame".
EDIT: I had received the same error as OP, driving me to check class()
I am under the impression that the error that was populated for you is because you were trying to coerce a data frame into a data frame and R wasn't a fan of that.

R - How to work with and plot time data

I have the following data (of which the following is a small sample):
times <- c("02:45:00", "02:45:07", "02:45:10", "02:45:20", "02:45:25", "02:45:27", "02:45:27", "02:45:30", "02:45:32", "02:45:37")
I would like to plot these times and be able to have them be in a time variable format if possible. In the graph, I want to be able to have different time bands in order to create a histogram of the different distribution of times.
You could look into strptime to get familiar with the base time format.
Then, something like this might get you started:
hist(strptime(times,"%H:%M:%S"), "secs", freq = TRUE, xlab="seconds")

Resources