Efficient way to compile NC4 file information from separate files in R - r

I am currently trying to compile temperature information from the WDFE5 Data set which is quite large in size and am struggling to find an efficient way to meet my goal. My main goals are to:
Determine the max temperature for individual days for each individual grid cell
Change the time step from hourly to daily and from UTC to MST.
The data set is stored in monthly NC4 files and contains the temperature data in a 3 dimensional matrix (time lat lon). My main question is if there is a efficient way to compile this data to meet my goals or to manipulate the NC4 files to be easier to play around with (Somehow merge the monthly files into one mega file?)
I have tried two rather convoluted ways to catch holes between months (Example : due to the time conversion, some dates end up spanning between two months, which requires me to read in the next file and then continuing to read the data).
My first try was to individually read 1 month / file at a time, using pmax() to get the max value of each grid cell, and comparing time steps for 24 hours, and then repeating the process. I have been using
ncvar_get() with start and count to only read one time step at a time. To catch days that span two months, I was able to create a convoluted function to merge the two, by calculating the number of 1 hour periods left in one month, and how much would be needed from the next.
My second try still involved pmax(), but I tried a different method to fill in any holes between months. I set a date vector from the time variable to each hour time step, and match by same day. While this seems better, it still has to read multiple NC4 files which gets very convoluting compared to being able to just reading one NC4 file with all the needed information.
In the end, I tested a few test cases and both seem to solutions seem to work, but run extremely slow and seem very overcomplicated to me. I was wondering if anyone had suggestions on how to better set up the NC4 files for reading and time conversion.

Related

Creating a Time Series with Half Hourly Data in R

This is my first time ever asking a question on Stack Overflow and I'm a programming novice so any advice as to how to improve my question asking abilities would be appreciated.
Onto my question: I have two csv files, one containing three columns (date time in dd/mm/yyyy hh:(00 or 30) format, production of a certain product, and demand for said product), and the other containing several columns (decomposition of the date time into year, month, day, hour, and whether it is :00 or :30 represented by 1 or 2 respectively, alongside several columns for independent variables which may affect production/demand of said product).
I've only played around with the first csv file, converting the string into a datetime object but the ts() function won't recognise the datetime objects as my times. I've tried adjusting the frequency parameter but ultimately failed and have no idea how to create a time series using half hourly data. Would appreciate any help.
Thanks in advance!
My suggestion is to apply the "difftime" over all your time data. For instance, like following code, you can use your initial time (the time of first record) for all comparisons as time_start and the others as time_finish. Then it return the time intervals as number of seconds and then you are ready to use other column values as the value of the time stamps.
interval=as.integer(difftime(strptime(time_finish,"%H:%M"),strptime(time_start,"%H:%M"),units = "sec"))
Second 0 10 15 ....

Extracting from data frame at specific time intervals ....index or posixct?

I currently have a dataframe which includes a running timeline of POSIXct (see below).
Basically given a starting time of my choosing I want to be able to take rows at specific time interval. E.g. Say I start taking rows at four pm I then want to take 9 minutes, then not take anything for the next two, then nine again. I'm guessing the best approach is possibly using indexing but I also thought something like the lubridae package could be used but not sure how to exactly do it.
Thanks!

Generating "Hovmöller" style diagram from dataset with gaps in R

What I have is data in a tab delimited txt file in the following format (http://pastebin.com/XN3y9Wek):
Date Time Flow (L/h)
...
6/10/15 05:19:05 -0.175148624605041
6/10/15 05:34:05 -0.170297042615798
...
7/10/15 07:34:08 -0.033833540932291
7/10/15 07:49:08 -0.0256913011453011
...
The data currently ranges from 6/10/15 till 22/11/15. Measurements occur approximately every 15 minutes, but sometimes there is data loss which means that there are not the same amount of data points for every day. There are also periods where there is a larger gap (for example evening 16/11 -> morning 17/11) due to logger malfunction.
From this data I would like to create a similar figure like this one, as it offers a very nice seasonal representation of a large amount of data (my full dataset spans over several years):
Its similar to the style of a Hovmöller diagram. I have tried experimenting with R and the lattice package, but I struggle with the data gaps I have in my datasets and the irregular data points per day.
Any help you can offer me, an R beginner, would be greatly appreciated!
(If it would be possible in PHP or Javascript, feel free to post this as well)

Can I make a time series work with date objects rather than integers?

I have time series data that I'm trying to analyse in R. It was provided as a CSV from excel, which I subsequently read as a data.frame all. Let's say it has two columns: all$date and all$people, representing the count of people on a particular date. The frequency is hence daily.
Being from Excel, the dates are integers representing the number of days since 1900-01-01.
I could read the data as people = ts(all$people, start=c(all$date[1], 1), frequency=365); but that gives a silly start value of almost 40000 because the data starts in 2006. The start parameter doesn't take a date object, according to ?ts, so I can't just use as.Date():
ts - ...
start: the time of the first observation. Either a single number
or a vector of two integers, which specify a natural time unit and
a (1-based) number of samples into the time unit. See the examples
for the use of the second form.
I could of course set start=1, but it's a bit painful to figure out what season we're in when the plot tells me interesting things are happening around day 2100. (To be clear, setting frequency=365 does tell me what year we're in, but isn't useful more precise dates). Is there a useful way of expressing the date in ts in a human-readable form so that I don't have to keep calling as.Date() to understand when the interesting features are happening?

Plotting hundreds of hours of data with gnuplot

I am trying to plot data from a simulation that tracks simulation time in (hours):(minutes):(seconds) format, but does not turn (hours) into days - so (hours) can be in the hundreds. When gnuplot plots data by time, however ("set xdata time"), it only plots up to 99 hours in one continuous plot; after that, it loops back around and starts overplotting hour 100+ near the beginning (and even then, does weird stuff). Does anyone know why this happens and/or how to get around it?
I also looked into reading the components of the time column (which is the 3rd field of data on each line, but not necessarily a fixed number of characters into the line) in as 3 simple numbers (integers), then converting to a real number, which happens to be a decimal version of the time (e.g., 107:45:00 -> 107.75), which would be fine for the plot, but I haven't been able to figure out how to get gnuplot to do that, either.
Any other ideas are welcome. (I would rather not alter the original file, due to the additional complexity of multiple versions of each file, having to teach others how to convert the file and how to figure out the plot didn't work because they didn't convert the file, etc.)
Version 2 of MathGL (GPL plotting library) have time ticks which can be set as you want (using standard strftime() format). However it is in beta version now -- stable version should appear at October 2011.

Resources