How to handle a large collection of time series in R? - r

I have data that represents about 50,000 different 2-year monthly time series. What would be the most convenient and tidyverse-ish way to store that in R? I'll be using R to review each series, trying to extract characteristic features of their shapes.
Somehow a data frame with 50,000 rows and 24 columns (plus a few more for meta data) seems awkward, because the time axis is in the columns. But what else should I be using? A list of xts objects? A data frame with 50,000x24 rows? A three-dimensional matrix? I'm not really seeing anything obviously convenient, and my friend google hasn't found any great examples for me either. I imagine this means I'm overlooking the obvious solution, so maybe someone can suggest it. Any help?

Related

(r) turning data (DNAbin) into a matrix

I am trying to run stamppFst() and stamppConvert() with haplotype data. The data I have is a squence of nucleotides in a DNAbin. I have tried to find ways to turn it into a matrix but what I have read goes way over my head since this is the first time I have ever used R.
data
This is an example of one of the data sets I want to use.
I apologize if this is a very basic question. Thanks for any help!

Forecasting unevenly spaced time series data

I am fairly new to this field and I would like to get some help/advices. Any help would be much appreciated!
I am currently doing a forecasting project with time series data. However, it does not contain any weekend/holiday data. My goal is to predict the future value on a specific date. For example, with given data from 2000-present, I would like to predict the value of 2023-05-01. I tried creating some plots and use the zoo package. However, I am unsure how to approach this unevenly spaced data. Can someone provide me with some ideas of what model I should try? Btw, I am using R for this project. Thank you all so much!
I would agree with #jhoward that this is missing data, not unevenly spaced (like timestamped data). So you can interpolate the missing data. Maybe this helps for an overview of the possible techniques: 4-techniques-to-handle-missing-values-in-time-series-data

Subsampling very long time series in R with the goal to display it

Background: I have a very long vector (think many millions of rows) that I cannot display easily, as the data is simply too large. The data is time-series - it exhibits temporal dependency.
My goal is to somehow visualize a part (or parts) of it that is representative enough (i.e. not just the first 10k rows or so)
Normally, if the data were iid and I wanted to display a part of it, I would just do resampling with replacement.
Question Since the data is time series, I was thinking of using "block resampling" (I don't know if this is a real term, I was thinking more of block bootstrap but without actually computing any statistics). Does anybody have a good idea (or even packages) of how I can achieve what I am looking for in a clever way?

Is there a function to detect individual outliers in longitudinal data in R?

I have a dataset of 5,000 records and each of those records consists of a series of continuous measurements collected over a decade at various times. Each of the measurements was originally entered by manually and, as might be expected, there are a number of errors that need to be corrected.
Typically the incorrect data change by >50% from point to point, while data that is correct changes at most by 10% at any one time. If I visualize the data individually, these are very obvious in an X/Y plot with time on the X-axis.
It's not feasible to graph each of these individually, and I'm trying to figure out if there's a faster way to automate and flag the data that are obviously in error and need to be corrected/removed.
Does anyone have any experience with a problem like this?

What is the best way to store anciliary data with a 2D timeseries object in R?

I currently try to move from matlab to R.
I have 2D measurements, consisting of irradiance in time and wavelength together with quality flags and uncertainty and error estimates.
In Matlab I extended the timeseries object to store both the wavelength array and the auxiliary data.
What is the best way in R to store this data?
Ideally I would like this data to be stored together such that e.g. window(...) keeps all data synchronized.
So far I had a look at the different timeseries classes like ts, zoo etc and some spatial-time series. However none of them allow me to neither attach auxiliary data to observations nor can they give me a secondary axes.
Not totally sure what you want, but here is a simple tutorial mentioning
R's "ts" and"zoo" time series classes:
http://faculty.washington.edu/ezivot/econ424/Working%20with%20Time%20Series%20Data%20in%20R.pdf
and here is a more comprehensive outline of many more classes(see the Time Series Classes section)
http://cran.r-project.org/web/views/TimeSeries.html

Resources