Missing times in r, how to subset by timeframe? - r

I have two summer's worth of hourly precip and discharge data collected from Durango data stations from 2012-2013. For this research, I am analyzing how each precip event impacts river discharge on an hourly basis. The discharge data has data every 15 minutes, every hour, every day no matter what the weather. The precip data only has times for hours that had rain. Here are two graphs I made of the first few precip events I have:
#after loading in my .CSVs 'animas' and 'durango':
disc1 <- animas[c(8700:9000), c(3,5)]
prec1 <- durango[c(3:11),c(6:7)]
ggplot(data = disc1, aes(x=datetime, discharge))+geom_point()+theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplot(data = prec1, aes(x=DATE, HPCP))+ geom_point()+ theme(axis.text.x = element_text(angle = 45, hjust = 1))
discharge, all hours get plotted
Precipitation, missing hours as zeros
The way that precipitation is plotting with missing hours is unacceptable for my objective. I need to somehow generate these missing hours and fill the empty precipitation("HPCP") values with zeros, so I can plot it on the same time scale as discharge.
Also, is there a way to separate this data into individual precipitation events, excluding events that total less than 0.05 inches? (As opposed to setting all the time bounds for hundreds of precipitation events by hand). I need to generate sets of hours that a precipitation event occurred and add the discharge values for those hours. Will be plotting these eventually, as well as taking the difference in time between peak rainfall an peak discharge. What data structure should I use, and how?
This seems difficult because zeros between precipitation hours not present in all cases; for example, two rain events from different dates could be in adjacent rows, one after the other. How can I sort it fast? Can a tail be added to include points from 6 hours before and after the start/end time?
I have messed around with the .csv to obtain two possible date/time configurations (HPCP in this file is precip). Which is better for convenience and plotting with ggplots?
All the hours with 0's in HPCP are measurement hours with the flag of 'F', which means a trace amount of precipitation was detected. These are too insignificant for my analysis.
Thank you in advance.

Related

Problem running asdetect package in r with original age data

I am trying to run the asdetect package on my time series data. The age data for my time series data starts at -65 years and goes to 13800 years. The dt function by default counts number of rows instead of the actual age. When I customize the interval between tick marks on the x-axis (dt) based on the number of data points divided by the maximum age, it causes the plot to start at 0, which is not accurate. Is there a way to change the x-axis labeling from a time-based scale to the actual age data?
This is what I tried:
detect1 <- asdetect::as_detect(charcoal$influx_area_mm2xcm2xyr, dt=0.5)
or
detect1 <- asdetect::as_detect(charcoal$influx_area_mm2xcm2xyr, dt=38.4)
plot(detect1, type="l", xlab='Time', ylab='Detection Value', ylim=c())

Separating time series of boxplots

I need help to create a time series of boxplots in R. On three separate graphs would work.
I have data which is separated by Treatment (A to D), Wetland (Ann,... Twin (total 9)), and by Time (10, 20, 90).
Example: data (Chl_a_ug_L) is from Wetland Ann, Treatment B and at Time 90.
I am trying to graph the data (Chl_a_ug_L) on the y axis and treatment on the x axis so that each Treatment has 9 boxplots with all locations.
The issue is that the data (Chl_a_ug_L) also needs to be sorted by date collected. I need to separate out the data collected on Time 10, Time 20 and Time 90 to create three separate graphs.
I have:
ggplot(AlgaeData, aes(x=Treatment, y=Chl_a_ug_L, fill=Wetland)) +
geom_boxplot()
This creates the graph I need but groups all Time data into one graph, instead of separating it.

Plot every year as line with months on Xaxis and variable on Y-axis from NetCDF

I have netcdf data with lat,lon,time as dimensions and temperature temp as variable. It has daily temperature data for 10 years.
For single location I can plot time series. But how to plot for every year, Year as hue and Months on Xaxis and temp on Y axis. So i want 10 lines as 10 years on my graph. Every line is an year which represents 12 monthly means or daily data. example is here.
And if possible please tell how to add mean and median of all the years as seperate line among these 10 yearly line plots. example picture image example
I'm tempted to agree with the comment that it would be good to show a little more effort in terms of what you've tried. It would also be good to mention what you've read (in e.g. the xarray documentation: https://xarray.pydata.org/en/stable/), which I believe has many of the components you need.
I'll start by setting up some mock data, like you mention, with four years of daily (random) data.
time = pd.date_range("2000-01-01", "2004-12-31")
base = xr.DataArray(
data=np.ones((time.size, 3, 2)),
dims=("time", "lat", "lon"),
coords={
"time": time,
"lat": [1, 2, 3],
"lon": [0.5, 1.5],
},
)
To make the data a bit more comparable with your example, I'm going to add yearly seasonality (based on day of year), and make every year increase by 0.1.
seasonality = xr.DataArray(
data=np.sin((time.dayofyear / 365.0) * (2 * np.pi)),
coords={"time": time},
dims=["time"],
)
trend = xr.DataArray(
data=(time.year - 2000) * 0.1,
coords={"time": time},
dims=["time"],
)
da = base + seasonality + trend
(You can obviously skip these two parts, in your case, you'd only do an xarray.open_dataset() or xarray.open_dataarray`)
I don't think your example is grouped by month: it's too smooth. So I'm going to group by day of year instead.
Let's start by getting a single locations, then using the dt accessor:
https://xarray.pydata.org/en/stable/time-series.html#datetime-components
In this case, it's also most convenient to store the data as a DataFrame, since it essentially becomes a table (month of dayofyear as the rows, separate years etc as columns). First we select one location, and calculate the minimum and maximum values and store them in a pandas DataFrame:
location = da.isel(lat=0, lon=0)
dataframe = location.groupby(da["time"].dt.dayofyear).min().drop(["lat", "lon"]).to_dataframe(name="min")
dataframe["max"] = location.groupby(da["time"].dt.dayofyear).max().values
Next, grab the year by year data, and add it to the DataFrame:
for year, yearda in location.groupby(location["time"].dt.year):
dataframe[year] = pd.Series(index=yearda["time"].dt.dayofyear, data=yearda.values)
If you want monthly values, add another groupby step:
for year, yearda in location.groupby(location["time"].dt.year):
monthly_mean = yearda.groupby(yearda["time"].dt.month).mean()
dataframe[year] = pd.Series(index=monthly_mean["month"], data=monthly_mean.values)
Note that by turning the data into a pandas Series first, it can add the values appriopriately, based on the values of the index (dayofyear here), even though we don't have 366 values for every year.
Next, plot it:
dataframe.plot()
It will automatically assign hue based on the columns.
(My minimum and maximum coincide with 2000 and 2004 due to the way I setup the mock data, ... you get the idea.)
In terms of styling, options, etc., you might like seaborn better:
https://seaborn.pydata.org/index.html
import seaborn as sns
sns.plot(data=dataframe)
If you want to use different styling, different kind of plots (e.g. the colored zones your example has), you'll have to combine different plot, e.g. as follows:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.fill_between(x=dataframe.index, y1=dataframe["min"], y2=dataframe["max"], alpha=0.5, color="orange")
dataframe.plot(ax=ax)
Note that seaborn, pandas, xarray, etc. all use matplotlib behind the scenes. Many of the plotting functions also accept an ax argument, to draw on top of an existing plot.

How to create a daily time series with monthly cycling patterns

I have a series of data for daily sales amount from 1/1/2018 to 10/15/2018, the example is shown as follows. It is already observed there are some monthly cycling patterns on the sales amount, say there is always a sales peak at the end of each month, and slight fluctuations in the amount in the middle of the month. Also, in general the sales in June, July and August is higher than that in other month. Now I need to predict the sales amount for the 10 days after 10/15/2018. I'm new to time series and ARIMA. Here I have two questions:
1. How to create such a daily time series and plot it with the date?
2. How can I set the cycle(or frequency) to show the monthly cycling pattern?
Date SalesAmount
1/1/2018 31,380.31
1/2/2018 384,418.10
1/3/2018 1,268,633.28
1/4/2018 1,197,742.76
1/5/2018 417,143.36
1/6/2018 693,172.65
1/8/2018 840,384.76
1/9/2018 1,955,909.69
1/10/2018 1,619,242.52
1/11/2018 2,267,017.06
1/12/2018 2,198,519.36
1/13/2018 584,448.06
1/15/2018 1,123,662.63
1/16/2018 2,010,443.35
1/17/2018 958,514.85
1/18/2018 2,190,741.31
1/19/2018 811,623.08
1/20/2018 2,016,031.26
1/21/2018 146,946.29
1/22/2018 1,946,640.57
As there isn't a reproducible example provided in the question, here's one that may help you visualize your data better.
Using the dataset: economics and library ggplot2, you can easily plot a timeseries.
library(ggplot2)
theme_set(theme_minimal())
# Basic line plot
ggplot(data = economics, aes(x = date, y = pop))+
geom_line(color = "#00AFBB", size = 2)
For your question, you just need to pass in x=Date and y=SalesAmount to obtain the plot below. To your 2nd question on predicting sales amount with timeseries, you can check out this question over here: Time series prediction using R
The first thing that you need before any kind of forecasting is to detect if you have any kind of seasonality. I recommend you to add more data as it is complex to determine if you have a repeated pattern with so few. Anyway you can try to determine the seasonality as follows:
library(readr)
test <- read_table2("C:/Users/Z003WNWH/Desktop/test.txt",
col_types = cols(Date = col_date(format = "%m/%d/%Y"),
SalesAmount = col_number()))
p<-periodogram(test$SalesAmount)
topF = data.table(freq=p$freq, spec=p$spec) %>% arrange(desc(spec))
1/topF
When you will add more data you can try to use ggseasonplot to visualize the different seasons.

Plotting multiple frequency polygon lines using ggplot2

I have a dataset with records that have two variables: "time" which are id's of decades, and "latitude" which are geographic latitudes. I have 7 time periods (numbered from 26 to 32).
I want to visualize a potential shift in latitude through time. So what I need ggplot2 to do, is to plot a graph with latitude on the x-axis and the count of records at a certain latitude on the y-axis. I need it do this for the seperate time periods and plot everything in 1 graph.
I understood that I need the function freqpoly from ggplot2, and I got this so far:
qplot(latitude, data = lat_data, geom = "freqpoly", binwidth = 0.25)
This gives me the correct graph of the data, ignoring the time. But how can I implement the time? I tried subsetting the data, but I can't really figure out if this is the best way..
So basically I'm trying to get a graph with 7 lines showing the frequency distribution in each decade in order to look for a latitude shift.
Thanks!!
Without sample data it is hard to answer but try to add color=factor(time) (where time is name of your column with time periods). This will draw lines for each time period in different color.
qplot(latitude, data = lat_data, geom = "freqpoly", binwidth = 0.25,
color=factor(time))

Resources