R - How to work with and plot time data - r

I have the following data (of which the following is a small sample):
times <- c("02:45:00", "02:45:07", "02:45:10", "02:45:20", "02:45:25", "02:45:27", "02:45:27", "02:45:30", "02:45:32", "02:45:37")
I would like to plot these times and be able to have them be in a time variable format if possible. In the graph, I want to be able to have different time bands in order to create a histogram of the different distribution of times.

You could look into strptime to get familiar with the base time format.
Then, something like this might get you started:
hist(strptime(times,"%H:%M:%S"), "secs", freq = TRUE, xlab="seconds")

Related

Create Venn Diagram from two DF

I'm trying to create a Venn diagram of two data frames, but am only able receive incorrect results. An example of the data sets of the same structure:
Chemical
ChemID
Oxidopamine
D016627
Melatonin
D016627
I've only received incorrect results from the following:
VennDiagram::venn.diagram(
x = list(Lewy, Park),
category.names = c("ChemID, ChemID"),
filename ="venndiagramm.png",
output=TRUE)
Ideally, I would like to export an image of number of overlapping chemicals between the two sets.
Welcome to SO! As far as I guess your data structure (two dataframes Lewy and Park, each with the column ChemID), try the following:
VennDiagram::venn.diagram(
x = list(Lewy$ChemID, Park$ChemID), # expects vectors, not dataframes
# category.names = c("ChemID, ChemID"), # see if these are rather to construct nice labels
filename ="venndiagramm.png",
output=TRUE)
You may increase the chance of a useful answer by providing minimal working data samples by dput(). Of course you can use simulated data. Try to explain what exactly did not work.
See also ? venn.diagram

How to optimally organize data analysis with different Date granularity

I am analyzing data for a client and I am building a Shiny app to showcase my graphs.
The original time granularity of this data is very granular as it is raw data and it is in UTC to the second. ie: 2019-12-16 17:07:40
I started my analysis at a weekly granularity using the cut function:
Signup_Table_Weekly <- User_Info[ , c("userID", "Signup_Date", "Device")]
Signup_Table_Weekly$Signup_Date <- cut(as.Date(Signup_Table$Signup_Date), "week")
I have the feeling this is not optimized at all because I now want to analyze the same data at a monthly granularity and I find myself having to replicate the same chunk of code and modify the parameter "week" in cut to "month", ending up with multiple dataframes:
Signup_Table_Monthly <- User_Info[ , c("userID", "Signup_Date", "Device")]
Signup_Table_Monthly$Signup_Date <- cut(as.Date(Signup_Table$Signup_Date), "month")
Any advice on the proper method? Should I do all my analysis with very granular raw dates and only aggregate in my ggplot X axis graph for the Date ?

R native time series: date data

There are R native datasets, such as the Nile dataset, that are time series. However, if I actually look at the data set, be it as it was, after as_tibble(), after as.data.frame() – it doesn't matter –, there is only one column: x (which, in this specific case, is the "measurement of anual flow of the river"). However, if I plot() the data, in any of the three formats (raw, tibble or data.frame), I plots with the dates:
(Technically, the x axis label changes, but that's not the point).
Where are these dates stored? How can I access them (to use ggplot(), for example), or even – how can I see them?
If you use str(Nile) or print(Nile), you'll see that the Nile data set is store in a Time-Series object. You can use the start(), end() and frequency() functions to extract those attribute then create a new column to store those informations.
data(Nile)
new_df = data.frame(Nile)
new_df$Time = seq(from = start(Nile)[[1]], to = end(Nile)[[1]], by = frequency(Nile))

How to get dates on the xaxis of my Arima forecast plot rather than just numbers

I have imported a netCDF file into R and created a dataset which has 58196 time stamps. I’ve then fitted an Arima model to it and forecasted. However, the format of the time is ‘hours since 1900-01-01 00:00:00’. Each of the times are just in a numerical order up to 58196, but I would like to use ggplot to plot the forecast with dates on the xaxis.
Any ideas? Here is some code I have put in.
I have read in the required variable and taken it along what pressure level I want, so that it is a single variable at 58169 times, 6hourly intervals up to the end of the year in 2018. I have then done the following:
data <- data_array[13, ] # To get my univariate time series.
print(data)
[58176] -6.537371e-01 -4.765177e-01 -4.226107e-01 -4.303621e-01
-3.519134e-01
[58181] -2.706966e-01 -1.864843e-01 -9.974014e-02 2.970415e-02
6.640909e-02
[58186] -1.504763e-01 -3.968417e-01 -4.864971e-01 -5.934973e-01
-7.059880e-01
[58191] -7.812654e-01 -7.622807e-01 -8.968482e-01 -9.414597e-01
-1.003678e+00
[58196] -9.908477e-01
datafit <- auto.arima(data)
datamodel <- Arima(data, order = c(5, 0, 2))
datafcst <- forecast(datamodel, h=60, level=95)
plot(datafcst, xlim=c(58100, 58250))
enter image description here
I have attached the image it yields too. The idea is that I can use ggplot to plot this rather than the standard plot, with dates on the xaxis instead of the numerical values. However, ggplot also won't work for me as it says it isn't considered a data frame?
Many thanks!
as you did not provide a minimal example it is hard to help you but I try. Assume your date is called "date".
dater = as.Date(strptime(date, "%Y-%m-%d"))
And from ?strptime:
format
A character string. The default for the format methods is "%Y-%m-%d %H:%M:%S" if any element has a time component which is not midnight, and "%Y-%m-%d" otherwise.
Hope that helps

Twitter plot created time

I'm trying to plot the created time of twitters. I can extract the created time with the following code:
tweets <- searchTwitter('weather', n=100,lang='en')
t <- twListToDF(tweets)
s <- t[, c("created")]
The time format I get is something like: 2017-02-25 18:52:06 UTC
Trying to plot it with plot(s) provides just a list of dots. I'm not sure if it is due to the date/time format.
I want to create a barchart that each bar represents the count of tweets on an hourly period. The x axis would represent time and the y axis the number of tweets.
Any ideas?
One approach that I've used uses the 'lubridate' package, which is available on CRAN.
library(lubridate)
date(now())
hour(now())
You would replace now() with your vector s. If your s vector has the class POSIXct, I believe that this will work. There might be alternative solutions that involve the lubridate R package, too. I hope that this helps.

Resources