I am trying to conduct a time series analysis based on this dataset:
time POINT_Y POINT_X
00:00 106.78 207.44
00:30 106.61 207.6
01:00 103.72 208.33
01:30 102.57 207.35
02:00 102.27 206.3
02:30 101.6 206.43
03:00 100.66 206.73
03:30 101.11 206.5
04:00 100.95 206.63
04:30 102.02 206.27
05:00 105.83 207.93
05:30 106.98 207.15
06:00 107.32 206.28
06:30 108.36 204.7
07:00 107.97 203.41
07:30 107.76 202.63
08:00 107.85 201.13
08:30 107.6 198.74
It has been set as:
austriacus<-read.table("austriacus.txt",header=T).
The time series function: x.ts<-ts(POINT_X,time) is not working and is producing the following error message: Error in is.data.frame(data) : object 'POINT_X' not found
Any ideas on this?
Try the zoo and chron packages:
Lines <- "time POINT_Y POINT_X
00:00 106.78 207.44
00:30 106.61 207.6
01:00 103.72 208.33
01:30 102.57 207.35
02:00 102.27 206.3
02:30 101.6 206.43
03:00 100.66 206.73
03:30 101.11 206.5
04:00 100.95 206.63
04:30 102.02 206.27
05:00 105.83 207.93
05:30 106.98 207.15
06:00 107.32 206.28
06:30 108.36 204.7
07:00 107.97 203.41
07:30 107.76 202.63
08:00 107.85 201.13
08:30 107.6 198.74
"
library(zoo)
library(chron)
to.times <- function(x) times(paste0(x, ":00"))
# z <- read.zoo("myfile", header = TRUE, FUN = to.times)
z <- read.zoo(text = Lines, header = TRUE, FUN = to.times)
plot(z)
Related
I'd like to make this graph on R:
Bat activity during the year regarding the time of the night. Each yellow dot is an individual. Blue curves are the sunset and sunrise during the year
This is the plot:
My excel file looks like this.
Date Sunrise Sunset Hours_after_sunset
16/08/2020 06:34 20:56 01:05
17/07/2020 05:53 21:42 01:26
11/08/2020 06:27 21:05 02:17
30/09/2020 07:42 19:20 06:45
24/04/2020 06:31 20:49 05:01
01/07/2020 05:38 21:53 04:13
18/07/2020 05:54 21:41 01:42
04/08/2020 06:17 21:18 01:47
13/08/2020 06:30 21:02 05:14
30/06/2020 05:37 21:53 01:37
15/08/2020 06:33 20:58 01:22
04/09/2020 07:03 20:17 07:25
07/09/2020 07:07 20:11 03:30
28/06/2020 05:36 21:54 02:10
01/07/2020 05:38 21:53 04:13
19/08/2020 06:39 20:50 01:32
09/04/2020 07:01 20:26 07:11
16/05/2020 05:55 21:22 01:14
17/06/2020 05:33 21:52 05:36
22/07/2020 05:59 21:36 03:00
11/08/2020 06:27 21:05 08:42
10/08/2020 06:25 21:07 02:36
08/08/2020 06:23 21:11 07:19
05/08/2020 06:18 21:16 03:33
24/08/2020 06:46 20:40 02:12
16/08/2020 06:34 20:56 04:01
24/08/2020 06:46 20:40 04:18
19/08/2020 06:39 20:50 02:27
22/08/2020 06:43 20:44 05:00
22/08/2020 06:43 20:44 01:56
17/09/2020 07:22 19:49 06:01
Hoping you can help me.
Something like this? Plot of the data you posted, sorted by date
I have a dataset with a column containing the opening and closing times of various stores.
The timings are in string format Opening time - Closing time,
eg: 17:00 - 21:00 | 11:30 - 14:30 | 11:30 - 14:30
I want to extract the minimum opening time within the above string, i.e. 11:30 and the max closing time i.e. 21:00.How do I do that using R?
DPUT:
structure(list(head.timings_remapping.Opening.And.Closing.Time..40. = c("15:30 - 21:30",
"12:00 - 00:00", "11:00 - 15:00 | 16:30 - 20:45", "12:00 - 22:30",
"17:00 - 21:30", "17:00 - 21:30", "16:30 - 00:00", "16:00 - 21:15",
"16:30 - 20:30", "17:00 - 20:00", "16:00 - 23:30", "16:30 - 21:30",
"17:00 - 22:00", "17:00 - 22:00", "17:00 - 21:30", "17:00 - 21:30",
"16:00 - 00:00", "16:30 - 23:59", "11:30 - 22:30", "11:30 - 23:59",
"17:00 - 20:30", "07:30 - 12:50", "16:15 - 23:00", "09:00 - 21:00",
"10:00 - 21:00", "11:00 - 22:00", "07:00 - 12:00 | 07:00 - 13:30 | 12:00 - 13:30",
"07:00 - 13:00 | 10:00 - 15:00", "10:00 - 02:00", "00:00 - 23:59",
"00:00 - 23:59", "11:00 - 20:00", "11:00 - 20:00", NA, "12:00 - 03:30 | 11:00 - 00:00",
"05:30 - 15:00", "07:00 - 16:00", "08:30 - 13:30", "17:00 - 21:00 | 11:30 - 14:30 | 11:30 - 14:30",
"12:00 - 01:00")), class = "data.frame", row.names = c(NA, -40L
))
The final output will have two columns "Opening time" and "Closing time"
Does this work:
library(dplyr)
library(tidyr)
df %>%
separate(col = head.timings_remapping.Opening.And.Closing.Time..40., into = c('Open_Close','A'), sep = '\\|') %>%
separate(col = Open_Close, into = c('Opening Time','Closing Time'), sep = ' - ') %>%
mutate(`Opening Time` = trimws(`Opening Time`), `Closing Time` = trimws(`Closing Time`)) %>% select(-A)
Opening Time Closing Time
1 15:30 21:30
2 12:00 00:00
3 11:00 15:00
4 12:00 22:30
5 17:00 21:30
6 17:00 21:30
7 16:30 00:00
8 16:00 21:15
9 16:30 20:30
10 17:00 20:00
11 16:00 23:30
12 16:30 21:30
13 17:00 22:00
14 17:00 22:00
15 17:00 21:30
16 17:00 21:30
17 16:00 00:00
18 16:30 23:59
19 11:30 22:30
20 11:30 23:59
21 17:00 20:30
22 07:30 12:50
23 16:15 23:00
24 09:00 21:00
25 10:00 21:00
26 11:00 22:00
27 07:00 12:00
28 07:00 13:00
29 10:00 02:00
30 00:00 23:59
31 00:00 23:59
32 11:00 20:00
33 11:00 20:00
34 <NA> <NA>
35 12:00 03:30
36 05:30 15:00
37 07:00 16:00
38 08:30 13:30
39 17:00 21:00
40 12:00 01:00
Using dplyr and tidyr library you can do :
library(dplyr)
library(tidyr)
#Rename the long column name to something smaller
names(df)[1] <- 'Time'
df %>%
#Create a row index
mutate(row = row_number()) %>%
#Split the data in different rows on '|'
separate_rows(Time, sep = '\\s*\\|\\s*') %>%
#split the data on '-'
separate(Time, c("Opening_Time", "Closing_time"), sep = '\\s*-\\s*') %>%
#Change the time to POSIXct format
mutate(across(c(Opening_Time, Closing_time), as.POSIXct, format = '%H:%M')) %>%
#For each row
group_by(row) %>%
#Get minimum opening time and maximum closing time
#and change into required format
summarise(Opening_Time = format(min(Opening_Time), "%H:%M"),
Closing_time = format(max(Closing_time), "%H:%M")) %>%
#Drop row column
select(-row)
This returns
# Opening_Time Closing_time
# <chr> <chr>
# 1 15:30 21:30
# 2 12:00 00:00
# 3 11:00 20:45
# 4 12:00 22:30
# 5 17:00 21:30
# 6 17:00 21:30
# 7 16:30 00:00
# 8 16:00 21:15
# 9 16:30 20:30
#10 17:00 20:00
# … with 30 more rows
OK, this is making me crazy.
I have several datasets with time values that need to be rolled up into 15 minute intervals.
I found a solution here that works beautifully on one dataset. But on the next one I try to do I'm getting weird results. I have a column with character data representing dates:
BeginTime
-------------------------------
1 1/3/19 1:50 PM
2 1/3/19 1:30 PM
3 1/3/19 4:56 PM
4 1/4/19 11:23 AM
5 1/6/19 7:45 PM
6 1/7/19 10:15 PM
7 1/8/19 12:02 PM
8 1/9/19 10:43 PM
And I'm using the following code (which is exactly what I used on the other dataset except for the names)
df$by15 = cut(mdy_hm(df$BeginTime), breaks="15 min")
but what I get is:
BeginTime by15
-------------------------------------------------------
1 1/3/19 1:50 PM 2019-01-03 13:36:00
2 1/3/19 1:30 PM 2019-01-03 13:21:00
3 1/3/19 4:56 PM 2019-01-03 16:51:00
4 1/4/19 11:23 AM 2019-01-04 11:21:00
5 1/6/19 7:45 PM 2019-01-06 19:36:00
6 1/7/19 10:15 PM 2019-01-07 22:06:00
7 1/8/19 12:02 PM 2019-01-08 11:51:00
8 1/9/19 10:43 PM 2019-01-09 22:36:00
9 1/10/19 11:25 AM 2019-01-10 11:21:00
Any suggestions on why I'm getting such random times instead of the 15-minute intervals I'm looking for? Like I said, this worked fine on the other data set.
You can use lubridate::round_date() function which will roll-up your datetime data as follows;
library(lubridate) # To handle datetime data
library(dplyr) # For data manipulation
# Creating dataframe
df <-
data.frame(
BeginTime = c("1/3/19 1:50 PM", "1/3/19 1:30 PM", "1/3/19 4:56 PM",
"1/4/19 11:23 AM", "1/6/19 7:45 PM", "1/7/19 10:15 PM",
"1/8/19 12:02 PM", "1/9/19 10:43 PM")
)
df %>%
# First we parse the data in order to convert it from string format to datetime
mutate(by15 = parse_date_time(BeginTime, '%d/%m/%y %I:%M %p'),
# We roll up the data/round it to 15 minutes interval
by15 = round_date(by15, "15 mins"))
#
# BeginTime by15
# 1/3/19 1:50 PM 2019-03-01 13:45:00
# 1/3/19 1:30 PM 2019-03-01 13:30:00
# 1/3/19 4:56 PM 2019-03-01 17:00:00
# 1/4/19 11:23 AM 2019-04-01 11:30:00
# 1/6/19 7:45 PM 2019-06-01 19:45:00
# 1/7/19 10:15 PM 2019-07-01 22:15:00
# 1/8/19 12:02 PM 2019-08-01 12:00:00
# 1/9/19 10:43 PM 2019-09-01 22:45:00
My question is about time series data.
Suppose I have one file, named as P1 with column Time.Stamp and Value. Data table is given below:
Time.Stamp
01/01/2017 19:08
01/01/2017 19:08
01/01/2017 19:08
01/01/2017 19:08
01/01/2017 19:08
01/01/2017 19:08
01/01/2017 19:08
01/01/2017 19:09
01/01/2017 19:09
Value
12
24
45
56
78
76
34
65
87
I have another separated file, Named as P2 which has two columns , “Transaction from” and “transaction to” . This has the following columns:
Transaction from
01/01/2017 19:00
01/01/2017 19:15
02/01/2017 08:45
02/01/2017 09:00
02/01/2017 09:15
02/01/2017 09:30
03/01/2017 18:00
03/01/2017 18:15
03/01/2017 23:45
04/01/2017 00:15
04/01/2017 01:45
transaction to
01/01/2017 19:15
01/01/2017 19:30
02/01/2017 09:00
02/01/2017 09:15
02/01/2017 09:30
02/01/2017 09:45
03/01/2017 18:15
03/01/2017 18:30
04/01/2017 00:00
04/01/2017 00:30
04/01/2017 02:00
Now I want to search in R, which “Time.Stamp” from file P1 are belongs to the duration of “Transaction from” to “transaction to” of file P2. If any “Time.Stamp” is in the range of mentioned two columns of P2 then the associated value with Time.stamp will be aggregated. The length of columns of file P1 and file P2 is not equal. Length of P1 is much more long than length of P2.
It will be very helpful, if any one can find a solution in R.
This is a possible duplication of How to perform join over date ranges using data.table? Assuming that P1 & P2 are data frames and dates are POSIXct at the beginning, here is the livesaver join provided by data.table:
library(data.table)
setDT(P1)
setDT(P2)
P1[ , dummy := Time.Stamp]
setkey(P2, Transaction.from, transaction.to)
dt <- foverlaps(
P1,
P2,
by.x = c("Time.Stamp", "dummy"),
# mult = "first"/mult = "first" will only choose first/last match
nomatch = 0L
)[ , dummy := NULL]
# you can run ?data.table::foverlaps for the documentation
Please refer to this great blog post for a step-by-step explanation and other possible answers.
After this point you can simply:
library(dplyr)
dt %>%
group_by(Transaction.from) %>%
mutate(total = sum(value))
Please note that this solution may seem long for the simple aggregation you asked. However, it will come very handy if you need to merge the data frames and conduct more complex analysis.
First, convert all date to as.POSIXct(x,format = "%d/%m/%Y %H:%M"). Then look if each elements of p1$Time.Stamp is in any period of p2[,1] to p2[,2] by following function , then aggregate:
isitthere<- function(x,from=p2$`Transaction from`,to=p2$`transaction to`){
any(x >=from & x<= to)
}
Apply the function to all p1$Time.Stamp:
index<-sapply(p1$Time.Stamp, isitthere,from=p2$`Transaction from`,to=p2$`transaction to`)
index
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
Now aggregate:
sum(p1$Value[index])
[1] 477
I am not clear about what is to be aggregated by what but assuming that DF1 and DF2 are as defined in the Note at the end then this will, for each row in DF2, look up zero or more rows in DF1 and then sum all Value for those rows having the same Transaction.from and Transaction.to.
library(sqldf)
sqldf("select [Transaction.from], [Transaction.to], sum(Value) as Value
from DF2
left join DF1 on [Time.Stamp] between [Transaction.from] and [Transaction.to]
group by [Transaction.from], [Transaction.to]")
giving:
Transaction.from Transaction.to Value
1 2017-01-01 19:00:00 2017-01-01 19:15:00 477
2 2017-01-01 19:15:00 2017-01-01 19:30:00 NA
3 2017-02-01 08:45:00 2017-02-01 09:00:00 NA
4 2017-02-01 09:00:00 2017-02-01 09:15:00 NA
5 2017-02-01 09:15:00 2017-02-01 09:30:00 NA
6 2017-02-01 09:30:00 2017-02-01 09:45:00 NA
7 2017-03-01 18:00:00 2017-03-01 18:15:00 NA
8 2017-03-01 18:15:00 2017-03-01 18:30:00 NA
9 2017-03-01 23:45:00 2017-04-01 00:00:00 NA
10 2017-04-01 00:15:00 2017-04-01 00:30:00 NA
11 2017-04-01 01:45:00 2017-04-01 02:00:00 NA
Note
Lines1 <- "
Time.Stamp,Value
01/01/2017 19:08,12
01/01/2017 19:08,24
01/01/2017 19:08,45
01/01/2017 19:08,56
01/01/2017 19:08,78
01/01/2017 19:08,76
01/01/2017 19:08,34
01/01/2017 19:09,65
01/01/2017 19:09,87
"
DF1 <- read.csv(text = Lines1)
fmt <- "%m/%d/%Y %H:%M"
DF1 <- transform(DF1, Time.Stamp = as.POSIXct(Time.Stamp, format = fmt))
Lines2 <- "
Transaction.from,Transaction.to
01/01/2017 19:00,01/01/2017 19:15
01/01/2017 19:15,01/01/2017 19:30
02/01/2017 08:45,02/01/2017 09:00
02/01/2017 09:00,02/01/2017 09:15
02/01/2017 09:15,02/01/2017 09:30
02/01/2017 09:30,02/01/2017 09:45
03/01/2017 18:00,03/01/2017 18:15
03/01/2017 18:15,03/01/2017 18:30
03/01/2017 23:45,04/01/2017 00:00
04/01/2017 00:15,04/01/2017 00:30
04/01/2017 01:45,04/01/2017 02:00
"
DF2 <- read.csv(text = Lines2)
DF2 <- transform(DF2, Transaction.from = as.POSIXct(Transaction.from, format = fmt),
Transaction.to = as.POSIXct(Transaction.to, format = fmt))
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Plot dates on the x axis and time on the y axis with ggplot2
I have these data,
Arrival Date
7:50 Apr-19
7:45 Apr-20
7:30 Apr-23
7:30 Apr-24
7:55 Apr-25
7:20 Apr-26
7:30 Apr-27
7:50 Apr-28
8:00 Apr-30
7:45 May-2
8:30 May-3
8:06 May-4
8:25 May-7
7:35 May-8
7:45 May-9
8:02 May-10
7:53 May-11
8:39 May-14
8:14 May-15
8:08 May-16
8:27 May-17
8:20 May-18
12:00 Apr-19
12:00 Apr-20
12:00 Apr-23
12:00 Apr-24
12:00 Apr-25
12:00 Apr-26
12:00 Apr-27
12:00 Apr-28
11:50 Apr-30
12:00 May-2
11:45 May-3
11:50 May-4
12:00 May-7
11:50 May-8
11:55 May-9
12:10 May-10
11:53 May-11
11:54 May-14
11:40 May-15
11:54 May-16
11:45 May-17
12:00 May-18
And I want to plot it using ggplot,
This is what I did,
OJT <- read.csv(file = "Data.csv", header = TRUE)
qplot(Date,Arrival, data = OJT, xlab = expression(bold("Date")), ylab = expression(bold("Time"))) + theme_bw() + opts(axis.text.x=theme_text(angle=90)) +geom_point(size = 2, colour = "black", fill = "red", pch = 21)
And here is the output
As you can see, the time and date is not arrange. I want the time to start from 7:00 am to 12:20 pm, and the date from April 19 to May 18. I tried using
as.Date(strptime(OJT$Date,"%m-%dT"))
But still I don't get the right plot.
And I can't find similar problems through the internet.
Any idea to help me solve this.
Thanks
I will try a different approach with some wrangling in lubridate. Target plot:
The code, including your data:
library("ggplot2")
library("lubridate")
df <- read.table(text = "Arrival Date
7:50 Apr-19
7:45 Apr-20
7:30 Apr-23
7:30 Apr-24
7:55 Apr-25
7:20 Apr-26
7:30 Apr-27
7:50 Apr-28
8:00 Apr-30
7:45 May-2
8:30 May-3
8:06 May-4
8:25 May-7
7:35 May-8
7:45 May-9
8:02 May-10
7:53 May-11
8:39 May-14
8:14 May-15
8:08 May-16
8:27 May-17
8:20 May-18
12:00 Apr-19
12:00 Apr-20
12:00 Apr-23
12:00 Apr-24
12:00 Apr-25
12:00 Apr-26
12:00 Apr-27
12:00 Apr-28
11:50 Apr-30
12:00 May-2
11:45 May-3
11:50 May-4
12:00 May-7
11:50 May-8
11:55 May-9
12:10 May-10
11:53 May-11
11:54 May-14
11:40 May-15
11:54 May-16
11:45 May-17
12:00 May-18", header=TRUE)
df$Date <- paste('2012-',df$Date, sep='')
df$Full <- paste(df$Date, df$Arrival, sep=' ')
df$Full <- ymd_hm(df$Full)
df$decimal.hour <- hour(df$Full) + minute(df$Full)/60
p <- ggplot(df, aes(x=Full, y=decimal.hour)) +
geom_point()
p
#make some data in your kind of format:
tS <- dummySeries()
a<-rownames(tS)
x<-c(a,a)
y<-1:24
dat<-as.data.frame(cbind(x,y))
#get it in the format for the plot
v<-paste(dat$x,dat$y, sep=" ")
v2<-as.POSIXct(strptime(v, "%Y-%m-%d %H",tz="GMT"))
v3<-sort(v2)
hrs<-strftime(v2,"%H")
days<-strftime(v2,"%Y-%m-%d")
final<-data.frame(cbind(days,hrs))
qplot(days,hrs,data=final) + geom_point()
#ooooff... I bet this can be done much cleaner...i know little about
#time series data.