Sorry if this question seems so basic but the examples I see online are too complex for a beginner like me so please help me through this. Basically, I want to predict when is a Measurement high and low based on the data. Below is a part of the dataset I have. The full picture is a 10-day with 30-minute interval.
Datetime Measurement
1 4/4/2013 11:59 0.532
2 4/4/2013 12:29 0.647
3 4/4/2013 12:59 0.564
4 4/4/2013 13:29 0.425
5 4/4/2013 13:59 0.253
I tried the following to convert the Datetime column to a Datetime type
data$Datetime <- strptime(x = as.character(data$Datetime),
format = "%m/%d/%Y %H:%M")
which resulted to
Datetime Measurement
1 2013-04-04 11:59:00 0.532
2 2013-04-04 12:29:00 0.647
3 2013-04-04 12:59:00 0.564
4 2013-04-04 13:29:00 0.425
5 2013-04-04 13:59:00 0.253
I've stumbled upon this and maybe it's necessary to convert the data frame to a time series. I tried the ts and xts but I think I'm going nowhere. I also tried
library(timeSeries)
as.timeSeries(data)
but then it says data.frame after I performed class(data).
How can I convert it to a time series given the situation? Can I make the link my basis and will the ARIMA model make sense? Or are there simpler and reasonable time series analysis algorithms out there that can be done?
Please and thank you very much.
Related
I am dealing with a dataset with dates and various response values at different time intervals as shown below
Id Date Response
1 2008-03-12 4.88
1 2009-06-06 5.39
2 2015-10-22 8.61
2 2019-09-26 6.20
3 2006-09-28 7.40
3 2009-07-15 7.25
3 2011-01-19 9.50
Dates are X values, Response y values.
I am interested in estimating the AUC for each Id. Any suggestions for accomplishing this is much appreciated.
I have a time series with hourly data on energy consumption in the form of a zoo object. And there are 16 indices (in the range [1:143206]) for which the Date is NA. Here is a sample of the data:
Date PJMW_MW
1 2002-04-01 01:00:00 4374
...
8709 2003-03-29 23:00:00 4827
8710 2003-03-30 00:00:00 4611
8711 2003-03-30 01:00:00 4421
8712 NA 4285
8713 2003-03-30 03:00:00 4212
8714 2003-03-30 04:00:00 4321
...
143206 2018-08-03 00:00:005489
The data above is a data.frame object called dat but I have it in a zoo object called hourly_ts:
1 4374
...
7709 6135
7710 6324
7711 6626
7712 6866
7713 6987
7714 7028
7715 7026
...
143206 5265
I would like to see the monthly averages, like, for which month is the consumption generally higher, and I saw that there is a simple formula for this: boxplot(hourly_ts ~ cycle(hourly_ts))
But the error Error in cycle.zoo(hourly_ts) : ‘x’is not regular appears.
The weird thing is that hourly_ts has a specified frequency (24 hours per day) and start time (April 1st 2002 01:00:00), so from that there shouldn't be any missing values in the time.
Supposing the missing values are what's causing the irregularity, is there a way I can add the values myself?
I would also like to use the aggregate function but have no idea what the by parameter should be.
I have a dataframe as follows:
Date Price1 Price2 Price3 Price4 .... Price 24
2017-10-15 60.43 49.40 48.72 48.32
2017-10-16 38.09 30.00 24.47 24.88
2017-10-17 48.80 46.76 46.73 45.82
The goal is to turn the dataframe object into a temporal series, predicting as well the date 2017-10-18, with all the corresponding 24 price/values.
Actually, I get the ts object, but it appears the following error at time to compute Error in ets(stock_prize) : y should be a univariate time series
Any advice?
I think your data structure is not correct. I suggest you should make those dates a factor and make only one column for the values. For example you have something like this:
mydates <- as.Date(c("2007-06-22", "2004-02-13"))
mydates2 <-as.Date(c("2008-06-22", "2005-02-13"))
mydates3 <-as.Date(c("2009-06-22", "2006-02-13"))
hours <- c(8,9)
values <- c(1,2)
a=data.frame(mydates,mydates2,mydates3,hours,values)
a
This is how your data looks:
mydates mydates2 mydates3 hours values
1 2007-06-22 2008-06-22 2009-06-22 8 1
2 2004-02-13 2005-02-13 2006-02-13 9 2
But you should transform them to look something like this:
dates=c(mydates,mydates2,mydates3)
hours_factor=rep(hours,3)
ordered_values=rep(values,3)
b=data.frame(dates,hours_factor,ordered_values)
b
This is how your data shoud look like:
dates hours_factor ordered_values
1 2007-06-22 8 1
2 2004-02-13 9 2
3 2008-06-22 8 1
4 2005-02-13 9 2
5 2009-06-22 8 1
6 2006-02-13 9 2
After that you can make the variables a ts class. You can use ts function for that. If you want to predict next date value you can do an auto-regression. It is very well documented in the Internet, but please know your data have to match some requirements first.
I would like to make a sentiment analysis in R.
In have a dataset with a column "Svolume" with values as 1,5,41,...
Another column with different names of company.
A last column with the dates yyyy-mm-dd.
I would like to plot the times series by dates for each company, So If I have 3 company, I would like to have 3 lines, on the x axis the dates, and y axis the svolumes.
I wrote this :
data2=data[order(as.Date(data$Date, format="%d/%m/%Y")),]
In order to order the data by dates.
plot.ts(data2$Svolume)
axis(1,1:40,data2$Date)
And then this to plot a time serie of the Svolumes and then to put the dates on the X axis. I have two problems: I have only one line so I can't see the difference between the 3 company (I would love to have the line of the total too if possible), and my second problem is that it's impossible to read the dates as there are horizontal and so are wrote one on each others. Is there a way to write dates verticaly for example in order to read it?
I would like to have a plot like this: times series plot 1 but instead of "websites" have the "volumes" and instead of "shoes,socks,lace" have the name of my companies/subjects
Or like that but with the svolumes time series plot 2
That's how my data looks like:
> head(data)
Date Time Subject Sscore Smean Svscore Sdispersion Svolume Sbuzz Last close
1 2015-07-08 09:10:00 MMM -0.2280 0.2593 -0.2795 0.375 8 0.6026 155.430000000
2 2015-07-08 09:10:00 ACE -0.4415 0.3521 -0.0374 0.500 4 0.7200 104.460000000
3 2015-07-07 09:10:00 AES 1.9821 0.0233 1.1743 1.000 1 1.9445 13.200000000
4 2015-07-04 09:10:00 AFL -2.9335 0.0035 -0.2975 1.000 1 0.8321 61.960000000
5 2015-07-07 09:10:00 MMM 0.2977 0.2713 -0.7436 0.400 5 0.4895 155.080000000
6 2015-07-07 09:10:00 ACE -0.2331 0.3519 -0.1118 1.000 3 0.7196 103.330000000
Company name Date
1 3M Company 2015-07-08
2 ACE Limited 2015-07-08
3 The AES Corporation 2015-07-07
4 AFLAC Inc. 2015-07-04
5 3M Company 2015-07-07
6 ACE Limited 2015-07-07
Thank you for your help and sorry for my english.
I found how to do it, if it can help someone else:
library(ggplot2)
ggplot(data)+geom_line(aes(x=data$Date, y=data$Svolume,
+ group=data$Subject, color=data$Subject))
I have a data frame with columns like dates, volumes, and companies.
I would like to know how to keep only one line for each company?
grep returns the number of lines, but How can I do to get the full line please?
Besides how can I plot these volumes per companies on one single time series plot please?
I found plot.ts but I can't do it while I don't have volumes per companies as if I plot.ts my full data set it does not make the difference between companies and then have a wrong time serie (many points for a single date)
I would like to have a plot like this:
time series plot
but instead of "websites" have the "volumes" and instead of "shoes,socks,lace" have the name of my companies/subjects
Or like that but with the svolumes
time series plot 2
That's how my data looks like:
> head(data)
Date Time Subject Sscore Smean Svscore Sdispersion Svolume Sbuzz Last close
1 2015-07-08 09:10:00 MMM -0.2280 0.2593 -0.2795 0.375 8 0.6026 155.430000000
2 2015-07-08 09:10:00 ACE -0.4415 0.3521 -0.0374 0.500 4 0.7200 104.460000000
3 2015-07-07 09:10:00 AES 1.9821 0.0233 1.1743 1.000 1 1.9445 13.200000000
4 2015-07-04 09:10:00 AFL -2.9335 0.0035 -0.2975 1.000 1 0.8321 61.960000000
5 2015-07-07 09:10:00 MMM 0.2977 0.2713 -0.7436 0.400 5 0.4895 155.080000000
6 2015-07-07 09:10:00 ACE -0.2331 0.3519 -0.1118 1.000 3 0.7196 103.330000000
Company name Date
1 3M Company 2015-07-08
2 ACE Limited 2015-07-08
3 The AES Corporation 2015-07-07
4 AFLAC Inc. 2015-07-04
5 3M Company 2015-07-07
6 ACE Limited 2015-07-07
Thank you very much!
I'm not yet totally clear about what you have and what you want. It's very difficult without a reproducible example!
I assume you want to summarize your data by company and date. You could achieve this by using the data.table package:
library(data.table)
setDT(data)
newdata<-data[,.(volume=sum(Svolume)),by=.("Company name",Date)]
# Notice that you can use any other function instead of sum. mean, to mention one
Once you have that newdata object, you can try to plot it with ggplot2:
library(ggplot2)
library(scales)
pl<-ggplot(newdata,aes(x=Date,y=volume,group=`Company name`))+geom_line()+scale_x_date(format="%d-%m-%y")
pl
This solution intends to be general (as your desired output is not well defined in your question), so it might require some minor adjustments.
I found how to do it, if it can help someone else:
library(ggplot2)
ggplot(data)+geom_line(aes(x=data$Date, y=data$Svolume, group=data$Subject, color=data$Subject))