AUC based on Rehimann Sum in R - r

I am dealing with a dataset with dates and various response values at different time intervals as shown below
Id Date Response
1 2008-03-12 4.88
1 2009-06-06 5.39
2 2015-10-22 8.61
2 2019-09-26 6.20
3 2006-09-28 7.40
3 2009-07-15 7.25
3 2011-01-19 9.50
Dates are X values, Response y values.
I am interested in estimating the AUC for each Id. Any suggestions for accomplishing this is much appreciated.

Related

My T-test argument of syntax is right from my data set? [A beginner of R]

I started R only a bit ago. So I would like to ask you if I am doing t-test right with my purpose.
I have two data set like this.
> head(da1)
LiefertagDeliveryDate Price Hour
1 2015-12-31 28.82 1
25 2015-12-30 42.97 1
49 2015-12-29 43.38 1
73 2015-12-28 48.54 1
97 2015-12-27 46.36 1
121 2015-12-26 42.68 1
And,
> head(sp1)
# A tibble: 6 x 3
Date Hour Price
<dttm> <chr> <dbl>
1 2015-12-31 1 16.06
2 2015-12-30 1 28.51
3 2015-12-29 1 20.59
4 2015-12-28 1 27.94
5 2015-12-27 1 13.42
6 2015-12-26 1 -36.07
So basically everyday, from 2015-12-31 down to 2011-01-01 for hour 1. I would like to conduct t.test for those data to compare the average of price for hour 1 in the each data set and see if it is significant.
For this purpose, I conducted,
t.test(da1$Price, sp1$Price, data=rp1, var.equal=TRUE, conf.level = 0.95,
alternative = 'two.sided', paired=F)
(For equal variance with 95% confidence level for two sided. not paired because it is two different data)
Did I do it right for my purpose? I am very new with R, so I am not really sure if I did good work and since I am a programming beginner, strings and logicals all of that sound very foreign which makes me hard to understand the description in help.
Thank you for your kind advise in advance.
Have a great day people!

How to create a temporal series object in a day/hour format and predict values for next day?

I have a dataframe as follows:
Date Price1 Price2 Price3 Price4 .... Price 24
2017-10-15 60.43 49.40 48.72 48.32
2017-10-16 38.09 30.00 24.47 24.88
2017-10-17 48.80 46.76 46.73 45.82
The goal is to turn the dataframe object into a temporal series, predicting as well the date 2017-10-18, with all the corresponding 24 price/values.
Actually, I get the ts object, but it appears the following error at time to compute Error in ets(stock_prize) : y should be a univariate time series
Any advice?
I think your data structure is not correct. I suggest you should make those dates a factor and make only one column for the values. For example you have something like this:
mydates <- as.Date(c("2007-06-22", "2004-02-13"))
mydates2 <-as.Date(c("2008-06-22", "2005-02-13"))
mydates3 <-as.Date(c("2009-06-22", "2006-02-13"))
hours <- c(8,9)
values <- c(1,2)
a=data.frame(mydates,mydates2,mydates3,hours,values)
a
This is how your data looks:
mydates mydates2 mydates3 hours values
1 2007-06-22 2008-06-22 2009-06-22 8 1
2 2004-02-13 2005-02-13 2006-02-13 9 2
But you should transform them to look something like this:
dates=c(mydates,mydates2,mydates3)
hours_factor=rep(hours,3)
ordered_values=rep(values,3)
b=data.frame(dates,hours_factor,ordered_values)
b
This is how your data shoud look like:
dates hours_factor ordered_values
1 2007-06-22 8 1
2 2004-02-13 9 2
3 2008-06-22 8 1
4 2005-02-13 9 2
5 2009-06-22 8 1
6 2006-02-13 9 2
After that you can make the variables a ts class. You can use ts function for that. If you want to predict next date value you can do an auto-regression. It is very well documented in the Internet, but please know your data have to match some requirements first.

Plot Times Series of Sentiment by Company in R

I would like to make a sentiment analysis in R.
In have a dataset with a column "Svolume" with values as 1,5,41,...
Another column with different names of company.
A last column with the dates yyyy-mm-dd.
I would like to plot the times series by dates for each company, So If I have 3 company, I would like to have 3 lines, on the x axis the dates, and y axis the svolumes.
I wrote this :
data2=data[order(as.Date(data$Date, format="%d/%m/%Y")),]
In order to order the data by dates.
plot.ts(data2$Svolume)
axis(1,1:40,data2$Date)
And then this to plot a time serie of the Svolumes and then to put the dates on the X axis. I have two problems: I have only one line so I can't see the difference between the 3 company (I would love to have the line of the total too if possible), and my second problem is that it's impossible to read the dates as there are horizontal and so are wrote one on each others. Is there a way to write dates verticaly for example in order to read it?
I would like to have a plot like this: times series plot 1 but instead of "websites" have the "volumes" and instead of "shoes,socks,lace" have the name of my companies/subjects
Or like that but with the svolumes time series plot 2
That's how my data looks like:
> head(data)
Date Time Subject Sscore Smean Svscore Sdispersion Svolume Sbuzz Last close
1 2015-07-08 09:10:00 MMM -0.2280 0.2593 -0.2795 0.375 8 0.6026 155.430000000
2 2015-07-08 09:10:00 ACE -0.4415 0.3521 -0.0374 0.500 4 0.7200 104.460000000
3 2015-07-07 09:10:00 AES 1.9821 0.0233 1.1743 1.000 1 1.9445 13.200000000
4 2015-07-04 09:10:00 AFL -2.9335 0.0035 -0.2975 1.000 1 0.8321 61.960000000
5 2015-07-07 09:10:00 MMM 0.2977 0.2713 -0.7436 0.400 5 0.4895 155.080000000
6 2015-07-07 09:10:00 ACE -0.2331 0.3519 -0.1118 1.000 3 0.7196 103.330000000
Company name Date
1 3M Company 2015-07-08
2 ACE Limited 2015-07-08
3 The AES Corporation 2015-07-07
4 AFLAC Inc. 2015-07-04
5 3M Company 2015-07-07
6 ACE Limited 2015-07-07
Thank you for your help and sorry for my english.
I found how to do it, if it can help someone else:
library(ggplot2)
ggplot(data)+geom_line(aes(x=data$Date, y=data$Svolume,
+ group=data$Subject, color=data$Subject))

Incrementally add seconds of a timestamp column grouped by ID in R

I have a dataframe that is essentially a time series data.
Timestamp <- c("1/27/2015 18:28:16","1/27/2015 18:28:17","1/27/2015 18:28:19","1/27/2015 18:28:20","1/27/2015 18:28:23","1/28/2015 22:43:08","1/28/2015 22:43:09","1/28/2015 22:43:13","1/28/2015 22:43:15","1/28/2015 22:43:16"
)
ID <- c("A","A","A","A","A","B","B","B","B","B")
v1<- c(1.70,1.71,1.77,1.79,1.63,7.20,7.26,7.16,7.18,7.18)
df <- data.frame(Timestamp ,ID,v1)
Timestamp ID v1
1/27/2015 18:28:16 A 1.70
1/27/2015 18:28:17 A 1.71
1/27/2015 18:28:19 A 1.77
1/27/2015 18:28:20 A 1.79
1/27/2015 18:28:23 A 1.63
1/28/2015 22:43:08 B 7.20
1/28/2015 22:43:09 B 7.26
1/28/2015 22:43:13 B 7.16
1/28/2015 22:43:15 B 7.18
1/28/2015 22:43:16 B 7.18
Since I dont really care about the timestamp, I was thinking of creating a column called interval to plot this data in one plot.
I am wrongly creating the interval column by doing this
df$interval <- cut(df$Timestamp, breaks="sec")
I want to incrementally add the "secs" of the timestamp and put it in the interval column and this should by grouped by ID. By this I mean, Everytime it has a new ID, the interval column resets to 1 and then incrementally adds the timestamp (secs).
My desired output
Timestamp ID v1 Interval
1/27/2015 18:28:16 A 1.70 1
1/27/2015 18:28:17 A 1.71 2
1/27/2015 18:28:19 A 1.77 4
1/27/2015 18:28:20 A 1.79 5
1/27/2015 18:28:23 A 1.63 8
1/28/2015 22:43:08 B 7.20 1
1/28/2015 22:43:09 B 7.26 2
1/28/2015 22:43:13 B 7.16 6
1/28/2015 22:43:15 B 7.18 8
1/28/2015 22:43:16 B 7.18 9
I also would like to plot this using ggplot with interval vs v1 by ID and so we get 2 time series in the same plot. I will then extract features from it.
Please help me how to work around this problem so that I can apply it to a larger dataset.
One solution with data.table:
For the data:
library(data.table)
df <- as.data.table(df)
df$Timestamp <- as.POSIXct(df$Timestamp, format='%m/%d/%Y %H:%M:%S')
df[, Interval := as.numeric(difftime(Timestamp, .SD[1, Timestamp], units='secs') + 1) , by=ID]
which outputs:
> df
Timestamp ID v1 Interval
1: 2015-01-27 18:28:16 A 1.70 1
2: 2015-01-27 18:28:17 A 1.71 2
3: 2015-01-27 18:28:19 A 1.77 4
4: 2015-01-27 18:28:20 A 1.79 5
5: 2015-01-27 18:28:23 A 1.63 8
6: 2015-01-28 22:43:08 B 7.20 1
7: 2015-01-28 22:43:09 B 7.26 2
8: 2015-01-28 22:43:13 B 7.16 6
9: 2015-01-28 22:43:15 B 7.18 8
10: 2015-01-28 22:43:16 B 7.18 9
Then for ggplot:
library(ggplot2)
ggplot(df, aes(x=Interval, y=v1, color=ID)) + geom_line()
and the graph:

How can I filter specifically for certain months if the days are not the same in each year?

This is probably a very simple question that has been asked already but..
I have a data frame that I have constructed from a CSV file generated in excel. The observations are not homogeneously sampled, i.e they are for "On Peak" times of electricity usage. That means they exclude different days each year. I have 20 years of data (1993-2012) and am running both non Robust and Robust LOESS to extract seasonal and linear trends.
After the decomposition has been done, I want to focus only on the observations from June through September.
How can I create a new data frame of just those results?
Sorry about the formatting, too.
Date MaxLoad TMAX
1 1993-01-02 2321 118.6667
2 1993-01-04 2692 148.0000
3 1993-01-05 2539 176.0000
4 1993-01-06 2545 172.3333
5 1993-01-07 2517 177.6667
6 1993-01-08 2438 157.3333
7 1993-01-09 2302 152.0000
8 1993-01-11 2553 144.3333
9 1993-01-12 2666 146.3333
10 1993-01-13 2472 177.6667
As Joran notes, you don't need anything other than base R:
## Reproducible data
df <-
data.frame(Date = seq(as.Date("2009-03-15"), as.Date("2011-03-15"), by="month"),
MaxLoad = floor(runif(25,2000,3000)), TMAX=runif(25,100,200))
## One option
df[months(df$Date) %in% month.name[6:9],]
# Date MaxLoad TMAX
# 4 2009-06-15 2160 188.4607
# 5 2009-07-15 2151 164.3946
# 6 2009-08-15 2694 110.4399
# 7 2009-09-15 2460 150.4076
# 16 2010-06-15 2638 178.8341
# 17 2010-07-15 2246 131.3283
# 18 2010-08-15 2483 112.2635
# 19 2010-09-15 2174 160.9724
## Another option: strftime() will be more _generally_ useful than months()
df[as.numeric(strftime(df$Date, "%m")) %in% 6:9,]

Resources