daily, monthly and annual mean - r

I have data with hourly. I need to convert into daily, monthly and then in the annual.
Also, some dates are missing in that, So i want to include that as well.
#Date
24/02/2000/05:25:00 NaN NaN NaN
26/02/2000/05:10:00 0.227 0.2002496 0.2009378
26/02/2000/06:50:00 NaN NaN NaN
27/02/2000/05:55:00 0.21 0.1687891 0.1630572
28/02/2000/05:00:00 NaN NaN 0.1265696
28/02/2000/06:35:00 0.136 0.1446176 0.1479067
29/02/2000/05:40:00 0.293 0.2279881 0.1900514
01/03/2000/04:45:00 NaN NaN NaN
01/03/2000/06:25:00 0.322 0.3068518 0.2880579
02/03/2000/05:30:00 0.332 0.2793714 0.2391622
02/03/2000/07:05:00 NaN NaN NaN
03/03/2000/06:10:00 0.335 0.2151302 0.2218139
04/03/2000/05:15:00 0.1 0.1138773 0.1168912
04/03/2000/06:55:00 NaN NaN NaN
05/03/2000/06:00:00 0.117 0.1333082 0.147145
06/03/2000/05:05:00 NaN 0.2426362 0.2401871
06/03/2000/06:40:00 NaN 0.32587 0.2845067
07/03/2000/05:45:00 0.323 0.3143821 0.3096662
08/03/2000/04:50:00 NaN NaN NaN
08/03/2000/06:30:00 0.236 0.23232 0.2300107
10/03/2000/06:20:00 0.113 0.1429935 0.1453774
11/03/2000/05:25:00 0.276 0.3238274 0.3150585
11/03/2000/07:00:00 NaN NaN NaN
12/03/2000/06:05:00 0.215 0.2537585 0.2512374
13/03/2000/05:10:00 0.163 0.2273455 0.2679352
13/03/2000/06:50:00 NaN NaN NaN
14/03/2000/05:55:00 0.09 0.1311507 0.1761056
15/03/2000/05:00:00 NaN NaN 0.1447348
15/03/2000/06:35:00 0.125 0.1232291 0.1387782
16/03/2000/05:40:00 0.019 0.06970426 0.11602
17/03/2000/04:45:00 NaN NaN NaN
17/03/2000/06:25:00 0.194 0.1964414 0.1874403
18/03/2000/05:30:00 0.263 0.2749394 0.242199
18/03/2000/07:05:00 NaN NaN NaN
19/03/2000/06:10:00 0.217 0.217737 0.2183706
20/03/2000/05:15:00 0.253 0.2307511 0.2089891
20/03/2000/06:55:00 NaN NaN NaN
21/03/2000/06:00:00 0.282 0.2413632 0.2511235
22/03/2000/05:05:00 NaN 0.382685 0.3944636
22/03/2000/06:45:00 NaN 0.2734097 0.241442
23/03/2000/05:50:00 0.347 0.3289219 0.3003848
24/03/2000/04:50:00 NaN NaN NaN
24/03/2000/06:30:00 0.18 0.1892378 0.2021516
25/03/2000/05:35:00 0.216 0.1871835 0.206762
26/03/2000/06:20:00 0.189 0.1836237 0.2116453
27/03/2000/05:25:00 0.195 0.1817446 0.1804464
27/03/2000/07:00:00 NaN NaN NaN
28/03/2000/06:05:00 0.208 0.168515 0.1819115
29/03/2000/05:10:00 0.162 0.1598227 0.1689523
29/03/2000/06:50:00 NaN NaN NaN
30/03/2000/05:55:00 0.145 0.1472181 0.1723774
31/03/2000/05:00:00 NaN NaN 0.157723
31/03/2000/06:35:00 0.226 0.2108984 0.2339231

I guess you are talking about spliting your date variable in year, month, and day and then you want to calculate some grouping statistics of another varibale which you did not include in your example. If that is the case you could do the following:
# load package
library(dplyr)
#Date
Date <- data.frame( Date =strptime(c("24/02/2000/05:25:00",
"26/02/2000/05:10:00",
"26/02/2000/06:50:00",
"27/02/2000/05:56:00",
"28/02/2000/05:00:00",
"28/02/2000/06:35:00",
"29/02/2000/05:40:00",
"01/03/2000/04:45:00",
"01/03/2000/06:25:00",
"02/03/2000/05:30:00",
"02/03/2000/07:05:00",
"03/03/2000/06:10:00",
"04/03/2000/05:15:00",
"04/03/2000/06:55:00",
"05/03/2000/06:00:00",
"06/03/2000/05:05:00",
"06/03/2000/06:40:00",
"07/03/2000/05:45:00",
"08/03/2000/04:50:00",
"08/03/2000/06:30:00",
"10/03/2000/06:20:00",
"11/03/2000/05:25:00",
"11/03/2000/07:00:00",
"12/03/2000/06:05:00",
"13/03/2000/05:10:00",
"13/03/2000/06:50:00",
"14/03/2000/05:55:00",
"15/03/2000/05:00:00",
"15/03/2000/06:35:00",
"16/03/2000/05:40:00",
"17/03/2000/04:45:00",
"17/03/2000/06:25:00",
"18/03/2000/05:30:00",
"18/03/2000/07:05:00",
"19/03/2000/06:10:00",
"20/03/2000/05:15:00",
"20/03/2000/06:55:00",
"21/03/2000/06:00:00",
"22/03/2000/05:05:00",
"22/03/2000/06:45:00",
"23/03/2000/05:50:00",
"24/03/2000/04:50:00",
"24/03/2000/06:30:00",
"25/03/2000/05:35:00",
"26/03/2000/06:20:00",
"27/03/2000/05:25:00",
"27/03/2000/07:00:00",
"28/03/2000/06:05:00",
"29/03/2000/05:10:00",
"29/03/2000/06:50:00",
"30/03/2000/05:55:00",
"31/03/2000/05:00:00",
"31/03/2000/06:35:00"), format = "%d/%m/%Y/%H:%M:%S"))
# Split your Date variable in days, months, and years
Date[,"Year"] <- format(Date$Date, format = "%Y")
Date[,"Month"] <- format(Date$Date, format = "%m")
Date[,"Day"] <- format(Date$Date, format = "%d")
# Make up some random variable to calculate summary statistics on
Date[,"Random"] <- sample(seq(1,7,1),size=dim(Date)[1], replace = TRUE)
# Now you can calculate grouped statistics by day, month, or year
MonthMean <- Date %>%
group_by(Month) %>%
select(Month, Random) %>%
summarise(Mean = mean(Random))
# Output
# A tibble: 2 × 2
Month Mean
<chr> <dbl>
1 02 3.142857
2 03 4.217391

I have splited the data in Day,Month and Year then calculated Daymean, Monthlymean and Annualmean
using code:
# open the file
file1 <-read.table(file.choose(), header=T)
# View the content of the file
View(file1)
# assign the date
as.character(file1$Date) -> file1$date
time <- as.Date( file1$date, "%d/%m/%Y")
# seperate the day, month, year
file1[,"Year"] <- format(time, format = "%Y")
file1[,"Month"] <- format(time, format = "%m")
file1[,"Day"] <- format(time, format = "%d")
# to see the updates file
View(file1)
# avearaging the dayily mean then same as month wise
aggregate(file1[, 2:4], list(file$Day), mean, na.rm=T)

Related

twang - Error in Di - crossprod(WX[index, ], X[index, ]) : non-conformable arrays

I'm trying to build propensity scores with the twang package, but I keep getting this error:
Error in Di - crossprod(WX[index, ], X[index, ]) : non-conformable arrays
I'm attaching the code:
ps.TPSV.gbm = ps(Cardioversione ~ Sesso+ age,
data = prova)
> ps.TPSV.gbm = ps(Cardioversione ~ Sesso+ age,
+ data = prova)
Fitting boosted model
Iter TrainDeviance ValidDeviance StepSize Improve
1 0.6590 nan 0.0100 nan
2 0.6581 nan 0.0100 nan
3 0.6572 nan 0.0100 nan
4 0.6564 nan 0.0100 nan
5 0.6556 nan 0.0100 nan
6 0.6548 nan 0.0100 nan
7 0.6540 nan 0.0100 nan
8 0.6533 nan 0.0100 nan
9 0.6526 nan 0.0100 nan
...
9900 0.4164 nan 0.0100 nan
9920 0.4161 nan 0.0100 nan
9940 0.4160 nan 0.0100 nan
9960 0.4158 nan 0.0100 nan
9980 0.4157 nan 0.0100 nan
10000 0.4155 nan 0.0100 nan
Diagnosis of unweighted analysis
Error in Di - crossprod(WX[index, ], X[index, ]) : non-conformable arrays
I honestly don't understand which is the problem, the variables are one factorial (Sesso) and one numeric (age), there are no missing values...could anyone help me?
Thank you in advance
I've already tried changing the variables introduced in the PS but there's no way, I tried if the example code works with the lalonde dataset included in twang and it works well.

Is it possible to extract slope and intercept from multiple fitted lines into a tibble?

I'm trying to compare the slope and intercept of many separate fitted lines and would like to extract this information from the equations that are shown using stat_poly_eq. I am able to plot all of the data and lines but since in my actual data I has over 50 equations, I'd like a simple way to extract the slope and intercept of each line.
Below is code to generate a similar plot with mtcars.
I'd like to add an output as a tibble with columns for cyl, gear, slope, intercept.
library(tidyverse)
library(ggpmisc)
ggplot(mtcars, aes(x = wt, y = mpg, color = as.character(cyl))) +
geom_point()+
facet_wrap(gear ~ .) +
stat_poly_line(fullrange = TRUE, se = FALSE) +
stat_poly_eq(aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~~~")),
parse=TRUE,label.x.npc = "right")
You can use nlme::lmList to help run all regression.
library(nlme)
#create a new grouping variable for the nested grouping factor
mtcars2 <- mtcars %>% mutate(gear_cyl = interaction(mtcars$gear, mtcars$cyl))
fm1 <- lmList(mpg ~ wt | gear_cyl, mtcars2)
Calling the summary for coefficient and r-squared will return you the same statistics shown on your ggplot.
summary(fm1)$coef
, , (Intercept)
Estimate Std. Error t value Pr(>|t|)
3.4 21.50000 NaN NaN NaN
4.4 40.85910 4.254923 9.602782 4.813403e-08
5.4 41.01754 NaN NaN NaN
3.6 64.70408 NaN NaN NaN
4.6 30.20964 12.042632 2.508558 2.326993e-02
5.6 19.70000 NaN NaN NaN
3.8 25.05942 4.527584 5.534834 4.526681e-05
5.8 22.14000 NaN NaN NaN
, , wt
Estimate Std. Error t value Pr(>|t|)
3.4 NA NaN NA NA
4.4 -5.859279 1.741259 -3.3649675 0.003940916
5.4 -7.017544 NaN NaN NaN
3.6 -13.469388 NaN NaN NaN
4.6 -3.380894 3.866794 -0.8743402 0.394867930
5.6 NA NaN NA NA
3.8 -2.438894 1.085886 -2.2459951 0.039178704
5.8 -2.000000 NaN NaN NaN
summary(fm1)$r.squared
[1] 0.0000000 0.5358963 1.0000000 1.0000000 0.8095674 0.0000000 0.4561613 1.0000000
To put everything into a tibble:
output <- tibble(gear_cyl = names(fm1),
intercept = summary(fm1)$coef[,1,1],
slope = summary(fm1)$coef[,1,2],
r_sq = summary(fm1)$r.squared)
output %>% separate(gear_cyl, c("gear", "cyl"), sep="\\.")
# A tibble: 8 x 5
gear cyl intercept slope r_sq
<chr> <chr> <dbl> <dbl> <dbl>
1 3 4 21.5 NA 0
2 4 4 40.9 -5.86 0.536
3 5 4 41.0 -7.02 1
4 3 6 64.7 -13.5 1
5 4 6 30.2 -3.38 0.810
6 5 6 19.7 NA 0
7 3 8 25.1 -2.44 0.456
8 5 8 22.1 -2.00 1

Where is the problem in my Forward Difference Table using Scilab?

So, i have to interpolate function f(x) like this.
x = 0:0.1:2.8;
y = [0 0.717 0.999 0.675 0.0583 0.7568 0.9961 0.6312];
Here's the code i got at the moment.
clc
clear
x = 0:0.1:2.8;
y = [0 0.717 0.999 0.675 0.0583 0.7568 0.9961 0.6312];
n = length(x);
del = %nan * ones (n ,7) ;
del (:,1) = y';
for j = 2:7
for i = 1: n - j +1
del (i,j) = del(i+1,j-1) - del(i,j-1);
end
end
del = [x'del];
del = round ( del *10^3) /10^3;
mprintf ("%5s,%7s,%8s,%9s,%8s,%8s,%8s",'x','y','dy','d2y','d3y','d4y','d5y')
disp ( del )
and it's giving me Submatrix incorrectly defined error.
Where could be the problem?
x and y should have the same length, but it is not the case with you data. For example, you can set
y = [0 0.717 0.999 0.675 0.0583 0.7568 0.9961 0.6312];
x = linspace(0,2.8,length(y));
The line del = [x'del]; fails, it should be written as (a space is missing)
del = [x' del];
Then you script outputs the result:
x, y, dy, d2y, d3y, d4y, d5y
0. 0. 0.717 -0.435 -0.171 0.484 0.81 -5.487
0.4 0.717 0.282 -0.606 0.313 1.295 -4.677 9.689
0.8 0.999 -0.324 -0.293 1.608 -3.382 5.012 Nan
1.2 0.675 -0.617 1.315 -1.774 1.629 Nan Nan
1.6 0.058 0.699 -0.459 -0.145 Nan Nan Nan
2. 0.757 0.239 -0.604 Nan Nan Nan Nan
2.4 0.996 -0.365 Nan Nan Nan Nan Nan
2.8 0.631 Nan Nan Nan Nan Nan Nan

How to stop printing for "ps" function in "twang" package?

The "ps" function (propensity score estimation) in "twang" package in R keeps printing its report. How can I turn that off?
I already tried to set the "print.level" argument to be 0. But it is not working for me.
D = rbinom(100, size = 1, prob = 0.5)
X1 = rnorm(100)
X2 = rnorm(100)
ps(D ~ ., data = data.frame(D, X1, X2), stop.method = 'es.mean',
estimand = "ATE", print.level = 0)
I hope there is no printing of the process, but it keeps giving me something like:
Fitting gbm model
Iter TrainDeviance ValidDeviance StepSize Improve
1 1.3040 nan 0.0100 nan
2 1.3012 nan 0.0100 nan
3 1.2985 nan 0.0100 nan
4 1.2959 nan 0.0100 nan
5 1.2932 nan 0.0100 nan
6 1.2907 nan 0.0100 nan
7 1.2880 nan 0.0100 nan
8 1.2855 nan 0.0100 nan
9 1.2830 nan 0.0100 nan
10 1.2804 nan 0.0100 nan
20 1.2562 nan 0.0100 nan
.....
which is annoying.
Presumably you want to capture the result in a variable; if you combine that with the verbose = FALSE parameter, it should do what you need:
res <- ps(D ~ ., data = data.frame(D, X1, X2), stop.method = 'es.mean',
estimand = "ATE", print.level = 0, verbose = FALSE)
I haven't tested whether you still need print.level = 0.

How to deal with NaN in R?

I have two binary files with the same dimensions(corr and rmse ).I want to do this:
replace all pixels in rmse by NA whenevr corr is NA.
file1:
conne <- file("D:\\omplete.bin","rb")
corr<- readBin(conne, numeric(), size=4, n=1440*720, signed=TRUE)
file2:
rms <- file("D:\\hgmplete.bin","rb")
rmse<- readBin(rms, numeric(), size=4, n=1440*720, signed=TRUE)
I did this:
rmse[corr==NA]=NA
did not do anything, so I tried this:
rmse[corr==NaN]=NA
did not do anything either! Can anybody help me on this.
Head of the file corr:
> corr
[1] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
You need to use the logical test is.nan(). In this case:
rmse[is.nan(corr)]=NA
should do the trick

Resources