Aggregating hourly data into monthly in R while omitting NAs [duplicate] - r

This question already has answers here:
Mean per group in a data.frame [duplicate]
(8 answers)
Closed 2 years ago.
I have some data gathered from a weather buoy:
station longitude latitude time wd wspd gst wvht dpd apd mwd bar
42001 -89.658 25.888 1975-08-13T22:00:00Z 23 4.1 NaN NaN NaN NaN NaN 1017.4
42001 -89.658 25.888 1975-08-13T23:00:00Z 59 3.1 NaN NaN NaN NaN NaN 1017.3
42001 -89.658 25.888 1975-08-14T00:00:00Z 30 5.2 NaN NaN NaN NaN NaN 1017.4
42001 -89.658 25.888 1975-08-14T01:00:00Z 70 2 NaN NaN NaN NaN NaN 1017.8
42001 -89.658 25.888 1975-08-14T02:00:00Z 87 5.7 NaN NaN NaN NaN NaN 1018.2
42001 -89.658 25.888 1975-08-14T03:00:00Z 105 5.6 NaN NaN NaN NaN NaN 1018.6
42001 -89.658 25.888 1975-08-14T04:00:00Z 116 5.8 NaN NaN NaN NaN NaN 1018.7
42001 -89.658 25.888 1975-08-14T05:00:00Z 116 5 NaN NaN NaN NaN NaN 1018.5
42001 -89.658 25.888 1975-08-14T06:00:00Z 123 4.5 NaN NaN NaN NaN NaN 1018.1
42001 -89.658 25.888 1975-08-14T07:00:00Z 137 4.1 NaN NaN NaN NaN NaN 1017.9
42001 -89.658 25.888 1975-08-14T08:00:00Z 151 3.6 NaN NaN NaN NaN NaN 1017.7
42001 -89.658 25.888 1975-08-14T09:00:00Z 153 3.5 NaN NaN NaN NaN NaN 1017.6
42001 -89.658 25.888 1975-08-14T10:00:00Z 180 3.5 NaN NaN NaN NaN NaN 1017.7
42001 -89.658 25.888 1975-08-14T11:00:00Z 189 2.8 NaN NaN NaN NaN NaN 1018
42001 -89.658 25.888 1975-08-14T12:00:00Z 183 1.7 NaN NaN NaN NaN NaN 1018.3
42001 -89.658 25.888 1975-08-14T13:00:00Z 172 0.7 NaN NaN NaN NaN NaN 1018.8
42001 -89.658 25.888 2001-11-18T11:00:00Z 38 7.3 8.8 1.1 6.67 4.51 69 1021
42001 -89.658 25.888 2001-11-18T12:00:00Z 29 7.9 9.3 1.01 5.88 4.42 57 1021.4
42001 -89.658 25.888 2001-11-18T13:00:00Z 29 7.4 8.3 1.02 7.14 4.42 65 1022.1
42001 -89.658 25.888 2001-11-18T14:00:00Z 23 8 9.5 0.97 5.56 4.48 55 1022.6
42001 -89.658 25.888 2001-11-18T15:00:00Z 16 7.6 8.9 1 6.67 4.5 64 1023.2
42001 -89.658 25.888 2001-11-18T16:00:00Z 26 8.9 10.2 0.94 4.17 4.49 29 1023.1
42001 -89.658 25.888 2001-11-18T17:00:00Z 26 8.5 10.2 0.98 4.55 4.48 36 1022.7
42001 -89.658 25.888 2001-11-18T18:00:00Z 17 7.8 9.1 1.07 4.76 4.56 30 1021.9
42001 -89.658 25.888 2001-11-18T19:00:00Z 24 8.1 9.1 1.07 4.55 4.6 29 1021
42001 -89.658 25.888 2001-11-18T20:00:00Z 18 8.3 11.1 1.21 6.25 4.6 69 1020
42001 -89.658 25.888 2001-11-18T21:00:00Z 30 8 9.4 1.2 6.67 4.72 77 1019.8
42001 -89.658 25.888 2001-11-18T22:00:00Z 39 8.2 9.6 1.32 6.67 4.8 76 1019.8
42001 -89.658 25.888 2001-11-18T23:00:00Z 32 8.5 9.6 1.21 6.67 4.63 71 1019.7
42001 -89.658 25.888 2001-11-19T00:00:00Z 38 8.9 10.3 1.28 6.25 4.6 72 1019.8
42001 -89.658 25.888 2001-11-19T01:00:00Z 48 8.3 9.6 1.26 6.67 4.53 71 1020.2
42001 -89.658 25.888 2001-11-19T02:00:00Z 54 10.1 11.6 1.28 6.67 4.59 65 1021.1
42001 -89.658 25.888 2001-11-19T03:00:00Z 60 3 4.7 1.29 5.88 4.58 72 1021.5
42001 -89.658 25.888 2001-11-19T04:00:00Z 77 0.8 1.7 1.25 6.67 4.92 63 1021.2
42001 -89.658 25.888 2001-11-19T05:00:00Z 153 2.1 3 1.21 6.67 4.91 64 1021
42001 -89.658 25.888 2001-11-19T06:00:00Z 20 2.2 5.5 1.18 6.25 4.92 65 1020.6
42001 -89.658 25.888 2001-11-19T07:00:00Z 158 6.2 9.7 1.31 6.67 5.22 67 1020.3
42001 -89.658 25.888 2001-11-19T08:00:00Z 162 7.4 9 1.26 6.67 5.42 73 1020.1
42001 -89.658 25.888 2001-11-19T09:00:00Z 218 4.8 6.2 1.2 7.69 4.98 65 1019.9
How could I create a data frame from aggregating the data (using the mean) on a monthly basis while leaving out the NaN values? The start of the data has numerous rows with NaN, but for several years there are values in those rows.
I've tried:
DF2 <- transform(buoy1, time = substring(time, 1, 7))
aggregate(as.numeric(wd) ~ time, DF2[-1,], mean, na.rm=TRUE))
Which generates
401 2010-09 109.20556
402 2010-10 107.42473
403 2010-11 130.67222
404 2010-12 135.75000
405 2011-01 156.11306
406 2011-02 123.33931
407 2011-03 137.29744
408 2011-04 119.85139
409 2011-05 148.65276
410 2011-06 104.74722
411 2011-07 88.16393
412 2011-09 106.60229
413 2011-10 93.32527
414 2011-11 149.52712
415 2011-12 123.09005
416 2012-01 145.38731
417 2012-02 115.40288
418 2012-03 127.44415
419 2012-04 133.02503
420 2012-05 122.34683
421 2012-06 146.95265
422 2012-07 133.58199
423 2012-08 149.08356
Is there a more efficient way to aggregate across all the columns at once?
Something like
DF2[,5:20] <- sapply(DF2[,5:20], as.numeric, na.rm=TRUE)
monthAvg <- aggregate(DF2[, 5:20], cut(time, "month"),mean)
But then I get:
Error in cut.default(time, "month") : 'x' must be numeric

Here is a base R solution
d <- within(buoy1[-1:-3], time <- format(as.POSIXct(time), "%Y-%m"))
aggregate(. ~ time, d, mean, na.rm = TRUE, na.action = NULL)
# "." means anything other than the RHS, which is `time` in this case
Output
time wd wspd gst wvht dpd apd mwd bar
1 1975-08 118.37500 3.806250 NaN NaN NaN NaN NaN 1018.000
2 2001-11 58.04348 6.882609 8.452174 1.157391 6.186957 4.690435 61.04348 1021.043

You could create a new column with year and month information and take mean of multiple columns using across.
library(dplyr)
df %>%
group_by(time = format(as.POSIXct(time), '%Y-%m')) %>%
summarise(across(gst:bar, mean, na.rm = TRUE)) -> result
result

Related

How to sample data non-random

I have weather dataset my data is date-dependent
I want to predict the temperature from 07 May 2008 until 18 May 2008 (which is maybe a total of 10-15 observations) my data size is around 200
I will be using decision tree/RF and SVM & NN to make my prediction
I've never handled data like this so I'm not sure how to sample non random data
I want to sample data 80% train data and 30% test data but I want to sample the data in the original order not randomly. Is that possible ?
install.packages("rattle")
install.packages("RGtk2")
library("rattle")
seed <- 42
set.seed(seed)
fname <- system.file("csv", "weather.csv", package = "rattle")
dataset <- read.csv(fname, encoding = "UTF-8")
dataset <- dataset[1:200,]
dataset <- dataset[order(dataset$Date),]
set.seed(321)
sample_data = sample(nrow(dataset), nrow(dataset)*.8)
test<-dataset[sample_data,] # 30%
train<-dataset[-sample_data,] # 80%
output
> head(dataset)
Date Location MinTemp MaxTemp Rainfall Evaporation Sunshine WindGustDir WindGustSpeed
1 2007-11-01 Canberra 8.0 24.3 0.0 3.4 6.3 NW 30
2 2007-11-02 Canberra 14.0 26.9 3.6 4.4 9.7 ENE 39
3 2007-11-03 Canberra 13.7 23.4 3.6 5.8 3.3 NW 85
4 2007-11-04 Canberra 13.3 15.5 39.8 7.2 9.1 NW 54
5 2007-11-05 Canberra 7.6 16.1 2.8 5.6 10.6 SSE 50
6 2007-11-06 Canberra 6.2 16.9 0.0 5.8 8.2 SE 44
WindDir9am WindDir3pm WindSpeed9am WindSpeed3pm Humidity9am Humidity3pm Pressure9am
1 SW NW 6 20 68 29 1019.7
2 E W 4 17 80 36 1012.4
3 N NNE 6 6 82 69 1009.5
4 WNW W 30 24 62 56 1005.5
5 SSE ESE 20 28 68 49 1018.3
6 SE E 20 24 70 57 1023.8
Pressure3pm Cloud9am Cloud3pm Temp9am Temp3pm RainToday RISK_MM RainTomorrow
1 1015.0 7 7 14.4 23.6 No 3.6 Yes
2 1008.4 5 3 17.5 25.7 Yes 3.6 Yes
3 1007.2 8 7 15.4 20.2 Yes 39.8 Yes
4 1007.0 2 7 13.5 14.1 Yes 2.8 Yes
5 1018.5 7 7 11.1 15.4 Yes 0.0 No
6 1021.7 7 5 10.9 14.8 No 0.2 No
> head(test)
Date Location MinTemp MaxTemp Rainfall Evaporation Sunshine WindGustDir WindGustSpeed
182 2008-04-30 Canberra -1.8 14.8 0.0 1.4 7.0 N 28
77 2008-01-16 Canberra 17.9 33.2 0.0 10.4 8.4 N 59
88 2008-01-27 Canberra 13.2 31.3 0.0 6.6 11.6 WSW 46
58 2007-12-28 Canberra 15.1 28.3 14.4 8.8 13.2 NNW 28
96 2008-02-04 Canberra 18.2 22.6 1.8 8.0 0.0 ENE 33
126 2008-03-05 Canberra 12.0 27.6 0.0 6.0 11.0 E 46
WindDir9am WindDir3pm WindSpeed9am WindSpeed3pm Humidity9am Humidity3pm Pressure9am
182 E N 2 19 80 40 1024.2
77 N NNE 15 20 58 62 1008.5
88 N WNW 4 26 71 28 1013.1
58 NNW NW 6 13 73 44 1016.8
96 SSE ENE 7 13 92 76 1014.4
126 SSE WSW 7 6 69 35 1025.5
Pressure3pm Cloud9am Cloud3pm Temp9am Temp3pm RainToday RISK_MM RainTomorrow
182 1020.5 1 7 5.3 13.9 No 0.0 No
77 1006.1 6 7 24.5 23.5 No 4.8 Yes
88 1009.5 1 4 19.7 30.7 No 0.0 No
58 1013.4 1 5 18.3 27.4 Yes 0.0 No
96 1011.5 8 8 18.5 22.1 Yes 9.0 Yes
126 1022.2 1 1 15.7 26.2 No 0.0 No
> head(train)
Date Location MinTemp MaxTemp Rainfall Evaporation Sunshine WindGustDir WindGustSpeed
7 2007-11-07 Canberra 6.1 18.2 0.2 4.2 8.4 SE 43
9 2007-11-09 Canberra 8.8 19.5 0.0 4.0 4.1 S 48
11 2007-11-11 Canberra 9.1 25.2 0.0 4.2 11.9 N 30
16 2007-11-16 Canberra 12.4 32.1 0.0 8.4 11.1 E 46
22 2007-11-22 Canberra 16.4 19.4 0.4 9.2 0.0 E 26
25 2007-11-25 Canberra 15.4 28.4 0.0 4.4 8.1 ENE 33
WindDir9am WindDir3pm WindSpeed9am WindSpeed3pm Humidity9am Humidity3pm Pressure9am
7 SE ESE 19 26 63 47 1024.6
9 E ENE 19 17 70 48 1026.1
11 SE NW 6 9 74 34 1024.4
16 SE WSW 7 9 70 22 1017.9
22 ENE E 6 11 88 72 1010.7
25 SSE NE 9 15 85 31 1022.4
Pressure3pm Cloud9am Cloud3pm Temp9am Temp3pm RainToday RISK_MM RainTomorrow
7 1022.2 4 6 12.4 17.3 No 0.0 No
9 1022.7 7 7 14.1 18.9 No 16.2 Yes
11 1021.1 1 2 14.6 24.0 No 0.2 No
16 1012.8 0 3 19.1 30.7 No 0.0 No
22 1008.9 8 8 16.5 18.3 No 25.8 Yes
25 1018.6 8 2 16.8 27.3 No 0.0 No
I use mtcars as an example. An option to non-randomly split your data in train and test is to first create a sample size based on the number of rows in your data. After that you can use split to split the data exact at the 80% of your data. You using the following code:
smp_size <- floor(0.80 * nrow(mtcars))
split <- split(mtcars, rep(1:2, each = smp_size))
With the following code you can turn the split in train and test:
train <- split$`1`
test <- split$`2`
Let's check the number of rows:
> nrow(train)
[1] 25
> nrow(test)
[1] 7
Now the data is split in train and test without losing their order.

Delete Several Lines in txt file with conditional in R

i got problem how to delete several lines in txt file then convert into csv with R because i just want to get the data from txt.
My code cant delete propely because it delete lines which contain the date of the data
Here the code i used
setwd("D:/tugasmaritim/")
FILES <- list.files( pattern = ".txt")
for (i in 1:length(FILES)) {
l <- readLines(FILES[i],skip=4)
l2 <- l[-sapply(grep("</PRE><H3>", l), function(x) seq(x, x + 30))]
l3 <- l2[-sapply(grep("<P>Description", l2), function(x) seq(x, x + 29))]
l4 <- l3[-sapply(grep("<HTML>", l3), function(x) seq(x, x + 3))]
write.csv(l4,row.names=FALSE,file=paste0("D:/tugasmaritim/",sub(".txt","",FILES[i]),".csv"))
}
my data looks like this
<HTML>
<TITLE>University of Wyoming - Radiosonde Data</TITLE>
<LINK REL="StyleSheet" HREF="/resources/select.css" TYPE="text/css">
<BODY BGCOLOR="white">
<H2>96749 WIII Jakarta Observations at 00Z 02 Oct 1995</H2>
<PRE>
-----------------------------------------------------------------------------
PRES HGHT TEMP DWPT RELH MIXR DRCT SKNT THTA THTE THTV
hPa m C C % g/kg deg knot K K K
-----------------------------------------------------------------------------
1011.0 8 23.2 22.5 96 17.30 0 0 295.4 345.3 298.5
1000.0 98 23.6 22.4 93 17.39 105 8 296.8 347.1 299.8
977.3 300 24.6 22.1 86 17.49 105 8 299.7 351.0 302.8
976.0 311 24.6 22.1 86 17.50 104 8 299.8 351.2 303.0
950.0 548 23.0 22.0 94 17.87 88 12 300.5 353.2 303.7
944.4 600 22.6 21.8 95 17.73 85 13 300.6 352.9 303.8
925.0 781 21.2 21.0 99 17.25 90 20 301.0 351.9 304.1
918.0 847 20.6 20.6 100 16.95 90 23 301.0 351.0 304.1
912.4 900 20.4 18.6 89 15.00 90 26 301.4 345.7 304.1
897.0 1047 20.0 13.0 64 10.60 90 26 302.4 334.1 304.3
881.2 1200 19.4 11.4 60 9.70 90 26 303.3 332.5 305.1
850.0 1510 18.2 8.2 52 8.09 95 18 305.2 329.9 306.7
845.0 1560 18.0 7.0 49 7.49 91 17 305.5 328.4 306.9
810.0 1920 15.0 9.0 67 8.97 60 11 306.0 333.4 307.7
792.9 2100 14.3 3.1 47 6.06 45 8 307.1 325.9 308.2
765.1 2400 13.1 -6.8 24 3.01 40 8 309.0 318.7 309.5
746.0 2612 12.2 -13.8 15 1.77 38 10 310.3 316.2 310.6
712.0 3000 10.3 -15.0 15 1.69 35 13 312.3 318.1 312.6
700.0 3141 9.6 -15.4 16 1.66 35 13 313.1 318.7 313.4
653.0 3714 6.6 -16.4 18 1.63 32 12 316.0 321.6 316.3
631.0 3995 4.8 -2.2 60 5.19 31 11 317.0 333.9 318.0
615.3 4200 3.1 -3.9 60 4.70 30 11 317.4 332.8 318.3
601.0 4391 1.6 -5.4 60 4.28 20 8 317.8 331.9 318.6
592.9 4500 0.6 -12.0 38 2.59 15 6 317.9 326.6 318.4
588.0 4567 0.0 -16.0 29 1.88 11 6 317.9 324.4 318.3
571.0 4800 -1.2 -18.9 25 1.51 355 5 319.1 324.4 319.4
549.8 5100 -2.8 -22.8 20 1.12 45 6 320.7 324.8 321.0
513.0 5649 -5.7 -29.7 13 0.64 125 10 323.6 326.0 323.8
500.0 5850 -5.1 -30.1 12 0.63 155 11 326.8 329.1 326.9
494.0 5945 -4.9 -29.9 12 0.65 146 11 328.1 330.6 328.3
471.7 6300 -7.4 -32.0 12 0.56 110 13 329.3 331.5 329.4
453.7 6600 -9.6 -33.8 12 0.49 100 14 330.3 332.2 330.4
400.0 7570 -16.5 -39.5 12 0.31 105 14 333.5 334.7 333.5
398.0 7607 -16.9 -39.9 12 0.30 104 14 333.4 334.6 333.5
371.9 8100 -20.4 -42.6 12 0.24 95 16 335.4 336.3 335.4
300.0 9660 -31.3 -51.3 12 0.11 115 18 341.1 341.6 341.2
269.0 10420 -36.3 -55.3 12 0.08 79 20 344.7 345.0 344.7
265.9 10500 -36.9 75 20 344.9 344.9
250.0 10920 -40.3 80 28 346.0 346.0
243.4 11100 -41.8 85 37 346.4 346.4
222.5 11700 -46.9 75 14 347.6 347.6
214.0 11960 -49.1 68 16 348.1 348.1
200.0 12400 -52.7 55 20 349.1 349.1
156.0 13953 -66.1 55 25 352.1 352.1
152.3 14100 -67.2 55 26 352.6 352.6
150.0 14190 -67.9 55 26 352.9 352.9
144.7 14400 -69.6 60 26 353.6 353.6
137.5 14700 -72.0 60 39 354.6 354.6
130.7 15000 -74.3 50 28 355.6 355.6
124.2 15300 -76.7 40 36 356.5 356.5
118.0 15600 -79.1 50 48 357.4 357.4
116.0 15698 -79.9 45 44 357.6 357.6
112.0 15900 -79.1 45 26 362.6 362.6
106.3 16200 -78.0 35 24 370.2 370.2
100.0 16550 -76.7 35 24 379.3 379.3
</PRE><H3>Station information and sounding indices</H3><PRE>
Station identifier: WIII
Station number: 96749
Observation time: 951002/0000
Station latitude: -6.11
Station longitude: 106.65
Station elevation: 8.0
Showalter index: 6.30
Lifted index: -1.91
LIFT computed using virtual temperature: -2.80
SWEAT index: 145.41
K index: 6.50
Cross totals index: 13.30
Vertical totals index: 23.30
Totals totals index: 36.60
Convective Available Potential Energy: 799.02
CAPE using virtual temperature: 1070.13
Convective Inhibition: -26.70
CINS using virtual temperature: -12.88
Equilibrum Level: 202.64
Equilibrum Level using virtual temperature: 202.60
Level of Free Convection: 828.70
LFCT using virtual temperature: 909.19
Bulk Richardson Number: 210.78
Bulk Richardson Number using CAPV: 282.30
Temp [K] of the Lifted Condensation Level: 294.96
Pres [hPa] of the Lifted Condensation Level: 958.67
Mean mixed layer potential temperature: 298.56
Mean mixed layer mixing ratio: 17.50
1000 hPa to 500 hPa thickness: 5752.00
Precipitable water [mm] for entire sounding: 36.31
</PRE>
<H2>96749 WIII Jakarta Observations at 00Z 03 Oct 1995</H2>
<PRE>
-----------------------------------------------------------------------------
PRES HGHT TEMP DWPT RELH MIXR DRCT SKNT THTA THTE THTV
hPa m C C % g/kg deg knot K K K
-----------------------------------------------------------------------------
1012.0 8 23.6 22.9 96 17.72 140 2 295.7 346.9 298.9
1000.0 107 24.0 21.6 86 16.54 135 3 297.1 345.2 300.1
990.0 195 24.4 20.3 78 15.39 128 4 298.4 343.4 301.2
945.4 600 22.9 20.2 85 16.00 95 7 300.9 348.0 303.7
925.0 791 22.2 20.1 88 16.29 100 6 302.0 350.3 304.9
913.5 900 21.9 18.2 80 14.63 105 6 302.8 346.3 305.4
911.0 924 21.8 17.8 78 14.28 108 6 302.9 345.4 305.5
850.0 1522 17.4 16.7 96 14.28 175 6 304.4 347.1 307.0
836.0 1665 16.4 16.4 100 14.24 157 7 304.8 347.5 307.4
811.0 1925 15.0 14.7 98 13.14 123 8 305.9 345.6 308.3
795.0 2095 14.2 7.2 63 8.08 101 9 306.8 331.6 308.3
794.5 2100 14.2 7.2 63 8.05 100 9 306.8 331.5 308.3
745.0 2642 10.4 2.4 58 6.14 64 11 308.4 327.6 309.6
736.0 2744 11.0 0.0 47 5.23 57 11 310.2 326.7 311.1
713.8 3000 9.2 5.0 75 7.70 40 12 310.9 335.0 312.4
711.0 3033 9.0 5.6 79 8.08 40 12 311.0 336.2 312.6
700.0 3163 8.6 1.6 61 6.18 40 12 312.0 331.5 313.1
688.5 3300 8.3 -6.0 36 3.57 60 12 313.1 324.8 313.8
678.0 3427 8.0 -13.0 21 2.08 70 12 314.2 321.2 314.6
642.0 3874 5.0 -2.0 61 5.17 108 11 315.7 332.4 316.7
633.0 3989 4.4 -11.6 30 2.50 117 10 316.3 324.7 316.8
616.6 4200 3.1 -14.1 27 2.09 135 10 317.1 324.3 317.6
580.0 4694 0.0 -20.0 21 1.36 164 13 319.1 323.9 319.4
572.3 4800 -0.4 -20.7 20 1.29 170 14 319.9 324.5 320.1
510.8 5700 -4.0 -26.6 15 0.86 80 10 326.1 329.2 326.2
500.0 5870 -4.7 -27.7 15 0.79 80 10 327.2 330.2 327.4
497.0 5917 -4.9 -27.9 15 0.78 71 13 327.6 330.5 327.7
491.7 6000 -5.5 -28.3 15 0.76 55 19 327.9 330.7 328.0
473.0 6300 -7.6 -29.9 15 0.68 55 16 328.9 331.4 329.0
436.0 6930 -12.1 -33.1 16 0.54 77 17 330.9 333.0 331.0
400.0 7580 -17.9 -37.9 16 0.37 100 19 331.6 333.1 331.7
388.3 7800 -19.9 -39.9 15 0.31 105 20 331.8 333.1 331.9
386.0 7844 -20.3 -40.3 15 0.30 103 20 331.9 333.1 331.9
372.0 8117 -18.3 -38.3 16 0.38 91 23 338.1 339.6 338.1
343.6 8700 -22.1 -41.4 16 0.30 65 29 340.7 342.0 340.8
329.0 9018 -24.1 -43.1 16 0.26 73 27 342.2 343.2 342.2
300.0 9680 -29.9 -44.9 22 0.23 90 22 343.1 344.1 343.2
278.6 10200 -34.3 85 37 344.1 344.1
266.9 10500 -36.8 60 32 344.7 344.7
255.8 10800 -39.4 65 27 345.2 345.2
250.0 10960 -40.7 65 27 345.4 345.4
204.0 12300 -51.8 55 23 348.6 348.6
200.0 12430 -52.9 55 23 348.8 348.8
194.6 12600 -55.0 60 23 348.1 348.1
160.7 13800 -70.1 35 39 342.4 342.4
153.2 14100 -73.9 35 41 340.6 340.6
150.0 14230 -75.5 35 41 339.9 339.9
131.5 15000 -76.3 50 53 351.6 351.6
124.9 15300 -76.6 50 57 356.2 356.2
122.0 15436 -76.7 57 45 358.3 358.3
118.6 15600 -77.3 65 31 360.2 360.2
115.0 15779 -77.9 65 31 362.2 362.2
112.6 15900 -77.7 85 17 364.8 364.8
107.0 16200 -77.2 130 10 371.2 371.2
100.0 16590 -76.5 120 18 379.7 379.7
</PRE><H3>Station information and sounding indices</H3><PRE>
Station identifier: WIII
Station number: 96749
Observation time: 951003/0000
Station latitude: -6.11
Station longitude: 106.65
Station elevation: 8.0
Showalter index: -0.58
Lifted index: 0.17
LIFT computed using virtual temperature: -0.57
SWEAT index: 222.41
K index: 31.80
Cross totals index: 21.40
Vertical totals index: 22.10
Totals totals index: 43.50
Convective Available Potential Energy: 268.43
CAPE using virtual temperature: 431.38
Convective Inhibition: -84.04
CINS using virtual temperature: -81.56
Equilibrum Level: 141.42
Equilibrum Level using virtual temperature: 141.35
Level of Free Convection: 784.91
LFCT using virtual temperature: 804.89
Bulk Richardson Number: 221.19
Bulk Richardson Number using CAPV: 355.46
Temp [K] of the Lifted Condensation Level: 293.21
Pres [hPa] of the Lifted Condensation Level: 940.03
Mean mixed layer potential temperature: 298.46
Mean mixed layer mixing ratio: 16.01
1000 hPa to 500 hPa thickness: 5763.00
Precipitable water [mm] for entire sounding: 44.54
and here my data
data
and this is what i want to get
contoh

adding new column to data frame in R

rate len ADT trks sigs1 slim shld lane acpt itg lwid hwy
1 4.58 4.99 69 8 0.20040080 55 10 8 4.6 1.20 12 FAI
2 2.86 16.11 73 8 0.06207325 60 10 4 4.4 1.43 12 FAI
3 3.02 9.75 49 10 0.10256410 60 10 4 4.7 1.54 12 FAI
4 2.29 10.65 61 13 0.09389671 65 10 6 3.8 0.94 12 FAI
5 1.61 20.01 28 12 0.04997501 70 10 4 2.2 0.65 12 FAI
6 6.87 5.97 30 6 2.00750419 55 10 4 24.8 0.34 12 PA
7 3.85 8.57 46 8 0.81668611 55 8 4 11.0 0.47 12 PA
8 6.12 5.24 25 9 0.57083969 55 10 4 18.5 0.38 12 PA
9 3.29 15.79 43 12 1.45333122 50 4 4 7.5 0.95 12 PA
I got a question in adding a new column, my data frame is called highway1,and i want to add a column named S/N, as slim divided by acpt, what can I do?
Thanks
> mydf$SN <- mydf$slim/mydf$acpt
> mydf
rate len ADT trks sigs1 slim shld lane acpt itg lwid hwy SN
1 4.58 4.99 69 8 0.20040080 55 10 8 4.6 1.20 12 FAI 11.956522
2 2.86 16.11 73 8 0.06207325 60 10 4 4.4 1.43 12 FAI 13.636364
3 3.02 9.75 49 10 0.10256410 60 10 4 4.7 1.54 12 FAI 12.765957
4 2.29 10.65 61 13 0.09389671 65 10 6 3.8 0.94 12 FAI 17.105263
5 1.61 20.01 28 12 0.04997501 70 10 4 2.2 0.65 12 FAI 31.818182
6 6.87 5.97 30 6 2.00750419 55 10 4 24.8 0.34 12 PA 2.217742
7 3.85 8.57 46 8 0.81668611 55 8 4 11.0 0.47 12 PA 5.000000
8 6.12 5.24 25 9 0.57083969 55 10 4 18.5 0.38 12 PA 2.972973
9 3.29 15.79 43 12 1.45333122 50 4 4 7.5 0.95 12 PA 6.666667
I hope an explanation is not necessary for the above.
While $ is the preferred route, you can also consider cbind.
First, create the numeric vector and assign it to SN:
SN <- Data[,6]/Data[,9]
Now you use cbind to append the numeric vector as a column to the existing data frame:
Data <- cbind(Data, SN)
Again, using the dollar operator $ is preferred, but it doesn't hurt seeing what an alternative looks like.

Write a dataframe formatted to a csv sheet

I am having a dataframe which looks like that:
> (eventStudyList120_After)
Dates Company Returns Market Returns Abnormal Returns
1 25.08.2009 4.81 0.62595516 4.184045
2 26.08.2009 4.85 0.89132960 3.958670
3 27.08.2009 4.81 -0.93323011 5.743230
4 28.08.2009 4.89 1.00388875 3.886111
5 31.08.2009 4.73 2.50655343 2.223447
6 01.09.2009 4.61 0.28025201 4.329748
7 02.09.2009 4.77 0.04999239 4.720008
8 03.09.2009 4.69 -1.52822071 6.218221
9 04.09.2009 4.89 -1.48860354 6.378604
10 07.09.2009 4.85 -0.38646531 5.236465
11 08.09.2009 4.89 -1.54065680 6.430657
12 09.09.2009 5.01 -0.35443455 5.364435
13 10.09.2009 5.01 -0.54107231 5.551072
14 11.09.2009 4.89 0.15189458 4.738105
15 14.09.2009 4.93 -0.36811321 5.298113
16 15.09.2009 4.93 -1.31185921 6.241859
17 16.09.2009 4.93 -0.53398643 5.463986
18 17.09.2009 4.97 0.44765285 4.522347
19 18.09.2009 5.01 0.81109101 4.198909
20 21.09.2009 5.01 -0.76254262 5.772543
21 22.09.2009 4.93 0.11309704 4.816903
22 23.09.2009 4.93 1.64429117 3.285709
23 24.09.2009 4.93 0.37294212 4.557058
24 25.09.2009 4.93 -2.59894035 7.528940
25 28.09.2009 5.21 0.29588776 4.914112
26 29.09.2009 4.93 0.49762314 4.432377
27 30.09.2009 5.41 2.17220569 3.237794
28 01.10.2009 5.21 1.67482716 3.535173
29 02.10.2009 5.25 -0.79014302 6.040143
30 05.10.2009 4.97 -2.69996146 7.669961
31 06.10.2009 4.97 0.18086490 4.789135
32 07.10.2009 5.21 -1.39072582 6.600726
33 08.10.2009 5.05 0.04210020 5.007900
34 09.10.2009 5.37 -1.14940251 6.519403
35 12.10.2009 5.13 1.16479551 3.965204
36 13.10.2009 5.37 -2.24208216 7.612082
37 14.10.2009 5.13 0.41327193 4.716728
38 15.10.2009 5.21 1.54473332 3.665267
39 16.10.2009 5.13 -1.73781565 6.867816
40 19.10.2009 5.01 0.66416288 4.345837
41 20.10.2009 5.09 -0.27007314 5.360073
42 21.10.2009 5.13 1.26968917 3.860311
43 22.10.2009 5.01 0.29432965 4.715670
44 23.10.2009 5.01 1.73758937 3.272411
45 26.10.2009 5.21 0.38854011 4.821460
46 27.10.2009 5.21 2.72671890 2.483281
47 28.10.2009 5.21 -1.76846884 6.978469
48 29.10.2009 5.41 2.95523593 2.454764
49 30.10.2009 5.37 -0.22681024 5.596810
50 02.11.2009 5.33 1.38835160 3.941648
51 03.11.2009 5.33 -1.83751398 7.167514
52 04.11.2009 5.21 -0.68721323 5.897213
53 05.11.2009 5.21 -0.26954741 5.479547
54 06.11.2009 5.21 -2.24083342 7.450833
55 09.11.2009 5.17 0.39168239 4.778318
56 10.11.2009 5.09 -0.99082271 6.080823
57 11.11.2009 5.17 0.07924735 5.090753
58 12.11.2009 5.81 -0.34424802 6.154248
59 13.11.2009 6.21 -2.00230195 8.212302
60 16.11.2009 7.81 0.48655978 7.323440
61 17.11.2009 7.69 -0.21092848 7.900928
62 18.11.2009 7.61 1.55605852 6.053941
63 19.11.2009 7.21 0.71028798 6.499712
64 20.11.2009 7.01 -2.38596631 9.395966
65 23.11.2009 7.25 0.55334705 6.696653
66 24.11.2009 7.21 -0.54239847 7.752398
67 25.11.2009 7.25 3.36386413 3.886136
68 26.11.2009 7.01 -1.28927630 8.299276
69 27.11.2009 7.09 0.98053264 6.109467
70 30.11.2009 7.09 -2.61935612 9.709356
71 01.12.2009 7.01 -0.11946242 7.129462
72 02.12.2009 7.21 0.17152317 7.038477
73 03.12.2009 7.21 -0.79343095 8.003431
74 04.12.2009 7.05 0.43919792 6.610802
75 07.12.2009 7.01 1.62169804 5.388302
76 08.12.2009 7.01 0.74055990 6.269440
77 09.12.2009 7.05 -0.99504492 8.045045
78 10.12.2009 7.21 -0.79728245 8.007282
79 11.12.2009 7.21 -0.73784636 7.947846
80 14.12.2009 6.97 -0.14656077 7.116561
81 15.12.2009 6.89 -1.42712116 8.317121
82 16.12.2009 6.97 0.95988962 6.010110
83 17.12.2009 6.69 0.22718293 6.462817
84 18.12.2009 6.53 -1.46958638 7.999586
85 21.12.2009 6.33 -0.21365446 6.543654
86 22.12.2009 6.65 -0.17256757 6.822568
87 23.12.2009 7.05 -0.59940253 7.649403
88 24.12.2009 7.05 NA NA
89 25.12.2009 7.05 NA NA
90 28.12.2009 7.05 -0.22307263 7.273073
91 29.12.2009 6.81 0.76736750 6.042632
92 30.12.2009 6.81 0.00000000 6.810000
93 31.12.2009 6.81 -1.50965723 8.319657
94 01.01.2010 6.81 NA NA
95 04.01.2010 6.65 0.06111069 6.588889
96 05.01.2010 6.65 -0.13159651 6.781597
97 06.01.2010 6.65 0.09545081 6.554549
98 07.01.2010 6.49 -0.32727619 6.817276
99 08.01.2010 6.81 -0.07225296 6.882253
100 11.01.2010 6.81 1.61131397 5.198686
101 12.01.2010 6.57 -0.40791980 6.977920
102 13.01.2010 6.85 -0.53016383 7.380164
103 14.01.2010 6.93 1.82016604 5.109834
104 15.01.2010 6.97 -0.62552046 7.595520
105 18.01.2010 6.93 -0.80490241 7.734902
106 19.01.2010 6.77 2.02857647 4.741424
107 20.01.2010 6.93 1.68204556 5.247954
108 21.01.2010 6.89 1.02683875 5.863161
109 22.01.2010 6.90 0.96765669 5.932343
110 25.01.2010 6.73 -0.57603687 7.306037
111 26.01.2010 6.81 0.50990350 6.300096
112 27.01.2010 6.81 1.64994011 5.160060
113 28.01.2010 6.61 -1.13511086 7.745111
114 29.01.2010 6.53 -0.82206204 7.352062
115 01.02.2010 7.03 -1.03993428 8.069934
116 02.02.2010 6.93 0.61692305 6.313077
117 03.02.2010 7.73 2.53012795 5.199872
118 04.02.2010 7.97 1.96223075 6.007769
119 05.02.2010 9.33 -0.76549820 10.095498
120 08.02.2010 8.01 -0.34391479 8.353915
When I write it to a csv sheet it looks like that:
write.table(eventStudyList120_After$`Abnormal Returns`, file = "C://Users//AbnormalReturns.csv", sep = ";")
In fact I want to let it look like that:
So my question is:
How to write the data frame as it is into a csv and how to transpose the Abnormal return column and put the header as in the example sheet?
Two approaches: transpose the data in R or in Excel
In R
Add an index column, select the columns you want and transpose the data using the function t
d <- anscombe
d$index <- 1:nrow(anscombe)
td <- t(d[c("index", "x1")])
write.table(td, "filename.csv", col.names = F, sep = ";")
Result:
"index";1;2;3;4;5;6;7;8;9;10;11
"x1";10;8;13;9;11;14;6;4;12;7;5
In Excel
Excel allows you to transpose data as well: http://office.microsoft.com/en-us/excel-help/switch-transpose-columns-and-rows-HP010224502.aspx

Aggregating multiple subtotals?

Is there a way to aggregate multiple sub-totals with reshape2? E.g. for the airquality dataset
require(reshape2)
require(plyr)
names(airquality) <- tolower(names(airquality))
aqm <- melt(airquality, id=c("month", "day"), na.rm=TRUE)
aqm <- subset(aqm, month %in% 5:6 & day %in% 1:7)
I can make a subtotal column for each month, that has the average for all variables within that month:
dcast(aqm, day ~ month+variable, mean, margins = "variable")
day 5_ozone 5_solar.r 5_wind 5_temp 5_(all) 6_ozone 6_solar.r
1 1 41 190 7.4 67 76.350 NaN 286
2 2 36 118 8.0 72 58.500 NaN 287
3 3 12 149 12.6 74 61.900 NaN 242
4 4 18 313 11.5 62 101.125 NaN 186
5 5 NaN NaN 14.3 56 35.150 NaN 220
6 6 28 NaN 14.9 66 36.300 NaN 264
7 7 23 299 8.6 65 98.900 29 127
6_wind 6_temp 6_(all)
1 8.6 78 124.20000
2 9.7 74 123.56667
3 16.1 67 108.36667
4 9.2 84 93.06667
5 8.6 85 104.53333
6 14.3 79 119.10000
7 9.7 82 61.92500
I can also make a subtotal column for each variable, that has the average for all months within that variable:
dcast(aqm, day ~ variable+month, mean, margins = "month")
day ozone_5 ozone_6 ozone_(all) solar.r_5 solar.r_6 solar.r_(all)
1 1 41 NaN 41 190 286 238.0
2 2 36 NaN 36 118 287 202.5
3 3 12 NaN 12 149 242 195.5
4 4 18 NaN 18 313 186 249.5
5 5 NaN NaN NaN NaN 220 220.0
6 6 28 NaN 28 NaN 264 264.0
7 7 23 29 26 299 127 213.0
wind_5 wind_6 wind_(all) temp_5 temp_6 temp_(all)
1 7.4 8.6 8.00 67 78 72.5
2 8.0 9.7 8.85 72 74 73.0
3 12.6 16.1 14.35 74 67 70.5
4 11.5 9.2 10.35 62 84 73.0
5 14.3 8.6 11.45 56 85 70.5
6 14.9 14.3 14.60 66 79 72.5
7 8.6 9.7 9.15 65 82 73.5
Is there a way to tell reshape2 to calculate both sets of subtotals in one command? This command is close, adding in the grand total, but omits the monthly subtotals:
dcast(aqm, day ~ variable+month, mean, margins = c("variable", "month"))
If I get your question right, you can use
acast(aqm, day ~ variable ~ month, mean, margins = c("variable", "month"))[,,'(all)']
The acast gets you the summary for each day over each variable over each month. The total aggregate "slice" ([,,'(all)']) has a row for each day, with a column for each variable (averaged over all months) and a '(all)' column averaging each day, over all variables over all months.
Is this what you needed?

Resources