R update values based on event based on multiple columns - r

I recently asked a similar question here for having all "Activities" in one column. The solution provided worked very well. Now I would like to change something in the design to be able to record more detailed information. The table shows information recorded from different fields over several years. All activities on the fields are recored by date. Now I would like to add a "Season" column that groups all values belonging to a harvest season. As harvest season I define the time in between two harvest events. (see table at the bottom on how the result should look like). The problem here is that seeding is sometimes done in the previous year (e.g.2012) but fields are harvested in 2013. All events would need to be grouped as 2013.
What would I need to change if I start recording more information and give all "Activities" a separate column? I tried:
library(data.table)
DF <- read.table(text="ID|Field|Date |Tillage|Seeding|Fertilizer|Spraying|Harvest
1|A |2012/08/01|Plough |NA|NA|NA|NA
2|A |2012/08/24|NA |Wheat|NA|NA|NA
3|A |2013/03/05|NA |NA|NA|ProduktA|NA
4|A |2013/03/05|NA|NA|TypeB|NA|NA
5|A |2013/07/25|NA |NA|NA|NA|9t
6|B |2012/09/01|Plough |NA|NA|NA|NA
7|B |2012/09/05|NA |Barley|NA|NA|NA
8|B |2013/04/05|NA |NA|NA|ProductB|NA
9|B |2013/07/28|NA |NA|NA|NA|10t
10|B |2010/08/24|Cultivator |NA|NA|NA|NA
11|B |2010/09/29|NA |NA|NA|NA|NA
12|B |2011/05/01|NA|NA|TypeB|NA|NA
13|B |2011/07/12|NA |NA|NA|NA|6t
14|A |2011/09/01|NA |Barley|NA|NA|NA
15|A |2011/10/10|NA |NA|NA|ProductC|NA
16|A |2012/04/10|NA|NA|TypeA|NA|NA
17|A |2012/08/02|NA |NA|NA|NA|7t|",
sep="|", header=TRUE, stringsAsFactors=FALSE)
DT <- data.table(DF)
DT[, Harvest:=gsub(" ", "", Harvest, fixed=TRUE)]
DT[, Date:=as.POSIXct(Date)]
setkeyv(DT, c("Field", "Date"))
DT[, Season:=cumsum(c("", !is.na(head(Harvest, -1)))), by=Field]
DT[, Season:=max(year(Date)), by=list(Field, Season)]
However, that seems not to work. The result should look like this with a "season" column at the end that indicates the season:
ID|Field|Date |Tillage|Seeding|Fertilizer|Spraying|Harvest|Season
1|A |2012/08/01|Plough |NA|NA|NA|NA|2013
2|A |2012/08/24|NA |Wheat|NA|NA|NA|2013
3|A |2013/03/05|NA |NA|NA|ProduktA|NA|2013
4|A |2013/03/05|NA|NA|TypeB|NA|NA|2013
5|A |2013/07/25|NA |NA|NA|NA|9t|2013
6|B |2012/09/01|Plough |NA|NA|NA|NA|2013
7|B |2012/09/05|NA |Barley|NA|NA|NA|2013
8|B |2013/04/05|NA |NA|NA|ProductB|NA|2013
9|B |2013/07/28|NA |NA|NA|NA|10t|2013
10|B |2010/08/24|Cultivator |NA|NA|NA|NA|2011
11|B |2010/09/29|NA |NA|NA|NA|NA|2011
12|B |2011/05/01|NA|NA|TypeB|NA|NA|2011
13|B |2011/07/12|NA |NA|NA|NA|6t|2011
14|A |2011/09/01|NA |Barley|NA|NA|NA|2012
15|A |2011/10/10|NA |NA|NA|ProductC|NA|2012
16|A |2012/04/10|NA|NA|TypeA|NA|NA|2012
17|A |2012/08/02|NA |NA|NA|NA|7t||2012

The only difference to OP's other question is that there are some additional columns and that the condition for extracting the harvest dates for applying my rolling join answer needs to be amended:
library(data.table)
setDT(DF)[!is.na(Harvest), .(Field, Date, Season = year(Date))][
DF, on = .(Field, Date), roll = -Inf]
Field Date Season ID Tillage Seeding Fertilizer Spraying Harvest
1: A 2012/08/01 2012 1 Plough NA NA NA NA
2: A 2012/08/24 2013 2 NA Wheat NA NA NA
3: A 2013/03/05 2013 3 NA NA NA ProduktA NA
4: A 2013/03/05 2013 4 NA NA TypeB NA NA
5: A 2013/07/25 2013 5 NA NA NA NA 9t
6: B 2012/09/01 2013 6 Plough NA NA NA NA
7: B 2012/09/05 2013 7 NA Barley NA NA NA
8: B 2013/04/05 2013 8 NA NA NA ProductB NA
9: B 2013/07/28 2013 9 NA NA NA NA 10t
10: B 2010/08/24 2011 10 Cultivator NA NA NA NA
11: B 2010/09/29 2011 11 NA NA NA NA NA
12: B 2011/05/01 2011 12 NA NA TypeB NA NA
13: B 2011/07/12 2011 13 NA NA NA NA 6t
14: A 2011/09/01 2012 14 NA Barley NA NA NA
15: A 2011/10/10 2012 15 NA NA NA ProductC NA
16: A 2012/04/10 2012 16 NA NA TypeA NA NA
17: A 2012/08/02 2012 17 NA NA NA NA 7t
Note that the rolling join has exhibited a flaw in the sample dataset. Row 1 shows Season 2012 although the subsequent harvest (according to OP's ID) should be in 2013. Reason is that the dates for tillage and harvest are intermixed for field A. Tillage date of Field A in row 1 is 2012/08/01 while the harvest date of the same field in row 17 is 2012/08/02, one day after tillage.
In case the column order is important, the setcolorder() function can be used to order the columns in place, i.e., without copying:
result <- setDT(DF)[!is.na(Harvest), .(Field, Date, Season = year(Date))][
DF, on = .(Field, Date), roll = -Inf]
setcolorder(result, c(names(DF), "Season"))[]
ID Field Date Tillage Seeding Fertilizer Spraying Harvest Season
1: 1 A 2012/08/01 Plough NA NA NA NA 2012
2: 2 A 2012/08/24 NA Wheat NA NA NA 2013
3: 3 A 2013/03/05 NA NA NA ProduktA NA 2013
4: 4 A 2013/03/05 NA NA TypeB NA NA 2013
5: 5 A 2013/07/25 NA NA NA NA 9t 2013
6: 6 B 2012/09/01 Plough NA NA NA NA 2013
7: 7 B 2012/09/05 NA Barley NA NA NA 2013
8: 8 B 2013/04/05 NA NA NA ProductB NA 2013
9: 9 B 2013/07/28 NA NA NA NA 10t 2013
10: 10 B 2010/08/24 Cultivator NA NA NA NA 2011
11: 11 B 2010/09/29 NA NA NA NA NA 2011
12: 12 B 2011/05/01 NA NA TypeB NA NA 2011
13: 13 B 2011/07/12 NA NA NA NA 6t 2011
14: 14 A 2011/09/01 NA Barley NA NA NA 2012
15: 15 A 2011/10/10 NA NA NA ProductC NA 2012
16: 16 A 2012/04/10 NA NA TypeA NA NA 2012
17: 17 A 2012/08/02 NA NA NA NA 7t 2012
Data
library(data.table)
DF <- fread(
"ID|Field|Date |Tillage|Seeding|Fertilizer|Spraying|Harvest
1|A |2012/08/01|Plough |NA|NA|NA|NA
2|A |2012/08/24|NA |Wheat|NA|NA|NA
3|A |2013/03/05|NA |NA|NA|ProduktA|NA
4|A |2013/03/05|NA|NA|TypeB|NA|NA
5|A |2013/07/25|NA |NA|NA|NA|9t
6|B |2012/09/01|Plough |NA|NA|NA|NA
7|B |2012/09/05|NA |Barley|NA|NA|NA
8|B |2013/04/05|NA |NA|NA|ProductB|NA
9|B |2013/07/28|NA |NA|NA|NA|10t
10|B |2010/08/24|Cultivator |NA|NA|NA|NA
11|B |2010/09/29|NA |NA|NA|NA|NA
12|B |2011/05/01|NA|NA|TypeB|NA|NA
13|B |2011/07/12|NA |NA|NA|NA|6t
14|A |2011/09/01|NA |Barley|NA|NA|NA
15|A |2011/10/10|NA |NA|NA|ProductC|NA
16|A |2012/04/10|NA|NA|TypeA|NA|NA
17|A |2012/08/02|NA |NA|NA|NA|7t")

Related

R Modify dataframes to the same length

I've got a list containing multiple dataframes with two columns (Year and area).
The problem is that some dataframes only contain information from 2002-2015 or 2003-2017 and other from 2001-2018 and so one. So they differ in length.
list:
list(structure(list(Year= c(2001,2002,2004,2005), Area=c(1,2,3,4), class ="data.frame"),
structure(list(Year= c(2001,2004,2018), Area=c(1,2,4), class ="data.frame",
(list(Year= c(2008,2009,2014,2015,2016), Area=c(1,2,3,4,5), class ="data.frame"))
How can I modify them all to the same length (from 2001-2018) by adding NA or better 0 for area if there is no area information for that year.
Let
A = data.frame(Year= c(2001,2002,2004,2005), Area=c(1,2,3,4))
B = data.frame(Year= c(2001,2004,2018), Area=c(1,2,4))
C = list(A, B)
Then we have
Ref = data.frame(Year = 2001:2018)
New.List = lapply(C, function(x) dplyr::left_join(Ref, x))
with the desired result
[[1]]
Year Area
1 2001 1
2 2002 2
3 2003 NA
4 2004 3
5 2005 4
6 2006 NA
7 2007 NA
8 2008 NA
9 2009 NA
10 2010 NA
11 2011 NA
12 2012 NA
13 2013 NA
14 2014 NA
15 2015 NA
16 2016 NA
17 2017 NA
18 2018 NA
[[2]]
Year Area
1 2001 1
2 2002 NA
3 2003 NA
4 2004 2
5 2005 NA
6 2006 NA
7 2007 NA
8 2008 NA
9 2009 NA
10 2010 NA
11 2011 NA
12 2012 NA
13 2013 NA
14 2014 NA
15 2015 NA
16 2016 NA
17 2017 NA
18 2018 4
To make sure that all data.frames in the list share the same spelling of Year, do
lapply(C, function(x) {colnames(x)[1] = "Year"; x})
provided the first column is always the Year-column.

Fill in blanks from the previous cell multiplied by the current cell in a different column in R

I have the below data:
year<-c(2015:2030)
actual<-c(NA,NA,NA,3170.620936,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA)
delta<-c(0.276674282,
0.23515258,
0.133083622,
0.236098022,
0.399974342,
0.385942573,
0.165095681,
0.163945346,
0.155695778,
0.147270755,
0.146505261,
0.133997582,
0.123100693,
0.119131947,
0.115589755,
0.103675414)
df<-cbind.data.frame(year,actual,delta)
df
year actual delta
1 2015 NA 0.2766743
2 2016 NA 0.2351526
3 2017 NA 0.1330836
4 2018 3170.621 0.2360980
5 2019 NA 0.3999743
6 2020 NA 0.3859426
7 2021 NA 0.1650957
8 2022 NA 0.1639453
9 2023 NA 0.1556958
10 2024 NA 0.1472708
11 2025 NA 0.1465053
12 2026 NA 0.1339976
13 2027 NA 0.1231007
14 2028 NA 0.1191319
15 2029 NA 0.1155898
16 2030 NA 0.1036754
What I am trying to do is to replace NA's after the last valid data point multiplied by the current delta. So, in this case, I want to multiply "actual" in 2016 by "delta" in 2017 and fill in the 2017 value for "actual". I have tried the below code with no success:
df$actual_filled<-df$actual
df
library(dplyr)
df<-df%>%
mutate( actual_filled=lag(actual_filled,1)*delta)
df
year actual delta actual_filled
1 2015 NA 0.2766743 NA
2 2016 NA 0.2351526 NA
3 2017 NA 0.1330836 NA
4 2018 3170.621 0.2360980 NA
5 2019 NA 0.3999743 1268.167
6 2020 NA 0.3859426 NA
7 2021 NA 0.1650957 NA
8 2022 NA 0.1639453 NA
9 2023 NA 0.1556958 NA
10 2024 NA 0.1472708 NA
11 2025 NA 0.1465053 NA
12 2026 NA 0.1339976 NA
13 2027 NA 0.1231007 NA
14 2028 NA 0.1191319 NA
15 2029 NA 0.1155898 NA
16 2030 NA 0.1036754 NA
As you can see, the filling process ends in 2019. I thought it would populate the new data till the end of the series. The code I wrote is acting as if I am using the "actual" data, rather than "actual_filled". Could someone tell me what I am doing wrong and how I can fix this?
Here's a solution that works via a loop:
df$actual_filled<-df$actual
for (row in 2:nrow(df)) {
if(!is.na(df$actual_filled[row-1])) {
df$actual_filled[row] <- df$delta[row] * df$actual_filled[row-1]
}
}
I'm new to R so it may not be the best solution!

Duplicate rows while using Merge function in R - but I dont want the sum

So here's my problem, I have about 40 datasets, all csv files that contain only two columns, (a) Date and (b) Price (for each dataset the price column is named as its country).. I used the merge function as follows to consolidate all data into a single dataset with one date column and several price columns. This was the function I used:
merged <- Reduce(function(x, y) merge(x, y, by="Date", all=TRUE), list(a,b,c,d,e,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,aa,ab,ac,ad,ae,af,ag,ah,ai,aj,ak,al,am,an))
What has happened is I have for instance in date column, 3 values for same date but the corresponding country values are split. e.g.:
# Date India China South Korea
# 01-Jan-2000 5445 NA 4445 NA
# 01-Jan-2000 NA 1234 NA NA
# 01-Jan-2000 NA NA NA 5678
I actually want
# 01-Jan-2000 5445 1234 4445 5678
I dont know how to get this, as the other questions related to this topic ask for summation of values which I clearly do not need. This is a simple example. Unfortunately I have daily data from Jan 2000 to November 2016 for about 43 countries, all messed up. Any help to solve this would be appreciated.
I would append all dataframes using rbind and reshape the result with spread(). As merging depends on the dataframe you start with.
Reproducable example:
library(dplyr)
a <- data.frame(date = Sys.Date()-1:10, cntry = "China", price=round(rnorm(10,20,5),2))
b <- data.frame(date = Sys.Date()-6:15, cntry = "Netherlands", price=round(rnorm(10,50,10),2))
c <- data.frame(date = Sys.Date()-11:20, cntry = "USA", price=round(rnorm(10,70,25),2))
all <- do.call(rbind, list(a,b,c))
all %>% group_by(date) %>% spread(cntry, price)
results in:
date China Netherlands USA
* <date> <dbl> <dbl> <dbl>
1 2016-11-29 NA NA 78.75
2 2016-11-30 NA NA 66.22
3 2016-12-01 NA NA 86.04
4 2016-12-02 NA NA 17.07
5 2016-12-03 NA NA 75.72
6 2016-12-04 NA 46.90 39.57
7 2016-12-05 NA 51.80 65.11
8 2016-12-06 NA 57.50 96.36
9 2016-12-07 NA 46.42 46.93
10 2016-12-08 NA 45.71 57.63
11 2016-12-09 15.41 60.09 NA
12 2016-12-10 16.66 60.07 NA
13 2016-12-11 23.72 66.21 NA
14 2016-12-12 19.82 45.46 NA
15 2016-12-13 14.22 45.07 NA
16 2016-12-14 27.26 NA NA
17 2016-12-15 20.08 NA NA
18 2016-12-16 15.79 NA NA
19 2016-12-17 17.66 NA NA
20 2016-12-18 26.77 NA NA

Daily averages of all data frame variables including NA values with aggregate function

I want to calculate daily means of all variables in my dataframe which includes NA values. All my databases have a value every 30min, so I´m very interested in using the timestamp with aggregate function to obtain daily, weekly, monthly... aggregated data.
My dataframe is 37795 rows x 54 variables. I´ve tried two ways to do that, first option does not give me daily means cause I obtained too high values (not logical). Second option gives me almost all NA values. I do not what to do.
I write my dataframe head and code below.
head(data)
timestamp day month year.x hour minute doy.x rn_1_1_1 ppfd_1_1_1
1 2013-07-06 00:00:00 6 7 2013 0 0 187.000 -84.37381 0.754
2 2013-07-06 00:30:00 6 7 2013 0 30 187.020 -84.07990 0.808
3 2013-07-06 01:00:00 6 7 2013 1 0 187.041 -82.19991 0.808
4 2013-07-06 01:30:00 6 7 2013 1 30 187.062 -81.12341 0.831
5 2013-07-06 02:00:00 6 7 2013 2 0 187.083 -79.57474 0.708
6 2013-07-06 02:30:00 6 7 2013 2 30 187.104 -77.72460 0.639
ppfdr_1_1_1 p_rain_1_1_1 swc_1_1_1 swc_2_1_1 swc_3_1_1 air_pressure air_pressure.1
1 0.624 0 0.07230304 0.09577876 0.134602791 101212.4165 1012.124165
2 0.587 0 0.07233134 0.09569421 0.134479816 101181.8094 1011.818094
3 0.713 0 0.07242914 0.09566160 0.134203719 101166.0948 1011.660948
4 0.72 0 0.07252077 0.09563419 0.134149141 101144.6151 1011.446151
5 0.564 0 0.07261925 0.09560297 0.134095791 101144.8662 1011.448662
6 0.706 0 0.07271843 0.09557789 0.134037119 101144.5084 1011.445084
u_rot v_rot w_rot wind_speed u. h_scr_qc01_man
1 5.546047919 1.42E-14 4.76E-16 5.546047919 0.426515403 -28.07603618
2 5.122724997 6.94E-15 -8.00E-16 5.122724997 0.408213459 -34.39110979
3 5.248639421 4.56E-15 7.28E-17 5.248639421 0.393959075 -33.29033501
4 4.845257286 2.81E-14 -1.33E-17 4.845257286 0.365475898 -32.62427147
5 4.486426895 1.39E-14 -4.43E-16 4.486426895 0.335905384 -33.80219189
6 4.109603841 7.08E-15 -9.76E-16 4.109603841 0.312610588 -35.77289349
fco2_scr_qc01_man le_scr_qc01_man fco2_scr_qc0 fco2_scr_qc0_man date year.y time
1 -0.306504951 NA NA NA 06-jul-13 2013 0:00
2 -0.206266524 NA -0.206266524 -0.206266524 06-jul-13 2013 0:30
3 -0.268508139 NA -0.268508139 -0.268508139 06-jul-13 2013 1:00
4 -0.203804516 0.426531598 -0.203804516 -0.203804516 06-jul-13 2013 1:30
5 -0.217438742 -0.358248118 -0.217438742 -0.217438742 06-jul-13 2013 2:00
6 -0.193778528 2.571063044 -0.193778528 -0.193778528 06-jul-13 2013 2:30
doy_ent doy.y doy_cum doy_cum_ent mes nrecord bat panel_temp vwc_0.1
1 187 187.0000 187.0000 187 7 24 12.57 22.93 0.06284828
2 187 187.0208 187.0208 187 7 25 12.56 22.85 0.06267169
3 187 187.0417 187.0417 187 7 26 12.55 22.58 0.06261738
4 187 187.0625 187.0625 187 7 27 12.54 22.3 0.06247716
5 187 187.0833 187.0833 187 7 28 12.53 22.01 0.06249525
6 187 187.1042 187.1042 187 7 29 12.52 21.82 0.06236862
vwc_0.5 vwc_1.5 temp_0.1 temp_0.5 temp_1.5 tempsd_0.1 tempsd_0.5 tempsd_1.5
1 0.07569027 0.1007845 30.9 28.96 25.14 0.372 0.961 0.767
2 0.07569027 0.1007743 30.8 28.85 24.99 0.181 1.361 1.087
3 0.07568554 0.1008558 30.53 28.8 25.03 0.98 1.476 0.351
4 0.07559577 0.1008507 30.52 29.09 25.11 0.186 0.229 0.556
5 0.07559577 0.1007743 30.11 29.09 24.87 1.331 0.191 0.954
6 0.07556271 0.1007285 30.15 29.33 25.04 1.447 1.078 0.2
pair pair_avg CO2_0.1 CO2_0.5 CO2_1.5 DCO2_0.1 DCO2_0.5
1 101.2124 101.2118 1161.592832 3275.1134 4888.231603 -24.67422109 34.88538221
2 101.1818 101.2131 1168.144925 3338.24016 4941.418642 6.55209301 63.12675931
3 101.1661 101.2090 1201.049131 3435.235974 5012.525851 32.90420541 96.9958144
4 101.1446 101.2007 1268.613941 3556.723878 5092.96558 67.56481067 121.4879035
5 101.1449 101.1906 1364.315214 3680.188043 5164.795759 95.7012722 123.464165
6 101.1445 101.1805 1472.975286 3808.988677 5236.40855 108.6600723 128.8006346
DCO2_1.5
1 31.30293041
2 53.18703947
3 71.10720845
4 80.43972916
5 71.83017884
6 71.61279156
## Daily avg - OPTION 1
data$timestamp <- as.POSIXct(data$timestamp, format = "%d/%m/%Y %H:%M",tz ="GMT")
> dates <- format(data$timestamp,"%Y/%m/%d",tz = "GMT")
> datadates <- cbind(data,dates)
> dailydata_avg <- aggregate(. ~ dates, datadates, FUN=mean, na.rm=TRUE, na.action = "na.pass")
head(dailydata_avg)
dates timestamp day month year.x hour minute doy.x rn_1_1_1 ppfd_1_1_1
1 2013/07/06 1373111100 6 7 2013 11.5 15 187.489 159.7788 3580.562
2 2013/07/07 1373197500 7 7 2013 11.5 15 188.489 154.0925 3506.688
3 2013/07/08 1373283900 8 7 2013 11.5 15 189.489 152.5259 3460.667
4 2013/07/09 1373370300 9 7 2013 11.5 15 190.489 131.1619 2965.250
5 2013/07/10 1373456700 10 7 2013 11.5 15 191.489 136.7853 3171.958
6 2013/07/11 1373543100 11 7 2013 11.5 15 192.489 145.2757 3282.167
ppfdr_1_1_1 p_rain_1_1_1 swc_1_1_1 swc_2_1_1 swc_3_1_1 air_pressure air_pressure.1
1 2552.396 1.0000 0.07095847 0.09606378 18341.81 25940.167 25940.167
2 2532.542 1.0000 0.06994341 0.09502167 18065.98 24891.000 24891.000
3 2523.562 1.0000 0.06860553 0.09379282 17777.02 23107.271 23107.271
4 2336.000 1.0000 0.06717054 0.09268716 17526.50 19309.500 19309.500
5 2607.229 1.0625 0.06620048 0.09166904 17275.56 8385.646 8385.646
6 2484.521 1.0000 0.06562964 0.09083684 17028.94 3535.438 3535.438
u_rot v_rot w_rot wind_speed u. h_scr_qc01_man fco2_scr_qc01_man
1 32167.83 2215.875 2041.354 32167.83 28531.44 18197.75 15365.65
2 30878.27 1911.312 1939.917 30878.27 26929.62 17605.52 14955.56
3 26052.96 2261.417 2116.458 26052.96 23305.83 19167.98 18399.33
4 17284.04 1987.438 2139.083 17284.04 17704.35 20349.92 18137.65
5 12028.06 2053.812 1960.417 12028.06 15670.00 21997.83 21120.19
6 15607.50 1997.417 1907.646 15607.50 15384.56 18000.94 18810.62
le_scr_qc01_man fco2_scr_qc0 fco2_scr_qc0_man date year.y time doy_ent doy.y
1 17409.67 13032.10 13027.90 137 2013 44.5 187 187.4896
2 15524.38 12077.17 12072.92 163 2013 44.5 188 188.4896
3 16407.71 14775.94 14770.56 189 2013 44.5 189 189.4896
4 16788.04 15024.79 15019.02 215 2013 44.5 190 190.4896
5 17955.58 17737.25 17730.75 241 2013 44.5 191 191.4896
6 14610.02 16605.48 16599.33 267 2013 44.5 192 192.4896
doy_cum doy_cum_ent mes nrecord bat panel_temp vwc_0.1 vwc_0.5 vwc_1.5
1 187.4896 187.5 7 28966.375 111.5208 1836.250 4638.833 4594.396 37.35417
2 188.4896 188.5 7 20801.417 111.7292 1900.812 4656.875 4392.979 26.68750
3 189.4896 189.5 7 4394.500 110.6042 1934.792 4675.604 4238.229 65.20833
4 190.4896 190.5 7 9467.708 104.0000 2090.896 4776.521 4178.729 54.12500
5 191.4896 191.5 7 14796.375 109.7500 2145.875 4907.292 4161.312 108.39583
6 192.4896 192.5 7 20127.958 109.3125 1934.375 4876.021 4123.458 143.10417
temp_0.1 temp_0.5 temp_1.5 tempsd_0.1 tempsd_0.5 tempsd_1.5 pair pair_avg CO2_0.1
1 2018.438 1565.812 797.8750 470.8125 474.3958 508.8333 101.1268 101.1323 10400.27
2 1998.438 1574.000 783.1875 478.3333 460.4583 566.0208 101.0764 101.0789 11292.75
3 1994.833 1568.104 780.2083 463.8125 453.1667 488.5625 100.9967 101.0036 13288.25
4 2042.625 1564.875 780.1667 465.0000 599.2708 437.6042 100.8520 100.8665 16156.60
5 2114.708 1576.729 780.5000 471.5833 406.5417 484.6875 100.4828 100.5169 18656.50
6 2124.604 1591.125 781.8125 516.7500 530.3333 510.7500 100.3025 100.2947 14586.60
CO2_0.5 CO2_1.5 DCO2_0.1 DCO2_0.5 DCO2_1.5
1 26360.38 34371.31 19795.81 20637.94 27123.92
2 26939.60 34558.17 18838.38 20464.56 20452.58
3 27603.06 34608.31 17413.15 19998.02 22754.85
4 28572.69 34678.38 19294.62 21894.92 18379.62
5 28983.29 34644.15 20251.17 20409.58 22077.40
6 28236.12 34736.67 17031.02 18852.04 19684.69`
## Daily avg - OPTION 2
data$timestamp <- as.POSIXct(data$timestamp, format = "%d/%m/%Y %H:%M",tz ="GMT")
datatime <- data$timestamp
dailydata_avg <- aggregate( data,
by = list('DATES'= format(datatime,'%Y%m%d' )),
FUN = mean, na.rm=T)
I obtain this console message:
1: In mean.default(X[[i]], ...) :
argument is not numeric or logical: returning NA
2: In mean.default(X[[i]], ...) :
argument is not numeric or logical: returning NA
3: In mean.default(X[[i]], ...) :
argument is not numeric or logical: returning NA
head(dailydata_avg)
DATES timestamp day month year.x hour minute doy.x rn_1_1_1 ppfd_1_1_1
1 20130706 2013-07-06 13:45:00 6 7 2013 11.5 15 187.489 159.7788 NA
2 20130707 2013-07-07 13:45:00 7 7 2013 11.5 15 188.489 154.0925 NA
3 20130708 2013-07-08 13:45:00 8 7 2013 11.5 15 189.489 152.5259 NA
4 20130709 2013-07-09 13:45:00 9 7 2013 11.5 15 190.489 131.1619 NA
5 20130710 2013-07-10 13:45:00 10 7 2013 11.5 15 191.489 136.7853 NA
6 20130711 2013-07-11 13:45:00 11 7 2013 11.5 15 192.489 145.2757 NA
ppfdr_1_1_1 p_rain_1_1_1 swc_1_1_1 swc_2_1_1 swc_3_1_1 air_pressure air_pressure.1
1 NA NA 0.07095847 0.09606378 NA NA NA
2 NA NA 0.06994341 0.09502167 NA NA NA
3 NA NA 0.06860553 0.09379282 NA NA NA
4 NA NA 0.06717054 0.09268716 NA NA NA
5 NA NA 0.06620048 0.09166904 NA NA NA
6 NA NA 0.06562964 0.09083684 NA NA NA
u_rot v_rot w_rot wind_speed u. h_scr_qc01_man fco2_scr_qc01_man le_scr_qc01_man
1 NA NA NA NA NA NA NA NA
2 NA NA NA NA NA NA NA NA
3 NA NA NA NA NA NA NA NA
4 NA NA NA NA NA NA NA NA
5 NA NA NA NA NA NA NA NA
6 NA NA NA NA NA NA NA NA
fco2_scr_qc0 fco2_scr_qc0_man date year.y time doy_ent doy.y doy_cum doy_cum_ent
1 NA NA NA 2013 NA 187 187.4896 187.4896 187.5
2 NA NA NA 2013 NA 188 188.4896 188.4896 188.5
3 NA NA NA 2013 NA 189 189.4896 189.4896 189.5
4 NA NA NA 2013 NA 190 190.4896 190.4896 190.5
5 NA NA NA 2013 NA 191 191.4896 191.4896 191.5
6 NA NA NA 2013 NA 192 192.4896 192.4896 192.5
mes nrecord bat panel_temp vwc_0.1 vwc_0.5 vwc_1.5 temp_0.1 temp_0.5 temp_1.5
1 7 NA NA NA NA NA NA NA NA NA
2 7 NA NA NA NA NA NA NA NA NA
3 7 NA NA NA NA NA NA NA NA NA
4 7 NA NA NA NA NA NA NA NA NA
5 7 NA NA NA NA NA NA NA NA NA
6 7 NA NA NA NA NA NA NA NA NA
tempsd_0.1 tempsd_0.5 tempsd_1.5 pair pair_avg CO2_0.1 CO2_0.5 CO2_1.5 DCO2_0.1
1 NA NA NA 101.1268 101.1323 NA NA NA NA
2 NA NA NA 101.0764 101.0789 NA NA NA NA
3 NA NA NA 100.9967 101.0036 NA NA NA NA
4 NA NA NA 100.8520 100.8665 NA NA NA NA
5 NA NA NA 100.4828 100.5169 NA NA NA NA
6 NA NA NA 100.3025 100.2947 NA NA NA NA
DCO2_0.5 DCO2_1.5
1 NA NA
2 NA NA
3 NA NA
4 NA NA
5 NA NA
6 NA NA
How could I do it?
Thanks!!
I didn't use the aggregate function, I used the tapply one.
This is the code, that deals with NA's, I came up with:
# create a sequence of DateTime with half-hourly data
DateTime <- seq.POSIXt(from = as.POSIXct("2015-05-01 00:00:00", tz = "Etc/GMT+12"),
to = as.POSIXct("2015-05-30 23:59:00", tz = "Etc/GMT+12"), by = 1800)
# create some dummy data of the same length as DateTime vector
aa <- runif(1440, 5.0, 7.5)
bb <- NA
df <- data.frame(DateTime, aa, bb)
# replace a cell with NA in the "a" column
df[19,2] <- NA # dataframe = df, row = 19, column = 2
# create DateHour column to use later
df$DateHour <- paste(format(df$DateTime, "%Y/%m/%d"), format(df$DateTime, "%H"), sep = " ")
View(df)
# Hourly means
# Calculate hourly mean values
aa.HourlyMean <- tapply(df$aa, df$DateHour, mean, na.rm = TRUE)
# convert the vector to dataframe
aa.HourlyMean <- data.frame(aa.HourlyMean)
# Extract the DateHour column from the "aa" dataframe
aa.HourlyMean$DateHour <- row.names(aa.HourlyMean);
# Delete rownames of "aa" dataframe
row.names(aa.HourlyMean) <- NULL
# Create a tidy DateTime column
aa.HourlyMean$DateTime <- as.POSIXct(aa.HourlyMean$DateHour, "%Y/%m/%d %H", tz = "Etc/GMT+12")
# change to a tidy dataframe
aa.HourlyMean <- aa.HourlyMean[,c(3,2,1)]
# You can delete any column (for example, DateHour) by
# aa.HourlyMean$Date <- NULL
# You can rename a column with "plyr" package by
# rename(aa.HourlyMean)[3] <- "NewColumnName"
# View the hourly mean of the "aa" dataframe
View(aa.HourlyMean)
# You can do the same with the "bb" vector
bb.HourlyMean <- tapply(df$bb, df$DateHour, mean, na.rm = TRUE)
bb.HourlyMean <- data.frame(bb.HourlyMean)
# View the hourly mean of the "bb" vector
View(bb.HourlyMean)
# /Hourly means
You then can combine in one dataframe the aa.HourlyMean and bb.HourlyMean vectors.
# Daily means
df$Date <- format(df$DateTime, "%Y/%m/%d")
aa.DailyMean <- tapply(df$aa, df$Days, mean, na.rm = TRUE)
aa.DailyMean <- data.frame(aa.DailyMean)
aa.DailyMean$Date <- row.names(aa.DailyMean); row.names(aa.DailyMean) <- NULL
aa.DailyMean <- aa.DailyMean[,c(2,1)]
View(aa.DailyMean)
# /Daily means
# Weekly means
df$YearWeek <- paste(format(df$DateTime, "%Y"), strftime(DateTime, format = "%W"), sep = " ")
aa.WeeklyMean <- tapply(df$aa, df$YearWeek, mean, na.rm = TRUE)
aa.WeeklyMean <- data.frame(aa.WeeklyMean)
aa.WeeklyMean$YearWeek <- row.names(aa.WeeklyMean); row.names(aa.WeeklyMean) <- NULL
aa.WeeklyMean <- aa.WeeklyMean[,c(2,1)]
View(aa.WeeklyMean)
# /Weekly means
I created the mean values for hourly, daily and weekly observations but you get the idea how to create the monthly, yearly, ... ones.

How to create a panel based on a long list of table in r

I have a data set like the following:
wk name score
3 - Davide - 3.070000
6 - Davide - 3.460000
7 - Davide - 3.480000
48 -Cringe- 2.773333
79 -Fabynsane- 2.330000
69 -PiDjO- 2.070000
61 -sjb- 2.310000
I want to use this information to construct a panel like the following:
name1 name2 name3 ...
wk1
wk2
wk3
...
I have tried dcast in reshape:
panel.num = dcast(data, name + wk ~ score)
but it gives me a panel like the following and this is apparently not the one I want:
Authorname wk.list 1 2 3 4 5 6 7 8 9 10 11 12 13
2 - Davide - 3 1 NA NA NA NA NA NA NA NA NA NA NA NA NA
3 - Davide - 6 1 NA NA NA NA NA NA NA NA NA NA NA NA NA
I am wondering what went wrong and how I could fix this issue. Thanks~
Try doing wk ~ name, ie
dat <- data.frame(wk=sample(1:100, 10),
name=sample(c("Davide", "Cringe", "Fabynsane"), 10, rep=T),
score=runif(10, 2, 3))
library(reshape2)
dcast(dat, wk ~ name)
# wk Cringe Davide Fabynsane
# 1 8 NA 2.225543 NA
# 2 12 NA NA 2.958040
# 3 46 NA 2.659209 NA
# 4 47 NA 2.086529 NA
# 5 59 NA NA 2.287232
Other options include
library(tidyr)
spread(dat, name, score)
Or reshape from base R
reshape(dat, idvar='wk', timevar='name', direction='wide')

Resources