how do you subset a data frame based on column names? - r

I have this data frame:
dput(df)
structure(list(Server = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "servera", class = "factor"),
Date = structure(1:6, .Label = c("7/13/2017 15:01", "7/13/2017 15:02",
"7/13/2017 15:03", "7/13/2017 15:04", "7/13/2017 15:05",
"7/13/2017 15:06"), class = "factor"), Host_CPU = c(1.812950134,
2.288070679, 1.563278198, 1.925239563, 5.350669861, 2.612503052
), UsedMemPercent = c(38.19, 38.19, 38.19, 38.19, 38.19,
38.22), jvm1 = c(10.91, 11.13, 11.34, 11.56, 11.77, 11.99
), jvm2 = c(11.47, 11.7, 11.91, 12.13, 12.35, 12.57), jvm3 = c(75.65,
76.88, 56.93, 58.99, 65.29, 67.97), jvm4 = c(39.43, 40.86,
42.27, 43.71, 45.09, 45.33), jvm5 = c(27.42, 29.63, 31.02,
32.37, 33.72, 37.71)), .Names = c("Server", "Date", "Host_CPU",
"UsedMemPercent", "jvm1", "jvm2", "jvm3", "jvm4", "jvm5"), class = "data.frame", row.names = c(NA,
-6L))
I only want to be able to subset this data frame based on the vectors names in this variable:
select<-c("jvm3", "jvm4", "jvm5")
so, my final df should look like this:
structure(list(Server = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "servera", class = "factor"),
Date = structure(1:6, .Label = c("7/13/2017 15:01", "7/13/2017 15:02",
"7/13/2017 15:03", "7/13/2017 15:04", "7/13/2017 15:05",
"7/13/2017 15:06"), class = "factor"), Host_CPU = c(1.812950134,
2.288070679, 1.563278198, 1.925239563, 5.350669861, 2.612503052
), UsedMemPercent = c(38.19, 38.19, 38.19, 38.19, 38.19,
38.22), jvm3 = c(75.65, 76.88, 56.93, 58.99, 65.29, 67.97
), jvm4 = c(39.43, 40.86, 42.27, 43.71, 45.09, 45.33), jvm5 = c(27.42,
29.63, 31.02, 32.37, 33.72, 37.71)), .Names = c("Server",
"Date", "Host_CPU", "UsedMemPercent", "jvm3", "jvm4", "jvm5"), class = "data.frame", row.names = c(NA,
-6L))
any ideas?

Please revisit indices. If you use the index mechanism [ in R, you can use mainly three types of indices:
logical vectors: same length as the number of columns, TRUE means select the column
numeric vectors: selects columns based on position
character vectors: select columns based on name
If you use the index mechanism for data frames, you can treat these objects in two ways:
as a list, because they are internally lists
as a matrix, because they mimick matrix behaviour in many cases
Take the iris data frame as example to compare the multiple ways you can select columns from a data frame. If you treat it as a list, you have the following two options:
Use [[ if you want a single column in the form of a vector:
iris[["Species"]]
# [1] setosa setosa setosa ... : is a vector
Use [ if you want one or more columns, but you need a data frame back :
iris["Species"]
iris[c("Sepal.Width", "Species")]
If you treat it as a matrix, you just do the exact same as you would do with a matrix. If you don't specify any row indices, these commands are actually equivalent to the ones used above:
iris[ , "Species"] # is the same as iris[["Species"]]
iris[ , "Species", drop = FALSE] # is the same as iris["Species"]
iris[ , c("Sepal.Width", "Species")] # is the same as iris[c("Sepal.Width", "Species")]
So in your case, you simply need:
select <- c("Server","Date","Host_CPU","UsedMemPercent",
"jvm3","jvm4","jvm5")
df[select]
Note on subset: subset works, but should ONLY be used interactively. There's a warning on the help page stating :
This is a convenience function intended for use interactively. For
programming it is better to use the standard subsetting functions like
[, and in particular the non-standard evaluation of argument subset
can have unanticipated consequences.

Saving your dataframe to a variable df:
df <-
structure(
list(
Server = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "servera", class = "factor"),
Date = structure(
1:6,
.Label = c(
"7/13/2017 15:01",
"7/13/2017 15:02",
"7/13/2017 15:03",
"7/13/2017 15:04",
"7/13/2017 15:05",
"7/13/2017 15:06"
),
class = "factor"
),
Host_CPU = c(
1.812950134,
2.288070679,
1.563278198,
1.925239563,
5.350669861,
2.612503052
),
UsedMemPercent = c(38.19, 38.19, 38.19, 38.19, 38.19,
38.22),
jvm1 = c(10.91, 11.13, 11.34, 11.56, 11.77, 11.99),
jvm2 = c(11.47, 11.7, 11.91, 12.13, 12.35, 12.57),
jvm3 = c(75.65,
76.88, 56.93, 58.99, 65.29, 67.97),
jvm4 = c(39.43, 40.86,
42.27, 43.71, 45.09, 45.33),
jvm5 = c(27.42, 29.63, 31.02,
32.37, 33.72, 37.71)
),
.Names = c(
"Server",
"Date",
"Host_CPU",
"UsedMemPercent",
"jvm1",
"jvm2",
"jvm3",
"jvm4",
"jvm5"
),
class = "data.frame",
row.names = c(NA,-6L)
)
df[,select] should be what youre looking for

Here's one way:
df[,c(1:4,7:9)]
You can also use dplyr to select columns:
select(df, Server,Date,Host_CPU,UsedMemPercent,jvm3,jvm4,jvm5)

Related

cannot coerce class ‘"formula"’ to a data.frame

I am trying to use Hotelling test
When I call hotelling.test(.~Number, bottle.df)
everything is OK.
However, when I try to do Hotelling test to only one element,
```
bottle_elem1<-data.frame(bottle.df$Number,bottle.df$Mn)
hotelling.test(bottle.df.Number, bottle_elem1)
```
it gives an error
> Error in as.data.frame.default(x[[i]], optional = TRUE): cannot coerce class ‘"formula"’ to a data.frame
>Traceback:
>1. data.frame(. ~ Number, bottle.df$Mn)
>2. as.data.frame(x[[i]], optional = TRUE)
>3. as.data.frame.default(x[[i]], optional = TRUE)
>4. stop(gettextf("cannot coerce class %s to a data.frame", sQuote(deparse(class(x))[1L])),
. domain = NA)
I understand that should do it differently, but don't know how. If I use like previously .~Number, there is an error too
What is correct code to do Hotelling test for a column? Maybe I should extract column differently, but don't know how.
bottle is from Hotelling package
structure(list(Number = c(1L, 1L, 1L, 1L, 1L, 1L), Mn = c(56.1,
53.8, 58.7, 54.6, 58.6, 56.8), Ba = c(170.7, 166.2, 184.2, 170.5,
185.2, 180.5), Sr = c(145.1, 143.3, 156.5, 158.1, 161.3, 146.7
), Zr = c(77.4, 71.6, 78.2, 75.3, 83.9, 79.2), Ti = c(267.4,
270, 286.4, 273.6, 289.9, 274)), row.names = c(NA, 6L), class = "data.frame")

Count how often two factors have the same output value

I want to calculate the number of times two individuals share the same group number. I'm working with quite a large dataset (169 individuals and over a 1000 observations (rows) of them) and I'm looking for an efficient way to count the occurrence of them being in the same group. My (simplified) data looks like this:
ID
Group number
Date
Time
Aa
1
15-06-22
15:05:22
Bd
1
15-06-22
15:05:27
Cr
2
15-06-22
15:07:12
Bd
1
15-06-22
17:33:15
Aa
2
15-06-22
17:36:54
Cr
2
15-06-22
17:37:01
...
I would like my output data to look like this:
Aa-Bd
Aa-Cr
Bd-Cr
...
1
1
0
Or:
Occurrence
Dyad
1
Aa-Bd; Aa-Cr
0
Bd-Cr
Or even a matrix might work. I've been trying to replicate the solution posed for this problem: Count occurrences of a variable having two given values corresponding to one value of another variable
but for some reason my matrix remains empty, even though I know that certain individuals have been in groups with others.
Any help and suggestions would be extremely appreciated! I feel like the solution shouldn't be too complicated but for some reason I can't seem to figure it out.
Thanks in advance!
Edit: some example data from dput():
dput(c[1:5,])
structure(list(Date = structure(c(19129, 19129, 19129, 19129,
19129), class = "Date"), Time = c("11:05:58", "11:06:06", "11:06:16",
"11:06:33", "11:06:59"), Data = structure(c(1L, 1L, 1L, 1L, 1L
), .Label = "Crossing", class = "factor"), Group = structure(c(5L,
5L, 5L, 5L, 5L), .Label = c("Ankhase", "Baie Dankie", "Kubu",
"Lemon Tree", "Noha"), class = "factor"), IDIndividual1 = structure(c(158L,
158L, 34L, 153L, 14L), .Label = c("Aaa", "Aal", "Aan", "Aapi",
"Aar", "Aara", "Aare", "Aat", "Amst", "App", "Asis", "Awa", "Beir",
"Bela", "Bet", "Buk", "Daa", "Dais", "Dazz", "Deli", "Dewe",
"Dian", "Digb", "Dix", "Dok", "Dore", "Eina", "Eis", "Enge",
"Fle", "Flu", "Fur", "Gale", "Gaya", "Gese", "Gha", "Ghid", "Gib",
"Gil", "Ginq", "Gobe", "Godu", "Goe", "Gom", "Gran", "Gree",
"Gri", "Gris", "Griv", "Guat", "Gub", "Guba", "Gubh", "Guz",
"Haai", "Hee", "Heer", "Heli", "Hond", "Kom", "Lail", "Lewe",
"Lif", "Lill", "Lizz", "Mara", "Mas", "Miel", "Misk", "Moes",
"Mom", "Mui", "Naal", "Nak", "Ncok", "Nda", "Ndaw", "Ndl", "Ndon",
"Ndum", "Nge", "Nko", "Nkos", "Non", "Nooi", "Numb", "Nurk",
"Nuu", "Obse", "Oerw", "Oke", "Ome", "Oort", "Ouli", "Oup", "Palm",
"Pann", "Papp", "Pie", "Piep", "Pix", "Pom", "Popp", "Prai",
"Prat", "Pret", "Prim", "Puol", "Raba", "Rafa", "Ram", "Rat",
"Rede", "Ree", "Reen", "Regi", "Ren", "Reno", "Rid", "Rim", "Rioj",
"Riss", "Riva", "Rivi", "Roc", "Sari", "Sey", "Sho", "Sig", "Sirk",
"Sitr", "Skem", "Sla", "Spe", "Summary", "Syl", "Tam", "Ted",
"Tev", "Udup", "Uls", "Umb", "Unk", "UnkAM", "UnkBB", "UnkJ",
"UnkJF", "UnkJM", "Upps", "Utic", "Utr", "Vla", "Vul", "Xala",
"Xar", "Xeni", "Xia", "Xian", "Xih", "Xin", "Xinp", "Xop", "Yam",
"Yamu", "Yara", "Yaz", "Yelo", "Yodo", "Yuko"), class = "factor"),
Behaviour = structure(c(2L, 3L, 1L, 1L, 1L), .Label = c("Crossing",
"First Approacher", "First Crosser", "Last Crosser", "Summary"
), class = "factor"), CrossingType = c("Road - Ground Level",
"Road - Ground Level", "Road - Ground Level", "Road - Ground Level",
"Road - Ground Level"), GPSS = c(-27.9999, -27.9999, -27.9999,
-27.9999, -27.9999), GPSE = c(31.20376, 31.20376, 31.20376,
31.20376, 31.20376), Context = structure(c(1L, 1L, 1L, 1L,
1L), .Label = c("Crossing", "Feeding", "Moving", "Unknown"
), class = "factor"), Observers = structure(c(12L, 12L, 12L,
12L, 12L), .Label = c("Christelle", "Christelle; Giulia",
"Christelle; Maria", "Elif; Giulia", "Josefien; Zach; Flavia; Maria",
"Mathieu", "Mathieu; Giulia", "Mike; Mila", "Mila", "Mila; Christelle; Giulia",
"Mila; Elif", "Mila; Giulia", "Nokubonga; Mila", "Nokubonga; Tam; Flavia",
"Nokubonga; Tam; Flavia; Maria", "Nokubonga; Zach; Flavia; Maria",
"Tam; Flavia", "Tam; Zach; Flavia; Maria", "Zach", "Zach; Elif; Giulia",
"Zach; Flavia; Maria", "Zach; Giulia"), class = "factor"),
DeviceId = structure(c(10L, 10L, 10L, 10L, 10L), .Label = c("{129F4050-2294-0D43-890F-3B2DEF58FC1A}",
"{1A678F44-DB8C-1245-8DD7-9C2F92F086CA}", "{1B249FD2-AA95-5745-9A32-56CDD0587018}",
"{2C7026A6-6EDC-BA4F-84EC-3DDADFFD4FD7}", "{2E489E9F-00BE-E342-8CAE-941618B2F0E6}",
"{359CEB57-351F-F54F-B2BD-77A05FB6C349}", "{3727647C-B73A-184B-B187-D6BF75646B84}",
"{7A4E6639-7387-7648-88EC-7FD27A0F258A}", "{854B02F2-5979-174A-AAE8-398C21664824}",
"{89B5C791-1F71-0149-A2F7-F05E0197F501}", "{D92DF19A-9021-A740-AD99-DCCE1D88E064}"
), class = "factor"), Obs.nr = c(1, 1, 1, 1, 1), Gp.nr = c(1,
3, 3, 4, 5)), row.names = c(NA, -5L), groups = structure(list(
Obs.nr = 1, .rows = structure(list(1:5), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -1L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
In here Gp.nr is my group number, IDIndividual1 is my ID.
This is not efficient at all, but as a starting point you can use (GN denotes the group number)
my_ID <- unique(df$ID)
matrix <- matrix(nrow = length(my_ID),ncol = length(my_ID))
for (i in 1:length(my_ID)){
for (j in 1:length(my_ID)){
matrix[i,j] <- length(intersect(df$GN[df$ID == my_ID[i]],df$GN[df$ID == my_ID[j]]))}}
Check this out:
## Creating the Dataframe
df = data.frame(ID = c("Aa","Bd","Cc","Dd","Cr"),
GroupNumber=c(1,2,1,3,3))
## Loading the libraries
library(dplyr)
library(tidyverse)
library(stringr)
## Grouping to find out which observations share same group
df1 = df %>%
group_by(GroupNumber) %>%
summarise(ID_=paste(ID, collapse="-"),
CountbyID = n_distinct(ID_)) %>%
filter(str_detect(ID_, "-"))
## Creating all possible pair combinations and then joining and concatenating all rows
df2 = data.frame(t(combn(df$ID,2))) %>%
mutate(Comb = paste(X1,"-",X2, sep = "")) %>%
left_join(df1, by=c("Comb"="ID_")) %>%
select(Comb, CountbyID) %>%
replace(is.na(.), 0) %>%
group_by(CountbyID) %>%
summarise(ID=paste(Comb, collapse=";"))
Hope this helps!
UPDATE
The way the dataframe is setup, its causing issues to the "IDIndividual1" column. Based on the way it is setup, it has more factor levels than the unique data points. Therefore, I simply converted it to a character. Try the code below:
df = df[,c("IDIndividual1","Gp.nr")]
colnames(df) = c("ID","GroupNumber")
df$ID = as.character(df$ID) ## Converting factors to characters
## Loading the libraries
library(dplyr)
library(tidyverse)
library(stringr)
## Grouping to find out which observations share same group
df1 = df %>%
group_by(GroupNumber) %>%
summarise(ID_=paste(ID, collapse="-"),
CountbyID = n_distinct(ID_)) %>%
filter(str_detect(ID_, "-"))
## Creating all possible pair combinations and then joining and concatenating all rows
df2 = data.frame(t(combn(df$ID,2))) %>%
distinct() %>%
filter(X1 != X2) %>%
mutate(Comb = paste(X1,"-",X2, sep = "")) %>%
left_join(df1, by=c("Comb"="ID_")) %>%
select(Comb, CountbyID) %>%
replace(is.na(.), 0) %>%
group_by(CountbyID) %>%
summarise(ID=paste(Comb, collapse=";"))

Speed up/replace the loop for millions data:judge multi date range

Good evening guys,I have 6 millions data and they have four types.
z=structure(list(date = structure(c(11866, 16190, 14729, 11718), class = "Date"),
beg1 = structure(c(12264, 12264, 13970, 12264), class = "Date"),
end1 = structure(c(17621, 14760, 14760, 13298), class = "Date"),
ID1 = c(1003587, 1000396, 1010743, 1002113), beg2 = structure(c(NA,
14790, 14790, 13299), class = "Date"), end2 = structure(c(NA,
17621, 15217, 13969), class = "Date"), ID2 = c(NA, 1024488,
1027877, 1002824), beg3 = structure(c(NA, NA, 15218, 13970
), class = "Date"), end3 = structure(c(NA, NA, 17621, 14760
), class = "Date"), ID3 = c(NA, NA, 1031361, 1002113), beg4 = structure(c(NA,
NA, NA, 14790), class = "Date"), end4 = structure(c(NA, NA,
NA, 17621), class = "Date"), ID4 = c(NA, NA, NA, 1021290),
realID = c(NA, NA, NA, NA)), row.names = c(267365L, 193587L,
5294385L, 2039421L), class = "data.frame")
and I tried to judge and assign a suitalbe ID based on their date in which date ranges(use the loop).
for(i in 1:nrow(z)){tryCatch({print(i)
if(between(z$date[i],z$beg1[i],z$end1[i])==T){z$realID[i]=z$ID1[i]}
if(between(z$date[i],z$beg2[i],z$end2[i])==T){z$realID[i]=z$ID2[i]}
if(between(z$date[i],z$beg3[i],z$end3[i])==T){z$realID[i]=z$ID3[i]}
if(between(z$date[i],z$beg4[i],z$end4[i])==T){z$realID[i]=z$ID4[i]}},error=function(e){})}
The code works.
But,now the problem is I have too many datas,the loop is inefficiency,may be it will take almost one day to loop.
Does anyone know how can I improve or replace the code?
Thanks you so much.
Since R is a vectorized language, to speed up this code it is best to operate on the entire vector as oppose to looping through each element.
As simple solution is to use a series of ifelse statements.
z$realID <- ifelse(!is.na(z$beg1) & z$date> z$beg1 & z$date< z$end1, z$ID1, z$realID)
z$realID <- ifelse(!is.na(z$beg2) & z$date> z$beg2 & z$date< z$end2, z$ID2, z$realID)
z$realID <- ifelse(!is.na(z$beg3) & z$date> z$beg3 & z$date< z$end3, z$ID3, z$realID)
z$realID <- ifelse(!is.na(z$beg4) & z$date> z$beg4 & z$date< z$end4, z$ID4, z$realID)
When the if statement evaluates TRUE, the realID will update if not it will retain its prior value.

Trouble trying to clean a character vector in R data frame (UTF-8 encoding issue)

I'm having some issues cleaning up a dataset after I manually extracted the data online - I'm guessing these are encoding issues. I have an issue trying to remove the "U+00A0" in the "Athlete" column cels along with the operator brackets. I looked up the corresponding UTF-8 code and it's for "No-Break-Space". I'm also not sure how to replace the other UTF-8 characters to make the names legible - for e.g. getting U+008A to display as Š.
Subset of data
head2007decathlon <- structure(list(Rank = 1:6, Athlete = c("<U+00A0>Roman <U+008A>ebrle<U+00A0>(CZE)", "<U+00A0>Maurice Smith<U+00A0>(JAM)", "<U+00A0>Dmitriy Karpov<U+00A0>(KAZ)", "<U+00A0>Aleksey Drozdov<U+00A0>(RUS)", "<U+00A0>Andr<e9> Niklaus<U+00A0>(GER)", "<U+00A0>Aleksey Sysoyev<U+00A0>(RUS)"), Total = c(8676L, 8644L, 8586L, 8475L, 8371L, 8357L), `100m` = c(11.04, 10.62, 10.7, 10.97, 11.12, 10.8), LJ = c(7.56, 7.5, 7.19, 7.25, 7.42, 7.01), SP = c(15.92, 17.32, 16.08, 16.49, 14.12, 16.16), HJ = c(2.12, 1.97, 2.06, 2.12, 2.06, 2.03), `400m` = c(48.8, 47.48, 47.44, 50, 49.4, 48.42), `110mh` = c(14.33, 13.91, 14.03, 14.76, 14.51, 14.59), DT = c(48.75, 52.36, 48.95, 48.62, 44.48, 49.76), PV = c(4.8, 4.8, 5, 5, 5.3, 4.9), JT = c(71.18, 53.61, 59.84, 65.51, 63.28, 57.75), `1500m` = c(275.32, 273.52, 279.68, 276.93, 272.5, 276.16), Year = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "2007", class = "factor"), Nationality = c(NA, NA, NA, NA, NA, NA)), .Names = c("Rank", "Athlete", "Total", "100m", "LJ", "SP", "HJ", "400m", "110mh", "DT", "PV", "JT", "1500m", "Year", "Nationality"), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))
This is what I've tried so far to no success:
1) head2007decathlon$Athlete <- gsub(pattern="\U00A0",replacement="",x=head2007decathlon$Athlete)
2) head2007decathlon$Athlete <- gsub(pattern="<U00A0>",replacement="",x=head2007decathlon$Athlete)
3) head2007decathlon$Athlete <- iconv(head2007decathlon$Athlete, from="UTF-8", to="LATIN1")
4) Encoding(head2007decathlon$Athlete) <- "UTF-8"
5) head2007decathlon$Athlete<- enc2utf8(head2007decathlon$Athlete)
The following would remove the no break space.
head2007decathlon$Athlete <- gsub(pattern="<U\\+00A0>",replacement="",x=head2007decathlon$Athlete)
Not sure how to convert the other characters. One problem could be that the codes are not exactly in a format that R sees as UTF-8.
One example:
iconv('\u008A', from="UTF-8", to="LATIN1")
this seems to have an effect, contrary to trying to convert U+008A. Although
the output is:
[1] "\x8a"
not the character you want. Hope this helps somehow.

convert List to zoo and use rollapply on the List

I would like to convert a list object to zoo and then apply rollapply on the zoo object. Short example reproduced below (I have 90,000 such files to process, using UNIX:)). Assume my list has two dataframes.
1) I would like to convert the date in each of the dataframes to this format:
dates <- as.Date(paste0(mylist$year, "-", mylist$month, "-", mylist$day), format="%Y-%m-%d")
z <- zoo(mylist, order.by=mylist[,1])
I understand lapply can do this but I tried without success.
Once I get my zoo object, I would like to use rollapply:
library(hydroTSM)#for daily2annual function but aggregate can do
x.3max <- rollapply(data=zooobject, width=3, FUN=sum, fill=NA, partial= TRUE,
align="center")
# Maximum value per year of 3-day total rainfall for each one of the simulations
z.3max.annual <- daily2annual(z.3max, FUN=max,na.rm=TRUE)#dates=1
What the code above does is it centers a 3-day window on each column of the dataframes in zooobject and sums the values. The, the max per year of the 3-day total is extracted.
mylist<- list(a,a)
mylist<-lapply(mylist, function(x) x[x[["Month"]] %in% c(12,1,2),])# extract data for DJF for individual sites
library(zoo)
a= structure(list(Year = c(1975L, 1975L, 1975L, 1975L, 1975L, 1975L
), Month = c(1L, 1L, 1L, 1L, 1L, 1L), Site = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = "G100", class = "factor"), Day = 1:6,
sim01 = c(28.49, 29.04, 27.62, 28.43, 28.69, 29.16), sim02 = c(29.49,
30.04, 28.62, 29.43, 29.69, 30.16), sim03 = c(30.49, 31.04,
29.62, 30.43, 30.69, 31.16), sim04 = c(31.49, 32.04, 30.62,
31.43, 31.69, 32.16), sim05 = c(32.49, 33.04, 31.62, 32.43,
32.69, 33.16), sim06 = c(33.49, 34.04, 32.62, 33.43, 33.69,
34.16), sim07 = c(34.49, 35.04, 33.62, 34.43, 34.69, 35.16
), sim08 = c(35.49, 36.04, 34.62, 35.43, 35.69, 36.16), sim09 = c(36.49,
37.04, 35.62, 36.43, 36.69, 37.16), sim10 = c(37.49, 38.04,
36.62, 37.43, 37.69, 38.16), sim11 = c(38.49, 39.04, 37.62,
38.43, 38.69, 39.16), sim12 = c(39.49, 40.04, 38.62, 39.43,
39.69, 40.16), sim13 = c(40.49, 41.04, 39.62, 40.43, 40.69,
41.16), sim14 = c(41.49, 42.04, 40.62, 41.43, 41.69, 42.16
), sim15 = c(42.49, 43.04, 41.62, 42.43, 42.69, 43.16), sim16 = c(43.49,
44.04, 42.62, 43.43, 43.69, 44.16), sim17 = c(44.49, 45.04,
43.62, 44.43, 44.69, 45.16), sim18 = c(45.49, 46.04, 44.62,
45.43, 45.69, 46.16), sim19 = c(46.49, 47.04, 45.62, 46.43,
46.69, 47.16), sim20 = c(47.49, 48.04, 46.62, 47.43, 47.69,
48.16)), .Names = c("Year", "Month", "Site", "Day", "sim01",
"sim02", "sim03", "sim04", "sim05", "sim06", "sim07", "sim08",
"sim09", "sim10", "sim11", "sim12", "sim13", "sim14", "sim15",
"sim16", "sim17", "sim18", "sim19", "sim20"), row.names = c(NA,
6L), class = "data.frame")
Output should be similar to:
Year Site Sim01...
1975 G100 ...
1976 G100 ...
1977 G100 ...
Only the values in the months c(12,1,2) are needed.
This produces a list of zoo objects, Lz, and then performs rollapply on each component of the list giving L2. Finally L3 aggregates over year taking the max of each column.
library(zoo)
mylist <- list(a, a) # a is given at bottom of question
Lz <- lapply(mylist, read.zoo, index = 1:3, format = "%Y %m %d")
L2 <- lapply(Lz, rollapply, 3, sum, partial = TRUE)
L3 <- lapply(L2, function(z) aggregate(z, as.numeric(format(time(z), "%Y")), max))
giving:
> L3
[[1]]
sim01 sim02 sim03 sim04 sim05 sim06 sim07 sim08 sim09 sim10 sim11
1975 86.28 89.28 92.28 95.28 98.28 101.28 104.28 107.28 110.28 113.28 116.28
sim12 sim13 sim14 sim15 sim16 sim17 sim18 sim19 sim20
1975 119.28 122.28 125.28 128.28 131.28 134.28 137.28 140.28 143.28
[[2]]
sim01 sim02 sim03 sim04 sim05 sim06 sim07 sim08 sim09 sim10 sim11
1975 86.28 89.28 92.28 95.28 98.28 101.28 104.28 107.28 110.28 113.28 116.28
sim12 sim13 sim14 sim15 sim16 sim17 sim18 sim19 sim20
1975 119.28 122.28 125.28 128.28 131.28 134.28 137.28 140.28 143.28
Solved
lst1 <- lapply(list.files(pattern=".csv"),function(x) read.table(x,header=TRUE,sep="")) # read all files and data and replace -999.9 with NA
lst2<-lapply(lst1, function(x) x[x[["Month"]] %in% c(6,7,8),])#c(6,7,8) extract data for DJF for individual sites
names(lst2)<-list.files(pattern=".csv")
lapply(lst2,tail,4)
lst3<-lapply(lst2, function(x) x[!(names(x) %in% c("Site"))])
Lz <- lapply(lst3, read.zoo, index = 1:3, format = "%Y %m %d")
L2 <- lapply(Lz, rollapply, 3, sum, partial = TRUE)
L3 <- lapply(L2, function(z) aggregate(z, as.numeric(format(time(z), "%Y")), max))
mapply(
write.table,
x=L3, file=paste(names(L3), "csv", sep="."),
MoreArgs=list(row.names=FALSE, sep=",")
) # write files to folder keeping the list names as file names

Resources