Colmeans in a dataframe by factor variable

Colmeans in a dataframe by factor variable - r

I'm trying to get the mean of some variables inside a dataframe for different factors. Say I have:
time geo var1 var2 var3 var4
1 1990 AT 1 7 13 19
2 1991 AT 2 8 14 20
3 1992 AT 3 9 15 21
4 1990 DE 4 10 16 22
5 1991 DE 5 11 17 23
6 1992 DE 6 12 18 24
And I want:
time geo var1 var2 var3 var4 m_var2 m_var3
1 1990 AT 1 7 13 19 8 14
2 1991 AT 2 8 14 20 8 14
3 1992 AT 3 9 15 21 8 14
4 1990 DE 4 10 16 22 11 17
5 1991 DE 5 11 17 23 11 17
6 1992 DE 6 12 18 24 11 17
I've tried a few things with by() and lapply() but I think this goes into the direction of ddply
require(plyr)
Dataset <- data.frame(time=rep(c(1990:1992),2),geo=c(rep("AT",3),rep("DE",3))
,var1=as.numeric(c(1:6)),var2=as.numeric(c(7:12)),var3=as.numeric(c(13:18)),
var4=as.numeric(c(19:24)))
newvars <- c("var2","var3")
newData <- Dataset[,c("geo",newvars)]
Currently, I can choose between two errors:
ddply(newData,newData[,"geo"],colMeans)
#where R apparently thinks AT is the variable?
ddply(newData,"geo",colMeans)
#where R worries about the factor variable not being numeric?
My lapply attempts got me quite far but then left me with a list I could not get back into the dataframe:
lapply(newvars,function(x){
by(Dataset[x],Dataset[,"geo"],function(x)
rep(colMeans(x,na.rm=T),length(unique(Dataset[,"time"]))))
})
I think this must even be able with merge and filters as here:
Lapply in a dataframe over different variables using filters , but I can't get it together. Any help would be appreciated!

Other method with dplyr
library(dplyr)
df1 %>% group_by(geo) %>% mutate(m_var2=mean(var2), m_var3=mean(var3))

Another simple base R solution is just
transform(df, m_var2 = ave(var2, geo), m_var3 = ave(var3, geo))
# time geo var1 var2 var3 var4 m_var2 m_var3
# 1 1990 AT 1 7 13 19 8 14
# 2 1991 AT 2 8 14 20 8 14
# 3 1992 AT 3 9 15 21 8 14
# 4 1990 DE 4 10 16 22 11 17
# 5 1991 DE 5 11 17 23 11 17
# 6 1992 DE 6 12 18 24 11 17
Couple years later, I think a more concise approach would be to both update the actual data set (instead of creating a new one) and operate on a vector of columns (instead of manually writing them)
vars <- paste0("var", 2:3) # Select desired cols
df[paste0("m_", vars)] <- lapply(df[vars], ave, df[["geo"]]) # Loop and update

One option would be to use data.table. We can convert the data.frame to data.table (setDT(df1)), get the mean (lapply(.SD, mean)) for the selected columns ('var2' and 'var3') by specifying the column index in .SDcols, grouped by 'geo'. Create new columns by assigning the output (:=) to the new column names (paste('m', names(df1)[4:5]))
library(data.table)
setDT(df1)[, paste('m', names(df1)[4:5], sep="_") :=lapply(.SD, mean)
,by = geo, .SDcols=4:5]
# time geo var1 var2 var3 var4 m_var2 m_var3
#1: 1990 AT 1 7 13 19 8 14
#2: 1991 AT 2 8 14 20 8 14
#3: 1992 AT 3 9 15 21 8 14
#4: 1990 DE 4 10 16 22 11 17
#5: 1991 DE 5 11 17 23 11 17
#6: 1992 DE 6 12 18 24 11 17
NOTE: This method is more general. We can create the mean columns even for 100s of variables without any major change in the code. ie. if we need to get the mean of columns 4:100, change the .SDcols=4:100 and in the paste('m', names(df1)[4:100].
data
df1 <- structure(list(time = c(1990L, 1991L, 1992L, 1990L, 1991L, 1992L
), geo = c("AT", "AT", "AT", "DE", "DE", "DE"), var1 = 1:6, var2 = 7:12,
var3 = 13:18, var4 = 19:24), .Names = c("time", "geo", "var1",
"var2", "var3", "var4"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))

In base R:
cbind(df1,m_var2=ave(df1$var2,df1$geo),m_var3=ave(df1$var3,df1$geo))

Related

Taking rolling differences of columns in R tibble for arbitrary number of columns

I want to take differences for each pair of consecutive columns but for an arbitrary number of columns. For example...
df <- as.tibble(data.frame(group = rep(c("a", "b", "c"), each = 4),
subgroup = rep(c("adam", "boy", "charles", "david"), times = 3),
iter1 = 1:12,
iter2 = c(13:22, NA, 24),
iter3 = c(25:35, NA)))
I want to calculate the differences by column. I would normally use...
df %>%
mutate(diff_iter2 = iter2 - iter1,
diff_iter3 = iter3 - iter2)
But... I'd like to:
accomodate an arbitrary number of columns and
treat NAs such that:
if the number we're subtracting from is NA, then the result should be NA. E.g. NA - 11 = NA
if the number we're subtracting is NA, then that NA is effectively treated as a 0. E.g. 35 - NA = 35
The result should look like this...
group subgroup iter1 iter2 iter3 diff_iter2 diff_iter3
<chr> <chr> <int> <dbl> <int> <dbl> <dbl>
1 a adam 1 13 25 12 12
2 a boy 2 14 26 12 12
3 a charles 3 15 27 12 12
4 a david 4 16 28 12 12
5 b adam 5 17 29 12 12
6 b boy 6 18 30 12 12
7 b charles 7 19 31 12 12
8 b david 8 20 32 12 12
9 c adam 9 21 33 12 12
10 c boy 10 22 34 12 12
11 c charles 11 NA 35 NA 35
12 c david 12 24 NA 12 NA
Originally, this df was in long format but the problem was that I believe the lag() function operates on position within groups and all the groups aren't the same because some have missing records (hence the NA in the wider table shown above).
Starting with long format would do but then please assume the records shown above with NA values would not exist in that longer dataframe.
Any help is appreciated.

An option in tidyverse would be - loop across the columns of 'iter' other than the iter1, then get the column value by replacing the column name (cur_column()) substring by subtracting 1 (as.numeric(x) -1) with str_replace, then replace the NA elements with 0 (replace_na) based on the OP's logic, subtract from the looped column and create new columns by adding prefix in .names ("diff_{.col}" - {.col} will be the original column name)
library(dplyr)
library(stringr)
library(tidyr)
df <- df %>%
mutate(across(iter2:iter3, ~
. - replace_na(get(str_replace(cur_column(), '\\d+',
function(x) as.numeric(x) - 1)), 0), .names = 'diff_{.col}'))
-output
df
# A tibble: 12 × 7
group subgroup iter1 iter2 iter3 diff_iter2 diff_iter3
<chr> <chr> <int> <dbl> <int> <dbl> <dbl>
1 a adam 1 13 25 12 12
2 a boy 2 14 26 12 12
3 a charles 3 15 27 12 12
4 a david 4 16 28 12 12
5 b adam 5 17 29 12 12
6 b boy 6 18 30 12 12
7 b charles 7 19 31 12 12
8 b david 8 20 32 12 12
9 c adam 9 21 33 12 12
10 c boy 10 22 34 12 12
11 c charles 11 NA 35 NA 35
12 c david 12 24 NA 12 NA

Find the columns whose names start with iter, ix, and then take all but the first as df1, all but the last as df2 and replace the NAs in df2 with 0. Then subtract them and cbind df to that. No packages are used.
ix <- grep("^iter", names(df))
df1 <- df[tail(ix, -1)]
df2 <- df[head(ix, -1)]
df2[is.na(df2)] <- 0
cbind(df, diff = df1 - df2)
giving:
group subgroup iter1 iter2 iter3 diff.iter2 diff.iter3
1 a adam 1 13 25 12 12
2 a boy 2 14 26 12 12
3 a charles 3 15 27 12 12
4 a david 4 16 28 12 12
5 b adam 5 17 29 12 12
6 b boy 6 18 30 12 12
7 b charles 7 19 31 12 12
8 b david 8 20 32 12 12
9 c adam 9 21 33 12 12
10 c boy 10 22 34 12 12
11 c charles 11 NA 35 NA 35
12 c david 12 24 NA 12 NA

add rows to data frame for non-observations

I have a dataframe that summarizes the number of times birds were observed at their breeding site one each day and each hour during daytime (i.e., when the sun was above the horizon). example:
head(df)
ID site day hr nObs
1 19 A 202 11 60
2 19 A 202 13 18
3 19 A 202 15 27
4 8 B 188 8 6
5 8 B 188 9 6
6 8 B 188 11 7
However, this dataframe does not include hours when the bird was not observed. Eg. no line for bird 19 on day 202 at 14 with an nObs value of 0.
I'd like to find a way, preferably with dplyr (tidy verse), to add in those rows for when individuals were not observed.

You can use complete from tidyr, i.e.
library(tidyverse)
df %>%
group_by(ID, site) %>%
complete(hr = seq(min(hr), max(hr)))
which gives,
# A tibble: 9 x 5
# Groups: ID, site [2]
ID site hr day nObs
<int> <fct> <int> <int> <int>
1 8 B 8 188 6
2 8 B 9 188 6
3 8 B 10 NA NA
4 8 B 11 188 7
5 19 A 11 202 60
6 19 A 12 NA NA
7 19 A 13 202 18
8 19 A 14 NA NA
9 19 A 15 202 27

One way to do this would be to first build a "template" of all possible combinations where birds can be observed and then merge ("left join") the actual observations onto that template:
a = read.table(text = " ID site day hr nObs
1 19 A 202 11 60
2 19 A 202 13 18
3 19 A 202 15 27
4 8 B 188 8 6
5 8 B 188 9 6
6 8 B 188 11 7")
tpl <- expand.grid(c(unique(a[, 1:3]), list(hr = 1:24)))
merge(tpl, a, all.x = TRUE)
Edit based on comment by #user3220999: in case we want to do the process per ID, we can just use split to get a list of data.frames per ID, get a list of templates and mapply merge on the two lists:
a <- split(a, a$ID)
tpl <- lapply(a, function(ai) {
expand.grid(c(unique(ai[, 1:3]), list(hr = 1:24)))
})
res <- mapply(merge, tpl, a, SIMPLIFY = FALSE, MoreArgs = list(all.x = TRUE))

reshaping columns in a data frame

Say I have the following data frame:
ID<-c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3, 4,4,4,4,4,5,5,5,5,5)
Score<- sample(1:20, 25, replace=TRUE)
days<-rep(c("Mon", "Tue", "Wed", "Thu", "Fri"), times=5)
t<-cbind(ID, Score, days)
I would like to reshape it so that the new columns are ID and the actual weekday names, (meaning 6 columns) and the Score values are distributed according to their ID and day name. Something like this:
I found that reshape package might do. Tried (melt and cast) but it did not produce the result I wanted, but something like in this post: Melt data for one column

A base R solution that uses the built-in reshape command.
set.seed(12345)
t <- data.frame(id = c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,5,5,5,5,5),
score = sample(x = 1:20,size = 25,replace = TRUE),
days = rep(x = c("Mon","Tue","Wed","Thu","Fri"),times = 5))
t.wide <- reshape(data = t,
v.names = "score",
timevar = "days",
idvar = "id",
direction = "wide")
names(t.wide) <- gsub(pattern = "score.",replacement = "",x = names(t.wide),fixed = TRUE)
t.wide
id Mon Tue Wed Thu Fri
1 1 15 18 16 18 10
6 2 4 7 11 15 20
11 3 1 4 15 1 8
16 4 10 8 9 4 20
21 5 10 7 20 15 13

You can use reshape2 to do this, but you need a data.frame to do that. Using cbind produces a matrix. (And converts all your numerical variables to characters in this case, as matrices can only hold one data type).
I've changed your code to produce a dataframe, which is already in long format (one row per observation).
set.seed(123)
ID<-c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3, 4,4,4,4,4,5,5,5,5,5)
Score<- sample(1:20, 25, replace=TRUE)
days<-rep(c("Mon", "Tue", "Wed", "Thu", "Fri"), times=5)
dat<-data.frame(ID, Score, days)
Changing it to wide using reshape2 is then quite straightforward:
library(reshape2)
res <- dcast(ID~days,value.var="Score",data=dat)
> res
ID Fri Mon Thu Tue Wed
1 1 16 3 2 12 6
2 2 19 13 12 7 19
3 3 19 19 17 8 15
4 4 15 3 8 1 20
5 5 3 11 18 8 15

You could also use unstack if your data are complete (same number of days per id).
Here's an example (using the data from TARehman's answer):
unstack(t, score ~ days)
# Fri Mon Thu Tue Wed
# 1 10 15 18 18 16
# 2 20 4 15 7 11
# 3 8 1 1 4 15
# 4 20 10 4 8 9
# 5 13 10 15 7 20
Here's the clean-up for the column ordering, and for adding in the ID column:
cbind(ID = unique(t$id), unstack(t, score ~ days)[c("Mon", "Tue", "Wed", "Thu", "Fri")])
## ID Mon Tue Wed Thu Fri
## 1 1 15 18 16 18 10
## 2 2 4 7 11 15 20
## 3 3 1 4 15 1 8
## 4 4 10 8 9 4 20
## 5 5 10 7 20 15 13

Rather than reshape I'd move to the newer tidyr package and also make use of dplyr like so:
library(dplyr)
library(tidyr)
tdf<-as.data.frame(t) %>%
mutate(Score=as.numeric(Score)) %>%
spread(days,Score, fill=NA)
glimpse(tdf)
HTH

Just another option using splitstackshape
library(splitstackshape)
data = data.frame(t)
out = setnames(cSplit(setDT(data)[, .(x = toString(Score)), by = ID],
'x', ','), c('ID', unique(days)))
#> out
# ID Mon Tue Wed Thu Fri
#1: 1 8 14 11 5 10
#2: 2 16 1 4 14 8
#3: 3 8 18 19 13 3
#4: 4 16 9 19 16 6
#5: 5 7 2 1 2 13

Within both the dplyr & tidyr package, use spread to achieve the following:
library(dplyr)
library(tidyr)
t <- tbl_df(as.data.frame(t))
t %>% spread(days, Score, ID)
and you get the following output:
ID Fri Mon Thu Tue Wed
(fctr) (fctr) (fctr) (fctr) (fctr) (fctr)
1 1 10 10 18 17 10
2 2 18 11 14 3 16
3 3 11 13 9 15 17
4 4 13 13 16 17 11
5 5 7 14 9 15 20

Lapply in a dataframe over different variables using filters

I'm trying to calculate several new variables in my dataframe. Take initial values for example:
Say I have:
Dataset <- data.frame(time=rep(c(1990:1992),2),
geo=c(rep("AT",3),rep("DE",3)),var1=c(1:6), var2=c(7:12))
time geo var1 var2
1 1990 AT 1 7
2 1991 AT 2 8
3 1992 AT 3 9
4 1990 DE 4 10
5 1991 DE 5 11
6 1992 DE 6 12
And I want:
time geo var1 var2 var1_1990 var1_1991 var2_1990 var2_1991
1 1990 AT 1 7 1 2 7 8
2 1991 AT 2 8 1 2 7 8
3 1992 AT 3 9 1 2 7 8
4 1990 DE 4 10 4 5 10 11
5 1991 DE 5 11 4 5 10 11
6 1992 DE 6 12 4 5 10 11
So both time and the variable are changing for the new variables. Here is my attempt:
intitialyears <- c(1990,1991)
intitialvars <- c("var1", "var2")
# ideally, I want code where I only have to change these two vectors
# and where it's possible to change their dimensions
for (i in initialyears){
lapply(initialvars,function(x){
rep(Dataset[time==i,x],each=length(unique(Dataset$time)))
})}
Which runs without error but yields nothing. I would like to assign the variable names in the example (eg. "var1_1990") and immediately make the new variables part of the dataframe. I would also like to avoid the for loop but I don't know how to wrap two lapply's around this function. Should I rather have the function use two arguments? Is the problem that the apply function does not carry the results into my environment? I've been stuck here for a while so I would be grateful for any help!
p.s.: I have the solution to do this combination by combination without apply and the likes but I'm trying to get away from copy and paste:
Dataset$var1_1990 <- c(rep(Dataset$var1[which(Dataset$time==1990)],
each=length(unique(Dataset$time))))

This can be done with subset(), reshape(), and merge():
merge(Dataset,reshape(subset(Dataset,time%in%c(1990,1991)),dir='w',idvar='geo',sep='_'));
## geo time var1 var2 var1_1990 var2_1990 var1_1991 var2_1991
## 1 AT 1990 1 7 1 7 2 8
## 2 AT 1991 2 8 1 7 2 8
## 3 AT 1992 3 9 1 7 2 8
## 4 DE 1990 4 10 4 10 5 11
## 5 DE 1991 5 11 4 10 5 11
## 6 DE 1992 6 12 4 10 5 11
The column order isn't exactly what you have in your question, but you can fix that up after-the-fact with an index operation, if necessary.

Here's a data.table method:
require(data.table)
dt <- as.data.table(Dataset)
in_cols = c("var1", "var2")
out_cols = do.call("paste", c(CJ(in_cols, unique(dt$time)), sep="_"))
dt[, (out_cols) := unlist(lapply(.SD, as.list), FALSE), by=geo, .SDcols=in_cols]
# time geo var1 var2 var1_1990 var1_1991 var1_1992 var2_1990 var2_1991 var2_1992
# 1: 1990 AT 1 7 1 2 3 7 8 9
# 2: 1991 AT 2 8 1 2 3 7 8 9
# 3: 1992 AT 3 9 1 2 3 7 8 9
# 4: 1990 DE 4 10 4 5 6 10 11 12
# 5: 1991 DE 5 11 4 5 6 10 11 12
# 6: 1992 DE 6 12 4 5 6 10 11 12
This assumes that the time variable is identical (and in the same order) for each geo value.

With dplyr and tidyr and using a custom function try the following:
Data
Dataset <- data.frame(time=rep(c(1990:1992),2),
geo=c(rep("AT",3),rep("DE",3)),var1=c(1:6), var2=c(7:12))
Code
library(dplyr); library(tidyr)
intitialyears <- c(1990,1991)
intitialvars <- c("var1", "var2")
#create this function
myTranForm <- function(dataSet, varName, years){
temp <- dataSet %>% select(time, geo, eval(parse(text=varName))) %>%
filter(time %in% years) %>% mutate(time=paste(varName, time, sep="_"))
names(temp)[names(temp) %in% varName] <- "someRandomStringForVariableName"
temp <- temp %>% spread(time, someRandomStringForVariableName)
return(temp)
}
#Then lapply on intitialvars using the custom function
DatasetList <- lapply(intitialvars, function(x) myTranForm(Dataset, x, intitialyears))
#and loop over the data frames in the list
for(i in 1:length(intitialvars)){
Dataset <- left_join(Dataset, DatasetList[[i]])
}
Dataset

Reshape wide format, to multi-column long format

I want to reshape a wide format dataset that has multiple tests which are measured at 3 time points:
ID Test Year Fall Spring Winter
1 1 2008 15 16 19
1 1 2009 12 13 27
1 2 2008 22 22 24
1 2 2009 10 14 20
2 1 2008 12 13 25
2 1 2009 16 14 21
2 2 2008 13 11 29
2 2 2009 23 20 26
3 1 2008 11 12 22
3 1 2009 13 11 27
3 2 2008 17 12 23
3 2 2009 14 9 31
into a data set that separates the tests by column but converts the measurement time into long format, for each of the new columns like this:
ID Year Time Test1 Test2
1 2008 Fall 15 22
1 2008 Spring 16 22
1 2008 Winter 19 24
1 2009 Fall 12 10
1 2009 Spring 13 14
1 2009 Winter 27 20
2 2008 Fall 12 13
2 2008 Spring 13 11
2 2008 Winter 25 29
2 2009 Fall 16 23
2 2009 Spring 14 20
2 2009 Winter 21 26
3 2008 Fall 11 17
3 2008 Spring 12 12
3 2008 Winter 22 23
3 2009 Fall 13 14
3 2009 Spring 11 9
3 2009 Winter 27 31
I have unsuccessfully tried to use reshape and melt. Existing posts address transforming to single column outcome.

Using reshape2:
# Thanks to Ista for helping with direct naming using "variable.name"
df.m <- melt(df, id.var = c("ID", "Test", "Year"), variable.name = "Time")
df.m <- transform(df.m, Test = paste0("Test", Test))
dcast(df.m, ID + Year + Time ~ Test, value.var = "value")
Update: Using data.table melt/cast from versions >= 1.9.0:
data.table from versions 1.9.0 imports reshape2 package and implements fast melt and dcast methods in C for data.tables. A comparison of speed on bigger data is shown below.
For more info regarding NEWS, go here.
require(data.table) ## ver. >=1.9.0
require(reshape2)
dt <- as.data.table(df, key=c("ID", "Test", "Year"))
dt.m <- melt(dt, id.var = c("ID", "Test", "Year"), variable.name = "Time")
dt.m[, Test := paste0("Test", Test)]
dcast.data.table(dt.m, ID + Year + Time ~ Test, value.var = "value")
At the moment, you'll have to write dcast.data.table explicitly as it's not a S3 generic in reshape2 yet.
Benchmarking on bigger data:
# generate data:
set.seed(45L)
DT <- data.table(ID = sample(1e2, 1e7, TRUE),
Test = sample(1e3, 1e7, TRUE),
Year = sample(2008:2014, 1e7,TRUE),
Fall = sample(50, 1e7, TRUE),
Spring = sample(50, 1e7,TRUE),
Winter = sample(50, 1e7, TRUE))
DF <- as.data.frame(DT)
reshape2 timings:
reshape2_melt <- function(df) {
df.m <- melt(df, id.var = c("ID", "Test", "Year"), variable.name = "Time")
}
# min. of three consecutive runs
system.time(df.m <- reshape2_melt(DF))
# user system elapsed
# 43.319 4.909 48.932
df.m <- transform(df.m, Test = paste0("Test", Test))
reshape2_cast <- function(df) {
dcast(df.m, ID + Year + Time ~ Test, value.var = "value")
}
# min. of three consecutive runs
system.time(reshape2_cast(df.m))
# user system elapsed
# 57.728 9.712 69.573
data.table timings:
DT_melt <- function(dt) {
dt.m <- melt(dt, id.var = c("ID", "Test", "Year"), variable.name = "Time")
}
# min. of three consecutive runs
system.time(dt.m <- reshape2_melt(DT))
# user system elapsed
# 0.276 0.001 0.279
dt.m[, Test := paste0("Test", Test)]
DT_cast <- function(dt) {
dcast.data.table(dt.m, ID + Year + Time ~ Test, value.var = "value")
}
# min. of three consecutive runs
system.time(DT_cast(dt.m))
# user system elapsed
# 12.732 0.825 14.006
melt.data.table is ~175x faster than reshape2:::melt and dcast.data.table is ~5x than reshape2:::dcast.

Sticking with base R, this is another good candidate for the "stack + reshape" routine. Assuming our dataset is called "mydf":
mydf.temp <- data.frame(mydf[1:3], stack(mydf[4:6]))
mydf2 <- reshape(mydf.temp, direction = "wide",
idvar=c("ID", "Year", "ind"),
timevar="Test")
names(mydf2) <- c("ID", "Year", "Time", "Test1", "Test2")
mydf2
# ID Year Time Test1 Test2
# 1 1 2008 Fall 15 22
# 2 1 2009 Fall 12 10
# 5 2 2008 Fall 12 13
# 6 2 2009 Fall 16 23
# 9 3 2008 Fall 11 17
# 10 3 2009 Fall 13 14
# 13 1 2008 Spring 16 22
# 14 1 2009 Spring 13 14
# 17 2 2008 Spring 13 11
# 18 2 2009 Spring 14 20
# 21 3 2008 Spring 12 12
# 22 3 2009 Spring 11 9
# 25 1 2008 Winter 19 24
# 26 1 2009 Winter 27 20
# 29 2 2008 Winter 25 29
# 30 2 2009 Winter 21 26
# 33 3 2008 Winter 22 23
# 34 3 2009 Winter 27 31

Base reshape function alternative method is below. Though this required using reshape twice, there might be a simpler way.
Assuming your dataset is called df1
tmp <- reshape(df1,idvar=c("ID","Year"),timevar="Test",direction="wide")
result <- reshape(
tmp,
idvar=c("ID","Year"),
varying=list(3:5,6:8),
v.names=c("Test1","Test2"),
times=c("Fall","Spring","Winter"),
direction="long"
)
Which gives:
> result
ID Year time Test1 Test2
1.2008.Fall 1 2008 Fall 15 22
1.2009.Fall 1 2009 Fall 12 10
2.2008.Fall 2 2008 Fall 12 13
2.2009.Fall 2 2009 Fall 16 23
3.2008.Fall 3 2008 Fall 11 17
3.2009.Fall 3 2009 Fall 13 14
1.2008.Spring 1 2008 Spring 16 22
1.2009.Spring 1 2009 Spring 13 14
2.2008.Spring 2 2008 Spring 13 11
2.2009.Spring 2 2009 Spring 14 20
3.2008.Spring 3 2008 Spring 12 12
3.2009.Spring 3 2009 Spring 11 9
1.2008.Winter 1 2008 Winter 19 24
1.2009.Winter 1 2009 Winter 27 20
2.2008.Winter 2 2008 Winter 25 29
2.2009.Winter 2 2009 Winter 21 26
3.2008.Winter 3 2008 Winter 22 23
3.2009.Winter 3 2009 Winter 27 31

tidyverse/tidyr solution:
library(dplyr)
library(tidyr)
df %>%
gather("Time", "Value", Fall, Spring, Winter) %>%
spread(Test, Value, sep = "")

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Colmeans in a dataframe by factor variable - r

Other method with dplyr library(dplyr) df1 %>% group_by(geo) %>% mutate(m_var2=mean(var2), m_var3=mean(var3))

In base R: cbind(df1,m_var2=ave(df1$var2,df1$geo),m_var3=ave(df1$var3,df1$geo))

Related

Taking rolling differences of columns in R tibble for arbitrary number of columns

add rows to data frame for non-observations

reshaping columns in a data frame

Lapply in a dataframe over different variables using filters

Reshape wide format, to multi-column long format

Categories

Resources