This question already has answers here:
How to sum a variable by group
(18 answers)
Closed 7 years ago.
I have a dataset that looks something like this:
Type Age count1 count2 Year Pop1 Pop2 TypeDescrip
A 35 1 1 1990 30000 50000 alpha
A 35 3 1 1990 30000 50000 alpha
A 45 2 3 1990 20000 70000 alpha
B 45 2 1 1990 20000 70000 beta
B 45 4 5 1990 20000 70000 beta
I want to add the counts of the rows that are matching in the Type and Age columns. So ideally I would end up with a dataset that looks like this:
Type Age count1 count2 Year Pop1 Pop2 TypeDescrip
A 35 4 2 1990 30000 50000 alpha
A 45 2 3 1990 20000 70000 alpha
B 45 6 6 1990 20000 70000 beta
I've tried using nested duplicated() statements such as below:
typedup = duplicated(df$Type)
bothdup = duplicated(df[(typedup == TRUE),]$Age)
but this returns indices for which age or type are duplicated, not necessarily when one row has duplicates of both.
I've also tried tapply:
tapply(c(df$count1, df$count2), c(df$Age, df$Type), sum)
but this output is difficult to work with. I want to have a data.frame when I'm done.
I don't want to use a for-loop because my dataset is quite large.
Try
library(dplyr)
df1 %>%
group_by(Type, Age) %>%
summarise_each(funs(sum))
# Type Age count1 count2
#1 A 35 4 2
#2 A 45 2 3
#3 B 45 6 6
In the newer versions of dplyr
df1 %>%
group_by(Type, Age) %>%
summarise_all(sum)
Or using base R
aggregate(.~Type+Age, df1, FUN=sum)
# Type Age count1 count2
#1 A 35 4 2
#2 A 45 2 3
#3 B 45 6 6
Or
library(data.table)
setDT(df1)[, lapply(.SD, sum), .(Type, Age)]
# Type Age count1 count2
#1: A 35 4 2
#2: A 45 2 3
#3: B 45 6 6
Update
Based on the new dataset,
df2 %>%
group_by(Type, Age,Pop1, Pop2, TypeDescrip) %>%
summarise_each(funs(sum), matches('^count'))
# Type Age Pop1 Pop2 TypeDescrip count1 count2
#1 A 35 30000 50000 alpha 4 2
#2 A 45 20000 70000 beta 2 3
#3 B 45 20000 70000 beta 6 6
data
df1 <- structure(list(Type = c("A", "A", "A", "B", "B"), Age = c(35L,
35L, 45L, 45L, 45L), count1 = c(1L, 3L, 2L, 2L, 4L), count2 = c(1L,
1L, 3L, 1L, 5L)), .Names = c("Type", "Age", "count1", "count2"
), class = "data.frame", row.names = c(NA, -5L))
df2 <- structure(list(Type = c("A", "A", "A", "B", "B"), Age = c(35L,
35L, 45L, 45L, 45L), count1 = c(1L, 3L, 2L, 2L, 4L), count2 = c(1L,
1L, 3L, 1L, 5L), Year = c(1990L, 1990L, 1990L, 1990L, 1990L),
Pop1 = c(30000L, 30000L, 20000L, 20000L, 20000L), Pop2 = c(50000L,
50000L, 70000L, 70000L, 70000L), TypeDescrip = c("alpha",
"alpha", "beta", "beta", "beta")), .Names = c("Type", "Age",
"count1", "count2", "Year", "Pop1", "Pop2", "TypeDescrip"),
class = "data.frame", row.names = c(NA, -5L))
#hannah you can also use sql using the sqldf package
sqldf("select
Type,Age,
sum(count1) as sum_count1,
sum(count2) as sum_count2
from
df
group by
Type,Age
")
Related
I have an example dataset:
Road Start End Cat
1 0 50 a
1 50 60 b
1 60 90 b
1 70 75 a
2 0 20 a
2 20 25 a
2 25 40 b
Trying to output following:
Road Start End Cat
1 0 50 a
1 50 90 b
1 70 75 a
2 0 25 a
2 25 40 b
My code doesn't work:
df %>% group_by(Road, cat)
%>% summarise(
min(Start),
max(End)
)
How can I achieve the results I wanted?
We can use rleid from data.table to get the run-length-id-encoding for grouping and then do the summarise
library(dplyr)
library(data.table)
df %>%
group_by(Road, grp = rleid(Cat)) %>%
summarise(Cat = first(Cat), Start = min(Start), End = max(End)) %>%
select(-grp)
# A tibble: 5 x 4
# Groups: Road [2]
# Road Cat Start End
# <int> <chr> <int> <int>
#1 1 a 0 50
#2 1 b 50 90
#3 1 a 70 75
#4 2 a 0 25
#5 2 b 25 40
Or using data.table methods
library(data.table)
setDT(df)[, .(Start = min(Start), End = max(End)), .(Road, Cat, grp = rleid(Cat))]
data
df <- structure(list(Road = c(1L, 1L, 1L, 1L, 2L, 2L, 2L), Start = c(0L,
50L, 60L, 70L, 0L, 20L, 25L), End = c(50L, 60L, 90L, 75L, 20L,
25L, 40L), Cat = c("a", "b", "b", "a", "a", "a", "b")),
class = "data.frame", row.names = c(NA,
-7L))
There are two dataframes which I want to connect.
So because I have 2 dimensions to filter a value from the column of the second table which meets some conditions of the first table.
The first dataframe looks like this:
letter year value
A 2001
B 2002
C 2003
D 2004
second one:
letter 2001 2002 2003 2004
A 4 9 9 9
B 6 7 6 6
C 2 3 5 8
D 1 1 1 1
which gives me something like this
letter year value
A 2001 4
B 2002 7
C 2003 5
D 2004 1
thank all of you
One option is to row/column index. Here, the row index can be sequence of rows, while the column index we get from matching the 'year' column of first data with the column names of second, cbind the indexes to create a matrix ('m1') and use that to extract values from second dataset and assign those to 'value' column in first data
i1 <- seq_len(nrow(df1))
j1 <- match(df1$year, names(df2)[-1])
m1 <- cbind(i1, j1)
df1$value <- df2[-1][m1]
df1
# letter year value
#1 A 2001 4
#2 B 2002 7
#3 C 2003 5
#4 D 2004 1
For the specific example, the pattern to extract seems to be the diagonal elements, in that case, we can also use
df1$value <- diag(as.matrix(df2[-1]))
data
df1 <- structure(list(letter = c("A", "B", "C", "D"), year = 2001:2004),
class = "data.frame", row.names = c(NA,
-4L))
df2 <- structure(list(letter = c("A", "B", "C", "D"), `2001` = c(4L,
6L, 2L, 1L), `2002` = c(9L, 7L, 3L, 1L), `2003` = c(9L, 6L, 5L,
1L), `2004` = c(9L, 6L, 8L, 1L)), class = "data.frame",
row.names = c(NA,
-4L))
Another option in the tidyverse would be to first pivot your value data to a longer data frame (data from #akrun's answer):
df2.long <- df2 %>%
pivot_longer(`2001`:`2004`, names_to = 'year', values_to = 'value')
# A tibble: 16 x 3
letter year value
<chr> <chr> <int>
1 A 2001 4
2 A 2002 9
3 A 2003 9
4 A 2004 9
5 B 2001 6
6 B 2002 7
7 B 2003 6
8 B 2004 6
9 C 2001 2
10 C 2002 3
...
And then perform an inner_join to the data frame containing your desired letter/year combinations:
df.final <- df2.long %>%
mutate(year = as.numeric(year)) %>%
inner_join(df1)
letter year value
<chr> <dbl> <int>
1 A 2001 4
2 B 2002 7
3 C 2003 5
4 D 2004 1
Base R solution:
# Reshape your dataframe from wide to long:
df3 <- reshape(df2,
direction = "long",
idvar = "letter",
varying = c(names(df2)[names(df2) != "letter"]),
v.names = "Value",
timevar = "Year",
times = names(df2)[names(df2) != "letter"],
new.row.names = 1:(nrow(df2) * length(names(df2)[names(df2) != "letter"]))
)
# Inner join the long_df with the first dataframe:
df_final <- merge(df1[,c(names(df1) != "Value")], df3, by = intersect(colnames(df1), colnames(df3)))
Tidyverse solution (slightly expanding on #jdobres' solution below):
lapply(c("dplyr", "tidyr"), require, character.only = TRUE)
df3_long <-
df2 %>%
pivot_longer(`2001`:`2004`, names_to = 'year', values_to = 'value') %>%
mutate(year = as.numeric(year)) %>%
inner_join(., df1, by = intersect(colnames(df1, df2)))
Data:
df1 <-
structure(list(letter = c("A", "B", "C", "D"), year = 2001:2004),
class = "data.frame",
row.names = c(NA,-4L))
df2 <-
structure(
list(
letter = c("A", "B", "C", "D"),
`2001` = c(4L,
6L, 2L, 1L),
`2002` = c(9L, 7L, 3L, 1L),
`2003` = c(9L, 6L, 5L,
1L),
`2004` = c(9L, 6L, 8L, 1L)
),
class = "data.frame",
row.names = c(NA,-4L)
)
I am having this dataframe.
token DD1 Type DD2 Price
AB-1 2018-01-01 10:12:15 Low 2018-01-25 10000
AB-5 2018-01-10 10:12:15 Low 2018-01-25 15000
AB-2 2018-01-05 12:25:04 High 2018-01-20 25000
AB-3 2018-01-03 17:04:25 Low 2018-01-27 50000
....
AB-8 2017-12-10 21:08:12 Low 2017-12-30 60000
AB-8 2017-12-10 21:08:12 High 2017-12-30 30000
dput:
structure(list(token = structure(c(2L, 5L, 3L, 4L, 1L, 6L, 6L
), .Label = c("....", "AB-1", "AB-2", "AB-3", "AB-5", "AB-8"), class = "factor"),
DD1 = structure(c(2L, 5L, 4L, 3L, 1L, 6L, 6L), .Label = c("",
"01/01/2018 10:12:15", "03/01/2018 17:04:25", "05/01/2018 12:25:04",
"10/01/2018 10:12:15", "10/12/2017 21:08:12"), class = "factor"),
Type = structure(c(3L, 3L, 2L, 3L, 1L, 3L, 2L), .Label = c("",
"High", "Low"), class = "factor"), DD2 = structure(c(3L,
3L, 2L, 4L, 1L, 5L, 5L), .Label = c("", "20/01/2018", "25/01/2018",
"27/01/2018", "30/12/2017"), class = "factor"), Price = c(10000L,
15000L, 25000L, 50000L, NA, 60000L, 30000L)), .Names = c("token",
"DD1", "Type", "DD2", "Price"), class = "data.frame", row.names = c(NA,
-7L))
From the above mentioned dataframe I want 2 kind of sub set data frame based on date (last three date in descending order (from DD2) if row is not available for particular date than show that date with all fields as '0') and month (last three date in descending order if row is not available for particular date than show that date with all fields as '0').
Formula for Avg Low (same for Avg high): DD2-DD1 and than take Median as per nrow available.
% Formula For month: (Recent Value-Old Value)/(Old Vaule)
The code should pick last three days data as well as last three months data from dataframe whenever i run the code.
DF1:
Date nrow for Low Med Low sum of value low nrow for High Med High sum of value High
27-01-2018 1 24 50000 0 0 0
26-01-2018 0 0 0 0 0 0
25-01-2018 2 19.5 25000 0 0 0
DF2
Month nrow low % sum low % nrow high % sum high %
Jan-18 3 200% 75000 25% 1 0% 25000 -17%
Dec-17 1 100% 60000 100% 1 100% 0 100%
Nov-17 0 - - - 0 - - -
Although this Q already has an accepted answer, I felt challenged to provide an answer which uses dcast() and melt(). Any missing dates and months are completed using CJ() and joins as requested by the OP.
The code tries to reproduce OP's expected results as close as possible. The particular customisation is why the code looks so much convoluted.
If requested, I am willing to explain the code in more detail.
library(data.table)
setDT(DF)
# daily
DF1 <-
DF[, .(n = .N, days = median(difftime(as.Date(DD2, "%d/%m/%Y"),
as.Date(DD1, "%d/%m/%Y"), units = "day")),
sum = sum(Price)), by = .(DD2, Type)][
, Date := as.Date(DD2, "%d/%m/%Y")][
, dcast(.SD, Date ~ Type, value.var = c("n", "days", "sum"), fill = 0)][
.(Date = seq(max(Date), length.out = 3L, by = "-1 days")), on = "Date"][
, setcolorder(.SD, c(1, 3, 5, 7, 2, 4, 6))][
is.na(n_Low), (2:7) := lapply(.SD, function(x) 0), .SDcols = 2:7][]
DF1
Date n_Low days_Low sum_Low n_High days_High sum_High
1: 2018-01-27 1 24.0 days 50000 0 0 days 0
2: 2018-01-26 0 0.0 days 0 0 0 days 0
3: 2018-01-25 2 19.5 days 25000 0 0 days 0
# monthly
DF2 <-
DF[, Month := lubridate::floor_date(as.Date(DD2, "%d/%m/%Y"), unit = "month")][
, .(n = .N, sum = sum(Price)), by = .(Month, Type)][
CJ(Month = seq(max(Month), length.out = 3L, by = "-1 months"), Type = unique(Type)),
on = .(Month, Type)][
, melt(.SD, id.vars = c("Month", "Type"))][
is.na(value), value := 0][
, Pct := {
old <- shift(value); round(100 * ifelse(old == 0, 1, (value - old) / old))
},
by = .(variable, Type)][
, dcast(.SD, Type + Month ~ variable, value.var = c("value", "Pct"))][
, setnames(.SD, c("value_n", "value_sum"), c("n", "sum"))][
, dcast(.SD, Month ~ Type, value.var = c("n", "Pct_n", "sum", "Pct_sum"))][
order(-Month), setcolorder(.SD, c(1, 3, 5, 7, 9, 2, 4, 6, 8))]
DF2
Month n_Low Pct_n_Low sum_Low Pct_sum_Low n_High Pct_n_High sum_High Pct_sum_High
1: 2018-01-01 3 200 75000 25 1 0 25000 -17
2: 2017-12-01 1 100 60000 100 1 100 30000 100
3: 2017-11-01 0 NA 0 NA 0 NA 0 NA
Does the following approach help?
require(tidyverse)
Edit
This is a very convoluted approach and is most certainly possible to be solved more elegantly.
dat <- structure(list(token = structure(c(2L, 5L, 3L, 4L, 1L, 6L, 6L), .Label = c("....", "AB-1", "AB-2", "AB-3", "AB-5", "AB-8"), class = "character"), DD1 = structure(c(2L, 5L, 4L, 3L, 1L, 6L, 6L), .Label = c("", "01/01/2018 10:12:15", "03/01/2018 17:04:25", "05/01/2018 12:25:04", "10/01/2018 10:12:15", "10/12/2017 21:08:12"), class = "factor"),
Type = structure(c(3L, 3L, 2L, 3L, 1L, 3L, 2L), .Label = c("", "High", "Low"), class = "character"), DD2 = structure(c(3L, 3L, 2L, 4L, 1L, 5L, 5L), .Label = c("", "20/01/2018", "25/01/2018", "27/01/2018", "30/12/2017"), class = "factor"), Price = c(10000L, 15000L, 25000L, 50000L, NA, 60000L, 30000L)), .Names = c("token", "DD1", "Type", "DD2", "Price"), class = "data.frame", row.names = c(NA, -7L))
#I have included this into the code because structure(your output) had messed up a lot with factors
dat <- dat[c(1:4,6:7),]
dat <- dat %>% mutate(DD1 = dmy_hms(DD1), DD2 = dmy(DD2), Type = as.character(Type))
dat_summary <- dat %>%
mutate(diff_days = round(as.duration(DD1%--%DD2)/ddays(1),0),
#uses lubridate to calculate the number of days between each DD2 and DD1
n = n()) %>%
group_by(DD2,Type) %>% #because your operations are performed by each Type by DD2
summarise(med = median(diff_days),# calculates the median
sum = sum(Price)) # and the sum
# A tibble: 5 x 4
# Groups: DD2 [?]
DD2 Type med sum
<date> <chr> <dbl> <int>
1 2017-12-30 2 19.0 30000
2 2017-12-30 3 19.0 60000
3 2018-01-20 2 14.0 25000
4 2018-01-25 3 19.5 25000
5 2018-01-27 3 23.0 50000
Now find the first day with a value in Price
datematch <- dat %>% group_by(Type,month = floor_date(DD2, "month")) %>%
arrange(Type, desc(DD2)) %>%
summarise(maxDate = max(DD2)) %>%
select(Type, maxDate)
now create helper data frames for merging. dummy_dates will contain the last day with a value and the previous two days, for both types (low and high), all_dates will contain... well, all dates
list1 <- split(datematch$maxDate, datematch$Type)
list_type2 <- do.call('c',lapply(list1[['2']], function(x) seq(as.Date(x)-2, as.Date(x), by="days")))
list_type3 <- do.call('c',lapply(list1[['3']], function(x) seq(as.Date(x)-2, as.Date(x), by="days")))
dd_2 <- data.frame (DD2 = list_type2, Type = as.character(rep('2', length(list_type2))), stringsAsFactors = F)
dd_3 <- data.frame (DD2 = list_type3, Type = as.character(rep('3', length(list_type3))), stringsAsFactors = F)
dummy_date = rbind(dd_2, dd_3)
seq_date <- seq(as.Date('2017-12-01'),as.Date('2018-01-31'), by = 'days')
all_dates <- data.frame (DD2 = rep(seq_date,2), Type = as.character(rep(c('2','3'),each = length(seq_date))),stringsAsFactors = F)
now we can join your data frame with all days, so that every single day in the month gets a row
all_dates <- left_join(dd_date, dat_summary, by = c('DD2', 'Type'))
and we can filter this result with dummy_date, which (as we remember) contains only the required days before the last day with data
df1<- left_join(dummy_date, all_dates, by = c('DD2', 'Type')) %>% arrange(Type, desc(DD2))
df1
DD2 Type med sum
1 2018-01-20 2 14.0 25000
2 2018-01-19 2 NA NA
3 2018-01-18 2 NA NA
4 2017-12-30 2 19.0 30000
5 2017-12-29 2 NA NA
6 2017-12-28 2 NA NA
7 2018-01-27 3 23.0 50000
8 2018-01-26 3 NA NA
9 2018-01-25 3 19.5 25000
10 2017-12-30 3 19.0 60000
11 2017-12-29 3 NA NA
12 2017-12-28 3 NA NA
Sorry that 'type' is not correctly put as low and high, had problems to read your data. I hope that this helps somewhat
edit
added suggestion for a way to get to DF2
df1 %>% group_by(Type, month = floor_date(DD2, 'month')) %>%
summarise(sum = sum(sum, na.rm = T),
n = max (n1, na.rm = T)) %>%
unite(sum.n, c('sum','n')) %>%
spread(Type, sum.n) %>%
rename(low = '3', high = '2') %>%
separate(high, c('high','n_high')) %>%
separate(low, c('low','n_low')) %>%
mutate(dummy_low = as.integer(c(NA, low[1:length(low)-1])),
dummy_high = as.integer(c(NA, high[1:length(high)-1])),
low = as.integer(low),
high = as.integer(high))%>%
mutate(perc_low = 100*(low-dummy_low)/dummy_low)
# A tibble: 2 x 8
month high n_high low n_low dummy_low dummy_high perc_low
<date> <int> <chr> <int> <chr> <int> <int> <dbl>
1 2017-12-01 30000 1 60000 1 NA NA NA
2 2018-01-01 25000 1 75000 3 60000 30000 25.0
It's up to you to add the remaining columns for 'high' and the count. I am sure that the solution is not the most elegant one but it should work. DF2 has now only two months, but this is because you have provided only 2 months in your example. It should work with any number of months, and you can then filter the last three months.
I have two lists:
list 1:
id name age
1 jake 21
2 ashly 19
45 lana 18
51 james 23
5675 eric 25
list 2 (tv watch):
id hours
1 1.1
1 3
1 2.5
45 5.6
45 3
51 2
51 1
51 2
this is just an example, the real lists are very big :list 1 - 5000 id's, list 2/3/4 - has more then 1 million rows (not a unique id).
I need for every list 2 and up to calculate average/sum/count to every id and add the value to list 1.
notice that I need the calculation saved in another list with different row numbers.
example:
list 1:
id name age tv_average
1 jake 21 2.2
2 ashly 19 n/a
45 lana 18 4.3
51 james 23 1.6667
5675 eric 25 n/a
this are my tries:
for (i in 1:nrow(list2)) {
p <- subset(list2,list2$id==i)
list2$tv_average[i==list2$id] <- sum(p$hours)/(nrow(p))
}
error:
out of 22999 rows it only work on 21713 rows.
Try this
#Sample Data
data1 = structure(list(id = c(1L, 2L, 45L, 51L, 5675L), name = structure(c(3L,
1L, 5L, 4L, 2L), .Label = c("ashly", "eric", "jake", "james",
"lana"), class = "factor"), age = c(21L, 19L, 18L, 23L, 25L)
), .Names = c("id",
"name", "age"), row.names = c(NA, -5L), class = "data.frame")
data2 = structure(list(id = c(1L, 1L, 1L, 3L, 45L, 45L, 51L, 51L, 51L,
53L), hours = c(1.1, 3, 2.5, 10, 5.6, 3, 2, 1, 2, 6)), .Names = c("id",
"hours"), class = "data.frame", row.names = c(NA, -10L))
# Use aggregate to calculate Average, Sum, and Count and Merge
merge(x = data1,
y = aggregate(hours~id, data2, function(x)
c(mean = mean(x),
sum = sum(x),
count = length(x))),
by = "id",
all.x = TRUE)
# id name age hours.mean hours.sum hours.count
#1 1 jake 21 2.200000 6.600000 3.000000
#2 2 ashly 19 NA NA NA
#3 45 lana 18 4.300000 8.600000 2.000000
#4 51 james 23 1.666667 5.000000 3.000000
#5 5675 eric 25 NA NA NA
I just picked up on the package reshape today and I'm having some trouble to understand how it works.
I have the following dataframe:
name workoutnum time weight raceid final position
tommy 1 12 140 1 2
tommy 2 14 140 1 2
tommy 3 11 140 1 2
sarah 1 10 115 1 1
sarah 2 10 115 1 1
sarah 3 11 115 1 1
sarah 4 15 115 1 1
How would I put all this in one row? So the dataframe would look like:
name workoutnum1 workoutnum2 workoutnum3 workoutnum4 time1 time2 time3 time4 weight raceid final_position
tommy 1 1 1 0 12 14 11 NA 140 1 2
sarah 1 1 1 1 10 10 11 15 115 1 1
So all columns would be attached to the workout values.
Is this even the proper way to do it?
reshape seems like a natural part of what you want to do, but won't get you all the way there.
Here's a reshape2 approach that fully melts the data, then casts it back to data.frame, with some tweaks along the way to get the desired output.
Note that in the call to melt(), the variables in the id.vars arguments will remain wide. Then in dcast(), the variable that'll be cast wide is on the RHS of the ~.
library(reshape2)
library(dplyr)
# fully melt the data
d_melt <- melt(d, id.vars = c("name", "raceid", "position", "weight"))
# index the variables within name and variable
d_melt <- d_melt %>%
group_by(name, variable) %>%
mutate(i = row_number(),
wide_variable = paste0(variable, i))
# cast as wide
d_wide <- dcast(d_melt, name + raceid + position + weight ~ wide_variable, value.var = "value")
# replace the workoutnum indices with indicators for missingness
d_wide %>% mutate_each(funs(ifelse(!is.na(.), 1L, 0L)), matches("workoutnum\\d"))
# name raceid position weight time1 time2 time3 time4 workoutnum1 workoutnum2
# 1 sarah 1 1 115 10 10 11 15 1 1
# 2 tommy 1 2 140 12 14 11 NA 1 1
# workoutnum3 workoutnum4
# 1 1 1
# 2 1 0
Data:
structure(list(name = structure(c(2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("sarah", "tommy"), class = "factor"), workoutnum = c(1L, 2L, 3L, 1L, 2L, 3L, 4L), time = c(12L, 14L, 11L, 10L, 10L, 11L, 15L), weight = c(140L, 140L, 140L, 115L, 115L, 115L, 115L), raceid = c(1L, 1L, 1L, 1L, 1L, 1L, 1L), position = c(2L, 2L, 2L, 1L, 1L, 1L, 1L)), .Names = c("name", "workoutnum", "time", "weight", "raceid", "position"), class = "data.frame", row.names = c(NA, -7L))
Here's an approach using dcast from "data.table", which reshapes a little more like the reshape function in base R.
The only change I've made to the data is the inclusion of another "time" variable though, as pointed out by #rawr in the comments, it almost seems like your "workoutnum" is the time variable.
I've used getanID from my "splitstackshape" package to generate the "time" variable, but you can create this variable in many different ways.
library(splitstackshape)
dcast(getanID(mydf, c("name", "raceid", "final_position")),
name + raceid + final_position ~ .id,
value.var = c("workoutnum", "time", "weight"))
## name raceid final_position workoutnum_1 workoutnum_2 workoutnum_3
## 1: sarah 1 1 1 2 3
## 2: tommy 1 2 1 2 3
## workoutnum_4 time_1 time_2 time_3 time_4 weight_1 weight_2 weight_3 weight_4
## 1: 4 10 10 11 15 115 115 115 115
## 2: NA 12 14 11 NA 140 140 140 NA
If you're using getanID, you can also use reshape like this:
reshape(getanID(mydf, c("name", "raceid", "final_position")),
idvar = c("name", "raceid", "final_position"), timevar = ".id",
direction = "wide")
## name raceid final_position workoutnum.1 time.1 weight.1 workoutnum.2 time.2
## 1: tommy 1 2 1 12 140 2 14
## 2: sarah 1 1 1 10 115 2 10
## weight.2 workoutnum.3 time.3 weight.3 workoutnum.4 time.4 weight.4
## 1: 140 3 11 140 NA NA NA
## 2: 115 3 11 115 4 15 115
but dcast would be more efficient in general.