How to deal with many days data using R - r
I have some kind of data frame in ten days. I want to use the ten days data to analysis general things.
For example, First, I need to split the data frame into groups by time interval(for example 10 seconds). Second, calculate the percentage of value "1" in each group for columns C and D separately. Finally, plot the percentage for column C and B with time in a graphic.
time B C D
1 2014-08-04 00:00:04.0 red 0 0
2 2014-08-04 00:00:06.0 red 0 0
3 2014-08-04 00:00:06.0 red 1 0
4 2014-08-04 00:00:06.2 red 0 0
5 2014-08-04 00:00:06.5 red 0 0
6 2014-08-04 00:00:07.0 red 0 1
7 2014-08-04 00:00:07.7 red 0 0
8 2014-08-04 00:00:16.0 red 0 0
9 2014-08-04 00:00:17.0 red 1 0
10 2014-08-04 00:00:18.0 red 0 0
11 2014-08-04 00:00:22.0 red 0 0
12 2014-08-04 00:00:22.0 red 0 0
13 2014-08-04 00:00:22.2 red 0 0
14 2014-08-04 00:00:25.0 red 1 0
15 2014-08-04 00:00:27.0 red 1 0
16 2014-08-04 00:00:28.0 red 0 0
17 2014-08-04 00:00:29.0 red/amber 1 0
18 2014-08-04 00:00:29.0 red/amber 1 1
19 2014-08-04 00:00:30.0 green 0 0
20 2014-08-04 00:00:40.0 green 0 1
21 2014-08-04 00:00:42.4 green 0 0
22 2014-08-04 00:00:43.0 green 0 0
23 2014-08-04 00:00:50.0 red 1 0
24 2014-08-04 00:00:51.2 red 0 0
25 2014-08-04 00:00:52.0 red 0 1
26 2014-08-04 00:00:52.0 red 1 0
27 2014-08-04 00:00:52.2 red 1 0
28 2014-08-04 00:00:52.9 red 1 1
29 2014-08-04 00:00:53.0 red 0 0
30 2014-08-04 00:00:59.0 red 0 1
31 2014-08-04 00:01:02.0 red 0 1
32 2014-08-04 00:01:03.2 red 0 1
33 2014-08-04 00:01:04.0 red 1 1
34 2014-08-04 00:01:06.4 red 0 1
35 2014-08-04 00:01:07.5 red 1 1
36 2014-08-04 00:01:08.0 red 0 1
37 2014-08-04 00:01:08.2 red 0 1
38 2014-08-04 00:01:08.4 red 0 1
39 2014-08-04 00:01:11.0 red 0 1
40 2014-08-04 00:01:13.0 red 0 1
41 2014-08-04 00:01:14.0 red 0 1
42 2014-08-04 00:01:15.0 red/amber 0 1
43 2014-08-04 00:01:15.0 red/amber 0 1
44 2014-08-04 00:01:16.0 green 0 1
45 2014-08-04 00:01:21.0 green 0 0
46 2014-08-04 00:01:26.0 green 0 0
47 2014-08-04 00:01:31.0 amber 0 0
48 2014-08-04 00:01:31.0 amber 0 0
49 2014-08-04 00:01:34.0 red 0 0
50 2014-08-04 00:01:36.0 red 0 0
The data in 11th of August:
time B C D
1 2014-08-11 00:00:02.0 red 0 0
2 2014-08-11 00:00:03.0 red 0 0
3 2014-08-11 00:00:04.0 red 0 0
4 2014-08-11 00:00:07.0 red 0 0
5 2014-08-11 00:00:08.0 red 0 0
6 2014-08-11 00:00:08.0 red 0 0
7 2014-08-11 00:00:08.2 red 0 0
8 2014-08-11 00:00:08.5 red 0 0
9 2014-08-11 00:00:08.9 red 0 0
10 2014-08-11 00:00:09.0 red 0 0
11 2014-08-11 00:00:09.5 red 0 0
12 2014-08-11 00:00:10.0 red 0 0
13 2014-08-11 00:00:10.2 red 0 0
14 2014-08-11 00:00:10.4 red 0 0
15 2014-08-11 00:00:10.5 red 0 0
16 2014-08-11 00:00:10.7 red 0 0
17 2014-08-11 00:00:11.7 red 0 0
18 2014-08-11 00:00:11.9 red 0 0
19 2014-08-11 00:00:12.0 red 0 0
20 2014-08-11 00:00:12.0 red 0 0
21 2014-08-11 00:00:12.2 red 0 0
22 2014-08-11 00:00:12.2 red 0 0
23 2014-08-11 00:00:12.5 red 0 0
24 2014-08-11 00:00:12.7 red 0 0
25 2014-08-11 00:00:13.0 red 0 0
26 2014-08-11 00:00:13.2 red 0 0
27 2014-08-11 00:00:13.2 red 0 0
28 2014-08-11 00:00:13.5 red 0 0
29 2014-08-11 00:00:13.7 red 0 0
30 2014-08-11 00:00:13.9 red 0 0
31 2014-08-11 00:00:14.2 red 0 0
32 2014-08-11 00:00:14.4 red 0 0
33 2014-08-11 00:00:14.7 red 0 0
34 2014-08-11 00:00:14.7 red 0 0
35 2014-08-11 00:00:15.0 red 0 0
36 2014-08-11 00:00:15.0 red 0 0
37 2014-08-11 00:00:15.2 red 0 0
38 2014-08-11 00:00:16.5 red 0 1
39 2014-08-11 00:00:17.0 red 0 1
40 2014-08-11 00:00:17.0 red 0 1
41 2014-08-11 00:00:17.9 red 0 1
42 2014-08-11 00:00:18.0 red 0 1
43 2014-08-11 00:00:18.0 red 0 1
44 2014-08-11 00:00:18.2 red 0 1
45 2014-08-11 00:00:18.4 red 0 1
46 2014-08-11 00:00:18.5 red 0 1
47 2014-08-11 00:00:18.7 red 0 1
48 2014-08-11 00:00:19.0 red 0 1
49 2014-08-11 00:00:19.2 red 0 1
50 2014-08-11 00:00:19.7 red 0 1
I just know how to deal with one-day data.
But how to plot it for ten days data from several days? The x-axis is only time part, not includes date to get the general results by those days. That means combining all days data for a average result
It's just an example, I did lots of things into difficulties whenever I need handle many days data to average for general results. Thx for help. T^T
library(reshape2)
library(ggplot2)
df$time <- as.POSIXct(cut(as.POSIXct(df$time), "10 secs"))
df.mlt <- melt(df, id.var=c("time", "B"))
ggplot(df.mlt, aes(x=time, y=value, color=variable)) +
stat_summary(geom="point", fun.y=mean, shape=1) +
stat_smooth()
For the first two parts, you could try: (here, it is split by 10 secs, not clear whether you want to include days also)
library(data.table)
df$time1 <- as.POSIXct(cut(as.POSIXct(df$time, format= "%Y-%m-%d %H:%M:%S"), "10 secs"))
df1 <- df[,-1] #deleted the time column
dt <- data.table(df1, key='time1')
dt1 <- dt[, list(C1=round(100*(sum(C==1)/.N),2), D1=round(100*(sum(D==1)/.N),2)), by=time1]
dt1
# time1 C1 D1
#1: 2014-08-04 00:00:04 14.29 14.29
#2: 2014-08-04 00:00:14 16.67 0.00
#3: 2014-08-04 00:00:24 66.67 16.67
#4: 2014-08-04 00:00:34 0.00 33.33
#5: 2014-08-04 00:00:44 57.14 28.57
#6: 2014-08-04 00:00:54 0.00 100.00
#7: 2014-08-04 00:01:04 25.00 100.00
#8: 2014-08-04 00:01:14 0.00 80.00
#9: 2014-08-04 00:01:24 0.00 0.00
#10: 2014-08-04 00:01:34 0.00 0.00
#11: 2014-08-10 23:59:54 0.00 0.00
#12: 2014-08-11 00:00:04 0.00 0.00
#13: 2014-08-11 00:00:14 0.00 65.00
Update
dt1[, list(C1=mean(C1), D1= mean(D1)), by=list(timeN=gsub("^.*\\s+","", time1))]
# timeN C1 D1
#1: 00:00:04 7.145 7.145
#2: 00:00:14 8.335 32.500
#3: 00:00:24 66.670 16.670
#4: 00:00:34 0.000 33.330
#5: 00:00:44 57.140 28.570
#6: 00:00:54 0.000 100.000
#7: 00:01:04 25.000 100.000
#8: 00:01:14 0.000 80.000
#9: 00:01:24 0.000 0.000
#10: 00:01:34 0.000 0.000
#11: 23:59:54 0.000 0.000
Update2
I think you need this. There is a difference in values. In the previous case, it was just the average of proportions. Here, I am taking the proportions from each cut time interval across days. Possibly, this is more correct.
df1$timeN <- gsub("^.*\\s+", "", df1$time1)
dt <- data.table(df1, key='timeN')
dt1 <- dt[,list(C1=round(100*(sum(C==1)/.N),2), D1=round(100*(sum(D==1)/.N),2)), by=timeN]
dt1
# timeN C1 D1
#1: 00:00:04 14.29 14.29
#2: 00:00:14 16.67 0.00
#3: 00:00:24 66.67 16.67
#4: 00:00:34 0.00 33.33
#5: 00:00:44 57.14 28.57
#6: 00:00:54 0.00 100.00
#7: 00:01:04 25.00 100.00
#8: 00:01:14 0.00 80.00
#9: 00:01:24 0.00 0.00
#10: 00:01:34 0.00 0.00
Related
get the name of child list in a list in R with lapply function
How can I get the name of child list in a list in R? My list is like: $sd1 freq value order 11 1.15 17 0 12 2.12 13 0 13 2.81 21 0 14 4.13 15 0 15 4.84 18 0 16 7.54 59 0 17 9.36 17 0 $sd2 freq value order 31 0.63 4 0 32 1.54 3 0 33 3.22 3 0 34 3.98 4 0 35 4.66 38 0 36 7.14 3 0 37 9.39 29 0 $sd3 freq value order 41 0.97 4 0 42 2.03 7 0 43 2.65 4 0 44 3.34 680 0 45 4.15 4 0 46 6.67 10 0 47 7.51 6 0 48 8.35 4 0 49 10.57 4 0 50 15.97 6 0 I'd like to get sd1,sd2... with lapply function and make some changes on each child list of sd1, sd2, etc.
Split data frame into groups by time and apply a function to multiple columns using R
Data frames sg is as following: time B C D 1 2014-08-04 00:00:04.0 red 0 0 2 2014-08-04 00:00:06.0 red 0 0 3 2014-08-04 00:00:06.0 red 1 0 4 2014-08-04 00:00:06.2 red 0 0 5 2014-08-04 00:00:06.5 red 0 0 6 2014-08-04 00:00:07.0 red 0 1 7 2014-08-04 00:00:07.7 red 0 0 8 2014-08-04 00:00:16.0 red 0 0 9 2014-08-04 00:00:17.0 red 1 0 10 2014-08-04 00:00:18.0 red 0 0 11 2014-08-04 00:00:22.0 red 0 0 12 2014-08-04 00:00:22.0 red 0 0 13 2014-08-04 00:00:22.2 red 0 0 14 2014-08-04 00:00:25.0 red 1 0 15 2014-08-04 00:00:27.0 red 1 0 16 2014-08-04 00:00:28.0 red 0 0 17 2014-08-04 00:00:29.0 red/amber 1 0 18 2014-08-04 00:00:29.0 red/amber 1 1 19 2014-08-04 00:00:30.0 green 0 0 20 2014-08-04 00:00:40.0 green 0 1 21 2014-08-04 00:00:42.4 green 0 0 22 2014-08-04 00:00:43.0 green 0 0 23 2014-08-04 00:00:50.0 red 1 0 24 2014-08-04 00:00:51.2 red 0 0 25 2014-08-04 00:00:52.0 red 0 1 26 2014-08-04 00:00:52.0 red 1 0 27 2014-08-04 00:00:52.2 red 1 0 28 2014-08-04 00:00:52.9 red 1 1 29 2014-08-04 00:00:53.0 red 0 0 30 2014-08-04 00:00:59.0 red 0 1 31 2014-08-04 00:01:02.0 red 0 1 32 2014-08-04 00:01:03.2 red 0 1 33 2014-08-04 00:01:04.0 red 1 1 34 2014-08-04 00:01:06.4 red 0 1 35 2014-08-04 00:01:07.5 red 1 1 36 2014-08-04 00:01:08.0 red 0 1 37 2014-08-04 00:01:08.2 red 0 1 38 2014-08-04 00:01:08.4 red 0 1 39 2014-08-04 00:01:11.0 red 0 1 40 2014-08-04 00:01:13.0 red 0 1 41 2014-08-04 00:01:14.0 red 0 1 42 2014-08-04 00:01:15.0 red/amber 0 1 43 2014-08-04 00:01:15.0 red/amber 0 1 44 2014-08-04 00:01:16.0 green 0 1 45 2014-08-04 00:01:21.0 green 0 0 46 2014-08-04 00:01:26.0 green 0 0 47 2014-08-04 00:01:31.0 amber 0 0 48 2014-08-04 00:01:31.0 amber 0 0 49 2014-08-04 00:01:34.0 red 0 0 50 2014-08-04 00:01:36.0 red 0 0 First, I need to split the data frame into groups by time interval(for example 10 seconds). Second, calculate the percentage of value "1" in each group for columns C and D separately. Finally, plot the percentage for column C and B with time in a graphic. I did it for single variable. My solution is : percentage.occupied <- function(x) (NROW(subset(x,C==1)))/(NROW(x)) splitbytime <- ddply(selectstatus309, .(cut(time,"10 seconds")),percentage.occupied) colnames(splitbytime)<-c("time","occupancy") occupancy <- ggplot(splitbytime, aes(x=(as.POSIXct(splitbytime$time)),y=occupancy)) + geom_point(shape=1) + geom_smooth()+ xlab("time") + ylab("% occupancy") The graphic is looks like the following pic, I plot it for column C. What I need is to plot the percentage for C and D respectively in one graphic. I am not sure if I describe my question clear (┬_┬) I took BrodieG's solution and apply it to a period time(1 hour) of my data. I followed each step but plot something wrong: Besides, there is an error: geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method. geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method. Error in smooth.construct.cr.smooth.spec(object, data, knots) : x has insufficient unique values to support 10 knots: reduce k. I guess the error is not the reason for the strange plot. You can see there is one part of the melted df as following, from which I refer the result is impossible to be just 1 or 0. time B variable value 10520 2014-08-04 15:10:00 green dt_5 0 10521 2014-08-04 15:10:00 green dt_5 0 10522 2014-08-04 15:10:00 green dt_5 0 10523 2014-08-04 15:10:00 green dt_5 0 10524 2014-08-04 15:10:00 green dt_5 0 10525 2014-08-04 15:10:00 green dt_5 0 10526 2014-08-04 15:10:00 green dt_5 0 10527 2014-08-04 15:10:00 green dt_5 0 10528 2014-08-04 15:10:00 green dt_5 1 10529 2014-08-04 15:10:00 amber dt_5 1 10530 2014-08-04 15:10:00 amber dt_5 1 10531 2014-08-04 15:10:00 amber dt_5 1 10532 2014-08-04 15:10:00 amber dt_5 1 10533 2014-08-04 15:10:00 amber dt_5 1 10534 2014-08-04 15:10:00 amber dt_5 1 10535 2014-08-04 15:10:00 amber dt_5 0 10536 2014-08-04 15:10:00 amber dt_5 0 10537 2014-08-04 15:10:00 amber dt_5 0 10538 2014-08-04 15:10:00 amber dt_5 0 10539 2014-08-04 15:10:00 amber dt_5 0 10540 2014-08-04 15:10:00 amber dt_5 0 10541 2014-08-04 15:10:00 red dt_5 0 10542 2014-08-04 15:10:00 red dt_5 0 10543 2014-08-04 15:10:00 red dt_5 0 10544 2014-08-04 15:10:00 red dt_5 0 10545 2014-08-04 15:10:00 red dt_5 0 The code is here: selectstatus309.mlt <- melt(selectstatus309,id.var=c("time","B")) percentage<- ggplot(selectstatus309.mlt, aes(x=time,y=value,color=variable))+ stat_summary(geom="point", fun.y =mean,shape=1)+ stat_smooth()+ facet_wrap(~ B) Sorry for the looooong and verbose story! T。T
Here is an option. First we make our cut time data: library(reshape2) library(ggplot2) df$time <- as.POSIXct(cut(as.POSIXct(df$time), "10 secs")) Then we melt it so the values in C and D are in the same column so we can use that as an aesthetic. This is the key step to have the two plots in the same graphic as you want. Inspect df.mlt to see how it is different from df. ggplot likes data in long format to use it's built-in data segmentation tools. df.mlt <- melt(df, id.var=c("time", "B")) Then we use stat_summary to plot the dots (no need to resort to ddply): ggplot(df.mlt, aes(x=time, y=value, color=variable)) + stat_summary(geom="point", fun.y=mean, shape=1) + stat_smooth() produces (on your subset of data): Note how I'm able to split out the data by whether it is "C" or "D". You can even facet by B: ggplot(df.mlt, aes(x=time, y=value, color=variable)) + stat_summary(geom="point", fun.y=mean, shape=1) + stat_smooth() + facet_wrap(~ B)
Error in is.constant(y) : (list) object cannot be coerced to type 'double'
I have a sample file that I am using to forecast as a panel series. The steps followed are library(plm) library(Formula) library(forecast) library(timeDate) library(zoo) df1 <- read.csv("full panel Test Input.csv", header=TRUE, sep=",") pdf1 <- plm.data(df1,index=c("state","time")) fmodel1 <- plm(M4~M1+M2+M3,data=pdf1,model="within") fcast <- forecast(fmodel1,data=pdf1) I get the error stated in the subject exactly at step nine. The process and the data is fine since we have checked it in stata. Any help is appreciated. M1 M2 M3 M4 state time 466 63 14 10 AZ 2013w31 0 63 0 0 AZ 2013w32 480 63 77 270 AZ 2013w33 0 63 0 0 AZ 2013w34 10 0 742 40 AZ 2013w35 0 0 0 0 AZ 2013w36 0 0 210 10 AZ 2013w37 1049 28 168 30 AZ 2013w38 1148 35 203 20 AZ 2013w39 5130 182 21 10 AZ 2013w40 8667 427 0 10 AZ 2013w41 460000 10731 14 20 AZ 2013w42 1000000 27608 0 120 AZ 2013w43 0 27608 0 0 AZ 2013w44 18494 1344 7 30 AZ 2013w45 15775 1176 21 10 AZ 2013w46 15516 1197 0 40 AZ 2013w47 0 1197 0 0 AZ 2013w48 0 1197 0 0 AZ 2013w49 11280 700 0 30 AZ 2013w50 6320 336 14 50 AZ 2013w51 765 35 0 20 AZ 2013w52 230 0 0 10 AZ 2014w1 0 0 0 0 NJ 2013w31 0 0 0 0 NJ 2013w32 6 0 0 10 NJ 2013w33
Convert data frame from wide to long with 2 variables
I have the following wide data frame (mydf.wide): DAY JAN F1 FEB F2 MAR F3 APR F4 MAY F5 JUN F6 JUL F7 AUG F8 SEP F9 OCT F10 NOV F11 DEC F12 1 169 0 296 0 1095 0 599 0 1361 0 1746 0 2411 0 2516 0 1614 0 908 0 488 0 209 0 2 193 0 554 0 1085 0 1820 0 1723 0 2787 0 2548 0 1402 0 1633 0 897 0 411 0 250 0 3 246 0 533 0 1111 0 1817 0 2238 0 2747 0 1575 0 1912 0 705 0 813 0 156 0 164 0 4 222 0 547 0 1125 0 1789 0 2181 0 2309 0 1569 0 1798 0 1463 0 878 0 241 0 230 0 I want to produce the following "semi-long": DAY variable_month value_month value_F 1 JAN 169 0 I tried: library(reshape2) mydf.long <- melt(mydf.wide, id.vars=c("YEAR","DAY"), measure.vars=c("JAN","FEB","MAR","APR","MAY","JUN","JUL","AUG","SEP","OCT","NOV","DEC")) but this skip the F variable and I don't know how to deal with two variables...
This is one of those cases where reshape(...) in base R is a better option. months <- c(2,4,6,8,10,12,14,16,18,20,22,24) # column numbers of months F <- c(3,5,7,9,11,13,15,17,19,21,23,25) # column numbers of Fn mydf.long <- reshape(mydf.wide,idvar=1, times=colnames(mydf.wide)[months], varying=list(months,F), v.names=c("value_month","value_F"), direction="long") colnames(mydf.long)[2] <- "variable_month" head(mydf.long) # DAY variable_month value_month value_F # 1.JAN 1 JAN 169 0 # 2.JAN 2 JAN 193 0 # 3.JAN 3 JAN 246 0 # 4.JAN 4 JAN 222 0 # 1.FEB 1 FEB 296 0 # 2.FEB 2 FEB 554 0 You can also do this with 2 calls to melt(...) library(reshape2) months <- c(2,4,6,8,10,12,14,16,18,20,22,24) # column numbers of months F <- c(3,5,7,9,11,13,15,17,19,21,23,25) # column numbers of Fn z.1 <- melt(mydf.wide,id=1,measure=months, variable.name="variable_month",value.name="value_month") z.2 <- melt(mydf.wide,id=1,measure=F,value.name="value_F") mydf.long <- cbind(z.1,value_F=z.2$value_F) head(mydf.long) # DAY variable_month value_month z.2$value_F # 1 1 JAN 169 0 # 2 2 JAN 193 0 # 3 3 JAN 246 0 # 4 4 JAN 222 0 # 5 1 FEB 296 0 # 6 2 FEB 554 0
melt() and dcast() are available from the reshape2 and data.table packages. The recent versions of data.table allow to melt multiple columns simultaneously. The patterns() parameter can be used to specify the two sets of columns by regular expressions: library(data.table) # CRAN version 1.10.4 used regex_month <- toupper(paste(month.abb, collapse = "|")) mydf.long <- melt(setDT(mydf.wide), measure.vars = patterns(regex_month, "F\\d"), value.name = c("MONTH", "F")) # rename factor levels mydf.long[, variable := forcats::lvls_revalue(variable, toupper(month.abb))][] DAY variable MONTH F 1: 1 JAN 169 0 2: 2 JAN 193 0 3: 3 JAN 246 0 4: 4 JAN 222 0 5: 1 FEB 296 0 ... 44: 4 NOV 241 0 45: 1 DEC 209 0 46: 2 DEC 250 0 47: 3 DEC 164 0 48: 4 DEC 230 0 DAY variable MONTH F Note that "F\\d" is used as regular expression in patterns(). A simple "F" would have catched FEB as well as F1, F2, etc. producing unexpected results. Also note that mydf.wide needs to be coerced to a data.table object. Otherwise, reshape2::melt() will be dispatched on a data.frame object which doesn't recognize patterns(). Data library(data.table) mydf.wide <- fread( "DAY JAN F1 FEB F2 MAR F3 APR F4 MAY F5 JUN F6 JUL F7 AUG F8 SEP F9 OCT F10 NOV F11 DEC F12 1 169 0 296 0 1095 0 599 0 1361 0 1746 0 2411 0 2516 0 1614 0 908 0 488 0 209 0 2 193 0 554 0 1085 0 1820 0 1723 0 2787 0 2548 0 1402 0 1633 0 897 0 411 0 250 0 3 246 0 533 0 1111 0 1817 0 2238 0 2747 0 1575 0 1912 0 705 0 813 0 156 0 164 0 4 222 0 547 0 1125 0 1789 0 2181 0 2309 0 1569 0 1798 0 1463 0 878 0 241 0 230 0", data.table = FALSE)
R Aggregate A Data Frame By Columns Instead of By Rows
I am trying to aggregate the columns of this data frame by unique column name (date). I keep getting an error. I have tried merge_all, merge_recurse, and aggregate but can not get it to work. I have hit an impasse that is seemingly unconquerable with my knowledge set and I can not find answers that are helping anywhere. Is this even possible? The data frame is below: 2014-02-14 2014-02-14 2014-02-14 2014-02-21 2014-06-20 2014-06-20 2014-06-20 2014-09-19 Totals PutWing 12 -6 0 171 7 -31 0 0 -77 Ten -6 0 0 24 -19 52 0 0 -10 Eighteen -15 0 0 73 0 -70 0 0 100 Thirty 0 0 0 -149 41 64 0 0 -463 FortyTwo 0 0 0 -91 0 121 0 0 426 ATM 44 0 0 -118 -25 -199 0 0 -134 FortyTwoC 0 0 0 -67 14 0 0 0 792 ThirtyC 0 0 0 79 0 0 0 0 -509 EighteenC 61 0 0 -57 0 -32 0 0 20 CallWing 1 0 0 -48 0 0 0 0 -28 Totals 95 -6 0 -183 17 -95 0 0 116 SlopeRisk 0 0 0 26 5 -6 0 0 -26
Assuming your data is in df: df <- t(df) rownames(df) <- substr(rownames(df), 1, 11) # only necessary if you get funny row names from data import; if your data is as it's shown you can skip this step. df.agg <- aggregate(df, by=list(rownames(df)), sum) row.names(df.agg) <- df.agg[[1]] t(df.agg[-1]) Produces: # Totals X2014.02.14 X2014.02.21 X2014.06.20 X2014.09.19 # PutWing -77 6 171 -24 0 # Ten -10 -6 24 33 0 # Eighteen 100 -15 73 -70 0 # Thirty -463 0 -149 105 0 # FortyTwo 426 0 -91 121 0 # ATM -134 44 -118 -224 0 # FortyTwoC 792 0 -67 14 0 # ThirtyC -509 0 79 0 0 # EighteenC 20 61 -57 -32 0 # CallWing -28 1 -48 0 0 # Totals 116 89 -183 -78 0 # SlopeRisk -26 0 26 -1 0 Basically, you need to transpose your data to use all the group/apply functions that R offers. After transposing, you could also use plyr, data.table, or dplyr to do the aggregation instead of aggregate as I did, but those are all non-base packages. This will need some cleaning up column names, etc, but I'll leave that up to you.