ceph crush map - replication - rules

Still somewhat confused how Ceph crush maps work and was hoping someone can shed some light. Here's my osd tree:
core#store101 ~ $ ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 6.00000 root default
-2 3.00000 datacenter dc1
-4 3.00000 rack rack_dc1
-10 1.00000 host store101
4 1.00000 osd.4 up 1.00000 1.00000
-7 1.00000 host store102
1 1.00000 osd.1 up 1.00000 1.00000
-9 1.00000 host store103
3 1.00000 osd.3 up 1.00000 1.00000
-3 3.00000 datacenter dc2
-5 3.00000 rack rack_dc2
-6 1.00000 host store104
0 1.00000 osd.0 up 1.00000 1.00000
-8 1.00000 host store105
2 1.00000 osd.2 up 1.00000 1.00000
-11 1.00000 host store106
5 1.00000 osd.5 up 1.00000 1.00000
I'm simply trying to make sure that, with a replication value of 2 or more, all replicas of an object are not in the same datacenter. The rule I had (taken from the internet) is:
rule replicated_ruleset_dc {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step choose firstn 2 type datacenter
step choose firstn 2 type rack
step chooseleaf firstn 0 type host
step emit
}
However, if I dump the placement groups, straight off I see two osd's from the same datacenter. osd's 5,0
core#store101 ~ $ ceph pg dump | grep 5,0
1.73 0 0 0 0 0 0 0 0 active+clean 2015-07-09 13:41:36.939197 0'0 96:113 [5,0] 5 [5,0] 5 0'0 2015-07-09 12:05:01.854945 0'0 2015-07-09 12:05:01.854945
1.70 0 0 0 0 0 0 0 0 active+clean 2015-07-09 13:41:36.947403 0'0 96:45 [5,0] 5 [5,0] 5 0'0 2015-07-09 12:05:01.854941 0'0 2015-07-09 12:05:01.854941
1.6f 0 0 0 0 0 0 0 0 active+clean 2015-07-09 13:41:36.947056 0'0 96:45 [5,0] 5 [5,0] 5 0'0 2015-07-09 12:05:01.854940 0'0 2015-07-09 12:05:01.854940
1.6c 0 0 0 0 0 0 0 0 active+clean 2015-07-09 13:41:36.938591 0'0 96:45 [5,0] 5 [5,0] 5 0'0 2015-07-09 12:05:01.854939 0'0 2015-07-09 12:05:01.854939
1.66 0 0 0 0 0 0 0 0 active+clean 2015-07-09 13:41:36.937803 0'0 96:107 [5,0] 5 [5,0] 5 0'0 2015-07-09 12:05:01.854936 0'0 2015-07-09 12:05:01.854936
1.67 0 0 0 0 0 0 0 0 active+clean 2015-07-09 13:41:36.929323 0'0 96:33 [5,0] 5 [5,0] 5 0'0 2015-07-09 12:05:01.854937 0'0 2015-07-09 12:05:01.854937
1.65 0 0 0 0 0 0 0 0 active+clean 2015-07-09 13:41:36.928200 0'0 96:33 [5,0] 5 [5,0] 5 0'0 2015-07-09 12:05:01.854936 0'0 2015-07-09 12:05:01.854936
1.63 0 0 0 0 0 0 0 0 active+clean 2015-07-09 13:41:36.927642 0'0 96:107 [5,0] 5 [5,0] 5 0'0 2015-07-09 12:05:01.854935 0'0 2015-07-09 12:05:01.854935
1.3f 0 0 0 0 0 0 0 0 active+clean 2015-07-09 13:41:36.924738 0'0 96:33 [5,0] 5 [5,0] 5 0'0 2015-07-09 12:05:01.854920 0'0 2015-07-09 12:05:01.854920
1.36 0 0 0 0 0 0 0 0 active+clean 2015-07-09 13:41:36.917833 0'0 96:45 [5,0] 5 [5,0] 5 0'0 2015-07-09 12:05:01.854916 0'0 2015-07-09 12:05:01.854916
1.33 0 0 0 0 0 0 0 0 active+clean 2015-07-09 13:41:36.911484 0'0 96:104 [5,0] 5 [5,0] 5 0'0 2015-07-09 12:05:01.854915 0'0 2015-07-09 12:05:01.854915
1.2b 0 0 0 0 0 0 0 0 active+clean 2015-07-09 13:41:36.878280 0'0 96:58 [5,0] 5 [5,0] 5 0'0 2015-07-09 12:05:01.854911 0'0 2015-07-09 12:05:01.854911
1.5 0 0 0 0 0 0 0 0 active+clean 2015-07-09 13:41:36.942620 0'0 96:98 [5,0] 5 [5,0] 5 0'0 2015-07-09 12:05:01.854892 0'0 2015-07-09 12:05:01.854892
How do I get ensure that at least one replica is alwasy in another dc?

I changed my ceph crush map yesterday:
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 181.99979 root default
-12 90.99989 rack rack1
-2 15.46999 host ceph0
1 3.64000 osd.1 up 1.00000 1.00000
0 3.64000 osd.0 up 1.00000 1.00000
8 2.73000 osd.8 up 1.00000 1.00000
9 2.73000 osd.9 up 1.00000 1.00000
19 2.73000 osd.19 up 1.00000 1.00000
...
-13 90.99989 rack rack2
-3 15.46999 host ceph2
2 3.64000 osd.2 up 1.00000 1.00000
3 3.64000 osd.3 up 1.00000 1.00000
10 2.73000 osd.10 up 1.00000 1.00000
11 2.73000 osd.11 up 1.00000 1.00000
18 2.73000 osd.18 up 1.00000 1.00000
...
rack rack1 {
id -12 # do not change unnecessarily
# weight 91.000
alg straw
hash 0 # rjenkins1
item ceph0 weight 15.470
...
}
rack rack2 {
id -13 # do not change unnecessarily
# weight 91.000
alg straw
hash 0 # rjenkins1
item ceph2 weight 15.470
...
}
root default {
id -1 # do not change unnecessarily
# weight 182.000
alg straw
hash 0 # rjenkins1
item rack1 weight 91.000
item rack2 weight 91.000
}
rule racky {
ruleset 3
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type rack
step emit
}
Please show your "root default" section
And try this
rule replicated_ruleset_dc {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type datacenter
step emit
}

Related

get the name of child list in a list in R with lapply function

How can I get the name of child list in a list in R? My list is like:
$sd1
freq value order
11 1.15 17 0
12 2.12 13 0
13 2.81 21 0
14 4.13 15 0
15 4.84 18 0
16 7.54 59 0
17 9.36 17 0
$sd2
freq value order
31 0.63 4 0
32 1.54 3 0
33 3.22 3 0
34 3.98 4 0
35 4.66 38 0
36 7.14 3 0
37 9.39 29 0
$sd3
freq value order
41 0.97 4 0
42 2.03 7 0
43 2.65 4 0
44 3.34 680 0
45 4.15 4 0
46 6.67 10 0
47 7.51 6 0
48 8.35 4 0
49 10.57 4 0
50 15.97 6 0
I'd like to get sd1,sd2... with lapply function and make some changes on each child list of sd1, sd2, etc.

R plot function gives out weird answer

I tried to use R plot to get the curve for some functions. Sometimes, I got very weird results. Here is an example
u=c(2,2,2,2,2)
elas=function(P){
prob=P/sum(u,P)
return(prob)
}
plot(elas,0,6)
This code gives out the plot like this:
It is obviously not right. The right code should be like this:
I know that if I change the 3rd line of the code to be
prob=P/(sum(u)+P)
It would work. But I do not understand why my original code does not work. Does it mean that I cannot plot a function which embeds another function?
sum(u,P) is a single value equal to the sum of all the values in u and P. So in elas, every value of P get's divided by the same number (313 in your example).
sum(u) + P is a vector containing each individual value of P with sum(u) added to it. So in the second version of elas (which I've called elas2 below), P/(sum(u) + P) results in element-by-element division of P by sum(u) + P.
Consider the examples below.
u=c(2,2,2,2,2)
x=seq(0,6,length=101)
sum(u,x)
[1] 313
sum(u) + x
[1] 10.00 10.06 10.12 10.18 10.24 10.30 10.36 10.42 10.48 10.54 10.60 10.66 10.72 10.78
[15] 10.84 10.90 10.96 11.02 11.08 11.14 11.20 11.26 11.32 11.38 11.44 11.50 11.56 11.62
[29] 11.68 11.74 11.80 11.86 11.92 11.98 12.04 12.10 12.16 12.22 12.28 12.34 12.40 12.46
[43] 12.52 12.58 12.64 12.70 12.76 12.82 12.88 12.94 13.00 13.06 13.12 13.18 13.24 13.30
[57] 13.36 13.42 13.48 13.54 13.60 13.66 13.72 13.78 13.84 13.90 13.96 14.02 14.08 14.14
[71] 14.20 14.26 14.32 14.38 14.44 14.50 14.56 14.62 14.68 14.74 14.80 14.86 14.92 14.98
[85] 15.04 15.10 15.16 15.22 15.28 15.34 15.40 15.46 15.52 15.58 15.64 15.70 15.76 15.82
[99] 15.88 15.94 16.00
par(mfrow=c(1,3))
elas=function(P) {
P/sum(u,P)
}
dat = data.frame(x, y=elas(x), y_calc=x/sum(u,x))
plot(dat$x, dat$y, type="l", lwd=2, ylim=c(0,0.020))
plot(elas, 0, 6, lwd=2, ylim=c(0,0.020))
curve(elas, 0, 6, lwd=2, ylim=c(0,0.020))
dat$y - dat$y_calc
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[43] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[85] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
elas2 = function(P) {
P/(sum(u) + P)
}
dat$y2 = elas2(x)
plot(dat$x, dat$y2, type="l", lwd=2, ylim=c(0,0.4))
plot(elas2, 0, 6, lwd=2, ylim=c(0,0.4))
curve(elas2, 0, 6, lwd=2, ylim=c(0,0.4))
sum(u+P) = sum of each value in u plus P
sum(u) + P = sum of values in u plus P.
Example: u = c(1,2,3), P = 5
sum(u+P) = (1+5) + (2+5) + (3+5) = 6+7+8 = 21
sum(u) + P = (1+2+3) + 5 = 6 + 5 = 11
For
elas <- function(P){
prob=P/(sum(u+P)
return(prob)
}
with u <- c(2,2,2,2,2):
y <- elas(0:6)
print(y)
#output of print y:
0.00000000 0.03225806 0.06451613 0.09677419 0.12903226 0.16129032
0.19354839
plot(0:6,y)
For
elas <- function(P){
prob=P/(sum(u) + P)
return(prob)
}
y <- elas(0:6)
print(y)
#Output of print(y)
plot(0:6,y)
0.00000000 0.09090909 0.16666667 0.23076923 0.28571429 0.33333333 0.37500000

How to extract an index from vector contained in list?

I tried to extract indexes from okresy when values fulfills the condition of ifelse. Results of below showed lapply loop are confusing to me. What are this ascending large numbers, and how can I extract indexes in each vector from a list?
okresy <- list(okres96, okres97, okres98, okres99, okres00, okres01, okres02, okres03, okres04, okres05, okres06, okres07, okres08, okres09, okres10, okres11, okres12, okres13, okres14, okres15, okres16, okres17)
day1 <- "1996-05-31"
day2 <- "2012-05-02"
day1 <- as.Date(day1, "%Y-%m-%d")
day2 <- as.Date(day2, "%Y-%m-%d")
values <- lapply(okresy, function(x) ifelse(day1 <= x & x <= day2, x, 0))
values
Results:
[[1]]
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[35] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[69] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[103] 0 0 9647 9650 9651 9652 9654 9657 9658 9659 9660 9661 9664 9665 9666 9667 9668 9671 9672 9673 9674 9675 9678 9679 9680 9681 9682 9685 9686 9687 9688 9689 9692 9693
[137] 9694 9695 9696 9699 9700 9701 9702 9703 9706 9707 9708 9709 9710 9713 9714 9715 9716 9717 9720 9721 9722 9724 9727 9728 9729 9730 9731 9734 9735 9736 9737 9738 9741 9742
[171] 9743 9744 9745 9748 9749 9750 9751 9752 9755 9756 9757 9758 9759 9762 9763 9764 9765 9766 9769 9770 9771 9772 9773 9776 9777 9778 9779 9780 9783 9784 9785 9786 9787 9790
[205] 9791 9792 9793 9794 9797 9798 9799 9800 9804 9805 9806 9807 9808 9812 9813 9814 9815 9818 9819 9820 9821 9822 9825 9826 9827 9828 9829 9832 9833 9834 9835 9836 9839 9840
[239] 9841 9842 9843 9846 9847 9848 9849 9850 9853 9854 9860 9861
[[2]]
[1] 9863 9864 9867 9868 9869 9870 9871 9874 9875 9876 9877 9878 9881 9882 9883 9884 9885 9888 9889 9890 9891 9892 9895 9896 9897 9898 9899 9902
[29] 9903 9904 9905 9906 9909 9910 9911 9912 9913 9916 9917 9918 9919 9920 9923 9924 9925 9926 9927 9930 9931 9932 9933 9934 9937 9938 9939 9940
[57] 9941 9944 9945 9946 9947 9948 9952 9953 9954 9955 9958 9959 9960 9961 9962 9965 9966 9967 9968 9969 9972 9973 9974 9975 9976 9979 9980 9981
[85] 9986 9987 9988 9989 9990 9993 9994 9995 9996 9997 10000 10001 10002 10003 10004 10007 10008 10009 10011 10014 10015 10016 10017 10018 10021 10022 10023 10024
[113] 10025 10028 10029 10030 10031 10032 10035 10036 10037 10038 10039 10042 10043 10044 10045 10046 10049 10050 10051 10052 10053 10056 10057 10058 10059 10060 10063 10064
[141] 10065 10066 10067 10070 10071 10072 10073 10074 10077 10078 10079 10080 10081 10084 10085 10086 10087 10091 10092 10093 10094 10095 10098 10099 10100 10101 10102 10105
[169] 10106 10107 10108 10109 10112 10113 10114 10115 10116 10119 10120 10121 10122 10123 10126 10127 10128 10129 10130 10133 10134 10135 10136 10137 10140 10141 10142 10143
[197] 10144 10147 10148 10149 10150 10151 10154 10155 10156 10157 10158 10161 10162 10163 10164 10165 10168 10169 10170 10171 10172 10177 10178 10179 10182 10183 10184 10185
[225] 10186 10189 10190 10191 10192 10193 10196 10197 10198 10199 10200 10203 10204 10205 10206 10207 10210 10211 10212 10213 10214 10217 10218 10219 10224 10225 10226
(...)

How to deal with many days data using R

I have some kind of data frame in ten days. I want to use the ten days data to analysis general things.
For example, First, I need to split the data frame into groups by time interval(for example 10 seconds). Second, calculate the percentage of value "1" in each group for columns C and D separately. Finally, plot the percentage for column C and B with time in a graphic.
time B C D
1 2014-08-04 00:00:04.0 red 0 0
2 2014-08-04 00:00:06.0 red 0 0
3 2014-08-04 00:00:06.0 red 1 0
4 2014-08-04 00:00:06.2 red 0 0
5 2014-08-04 00:00:06.5 red 0 0
6 2014-08-04 00:00:07.0 red 0 1
7 2014-08-04 00:00:07.7 red 0 0
8 2014-08-04 00:00:16.0 red 0 0
9 2014-08-04 00:00:17.0 red 1 0
10 2014-08-04 00:00:18.0 red 0 0
11 2014-08-04 00:00:22.0 red 0 0
12 2014-08-04 00:00:22.0 red 0 0
13 2014-08-04 00:00:22.2 red 0 0
14 2014-08-04 00:00:25.0 red 1 0
15 2014-08-04 00:00:27.0 red 1 0
16 2014-08-04 00:00:28.0 red 0 0
17 2014-08-04 00:00:29.0 red/amber 1 0
18 2014-08-04 00:00:29.0 red/amber 1 1
19 2014-08-04 00:00:30.0 green 0 0
20 2014-08-04 00:00:40.0 green 0 1
21 2014-08-04 00:00:42.4 green 0 0
22 2014-08-04 00:00:43.0 green 0 0
23 2014-08-04 00:00:50.0 red 1 0
24 2014-08-04 00:00:51.2 red 0 0
25 2014-08-04 00:00:52.0 red 0 1
26 2014-08-04 00:00:52.0 red 1 0
27 2014-08-04 00:00:52.2 red 1 0
28 2014-08-04 00:00:52.9 red 1 1
29 2014-08-04 00:00:53.0 red 0 0
30 2014-08-04 00:00:59.0 red 0 1
31 2014-08-04 00:01:02.0 red 0 1
32 2014-08-04 00:01:03.2 red 0 1
33 2014-08-04 00:01:04.0 red 1 1
34 2014-08-04 00:01:06.4 red 0 1
35 2014-08-04 00:01:07.5 red 1 1
36 2014-08-04 00:01:08.0 red 0 1
37 2014-08-04 00:01:08.2 red 0 1
38 2014-08-04 00:01:08.4 red 0 1
39 2014-08-04 00:01:11.0 red 0 1
40 2014-08-04 00:01:13.0 red 0 1
41 2014-08-04 00:01:14.0 red 0 1
42 2014-08-04 00:01:15.0 red/amber 0 1
43 2014-08-04 00:01:15.0 red/amber 0 1
44 2014-08-04 00:01:16.0 green 0 1
45 2014-08-04 00:01:21.0 green 0 0
46 2014-08-04 00:01:26.0 green 0 0
47 2014-08-04 00:01:31.0 amber 0 0
48 2014-08-04 00:01:31.0 amber 0 0
49 2014-08-04 00:01:34.0 red 0 0
50 2014-08-04 00:01:36.0 red 0 0
The data in 11th of August:
time B C D
1 2014-08-11 00:00:02.0 red 0 0
2 2014-08-11 00:00:03.0 red 0 0
3 2014-08-11 00:00:04.0 red 0 0
4 2014-08-11 00:00:07.0 red 0 0
5 2014-08-11 00:00:08.0 red 0 0
6 2014-08-11 00:00:08.0 red 0 0
7 2014-08-11 00:00:08.2 red 0 0
8 2014-08-11 00:00:08.5 red 0 0
9 2014-08-11 00:00:08.9 red 0 0
10 2014-08-11 00:00:09.0 red 0 0
11 2014-08-11 00:00:09.5 red 0 0
12 2014-08-11 00:00:10.0 red 0 0
13 2014-08-11 00:00:10.2 red 0 0
14 2014-08-11 00:00:10.4 red 0 0
15 2014-08-11 00:00:10.5 red 0 0
16 2014-08-11 00:00:10.7 red 0 0
17 2014-08-11 00:00:11.7 red 0 0
18 2014-08-11 00:00:11.9 red 0 0
19 2014-08-11 00:00:12.0 red 0 0
20 2014-08-11 00:00:12.0 red 0 0
21 2014-08-11 00:00:12.2 red 0 0
22 2014-08-11 00:00:12.2 red 0 0
23 2014-08-11 00:00:12.5 red 0 0
24 2014-08-11 00:00:12.7 red 0 0
25 2014-08-11 00:00:13.0 red 0 0
26 2014-08-11 00:00:13.2 red 0 0
27 2014-08-11 00:00:13.2 red 0 0
28 2014-08-11 00:00:13.5 red 0 0
29 2014-08-11 00:00:13.7 red 0 0
30 2014-08-11 00:00:13.9 red 0 0
31 2014-08-11 00:00:14.2 red 0 0
32 2014-08-11 00:00:14.4 red 0 0
33 2014-08-11 00:00:14.7 red 0 0
34 2014-08-11 00:00:14.7 red 0 0
35 2014-08-11 00:00:15.0 red 0 0
36 2014-08-11 00:00:15.0 red 0 0
37 2014-08-11 00:00:15.2 red 0 0
38 2014-08-11 00:00:16.5 red 0 1
39 2014-08-11 00:00:17.0 red 0 1
40 2014-08-11 00:00:17.0 red 0 1
41 2014-08-11 00:00:17.9 red 0 1
42 2014-08-11 00:00:18.0 red 0 1
43 2014-08-11 00:00:18.0 red 0 1
44 2014-08-11 00:00:18.2 red 0 1
45 2014-08-11 00:00:18.4 red 0 1
46 2014-08-11 00:00:18.5 red 0 1
47 2014-08-11 00:00:18.7 red 0 1
48 2014-08-11 00:00:19.0 red 0 1
49 2014-08-11 00:00:19.2 red 0 1
50 2014-08-11 00:00:19.7 red 0 1
I just know how to deal with one-day data.
But how to plot it for ten days data from several days? The x-axis is only time part, not includes date to get the general results by those days. That means combining all days data for a average result
It's just an example, I did lots of things into difficulties whenever I need handle many days data to average for general results. Thx for help. T^T
library(reshape2)
library(ggplot2)
df$time <- as.POSIXct(cut(as.POSIXct(df$time), "10 secs"))
df.mlt <- melt(df, id.var=c("time", "B"))
ggplot(df.mlt, aes(x=time, y=value, color=variable)) +
stat_summary(geom="point", fun.y=mean, shape=1) +
stat_smooth()
For the first two parts, you could try: (here, it is split by 10 secs, not clear whether you want to include days also)
library(data.table)
df$time1 <- as.POSIXct(cut(as.POSIXct(df$time, format= "%Y-%m-%d %H:%M:%S"), "10 secs"))
df1 <- df[,-1] #deleted the time column
dt <- data.table(df1, key='time1')
dt1 <- dt[, list(C1=round(100*(sum(C==1)/.N),2), D1=round(100*(sum(D==1)/.N),2)), by=time1]
dt1
# time1 C1 D1
#1: 2014-08-04 00:00:04 14.29 14.29
#2: 2014-08-04 00:00:14 16.67 0.00
#3: 2014-08-04 00:00:24 66.67 16.67
#4: 2014-08-04 00:00:34 0.00 33.33
#5: 2014-08-04 00:00:44 57.14 28.57
#6: 2014-08-04 00:00:54 0.00 100.00
#7: 2014-08-04 00:01:04 25.00 100.00
#8: 2014-08-04 00:01:14 0.00 80.00
#9: 2014-08-04 00:01:24 0.00 0.00
#10: 2014-08-04 00:01:34 0.00 0.00
#11: 2014-08-10 23:59:54 0.00 0.00
#12: 2014-08-11 00:00:04 0.00 0.00
#13: 2014-08-11 00:00:14 0.00 65.00
Update
dt1[, list(C1=mean(C1), D1= mean(D1)), by=list(timeN=gsub("^.*\\s+","", time1))]
# timeN C1 D1
#1: 00:00:04 7.145 7.145
#2: 00:00:14 8.335 32.500
#3: 00:00:24 66.670 16.670
#4: 00:00:34 0.000 33.330
#5: 00:00:44 57.140 28.570
#6: 00:00:54 0.000 100.000
#7: 00:01:04 25.000 100.000
#8: 00:01:14 0.000 80.000
#9: 00:01:24 0.000 0.000
#10: 00:01:34 0.000 0.000
#11: 23:59:54 0.000 0.000
Update2
I think you need this. There is a difference in values. In the previous case, it was just the average of proportions. Here, I am taking the proportions from each cut time interval across days. Possibly, this is more correct.
df1$timeN <- gsub("^.*\\s+", "", df1$time1)
dt <- data.table(df1, key='timeN')
dt1 <- dt[,list(C1=round(100*(sum(C==1)/.N),2), D1=round(100*(sum(D==1)/.N),2)), by=timeN]
dt1
# timeN C1 D1
#1: 00:00:04 14.29 14.29
#2: 00:00:14 16.67 0.00
#3: 00:00:24 66.67 16.67
#4: 00:00:34 0.00 33.33
#5: 00:00:44 57.14 28.57
#6: 00:00:54 0.00 100.00
#7: 00:01:04 25.00 100.00
#8: 00:01:14 0.00 80.00
#9: 00:01:24 0.00 0.00
#10: 00:01:34 0.00 0.00

R Aggregate A Data Frame By Columns Instead of By Rows

I am trying to aggregate the columns of this data frame by unique column name (date). I keep getting an error. I have tried merge_all, merge_recurse, and aggregate but can not get it to work. I have hit an impasse that is seemingly unconquerable with my knowledge set and I can not find answers that are helping anywhere. Is this even possible? The data frame is below:
2014-02-14 2014-02-14 2014-02-14 2014-02-21 2014-06-20 2014-06-20 2014-06-20 2014-09-19 Totals
PutWing 12 -6 0 171 7 -31 0 0 -77
Ten -6 0 0 24 -19 52 0 0 -10
Eighteen -15 0 0 73 0 -70 0 0 100
Thirty 0 0 0 -149 41 64 0 0 -463
FortyTwo 0 0 0 -91 0 121 0 0 426
ATM 44 0 0 -118 -25 -199 0 0 -134
FortyTwoC 0 0 0 -67 14 0 0 0 792
ThirtyC 0 0 0 79 0 0 0 0 -509
EighteenC 61 0 0 -57 0 -32 0 0 20
CallWing 1 0 0 -48 0 0 0 0 -28
Totals 95 -6 0 -183 17 -95 0 0 116
SlopeRisk 0 0 0 26 5 -6 0 0 -26
Assuming your data is in df:
df <- t(df)
rownames(df) <- substr(rownames(df), 1, 11) # only necessary if you get funny row names from data import; if your data is as it's shown you can skip this step.
df.agg <- aggregate(df, by=list(rownames(df)), sum)
row.names(df.agg) <- df.agg[[1]]
t(df.agg[-1])
Produces:
# Totals X2014.02.14 X2014.02.21 X2014.06.20 X2014.09.19
# PutWing -77 6 171 -24 0
# Ten -10 -6 24 33 0
# Eighteen 100 -15 73 -70 0
# Thirty -463 0 -149 105 0
# FortyTwo 426 0 -91 121 0
# ATM -134 44 -118 -224 0
# FortyTwoC 792 0 -67 14 0
# ThirtyC -509 0 79 0 0
# EighteenC 20 61 -57 -32 0
# CallWing -28 1 -48 0 0
# Totals 116 89 -183 -78 0
# SlopeRisk -26 0 26 -1 0
Basically, you need to transpose your data to use all the group/apply functions that R offers. After transposing, you could also use plyr, data.table, or dplyr to do the aggregation instead of aggregate as I did, but those are all non-base packages.
This will need some cleaning up column names, etc, but I'll leave that up to you.

Resources