If I do DF$where <- tree$where after fitting an rpart object using DF as my data, will each row be mapped to its corresponding leaf through the column where?
Thanks!
As an example of how to demonstrate that this is possibly true (modulo my understanding of your question being correct), we work with the first example in ?rpart:
require(rpart)
fit <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)
kyphosis$where <- fit$where
> str(kyphosis)
'data.frame': 81 obs. of 5 variables:
$ Kyphosis: Factor w/ 2 levels "absent","present": 1 1 2 1 1 1 1 1 1 2 ...
$ Age : int 71 158 128 2 1 1 61 37 113 59 ...
$ Number : int 3 3 4 5 4 2 2 3 2 6 ...
$ Start : int 5 14 5 1 15 16 17 16 16 12 ...
$ where : int 9 7 9 9 3 3 3 3 3 8 ...
> plot(fit)
> text(fit, use.n = TRUE)
And now look at some tables based on the 'where' vector and some logical tests:
First node:
> with(kyphosis, table(where, Start >= 8.5))
where FALSE TRUE
3 0 29
5 0 12
7 0 14
8 0 7
9 19 0 # so this is the row that describes that split
> fit$frame[9,]
var n wt dev yval complexity ncompete nsurrogate yval2.V1
3 <leaf> 19 19 8 2 0.01 0 0 2.0000000
yval2.V2 yval2.V3 yval2.V4 yval2.V5 yval2.nodeprob
3 8.0000000 11.0000000 0.4210526 0.5789474 0.2345679
Second node:
> with(kyphosis, table(where, Start >= 8.5, Start>=14.5))
, , = FALSE
where FALSE TRUE
3 0 0
5 0 12
7 0 14
8 0 7
9 19 0
, , = TRUE
where FALSE TRUE
3 0 29
5 0 0
7 0 0
8 0 0
9 0 0
And this is the row of fit$frame that describes the second split:
> fit$frame[3,]
var n wt dev yval complexity ncompete nsurrogate yval2.V1
4 <leaf> 29 29 0 1 0.01 0 0 1.0000000
yval2.V2 yval2.V3 yval2.V4 yval2.V5 yval2.nodeprob
4 29.0000000 0.0000000 1.0000000 0.0000000 0.3580247
So I would characterize the value of fit$where as describing the "terminal nodes" which are being labeled as '<leaf>', which may or not be what you were calling the "nodes".
> fit$frame
var n wt dev yval complexity ncompete nsurrogate yval2.V1
1 Start 81 81 17 1 0.17647059 2 1 1.00000000
2 Start 62 62 6 1 0.01960784 2 2 1.00000000
4 <leaf> 29 29 0 1 0.01000000 0 0 1.00000000
5 Age 33 33 6 1 0.01960784 2 2 1.00000000
10 <leaf> 12 12 0 1 0.01000000 0 0 1.00000000
11 Age 21 21 6 1 0.01960784 2 0 1.00000000
22 <leaf> 14 14 2 1 0.01000000 0 0 1.00000000
23 <leaf> 7 7 3 2 0.01000000 0 0 2.00000000
3 <leaf> 19 19 8 2 0.01000000 0 0 2.00000000
yval2.V2 yval2.V3 yval2.V4 yval2.V5 yval2.nodeprob
1 64.00000000 17.00000000 0.79012346 0.20987654 1.00000000
2 56.00000000 6.00000000 0.90322581 0.09677419 0.76543210
4 29.00000000 0.00000000 1.00000000 0.00000000 0.35802469
5 27.00000000 6.00000000 0.81818182 0.18181818 0.40740741
10 12.00000000 0.00000000 1.00000000 0.00000000 0.14814815
11 15.00000000 6.00000000 0.71428571 0.28571429 0.25925926
22 12.00000000 2.00000000 0.85714286 0.14285714 0.17283951
23 3.00000000 4.00000000 0.42857143 0.57142857 0.08641975
3 8.00000000 11.00000000 0.42105263 0.57894737 0.23456790
Related
The multcomp package has a nice compact letter display function built in, but I'm using the non-parametric multiple comparison package "nparcomp" which does not appear to have a similar feature. I've noticed there are a couple packages such as multcompView and rcompanion that have CLD functions, but I'm not sure how to get my nparcomp summary to cooperate with those tools. Maybe someone here can help me? Here's an example summary of an nparcomp Tukey test:
library(nparcomp)
pristineraw.tukey <- mctp(positif.prop.total ~ dose.log, data = pristineraw, type = "Tukey", conf.level = 0.95, asy.method = "fisher", info = FALSE)
pristineraw.tukey
$Data.Info
Sample Size Effect Lower Upper
1 -4 4 0.7812500 0.65095081 0.8724403
2 -2.95860731484178 2 0.8229167 0.68706660 0.9077133
3 -1.99567862621736 4 0.6145833 0.49050216 0.7253656
4 -0.999565922520681 4 0.4166667 0.33069961 0.5080188
5 4.34272768626649e-05 4 0.1562500 0.08581288 0.2675807
6 1.0000043429231 2 0.2083333 0.12491776 0.3266579
$Contrast
1 2 3 4 5 6
2 - 1 -1 1 0 0 0 0
3 - 1 -1 0 1 0 0 0
4 - 1 -1 0 0 1 0 0
5 - 1 -1 0 0 0 1 0
6 - 1 -1 0 0 0 0 1
3 - 2 0 -1 1 0 0 0
4 - 2 0 -1 0 1 0 0
5 - 2 0 -1 0 0 1 0
6 - 2 0 -1 0 0 0 1
4 - 3 0 0 -1 1 0 0
5 - 3 0 0 -1 0 1 0
6 - 3 0 0 -1 0 0 1
5 - 4 0 0 0 -1 1 0
6 - 4 0 0 0 -1 0 1
6 - 5 0 0 0 0 -1 1
$Analysis
Estimator Lower Upper Statistic p.Value
2 - 1 0.042 -0.431 0.496 0.343 0.99761714
3 - 1 -0.167 -0.586 0.323 -1.381 0.69088411
4 - 1 -0.365 -0.648 0.007 -4.062 0.05318202
5 - 1 -0.625 -0.867 -0.144 -5.151 0.02076608
6 - 1 -0.573 -0.838 -0.090 -4.801 0.02785983
3 - 2 -0.208 -0.609 0.277 -1.763 0.50620162
4 - 2 -0.406 -0.688 -0.019 -4.320 0.04250191
5 - 2 -0.667 -0.894 -0.164 -5.205 0.02026988
6 - 2 -0.615 -0.866 -0.115 -4.930 0.02523067
4 - 3 -0.198 -0.583 0.260 -1.775 0.50151321
5 - 3 -0.458 -0.746 -0.026 -4.365 0.04027067
6 - 3 -0.406 -0.712 0.028 -3.880 0.06346250
5 - 4 -0.260 -0.561 0.101 -2.997 0.14893258
6 - 4 -0.208 -0.559 0.206 -2.078 0.37679610
6 - 5 0.052 -0.380 0.466 0.476 0.99043710
$Analysis.Inf
Estimator Lower Upper Statistic p.Value
2 - 1 0.04166667 -0.4310816 0.496466660 0.3426000 0.99761714
3 - 1 -0.16666667 -0.5861671 0.323305915 -1.3807046 0.69088411
4 - 1 -0.36458333 -0.6475061 0.006668684 -4.0618961 0.05318202
5 - 1 -0.62500000 -0.8671346 -0.143918870 -5.1509655 0.02076608
6 - 1 -0.57291667 -0.8375693 -0.090485809 -4.8010534 0.02785983
3 - 2 -0.20833333 -0.6088821 0.276867328 -1.7626807 0.50620162
4 - 2 -0.40625000 -0.6877026 -0.018637527 -4.3195377 0.04250191
5 - 2 -0.66666667 -0.8944430 -0.164222955 -5.2046137 0.02026988
6 - 2 -0.61458333 -0.8659606 -0.115293472 -4.9298694 0.02523067
4 - 3 -0.19791667 -0.5834144 0.260362074 -1.7746828 0.50151321
5 - 3 -0.45833333 -0.7460603 -0.026382368 -4.3654031 0.04027067
6 - 3 -0.40625000 -0.7115636 0.028113118 -3.8797113 0.06346250
5 - 4 -0.26041667 -0.5608547 0.100626889 -2.9973930 0.14893258
6 - 4 -0.20833333 -0.5594223 0.206138515 -2.0776563 0.37679610
6 - 5 0.05208333 -0.3804685 0.465937204 0.4758687 0.99043710
$Overall
Quantile p.Value
1 4.132777 0.02026988
$input
$input$formula
positif.prop.total ~ dose.log
$input$data
dose positif negatif dead totalNb positif.prop.total dose.log
1 0e+00 17 20 0 37 0.45945946 -4.000000e+00
2 0e+00 23 16 0 39 0.58974359 -4.000000e+00
3 0e+00 18 15 0 33 0.54545455 -4.000000e+00
4 0e+00 14 14 1 28 0.50000000 -4.000000e+00
5 1e-03 19 19 1 38 0.50000000 -2.958607e+00
6 1e-03 20 14 4 34 0.58823529 -2.958607e+00
7 1e-02 22 16 0 38 0.57894737 -1.995679e+00
8 1e-02 18 19 0 37 0.48648649 -1.995679e+00
9 1e-02 15 22 2 37 0.40540541 -1.995679e+00
10 1e-02 11 20 4 31 0.35483871 -1.995679e+00
11 1e-01 12 20 0 32 0.37500000 -9.995659e-01
12 1e-01 12 17 4 29 0.41379310 -9.995659e-01
13 1e-01 8 26 3 34 0.23529412 -9.995659e-01
14 1e-01 5 18 11 23 0.21739130 -9.995659e-01
15 1e+00 3 16 10 19 0.15789474 4.342728e-05
16 1e+00 1 16 5 17 0.05882353 4.342728e-05
17 1e+00 2 24 9 26 0.07692308 4.342728e-05
18 1e+00 7 23 6 30 0.23333333 4.342728e-05
19 1e+01 3 10 8 13 0.23076923 1.000004e+00
20 1e+01 2 20 8 22 0.09090909 1.000004e+00
$input$type
[1] "Tukey"
$input$conf.level
[1] 0.95
$input$alternative
[1] "two.sided"
$input$asy.method
[1] "fisher"
$input$plot.simci
[1] FALSE
$input$control
NULL
$input$info
[1] FALSE
$input$rounds
[1] 3
$input$contrast.matrix
NULL
$input$correlation
[1] FALSE
$input$effect
[1] "unweighted"
$input$const
[1] 0.5875441
$text.Output
[1] "True differences of relative effects are not equal to 0"
$text.output.W
[1] "Global Pseudo Ranks"
$connames
[1] "2 - 1" "3 - 1" "4 - 1" "5 - 1" "6 - 1" "3 - 2" "4 - 2" "5 - 2" "6 - 2"
[10] "4 - 3" "5 - 3" "6 - 3" "5 - 4" "6 - 4" "6 - 5"
$AsyMethod
[1] "Fisher with 5 DF"
attr(,"class")
[1] "mctp"
You can use cldList in package rcompanion. You didn't provide reproducible data so I'll use the iris data set that is included with R:
data(iris)
library(rcompanion)
library(nparcomp)
iris.mc <- mctp(Sepal.Length~Species, iris)
comp <- iris.mc$connames
pv <- iris.mc$Analysis$p.Value
cldList(comparison=comp, p.value=pv)
# Group Letter MonoLetter
# 1 2 a a
# 2 3 b b
# 3 1 c c
With the following data set
u_data
rSLn rwave rexpd y_ij rwave2 u_ij
1 1 1 199.929886 5.302956 1 5.302956
2 1 2 27.738826 3.358249 4 3.358249
3 1 3 144.000000 4.976734 9 4.976734
4 1 4 72.000000 4.290459 16 4.290459
5 1 5 0.000000 0.000000 25 0.000000
6 2 1 392.606361 5.975351 1 5.975351
7 2 2 749.524990 6.620773 4 6.620773
8 2 3 3120.000000 8.045909 9 8.045909
9 2 4 1600.000000 7.378384 16 7.378384
10 2 5 1000.000000 6.908755 25 6.908755
11 2 6 5840.000000 8.672657 36 8.672657
12 2 7 3960.000000 8.284252 49 8.284252
13 2 8 4700.000000 8.455531 64 8.455531
14 2 9 1660.000000 7.415175 81 7.415175
15 2 10 5620.000000 8.634265 100 8.634265
16 3 1 1566.117441 7.356993 1 7.356993
17 3 2 739.702016 6.607598 4 6.607598
18 3 3 0.000000 0.000000 9 0.000000
19 3 4 0.000000 0.000000 16 0.000000
20 3 5 0.000000 0.000000 25 0.000000
21 3 6 0.000000 0.000000 36 0.000000
22 3 7 0.000000 0.000000 49 0.000000
23 3 8 0.000000 0.000000 64 0.000000
24 3 9 600.000000 6.398595 81 6.398595
25 3 10 720.000000 6.580639 100 6.580639
26 4 1 249.912358 5.525104 1 5.525104
27 4 2 9.246275 2.326914 4 2.326914
28 4 3 848.000000 6.744059 9 6.744059
29 4 4 820.000000 6.710523 16 6.710523
30 4 5 968.000000 6.876265 25 6.876265
31 4 6 4800.000000 8.476580 36 8.476580
32 4 7 1572.000000 7.360740 49 7.360740
33 4 8 1960.000000 7.581210 64 7.581210
34 4 9 1800.000000 7.496097 81 7.496097
35 4 10 1700.000000 7.438972 100 7.438972
36 5 1 0.000000 0.000000 1 0.000000
37 5 2 6768.273444 8.820149 4 8.820149
38 5 3 520.000000 6.255750 9 6.255750
39 5 4 1020.000000 6.928538 16 6.928538
40 5 5 1520.000000 7.327123 25 7.327123
41 5 6 2075.000000 7.638198 36 7.638198
42 5 7 1760.000000 7.473637 49 7.473637
43 5 8 1270.000000 7.147559 64 7.147559
44 5 9 5400.000000 8.594339 81 8.594339
45 5 10 6550.000000 8.787373 100 8.787373
And with following values
ux_data=as.matrix(u_data[,c(2,5)])
ux_data=cbind(1, ux_data)
class=rbinom(length(unique(u_data$rSLn)),1,0.48)+1
thet.value=c(4.25,5.85,1.26,9.78,6.86)
n_g_i=numeric()
for ( d in unique(u_data$rSLn)){
n_g_i[d]=length(u_data$rwave[u_data$rSLn==d])
}
sigma2=0.7849
SIGMA=matrix(c(100,0,0,
0,1,0,
0,0,1/100), nrow = 3, ncol = 3, byrow = T)
I would like to execute the following code, which are working perfectly.
u_ij_C1=(u_data$u_ij[rep(class,times=n_g_i)==1] #u_ij_new belongs to cluster-1
-rep(thet.value[class==1], n_g_i[class==1]))
m_beta_C1=(solve((t(ux_data[rep(class,times=n_g_i)==1,])%*%ux_data[rep(class,times=n_g_i)==1,]/
(sigma2))+solve(SIGMA)) %*%(t(ux_data[rep(class,times=n_g_i)==1,])%*%u_ij_c1/sigma2))
sig2_beta_C1=(solve((t(ux_data[rep(class,times=n_g_i)==1,])
%*%ux_data[rep(class,times=n_g_i)==1,]/(sigma2))+solve(SIGMA)))
u_ij_C2=(u_data$u_ij[rep(class,times=n_g_i)==2] #u_ij_new belongs to cluster-2
-rep(thet.value[class==2], n_g_i[class==2]))
m_beta_C2=(solve((t(ux_data[rep(class,times=n_g_i)==2,])%*%ux_data[rep(class,times=n_g_i)==2,]/
(sigma2))+solve(SIGMA)) %*%(t(ux_data[rep(class,times=n_g_i)==2,])%*%u_ij_C2/sigma2))
sig2_beta_C2=(solve((t(ux_data[rep(class,times=n_g_i)==2,])
%*%ux_data[rep(class,times=n_g_i)==2,]/(sigma2))+solve(SIGMA)))
Each m_beta is a vector of size 3 and sig2_beta is a matrix of order 3x3
I am trying to do it with for loop, Unfortunately, it is not working
ngrp=2
u_ij_New_C12=numeric()
mu_beta_C12=numeric()
sig_beta_C12=array()
for ( k in 1:ngrp){
u_ij_New_C12[k]=(u_data$u_ij[rep(class,times=n_g_i)==k] #u_ij_new belongs to cluster-k
-rep(theta_i[class==k], n_g_i[class==k])) #repeting thetas belongs to cluster-k
sig_beta_C12[k]=(solve((t(ux_data[rep(class,times=n_g_i)==k,])
%*%ux_data[rep(class,times=n_g_i)==k,]/
(sigma2))+solve(SIGMA)))
mu_beta_C12[k]=(sig_beta_C12[k] %*%(t(ux_data[rep(class,times=n_g_i)==k,])%*%u_ij_New_C12[k]/sigma2))
}
For k=1, I am expecting the same result for the cluster-1 and the same for cluster-2. For example mu_beta_C12[1] and sig_beta_C12[1] should be exactly similar to m_beta_C1 and sig2_beta_C1 respectively.
Any help is appreciated.
I would like a way to turn an rpart tree object into a nested list of lists (a dendrogram). Ideally, the attributes in each node will include the information in the rpart object (impurity, variable and rule that is used for splitting, the number of observations funneled to that node, etc.).
Looking at the rpart$frame object, it is not clear to me how to read it. Any suggestions?
Tiny example:
library(rpart)
fit <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)
fit$frame
var n wt dev yval complexity ncompete nsurrogate yval2.V1 yval2.V2 yval2.V3 yval2.V4 yval2.V5 yval2.nodeprob
1 Start 81 81 17 1 0.17647059 2 1 1.00000000 64.00000000 17.00000000 0.79012346 0.20987654 1.00000000
2 Start 62 62 6 1 0.01960784 2 2 1.00000000 56.00000000 6.00000000 0.90322581 0.09677419 0.76543210
4 <leaf> 29 29 0 1 0.01000000 0 0 1.00000000 29.00000000 0.00000000 1.00000000 0.00000000 0.35802469
5 Age 33 33 6 1 0.01960784 2 2 1.00000000 27.00000000 6.00000000 0.81818182 0.18181818 0.40740741
10 <leaf> 12 12 0 1 0.01000000 0 0 1.00000000 12.00000000 0.00000000 1.00000000 0.00000000 0.14814815
11 Age 21 21 6 1 0.01960784 2 0 1.00000000 15.00000000 6.00000000 0.71428571 0.28571429 0.25925926
22 <leaf> 14 14 2 1 0.01000000 0 0 1.00000000 12.00000000 2.00000000 0.85714286 0.14285714 0.17283951
23 <leaf> 7 7 3 2 0.01000000 0 0 2.00000000 3.00000000 4.00000000 0.42857143 0.57142857 0.08641975
3 <leaf> 19 19 8 2 0.01000000 0 0 2.00000000 8.00000000 11.00000000 0.42105263 0.57894737 0.23456790
(the function ggdendro:::dendro_data.rpart might be helpful somehow, but I couldn't get it to really solve the problem)
Here is a GitHub gist with the function rpart2dendro for converting an object of class "rpart" to a dendrogram. Note that branches are not weighted in the output object, but it should be fairly straightforward to recursively modify the "height" attributes of the dendrogram to get proportional branch lengths. The Kyphosis example is provided at the bottom.
I have to table of data in R
a = Duration (-10,0] (0,0.25] (0.25,0.5] (0.5,10]
1 2 0 0 0 2
2 3 0 0 10 3
3 4 0 51 25 0
4 5 19 129 14 0
5 6 60 137 1 0
6 7 31 62 15 5
7 8 7 11 7 0
and
b = Duration (-10,0] (0,0.25] (0.25,0.5] (0.5,10]
1 1 0 0 1 266
2 2 1 0 47 335
3 3 1 26 415 142
4 4 3 965 508 5
5 5 145 2535 103 0
6 6 939 2239 15 6
7 7 420 613 86 34
8 8 46 84 36 16
I wouold like to calculate b/a by matching the duration. I though of some thing like ifelse() but it does not work. Can someone please help me?
Thanks a lot
Match the order and selection of b with a (in my example y with x). Then do the math.
x <- data.frame(duration = 2:8, v = rnorm(7))
y <- data.frame(duration = 8:1, v = rnorm(8))
m <- match(y$duration, x$duration)
ym <- y[m[!is.na(m)],]
x$v/ym$v
It does not work when x contains items that are not in y, btw.
Do you want something like the following:
a <- a[-1]
b <- b[-1]
a <- a[order(a$Duration),]
b <- b[order(b$Duration),]
durations <- intersect(a$Duration, b$Duration)
b[b$Duration %in% durations,] / a[a$Duration %in% durations,]
Duration (-10,0] (0,0.25] (0.25,0.5] (0.5,10]
2 1 Inf NaN Inf 167.50000
3 1 Inf Inf 41.500000 47.33333
4 1 Inf 18.921569 20.320000 Inf
5 1 7.631579 19.651163 7.357143 NaN
6 1 15.650000 16.343066 15.000000 Inf
7 1 13.548387 9.887097 5.733333 6.80000
8 1 6.571429 7.636364 5.142857 Inf
you may like to replace NaN and Inf values by something else.
I have a dataset like this:
Anno.2013 Giorni.2013 Anno.2014 Giorni.2014 Stagionalità Destagionata2013
1 18 mar 17 mer Bassa 9.3710954
2 9 mer 5 gio Bassa 4.6855477
3 9 gio 2 ven Bassa 4.6855477
4 8 ven 5 sab Bassa 4.1649313
5 4 sab 2 dom Bassa 2.0824656
6 2 dom 0 lun Bassa 1.0412328
7 1 lun 1 mar Bassa 0.5206164
8 0 mar 0 mer Bassa 0.0000000
9 2 mer 0 gio Bassa 1.0412328
10 0 gio 1 ven Bassa 0.0000000
Destagionata2014 Settimana2013 Settimana2014
1 9.4463412 1 1
2 2.7783356 1 1
3 1.1113343 1 1
4 2.7783356 1 1
5 1.1113343 1 1
6 0.0000000 1 2
7 0.5556671 2 2
8 0.0000000 2 2
9 0.0000000 2 2
10 0.5556671 2 2
> str(domanda)
'data.frame': 365 obs. of 9 variables:
$ Anno.2013 : int 18 9 9 8 4 2 1 0 2 0 ...
$ Giorni.2013 : Factor w/ 7 levels "dom","gio","lun",..: 4 5 2 7 6 1 3 4 5 2 ...
$ Anno.2014 : int 17 5 2 5 2 0 1 0 0 1 ...
$ Giorni.2014 : Factor w/ 7 levels "dom","gio","lun",..: 5 2 7 6 1 3 4 5 2 7 ...
$ Stagionalità : Factor w/ 2 levels "Alta","Bassa": 2 2 2 2 2 2 2 2 2 2 ...
$ Destagionata2013: num 9.37 4.69 4.69 4.16 2.08 ...
$ Destagionata2014: num 9.45 2.78 1.11 2.78 1.11 ...
$ Settimana2013 : Factor w/ 53 levels "1","2","3","4",..: 1 1 1 1 1 1 2 2 2 2 ...
$ Settimana2014 : Factor w/ 53 levels "1","2","3","4",..: 1 1 1 1 1 2 2 2 2 2 ...
I would like to divide every row of Destagionata2013 for the mean of Destagionata2013 grouped by Settimana2013. For example:
Destagionata2013[1:6]/mean(Destagionata2013[1:6])
I try to use tapply:
Media_Settimana<-as.vector(tapply(domanda$Anno.2013, domanda$Settimana2013, mean))
Media_Settimana
> Media_Settimana
[1] 8.333333 5.857143 3.142857 4.285714 6.428571 6.714286 13.714286 3.428571
[9] 4.000000 3.285714 11.428571 6.285714 11.714286 7.285714 12.142857 12.000000
[17] 16.000000 20.857143 19.428571 23.428571 33.857143 31.000000 31.714286 32.428571
[25] 38.571429 41.000000 36.000000 38.714286 36.714286 39.857143 40.714286 39.857143
[33] 41.714286 41.857143 41.142857 40.571429 40.428571 37.857143 32.714286 19.714286
[41] 9.000000 4.142857 5.857143 16.285714 11.000000 8.428571 4.428571 6.857143
[49] 6.285714 3.857143 7.000000 5.571429 18.500000
But I'am not able to replicate values for every row.
As MrFlick notes, you need ave instead of tapply as ave automatically recycles 1 length results to the length of the inputs. Here we do what you are trying to do with iris (normalize Sepal.Length by the mean Sepal.Width within each species):
transform(iris, norm.sep.len=Sepal.Length / ave(Sepal.Width, Species, FUN=mean))