lattice::xyplot for multiple lines from quantile regression output - r

This is a data.frame whose third "column" is in fact a matrix:
pred.Alb <- structure(list(Age =
c(20, 30, 40, 50, 60, 70, 80, 20, 30, 40,
50, 60, 70, 80), Sex = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Male", "Female"),
class = "factor"),
pred = structure(c(4.34976914720261, 4.3165897157342, 4.2834102842658,
4.23952109360855, 4.15279286619591, 4.05535487959442, 3.95791689299294,
4.02417706540447, 4.05661037005163, 4.08904367469879, 4.0942071858864,
3.9902915232358, 3.85910606712565, 3.72792061101549, 4.37709246711838,
4.38914906337186, 4.40120565962535, 4.3964228776405, 4.32428258270227,
4.23530290952571, 4.14632323634915, 4.3, 4.3, 4.3, 4.28809523809524,
4.22857142857143, 4.15714285714286, 4.08571428571429, 4.59781730640631,
4.59910124381436, 4.60038518122242, 4.58132673532165, 4.48089875618564,
4.36012839374081, 4.23935803129598, 4.39298701298701, 4.39711229946524,
4.40123758594347, 4.39484310896076, 4.34636957813428, 4.28737628384687,
4.22838298955946), .Dim = c(14L, 3L), .Dimnames = list(c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12",
"13", "14"), c("tau= 0.10", "tau= 0.25", "tau= 0.50")))),
.Names = c("Age", "Sex", "pred"), out.attrs =
structure(list(dim = structure(c(7L, 2L), .Names = c("Age", "Sex")),
dimnames = structure(list(Age = c("Age=20",
"Age=30", "Age=40", "Age=50", "Age=60", "Age=70", "Age=80"),
Sex = c("Sex=Male", "Sex=Female")),
.Names = c("Age", "Sex"))),
.Names = c("dim", "dimnames")), row.names = c(NA, -14L),
class = "data.frame")
It was created with this code:
require(rms) # also loads Hmisc
require(quantreg) # might also get loaded by rms
rqAlb10fit2 <- rq(BL_ALBUMIN ~ rcs(Age,3) *Sex , data=redBan,
tau= c(0.1, 0.25, 0.5) )
pred.Alb <- expand.grid(Age=seq(20,80,by=10), Sex=c("Male", "Female") )
pred.Alb$pred <- predict(rqAlb10fit2,
newdata=expand.grid(Age=seq(20,80,by=10), Sex=c("Male", "Female") ) )
I would like to have a line plot of the predictions by Sex and tau level. I can get a points plot with:
xyplot(pred~Age|Sex, data=pred.Alb, type="p")
When I add type="l", the lines slew back and forth connecting the various levels of tau.
I doubt that it matters, but running on Mac 10.7.5 with quantreg_4.96/rms_3.6-3/Hmisc_3.10-1. If you want to show me a ggplot solution with classic theme, I'm OK with that too, it's just that I am not very good with ggplot2 and Harrell's rms package is mated to lattice.

The problem appears to be that y loses its dimension attribute when it's passed into the panel function, becoming a simple vector. It still goes ahead and plots, recycling x to match y's length, which you can't see type="p", but can when type="l".
Here is a custom panel function that accomplishes what you want by first converting y back to a matrix and then calling panel.xyplot separately on each of its columns:
panel.matplot <- function(x,y,...) {
y <- matrix(y, nrow=length(x))
apply(y, 2, function(Y) panel.xyplot(x,Y, ...))
}
xyplot(pred~Age|Sex, data=pred.Alb, type="l", panel=panel.matplot)
BTW: In cases like this, I often find it useful to poke around 'inside' the panel function call. A simple way to do this is to construct a dummy panel function containing a browser() call. Here, for example, is how I discovered the problem in this case:
xyplot(pred~Age|Sex, data=pred.Alb, type="l",
panel = function(x,y,...) browser())
Browse[2]> x
# [1] 20 30 40 50 60 70 80
Browse[2]> y
# [1] 4.349769 4.316590 4.283410 4.239521 4.152793 4.055355 3.957917 4.377092
# [9] 4.389149 4.401206 4.396423 4.324283 4.235303 4.146323 4.597817 4.599101
# [17] 4.600385 4.581327 4.480899 4.360128 4.239358
... at which point the required fix is both (a) pretty obvious and (b) can be tested out from within the existing browser call.

You can do this by reshaping to long and using the groups argument to xyplot:
pred2 <- as.data.frame(pred.Alb$pred)
varying=names(pred2)
pred2$Age <- pred.Alb$Age
pred2$Sex <- pred.Alb$Sex
pred2.long <- reshape(pred2, direction='long', varying=varying, sep='= ')
xyplot(tau~Age|Sex, data=pred2.long, type="l", groups=time)

Related

Interaction effect plot with CIs and emmeans contrast

I'm having trouble creating an interaction effect plot. There is probably something fairly simple I don't yet know how to do. I'm pretty new to R and ggplot. My reprex is below. Your insight is greatly appreciated!
The data is from UCLA and I'm also adapting their example for my purposes here.
library(here)
library(emmeans)
library(tidyverse)
dat <- read.csv("https://stats.idre.ucla.edu/wp-content/uploads/2019/03/exercise.csv")
Convert prog into factor variable
dat$prog <- factor(dat$prog, labels = c("jog","swim","read"))
The model
contcat <- lm(loss ~ hours * prog, data=dat)
summary(contcat)
I create mylist with certain points on hours and the two categories in prog that I want to contrast.
(mylist <- list(hours = seq(0, 4, .5), prog=c("jog","read")))
I then pass the object contcat into the emmeans. I request that predicted values of every combination of hours and prog be specified in at=mylist and store the output into an object called emcontcat.
emcontcat <- emmeans(contcat, ~ hours * prog, at=mylist)
I use emmip to output a set of values using plotit=FALSE.
contcatdat <- emmip(contcat, prog ~ hours, at = mylist, CIs=TRUE, plotit=FALSE)
The output object is fed to ggplot. The interaction effect is plotted along with CI bands.
ggplot(data=contcatdat, aes(x=hours, y=yvar, color=prog)) +
geom_line() +
geom_ribbon(aes(ymax=UCL, aymin=LCL, fill=prog), alpha=0.4)
The plot looks like this:
But overlapping CIs do not always correspond to the portions of the lines where there is no significant differences in predicted values. I want to add hashed lines for the portions of the lines where there is no significant difference in predicted values. This figure below
shows the kind of figure I'm trying to create. (The figure is from a paper by Trenton Mize (2019) found here at Fig. 14.)
To get the simple effect (i.e., difference of two predicted values), I pass emcontcat into a function called contrast where we can request "pairwise" differences (or simple effects). P-values are given for jog - read at each level of hours that was specified in mylist.
contrast(emcontcat, "pairwise", by="hours")
The output:
Where I am having trouble is how to incorporate the simple effect (i.e., the parts of hours where jog - read are significantly different or not) into ggplot as hashed or solid portions of the lines like the Mize 2019 figure.
We want to know if the intervals overlap, and if so, we want dashed lines. Actually that's easy by writing a respective function itvl_is_l(). However, on the LHS of the plot, there is just one point, but to draw a line we need a minimum of two. So we have to interpolate with "approximate", which is also done internally in the plot functions. Since we want to do everything for the two progs, we use by.
Preprocessing
## merge interpolations by prog
aux <- by(contcatdat, contcatdat$prog, \(x) {
x <- merge(x, data.frame(hours=with(x, seq.int(min(hours), max(hours),
length.out=1e3))), all=TRUE)
x$prog <- unique(na.omit(x$prog))
u <- c('yvar', 'LCL', 'UCL')
x[u] <- lapply(x[u], \(x) approx(x, xout=seq_along(x))$y)
x
})
## logical interval intersect function
itvl_is_l <- \(a, b) {unname(as.vector(ifelse(b[, 1] > a[, 2] | a[, 1] > b[2], TRUE, FALSE)))}
## check if intersecting CIs
its <- itvl_is_l(aux$jog[c('LCL', 'UCL')], aux$read[c('LCL', 'UCL')])
aux <- lapply(aux, `[<-`, 'its', val=its) ## add as variable
aux <- lapply(aux, \(x) transform(x, itsn=cumsum(c(0, diff(x$its)) != 0) + 1)) ## making a sequence out of it
contcatdat <- do.call(rbind, aux) ## combine back as contcatdat
Plot
clr <- c('#FF0000', '#0000FF', '#0000001A') ## some colors
png('foo.png', 600, 400) ## open .png device
plot(yvar ~ hours, contcatdat, type='n')
grid()
## lines left
lines(yvar ~ hours, contcatdat, subset=prog == 'jog' & itsn > 2, lwd=2, col=clr[1])
lines(yvar ~ hours, contcatdat, subset=prog == 'read' & itsn > 2, lwd=2, col=clr[2])
## lines middle, dashed
lines(yvar ~ hours, contcatdat, subset=prog == 'jog' & itsn == 2, lwd=2, col=clr[1], lty=2)
lines(yvar ~ hours, contcatdat, subset=prog == 'read' & itsn == 2, lwd=2, col=clr[2], lty=2)
## lines right
lines(yvar ~ hours, contcatdat, subset=prog == 'jog' & itsn < 2, lwd=2, col=clr[1])
lines(yvar ~ hours, contcatdat, subset=prog == 'read' & itsn < 2, lwd=2, col=clr[2])
## CIs
with(subset(contcatdat, prog == 'jog'),
polygon(c(hours, rev(hours)), c(UCL, rev(LCL)), border=NA, col=clr[3]))
with(subset(contcatdat, prog == 'read'),
polygon(c(hours, rev(hours)), c(UCL, rev(LCL)), border=NA, col=clr[3]))
## legend
legend('topleft', legend=unique(contcatdat$prog), title='Group', col=clr[1:2], lty=1, lwd=2)
dev.off() ## close .png device
You could also try to plot the polygons first and opaque with a border, if that might look better.
Data:
contcatdat <- structure(list(prog = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), levels = c("jog",
"read"), class = "factor"), hours = c(0, 0, 0.5, 0.5, 1, 1, 1.5,
1.5, 2, 2, 2.5, 2.5, 3, 3, 3.5, 3.5, 4, 4), yvar = c(-6.78065983345649,
2.21637209230689, -3.05428518360714, 0.738291278604121, 0.672089466242214,
-0.739789535098646, 4.39846411609157, -2.21787034880141, 8.12483876594092,
-3.69595116250418, 11.8512134157903, -5.17403197620695, 15.5775880656396,
-6.65211278990971, 19.303962715489, -8.13019360361248, 23.0303373653383,
-9.60827441731525), SE = c(1.64384530410457, 1.48612021916972,
1.25520349531108, 1.14711211184156, 0.87926401607137, 0.820840725755632,
0.543079708493216, 0.531312719216624, 0.375535476484592, 0.376041650300328,
0.558013604603198, 0.501120592808483, 0.89777081499028, 0.781944232621328,
1.27470257475094, 1.1056003463909, 1.66373129934114, 1.44356083265185
), df = c(894, 894, 894, 894, 894, 894, 894, 894, 894, 894, 894,
894, 894, 894, 894, 894, 894, 894), LCL = c(-10.0069052579393,
-0.700318757711651, -5.51777400669205, -1.51305511813823, -1.05357261502514,
-2.35078883599747, 3.33260443922245, -3.26063588462286, 7.38780492844162,
-4.43397842739773, 10.7560441598055, -6.15754180868669, 13.815604150934,
-8.18677301395645, 16.8022045883112, -10.3000681349591, 19.7650632676689,
-12.4414373187615), UCL = c(-3.55441440897366, 5.13306294232543,
-0.590796360522233, 2.98963767534648, 2.39775154750957, 0.871209765800175,
5.46432379296068, -1.17510481297997, 8.86187260344022, -2.95792389761063,
12.946382671775, -4.19052214372721, 17.3395719803452, -5.11745256586298,
21.8057208426668, -5.96031907226584, 26.2956114630078, -6.77511151586902
), tvar = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), levels = c("jog", "read"), class = "factor"),
xvar = c(0, 0, 0.5, 0.5, 1, 1, 1.5, 1.5, 2, 2, 2.5, 2.5,
3, 3, 3.5, 3.5, 4, 4)), estName = "yvar", clNames = c("lower.CL",
"upper.CL"), pri.vars = c("prog", "hours"), adjust = "none", side = 0, delta = 0, type = "link", mesg = "Confidence level used: 0.95", row.names = c(NA,
18L), class = c("summary_emm", "data.frame"), labs = list(xlab = "hours",
ylab = "Linear prediction", tlab = "prog"), vars = list(byvars = character(0),
tvars = "prog"))

Row wise parallel Processing in R?

I am working on large data sets, for which i have written a code to perform row by row operation on a data frame, which is sequential. The process is slow.
I am trying to perform the operation using parallel processing to make it fast.
Here is code
library(geometry)
# Data set - a
data_a = structure(c(10.4515034409741, 15.6780890052356, 12.5581992918563,
9.19067944250871, 14.4459166666667, 11.414, 17.65325, 12.468,
11.273, 15.5945), .Dim = c(5L, 2L), .Dimnames = list(c("1", "2",
"3", "4", "5"), c("a", "b")))
# Data set - b
data_b = structure(c(10.4515034409741, 15.6780890052356, 12.5581992918563,
9.19067944250871, 14.4459166666667, 11.3318076923077, 13.132273830156,
6.16003995082975, 11.59114820435, 10.9573192090395, 11.414, 17.65325,
12.468, 11.273, 15.5945, 11.5245, 12.0249, 6.3186, 13.744, 11.0921), .Dim = c(10L,
2L), .Dimnames = list(c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"), c("a",
"b")))
conv_hull_1 <- convhulln( data_a, options = "FA") # Draw Convex Hull
test = c()
for (i in 1:nrow(data_b)){
df = c()
con_hull_all <- inhulln(conv_hull_1, matrix(data_b[i,], ncol = 2))
df$flag <- ifelse(con_hull_all[1] == TRUE , 0 , ifelse(con_hull_all[1] == FALSE , 1, 2))
test <- as.data.frame(rbind(test, df))
print(i)
}
test
Is there any way to parallelize row wise computation?
As you can observe, for small datasets the computational time is really low, but as soon as i increase the data size, the computation time increases drastically.
Can you provide solution with the code.
Thanks in advance.
You could take advantage of the parameter to the inhulln function. This allows more than one row of points to be tested to be passed in.
I've tried the code below on a 320,000 row matrix that I made from the original data and it's quick.
library(geometry)
library(dplyr)
# Data set - a
data_a = structure(
c(
10.4515034409741,
15.6780890052356,
12.5581992918563,
9.19067944250871,
14.4459166666667,
11.414,
17.65325,
12.468,
11.273,
15.5945
),
.Dim = c(5L, 2L),
.Dimnames = list(c("1", "2",
"3", "4", "5"), c("a", "b"))
)
# Data set - b
data_b = structure(
c(
10.4515034409741,
15.6780890052356,
12.5581992918563,
9.19067944250871,
14.4459166666667,
11.3318076923077,
13.132273830156,
6.16003995082975,
11.59114820435,
10.9573192090395,
11.414,
17.65325,
12.468,
11.273,
15.5945,
11.5245,
12.0249,
6.3186,
13.744,
11.0921
),
.Dim = c(10L,
2L),
.Dimnames = list(c(
"1", "2", "3", "4", "5", "6", "7", "8", "9", "10"
), c("a",
"b"))
)
conv_hull_1 <- convhulln( data_a, options = "FA") # Draw Convex Hull
#Make a big data_b
for (i in 1:15) {
data_b = rbind(data_b, data_b)
}
In_Or_Out <- inhulln(conv_hull_1, data_b)
result <- data.frame(data_b) %>% bind_cols(InOrOut=In_Or_Out)
I use dplyr::bind_cols to bind the in or out result to a data frame version of the original data so you might need some changes for your specific environment.

How can I compute the median absolute deviation (MAD) for generalized linear mixed-effects models

I know my question is linked to stats but I'm looking for a solution in R, so I believe it's suited for SO.
I built a generalized linear mixed-effects model (GLMM) using the glmer function from the lme4 package in R to model species richness around aquaculture sites based on significant explanatory variables using Zuur et al. (2009) Mixed Effects Models and Extensions in Ecology with R. The model is:
Mod1 <- glmer(Richness ~ Distance + Depth + Substrate + Beggiatoa +
Distance*Beggiatoa + (1|Site/transect), family = poisson, data = mydata)
Now I have a full data set collected at different sites and I want to assess how this model performs on the new data set.
Following a question on CV, someone suggested to look for the median absolute deviation (mad) on the new data set. I tried the mad function from the stats package in R but I get the following error message:
Error in x[!is.na(x)] : object of type 'S4' is not subsettable
In addition: Warning messages:
1: In is.na(x) : is.na() applied to non-(list or vector) of type 'S4'
2: In is.na(x) : is.na() applied to non-(list or vector) of type 'S4'
Does anybody knows what's going wrong here? Is it that mad in stats can't be calculated for GLMMs? If so, is there another R package to calculate mad from GLMMs?
Edit:
To give you an idea of my data, here's the output from dput(head(mydata)), also note that there's no "Substrate" category in the new data set and "S" refers to "Richness":
structure(list(S = c(0, 1, 2, 3, 3, 2), Site = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = c("BC", "BH", "GC", "IS", "Ref"
), class = "factor"), Transect = structure(c(4L, 4L, 4L, 4L,
4L, 4L), .Label = c("10GC", "10IS", "10N", "10S", "11IS", "12IS",
"13E", "1GC", "1N", "1W", "2E", "2GC", "2IS", "2N", "2W", "2WA",
"3E", "3GC", "3IS", "3N", "3S", "4E", "4GC", "4IS", "4S", "4W",
"5GC", "5IS", "5S", "6GC", "6IS", "6N", "6S", "6W", "7E", "7GC",
"7IS", "8GC", "8IS", "8W", "9E", "9GC", "9IS", "9N", "RefBC1",
"RefBC10", "RefBC11", "RefBC12", "RefBC2", "RefBC3", "RefBC4",
"RefBC5", "RefBC6", "RefBC7", "RefBC8", "RefBC9", "X1", "X2"), class = "factor"),
Distance = c(2, 20, 40, 80, 120, 160), Depth = c(40L, 40L,
50L, 40L, 40L, 40L), Beggiatoa = c(2, 1, 1, 0, 0, 0)), .Names = c("S",
"Site", "Transect", "Distance", "Depth", "Beggiatoa"), row.names = c(NA,
6L), class = "data.frame")
For within-sample error, the median absolute deviation computation would just be
mad(residuals(fitted_model))
... you might want residuals(fitted_model,type="response"), since residuals will give you deviance residuals by default (see ?residuals.merMod)
If you want to look at out-of-sample error, you could do something like this:
pred <- predict(fitted_model,
newdata = newdf,
type = "response",
re.form=~0)
mad(pred, center=newdf$S)
(re.form=~0 specifies that you want to omit random effects from the prediction, which is your only choice unless you're predicting at sites/transects where you've also got training data)

Passing current value of ddply split on to function

Here is some sample data for which I want to encode the gender of the names over time:
names_to_encode <- structure(list(names = structure(c(2L, 2L, 1L, 1L, 3L, 3L), .Label = c("jane", "john", "madison"), class = "factor"), year = c(1890, 1990, 1890, 1990, 1890, 2012)), .Names = c("names", "year"), row.names = c(NA, -6L), class = "data.frame")
Here is a minimal set of the Social Security data, limited to just those names from 1890 and 1990:
ssa_demo <- structure(list(name = c("jane", "jane", "john", "john", "madison", "madison"), year = c(1890L, 1990L, 1890L, 1990L, 1890L, 1990L), female = c(372, 771, 56, 81, 0, 1407), male = c(0, 8, 8502, 29066, 14, 145)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L), .Names = c("name", "year", "female", "male"))
I've defined a function which subsets the Social Security data given a year or range of years. In other words, it calculates whether a name was male or female over a given time period by figuring out the proportion of male and female births with that name. Here is the function along with a helper function:
require(plyr)
require(dplyr)
select_ssa <- function(years) {
# If we get only one year (1890) convert it to a range of years (1890-1890)
if (length(years) == 1) years <- c(years, years)
# Calculate the male and female proportions for the given range of years
ssa_select <- ssa_demo %.%
filter(year >= years[1], year <= years[2]) %.%
group_by(name) %.%
summarise(female = sum(female),
male = sum(male)) %.%
mutate(proportion_male = round((male / (male + female)), digits = 4),
proportion_female = round((female / (male + female)), digits = 4)) %.%
mutate(gender = sapply(proportion_female, male_or_female))
return(ssa_select)
}
# Helper function to determine whether a name is male or female in a given year
male_or_female <- function(proportion_female) {
if (proportion_female > 0.5) {
return("female")
} else if(proportion_female == 0.5000) {
return("either")
} else {
return("male")
}
}
Now what I want to do is use plyr, specifically ddply, to subset the data to be encoded by year, and merge each of those pieces with the value returned by the select_ssa function. This is the code I have.
ddply(names_to_encode, .(year), merge, y = select_ssa(year), by.x = "names", by.y = "name", all.x = TRUE)
When calling select_ssa(year), this command works just fine if I hard code a value like 1890 as the argument to the function. But when I try to pass it the current value for year that ddply is working with, I get an error message:
Error in filter_impl(.data, dots(...), environment()) :
(list) object cannot be coerced to type 'integer'
How can I pass the current value of year on to ddply?
I think you're making things too complicated by trying to do a join inside ddply. If I were to use dplyr I would probably do something more like this:
names_to_encode <- structure(list(name = structure(c(2L, 2L, 1L, 1L, 3L, 3L), .Label = c("jane", "john", "madison"), class = "factor"), year = c(1890, 1990, 1890, 1990, 1890, 2012)), .Names = c("name", "year"), row.names = c(NA, -6L), class = "data.frame")
ssa_demo <- structure(list(name = c("jane", "jane", "john", "john", "madison", "madison"), year = c(1890L, 1990L, 1890L, 1990L, 1890L, 1990L), female = c(372, 771, 56, 81, 0, 1407), male = c(0, 8, 8502, 29066, 14, 145)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L), .Names = c("name", "year", "female", "male"))
names_to_encode$name <- as.character(names_to_encode$name)
names_to_encode$year <- as.integer(names_to_encode$year)
tmp <- left_join(ssa_demo,names_to_encode) %.%
group_by(year,name) %.%
summarise(female = sum(female),
male = sum(male)) %.%
mutate(proportion_male = round((male / (male + female)), digits = 4),
proportion_female = round((female / (male + female)), digits = 4)) %.%
mutate(gender = ifelse(proportion_female == 0.5,"either",
ifelse(proportion_female > 0.5,"female","male")))
Note that 0.1.1 is still a little finicky about the types of join columns, so I had to convert them. I think I saw some activity on github that suggested that was either fixed in the dev version, or at least something they're working on.

Function defining answer by a vector

Looking to learn function writing. I have data laid out in the following (e.g.):
Genus Species Wing Tail
A X 10.5 20.3
A Y 10.7 20.7
B XX 15.2 22.5
B XY 15.5 24
I calculate variance for a given trait using the equation:
sqrt(max(Wing) - min (Wing))
which I sum for all traits.
So I can write the following function so sum variance for the total data set:
variance<- function(data){
t <- sqrt(max(Tail)-min(Tail))
w <- sqrt(max(Wing)-min(Wing))
x <- sum(t,w)
x
}
But I can'twork out how to generate a response to give me an output where this result is dependant on the Genus. So i'm looking to generate an output like:
Genus A Genus B
2.345 3.456
I am going to give a new name to your function because it's just wrong to call it "variance". I hope you can overlook that. We can work on a dataframe object
dput(dfrm)
structure(list(Genus = structure(c(1L, 1L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), Species = structure(c(1L, 4L, 2L, 3L
), .Label = c("X", "XX", "XY", "Y"), class = "factor"), Wing = c(10.5,
10.7, 15.2, 15.5), Tail = c(20.3, 20.7, 22.5, 24)), .Names = c("Genus",
"Species", "Wing", "Tail"), class = "data.frame", row.names = c(NA,
-4L))
dev2<- function(df){
t <- sqrt(max(df[["Tail"]])-min(df[["Tail"]]))
w <- sqrt(max(df[["Wing"]])-min(df[["Wing"]]))
x <- sum(t,w)
x
}
Now use it to work on the full dataframe, using the split-lapply strategy, which passes sections of the original dataframe determined by the Genus values to the dev2 function
lapply( split(dfrm, list(dfrm$Genus)), FUN = dev2)
$A
[1] 1.079669
$B
[1] 1.772467

Resources