How do I plot more than one series using qplot? - r

I'm trying to understand how to have more than one series on a plot, using the following data.
Year <- c('1950', '1960', '1970', '1980')
Bus <- c(10,20,30,40)
Bus.sd <- c(1.1, 2.2, 3.3, 4.4)
Car <- c(20, 20, 40, 40)
Car.sd <- c(1.1, 2.2, 3.3, 4.4)
sample_data = data.frame(Year, Bus, Bus.sd, Car, Car.sd)
qplot(Year, Bus, data=sample_data, geom="pointrange",
ymin = Bus - Bus.sd/2, ymax = Bus + Bus.sd/2)
For example, using the above data, how do I show both sample_data$Bus and sample_data$Car on the same plot in different colors?
What I tried doing was:
p <- qplot(...)
then
p <- p + qplot(...)
where I replicated the previous line, but this gave me an error.
I don't fully understand how AES works. I have studied the ggplot2 examples, but have difficulty understanding the relevant examples here. Or, if it is possible to make a stacked bar (geom_bar) using this data, I think that would also represent it appropriately.

I Hope this helps
gplot2 works best with data in long format, like so:
Year score sd variable
1 1950 10 1.1 bus
2 1960 20 2.2 bus
3 1970 30 3.3 bus
4 1980 40 4.4 bus
5 1950 20 1.1 car
6 1960 20 2.2 car
7 1970 40 3.3 car
8 1980 40 4.4 car
This will get the data into R:
data <- structure(list(Year = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L,
4L), class = "factor", .Label = c("1950", "1960", "1970", "1980"
)), score = c(10, 20, 30, 40, 20, 20, 40, 40), sd = c(1.1, 2.2,
3.3, 4.4, 1.1, 2.2, 3.3, 4.4), variable = c("bus", "bus", "bus",
"bus", "car", "car", "car", "car")), .Names = c("Year", "score",
"sd", "variable"), row.names = c(NA, -8L), class = "data.frame")
And this will make the plot, with dodge an all. You properbly need the dodge, because your data is overlapping. You can control the amount of dodging with the "W" value.
ggplot(data, aes(x=Year, y=score,col=variable))+
geom_point(position=position_dodge(w=0.2))+
geom_pointrange(aes(ymin=score-sd, ymax=score+sd,group=Year),position=position_dodge(w=0.2))

Related

R data visualization: Is there a way to plot based on emmeans using ggplot?

I am trying to visualize my data separately as a bar graph and as a dot plot connected by a line.
The experimental design includes 2 treatments, 3 levels for each treatment, and 2 diets as independent variables and weight measurement as a dependent variable. Each sample (e.g. treatment "a" level "1" diet "l" is duplicated. Below is a sample data frame (the response variable values are simplified):
df <- data.frame(treatment=c('a','a','a','b','b','b','a','a','a','b','b','b',
'a','a','a','b','b','b','a','a','a','b','b','b',
'a','a','a','b','b','b','a','a','a','b','b','b',
'a','a','a','b','b','b','a','a','a','b','b','b'),
level=c(1,2,3,1,2,3,1,2,3,1,2,3,
1,2,3,1,2,3,1,2,3,1,2,3,
1,2,3,1,2,3,1,2,3,1,2,3,
1,2,3,1,2,3,1,2,3,1,2,3,
1,2,3,1,2,3,1,2,3,1,2,3,
1,2,3,1,2,3,1,2,3,1,2,3,
1,2,3,1,2,3,1,2,3,1,2,3,
1,2,3,1,2,3,1,2,3,1,2,3),
diet=c('l','l','l','l','l','l','h','h','h','h','h','h',
'l','l','l','l','l','l','h','h','h','h','h','h',
'l','l','l','l','l','l','h','h','h','h','h','h',
'l','l','l','l','l','l','h','h','h','h','h','h'),
rep=c(1,1,1,1,1,1,1,1,1,1,1,1,
2,2,2,2,2,2,2,2,2,2,2,2,
1,1,1,1,1,1,1,1,1,1,1,1,
2,2,2,2,2,2,2,2,2,2,2,2),
weight=c(100,75,50,50,25,12.5,100,75,50,50,25,12.5,
100,75,50,50,25,12.5,100,75,50,50,25,12.5,
200,150,100,100,50,25,200,150,100,100,50,25,
200,150,100,100,50,25,200,150,100,100,50,25))
Using a linear mixed model, I see that treatment and level effects are individually significant.
fit_df <- lmer(weight ~ treatment*level*diet + (1|rep), data=df)
I have also run emmeans to see pairwise contrasts between each combination of treatment and level.
(emm_wt <- emmeans(fit_df, specs=pairwise~treatment*level))
Then, I want to visualize the result shown below in a bar graph and a dot plot connected by a line. For the bar graph, the y-axis is emmean, x-axis is treatment*level, and error bars show emmean±SE.
$emmeans
treatment level emmean SE df lower.CL upper.CL
a 1 150.0 7.98 27.7 133.64 166.4
b 1 75.0 7.98 27.7 58.64 91.4
a 2 112.5 7.98 27.7 96.14 128.9
b 2 37.5 7.98 27.7 21.14 53.9
a 3 75.0 7.98 27.7 58.64 91.4
b 3 18.8 7.98 27.7 2.39 35.1
Results are averaged over the levels of: diet
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
The code below produces something similar to what I am looking for, but I am not sure how to add a line connecting the dots by the treatment (a1 to a3 and b1 to b3)...
It would also be nice to assign colors by the treatment (e.g. red for a and blue for b).
plot(emm_wt[[1]],
CIs=TRUE,
PIs=TRUE,
comparisons=TRUE,
colors=c("black","dark grey","grey","red"),
alpha=0.05,
adjust="tukey") +
theme_bw() +
coord_flip()
If anybody has any insights as to how I could visualize this, please let me know. Thank you in advance!
You could do something like this, using ggplot2
library(ggplot2)
ggplot(df,aes(reorder(trt,level),emmean, group=treatment, color=treatment)) +
geom_line(size=2) +
scale_color_manual(values=c("a" = "red", "b"="blue")) +
geom_linerange(aes(ymin=lower.CL, ymax=upper.CL), size=2,show.legend = F) +
geom_point(color="black", size=8) +
ylim(0,200) + labs(x="Treatment/Level", color="Treatment") +
theme(legend.position="bottom")
Output:
Input:
df = structure(list(treatment = c("a", "b", "a", "b", "a", "b"), level = c(1L,
1L, 2L, 2L, 3L, 3L), emmean = c(150, 75, 112.5, 37.5, 75, 18.8
), SE = c(7.98, 7.98, 7.98, 7.98, 7.98, 7.98), df = c(27.7, 27.7,
27.7, 27.7, 27.7, 27.7), lower.CL = c(133.64, 58.64, 96.14, 21.14,
58.64, 2.39), upper.CL = c(166.4, 91.4, 128.9, 53.9, 91.4, 35.1
), trt = structure(c(1L, 4L, 2L, 5L, 3L, 6L), .Label = c("a1",
"a2", "a3", "b1", "b2", "b3"), class = c("ordered", "factor"))), row.names = c(NA,
-6L), class = "data.frame")

Excluding outliers when plotting a Stripchart with ggplot2

I'm trying to create a combination Boxplot/Scatterplot. I'm doing alright with it so far but there's one issue that's really bothering me that I've been unable to figure out. I'm in R and I've installed the ggplot2 package. Here's the code I'm using:
#(xx= stand in for my data set, which I imported from excel with the
# column labels as the X-axis values)
> boxplot(xx, lwd = 1.5, ylab = 'Minutes', xlab = "Epoch")
> stripchart(xx, vertical = TRUE,
+ method = "jitter", add = TRUE, pch = 20, col = 'blue')
This gives me a plot that is pretty close to what I want but the problem is that the outliers are placed on the chart twice. If possible, I'd like to have the stripchart exclude them (highest groups of blue dots) and only use the ones from the boxplot (black outlined circles) so they stand out as different and don't look so sloppy.
I've tried to alter the points in question by putting a lot of different outlier arguments into the stripchart command, unfortunately with no luck. I've tried setting y-limits below their values, tried using outline=false (which completely removes the stripchart), tried changing outlier color, outpch, etc. The command has not worked for any of these attempts. Here's an example of ylim:
> stripchart(xx, vertical = TRUE,
+ method = "jitter", add = TRUE, pch = 20, col = 'blue', ylim = true,
ylim (0,20))
Error in ylim(0, 20) : could not find function "ylim"
And here's an example with outlier color:
> stripchart(xx vertical = TRUE,
+ method = "jitter", add = TRUE, pch = 20, col = 'blue', outcol = "black")
Warning messages:
1: In plot.xy(xy.coords(x, y), type = type, ...) : "outcol" is not a
graphical parameter
.......# warning messages continue as such.
Are stripcharts capable of outlier exclusion? Or do I simply not know enough about them yet (and R as a whole, for that matter) to effectively write the code?
If this can be done, how should I proceed? I'm totally fine with solutions that don't directly address the outlier issue in terms of the data as long as the visual effect on the plot is the same.
Thank you for your time and any help you can give!
Edit: Here's some of the data to play around with. Top row is column labels and data is beneath. Sorry if this formatting is bad.The 29s and 30s and such in the 9th row of data, 10th overall, are examples of some of the points plotted as outliers in my graphs that I would like to keep in the boxplot but not in the scatterplot/stripchart.
1 5 10 15 30 60
7.233333333 8.166666667 9.666666667 7.75 9 7
7.133333333 9.25 9.333333333 9.75 10 11
0.733333333 0.5 0.833333333 1 1 0
1.766666667 1.166666667 1 0.75 1 0
1.75 2.25 2.333333333 2.25 1 1
6.75 7 7.166666667 7.75 6.5 7
1.516666667 1.75 1.333333333 2 2 2
1.533333333 1.5 2 1.25 1.5 2
27.3 28.33333333 29.33333333 30.25 28.5 29
6.35 6 6.333333333 7 6 6
7.083333333 8.333333333 8.833333333 8.75 8 8
8.533333333 10.08333333 10.5 12 10.5 11
7.65 8.416666667 9 10.75 9 12
6.85 7.333333333 8 7.25 6 8
4.433333333 5 5.5 5 6.5 6
8.616666667 10 11.66666667 12.25 13 12
3.633333333 3.75 3.5 3.25 3 2
0.8 0.75 0.833333333 1 1 0
7.283333333 8.583333333 9.666666667 9.75 12 8
7.483333333 8.75 8.333333333 7.75 6.5 7
3.466666667 2.916666667 3.166666667 2.5 2 0
5.483333333 6.416666667 6.833333333 6.75 7 8
There are a few things going on here. If you wanted to stick with the base plotting functions (boxplot() and stripchart()), you could simply tell stripchart to plot only the points that are within some criterion. A common standard for outliers would be any point 3 or more standard deviations away from the mean. Instead of passing your unmodified data set to stripchart, we subset that data set (note the [ ] brackets).
boxplot(xx)
stripchart(xx[xx <= mean(xx) + sd(xx) * 3], vertical = T, method = 'jitter', add = T, pch = 20, col = 'blue')
Of course, if you really did want to use ggplot2 (and I recommend installing not only that package, but the entire tidyverse with install.packages('tidyverse')), you could produce an arguably nicer plot:
The data formatting and commands needed to produce the ggplot version are quite different from the base graphics version, and beyond the scope of this answer. Reproducible code follows.
library(tidyverse)
df <- structure(list(X1 = c(7.233333333, 7.133333333, 0.733333333, 1.766666667, 1.75, 6.75, 1.516666667, 1.533333333, 27.3, 6.35, 7.083333333, 8.533333333, 7.65, 6.85, 4.433333333, 8.616666667, 3.633333333, 0.8, 7.283333333, 7.483333333, 3.466666667, 5.483333333 ), X5 = c(8.166666667, 9.25, 0.5, 1.166666667, 2.25, 7, 1.75, 1.5, 28.33333333, 6, 8.333333333, 10.08333333, 8.416666667, 7.333333333, 5, 10, 3.75, 0.75, 8.583333333, 8.75, 2.916666667, 6.416666667 ), X10 = c(9.666666667, 9.333333333, 0.833333333, 1, 2.333333333, 7.166666667, 1.333333333, 2, 29.33333333, 6.333333333, 8.833333333, 10.5, 9, 8, 5.5, 11.66666667, 3.5, 0.833333333, 9.666666667, 8.333333333, 3.166666667, 6.833333333), X15 = c(7.75, 9.75, 1, 0.75, 2.25, 7.75, 2, 1.25, 30.25, 7, 8.75, 12, 10.75, 7.25, 5, 12.25, 3.25, 1, 9.75, 7.75, 2.5, 6.75), X30 = c(9, 10, 1, 1, 1, 6.5, 2, 1.5, 28.5, 6, 8, 10.5, 9, 6, 6.5, 13, 3, 1, 12, 6.5, 2, 7), X60 = c(7L, 11L, 0L, 0L, 1L, 7L, 2L, 2L, 29L, 6L, 8L, 11L, 12L, 8L, 6L, 12L, 2L, 0L, 8L, 7L, 0L, 8L)), .Names = c("X1", "X5", "X10", "X15", "X30", "X60"), class = "data.frame", row.names = c(NA, -22L))
df.long <- gather(df, x, value) %>%
mutate(x = as.factor(as.numeric(gsub('X', '', x)))) %>%
group_by(x) %>%
mutate(is.outlier = value > mean(value) + sd(value) * 3)
plot.df <- ggplot(data = df.long, aes(x = x, y = value, group = x)) +
geom_boxplot() +
geom_point(data = filter(df.long, !is.outlier), color = '#0000ff88', position = position_jitter(width = 0.1))
print(plot.df)

Calculating the median of a time series, by 8 every 8 hours

I am new to R and I do have to calculate the mean of time series, containing 5 years, with hourly taken data of ozon etc..
My df looks like:
structure(list(date = structure(c(1L, 1L, 1L, 1L), .Label = "01.01.2010", class = "factor"),
day.of = c(1L, 1L, 1L, 1L), time = structure(1:4, .Label = c("00:00",
"01:00", "02:00", "03:00"), class = "factor"), SVF_Ray = c(1L,
1L, 1L, 1L), Gmax = c(0, 0, 0, 0), Ta = c(-1.3, -1.2, -1.2,
-1.2), Tmrt = c(-19.3, -12.1, -12, -12.1), PET = c(-10.4,
-8.7, -8.7, -8.7), PT = c(-11.3, -9.3, -9.3, -9.3), Ozon = c(61.35,
62.65, 63.4, 63.85), rDatum = structure(c(14610, 14610, 14610,
14610), class = "Date"), year = c(2010, 2010, 2010, 2010),
month = c(1, 1, 1, 1), day = c(1, 1, 1, 1), hour = c(0, 1,
2, 3)), .Names = c("date", "day.of", "time", "SVF_Ray", "Gmax",
"Ta", "Tmrt", "PET", "PT", "Ozon", "rDatum", "year", "month",
"day", "hour"), row.names = c(NA, 4L), class = "data.frame")
I would like to calculate the mean of Ozon every 8 hours, so a series of 4 calculated means for every day. I have arranged my datum like:
Datum_Ozon$rDatum <- as.Date(data$date, format="%d.%m.%Y")
Datum_Ozon$hour<-as.numeric(unlist(strsplit(as.character(df$time), ":"))[seq(1, 2 * length(df$time), 2)])
Format is numeric
But I don't know any further in achieving my goal. Thanks in advance!
If its the case that your data is regular and complete (ie, every hour has a record), the following base R code should do the trick:
# Get the number of 8 hour intervals
intervalCnt <- nrow(df) / 8L
# add a grouping vector to your data
df$group <- rep(1:intervalCnt, each=8)
# get the median for each interval, keep year var around for later
intervalMedian <- aggregate(var~group + day + month + year, data=df, FUN=median)
Note that this solution relies on the assumption that the data has a regular structure, i.e., every hour has a record. If the measure of interest is missing, i.e. NA, then simply adding na.rm to the aggregate function will return the statistics of interest:
# get the median for each interval
intervalMedian <- aggregate(var~group + day + month + year, data=df, FUN=median, na.rm=T)
If you have a variable for hour of the day, here is a simple way to check for data regularity:
table(df$hourOfDay)
The result of this function is a frequency count of each hour. The counts should be equal. Another thing to check is that the first observation starts in the hour following the final observation, i.e. if the hour of observation 1 == "00:00", then the hour of the final observation should be 23:00.
To provide a plot of the mean of the 8 hour periods by year, you can again use aggregate:
intervalMeans.year <- aggregate(var~group, data=intervalMedian,
FUN=mean, na.rm=T)
The inclusion of the group, day, month, and year variables in the intervalMedian data.frame allow for a lot of different aggregations. For example, with a minor adjustment, it is possible to get the average value of a variable over the 5 year period for each time period-day-month:
intervalMedian$periodDay <- rep(1:3, length.out=intervalMedian)
intervalMeans.dayMonthPeriod <- aggregate(var~periodDay+day+month,
data=intervalMedian, FUN=mean, na.rm=T)
Here is a basic example using a dplyr pipe rather than a plyr approach as well as ifelse(). Everything is self contained here:
library(dplyr)
## OP data
df <-
structure(list(date = structure(c(1L, 1L, 1L, 1L), .Label = "01.01.2010", class = "factor"),
day.of = c(1L, 1L, 1L, 1L), time = structure(1:4, .Label = c("00:00",
"01:00", "02:00", "03:00"), class = "factor"), SVF_Ray = c(1L,
1L, 1L, 1L), Gmax = c(0, 0, 0, 0), Ta = c(-1.3, -1.2, -1.2,
-1.2), Tmrt = c(-19.3, -12.1, -12, -12.1), PET = c(-10.4,
-8.7, -8.7, -8.7), PT = c(-11.3, -9.3, -9.3, -9.3), Ozon = c(61.35,
62.65, 63.4, 63.85), rDatum = structure(c(14610, 14610, 14610,
14610), class = "Date"), year = c(2010, 2010, 2010, 2010),
month = c(1, 1, 1, 1), day = c(1, 1, 1, 1), hour = c(0, 1,
2, 3)), .Names = c("date", "day.of", "time", "SVF_Ray", "Gmax",
"Ta", "Tmrt", "PET", "PT", "Ozon", "rDatum", "year", "month",
"day", "hour"), row.names = c(NA, 4L), class = "data.frame")
df %>%
mutate(DayChunk=ifelse(hour %in% c(0:7),"FirstThird",
ifelse(hour %in% c(8:15), "SecondThird"
,"ThirdThird")
)) %>%
group_by(Date, DayChunk) %>%
summarise(MedOzon=median(Ozon))
Look up the function seq.POSIXt. There are options to specify the start and stop intervals. This function is designed to create sequences of time. For your problem:
myseq<-seq(ISOdate(2010,01,01, 00, 00, 00, tz="GMT"), to=ISOdate(2016,01,05), by = "8 hour")
Use the ISOdate functions to set the start and stop times. If you are going to be working much with times, I suggest researching the function strptime and the POSIXlt/ct time classes.
Now with the breaks defined and assuming you have a column in your dataframe (Datum_Ozon) named "datetime", then use "cut" to group/subset your data.
Datum_Ozon$datetime<-as.POSIXct(paste(as.character(Datum_Ozon$date),
as.character(Datum_Ozon$time)), "%d.%m.%Y %H:%M", tz="GMT" )
library(dplyr)
summarize(group_by(Datum_Ozon, cut(Datum_Ozon$datetime, myseq)), mean(Ozon))

ggplot graph in R based on many variables

I have two data sets, d1 and d2 in csv files. Each data has 6 columns. I managed to combine them by melt command and graph them together in ggplot. After I have added one extra column which is another variable the graph would depend on, I couldn't get the required graph. Provided sample of the dataset and my code.
The dataset after using melt and reshape package:
initi A B C D E L1
0.005 1 23.7 1.0 1.0 24.7 d2
0.005 2 31.2 2.0 2.1 31.2 d2
0.005 3 35.8 3.1 3.2 35.6 d2
1 1 6.2 1.0 1.0 6.2 d1
1 2 10.1 2.0 2.1 7.0 d1
1 3 11.2 3.0 3.5 7.0 d1
2 1 14.2 8.0 14.3 5.2 d1
2 2 15.9 7.0 13.0 5.5 d1
2 3 16.0 6.2 12.4 5.8 d1
I need to graph A in the X-axis and B in the Y-axis. The initi value will represent each graph. In other words, it will be in the legend. For d1, I need to plot the results between A and B. For d2 I want to plot two graphs where the first graph when the initi = 1 and the second for initi = 2. All of the graphs for d1 and d2 are between A and B and combined in graph. The total in this case 3 lines combined in one graph.
I managed to graph d1 and d2 before I have added the initi column. Now I struggling. Below is my code:
dlist <- list(d1 =data1 ,d2 = data2)
reshaped_data <- melt(dlist, id.vars = c('initi','A','B','C','D','E'))
graph_AB<-ggplot(reshaped_data,aes(x = A, y = B, colour = initi)) +
geom_point(size = 5)+
geom_line() +
ggtitle("DATA1 vs DATA2")
The above code is close to what I want ...Except that the lines are connected in strange way. I should get each pattern for "initi" as one line. Plus the legend is not showing each value of 'initi'
You say you want two graphs, one for when initi is 1 and another for when it is 2, but it also takes the value 0.005. You need to subset your data first if you want to omit the 0.005 level.
In ggplot2, multiple graphs are called "facets", and since your facets depend on one variable, the command you want is facet_wrap(). (You'd use facet_grid if your facets depended on 2 variables.) Facets should be on factors, so we'll make sure initi is a factor
reshaped_data$initi <- factor(reshaped_data$initi)
then just add + facet_wrap(~ initi) to your ggplot.
EDIT:
If you want just one graph, try
graph_AB<-ggplot(reshaped_data,aes(x = A, y = B, colour = initi, group = L1)) +
geom_point(size = 5)+
geom_line() +
ggtitle("DATA1 vs DATA2")
graph_AB
In the future, post data using dput. For example, if you do dput(reshaped_data) and paste it into your question, the output will be this:
structure(list(initi = c(0.005, 0.005, 0.005, 1, 1, 1, 2, 2,
2), A = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), B = c(23.7, 31.2,
35.8, 6.2, 10.1, 11.2, 14.2, 15.9, 16), C = c(1, 2, 3.1, 1, 2,
3, 8, 7, 6.2), D = c(1, 2.1, 3.2, 1, 2.1, 3.5, 14.3, 13, 12.4
), E = c(24.7, 31.2, 35.6, 6.2, 7, 7, 5.2, 5.5, 5.8), L1 = structure(c(2L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("d1", "d2"), class = "factor")), .Names = c("initi",
"A", "B", "C", "D", "E", "L1"), class = "data.frame", row.names = c(NA,
-9L))
which anyone can paste into R and use easily.
dlist <- list(d1 =data1 ,d2 = data2)
reshaped_data <- melt(dlist, id.vars = c('initi','A','B','C','D','E'))
graph_AB<-ggplot(reshaped_data,aes(x = A, y = B, colour = initi)) +
geom_point(size = 5)+
ggtitle("DATA1 vs DATA2")
print(graph_AB)
I just removed geom_line from original question !

Function defining answer by a vector

Looking to learn function writing. I have data laid out in the following (e.g.):
Genus Species Wing Tail
A X 10.5 20.3
A Y 10.7 20.7
B XX 15.2 22.5
B XY 15.5 24
I calculate variance for a given trait using the equation:
sqrt(max(Wing) - min (Wing))
which I sum for all traits.
So I can write the following function so sum variance for the total data set:
variance<- function(data){
t <- sqrt(max(Tail)-min(Tail))
w <- sqrt(max(Wing)-min(Wing))
x <- sum(t,w)
x
}
But I can'twork out how to generate a response to give me an output where this result is dependant on the Genus. So i'm looking to generate an output like:
Genus A Genus B
2.345 3.456
I am going to give a new name to your function because it's just wrong to call it "variance". I hope you can overlook that. We can work on a dataframe object
dput(dfrm)
structure(list(Genus = structure(c(1L, 1L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), Species = structure(c(1L, 4L, 2L, 3L
), .Label = c("X", "XX", "XY", "Y"), class = "factor"), Wing = c(10.5,
10.7, 15.2, 15.5), Tail = c(20.3, 20.7, 22.5, 24)), .Names = c("Genus",
"Species", "Wing", "Tail"), class = "data.frame", row.names = c(NA,
-4L))
dev2<- function(df){
t <- sqrt(max(df[["Tail"]])-min(df[["Tail"]]))
w <- sqrt(max(df[["Wing"]])-min(df[["Wing"]]))
x <- sum(t,w)
x
}
Now use it to work on the full dataframe, using the split-lapply strategy, which passes sections of the original dataframe determined by the Genus values to the dev2 function
lapply( split(dfrm, list(dfrm$Genus)), FUN = dev2)
$A
[1] 1.079669
$B
[1] 1.772467

Resources