How to find LC50 using r? [closed] - r

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I have run cadmium exposure (46h) test and now I want to find LC50 value(Lethal Concentration)and 95% confidence limits (upper and lower limits) using R ?
Here are my data:
Conc. mg/L Dead Live
C1 0 10
C2 0 10
C3 0 10
2 0 10
2 0 10
2 0 10
4 0 10
4 0 10
4 0 10
8 0 10
8 0 10
8 0 10
16 1 9
16 1 9
16 8 8
32 1 9
32 2 8
32 4 6
64 8 2
64 2 8
64 5 5
128 10 0
128 8 2
128 10 0
256 10 0
256 10 0
256 10 0

From here, it seems that LC50 is the minimum concentration at which 50% or more of organisms die. You could aggregate your data to compute the proportion of organisms that died at each concentration level:
# Numeric concentration
dat$Conc.mg.L <- as.character(dat$Conc.mg.L)
dat$Conc.mg.L[dat$Conc.mg.L %in% c("C1", "C2", "C3")] <- 0
dat$Conc.mg.L <- as.numeric(dat$Conc.mg.L)
# Determine LC50
(agg <- tapply(dat$Dead / (dat$Dead+dat$Live), dat$Conc.mg.L, mean))
# 0 2 4 8 16 32 64 128 256
# 0.0000000 0.0000000 0.0000000 0.0000000 0.2333333 0.2333333 0.5000000 0.9333333 1.0000000
as.numeric(names(agg)[min(which(agg >= 0.5))])
# [1] 64

Related

How to add two specific columns from a colSums table in r?

I made a frequency table with two variables in a data frame using this:
table(df$Variable1, df$Variable2)
The output was this:
1 2 3 4 5 D R
1 5000 21 39 2 10 0 112
2 1028 11 18 4 8 1 54
3 1501 6 12 2 3 0 68
4 355 2 4 0 0 0 23
5 421 4 4 0 0 0 49
Then I wanted to find the sum of the first two columns so I did this:
colSums(table(df$Variable1, df$Variable2))
The output was this:
1 2 3 4 5 D R
8305 44 77 8 21 1 306
Is there a way to find the sum of columns 1 and 2 from the colSums output above? What would the code be? Thanks in advance.

Choosing the correct fixed and random variables in a generalized linear mixed model (GLMM) in a longitudinal study (repeated measures)

I want to explore the relationship between the abundance of an organism and several possible explanatory factors. I have doubts regarding what variables should be called as fixed or random in the GLMM.
I have a dataset with the number of snails in different sites within a national park (all sites are under the same climatic conditions). But there are local parameters whose effects over the snail abundance haven't been studied yet.
This is a longitudinal study, with repeated measures over time (every month, for almost two years). The number of snails were counted in the field, always in the same 21 sites (each site has a 6x6 square meters plot, delimitated with wooden stakes).
In case it could influence the analysis, note that some parameters may vary over time, such as the vegetation cover in each plot, or the presence of the snail natural predator (measured with yes/no values). Others, however, are always the same, because they are specific to each site, such as the distant to the nearest riverbed or the type of soil.
Here is a subset of my data:
> snail.data
site time snails vegetation_cover predator type_soil distant_riverbed
1 1 1 9 NA n 1 13
2 1 2 7 0.8 n 1 13
3 1 3 13 1.4 n 1 13
4 1 4 14 0.6 n 1 13
5 1 5 12 1.6 n 1 13
10 2 1 0 NA n 1 136
11 2 2 0 0.0 n 1 136
12 2 3 0 0.0 n 1 136
13 2 4 0 0.0 n 1 136
14 2 5 0 0.0 n 1 136
19 3 1 1 NA n 2 201
20 3 2 0 0.0 n 2 201
21 3 3 0 0.0 y 2 201
22 3 4 3 0.0 n 2 201
23 3 5 2 0.0 n 2 201
28 4 1 0 NA n 2 104
29 4 2 0 0.0 n 2 104
30 4 3 0 0.0 y 2 104
31 4 4 0 0.0 n 2 104
32 4 5 0 0.0 n 2 104
37 5 1 1 NA n 3 65
38 5 2 0 2.4 n 3 65
39 5 3 3 2.2 n 3 65
40 5 4 2 2.2 n 3 65
41 5 5 4 2.0 y 3 65
46 6 1 1 NA n 3 78
47 6 2 2 3.0 n 3 78
48 6 3 7 2.8 n 3 78
49 6 4 3 1.8 n 3 78
50 6 5 6 1.2 y 3 78
55 7 1 14 NA n 3 91
56 7 2 21 2.8 n 3 91
57 7 3 16 2.6 n 3 91
58 7 4 15 1.6 n 3 91
59 7 5 8 2.0 n 3 91
So I'm interested in investigating if the number of snails is significantly different in each site and if those differences are related to some specific parameters.
So far the best statistic approach I have found is a generalized linear mixed model. But I'm struggling in choosing the correct fixed and random variables. My reasoning is, although I'm checking for the differences among sites (by comparing the number of snails) the focus of the study is the other parameters commented above, thus the site would be a random factor.
Then, my question is: should 'site' and 'time' be considered random factors and the local parameters should be the fixed variables? Should I include interactions between time and other factors?
I have set up my command as follows:
library(lme4)
mixed_model <- glmer(snails ~ vegetation_cover + predator + type_soil + distant_riverbed + (1|site) + (1|time), data = snails.data, family = poisson)
Would it be the correct syntax for what I have described?

transform values in data frame, generate new values as 100 minus current value

I'm currently working on a script which will eventually plot the accumulation of losses from cell divisions. Firstly I generate a matrix of values and then I add the number of times 0 occurs in each column - a 0 represents a loss.
However, I am now thinking that a nice plot would be a degradation curve. So, given the following example;
>losses_plot_data <- melt(full_losses_data, id=c("Divisions", "Accuracy"), value.name = "Losses", variable.name = "Size")
> full_losses_data
Divisions Accuracy 20 15 10 5 2
1 0 0 0 0 3 25
2 0 0 0 1 10 39
3 0 0 1 3 17 48
4 0 0 1 5 23 55
5 0 1 3 8 29 60
6 0 1 4 11 34 64
7 0 2 5 13 38 67
8 0 3 7 16 42 70
9 0 4 9 19 45 72
10 0 5 11 22 48 74
Is there a way I can easily turn this table into being 100 minus the numbers shown in the table? If I can plot that data instead of my current data, I would have a lovely curve of degradation from 100% down to however many cells have been lost.
Assuming you do not want to do that for the first column:
fld <- full_losses_data
fld[, 2:ncol(fld)] <- 100 - fld[, -1]

summing a range of columns in data frame

I am having trouble summing select columns within a data frame, a basic problem that I've seen numerous similar, but not identical questions/answers for on StackOverflow.
With this perhaps overly complex data frame:
site<-c(223,257,223,223,257,298,223,298,298,211)
moisture<-c(7,7,7,7,7,8,7,8,8,5)
shade<-c(83,18,83,83,18,76,83,76,76,51)
sampleID<-c(158,163,222,107,106,166,188,186,262,114)
bluestm<-c(3,4,6,3,0,0,1,1,1,0)
foxtail<-c(0,2,0,4,0,1,1,0,3,0)
crabgr<-c(0,0,2,0,33,0,2,1,2,0)
johnson<-c(0,0,0,7,0,8,1,0,1,0)
sedge1<-c(2,0,3,0,0,9,1,0,4,0)
sedge2<-c(0,0,1,0,1,0,0,1,1,1)
redoak<-c(9,1,0,5,0,4,0,0,5,0)
blkoak<-c(0,22,0,23,0,23,22,17,0,0)
my.data<-data.frame(site,moisture,shade,sampleID,bluestm,foxtail,crabgr,johnson,sedge1,sedge2,redoak,blkoak)
I want to sum the counts of each plant species (bluestem, foxtail, etc. - columns 4-12 in this example) within each site, by summing rows that have the same site number. I also want to keep information about moisture and shade (these are consistant withing site, but may also be the same between sites), and want a new column that is the count of number of rows summed.
the result would look like this
site,moisture,shade,NumSamples,bluestm,foxtail,crabgr,johnson,sedge1,sedge2,redoak,blkoak
211,5,51,1,0,0,0,0,0,1,0,0
223,7,83,4,13,5,4,8,6,1,14,45
257,7,18,2,4,2,33,0,0,1,1,22
298,8,76,3,2,4,3,9,13,2,9,40
The problem I am having is that, my real data sets (and I have several of them) have from 50 to 300 plant species, and I want refer a range of columns (in this case, [5:12] ) instead of my.data$foxtail, my.data$sedge1, etc., which is going to be very difficult with 300 species.
I know I can start off by deleting the column I don't need (SampleID)
my.data$SampleID <- NULL
but then how do I get the sums? I've messed with the aggregate command and with ddply, and have seen lots of examples which call particular column names, but just haven't gotten anything to work. I recognize this is a variant of a commonly asked and simple type of question, but I've spent hours without resolving it on my own. So, apologies for my stupidity!
This works ok:
x <- aggregate(my.data[,5:12], by=list(site=my.data$site, moisture=my.data$moisture, shade=my.data$shade), FUN=sum, na.rm=T)
library(dplyr)
my.data %>%
group_by(site) %>%
tally %>%
left_join(x)
site n moisture shade bluestm foxtail crabgr johnson sedge1 sedge2 redoak blkoak
1 211 1 5 51 0 0 0 0 0 1 0 0
2 223 4 7 83 13 5 4 8 6 1 14 45
3 257 2 7 18 4 2 33 0 0 1 1 22
4 298 3 8 76 2 4 3 9 13 2 9 40
Or to do it all in dplyr
my.data %>%
group_by(site) %>%
tally %>%
left_join(my.data) %>%
group_by(site,moisture,shade,n) %>%
summarise_each(funs(sum=sum)) %>%
select(-sampleID)
site moisture shade n bluestm foxtail crabgr johnson sedge1 sedge2 redoak blkoak
1 211 5 51 1 0 0 0 0 0 1 0 0
2 223 7 83 4 13 5 4 8 6 1 14 45
3 257 7 18 2 4 2 33 0 0 1 1 22
4 298 8 76 3 2 4 3 9 13 2 9 40
Try following using base R:
outdf<-data.frame(site=numeric(),moisture=numeric(),shade=numeric(),bluestm=numeric(),foxtail=numeric(),crabgr=numeric(),johnson=numeric(),sedge1=numeric(),sedge2=numeric(),redoak=numeric(),blkoak=numeric())
my.data$basic = with(my.data, paste(site, moisture, shade))
for(b in unique(my.data$basic)) {
outdf[nrow(outdf)+1,1:3] = unlist(strsplit(b,' '))
for(i in 4:11)
outdf[nrow(outdf),i]= sum(my.data[my.data$basic==b,i])
}
outdf
site moisture shade bluestm foxtail crabgr johnson sedge1 sedge2 redoak blkoak
1 223 7 83 13 5 4 8 6 1 14 45
2 257 7 18 4 2 33 0 0 1 1 22
3 298 8 76 2 4 3 9 13 2 9 40
4 211 5 51 0 0 0 0 0 1 0 0

R tvm financial package

Im trying to estimate the present value of a stream of payments using the fvm in the financial package.
y <- tvm(pv=NA,i=2.5,n=1:10,pmt=-c(5,5,5,5,5,8,8,8,8,8))
The result that I obtain is:
y
Time Value of Money model
I% #N PV FV PMT Days #Adv P/YR C/YR
1 2.5 1 4.99 0 -5 30 0 12 12
2 2.5 2 9.97 0 -5 30 0 12 12
3 2.5 3 14.94 0 -5 30 0 12 12
4 2.5 4 19.90 0 -5 30 0 12 12
5 2.5 5 24.84 0 -5 30 0 12 12
6 2.5 6 47.65 0 -8 30 0 12 12
7 2.5 7 55.54 0 -8 30 0 12 12
8 2.5 8 63.40 0 -8 30 0 12 12
9 2.5 9 71.26 0 -8 30 0 12 12
10 2.5 10 79.09 0 -8 30 0 12 12
There is a jump in the PV from 5 to 6 (when the price changes to 8) that appears to be incorrect. This affects the result in y[10,3] which is the result that I'm interested in obtaining.
The NPV formula in Excel produces similar results when the payments are the same throughout the whole stream, however, when the vector of paymets is variable, the resuls with the tvm formula and the NPV differ. I need to obtain the same result that the NPV formula provides in Excel.
What should I do to make this work?
The cf formula helps but it is not always consistent with Excel.
I solved my problem using the following function:
npv<-function(a,b,c) sum(a/(1+b)^c)

Resources