I have table of soil moisture deficit (SMD) table with 170 column (each column is a month) and 103937 rows and I want to calculate the p and q as in equation below, I wrote a code but it does not work from fourth line; it says:
error in vector(length=21,na.rm=TRUE is unused arguments)
there are many NA in the data which I don’t want to include:equation is
p=1-(m/m+b)....1
q=C/(m+b)......2
Where m and b are the slope and intercept of the linear regression between the cumulative SMD in driest and wettest conditions vs different durations from one month to eighteen months (Figure 9). For each grid cell, to evaluate m and b in dry conditions, the driest month in the history (lowest SMD) was first selected and plotted for one-month duration. Then the running sums of SMD for every two neighboring months were calculated and the lowest cumulative SMD was selected as well for two-month duration. Same process repeated until eighteen-month duration and highest cumulative SMD was chosen for wet conditions. Then a linear regression was used to fit these plots and identify the slope,-m and intercept, b. C is from the best-fit line of a drought monograph to scale, which ranges from -100 to 100 which is then scaled to fit the range of PDSI categories (-4, 4). The code is as follows:
SM=read.table('SMD.csv',header=T,sep=',')
df=data.frame(data[3:21])#subset from 3 to 21 column; i have 2000 column and 103937rows.
matrix=data.matrix(df)
x=t(t(c(matrix[3:21])))
dry=vector(Length=21, na.rm=TRUE)
wet=vector(length= 21,na.rm=TRUE )
slope_dry= vector(length= 103937,na.rm=TRUE )
slope_wet= vector(length= 103937,na.rm=TRUE )
inter_dry= vector(length= 103937,na.rm=TRUE )
inter_wet= vector(length= 103937,na.rm=TRUE )
for (a in 1:103937){
for (i in 1:103937) {
sum_SMD=vector(length=nrow(matrix)-i+1)
for (j in 1 : (nrow(matrix)-i+1)) {
for(b in j :(j+i-1))
sum_SMD[j]<-sum_SMD[j]+SMD[b,a]
}
dry[i]<-min(sum_SMD)
wet[i]<-max(sum_SMD)
}
model_dry<-lm(dry~x)
slope_dry[a]<-coefficients(model_dry)[2]
inter_dry[a]<-coefficients(model_dry)[1]
model_wet<-lm(wet~x)
slope_wet[a]<-coefficients(model_wet)[2]
inter_wet[a]<-coefficients(model_wet)[1]
}
c_dry=slope_dry/25
#c_dry=-4
p_dry=1-slope_dry/(slope_dry+inter_dry)
q_dry=c_dry/(slope_dry+inter_dry)
#c_wet=4
c_wet=slope_wet/25
p_wet=1-slope_wet/(slope_wet+inter_wet)
q_wet=c_wet/(slope_wet+inter_wet)
Related
I am strugling into comparing the rates 'of mortality' between two percentages over time interval. My goal is to get the annual rates per group.
My values are already in percentages (start and end values), representing how mych forest have been lost (disturbed, burned, cut, etc.) over several years from the total forest cover. E.g in first year it was 1%, the last year 20 % is a cumulative value of total forest lost.
I followed the calculation of the Compound annual growth rate (CARG), taking into account the values in the 1st year, last year, and total number of years.
Here are my dummy data for two groups, eg. mortablity depending between tree species:
df <- data.frame(group = c('pine', 'beech'),
start = c(1,2),
end = c(19, 30),
years = 18)
To calculate the CAGR, I have used this function:
CAGR_formula <- function(end, start, yrs) {
values <- ((end/start)^(1/yrs)-1)
return(values)
}
giving:
df %>%
mutate(CARG = CAGR_formula(end, start, yrs)*100)
group start end yrs CARG
1 pine 1 19 18 17.8
2 beech 2 30 18 16.2
However, CARG rates of 16-17% seems awefully hight! I was expecting about 1-3% per year. Please, what is wrong in my formula? Is it because original values (start, end) are already in percentages? Or, is it because end is a cumulative values of the start?
Thank you for your ideas!
If I understand correctly, maybe this is what is desired:
df %>%
mutate(CARG = CAGR_formula(1 - end/100, 1, yrs)*100)
#> group start end yrs CARG
#> 1 pine 1 19 18 -1.163847
#> 2 beech 2 30 18 -1.962024
where the start parameter to CARG() is always 1 (the value for year 1 can be ignored in this calculation), meaning the forest is 100%, and the end parameter to CARG() is 1 - end/100, e.g. in the first row 81% of the forest remains after 18 years.
The resulting yearly mortality rates are 1.17% and 1.96%.
We can verify that 1 * (1 - 0.0117)^18 is roughly 81%, and 1 * (1 - 0.0196)^18 is roughly 70%
Why does it seem high? From 1% to 19% is a big jump. Also:
1 * 1.178^18 = 19.086
Seems right to me
I'm trying to compute a moving window transition matrix for credit rating data. My data looks like this but then with 4000+ rows (there are 8 different ratings in the complete dataset). So we have firms (id) and their prior and current credit rating and at what date this transition took place.
df <- data.frame(id = c(100,100,98,99,98,98,56,54), date = c(19750102, 19750205, 19750402, 19750609, 19831212, 19840202, 19840704,19861104), priorrating = c("A","BBB","AAA","AAA","AA","A","BB", "D"), currentrating = c("BBB","BB","AA","AAA","A","A","BB","D"))
I want to compute the transition probability matrices of these ratings as a moving window. I want to shift 6 months intervals by 1 month. So you would get matrices P for intervals [19750102,19750602], [19750202,19750702] and so forth.
For each element (each date) in the different intervals I want to compute the following. There are no ties (so no dates with multiple different transitions).
require(Matrix)
#Table with transitions between ratings at time t in the interval
N <- table(df$priorrating,df$currentrating)
#Getting the number of total exposed firms Y at time t in the interval
firms <- apply(N,1,sum)
Y <- sum(firms)
#Computing the off-diagional elements of matrix dA; the kjth off-diagonal element count the
#fraction of transitions from the kth category to the jth category in the number of exposed
#firms at time t.
dA <- N/Y
#Complete matrix dA by adding default row
dA[6,] <- 0
#Computing diagonal elements of dA; e, the kth diagonal element counts the fraction of the
#exposed firms Y leaving the state at time t
D <- rep(0,6)
diag(dA) <- D #setting diagonal to zeros
diag(N) <- D #setting diagonal of transition count matrix to zero
outtrans <- apply(N,1,sum) #vector with number of firms leaving each state at time t
diag(dA) <- -outtrans/Y
#Finally compute probability matrix P for element i in the interval (time t)
attributes(dA)$class <- "matrix"
P <- (diag(6) + dA) #note that there are 8 different ratings in the complete dataset.
And finally I want to compute P.hat for the entire interval, which would be the P matrix computed above of each element in the interval multiplied by eachother.
So the output would look like a list of matrices P.hat for each interval.
Now, my code above works on the entire sample. But I'm quite unsure how to implement it as a moving window.
PS: Hopefully my question is clear, if not please let me know!
I am new to R and trying to do a coursework about factor analysis with it.
I have two data sets FundReturn(120 rows, 14 columns) and Factors(120 rows, 30 columns), I want to do a one-factor regression for all the possible pairs of factors and funds, starting with the first 60 observations. With the parameters estimated, I want to calculate the predicted value for the 61st fund return with the 61st value of the factor. Then the estimation window is expanded one observation bigger and new parameters are estimated with the updated sample, then the predicted value for 62rd fund return is calculated, so on so forth. Totally 60 predictions will be made, stored in Predictions=array(1,dim=c(60,30,14)), so I can compare them with the realized values.
The following is the code I used and produced this error:
Error in Predictions[p, fa, fu] <- coeff[1, p, fa, fu] + coeff[2, p, fa, :
replacement has length zero
Can anyone spot the problem? Your help is very appreciated.
Predictions=array(1,dim=c(60,30,14))
coeff=array(1,dim=c(3,60,30,14))
v1<- 1:30
v2<- 1:60
v3<- 1:14
for(fu in v3){
for (fa in v1){
for (p in v2){
y1=FundReturn[1:(59+p),fu]
x1=Factors[1:(59+p),fa]
Model<-lm(y1 ~ x1 + lag(y1))
coeff[1:3,p,fa,fu]=Model[["coefficients"]]
Predictions[p,fa,fu]= coeff[1,p,fa,fu]+coeff[2,p,fa,fu]*Factors[60+p,fa]+coeff[3,p,fa,fu]*FundReturn[59+p,fu]
}
}
}
I want to calculate the error rate by interval where 0 is good and 1 is bad. If I have a sample of 100 observation as levels divided in intervals as follows:
X <- 10; q<-sample(c(0,1), replace=TRUE, size=X)
l <- sample(c(1:100),replace=T,size=10)
bornes<-seq(min(l),max(l),5)
v <- cut(l,breaks=bornes,include.lowest=T)
table(v)
How can I get a table or function that calculates the default rate by each interval, the number of bad observations divided by the total number of observations?
tx_erreur<-function(x){
t<-table(x,q)
return(sum(t[,2])/sum(t))
}
I already tried this code above and tapply.
Thank you!
I think you want this:
tapply(q,# the variable to be summarized
v,# the variable that defines the bins
function(x) # the function to calculate the summary statistics within each bin
sum(x)/length(x))
I have survival data from an experiment in flies which examines rates of aging in various genotypes. The data is available to me in several layouts so the choice of which is up to you, whichever suits the answer best.
One dataframe (wide.df) looks like this, where each genotype (Exp, of which there is ~640) has a row, and the days run in sequence horizontally from day 4 to day 98 with counts of new deaths every two days.
Exp Day4 Day6 Day8 Day10 Day12 Day14 ...
A 0 0 0 2 3 1 ...
I make the example using this:
wide.df2<-data.frame("A",0,0,0,2,3,1,3,4,5,3,4,7,8,2,10,1,2)
colnames(wide.df2)<-c("Exp","Day4","Day6","Day8","Day10","Day12","Day14","Day16","Day18","Day20","Day22","Day24","Day26","Day28","Day30","Day32","Day34","Day36")
Another version is like this, where each day has a row for each 'Exp' and the number of deaths on that day are recorded.
Exp Deaths Day
A 0 4
A 0 6
A 0 8
A 2 10
A 3 12
.. .. ..
To make this example:
df2<-data.frame(c("A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A"),c(0,0,0,2,3,1,3,4,5,3,4,7,8,2,10,1,2),c(4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36))
colnames(df2)<-c("Exp","Deaths","Day")
What I would like to do is perform a Gompertz Analysis (See second paragraph of "the life table" here). The equation is:
μx = α*e β*x
Where μx is probability of death at a given time, α is initial mortality rate, and β is the rate of aging.
I would like to be able to get a dataframe which has α and β estimates for each of my ~640 genotypes for further analysis later.
I need help going from the above dataframes to an output of these values for each of my genotypes in R.
I have looked through the package flexsurv which may house the answer but I have failed in attempts to find and implement it.
This should get you started...
Firstly, for the flexsurvreg function to work, you need to specify your input data as a Surv object (from package:survival). This means one row per observation.
The first thing is to re-create the 'raw' data from the summary tables you provide.
(I know rbind is not efficient, but you can always switch to data.table for large sets).
### get rows with >1 death
df3 <- df2[df2$Deaths>1, 2:3]
### expand to give one row per death per time
df3 <- sapply(df3, FUN=function(x) rep(df3[, 2], df3[, 1]))
### each death is 1 (occurs once)
df3[, 1] <- 1
### add this to the rows with <=1 death
df3 <- rbind(df3, df2[!df2$Deaths>1, 2:3])
### convert to Surv object
library(survival)
s1 <- with(df3, Surv(Day, Deaths))
### get parameters for Gompertz distribution
library(flexsurv)
f1 <- flexsurvreg(s1 ~ 1, dist="gompertz")
giving
> f1$res
est L95% U95%
shape 0.165351912 0.1281016481 0.202602176
rate 0.001767956 0.0006902161 0.004528537
Note that this is an intercept-only model as all your genotypes are A.
You can loop this over multiple survival objects once you have re-created the per-observation data as above.
From the flexsurv docs:
Gompertz distribution with shape parameter a and rate parameter
b has hazard function
H(x: a, b) = b.e^{ax}
So it appears your alpha is b, the rate, and beta is a, the shape.