calculating a parameters in equation

calculating a parameters in equation - r

SMDIt=p*SMDIt-1+q*SMDt
SMDIt=SMDt/50
I want to do the above equation to my dataset (SMD). I first need to divide the first column of my dataset with 50 (eqn 2)and call it SMDI, then go for first equation where i add SMDIt-1 with the original SMD.I have two values of p and q (p_dry and p_wet, q_dry and q_wet). I want to use p_dry and q_dry if my cell value is positive otherwise p_wet and q_wet in equation one. I wrote a following code but it gives me error. NA/NAN argument. Please help.
3.343327144 0.076583722 -4.316073117 -6.064319011 -1.034313982 1.711678831 2.062381759 5.632386548 6.017760438
4.467709087 1.632745678 -2.045736377 -3.601413064 1.695347213 3.295933998 4.070685302 7.743864617 8.348716373
8.256385028 5.635534811 2.707796712 1.572985845 6.066710978 7.095101029 7.941167874 11.37490758 12.15712496
NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA
-47.4749727 -62.45954133 -69.42311677 -68.04854477 -69.86363461 -56.6566393 -44.02624374 -34.68257496 -5.528397863
-57.44464723 -74.11667952 -83.07777747 -81.88546602 -84.32488173 -72.37428075 -61.04778523 -51.84892678 -20.81696219
-12.6032741 -26.27089119 -36.55478576 -30.40468773 -36.15889518 -33.71339142 -16.63378788 -4.849972012 -1.667644897
-28.28948158 -38.05693676 -43.2879285 -35.34546364 -40.09848824 -34.40754496 -18.41988896 -9.867125675 -7.493617422
NA NA NA NA NA NA NA NA NA
-35.04117468 -38.74252722 -42.69080876 -43.06064215 -40.85844545 -36.79603495 -37.92408262 -34.51428202 -32.54118632
-29.35688054 -33.7004665 -37.88555224 -39.06340145 -37.19884049 -29.8488303 -32.48244008 -28.52426895 -28.39245064
-1.422800439 -6.972537109 -11.86824507 -13.14543917 -9.893061342 1.11258721 -0.415834635 2.424939039 2.65615071
Codes:
data=read.table('SMD.csv', header=TRUE, sep=',')
SMD=data.matrix(data)
p_dry<-0.1542
q_dry<-0.0338
p_wet<-0.1660
q_wet<-0.0333
SMDI<- matrix(0,nrow=nrow(SMD),ncol=ncol(SMD))
for (i in 2:nrow(SMD)) {
for(j in 1:ncol){
if(is.na(SMD[i,j])){
SMD[i,j]<-NaN
SMDI[1,j] <-SMD[1,j]/50
if(SMD[i,j]<0)
SMDI[i,j]<- p_dry[j]*SMDI[i-1,j]+SMD[i,j]*q_dry[j] else
SMDI[i,j]<- p_wet[j]*SMDI[i-1,j]+SMD[i,j]*q_wet[j]
}
}
}
write.table(SMDI,(file='SMDI.csv')

You don't need loops. In R we works with vectors.
SMDIt <- SMD/50 # second equation
# defining vectors of p and q values corresponding to SMDIt
p <- ifelse(SMDIt>0, p_dry, p_wet)
q <- ifelse(SMDIt>0, q_dry, q_wet)
SMDIt <- p*SMDIt - 1 + q*SMD # first equation
Edit: replaced SMD[, 1] with SMD to calculate values for whole matrix.

Related

compute diff of rows with NAs values in data frame using R

I have data frame (9000 x 304) but it looks like to this :
date
a
b
1997-01-01
8.720551
10.61597
1997-01-02
na
na
1997-01-03
8.774251
na
1997-01-04
8.808079
11.09641
I want to calculate the values data such as :
first <- data[i-1,] - data[i-2,]
second <- data[i,] - data[i-1,]
third <- data[i,] - data[i-2,]
I want to ignore the NA values and if there is na I want to get the last value that is not na in the column.
For example in the second diff i = 4 from column b :
11.09641 - 10.61597 is the value of b_diff on 1997-01-04
This is what I did but it keeps generating data with NA :
first <- NULL
for (i in 3:nrow(data)){
first <-rbind(first, data[i-1,] - data[i-2,])
}
second <- NULL
for (i in 3:nrow(data)){
second <- rbind(second, data[i,] - data[i-1,])
}
third <- NULL
for (i in 3:nrow(data)){
third <- rbind(third, data[i,] - data[i-2,])
}
It can be a way to solve it with aggregate function but I need a solution that can be applied on big data and I can't specify each colnames separately. Moreover my colnames are in foreign language.
Thank you very much ! I hope I gave you all the information you need to help me, otherwise, let me know please.

You can use fill to replace NAs with the closest value, and then use across and lag to compute the new variables. It is unclear as to what exactly is your expected output, but you can also replace the default value of lag when it does not exist (e.g. for the first value), using lag(.x, default = ...).
library(dplyr)
library(tidyr)
data %>%
fill(a, b) %>%
mutate(across(a:b, ~ lag(.x) - lag(.x, n = 2), .names = "first_{.col}"),
across(a:b, ~ .x - lag(.x), .names = "second_{.col}"),
across(a:b, ~ .x - lag(.x, n = 2), .names = "third_{.col}"))
date a b first_a first_b second_a second_b third_a third_b
1 1997-01-01 8.720551 10.61597 NA NA NA NA NA NA
2 1997-01-02 8.720551 10.61597 NA NA 0.000000 0.00000 NA NA
3 1997-01-03 8.774251 10.61597 0.0000 0 0.053700 0.00000 0.053700 0.00000
4 1997-01-04 8.808079 11.09641 0.0537 0 0.033828 0.48044 0.087528 0.48044

Creating Data Table of Regression Coefficients

I have a model with the following regression coefficient values:
(Intercept) radius perimeter compactness concavepoints
-2.3003926746 0.0743984303 -0.0111031732 -2.5826629017 5.3127565914
radius.stderr smoothness.stderr compactness.stderr concavity.stderr radius.worst
0.4256225882 16.9805981122 -3.8819567231 0.9488969352 0.1408605366
texture.worst area.worst concavity.worst symmetry.worst fractaldimension.worst
0.0105317616 -0.0009867991 0.3504860653 0.8536208289 4.7503948408
I want to make a data table with the variable names in one column, and the corresponding regression coefficient values in the other column.
This is what I have tried so far:
var_names = coef(summary(model_B))[, 0]
coef_vals = coef(summary(model_B))[, 1]
data.table(Variables=c(var_names), RegressionCoefficients = c(coef_values))
But I get the following output with the 'Variables' column all NA:
Variables RegressionCoefficients
<dbl> <dbl>
NA -2.3003926746
NA 0.0743984303
NA -0.0111031732
NA -2.5826629017
NA 5.3127565914
NA 0.4256225882
NA 16.9805981122
NA -3.8819567231
NA 0.9488969352
NA 0.1408605366

Use names to access the names of the coefficients.
var_names=names(coef(model_B))
coef_vals=coef(model_B)
data.table(Variables=var_names, RegressionCoefficients=coef_vals)
Variables RegressionCoefficients
1: (Intercept) 2.984208e-16
2: radius 1.000000e+00
3: perimeter 1.000000e+00

apply function to subsets of dataframe r

I am trying to subset a dataframe by two variables ('site' and 'year') and apply a function (dismo::biovars) to each subset. Biovars requires monthly inputs (12 values) and outputs 19 variables per year. I'd like to store the outputs for each subset and combine them.
Example data:
data1<-data.frame(Meteostation=c(rep("OBERHOF",12),rep("SOELL",12)),
Year=c(rep(1:12),rep(1:12)),
tasmin=runif(24, min=-20, max=5),
tasmax=runif(24, min=-1, max=30),
pr=runif(24, min=0, max=300))
The full dataset contains 900 stations and 200 years.
I'm currently attempting a nested loop, which I realised isn't the most efficient, and which I'm struggling to make work - code below:
sitesList <- as.character(unique(data1$Meteostation))
#yearsList<- unique(data1$Year)
bvList<-list()
for (i in c(1:length(unique(sitesList)))) {
site<-filter(data1, Meteostation==sitesList[i])
yearsList[i]<-unique(site$Year)
for (j in c(1:length(yearsList))){
timestep<-filter(site,Year==yearsList[j])
tmin<-timestep$tasmin
tmax<-timestep$tasmax
pr<-timestep$pr
bv<-biovars(pr,tmin,tmax)
bvList[[j]]<- bv
}}
bv_all <- do.call(rbind, bvList)
I'm aware there are much better ways to go about this, and have been looking to variations of apply, and dplyr solutions, but am struggling to get my head around it. Any advice much appreciated.

You could use the dplyr package, as follows perhaps?
library(dplyr)
data1 %>%
group_by(Meteostation, Year) %>%
do(data.frame(biovars(.$pr, .$tasmin, .$tasmax)))

Use by and rbind the result.
library("dismo")
res <- do.call(rbind, by(data1, data1[c("Year", "Meteostation")], function(x) {
cbind(x[c("Year", "Meteostation")], biovars(x$pr, x$tasmin, x$tasmax))
}))
Produces
head(res[, 1:10])
# Meteostation Year bio1 bio2 bio3 bio4 bio5 bio6 bio7 bio8
# 1 OBERHOF 1 12.932403 18.59525 100 NA 22.2300284 3.634777 18.59525 NA
# 2 OBERHOF 2 5.620587 7.66064 100 NA 9.4509069 1.790267 7.66064 NA
# 3 OBERHOF 3 0.245540 12.88662 100 NA 6.6888506 -6.197771 12.88662 NA
# 4 OBERHOF 4 5.680438 45.33159 100 NA 28.3462326 -16.985357 45.33159 NA
# 5 OBERHOF 5 -6.971906 16.83037 100 NA 1.4432801 -15.387092 16.83037 NA
# 6 OBERHOF 6 -7.915709 14.63323 100 NA -0.5990945 -15.232324 14.63323 NA

replace NA with data from another column in R

I know how to make the NA's blanks with the following code:
IMILEFT IMIRIGHT IMIAVG
NA NA NA
NA 71.15127 NA
72.18310 72.86607 72.52458
70.61460 68.00766 69.31113
69.39032 69.91261 69.65146
72.58609 72.75168 72.66888
70.85714 NA NA
NA 69.88203 NA
74.47109 73.07963 73.77536
70.44855 71.28647 70.86751
NA 72.33503 NA
69.82818 70.45144 70.13981
68.66929 69.79866 69.23397
72.46879 71.50685 71.98782
71.11888 71.98336 71.55112
NA 67.86667 NA
IMILEFT <- ((ASLCOMPTEST$LHML + ASLCOMPTEST$LRML)/(ASLCOMPTEST$LFML +
ASLCOMPTEST$LTML)*100)
IMILEFT <- sapply(IMILEFT, as.character)
IMILEFT[is.na(IMILEFT)] <- ""
But when I do that code, it won't allow me to do an average of "IMILEFT" and "IMIRIGHT" or make the "IMIAVG" the same as the other column that has a numerical value.
IMIAVG<-((IMILEFT + IMIRIGHT)/2)
Error in IMILEFT + IMIRIGHT : non-numeric argument to binary operator
It will also be the same error if I make it as.numeric

Try the following. Leave the NAs as they are
rowSums(M, na.rm=TRUE) / 2 - (is.na(L) + is.na(R))
## WHERE
M = cbind(IMILEFT, IMIRIGHT)
L = IMILEFT
R = IMIRIGHT
if you have rows were both columns are NA, then have the denominator be
pmin(1, 2 - (is.na(L) + is.na(R)))

Binning average of matrix

I have a matrix with n rows and n columns and I would like to do binning average 10 rows at a time, which means in the end I am left with a matrix of size n/10-by-n. I added the matlab library and tried the following code:
nRemove = rem(size(a,1),10);
a = a(1:end-nRemove,:)
Avg = mean(reshape(a,10,[],n));
AvgF = squeeze(Avg);
but it didn't work, which code/codes should i use?
Thanks!!

Here is another way to do it:
set.seed(5)
x = matrix(runif(1000), ncol = 10)
nr = nrow(x)
gr = rep(1:floor(nr/10), each = 10)
aggregate(x ~ gr, FUN=mean)[,-1]
which results in
NA NA.1 NA.2 NA.3 NA.4 NA.5 NA.6 NA.7
1 0.5295264 0.5957229 0.4502069 0.5168083 0.3398190 0.4075922 0.6059122 0.5127865
2 0.4778341 0.3967321 0.4069635 0.4514742 0.6172677 0.2486085 0.6340686 0.4052600
3 0.5168132 0.5117207 0.5202261 0.5068593 0.5218041 0.4925462 0.5169584 0.4919296
4 0.3299557 0.3314723 0.4503393 0.3965103 0.6166598 0.5525628 0.4943880 0.6048207
5 0.6145423 0.5853235 0.4822182 0.3377771 0.3540784 0.5974846 0.5202577 0.5769518
6 0.5009249 0.5203701 0.3940540 0.4237508 0.3199265 0.4817713 0.4655320 0.6124400
7 0.7335082 0.5856578 0.3929621 0.6403662 0.5347719 0.5658542 0.4226456 0.7196593
8 0.4976663 0.5205538 0.4529273 0.4757352 0.6980300 0.5694570 0.4384924 0.5481236
9 0.5275932 0.5014861 0.5363340 0.5664576 0.5006055 0.5611069 0.3803889 0.4680865
10 0.4560031 0.5527328 0.4419076 0.6893043 0.5161281 0.5895931 0.3965911 0.3842419
NA.8 NA.9
1 0.3711607 0.5541607
2 0.4379255 0.4159131
3 0.5048523 0.5884052
4 0.4642687 0.4572388
5 0.6054209 0.5174784
6 0.4659952 0.5332438
7 0.4568273 0.3943798
8 0.6978356 0.5087778
9 0.4897584 0.4710949
10 0.6310546 0.4775762

t( sapply(1:(NROW(A)/10), function(x) colMeans(A[ x:(x+9), ] ) ) )
You need the transpose operation to re-orient the result. One often needs to do so after an 'apply' operation.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

calculating a parameters in equation - r

Related

compute diff of rows with NAs values in data frame using R

Creating Data Table of Regression Coefficients

apply function to subsets of dataframe r

replace NA with data from another column in R

Binning average of matrix

Categories

Resources