how to apply a function on a data set? - r

I have the following data set. When I run the following function it only produce one line of results. I need to find the Gomp for each tree.
Tree a b k
4382 21,88 9,59 0,0538
4383 13,93 12,94 0,0811
4384 19,69 9,78 0,0597
4385 20,02 8,23 0,0489
4386 11,07 23,2 0,1276
4387 18,35 13,29 0,0772
4388 19,72 17,53 0,0961
4389 26,3 5,26 0,0278
DOY = c(1:365)
Gomp <- data.frame(DF$a * exp (-exp(DF$b-DF$k*DOY)))

I'm at least not quite sure whether I understood correct. Maybe a better question could improve the answers...
DF <- data.frame(Tree = c(4382, 4383, 4384, 5385, 4386), a = c(21.88, 13.93, 19.69, 20.02, 11.07), b = c(9.59, 12.95, 9.78, 8.23, 23.20), k = c(0.0538, 0.0811, 0.0597, 0.0489, 0.1276))
DOY <- c(1:365)
DF_new <- data.frame(sapply(1:length(DF$Tree), function(x)(DF$a[x]*exp(-exp(DF$b[x]-DF$k[x]*DOY)))))
colnames(DF_new) <- DF$Tree
With sapply (apply, vapply,etc.) you can loop through vectors, lists, dataframe and so on. Without 1:length(DF$Tree) the values are used instead of the index.

Related

Calculating autocorrelation [acf()] for multiple columns in single run in R

I had imported my data from excel, containing 64 rows and 15 columns, into R in order to calculate the autocorrelation using the acf(). I've been to other similar threads and the only workable code I could find was:
structure(list(V1 = c(24.1, 6.3, 39.1, 11.97, 23.47), V2 = c(5.63, 9.91, 19.98, 16.14, 15.76), V3 = c(21.08, 5.82, 23.5, 27.71, 3.54)), class = "data.frame", row.names = c(NA, -5L))
for(i in 1:3) {acf(data[ ,i], lag.max = 1, type = "correlation", plot = TRUE , na.action = na.exclude)}
Using this loop, I was able to obtain the autocorrelation plots but there were no mathematical values generated for the autocorrelation coefficient.
I tried assigning a variable name to this loop but after the run, it only returned as NULL.
I also used the following code in place of the for loop:
lapply(split(A,col(A)), function(ts) acf(ts, lag.max=1))
where, A is a matrix with 64 rows and 15 columns. But it doesn't work either because when I convert my imported data into a matrix and run this code, it shows the error "A is not numeric" and also, I think ts stands for time series but when I insert the data in place of ts, there is an error.
1.It's better to avoid using R keywords like "data".It's not always a problem but can lead to confusion.
inputdata <- structure(list(V1 = c(24.1, 6.3, 39.1, 11.97, 23.47),
V2 = c(5.63, 9.91, 19.98, 16.14, 15.76),
V3 = c(21.08, 5.82, 23.5, 27.71, 3.54)),
class = "data.frame", row.names = c(NA, -5L))
2.We prepare an empty list to collect the statistics from acf:
results <- list()
3.You can use a loop or something from the apply family. But I'd say a loop makes it easier to see what is going on. In the loop, on each iteration we put the result in the list.
for(i in 1:3) {
results[[i]] <- acf(inputdata[ ,i], lag.max = 1, type = "correlation", plot = TRUE , na.action = na.exclude)
}
The help for acf (accessed using ?acf tells us where it puts its statistics etc.
Now we can access the results from the analysis the following way:
results[[1]]$acf
returns:
, , 1
[,1]
[1,] 1.0000000
[2,] -0.7761198
and
results[[2]]$acf
returns:
, , 1
[,1]
[1,] 1.000000
[2,] 0.218416
results[[3]]$acf
returns:
, , 1
[,1]
[1,] 1.0000000
[2,] -0.3962866

Summarise multiple columns using multiple functions using base R and Dplyr

the data is something like this:
> head(r)
area peri shape perm
1 4990 2791.90 0.0903296 6.3
2 7002 3892.60 0.1486220 6.3
3 7558 3930.66 0.1833120 6.3
4 7352 3869.32 0.1170630 6.3
5 7943 3948.54 0.1224170 17.1
6 7979 4010.15 0.1670450 17.1
I want to perform multiple functions on each column, what I currently have is this function:
analysis = function(df){
measurements = data.frame(attributes = character(),
mean = double(),
median = double(),
variance = double(),
IQR = double())
for (i in 1:ncol(df)){
names = colnames(df)[i]
temp = data.frame(attribute = names,
mean = mean(df[,i]),
median = median(df[,i]),
variance = var(df[,i]),
IQR = IQR(df[,i]))
measurements = rbind(measurements, temp)
}
return (measurements)
}
It works well and achieve what I want which gives the following output:
attribute mean median variance IQR
1 area 7187.7291667 7487.000000 7.203045e+06 3564.2500000
2 peri 2682.2119375 2536.195000 2.049654e+06 2574.6150000
3 shape 0.2181104 0.198862 6.971657e-03 0.1004083
4 perm 415.4500000 130.500000 1.916848e+05 701.0500000
However, my supervisor said it is not efficient and not thinking in a R way.
I also tried summarise_each()and summarise_all(r, funs(mean, median, var, IQR)) but it doesn't achieve what I want and the output doesn't look nice.
What are some other ways to achieve that output only using base R or dplyr.
I suspect your supervisors comment about 'R'-style thinking was about using that for loop. Almost any for loop you write can be replaced by the apply family of functions (e.g. apply, sapply, lapply etc).
They make it easier to run functions on vectors/data.frames/lists/etc.
Everything you could do using apply functions could be replicated in for loops (usually with similar performance) so using for loops isn't actually a cardinal sin. Why use apply functions? Well ... once you learn them you get more succinct code which returns the results of running your functions on your data. Before long, you'll find this sort of code very intuitive, and even more readable than for loops.
Base R
df <- data.frame(
area = c(4990, 7002, 7558, 7352, 7943),
peri = c(2791.9, 3892.6, 3930.66, 3869.32, 3948.54),
shape = c(.0903296, .148622, .183312, .117063, .122417),
perm = c(6.3, 6.3, 6.3, 6.3, 17.1)
)
sapply(df, function(x) c(mean=mean(x), median=median(x), var=var(x), IQR=IQR(x)))
Your results can be achieved using base::Map:
f <- function(x) {
desc = base::summary(x)
c(
Mean = unname(desc['Mean']),
Median = unname(desc['Median']),
Variance = base::sum((x-desc['Mean'])**2)/(length(x)-1),
IQR = unname(desc['3rd Qu.'] - desc['1st Qu.'])
)
}
t(as.data.frame(base::Map(f, df)))
# Mean Median Variance IQR
# area 7137.3333333 7455.0000000 1.241980e+06 757.25000000
# peri 3740.5283333 3911.6300000 2.183447e+05 68.93000000
# shape 0.1381314 0.1355195 1.192633e-03 0.04403775
# perm 9.9000000 6.3000000 3.110400e+01 8.10000000
Apologies
Data:
df <- data.frame(
area = c(4990, 7002, 7558, 7352, 7943, 7979),
peri = c(2791.9, 3892.6, 3930.66, 3869.32, 3948.54, 4010.15),
shape = c(.0903296, .148622, .183312, .117063, .122417, .167045),
perm = c(6.3, 6.3, 6.3, 6.3, 17.1, 17.1)
)
Hope that's useful.

convert math equations into code with ifelse and min/max in R Studio

For a medical study I would like to calculate the eGFR, a measure of renal function, with an equation that require certain input values: Scr (serum creatinine), ScysC (serum cystatin C), age and sex-depending values, which are all available in my dataset.
Please see the attached image for the equations. eGFR equation
So I am mainly struggling with ifelse-statements and the min/max numbers. How do I create a code to retrieve the output with this equation?
My first thought is to create a loop function, but I don't know exactly how. So any help and time is very much appreciated :)
-EDIT-
NOTICE: it is important that the ratio between min/max is always <1.
e.g. a female with Scr= 0.9 gives Scr/k= 0.9/0.7=1.28 and results in min=1 and max=1.28.
A female with Scr=0.6 gives Scr/k= 0.6/0.7=0.86 and results in min=0.86 and max=1.
Here is a sample of my data:
df <- data.frame(ID = c(1,2,3), AGE = c(36,36, 36),
CYSC = c(0.757, 1.34, 1.34), SCR = c(0.58, 0.68, 0.68), SEX = c(1,1,0))
#Male = 1, Female 0
#equation:
eGFR = 135*((min(Scr/k,1)**a))*((max(Scr/k,1)**-0.601))*(min(Scysc/0.8,1)**-0.375)* (max(Scysc/0.8,1)**-0.711) * (0.995**Age) (*0.969 if female)
(With k=0.7 if F and k=0.9 if M, a=-0.248 if F and a=-0.207 if M)
Ok, so I'm guessing the structure of your data.frame. I provided how I created mine for the test since there seem to be more numbers than row.names. I also assumed that 1 is male and 0 is female. Finally, I added a third female patient for the test, with the same clinical results as male #2.
df <- data.frame(ID = c(1,2,3), AGE = c(36,36, 36), CYSC = c(51.614, 47.669, 47.669), SCR = c(0.75776, 1.34, 1.34), SEX = c(1,1,0))
male.idx <- df$SEX == 1
k <- rep(0.7, nrow(df))
k[male.idx] <- 0.9
a <- rep(-0.248, nrow(df))
a[male.idx] <- -0.207
eGFR <- 135*pmin(df$SCR/k,1)**a*((pmax(df$SCR/k,1)**-0.601))*(pmin(df$CYSC/0.8,1)**-0.375)*
(pmax(df$CYSC/0.8,1)**-0.711) * ifelse(male.idx, 0.995, 0.969)**df$AGE
[edited for more accurate answer]

Logit-Transformation backwards

I've transformed some values from my dataset with the logit transformation from the car-package. The variable "var" represent these values and consists of percentage values.
However, if I transform them back via inv.logit from the boot-package, the values dont match the original ones.
data$var
46.4, 69.5, 82.7, 61.7, 76.4, 84.8, 69.1
data["var_logit"] <- logit(data$var, percents=TRUE)
data$var_logit
-0.137013943, 0.778005062, 1.454239241, 0.452148763, 1.102883518, 1.589885549, 0.760443432
data$var_logback <- inv.logit(data$var_logit)
0.46580 0.68525 0.81065 0.61115 0.75080 0.83060 0.68145
It looks like I have to multiply the result with 100 to get the previous values (or at least some very similar values), but I feel like I'm missing something.
Thanks for the help!
The other thing that's going on here is that car::logit automatically adjusts the data if there are 0 or 1 values:
adjust: adjustment factor to avoid proportions of 0 or 1; defaults to ‘0’ if there are no such proportions in the data, and to ‘.025’ if there are.
library(car)
dat <- c(46.4, 69.5, 82.7, 61.7, 76.4, 84.8, 69.1)
(L1 <- logit(dat, percents=TRUE))
## [1] -0.1442496 0.8236001 1.5645131
## 0.4768340 1.1747360 1.7190001 0.8047985
(L2 <- logit(c(dat,0),percents=TRUE))
## [1] -0.1370139 0.7780051 1.4542392 0.4521488
## 1.1028835 1.5898855 0.7604434 -3.6635616
## Warning message:
## In logit(c(0, dat)) : proportions remapped to (0.025, 0.975)
This means you can't invert the results as easily.
Here's a function (using the guts of car::inv.logit with a little help from Wolfram Alpha because I was too lazy to do the algebra) that inverts the result:
inv.logit <- function(f,a) {
a <- (1-2*a)
(a*(1+exp(f))+(exp(f)-1))/(2*a*(1+exp(f)))
}
zapsmall(inv.logit(L2,a=0.025)*100)
## [1] 46.4 69.5 82.7 61.7 76.4 84.8 69.1 0.0
You set the percents=TRUE flag, which divides your values by 100, and the inverse command does not know about it.

Permuting strings and passing as function arguments in Julia

In my continuing odyssey to kick the tires on Julia more thoroughly, I'm going back and reimplementing my solutions to some bayesian coursework exercises. Last time, I discovered the conjugate distributions facilities in Julia and decided to play with those this time. That part works rather well (as an aside, I haven't figured out if there's a good reason the NormalInverseGamma function won't take sufficient statistics rather than a vector of data, or if it's just not implemented yet).
Here, I'd like to make some comparisons between samples from several posterior distributions. I have three posterior samples that I'd like to compare all permutations of. I am able to permute what should be the arguments to my compare function:
using Distributions
# Data, the expirgated versions
d1 = [2.11, 9.75, 13.88, 11.3, 8.93, 15.66, 16.38, 4.54, 8.86, 11.94, 12.47]
d2 = [0.29, 1.13, 6.52, 11.72, 6.54, 5.63, 14.59, 11.74, 9.12, 9.43]
d3 = [4.33, 7.77, 4.15, 5.64, 7.69, 5.04, 10.01, 13.43, 13.63, 9.9]
# mu=5, sigsq=4, nu=2, k=1
# think I got those in there right... docs were a bit terse
pri = NormalInverseGamma(5, 4, 2, 1)
post1 = posterior(pri, Normal, d1)
post1_samp = [rand(post1)[1] for i in 1:5000]
post2 = posterior(pri, Normal, d2)
post2_samp = [rand(post2)[1] for i in 1:5000]
post3 = posterior(pri, Normal, d3)
post3_samp = [rand(post2)[1] for i in 1:5000];
# Where I want my permutations passed in as arguments
compare(a, b, c) = mean((a .> b) & (b .> c))
#perm = permutations([post1_samp, post2_samp, post3_samp]) # variables?
#perm = permutations([:post1_samp, :post2_samp, :post3_samp]) # symbols?
perm = permutations(["post1_samp", "post2_samp", "post3_samp"]) # strings?
[x for x in perm] # looks like what I want; now how to feed to compare()?
If I'm reading you correctly and you want six outputs, you could pass a tuple containing the arrays to permutations, and then use apply:
julia> perm = permutations((post1_samp, post2_samp, post3_samp))
Permutations{(Array{Any,1},Array{Any,1},Array{Any,1})}(({10.562517942895859,10.572164090071183,10.736702907907505,10.210772173751444,14.366729334490795,10.592629893299842,9.89659091860089,8.116412691836256,10.349724070315517,11.268377549210639 … 12.064725902593915,10.303602433314985,10.002042635051714,9.055831122365928,10.110819233623218,11.562207296236382,10.64265460839246,13.450063260877014,12.017400480458447,10.4932272939257},{7.405568037651908,8.02078920939688,7.511497830660621,7.887748694407902,7.698862774251405,6.007663099515951,7.848174806167786,9.23309632138448,7.205139036914154,8.277223275210972 … 7.06835013863376,6.488809918983307,9.250388581506368,7.350669918529516,5.546251008276725,8.778324046008263,10.833297020230216,9.2006982752771,9.882075423462595,3.253723211533207},{9.531489314208752,7.395780786761686,6.224734811478234,5.200474665890965,8.044992565567913,7.764939771450804,6.646382928269485,5.501893299017636,6.993003549302548,7.243273003116189 … 10.249365688182436,7.499165465689278,6.056692905419897,7.411776062227991,9.829197784956492,7.014685931227273,6.156474145474993,10.258900762434248,-1.044259248117803,7.284861693401341}))
julia> [apply(compare, p) for p in perm]
6-element Array{Any,1}:
0.4198
0.4182
0.0636
0.0154
0.0672
0.0158
Remember though that it's usually a misstep to have a bunch of variables with numbers in their names: that usually suggests they should be together in a named collection of some kind ("post_samples", say.)

Resources