I'm trying to calculate the 'trapezoidal AUC(area under the curve)' by using the 'trapz' tool from 'caTools'. It is very simple to calculate one variable's AUC when using trapz like this:
tAUC <- trapz(df1$time, df1$CAT.19)
tAUC
Now, I want to create a function with this and eventually 'lapply' it to do batch calculation, but having trouble making this into a function.
I have tried like:
t_func <- function(x){
trapz(df1$time, df1$x)
}
but having error that says "non-conformable arguments"
Can anyone help me with this? Thank you so much.
my df1 looks like this
An image is not helpful way to share data. I have created a fake dataset to reproduce the dataset that you have.
set.seed(123)
df1 <- data.frame(time = seq(0, 120, 15), CAT.01 = rnorm(9), CAT.02 = rnorm(9))
tAUC <- sapply(df1[-1], function(x) caTools::trapz(df1$time, x))
tAUC
# CAT.01 CAT.02
#27.23374 39.27199
If you need a list you may use lapply instead of sapply.
Related
I want to calculate conditional probabilities in my data. Therefore I coded the following:
creditrisks <- read.table("kredit.asc", header=TRUE)
glimpse(creditrisks)
creditrisks$moral1 <- as.integer(moral>1)
creditrisks$konto1 <- as.integer(laufkont==1)
creditrisks$konto2 <- as.integer(laufkont==4)
creditrisks$zweck <- as.integer(0<verw & verw<9)
attach(creditrisks)
prop.table(table(kredit,konto1),2)
prop.table(table(kredit,konto2),2)
prop.table(table(kredit,moral1),2)
prop.table(table(kredit,zweck),2)
The results look like this:
This works well for me, the only thing I want to change is that I can calculate all conditional frequencies at once, so the table should look like this:
With cbind I loose all the variable names, so I'm searching for a more elegant way.
The dataset can be found here: dataset
Thanks for your help!
Try this.
lapply(creditrisks[, c("konto1", "konto2", "moral1", "zweck")],
function(x) prop.table(table(creditrisks$kredit, x), 2)
)
You can also cbind them together by
do.call(cbind,
lapply(
creditrisks[, c("konto1", "konto2", "moral1", "zweck")],
function(x) prop.table(table(creditrisks$kredit, x), 2)
)
)
I have run a loop and the results are saved in the list com. Now I have to call the results of each iteration ( # iterations=2000) and compute the mean of values as below:
l<-rbindlist(list(com[[1]], com[[2]], com[[3]],...com[[2000]]))[, .(values = mean(values)),
by = variables][order(variables)].
I am a beginner in R. What would be the easy way of doing this?
Until you provide a data example, this will only be guesswork. I assume the results in your list com are numeric vectors. If not, this solution may not work.
This is base R, not data.table.
Example data:
set.seed(1)
com <- list(rnorm(100), rnorm(100), rnorm(100), rnorm(100), rnorm(100))
We bind the results together using do.call:
l <- do.call("rbind", com)
Now we use the vectorized rowMeans:
rowMeans(l)
> rowMeans(l)
[1] 0.10888737 -0.03780808 0.02967354 0.05160186 -0.03913424
I am fairly new to R and I like to understand the concept of using the "apply"-family functions to avoid loop and custom functions. Unfortunately I am failing at the very first exercise.
Here is my minimum reproducible example:
x <- data.frame(Hours=cbind(c(rep(5,5),rep(6,5),rep(7,5),rep(8,5),rep(9,5))),Price=c(cbind(seq(48,50.4, by=0.1),seq(48,52.8, by=0.2),seq(48,55.2, by=0.3),seq(48,57.8, by=0.4),seq(48,60.0, by=0.5))),Volume=seq(10000:10024))
f1 <- approxfun(x$Volume,x$Price, rule=2)
plot(x$Volume, x$Price)
curve(f1, add=TRUE)
However, I would like to perform approxfun() with every unique Hour in x$Hour.
How would I have to approach this?
Thank you for your help.
This solution was provided by bunk.
The idiom is split/apply/combine: split the data, apply the function, combine the results. R/*plyr/data.table etc has many functions to do this:
fns <- lapply(split(x, x$Hours), function(dat) approxfun(dat$Volume, dat$Price, rule=2)); plot(x$Volume, x$Price); cols <- 1; for(fn in fns) curve(fn, add=TRUE, col=(cols<<-cols+1))
I'm working with a set of results of INLA package in R. These results are stored in objects with meaningful names so I can have, for instance, model_a, model_b... in current environment. For each of these models I'd like to do several processing tasks including extracting of the data to separate data frame, which can then be used to merge to spatial data to create map, etc.
Turning to simpler, reproducible example let's assume two results
ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group <- gl(2, 10, 20, labels = c("Ctl","Trt"))
weight <- c(ctl, trt)
model_a <- lm(weight ~ group)
model_b <- lm(weight ~ group - 1)
I can handle the steps for an individual model, for instance:
model_a_sum <- data.frame(var = character(1), model_a_value = numeric(1))
model_a_sum$var <- "Intercept"
model_a_sum$model_a_value <- model_a$coefficients[1]
png("model_a_plot.png")
plot(model_a, las = 1)
dev.off()
Now, I'd like to reuse this code for each of the models, essentially constructing correct names depending on the model I'm using. I'm more Stata than R person and inside Stata that would be a trivial task to use the stub of a name (model_a, or even a only..) and construct foreach loop that would implement all the steps, adapting names for each of the models.
In R, for loops have been bashed all over the internet so I presume I shouldn't attempt to venture into the territory of:
models <- c("model_a", "model_b", "model_c")
for (model in models) {
...
}
What would be the better solution for such scenario?
Update 1: Since comments suggested that for might indeed be an option I'm trying to put all the tasks into a loop. So far I manged to name the data frame correctly using assign and get correct data plotted under correct name using get:
models <- c("model_a", "model_b")
for (i in 1:length(models)) {
# create df
name.df <- paste0(models[i], "_sum")
assign(name.df, data.frame(var = character(1), value = numeric(1)))
# replace variables of df with results from the model
# plot and save
name.plot <- paste0(models[i], "_plot.png")
png(name.plot)
plot(get(models[i]), which = 1, las = 1)
dev.off()
}
Is this reasonable approach? Any better solutions?
One thing I cannot solve is having the second variable of the df named according to the model (ie. model_a_value instead of current value. Any ideas how to solve that?
Some general tips/advice:
As mentioned in comments, don't believe much of the negativity about for loops in R. The issue is not that they are bad, but more that they are correlated with some bad code patterns that are inefficient.
More important is to use the right data organization. Don't keep the models each in a separate object!. Put them in a list:
l <- vector("list",3)
l[[1]] <- lm(...)
l[[2]] <- lm(...)
l[[3]] <- lm(...)
Then name the list:
names(l) <- paste0("model_",letters[1:3])
Now you can loop over the list without resorting to awkward and unnecessary tools like assign and get, and more importantly when you're ready to step up from for loops to tools like lapply you're all good to go.
I would use similar strategies for your data frames as well.
See #joran answer, this one is to show use of assign and get but should be avoided when possible.
I would go this way for the for loop:
for (model in models) {
m <- get(model) # to get the real model object
# create the model_?_sum dataframe
assign(paste0(model,"_sum"), data.frame(var = "Intercept", value = m$coefficients[1]))
assign(paste0(model,"_sum"), setNames( get(paste0(model,"_sum")), c("var",paste0(model,"_value"))) ) # per comment to rename the value column thanks to #Franck in chat for the guidance
# paste0 to create the text
png(paste0(model,"_plot.png"))
plot(m, las = 1) # use the m object to graph
dev.off()
}
which give the two images and this:
> model_a_sum
var value
(Intercept) Integer 5.032
> model_b_sum
var value
groupCtl Integer 5.032
>
I'm unsure of why you wish this dataframe, but I hope this give clues on how to makes variables names and how to access them.
I am trying to write a function in R, for a simple time series regression (the result of this function is the output for more complicated ones). In the first part i define the variables and create some lags for the function, which are named ar_i depending on the used lag.
However in the second part i try to combine this lags in a matrix using a cbind function on the variables initially defined. As you can see the output is not the expected matrix, but the names of the lags themselves. I tried to solve this by using the noquote() and cat() function, but these don't seem to work.
Do you have any suggestions? Thanks in advance!!!
Pd: The code and the results are below.
trans <- dlpib
ar <- dlpib
linear <- 1:4
for (i in linear){
assign(paste("ar_",i,sep = ""), lag(ar,k=-i))
}
linear_dat <- cbind(paste("ar_",linear, collapse=',', sep = ""))
> linear_dat
[,1]
[1,] "ar_1,ar_2,ar_3,ar_4"
I think you could go about this more efficiently with sapply:
linear <- 1:4
linear_list <- lapply(linear, function(i) lag(ar, k=-i))
linear_dat <- do.call(cbind, linear_list)
colnames(linear_dat) <- paste0("ar_", linear)