Below, assume this is part of the data:
df <- tribble(
~temp1, ~temp2, ~temp3, ~temp4, ~temp5, ~temp6, ~temp7, ~temp8,
75, 88, 85, 71, 98, 76, 71, 57,
80, 51, 84, 72, 59, 81, 70, 64,
54, 65, 90, 66, 93, 88, 77, 59,
59, 87, 94, 75, 74, 53, 56, 87,
52, 55, 64, 77, 50, 64, 83, 87,
)
Now I want to make a loop to get the results. In this example, temp1 should go with temp2 ONLY and temp3 should go with temp4 only, temp5 with temp6 only and temp7 with temp8.
Suppose I want to run a correlation or a t-test between the intended variables ( temp1 with 2, temp3 with temp4, temp5with tem6, temp7 with temp8 ONLY)
I would also like to get only statistics, for example only the value of r in correlation... A table would be very helpful.
I have searched it seems we need to use the function of the map, but I struggled to do it. Could we do it in R?
We can use seq to subset the columns and use map2 so that we get the correlation between temp1 and temp2, temp3 and temp4 etc
library(purrr)
out <- map2_dbl(df[seq(1, ncol(df), 2)], df[seq(2, ncol(df), 2)], ~ cor(.x, .y))
names(out) <- paste0("Time", seq_along(out))
Or with Map from base R
out <- unlist(Map(function(x, y) cor(x, y), df[seq(1, ncol(df), 2)],
df[seq(2, ncol(df), 2)]))
names(out) <- paste0("Time", seq_along(out))
You could split your dataframe in two: one with columns 1,3,5,7 and the other with 2,4,6,8.
Then you one take one column per each a time and perform cor or t.test with pmap.
library(purrr)
df %>%
split.default(rep_len(1:2, ncol(.))) %>%
pmap_dbl(~cor(.x,.y))
Related
I'm looking to find multiple max values using multiple ranges from a single table without using a loop.
It's difficult to explain, but here's an example:
list of value <- c(100, 110, 54, 64, 73, 23, 102)
beginning_of_max_range <- c(1, 2, 4)
end_of_max_range <- c(3, 5, 6)
output
110, 110, 73
max(100, 110, 54)
max(110, 54, 64)
max(64, 73, 23)
You may do this with mapply -
list_of_value <- c(100, 110, 54, 64, 73, 23, 102)
beginning_of_max_range <- c(1, 2, 4)
end_of_max_range <- c(3, 5, 6)
mapply(function(x, y) max(list_of_value[x:y]), beginning_of_max_range, end_of_max_range)
#[1] 110 110 73
We create a sequence from beginning_of_max_range to end_of_max_range, subset it from list_of_value and get the max from each pair.
I am trying to calculate concordance value (using epiR package) between measured and predicted for each group with a dplyr pipe operation. My example code is below
measured <- c(23, 20, 24, 26, 23, 46, 47, 45, 47, 46, 67, 68, 64, 63, 63)
predicted <- c(21, 19, 25, 23, 25, 48, 45, 46, 48, 46, 67, 68, 64, 63, 63)
gdata <- cbind(replicate, measured, predicted)
gdata <- as.data.frame(gdata)
head(gdata)
gdata$replicate <- as.factor(gdata$replicate)
test <- gdata %>%
group_by(replicate) %>%
mutate(tt <- epiR::epi.ccc(gdata$measured, gdata$predicted))
What I want is to extract the tt$rho$est value that returns within the list for each group in this case replicate. However, I am getting an error. It works outside the pipe operation using below code. Any help is appreciated.
tt <- epi.ccc(bootdata$CalCut_DRY, bootdata$Predicted)
tt$rho.c$est
You can try :
library(dplyr)
gdata %>%
group_by(replicate) %>%
mutate(tt = epiR::epi.ccc(measured, predicted)$rho.c$est)
If the rho.c$est value is same for the group then you probably can use summarise instead of mutate.
How would I vectorize this loop? When I have the loop with the backward stepwise regression, it takes over 15 minutes to run through the regression. (My full dataset has over 4000 observations and 20+ independent variables.) Any idea how I would vectorize this? I'm new to the whole concept.
I've looked into making this a function, and then using an ifelse statement for the training and validation. But, I haven't been able to get this to work in the code. Any ideas?
Here is a small dataset:
name <- c("Joe I.", "Joe I.", "Joe I.", "Joe I.", "Jane P.", "Jane P.", "Jane P.", "Jane P.",
"John K.", "John K.", "John K.", "John K.")
name_id <- c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3)
grade <- c(80, 99, 70, 65, 88, 90, 76, 65, 67, 68, 89, 67)
score <- c(82, 93, 72, 61, 89, 93, 71, 63, 64, 65, 82, 62)
attendance <- c(80, 99, 82, 62, 70, 65, 88, 90, 76, 93, 71, 99)
participation <- c(71, 63, 64, 71, 99, 76, 65, 67, 93, 72, 68, 89)
df <- cbind(name, name_id, class, grade, score, attendance, participation)
df <- as.data.frame(df)
df$name_id <- as.numeric(df$name_id)
df$grade <- as.numeric(df$grade)
df$score <- as.numeric(df$score)
df$attendance <- as.numeric(df$attendance)
df$participation <- as.numeric(df$participation)
Here is the loop:
magic_for(print, silent = TRUE)
for(i in 1:3){
validation = df[df$name_id == (i),]
training = df[df$name_id != (i),]
m = lm(score ~ grade + attendance, participation, data = training)
stepm <- stepAIC(m, direction = "backward", trace = FALSE)
pred1 <- predict(stepm, validation)
print(pred1)
}
options(max.print=999999)
pred1 <- magic_result_as_dataframe()
I am not sure if the following code can speed up your program, please have a try. Here df is pre-processed to be splitted by df$name_id, such that you have different chunks in terms of name_id
dfs <- split(df,df$name_id)
lapply(seq_along(dfs), function(k) {
validation <- dfs[[k]]
m <- lm(score ~ grade + attendance, participation, data = Reduce(rbind,dfs[-k]))
stepm <- stepAIC(m, direction = "backward", trace = FALSE)
pred1 <- predict(stepm, validation)
})
in R it is possible to create a list
k <- list()
k[[1]] <- airquality
k[[2]] <- rock
k[[3]] <- AirPassengers
k[[4]] <- airmiles
k[[5]] <- trees
k[[6]] <- treering
and selecting it with
k[c(1:3,6)]
How it is possible to do same in S4 class?
for example I create the some data from dismo package:
library(dismo)
example(voronoi)
that performs following:
p <- matrix(c(17, 42, 85, 70, 19, 53, 26, 84, 84, 46, 48, 85, 4, 95, 48, 54, 66, 74, 50, 48,
28, 73, 38, 56, 43, 29, 63, 22, 46, 45, 7, 60, 46, 34, 14, 51, 70, 31, 39, 26), ncol=2)
v <- voronoi(p)
v
I want to select the coordinates of a polygon, it can be done with.
v#polygons[[1]]#Polygons[[1]]#coords.
My question is How to select for example 1 to 3rd and sixth component?
my idea to use
v#polygons[c(1:3,6)]#Polygons[[1]]#coords
does not work. R says:
Error: trying to get slot "Polygons" from an object of a basic class ("list") with no slots
The problem isn't with v#polygons[c(1:3,6)] but rather in the attempt to apply #Polygons[[1]]#coords directly to the resulting list. instead, you could use lapply() on v#polygons[c(1:3,6)] like this:
result <- lapply(v#polygons[c(1:3,6)], function(x) x#Polygons[[1]]#coords)
which works as expected.
I have a set of data that I have collected which consists of a time series, where each y-value is found by taking the mean of 30 samples of grape cluster weight.
I want to simulate more data from this, with the same number of x and y values, so that I can carry out some Bayesian analysis to find the posterior distribution of the data.
I have the data, and I know that the growth follows a Gompertz curve with formula:
[y = a*exp(-exp(-(x-x0)/b))], with a = 88.8, b = 11.7, and x0 = 15.1.
The data I have is
x = c(0, 28, 36, 42, 50, 58, 63, 71, 79, 85, 92, 99, 106, 112)
y = c(0, 15, 35, 55, 62, 74, 80, 96, 127, 120, 146, 160, 177, 165).
Any help would be appreciated thank you
*Will edit when more information is given**
I am a little confused by your question. I have compiled what you have written into R. Please elaborate for me so that I can help you:
gompertz <- function(x, x0, a, b){
a*exp(-exp(-(x-x0)/b))
}
y = c(0, 15, 35, 55, 62, 74, 80, 96, 127, 120, 146, 160, 177, 165) # means of 30 samples of grape cluster weights?
x = c(0, 28, 36, 42, 50, 58, 63, 71, 79, 85, 92, 99, 106, 112) # ?
#??
gompertz(x, x0 = 15.1, a = 88.8, b = 11.7)
gompertz(y, x0 = 15.1, a = 88.8, b = 11.7)