I am trying to create such list with value from different data frame called kc2 to kc10. anyone provide me some advice how to formulate this for loop?
sum_square=append(sum_square,weighted.mean(x=kc2$withinss,w=kc2$size, na.rm=TRUE))
I tried something like this but didnt work:
for (i in 2:10){
nam1 = paste0("kc",i,"$withinss")
nam2 = paste0("kc",i,"$size")
sum_square = append(sum_square, lapply(c(as.numeric(nam1),as.numeric(nam2)), weighted.mean))
}
There are a lot of problems with the code you posted, so I'll just cut right to the point. In R, when you want to apply a function to multiple objects and collect the result, you should be thinking of using lapply. lapply loops through a list of objects (you can put your data frames into a list), applies the chosen function to each, and then returns the result of each as a list. The below code is in the form of what you want:
# Add data frames to list by name
list_of_data_frames <- list(kc2, kc3, kc4, kc5, kc6, kc7, kc8, kc9, kc10)
# OR add them programatically
list_of_data_frames <- mget(paste0('kc', seq.int(from = 2, to = 10)))
result <- lapply(list_of_data_frames,
function(x) weighted.mean(x = x$withiniss, w = x$size, na.rm=TRUE))
Related
This topic has been covered numerous times I see but I can't really get the answer I'm looking for. Thus, here I go.
I am trying to do a loop to create variables in 5 data sets that have similar names as such:
Ech_repondants_nom_1
Ech_repondants_nom_2
Ech_repondants_nom_3
Ech_repondants_nom_4
Ech_repondants_nom_5
Below if the code that I have tried:
list <- c(1:5)
for (i in list) {
Ech_repondants_nom_[[i]]$sec = as.numeric(Ech_repondants_nom_[[i]]$interviewtime)
Ech_repondants_nom_[[i]]$min = round(Ech_repondants_nom_[[i]]$sec/60,1)
Ech_repondants_nom_[[i]]$heure = round(Ech_repondants_nom_[[i]]$min/60,1)
}
Any clues why this does not work?
cheers!
These are object names and not list elements to subset as Ech_repondants_nom_[[i]]. We may need to get the object by paste i.e.
get(paste0("Ech_repondants_nom_", i)$sec
but, then if we need to update the original object, have to call assign. Instead of all this, it can be done more easily if we load the datasets into a list and loop over the list with lapply
lst1 <- lapply(mget(paste0("Ech_repondants_nom_", 1:5)), function(dat)
within(dat, {sec <- as.numeric(interviewtime);
min <- round(sec/60, 1);
heure <- round(min/60, 1)}))
It may be better to keep it as a list, but if we need to update the original object, use list2env
list2env(lst1, .GlobalEnv)
Ech_repondants_nom_[[i]]
Isn't actually selcting that dataframe because you can't call objects like that. Try creating a function that takes a dataframe as an argument then iterating through the dataframes
changing_time_stamp<-function(df){
df$sec = as.numeric(df$interviewtime)
df$min = round(df$sec/60,1)
df$heure = round(df$min/60,1)
for (i in list) {
changing_time_stamp(i)
}
EDIT: I fixed some of the variable names in the function
Incredibly basic question. I'm brand new to R. I feel bad for asking, but also like someone will crush it:
I'm trying to generate a number of vectors with a for loop. Each with an unique name, numbered by iteration. The code I'm attaching throws an error, but I think it explains what I'm trying to do in principle fairly well.
Thanks in advance.
vectorBuilder <- function(num){
for (x in num){
paste0("vec",x) <- rnorm(10000, mean = 0, sd = 1)}
}
numSeries <- 1:10
vectorBuilder(numSeries)
You can write the function to return a named list :
create_vector <- function(n) {
setNames(replicate(n, rnorm(10000), simplify = FALSE),
paste0('vec', seq_len(n)))
}
and call it as :
data <- create_vector(10)
data will have list of length 10 with each element having a vector of size 10000. It is better to keep data in this list instead of creating lot of vectors in global environment. However, if you still want separate vectors you can use list2env :
list2env(data, .GlobalEnv)
I have a custom function that I want to apply to any dataset that shares a common name.
common_funct=function(rank_p=5){
df = ANY_DATAFRAME_HERE[ANY_DATAFRAME_HERE$rank <rank_p,]
return(df)
}
I know with common functions I could do something like below to get the value of each.
apply(mtcars,1,mean)
But what if I wanted to do :
apply(any_dataset, 1, common_funct(anyvalue))
How would I pass that along?
library(dplyr)
mtcars$rank = dense_rank(mtcars$mpg)
iris$rank = dense_rank(iris$Sepal.Length)
Now how would I go about applying my same function to both values?
If I understand you question, I would suggest putting you data frames into a list and apply over it. So
## Your example function
common_funct=function(df, rank_p=5){
df[df$rank <rank_p,]
}
## Sanity check
common_funct(mtcars)
common_funct(iris)
Next create a list of the data frames
l = list(mtcars, iris)
and use lapply
lapply(l, common_funct)
I am having trouble optimising a piece of R code. The following example code should illustrate my optimisation problem:
Some initialisations and a function definition:
a <- c(10,20,30,40,50,60,70,80)
b <- c(“a”,”b”,”c”,”d”,”z”,”g”,”h”,”r”)
c <- c(1,2,3,4,5,6,7,8)
myframe <- data.frame(a,b,c)
values <- vector(length=columns)
solution <- matrix(nrow=nrow(myframe),ncol=columns+3)
myfunction <- function(frame,columns){
athing = 0
if(columns == 5){
athing = 100
}
else{
athing = 1000
}
value[colums+1] = athing
return(value)}
The problematic for-loop looks like this:
columns = 6
for(i in 1:nrow(myframe){
values <- myfunction(as.matrix(myframe[i,]), columns)
values[columns+2] = i
values[columns+3] = myframe[i,3]
#more columns added with simple operations (i.e. sum)
solution <- rbind(solution,values)
#solution is a large matrix from outside the for-loop
}
The problem seems to be the rbind function. I frequently get error messages regarding the size of solution which seems to be to large after a while (more than 50 MB).
I want to replace this loop and the rbind with a list and lapply and/or foreach. I have started with converting myframeto a list.
myframe_list <- lapply(seq_len(nrow(myframe)), function(i) myframe[i,])
I have not really come further than this, although I tried applying this very good introduction to parallel processing.
How do I have to reconstruct the for-loop without having to change myfunction? Obviously I am open to different solutions...
Edit: This problem seems to be straight from the 2nd circle of hell from the R Inferno. Any suggestions?
The reason that using rbind in a loop like this is bad practice, is that in each iteration you enlarge your solution data frame and then copy it to a new object, which is a very slow process and can also lead to memory problems. One way around this is to create a list, whose ith component will store the output of the ith loop iteration. The final step is to call rbind on that list (just once at the end). This will look something like
my.list <- vector("list", nrow(myframe))
for(i in 1:nrow(myframe)){
# Call all necessary commands to create values
my.list[[i]] <- values
}
solution <- rbind(solution, do.call(rbind, my.list))
A bit to long for comment, so I put it here:
If columns is known in advance:
myfunction <- function(frame){
athing = 0
if(columns == 5){
athing = 100
}
else{
athing = 1000
}
value[colums+1] = athing
return(value)}
apply(myframe, 2, myfunction)
If columns is not given via environment, you can use:
apply(myframe, 2, myfunction, columns) with your original myfunction definition.
I want to use an apply statement to do something to each row of a data frame in R.
The following works where I call the function "calc.Sphere.Metrics" with a bunch of parameters and an index i. I store the result in each row.
for(i in 1: dim(position.matrix)[1]){
results.obs[i,] <- calc.Sphere.Metrics(i, culled.mutation.data, position.matrix, protein.metrics, radius)
}
I've tried several apply, mapply statements but am having no luck. What would be the correct way to do this?
EDIT:
As requested, here's a skeleton of calc.Sphere.Metrics
calc.Sphere.Metrics <- function(index, culled.mutation.data, position.matrix, protein.metrics, radius){
results <- matrix(data = 0, nrow = 1, ncol = 8)
colnames(results) <- c("Line.Length","Center", "Start","End","Positions","MutsCount","P.Value", "Within.Range")
results <- as.data.frame(results)
....
look up a bunch of stuff and fill in each column of results. All the data required is in the parameters passed in and the index.
.....
return(results)
}
Results has the same number of columns as results.obs in the top function. Hope this helps!
Thanks!
Probably something like this:
result.obs <- do.call(rbind, lapply(seq_len(dim(position_matrix)[1]),
calc.Sphere.Metrics, culled.mutation.data, position.matrix, protein.metrics, radius))