Apply custom function to any dataset with common name

Apply custom function to any dataset with common name - r

I have a custom function that I want to apply to any dataset that shares a common name.
common_funct=function(rank_p=5){
df = ANY_DATAFRAME_HERE[ANY_DATAFRAME_HERE$rank <rank_p,]
return(df)
}
I know with common functions I could do something like below to get the value of each.
apply(mtcars,1,mean)
But what if I wanted to do :
apply(any_dataset, 1, common_funct(anyvalue))
How would I pass that along?
library(dplyr)
mtcars$rank = dense_rank(mtcars$mpg)
iris$rank = dense_rank(iris$Sepal.Length)
Now how would I go about applying my same function to both values?

If I understand you question, I would suggest putting you data frames into a list and apply over it. So
## Your example function
common_funct=function(df, rank_p=5){
df[df$rank <rank_p,]
}
## Sanity check
common_funct(mtcars)
common_funct(iris)
Next create a list of the data frames
l = list(mtcars, iris)
and use lapply
lapply(l, common_funct)

Related

How to replicate a function for a nested list

I have a nested list (datalist) which I'd like to repeat the following function for. Within datalist are multiple dataframes (e.g., A-F).
After doing the following for the nested dataframe "A", I'd like to run it for the other nested dataframes (B-F):
dat_A_dat<-datalist["A"]
dat_A <- dat_A_dat$"A"[,c(1:4,7)] #note: I have to use $ to access this
dat_A.v <-dat_A[,c(1,2)]
dat_A.b <-dat_A[,3]
dat_A.c <-dat_A[,4]
dat_A.r <-dat_A[,7]
Is there a simpler way of doing this?
Your help would be greatly appreciated. Thank you all.

Its not clear what structure your data is or what exactly you're trying to achieve, but if you're just asking how to write a function that can be applied to each element of the list then it would be something like as follows.
Note: you might have to change this depending on the structure of your data and what you are trying to achieve, in future try to include a reproducible example
my_extraction_function <- function(d_list) {
dat <- d_list$"A"[,c(1:4,7)] #note: I have to use $ to access this
dat.v <-dat[,c(1,2)]
dat.b <-dat[,3]
dat.c <-dat[,4]
dat.r <-dat[,7]
# Return them in whatever format you want
list(v = dat.v,
b = dat.b,
c = dat.c,
r = dat.r)
}
You can then do:
my_extraction_function(datalist[["A"]])
my_extraction_function(datalist[["B"]])
... etc.
or
lapply(datalist, my_extraction_function)

Loop for on dataset name in R

This topic has been covered numerous times I see but I can't really get the answer I'm looking for. Thus, here I go.
I am trying to do a loop to create variables in 5 data sets that have similar names as such:
Ech_repondants_nom_1
Ech_repondants_nom_2
Ech_repondants_nom_3
Ech_repondants_nom_4
Ech_repondants_nom_5
Below if the code that I have tried:
list <- c(1:5)
for (i in list) {
Ech_repondants_nom_[[i]]$sec = as.numeric(Ech_repondants_nom_[[i]]$interviewtime)
Ech_repondants_nom_[[i]]$min = round(Ech_repondants_nom_[[i]]$sec/60,1)
Ech_repondants_nom_[[i]]$heure = round(Ech_repondants_nom_[[i]]$min/60,1)
}
Any clues why this does not work?
cheers!

These are object names and not list elements to subset as Ech_repondants_nom_[[i]]. We may need to get the object by paste i.e.
get(paste0("Ech_repondants_nom_", i)$sec
but, then if we need to update the original object, have to call assign. Instead of all this, it can be done more easily if we load the datasets into a list and loop over the list with lapply
lst1 <- lapply(mget(paste0("Ech_repondants_nom_", 1:5)), function(dat)
within(dat, {sec <- as.numeric(interviewtime);
min <- round(sec/60, 1);
heure <- round(min/60, 1)}))
It may be better to keep it as a list, but if we need to update the original object, use list2env
list2env(lst1, .GlobalEnv)

Ech_repondants_nom_[[i]]
Isn't actually selcting that dataframe because you can't call objects like that. Try creating a function that takes a dataframe as an argument then iterating through the dataframes
changing_time_stamp<-function(df){
df$sec = as.numeric(df$interviewtime)
df$min = round(df$sec/60,1)
df$heure = round(df$min/60,1)
for (i in list) {
changing_time_stamp(i)
}
EDIT: I fixed some of the variable names in the function

get() not working for column in a data frame in a list in R (phew)

I have a list of data frames. I want to use lapply on a specific column for each of those data frames, but I keep throwing errors when I tried methods from similar answers:
The setup is something like this:
a <- list(*a series of data frames that each have a column named DIM*)
dim_loc <- lapply(1:length(a), function(x){paste0("a[[", x, "]]$DIM")}
Eventually, I'll want to write something like results <- lapply(dim_loc, *some function on the DIMs*)
However, when I try get(dim_loc[[1]]), say, I get an error: Error in get(dim_loc[[1]]) : object 'a[[1]]$DIM' not found
But I can return values from function(a[[1]]$DIM) all day long. It's there.
I've tried working around this by using as.name() in the dim_loc assignment, but that doesn't seem to do the trick either.
I'm curious 1. what's up with get(), and 2. if there's a better solution. I'm constraining myself to the apply family of functions because I want to try to get out of the for-loop habit, and this name-as-list method seems to be preferred based on something like R- how to dynamically name data frames?, but I'd be interested in other, more elegant solutions, too.

I'd say that if you want to modify an object in place you are better off using a for loop since lapply would require the <<- assignment symbol (<- doesn't work on lapply`). Like so:
set.seed(1)
aList <- list(cars = mtcars, iris = iris)
for(i in seq_along(aList)){
aList[[i]][["newcol"]] <- runif(nrow(aList[[i]]))
}
As opposed to...
invisible(
lapply(seq_along(aList), function(x){
aList[[x]][["newcol"]] <<- runif(nrow(aList[[x]]))
})
)
You have to use invisible() otherwise lapply would print the output on the console. The <<- assigns the vector runif(...) to the new created column.
If you want to produce another set of data.frames using lapply then you do:
lapply(seq_along(aList), function(x){
aList[[x]][["newcol"]] <- runif(nrow(aList[[x]]))
return(aList[[x]])
})
Also, may I suggest the use of seq_along(list) in lapply and for loops as opposed to 1:length(list) since it avoids unexpected behavior such as:
# no length list
seq_along(list()) # prints integer(0)
1:length(list()) # prints 1 0.

Append in a for loop

I am trying to create such list with value from different data frame called kc2 to kc10. anyone provide me some advice how to formulate this for loop?
sum_square=append(sum_square,weighted.mean(x=kc2$withinss,w=kc2$size, na.rm=TRUE))
I tried something like this but didnt work:
for (i in 2:10){
nam1 = paste0("kc",i,"$withinss")
nam2 = paste0("kc",i,"$size")
sum_square = append(sum_square, lapply(c(as.numeric(nam1),as.numeric(nam2)), weighted.mean))
}

There are a lot of problems with the code you posted, so I'll just cut right to the point. In R, when you want to apply a function to multiple objects and collect the result, you should be thinking of using lapply. lapply loops through a list of objects (you can put your data frames into a list), applies the chosen function to each, and then returns the result of each as a list. The below code is in the form of what you want:
# Add data frames to list by name
list_of_data_frames <- list(kc2, kc3, kc4, kc5, kc6, kc7, kc8, kc9, kc10)
# OR add them programatically
list_of_data_frames <- mget(paste0('kc', seq.int(from = 2, to = 10)))
result <- lapply(list_of_data_frames,
function(x) weighted.mean(x = x$withiniss, w = x$size, na.rm=TRUE))

How to index data frame column by a variable?

As an example, I want a function that will iterate over the columns in a dataframe and print out each column's data type (e.g., "numeric", "integer", "character", etc)
Without a variable I know I can do class(df$MyColumn) and get the data type. How can I change it so "MyColumn" is a variable?
What I'm trying is
f <- function(df) {
for(column in names(df)) {
columnClass = class(df[column])
print(columnClass)
}
}
But this just prints out [1] "data.frame" for each column.

Since a data frame is simply a list, you can loop over the columns using lapply and apply the class function to each column:
lapply(df, class)
To address the previously unspoken concerns in User's comment.... if you build a function that does whatever it is that you hope to a column, then this will succeed:
func <- function(col) {print(class(col))}
lapply(df, func)
It's really mostly equivalent to:
for(col in names(df) ) { print(class(df[[col]]))}
And there would not be an unneeded 'colClass' variable cluttering up the .GlobalEnv.

Use a comma before column:
for(column in names(df)) {
columnClass = class(df[,column])
print(columnClass)
}

Much as DWin suggested
apply(df,2,class)
but you say you want to do more with each coloumn?
What do you want to do. Try to avoid abstract examples.
In case it helps
apply(df,2,mean)
apply(df,2,sd)
or something more complicated
apply(df,2,function(x){s = c(summary(x)["Mean"], summary(x)["Median"], sd(x))})
Note that the summary function gives you most of this functionality anyway, but this is just an example. any function can be place inside of an apply and iterated over the cols of a matrix or dataframe. that function can be as complex or as simple as you need it to be.

You can use the colwise function of the plyr package to transform any function into a column wise function. This is a wrapper for lapply.
library(plyr)
colwise.print.class<-colwise(.fun=function(col) {print(class(col))})
colwise.print.class(df)
You can view the function created with
print(colwise.print.class)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Apply custom function to any dataset with common name - r

Related

How to replicate a function for a nested list

Loop for on dataset name in R

get() not working for column in a data frame in a list in R (phew)

Append in a for loop

How to index data frame column by a variable?

Categories

Resources