I am trying to use lapply so that I can apply a custom function on all elements of a vector in R. I am trying to avoid using for loop here .
This is what I have so far:
writeToHDFS <- function(fileName){
print(fileName)
hdfs.init()
modelfile <- hdfs.file(fileName, "w")
hdfs.write(gaDataUser, modelfile)
hdfs.close(modelfile)
}
fileNames <- c("gaDataUser", "gaDataSession", "gaDataSources","gaDataEventTracking", "gaDataEcommerce", "gaDataEcommerce2", "gaDataEcommerce3", "gaDataEcommerce4")
lapply(fileNames,writeToHDFS(x))
I have variables with the names mentioned in the character vector fileNames.
What I need to know:
How to pass each string from the vector fileNames to function writeToHDFS since I would like this function to be executed for every element in the vector.
How to use string name to access variables of that name in the function.
For example:
At this line,I have to access variable with name same as string passed to fileName variable in the function.
hdfs.write(variableWithData, modelfile)
3. Can I pass the variable fileName to
modelfile <- hdfs.file(fileName, "w")
instead of passing a string for file name ?
I am trying to use lapply so that I can apply a custom function on all elements of a vector in R
In this situation, you should use tapply:
tapply(fileNames, 1:length(fileNames), writeToHDFS)
lapply, is short for "list-apply", but fileNames is a vector not a list.
Since you are already pretty well-aimed at using lapply, you can learn from ?lapply.
writeToHDFS(x) gives you return value of function.
But you want to pass function so:
lapply(fileNames,writeToHDFS)
Related
How do I create a custom function which takes an integer input, and uses that integer input as a parameter to another function inside it, and returns a variable which appends the integer to its name?
For example:
f <- function(i) {
var_i<-another.Function(parameter="ni")
}
This should return the variable var_1 if I pass i=1.
My goal is to implement parallel processing using this function which I would pass a range of integers for processes, i.e mclapply(1:13,f)
I initially used eval(parse=text..., however I have read that is is not the recommended method.
A quick way that you could do it if you want to pass this for a range of integers is using assign (base) which creates variables and paste (base),
for (i in 1:10){ #or whatever range you want to use
#assign(name of var, the value of var (here 2i, but whatever function you need to pass))
assign(paste("var", i, sep = "_"), i*2) }
This does not need to a be function as it exports automatically variables to your environment, so if you use it inside the function, you will need to return it to put it to global environment.
I am trying to create a function to clean data and return as a data.frame in R.
I'm using the map_df() function to return the cleaned data as a data.frame, and have a function written to clean the data.
The first thing I do is pull a list of files from a folder, then iterate through them and clean each file. I have a pre-defined set specifying which column names to pull (stored in selectCols) in case of variation between files:
files <- list.files(filepath,full.names=F)
colInd <- which(names(fread(files[i],nrows=0)) %in% gsub("_","",selectCols))
I also have a function to clean my data, which uses fread() to read in the .csv files. It takes colInd and i as arguments to clean files iteratively.
cleanData <- function(files,i,colInd) {
addData <- fread(files[i],select=c(colInd))
[...]
}
Overall it looks like this (as a recursive function):
i <- 1
files <- list.files(filepath,full.names=F)
iterateCleaning <- function(files,i) {
colInd <- (which(names(fread(files[i],nrows=0)) %in% gsubs("_","",selectCols))
if (length(colInd)==length(selectCols)) {
newData <- map_df(files,cleanData)
saveToFolder(newData,i,files)
}
else {}
i=i+1
if (i<-length(files)){
iterateCleaning(files,i)
}
else {}
}
When I try to run without specifying the arguments for my function I get this error:
Error in fread(files,select=c(colInd)):
argument "colInd" is missing, with no default.
When I insert it into my map_df() I do it like so:
newData <- map_df(files,i,colInd,cleanData)
Then I get this error:
Error in as_mapper(.f,...): object 'colInd' not found.
Any suggestions for resolving this error? As I understand it, map_df() applies to each element in the function, but I don't need it applied to the i and colInd inputs, I just need them for the function I am calling in map_df(). How can I call map_df() on a function that requires additional arguments?
I read the documentation but it seemed a bit confusing. It says for a single-argument function to use "." and for two-argument functions to use .x and .y, but I'm not sure what it means. My initial guess is something like these, but neither line works):
newData <- map_df(files,cleanData,.i,.colInd)
newData <- map_df(files,cleanData,.x=i,.y=colInd)
Any recommendations? Will I have the same output if I just call map_df() afterwards on the output of my function?
I have an R function with optional parameters like so:
myFunc <- function(
requiredParam,
optionalParam1 = optionalValue1,
optionalParam2 = optionalValue2,
...
optionalParamN = optionalValueN) {
# implementation
}
I have another function which calls this function and has the necessary parameters stored in a dataframe:
optionalParam1 optionalParam3 optionalParam10
1 "val1" "val2" "val3"
I only want to pass the optional parameters specified in the dataframe. For the others, I want it to use the default values. How can I accomplish this without typing up all permutations of optionalParameters existing/not existing?
Call the function using do.call (not knowing what your data.frame is called I will just assume you have a list or something of the parameters called myParams):
do.call(myFunc, as.list(myParams))
You can also build your function call as a string by parsing your dataframe column names and using paste.
Then, use eval(parse(text="your string"))
This question is asked here In R, how to get an object's name after it is sent to a function?
However, this doesn't work when in a for loop. For example the following method will write multiple dataframes to a postgresql database,
write_multiple_to_postgres <- function(list_of_frames) {
for(i in 1:length(list_of_frames)) {
object_name <- deparse(substitute(list_of_frames[[i]]))
write_to_postgresql(object_name, list_of_frames[[i]])
}
}
Where the list_of_frames looks like this:
my_list <- list(data_frame_susan, data_frame_bobby, data_frame_melissa)
....and is called as:
write_multiple_to_postgres(my_list)
I want the object_name to be passed as a string to the write_to_postgresql method. But instead I get the following outputs for object_name
my_list[[1L]],
my_list[[2L]],
my_list[[3L]]
Where what I want is:
data_frame_susan,
data_frame_bobby,
data_frame_melissa
How can I use "deparse(substitute) trick" or some other method to get the object name after being passed into a function and used in a for loop?
As you define your list my_list there is no way to get back df name back. You should use a named list such as my_list=list(data_frame_susan=data_frame_susan...) and then have a loop on names(my_list)
for (df in names(my_list)){
write_to_postgresql(df, my_list[[df]])
}
Now the question is how to prepare my_list with corresponding names but with what you say we don't know from where it comes from and how those data.frames are populated.
I need to write a user-defined function that, when applied to a data frame, will return the column position, the column name, the mode, and the class for each variable. I am able to create one that returns mode and class, but I keep getting errors when I include the position/name. I have been doing this,
myFunction <- function(x) {
data.frame(mode(x), class(x))
}
data.frame(names(myData), myFunction(myData))
and it returns the correct info, but it doesn't combine it into a single function I need. Any advice?
You can combine it as follows:
myFunction <- function(x)
data.frame(mode(x), class(x), cname=names(x), cpos=1:ncol(x))