how to use lapply in R - r

I am trying to use lapply so that I can apply a custom function on all elements of a vector in R. I am trying to avoid using for loop here .
This is what I have so far:
writeToHDFS <- function(fileName){
print(fileName)
hdfs.init()
modelfile <- hdfs.file(fileName, "w")
hdfs.write(gaDataUser, modelfile)
hdfs.close(modelfile)
}
fileNames <- c("gaDataUser", "gaDataSession", "gaDataSources","gaDataEventTracking", "gaDataEcommerce", "gaDataEcommerce2", "gaDataEcommerce3", "gaDataEcommerce4")
lapply(fileNames,writeToHDFS(x))
I have variables with the names mentioned in the character vector fileNames.
What I need to know:
How to pass each string from the vector fileNames to function writeToHDFS since I would like this function to be executed for every element in the vector.
How to use string name to access variables of that name in the function.
For example:
At this line,I have to access variable with name same as string passed to fileName variable in the function.
hdfs.write(variableWithData, modelfile)
3. Can I pass the variable fileName to
modelfile <- hdfs.file(fileName, "w")
instead of passing a string for file name ?

I am trying to use lapply so that I can apply a custom function on all elements of a vector in R
In this situation, you should use tapply:
tapply(fileNames, 1:length(fileNames), writeToHDFS)
lapply, is short for "list-apply", but fileNames is a vector not a list.
Since you are already pretty well-aimed at using lapply, you can learn from ?lapply.

writeToHDFS(x) gives you return value of function.
But you want to pass function so:
lapply(fileNames,writeToHDFS)

Related

R custom function which takes an integer i, then returns an integer-appended variable name

How do I create a custom function which takes an integer input, and uses that integer input as a parameter to another function inside it, and returns a variable which appends the integer to its name?
For example:
f <- function(i) {
var_i<-another.Function(parameter="ni")
}
This should return the variable var_1 if I pass i=1.
My goal is to implement parallel processing using this function which I would pass a range of integers for processes, i.e mclapply(1:13,f)
I initially used eval(parse=text..., however I have read that is is not the recommended method.
A quick way that you could do it if you want to pass this for a range of integers is using assign (base) which creates variables and paste (base),
for (i in 1:10){ #or whatever range you want to use
#assign(name of var, the value of var (here 2i, but whatever function you need to pass))
assign(paste("var", i, sep = "_"), i*2) }
This does not need to a be function as it exports automatically variables to your environment, so if you use it inside the function, you will need to return it to put it to global environment.

How do I pass a function argument into map_df()?

I am trying to create a function to clean data and return as a data.frame in R.
I'm using the map_df() function to return the cleaned data as a data.frame, and have a function written to clean the data.
The first thing I do is pull a list of files from a folder, then iterate through them and clean each file. I have a pre-defined set specifying which column names to pull (stored in selectCols) in case of variation between files:
files <- list.files(filepath,full.names=F)
colInd <- which(names(fread(files[i],nrows=0)) %in% gsub("_","",selectCols))
I also have a function to clean my data, which uses fread() to read in the .csv files. It takes colInd and i as arguments to clean files iteratively.
cleanData <- function(files,i,colInd) {
addData <- fread(files[i],select=c(colInd))
[...]
}
Overall it looks like this (as a recursive function):
i <- 1
files <- list.files(filepath,full.names=F)
iterateCleaning <- function(files,i) {
colInd <- (which(names(fread(files[i],nrows=0)) %in% gsubs("_","",selectCols))
if (length(colInd)==length(selectCols)) {
newData <- map_df(files,cleanData)
saveToFolder(newData,i,files)
}
else {}
i=i+1
if (i<-length(files)){
iterateCleaning(files,i)
}
else {}
}
When I try to run without specifying the arguments for my function I get this error:
Error in fread(files,select=c(colInd)):
argument "colInd" is missing, with no default.
When I insert it into my map_df() I do it like so:
newData <- map_df(files,i,colInd,cleanData)
Then I get this error:
Error in as_mapper(.f,...): object 'colInd' not found.
Any suggestions for resolving this error? As I understand it, map_df() applies to each element in the function, but I don't need it applied to the i and colInd inputs, I just need them for the function I am calling in map_df(). How can I call map_df() on a function that requires additional arguments?
I read the documentation but it seemed a bit confusing. It says for a single-argument function to use "." and for two-argument functions to use .x and .y, but I'm not sure what it means. My initial guess is something like these, but neither line works):
newData <- map_df(files,cleanData,.i,.colInd)
newData <- map_df(files,cleanData,.x=i,.y=colInd)
Any recommendations? Will I have the same output if I just call map_df() afterwards on the output of my function?

R only specify optional parameters if specified

I have an R function with optional parameters like so:
myFunc <- function(
requiredParam,
optionalParam1 = optionalValue1,
optionalParam2 = optionalValue2,
...
optionalParamN = optionalValueN) {
# implementation
}
I have another function which calls this function and has the necessary parameters stored in a dataframe:
optionalParam1 optionalParam3 optionalParam10
1 "val1" "val2" "val3"
I only want to pass the optional parameters specified in the dataframe. For the others, I want it to use the default values. How can I accomplish this without typing up all permutations of optionalParameters existing/not existing?
Call the function using do.call (not knowing what your data.frame is called I will just assume you have a list or something of the parameters called myParams):
do.call(myFunc, as.list(myParams))
You can also build your function call as a string by parsing your dataframe column names and using paste.
Then, use eval(parse(text="your string"))

Get object name after passed to function, and used in for loop, in R

This question is asked here In R, how to get an object's name after it is sent to a function?
However, this doesn't work when in a for loop. For example the following method will write multiple dataframes to a postgresql database,
write_multiple_to_postgres <- function(list_of_frames) {
for(i in 1:length(list_of_frames)) {
object_name <- deparse(substitute(list_of_frames[[i]]))
write_to_postgresql(object_name, list_of_frames[[i]])
}
}
Where the list_of_frames looks like this:
my_list <- list(data_frame_susan, data_frame_bobby, data_frame_melissa)
....and is called as:
write_multiple_to_postgres(my_list)
I want the object_name to be passed as a string to the write_to_postgresql method. But instead I get the following outputs for object_name
my_list[[1L]],
my_list[[2L]],
my_list[[3L]]
Where what I want is:
data_frame_susan,
data_frame_bobby,
data_frame_melissa
How can I use "deparse(substitute) trick" or some other method to get the object name after being passed into a function and used in a for loop?
As you define your list my_list there is no way to get back df name back. You should use a named list such as my_list=list(data_frame_susan=data_frame_susan...) and then have a loop on names(my_list)
for (df in names(my_list)){
write_to_postgresql(df, my_list[[df]])
}
Now the question is how to prepare my_list with corresponding names but with what you say we don't know from where it comes from and how those data.frames are populated.

Writing a user-function to return column position, column name, mode and class for every variable

I need to write a user-defined function that, when applied to a data frame, will return the column position, the column name, the mode, and the class for each variable. I am able to create one that returns mode and class, but I keep getting errors when I include the position/name. I have been doing this,
myFunction <- function(x) {
data.frame(mode(x), class(x))
}
data.frame(names(myData), myFunction(myData))
and it returns the correct info, but it doesn't combine it into a single function I need. Any advice?
You can combine it as follows:
myFunction <- function(x)
data.frame(mode(x), class(x), cname=names(x), cpos=1:ncol(x))

Resources