Progressr to display iteration in pmap in R - r

I have a rather complex pmap function in R and I would like to simply display what iteration number pmap is currently executing. I think the "progressr" package is likely what I want to use, but all of the documentation and examples I'm finding seem rather cumbersome. I have a dataframe of variables that I feed into my pmap function. The dataframe is called 'crosslist', and the pmap function is below:
library(purrr)
results <- pmap(crosslist, safely(function(variable1, variable2, ...., variable 10) {
#do a lot of calculations inside pmap that is a function of variable1, variable 2, ....., variable10
}, otherwise = "NA"))
I think I use the with_progress function (from library(progressr)) prior to {, but I'm having problems getting it to work. All I would like is for somewhere in R (likely the console) to display what iteration the pmap function is currently processing.
To use this with_progress, do I simply place it as follows:
results <- pmap(crosslist, safely(function(variable1, variable2, ...., variable 10) with_progress({
#do a lot of calculations inside pmap that is a function of variable1, variable 2, ....., variable10
}), otherwise = "NA"))

You can change your code from pmap to map and loop over index of each row which you can print in the console to know which line is current being processed. You can use it as :
library(purrr)
results <- map(seq(nrow(crosslist)), safely(function(i) {
print(i)
var <- crosslist[i, ]
#You can access variable1, variable2 by var[['variable1']], var[['variable2']]
}, otherwise = "NA"))

Related

R function to reverse a survey item produces NULL

I'm still new to writing my own functions. As an exercise and because I use it alot, I want to write a flexible function to easily reverse survey response scales. This is what I came up with:
rev_scale = function(var, new_var, scale){
for (i in 1:length(abs(var))){
new_var[i] = scale-abs(var[i])+1
}
}
Info on code
var = variable I want to reverse.
new_var = new column with the reversed variable
scale = how many points in the scale (eg. 5 for a 5-point scale)
The reason why I use 'abs' instead of just 'var' is that some dataframes also return value-labels, and I only want the values in this function.
Question
When applying this new function on a variable, R returns "NULL". However, if I run the for-loop separately, with the arguments 'imputed', my new variable is properly reversed.
Any ideas on what is happening here?
Thanks in advance!
### Example of the (working) for-loop with arguments 'imputed' ###
df <- data.frame(matrix(ncol = 1, nrow = 4))
df$var = c(1,2,3,4)
for (i in 1:length(abs(df$var))){
df$var_rev[i] = 4-abs(df$var[i])+1
}
df$var_rev
OUTPUT:
[1] 4 3 2 1
R does not use reference-variables (think pointers)*. So your new_var outside of your function does not get updated when refered to inside a function. Instead, R creates a new copy of new_var and updates that.
You should instead return the new value from your function. I.e.
rev_scale = function(var, scale){
res <- vector('numeric', length(var))
for (i in 1:length(abs(var))){
res[i] = scale-abs(var[i])+1
}
return(res)
}
Also note that I have removed new_var from the function's arguments. In other words, I have completely separated the functions input-arguments from its output.
The reason you get a NULL from the function is that in R, all functions returns somethings. If not specified, the function will return the last value of the last statement, except when the last statement is a control structure (ifs, loops) - then it defaults to a NULL.
* There are a couple of exceptions and work-arounds, but I will not go into that here.
Edit:
As benimwolfspelz noted, you do not need to explicitly iterate over each element in var, as R does this implicitly. Your entire function could be reduced to:
rev_scale = function(var, scale) {
scale-abs(var)+1
}
Secondly, in your for-loop, your can simplify length(abs(var)) to length(var) as abs(var) does not change the length of the vector.

How to apply global variables in this scenario in R

I have written a function that should make my script more streamlined, however I am having trouble with assigning variables or calling variables. In this case I want tax_count to be as if Domain_count was written outside the function. I have tried the solution by R: How can a function assign a value to a variable that will persist outside the function? but with no luck.
Here is my function and calling the function:
taxonomic_count <- function(tax_count, tax, col_num, data) {
tax_count <- aggregate(.~tax, data, sum)
tax_count$count <- rowSums(select(tax_count, col_num:7))
tax_count <- tax_count %>%
select(tax, count)
}
taxonomic_count(Domain_count, Domain, 2, Data1)
Help would be awesome!

How to subset rows with strings

I want to use function for repetitively making up set with different names.
for example, if I have 5 random vectors.
number1<-sample(1:10, 3)
number2<-sample(1:10, 3)
number3<-sample(1:10, 3)
number4<-sample(1:10, 3)
number5<-sample(1:10, 3)
Then, I will use these vectors for selecting rows in raw data set(i.e. dataframe)
testset1<-raw[number1,]
testset2<-raw[number2,]
testset3<-raw[number3,]
tsetset4<-raw[number4,]
testset5<-raw[number5,]
It takes lot of spaces in manuscript for writing up each commands. I'm trying to shorten these commands with using 'function'
However, I found that it is hard to use variables in a function statement for writing 'text argument'. For example, it is easy to use variables like this.
mean_function<-function(x){
mean(x)
}
But, I want to use function like this.
testset "number with 1-5" <-raw[number"number 1-5",]
I would really appreciate your help.
You don't need to create a function for this task, simply use lapply to loop over the list of elements produced by mget(), then set some names and finally put all results in the global environment:
rowSelected <-lapply(mget(paste0("number", 1:5)), function(x) raw[x, ])
names(rowSelected) <- paste0("testset", 1:5)
list2env(rowSelected, envir = .GlobalEnv)

get() not working for column in a data frame in a list in R (phew)

I have a list of data frames. I want to use lapply on a specific column for each of those data frames, but I keep throwing errors when I tried methods from similar answers:
The setup is something like this:
a <- list(*a series of data frames that each have a column named DIM*)
dim_loc <- lapply(1:length(a), function(x){paste0("a[[", x, "]]$DIM")}
Eventually, I'll want to write something like results <- lapply(dim_loc, *some function on the DIMs*)
However, when I try get(dim_loc[[1]]), say, I get an error: Error in get(dim_loc[[1]]) : object 'a[[1]]$DIM' not found
But I can return values from function(a[[1]]$DIM) all day long. It's there.
I've tried working around this by using as.name() in the dim_loc assignment, but that doesn't seem to do the trick either.
I'm curious 1. what's up with get(), and 2. if there's a better solution. I'm constraining myself to the apply family of functions because I want to try to get out of the for-loop habit, and this name-as-list method seems to be preferred based on something like R- how to dynamically name data frames?, but I'd be interested in other, more elegant solutions, too.
I'd say that if you want to modify an object in place you are better off using a for loop since lapply would require the <<- assignment symbol (<- doesn't work on lapply`). Like so:
set.seed(1)
aList <- list(cars = mtcars, iris = iris)
for(i in seq_along(aList)){
aList[[i]][["newcol"]] <- runif(nrow(aList[[i]]))
}
As opposed to...
invisible(
lapply(seq_along(aList), function(x){
aList[[x]][["newcol"]] <<- runif(nrow(aList[[x]]))
})
)
You have to use invisible() otherwise lapply would print the output on the console. The <<- assigns the vector runif(...) to the new created column.
If you want to produce another set of data.frames using lapply then you do:
lapply(seq_along(aList), function(x){
aList[[x]][["newcol"]] <- runif(nrow(aList[[x]]))
return(aList[[x]])
})
Also, may I suggest the use of seq_along(list) in lapply and for loops as opposed to 1:length(list) since it avoids unexpected behavior such as:
# no length list
seq_along(list()) # prints integer(0)
1:length(list()) # prints 1 0.

plyr within loop: unexpected behavior

I found an odd issue with plyr when using it inside a loop.
What I want to perform with this script is to iterate the plyr function with different input values (provided by the for loop) and store the results as a list of data.frames.
k=as.factor(c(rep("a",2), rep("b",2), rep("c",2), rep("d",2), rep("e",2)))
indata=data.frame(k)
outdata<-list()
for (i in 1:10){
tempdata<-ddply(.data = indata, .variables = .(k), .fun = summarize, i=i)
data[[i]]<-tempdata
rm(tempdata)
}
data
I would expect it to produce a list of data.frames each produced within a single iteration of the loop, and therefore a single value of the loop variable.
What happens instead is that each of the data.frames looks identical, with each row having a sequential value of the loop variable.
Storing the loop variable into a separate one makes it work, but seems like an awkward workaround.
k=as.factor(c(rep("a",2), rep("b",2), rep("c",2), rep("d",2), rep("e",2)))
indata=data.frame(k)
outdata<-list()
for (i in 1:10){
z=i
tempdata<-ddply(.data = indata, .variables = .(k), .fun = summarize, i=i, z=z)
data[[i]]<-tempdata
rm(tempdata)
}
data
Any ideas on what's causing this odd behavior?
This is a scoping issue. Functions within ddply (I believe llply) use i as a local variable and that's before your i in the search path. The easiest fix would be using j as the iterator:
for (j in 1:10)
However, I have no idea why you use ddply in your example. It doesn't seem necessary, so I assume it's only a toy example.

Resources