How to apply global variables in this scenario in R - r

I have written a function that should make my script more streamlined, however I am having trouble with assigning variables or calling variables. In this case I want tax_count to be as if Domain_count was written outside the function. I have tried the solution by R: How can a function assign a value to a variable that will persist outside the function? but with no luck.
Here is my function and calling the function:
taxonomic_count <- function(tax_count, tax, col_num, data) {
tax_count <- aggregate(.~tax, data, sum)
tax_count$count <- rowSums(select(tax_count, col_num:7))
tax_count <- tax_count %>%
select(tax, count)
}
taxonomic_count(Domain_count, Domain, 2, Data1)
Help would be awesome!

Related

Use weighted.mean in summary_rows GT package

I've been searching around for a solution to using weighted.mean with summary_rows in GT package.
summary_rows function only accepts functions in form foo(x), therefore functions with more variables such as weighted.mean(x,w) is not accepted.
When using summary_rows with groups, such as:
summary_rows(groups = T, columns = c, fns = list("average" = ~mean(.)),...)
It takes vector of values for each group and then runs them through the mean() function, resp. the list of chosen functions.
My solution to this is quite cumbersome. I wrote my own custom function, that takes the vector of values provided by summary_rows and compares it to expected vectors using if statements. This only works for single columns at a time so it is quite a lot of code, both in the custom functions and in the code for the GT table.
weighted_mean_age <- function (x) {
if (all(x == some.data$age.column[some.data$group.column == "group name"])) {
weighted.mean(x, some.data$no.occurences[some.data$group.column == "group name"])
} else if (another vector) {
And so on for every group.
}
}
Did anyone deal with the same problem, but came up with less cumbersome solution? Did I miss something in the GT package?
Thank you for your time and ideas.
First I need to clarify the assumption that I used for this answer:
What you want is to pass something like weighted.mean(.,w) to this summary_rows() function.
However this isn't possible due to the problems with the gt library that you outlined in your question. If that is the case then I do believe I have a solution:
I've done some similar 'hacks' when I was creating some very specific Python scripts. It essentially revolved around mapping the functions that I wanted to use using some specific container. Thus I searched the R language sources if something like this is also possible in R and apparently it is using factory functions and storing them in some container. Here is a step by step guide:
You first need to create a factory function for your weighted.mean as such:
my_mean <- function(w) { function(x) { weighted.mean(x,w) } }
then you need to populate some kind of a container with your new functions (I am using a list):
func_list <- list()
func_list[[some_weight]] <- my_mean(some_weight)
func_list[[different_w]] <- my_mean(different_w)
#etc...
Once you've done that you should be able to pass this as a function to summary_rows i.e.:
summary_rows(
groups = T,
columns = c,
fns = list("w_mean" = ~func_list[w](.)),
...)
Bare in mind that you have to put the w values in yourself using some form of a mapping function or a loop.
Hope it is what you are looking for and I hope it helps!

R function won't save data to global environment

I'm writing a function that uses the Pushshift API to get data from subreddits, and iterates over the function to get more than the maximum amount of posts. So far it works and prints to the screen, but won't save the data frame to the environment. What am I doing wrong here? Bit of an R newbie so any help and explanations would be great!
Here is my code:
get_data_loop <- function(after, subreddit, iterations = c(1:2)) {
loopeddata <- as.data.frame(NULL)
for(i in iterations) {
after+1
data <- as.data.frame(fromJSON(paste("https://api.pushshift.io/reddit/search/comment/?after=", after, "d&subreddit=", subreddit, "&size=100&fields=body,author", sep = "")))
loopeddata <- rbind(data, loopeddata)
#Sleep for API
Sys.sleep(5)
}
print(i)
view(loopeddata)
}
You have to return the variable you are interested in from the function. In the last line of your function use return(loopeddata) and then run the function as
loopeddata <- get_data_loop(..your..options..)
Alternatively, you can, after the iterations, assign that variable from inside the function to a global variable. Last functon line could be but I would prefer the former soltuon which is cleaner:
assign("loopeddata", loopeddata, .GlobalEnv)

R function to reverse a survey item produces NULL

I'm still new to writing my own functions. As an exercise and because I use it alot, I want to write a flexible function to easily reverse survey response scales. This is what I came up with:
rev_scale = function(var, new_var, scale){
for (i in 1:length(abs(var))){
new_var[i] = scale-abs(var[i])+1
}
}
Info on code
var = variable I want to reverse.
new_var = new column with the reversed variable
scale = how many points in the scale (eg. 5 for a 5-point scale)
The reason why I use 'abs' instead of just 'var' is that some dataframes also return value-labels, and I only want the values in this function.
Question
When applying this new function on a variable, R returns "NULL". However, if I run the for-loop separately, with the arguments 'imputed', my new variable is properly reversed.
Any ideas on what is happening here?
Thanks in advance!
### Example of the (working) for-loop with arguments 'imputed' ###
df <- data.frame(matrix(ncol = 1, nrow = 4))
df$var = c(1,2,3,4)
for (i in 1:length(abs(df$var))){
df$var_rev[i] = 4-abs(df$var[i])+1
}
df$var_rev
OUTPUT:
[1] 4 3 2 1
R does not use reference-variables (think pointers)*. So your new_var outside of your function does not get updated when refered to inside a function. Instead, R creates a new copy of new_var and updates that.
You should instead return the new value from your function. I.e.
rev_scale = function(var, scale){
res <- vector('numeric', length(var))
for (i in 1:length(abs(var))){
res[i] = scale-abs(var[i])+1
}
return(res)
}
Also note that I have removed new_var from the function's arguments. In other words, I have completely separated the functions input-arguments from its output.
The reason you get a NULL from the function is that in R, all functions returns somethings. If not specified, the function will return the last value of the last statement, except when the last statement is a control structure (ifs, loops) - then it defaults to a NULL.
* There are a couple of exceptions and work-arounds, but I will not go into that here.
Edit:
As benimwolfspelz noted, you do not need to explicitly iterate over each element in var, as R does this implicitly. Your entire function could be reduced to:
rev_scale = function(var, scale) {
scale-abs(var)+1
}
Secondly, in your for-loop, your can simplify length(abs(var)) to length(var) as abs(var) does not change the length of the vector.

Progressr to display iteration in pmap in R

I have a rather complex pmap function in R and I would like to simply display what iteration number pmap is currently executing. I think the "progressr" package is likely what I want to use, but all of the documentation and examples I'm finding seem rather cumbersome. I have a dataframe of variables that I feed into my pmap function. The dataframe is called 'crosslist', and the pmap function is below:
library(purrr)
results <- pmap(crosslist, safely(function(variable1, variable2, ...., variable 10) {
#do a lot of calculations inside pmap that is a function of variable1, variable 2, ....., variable10
}, otherwise = "NA"))
I think I use the with_progress function (from library(progressr)) prior to {, but I'm having problems getting it to work. All I would like is for somewhere in R (likely the console) to display what iteration the pmap function is currently processing.
To use this with_progress, do I simply place it as follows:
results <- pmap(crosslist, safely(function(variable1, variable2, ...., variable 10) with_progress({
#do a lot of calculations inside pmap that is a function of variable1, variable 2, ....., variable10
}), otherwise = "NA"))
You can change your code from pmap to map and loop over index of each row which you can print in the console to know which line is current being processed. You can use it as :
library(purrr)
results <- map(seq(nrow(crosslist)), safely(function(i) {
print(i)
var <- crosslist[i, ]
#You can access variable1, variable2 by var[['variable1']], var[['variable2']]
}, otherwise = "NA"))

User defined function export multiple data frames into global environment

I have created a function, which computes the statistics on various patients data, and as well as outputting plots, it generates data frames containing summary statistics for each patient.
If i copy and run the function within R, the outputs are available to me. However, I am now calling the function from a separate R script, and the data frames are no longer available.
Is there any way to correct this?
For example,
test=function(a){
A=a
B=2*a
C=3*a
D=4*a
DF=data.frame(A,B,C,D)
}
a=c(1,2,3,4)
test(a)
This does not return DF, yet if I were to type:
a=c(1,2,3,4)
A=a
B=2*a
C=3*a
D=4*a
DF=data.frame(A,B,C,D)
Then clearly DF is returned. Is there a simple way to fix this so that DF becomes available from the test function?
Try:
test=function(a){
A=a
B=2*a
C=3*a
D=4*a
DF=data.frame(A,B,C,D)
}
a=c(1,2,3,4)
df<-test(a)
print(df)
By assigning the function's returned value to a new variable it is now accessible in the global space.
If you want to assign an object from within a function to the global environment for easy retrieval then your operators are "<<-" or "->>" for more info see:
?assignOps() i.e.
test <- function(a)
A=a
B=2*a
C=3*a
D=4*a
DF <<- data.frame(A,B,C,D)
}
# trial your dummy data
a=c(1,2,3,4)
test(a)
DF
Hey presto ... it works! Writing return(DF) within the function will not deliver your data frame to the global environment.

Resources