I'm still new to writing my own functions. As an exercise and because I use it alot, I want to write a flexible function to easily reverse survey response scales. This is what I came up with:
rev_scale = function(var, new_var, scale){
for (i in 1:length(abs(var))){
new_var[i] = scale-abs(var[i])+1
}
}
Info on code
var = variable I want to reverse.
new_var = new column with the reversed variable
scale = how many points in the scale (eg. 5 for a 5-point scale)
The reason why I use 'abs' instead of just 'var' is that some dataframes also return value-labels, and I only want the values in this function.
Question
When applying this new function on a variable, R returns "NULL". However, if I run the for-loop separately, with the arguments 'imputed', my new variable is properly reversed.
Any ideas on what is happening here?
Thanks in advance!
### Example of the (working) for-loop with arguments 'imputed' ###
df <- data.frame(matrix(ncol = 1, nrow = 4))
df$var = c(1,2,3,4)
for (i in 1:length(abs(df$var))){
df$var_rev[i] = 4-abs(df$var[i])+1
}
df$var_rev
OUTPUT:
[1] 4 3 2 1
R does not use reference-variables (think pointers)*. So your new_var outside of your function does not get updated when refered to inside a function. Instead, R creates a new copy of new_var and updates that.
You should instead return the new value from your function. I.e.
rev_scale = function(var, scale){
res <- vector('numeric', length(var))
for (i in 1:length(abs(var))){
res[i] = scale-abs(var[i])+1
}
return(res)
}
Also note that I have removed new_var from the function's arguments. In other words, I have completely separated the functions input-arguments from its output.
The reason you get a NULL from the function is that in R, all functions returns somethings. If not specified, the function will return the last value of the last statement, except when the last statement is a control structure (ifs, loops) - then it defaults to a NULL.
* There are a couple of exceptions and work-arounds, but I will not go into that here.
Edit:
As benimwolfspelz noted, you do not need to explicitly iterate over each element in var, as R does this implicitly. Your entire function could be reduced to:
rev_scale = function(var, scale) {
scale-abs(var)+1
}
Secondly, in your for-loop, your can simplify length(abs(var)) to length(var) as abs(var) does not change the length of the vector.
Related
I want to concatenate iris$SepalLength, so I can use that in a function to get the Sepal Length column from iris data frame. But when I use paste function paste("iris$", colnames(iris[3])), the result is as characters (with quotes), as "iris$SepalLength". I need the result not as a character. I have tried noquotes(), as.datafram() etc but it doesn't work.
freq <- function(y) {
for (i in iris) {
count <-1
y <- paste0("iris$",colnames(iris[count]))
data.frame(as.list(y))
print(y)
span = seq(min(y),max(y), by = 1)
freq = cut(y, breaks = span, right = FALSE)
table(freq)
count = count +1
}
}
freq(1)
The crux of your problem isn't making that object not be a string, it's convincing R to do what you want with the string. You can do this with, e.g., eval(parse(text = foo)). Isolating out a small working example:
y <- "iris$Sepal.Length"
data.frame(as.list(y)) # does not display iris$Sepal.Length
data.frame(as.list(eval(parse(text = y)))) # DOES display iris.$Sepal.Length
That said, I wanted to point out some issues with your function:
The input variable appears to not do anything (because it is immediately overwritten), which may not have been intended.
The for loop seems broken, since it resets count to 1 on each pass, which I think you didn't mean. Relatedly, it iterates over all i in iris, but then it doesn't use i in any meaningful way other than to keep a count. Instead, you could do something like for(count in 1 : length(iris) which would establish the count variable and iterate it for you as well.
It's generally better to avoid for loops in R entirely; there's a host of families available for doing functions to (e.g.) every column of a data frame. As a very simple version of this, something like apply(iris, 2, table) will apply the table function along margin 2 (the columns) of iris and, in this case, place the results in a list. The idea would be to build your function to do what you want to a single vector, then pass each vector through the function with something from the apply() family. For instance:
cleantable <- function(x) {
myspan = seq(min(x), max(x)) # if unspecified, by = 1
myfreq = cut(x, breaks = myspan, right = FALSE)
table(myfreq)
}
apply(iris[1:4], 2, cleantable) # can only use first 4 columns since 5th isn't numeric
would do what I think you were trying to do on the first 4 columns of iris. This way of programming will be generally more readable and less prone to mistakes.
In R one can use the <<- symbol within the lapply() function to assign a value to a variable outside lapply().
Let's consider a matrix full of 1:
m<-matrix(data=1, nrow=5, ncol=5)
Let's say I want to replace each row by the values 1,2,3,4 and 5 using the assignation symbol <<-. I can use the function the lapply function (it is not the designed function for that kind of operation, this is only an example):
lapply(X = seq(nrow(m)), FUN = function(r){
m[r,]<<-seq(5)
})
This will work.
But if I now use mclapply like this:
mclapply(X = seq(nrow(m)), FUN = function(r){
m[r,]<<-seq(5)
})
The matrix m will remain full of 1.
The idea is to apply changes to rows of a matrix, without creating a new one, but rather assigning them in the existing one. The only constrain is to use a function from the parallel package (e.g. mclapply(), but maybe another function would better fit).
Also using the <<- symbol is not mandatory.
How can I do that ?
You can't assign in parallel, as you're just assigning to a local copy of the matrix.
Two solutions:
Use shared memory (e.g. matrices on disk using package {bigstatsr}; disclaimer: I'm the author)
Don't assign in the first place. Just run the lapply(), get all the results parts as a list and use do.call("rbind", list).
How about this, using the future package
library(future)
plan(multiprocess)
m <- matrix(data = 1, nrow = 5, ncol = 5)
# we create a set of futures, so the values are calculated in parallele and
# not sent back to the main environment
fs <- lapply(seq(nrow(m)), function(x) future(seq(5) + x))
# when then pull the values one by one and apply them where they belong
for (i in seq(nrow(m))) {
m[i, ] <- value(fs[[i]])
}
# or the same way you did it:
lapply(X = seq(nrow(m)), FUN = function(r){
m[r,] <<- value(fs[[r]])
})
The drawback here is that the value are assigned sequentially but at least they are calculated in parallel. But, I don't think you intend to use the matrix before all calculations are done anyway.
So I currently have several functions where I want to modify a matrix that was created outside the function, in order to use it as a counting variable for things that happen inside the function. The matrix is named cost_counter, and I want to add to it when certain events occur inside of multiple functions. However, I'd like the solution to still be able to be used in foreach() and be parallelizable.
I know that using <<- is not recommended, however I can't figure out how to use assign() to modify an existing matrix. Example code is below. I've defined the variable cost_counter at the beginning. The function below goes on for longer, but I'm just including the first part for an example of what is happening.
cost_counter <<- matrix(0,nrow = 2, ncol = 12*15)
I0 <- function(){
if (screen[i] == 1){
cost_counter[2,ages[i]] <<- 1 + cost_counter[2,ages[i]] + 1
if(HIV[i] == 1){
if(ages[i] > 35){
if(pv[(i-min_i+1),1] < (1-specP3)){
cost_counter[1,ages[i]] <<- cost_counter[1,ages[i]] + 1
if(contact[i] == 1){return(c(5,0))}
}
}
When I run, error message simply says
"Error in cost_counter[2, ages[i]] <<- cost_counter[2, ages[i]] + 1 :
object 'cost_counter' not found"
I would just like to be able to modify the matrix, and for it to be recognized.
Any help would be appreciated. Thanks!
just at the end f the function re assign the function's matrix to the global env using:
cost_counter <<- cost_counter
I have a question here about the print() inside a for loop.
I have a dataset (gpa) with 2 columns. I am trying to get mean, variance, and standard deviation of values inside the two columns. When I code,
for(x in c(1:2)) {
mean(gpa[[x]])
var(gpa[[x]])
sd(gpa[[x]])
}
I don't get any output:
for(x in c(1:2)) {
print(mean(gpa[[x]]))
print(var(gpa[[x]]))
print(sd(gpa[[x]]))
}
But If i insert print before each of the lines, I do get the desired values.
What is the difference here? Is print really necessary?
The reason for this is, that all the stuff inside the loop gets evaluated, but never returned to somewhere. Though print deliver something I wouldn't advise it, because you can't use these values later, because they get returned to the console rather then the global environment. Instead you might want to assign them to something.
For example:
#example data
df <- data.frame(x = 1:10, y = rnorm(10))
#it is good to create the output in the desired length first
#it is much more efficient in terms of speed an memory beeing used but here it don't really matter
ret <- vector("list", NCOL(df))
for(x in seq_len(NCOL(df))){
ret[[x]] <- c(mean(df[[x]]),
var(df[[x]]),
sd(df[[x]]))
}
ret
Though I don't advise that neither. For the most (if not all things) you can do with a for loop you can use the apply family of functions from base r or the map function from purrr. This would look that way:
library(purrr)
map(df, ~c(mean(.), var(.), sd(.)))
#or even save it with names
ret <- map(df, ~c(mean = mean(.), var = var(.), sd = sd(.)))
ret
The apply/map variants are faster & shorter, but more importantly for me easier to understand and have less room for errors. Though there are a hole bunch of other arguments why you might want to use apply/map.
Thank you for trying to help. I am happy to be corrected on all R misdemeanors.
I am not sure that I was entirely clear with my earlier post as below, so I will hope to clarify:
In the R console, my calls 'use source (etc)' to a .R file
Code within the .R file uses variables (for e.g. 'extracted info' ) ex1, ex2, ex3. These may hold strings or (a string of) numbers pulled from text.
In line with your guidance I've renamed my function to 'reset' (and ?reset indicates no other occurrences) are in scope. I'm passing both x and y which from outside the function:
#send variables ex1, ex2, ex3 together with location, loc and parse, prs to be reset with 0
reset(x<-c(loc,prs,ex1,ex2,ex3),y<-rep(c(0),length(x))) #repeats 0 in y variable as many times as there are entries for x
reset<-function(x,y){
print(c("resetting ",x," with ", y))
if (length(x) == length(y)) {x <- y
print(paste(x,"=",y),sep="") #both x and y should now be equal (to y)
} else {
paste("list lengths differ: x=",length(x)," y=",length(y),sep="")
}
}
Now both x and y are 0 but ex1, ex2 and ex3 still contain the previous values
I would like ex1, ex2 and ex3 all to be 0 before they are used in a subsequent section of code, so they don't contaminate extracted data with previous values such as:
loc<-str_locate(data[i],"=")
prs<-str_locate(data[i],",")
#extract data from the end of loc to before the occurrence of prs
ex1<-str_sub(data[i],loc[2]+1,prs[1]-1)
#cleanup
#below is simplified for example;
#in reality I wish to send ex1:ex(n) to be reset with values val1:val(n)
The desired outcome would be that back in the Rconsole >ex1 should now return 0.
Hope you can understand my dilemma and possibly help.
Say my code uses some variables to hold data extracted from a string using Stringr str_sub. The variables are temporary in that I use the values to construct other strings then they should be freed up to be used in an upcoming test: i.e. if (test==true){extract<-str_sub(string, start, end)}
For a later test, I would like extract==0; simple enough, but I have a few of these and would like to do it in one fell swoop.
I've used a for loop, but if there is a simpler way, please identify this.
My attempt is using a function:
#For variables loc, prs, ex1 and x2, set all values to 0
x<-assign(x<-c(loc, prs, ex1, ex2),y<-rep(c(0),length(x)))
#Function
assign <- function(x, y) {
if(length(x)==length(y)){
for (i in 1:length(x)){x[i]<-y[i]}
print(c("Assigned",x[i]))
return (x)
} else { print (c("list lengths differ: x=",length(x)," y=",length(y)))
}
}
The problem being that this returns x as 0, but the list of variables retain their values.
I'm a bit of a noob to both r and SO, so although I've benefitted from SO's bountiful advice on numerous occasions, this is my first question, so please be gentle. I have searched this issue, but have not found what I need in a few hours now. Hope you can help.
Beware of naming a function assign. There is already one in base-r and you will create confusion.
There are a couple of problems with your function besides its name. First, you do not need the for-loop to replace x by y, as this is a basic vectorized operation. Just use x <- y ; second, your should wrap your message in paste.
asgn <- function(x, y) {
if(length(x)==length(y)){
## This step is not needed, return(y) is better as #Rick proposed in their now deleted answer
## I am leaving it to show you how the for-loop is not needed
x<-y
return (x)
} else {
print (paste("list lengths differ: x=",length(x)," y=",length(y)))
return(x)
}
}
Then, there are a couple of problems with your function call. You use <- instead of = to specify the arguments. They are only somewhat synonymous for assigning variables, but a function argument is another matter. Finally, you are trying to use x is the definition of y in the arguments (length(x)), but this is not possible, because it is not yet defined, so it is looking for x in the parent environment. You should test your function with length(3) instead.
x<-asgn(x=c(loc, prs, ex1, ex2),y=rep(c(0),length(3)))