Using readline() within a for loop - r

I have a function, as follows:
f.factor <- function(x) {
print(length(unique(x)))
z <- 1
for (i in 1:length(unique(x))) {
z[i] <- readline(":")
}
x <- factor(x, labels=c(z))
return(x)
}
Essentially, it allows me to copy/paste/type or just simply write into my script the factors for a particular variable without having to type c("..","...") a million times.
I've run into a problem when I try to use this function in a loop, perhaps the loop structure will not allow lines to be read within the loop?
for(i in 1:ncol(df.)) {
df[,paste("q4.",i,sep="")] <- f.factor(df[,paste("q4.",i,sep="")])
Never Heard of
Heard of but Not at all Familiar
Somewhat Familiar
Familiar
Very Familiar
Extremely Familiar
}
In the end, I'm looking for a way to specify the factor label without having to rewrite it over and over.

That was only working before because when you pasted all the code in at the top level it was executed immediately and the readline() call used the following N lines. In a function, or any control structure, it will try to parse it as R code which will fail.
A multiline string can stand in for a passable heredoc:
lvls = strsplit('
Never Heard of
Heard of but Not at all Familiar
Somewhat Familiar
Familiar
Very Familiar
Extremely Familiar
', '\n')[[1]][-1]

Instead of the for loop you can just use scan without a file name (and what='' and possibly sep='\n'.
> tmp <- scan(what='', sep='\n')
1: hello there
2: some more
3:
Read 2 items
> tmp
[1] "hello there" "some more"
>

Related

return value of if statement in r

So, I'm brushing up on how to work with data frames in R and I came across this little bit of code from https://cloud.r-project.org/web/packages/data.table/vignettes/datatable-intro.html:
input <- if (file.exists("flights14.csv")) {
"flights14.csv"
} else {
"https://raw.githubusercontent.com/Rdatatable/data.table/master/vignettes/flights14.csv"
}
Apparently, this assigns the strings (character vectors?) in the if and else statements to input based on the conditional. How is this working? It seems like magic. I am hoping to find somewhere in the official R documentation that explains this.
From other languages I would have just done:
if (file.exists("flights14.csv")) {
input <- "flights14.csv"
} else {
input <- "https://raw.githubusercontent.com/Rdatatable/data.table/master/vignettes/flights14.csv"
}
or in R there is ifelse which also seems designed to do exactly this, but somehow that first example also works. I can memorize that this works but I'm wondering if I'm missing the opportunity to understand the bigger picture about how R works.
From the documentation on the ?Control help page under "Value"
if returns the value of the expression evaluated, or NULL invisibly if none was (which may happen if there is no else).
So the if statement is kind of like a function that returns a value. The value that's returned is the result of either evaulating the if or the then block. When you have a block in R (code between {}), the brackets are also like a function that just return the value of the last expression evaluated in the block. And a string literal is a valid expression that returns itself
So these are the same
x <- "hello"
x <- {"hello"}
x <- {"dropped"; "hello"}
x <- if(TRUE) {"hello"}
x <- if(TRUE) {"dropped"; "hello"}
x <- if(TRUE) {"hello"} else {"dropped"}
And you only really need blocks {} with if/else statements when you have more than one expression to run or when spanning multiple lines. So you could also do
x <- if(TRUE) "hello" else "dropped"
x <- if(FALSE) "dropped" else "hello"
These all store "hello" in x
You are not really missing anything about the "big picture" in R. The R if function is atypical compared both to other languages as well as to R's typical behavior. Unlike most functions in R which do require assignment of their output to a "symbol", i.e a proper R name, if allows assignments that occur within its consequent or alternative code blocks to occur within the global environment. Most functions would return only the final evaluation, while anything else that occurred inside the function body would be garbage collected.
The other common atypical function is for. R for-loops only
retain these interior assignments and always return NULL. The R Language Definition calls these atypical R functions "control structures". See section 3.3. On my machine (and I suspect most Linux boxes) that document is installed at: http://127.0.0.1:10731/help/doc/manual/R-lang.html#Control-structures. If you are on another OS then there is probably a pulldown Help menu in your IDE that will have a pointer to it. Thew help document calls them "control flow constructs" and the help page is at ?Control. Note that it is necessary to quote these terms when you wnat to access that help page using one of those names since they are "reserved words". So you would need ?'if' rather than typing ?if. The other reserved words are described in the ?Reserved page.
?Control
?'if' ; ?'for'
?Reserved
# When you just type:
?if # and hit <return>
# you will see a "+"-sign which indicateds an incomplete expression.
# you nthen need to hit <escape> to get back to a regular R interaction.
In R, functions don't need explicit return. If not specified the last line of the function is automatically returned. Consider this example :
a <- 5
b <- 1
result <- if(a == 5) {
a <- a + 1
b <- b + 1
a
} else {b}
result
#[1] 6
The last line in if block was saved in result. Similarly, in your case the string values are "returned" implicitly.

Using strings from loops as parts of function commands and variable names in R

How does one use the string coming from a loop
- to generate new variables
- as a part of functions commands
- as functions' arguments
- as a part of if statements
in R?
Specifically, as an example (the code obviously doesn't work, but I'd like to have something not less intelligible than what is bellow),
list_dist <- c("unif","norm")
for (dist in list_dist){
paste("rv",dist,sep="") = paste("r",dist,sep="")(100,0,1)
paste("meanrv",dist,sep="") = mean(paste("rv",dist,sep=""))
if (round(paste("meanrv",dist,sep=""),3) != 0){
print("Not small enough")
}
}
Note: This is an example and I do need to use kind of loops to avoid writing huge scripts.
I managed to use strings as in the example above but only with eval/parse/text/paste and combining the whole statement (i.e. the whole "line") inside paste, instead of pasting only in the varname part or the function part, which makes code ugly and illegible and coding inefficient.
Other available replies to similar questions which I've seen are not specific as in how to deal with this sort of usage of strings from loops.
I'm sure there must be a more efficient and flexible way to deal with this, as there is in some other languages.
Thanks in advance!
Resist the temptation of creating variable names programmatically. Instead, structure your data properly into lists:
list_dist = list(unif = runif, norm = rnorm)
distributions = lapply(list_dist, function (f) f(100, 0, 1))
means = unlist(lapply(distributions, mean))
# … etc.
As you can see, this also gets rid of the loop, by using list functions instead.
Your last step can also be vectorised:
if (any(round(means, 3) != 0))
warning('not small enough')
try this:
list_dist <- list(unif = runif,norm = rnorm)
for (i in 1:length(list_dist)){
assign(paste("rv",names(list_dist)[i],sep=""), list_dist[[i]](100,0,1))
assign(paste("meanrv",names(list_dist)[i],sep=""),mean(get(paste("rv",names(list_dist)[i],sep=""))))
if (round(get(paste("meanrv",names(list_dist)[i],sep="")),3) != 0){
print("Not small enough")
}
}

FOR loops giving no result or error in R

I am running the following code:
disc<-for (i in 1:33) {
m=n[i]
xbar<-sum(data[i,],na.rm=TRUE)/m
Sx <- sqrt(sum((data[i,]-xbar)^2,na.rm=TRUE)/(m-1))
Sx
i=i+1}
Running it:
>disc
NULL
Why is it giving me NULL?
This is from the documentation for for, accessible via ?`for`:
‘for’, ‘while’ and ‘repeat’ return ‘NULL’ invisibly.
Perhaps you are looking for something along the following lines:
library(plyr)
disc <- llply(1:33, function(i) {
m=n[i]
xbar<-sum(data[i,],na.rm=TRUE)/m
Sx <- sqrt(sum((data[i,]-xbar)^2,na.rm=TRUE)/(m-1))
Sx
})
Other variants exists -- the ll in llply stands for "list in, list out". Perhaps your intended final result is a data frame or an array -- appropriate functions exist.
The code above is a plain transformation of your example. We might be able to do better by splitting data right away and forgetting the otherwise useless count variable i (untested, as you have provided no data):
disc <- daply(cbind(data, n=n), .(), function(data.i) {
m=data.i$n
xbar<-sum(data.i,na.rm=TRUE)/m
sqrt(sum((data.i-xbar)^2,na.rm=TRUE)/(m-1))
})
See also the plyr website for more information.
Related (if not a duplicate): R - How to turn a loop to a function in R
krlmlr's answer shows you how to fix your code, but to explain your original problem in more abstract terms: A for loop allows you to run the same piece of code multiple times, but it doesn't store the results of running that code for you- you have to do that yourself.
Your current code only really assigns a single value, Sx, for each run of the for loop. On the next run, a new value is put into the Sx variable, so you lose all the previous values. At the end, you'll just end up with whatever the value of Sx was on the last run through the loop.
To save the results of a for loop, you generally need to add them to a vector as you go through, e.g.
# Create the empty results vector outside the loop
results = numeric(0)
for (i in 1:10) {
current_result = 3 + i
results = c(results, current_result)
}
In R for can't return a value. The unique manner to return a value is within a function. So the solution here, is to wrap your loop within a function. For example:
getSx <- function(){
Sx <- 0
disc <- for (i in 1:33) {
m=n[i]
xbar <- sum(data[i,],na.rm=TRUE)/m
Sx <- sqrt(sum((data[i,]-xbar)^2,na.rm=TRUE)/(m-1))
}
Sx
}
Then you call it:
getSx()
Of course you can avoid the side effect of using a for by lapply or by giving a vectorized But this is another problem: You should maybe give a reproducible example and explain a little bit what do you try to compute.

Creating Automatin in R

I have created a script that analyzes a set of raw data and converts it into many different formats based on different parameters and functions. I have 152 more raw data sheets to go, but all I will have to do is use my script on each one. However, there will be times that I might decide I need to change a variable or parameter and I would like to come up with a parameter list at the top of my spreadsheet that would affect the rest of the functions in my soon to be very large script.
Global variables aren't the answer to this problem, this is best illustrated through this example:
exceedes <- function (L=NULL, R=NULL)
{
if (is.null(L) | is.null(R))
{
print ("mycols: invalid L,R.")
return (NULL)
}
options (na.rm = TRUE)
test <-(mean(L, na.rm=TRUE)-R*sd(L,na.rm=TRUE))
test1 <- ifelse(is.na(L), NA, ifelse(L > test, 1, 0))
return (test1)
}
L=ROCC[,2]
R=.08
ROCC$newcolumn <- exceedes(L,R)
names(ROCC)[names(ROCC)=="newcolumn"]="Exceedes1"
L=ROCC[,2]
R=.16
ROCC$newcolumn <- exceedes(L,R)
names(ROCC)[names(ROCC)=="newcolumn"]="Exceedes2"
L=ROCC[,2]
R=.24
ROCC$newcolumn <- exceedes(L,R)
names(ROCC)[names(ROCC)=="newcolumn"]="Exceedes3"
So in the above example, I would like to have a way at the top of my script to change the range of R and have it affect the rest of the script because this function will be repeated 152 times. The only way I can think of doing it is to copy and paste the function over and over with a different variable each time, and set it globally. But I have to imagine there is a simpler way, my function possibly needs to be rearranged perhaps?
File names and output names. I am not sure whether this is possible but say for example that all my input.csv's come in a format where one dataset will be titled 123 another will be 124, another 125 etc. and then have R know to take the very next dataset, and then output that dataset to a specific folder on my computer without me having to actually type in read.csv(file="123.csv"), and then write.csv(example, file="123.csv) and so on?
General formatting of automation script
Before I dive into my automation, my procedure was going to be to copy and past the script 152 times over and then change the filename and output name for each one. This sounds ridiculous, but with my lack of programming skills I am not sure a better way to change it. Any ideas?
Thanks for all the help in advance.
You can rerun the function with different parameters by constructing a vector of paremters (say R)
R <- c(seq(0.1, 1, by = 0.01))
and then run your exceedes function length(R) times using sapply.
exceedes <- function(R, L) {} #notice the argument order
sapply(X = R, FUN = exceedes, L = ROCC[, 2])
You can pass other arguments to your function (e.g. file.name) and use it to create whatever file name you need.

Verbatim command arguments: deparse(substitute(foo)) in a wrapper

Here's a little puzzler for those fluent in the nitty-gritty of how the R evaluator handles function calls. Suppose I wanted to write a function that takes an R statement, same as what I'd write at the command line, and echoes both it, and the evaluated result. Example:
> p.eval(sum(1:3))
sum(1:3) --> 6
That's easy; here's the definition of p.eval():
p.eval <- function(v,prefix="--> ") {
cmd <- deparse(substitute(v)); cat(cmd,prefix,v,"\n")
}
But suppose I now want to write a wrapper around p.eval, to be invoked the same way; perhaps as a somewhat demented binary operator with a dummy second argument:
%PE% <- function(x,...) p.eval(x)
I'd like to invoke it like so: sum(1:3) %PE% 0 should be equivalent to the old p.eval(sum(1:3)). This doesn't work, of course, because the deparse(substitute()) of p.eval() now gives x.
Question to the enlightened: is there a way to make this work as I desire?.. For this particular usage, I'm quite fine with defining %PE% by copying/pasting the one-liner definition of p.eval, so this question is mostly academic in nature. Maybe I'll learn something about the nitty-gritty of the R evaluator :)
P.S.: Why might one find the above functions useful?.. Suppose I develop some analysis code and invoke it non-interactively through org-babel (which is most definitely worth playing with if you are an Org-mode and/or an Emacs user). By default, org-babel slurps up the output as things are evaluated in the interpreter. Thus, if I want to get anything but raw numbers, I have to explicitly construct strings to be printed through cat or paste, but who wants to do that when they are flying through the analysis?.. The hack above allows you to simply append %PE%0 after a line that you want printed, and this echoes the command to the org output.
Try this:
> "%PE%" <- function(x, ...) do.call(p.eval, list(substitute(x)))
> sum(1:3) %PE% 0
sum(1:3) --> 6
Also could just have p.eval return "v" and then:
p.eval <- function(v,prefix="--> ") {
cmd <- deparse(substitute(v)); cat(cmd,prefix,v,"\n") ; return(v) }
"%PE%" <- function(x, y=NULL) x
sum(1:3) %PE% Inf
#[1] 6
sum(1:3) %PE% # won't accept single argument
r # give it anything
#[1] 6

Resources