Retrieving Variable Declaration - r

How can I find how did I first declare a certain variable when I am a few hundred
lines down from where I first declared it. For example, I have declared the following:
a <- c(vectorA,vectorB,vectorC)
and now I want to see how I declared it. How can I do that?
Thanks.

You could try using the history command:
history(pattern = "a <-")
to try to find lines in your history where you assigned something to the variable a. I think this matches exactly, though, so you may have to watch out for spaces.
Indeed, if you type history at the command line, it doesn't appear to be doing anything fancier than saving the current history in a tempfile, loading it back in using readLines and then searching it using grep. It ought to be fairly simple to modify that function to include more functionality...for example, this modification will cause it to return the matching lines so you can store it in a variable:
myHistory <- function (max.show = 25, reverse = FALSE, pattern, ...)
{
file1 <- tempfile("Rrawhist")
savehistory(file1)
rawhist <- readLines(file1)
unlink(file1)
if (!missing(pattern))
rawhist <- unique(grep(pattern, rawhist, value = TRUE,
...))
nlines <- length(rawhist)
if (nlines) {
inds <- max(1, nlines - max.show):nlines
if (reverse)
inds <- rev(inds)
}
else inds <- integer()
#file2 <- tempfile("hist")
#writeLines(rawhist[inds], file2)
#file.show(file2, title = "R History", delete.file = TRUE)
rawhist[inds]
}

I will assume you're using the default R console. If you're on Windows, you can File -> Save history and open the file in your fav text browser, or you can use function savehistory() (see help(savehistory)).
What you need to do is get a (good) IDE, or at least a decent text editor. You will benevit from code folding, syntax coloring and much more. There's a plethora of choices, from Tinn-R, VIM, ESS, Eclipse+StatET, RStudio or RevolutionR among others.

You can run grep 'a<-' .Rhistory from terminal (assuming that you've cdd to your working directory). ESS has several very useful history-searching functions, like (comint-history-isearch-backward-regexp) - binded to M-r by default.
For further info, consult ESS manual: http://ess.r-project.org/Manual/ess.html

When you define a function, R stores the source code of the function (preserving formatting and comments) in an attribute named "source". When you type the name of the function, you will get this content printed.
But it doesn't do this with variables. You can deparse a variable, which generates an expression that will produce the variable's value but this doesn't need to be the original expression. For example when you have b <- c(17, 5, 21), deparse(b) will produce the string "c(17, 5, 21)".
In your example, however, the result wouldn't be "c(vectorA,vectorB,vectorC)", it would be an expression that produces the combined result of your three vectors.

Related

How can I pass the names of a list of files from bash to an R program?

I have a long list of files with names like: file-typeX-sectorY.tsv, where X and Y get values from 0-100. I process each of those files with an R program, but read them one by one like this:
data <- read.table(file='my_info.tsv', sep = '\t', header = TRUE, fill = TRUE)
it is impractical. I want to build a bash program that does something like
#!/bin/bash
for i in {0..100..1}
do
for j in {1..100..1)
do
Rscript program.R < file-type$i-sector$j.tsv
done
done
My problem is not with the bash script but with the R program. How can I receive the files one by one? I have googled and tried instructions like:
args <- commandArgs(TRUE)
either
data <- commandArgs(trailingOnly = TRUE)
but I can't find the way. Could you please help me?
At the simplest level your problem may be the (possible accidental ?) redirect you have -- so remove the <.
Then a mininmal R 'program' to take a command-line argument and do something with it would be
#!/usr/bin/env Rscript
args <- commandArgs(trailingOnly = TRUE)
stopifnot("require at least one arg" = length(args) > 0)
cat("We were called with '", args[1], "'\n", sep="")
We use a 'shebang' line and make it chmod 0755 basicScript.R to be runnable. The your shell double loop, reduced here (and correcting one typo) becomes
#!/bin/bash
for i in {0..2..1}; do
for j in {1..2..1}; do
./basicScript.R file-type${i}-sector${j}.tsv
done
done
and this works as we hope with the inner program reflecting the argument:
$ ./basicCaller.sh
We were called with 'file-type0-sector1.tsv'
We were called with 'file-type0-sector2.tsv'
We were called with 'file-type1-sector1.tsv'
We were called with 'file-type1-sector2.tsv'
We were called with 'file-type2-sector1.tsv'
We were called with 'file-type2-sector2.tsv'
$
Of course, this is horribly inefficient as you have N x M external processes. The two outer loops could be written in R, and instead of calling the script you would call your script-turned-function.

How to specify input arguments to Rscript by name from command line?

I am new to command line usage and don't think this question has been asked elsewhere. I'm trying to adapt an Rscript to be run from the command line in a shell script. Basically, I'm using some tools in the immcantation framework to read and annotate some antibody NGS data, and then to group sequences into their clonal families. To set the similarity threshold, the creators recommend using a function in their shazam package to set an appropriate threshold.
I've made the simple script below to read and validate the arguments:
#!/usr/bin/env Rscript
params <- commandArgs(trailingOnly=TRUE)
### read and validate mode argument
mode <- params[1]
modeAllowed <- c("ham","aa","hh_s1f","hh_s5f")
if(!(mode %in% modeAllowed)){
stop(paste("illegal mode argument supplied. acceptable values are",
paste(paste(modeAllowed, collapse = ", "), ".", sep = ""), "\nmode should be supplied first",
sep = " "))
}
### execute function
cat(threshold)
The script works, however since for each parameter there's only a finite number of options. I was wondering if there was a way of passing in the arguments like --mode aa (for example) from the terminal? All the information I've seen online seems to be using code like my mode <- params[1] from above which I guess only works if the mode argument is first?

Search list of keywords in elastic using R elastic package

I am making a shiny app where a user can input an excel file of terms and search them on elastic holdings. The excel file contains one column of keywords and is read into a list, then what I am trying to do is have the each item in the list searched using Search(). I easily did this in Python with a for loop over the terms and then the search connection inside the for loop and got accurate results. I understand that it isn't that easy in R, but I cannot get to the right solution. I am using the R elastic package and have been trying different versions of Search() for over a day. I have not used elastic before so my apologies for not understanding the syntax much. I know that I need to do something with aggs for a list of terms..
Essentially I want to search on the source.body_ field and I want to use match_phrase for searching my terms.
Here is the code I have in Python that works, but I need everything in R for the shiny app and don't want to use reticulate.
queries = list()
for term in my_terms:
search_result = es.search(index="cars", body={"query": {"match_phrase": {'body_':term}}}, size = 5000)
search_result.update([('term', term)])
queries.append(search_result)
I established my elastic connection as con and made sure it can bring back accurate matches on just one keyword with:
match <- {"query": {"match_phrase" : {"body_" : "mustang"}}}
search_results <- Search(con, index="cars", body = match, asdf = TRUE)
That worked how I expected it to with just one keyword explicitly defined.
So after that, here is what I have tried for a list of my_terms:
aggs <- '{"aggs":{"stats":{"terms":{"field":"my_terms"}}}}'
queries <- data.frame()
for (term in my_terms) {
final <- Search(con, index="cars", body = aggs, asdf = TRUE, size = 5000)
rbind(queries, final)
}
When I run this, it just brings back everything in my elastic. I also tried running that with the for loop commented out and that didn't work.
I also have tried to embed my_terms inside my match list from the single term search at the beginning of this post like so:
match <- '{"query": {"match_phrase" : {"body_": "my_terms"}}}'
Search(con, index="cars", body = match, asdf = TRUE, size = 5000)
This returns nothing. I have tried so many combinations of the aggs list and match list but nothing returns what I'm looking for. Any advice would be much appreciated, I feel like I've read just about everything so far and now I'm just confused.
UPDATE:
I figured it out with
p <- data.frame()
for (t in the_terms$keyword) {
result <- Search(con, index="cars", body = paste0('{"query": {"match_phrase" : {"body_":', '"', t, '"', '}}}'), asdf = TRUE, size = 5000)
p <- rbind(p, result$hit$hit)
}

R - Connect Scripts via Pipes

I have a number of R scripts that I would like to chain together using a UNIX-style pipeline. Each script would take as input a data frame and provide a data frame as output. For example, I am imagining something like this that would run in R's batch mode.
cat raw-input.Rds | step1.R | step2.R | step3.R | step4.R > result.Rds
Any thoughts on how this could be done?
Writing executable scripts is not the hard part, what is tricky is how to make the scripts read from files and/or pipes. I wrote a somewhat general function here: https://stackoverflow.com/a/15785789/1201032
Here is an example where the I/O takes the form of csv files:
Your step?.R files should look like this:
#!/usr/bin/Rscript
OpenRead <- function(arg) {
if (arg %in% c("-", "/dev/stdin")) {
file("stdin", open = "r")
} else if (grepl("^/dev/fd/", arg)) {
fifo(arg, open = "r")
} else {
file(arg, open = "r")
}
}
args <- commandArgs(TRUE)
file <- args[1]
fh.in <- OpenRead(file)
df.in <- read.csv(fh.in)
close(fh.in)
# do something
df.out <- df.in
# print output
write.csv(df.out, file = stdout(), row.names = FALSE, quote = FALSE)
and your csv input file should look like:
col1,col2
a,1
b,2
Now this should work:
cat in.csv | ./step1.R - | ./step2.R -
The - are annoying but necessary. Also make sure to run something like chmod +x ./step?.R to make your scripts executables. Finally, you could store them (and without extension) inside a directory that you add to your PATH, so you will be able to run it like this:
cat in.csv | step1 - | step2 -
Why on earth you want to cram your workflow into pipes when you have the whole R environment available is beyond me.
Make a main.r containing the following:
source("step1.r")
source("step2.r")
source("step3.r")
source("step4.r")
That's it. You don't have to convert the output of each step into a serialised format; instead you can just leave all your R objects (datasets, fitted models, predicted values, lattice/ggplot graphics, etc) as they are, ready for the next step to process. If memory is a problem, you can rm any unneeded objects at the end of each step; alternatively, each step can work with an environment which it deletes when done, first exporting any required objects to the global environment.
If modular code is desired, you can recast your workflow as follows. Encapsulate the work done by each file into one or more functions. Then call these functions in your main.r with the appropriate arguments.
source("step1.r") # defines step1_read_input, step1_f2
source("step2.r") # defines step2_f2
source("step3.r") # defines step3_f1, step3_f2, step3_f3
source("step4.r") # defines step4_write_output
step1_read_input(...)
step1_f2(...)
....
step4write_output(...)
You'll need to add a line at the top of each script to read in from stdin. Via this answer:
in_data <- readLines(file("stdin"),1)
You'll also need to write the output of each script to stdout().

R: While loop input

I am bit new to R and have a question about a program I am trying to write. I am hoping to take in files (as many as a user pleases) with a while loop (eventually using read.table on each) but it keeps breaking on me.
Here is what I have so far:
cat("Please enter the full path for your files, if you have no more files to add enter 'X': ")
fil<-readLines(con="stdin", 1)
cat(fil, "\n")
while (!input=='X' | !input=='x'){
inputfile=input
input<- readline("Please enter the full path for your files, if you have no more files to add enter 'X': ")
}
if(input=='X' | input=='x'){
exit -1
}
When I run it (from the commandline (UNIX)) I get these results:
> library("lattice")
>
> cat("Please enter the full path for your files, if you have no more files to add enter 'X': ")
Please enter the full path for your files, if you have no more files to add enter 'X': > fil<-readLines(con="stdin", 1)
x
> cat(fil, "\n")
x
> while (!input=='X' | !input=='x'){
+ inputfile=input
+ input<- readline("Please enter the full path for your files, if you have no more files to add enter 'X': ")
+ }
Error: object 'input' not found
Execution halted
I am not quite sure how to fix the problem, but I am pretty sure that it is probably a simple problem.
Any suggestions?
Thanks!
when you first run the script input doesnt exist. Assign
input<-c()
say before your while statement or put
inputfile=input
below input<- readline....
I'm not exactly sure what the underlying problem is for your issue. It may be that you're inputting the directory path incorrectly.
Here's a solution I've used a few times. It makes it much easier for the user. Basically, your code will not require user input, all it requires is that you have a certain naming convention for your files.
setwd("Your/Working/Directory") #This doesn't change
filecontents <- 1
i <- 1
while (filecontents != 0) {
mydata.csv <- try(read.csv(paste("CSV_file_",i,".csv", sep = ""), header = FALSE), silent = TRUE)
if (typeof(mydata.csv) != "list") { #checks to see if the imported data is a list
filecontents <- 0
}
else {
assign(paste('dataset',i, sep=''), mydata)
#Whatever operations you want to do on the files.
i <- i + 1
}
}
As you can see, the naming convention for the files is CSV_file_n where n is any number of input files (i took this code out of one of my programs, in which I load csv's). One of the problems I kept having was Error messages popping up when my code looked for a file that wasn't there. With this loop, those messages won't arise. If it assigns the contents of a non-existant file to mydata.csv, it merely checks to see if mydata.csv is a list. If it is, it continues operating. If not, it stops. If you're worried about differentiating between your data from different files within the code, just insert any relevant information about the file in a constant location within the file itself. For example, in my csv's, My 3rd column always contained the name of the image from which I gathered the information contained in the rest of the csv.
Hope this helps you a bit, even though I see you've already got a solution :-). It's really just an option if you want your program to be more autonomous.

Resources