How to use indentation with `rstudioapi::insertText` - r

I have a string that I want to paste with indentation into RStudio using the {rstudioapi}. Here is a simple test string:
test_str <- "for (i in seq_along(x)) {\nout[[i]] <- sum(x[[i]])\n}"
cat(test_str)
#> for (i in seq_along(x)) {
#> out[[i]] <- sum(x[[i]])
#> }
When copying the console output and pasting it manually into an R script in RStudio the output has the correct indentation of one tab equalling two spaces (my default setting):
# this is my desired output (directly in a script, not the console):
for (i in seq_along(x)) {
out[[i]] <- sum(x[[i]])
}
When using rstudioapi::insertText the string is inserted in the script without indentation:
rstudioapi::insertText(test_str)
for (i in seq_along(x)) {
out[[i]] <- sum(x[[i]]) # one tab (equalling two spaces) is missing
}
How can I add indentation when using rstudioapi::insertText or any other function from the {rstudioapi} package?
Reading the documentation I found how to read the system preference for indentation:
rstudioapi::readRStudioPreference("num_spaces_for_tab")
#> 2
However, I can't figure out how to make insertText use this information.
More context:
I'm looking for a way to add indentation programmatically to string outputs. That means, I don't want to add manually \t to lines which should have indentation. I'm in a package and have to deal with user input, which makes it probably pretty tough to calculate the correct amount of indentation which is needed. In the example above line 1 and 3 would need no indentation, while line 2 would need one tab or two spaces.
Ideally, I'd like to use no other package than the {rstudioapi} or base R. Looking at the documentation insertText also has a location argument which works with positions or ranges in scripts. I'm not sure whether this can be somehow used to include indentation.
I'm also looking at the {datapasta} package which also uses the {rstudioapi} and here the "num_spaces_for_tab"option is used in the output_context (in the script called oc$nspc), but I'm not sure how to apply it to my problem.

You could use rstudioapi::executeCommand to launch reindent or reformatCode commands:
If you run following commands together in editor (for example with ctrl+A Ctrl+Enter):
test_str <- "for (i in seq_along(x)) {\nout[[i]] <- sum(x[[i]])\n}"
rstudioapi::insertText(test_str)
# Should be adapted to the range you want to reformat (here : all lines)
ranges <- rstudioapi::document_range(c(1, 0), c(Inf, Inf))
rstudioapi::setSelectionRanges(ranges)
rstudioapi::executeCommand('reformatCode')
You get:
for (i in seq_along(x)) {
out[[i]] <- sum(x[[i]])
}
List of available command IDs is available here.

I didn't hear about an indentation feature in rstudioapi library.
But I know, that the styler has this possibility.
Maybe, it will be also helpful for you.
An example:
library(styler)
test_str <- "for (i in seq_along(x)) {\nout[[i]] <- sum(x[[i]])\n}" #your code
style_text(test_str, indent_by = 3)
An output:
for (i in seq_along(x)) {
out[[i]] <- sum(x[[i]])
}
Let's add this into insertText
> rstudioapi::insertText(style_text(test_str, indent_by = 3))
named list()
> for (i in seq_along(x)) {
+ out[[i]] <- sum(x[[i]])
+ }
It works?
An addition
Maybe this?
Add \t to our string.
test_str <- "for (i in seq_along(x)) {\n\tout[[i]] <- sum(x[[i]])\n}"
Because you want to see two spaces, let's do this:
> insertText(gsub('\\t',' ', test_str))
named list()
> for (i in seq_along(x)) {
+ out[[i]] <- sum(x[[i]])
+ }

Related

Loop in R with a function included

I'm very new at R and I would like to do a loop in order to return search volume (through an API call) for a list of keywords.
Here the code that I used :
install.packages("SEMrushR")
library(SEMrushR)
mes_keywords_to_check <- readLines("voyage.txt") # List of keywords to check
mes_keywords_to_check <- as.character(mes_keywords_to_check)
Loop
for (i in 1:length(mes_keywords_to_check)) {
test_keyword <- as.character(mes_keywords_to_check[i])
df_test_2 <- keyword_overview_all(test_keyword, "fr","API KEY NUMBER") ##keyword_overview_all is the function from the Semrush package
}
By doing this, I only get the Search Volume for the first keyword in the list. My purpose if of course to get the date required for the full list of keywords.
Here is the table that I get:
enter image description here
Do you have any idea how I could solve this issue?
Well, you need to add your results to some kind of container. for example to a list. As of now, you have just one object that gets filled with data from the most recent iteration of your loop.
results = list()
for (i in 1:length(mes_keywords_to_check)) {
test_keyword <- as.character(mes_keywords_to_check[i])
df_test_2 <- keyword_overview_all(test_keyword, "fr","API KEY NUMBER") ##keyword_overview_all is the function from the Semrush package
results[[i]] <- df_test_2
}
But, most R experts would suggest to refrain from using a loop
library("plyr")
result <- plyr::ldply(mes_keywords_to_check, function(x) keyword_overview_all(as.character(x), "fr","API KEY NUMBER"))
I did not test this, and it probably needs some tweaking, but it should point you in the right direction.
It looks like you're reading in the text file with readLines("voyage.txt") which will return a list of each line. These lines are then being passed to the for loop. The below will convert the lines to words. There are various approaches, but below uses a loop within a loop to keep using for() and in case you prefer to search line-by-line-word-by-word. It also uses a regex to split on non-alpha-numeric so that you omit words bounded by punctuation.
mes_lines <- readLines("voyage.txt") # List of keywords to check
mes_lines <- as.character(mes_lines)
search_results <- list()
for (i in 1:length(mes_lines)) {
mes_keywords_to_check <- unlist(strsplit(mes_lines,"[^[:alnum:]]"))
mes_keywords_to_check <- mes_keywords_to_check[nchar(mes_keywords_to_check)>0]
if (length(mes_keywords_to_check)==0) next
for (w in 1:length(mes_keywords_to_check))
{
test_keyword <- as.character(mes_keywords_to_check[w])
print(paste0("Checking word=",test_keyword))
df_test_2 <- keyword_overview_all(test_keyword, "fr","API KEY NUMBER") ##keyword_overview_all is the function from the Semrush package
search_results <- append(search_results,df_test_2)
}
}
search_results
Thanks for pointing to the right direction.
Here is what I did, and this is working:
final_result <- data.frame()
mes_keywords_to_check <- readLines("voyage.txt")
mes_keywords_to_check <- as.character(mes_keywords_to_check)
for (i in 1:length(mes_keywords_to_check)) {
test_keyword <- as.character(mes_keywords_to_check[i])
df_test_2 <- keyword_overview_all(test_keyword, "fr","API KEY")
final_result <- rbind(final_result,df_test_2)
}

How to get content or length of current console output?

I'm looking for the functions get_output_content or at least get_output_length below, that would tell me how many characters were printed in the console.
test <- function(){
cat("ab")
cat("\b")
cat("cd")
c <- get_output_content() # "acd" (I'd be happy with "ab\bcd" as well)
l <- get_output_length() # 3
return(list(c,l))
}
test()
In this example obviously I could easily count the characters in the input, but If I'm using other functions I may not. Can you help me build one or both of these functions ?
EDIT to clarify:
in my real situation, I cannot work upstream and count before, like in the proposed solutions, I need to count the displayed output at a given time without monitoring what's before.
here's a reproducible example looking more like what I want to achieve
library(pbapply)
my_files <- paste0(1000:1,".pdf")
work_on_pdf <- function(pdf_file){
Sys.sleep(0.001)
}
report <- pbsapply(my_files,work_on_pdf) # the simple version, but I want to add the pdf name next to the bar to have more info about my progress
# so I tried this but it's not satisfying because it "eats" some of the current output of pbapply
report <- pbsapply(my_files,function(x){
buffer_length <- 25
work_on_pdf(x)
catmsg <- paste0(c( # my additional message, which is in 3 parts:
rep("\b",buffer_length), # (1) eat 25 characters
x, # (2) print filename
rep(" ",buffer_length-nchar(x))), # (3) print spaces to cover up what may have been printed before
collapse="")
cat(catmsg)
})
if I was able to count what's in the console I could easily tweak my function to get something satisfying.
NEW EDIT : FYI solution to example but not to general question:
I could solve my precise issue with this, though it doesn't solve the general question, which is measuring the current output of the console when you don't have any other info.
library(pbapply)
my_files <- paste0(1000:1,".pdf")
work_on_pdf <- function(pdf_file){
Sys.sleep(0.01)
}
pbsapply2 <- function(X,FUN,FUN2){
# FUN2 will give the additional message
pbsapply(X,function(x){
msg <- FUN2(x)
cat(msg)
output <- FUN(x)
eraser <- paste0(c(
rep("\b",nchar(msg)), # go back to position before additional message
rep(" ",nchar(msg)), # cover with blank spaces
rep("\b",nchar(msg))), # go back again to initial position
collapse="")
cat(eraser)
return(output)
})
}
report <- pbsapply2(my_files,work_on_pdf,function(x) paste("filename:",x))
Something like this (?):
test <- function(){
c <- paste0(capture.output(cat("ab")),
capture.output(cat("\b")),
capture.output(cat("cd")))
n <- nchar(c)
l <- length(c)
return(list(c,n,l))
}
test()

Customizing R profile [duplicate]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I have always found startup profile files of other people both useful and instructive about the language. Moreover, while I have some customization for Bash and Vim, I have nothing for R.
For example, one thing I always wanted is different colors for input and output text in a window terminal, and maybe even syntax highlighting.
Here is mine. It won't help you with the coloring but I get that from ESS and Emacs...
options("width"=160) # wide display with multiple monitors
options("digits.secs"=3) # show sub-second time stamps
r <- getOption("repos") # hard code the US repo for CRAN
r["CRAN"] <- "http://cran.us.r-project.org"
options(repos = r)
rm(r)
## put something this is your .Rprofile to customize the defaults
setHook(packageEvent("grDevices", "onLoad"),
function(...) grDevices::X11.options(width=8, height=8,
xpos=0, pointsize=10,
#type="nbcairo")) # Cairo device
#type="cairo")) # other Cairo dev
type="xlib")) # old default
## from the AER book by Zeileis and Kleiber
options(prompt="R> ", digits=4, show.signif.stars=FALSE)
options("pdfviewer"="okular") # on Linux, use okular as the pdf viewer
I hate to type the full words 'head', 'summary', 'names' every time, so I use aliases.
You can put aliases into your .Rprofile file, but you have to use the full path to the function (e.g. utils::head) otherwise it won't work.
# aliases
s <- base::summary
h <- utils::head
n <- base::names
EDIT: to answer your question, you can use the colorout package to have different colors in the terminal. Cool! :-)
options(stringsAsFactors=FALSE)
Although I don't actually have that in my .Rprofile, because it might breaks my coauthors' code, I wish it was the default. Why?
1) Character vectors use less memory (but only barely);
2) More importantly, we would avoid problems such as:
> x <- factor(c("a","b","c"))
> x
[1] a b c
Levels: a b c
> x <- c(x, "d")
> x
[1] "1" "2" "3" "d"
and
> x <- factor(c("a","b","c"))
> x[1:2] <- c("c", "d")
Warning message:
In `[<-.factor`(`*tmp*`, 1:2, value = c("c", "d")) :
invalid factor level, NAs generated
Factors are great when you need them (e.g. implementing ordering in graphs) but a nuisance most of the time.
I like saving my R command history and having it available each time I run R:
In the shell or .bashrc:
export R_HISTFILE=~/.Rhistory
in .Rprofile:
.Last <- function() {
if (!any(commandArgs()=='--no-readline') && interactive()){
require(utils)
try(savehistory(Sys.getenv("R_HISTFILE")))
}
}
Here are two functions I find handy for working with windows.
The first converts the \s to /.
.repath <- function() {
cat('Paste windows file path and hit RETURN twice')
x <- scan(what = "")
xa <- gsub('\\\\', '/', x)
writeClipboard(paste(xa, collapse=" "))
cat('Here\'s your de-windowsified path. (It\'s also on the clipboard.)\n', xa, '\n')
}
The second opens the working directory in a new explorer window.
getw <- function() {
suppressWarnings(shell(paste("explorer", gsub('/', '\\\\', getwd()))))
}
Here's mine. I always use the main cran repository, and have code to make it easy to source in-development package code.
.First <- function() {
library(graphics)
options("repos" = c(CRAN = "http://cran.r-project.org/"))
options("device" = "quartz")
}
packages <- list(
"describedisplay" = "~/ggobi/describedisplay",
"linval" = "~/ggobi/linval",
"ggplot2" = "~/documents/ggplot/ggplot",
"qtpaint" = "~/documents/cranvas/qtpaint",
"tourr" = "~/documents/tour/tourr",
"tourrgui" = "~/documents/tour/tourr-gui",
"prodplot" = "~/documents/categorical-grammar"
)
l <- function(pkg) {
pkg <- tolower(deparse(substitute(pkg)))
if (is.null(packages[[pkg]])) {
path <- file.path("~/documents", pkg, pkg)
} else {
path <- packages[pkg]
}
source(file.path(path, "load.r"))
}
test <- function(path) {
path <- deparse(substitute(path))
source(file.path("~/documents", path, path, "test.r"))
}
I've got this, more dynamic trick to use full terminal width, which tries to read from the COLUMNS environment variable (on Linux):
tryCatch(
{options(
width = as.integer(Sys.getenv("COLUMNS")))},
error = function(err) {
write("Can't get your terminal width. Put ``export COLUMNS'' in your \
.bashrc. Or something. Setting width to 120 chars",
stderr());
options(width=120)}
)
This way R will use the full width even as you resize your terminal window.
Most of my personal functions and loaded libraries are in the Rfunctions.r script
source("c:\\data\\rprojects\\functions\\Rfunctions.r")
.First <- function(){
cat("\n Rrrr! The statistics program for Pirates !\n\n")
}
.Last <- function(){
cat("\n Rrrr! Avast Ye, YO HO!\n\n")
}
#===============================================================
# Tinn-R: necessary packages
#===============================================================
library(utils)
necessary = c('svIDE', 'svIO', 'svSocket', 'R2HTML')
if(!all(necessary %in% installed.packages()[, 'Package']))
install.packages(c('SciViews', 'R2HTML'), dep = T)
options(IDE = 'C:/Tinn-R/bin/Tinn-R.exe')
options(use.DDE = T)
library(svIDE)
library(svIO)
library(svSocket)
library(R2HTML)
guiDDEInstall()
shell(paste("mkdir C:\\data\\rplots\\plottemp", gsub('-','',Sys.Date()), sep=""))
pldir <- paste("C:\\data\\rplots\\plottemp", gsub('-','',Sys.Date()), sep="")
plot.str <-c('savePlot(paste(pldir,script,"\\BeachSurveyFreq.pdf",sep=""),type="pdf")')
Here's from my ~/.Rprofile, designed for Mac and Linux.
These make errors easier to see.
options(showWarnCalls=T, showErrorCalls=T)
I hate the CRAN menu choice, so set to a good one.
options(repos=c("http://cran.cnr.Berkeley.edu","http://cran.stat.ucla.edu"))
More history!
Sys.setenv(R_HISTSIZE='100000')
The following is for running on Mac OSX from the terminal (which I greatly prefer to R.app because it's more stable, and you can organize your work by directory; also make sure to get a good ~/.inputrc). By default, you get an X11 display, which doesn't look as nice; this instead gives a quartz display same as the GUI. The if statement is supposed to catch the case when you're running R from the terminal on Mac.
f = pipe("uname")
if (.Platform$GUI == "X11" && readLines(f)=="Darwin") {
# http://www.rforge.net/CarbonEL/
library("grDevices")
library("CarbonEL")
options(device='quartz')
Sys.unsetenv("DISPLAY")
}
close(f); rm(f)
And preload a few libraries,
library(plyr)
library(stringr)
library(RColorBrewer)
if (file.exists("~/util.r")) {
source("~/util.r")
}
where util.r is a random bag of stuff I use, under flux.
Also, since other people were mentioning console width, here's how I do it.
if ( (numcol <-Sys.getenv("COLUMNS")) != "") {
numcol = as.integer(numcol)
options(width= numcol - 1)
} else if (system("stty -a &>/dev/null") == 0) {
# mac specific? probably bad in the R GUI too.
numcol = as.integer(sub(".* ([0-9]+) column.*", "\\1", system("stty -a", intern=T)[1]))
if (numcol > 0)
options(width= numcol - 1 )
}
rm(numcol)
This actually isn't in .Rprofile because you have to re-run it every time you resize the terminal window. I have it in util.r then I just source it as necessary.
Here are mine:
.First <- function () {
options(device="quartz")
}
.Last <- function () {
if (!any(commandArgs() == '--no-readline') && interactive()) {
require(utils)
try(savehistory(Sys.getenv("R_HISTFILE")))
}
}
# Slightly more flexible than as.Date
# my.as.Date("2009-01-01") == my.as.Date(2009, 1, 1) == as.Date("2009-01-01")
my.as.Date <- function (a, b=NULL, c=NULL, ...) {
if (class(a) != "character")
return (as.Date(sprintf("%d-%02d-%02d", a, b, c)))
else
return (as.Date(a))
}
# Some useful aliases
cd <- setwd
pwd <- getwd
lss <- dir
asd <- my.as.Date # examples: asd("2009-01-01") == asd(2009, 1, 1) == as.Date("2009-01-01")
last <- function (x, n=1, ...) tail(x, n=n, ...)
# Set proxy for all web requests
Sys.setenv(http_proxy="http://192.168.0.200:80/")
# Search RPATH for file <fn>. If found, return full path to it
search.path <- function(fn,
paths = strsplit(chartr("\\", "/", Sys.getenv("RPATH")), split =
switch(.Platform$OS.type, windows = ";", ":"))[[1]]) {
for(d in paths)
if (file.exists(f <- file.path(d, fn)))
return(f)
return(NULL)
}
# If loading in an environment that doesn't respect my RPATH environment
# variable, set it here
if (Sys.getenv("RPATH") == "") {
Sys.setenv(RPATH=file.path(path.expand("~"), "Library", "R", "source"))
}
# Load commonly used functions
if (interactive())
source(search.path("afazio.r"))
# If no R_HISTFILE environment variable, set default
if (Sys.getenv("R_HISTFILE") == "") {
Sys.setenv(R_HISTFILE=file.path("~", ".Rhistory"))
}
# Override q() to not save by default.
# Same as saying q("no")
q <- function (save="no", ...) {
quit(save=save, ...)
}
# ---------- My Environments ----------
#
# Rather than starting R from within different directories, I prefer to
# switch my "environment" easily with these functions. An "environment" is
# simply a directory that contains analysis of a particular topic.
# Example usage:
# > load.env("markets") # Load US equity markets analysis environment
# > # ... edit some .r files in my environment
# > reload() # Re-source .r/.R files in my environment
#
# On next startup of R, I will automatically be placed into the last
# environment I entered
# My current environment
.curr.env = NULL
# File contains name of the last environment I entered
.last.env.file = file.path(path.expand("~"), ".Rlastenv")
# Parent directory where all of my "environment"s are contained
.parent.env.dir = file.path(path.expand("~"), "Analysis")
# Create parent directory if it doesn't already exist
if (!file.exists(.parent.env.dir))
dir.create(.parent.env.dir)
load.env <- function (string, save=TRUE) {
# Load all .r/.R files in <.parent.env.dir>/<string>/
cd(file.path(.parent.env.dir, string))
for (file in lss()) {
if (substr(file, nchar(file)-1, nchar(file)+1) %in% c(".r", ".R"))
source(file)
}
.curr.env <<- string
# Save current environment name to file
if (save == TRUE) writeLines(.curr.env, .last.env.file)
# Let user know environment switch was successful
print (paste(" -- in ", string, " environment -- "))
}
# "reload" current environment.
reload <- resource <- function () {
if (!is.null(.curr.env))
load.env(.curr.env, save=FALSE)
else
print (" -- not in environment -- ")
}
# On startup, go straight to the environment I was last working in
if (interactive() && file.exists(.last.env.file)) {
load.env(readLines(.last.env.file))
}
sink(file = 'R.log', split=T)
options(scipen=5)
.ls.objects <- function (pos = 1, pattern, order.by = "Size", decreasing=TRUE, head = TRUE, n = 10) {
# based on postings by Petr Pikal and David Hinds to the r-help list in 2004
# modified by: Dirk Eddelbuettel (http://stackoverflow.com/questions/1358003/tricks-to- manage-the-available-memory-in-an-r-session)
# I then gave it a few tweaks (show size as megabytes and use defaults that I like)
# a data frame of the objects and their associated storage needs.
napply <- function(names, fn) sapply(names, function(x)
fn(get(x, pos = pos)))
names <- ls(pos = pos, pattern = pattern)
obj.class <- napply(names, function(x) as.character(class(x))[1])
obj.mode <- napply(names, mode)
obj.type <- ifelse(is.na(obj.class), obj.mode, obj.class)
obj.size <- napply(names, object.size) / 10^6 # megabytes
obj.dim <- t(napply(names, function(x)
as.numeric(dim(x))[1:2]))
vec <- is.na(obj.dim)[, 1] & (obj.type != "function")
obj.dim[vec, 1] <- napply(names, length)[vec]
out <- data.frame(obj.type, obj.size, obj.dim)
names(out) <- c("Type", "Size", "Rows", "Columns")
out <- out[order(out[[order.by]], decreasing=decreasing), ]
if (head)
out <- head(out, n)
out
}
Make data.frames display somewhat like 'head', only without having to type 'head'
print.data.frame <- function(df) {
if (nrow(df) > 10) {
base::print.data.frame(head(df, 5))
cat("----\n")
base::print.data.frame(tail(df, 5))
} else {
base::print.data.frame(df)
}
}
(From How to make 'head' be applied automatically to output? )
I often have a chain of debug calls I need to call and uncommenting them can be very tedious. With the help of the SO community, I went for the following solution and inserted this into my .Rprofile.site. # BROWSER is there for my Eclipse Tasks so that I have an overview of browser calls in the Task View window.
# turn debugging on or off
# place "browser(expr = isTRUE(getOption("debug"))) # BROWSER" in your function
# and turn debugging on or off by bugon() or bugoff()
bugon <- function() options("debug" = TRUE)
bugoff <- function() options("debug" = FALSE) #pun intended
Mine is not too fancy:
# So the mac gui can find latex
Sys.setenv("PATH" = paste(Sys.getenv("PATH"),"/usr/texbin",sep=":"))
#Use last(x) instead of x[length(x)], works on matrices too
last <- function(x) { tail(x, n = 1) }
#For tikzDevice caching
options( tikzMetricsDictionary='/Users/cameron/.tikzMetricsDictionary' )
setwd("C://path//to//my//prefered//working//directory")
library("ggplot2")
library("RMySQL")
library("foreign")
answer <- readline("What database would you like to connect to? ")
con <- dbConnect(MySQL(),user="root",password="mypass", dbname=answer)
I do a lot of work from mysql databases, so connecting right away is a godsend. I only wish there was a way of listing the avaialble databases so I wouldn't have to remember all the different names.
Stephen Turner's post on .Rprofiles has several useful aliases and starter functions.
I find myself using his ht and hh often.
#ht==headtail, i.e., show the first and last 10 items of an object
ht <- function(d) rbind(head(d,10),tail(d,10))
# Show the first 5 rows and first 5 columns of a data frame or matrix
hh <- function(d) d[1:5,1:5]
Here's mine, including some of the mentioned ideas.
Two things you might want to look at:
.set.width() / w() update your print width to the one of the terminal. Unfortunately I did not find a way to do this automatically on terminal resize - R documentation mentions this is done by some R interpreters.
history is saved every time together with a timestamp and the working directory
.
.set.width <- function() {
cols <- as.integer(Sys.getenv("COLUMNS"))
if (is.na(cols) || cols > 10000 || cols < 10)
options(width=100)
options(width=cols)
}
.First <- function() {
options(digits.secs=3) # show sub-second time stamps
options(max.print=1000) # do not print more than 1000 lines
options("report" = c(CRAN="http://cran.at.r-project.org"))
options(prompt="R> ", digits=4, show.signif.stars=FALSE)
}
# aliases
w <- .set.width
.Last <- function() {
if (!any(commandArgs()=='--no-readline') && interactive()){
timestamp(,prefix=paste("##------ [",getwd(),"] ",sep=""))
try(savehistory("~/.Rhistory"))
}
}
I use the following to get cacheSweave (or pgfSweave) to work with the "Compile PDF" button in RStudio:
library(cacheSweave)
assignInNamespace("RweaveLatex", cacheSweave::cacheSweaveDriver, "utils")
Mine includes options(menu.graphics=FALSE) because I like to Disable/suppress tcltk popup for CRAN mirror selection in R.
Here's mine. Nothing too innovative. Thoughts on why particular choices:
I went with setting a default for stringsAsFactors because I find
it extremely draining to pass it as an argument each time I read a CSV in. That said, it has already caused me some minor vexation when using code written on my usual computer on a computer which did not have my .Rprofile. I'm keeping it, though, as the troubles it has caused pale in comparison to the troubles not having it set everyday used to cause.
If you don't load the utils package before options(error=recover), it cannot find recover when placed inside an interactive() block.
I used .db for my dropbox setting rather than options(dropbox=...) because I use it all the time inside file.path and it saves much typing. The leading . keeps it from appearing with ls().
Without further ado:
if(interactive()) {
options(stringsAsFactors=FALSE)
options(max.print=50)
options(repos="http://cran.mirrors.hoobly.com")
}
.db <- "~/Dropbox"
# `=` <- function(...) stop("Assignment by = disabled, use <- instead")
options(BingMapsKey="blahblahblah") # Used by taRifx.geo::geocode()
.First <- function() {
if(interactive()) {
require(functional)
require(taRifx)
require(taRifx.geo)
require(ggplot2)
require(foreign)
require(R.utils)
require(stringr)
require(reshape2)
require(devtools)
require(codetools)
require(testthat)
require(utils)
options(error=recover)
}
}
Here's a little snippet for use exporting tables to LaTeX. It changes all the column names to math mode for the many reports I write. The rest of my .Rprofile is pretty standard and mostly covered above.
# Puts $dollar signs in front and behind all column names col_{sub} -> $col_{sub}$
amscols<-function(x){
colnames(x) <- paste("$", colnames(x), "$", sep = "")
x
}
I set my lattice color theme in my profile. Here are two other tweaks I use:
# Display working directory in the titlebar
# Note: This causes demo(graphics) to fail
utils::setWindowTitle(base::getwd())
utils::assignInNamespace("setwd",function(dir) {.Internal(setwd(dir));setWindowTitle(base::getwd())},"base")
# Don't print more than 1000 lines
options(max.print=2000)
I have an environment variable R_USER_WORKSPACE which points to the top directory of my packages. In .Rprofile I define a function devlib which sets the working directory (so that data() works) and sources all .R files in the R subdirectory. It is quite similar to Hadley's l() function above.
devlib <- function(pkg) {
setwd(file.path(Sys.getenv("R_USER_WORKSPACE", "."), deparse(substitute(pkg)), "dev"))
sapply(list.files("R", pattern=".r$", ignore.case=TRUE, full.names=TRUE), source)
invisible(NULL)
}
.First <- function() {
setwd(Sys.getenv("R_USER_WORKSPACE", "."))
options("repos" = c(CRAN = "http://mirrors.softliste.de/cran/", CRANextra="http://www.stats.ox.ac.uk/pub/RWin"))
}
.Last <- function() update.packages(ask="graphics")
I found two functions really necessary: First when I have set debug() on several functions and I have resolved the bug, so I want to undebug() all functions - not one by one. The undebug_all() function added as the accepted answer here is the best.
Second, when I have defined many functions and I am looking for a specific variable name, it's hard to find it within all results of the the ls(), including the function names. The lsnofun() function posted here is really good.

How to put characters

I have a fasta format file where in i have to only keep those nodes whose length is less than 100. however, the problem i am currently facing is that i am able to separate the nodes but am not able to put the characters of each node in separate variable whose length i can then check and subsequently separate the requisite nodes from longer ones.
So what i mean is i am able to read the headings and separate nodes but how do i put the characters within each node in a variable.
This is a sample of my data
>NODE_1
GTTGGCCGAGCCCCAGGACGCGTGGTTGTTGAACCAGATCAGGTCCGGGCTCCACTGCAC
GTAGTCCTCGTTGGACAGCAGCGGGGCGTACGAGGCCAGCTTGACCACGTCGGCGTTGCG
CTCGAGCCGGTCATGAACGCGGCCTCGGCGAGGGCGTTCTTCCAGGCGTTGCCCTGGGAA
>NODE_2
CCTCCGGCGGCACCACGGTCGGCGAGGCCCTCAACATCCTGGAGCGCACCGACCTGTCCA
CCGCGGACAAGGCCGGTTACCTGCACCGCTACATCGAGGCCAGCCGCATCGCGTTCGCGG
ACCGCGGGCGCTGGGTCGGCGACCCCGCCTTCGAGGACGTAC
>NODE_3
CCTCCGGCGGCACCACGGTCGGCGAGGCCCTCAACATCCTGGAGCGCACCGACCTGTCCA
CCGCGGACAAGGCCGGTTACCTGCACCGCTACATCGAGGCCAGCCGCATCGCGTTCGCGG
ACCGCGGGCGCTGGGTCGGCGACCCCGCCTTCGAGGACGTACATCATTCCTTAATCTTCC
my code:
x <- readLines("1.fa", n = -1L, ok = TRUE, warn = TRUE)
for (i in 1:length(x)) {
if (substr(x[i],1,1)=='>') {
head <- c(head,x[i])
q <- x[i+1]
if (q=!0) {
contig <- c(contig,q)
print(contig)
contig.length <- c(contig.length, nchar(q))
} else {
break
}
} else {
z <- paste(z,x[i], sep=" ")
}
}
You should use BioConductor for that. You're actually trying to parse a FASTA-file to some kind of a list. Bioconductor has a simple function read.fasta() that does just that, and returns an object where you can get the lengths and so on. Learning bioconductor is definitely worth the hassle if you work with sequences.
To do it in base R, you'll need to work with lists, something like :
Split.Fasta <- function(x){
out <- list()
for(i in x){
if(substr(i,1,1)==">") {
name <- gsub(">","",i)
out[[name]] <- character(0)
} else if (grepl("\\w",i)){
out[[name]] <- paste(out[[name]],gsub("\\W","",i),sep="")
}
}
out
}
Which works like :
zz <- textConnection(">NODE_1
GTTGGCCGAGCCCCAGGACGCGTGGTTGTTGAACCAGATCAGGTCCGGGCTCCACTGCAC
GTAGTCCTCGTTGGACAGCAGCGGGGCGTACGAGGCCAGCTTGACCACGTCGGCGTTGCG
CTCGAGCCGGTCATGAACGCGGCCTCGGCGAGGGCGTTCTTCCAGGCGTTGCCCTGGGAA
>NODE_2
CCTCCGGCGGCACCACGGTCGGCGAGGCCCTCAACATCCTGGAGCGCACCGACCTGTCCA
CCGCGGACAAGGCCGGTTACCTGCACCGCTACATCGAGGCCAGCCGCATCGCGTTCGCGG
ACCGCGGGCGCTGGGTCGGCGACCCCGCCTTCGAGGACGTAC
>NODE_3
CCTCCGGCGGCACCACGGTCGGCGAGGCCCTCAACATCCTGGAGCGCACCGACCTGTCCA
CCGCGGACAAGGCCGGTTACCTGCACCGCTACATCGAGGCCAGCCGCATCGCGTTCGCGG
ACCGCGGGCGCTGGGTCGGCGACCCCGCCTTCGAGGACGTACATCATTCCTTAATCTTCC")
X <- readLines(zz,n=-1L,ok=TRUE,warn=TRUE)
close(zz)
Y <- Split.Fasta(X)
$`NODE_1 `
[1] "GTTGGCCGAGCCCCAGGACGCGTGGTTGTTGAACCAGATCA...
$`NODE_2 `
[1] "CCTCCGGCGGCACCACGGTCGGCGAGGCCCTCAACATCCTGGAGC...
$`NODE_3 `
[1] "CCTCCGGCGGCACCACGGTCGGCGAGGCCCTCAACATCCTGGAGCGCAC...
It returns a list which you can use later on to check lengths and so on :
sapply(Y,nchar)
NODE_1 NODE_2 NODE_3
180 162 180
Still, learn to use BioConductor, you'll thank yourself for that.
You could install the seqinr package, which has lots of methods for analysing sequence data.
install.packages("seqinr")
Next, read in your fasta file:
seqs <- read.fasta("myfile.fa")
And then, extract sequences from the list with length < 100:
seqs.small <- seqs[sapply(seqs, function(x) getLength(x) < 100)]
maybe assign would be helpful?
assign('NODE_1', 'GTTGG...')

Understanding how the device list is read

I ran into a rather annoying issue with the list of open devices, trying to construct a function that saves a number of graphs for a list. Say we have following data :
Alist <- list(
X1 = data.frame(X=rnorm(10),Y=1:10),
X2 = data.frame(X=rnorm(10),Y=1:10),
X3 = data.frame(X=rnorm(10),Y=1:10)
)
and following function :
myPlotFunc <- function(x,save=F){
fnames <- paste(names(x),"pdf",sep=".")
for(i in 1:length(x)){
if(save){
pdf(fnames[i])
on.exit(dev.off(),add=T)
}
plot(x[[i]])
}
fnames
}
If I run fnames <- myPlotFunc(Alist,save=T), everything behaves normally and I get 3 pdf files names X1.pdf to X3.pdf. That is, if there's no graphics window open. If there is, then one of the pdfs is not closed, and all subsequent plots are added to the pdf until I explicitly call dev.off() in the console. Like this :
plot(Alist[[1]])
fnames <- myPlotFunc(Alist,save=T)
myPlotFunc(Alist,save=F)
> dev.list()
pdf
4
If I add on.exit({print(dev.cur());dev.off()},add=T) , I get following output :
> fnames <- myPlotFunc(Alist,save=T)
pdf
5
windows
2
pdf
3
So apparently it takes the list from the bottom up again to close everything it meets. So if there is a graphics window open, that is the next "current" device. Meaning that the second-last opened pdf-connection will not be closed by dev.off() any more, as there will be one short in the on.exit call.
I got around it by changing my function to :
myPlotFunc <- function(x,save=F){
fnames <- paste(names(x),"pdf",sep=".")
devs <- NULL
on.exit(for(i in devs) dev.off(i), add=T)
for(i in 1:length(x)){
if(save){
pdf(fnames[i])
devs <- c(devs,dev.cur())
}
plot(x[[i]])
}
fnames
}
but this feels rather awkward. Is there something I'm missing here, or a better way of getting around this?
disclaimer :
In case you're not aware, run dev.off() after running the third code block. You can clean up easily by running unlink(fnames) when you're done.
How about making help function to do one plot:
myPlotFunc <- function(x, save=FALSE) {
fnames <- paste(names(x), "pdf", sep=".")
plot_one <- function(xx, fname, save=save) {
if (save) {
pdf(fname)
on.exit(dev.off())
}
plot(xx)
}
for (i in 1:length(x)) plot_one(x[[i]], fnames[i], save)
fnames
}
One drastic solution might be to use graphics.off() instead of trying to close devices your script opens. If this is just user code, then perhaps it doesn't matter if all graphics devics are closed upon exit?
Using this brutal approach seems to work:
myPlotFunc <- function(x,save = FALSE) {
fnames <- paste(names(x),"pdf",sep=".")
if(save)
on.exit(graphics.off(),add = TRUE)
for(i in 1:length(x)) {
if(save) {
pdf(fnames[i])
}
plot(x[[i]])
}
fnames
}
An alternative is just to list all devices when on.exit() gets called, select out the pdf ones and close them. This function implements this and seems to have the desired behaviour.
myPlotFunc2 <- function(x,save = FALSE) {
fnames <- paste(names(x), "pdf", sep=".")
if(save) {
on.exit(foo <- lapply(dev.list()[grepl("pdf", names(dev.list()))],
dev.off),
add = TRUE)
}
for(i in 1:length(x)) {
if(save) {
pdf(fnames[i])
}
plot(x[[i]])
}
fnames
}
It seems the lowest numbered device is the one that R activates after a call to dev.off(), and that will be the on-screen device in the setting you describe, and hence the behaviour your report.

Resources