Reading files in R inside a bash loop - r

I want to run a R script for 23 chromosomes. The files I need to read are "chr_1.txt, chr_2.txt,...,chr_23.txt"
So I have a bash file
#!/bin/bash
for chr in {1..23}; do \
sbatch torunR.sh "chr_$"
done
and another bash file (torunR.sh)
R CMD BATCH script.R
The problem is that I don't know how to read a different "chr_X.txt" files in R (script.R).
I have tried chr_$ or chr_*', for example:
geno = read.table(file="chr_*.txt") but it didn't work.
Any ideas?? Thanks!!

I'm not an expert in bash scripting so I don't know the necessity of batching, but I would modify your R script to use a command line argument created within the loop.
Your R script would look like this:
## script.R
targetFile <- commandArgs(trailingOnly = TRUE)
# optional status message
cat(sprintf("Processing file %s\n", targetFile))
geno <- read.table(file = targetFile)
Then modify your bash script to be something along the lines of
#!/bin/bash
for chr in {1..23}; do \
Rscript script.R "chr_$chr.txt"
done
Result when running bash script (with optional status message):
$ ./bashScript.sh
Processing file chr_1.txt
Processing file chr_2.txt
Processing file chr_3.txt
Processing file chr_4.txt
Processing file chr_5.txt
Processing file chr_6.txt
Processing file chr_7.txt
Processing file chr_8.txt
Processing file chr_9.txt
Processing file chr_10.txt
Processing file chr_11.txt
Processing file chr_12.txt
Processing file chr_13.txt
Processing file chr_14.txt
Processing file chr_15.txt
Processing file chr_16.txt
Processing file chr_17.txt
Processing file chr_18.txt
Processing file chr_19.txt
Processing file chr_20.txt
Processing file chr_21.txt
Processing file chr_22.txt
Processing file chr_23.txt

Related

Default Authorization Required response (401) - taskscheduleR

I'm trying to run a daily taskscheduleR script that pulls data into R from an API. It works when I run it as a one time task but for some reason it won't work as a daily task. I keep getting the following error in the log file:
<HEAD><TITLE>Authorization Required</TITLE></HEAD>
<BODY BGCOLOR=white FGCOLOR=black>
<H1>Authorization Required</H1><HR>
<FONT FACE=Helvetica,Arial>
<B>Description: Authorization is required for access to this proxy</B>
</FONT>
<HR>
<!-- default Authorization Required response (401) -->
Here's the code:
library(httr)
library(jsonlite)
library(tidyverse)
library(taskscheduleR)
# Url to feed into GET function
url<-"https://urldefense.com/v3/__http://files.airnowtech.org/airnow/yesterday/daily_data_v2.dat__;!!J30X0ZrnC1oQtbA!Yh5wIss-mzbpMRXugALJoWEKLKcg1-7VmERQwcx2ESK0PZpM5NWNml5s9MVgwHr5LD1i5w$ "
# Sends request to AirNow API to get access to data
my_raw_result<-httr::GET(url)
# Retrieve contents of a request
my_content<-httr::content(my_raw_result,as="text")
# Parse content into a dataframe
my_content_from_delim <- my_content %>% textConnection %>% readLines %>% read.delim(text = ., sep = "|",header = FALSE)
head(my_content_from_delim)
I have been using the Rstudio add-in to create the task.
If you are trying to access this on a work computer, you may need to allow downloads from the url link. Open a browser, paste that url, click 'allow downloads', run the script.
I am not sure whether the solution I will offer will work for you, but it won't harm to try. If the problem related to the task scheduler, the following solution might work. However, if the problem of authorization issues, you may need to get some IT help from your workplace.
For the task scheduler issue, you can directly send your script to the windows task scheduler with a batch file and create a schedule for it.
To make it easy, you can use the following code. First, open a new folder and copy-paste your R script there. To run the following code, you should call you R script as My Script.r.
Then, in the same folder, create a batch file with the following codes. To create a batch file, you should copy the following code into a Notepad and save it as Run R Script.bat in the same folder.
cd %~dp0
"C:\PROGRA~1\R\R-40~1.0\bin\R.exe" -e "setwd(%~dp0)" CMD BATCH --vanilla --slave "%~dp0My Script.r" Log.txt
Here, cd %~dp0 will set the directory for the windows batch to the folder you run this batch. "C:\PROGRA~1\R\R-40~1.0\bin\R.exe" will specify your R.exe. You may need to change the path based on your system files.
-e "setwd(%~dp0)" will set the directory of R to the same folder in which the batch and script will be run.
"%~dp0My Script.r" Log.txt will define R script pathname and the log file for the batch.
Second, to create a daily schedule, we are going to create another batch file. To do so, copy and paste the following codes into a notepad and save as Daily Schedule.bat.
When you click the Daily Schedule.bat, it will create a daily task and run for the first time in one minute, and every day it will repeat itself at the same time when you first run this batch.
#echo off
for /F "tokens=1*" %%A in ('
powershell -NoP -C "(Get-Date).AddMinutes(1).ToString('MM/dd/yyyy HH:mm:ss')"
') do (
Set "MyDate=%%A"
set "MyTime=%%B"
)
::Execute path to bat path
cd %~dp0
::Create Task
SchTasks /Create /SC DAILY /TN "MY R TASK" /TR "%~dp0Run R Script.bat" /sd %MyDate% /st %MyTime%
This code will create a task called as "MY R TASK". To see whether it is scheduled, you can run the following codes on the windows prompt: taskschd.msc. This will open your task scheduler, and you can find your task there. If you want to modify or delete, you can use this task scheduler program; it has a nice GUI and easy to navigate.
For more details about the Task scheduler syntax, see the following link
If you have any questions, let me know.

Spaces in paths in batch mode R

I'm trying to get an R script to run from a batch file so it can be nice and clean for other users. Currently, you drag and drop a CSV file onto the batch file and it passes the file name to the R script for input.
When there's a space in the file path/name it works fine in RStudio but causes problems when I call it from the batch file. When I do that it tries to open the path before the space.
I've tried to reformat the file path from within R by using shortPathName(inputPath) and by replacing spaces with "\ " but it doesn't seem to work.
At the moment, the script is launched with
"%~dp0\R-3.6.0\bin\R.exe" CMD BATCH "--args %~1" "%~dp0\Script.R"
with the script containing
args <- commandArgs(TRUE)
inputPath <- args[1]
inputPath <- shortPathName(inputPath)
inputData <- read.csv(inputPath)
It runs fine from within RStudio but crashes when launched from the batch producing this error message in the output file:
Error in file(file, "rt") : cannot open the connection
Calls: read.csv -> read.table -> file
In addition: Warning message:
In file(file, "rt") :
cannot open file 'file path up to the space': No such file or directory
Execution halted
By no means a R expert, but I'd try
%~dp0\R-3.6.0\bin\R.exe" CMD BATCH "--args %~s1" "%~dp0\Script.R"
The %~s1 should supply the short filename as the argument.
After trying several formulations of the batch file and some debugging, I found that the batch file was passing the first part of the file before the space as the first argument.
After finding that the use of R in CMD BATCH mode is no longer advisable so switched to running using Rscript mode as
"%~dp0\R-3.6.0\bin\Rscript.exe" --vanilla "%~dp0\Script.R" "%~1"
This allowed for the argument to be passed to R with "", and hence with the space.
Since v3.5.1, R accepts file paths with spaces.

pass character "&" by args to R

the following R code is to be execute by command prompt (windows)
# Collect arguments
args <- commandArgs(TRUE)
archivo <- as.character(args[1])
cat(archivo)
like this
C:\Users\Owner\Desktop>Rscript prueba.r "hola&chao"
problem is that command prompt respond
hola
'chao' is not recognized as an internal or external command,
operable program or batch file.
C:\Users\Owner\Desktop>
I need to R accepts the "&" as a character and print it all together in the cat()
What should I do?
Thanks
This isn't an R problem, but a shell problem. You actually executed
Rscript prueba.r hola&chao
instead of
Rscript prueba.r "hola&chao"
In the Windows command shell,
prog1 & prog2
executes both programs in sequence, which is why you see the output of
Rscript prueba.r hola
followed by the output of
chao
The other possibility is that Rscript is a buggy batch file.

Executing console command in R

I would like to execute this DOS command under R:
iconv -f ISO-8859-1 -t UTF-8 FileName.md > FileNameNew.md
The above command creates new file after transforming from ISO to UTF.
I have tried execute this command however unsuccessfully with:
system(paste("iconv -f ISO-8859-1 -t UTF-8 FileName.md > FileNameNew.md", sep=""))
This gives me two types of errors:
Invalid argument
No such file or directory
I don't think the issue is the second since when I run the command under R it in fact executes the command as it re-reads the FileName.md, which means he found the file. I think it is just a issue with the > and hence formulation of the command in the system(paste("")) command.
When I rund this command directly under console it works.
The problem is (most likely) simply with where the R session is located. Check this by running getwd() in R and see if it is in the same place as the file. The paste part shouldn't be needed, as it is not really pasting anything (paste combines 2 strings together, while it is one string here).
Solve this by explicitly attaching input and output to those files.
If you would insist on using paste, you could use it for instance like this:
system(paste("iconv -f ISO-8859-1 -t UTF-8 ", getwd(), "/FileName.md > ",
getwd(), "/FileNameNew.md", sep=""))

Write errors to file

Is there a way to write my R errors to a file? I run R on bash via:
R --vanilla < myprogram > myprogram.out &
When my program encounters an error (not a syntax error...like an illegal replacement or something) it stops but the error line isn't written to the output file and I don't know what the program was and a lot of the time I log out from the server while it runs.
Thanks,
Josh
Use the R CMD BATCH <infile> <outfile> syntax instead.

Resources