I working in R (on a Windows OS) attempting to count the number of words in the text file without loading the file into memory. The idea is to get some stats on the file size, line count, word count, etc. A call to R's system() function that uses find for the line count is not hard to come by:
How do I do a "word count" command in Windows Command Prompt
lineCount <- system(paste0('find /c /v "" ', path), intern = T)
The command that I'm trying to work with for the word count is a PowerShell command: Measure-Object. I can get the following code to run without throwing an error but it returns an incorrect count.
print(system2("Measure-Object", args = c('count_words.txt', '-Word')))
[1] 127
The file, count_words.txt has on the order of millions of words. I also tested it on a .txt file with far fewer words.
"There are seven words in this file."
But the count again is returned as 127.
print(system2("Measure-Object", args = c('seven_words.txt', '-Word')))
[1] 127
Does system2() recognize PowerShell commands? What is the correct syntax for a call to the function when using Measure-Object? Why is it returning the same value regardless of actual word count?
The issues -- overview
So, you have two issues going on here:
You aren't telling system2() to use powershell
You aren't using the right powershell syntax
The solution
command <- "Get-Content C:/Users/User/Documents/test1.txt | Measure-Object -Word"
system2("powershell", args = command)
where you replace C:/Users/User/Documents/test2.txt with whatever the path to your file is. I created two .txt files, one with the text "There are seven words in this file." and the other with the text "But there are eight words in this file." I then ran the following in R:
command <- "Get-Content C:/Users/User/Documents/test1.txt | Measure-Object -Word"
system2("powershell", args = command)
Lines Words Characters Property
----- ----- ---------- --------
7
command <- "Get-Content C:/Users/User/Documents/test2.txt | Measure-Object -Word"
system2("powershell", args = command)
Lines Words Characters Property
----- ----- ---------- --------
8
More explanation
From help("system2"):
system2 invokes the OS command specified by command.
One main issue is that Measure-Object isn't a system command -- it's a PowerShell command. The system command for PowerShell is powershell, which is what you need to invoke.
Then, further, you didn't quite have the right PowerShell syntax. If you take a look at the docs, you'll see the PowerShell command you really want is
Get-Content C:/Users/User/Documents/count_words.txt | Measure-Object -Word
(check out example three on the linked documentation).
Related
I'm facing a problem while trying to pass a variable value to a grep command.
In essence, I want to grep out the lines which match my pattern and the pattern is stored in a variable. I take in the input from the user, and parse through myfile and see if the pattern exists(no problem here).
If it exists I want to display the lines which have the pattern i.e grep it out.
My code:
if {$a==1} {
puts "serial number exists"
exec grep $sn myfile } else {
puts "serial number does not exist"}
My input: SN02
My result when I run grep in Shell terminal( grep "SN02" myfile):
serial number exists
SN02 xyz rtw 345
SN02 gfs rew 786
My result when I try to execute grep in Tcl script:
serial number exists
The lines which match the pattern are not displayed.
Your (horrible IMO) indentation is not actually the problem. The problem is that exec does not automatically print the output of the exec'ed command*.
You want puts [exec grep $sn myfile]
This is because the exec command is designed to allow the output to be captured in a variable (like set output [exec some command])
* in an interactive tclsh session, as a convenience, the result of commands is printed. Not so in a non-interactive script.
To follow up on the "horrible" comment, your original code has no visual cues about where the "true" block ends and where the "else" block begins. Due to Tcl's word-oriented nature, it pretty well mandates the one true brace style indentation style.
I have a CSV file with millions of rows. I would like to open a connection to the file and filter unnecessary rows before opening it in R. To detail, I would like to import every 30th row starting at the second row.
I am operating on a Windows machine. I know the following command achieves the desired result on an Apple; however, it doesn't work on my Windows machine.
awk 'BEGIN{i=0}{i++;if (i%30==2) print $1}' < test.csv
In R, if I ran this code on an Apple, I would get the desired result:
write.csv(1:100000, file = "test.csv")
file.pipe <- pipe("awk 'BEGIN{i=0}{i++;if (i%30==2) print $1}' < test.csv")
res <- read.csv(file.pipe)
Clearly, I know nothing about Windows CLI, so could someone translate this awk command to Windows language for me and possibly explain how the translation achieves the desired result?
Thanks in advance!
UPDATE:
So I have downloaded Git and have successfully been able to complete this task using Git command line, but I need to implement it in R because I have to do this task on thousands of files. Anyone know how to make R run this command through Git?
write.csv(1:100000, file = "test.csv")
file.pipe <- pipe("awk \"BEGIN{i=0}{i++;if (i%30==2) print $1}\" test.csv")
res <- read.csv(file.pipe)
On windows the programline of awk need to be surrounded by double quotes. They are escaped because of the other double quotes on the same line.
Also the '<' before the input file is not needed (an I doubt that it is needed on an Apple.)
I'm developing a bash program that execute a R oneliner command to convert a RMarkdown template into a HTML document.
This R oneliner command looks like:
R -e 'library(rmarkdown) ; rmarkdown::render( "template.Rmd", "html_document", output_file = "report.html", output_dir = "'${OUTDIR}'", params = list( param1 = "'${PARAM1}'", param2 = "'${PARAM2}'", ... ) )
I have a long list of parameters, let's say 10 to explain the problem, and it seems that the R or bash has a command line length limit.
When I execute the R oneliner with 10 parameters I obtain a error message like this:
WARNING: '-e library(rmarkdown)~+~;~+~rmarkdown::render(~+~"template.Rmd",~+~"html_document",~+~output_file~+~=~+~"report.html",~+~output_dir~+~=~+~"output/",~+~params~+~=~+~list(~+~param1~+~=~+~"param2", ...
Fatal error: you must specify '--save', '--no-save' or '--vanilla'
When I execute the R oneliner with 9 parameters it's ok (I tried different combinations to verify that the problem was not the last parameter).
When I execute the R oneliner with 10 parameters but with removing all spaces in it, it's ok too so I guess that R or bash use a command line length limit.
R -e 'library(rmarkdown);rmarkdown::render("template.Rmd","html_document",output_file="report.html",output_dir="'${OUTDIR}'",params=list(param1="'${PARAM1}'",param2="'${PARAM2}'",...))
Is it possible to increase this limit?
This will break a number of ways – including if your arguments have spaces or quotes in them.
Instead, try passing the values as arguments. Something like this should give you an idea how it works:
# create a script file
tee arguments.r << 'EOF'
argv <- commandArgs(trailingOnly=TRUE)
arg1 <- argv[1]
print(paste("Argument 1 was", arg1))
EOF
# set some values
param1="foo bar"
param2="baz"
# run the script with arguments
Rscript arguments.r "$param1" "$param2"
Expected output:
[1] "Argument 1 was foo bar"
Always quote your variables and always use lowercase variable names to avoid conflicts with system or application variables.
I am using the shell() command to generate pdf documents from .tex files within a function. This function sometimes gets ran multiple times with adjusted data and so will overwrite the documents. Of course, if the pdf file is open when the .tex file is ran, it generates an error saying it can't run the .tex file. So I want to know whether there are any R or Windows cmd commands which will check whether a file is open or not?
I'm not claiming this as a great solution: it is hacky but maybe it will do. You can make a copy of the file and try to overwrite your original file with it. If it fails, no harm is made. If it succeeds, you'll have modified the file's info (not the contents) but since your end goal is to overwrite it anyway I doubt it will be a huge problem. In either case, you'll be fixed about whether or not the file can be rewritten.
is.writeable <- function(f) {
tmp <- tempfile()
file.copy(f, tmp)
success <- file.copy(tmp, f)
return(success)
}
openfiles /query /v|(findstr /i /c:"C:\Users\David Candy\Documents\Super.xls"&&echo File is open||echo File isn't opened)
Output
592 David Candy 1756 EXCEL.EXE C:\Users\David Candy\Documents\Super.xls
File is open
Findstr returns 0 if found and 1+ if not found or error.
& seperates commands on a line.
&& executes this command only if previous command's errorlevel is 0.
|| (not used above) executes this command only if previous command's errorlevel is NOT 0
> output to a file
>> append output to a file
< input from a file
| output of one command into the input of another command
^ escapes any of the above, including itself, if needed to be passed to a program
" parameters with spaces must be enclosed in quotes
+ used with copy to concatinate files. E.G. copy file1+file2 newfile
, used with copy to indicate missing parameters. This updates the files modified date. E.G. copy /b file1,,
%variablename% a inbuilt or user set environmental variable
!variablename! a user set environmental variable expanded at execution time, turned with SelLocal EnableDelayedExpansion command
%<number> (%1) the nth command line parameter passed to a batch file. %0 is the batchfile's name.
%* (%*) the entire command line.
%<a letter> or %%<a letter> (%A or %%A) the variable in a for loop. Single % sign at command prompt and double % sign in a batch file.
.
--
I'm trying to run R from the command line using command line arguments. This includes passing in some filepaths as arguments for use inside the script. It all works most of the time, but sometimes the paths have spaces in and R doesn't understand.
I'm running something of the form:
R CMD BATCH --slave "--args inputfile='C:/Work/FolderWith SpaceInName/myinputfile.csv' outputfile='C:/Work/myoutputfile.csv'" RScript.r ROut.txt
And R throws out a file saying
Fatal error: cannot open file 'C:\Work\FolderWith': No such file or directory
So evidently my single quotes aren't enough to tell R to take everything inside the quotes as the argument value. I'm thinking this means I should find a way to delimit my --args using a comma, but I can't find a way to do this. I'm sure it's simple but I've not found anything in the documentation.
The current script is very basic:
ca = commandArgs(trailingOnly=TRUE)
eval(parse(text=ca))
tempdata = read.csv(inputFile)
tempdata$total = apply(tempdata[,4:18], 1, sum)
write.csv(tempdata, outputFile, row.names = FALSE)
In case it's relevant I'm using windows for this, but it seems like it's not a cmd prompt problem.
Using eval(parse()) is probably not the best and most efficient way to parse command line arguments. I recommend to use a package like the optparse to do the parsing for you. Parsing command line args has already been solved, no need to reimplement this. I could imagine that this solves your problems. Although, spaces in path names are a bad idea to begin with.
Alternatively, you could take a very simple approach and pass the arguments like this:
R CMD BATCH --slave arg1 arg2
Where you can retrieve them like:
ca = commandArgs(TRUE)
arg1 = ca[2]
arg2 = ca[3]
This avoids the eval(parse which I think is causing the issues. Finally, you could try and escape the space like this:
R CMD BATCH --slave "C:/spam\ bla"
You could also give Rscript a try, R CMD BATCH seems to be less favored than Rscript.
As an enhancement of #PaulHimestra answer here how you can use Rscript :
you create a launcher.bat ,
echo off
C:
PATH R_PATH;%path%
cd DEMO_PATH
Rscript youscript.R arg1 arg2
exit
with R_PATH something like C:/Program Files/R/R-version
There are many similarities with this post:
R command line passing a filename to script in arguments (Windows)
Also this post is very OS related. My answer applies only to Windows.
Probably what you are looking for is RScript.exe instead of R.exe. The latter has no problem with spaces: path\to\RScript "My script.r".
One boring thing may be searching or setting the path for RScript and doing this every time one updates R.
Among the convenience scripts I have in my search path, I wrote a little facility to run RScript without bothering with paths. Just in case it may be of interest for someone:
#echo off
setlocal
::Get change to file dir par (-CD must be 1st par)
::================================================
Set CHANGEDIR="F"
If /I %1 EQU -cd (
Set CHANGEDIR="T"
SHIFT
)
::No args given
::=============
If [%1] EQU [] GoTo :USAGE
::Get R path from registry
::========================
:: may check http://code.google.com/p/batchfiles for updates on R reg keys
Call :CHECKSET hklm\software\R-core\R InstallPath
Call :CHECKSET hklm\software\wow6432Node\r-core\r InstallPath
if not defined RINSTALLPATH echo "Error: R not found" & goto:EOF
::Detect filepath when arg not starting with "-"
::==============================================
::Note the space after ARGS down here!!!
Set ARGS=
:LOOP
if [%1]==[] (GoTo :ELOOP)
Set ARGS=%ARGS% %1
::Echo [%ARGS%]
Set THIS=%~1
if [%THIS:~0,1%] NEQ [-] (Set FPATH=%~dp1)
SHIFT
GoTo :LOOP
:ELOOP
::echo %FPATH%
::Run Rscript script, changing to its path if asked
::=================================================
If %CHANGEDIR%=="T" (CD %FPATH%)
Echo "%RINSTALLPATH%\bin\Rscript.exe" %ARGS%
"%RINSTALLPATH%\bin\Rscript.exe" %ARGS%
endlocal
:: ==== Subroutines ====
GoTo :EOF
:USAGE
Echo USAGE:
Echo R [-cd] [RScriptOptions] Script [ScriptArgs]
Echo.
Echo -cd changes to script dir. Must be first par.
Echo To get RScript help on options etc.:
Echo R --help
GoTo :EOF
:CHECKSET
if not defined RINSTALLPATH for /f "tokens=2*" %%a in ('reg query %1 /v %2 2^>NUL') do set RINSTALLPATH=%%~b
GoTo :EOF
The script prints the actual RScript invoking line, before running it.
Note that there is an added argument, -cd, to change automatically to the script directory. In fact it is not easy to guess the script path from inside R (and set it with setwd()), in order to call other scripts or read/write data files placed in the same path (or in a relative one).
This (-cd) might possibly make superfluous your other commandargs, as you may find convenient calling them straight from inside the script.