I think this is an accurate title but feel free to change it if anyone thinks it can be worded better. I am running the following commands using data.table::fread.
fread("sed 's+0/0+0+g' R.test.txt > R.test.edit.txt")
fread("sed 's+0/1+1+g' R.test.edit.txt > R.test.edit2.txt")
fread("sed 's+1/1+2+g' R.test.edit2txt > R.test.edit3.txt")
fread("sed 's+./.+0.01+g' R.test3..edit3.txt > R.test.edit.final.txt")
After each line I get the following message
Warning messages:
1: In fread("sed 's+0/0+0+g' /R/R.test.small.txt > /R/R.test.edit.small.txt") :
File '/path/to/tmp/RtmpwqJu82/file7e7e250b96bf' has size 0. Returning a NULL data.table.
2: In fread("sed 's+0/1+1+g' /R/R.test.edit.small.txt > /R/R.test.edit2.small.txt") :
File '/path/to/tmp/RtmpwqJu82/file7e7e8456d82' has size 0. Returning a NULL data.table.
3: In fread("sed 's+1/1+2+g' /R/R.test.edit2.small.txt > /R/R.test.edit3.small.txt") :
File '/path/to/tmp/RtmpwqJu82/file7e7e3f96bc35' has size 0. Returning a NULL data.table.
4: In fread("sed 's+./.+0.01+g' /R/R.test.edit3.small.txt > /R/R.test.edit.final.small.txt") :
File '/path/to/tmp/RtmpwqJu82/file7e7e302a3cde' has size 0. Returning a NULL data.table.
So it is weird... fread makes all the files I need when I run it on my laptop but gives that error for each file. When I got to run the script on our cluster, the script crashes and gives the following message.
> fread("sed 's+0/0+0+g' /R/R.test.txt > /R/R.test.edit.txt")
Error in fread("sed 's+0/0+0+g' /R/R.test.txt > /R/R.test.edit.txt") :
File is empty: /dev/shm/file38d161d613c
Execution halted
I think it has to do with the message I get when I run the script on my laptop? I think it a user issue but maybe it is a bug. I was wondering if anyone had any ideas. I was wondering if anyone had any ideas? I thought of a work around using the following
end_time <- Sys.time()
print(end_time)
peakRAM(system(paste("sed 's+0/0+0+g' /R/R.test.txt > /R/R.test.edit.txt")),
system(paste("sed 's+0/1+1+g' /R/R.test.edit.txt > /R/R.test.edit2.txt")),
system(paste("sed 's+1/1+2+g' /R/R.test.edit2.txt > /R/R.test.edit3.txt")),
system(paste("sed 's+./.+0.01+g' /R/R.test.edit3.txt > /R/R.test.edit.final.txt")))
end_time <- Sys.time()
print(end_time)
And this works fine. So I think there's a problem with sed or anything like that. I am just wondering what I am doing wrong when I use fread
Comments above are correct about what to do; I tried looking in the documentation for fread but didn't find anything helpful for you so I filed an issue to improve... thanks!
When you pass a terminal command to fread, it creates a tmp file for you automatically in the background. You can see the exact line here, stylized:
system(paste0('(', cmd, ') > ', tmpFile<-tempfile(tmpdir=tmpdir))
Then fread is applied to that file. As mentioned, the file resulting from your command with > tmpFile appended has size 0.
If you actually want to keep those intermediate files (e.g. R.test.edit.txt), you have two options: (1) first, run system('grep > R.test.edit.txt') then run fread on the output; or (2) [available on development version only for now; see Installation wiki] supply the tmpdir argument to fread and omit the > R.test.edit.txt part; fread will do the outputting itself for you.
If you don't actually care about the intermediate files, simply omit the > R.test.edit.txt part and fread should work as you were expecting, e.g.:
fread("sed 's+0/0+0+g' R.test.txt")
Related
I have a use case where I need to execute bash commands inside an R program. I can send and verify that bash commands are being executed, but for reasons I do not understand, I can't set and use variables. To begin with, a simple command works fine:
$ R
...
> system("ls")
1a.csv 1b.csv 2.csv 3.csv
[1] 0
>
Now moving on to the problem. I've tried as many approaches as I could find, but none seem to work:
> system("TEST_VAR=\"test_val\"")
[1] 127
Warning message:
In system("TEST_VAR=\"test_val\"") : 'TEST_VAR="test_val"' not found
> system("bash -c 'REPORT_S3=report_s3_test_val'")
[1] 0
> system("echo ${REPORT_S3}")
$REPORT_S3
[1] 0
> system('TEST_VAR=test_var')
[1] 127
Warning message:
In system("TEST_VAR=test_var") : 'TEST_VAR=test_var' not found
> Sys.setenv(TEST_VAR = "test_val")
> system("echo $TEST_VAR")
$TEST_VAR
[1] 0
> system("bash -c 'export TEST_VAR=\"test_val\"'")
[1] 0
> system(" echo ${TEST_VAR}")
$TEST_VAR
[1] 0
None of these attempts succeed.
What I need to do is set variables and subsequently use them to create successively more complex commands. This works fine in in bash. But I can't seem to get it to work in R, apparently for the reasons above.
REPORT_S3="s3://xxxxxxxx-reports/r/html/"$RMD_FILE"_"$EMAIL_ADDRESS"_"$CLOUDWATCH_UUID".html"
PRESIGNED_URL=$(aws s3 presign --expires-in 604800 $REPORT_S3)
JSON_STRING='xxxxx"$CLOUDWATCH_UUID"xxxxxxx"$PRESIGNED_URL".....'
echo $JSON_STRING > message.json
echo '{"ToAddresses":["'$EMAIL_ADDRESS'"],"CcAddresses":[],"BccAddresses":[]}' > destination.json
aws ses send-email --from xx#yyyyyyyy.co.nz --destination file://destination.json --message file://message.json --region ap-southeast-2
Perhaps there any other options for issuing bash commands other than system that would permit easier reuse of the original source bash code?
Environment variables are per process, and each system(..) call starts a new process. If you define and reference the variable in the same system call, it works fine:
> system('
var="foo"
echo "The variable is: $var"
')
The variable is: foo
If you put your entire script into a single system(..) call, instead of trying to run line by line, it should therefore work.
An alternative method is using Sys.setenv to set the variables in your current R process, so that future system() calls inherit it:
> Sys.setenv(var = "bar")
> system('echo "The variable is: $var"')
The variable is: bar
Obviously, since Sys.setenv is an R function, you must use R code to define your variables, and not rely on shell syntax like $(..)
PS: system() invokes sh and not bash, so all the code you pass it should be sh compatible.
This is likely a stupid question but I have not found a work around (at least in anything I have searched for, though I might just not be using the right search parameters.)
I want to call an executable in Windows, and send a file to it (in this case a Blaise man file), the name of which is variable in my script.
So, for example, I have
x<-2
myfile<-c(paste("FileNumber",x,".man", sep="")
system("myapp.exe" myfile)
But I simply get
Error: unexpected symbol in "system("myapp.exe" myfile"
as if the command is not recognizing the object as myfile, instead taking "myfile" as literal text.
I tried using a paste function to create a whole line command, but that also did not work.
The system command will not concatenate the string and the myfile object together, you have to do it yourself.
So, try this instead:
x<-2
myfile<-c(paste("FileNumber",x,".man", sep=""))
cmd <- paste("myapp.exe", myfile)
system(cmd)
Or just:
x<-2
system(paste("myapp.exe", c(paste("FileNumber",x,".man", sep=""))))
I have a lot of very large txt-files that have a header I would like to remove.
Is there a way to do this without reading in the entire file?
Possibly using the system() command?
I found some ideas here but I haven't been able to make them work. I am using Windows 7 and R Version 3.2.2.
Here is what I have tried:
> systCommand <- paste0("echo '$(tail -n +2 ", myFilePath, ")' > ", myFilePath)
> system(systCommand, intern=T)
Error in system(systCommand, intern = T) : 'echo' not found
I am pretty sure that this is because I am using windows?
after reading in
count_table <- read.table("your path/filename.txt")
head(count_table)
if the first row is header,
c_table <- count_table[-1,]
head(c_table)
then, the first line of header can be removed
In R, I'm wondering if it's possible to temporarily redirect the output of the console to a variable?
p.s. There are a few examples on the web on how to use sink() to redirect the output into a filename, but none that I could find showing how to redirect into a variable.
p.p.s. The reason this is useful, in practice, is that I need to print out a portion of the default console output from some of the built-in functions in R.
I believe results <- capture.output(...) is what you need (i.e. using the default file=NULL argument). sink(textConnection("results")); ...; sink() should work as well, but as ?capture.output says, capture.output() is:
Related to ‘sink’ in the same way that ‘with’ is related to ‘attach’.
... which suggests that capture.output() will generally be better since it is more contained (i.e. you don't have to remember to terminate the sink()).
If you want to send the output of multiple statements to a variable you can wrap them in curly brackets {}, but if the block is sufficiently complex it might be better to use sink() (or make your code more modular by wrapping it in functions).
For the record, it's indeed possible to store stdout in a variable with the help of a temorary connection without calling capture.output -- e.g. when you want to save both the results and stdout. Example:
Prepare the variable for the diverted R output:
> stdout <- vector('character')
> con <- textConnection('stdout', 'wr', local = TRUE)
Divert the output:
> sink(con)
Do some stuff:
> 1:10
End the diversion:
> sink()
Close the temporary connection:
> close(con)
Check results:
> stdout
[1] " [1] 1 2 3 4 5 6 7 8 9 10"
I have a fairly basic data.table in R, with 250k rows and 90 columns. I am trying to key the data.table on one of the columns which is of class character. When I call:
setkey(my.dt,my.column)
I receive the following cryptic error message:
"Error in setkeyv(x, cols, verbose=verbose) :
reorder received irregular lengthed list"
I have found a source-code commit with this message, but can't quite decipher what it means. My key column contains no NA or blank values, seems perfectly reasonable to look at (it contains stock tickers), and behaves well with the default order() command.
Even more frustrating, the following code completes correctly:
first.dt <- my.dt[1:100000]
setkey(first.dt,my.column)
second.dt <- my.dt[100001:nrow(my.dt]
setkey(second.dt,my.column)
I have no idea what could be going on here. Any tips?
Edit 1: I have confirmed every value in the key fits a fairly standard format:
> length(grep("[A-Z]{3,4}\\.[A-Z]{2}",my.dt$my.column)) == nrow(my.dt)
[1] TRUE
Edit 2: My system info is below (note that I'm actually using Windows 7). I am using data.table version 1.8.
> Sys.info()
sysname release version nodename machine login
"Windows" "Server 2008 x64" "build 7600" "WIN-9RH28AH0CKG" "x86-64" "Administrator"
user effective_user
"Administrator" "Administrator"
Please run :
sapply(my.dt, length)
I suspect that one or more columns have a different length to the first column, and that's an invalid data.table. It won't be one of the first 5 because your .Internal(inspect(my.dt)) (thanks) shows those and they're ok.
If so, there is this bug fix in v1.8.1 :
o rbind() of DT with an irregular list() now recycles the list items
correctly, #2003. Test added.
Any chance there's an rbind() at an earlier point to create my.dt together with an irregular lengthed list? If not, please step through your code running the sapply(my.dt,length) to see where the invalidly lengthed column is being created. Armed with that we can make a work around and also fix the potential bug. Thanks.
EDIT :
The original cryptic error message is now improved in v1.8.1, as follows :
DT = list(a=6:1,b=4:1)
setattr(DT,"class",c("data.table","data.frame"))
setkey(DT,a)
Error in setkeyv(x, cols, verbose = verbose) :
Column 2 is length 4 which differs from length of column 1 (6). Invalid
data.table. Check NEWS link at top of ?data.table for latest bug fixes. If
not already reported and fixed, please report to datatable-help.
NB: This method to create a data.table is not recommended because it lets you create an invalid data.table. Unless, you are really sure the list is regular and you really do need speed (i.e. for speed you want to avoid the checks that as.data.table() and data.table() do), or you need to demonstrate an invalid data.table, as I'm doing here.