Reading a string with special character in R - r

I have a little problem in reading a file in R. In particular, I run a script that load a file, say X, which stores a string per each line. There are string with special characters like, ' and therefore I get some errors.
I run the script by command line as follows
Rscript.exe MyScript.R "C:\X.txt"
The content of file X is, for instance:
I'll win a prize
I'll shutdown my pc
The MyScript.R script initially loads the file X.txt as follows
args <- commandArgs(TRUE)
args <- read.table(args[1], sep="\n")
and then uses it as follows:
print(nrow(args))
The previous line returns 0. However, if I remove the ' character from the two lines in file X.txt then everything works fine (i.e., and the returned length is 2).
Any solution to handle this tricky input?

read.table is meant for reading structured data, ie data that is in the form of multiple fields per row. If you just want to read a bunch of strings, use readLines.
args <- readLines(args[1])

Related

How to designate a variable within a string quote in R?

Hi Stack Overflow community, thank you so much in advance for any help on this issue.
I'm interested in writing a non-interactive Rscript that designates variables that are read in from the command line. For example:
Rscript ./script {1} {2}
The {1} is an argument I pass to the script and the {2} is an argument corresponding to the full path that points to a text file composed of one column, filled with numbers and names.
I wrote a small Rscript whose function is to read in the text file and delete the rows that contain a particular string. I want the string to correspond to a variable I designate at the beginning of the script and provide on the command line. So far, my script reads:
args = commandArgs(trailingOnly=TRUE)
POP = args[1]
FREQ = args[2]
# read the file in
POP.freq <- paste0("FREQ")
# write a function to remove the rows that hold the string "POP"
clean_names <- function(x, ...) {
POP.freq[!POP.freq$POP == "POP",]
}
The problem is that in the function where I designate the string to be removed: "POP", R reads this literally as "POP" and not whatever the argument is I am trying to pass it (usually the name of a country e.g. Germany). In bash, the $ designates the variable but I am not sure what the equivalent is in R. How can I designate that POP is a variable within the string but also let R know that it is a string I am trying to identify.
I would appreciate any advice on this matter!

R how to print different types of variables into one txt file

I have a lm model, a numeric vector from vif function, and some character variables.
gvmodel<-gvlma(lmFit)
VIF<-sqrt(vif(lmFit))
Str1<-"Original R-square=".567"
Str2<-"Cross-validated R-Square=.123"
I would like to print content of all into one single txt file. I tried cat and capture.output.
cat(gvmodel,VIF,Str1, file="E:/.../text.txt")
capture.output(paste(gvmodel,VIF,...,sep=""),file="E:/.../text.txt")
Obviously this did not work. Anyone how to print them into a single txt file? Thanks
Sink() works. Sink a txt file path and name, run any command that produces text outputs and they will be captured in the txt file. And finally sink() to close.
sink("E:/.../Sink.txt")
gvmodel
sqrt(vif(ModeName))
sink()
In your code gvmodel<-gvlma(lmFit) produces gvlma object. It cannot be directly printed to .txt file as such. You can transform it first to characters using as.character() option:
gvmodel<-gvlma(lmFit)
gvmodel <- as.character(gvmodel)
The result is printable for example with cat. So if you put all this together:
gvmodel<-gvlma(lmFit)
gvmodel <- as.character(gvmodel)
VIF<-sqrt(vif(lmFit))
Str1<-"Original R-square=.567"
Str2<-"Cross-validated R-Square=.123" # NB the corrected quotation marks
cat(gvmodel,VIF,Str1, file="yourfile.txt",sep="\t")

when passing argument to R script via command line string gets split at blank spaces

I am running an R script via the command line (which is initiated by a Jython script). I want to pass a file location to the R script however if the file location has spaces in it, the string gets split at those spaces to make a list.
I can pass an argument and run the R script in Jython using:
cmd = '"C:\\Program Files\\R\\R-3.5.1\\bin\\Rscript"'+" "+'"C:\\filelocation"'+" "+"C:/filelocation"
Runtime.getRuntime().exec(cmd)
In R I then use the following to get the passed argument:
args <- commandArgs(trailingOnly = TRUE)
The problem I having is that if there is a space in the name of the folder, the string gets split at the spaces. Therefore instead of args being one string for the file location it becomes a list split at every space. I can fix this problem in R using the following (to make the file location one string):
location <- paste(args, collapse = " ")
is there a better solution to this problem?
thanks for your time,
Mike

Using com.opencsv.CSVReader on windows stops reading lines prematurely

I have two files that are identical except for the line ending codes. The one that uses the newline (linux/Unix)character works (reads all 550 rows of data) and the one that uses carriage return and line feed (Windows) stops returning lines after reading 269 lines. In both cases the data is read correctly up to the point where they stop.
If I run dos2unix on the file that fails, the resulting file works.
I would like to be able read CSV files regardless of their origin. If I could at least detect that the file is in the wrong format before reading part of the data that would be helpful
Even if I could tell at any time in the middle of reading the file that it was not going to work, I could output an error.
My current state of reading half the file and terminating with no error is dangerous.
The problem is that under the covers openCSV uses a BufferedReader which reads a line from the stream until it gets to the Systems line.seperator.
If you know beforehand what the line separator of the file is then in your application just do a System.setProperty("line.separator", newLine) where newLine is either "\n" or "\r\n" based on the file you are about to parse. Or you can pass that in as a parameter.
If you want to automatically detect the file character. Create a method that will take the file you want, create a BufferedReader and read a single line. If the last character is a '\r' then your system system uses "\n" but you want to set it to "\r\n". Else if line.contains("\n") returns true then you are on a system that uses "\r\n" and you want to set it to "\n". Otherwise the system and the file you are reading have compatible line feed characters.
Just note if you do change the system line feed character be sure to set it back after processing the file in case your program is processing multiple files.

How can I insert a column in numeric comma separated input?

Hi i have as text file below
input
326783,326784,402
326783,0326784,402
503534,503535,403
503534,0503535,403
429759,429758,404
429759,0429758,404
409626,409627,405
409626,0409627,405
369917,369916,402
369917,0369916,403
i want to convert it like below
condition :
1)input file column 3 and column 1 should be be same for 326784 and 0326784 and like that so on
2)if it different like the above input file last case then it should be printed in last line
output should be
326783,326784,0326784,402
503534,503535,0503535,403
429759,429758,0429758,404
409626,409627,0409627,405
369917,369916,402
369917,0369916,403
i am using solaris platform
please help me
I don't understand the logic of your computation, but some general advice: the unix tool awk can do such computations. It understands comma-separated files and you can get it to output other comma-separated files, manipulated by your logic (which you'll have to express in awk syntax).
This is, as I understand it, the unix way to do it.
The way I'd do it (being a non-expert on awk and just mentioning it for completeness ;) would be to write a little python script.
you want to
open an input and an output file
get each line from the input file
parse the integers
perform your logic
write integers to your output file
unchecked python-like code:
f_in = open("input", "r")
f_out = open("output", "w")
for line in f_in.readlines():
ints = [int(x) for x in line.split(",")]
f_out.write("%d, %d, %d\n" % (ints[0], ints[1], ints[0]+ints[1]))
f_in.close()
f_out.close()
Here, the logic is in the f_out.write(...) line (this example would output the first, the second and the sum of both input integers)
You can check if you have a Python interpreter at hand by simply typing python and seeing what happens. If you have, save your code into something.py and start it with "python something.py"

Resources