Is there a new line constant that's platform independent in R? I'm used to C# and there's Environment.NewLine which will return \r\n on windows and \n otherwise. Searching turned up nothing, but I assume there has to be something somewhere so that scripts can be platform independent.
Related question: Is there a way to detect the platform a script is running on? This could be useful to know for other reasons (which I haven't thought of yet).
EDIT: Here's why I'm asking. I'm downloading files from an FTP server, but want to get a list of files and only download files that are on the server that don't exist locally. Here's how I'm getting the list of files:
filesonserver <- unlist(strsplit(getURL(basePath, ftp.use.epsv=F, dirlistonly=T), "\n"))
On windows, the files are separated by \r\n. On my mac (where I'm currently working), they're separated by \n. I was looking for a way to make this platform independent. I haven't tried just separating by \n on windows, which might work. There might also be a way to get the list of files as a vector without having to split them, which would avoid this entirely...
The package tryCatchLog has a function determine.platform.NewLine():
https://cran.r-project.org/package=tryCatchLog
https://github.com/aryoda/tryCatchLog/blob/master/R/platform_newline.R
If you consequently use this string instead of hard-coded "\n" your new lines will work platform-independently.
The answer to the initial question appears to be there isn't a new line constant like C# has. But it doesn't matter in my case, as the comments pointed out. It didn't occur to me until after I edited in the details that I probably didn't need to worry about it. Splitting by \n works fine on windows, even though the string containing the files names returned by getURL() is split by \r\n.
Related
I was teaching an online course and a student asked me why R only uses / and not \ in file paths when using read.csv and other related functions. I tried looking at the documentation but it didn’t really mention anything about it. Never really thought about it because I use a Mac, and the default in Macs is \, but not so in Windows machines.
I’m not trained in computer science so I was left a bit stumped to answer the question a I’m afraid. Students always ask the darnest things!
Interesting question.
First off, the "forward slash" / is actually more common as it used by Unix, Linux, and macOS.
Second, the "backward slash" \ is actually somewhat painful as it is also an escape character. So whenever you want one, you need to type two in string: "C:\\TEMP".
Third, R on Windows knows this and helps! So you can you use a forward slash whereever you would use a backward slash: "C:/TEMP" works the same!
Fourth, you can have R compute the path for you and it will use use the separator: file.path("some", "dir").
So the short answer: R uses both on Windows and lets you pick whichever you find easier. But remember to use two backward slashes (unless you use the very new R 4.0.0 feature on raw strings which I'll skip for now).
(Note: backslashes as directory folder separators on Macs is a recent innovation.See History of Mac folder separators
I think if you review the history (or look it up if you were not there when it occurred as I was) you will find that Unix (which Linux copied completely) got there first. It preceded either MS-DOS or Macs or last of all Windows. R is a work-alike clone of S which was developed like Unix at Bell Labs.
Mac originally used colons (:) as folder separators (and still won't accept them in file names) and converted to slashes sometime during its long transition to BSD Unix which it licensed from ATT.
Shouldn't the question be: why Microsoft chose to use a backslash?
I would like to write a line in a text file at a given position (i) by avoiding the sequential reading.
There is WriteLines base function but I don't know how to insert the text at position (i) given as parameter.
Thanks
Dave
This is — unrelated to R — fundamentally impossible. Most (all common) filesystems do not support inserting or removing content in the middle of a file. The only supported operations are appending (or truncation) at the end, and R only supports appending, not truncation.
The way virtually all software solves your problem is by reading the file, modifying it, and writing it back to disk. If you want to get fancy because the file is very large (at least in the order of hundreds of MiB), you can stream edit the file: Read a part, edit that part, write it back to a new file. Rinse and repeat.
Technical aside: There is one exception to the above with low-level file operations, since files are stored as as non-contiguous “blocks”. But even if R supported this it wouldn’t help you since it doesn’t permit byte-level or line-level granularity: Blocks are typically at least 4 kiB in size.
I have a flat file that I need to take and insert a carriage return every 410 characters. I know this sounds weird, but for whatever reason my work was given several huge flat files from a clearinghouse, and I need to parse it out.
There is nothing that seperates what is supposed to be each new line, but it is exactly 410 characters. So I can't even search for anything specific and then do it.
There are 21 files total, each about 12-13mb.
I have asked for a CSV file, and they are unable to provide that.
I am trying to see if Notepad++ will do a Character count and then I can just hit "enter" after every 410th.
Also I am trying to see if I can do this in Java.
Any help you all can provide would be appreciated.
In Notepad++ you can search for the regular expression (.{410}) and replace it with \1\r.
It has happened to me that Notepad++ swallowed some characters when doing regex-based search and replace operations in large files, so I would try this for one file, then remove all the carriage returns again and compare the result size to the original size, just to make sure that nothing got swallowed during the replace operation.
I connect to different types of computers every day. When I Telnet in, the first thing I do is run a command line script that is about 1150 characters long. I have no problem with Linux based systems, but if it is Unix based (ie IRIX), then my command is truncated at ~256 Chars.
The Final result of the Command will be a data dump (the results of the commands) to the Telnet window. This data will then be copied and pasted into a tool for analysis. Also the Command string that is being entered is a series of Commands (mostly egreps) separated by semi-colons, but when combined together it gets very long.
I need to be able to enter all 1150 Chars on the command line. The systems I access are not mine, So I need to be as Benign as possible when interacting with them.
Your Help is appreciated.
If its a parameter list thats making the command that long then xargs is your friend
I'm not sure if this is the answer you're looking for, but as you stated in your comment, all of the commands are less than 256 characters. So, you can break the commands up into 5-6 groups being sure to only separate at the semi-colon (not at pipes). Then execute each group in sequence. It's more work if your use to just copying and pasting, but not much if you already have the groups created in a text file.
I'm using the diff command to compare two text files. They need to be literally matched.
So I use the diff:
diff binary.out binary.expected
(By the way, those files are NOT binary files. They are text file. I call them binary because that's the name of the project)
and got
Binary files binary.out and binary.expected differ
When I use another diff tool, the smartest of all (AKA human), and there's really nothing different between the two files.
Does anyone happen to know what's going on here?
Thanks.
diff from diffutils says the following about text/binary:
diff determines whether a file is text or binary by checking the
first few bytes in the file; the exact number of bytes is system
dependent, but it is typically several thousand. If every byte in
that part of the file is non-null, diff considers the file to be
text; otherwise it considers the file to be binary.
hence GNU diff have a quite open definition of what is text, and the use of the --text option to force it to treat the file as text should seldom be needed.
Have you checked if binary.out or binary.expected contains null characters? What version is your diff program?
Make sure to ignore white space in the diff options.
It may also see Unicode characters and interpret that as binary. See if your diff tool has an option to force text mode.