R programming, naming output file using variable - r

I would like to direct output to a file, using a write.csv statement. I am wanting to write 16 different output files, labeling each one with the extension 1 through 16.
Example as written now:
trackfilenums=1:16
for (i in trackfilenums){
calculations etc
write.csv(max.hsi, 'Severity_Index.csv', row.names=F)
}
I would like for the output csv files to be labeled 'Severity_Index_1.csv', 'Severity_Index_2.csv', etc. Not sure how to do this in R language.
Thanks!
Kimberly

You will want to use the paste command:
write.csv(max.hsi, paste0("Severity_Index_", i,".csv"), row.names=F)

Some people like to have file names like Name_01 Name_02 etc instead of Name_1 Name_2 etc. This may, for example, make the alphabetical order more reasonable: with some software, otherwise, 10 would come after 1, 20 after 2, etc.
This kind of numbering can be achieved with sprintf:
sprintf("Severity_Index_%02d.csv", 7)
The interesting part is %02d -- this says that i is an integer value (could actually use %02i as well) that will take at least 2 positions, and leading zero will be used if necessary.
# try also
sprintf("Severity_Index_%03d.csv", 7)
sprintf("Severity_Index_%2d.csv", 7)

To add to the other answers here, I find it's also a good idea to sanitise the pasted string to make sure it is ok for the file system. For that purpose I have the following function:
fsSafe <- function(string) {
safeString <- gsub("[^[:alnum:]]", "_", string)
safeString <- gsub("_+", "_", safeString)
safeString
}
This simply strips out all non-alphabetic and non-numeric characters and replacing them with an underscore.

Related

write_csv - Exporting trailing spaces (no elimination)

I am trying to export a table to CSV format, but one of my columns is special - it's like a number string except that the length of the string needs to be the same every time, so I add trailing spaces to shorter numbers to get it to a certain length (in this case I make it length 5).
library(dplyr)
library(readr)
df <- read.table(text="ID Something
22 Red
55555 Red
123 Blue
",header=T)
df <- mutate(df,ID=str_pad(ID,5,"right"," "))
df
ID Something
1 22 Red
2 55555 Red
3 123 Blue
Unfortunately, when I try to do write_csv somewhere, the trailing spaces disappear which is not good for what I want to use this for. I think it's because I am downloading the csv from the R server and then opening it in Excel, which messes around with the data. Any tips?
str_pad() appears to be a function from stringr package, which is not currently available for R 3.5.0 which I am using - this may be the cause of your issues as well. If it the function actually works for you, please ignore the next step and skip straight to my Excel comments below
Adding spaces. Here is how I have accomplished this task with base R
# a custom function to add arbitrary number of trailing spaces
SpaceAdd <- function(x, desiredLength = 5) {
additionalSpaces <- ifelse(nchar(x) < desiredLength,
paste(rep(" ", desiredLength - nchar(x)), collapse = ""), "")
paste(x, additionalSpaces, sep="")
}
# use the function on your df
df$ID <- mapply(df$ID, FUN = SpaceAdd)
# write csv normally
write.csv(df, "df.csv")
NOTE When you import to Excel, you should be using the 'import from text' wizard rather than just opening the .csv. This is because you need marking your 'ID' column as text in order to keep the spaces
NOTE 2 I have learned today, that having your first column named 'ID' might actually cause further problems with excel, since it may misinterpret the nature of the file, and treat it as SYLK file instead. So it may be best avoiding this column name if possible.
Here is a wiki tl;dr:
A commonly encountered (and spurious) 'occurrence' of the SYLK file happens when a comma-separated value (CSV) format is saved with an unquoted first field name of 'ID', that is the first two characters match the first two characters of the SYLK file format. Microsoft Excel (at least to Office 2016) will then emit misleading error messages relating to the format of the file, such as "The file you are trying to open, 'x.csv', is in a different format than specified by the file extension..."
details: https://en.wikipedia.org/wiki/SYmbolic_LinK_(SYLK)

How to modify i in an R loop?

I have several large R objects saved as .RData files: "this.RData", "that.RData", "andTheOther.RData" and so on. I don't have enough memory, so I want to load each in a loop, extract some rows, and unload it. However, once I load(i), I need to strip the ".RData" part of (i) before I can do anything with objects "this", "that", "andTheOther". I want to do the opposite of what is described in How to iterate over file names in a R script? How can I do that? Thx
Edit: I omitted to mention the files are not in the working directory and have a filepath as well. I came across Getting filename without extension in R and file_path_sans_ext takes out the extension but the rest of the path is still there.
Do you mean something like this?
i <- c("/path/to/this.RDat", "/another/path/to/that.RDat")
f <- gsub(".*/([^/]+)", "\\1", i)
f1 <- gsub("\\.RDat", "", f)
f1
[1] "this" "that"
On windows' paths you have to use "\\" instead of "/"
Edit: Explanation. Technically, these are called "regular
expressions" (regexps), not "patterns".
. any character
.* arbitrary number (including 0) of any kind of characters
.*/ arbitrary number of any kind of characters, followed by a
/
[^/] any character but not /
[^/]+ arbitrary number (1 or more) of any kind of characters,
but not /
( and ) enclose groups. You can use the groups when
replacing as \\1, \\2 etc.
So, look for any kind of character, followed by /, followed by
anything but not the path separator. Replace this with the "anything
but not separator".
There are many good tutorials for regexps, just look for it.
A simple way to do this using would be to extract the base name from the filepaths with base::basename() and then remove the file extension with tools::file_path_sans_ext().
paths_to_files <- c("./path/to/this.RData", "./another/path/to/that.RData")
tools::file_path_sans_ext(
basename(
paths_to_files
)
)
## Returns:
## [1] "this" "that"

grep-like function in R

I'm trying to write a program in R which would take in a .pdb file and give out a .xyz-file.
I'm having problems with erasing some rows that contain useless data. There are around 30-40 thousand rows, from which I would only need about 3000. The rows that contain the useful information start with the word "ATOM".
In unix terminal I would just use the command
grep ATOM < filename.pdb > newfile.xyz
but I have no idea how to achieve the same result with R.
Thank you for your help!
You should be able to use grep, and depending on your specific situation, perhaps substr.
For example
#Random string variable
stringVar <- c("abcdefg", "defg", "eff", "abc")
#find the location of variables starting with "abc"
abcLoc <- grep("abc", substr(stringVar, 1, 3))
#Extract "abc" instances
out <- stringVar[abcLoc]
out
Note that the substr part limits the search to only the first three characters of each element of stringVar (e.g., "abc", "def", etc.). This may not be strictly necessary but I've found it to be very useful at times. For example, if you had an element like "defabc" that you didn't want to include, using substr would ensure it wouldn't be "found" by grep.
Hope it's helpful.

R: Add paste() elements to file

I'm using base::paste in a for loop:
for (k in 1:length(summary$pro))
{
if (k == 1)
mp <- summary$pro[k]
else
mp <- paste(mp, summary$pro[k], sep = ",")
}
mp comes out as one big string, where the elements are separated by commas.
For example mp is "1,2,3,4,5,6"
Then, I want to put mp in a file, where each of its elements is added to a separate column in the same row. My code for this is:
write.table(mp, file = recompdatafile, sep = ",")
However, mp just appears in the CSV as one big string as opposed to being divided up. How can I achieve my desired format?
FYI
I've also tried converting mp to a list, and strsplit()-ing it, neither of which have worked.
Once I've added summary$pro to the file, how can I also add summary$me (which has the same format), in one row with multiple columns?
Thanks,
n.i.
If you want to write something to a file, write.table() isn't the only way. If you want to avoid headers and quotes and such, you can use the more direct cat. For example
cat(summary$pro, sep=",", file="filename.txt")
will write out the vector of values from summary$pro separated by commas more directly. You don't need to build a string first. (And building a string one element at a time as you did above is a bad practice anyway. Most functions in R can operate on an entire vector at a time, including paste).

Using lists in R

Sorry for possibly a complete noob question but I have just started programming with R today and I am stuck already.
I am reading some data from a file which is in the format.
3.482373 8.0093238198371388 47.393873
0.32 20.3131 31.313
What I want to do is split each line then deal with each of the individual numbers.
I have imported the stringr package and using
x = str_split(line, " ")
This produces a list which I would like to index but don't know how.
I have learnt that x[[1:2]] gets the second element but that is about it. Ideally I would like something like
x1 = x[1]
x2 = x[2]
x3 = x[3]
But can't find anyway of doing this.
Thanks in advance
By using unlist you will get a vector instead of a list of vectors, and you will then be able to index it directly :
R> unlist(str_split("foo bar baz", " "))
[1] "foo" "bar" "baz"
But maybe you should read your file directly from read.table or one of its variant ?
And if you are beginning with R, you really should read one of the introduction available if you want to understand subsetting, indexing, etc.
you can wrap your call to str_split with unlist to get the behavior you're looking for.
The usual way to get this in would be to import it into a dataframe (a special sort of list). If file name is "fil.dat"" and is in "C:/dir/"
dfrm <- read.table("C:/dir/fil.dat") # resist the temptation to use backslashes
dfrm[2,2] # would give you the second item on the second row.
By default the field separator in R is "white-space" and that seems to be what you have, so you do not need to supply a sep= argument and the read.table function will attempt to import as numeric. To be on the safe side, you might consider forcing that option with colClasses=rep("numeric", 3) because if it encounters a strange item (such as often produced by Excel dumps), you will get a factor variable and will probably not understand how to recover gracefully.

Resources