Conditional string generation in R (for loop + if/else) - r

Afraid I can't paste a code example as my dataset is sensitive.
After some issues with our source files we realised that our source file is inconsistent with allele coding and need to alter it, the first step in that is dropping the redundant column value (sometimes it's REF, sometimes ALT1), the third value, A1 is always used, all three are characters, and POSITION is a string.
Given the number of rows involved I've tried to set up a loop as follows:
Go to next row
Concatenate new identifier using A1 and whichever of REF and ALT1 does not equal A1
Looks simple enought in theory, but just won't behave; on inspection it appears to correctly catch the first instance of the first line but not the others.
Is there a glaring mistake I've made somewhere? Thanks.
# NOTE: reversed in order to match mapping file formatting (equiv. to REF_ALT)
for (i in 1:nrow(Chr1_results.dt)){
if(Chr1_results.dt[i,]$A1 != Chr1_results.dt[i,]$ALT1){
Chr1_results.dt[i,]$POSITION <- paste(Chr1_results.dt[i,]$ID, Chr1_results.dt[i,]$A1, Chr1_results.dt[i,]$ALT1, sep = "_")
} else{
Chr1_results.dt[i,]$POSITION <- paste(Chr1_results.dt[i,]$ID, Chr1_results.dt[i,]$A1, Chr1_results.dt[i,]$REF, sep = "_")
}
}

Related

Combining many vectors into one larger vector (in an automated way)

I have a list of identifiers as follows:
url_num <- c('85054655', '85023543', '85001177', '84988480', '84978776', '84952756', '84940316', '84916976', '84901819', '84884081', '84862066', '84848942', '84820189', '84814935', '84808144')
And from each of these I'm creating a unique variable:
for (id in url_num){
assign(paste('test_', id, sep = ""), FUNCTION GOES HERE)
}
This leaves me with my variables which are:
test_8505465, test_85023543, etc, etc
Each of them hold the correct output from the function (I've checked), however my next step is to combine them into one big vector which holds all of these created variables as a seperate element in the vector. This is easy enough via:
c(test_85054655,test_85023543,test_85001177,test_84988480,test_84978776,test_84952756,test_84940316,test_84916976,test_84901819,test_84884081,test_84862066,test_84848942,test_84820189,test_84814935,test_84808144)
However, as I update the original 'url_num' vector with new identifiers, I'd also have to come down to the above chunk and update this too!
Surely there's a more automated way I can setup the above chunk?
Maybe some sort of concat() function in the original for-loop which just adds each created variable straight into an empty vector right then and there?
So far I've just been trying to list all the variable names and somehow get the output to be in an acceptable format to get thrown straight into the c() function.
for (id in url_num){
cat(as.name(paste('test_', id, ",", sep = "")))
}
...which results in:
test_85054655,test_85023543,test_85001177,test_84988480,test_84978776,test_84952756,test_84940316,test_84916976,test_84901819,test_84884081,test_84862066,test_84848942,test_84820189,test_84814935,test_84808144,
This is close to the output I'm looking for but because it's using the cat() function it's essentially a print statement and its output can't really get put anywhere. Not to mention I feel like this method I've attempted is wrong to begin with and there must be something simpler I'm missing.
Thanks in advance for any help you guys can give me!
Troy

Adding new column and its value based on file name

I have 10 data files in my current directory such as data-01, data-02, data-03, data-04 till data-10.
Each of these data files has a few hundred rows with 4 fields. I would like to add a new column name "ID" and keep its ID like 01 (for data file data-01) for all the rows in that file.
A base R solution using a loop would go like this:
df<- c()
for (x in list.files(pattern="*.csv")) {
u<-read.table(x)
u$Label = factor(x)
df <- rbind(df, u)
cat(x, "\n ")
}
This depends on your data files having the same number of columns (though you get get around that inside the loop by selecting which columns you need before rbind) and then you can set whichever filetype you are looking at. The cat is useful because you can better trace read problems (because there are always problems). I bet there is a better way to do this with apply as well.

Naming columns in R string variables showing up blank

apologies in advanced if this question has been asked before but I couldn't find anything on it.
Right now, I'm attempting to take certain columns from files, and to name the columns the names of those files. I've done it before, and I know it's not too difficult, but I am running into a lot of trouble. MY code is as follows (allfiles is declared earlier in the code as all of the files in that directory)
makelist<-function(list_text){
if (list_text == "squared_median " || list_text == "squared_median_ranked"
|| list_text == "value_median " || list_text == "value_median_ranked")
metric = "median"
else
metric = "avg"
currfiles=allfiles[grepl(list_text,allfiles)]
currfile=currfiles[1]
currtable=read.table(currfile, header=T, sep='\t',stringsAsFactors = F)
a<-cbind(gene=currtable[,1],paste0(currfile)=currtable[,metric])
#col.name(a[,ncol(a)])<-currfile
#names(a)[ncol(a)]<-as.character(currfile)
for(currfile in currfiles[2:length(currfiles)])
{
currtable=read.table(currfile, header=T, sep='\t', stringsAsFactors=F)
if (length(currtable[,metric]) > length(a[,1]))
apply(a,2, function(x) length(x) = length(currtable[,metric]))
a=cbind(a, "gene"=currtable[,1],currfile=currtable[,metric])
#names(a)[ncol(a)]<-paste(currfile)
}
#names(a)=c("gene", currfiles[1], "gene", currfiles[2],"gene", currfiles[3],"gene", currfiles[4])
write.table(a, paste(output_folder, list_text,".txt"),sep='\t',quote=F,row.names=F)
}
Essentially, I'm passing in a string that is used to gather certain files from a directory. From, there the code grabs the median or average column from that file, and names the column the file from which it got that information. I've tried loads of different ways with no success. The commented ways are ways that did not work -- either they left the column name blank, or named it the literal variable name "currfile" as opposed to the file name which it contains. I've gone as far as individually renaming all of the columns with
names(a)=c("gene", currfiles[1], "gene", currfiles[2]...currfiles[n])
And that just names every other column currfiles.
Can you help me identify what's wrong? I've tried setting the name as get(currfile) too and that won't let me run the script.
These lines
#col.name(a[,ncol(a)])<-currfile
#names(a)[ncol(a)]<-as.character(currfile)
Have left me with blank column names.
** as an aside the lines with the if statement concerning length are supposed to extend the length of each column to the latest longest column, but doesn't seem to be working. That could be something else I'll read up about a bit more.
Thanks for your help,
Mike
To set column names of a table, you use colnames(table) (ref).
In your case, I'd expect colnames(a)[-1] <- currfile to do the trick, if I am understanding currently that you want to name the last column of the table a with the string in variable currfile.

R: Add paste() elements to file

I'm using base::paste in a for loop:
for (k in 1:length(summary$pro))
{
if (k == 1)
mp <- summary$pro[k]
else
mp <- paste(mp, summary$pro[k], sep = ",")
}
mp comes out as one big string, where the elements are separated by commas.
For example mp is "1,2,3,4,5,6"
Then, I want to put mp in a file, where each of its elements is added to a separate column in the same row. My code for this is:
write.table(mp, file = recompdatafile, sep = ",")
However, mp just appears in the CSV as one big string as opposed to being divided up. How can I achieve my desired format?
FYI
I've also tried converting mp to a list, and strsplit()-ing it, neither of which have worked.
Once I've added summary$pro to the file, how can I also add summary$me (which has the same format), in one row with multiple columns?
Thanks,
n.i.
If you want to write something to a file, write.table() isn't the only way. If you want to avoid headers and quotes and such, you can use the more direct cat. For example
cat(summary$pro, sep=",", file="filename.txt")
will write out the vector of values from summary$pro separated by commas more directly. You don't need to build a string first. (And building a string one element at a time as you did above is a bad practice anyway. Most functions in R can operate on an entire vector at a time, including paste).

span across columns with hwrite

Is it possible to span a heading across multiple columns with hwrite (or any other HTML-creating package)? I can sort of fake it with dataframe pieces nested within a larger table, but it's not quite a real span (and it looks ugly).
I did not see a version of this in the examples but maybe there exists elsewhere.
Thanks,
Tom
Edit: I should add that the print.xtable method does html, also (I shouldn't assume that is known). Use the type = "html" option.
No experience with html, but I do the following with LaTeX.
In the xtable package, the print.xtable method has an option add.to.row that allows you to do just that. For add.to.row you add a list-of-lists, where the first list is a list of row numbers and the second list is a list of commands to insert at that spot. From the ?print.xtable:
add.to.row -- a list of two
components. The first component (which
should be called 'pos') is a list
contains the position of rows on which
extra commands should be added at the
end, The second component (which
should be called 'command') is a
character vector of the same length of
the first component which contains the
command that should be added at the
end of the specified rows. Default
value is NULL, i.e. do not add
commands.
For LaTeX I use the following homemade command that add a "(1)" above the coefficient and t-stat column.
my.add.to.row <- function(x) {
first <- "\\hline \\multicolumn{1}{c}{} & "
middle <- paste(paste("\\multicolumn{2}{c}{(", seq(x), ")}", sep = ""), collapse = " & ")
last <- paste("\\\\ \\cline {", 2, "-", 1 + 2 * x, "}", collapse = "")
string <- paste(first, middle, last, collapse = "")
list(pos = list(-1), command = string)
}
HTH.
I can't see an obvious way of generating a table with headers that cross multiple columns. Here's a really awful hack that might solve your problem though.
Generate your table as normal.
In the source code for that page, the first row of the table will look something like
<td someattribute="somevalue">First column name</td><td someattribute="somevalue">Second column name</td>
You can read the file into R, either with htmlTreeParse from the XML package, or plain old readLines.
Now replace the offending bit of html with the correct value. The stringr package may well help here.
<td someattribute="somevalue" colspan="2">Column name spanning two columns</td>
And write back out to file.

Resources