Naming columns in R string variables showing up blank - r

apologies in advanced if this question has been asked before but I couldn't find anything on it.
Right now, I'm attempting to take certain columns from files, and to name the columns the names of those files. I've done it before, and I know it's not too difficult, but I am running into a lot of trouble. MY code is as follows (allfiles is declared earlier in the code as all of the files in that directory)
makelist<-function(list_text){
if (list_text == "squared_median " || list_text == "squared_median_ranked"
|| list_text == "value_median " || list_text == "value_median_ranked")
metric = "median"
else
metric = "avg"
currfiles=allfiles[grepl(list_text,allfiles)]
currfile=currfiles[1]
currtable=read.table(currfile, header=T, sep='\t',stringsAsFactors = F)
a<-cbind(gene=currtable[,1],paste0(currfile)=currtable[,metric])
#col.name(a[,ncol(a)])<-currfile
#names(a)[ncol(a)]<-as.character(currfile)
for(currfile in currfiles[2:length(currfiles)])
{
currtable=read.table(currfile, header=T, sep='\t', stringsAsFactors=F)
if (length(currtable[,metric]) > length(a[,1]))
apply(a,2, function(x) length(x) = length(currtable[,metric]))
a=cbind(a, "gene"=currtable[,1],currfile=currtable[,metric])
#names(a)[ncol(a)]<-paste(currfile)
}
#names(a)=c("gene", currfiles[1], "gene", currfiles[2],"gene", currfiles[3],"gene", currfiles[4])
write.table(a, paste(output_folder, list_text,".txt"),sep='\t',quote=F,row.names=F)
}
Essentially, I'm passing in a string that is used to gather certain files from a directory. From, there the code grabs the median or average column from that file, and names the column the file from which it got that information. I've tried loads of different ways with no success. The commented ways are ways that did not work -- either they left the column name blank, or named it the literal variable name "currfile" as opposed to the file name which it contains. I've gone as far as individually renaming all of the columns with
names(a)=c("gene", currfiles[1], "gene", currfiles[2]...currfiles[n])
And that just names every other column currfiles.
Can you help me identify what's wrong? I've tried setting the name as get(currfile) too and that won't let me run the script.
These lines
#col.name(a[,ncol(a)])<-currfile
#names(a)[ncol(a)]<-as.character(currfile)
Have left me with blank column names.
** as an aside the lines with the if statement concerning length are supposed to extend the length of each column to the latest longest column, but doesn't seem to be working. That could be something else I'll read up about a bit more.
Thanks for your help,
Mike

To set column names of a table, you use colnames(table) (ref).
In your case, I'd expect colnames(a)[-1] <- currfile to do the trick, if I am understanding currently that you want to name the last column of the table a with the string in variable currfile.

Related

Conditional string generation in R (for loop + if/else)

Afraid I can't paste a code example as my dataset is sensitive.
After some issues with our source files we realised that our source file is inconsistent with allele coding and need to alter it, the first step in that is dropping the redundant column value (sometimes it's REF, sometimes ALT1), the third value, A1 is always used, all three are characters, and POSITION is a string.
Given the number of rows involved I've tried to set up a loop as follows:
Go to next row
Concatenate new identifier using A1 and whichever of REF and ALT1 does not equal A1
Looks simple enought in theory, but just won't behave; on inspection it appears to correctly catch the first instance of the first line but not the others.
Is there a glaring mistake I've made somewhere? Thanks.
# NOTE: reversed in order to match mapping file formatting (equiv. to REF_ALT)
for (i in 1:nrow(Chr1_results.dt)){
if(Chr1_results.dt[i,]$A1 != Chr1_results.dt[i,]$ALT1){
Chr1_results.dt[i,]$POSITION <- paste(Chr1_results.dt[i,]$ID, Chr1_results.dt[i,]$A1, Chr1_results.dt[i,]$ALT1, sep = "_")
} else{
Chr1_results.dt[i,]$POSITION <- paste(Chr1_results.dt[i,]$ID, Chr1_results.dt[i,]$A1, Chr1_results.dt[i,]$REF, sep = "_")
}
}

How can i remove the first x number of characters of a column name from 200+ columns with each column being not the same number of characters

How can I remove a specific number of characters from a column name from 200+ column names for example: "Q1: GOING OUT?" and "Q5: STATE, PROVINCE, COUNTY, ETC" I just want to remove the "Q1: " and the "Q5: "I have looked around but haven't been able to find one where I don't have to manually rename them manually. Are there any functions or ways to use it through tidyverse? I have only been starting with R for 2 months.
I don't really have anything to show. I have considered using for loops and possibly using gsub or case_when, but don't really understand how to properly use them.
#probably not correctly written but tried to do it anyways
for ( x in x(0:length) and _:(length(CandyData)-1){
front -> substring(0:3)
back -> substring(4:length(CandyData))
print <- back
}
I don't really have any errors because I haven't been able to make it work properly.
Try this:
col_all<-c("Q1:GOING OUT?","Q2:STATE","Q100:PROVINCE","Q200:COUNTRY","Q299:ID") #This is an example.If you already have a dataframe ,you may get colnames by **col_all<-names(df)**
for(col in 1:length(col_all)) # Iterate over the col_all list
{
colname=col_all[col] # assign each column name to variable colname at each iteration
match=gregexpr(pattern =':',colname) # Find index of : for each colname(Since you want to delete characters before colon and keep the string succeeding :
index1=as.numeric(match[1]) # only first element is needed for index
if(index1>0)
{
col_all[col]=substr(colname,index1+1,nchar(colname))#Take substring after : for each column name and assign it to col_all list
}
}
names(df)<-col_all #assign list as column name of dataframe
The H 1 answer is still the best: sub() or gsub() functions will do the work. And do not fear the regex, it is a powerful tool in data management.
Here is the gsub version:
names(df) <- gsub("^.*:","",names(df))
It works this way: for each name, fetch characters until reaching ":" and then, remove all the fetched characters (including ":").
Remember to up vote H 1 soluce in the comments

Adding new column and its value based on file name

I have 10 data files in my current directory such as data-01, data-02, data-03, data-04 till data-10.
Each of these data files has a few hundred rows with 4 fields. I would like to add a new column name "ID" and keep its ID like 01 (for data file data-01) for all the rows in that file.
A base R solution using a loop would go like this:
df<- c()
for (x in list.files(pattern="*.csv")) {
u<-read.table(x)
u$Label = factor(x)
df <- rbind(df, u)
cat(x, "\n ")
}
This depends on your data files having the same number of columns (though you get get around that inside the loop by selecting which columns you need before rbind) and then you can set whichever filetype you are looking at. The cat is useful because you can better trace read problems (because there are always problems). I bet there is a better way to do this with apply as well.

How to search for a specific column name in data

So this is a bit strange, I am pretty new to R and facing this weird problem.
I have a data frame, and there is a column called SNDATE which is a combined value of two different columns.
I want to check if the data frame has a column named SN, if it doesn't I will split SNDATE to fill the SN column.
Here is the code
if(!('SN' %in% colnames(data))){
#do some spliting here
}
Funny thing is, it keeps saying it's there, and the stuff in it never gets triggered.
And when I do this:
print(data$SN)
It will print the value of data$SNDATE. So does R have some sort of lazy name filling or something? This is very strange to me.
Thank you very much for the help
When you do
print(data$SN)
it works because $ is using partial name matching. For another example, try
mtcars$m
There is no column named m, so $ partially matches mpg. Unfortunately, this is not used in %in%, so you will need to use the complete exact column name in
if(!('SNDATE' %in% colnames(data))){
#do some spliting here
}
You could insead use something along the lines of pmatch()
names(mtcars)[2] <- "SNDATE"
names(mtcars)[pmatch("SN", names(mtcars))]
# [1] "SNDATE"
So the if() statement might go something like this -
nm <- colnames(data)
if(!nm[pmatch("SN", nm)] %in% nm) {
...
}
Or even
if(is.na(pmatch("SN", names(data)))
might be better

how to prompt user to remove multiple columns using the readline() in R

I am trying to write a code that allows the user to decide how many columns to remove from a table in R. The steps I am trying to perform are as follows:
1) print the column headers of the table
2) ask the user if they want to remove any columns. If the answer is yes, proceed to remove columns. This is in a loop, in case the user wants to remove multiple columns.
3) once the user is done removing columns, I want the modified table (with unwanted columns removed) to be returned so that it can be used later in script.
4) if the user does not want to remove any columns at all, they can just proceed, and the table is returned with no columns missing.
I am having 2 major issues/questions with my code as I currently have it:
1) the loop only works once (only one column is removed). the loop does work (it keeps prompting me if I keep answering "Y"), however in the end, the returned object only has 1 column removed (the first column I removed when the loop began). I tried to find if there is a way to have the user write in multiple inputs using readline, however the answers I found did not really help me.
2) If I don't want to remove any columns, and I enter "no" the first time I'm prompted for input, something very strange happens where what is returned is a table with the first column is removed.
I am still a newbie at coding, and I realize this may not be the best way to do what I want to do. I appreciate any advice/feedback!
my_data<-read.table(file.choose(),header=TRUE)
print(names(my_data)
for (column in my_data) {
remove_columns<-readline("Would you like to remove any columns? \n")
if(remove_columns=="Y" || remove_columns=="y") {
my_data_new<-my_data[,-!names(my_data) %in% c(readline("Which columns would you like to remove? \n"))]
} else {
return(my_data_new)
}}
I think you're looking for a while loop
my_data <- read.table(file.choose(), header = TRUE)
print(names(my_data)
while (TRUE) {
remove_columns <- readline("Would you like to remove any columns? \n")
if (remove_columns == "Y" || remove_columns == "y") {
my_data <- my_data[,-!names(my_data) %in% c(readline("Which columns would you like to remove? \n"))]
} else {
break
}
}

Resources