So I'm trying to create a loop which reads input files based on the value present in a column 4 of my data frame.
for (idx in 1:nrow(config_path_test)) {
if (config_path_test[idx,2] == "ens84" & is.na(config_path_test[idx,4]))
{#process Raw data}
else {
if(config_path_test[idx,2] == "ens84" & !is.na(config_path_test[idx,4]))
{#process TPM data}
}
Tested my code with column 4 containing paths to my data in all fields, no paths in any of the fields or a combination of both (some have paths, some will not).
Code above deals with both all and no fields succesfully.
However, for the combination of both, I'm a bit stuck.
As no paths being present is reflected for the entire column as NA, i've been using:
is.na() and !is.na()
However, a combination of both will not show NA for the missing values. The fields are blank.
Any idea how to change my code to process the blank fields? Or any idea on how to process the blank values to NA?
Thanks in advance
If I am understanding the issue correctly, the files you have read in have some ("") fields that you would like to cast to NA? If that is the case, do the following outside your function:
df[df == ""] <- NA
Do check to ensure it is not something similar, such as (" "). This thread might also be of interest to you.
Related
I am new to R, moving over from Excel VBA. I would like to categorize a final value based on the text provided in multiple columns and 20k+ rows.
I've been semi-successful with "if" and "identical" but have struggled with partial matches through using "grep"
I'll share psuedo-code of what I'm trying to achieve:
If d$Removal_Reason_Code contains "SCH" AND
If d$Shop_Action_Code is an exact match to "Test" AND
If d$Repair_Summary contains "No Fault Found"
Then
set d$Category to "NFF"
Else
go back to row 1 and check against other keywords
I can post the working VBA code if that is helpful. I'm just getting my head round how R works, and was hoping it may be a quick and easy answer for one of you gurus!
Much appreciated :)
We can use grepl for partial matches
i1 <- with(d, grepl("SCH", Removal_Reason_Code) & Shop_Action_Code == "TEST" &
grepl("No Fault Found", Repair_Summary))
d$Category[i1] <- "NFF"
I want to update a column value in such a way that only the part of a string after the last '.' is stored. I wrote a code that does this but it only works when it's given one input. How do I loop through all the rows of my dataframe?
One value of a row looks for example like this. I only want to store the last part ".gif"
GET /./enviro/gif/emcilogo.gif
I wrote the following code that succesfully does this.
tail(c(do.call(rbind, strsplit(as.character(sapply(strsplit("GET /./enviro/gif/emcilogo.gif", "\\s+"), `[`, 2)),"\\."))), n=1)
Output:
"gif"
However, I am using the string "GET /./enviro/gif/emcilogo.gif" as input. As soon as I change this to the column of my dataframe "df$request" I receive the error.
Error in strsplit(epa.df$request) :
argument "split" is missing, with no default
I tried writing a function that one-by-one loops through my column values and updates them. However, I cant seem to get this working.
Any help would be highly appreciated!
So this is a bit strange, I am pretty new to R and facing this weird problem.
I have a data frame, and there is a column called SNDATE which is a combined value of two different columns.
I want to check if the data frame has a column named SN, if it doesn't I will split SNDATE to fill the SN column.
Here is the code
if(!('SN' %in% colnames(data))){
#do some spliting here
}
Funny thing is, it keeps saying it's there, and the stuff in it never gets triggered.
And when I do this:
print(data$SN)
It will print the value of data$SNDATE. So does R have some sort of lazy name filling or something? This is very strange to me.
Thank you very much for the help
When you do
print(data$SN)
it works because $ is using partial name matching. For another example, try
mtcars$m
There is no column named m, so $ partially matches mpg. Unfortunately, this is not used in %in%, so you will need to use the complete exact column name in
if(!('SNDATE' %in% colnames(data))){
#do some spliting here
}
You could insead use something along the lines of pmatch()
names(mtcars)[2] <- "SNDATE"
names(mtcars)[pmatch("SN", names(mtcars))]
# [1] "SNDATE"
So the if() statement might go something like this -
nm <- colnames(data)
if(!nm[pmatch("SN", nm)] %in% nm) {
...
}
Or even
if(is.na(pmatch("SN", names(data)))
might be better
apologies in advanced if this question has been asked before but I couldn't find anything on it.
Right now, I'm attempting to take certain columns from files, and to name the columns the names of those files. I've done it before, and I know it's not too difficult, but I am running into a lot of trouble. MY code is as follows (allfiles is declared earlier in the code as all of the files in that directory)
makelist<-function(list_text){
if (list_text == "squared_median " || list_text == "squared_median_ranked"
|| list_text == "value_median " || list_text == "value_median_ranked")
metric = "median"
else
metric = "avg"
currfiles=allfiles[grepl(list_text,allfiles)]
currfile=currfiles[1]
currtable=read.table(currfile, header=T, sep='\t',stringsAsFactors = F)
a<-cbind(gene=currtable[,1],paste0(currfile)=currtable[,metric])
#col.name(a[,ncol(a)])<-currfile
#names(a)[ncol(a)]<-as.character(currfile)
for(currfile in currfiles[2:length(currfiles)])
{
currtable=read.table(currfile, header=T, sep='\t', stringsAsFactors=F)
if (length(currtable[,metric]) > length(a[,1]))
apply(a,2, function(x) length(x) = length(currtable[,metric]))
a=cbind(a, "gene"=currtable[,1],currfile=currtable[,metric])
#names(a)[ncol(a)]<-paste(currfile)
}
#names(a)=c("gene", currfiles[1], "gene", currfiles[2],"gene", currfiles[3],"gene", currfiles[4])
write.table(a, paste(output_folder, list_text,".txt"),sep='\t',quote=F,row.names=F)
}
Essentially, I'm passing in a string that is used to gather certain files from a directory. From, there the code grabs the median or average column from that file, and names the column the file from which it got that information. I've tried loads of different ways with no success. The commented ways are ways that did not work -- either they left the column name blank, or named it the literal variable name "currfile" as opposed to the file name which it contains. I've gone as far as individually renaming all of the columns with
names(a)=c("gene", currfiles[1], "gene", currfiles[2]...currfiles[n])
And that just names every other column currfiles.
Can you help me identify what's wrong? I've tried setting the name as get(currfile) too and that won't let me run the script.
These lines
#col.name(a[,ncol(a)])<-currfile
#names(a)[ncol(a)]<-as.character(currfile)
Have left me with blank column names.
** as an aside the lines with the if statement concerning length are supposed to extend the length of each column to the latest longest column, but doesn't seem to be working. That could be something else I'll read up about a bit more.
Thanks for your help,
Mike
To set column names of a table, you use colnames(table) (ref).
In your case, I'd expect colnames(a)[-1] <- currfile to do the trick, if I am understanding currently that you want to name the last column of the table a with the string in variable currfile.
I am trying to write a code that allows the user to decide how many columns to remove from a table in R. The steps I am trying to perform are as follows:
1) print the column headers of the table
2) ask the user if they want to remove any columns. If the answer is yes, proceed to remove columns. This is in a loop, in case the user wants to remove multiple columns.
3) once the user is done removing columns, I want the modified table (with unwanted columns removed) to be returned so that it can be used later in script.
4) if the user does not want to remove any columns at all, they can just proceed, and the table is returned with no columns missing.
I am having 2 major issues/questions with my code as I currently have it:
1) the loop only works once (only one column is removed). the loop does work (it keeps prompting me if I keep answering "Y"), however in the end, the returned object only has 1 column removed (the first column I removed when the loop began). I tried to find if there is a way to have the user write in multiple inputs using readline, however the answers I found did not really help me.
2) If I don't want to remove any columns, and I enter "no" the first time I'm prompted for input, something very strange happens where what is returned is a table with the first column is removed.
I am still a newbie at coding, and I realize this may not be the best way to do what I want to do. I appreciate any advice/feedback!
my_data<-read.table(file.choose(),header=TRUE)
print(names(my_data)
for (column in my_data) {
remove_columns<-readline("Would you like to remove any columns? \n")
if(remove_columns=="Y" || remove_columns=="y") {
my_data_new<-my_data[,-!names(my_data) %in% c(readline("Which columns would you like to remove? \n"))]
} else {
return(my_data_new)
}}
I think you're looking for a while loop
my_data <- read.table(file.choose(), header = TRUE)
print(names(my_data)
while (TRUE) {
remove_columns <- readline("Would you like to remove any columns? \n")
if (remove_columns == "Y" || remove_columns == "y") {
my_data <- my_data[,-!names(my_data) %in% c(readline("Which columns would you like to remove? \n"))]
} else {
break
}
}