How to search for a specific column name in data - r

So this is a bit strange, I am pretty new to R and facing this weird problem.
I have a data frame, and there is a column called SNDATE which is a combined value of two different columns.
I want to check if the data frame has a column named SN, if it doesn't I will split SNDATE to fill the SN column.
Here is the code
if(!('SN' %in% colnames(data))){
#do some spliting here
}
Funny thing is, it keeps saying it's there, and the stuff in it never gets triggered.
And when I do this:
print(data$SN)
It will print the value of data$SNDATE. So does R have some sort of lazy name filling or something? This is very strange to me.
Thank you very much for the help

When you do
print(data$SN)
it works because $ is using partial name matching. For another example, try
mtcars$m
There is no column named m, so $ partially matches mpg. Unfortunately, this is not used in %in%, so you will need to use the complete exact column name in
if(!('SNDATE' %in% colnames(data))){
#do some spliting here
}
You could insead use something along the lines of pmatch()
names(mtcars)[2] <- "SNDATE"
names(mtcars)[pmatch("SN", names(mtcars))]
# [1] "SNDATE"
So the if() statement might go something like this -
nm <- colnames(data)
if(!nm[pmatch("SN", nm)] %in% nm) {
...
}
Or even
if(is.na(pmatch("SN", names(data)))
might be better

Related

How can i remove the first x number of characters of a column name from 200+ columns with each column being not the same number of characters

How can I remove a specific number of characters from a column name from 200+ column names for example: "Q1: GOING OUT?" and "Q5: STATE, PROVINCE, COUNTY, ETC" I just want to remove the "Q1: " and the "Q5: "I have looked around but haven't been able to find one where I don't have to manually rename them manually. Are there any functions or ways to use it through tidyverse? I have only been starting with R for 2 months.
I don't really have anything to show. I have considered using for loops and possibly using gsub or case_when, but don't really understand how to properly use them.
#probably not correctly written but tried to do it anyways
for ( x in x(0:length) and _:(length(CandyData)-1){
front -> substring(0:3)
back -> substring(4:length(CandyData))
print <- back
}
I don't really have any errors because I haven't been able to make it work properly.
Try this:
col_all<-c("Q1:GOING OUT?","Q2:STATE","Q100:PROVINCE","Q200:COUNTRY","Q299:ID") #This is an example.If you already have a dataframe ,you may get colnames by **col_all<-names(df)**
for(col in 1:length(col_all)) # Iterate over the col_all list
{
colname=col_all[col] # assign each column name to variable colname at each iteration
match=gregexpr(pattern =':',colname) # Find index of : for each colname(Since you want to delete characters before colon and keep the string succeeding :
index1=as.numeric(match[1]) # only first element is needed for index
if(index1>0)
{
col_all[col]=substr(colname,index1+1,nchar(colname))#Take substring after : for each column name and assign it to col_all list
}
}
names(df)<-col_all #assign list as column name of dataframe
The H 1 answer is still the best: sub() or gsub() functions will do the work. And do not fear the regex, it is a powerful tool in data management.
Here is the gsub version:
names(df) <- gsub("^.*:","",names(df))
It works this way: for each name, fetch characters until reaching ":" and then, remove all the fetched characters (including ":").
Remember to up vote H 1 soluce in the comments

R - Why does frameex[ind, ] needs a ", " to display rows

I am new to R and I have troubles understanding how displaying an index works.
# Find indices of NAs in Max.Gust.SpeedMPH
ind <- which(is.na(weather6$Max.Gust.SpeedMPH))
# Look at the full rows for records missing Max.Gust.SpeedMPH
weather6[ind, ]
My code here works, no problem but I don't understand why weather6[ind] won't display the same thing as weather6[ind, ] . I got very lucky and mistyped the first time.
I apologize in advance that the question might have been posted somewhere else, I searched and couldn't find a proper answer.
So [ is a function just like any other function in R, but we call it strangely. Another way to write it in this case would be:
'[.data.frame'(weather6,ind,)
or the other way:
'[.data.frame'(weather6,ind)
The first three arguments to the function are named x, i and j. If you look at the code, early on it branches with the line:
if (Narg < 3L)
Putting the extra comma tells R that you've called the function with 3 arguments, but that the j argument is "missing". Otherwise, without the comma, you have only 2 arguments, and the function code moves on the the next [ method for lists, in which it will extract the first column instead.

how to convert a charcter string to a name that accepts data (data frame name) in R

I have stored a list of names as characters and want to convert them to something that can be accepted as data frame name. something like this:
for (i in 1:18) {
str[i] <- paste("alert_month_amount_",i,sep="")
}
name_str = as.character(str)
then name_str will be:
name_str[1] would be "alert_month_amount_1"
now i want to assign certain data to a data frame that uses name_str[i] inside a loop like:
for (n in 1:18){
name_str[n] <- subset(by_Month_Acct_Num,month==month_index[n] & year==year_index[n])
}
but this does not work perhaps because the names are passed as characters inside double quotation mark ("). I would appreciate your help.
You can use assign for this:
assign(name_str[n], subset(by_Month_Acct_Num,month==month_index[n] & year==year_index[n]))
This is FAQ 7.21. The most important part of that answer is the end where it says (like #MrFlick) that it is better to use a list. You really should learn how to take advantage of R's vectorized functions.
The paste and paste0 functions are both vectorized, so your first bit of code can be replaced with:
name_str <- paste0("alert_month_amount_", 1:18)
without need for the loop.
You could create your list and fill it with code like:
alert_month_amount <- list()
for(i in 1:18) {
alert_month_amount[[i]] <- subset(by_Month_Acct_Num,month==month_index[n] & year==year_index[n])
}
Or possibly even easier using the split function. You could also use lapply or mapply.
If you want the elements named then just do:
names(alert_month_amount) <- name_str
Now with everything in a single list you can copy, save, delete, etc. one object rather than needing another loop to do each individual piece. If you want to do the same thing (calculate a summary, fit a regression, etc.) on each piece created then with everything in a list you can just use lapply or sapply on the list rather than having to create another loop and figuring out how to grab each piece in the loop and save it to an output object.

how to prompt user to remove multiple columns using the readline() in R

I am trying to write a code that allows the user to decide how many columns to remove from a table in R. The steps I am trying to perform are as follows:
1) print the column headers of the table
2) ask the user if they want to remove any columns. If the answer is yes, proceed to remove columns. This is in a loop, in case the user wants to remove multiple columns.
3) once the user is done removing columns, I want the modified table (with unwanted columns removed) to be returned so that it can be used later in script.
4) if the user does not want to remove any columns at all, they can just proceed, and the table is returned with no columns missing.
I am having 2 major issues/questions with my code as I currently have it:
1) the loop only works once (only one column is removed). the loop does work (it keeps prompting me if I keep answering "Y"), however in the end, the returned object only has 1 column removed (the first column I removed when the loop began). I tried to find if there is a way to have the user write in multiple inputs using readline, however the answers I found did not really help me.
2) If I don't want to remove any columns, and I enter "no" the first time I'm prompted for input, something very strange happens where what is returned is a table with the first column is removed.
I am still a newbie at coding, and I realize this may not be the best way to do what I want to do. I appreciate any advice/feedback!
my_data<-read.table(file.choose(),header=TRUE)
print(names(my_data)
for (column in my_data) {
remove_columns<-readline("Would you like to remove any columns? \n")
if(remove_columns=="Y" || remove_columns=="y") {
my_data_new<-my_data[,-!names(my_data) %in% c(readline("Which columns would you like to remove? \n"))]
} else {
return(my_data_new)
}}
I think you're looking for a while loop
my_data <- read.table(file.choose(), header = TRUE)
print(names(my_data)
while (TRUE) {
remove_columns <- readline("Would you like to remove any columns? \n")
if (remove_columns == "Y" || remove_columns == "y") {
my_data <- my_data[,-!names(my_data) %in% c(readline("Which columns would you like to remove? \n"))]
} else {
break
}
}

How to remove selected R variables without having to type their names

While testing a simulation in R using randomly generated input data, I have found and fixed a few bugs and would now like to re-run the simulation with the same data, but with all intermediate variables removed to ensure it's a clean test.
Is there a way to remove several dozen manually selected variables from the workspace without having to:
a) clobber the entire workspace, e.g. rm(list=ls()), or b) type each variable name, e.g. remove(name1, name2, ...)?
Ideal solution would be to use ls() to inspect the definitions and then pick out the indices of the ones I want to remove, e.g.
ls() # inspect definitions
delme <- c(3,5,7:9,11,13) # names selected for removal
remove(ls()[delme]) # DESIRED SOLUTION -- doesn't quite work this way
(In hindsight, I should have used a fixed seed to generate the random input data, which allow clearing everything and then re-running the test...)
There is a much simpler and more direct solution:
vars.to.remove <- ls()
vars.to.remove <- temp[c(1,2,14:15)]
rm(list = vars.to.remove)
Or, better yet, if you are good about variable naming schemes, you can use the following pattern matching strategy:
E.g. I name all temporary variables with the starting string "Temp."
... so, you can have Temp.Names, Temp.Values, Temp.Whatever
The following produces the list of variables that match this pattern
ls(pattern = "^Temp\\.")
So, you can remove all unneeded variables using ONE line of code, as follows:
rm(list = ls(pattern = "^Temp\\."))
Hope this helps.
Assad, while I think the actual answer to the question is in the comments, let me suggest this pattern as a broader solution:
rm(list=
Filter(
Negate(is.na), # filter entries corresponding to objects that don't meet function criteria
sapply(
ls(pattern="^a"), # only objects that start with "a"
function(x) if(is.matrix(get(x))) x else NA # return names of matrix objects
) ) )
In this case, I'm removing all matrix object that start with "a". By modifying the pattern argument and the function used by sapply here, you can get pretty fine control over what you delete, without having to specify many names.
If you are concerned that this could delete something you don't want to delete, you can store the result of the Filter(... operation in a variable, review the contents, and then execute the rm(list=...) command.
Try
eval(parse(text=paste("rm(",paste(ls()[delme],sep=","),")")))
I had a similar requirement. I pulled all the elements I needed to a list:
varsToPurge = as.list(ls())
I then reassign the few values I wish to keep with new variable names which will not be in the variable varsToPurge. After that I looped through the elements
for (j in 1:length(varsToPurge)){
rm(list = as.character(varsToPurge[j]))
}
Do a little garbage collecting, and you maintain a clean environment as you go through your code.
gc()
You can also use a vector of row numbers you wish to keep instead and run through the vector in the loop but it won't be as dynamic if you add rough work you wish to remove.

Resources