I am building a function that requires me to rename columns in a data frame, where the original column names are stored in a variable from user input.
#self$options$DV is a string identifying the column of interest, such as 'chosenvarname':
DVlabel <- self$options$DV
However, the dplyr::rename function doesn't work when using this variable or its symbolic link:
df1 <-plyr::rename(df1,c(DVlabel='DV'))
df1 <-plyr::rename(df1,c(self$options$DV='DV'))
Even when DVlabel is set to equal a valid column name in the data frame, it still doesn't work
It only works properly when using the actual column name, which makes me think that this function doesn't work with symbolic links:
df1 <-plyr::rename(df1,c(OriginalColumnName='DV'))
Is there another way to use the column name identified in self$options$DV as the basis for renaming that same column to something else?
Put differently, is there any way to rename a column using symbolic links to the column name that don't otherwise exist in the data?
Alternatively, is there some way to construct a column reference, such as data$var1, where the "var1" component is extracted from some other variable (e.g., DVlabel or self$options$DV?)
I found a way around this by using the following to rename the columns:
colnames(df1)[colnames(df1) == self$options$DV] <- 'DV'
Related
It could be a very easy question, given that I am very unfamiliar with R. I know normally one can use deparse(substitute(.)) to extract the name of a variable. However, if I have a long list of variables (let's say it's built without names), how can I extract the name of each variable efficiently? I was thinking about using loops, but the deparse(substitute(.)) method would obviously generate the 'general' variable name we used to denote every item.
Sample code:
countries<-
list(austria,belgium,czech,denmark,france,germany,italy,luxemberg,netherlands,poland,swiss)
Suppose I want to get countryNames equals to list("austria","belgium",...,"swiss"), how shall I code? I tried generating the list using countries <- list(countryA = countryA, countryB = countryB, ...), but it was extremely tedious, and in some cases I might only have an unnamed input list from elsewhere.
countries would just have values of each individual objects (austria,belgium etc.). To access the names you need to create a named list while creating countries which can be done like :
countries <- list(austria = austria,belgium = belgium....)
However, if this is very tedious you can use tibble::lst which creates the names automatically without explicitly mentioning them.
countries <- tibble::lst(austria,belgium....)
In both the case you can access the names using names(countries).
If the country objects are the only ones loaded in the global environment, we can do this easily with ls and mget to return a named list of values
countries <- mget(ls())
I have a SpatialPolygonDataFrame loaded in R. There is a specific column with entries I want to rename to correct typos.
Data$Material has the attributes of PIPES, PILINGS, TIRES, etc. I want to rename these to Pipes, Pilings, Tires, etc.
I have used relabel() rename.vars() rename() and all run without any error messages, but there is no change in the data. Below is an example of my code.
mat<- memisc::relabel(Data$Material,"PILINGS"=="Pilings","Pipe"=="Pipes","PIPE"=="Pipes","TIRES"=="Tires")
Data$Material_Clean <- NA
Data$Material_Clean <- mat
Data$Material_Clean has the exact same attributes as Data$Material with none of the renamed variables.
How do I rename the specified variables?
I had the same issue and the best solution I could find was to use setNames from stats:
Data <- setNames(Data, c("Pilings", "Pipes", "Tires"))
Unfortunately, this means you need to include all columns - not just the ones you want to rename. So if you have many columns just get their names as a vector (e.g. using names(Data)), change the names of the ones you want to rename, then pass the updated vector as the second arg to setNames().
This is probably a basic question, but why does R think my vector, which has a bunch of words in it, are numbers when I try to use these vectors as column names?
I imported a data set and it turns out the first row of data are the column headers that I want. The column headers that came with the data set are wrong ones. So I want to replace the column names. I figured this should be easy.
So what I did was I extracted the first row of data into a new object:
names <- data[1,]
Then I deleted the first row of data:
data <- data[-1,]
Then I tried to rename the column headers with the "names" object:
colnames(data) <- names
However, when I do this, instead of changing my column names to the words within the names object, it turns it into a bunch of numbers. I have no idea where these numbers come from.
Thanks
You need to actually show us the data, and the read.csv()/read.table() command you used to import.
If R thinks your numeric column is string, it sounds like that's because it wrongly includes the column name, i.e. you omitted header=TRUE in your read.csv()/read.table() import.
But show us your actual data and commands used.
I have two part question, both concerns working with data frames names:
I want to concatenate two dfs names with separator, for example: df1 and df2 to be "df1_&_df2"
I want make R to read data frame name as character in quotation marks so my df is called df1 and I in certain parts of my code I want it to be "df1".
When it come to 1st part I tried paste but it pasted entire data in both dfs and names concerns column names.
In the 2nd issue, being able to make R understand df name as quotation marked word is very handy in code for more complex charts, I simply put dfs into code and R makes chart title out of it. I understand there is very simple workaround here, I can create list of names manually list=c("df1", "df2") and then just use function get in places where I need to refer to content of data frame instead of its name, but it seems little inconvenient in the long run. Is there any function in R which output is just df name? Something that looks like GiveMeName(df) and the output is "df"? (I wrote this in normal font intentionally, so no one would thought this is real function)
For #1, you'll have to give a use case for me to understand your goal.
For #2, you can use deparse(substitute(df1)). Here's an example:
plot_and_title <- function(df1) {
data_name <- deparse(substitute(df1))
plot(df1[[1]], df1[[2]], main = data_name)
}
plot_and_title(mtcars)
Adding on to the answer by #Nathan Werth, you can concatenate names using:
paste(deparse(substitute(df1)), deparse(substitute(df2)), sep="_&_")
I have a data frame with hundreds of columns whose names I want to change. I'm very new to R, so it's rather easy to think through the logic of this, but I simply can't find a relevant example online.
The closest I could sort of get was this:
projectFileAllCombinedNames <- for (i in 1:200){names(projectFileAllCombined)[i+1] <-variableNames[i]}
Basically, starting at the second column of projectFileAllCombined, I want to loop through the columns in the dataframe and assign them the data values in the second data frame. I was able to change one column name manually with this code:
colnames(projectFileAllCombined)[2]<-"newColumnName"
but I can't possibly do that for hundreds of columns. I've spent multiple hours on this and can't crack it with any number of Google searches on "change multiple columns in r" or "change column names in r". The best I can find online is examples where people change a few columns with a c() function and I get how that works, but that still seems to require typing out all the column names as parameters to the function, unless there is a way to just pass the "variableNames" file into that c() function, but I don't know of one.
Will
colnames(projectFileAllCombined)[-1] <- variableNames
not suffice?
This assumes the ordering of columns in projectFileAllCombined is the same as the ordering of the new variable names in variableNames, and that
length(variableNames) == (ncol(projectFileAllCombined) - 1)
The key point here is that the replacement function 'colnames<-'() is vectorised and can replace any number of column names in a single call if passed a vector of replacement values.