Why does R think my imported vector of characters are numbers? - r

This is probably a basic question, but why does R think my vector, which has a bunch of words in it, are numbers when I try to use these vectors as column names?
I imported a data set and it turns out the first row of data are the column headers that I want. The column headers that came with the data set are wrong ones. So I want to replace the column names. I figured this should be easy.
So what I did was I extracted the first row of data into a new object:
names <- data[1,]
Then I deleted the first row of data:
data <- data[-1,]
Then I tried to rename the column headers with the "names" object:
colnames(data) <- names
However, when I do this, instead of changing my column names to the words within the names object, it turns it into a bunch of numbers. I have no idea where these numbers come from.
Thanks

You need to actually show us the data, and the read.csv()/read.table() command you used to import.
If R thinks your numeric column is string, it sounds like that's because it wrongly includes the column name, i.e. you omitted header=TRUE in your read.csv()/read.table() import.
But show us your actual data and commands used.

Related

Finding a character variable in a column

I have a huge data frame df, with many columns. One of the columns named id_nm happens to be a character with values such as: aksh123dn.Ins
class(df$id_nm)
returns character
I need to lookup all those values which have the id_nm say aksh123dn.Ins
I used:
new_df<-df[df$id_nm=='aksh123dn.Ins',]
this returns the entire df which isn't the case in reality
also tried:
new_df<-df%filter(id_nm=='aksh123dn.Ins']
still getting the same answer
I think its possibly because it is a character string. Please help me with this. TIA

Why does read.csv in R convert fields to factors for some files, and not in others?

I have tables of weather data from several stations. When I import them separately using read.csv, the fields are factors, integers, and numerics. However, when I try to import one csv file with all of data combined, the resulting fields in a dataframe are all factors. In the combined file the 1st field has several alphanumeric variables, whereas in the individual files there is only one variable (name of station).
This is a commom behaviour of data.frame() from base R. And most of the times, the result of read.csv() will be stored in a data.frame. As #Duck suggested in the comment section, you can avoid this behaviour, by setting the stringsAsFactors argument to FALSE.
read.csv('myfile.csv', stringsAsFactors = FALSE)
You can check this description below on the documentation page of the data.frame function. You can access this documentation with ?data.frame command.
Character variables passed to data.frame are converted to factor columns unless protected by I() or argument stringsAsFactors is false.
So in your case, this happens in your combined file, because R are interpreting all variables as caracters. Why? Probably because in one (or some) of your files, in the numeric and integers columns, some lines of data are out of format. For example, maybe in a row, you have an "x" to represent an missing value. And read.csv() uses the entire file to decide wich format of data is each column, so as soon as the function hits this "x" value, it interprets the entire column as character. When this data is passed to data.frame(), the function converts these characters to factors. You sad that, in the combined file, you have in the first field, some alphanumeric values. So these values, are probably your "x"'s that are generating your problem.

Renaming data frame columns using symbolic names from stored variables?

I am building a function that requires me to rename columns in a data frame, where the original column names are stored in a variable from user input.
#self$options$DV is a string identifying the column of interest, such as 'chosenvarname':
DVlabel <- self$options$DV
However, the dplyr::rename function doesn't work when using this variable or its symbolic link:
df1 <-plyr::rename(df1,c(DVlabel='DV'))
df1 <-plyr::rename(df1,c(self$options$DV='DV'))
Even when DVlabel is set to equal a valid column name in the data frame, it still doesn't work
It only works properly when using the actual column name, which makes me think that this function doesn't work with symbolic links:
df1 <-plyr::rename(df1,c(OriginalColumnName='DV'))
Is there another way to use the column name identified in self$options$DV as the basis for renaming that same column to something else?
Put differently, is there any way to rename a column using symbolic links to the column name that don't otherwise exist in the data?
Alternatively, is there some way to construct a column reference, such as data$var1, where the "var1" component is extracted from some other variable (e.g., DVlabel or self$options$DV?)
I found a way around this by using the following to rename the columns:
colnames(df1)[colnames(df1) == self$options$DV] <- 'DV'

R data frame issue - non-numeric headers

This is definitely a rookie question but I'm not finding an answer for this (maybe because of my wording) so here goes:
I'm reading a data frame into R studio (csv file) that has 24 columns with headers. There are only numbers in these columns (they're essentially concentrations of several chemicals). It's called all. I need to use them as numeric vectors. When I read them in and type
is.numeric(all[,1])
I get
TRUE
When I type
is.numeric(all[1])
I get
FALSE
I think this is because R interprets the header as a factor. I also tried reading in a table without headers and with headers=FALSE, but R renames it to V1, V2 etc so the result ends up being the same.
I need to work with functions where I invoke something like all[2:24]. How can I go about to make R either "not see" the header or remove it altogether?
Thanks for the answers!
PS: the dataframe I am using (without headers - if it had headers, it would just have names instead of V1, V2, etc) is something like this:
This is a subset from the first column, not the first row.
all[,1]) #subset first column
The following is subset of first row
all[1,]) #subset first row (headers of df not included)
To give columnames
colnames(all) <- c("col1","col2")
Your assumption is wrong. You have a data.frame and all[1] does list subsetting, which results in a data.frame, which is not a vector, and not a numeric vector in particular.
You should study help("[") and An Introduction to R.

Changing hundreds of column names simultaneously in R

I have a data frame with hundreds of columns whose names I want to change. I'm very new to R, so it's rather easy to think through the logic of this, but I simply can't find a relevant example online.
The closest I could sort of get was this:
projectFileAllCombinedNames <- for (i in 1:200){names(projectFileAllCombined)[i+1] <-variableNames[i]}
Basically, starting at the second column of projectFileAllCombined, I want to loop through the columns in the dataframe and assign them the data values in the second data frame. I was able to change one column name manually with this code:
colnames(projectFileAllCombined)[2]<-"newColumnName"
but I can't possibly do that for hundreds of columns. I've spent multiple hours on this and can't crack it with any number of Google searches on "change multiple columns in r" or "change column names in r". The best I can find online is examples where people change a few columns with a c() function and I get how that works, but that still seems to require typing out all the column names as parameters to the function, unless there is a way to just pass the "variableNames" file into that c() function, but I don't know of one.
Will
colnames(projectFileAllCombined)[-1] <- variableNames
not suffice?
This assumes the ordering of columns in projectFileAllCombined is the same as the ordering of the new variable names in variableNames, and that
length(variableNames) == (ncol(projectFileAllCombined) - 1)
The key point here is that the replacement function 'colnames<-'() is vectorised and can replace any number of column names in a single call if passed a vector of replacement values.

Resources