I imported a csv file from excel. All the Revenue columns are importing as string. I want them to be numeric.
I thought it would be as easy as a$Revenue <- as.numeric(a$Revenue), but this coerces NAs into all the cells, wiping out the data. So the column does convert to numeric, but I lose all the data.
Is there another technique?
When you load the data try setting stringsAsFactors=FALSE in the read.csv function, and then try again.
Related
When I have my data set in excel, I have set it to numeric format and all data points are numbers. I then convert it to a txt and upload to R. Somehow when it is then in R, it reads that some data points are character and some are numeric. I am unsure how to make sure that all data points are numeric format when I upload them.
I need them all to be in numeric format as at a later date I use cTree which does not work if the data is characters.
thanks
What r function are you using to upload? if it is read_csv(), you can use the col_types argument to control how each column is read.
For example
dataframe <- read_csv(file.csv,col_types=col(a=col_numeric(),b=col_character())
I have tables of weather data from several stations. When I import them separately using read.csv, the fields are factors, integers, and numerics. However, when I try to import one csv file with all of data combined, the resulting fields in a dataframe are all factors. In the combined file the 1st field has several alphanumeric variables, whereas in the individual files there is only one variable (name of station).
This is a commom behaviour of data.frame() from base R. And most of the times, the result of read.csv() will be stored in a data.frame. As #Duck suggested in the comment section, you can avoid this behaviour, by setting the stringsAsFactors argument to FALSE.
read.csv('myfile.csv', stringsAsFactors = FALSE)
You can check this description below on the documentation page of the data.frame function. You can access this documentation with ?data.frame command.
Character variables passed to data.frame are converted to factor columns unless protected by I() or argument stringsAsFactors is false.
So in your case, this happens in your combined file, because R are interpreting all variables as caracters. Why? Probably because in one (or some) of your files, in the numeric and integers columns, some lines of data are out of format. For example, maybe in a row, you have an "x" to represent an missing value. And read.csv() uses the entire file to decide wich format of data is each column, so as soon as the function hits this "x" value, it interprets the entire column as character. When this data is passed to data.frame(), the function converts these characters to factors. You sad that, in the combined file, you have in the first field, some alphanumeric values. So these values, are probably your "x"'s that are generating your problem.
I am using the read.xlsx function in R to read excel sheets. All the values of a date column 'A' is of the form dd/mm/yyyy. However,when using the read.xlsx function, the values of the date parsed ranges from being an integer ie. 42283 to string i.e. 20/08/2015. This probelm persist even when I uses read.xlsx2.
I guess the inconsistency in the format for different rows makes it hard to change the column to a single standard format. Also, it is hard to specify the column classes in the read.xlsx since I have more than 100 variables.
Are there ways around this problem and also is this an excel specific problems?
Thank you!
This problem with date formats is pervasive and it seems like every R package out there deals with it differently. My experience with read.xlsx has been that it sometimes saves the date as a character string of numbers, e.g. "42438" as character data that I then have to convert to numeric and then to POSIXct. Then other times, it seems to save it as numeric and sometimes as character and once in a while, actually as POSIXct! If you're consistently getting character data in the form "20/08/2015", try the lubridate package:
library(lubridate)
dmy("20/08/2015")
I imported a set of data into RStudio containing 85 variables and 139 observations. All values are integers except for the last column which is blank and for some reason was imported alongside everything else in the .csv file I created from a .xls file.
As such, this last column is all NA values. The problem is that when I try to run any kind of analysis it seems to be reading that all values are NA values. Despite this, in the data window in RStudio everything seems to be fine. Are there solutions to this problem that don't involve the data? Is it almost certainly the data that's the problem?
It seems strange that when opening the file anywhere else and even viewing it in R
The most likely issue is that the file is being imported as all text rather than as numeric data. If all of the data is numeric you can just use colClasses="numeric" as an argument to the read.csv() function and that should import correctly. You could also change the data class once it is in R, or give colClasses a vector of different classes if you have a variety of different data types (logical, character, numeric etc.) in your file.
Edit
Seeing as colClasses is not working (it is hard to say why without looking at your data), you can try this:
MyDF<-data.frame(sapply(MyDF,FUN=as.numeric))
Where MyDF is your datafraome. That will change all of your columns to numeric. If you have some charcter/factor/logical values in there this may not work as expected. You might want to check your excel file/csv to see why it is importing a NA column. It could be that there is a cell with a space in it that is being pulled in and this is throwing things off. You could always try deleting that empty column and retrying your import.
If you want to omit your last column while reading the data itself, you can try the following code. In this example, I am assuming that your file has 5 columns and the 5th column has NA values. So, you want to skip reading 5th column in your data set.
data <- read.csv (fileName, ....) [,1:4]
or, if you want to use column names, you can use:
data <- read.csv (fileName, ....) [,c('col1','col2','col3','col4')]
This will read all the observations from selected columns within your data set.
Hope this helps.
If you are trying too find the mean and standard deviation you can use
Data<- mean( dataframe$colname , na.rm = TRUE)
Data1<- sd( dataframe$colname , na.rm = TRUE)
This will give u the answer after omitting the na values from the column
How do I export a data frame completely as.character in r? I have digits that need to be treated as text in large dataframes, and I'm using write.csv, but even though I imported digits into r as characters, they are exporting as numbers (not surrounded by"" when viewed in notepad) and are occasionally rewritten as, e.g., 1e-04 (for a small decimal value). This is for data munging, and I need stuff to stay as formatted (once formatted). Shouldn't that be possible with some form of "as.character" or similar?
Make it into a matrix. If there is at least one character column in your data frame, it'll coerce the rest to character to match, since you can only have on type of data in a matrix.
new <- as.matrix(old_data_frame)
If there are no character columns in your old data frame, do:
new <- matrix(as.character(as.numeric(as.matrix(old_data_frame))),
ncol=ncol(old_data_frame))
If you user the function
write.table(x, file= ,quote=TRUE ...)
anything that is a string will be quoted on output