Prevent a numeric from being converted into a factor - r

I'm building a table from a CSV file. When the file is initially loaded I need to load as characters.
datset <- read.csv("outcome-of-care-measures.csv", colClasses = "character")
I have function to convert a factor containing number (from other stack q)
as.numeric.factor <- function(x) {as.numeric(levels(x))[x]}
I clean up the file with
i<-17
datset[datset=="Not Available"]<-NA
datset<-datset[complete.cases(datset[,i]),]
x<- as.numeric.factor(datset[, i])
The datset table contains lots of columns I don't need so I build a new table :
dat <- data.frame(cbind("HospitalName"= datset[,2], "State"= datset[,7],"Rating" = x))
My problem is that even though x is numeric, it gets turned into a factor when loaded to the dataframe. I can verify this from debug mode with :
class(x)
"Numeric"
class(dat[,3])
"Factor"
In later code I'm trying to sort the Rating column but it's failing due it being a factor - I guess.
I've even tried appending stringsAsFactors = FALSE to read.csv but this has no effect.
How can I prevent x from being converted into a factor when loading to a DF?

As Henrik explained in his comment, this:
dat <- data.frame(cbind("HospitalName"= datset[,2], "State"= datset[,7],"Rating" = x))
is a poor way to construct a data frame. cbind converts everything to a matrix, which can only hold a single data type. Hence the coercion.
It would be better to do:
dat <- data.frame(HospitalName = dataset[,2],state = dataset[,7],rating = x)
However, it is also true as Roland mentioned that you should be able to specify this one column to be numeric when reading the data in via:
colclasses <- rep("character", 40)
colclasses[7] <- "numeric"
and then passing that in read.csv.

Related

R - Error in colMeans(wind.speed, na.rm = T) : 'x' must be numeric

I am trying to importa single column of a text file data set where each file is a single day of data. I want to take the mean of each day's wind speed. Here is the code I have written for that:
daily.wind.speed <- c()
file.names <- dir("C:\\Users\\User Name\\Desktop\\R project\\Sightings Data\\Weather Data", pattern =".txt")
for(i in 1:length(file.names))
{
##import data file into data frame
weather.data <-read.delim(file.names[i])
## extract wind speed column
wind.speed <- weather.data[3]
##Attempt to fix numeric error
##wind.speed.num <- as.numeric(wind.speed)
##Take the column mean of wind speed
daily.avg <- colMeans(wind.speed,na.rm=T)
##Add daily average to list
daily.wind.speed <- c(daily.wind.speed,daily.avg)
##Print for troubleshooting and progress
print(daily.wind.speed)
}
This code seems to work on some files in my data set, but others give me this error during this section of the code:
> daily.avg <- colMeans(wind.speed,na.rm=T)
Error in colMeans(wind.speed, na.rm = T) : 'x' must be numeric
I am also having trouble converting these values to numeric and am looking for options to either convert my data to numeric, or to possibly take the mean in a different way that dosen't encounter this issue.
> as.numeric(wind.speed.df)
Error: (list) object cannot be coerced to type 'double'
weather.data Example
Even though this is not a reproducible example the problem is that you are applying a matrix function to a vector so it won't work. Just change the colMeans for mean

Convert a dataframe to a character array?

I am trying to convert a dataframe to a character array in R.
THIS WORKS BUT THE TEXT FILE ONLY CONTAINS LIKE 83 RECORDS
data <- readLines("https://www.r-bloggers.com/wp-content/uploads/2016/01/vent.txt")
df <- data.frame(data)
textdata <- df[df$data, ]
THIS DOES NOT WORK..MAYBE BECAUSE IT HAS 3k RECORDS?
trump_posts <- read.csv(file="C:\\Users\\TAFer\\Documents\\R\\TrumpFBStatus1.csv",
sep = ",", stringsAsFactors = TRUE)
trump_text <- trump_posts[trump_posts$Facebook.Status, ]
All I know is I have a dataframe called trump posts. The frame has a single column called Facebook.Status. I just wanted to turn it into a character array so I can run an analysis on it.
Any help would be very much appreciated.
Thanks
If Facebook.Status is a character vector you can directly perform your analysis on it.
Or you can try:
trump_text <- as.character(trump_posts$Facebook.Status)
I think you are somehow confusing data.frame syntax with data.table syntax. For DF, you'd reference vector as df$col. However, for DT it is somewhat similar to what you wrote dt[,col] or dt[,dt$col]. Also, if you want a character vector right away, set stringsAsFactors = F in your read.csv. Otherwise you'll need extra conversion, for example, dt[,as.character(col)] or as.character(df$col).
And on a side note, size of vector is almost never an issue, unless you hit the limits of your hardware.

Splitting an ffdf object

I'm using ff and ffbase libraries to manage a big csv file (~40Go and 275e6 observations). I'd like to split/partition this file according to one of its columns (which is a factor column).
With a normal data frame, I would do something like that:
a <- data.frame(rnorm(10000,0,1),
sample(1:100,10000,replace=T),
sample(letters,10000,replace = T))
names(a) <- c('V1','V2','V3')
a_partition <- split(a,a$V3)
names(a_partition) <- paste("df",names(a_partition),sep = "_")
list2env(a_partition,globalenv())
but ff and ffbase doesn't have a split function. So, looking in the ffbase documentation, I found ffdfply and tried to use it as follows:
ffa <- as.ffdf(a)
ffa_partititon <- ffdfdply(x = ffa,split = ffa$V3)
Alas, I get the log message :
calculating split sizes
building up split locations
working on split 1/1, extracting data in RAM of 26 split elements,
totalling, 0.00015 GB, while max specified
data specified using BATCHBYTES is 0.01999 GB
... applying FUN to selected data
Error: argument "FUN" is missing, with no default
I tried FUN = as.data.frame (since the result of the function must be a data frame) with no luck : doing so makes ffa_partition a copy of ffa...
How can I partition my ffdf?
Two years late, but I believe this does what you needed:
result_list <- list()
for(letter in letters){
result_list[[letter]] <- subset(ffa, V3 == letter)
}

non-numeric argument error while running FFT on a matrix

I read eachline of a csv file and save the first element of each line in a list, then I want to run FFT on this list, but I get this error:
Error in fft(x) : non-numeric argument
in my Example hier I read 4 rows:
con<-file("C:\\bla\\test.csv","r")
datalist<-list()
m<-list()
for(i in 1:4)
{
line<-readLines(con,n=1,warn=FALSE)
m<-list(as.integer(unlist(strsplit(line,split=","))))
datalist<-c(datalist,sapply(m,"[[",1))
}
datalist
close(con)
fftfun<- function(x) {fft(x)}
fft_amplitude <- function(x) {sqrt((Re(fft(x)))^2+(Im(fft(x)))^2)} }
apply(as.matrix(datalist),2,FUN=fftfun)
what should I do to solve this problem?
EDIT
My rows in csv file:
12,85,365,145,23
13,84,364,144,21
14,86,366,143,24
15,83,363,146,22
16,85,365,145,23
17,80,361,142,21
Your code seems overly complicated. Why don't you just do something like this :
df <- read.csv("test.csv", header=FALSE)
x <- df[,1]
fft(x)
Or, if you really want to read line by line :
con <- file("test.csv","r")
data <- NULL
for (i in 1:4) {
line<-readLines(con,n=1,warn=FALSE)
data <- c(data, as.numeric(strsplit(line,split=",")[[1]][1]))
}
close(con)
fft(data)
Let's assume your real question is: what happened to make apparently numeric data become non-numeric? Rather than slogging through the incredible number of type coercions in your code (csv to matrix to list to other list to as.matrix), I'm going to recommend you start by just plain reading one file into R and checking the typeof and class of each column. If anything turns out to be a factor rather than numeric , you may need to add the argument colClasses='character' .
If the data as read are numeric, then you're fouling it up in your subsequent conversions. Try simplifying the code as much as possible.

Imported a csv-dataset to R but the values becomes factors

I am very new to R and I am having trouble accessing a dataset I've imported. I'm using RStudio and used the Import Dataset function when importing my csv-file and pasted the line from the console-window to the source-window. The code looks as follows:
setwd("c:/kalle/R")
stuckey <- read.csv("C:/kalle/R/stuckey.csv")
point <- stuckey$PTS
time <- stuckey$MP
However, the data isn't integer or numeric as I am used to but factors so when I try to plot the variables I only get histograms, not the usual plot. When checking the data it seems to be in order, just that I'm unable to use it since it's in factor form.
Both the data import function (here: read.csv()) as well as a global option offer you to say stringsAsFactors=FALSE which should fix this.
By default, read.csv checks the first few rows of your data to see whether to treat each variable as numeric. If it finds non-numeric values, it assumes the variable is character data, and character variables are converted to factors.
It looks like the PTS and MP variables in your dataset contain non-numerics, which is why you're getting unexpected results. You can force these variables to numeric with
point <- as.numeric(as.character(point))
time <- as.numeric(as.character(time))
But any values that can't be converted will become missing. (The R FAQ gives a slightly different method for factor -> numeric conversion but I can never remember what it is.)
You can set this globally for all read.csv/read.* commands with
options(stringsAsFactors=F)
Then read the file as follows:
my.tab <- read.table( "filename.csv", as.is=T )
When importing csv data files the import command should reflect both the data seperation between each column (;) and the float-number seperator for your numeric values (for numerical variable = 2,5 this would be ",").
The command for importing a csv, therefore, has to be a bit more comprehensive with more commands:
stuckey <- read.csv2("C:/kalle/R/stuckey.csv", header=TRUE, sep=";", dec=",")
This should import all variables as either integers or numeric.
None of these answers mention the colClasses argument which is another way to specify the variable classes in read.csv.
stuckey <- read.csv("C:/kalle/R/stuckey.csv", colClasses = "numeric") # all variables to numeric
or you can specify which columns to convert:
stuckey <- read.csv("C:/kalle/R/stuckey.csv", colClasses = c("PTS" = "numeric", "MP" = "numeric") # specific columns to numeric
Note that if a variable can't be converted to numeric then it will be converted to factor as default which makes it more difficult to convert to number. Therefore, it can be advisable just to read all variables in as 'character' colClasses = "character" and then convert the specific columns to numeric once the csv is read in:
stuckey <- read.csv("C:/kalle/R/stuckey.csv", colClasses = "character")
point <- as.numeric(stuckey$PTS)
time <- as.numeric(stuckey$MP)
I'm new to R as well and faced the exact same problem. But then I looked at my data and noticed that it is being caused due to the fact that my csv file was using a comma separator (,) in all numeric columns (Ex: 1,233,444.56 instead of 1233444.56).
I removed the comma separator in my csv file and then reloaded into R. My data frame now recognises all columns as numbers.
I'm sure there's a way to handle this within the read.csv function itself.
This only worked right for me when including strip.white = TRUE in the read.csv command.
(I found the solution here.)
for me the solution was to include skip = 0
(number of rows to skip at the top of the file. Can be set >0)
mydata <- read.csv(file = "file.csv", header = TRUE, sep = ",", skip = 22)

Resources