How to read data from excel in r? - r

I am trying to prepare data for cluster analysis. That's why I have prepared data tables in excel and the headers are "id","name","crime_type","crime_date","gender","age"
Then , I convert the excel into .csv format.
Then , I write the following command ->
>crime <- read.csv("crime_data.csv",header=T)
>crime # I print , and it prints
# now I will do cluster with kmeans()
>kmeans.result <- kmeans(crime,3)
But , it shows errors.
"Error is as follows :
Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning message:
In kmeans(crime, 3) : NAs introduced by coercion"
What I am doing wrong here...

I can't speak to your specific problem without knowing what you data looks like but it could be as simple as giving the xlsx package a try. I think it handles NaNs better
install.packages(xlsx)
library(xlsx)
yourdata <- read.xlsx("YOURDATASHEET.xlsx", sheetName="THESHEETNAME")

Seems like you are asking two questions. For the first; you can also try reading directly from the clipboard (beware of large tables tough, but so far I have good results with 40k rows, 30 col)
d1<-read.table(file="clipboard",sep="\t",header=FALSE,stringsAsFactors=FALSE)
set header to TRUE if you want to name your columns. You can also use what was suggested above to open excel sheets directly but this might not be practical if you have non standard tables.
For the second part perhaps you should convert to numerical using the sapply function and or suppressWarnings().

Related

Error while using colSums with tab-delimited file

I'm new at R and I'm currently trying to get some statistical data from a file. It is a large set of data in txt tab-delimited file. While importing the file I had no problem and all of the data is shown correctly as a table in rstudio. However, when I'm trying to make any sort of calculations using colsums,
> colSums("Wages and salaries")
Error in colSums("Wages and salaries") : 'x' must be an array of at
least two dimensions
I do receive an error
x' must be an array of at least two dimensions.
"Wages and Salaries" is the name of the column I'm trying to get the sum of.
Using V1 or any other column name that was created by r gives me another error
> colSums(V2)
Error in is.data.frame(x) : object 'V2' not found
The way I'm importing the file is
rm(list=ls())
filename <- read.delim("~/filename.txt", header=FALSE)`
> is.data.frame(filename)
[1] TRUE
This gives me a matrix type data table with rows and columns the same way excel would show me the data.
The reason I'm trying to get a sum of all of the numbers in column is to later get sum of several different columns.
I'm very new at R and I could not find an answer to my question as most of the examples are using just a very small set of data that was created in the r.
In R you can access a column in 2 ways:
filename["Wages and salaries"]
or
filename$`Wages and salaries`
So, please try :
colSums(filename["Wages and salaries"])

Error in eval(expr, envir, enclos) : object 'score' not found

We have always been an SPSS shop but we're trying to learn R. Been doing some basics. I am trying to do a simple t-test, just to learn that code.
Here's my code, and what's happening:
Code screenshot
I don't get why it says "score" isn't found. It's in the table, and the read.csv code defaults to assuming the first row contains headers. I can't see why it's not "finding" the column for score. I'm sure it's something simple, but would appreciate any help.
You didn't store your imported csv file in a variable. It printed to the console because it had nowhere else to go - it just gets printed and then thrown away. You need to assign it for it to be saved in memory:
my_data_frame <- read.csv("ttest.csv")
Now your data exists in the variable my_data_frame and you can use it by supplying it as the data argument:
t.test(score ~ class, mu=0, alt="two.sided", conf=0.95, var.eg=F, paired=F, data=my_data_frame)
Also, in general, I would recommend using read_csv from the package readr over the default read.csv - it's faster.
Finally, when you ask questions, please provide your code as text, not a picture. You should also provide your data or a toy dataset - you can use the function dput to print code that will create your data, or just provide the csv file or some code that creates toy data.

Error in fstRead R

I am using the new 'fst' package in R for a few weeks to write and read tables in the .fst format. Sometimes I cannot read a table that I've just write having the following message :
> tab=read.fst("Tables R/tab.fst",as.data.table=TRUE)
Error in fstRead(fileName, columns, from, to) :
Unknown type found in column.
Do you know why this happens ? Is there an other way to retrieve the table ?

Error: could not find function "includePackage"

I am trying to execute Random Forest algorithm on SparkR, with Spark 1.5.1 installed. I am not getting clear idea, why i am getting the error -
Error: could not find function "includePackage"
Further even if I use mapPartitions function in my code , i get the error saying -
Error: could not find function "mapPartitions"
Please find the below code:
rdd <- SparkR:::textFile(sc, "http://localhost:50070/explorer.html#/Datasets/Datasets/iris.csv",5)
includePackage(sc,randomForest)
rf <- mapPartitions(rdd, function(input) {
## my function code for RF
}
This is more of a comment and a cross question rather than an answer (not allowed to comment because of the reputation) but just to take this further, if we are using the collect method to convert the rdd back to an R dataframe, isnt that counter productive as if the data is too large, it would take too long to execute in R.
Also does it mean that we could possibly use any R package say, markovChain or a neuralnet using the same methodology.
Kindly check the functions that can be in used in sparkR http://spark.apache.org/docs/latest/api/R/index.html
This doesn't include function mapPartitions() or includePackage()
#For reading csv in sparkR
sparkRdf <- read.df(sqlContext, "./nycflights13.csv",
"com.databricks.spark.csv", header="true")
#Possible way to use `randomForest` is to convert the `sparkR` data frame to `R` data frame
Rdf <- collect(sparkRdf)
#compute as usual in `R` code
>install.packages("randomForest")
>library(rainForest)
......
#convert back to sparkRdf
sparkRdf <- createDataFrame(sqlContext, Rdf)

Error in eval(expr, envir, enclos) : object 'RM' not found

I'm Trying to do a simple linear regresion by calling a .CSV file where i have 14 different variables...which were first in a Excel file.
In Excel i named the variables by just putting a name above the data, but i don't really believe that means that that names the whole column by that variable.
(Of course i first call the .CSV file from R:
> datos<-read.table("datos3.csv",header=T,sep=";",dec=",")
So that must be why when i call them by name in R like:
regresion<-lm(RM~MEDV)
R says: Error in eval(expr, envir, enclos) : object 'RM' not found
The thing i find strange is that when i change the order of the variables (say MEDV~RM) i get the same error but saying that MEDV was not found this time. Is it because is the first thing it reads and detects?
Is there a way from Excel or R that i can name the variables for me to call them without errors or is the problem anywhere else?
UPDATED CODE
rm(list=ls())
datos<-read.table("datos3.CSV",header=T,sep=";",dec=",")
View(datos)
regresion<-lm(MEDV,RM,data=datos)
After this i get
Error in stats::model.frame(formula = MEDV, data = datos, subset = RM, :
object 'MEDV' not found
Basically the problem is that since i was using first an excel file for my data, i had the variables named included in the first row, so when passing it to .CSV format and trying to read it in R, it transformed those characters into data as factors. So when trying to do a regresion it did not have only number values.
After that it was the "dec"argument in the read.csv causing trouble and printing the same error. So after removing that and typing:
datos <- read.csv("datos3.csv")
I had no trouble making the regresion and could analyze the data i wanted.
(special thanks to #user20650 for helping me out)

Resources