I am trying to learn how to use R and I'd like to run simple/multiple/logistic regressions but I am stuck right at the beginning. I have succesfully loaded an spss database in R using this code:
> library(foreign)
> data<-read.spss("new long an.sav", use.value.labels=TRUE, to.data.frame=TRUE)
re-encoding from UTF-8
> data
Then, I was trying to specify the data file I want to undertake my regressions on by doing as following:
> newlongan<-read.delim("new long an.sav", header = TRUE)
However, the following error messages comes up and I am not sure how to solve them:
> Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 1 appears to contain embedded nulls
2: In read.table(file = file, header = header, sep = sep, quote = quote, :
incomplete final line found by readTableHeader on 'new long an.sav'
I have got car, boot and QuantPsyc installed. Do you have any idea? Thanks
Silvia
Related
I'm trying to download some data from the internet to work with Text Mining in R, but running the command fails.
The commands are:
url <- 'http://www.gutenberg.org/cache/epub/100/pg100.txt'
arquivo <- read.csv(url)
And the error is:
Error in make.names(col.names, unique = TRUE) :
invalid multibyte string 1
In addition: Warning message:
In read.table(file = file, header = header, sep = sep, quote = quote, :
line 1 appears to contain embedded nulls
I tried several arguments for read.csv() function, but without success.
This is a text (.txt) document from project Gutenberg. Use readLines
url <- 'http://www.gutenberg.org/cache/epub/100/pg100.txt'
arquivo <- readLines(url)
This works for me
This:
Error in make.names(col.names, unique = TRUE) :
invalid multibyte string 1
In addition: Warning message:
In read.table(file = file, header = header, sep = sep, quote = quote, :
line 1 appears to contain embedded nulls
is telling you there is non-text data in the stream. Upon inspection this appears to be a GZ-encoded stream, which a web browser would decode on the fly to present the plain text. R might not want to do that. You can get the plain text version from this URL
> txt = readLines("http://www.gutenberg.org/files/100/100-0.txt")
> txt[14532]
[1] "ADRIANA. To fetch my poor distracted husband hence."
> txt[143532]
[1] " He looks like sooth. He says he loves my daughter;"
The tidyverse package readr is an option:
arquivo <- readr::read_file(url)
I have a large data set of (~20000x1). Not all the fields are filled, in other words the data does have missing values. Each feature is a string.
I have done the following code runs:
Input:
data <- read.csv("data.csv", header=TRUE, quote = "")
datan <- read.table("data.csv", header = TRUE, fill = TRUE)
Output for the second code:
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 1 did not have 80 elements
Input:
datar <- read.csv("data.csv", header = TRUE, na.strings = NA)
Output:
Warning message:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
EOF within quoted string
I run into essentially 4 problems, that I see. Two of the problems are the error message stated above. The third one is if it doesn't spit out an error message, when I look at the global environment window, I see not all my rows are accounted for, like ~14000 samples are missing but the feature number is right. The other problem I see is, again, not all the samples are counted for and the feature number is not correct.
How can I solve this??
Try the argument comment.char = "" as well as quote. The hash (#) is being read by R as a comment and will cut the line short.
Can you open the CSV using Notepad++? This will allow you to see 'invisible' characters and any other non-printable characters. That file may not contain what you think it contains! When you get the sourcing issue resolved, you can choose the CSV file with a selector tool.
filename <- file.choose()
data <- read.csv(filename, skip=1)
name <- basename(filename)
Or, hard-code the path, and read the data into R.
# Read CSV into R
MyData <- read.csv(file="c:/your_path_here/Data.csv", header=TRUE, sep=",")
When I run read.csv on a dataset
read.csv(file = msleep_ggplot2, header = TRUE, sep = ",")
I get an error message:
Error in read.table(file = file, header = header, sep = sep, quote = quote, : 'file' must be a character string or connection
The csv file loaded in r studio and looks good. Any idea what the problem might be?
I was having difficulties importing an excel sheet into R (csv). However, after reading this post, I was able to successfully import it. However, I noticed that some of the numbers in a particular column have transformed into unwanted characters-"Ï52,386.43" "Ï6,887.61" "Ï32,923.45". Any ideas how I can change these to numbers?
Here's my code below:
df <- read.csv("data.csv", header = TRUE, strip.white = TRUE,
fileEncoding="latin1", stringsAsFactors=FALSE)
I've also tried fileEncoding = "UTF-8" but this doesn't work-I'm getting the following warning:
Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote, :
invalid input found on input connection 'data.csv'
2: In read.table(file = file, header = header, sep = sep, quote = quote
I am using a mac with "R version 3.2.4 (2016-03-10)" (if that makes any difference). Here are the first ten entries from the affected column:
[1] "Ï52,386.43" "Ï6,887.61" "Ï32,923.45" "" "Ï82,108.44"
[6] "Ï6,378.10" "" "Ï22,467.43" "Ï3,850.14" "Ï5,547.83"
It turns out the issue was a pound sign that got changed into Ï in the process of saving an xls file into csv format (in windows-opened in a mac). Thanks for your replies.
I am trying to make R read my CSV file (which contains numerical and categorical data). I am able to open this file on a Windows computer(I tried different ones and it always worked) without any issues, but it is not working on my Mac at all. I am using the latest version of R. Originally, the data was in Excel and then I converted it to csv.
I have exhausted all my options, I tried recommendations from similar topics but nothing works. One time I sort of succeeded but the result looked like this: ;32,0;K;;B;50;;;; I tried the advice given in this topic Import data into R with an unknown number of columns? and the result was the same. I am a beginner in R and I really know nothing about coding or programming, so I would appreciate tremendously any kind of advice on this issue.Below are my feckless attempts to fix this problem:
> file=read.csv("~/Desktop/file.csv", sep = ";")
Error in type.convert(data[[i]], as.is = as.is[i], dec = dec, na.strings = character(0L)) :
invalid multibyte string at '<ca>110'
> file=read.csv("~/Desktop/file.csv", sep = " ")
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
duplicate 'row.names' are not allowed
> ?read.csv
> file=read.csv2("~/Desktop/file.csv", sep = ";")
Error in type.convert(data[[i]], as.is = as.is[i], dec = dec, na.strings = character(0L)) :
invalid multibyte string at '<ca>110'
> file=read.csv2("~/Desktop/file.csv", sep = ";", header=TRUE)
Error in type.convert(data[[i]], as.is = as.is[i], dec = dec, na.strings = character(0L)) :
invalid multibyte string at '<ca>110'
> file=read.csv("~/Desktop/file.csv", sep=" ",row.names=1)
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
duplicate 'row.names' are not allowed
> file=read.csv("~/Desktop/file.csv", row.names=1)
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
more columns than column names
> file=read.csv("~/Desktop/file.csv", sep=";",row.names=1)
Error in type.convert(data[[i]], as.is = as.is[i], dec = dec, na.strings = character(0L)) :
invalid multibyte string at '<ca>110'
This is what the header of the data looks like. So using the advice below, I saved the document in the CSV format for Mac and once I executed the View(file) function, everything looked ok, except for some rows like the row#1 (Cord Number 1) below, it was completely misplaced :
Cord.Number Ply Attch Knots Length Term Thkns Color Value
1,S,U,,37.0,K,,MB,,,"5.5 - 6.5:4, 8.0 - 8.5:2",,UR1031,unknown,
1s1 S U 1S(5.5/Z) 1E(11.5/S) 46.5 K NA W 11
1s2 S U 1S(5.5/Z) 5L(11.0/Z) 21.0 B NA W 15
This is what the spreadsheet looks like in R Studio on Windows (I don't have enough reputation to post an image):
http://imgur.com/zQdJBT2
As a workaround, what you can do is open the csv file on a Windows machine, and then save it to a .rdata file. Rdata is R's internal storage format. You can then put the file on a USB stick, (or DropBox, Google Drive, or whatever), copy it to your Mac, and work on it there.
# on the Windows PC
dat <- read.csv("<file>", ...)
save(dat, file="<file location>/dat.rdata")
# copy the dat.rdata file over, and then on your Mac:
load("<Mac location>/dat.rdata")
fileEncoding="latin1" is a way to make R read the file, but in my case it came with loss of data and special characters. For example, the symbol € disappeared.
As a workaround that worked best for me for this issue (I'm on a mac too), I opened first the file on Sublime Text, and saved it "with encoding" UTF 8.
When trying to import it after again, it could get read by R with no problem, and my special character were still present.
I had a similar problem, but when including , fileEncoding="latin1" after file's name it works