I have a csv file with Non ASCII characteres in it. I simply want to remove that characters and read my csv file.
> tables <- lapply('/.././abc.csv', read.csv,header=F,stringsAsFactors=FALSE,fileEncoding="UTF-8")
Warning message:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
invalid input found on input connection '/.././abc.csv'
> df= suppressWarnings(do.call(rbind, tables))
It is not reading the complete file. It has only read the records before the Non-ASCII character. It has skipped all the records after Non ASCII chracter.
I cannot use iconv('/.././abc.csv', "latin1", "ASCII", sub="") as it expects x as vector.
cat '/.././abc.csv'
88036,120,151036.656250,2017-07-17 22:27:49,17-07-17 22:27:49
88036,120,151036.671875,2017-07-17 22:27:53,17-07-17 22:27:53
88036,310,151036.687500,2017-07-17 22:27:58,17-07-17 22:27:58
88036,310,151036.703▒▒F▒▒B▒▒▒D▒%▒▒▒2▒T▒▒K222642,17-07-17 22:28:03,2017-07-17 22:28:03
88036,310,151036.484375,2017-07-17 22:26:54,17-07-17 22:26:54
88036,310,151036.500000,2017-07-17 22:26:59,17-07-17 22:26:59
It is skipping last 2 records after reading the CSV files. Any help.
What if you read it first and then you do
td <- td[,lapply(.SD,function(x){ iconv(x, "latin1", "ASCII", sub="")})]
assuming that you read your csv file as a data.table
Related
I have a simple CSV file which was created in R using
write.csv(exportChoice, file="exportChoice.csv")
When I read this file using
exportChoice <- read.csv(file = "exportChoice.csv", header=TRUE, row.names=1)
I get the error message:
Error in scan(file, what = "", sep = sep, quote = quote, nlines = 1:
invalid 'sep' value: must be one byte
but I am certain that I only have a single comma separator.
When I open this CSV file in Microsoft Word, I see that my row and column names are surrounded by quotation marks, for some reason. Could this be the problem with reading the file? How do I solve it?
I have a large data set of (~20000x1). Not all the fields are filled, in other words the data does have missing values. Each feature is a string.
I have done the following code runs:
Input:
data <- read.csv("data.csv", header=TRUE, quote = "")
datan <- read.table("data.csv", header = TRUE, fill = TRUE)
Output for the second code:
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 1 did not have 80 elements
Input:
datar <- read.csv("data.csv", header = TRUE, na.strings = NA)
Output:
Warning message:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
EOF within quoted string
I run into essentially 4 problems, that I see. Two of the problems are the error message stated above. The third one is if it doesn't spit out an error message, when I look at the global environment window, I see not all my rows are accounted for, like ~14000 samples are missing but the feature number is right. The other problem I see is, again, not all the samples are counted for and the feature number is not correct.
How can I solve this??
Try the argument comment.char = "" as well as quote. The hash (#) is being read by R as a comment and will cut the line short.
Can you open the CSV using Notepad++? This will allow you to see 'invisible' characters and any other non-printable characters. That file may not contain what you think it contains! When you get the sourcing issue resolved, you can choose the CSV file with a selector tool.
filename <- file.choose()
data <- read.csv(filename, skip=1)
name <- basename(filename)
Or, hard-code the path, and read the data into R.
# Read CSV into R
MyData <- read.csv(file="c:/your_path_here/Data.csv", header=TRUE, sep=",")
I'm trying to read in a large CSV file to R. The file is available at https://github.com/AidData-WM/public_datasets/releases/download/v3.0/AidDataCore_ResearchRelease_Level1_v3.0.zip and the READ ME states that encoding is UTF-8 and there should be 1,561,039 rows and 68 columns. I have tried several different ways to read in the data, but cannot get the full dataset to be read in. I think some problems might arise because: (i) there are incomplete quotations inside character strings, (ii) there are commas inside character strings and sep="," (so I can't use quote="" to deal with the quotations issue), and (iii) there are unusual characters such as arrows.
Here are my various attempts to read the data and the resulting warnings:
aid <- read.csv("AidDataCoreFull_ResearchRelease_Level1_v3.0.csv"),header=T, encoding="UTF-8")
> dim(aid)
[1] 9960 68
Warning message:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
EOF within quoted string
aid <- read.table("AidDataCoreFull_ResearchRelease_Level1_v3.0.csv"),header=T,sep=",",encoding="UTF-8")
> dim(aid)
[1] 9960 68
Warning messages:
1: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
EOF within quoted string
2: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
number of items read is not a multiple of the number of columns
aid <- read.csv("AidDataCoreFull_ResearchRelease_Level1_v3.0.csv"),header=F,skip=1,quote="",encoding="UTF-8")
> dim(aid)
[1] 10956 72
No warning message this time, but no where near the full rows read in and now too many columns.
tx <- readLines("AidDataCoreFull_ResearchRelease_Level1_v3.0.csv",encoding="utf-8",skipNul=T)
> length(tx)
[1] 9961
Warning message:
In readLines("AidDataCoreFull_ResearchRelease_Level1_v3.0.csv", :
incomplete final line found on 'AidDataCoreFull_ResearchRelease_Level1_v3.0.csv'
I can't find a combination of commands that reads in the full CSV, and I can't open it in excel in order to view and try to tidy up the data. Any help would be greatly appreciated!
I am trying to import into R a text file, saved with TextWrangler as Unicode (UTF-8) and Unix(LF)
Here is the code I am using:
scan("Testi/PIRANDELLOsigira.txt", fileEncoding='UTF-8', what=character(), sep='\n')
I got the following warning:
Read 6 items
Warning message:
In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
invalid input found on input connection 'Testi/PIRANDELLOsigira.txt'
and a vector that stops at the first accented character.
first change your locale from Italy to English
Sys.setlocale(category="LC_ALL", locale = "English_United States.1252")
Then you can read the data with italian encoding
df_ch <- read.table("test.utf8",
sep=",",
header=TRUE,
encoding=" Italian",
)
if you want to only read the data with UTF-8 encoding
you can simply use the following
yourdf <- read.table(" path to your data.utf8",
sep=",",
header=TRUE,
encoding="UTF-8",
)
I am trying to make R read my CSV file (which contains numerical and categorical data). I am able to open this file on a Windows computer(I tried different ones and it always worked) without any issues, but it is not working on my Mac at all. I am using the latest version of R. Originally, the data was in Excel and then I converted it to csv.
I have exhausted all my options, I tried recommendations from similar topics but nothing works. One time I sort of succeeded but the result looked like this: ;32,0;K;;B;50;;;; I tried the advice given in this topic Import data into R with an unknown number of columns? and the result was the same. I am a beginner in R and I really know nothing about coding or programming, so I would appreciate tremendously any kind of advice on this issue.Below are my feckless attempts to fix this problem:
> file=read.csv("~/Desktop/file.csv", sep = ";")
Error in type.convert(data[[i]], as.is = as.is[i], dec = dec, na.strings = character(0L)) :
invalid multibyte string at '<ca>110'
> file=read.csv("~/Desktop/file.csv", sep = " ")
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
duplicate 'row.names' are not allowed
> ?read.csv
> file=read.csv2("~/Desktop/file.csv", sep = ";")
Error in type.convert(data[[i]], as.is = as.is[i], dec = dec, na.strings = character(0L)) :
invalid multibyte string at '<ca>110'
> file=read.csv2("~/Desktop/file.csv", sep = ";", header=TRUE)
Error in type.convert(data[[i]], as.is = as.is[i], dec = dec, na.strings = character(0L)) :
invalid multibyte string at '<ca>110'
> file=read.csv("~/Desktop/file.csv", sep=" ",row.names=1)
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
duplicate 'row.names' are not allowed
> file=read.csv("~/Desktop/file.csv", row.names=1)
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
more columns than column names
> file=read.csv("~/Desktop/file.csv", sep=";",row.names=1)
Error in type.convert(data[[i]], as.is = as.is[i], dec = dec, na.strings = character(0L)) :
invalid multibyte string at '<ca>110'
This is what the header of the data looks like. So using the advice below, I saved the document in the CSV format for Mac and once I executed the View(file) function, everything looked ok, except for some rows like the row#1 (Cord Number 1) below, it was completely misplaced :
Cord.Number Ply Attch Knots Length Term Thkns Color Value
1,S,U,,37.0,K,,MB,,,"5.5 - 6.5:4, 8.0 - 8.5:2",,UR1031,unknown,
1s1 S U 1S(5.5/Z) 1E(11.5/S) 46.5 K NA W 11
1s2 S U 1S(5.5/Z) 5L(11.0/Z) 21.0 B NA W 15
This is what the spreadsheet looks like in R Studio on Windows (I don't have enough reputation to post an image):
http://imgur.com/zQdJBT2
As a workaround, what you can do is open the csv file on a Windows machine, and then save it to a .rdata file. Rdata is R's internal storage format. You can then put the file on a USB stick, (or DropBox, Google Drive, or whatever), copy it to your Mac, and work on it there.
# on the Windows PC
dat <- read.csv("<file>", ...)
save(dat, file="<file location>/dat.rdata")
# copy the dat.rdata file over, and then on your Mac:
load("<Mac location>/dat.rdata")
fileEncoding="latin1" is a way to make R read the file, but in my case it came with loss of data and special characters. For example, the symbol € disappeared.
As a workaround that worked best for me for this issue (I'm on a mac too), I opened first the file on Sublime Text, and saved it "with encoding" UTF 8.
When trying to import it after again, it could get read by R with no problem, and my special character were still present.
I had a similar problem, but when including , fileEncoding="latin1" after file's name it works