Changing file encoding in R - r

I was having difficulties importing an excel sheet into R (csv). However, after reading this post, I was able to successfully import it. However, I noticed that some of the numbers in a particular column have transformed into unwanted characters-"Ï52,386.43" "Ï6,887.61" "Ï32,923.45". Any ideas how I can change these to numbers?
Here's my code below:
df <- read.csv("data.csv", header = TRUE, strip.white = TRUE,
fileEncoding="latin1", stringsAsFactors=FALSE)
I've also tried fileEncoding = "UTF-8" but this doesn't work-I'm getting the following warning:
Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote, :
invalid input found on input connection 'data.csv'
2: In read.table(file = file, header = header, sep = sep, quote = quote
I am using a mac with "R version 3.2.4 (2016-03-10)" (if that makes any difference). Here are the first ten entries from the affected column:
[1] "Ï52,386.43" "Ï6,887.61" "Ï32,923.45" "" "Ï82,108.44"
[6] "Ï6,378.10" "" "Ï22,467.43" "Ï3,850.14" "Ï5,547.83"

It turns out the issue was a pound sign that got changed into Ï in the process of saving an xls file into csv format (in windows-opened in a mac). Thanks for your replies.

Related

Text encoding in R with japanese characters

I am trying to read a CSV file containing texts in many different characters using the function read.csv.
This is a sample of the file content:
device,country_code,keyword,indexed_clicks,indexed_cost
Mobile,JP,お金 借りる,5.913037843442198,103.05985173478956
Desktop,US,email,82.450427682737157,81.871030974598241
Desktop,US,news,414.14755054432345,66.502397615344861
Mobile,JP,ヤフートラベル,450.9622861586314,55.733902871922957
If I use the next function to read the data:
texts <- read.csv("text.csv", sep = ",", header = TRUE)
The dataframe is imported to R, but the characters are not well saved...
device country_code keyword indexed_clicks indexed_cost
1 Mobile JP ã\u0081Šé‡‘ 借りる 5.913038 103.05985
2 Desktop US email 82.450428 81.87103
3 Desktop US news 414.147551 66.50240
4 Mobile JP ヤフートラベル 450.962286 55.73390
If I use the next function (same as before with fileEncoding="UTF-8"):
texts <- read.csv("text.csv", sep = ",", header = TRUE, fileEncoding = "utf-8")
I get the next warning message:
Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote, :
entrada inválida encontrada en la conexión de entrada 'text.csv'
2: In read.table(file = file, header = header, sep = sep, quote = quote, :
incomplete final line found by readTableHeader on 'text.csv'
Anyone knows how to read properly this file?
I replicated your problem with both:
texts <- read.csv("text.csv", sep = ",", header = TRUE)
and
texts_ <- read.csv("text.csv", sep = ",", header = TRUE, encoding = "utf-8")
and both works perfectly fine (R Studio V1.4.1717, Ubuntu 20.04.3 LTS).
Some possibilities I can think of:
The csv file wasn't saved properly as UTF-8 or corrupted. Have you checked the file again?
If you are using Windows, try using encoding instead of fileEncoding. These problems happen with non-standard characters (Windows Encoding Hell).

Issues reading data as csv in R

I have a large data set of (~20000x1). Not all the fields are filled, in other words the data does have missing values. Each feature is a string.
I have done the following code runs:
Input:
data <- read.csv("data.csv", header=TRUE, quote = "")
datan <- read.table("data.csv", header = TRUE, fill = TRUE)
Output for the second code:
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 1 did not have 80 elements
Input:
datar <- read.csv("data.csv", header = TRUE, na.strings = NA)
Output:
Warning message:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
EOF within quoted string
I run into essentially 4 problems, that I see. Two of the problems are the error message stated above. The third one is if it doesn't spit out an error message, when I look at the global environment window, I see not all my rows are accounted for, like ~14000 samples are missing but the feature number is right. The other problem I see is, again, not all the samples are counted for and the feature number is not correct.
How can I solve this??
Try the argument comment.char = "" as well as quote. The hash (#) is being read by R as a comment and will cut the line short.
Can you open the CSV using Notepad++? This will allow you to see 'invisible' characters and any other non-printable characters. That file may not contain what you think it contains! When you get the sourcing issue resolved, you can choose the CSV file with a selector tool.
filename <- file.choose()
data <- read.csv(filename, skip=1)
name <- basename(filename)
Or, hard-code the path, and read the data into R.
# Read CSV into R
MyData <- read.csv(file="c:/your_path_here/Data.csv", header=TRUE, sep=",")

R - incomplete final line and embedded nulls

I am trying to learn how to use R and I'd like to run simple/multiple/logistic regressions but I am stuck right at the beginning. I have succesfully loaded an spss database in R using this code:
> library(foreign)
> data<-read.spss("new long an.sav", use.value.labels=TRUE, to.data.frame=TRUE)
re-encoding from UTF-8
> data
Then, I was trying to specify the data file I want to undertake my regressions on by doing as following:
> newlongan<-read.delim("new long an.sav", header = TRUE)
However, the following error messages comes up and I am not sure how to solve them:
> Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 1 appears to contain embedded nulls
2: In read.table(file = file, header = header, sep = sep, quote = quote, :
incomplete final line found by readTableHeader on 'new long an.sav'
I have got car, boot and QuantPsyc installed. Do you have any idea? Thanks
Silvia

File error when running a simple read.csv command in R

When I run read.csv on a dataset
read.csv(file = msleep_ggplot2, header = TRUE, sep = ",")
I get an error message:
Error in read.table(file = file, header = header, sep = sep, quote = quote, : 'file' must be a character string or connection
The csv file loaded in r studio and looks good. Any idea what the problem might be?

R Error in columns and type.convert(data[[i]], specifically on Mac

I am trying to make R read my CSV file (which contains numerical and categorical data). I am able to open this file on a Windows computer(I tried different ones and it always worked) without any issues, but it is not working on my Mac at all. I am using the latest version of R. Originally, the data was in Excel and then I converted it to csv.
I have exhausted all my options, I tried recommendations from similar topics but nothing works. One time I sort of succeeded but the result looked like this: ;32,0;K;;B;50;;;; I tried the advice given in this topic Import data into R with an unknown number of columns? and the result was the same. I am a beginner in R and I really know nothing about coding or programming, so I would appreciate tremendously any kind of advice on this issue.Below are my feckless attempts to fix this problem:
> file=read.csv("~/Desktop/file.csv", sep = ";")
Error in type.convert(data[[i]], as.is = as.is[i], dec = dec, na.strings = character(0L)) :
invalid multibyte string at '<ca>110'
> file=read.csv("~/Desktop/file.csv", sep = " ")
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
duplicate 'row.names' are not allowed
> ?read.csv
> file=read.csv2("~/Desktop/file.csv", sep = ";")
Error in type.convert(data[[i]], as.is = as.is[i], dec = dec, na.strings = character(0L)) :
invalid multibyte string at '<ca>110'
> file=read.csv2("~/Desktop/file.csv", sep = ";", header=TRUE)
Error in type.convert(data[[i]], as.is = as.is[i], dec = dec, na.strings = character(0L)) :
invalid multibyte string at '<ca>110'
> file=read.csv("~/Desktop/file.csv", sep=" ",row.names=1)
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
duplicate 'row.names' are not allowed
> file=read.csv("~/Desktop/file.csv", row.names=1)
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
more columns than column names
> file=read.csv("~/Desktop/file.csv", sep=";",row.names=1)
Error in type.convert(data[[i]], as.is = as.is[i], dec = dec, na.strings = character(0L)) :
invalid multibyte string at '<ca>110'
This is what the header of the data looks like. So using the advice below, I saved the document in the CSV format for Mac and once I executed the View(file) function, everything looked ok, except for some rows like the row#1 (Cord Number 1) below, it was completely misplaced :
Cord.Number Ply Attch Knots Length Term Thkns Color Value
1,S,U,,37.0,K,,MB,,,"5.5 - 6.5:4, 8.0 - 8.5:2",,UR1031,unknown,
1s1 S U 1S(5.5/Z) 1E(11.5/S) 46.5 K NA W 11
1s2 S U 1S(5.5/Z) 5L(11.0/Z) 21.0 B NA W 15
This is what the spreadsheet looks like in R Studio on Windows (I don't have enough reputation to post an image):
http://imgur.com/zQdJBT2
As a workaround, what you can do is open the csv file on a Windows machine, and then save it to a .rdata file. Rdata is R's internal storage format. You can then put the file on a USB stick, (or DropBox, Google Drive, or whatever), copy it to your Mac, and work on it there.
# on the Windows PC
dat <- read.csv("<file>", ...)
save(dat, file="<file location>/dat.rdata")
# copy the dat.rdata file over, and then on your Mac:
load("<Mac location>/dat.rdata")
fileEncoding="latin1" is a way to make R read the file, but in my case it came with loss of data and special characters. For example, the symbol € disappeared.
As a workaround that worked best for me for this issue (I'm on a mac too), I opened first the file on Sublime Text, and saved it "with encoding" UTF 8.
When trying to import it after again, it could get read by R with no problem, and my special character were still present.
I had a similar problem, but when including , fileEncoding="latin1" after file's name it works

Resources