I am trying to read a file containing non-ASCII characters in R, please see the screenshot below.
enter image description here
When I use readLines() to read the file, I get simply few lines of content.
enter image description here
Please help! Any answer is highly appreciated.
Related
I am trying to export parts of dataframe in r into .txt file, using rio::export
the exported part is a text string that contains some characters as >, ≥ and ", some of these show in the text file as < " or > which is not what I need ( I want them to transfer into the text as greater than or less than, etc...).
when I looked these up online, I found these to be "html special characters" but I could not find a solution. I believe it has something to do with the encoding. is there a way to fix that by adding arguments to rio::export or do you suggest using other package?
I am using the code:
rio::export(df, "df.txt")
I also tried the following from this question but still gives the same problem
How to save R output data to a text file
cat(df$text, file = "myfile.txt") where df$text is the column that has the items I want to export.
thank you
I have tried my best to read a CSV file in r but failed. I have provided a sample of the file in the following Gdrive link.
Data
I found that it is a tab-delimited file by opening in a text editor. The file is read in Excel without issues. But when I try to read it in R using "readr" package or the base r packages, it fails. Not sure why. I have tried different encoding like UTF-8. UTF-16, UTF16LE. Could you please help me to write the correct script to read this file. Currently, I am converting this file to excel as a comma-delimited to read in R. But I am sure there must be something that I am doing wrong. Any help would be appreciated.
Thanks
Amal
PS: What I don't understand is how excel is reading the file without any parameters provided? Can we build the same logic in R to read any file?
This is a Windows-related encoding problem.
When I open your file in Notepad++ it tells me it is encoded as UCS-2 LE BOM. There is a trick to reading in files with unusual encodings into R. In your case this seems to do the trick:
read.delim(con <- file("temp.csv", encoding = "UCS-2LE"))
(adapted from R: can't read unicode text files even when specifying the encoding).
BTW "CSV" stands for "comma separated values". This file has tab-separated values, so you should give it either a .tsv or .txt suffix, not .csv, to avoid confusion.
In terms of your second question, could we build the same logic in R to guess encoding, delimiters and read in many types of file without us explicitly saying what the encoding and delimiter is - yes, this would certainly be possible. Whether it is desirable I'm not sure.
I have a csv file containing text, where after reading it in R, I am getting some unwanted symbols and numbers.
How to remove all these unwanted ones from the file?
Example:
My csv file has two text columns= Question and Answer
Original Question (before opening in R):
Where do I see my bank's account details?
Original Answer:
That's a frequently asked question. You can find details at this link.
After reading the file, I am getting like:
Question:
Where do I see my bank’s account details?
Answer:
That39;s a frequently asked question. You can find details at this link.
I tried saving in UTF-8 and then read the file,
df <-read.csv("data.csv", encoding="UTF-8", stringsAsFactors=FALSE)
But still some unwanted symbols and numbers are appearing as shown above. How do I remove them?
Try read_excel() from the readxl package
library(readxl)
df = read_excel("data.csv")
Have a look at the attached picture, which comes from an RStudio notebook...
After I render this into an HTML file, I get the following:
Why do the "?" characters become "chinese?" characters?
Is there a way to fix this?
Thanks
I have csv file that contains special character in UTF-8 with BOM. Issue here is while reading the file in R it considers "special character" as end of file and file in not read completely.
Have attached the screen shot of special character line.
Help is highly appreciated.
Thanks
Attached image containing special character