Why R cannot read this table while excel can? - r

I am trying to read a specific file that I have copied from an SFTP location. The file is pipe delimited. I can read the file in Excel. But R read is as null values and column names are being duplicated. I don't understand if this is an encoding issue? I am trying to create a bash script to automate this process. Any help? Below is the link for the data.
Here's file!
I have tried changing the Encoding. But without knowing which encoding I am struggling. I have tried using read_delim, ead_table, read.table, read_csv and read.csv. But no help.
this is the code I have used to read the file.
read_delim("./Engagement_Level.txt", delim = "|")
I would like to read it as a data frame.

The issue is that the file encoding is UTF-16LE, which read_delim cannot read at present.
You could use the base read.delim and file() to specify the encoding:
read.delim(file("Engagement_Level.txt", encoding = "UTF-16LE"), sep = "|")
That will convert all the quoted numbers to numeric. If you'd rather they were type character, to deal with later:
read.delim(file("Engagement_Level.txt", encoding = "UTF-16LE"), sep = "|",
colClasses = "character")

I really recommend you to use Excel to build a CSV file using Data>Text in columns, this is not appropriate in this context but it's incredibly infallible and quickly.
Then use read.csv(file,sep=",").

Related

how to write a file so that it does not have commas and remains in same format in R

I am reading file in R:
data <- read.delim ("file.fas", header=TRUE, sep="\t" )
However, after I have done some manipulations to the data, the output format is not same. It now contains commas "," like this all over.
write.table(x= data, file = "file_1.fas")
How can I avoid this? Maybe I should use some different function to write a file?

How to create a dataframe from csv file with texts separated by pipe I? [duplicate]

I have just received a data file, whose extension is "*.psv". After doing a bit of research, I don't know how to open it R.
We could use read.table to read *.psv file.
read.table("myfile.psv", sep = "|", header = FALSE, stringsAsFactors = FALSE)
There might be many different representations of psv file, but when it comes to data mining, I think it might be more about "pipe separated" file. The data in the file is separated by "|"

Delimiters while writing csv files in R

How can I use |(pipe) as a delimiter while writing csv files in R?
When I try writing a data set into a file with write.csv with sep = "|", it ignores the separator and writes the file simply as a comma separated file.
Also write.csv2 also doesn't seem to cover the other variety of characters which could be used as a separator.
Is there a way to use other characters such as ^, $, ~, ¬ or |, as a delimiter while writing a csv file in R.
Thanks.
You have to understand that .csv means "comma-separated value" https://en.wikipedia.org/wiki/Comma-separated_values.
If you want to export with a separator using that characters you need another function.
For example, using write.table, and you'll be able to load this file with R, Excel,....
write.table(data, "data.txt", sep = "|")
data_load <- read.table("data.txt", sep = "|")
Feel free to use any character as separator.
Or you could force this plain text to be .csv
write.table(data, "data.csv", sep = "|")
data_load <- read.csv("data.csv", sep = "|")
This answer is just a variation of the one I gave for this question. They are similar, but I don't think the question itself is an exact duplicate, but they are both part of a bigger question (not yet asked).
In the help for write.table, it states:
write.csv and write.csv2 provide convenience wrappers for writing CSV files.
...
These wrappers are deliberately inflexible: they are designed to ensure that the correct conventions are used to write a valid file.
Attempts to change append, col.names, sep, dec or qmethod are ignored,
with a warning.
To set sep or another of these parameters you need to use write.table instead of write.csv.

Is there a sed type package in R for removing embedded NULs?

I am processing the US Weather service Storm Data, which has one large CSV data file for each year from 1950 onwards. The 1999 year file contains several rows with very large freeform text fields which contain embedded NUL characters, in an otherwise vanilla ascii database. (The offending file is at ftp://ftp.ncdc.noaa.gov/pub/data/swdi/stormevents/csvfiles/StormEvents_details-ftp_v1.0_d1999_c20140915.csv.gz).
R cannot handle corrupted string data without errors,and this includes R data.frame, data.table, stringr, and stringi package functions (tried).
I can clean the files of NULs with sed, but I would prefer not to use external programs, as this is for an R markdown type report with embedded code.
Suggestions?
Maybe this could be of help:
in.file <- file(description = "StormEvents_details-ftp_v1.0_d1999_c20140915.csv",
open = "r")
writeLines(iconv(readLines(in.file), to = "ASCII"),
con = "StormEvents_ascii.csv")
I was able to read the csv file without errors with this call do read.table:
options(stringAsFactors = FALSE)
StormEvents <- read.table("StormEvents_ascii.csv", header = TRUE,
sep = ",", fill = TRUE, quote = '"')
Obviously you'd need to change the class of several columns, since all are considered character as it is.
Just for posterity - you can use binary reads (readBin()) and replace the NULs with anything else - see
Removing "NUL" characters (within R)
An update for May 2020: The tidyverse and data.table both still choke on null characters within files however the base::read.*() family and readLines() will gracefully skip them with the skipNul=TRUE option. You can read a file in skipping over null characters and then write it back out again.

Extract bz2 file in R

I have bunch of .csv.bz2 files, which i have to download, extract, and read in R.
I downloaded the file and want to extract it to current working directory, then read it.
unz(filename,filename.csv) but it does not seem to work. How can I do that?
I heard somewhere that bzfiles can be read directly without decompressing. How can I do that?
You can use any of these two commands:
read.csv()command: with this command you can directly supply your compressed filename containing csv file.
read.csv("file.csv.bz2")
read.table() command: This command is generic version of read.csv() command. You can set delimiters and others options that read.csv() automatically sets. You don't need to uncompress the file separately. This command does it automatically for you.
read.csv("file.csv.bz2", header = TRUE, sep = ",", quote = "\"",...)
Like this:
readcsvbz2file <- read.csv(bzfile("file.csv.bz2"))
You can make use of the super fast fread which has built-in support for bz2-compressed files
require(data.table)
fread("file.csv.bz2")
Basically, you need to type:
library(R.utils)
bunzip2("dataset.csv.bz2", "dataset.csv", remove = FALSE, skip = TRUE)
dataset <- read.csv("dataset.csv")
See documentation here: bunzip2 {R.utils}.
According to read.table description, one can read a compressed file directly.
read.table("file.csv.bz2")

Resources