read_csv does not work separate commas and not capture separate rows - r

I am trying to parse a text log file like this, I can use the default read.csv to parse this file.
test <- read.csv("test.txt", header=FALSE)
It separated all comma parts, though not perfectly put in a dataframe, further manipulation can be done to improve.
However, I can not seem to do so using readr package
test <- read_csv("test.txt", header=FALSE)
All observations turn into 1 row, no separation between commas.
I am learning this package so any help would be great.
{"dev_id":"f8:f0:05:xx:db:xx","data":[{"dist":[7270,7269,7269,7275,7270,7271,7265,7270,7274,7267,7271,7271,7266,7263,7268,7271,7266,7265,7270,7268,7264,7270,7261,7260]},{"temp":0},{"hum":0},{"vin":448}],"time":4485318,"transmit_time":4495658,"version":"1.0"}
{"dev_id":"f8:xx:05:xx:d9:xx","data":[{"dist":[6869,6868,6867,6871,6866,6867,6863,6865,6868,6869,6868,6860,6865,6866,6870,6861,6865,6868,6866,6864,6866,6866,6865,6872]},{"temp":0},{"hum":0},{"vin":449}],"time":4405316,"transmit_time":4413715,"version":"1.0"}
{"dev_id":"xx:f0:05:e8:da:xx","data":[{"dist":[5775,5775,5777,5772,5777,5770,5779,5773,5776,5777,5772,5768,5782,5772,5765,5770,5770,5767,5767,5777,5766,5763,5773,5776]},{"temp":0},{"hum":0},{"vin":447}],"time":4461316,"transmit_time":4473307,"version":"1.0"}
{"dev_id":"xx:f0:xx:e8:xx:0a","data":[{"dist":[4358,4361,4355,4358,4359,4359,4361,4358,4359,4360,4360,4361,4361,4359,4359,4356,4357,4361,4359,4360,4358,4358,4362,4359]},{"temp":0},{"hum":0},{"vin":424}],"time":5190320,"transmit_time":5198748,"version":"1.0"}

Thanks to #Dave2e pointing out that this file is in JSON format, I found the way to parse it using ndjson::stream_in.

Related

R misreading csv files after modifications on Excel

This is more of a curiosity.
Sometimes I modify csv files from Excel rather than R (suppose I manage to find a missing piece of info and I type it in the csv file), of course maintaining commas and quotes as they were.
Every time I do this, R becomes unable to read the csv file, i.e. it imports a single column as it appears on Excel, rather than separating the values (no options like sep= or quote= change this).
Does anyone know why this happens?
Thanks a lot
An example
This was readable:
state,"city","county"
AK,"Anchorage",""
AK,"Haines",""
AK,"Juneau","Juneau"
After adding the missing info under "county", R fails to import it as a data frame, reading it instead as a single vector.
state,"city","county"
AK,"Anchorage","Anchorage"
AK,"Haines","Haines"
AK,"Juneau","Juneau"
Edit:
I'm just running the basic read.csv
df <- read.csv("C:/directory/df.csv")

Exporting large number to csv from R

I came across a strange problem when trying to export an R dataframe to a csv file.
The dataframe contains some big numbers, but when they are written to the csv file, they "lose" the decimal part and are instead written without it.
But not like one would expect, but like this:
Say 3224571816.5649 is the correct value in R. When written to csv, it becomes 32245718165649.
I am using the write.csv2 function to write the csv. The separators are correct, as it works normally for smaller values. Is the problem occurring because the number (with decimals) is bigger than 32bit?
And more importantly, how can I solve this, as I have a whole dataframe with values as big (or bigger) than this? Also, it has to be written in to a csv.
write.csv2 is intended for a different standard of csv (Western European styling, which based on your use of a "." as a decimal indicator, I am guessing you are not looking for). write.csv2 uses a comma as a decimal indicator and a semicolon as the field delimiter, so if you are trying to read the result in as a comma separated file, it will look strange indeed.
I suggest you use write.csv (or even better, write.table) to output your file. write.csv assumes a comma separator and period for decimal marker.
both write.csv and write.csv2 are just wrappers for write.table, which is the underlying method. In general, I recommend use of write.table because it does not assume your region and you can explicitly pass it sep = ",", dec = ".", etc. This not only lets you know what you are using for sure, but it also makes your code a lot more readable.
for more, check the rdocumentation.org site for write.table: https://www.rdocumentation.org/packages/utils/versions/3.5.3/topics/write.table

Using a Value in write.csv

Maybe a simple question, but I am trying to automate a bit of code, including the write.csv process.
I want to use a character (Stationname, e.g. STN76) to name the csv, I have this:
write.csv(AbunData, file = Stationname)
Where the station name is the automated bit. Which sort of works, but without the .csv file encoding. I want to .csv to be
STN76.csv
or would would be even better!
STN76_Routput.csv
Cheers!
if you include more details on exactly what you are doing you will probably get a more helpful answer :), but have you explored using paste() or paste0() to create the file names. E.g.:
file_name <- paste0("STN", SOME_NUMBER, "_Routput.csv")

Reading a CSV file and to tokenize it.

I am a newbie in R. I have been trying to read a CSV file like this.
tweets <- read.csv("tweets.csv")
and I need to be able to remove all of the punctuations, convert to lower cases, remove numbers & stop words & whitespaces from the data frame 'tweets' without having to convert it into a corpus or something. Nothing fancy just straight removing it. Is there any library/function that could help solve this issue?
Reading part of csv is what you have defined
tweets <- read.csv("tweets.csv")
However, for dealing with punctuations, whitespaces the other approach except using corpus is by using regular expressions but that has limited application as it is not generic at all
That is why we prefer corpus as it can become easier to apply to different sources

Deal with escaped commas in CSV file?

I'm reading in a file in R using fread as such
test.set = fread("file.csv", header=FALSE, fill=TRUE, blank.lines.skip=TRUE)
Where my csv consists of 6 columns. An example of a row in this file is
"2014-07-03 11:25:56","61073a09d113d3d3a2af6474c92e7d1e2f7e2855","Securenet Systems Radio Playlist Update","Your Love","Fred Hammond & Radical for Christ","50fcfb08424fe1e2c653a87a64ee92d7"
However, certain rows are formatted in a particular way when there is a comma inside one of the cells. For instance,
"2014-07-03 11:25:59","37780f2e40f3af8752e0d66d50c9363279c55be6","Spotify","\"Hello\", He Lied","Red Box","b226ff30a0b83006e5e06582fbb0afd3"
produces an error of the sort
Expecting 6 cols, but line 5395818 contains text after processing all
cols. Try again with fill=TRUE. Another reason could be that fread's
logic in distinguishing one or more fields having embedded sep=','
and/or (unescaped) '\n' characters within unbalanced unescaped quotes
has failed. If quote='' doesn't help, please file an issue to figure
out if the logic could be improved.
As you can see, the value that is causing the error is "\"Hello\", He Lied", which I want to be read by fread as "Hello, He Lied". I'm not sure how to account for this, though - I've tried using fill=TRUE and quote="" as suggested, but the error still keeps coming up. It's probably just a matter of finding the right parameter(s) for fread; anyone know what those might be?
In read.table() from base R this issue is solvable.
Using Import data into R with an unknown number of columns?
In fread from data.table this is not possible.
Issue logged for this : https://github.com/Rdatatable/data.table/issues/2669

Resources