Error with fread in R--embedded nul in string: '\0' - r

I am trying to read a csv file >4GB, However, when I use fread command it produces and error
library(data.table)
csv1 <- fread("cleaned.csv",sep = ",",colClasses = "character",showProgress = TRUE)
Error: embedded nul in string: '\0'
After some looking I found that you could use sed function
such as in this stackoverflow Question But I have no clue how to use it in my scenario. Please help!
UPDATE:
I have attempted to use the sed function as described below in comments, however, they throw an error.
sed couldn't flush stdout no space left on device
UPDATE2:
I have solved it with the help of some colleagues.However, I am still looking to automate this activity since I had to repeat the process for each file. Expected Automation would either be from within the R or using a BASH Script. Any Suggestions?

The csv files were populated with ^# and they were placed within the blank values, somehow they couldn't be searched or replaced via sed commands to solve the problem, I followed the following solution.
In linux, follow to the file directory and use vim command such as,
vim filename.csv
:%s/CTRL+2//g
ESC #TO SWITCH FROM INSERT MODE
:wq # TO SAVE THE FILE
I had to do this manually for every file. However, I still looking for a way to automate this either within R or using from BASH script.

Related

Using a file path as an argument to system() to execute C code

I have some C code that transforms some data into a different format. My goal is that the R user inputs the file path, and then runs the executable (which came from the C code). I have been having some issues with this however. It seems to not be reading the file path properly. Translator accepts one argument: the file path as the form seen below.
My code: system("Translator C:\\Users\\user\\Documents\\data.csv")
Running this prints the error in my C code File not read. I ran the executable directly and it worked just fine, so it is not a problem with my C code, but how I am calling it in R.
I have tried several different variations of the above code, such as
system2("Translator", args = "C:\\Users\\user\\Documents\\data.csv")
system(paste("Translator C:\\Users\\user\\Documents\\data.csv, collapse = " "))
However, these have not yielded any success. I believe the issue is stemming from the fact that R is not reading the path the way I want it to due to the \\. R reads directories as / I believe. However, fopen in C interprets the directory using \. Is there a way use \ in R, or is this an issue that should be solved in C?
Thank you.
Give this format a shot:
Basically capture.output should push the cat result of the normalizePath function in a 'native' WINdows path format to the system2 command:
system2( command = "Translator", args = capture.output( cat(normalizePath(pathToFile)) ) )
in this case pathToFile can be kept in 'regular' R path format ie: "C:/Users/user/Documents/data.csv" should be possible to keep.

R: importing data impossible (windows)

I used to work with R on my mac and never had any problems.
Now I would like to use it on my work computer (windows). The problem is I can't import any files to start working with them. I tried several options:
mydata<-read.table("c:/temp/myfile.csv",header=TRUE)
mydata<-read.csv("myfile.csv",header=TRUE)
mydata<-read.table("c:/myfile.csv",header=TRUE)
mydata<-read.table("Desktop/myfile.csv",header=TRUE)
I also tried to change / into \ in all variants above.
Nothing seems to work. R displays the command in red, sometimes with a comment "connection can't be opened" or "no such file or directory" (my translation from German).
I tried copying the file I want to open to a different location (desktop, c:, temp), but alas, nothing helps.
Do you have any ideas why I have this problem and how I can solve it? Thanks in advance.
There is a safer way to work with paths; just using file.path().
So, if you're trying to get a file in C:/temp/turtles.csv, then you'd use:
targetFile <- file.path('C:/', 'temp', 'turtles.csv')
read.csv( targetFile, header=TRUE )
Minor point since it showed up on Twitter; DO NOT USE PATHS THAT EXIST ONLY IN YOUR ENVIRONMENT.
Try to keep the data in a path either in or directly under where the script is.
You have three ways to do this with read.csv() function
To avoid inserting actual path you can do it simply nesting function
read.csv(file.choose(),header=TRUE)
it will open pop up for selecting your file just select file from directory
where you have saved it.
Now if you have to insert a path then just get actual location of your file
by
read.csv("C:\path\to\your\file\filename.csv",header=TRUE)
for Example
read.csv("C:\Users\Amway\Desktop\resources.csv",header=TRUE)
Best way is to have your own work space directory
so create a directory by your preferred name and just set that directory as a
R session work space by
setwd("C:\path\to\your\workspace directory\")
check your current directory by
getwd()
now if you want to read a file into R session just copy your file to work
space and just write
read.csv("resources.csv",header=TRUE)
So, it should be like this.
setwd("c:/mydir") # note / instead of \ in windows
Also.
MyData <- read.csv(file="c:/mydir/TheDataIWantToReadIn.csv", header=TRUE, sep=",")
Windows uses the other backslash.
https://www.howtogeek.com/181774/why-windows-uses-backslashes-and-everything-else-uses-forward-slashes/

Converting .Rd file to plain text

I'm trying to convert R documentation files (extension .Rd) into plain text. I am aware that RdUtils contains a tool called Rdconv, but as far as I know it can only be used from the command line. Is there a way to access Rdconv (or a similar conversion tool) from within an R session?
Try
tools::Rd2txt("path/to/file.Rd")
You may always invoke a system command e.g. with the system2 function:
input <- '~/Projekty/stringi/man/stri_length.Rd'
output <- '/tmp/out.txt'
system2('R', paste('CMD Rdconv -t txt', filename, '-o', output))
readLines(output)
## [1] "Count the Number of Characters"
## ...
Make sure that R is in your system's search path. If not, replace the first argument of system2() above with full path, e.g. C:\Program Files\R\3.1\bin\R.exe.

Print plain text of help file to console [duplicate]

I'd like to be able to write the contents of a help file in R to a file from within R.
The following works from the command-line:
R --slave -e 'library(MASS); help(survey)' > survey.txt
This command writes the help file for the survey data file
--slave hides both the initial prompt and commands entered from the
resulting output
-e '...' sends the command to R
> survey.txt writes the output of R to the file survey.txt
However, this does not seem to work:
library(MASS)
sink("survey.txt")
help(survey)
sink()
How can I save the contents of a help file to a file from within R?
Looks like the two functions you would need are tools:::Rd2txt and utils:::.getHelpFile. This prints the help file to the console, but you may need to fiddle with the arguments to get it to write to a file in the way you want.
For example:
hs <- help(survey)
tools:::Rd2txt(utils:::.getHelpFile(as.character(hs)))
Since these functions aren't currently exported, I would not recommend you rely on them for any production code. It would be better to use them as a guide to create your own stable implementation.
While Joshua's instructions work perfectly, I stumbled upon another strategy for saving an R helpfile; So I thought I'd share it. It works on my computer (Ubuntu) where less is the R pager. It essentially just involves saving the file from within less.
help(survey)
Then follow these instructions to save less buffer to file
i.e., type g|$tee survey.txt
g goes to the top of the less buffer if you aren't already there
| pipes text between the range starting at current mark
and ending at $ which indicates the end of the buffer
to the shell command tee which allows standard out to be sent to a file

wget options to get output straight to R

I have a wget_string and commands made as follows:
wget_string <- paste("wget --user=" , u_name , " --password=", p_word, " ", my_urls,' -qO ', file_name, sep="")
system(wget_string)
readLines(file_name)
Which works, but then I have to use readLines() to read the file into R. I would like to run the command so that file is available directly in R, without being saved to the hard disk and then loaded from it.
I'm hoping to save resources by loading the file straight into R from the web. Can't use readlines from the beginning because of the secure server. What are the options for this?
An intern=TRUE parameter the system function has. Capture the output from the command it does. Use the force, and the right options to wget to print to stdout:
> wget_string="wget -qO- http://www.google.com"
> s = system(wget_string,intern=TRUE)
If your returned data is a CSV file, textConnection you can use, feed it to read.csv you can.
If you use the -O - option of wget you set the output to the standar output (writing directly on the screen). In this way you can read directly from the output of the wget command.
E.g.
wget -O - http: //www.address.com
Will download the web page and print it directly to the standard output. So you can directly read the output of the system(wget_string).
From the wget man page:
-O file
--output-document=file
The documents will not be written to the appropriate files, but all will be concatenated together and written to file. If - is used as file, documents will be printed to standard output, disabling link conversion. (Use
./- to print to a file literally named -.)

Resources