Pass R object name as argument in shell - r

I'm having a little trouble here using the shell command in R. I have the a java JAR file that takes as input a file containing a character vector (1 tweet per line). I'm calling it from the shell function:
shell("java -Xmx500m -jar C:/Users/User/Documents/R/java/ark-tweet-nlp-0.3.2/ark-tweet-nlp-0.3.2.jar --input-format text C:/Users/User/Documents/R/java/ark-tweet-nlp-0.3.2/examples/test.txt",intern=T)
Rather than pull the character vector from a text file external to the R environment, I want to be able to pass a vector that I have preprocessed within R. For example, if the file "text.txt" is imported into R as a character vector called test, I thought I could do this:
shell(paste("java -Xmx500m -jar C:/Users/User/Documents/R/java/ark-tweet-nlp-0.3.2/ark-tweet-nlp-0.3.2.jar --input-format text",test,sep=" "),intern=T)
But the jar file that is being called needs to actually read the file name, not the file contents. My workaround is to write the preprocessed file to my drive and then reimport using the shell script, but that is clunky and will mess up later processing I plan on doing.

Use the system command set to create an environment variable, then read it from java. The shared location will be the environment variable table.

Related

Read a file in R without changing the working directory

How can others who run my R program read a file(eg: csv) used in my R code without having to change the working directory in setwd()?
I will suggest you use the here() function in the here package in your code like this:
library(here)
Data1 <- read_csv(here("test_data.csv"))
read.csv has a file argument and if I were to quote from the inbuilt R help about file:
If it does not contain an absolute path, the file name is relative to
the current working directory, getwd().
So, providing the absolute path of the file inside the file argument solves this problem.
In Windows
Suppose your file name is test.csv and it's located in D:\files\test_folder (you can get the location of any file from its properties in Windows)
For reading this file you run:
df<-read.csv('D:\\files\\test_folder\\test.csv')
or
df<-read.csv('D:/files/test_folder/test.csv')
Suggested reading: Why \\ instead of \ and Paths in programming languages
Haven't used R in Linux but maybe Getting a file path in Linux might help
Read from web
Just type in the web address of the dataset in the file attribute. Try:
df<-read.csv('https://raw.githubusercontent.com/AdiPersonalWorks/Random/master/student_scores%20-%20student_scores.csv')
Note: This link contains a list of 25 students with their study hours and marks. I myself used this dataset for one of my earlier tasks and its perfectly safe

How to extract object data from Rdata file via command line?

Is there a way to do this? I want to write a bash script that can extract data from an object in an RData file and write it to a text file.
Without being too certain about the specifics of your request (please see about creating a minimal reproducible example), does something like this work:
Assuming your .Rdata is called 'mtcars.Rdata' and contains a data.frame called mtcars and you want to write it to mtcars.csv.
You may have to change the path to where your Rscript.exe file lives.
"C:\Program Files\R\R-3.5.3\bin\Rscript" -e "load('mtcars.Rdata', env <- new.env());write.csv(env$mtcars, 'mtcars.csv')"

feed treetagger in R with text in string rather than text in file

I use TreeTagger from R, through the Korpus package.
Calling the treetag function requires me to indicate a filename,
which contains the text to be processed. However, I would like to provide a string
rather than a filename, because I have a do some preliminary text processing on this string.
I guess this has to go through a file because it is wrapping a script call.
As I am looping over 10000 texts I would like to avoid writing the file to disk and waste time,
but just flow through memory.
Can I avoid this ? Thanks.
No. Or not really. As you suspect, the external script needs a file:
read the docs:
Either a connection or a character vector, valid path to a file,
containing the text to be analyzed. If file is a connection, its
contents will be written to a temporary file, since TreeTagger can't
read from R connection objects.
So its got to write it to a file for the external TreeTagger binary to read. If you don't do that, then the treetag function does it for you. Either way, the text ends up in file.
If TreeTagger can read from a Unix named pipe, or fifo, then you might be able to stream text to it on the fly.
The only other option would be to see if the TreeTagger source can be linked with R in some way so that you can call one of its subroutines directly, passing an R object. I don't even know if this is written in Java or C++ or whatever, but it might be a big job anyway.
As indicated in the documentation:
format:
Either "file" or "obj", depending on whether you want to scan files or analyze the text in a given object, like a character vector. If the latter, it will be written to a temporary file (see file).
Using this knowledge, we can simply use the treetag()-function in combination with a character vector:
treetag(as.vector(yourinput), format = "obj").
Internally R converts it to a text file and Treetagger will refer to that temporary file and analyze it.

run saxon xquery over batch of xml files and produce one output file for each input file

How do I run xquery using Saxon HE 9.5 on a directory of files using the build in command-line? I want to take one file as input and produce one file as output.
This sounds very obvious, but I can't figure it out without using saxon extensions that are only available in PE and EE.
I can read in the files in a directory using fn:collection() or using input parameters. But then I can only produce one output file.
To keep things simple, let's say I have a directory "input" with my files 01.xml, 02.xml, ... 99.xml. Then I have an "output" directory where I want to produce the files with the same names -- 01.xml, 02.xml, ... 99.xml.
Any ideas?
My real data set is large enough (tens of thousands of files) that I don't want to fire off the jvm, so writing a shell script to call the saxon command-line for each file is out of the question.
If there are no built-in command-line options, I may just write my own quick java class.
The capability to produce multiple output files from a single query is not present in the XQuery language (only in XSLT), and the capability to process a batch of input files is not present in Saxon's XQuery command line (only in the XSLT command line).
You could call a single-document query repeatedly from Ant, XProc, or xmlsh (or of course from Java), or you could write the code in XSLT instead.

How to supply file names with paths to R's read.table function?

What is the correct method for enter data(d=read.table("WHAT GOES HERE IF YOU HAVE A MACBOOK ") if you have a mac computer?
Also what does the error code list below mean:
d=read.table(“Firststatex.notepad”,header=T)
Error: unexpected input in "d=read.table(‚"
Two usage errors:
You don't use data() to read in to R datasets held in external files. data() is an R function to load datasets that are built in to R and R packages. read.table("foo.txt") will return a data frame object from the file "foo.txt", which you can assign to an object within R using the assignment operator <-, e.g.
DF <- read.table("foo.txt")
As for "what goes here...", you need to supply a file system path from the current directory to the directory holding the file you want to read in. If the file "foo.txt" is in the current working directory, you can just provide the file name with extension as I did above. If the file is in another directory you need to supply the path to the file name and the file name, for example if the file "foo.txt" is located in the directory above the current directory, you would supply "../foo.txt". If it were in a directory myData located in the directory above the current directory you could us "../myData/foo.txt". So paths can be relative to the current directory. You can also use the fully qualified path on your file system hierarchy.
An alternative is to use the file.choose() function in place of the file name string. This will allow you to navigate to the file you wish to load interactively using a native file selection dialogue. This is what happens on Windows and I suspect also on Mac; not much different happens on Linux. For example:
DF <- read.table(file.choose())
You should probably look for specific help for your operating system if you are not familiar with how to specify file names and paths.
I get the same error when copying and pasting in the code you provide. The problem comes from the fact that you are using fancy, curly quotes “Firststatex.notepad” rather than one of the three sets of accepted quote marks: ` , ", and '; each of these is acceptable, i) "Firststatex.notepad", ii) 'Firststatex.notepad', and iii) `Firststatex.notepad` Just because the quotes you used look like quotes to you or I, these aren't quotes as far as most computer programs recognise. MS Word often inserts these quotes when you enter " for example, as do many other applications.

Resources