Error with tex2docx function from reports R package - r

I'm trying to reproduce the example for tex2docx function in reports R package and getting the following error.
DOC <- system.file("extdata/doc_library/apa6.qual_tex/doc.tex",
package = "reports")
BIB <- system.file("extdata/docs/example.bib", package = "reports")
tex2docx(DOC, file.path(getwd(), "test.docx"), path = NULL, bib.loc = BIB)
Error Message
pandoc.exe: Error reading bibliography `C:/Users/Muhammad'
citeproc: the format of the bibliographic database could not be recognized
using the file extension.
docx file generated!
Warning message:
running command 'C:\Users\MUHAMM~1\AppData\Local\Pandoc\pandoc.exe -s C:/Users/Muhammad Yaseen/R/win-library/3.0/reports/extdata/doc_library/apa6.qual_tex/doc.tex -o C:/Users/Muhammad Yaseen/Documents/test.docx --bibliography=C:/Users/Muhammad Yaseen/R/win-library/3.0/reports/extdata/docs/example.bib' had status 23
I wonder how to get tex2docx function in reports R package working properly.

As described in the above comments, the error is caused by passing a filename/path including some spaces that are nor escaped, nor quoted. A workaround could be wrapping all file paths and names inside of shQuote before passing to the command line with system.
Code: https://github.com/trinker/reports/pull/31
Demo:
Loading package
library(reports)
Creating a dummy dir with a space in the name that would hold the bib file
dir.create('foo bar')
file.copy(system.file("extdata/docs/example.bib", package = "reports"), 'foo bar/example.bib')
Specifying the source and the copied bib file:
DOC <- system.file("extdata/doc_library/apa6.qual_tex/doc.tex", package = "reports")
BIB <- 'foo bar/example.bib'
Running the test:
tex2docx(DOC, file.path(getwd(), "test2.docx"), path = NULL, bib.loc = BIB)
Disclaimer: I tried to test this pull request, but I could not setup an environment with all the needed tools to run R CMD check with vignettes and everything else after all in 5 mins (sorry but being on vacation right now and just enjoying the siesta after lunch), so please consider this pull request as "untested" -- although it should work.

Related

TreeTagger in R

I have downloaded TreeTaggerv3.2 for Windows and have configured it per the install.txt. I am trying to use it in R with koRpus package. I have set the kRp.env as -
set.kRp.env(TT.cmd="C:\\TreeTagger\\bin\\tag-english.bat", lang="en",
preset="en", treetagger="manual", format="file",
TT.tknz=TRUE, encoding="UTF-8" )
.My data to be tagged is in a file and trying to use it as treetag("myfile.txt") but it is throwing the error-
Error in matrix(unlist(strsplit(tagged.text, "\t")), ncol = 3, byrow = TRUE, :
'data' must be of a vector type, was 'NULL'
In addition: Warning message:
running command 'C:\windows\system32\cmd.exe /c C:\TreeTagger\bin\tag-english.bat
C:\Users\vivsingh\Desktop\NLP\tree_tag_ex.txt' had status 255
The standalone TreeTagger is working on by windows.Any idea on how it works?
I had the exact same error and warning while trying lemmatization on R word vector following Bernhard Learns blog using windows 7 and R 3.4.1 (x64). The issue was also appearing using textstem package but TreeTagger was running properly in cmd window.
I mixed several answers I found on this post and here is my steps and code running properly:
get into R win_library (~\Documents\R\win-library\3.4\rJava\jri\x64\jri.dll) and copy jri.dll (thanks kravi!) to replace it the parent folder.
close and restart R
library(koRpus)
set.kRp.env(TT.cmd="C:\\TreeTagger\\bin\\tag-english.bat", lang="en", preset="en", treetagger="manual", format="file", TT.tknz=TRUE, encoding="UTF-8")
lemma_tagged <- treetag(lemma_unique$word_clean, treetagger="manual", format="obj", TT.tknz=FALSE , lang="en", TT.options=list(path="c:/TreeTagger", preset="en"))
lemma_tagged_tbl <- tbl_df(lemma_tagged#TT.res)
Hope it helps.
I am posting this answer to keep a record. I also faced the same issue due to incorrect specification of the location of jri.dll on 64-Bit processor and windows 8.1. If we call
set.kRp.env(TT.cmd="manual", lang="en", TT.options=list(path="/path/to/tree-tagger-windows-x.x/TreeTagger", preset="en")) and we follow either of following two steps, we can resolve this error:
While installing R, if we install only 64 Bit version of R, and
specify the proper path for these variables
LD_LIBRARY_PATH = /path/to/rJava/jri
JAVA_HOME = /path/to/jdk1.x.x
java.library.path = /path/to/rJava/jri/jri.dll
CLASSPATH = /path/to/rJava/jri
If we already installed both versions viz. 32 bit and 64 bit of R on your computer then just copy jri.dll from /path/to/rJava/jri/x64/jri.dll and replace at path/to/rJava/jri/jri.dll. Further, we need to set the path of above mentioned four variables.
I've got this issue (very similar I guess) and posted query to GitHub.
https://github.com/unDocUMeantIt/koRpus/issues/7
The current working solution for me for this case was easier than I could expect, just downgrading the koRpus package. This can change with time but this version should remain appropriate.
library("devtools")
install_github("unDocUMeantIt/koRpus", ref="0.06-5")
This package is not Java related they said.
You can face the same error while setting up the korpus environment and getting the result from treetagger. For example, when you use:
tagged.text <- treetag(
"C:/temp/sample_text.txt",
treetagger = "manual",
lang = "en",
TT.options = list(
path = "c:/Treetagger",
preset = "en"
),
doc_id = "sample"
)
You would receive a similar error
Error: Awww, this should not happen: TreeTagger didn't return any useful data.
This can happen if the local TreeTagger setup is incomplete or different from what presets expected.
You should re-run your command with the option 'debug=TRUE'. That will print all relevant configuration.
Look for a line starting with 'sys.tt.call:' and try to execute the full command following it in a command line terminal. Do not close this R session in the meantime, as 'debug=TRUE' will keep temporary files that might be needed.
If running the command after 'sys.tt.call:' does fail, you'll need to fix the TreeTagger setup.
If it does not fail but produce a table with proper results, please contact the author!
Here you need to change the value of treetagger, from
treetagger = "manual"
to
treetagger = "kRp.env"
However, before that remember to set the kRp.env as #Xochitl C. suggested in their answer
set.kRp.env(TT.cmd="C:\\TreeTagger\\bin\\tag-english.bat", lang="en", preset="en", treetagger="manual", format="file", TT.tknz=TRUE, encoding="UTF-8")
Once you do this, you'll get the desired result.

Cannot read data from an xlsx file in RStudio

I have installed the required packages - gdata and ggplot2 and I have installed perl.
library(gdata)
library(ggplot2)
# Read the data from the excel spreadsheet
df = data.frame(read.xls ("AssignmentData.xlsx", sheet = "Data", header = TRUE, perl = "C:\\Strawberry\\perl\\bin\\perl.exe"))
However when I run this I get the following error:
Error in xls2sep(xls, sheet, verbose = verbose, ..., method = method, :
Intermediate file 'C:\Users\CLAIRE~1\AppData\Local\Temp\RtmpE3UYWA\file8983d8e1efc.csv' missing!
In addition: Warning message:
running command '"C:\STRAWB~1\perl\bin\perl.exe" "C:/Users/Claire1992/Documents/R/win-library/3.1/gdata/perl/xls2csv.pl" "AssignmentData.xlsx" "C:\Users\CLAIRE~1\AppData\Local\Temp\RtmpE3UYWA\file8983d8e1efc.csv" "Data"' had status 2
Error in file.exists(tfn) : invalid 'file' argument
Thanks to #Stibu I realised I had to set my work directory. This is the command you use to run in Rstudio; setwd("C/Documents..."). The file path is where the excel file is located.
I had the issue but I solved it differently.
My problem was because my file was saved as Excel (extension .xls) but it was a txt file.
I corrected the file and I did not meet any other error with the R function.

Error in R. Error in gsub("(?<=\n)(?=.|\n)", continue, x, perl = TRUE) :

I am encountering an error in R that I cannot seem to figure out. I am creating an R markdown document where I read in an a csv table using this code.
iati <- read.csv(file="/filepath/IATI_NGOS.csv",head=TRUE,sep=",")
and then using ggplot2 I create a plot using the following code.
figure_one <- ggplot(iati, aes(iati$reporting.org))+
geom_bar(fill="blue")+
ylab("Total Activities")+
xlab("NGO Reporting Organizations in IATI")+
ggtitle("Total Number of Activities compared to each NGO Reporting Organization in IATI")+
coord_flip()
When I try to call figure_one in the R markdown I get the following error:
Quitting from lines 44-55 (NGO_IATI.Rmd)
Error in gsub("(?<=\n)(?=.|\n)", continue, x, perl = TRUE) :
input string 1 is invalid UTF-8
Calls: <Anonymous> ... paste -> comment_out -> line_prompt -> paste -> gsub
In addition: Warning message:
In grep("\n", message) : input string 1 is invalid in this locale
Execution halted
When I run this code in a regular R script I have absolutely no issues. I have search for some answers but can't figure it out.
Thanks!
I ended solving my issue by just doing a fresh install of R and Rstudio on my local machine. I think the recent update to Yosemite on my local environment created a lot of issues with the TeX plugin I had installed for R markdown.
I get the same question when I knit my rmarkdown document and find **encoding is the cause.
When you use functions like read.csv, fread or read_csv, you will read the column name.
If column names are in other languages, like Chinese, the problem will easily happen.
Or you rmarkdown works on Windows, but the encoding bug happens on Mac, a different environment.
The temporal solution is to rename the column name in English and resave the data files.
Here is the pseudocode in R to show my idea.
library(data.table)
library(tidyverse)
fread('yourfile.csv',encoding = 'UTF-8') %>%
purrr::set_names(c('x1','x2','x3')) %>%
write_excel_csv('yourfile_2.csv')
Here the new file yourfile_2.csv is fine to rmarkdown knit without encoding problems happening.

issue with get_rollit_source

I tried to use get_rollit_source from the RcppRoll package as follows:
library(RcppRoll)
get_rollit_source(roll_max,edit=TRUE,RStudio=TRUE)
I get an error:
Error in get("outFile", envir = environment(fun)) :
object 'outFile' not found
I tried
outFile="C:/myDir/Test.cpp"
get_rollit_source(roll_max,edit=TRUE,RStudio=FALSE,outFile=outFile)
I get an error:
Error in get_rollit_source(roll_max, edit = TRUE, RStudio = FALSE, outFile = outFile) :
File does not exist!
How can fix this issue?
I noticed that the RcppRoll folder in the R library doesn't contain any src directory. Should I download it?
get_rollit_source only works for 'custom' functions. For things baked into the package, you could just download + read the source code (you can download the source tarball here, or go to the GitHub repo).
Anyway, something like the following should work:
rolling_sqsum <- rollit(final_trans = "x * x")
get_rollit_source(rolling_sqsum)
(I wrote this package quite a while back when I was still learning R / Rcpp so there are definitely some rough edges...)

Using inst/extdata with vignette during package checking R 2.14.0

I have a package which contains a csv file which I put in inst/extdata per R-exts. This file is needed for the vignette. If I Sweave the vignette directly, all works well. When I run R --vanilla CMD check however, the check process can't find the file. I know it has been moved into an .Rcheck directory during checking and this is probably part of the problem. But I don't know how to set it up so both direct Sweave and vignette building/checking works.
The vignette contains a line like this:
EC1 <- dot2HPD(file = "../inst/extdata/E_coli/ecoli.dot",
node.inst = "../inst/extdata/E_coli/NodeInst.csv",
and the function dot2HPD accesses the file via:
ni <- read.csv(node.inst)
Here's the error message:
> tab <- read.csv("../inst/extdata/E_coli/NodeInst.csv")
Warning in file(file, "rt") :
cannot open file '../inst/extdata/E_coli/NodeInst.csv': No such file or directory
When sourcing ‘HiveR.R’:
Error: cannot open the connection
Execution halted
By the way, this is related to this question but that info seems outdated and doesn't quite cover this territory.
I'm on a Mac.
Have you tried using system.file instead of hardcoded relative paths?
EC1 <- dot2HPD(file = system.file("inst", "extdata", "E_coli", "ecoli.dot", package = "your_package+name"))
node.inst <- system.file("inst", "extdata", "E_coli", "NodeInst.csv", package = "your_package_name")

Resources