TreeTagger in R - r

I have downloaded TreeTaggerv3.2 for Windows and have configured it per the install.txt. I am trying to use it in R with koRpus package. I have set the kRp.env as -
set.kRp.env(TT.cmd="C:\\TreeTagger\\bin\\tag-english.bat", lang="en",
preset="en", treetagger="manual", format="file",
TT.tknz=TRUE, encoding="UTF-8" )
.My data to be tagged is in a file and trying to use it as treetag("myfile.txt") but it is throwing the error-
Error in matrix(unlist(strsplit(tagged.text, "\t")), ncol = 3, byrow = TRUE, :
'data' must be of a vector type, was 'NULL'
In addition: Warning message:
running command 'C:\windows\system32\cmd.exe /c C:\TreeTagger\bin\tag-english.bat
C:\Users\vivsingh\Desktop\NLP\tree_tag_ex.txt' had status 255
The standalone TreeTagger is working on by windows.Any idea on how it works?

I had the exact same error and warning while trying lemmatization on R word vector following Bernhard Learns blog using windows 7 and R 3.4.1 (x64). The issue was also appearing using textstem package but TreeTagger was running properly in cmd window.
I mixed several answers I found on this post and here is my steps and code running properly:
get into R win_library (~\Documents\R\win-library\3.4\rJava\jri\x64\jri.dll) and copy jri.dll (thanks kravi!) to replace it the parent folder.
close and restart R
library(koRpus)
set.kRp.env(TT.cmd="C:\\TreeTagger\\bin\\tag-english.bat", lang="en", preset="en", treetagger="manual", format="file", TT.tknz=TRUE, encoding="UTF-8")
lemma_tagged <- treetag(lemma_unique$word_clean, treetagger="manual", format="obj", TT.tknz=FALSE , lang="en", TT.options=list(path="c:/TreeTagger", preset="en"))
lemma_tagged_tbl <- tbl_df(lemma_tagged#TT.res)
Hope it helps.

I am posting this answer to keep a record. I also faced the same issue due to incorrect specification of the location of jri.dll on 64-Bit processor and windows 8.1. If we call
set.kRp.env(TT.cmd="manual", lang="en", TT.options=list(path="/path/to/tree-tagger-windows-x.x/TreeTagger", preset="en")) and we follow either of following two steps, we can resolve this error:
While installing R, if we install only 64 Bit version of R, and
specify the proper path for these variables
LD_LIBRARY_PATH = /path/to/rJava/jri
JAVA_HOME = /path/to/jdk1.x.x
java.library.path = /path/to/rJava/jri/jri.dll
CLASSPATH = /path/to/rJava/jri
If we already installed both versions viz. 32 bit and 64 bit of R on your computer then just copy jri.dll from /path/to/rJava/jri/x64/jri.dll and replace at path/to/rJava/jri/jri.dll. Further, we need to set the path of above mentioned four variables.

I've got this issue (very similar I guess) and posted query to GitHub.
https://github.com/unDocUMeantIt/koRpus/issues/7
The current working solution for me for this case was easier than I could expect, just downgrading the koRpus package. This can change with time but this version should remain appropriate.
library("devtools")
install_github("unDocUMeantIt/koRpus", ref="0.06-5")
This package is not Java related they said.

You can face the same error while setting up the korpus environment and getting the result from treetagger. For example, when you use:
tagged.text <- treetag(
"C:/temp/sample_text.txt",
treetagger = "manual",
lang = "en",
TT.options = list(
path = "c:/Treetagger",
preset = "en"
),
doc_id = "sample"
)
You would receive a similar error
Error: Awww, this should not happen: TreeTagger didn't return any useful data.
This can happen if the local TreeTagger setup is incomplete or different from what presets expected.
You should re-run your command with the option 'debug=TRUE'. That will print all relevant configuration.
Look for a line starting with 'sys.tt.call:' and try to execute the full command following it in a command line terminal. Do not close this R session in the meantime, as 'debug=TRUE' will keep temporary files that might be needed.
If running the command after 'sys.tt.call:' does fail, you'll need to fix the TreeTagger setup.
If it does not fail but produce a table with proper results, please contact the author!
Here you need to change the value of treetagger, from
treetagger = "manual"
to
treetagger = "kRp.env"
However, before that remember to set the kRp.env as #Xochitl C. suggested in their answer
set.kRp.env(TT.cmd="C:\\TreeTagger\\bin\\tag-english.bat", lang="en", preset="en", treetagger="manual", format="file", TT.tknz=TRUE, encoding="UTF-8")
Once you do this, you'll get the desired result.

Related

RStudio session aborts when I run gRbase::stepwise function (undirected graphs)

I'm doing an R project on RStudio (RStudio 2021.09.1+372 "Ghost Orchid" Release for Windows), (R version 4.1.2).
I'm working on Windows 10 x64.
I want to create an Undirected Graph from a dataset:
library(gRbase)
library(gRim)
library(gRain)
data(BodyFat)
BodyFat <- BodyFat[-c(31,42,48,76,86,96,159,169,175,182,206),]
BodyFat$Age <- sqrt(BodyFat$Age)
BodyFat$Weight <- sqrt(BodyFat$Weight)
names(BodyFat)[names(BodyFat) == 'BodyFat'] <- 'BodyFatPerc'
gRbodyfat <- BodyFat[,2:15]
graph.MyGraph <- cmod(~.^., data = gRbodyfat)
AIC.MyGraph <- gRbase::stepwise(graph.MyGraph)
I'm still exploring RStudio and graphical models, so I run these lines from the console, one by one.
When I run the last line of code, R session aborts and I get the following pop-up message:
I've also tried the following lines of code (I changed the last line):
library(gRbase)
library(gRim)
library(gRain)
data(BodyFat)
BodyFat <- BodyFat[-c(31,42,48,76,86,96,159,169,175,182,206),]
BodyFat$Age <- sqrt(BodyFat$Age)
BodyFat$Weight <- sqrt(BodyFat$Weight)
names(BodyFat)[names(BodyFat) == 'BodyFat'] <- 'BodyFatPerc'
gRbodyfat <- BodyFat[,2:15]
graph.MyGraph <- cmod(~.^., data = gRbodyfat)
AIC.MyGraph <- stepwise(graph.MyGraph)
but I get the same problem. I tried three or four times running those lines of code, R aborted everytime.
My gRbase, gRim, gRain, Rgraphviz and RBGL libraries are in the home folder:
C:/Users/MyUser/Documents/R/win-library/4.1
Any advice?
EDIT:
I've tried uninstalling/reinstalling R, Rtools, RStudio, reinstalling libraries, updating them, I've tried launching RStudio as Administrator, I've tried changing computer.
The last thing I tried is this:
uninstalling R, Rtools and RStudio
deleting .RData and .Renviron files in the Documents folder
deleting all libraries
downloading R (latest version) from CRAN webpage and installing it as Administrator
downloading Rtools (latest version from webpage) and installing it as Administrator
creating the .Renviron file as described here
downloading RStudio (latest version from webpage) and installing it as Administrator
installing BiocManager as described here (when asked to update Matrix library I said NO, I thought it could be the problem)
installing gRbase, gRim, gRain libraries as described here (when asked to update fansi library I said NO because it said "There are binary versions available but the source versions are later", I thought it could be the problem)
Tried again the lines of code
Still got the problem, R session aborted again.
A friend of mine doing a similar project used gRbase::stepwise function, so I inserted his code in the console and the function worked. So I think the problem might be the object of the function, graph.MyGraph. The cmod function works fine, so maybe there's a problem with gRbodyfat, which is used to build graph.MyGraph.
If I run these lines (I tried changing the dataset, I used BodyFat instead of gRbodyfat):
graph.MyGraph <- cmod(~.^., data = BodyFat)
AIC.MyGraph <- gRbase::stepwise(graph.MyGraph)
I get the following error:
Error in dim(res) <- c(NSEL, NSET) :
dims [product 210] do not match the length of object [14]
Don't know what to do.
EDIT:
I've tried debugging gRbase::stepwise function and I get this:
The problem appears to be the iModel function.
EDIT:
I've discovered that if I use graph.MyGraph <- cmod(~.^1, data = gRbodyfat) the gRbase::stepwise function works fine.

How to export sf object to GDB using RPyGeo in R (Windows)?

I have a bunch of sf objects I'd like to export to GDB from R. I'm running R 4.0.2 on Windows 10. In this case the sf objects are all vector point data. The main reasons to export to GDB are to keep longer field names (the shapefile truncation is very annoying), and because GDBs are more desirable storage locations for our workflows.
Yes, I know about the ArcGisBinding package. I've got it to work in a test script but it's pretty unstable - often crashing and requiring a restart of R. This is a problem, because the sf objects I'd like to export come after an already long Rmd that reads in, formats and cleans the data. So it's not a simple manner of re-running the script until arc.write doesn't break. I could break up the script, but then I'd still have to read in a bunch of shapefiles. One option I haven't yet explored is using reticulate to call a python script instead of trying to do everything in R, but we're trying to do our analysis all in one place, if possible.
I'm pretty sure I've managed to set up RPyGeo appropriately, first setting my python path using the reticulate package. I'm doing it this way because IT restrictions means I can't edit PATH variables on my machine.
#package calls
library(sf)
library(spData)
library(reticulate)
#set python version in reticulate
py_path <- "C:/Program Files/ArcGIS/Pro/bin/Python/envs/arcgispro-py3/python.exe"
reticulate::use_python(python = py_path, required = TRUE)
#call RPyGeo
library(RPyGeo) # for potential point export
#output gdb
out.gdb <- "C:/LOCAL_PROJECTS/Output/Output.gdb"
#RPyGeo Parameters
# Note that, in order to use RPyGeo you need a working ArcMap or ArcGIS Pro installation on your computer.
# python path - note that this will change depending on which version of Arc one is using
# py_path <- "C:/Program Files/ArcGIS/Pro/bin/Python/envs/arcgispro-py3/python.exe"
arcpy <- rpygeo_build_env(workspace = out.gdb,
overwrite = TRUE,
extensions = c("Spatial","DataInteroperability"),
path = py_path)
I've tried a bunch of different tools to export an sf object, here using dummy data also used in the RPyGeo vignette
data(nz, package = "spData")
arcpy$Copy_management(in_data = nz,out_data = "nz_test")
arcpy$Copy_management(in_data = nz,out_data = file.path(out.gdb,"nz"))
arcpy$FeatureClassToGeodatabase_conversion(Input_Features = nz,Output_Geodatabase = out.gdb)
arcpy$FeatureClassToFeatureClass_conversion(in_features = nz,out_path = out.gdb,out_name = "nz")
arcpy$QuickExport_interop(Input = nz,Output = file.path(out.gdb,"nz"))
arcpy$CopyFeatures_management(in_features = nz,out_feature_class = file.path(out.gdb,"nz"))
arcpy$CopyFeatures_management(in_features = nz,out_feature_class = "nz")
Each time I get an error, for example:
Error in py_call_impl(callable, dots$args, dots$keywords) :
RuntimeError: Object: Error in executing tool
Detailed traceback:
File "C:\Program Files\ArcGIS\Pro\Resources\ArcPy\arcpy\management.py", line 3232, in CopyFeatures
raise e
File "C:\Program Files\ArcGIS\Pro\Resources\ArcPy\arcpy\management.py", line 3229, in CopyFeatures
retval = convertArcObjectToPythonObject(gp.CopyFeatures_management(*gp_fixargs((in_features, out_feature_class, config_keyword, spatial_grid_1, spatial_grid_2, spatial_grid_3), True)))
File "C:\Program Files\ArcGIS\Pro\Resources\ArcPy\arcpy\geoprocessing\_base.py", line 511, in <lambda>
return lambda *args: val(*gp_fixargs(args, True))
I'm not an expert in ArcPy by any means. Nor am I an expert in tracing errors inside packages. Am I making a simple syntax mistake? Is there something else that I'm missing? Any help would be much appreciated!

How do I deal with an error message while installing a package?

I am brand new to this so please forgive my inexperience...I'm trying to learn.
I'm attempting to install an R package called "Doublet Finder" using the specified code given on the Github site.
When I do this, I get this error immediately:
Error in rbind(info, getNamespaceInfo(env, "S3methods")) :
number of columns of matrices must match (see arg 2)
Being new to R, I'm not sure what this error means and when I google this something similar comes up and the individual removed and re-installed ALL of their libraries...that seems crazy. Does anyone have advice on what this could be, how to fix it, or why the package won't install?
Your problem seems to be fairly similar to this one. It might be the case that the dependencies (packages that Doublet Finder relies on) are outdated. What you can try is to follow these steps to uninstall and reinstall all packages with the hope that by updating packages there isn't a version mismatch.
This code is copied from the website above:
ip <- as.data.frame(installed.packages(lib.loc = .libPaths()[1]),
stringsAsFactors = FALSE)
head(ip)
str(ip)
path.lib <- unique(ip$LibPath)
# create a vector with all the names of the packages you want to remove
pkgs.to.remove <- ip[,1]
head(pkgs.to.remove)
str(pkgs.to.remove)
sapply(pkgs.to.remove, remove.packages, lib = path.lib)
sapply(pkgs.to.remove, install.packages, lib = path.lib)

Installing pdftotext on Windows (for use with R, 'tm' package)

I am having trouble using R, 'tm' package, to read in .pdf files.
Specifically, I try to run the following code:
library(tm)
filename = "myfile.pdf"
tmp1 <- readPDF(PdftotextOptions="-layout")
doc <- tmp1(elem=list(uri=filename),language="en",id="id1")
doc[1:15]
...which gives me the error:
Error in readPDF(PdftotextOptions = "-layout") :
unused argument (PdftotextOptions = "-layout")
I assume this is due to the fact that the pdftotext program (part of xpdf, http://www.foolabs.com/xpdf/download.html) has not been installed correctly on my machine, so that R cannot access it.
What are the steps to install xpdf/pdftotext correctly such that the above R code can be executed? (I am aware of similar questions already posted, however they don't address the same issue)
PdftotextOptions is no parameter of readPDF. readPDF has a control parameter, which expects a list. So correct use would be:
if(all(file.exists(Sys.which(c("pdfinfo", "pdftotext"))))) {
tmp1 <- readPDF(control = list(text = "-layout"))
doc <- tmp1(elem=list(uri=filename),language="en",id="id1")
}
Set
setwd('C:/xpdf/bin64')
It works for me.

I am trying to make an interactive map using leafletR package according to a ZevRoss blog. But there is an error in the code

The ZevRoss blog is as follow:
http://zevross.com/blog/2014/04/11/using-r-to-quickly-create-an-interactive-online-map-using-the-leafletr-package/
The code with error is:
# ----- Write data to GeoJSON
leafdat<-paste(downloaddir, "/", filename, ".geojson", sep="")
writeOGR(subdat, leafdat, layer="", driver="GeoJSON")
And the error is:
Error in writeOGR(subdat, leafdat, layer = "", driver = "GeoJSON") :
GDAL Error 3: Cannot open file 'd:/Leaflet/County_2010Census_DP1.geojson'
Because I am a freshman in R, I searched for this problem a lot and didn't get any good answer.
I am using Rstudio R version 3.1.1(2014-07-10) on windows 7 32bit.
My rgdal version is 0.9-1.
The other code in the blog runs successfully, this sentence seems to be the only difficult point.
You could create GeoJSON using leafletR package:
library('leafletR')
Your_GeoJSON <- toGeoJSON(data=YourData, dest=getwd())
I've tried to find a solution for this mysterious error for some time.
Eventually I found this post on the Gdal package errors' tickets site that clarified the problem and gave a solution.
Basically the problem is in the interface between rgdal and Gdal (Gdal changed their way to work and the latest version of rgdal hasn't watched up yet):
writeOGR() calls ogrCheckExists("foo.geojson") to check first if the file exists before creating a new dataset.
In the 1.11 version the OGR GeoJSON driver will emit an error message that this file doesn't exists, whereas previous version didn't emit an error message.
rgdal catches this error as a fatal one and doesn't go to the writing step. This should be fixed in rgdal.
Meanwhile you have an easy workaround : add check_exists = FALSE as a parameter to writeOGR()
Therefore the following code will work:
writeOGR(spDf,'foo.geojson','spDf', driver='GeoJSON',check_exists = FALSE)
Of course if there is already a geojson file with the chosen name at the location writeOGR still fails.
Even though you already have a 'd:' drive on your computer and you have permission to write to that drive, try the following:
--------------------------------
leafdat<-paste(downloaddir, "/", ".geojson", sep="")
> leafdat
> "d:/Leaflet/.geojson"
writeOGR(subdat, leafdat, layer="", driver="GeoJSON")
--------------------------------
Then you may get ".geojson" file on "d:/Leaflet". Change the file name ".geojson" to "County_2010Census_DP1.geojson".

Resources