Importing non-English shapefiles - r

I'm trying to import a shape-file like this:
fn <- "Proj1"
my_shp <- readShapeSpatial(fn)
On a windows-computer (32-bit) it works ok, but when I do the same from a Ubuntu-machine (64-bit, English OS, R2.14.0), I get "Error in make.names(onames, unique = TRUE) : invalid multibyte string 9".
I suspect it is because the shapefile has Spanish origins, i.e. the names of polygons in it have accents like in "México" (not "Mexico").
As a quick fix, I did the import in windows, saved as .rda and loaded it in Ubuntu, but then I get for example "M\xfexico" as polygon name.
I'm not so experienced in Linux so I don't know if the fix is in R or in Ubuntu. Your help is highly appreciated.

The solution is to start R on the Ubuntu-computer by writing "LC_ALL=C R" in a terminal window. Thanks to Oscar Perpiñán for the solution.
Update: I use RStudio, where as far as I know it is not possible to start R with command-line parameters, but this works from inside RStudio:
Sys.setlocale(category = "LC_ALL", locale = "C")
/Chris

Related

R: Encoding of labelled data and knit to html problems

First of all, sorry for not providing a reproducible example and posting images, a word of explanation why I did it is at the end.
I'd really appreciate some help - comments or otherwise, I think I did my best to be as specific and concise as I can
Problem I'm trying to solve is how to set up (and where to do it) encoding in order to get polish letters after a .Rmd document is knitted to html.
I'm working with a labelled spss file imported to R via haven library and using sjPlot tools to make tables and graphs.
I already spent almost all day trying to sort this out, but I feel I'm stucked with no idea where to go.
My sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Matrix products: default
locale:
[1] LC_COLLATE=Polish_Poland.1250 LC_CTYPE=Polish_Poland.1250 LC_MONETARY=Polish_Poland.1250
[4] LC_NUMERIC=C LC_TIME=Polish_Poland.1250
Whenever I run (via console / script)
sjt.frq(df$sex, encoding = "Windows-1250")
I get a nice table with proper encoding in the rstudio viewer pane:
Trying with no encoding sjt.frq(df$sex) gives this:
I could live with setting encoding each time a call to sjt.frq is made, but problem is, that no matter how I set up sjt.frq inside a markdown document, it always gets knited the wrong way.
Running chunk inside .Rmd is OK (for a completely unknown reason encoding = "UTF-8 worked as well here and it didn't previously):
Knitting same document, not OK:
(note, that html header has all the polish characters)
Also, it looks like that it could be either html or sjPlot specific because knitr can print polish letters when they are in a vector and are passed as if they where printed to console:
Is there anything I can set up / change in order to make this work?
While testing different options I discovered, that manually converting sex variable to factor and assigning labels again, works and Rstudio knits to html with proper encoding
df$sex <- factor(df$sex, label = c("kobieta", "mężczyzna"))
sjt.frq(df$sex, encoding = "Windows-1250")
Regarding no reproducible example:
I tried to simulate this example with fake data:
# Get libraries
library(sjPlot)
library(sjlabelled)
x <- rep(1:4, 4)
x<- set_labels(x, labels = c("ąę", "ćŁ", "óŚŚ", "abcd"))
# Run freq table similar to df$sex above
sjt.frq(x)
sjt.frq(x, encoding = "UTF-8")
sjt.frq(x, encoding = "Windows-1250")
Thing is, each sjt.frq call knits the way it should (although only encoding = "Windows-1250" renders properly in rstudio viewer pane.
If you run sjt.frq(), a complete HTML-page is returned, which is displayed in a viewer.
However, for use inside markdown/knitr-documents, there are only parts of the HTML-output required: You don't need the <head> part, for instance, as the knitr-document creates an own header for the HTML-page. Thus, there's an own print()-method for knitr-documents, which use another return-value to include into the knitr-file.
Compare:
dummy <- sjt.frq(df$sex, encoding = "Windows-1250")
dummy$output.complete # used for default display in viewer
dummy$knitr # used in knitr-documents
Since the encoding is located in the <meta>-tag, which is not included in the $knitr-value, the encoding-argument in sjt.frq() has no effect on knitr-documents.
I think that this might help you: rmarkdown::render_site(encoding = 'UTF-8'). Maybe there are also other options to encode text, or you need to modify the final HTML-file, changing the charset encoding there.

Encoding KNIME R-Snippet in pdf export (Special Characters)

I have a Encoding Problem in KNIME.
Following code works perfectly in RStudio, the symbol ° is printed out correctly.
library(grid)
library(gridBase)
library(gridExtra)
library(ggplot2)
fn <- "C:/Temp/textR.pdf"
pdf(file=fn)
df <- data.frame("crit °C", 1)
g1 <- tableGrob(format(df, core.just="left"))
grid.arrange(g1, ncol = 1)
dev.off()
I want to use this code in an R Snippet in KNIME, unfortunately it won´t work there, instead of "°" I get "°".
What i already tried:
setting Preferences to UTF-8 in KNIME ->Preferences->General->Workspace
https://tech.knime.org/forum/knime-textprocessing/problems-exporting-utf-8-csv-writer
use ggsave
use Pdf Cairo instead (solution from using Unicode 'dingbat-like' glyphs in R graphics, across devices & platforms, especially PDF) but i am not sure if I had the family package installed...
Can anyone help me? I am using KNIME 3.1.1 and R_3_2_1
It was working well for me (also Windows, in my case Windows 10, 64 bit, English locale, KNIME 3.1.2, R 3.0.3).
You might give a try with the following change:
df <- data.frame("crit \u00B0C", 1)
(\u00B0 stands for the unicode degree sign.)

TreeTagger in R

I have downloaded TreeTaggerv3.2 for Windows and have configured it per the install.txt. I am trying to use it in R with koRpus package. I have set the kRp.env as -
set.kRp.env(TT.cmd="C:\\TreeTagger\\bin\\tag-english.bat", lang="en",
preset="en", treetagger="manual", format="file",
TT.tknz=TRUE, encoding="UTF-8" )
.My data to be tagged is in a file and trying to use it as treetag("myfile.txt") but it is throwing the error-
Error in matrix(unlist(strsplit(tagged.text, "\t")), ncol = 3, byrow = TRUE, :
'data' must be of a vector type, was 'NULL'
In addition: Warning message:
running command 'C:\windows\system32\cmd.exe /c C:\TreeTagger\bin\tag-english.bat
C:\Users\vivsingh\Desktop\NLP\tree_tag_ex.txt' had status 255
The standalone TreeTagger is working on by windows.Any idea on how it works?
I had the exact same error and warning while trying lemmatization on R word vector following Bernhard Learns blog using windows 7 and R 3.4.1 (x64). The issue was also appearing using textstem package but TreeTagger was running properly in cmd window.
I mixed several answers I found on this post and here is my steps and code running properly:
get into R win_library (~\Documents\R\win-library\3.4\rJava\jri\x64\jri.dll) and copy jri.dll (thanks kravi!) to replace it the parent folder.
close and restart R
library(koRpus)
set.kRp.env(TT.cmd="C:\\TreeTagger\\bin\\tag-english.bat", lang="en", preset="en", treetagger="manual", format="file", TT.tknz=TRUE, encoding="UTF-8")
lemma_tagged <- treetag(lemma_unique$word_clean, treetagger="manual", format="obj", TT.tknz=FALSE , lang="en", TT.options=list(path="c:/TreeTagger", preset="en"))
lemma_tagged_tbl <- tbl_df(lemma_tagged#TT.res)
Hope it helps.
I am posting this answer to keep a record. I also faced the same issue due to incorrect specification of the location of jri.dll on 64-Bit processor and windows 8.1. If we call
set.kRp.env(TT.cmd="manual", lang="en", TT.options=list(path="/path/to/tree-tagger-windows-x.x/TreeTagger", preset="en")) and we follow either of following two steps, we can resolve this error:
While installing R, if we install only 64 Bit version of R, and
specify the proper path for these variables
LD_LIBRARY_PATH = /path/to/rJava/jri
JAVA_HOME = /path/to/jdk1.x.x
java.library.path = /path/to/rJava/jri/jri.dll
CLASSPATH = /path/to/rJava/jri
If we already installed both versions viz. 32 bit and 64 bit of R on your computer then just copy jri.dll from /path/to/rJava/jri/x64/jri.dll and replace at path/to/rJava/jri/jri.dll. Further, we need to set the path of above mentioned four variables.
I've got this issue (very similar I guess) and posted query to GitHub.
https://github.com/unDocUMeantIt/koRpus/issues/7
The current working solution for me for this case was easier than I could expect, just downgrading the koRpus package. This can change with time but this version should remain appropriate.
library("devtools")
install_github("unDocUMeantIt/koRpus", ref="0.06-5")
This package is not Java related they said.
You can face the same error while setting up the korpus environment and getting the result from treetagger. For example, when you use:
tagged.text <- treetag(
"C:/temp/sample_text.txt",
treetagger = "manual",
lang = "en",
TT.options = list(
path = "c:/Treetagger",
preset = "en"
),
doc_id = "sample"
)
You would receive a similar error
Error: Awww, this should not happen: TreeTagger didn't return any useful data.
This can happen if the local TreeTagger setup is incomplete or different from what presets expected.
You should re-run your command with the option 'debug=TRUE'. That will print all relevant configuration.
Look for a line starting with 'sys.tt.call:' and try to execute the full command following it in a command line terminal. Do not close this R session in the meantime, as 'debug=TRUE' will keep temporary files that might be needed.
If running the command after 'sys.tt.call:' does fail, you'll need to fix the TreeTagger setup.
If it does not fail but produce a table with proper results, please contact the author!
Here you need to change the value of treetagger, from
treetagger = "manual"
to
treetagger = "kRp.env"
However, before that remember to set the kRp.env as #Xochitl C. suggested in their answer
set.kRp.env(TT.cmd="C:\\TreeTagger\\bin\\tag-english.bat", lang="en", preset="en", treetagger="manual", format="file", TT.tknz=TRUE, encoding="UTF-8")
Once you do this, you'll get the desired result.

Error in R. Error in gsub("(?<=\n)(?=.|\n)", continue, x, perl = TRUE) :

I am encountering an error in R that I cannot seem to figure out. I am creating an R markdown document where I read in an a csv table using this code.
iati <- read.csv(file="/filepath/IATI_NGOS.csv",head=TRUE,sep=",")
and then using ggplot2 I create a plot using the following code.
figure_one <- ggplot(iati, aes(iati$reporting.org))+
geom_bar(fill="blue")+
ylab("Total Activities")+
xlab("NGO Reporting Organizations in IATI")+
ggtitle("Total Number of Activities compared to each NGO Reporting Organization in IATI")+
coord_flip()
When I try to call figure_one in the R markdown I get the following error:
Quitting from lines 44-55 (NGO_IATI.Rmd)
Error in gsub("(?<=\n)(?=.|\n)", continue, x, perl = TRUE) :
input string 1 is invalid UTF-8
Calls: <Anonymous> ... paste -> comment_out -> line_prompt -> paste -> gsub
In addition: Warning message:
In grep("\n", message) : input string 1 is invalid in this locale
Execution halted
When I run this code in a regular R script I have absolutely no issues. I have search for some answers but can't figure it out.
Thanks!
I ended solving my issue by just doing a fresh install of R and Rstudio on my local machine. I think the recent update to Yosemite on my local environment created a lot of issues with the TeX plugin I had installed for R markdown.
I get the same question when I knit my rmarkdown document and find **encoding is the cause.
When you use functions like read.csv, fread or read_csv, you will read the column name.
If column names are in other languages, like Chinese, the problem will easily happen.
Or you rmarkdown works on Windows, but the encoding bug happens on Mac, a different environment.
The temporal solution is to rename the column name in English and resave the data files.
Here is the pseudocode in R to show my idea.
library(data.table)
library(tidyverse)
fread('yourfile.csv',encoding = 'UTF-8') %>%
purrr::set_names(c('x1','x2','x3')) %>%
write_excel_csv('yourfile_2.csv')
Here the new file yourfile_2.csv is fine to rmarkdown knit without encoding problems happening.

Output from 'choice' in R's kml

I'm having trouble getting 'choice' to create output. When the graphical interface launches, I am selecting a partition with the space bar. This creates a black circle around the partition, indicating it has been selected. When I click 'return', nothing happens.
I checked my working directory to look for the output files, but they are not there. I used getwd() to ensure that I have the correct setwd(). No dice.
There was a similar question posted: Exporting result from kml package in R; however, the answer does not work for me.
Any suggestions? I am using R 3.1.0 GUI Mavericks build(6734) and XQuartz 2.7.6. Thanks for any help getting this working.
Here is my code:
setwd("/Users/eightfrench")
mydata <- read.csv("hcris_long3.csv")
cldHCRIS <- clusterLongData(traj=mydata)
kml(cldHCRIS,nbClusters=2:4,nbRedrawing=2,toPlot="both")
X11(type="Xlib")
choice(cldHCRIS, typeGraph= "bmp")
I had the same problem when using RStudio and i solved it just opening the x11 device with no specification. Like this:
setwd("/Users/eightfrench")
mydata <- read.csv("hcris_long3.csv")
cldHCRIS <- clusterLongData(traj=mydata)
kml(cldHCRIS,nbClusters=2:4,nbRedrawing=2,toPlot="both")
X11()
choice(cldHCRIS, typeGraph= "bmp")
Hope this also works for you.
The new version of kml (2.3) fix that, it provide a key to export data with linux as well (not "return" since getGraphicsEvent does not accept "return" using linux, so I have to map another key : 'm')
Christophe

Resources