Exporting Arabic Text from R - r

I'm trying to export a data frame with Arabic text in R.
When R imports Arabic text it converts it to UTF-8 codes. Like this:
<U+0627><U+0644><U+0641><U+0631><U+0639> <U+0627><U+0644><U+062A><U+0634><U+0631><U+064A><U+0639><U+064A><U+060C> <U+0627><U+0644><U+0641><U+0631><U+0639> <U+0627><U+0644><U+062A><U+0646><U+0641><U+064A><U+0630><U+064A><U+060C><U+0627><U+0644><U+0641><U+0631><U+0639> <U+0627><U+0644><U+0642><U+0636><U+0627><U+0626><U+064A>. <U+0627><U+0644><U+062D><U+0643><U+0648><U+0645><U+0629> <U+0627><U+0644><U+0641><U+062F><U+0631><U+0627><U+0644><U+064A>
Unfortunately, I can't get it to turn back into readable Arabic when exporting. Below is code I'm using...
write.csv(my.data,"data.csv", fileEncoding='UTF-8')
Anybody have a solution?
Also, here is my session info.
R version 3.0.1 (2013-05-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggplot2_0.9.3.1
loaded via a namespace (and not attached):
[1] colorspace_1.2-2 dichromat_2.0-0 digest_0.6.3 grid_3.0.1 gtable_0.1.2
[6] labeling_0.2 MASS_7.3-27 munsell_0.4.2 plyr_1.8 proto_0.3-10
[11] RColorBrewer_1.0-5 reshape2_1.2.2 scales_0.2.3 stringr_0.6.2 tools_3.0.1

This code worked with me so I am sharing it:
Sys.setlocale("LC_CTYPE", "arabic" )
write.csv(group$message, file = 'posts.txt', fileEncoding = "UTF-8")
If you save the file as csv it will not work. You have to save it as txt.

You'll have to install and use locales. It's difficult and sometimes doesn't work.
There's some solutions and code offered here: Writing data isn't preserving encoding
Keep in mind that you actually HAVE to install language packs for your operating system and for some Windows versions there aren't any available separately at all.

Related

Error: 'glmnet_softmax' is not an exported object from 'namespace:glmnet'

I am trying to use this tutorial https://github.com/wvictor14/planet#infer-ethnicity and get the following error message, even when using the test data provided:
pl_infer_ethnicity(pl_betas)
[1] "1860 of 1860 predictors present."
Loading required package: Matrix
Error: 'glmnet_softmax' is not an exported object from 'namespace:glmnet'
I've tried re-installing individual packages and running in a new version or R and get the same error. I believe this is related to other errors posted with a recent update in glmnet. Any tips on how to resolve?
sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Matrix_1.2-18 planet_0.1.0
loaded via a namespace (and not attached):
[1] lattice_0.20-38 codetools_0.2-16 glmnet_3.0-2 foreach_1.4.7
[5] crayon_1.3.4 grid_3.6.2 magrittr_1.5 pillar_1.4.2
[9] rlang_0.4.2 remotes_2.1.0 iterators_1.0.12 tools_3.6.2
[13] compiler_3.6.2 pkgconfig_2.0.3 shape_1.4.4 tibble_2.1.3
I don't know why it would have worked in the first place; the NEWS file for glmnet doesn't say anything one way or the other about glmnet_softmax (e.g., it does not say "glmnet_softmax is no longer exported" or anything like that ...)
In any case, this is a non-exported function from the glmnet package. It is referred to here in the tutorial code.
If you can change that line of code to refer to glmnet:::glmnet_softmax (i.e., three colons rather than two), that should suffice (::: allows you to access a non-exported function).

Using addPosLimit and osMaxPos throws "error in PosLimit[, "MaxPos"] : incorrect number of dimensions"

Even if I copy paste example from blog.fosstrading.com/2011/08/tactical-asset-allocation-using.html I get this error:
error in PosLimit[, "MaxPos"] : incorrect number of dimensions
Is this a bug or am I missing something?
Output from sessionInfo():
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_0.4.3 quantstrat_0.9.1687 foreach_1.4.2
[4] blotter_0.9.1695 PerformanceAnalytics_1.4.3662 FinancialInstrument_1.2.0
[7] quantmod_0.4-5 TTR_0.23-0 xts_0.9-7
[10] zoo_1.7-12
loaded via a namespace (and not attached):
[1] Rcpp_0.12.1 lattice_0.20-33 codetools_0.2-14 assertthat_0.1 grid_3.2.2
[6] R6_2.1.1 DBI_0.3.1 magrittr_1.5 iterators_1.0.7 tools_3.2.2
[11] parallel_3.2.2
You have dplyr loaded. It defines a lag function that masks the generic function, stats::lag. dplyr::lag does not do any method dispatch, so there's a lag method somewhere that isn't being called when it should be.
dplyr also masks the first and last generics defined in xts, which may also cause problems.
A short-term work-around is to call library(dplyr) first, so first and last in xts will mask their couterparts in dplyr. The long-term solution is that all packages should explicitly import all functions they use to avoid issues caused by the sequence in which packages are loaded/attached (note that user's non-packaged code will still be affected by package load/attach order).

Run R script on Amazon EC2

I am working with large datasets in R and some of the computations are too heavy for my machine in terms of RAM (cannot allocate vector of size n Mb).
sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] MASS_7.3-35 lubridate_1.3.3 nlstools_1.0-0 stringr_0.6.2 numbers_0.5-2 plyr_1.8.1
[7] simecol_0.8-4 deSolve_1.11 Quandl_2.4.0 xts_0.9-7 zoo_1.7-11 data.table_1.9.4
[13] RODBC_1.3-10
loaded via a namespace (and not attached):
[1] bitops_1.0-6 chron_2.3-45 digest_0.6.8 grid_3.1.2 lattice_0.20-29 memoise_0.2.1
[7] Rcpp_0.11.4 RCurl_1.95-4.5 reshape2_1.4.1 RJSONIO_1.3-0 tools_3.1.2
I have access to an Amazon EC2 external server with up to 30 GB RAM which should be enough. My question is how I can run an R script on this external server using my local machine? Is there a function for this?
you would have to put script on the external machine and then run it:
ssh user_name#123.321.123.123 'my_script.r'
Much nicer way of doing that would be to use RStudio server.

Special characters and RODBC

in a database I have strings stored that contain special characters such as "§".
Using the command
sqlQuery()
from package RODBC "§" is translated to "?". This is also the case for characters such as " ' " as it can be found in French words.
Of course I can not replace every "?" by one of the special characters after the query. Does anybody have an idea for this problem? I work under windows 7.
As requested the out put of sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: x86_64-pc-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] grid stats graphics grDevices utils datasets methods base
other attached packages:
[1] timeDate_2160.97 gridExtra_0.9.1 reshape2_1.2.2 RODBC_1.3-6 ggplot2_0.9.3.1 plyr_1.8
[7] Runiversal_1.0.2
loaded via a namespace (and not attached):
[1] colorspace_1.2-1 dichromat_2.0-0 digest_0.6.3 gtable_0.1.2 labeling_0.1 MASS_7.3-21
[7] munsell_0.4 proto_0.3-10 RColorBrewer_1.0-5 scales_0.2.3 stringr_0.6.2 tools_2.14.1
If you are seeing an issue where you are using sqlSave() in R to send to MYSQL where not all of the data streams through, it's likely because of special characters. The key is to make sure the character collation is set to the same in both..I found latin1 works best.

accessing global environment objects created in cached chunks Rmarkdown documents

I want to run a R script to run a simulation and cache the results for the Rmarkdown document. I am using Rstudio and try to create an HTML report by using knit HTML Here's a simple example.
```{r test_global_env,cache=TRUE}
print(getwd())
source("./test_script.R")
```
```{r test_global_env_2}
print(a)
```
and test_script.R is as follows
a<-1
When I change the cache option for the chunk to FALSE, print(a) works. If I set it to TRUE, it works the first time, the second time, I get object 'a' not found error.
A similar question is Can knit2pdf use the global environment? , but I could not figure out if it applies to my situation. Here is the sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets
[6] methods base
other attached packages:
[1] knitr_1.2 igraph_0.6.5-2 lubridate_1.3.0
loaded via a namespace (and not attached):
[1] digest_0.6.3 evaluate_0.4.3 formatR_0.8
[4] memoise_0.1 plyr_1.8 stringr_0.6.2
[7] tools_3.0.0

Resources