R script with cyrillic symbols in Task Scheduler. Encoding issue - r

I have an R script that contains cyrillic symbols (as filtering terms) in it. It is saved with UTF-8 encoding and locale option is set to Sys.setlocale('LC_ALL', "Ukrainian"). It works perfectly when I run it manually. I want to run this script through the Windows Task scheduler. Generally, it works (and produces resulting dataset with non-distorted symbols), but it does not filter by terms, written in cyrillic.
I was wondering how this issue can be resolved?
The script, actually looks like
library(tidyRSS)
library(googlesheets4)
library(dplyr)
library(plyr)
library(adapr)
library(devtools)
library(xlsx)
#devtools::install_github("cran/adapr")
Sys.setlocale('LC_ALL', "Ukrainian")
my_feed1 <- tidyfeed("https://www.vugledar-rada.gov.ua/index.php?format=feed&type=rss")
my_feed2 <- tidyfeed("https://ugledar.info/feed")
to_filter <- rbind.fill(my_feed1, my_feed2)
term <- which(grepl("город", to_filter$item_title) | grepl("город", to_filter$item_description) | grepl("місто", to_filter$item_title) | grepl("місто", to_filter$item_description))
filtered <- to_filter[c(term),]
d <- Sys.Date()
t <- Sys.time()
print("saving to the disk...")
setwd("C:\\Users\\user\\Desktop\\Hanna K\\Newsfeed")
write.xlsx(filtered, paste0("check_news__", d, ".xlsx"))

it is difficult to guess the exact source of your problem, but i strongly suspect that it has something to do with the encoding.
try to check the encoding in the console and in the task scheduler, by printing
options("encoding")
if the encodings differ then you can set the encoding at the top of your script with:
options(encoding = "myencoding")
confirm that the encoding did indeed change with the first command.

Related

Read .sql into R with Spanish characters (á, é, í, ó, ú, ñ, etc)

So, I've been struggling with this for a while now and can't seem to google my way out of it. I'm trying to read a .sql file into R, I always do that to avoid putting 100+ lines of sql in my R scripts. I usually do this:
library(tidyverse)
library(DBI)
con <- dbConnect(<CONNECTION ARGUMENTS>)
query <- read_file("path/to/script.sql")
df <- as_tibble(dbGetQuery(con, query))
dbDisconnect(con)
However, this time my sql script has some Spanish characters in it. Say something like this:
select tree_id, tree
from forest.trees
where species = 'árbol'
When I read this script into R and make the query it just doesn't return anything, but if I copy and paste the sql script into an R string it works! So it seems that the problem is in the line where I read the script into R.
I tried changing the string's encoding in a couple of ways:
# none of these work
query <- read_file("path/to/script.sql")
Encoding(query) <- "latin1"
query <- readLines("path/to/script.sql", encoding = "latin1")
query <- paste0(query, collapse = " ")
Unfortunately I don't have a public database to offer to anyone reading this. I'm connecting to a postgreSQL 11 database.
--- UPDATE ----
I'm on a windows 10 machine, with US locale.
When I use the read_file function the contents of query look ok, the Spanish characters print out like they should, but when I pass it to dbGetQuery it just doesn't fetch anything.
I tried forcing encoding "latin1" because I found online that Spanish characters tend to fix in R when doing that. When doing this, the Spanish characters print out wrong, so I didn't expected it to work, and it didn't.
The character values in my database have 'utf-8' encoding.
Just to be completely clear, all my attempts to read the .sql script haven't worked, however this does work:
library(tidyverse)
library(DBI)
con <- dbConnect(<CONNECTION ARGUMENTS>)
query <- "select tree_id, tree from forest.trees where species = 'árbol'"
# df actually has results
df <- as_tibble(dbGetQuery(con, query))
dbDisconnect(con)
The encoding statement is telling R how to interpret the filename, not its contents. Try this instead:
filetext <- readLines(file("path/to/script.sql", encoding = "latin1"))
See this answer for more details:R: can't read unicode text files even when specifying the encoding
So after some time to think about it, I wondered why the solution proposed by MrFlick didn't work. I checked the encoding of the file created by this chunk:
query <- "select tree_id, tree from forest.trees where species = 'árbol'"
write_lines(query, "test.sql")
After checking what encoding did test.sql have, it turned out it was ANSI, but it didn't look right. So I manually changed my original script.sql encoding to ANSI. After that it worked totally fine.
This solution however, didn't work when I cloned my repo on an ubuntu environment. In ubuntu there was no problem with the original 'utf-8' encoding.
Hope this helps anyone dealing with this in windows.

How to knit directly to R object?

I'd like to store a knit()ted document directly in R as an R object, as a character vector.
I know I can do this by knit()ing to a tempfile() and then import the result, like so:
library(knitr)
library(readr)
ex_file <- tempfile(fileext = ".tex")
knitr::knit(text = "foo", output = ex_file)
knitted_obj <- readr::read_file(ex_file)
knitted_obj
returns
# [1] "foo\n"
as intended.
Is there a way to do this without using a tempfile() and by directly "piping" the result to a vector?
Why on earth would I want this, you ask?
*.tex string will be programmatically saved to disc, and rendered to PDF later. Reading rendered *.tex from disc in downstream functions would make code more complicated.
Caching is just a whole lot easier, and moving this cache to a different machine.
I am just really scared of side effects in general and file system shenanigans across machines/OSes in particular. I want to isolate those to as few (print(), save(), plot()) functions as possible.
Does that make me a bad (or just OCD) R developer?
It should be as straightforward as a single line like this:
knitted_obj = knitr::knit(text = "foo")
You may want to read the help page ?knitr::knit again to know what it returns.
You can use con <- textConnection("varname", "w") to create a connection that writes its output to variable varname, and use output=con in the call to knit(). For example:
library(knitr)
con <- textConnection("knitted_obj", "w")
knit(text="foo", output = con)
close(con)
knitted_obj
returns the same as your tempfile approach, except for the newline. Multiple lines will show up as different elements of knitted_obj. I haven't timed it, but text connections have a reputation for being slow, so this isn't necessarily as fast as writing to the file system.

Is there a way to paste csv data into R instead of reading from file?

I am currently trying to publish a shinyapp to shinyapp.io but I am having trouble configuring my rdrop2 token to upload my data file with it. It is in csv form and I using app.R. Since I cannot upload the token on here in fear of having my Dropbox completely available online I will do my best.
The function I am using with rdrop2 is the following:
token <- drop_auth()
saveRDS(token, "droptoken.rds")
token <- readRDS("droptoken.rds")
drop_acc(dtoken = token)
statadata <- drop_read_csv("/shinyapp/alldata.csv")
g <- na.omit(statadata)
data <- reactive({
g[1:input$scatterD3_nb,]
})
ui <- fluidPage(...
When I run the shiny app on RStudio it is fully functional but when I deploy the app it gives me one of two errors.
ERROR: oauth_listener() needs an interactive environment.
or
Error in func(fname, ...) : app.R did not return a shiny.appobj object.
neither error occurs when I am just printing it into the RStudio viewer.
While I fix this issue is there a way to simply create the data set by copying the csv file text editor version directly into r with something like
read.csv("country,nutsid,year,cyril_index_left,delta_cyril_left,manifesto,cyril_index_abs
,cyril_index,cyril_index_right,delta_cyril_right,Employment_15_64_,Employment_total,youth_employment,L_Employment_total,
L_youth_employment,growth, Austria,AT11,2002,-1017.925,-216.9429,-17.64,72.93657,1017.925,
0,-977.0339,1.1,0.9,0.5,-2.1,-8.9,4.7,Austria,AT11,2006,-923.9658,93.95892,
-4.308,104.4628,923.9658,0,0,0.8,0.4,-1.9,2.5,2.8,1.6", sep = ",")
I really do not see any other solution because shiny won't read my data from local files anyway.
You can use the text= argument to read.table (and therefore read.csv):
x <- read.csv(text="country,nutsid,year,cyril_index_left,delta_cyril_left,manifesto,cyril_index_abs,cyril_index,cyril_index_right,delta_cyril_right,Employment_15_64_,Employment_total,youth_employment,L_Employment_total,L_youth_employment,growth
Austria,AT11,2002,-1017.925,-216.9429,-17.64,72.93657,1017.925,0,-977.0339,1.1,0.9,0.5,-2.1,-8.9,4.7
Austria,AT11,2006,-923.9658,93.95892,-4.308,104.4628,923.9658,0,0,0.8,0.4,-1.9,2.5,2.8,1.6")
Sure. Use something like this.
Lines <- "
header1, header2
val1, 12
val2, 23
"
con <- textConnection(lines)
data <- read.csv(con)
close(con)
You can simplify and have the multiline expression around read.csv(textConnection("...here...")) as well.
You can also paste from the clipboard, but that tends to get OS specific and less portable.

read.table function and stdin

I have a tab-delimited text file that I am trying to load into R with the read.table function. The first few lines of the script look like this
#!/usr/bin/env Rscript
args <- commandArgs(trailingOnly=TRUE)
data <- read.table(args[1], header=TRUE, sep="\t", quote="")
# process the data
This works. I had originally tried to get R to read the data from standard input, but was unsuccessful. My first approach...
#!/usr/bin/env Rscript
data <- read.table(stdin(), header=TRUE, sep="\t", quote="")
# process the data
...didn't seem to work at all. My second approach...
#!/usr/bin/env Rscript
data <- read.table("/dev/stdin", header=TRUE, sep="\t", quote="")
# process the data
...read the data file but (for some reason I don't understand) the first 20 or so lines get mangled, which is a big problem (especially since those lines contain the header information). Is there any way to get read.table to read from standard input? Am I missing something completely obvious?
?stdin says:
stdin() refers to the ‘console’ and not to the C-level ‘stdin’
of the process. The distinction matters in GUI consoles (which
may not have an active ‘stdin’, and if they do it may not be
connected to console input), and also in embedded applications.
If you want access to the C-level file stream ‘stdin’, use
file("stdin").
And:
When R is reading a script from a file, the file is the
‘console’: this is traditional usage to allow in-line data …
That’s the probable reason for the observed behaviour. In principle you can read.table from standard input – but in most (almost all?) cases you’ll want to do this via file('stdin').

Why do I get garbled characters?

Why do I get garbled characters in parse a web?
I have used encoding="big-5\\IGNORE"to get the normal character, but it doesn't work.
require(XML)
url="http://www.hkex.com.hk/chi/market/sec_tradinfo/stockcode/eisdeqty_c.htm"
options(encoding="big-5")
data=htmlParse(url,isURL=TRUE,encoding="big-5\\IGNORE")
tdata=xpathApply(data,"//table[#class='table_grey_border']")
stock <- readHTMLTable(tdata[[1]], header=TRUE, stringsAsFactors=FALSE)
How should I revise my code to change the garbled characters into normal?
#MartinMorgan (below) suggested using
htmlParse(url,isURL=TRUE,encoding="big-5")
Here is an example of what is going on:
require(XML)
url="http://www.hkex.com.hk/chi/market/sec_tradinfo/stockcode/eisdeqty_c.htm"
options(encoding="big-5")
data=htmlParse(url,isURL=TRUE,encoding="big-5")
tdata=xpathApply(data,"//table[#class='table_grey_border']")
stock <- readHTMLTable(tdata[[1]], header=TRUE, stringsAsFactors=FALSE)
stock
The total records should be 1335. In the case above it is 309 - many records appear to have been lost
This is a complicated problem. There are a number of issues:
A Badly-formed html file
The web is not a standard web, not well formed html file,let me prove my point.
please run :
url="http://www.hkex.com.hk/chi/market/sec_tradinfo/stockcode/eisdeqty_c.htm"
txt=download.file(url,destfile="stockbig-5",quiet = TRUE)
How about to open the downloaded file stockbig-5wiht firefox?
Iconv function bug in R
if a html file is well formed,you can use
data=readLines(file)
datachange=iconv(data,from="source encode",to="target encode\IGNORE")
when a html file is not well formed,you can do that way ,in this example,
please run ,
data=readLines(stockbig-5)
An error will occur.
1: In readLines("stockbig-5") :
invalid input found on input connection 'stockbig-5'
You can't use iconv function in R to change encode in bad formed html file.
You can, however do this in shell
I have solved it myself for one night,hard time.
System:debian6(locale utf-8)+R2.15(locale utf-8)+gnome terminal(locale utf-8).
Here is the code:
require(XML)
url="http://www.hkex.com.hk/chi/market/sec_tradinfo/stockcode/eisdeqty_c.htm"
txt=download.file(url,destfile="stockbig-5",quiet = TRUE)
system('iconv -f big-5 -t UTF-8//IGNORE stockbig-5 > stockutf-8')
data=htmlParse("stockutf-8",isURL=FALSE,encoding="utf-8\\IGNORE")
tdata=xpathApply(data,"//table[#class='table_grey_border']")
stock <- readHTMLTable(tdata[[1]], header=TRUE, stringsAsFactors=FALSE)
stock
I want my code more elegant ,the shell command in R code is ugly maybe,
system('iconv -f big5 -t UTF-8//IGNORE stockgb2312 > stockutf-8')
i made tries to replace it with pure R code ,failed ,how can replace it in pure R code?
you can duplicate the result in your computer with the code.
half done,half success,continue to try.

Resources