r 3.4.1 source exprs - r

I have following function as example:
myFunc <- function(x){
while(x < 100){
x <- x+10
cat( x )
cat("\n")
}
}
In the new R version 3.4.1 on Windows I want to source this function from the file myFunc.R like as below:
filepath <- "D:/"
l <- list.files(filepath, pattern = "my", full.names = TRUE)
source(l)
But am getting the following Error:
source(l) Error in source(l) : could not find symbol "exprs" in
environment of the generic function
I hope anyone can help. Thanks a lot

Related

pkstwo error in R

I am running the following code in R:
pkstwo <- function(x, tol = 1e-06) {
if (is.numeric(x))
x <- as.double(x)
else stop("argument 'x' must be numeric")
p <- rep(0, length(x))
p[is.na(x)] <- NA
IND <- which(!is.na(x) & (x > 0))
if (length(IND))
p[IND] <- .C(stats:::C_pkstwo, length(x[IND]), p = x[IND],
as.double(tol), PACKAGE = "stats")$p
p
}
But when I call pkstwo(0.1) I get the following error:
Error in get(name, envir = asNamespace(pkg), inherits = FALSE) :
object 'C_pkstwo' not found
Could anyone please help me fix this?
The C code is now called C_pKS2, and is used in the private function pkstwo() within ks.test()
Run ks.test with no parentheses to see its R code.
ks.test
Run stats:::C_pKS2 for some more info.
stats:::C_pKS2

xml-tei in R: selecting attributes in nodes

I have an xml-tei file:
#in R
doc <- xmlTreeParse("FILE_NAME" , useInternalNodes=TRUE, encoding="UTF-8")
ns = c(ns = "http://www.tei-c.org/ns/1.0")
namespaces = ns
getNodeSet(doc,"//* and //#*", ns)
doc
I am looking at two elements inside my xml-tei: <l> and <w>, and attributes (1) for <l>, #xml:id and (2) for <w> type="verb" and ana="#confrontation #action #ANT":
#example of element <l> and its child <w> in XML-TEI FILE
<l n="5b-6a" xml:id="ktu1.3_ii_5b-6a">
<w>[...]</w>
<w type="verb" ana="#MḪṢ01 #confrontation #action #ANT" xml:id="ktu1-3_ii_l5b-6a_tmtḫṣ" lemmaRef="uga/verb.xml#mḫṣ">tmtḫṣ</w>
<g>.</g>
</l>
I use the function getNodeSet
#in R
l_cont <- getNodeSet(doc, "//ns:l[(#xml:id)]", ns)
l_cont
Of course it shows all elements and attributes inside <l>. But
I would like to select only relevant attributes and their values, to have something like this :
#in R
xml:id="ktu1.3_ii_5b-6a"
type="verb" ana="#confrontation #action #ANT"
Following the suggestion of another post Load XML to Dataframe in R with parent node attributes, I did:
#in R
attrTest <- function(x) {
attrTest01 <- xmlGetAttr(x, "xml:id")
w <- xpathApply(x, 'w', function(w) {
ana <- xmlGetAttr(w, "ana")
if(is.null(w))
data.frame(attrTest01, ana)
})
do.call(rbind, w)
}
res <- xpathApply(doc, "//ns:l[(#xml:id)]", ns ,attrTest)
temp.df <- do.call(rbind, res)
But it doesn't work... I get the errors:
> res <- xpathApply(doc, "//ns:l[(#xml:id)]", ns ,attrTest)
Error in get(as.character(FUN), mode = "function", envir = envir) :
objet 'http://www.tei-c.org/ns/1.0' de mode 'function' introuvable
> temp.df <- do.call(rbind, res)
Error in do.call(rbind, res) : objet 'res' introuvable
Do you have suggestions?
In advance, thank you
I would suggest using the R-package tei2r. (https://rdrr.io/github/michaelgavin/tei2r/) This package has helped me, when working with TEI encoded files.
From this package I would use the function importTexts to import the document and the parseTEI function to get the exact nodes you are looking for.
Another way to import and extract could be this:
read_tei <- function(folder) {
list.files(folder, pattern = '\\.xml$', full.names = TRUE) %>%
map_dfr(~.x %>% parseTEI(.,node = "INSERT_NODE_TO_FIND") %>%tibble())
}
text <- read_tei("/Path/to/file").

eval inside gsubfn inside sub function: object not found

Give the two functions
subfun <- function(txt)
gsubfn::gsubfn("§([^§]+)§", ~eval(parse(text=x)), txt)
topfun <- function(id = 1L)
subfun("Hello §id§ world!")
The following (1.) should yield "Hello 1 world!"but throws an error instead:
topfun()
# Error in eval(expr, envir, enclos) : object 'id' not found
These two (2.) & (3.) work as expected:
id <- 2L
topfun()
# [1] "Hello 2 world!"
topfun2 <- function(id = 1L)
gsubfn::gsubfn("§([^§]+)§", ~eval(parse(text=x)), "Hello §id§ world!")
topfun2()
# [1] "Hello 1 world!"
How can I make (1.) work?
I tried several environment() and parent.frame() variations with the envir parameter of eval and gsubfn, including passing topfun's environment to subfun via the ellipsis argument. All to no success. (Not that I had greater knowledge of what's going on under the hood. But I would have expected R to go up one parent environment after another to look for id...)
I'm using R version 3.3.0 and gsubfn package version 0.6.6.
Thanks in advance!
I am no expert at this but the problem seems to be the use of a formula as replacement in gsubfun. At least I am unable to pass an environment to eval if it is in a formula.
subfun_2 <- function(txt){
ev <- parent.frame() # the environment in which subfun_2 was called
gsubfn::gsubfn("§([^§]+)§", ~eval(parse(text=x), envir = ev), txt)
}
topfun_2 <- function(id = 1L) subfun_2("Hello §id§ world!")
topfun_2()
# Error in eval(parse(text = x), envir = ev) :
# argument "ev" is missing, with no default
If you use a function instead it works as expected:
subfun_3 <- function(txt){
ev <- parent.frame()
gsubfn::gsubfn("§([^§]+)§", function(x)eval(parse(text=x), envir = ev), txt)
}
topfun_3 <- function(id = 1L) subfun_3("Hello §id§ world!")
topfun_3()
# Hello 1 world!

Rhadoop - wordcount using rmr

I am trying to run a simple rmr job using Rhadoop package but it is not working.Here is my R script
print("Initializing variable.....")
Sys.setenv(HADOOP_HOME="/usr/hdp/2.2.4.2-2/hadoop")
Sys.setenv(HADOOP_CMD="/usr/hdp/2.2.4.2-2/hadoop/bin/hadoop")
print("Invoking functions.......")
#Referece taken from Revolution Analytics
wordcount = function( input, output = NULL, pattern = " ")
{
mapreduce(
input = input ,
output = output,
input.format = "text",
map = wc.map,
reduce = wc.reduce,
combine = T)
}
wc.map =
function(., lines) {
keyval(
unlist(
strsplit(
x = lines,
split = pattern)),
1)}
wc.reduce =
function(word, counts ) {
keyval(word, sum(counts))}
#Function Invoke
wordcount('/user/hduser/rmr/wcinput.txt')
I am running above script as
Rscript wordcount.r
I am getting below error.
[1] "Initializing variable....."
[1] "Invoking functions......."
Error in wordcount("/user/hduser/rmr/wcinput.txt") :
could not find function "mapreduce"
Execution halted
Kindly let me know what is the issue.
Firstly, you'll have to set the HADOOP_STREAMING environment variable in your code.
Try the below code, and note that the code assumes that you have copied your text file to the hdfs folder examples/wordcount/data
R Code:
Sys.setenv("HADOOP_CMD"="/usr/local/hadoop/bin/hadoop")
Sys.setenv("HADOOP_STREAMING"="/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.4.0.jar")
# load librarys
library(rmr2)
library(rhdfs)
# initiate rhdfs package
hdfs.init()
map <- function(k,lines) {
words.list <- strsplit(lines, '\\s')
words <- unlist(words.list)
return( keyval(words, 1) )
}
reduce <- function(word, counts) {
keyval(word, sum(counts))
}
wordcount <- function (input, output=NULL) {
mapreduce(input=input, output=output, input.format="text", map=map, reduce=reduce)
}
## read text files from folder example/wordcount/data
hdfs.root <- 'example/wordcount'
hdfs.data <- file.path(hdfs.root, 'data')
## save result in folder example/wordcount/out
hdfs.out <- file.path(hdfs.root, 'out')
## Submit job
out <- wordcount(hdfs.data, hdfs.out)
## Fetch results from HDFS
results <- from.dfs(out)
results.df <- as.data.frame(results, stringsAsFactors=F)
colnames(results.df) <- c('word', 'count')
head(results.df)
Output:
word count
AS 16
As 5
B. 1
BE 13
BY 23
By 7
For your reference, here is another example of running R word count map reduce program.
Hope this helps.

Use of environment variables in R

I am trying to understand the reducer.R code taken from the following website.
http://www.thecloudavenue.com/2013/10/mapreduce-programming-in-r-using-hadoop.html
This code is using for Hadoop Streaming using R.
I have given the code below:
#! /usr/bin/env Rscript
# reducer.R - Wordcount program in R
# script for Reducer (R-Hadoop integration)
trimWhiteSpace <- function(line) gsub("(^ +)|( +$)", "", line)
splitLine <- function(line) {
val <- unlist(strsplit(line, "\t"))
list(word = val[1], count = as.integer(val[2]))
}
env <- new.env(hash = TRUE)
con <- file("stdin", open = "r")
while (length(line <- readLines(con, n = 1, warn = FALSE)) > 0) {
line <- trimWhiteSpace(line)
split <- splitLine(line)
word <- split$word
count <- split$count
if (exists(word, envir = env, inherits = FALSE)) {
oldcount <- get(word, envir = env)
assign(word, oldcount + count, envir = env)
}
else assign(word, count, envir = env)
}
close(con)
for (w in ls(env, all = TRUE))
cat(w, "\t", get(w, envir = env), "\n", sep = "")
Could someone explain the significance of the use of the following new.env command and the subsequent use of the env in the code:
env <- new.env(hash = TRUE)
Why is this required? What happens if this is not included in the code?
Update 06/05/2014
I tried writing another version of this code without having a new environment defined and have given the code as follows:
#! /usr/bin/env Rscript
current_word <- ""
current_count <- 0
word <- ""
con <- file("stdin", open = "r")
while (length(line <- readLines(con, n = 1, warn = FALSE)) > 0)
{
line1 <- gsub("(^ +)|( +$)", "", line)
word <- unlist(strsplit(line1, "[[:space:]]+"))[[1]]
count <- as.numeric(unlist(strsplit(line1, "[[:space:]]+"))[[2]])
if (current_word == word) {
current_count = current_count + count
} else
{
if(current_word != "")
{
cat(current_word,'\t', current_count,'\n')
}
current_count = count
current_word = word
}
}
if (current_word == word)
{
cat(current_word,'\t', current_count,'\n')
}
close(con)
This code gives the same output as the one with a new environment defined.
Question: Does using new environment provide any advantages from a Hadoop standpoint? Is there a reason for using it in this specific case?
Thank you.
Your question is related with environment in R, example code for make new environment in R
> my.env <- new.env()
> my.env
<environment: 0x114a9d940>
> ls(my.env)
character(0)
> assign("a", 999, envir=my.env)
> my.env$foo = "This is the variable foo."
> ls(my.env)
[1] "a" "foo"
I think this article can help you http://www.r-bloggers.com/environments-in-r/
or press
?environment
for more help
Like on code that you give, the author make a new environmnt.
env <- new.env(hash = TRUE)
when he want to assign value they defined the environment
assign(word, oldcount + count, envir = env)
And for the question "What happens if this is not included in the code?" I think you can find the answer on the link that I already provided
About the advantages using new env in R is already answered in this link
so the reason is in this case you will play with the large of dataset, when you passing your dataset to the function, R will make a copy your dataset and then the return data will overwrite the old dataset. But if you passing env, R will directly process that env without copying large dataset.

Resources