xml-tei in R: selecting attributes in nodes - r

I have an xml-tei file:
#in R
doc <- xmlTreeParse("FILE_NAME" , useInternalNodes=TRUE, encoding="UTF-8")
ns = c(ns = "http://www.tei-c.org/ns/1.0")
namespaces = ns
getNodeSet(doc,"//* and //#*", ns)
doc
I am looking at two elements inside my xml-tei: <l> and <w>, and attributes (1) for <l>, #xml:id and (2) for <w> type="verb" and ana="#confrontation #action #ANT":
#example of element <l> and its child <w> in XML-TEI FILE
<l n="5b-6a" xml:id="ktu1.3_ii_5b-6a">
<w>[...]</w>
<w type="verb" ana="#MḪṢ01 #confrontation #action #ANT" xml:id="ktu1-3_ii_l5b-6a_tmtḫṣ" lemmaRef="uga/verb.xml#mḫṣ">tmtḫṣ</w>
<g>.</g>
</l>
I use the function getNodeSet
#in R
l_cont <- getNodeSet(doc, "//ns:l[(#xml:id)]", ns)
l_cont
Of course it shows all elements and attributes inside <l>. But
I would like to select only relevant attributes and their values, to have something like this :
#in R
xml:id="ktu1.3_ii_5b-6a"
type="verb" ana="#confrontation #action #ANT"
Following the suggestion of another post Load XML to Dataframe in R with parent node attributes, I did:
#in R
attrTest <- function(x) {
attrTest01 <- xmlGetAttr(x, "xml:id")
w <- xpathApply(x, 'w', function(w) {
ana <- xmlGetAttr(w, "ana")
if(is.null(w))
data.frame(attrTest01, ana)
})
do.call(rbind, w)
}
res <- xpathApply(doc, "//ns:l[(#xml:id)]", ns ,attrTest)
temp.df <- do.call(rbind, res)
But it doesn't work... I get the errors:
> res <- xpathApply(doc, "//ns:l[(#xml:id)]", ns ,attrTest)
Error in get(as.character(FUN), mode = "function", envir = envir) :
objet 'http://www.tei-c.org/ns/1.0' de mode 'function' introuvable
> temp.df <- do.call(rbind, res)
Error in do.call(rbind, res) : objet 'res' introuvable
Do you have suggestions?
In advance, thank you

I would suggest using the R-package tei2r. (https://rdrr.io/github/michaelgavin/tei2r/) This package has helped me, when working with TEI encoded files.
From this package I would use the function importTexts to import the document and the parseTEI function to get the exact nodes you are looking for.
Another way to import and extract could be this:
read_tei <- function(folder) {
list.files(folder, pattern = '\\.xml$', full.names = TRUE) %>%
map_dfr(~.x %>% parseTEI(.,node = "INSERT_NODE_TO_FIND") %>%tibble())
}
text <- read_tei("/Path/to/file").

Related

Problem with accessing elements from parLapply() output

I have a problem with accessing elements from the output of parLapply(). When I use the non-parallel lapply() function I can access the elements with the following code.
out_list <- lapply(list, function)
out_list[[2]][1:5, 1:5] # out_list[[2]] is a matrix in my specific case
But when I try to do the same, but with the output of the parLapply() function, I get an error.
The code:
out_list <- parLapply(cl = cluster, list, function)
out_list[[2]][1:5, 1:5]
The error message:
in extract_matrix(x, i, j, ...) :
out_list instance has been unmapped.
Here is the full code:
#!/usr/bin/Rscript
path_to_files = '***********'
file.names <- list.files(path = path_to_files, pattern = "*.bed", full.names = TRUE, recursive = FALSE) # making a list of the desired files
# sequentially -------------------------------------------------------------------------------------------------------------------------------------
library(BGData)
print("Executing lapply...")
example_BEDMatrix_list <- lapply(file.names, BEDMatrix)
print("lapply() done.")
example_BEDMatrix_list[[4]][1:5, 1:5]
#------------------------------------------------------------------------------------------------------------------------------------------------------------------------
# parallel ------------------------------------------------------------------------------------------------------------------------------------------------------
library(BGData)
library(parallel)
print("Creating cluster...")
copies_of_r <- detectCores() - 5
cluster <- makeCluster(copies_of_r)
clusterExport(cl=cluster, c('file.names'))
print("Cluster created")
print("Executing parLapply()...")
BEDMatrix_list <- parLapply(cluster, file.names[2:4], BEDMatrix)
BEDMatrix_list[[2]][1:5, 1:5]
print("parLapply() executed")
print("stopping cluster...")
stopCluster(cluster)
print("Cluster stopped")
How can I fix this?

Internal Generic s3 function name starting with a "."

For a package all the internal functions start with a ".".
Example .internalfunciton() and externalfunction(). This is used for quick namespace exporting.
Now I am trying to write a internal s3 method. There seems to be problems with it working with the dot at the start of the function name.
Here is some examples I have come up with to test it:
test <- function(x,...) UseMethod("test", x)
test.class <- function(x, ...) {
print("works like a charm")
}
.dottest <- function(x,...) UseMethod(".dottest", x)
.dottest.class <- function(x, ...) {
print("works like a charm even with a dot")
}
When I test it it ends up like this.
item <- 5
class(item) <- "class"
class(item)
#> [1] "class"
BrailleR:::test(item)
#> Error in UseMethod("test", x): no applicable method for 'test' applied to an object of class "class"
BrailleR:::.dottest(item)
#> Error in UseMethod(".dottest", x): no applicable method for '.dottest' applied to an object of class "class"
This happens when I load the functions locally or use the load_all method afterplacing that test function code in the package or even after installing it as this particular version shows.
Edit: As pointed out in comments some of these tests were invalid anyways due to not being put in the NAMESPACE
It feels like I am missing something with s3 generics.
Below is some context and is the actual code
.RewriteSVG = function(x, file, type) {
UseMethod(".ReWriteSVG", type)
}
.RewriteSVG.GeomLine <- function(x, file, type) {
# Adding extra 1 as this gets us into the inner line.
lineID <- paste(.GetGeomLine(type), "1", sep = ".")
svgDoc <- XML::xmlParseDoc(file)
nodes <- XML::getNodeSet(svgDoc,
paste0('//*[#id="', lineID , '"]'))
# Split the line into smaller polylines
line <- nodes[[1]]
lineAttr <- XML::xmlAttrs(line)
lineAttr <- lineAttr[!(names(lineAttr) %in% c("id", "points"))]
lineAttr <- split(lineAttr, names(lineAttr))
## Get the line points
attr <- XML::xmlGetAttr(line, 'points')
coordinates <- strsplit(attr, " ")[[1]]
## As there will always be 100 points in a graph we can just easily split them into 5 groups
nBreaks <- 6
breaks <- seq(1, 100, length.out = nBreaks) |> round()
start <- breaks[1:(nBreaks-1)]
end <- breaks[2:nBreaks]
1:(nBreaks-1) |>
lapply(function(i) {
segmentCoords <- coordinates[start[i]:end[i]]
args <- lineAttr
args$id <- paste(lineID, i, sep = ".")
args$points <- paste(segmentCoords, collapse = " ")
print(args)
newPolyline <- XML::newXMLNode('polyline', parent=line, attrs = args)
XML::addChildren(line, newPolyline)
})
# Remove old line
XML::removeNodes(line)
# Save modified svg doc
XML::saveXML(svgDoc, file=file)
}
Errors message looks like this
Error in UseMethod(".ReWriteSVG", type) :
no applicable method for '.ReWriteSVG' applied to an object of class "c('GeomLine', 'GeomPath', 'Geom', 'ggproto', 'gg')"
Which comes from the
lapply(x$layers, function(x, graphObject, file) {
.RewriteSVG(graphObject, file, x$geom)
}, graphObject = x, file = file)

Can not inspect S4 object after modification

I am having problems with my S4 object resafter I appended a list of values to it. The object was created with the DESeq2 package. The object was created via:
dds <- DESeqDataSetFromMatrix(countData = count.matrix,
colData = coldata,
design = ~ Condition)
dds <- DESeq(dds, test = "Wald")
res <- results(dds)
I did the following:
x <- qvalue(res#listData[["pvalue"]]) #calc qvalues based on pvalues from S4 object 'res'
res#listData[["qval"]] <- x[["qvalues"]] #append qvalues from x to 'res' as new col named "qval"
Now when I try to inspect the object with head() I get the following error:
> head(res)
Error in `rownames<-`(`*tmp*`, value = names(x)) :
invalid rownames length
The funny thing is that with View()I can inspect the S4 object in RStudio and I can see that everything went fine, adding the qvalues. Does anyone know why this happens? Is there a way to avoid that?
For you to get the qvalues.. you can do this first:
library(qvalue)
library(DESeq2)
dds = makeExampleDESeqDataSet()
dds = DESeq(dds)
res = results(dds)
res$qvalue = qvalue(res$pvalue)$qvalue
I will follow up with why there is an error.. you need to look into how it is constructed.

r 3.4.1 source exprs

I have following function as example:
myFunc <- function(x){
while(x < 100){
x <- x+10
cat( x )
cat("\n")
}
}
In the new R version 3.4.1 on Windows I want to source this function from the file myFunc.R like as below:
filepath <- "D:/"
l <- list.files(filepath, pattern = "my", full.names = TRUE)
source(l)
But am getting the following Error:
source(l) Error in source(l) : could not find symbol "exprs" in
environment of the generic function
I hope anyone can help. Thanks a lot

Error in<>: Object of type 'closure' is not subsettable still do not know how to fix it

I get a error message on my code and can not figure it out .
I google some questions but still confuse about the solution.
I will be very appreciate if you can check my code and help me to solve this issue.
Thanks a lot.
My code is:
rm(list = ls())
library(XLConnect)
setwd('C:/Users/YL1/Desktop/Air Qulaity/Power Plant/
NOx_SO2_Emission') # replace it with your own directory
file <- 'C:/Users/YL1/Desktop/Air Qulaity/
Power Plant/NOx_SO2_Emission/
Total Emission_2003-2015.xlsx'
wb <- loadWorkbook(file)
dt <- lapply(2003:2015, function(x) readWorksheet(wb, sheet = as.character(x)))
dt <- do.call(rbind, dt)
colnames(dt) <- c('State', 'Facility.Name', 'Facility.ID.ORISPL', 'Year',
'SO2.tons', 'NOx.tons')
dt.select.fun <- function(station) {
dt.select <- dt[dt$Facility.Name == station, ]
dt.select <- dt.select[order(-dt.select$Year), ]
write.csv(dt.select, paste0(station, '.csv'))
return(dt.select)
}
# change station to other values to extract the emission in other stations
dt.select.fun(station = 'Coffeen')
> Error in dt$Facility.Name : object of type 'closure' is not
> subsettable
That is probably because your dt <- do.call(rbind, dt) results in a matrix. Moreover, you are indexing in a function. Replace that with
dt <- do.call(rbind, dt) %>% as.data.table # to make a datatable
When you are calling your function , where are you passing data table ?
update:
dt.select.fun <- function(data,y) {
dt.select <- data[data$Facility.Name == y, ]
dt.select <- dt.select[order(-dt.select$Year), ]
write.csv(dt.select, paste0(station, '.csv'))
return(dt.select)
}
# call the function on your data table
dt.select.fun(dt,"Coffeen")

Resources