How to process SVG with namespace in R? - r

I have code:
transformXML <- function(xml, fn) {
x <- xml2::read_xml(xml)
fn(x)
tmp <- tempfile(fileext = ".xml")
xml2::write_xml(x, tmp, options = "format")
paste(unlist(readLines(tmp)), collapse='\n')
}
fname <- "inst/shiny/test.svg"
svg <- readChar(fname, file.info(fname)$size)
## move text that is in top of the plot tot the left
svg <- transformXML(svg, function(xml) {
text.nodes <- xml2::xml_find_all(xml, ".//text")
for (text in text.nodes) {
if (as.double(xml2::xml_attr(text, 'y')) < 60) {
xml2::xml_set_attr(text, 'x', '0')
}
}
})
message(svg)
When I have SVG like this:
<?xml version='1.0' encoding='UTF-8' ?>
<svg xmlns='http://www.w3.org/2000/svg' xmlns:xlink='http://www.w3.org/1999/xlink' viewBox='0 0 1319.34 500.00'>
But it can't find anything in the SVG. It always returns 0 nodes, even If I use //svg XPath.
If I remove xmlns='http://www.w3.org/2000/svg' it works fine, How can I use default namespace in XML and process the file with R?

I suppose the problem is the different namespace the svg elements are in. Take a look here : https://www.inflectra.com/support/knowledgebase/kb503.aspx how to select those elements.

Related

readLines function not recognizing separating character "\t"

My input file contains many lines of tab-delineated information in a text file. Below would be a line from the text file:
100026 TGACTGCATGACGTACAC NM_006342.1 TACC3
My code is as follows:
constant_source <- 'constants.R'
source(constant_source)
source(classes_file)
processFile = function(filepath) {
con = file(filepath, "r")
while ( TRUE ) {
line = readLines(con, sep="\t")
print(line)
if (length(line) == 0 ) {
break
}
}
close(con)
}
The output, however, is as follows:
100026\tTGACTGCATGACGTACAC\tNM_006342.1\tTACC3
Why is the readLines function not respecting the separation parameter? I have been toying with this for a while and am stuck. Sorry about this; I just started learning R today. If it makes a difference, I am using RStudio.

How to save all images in a separate folder?

So, I am running the following code:
dirtyFolder = "Myfolder/test"
filenames = list.files(dirtyFolder, pattern="*.png")
for (f in filenames)
{
print(f)
imgX = readPNG(file.path(dirtyFolder, f))
x = data.table(img2vec(imgX), kmeansThreshold(imgX))
setnames(x, c("raw", "thresholded"))
yHat = predict(gbm.mod, newdata=x, n.trees = best.iter)
img = matrix(yHat, nrow(imgX), ncol(imgX))
img.dt=data.table(melt(img))
names.dt<-names(img.dt)
setnames(img.dt,names.dt[1],"X1")
setnames(img.dt,names.dt[2],"X2")
Numfile = gsub(".png", "", f, fixed=TRUE)
img.dt[,id:=paste(Numfile,X1,X2,sep="_")]
write.table(img.dt[,c("id","value"),with=FALSE], file = "submission.csv", sep = ",", col.names = (f == filenames[1]),row.names = FALSE,quote = FALSE,append=(f != filenames[1]))
# show a sample
if (f == "4.png")
{
writePNG(imgX, "train_101.png")
writePNG(img, "train_cleaned_101.png")
}
}
What it does is basically, takes as input images which have noise in them and removes noise from them. This is only the later part of the code which applies the algorithm prepared from a training dataset (not shown here).
Now, I am not able to figure out, how can I save the cleaned image for each of the images in the test folder. That is, I wish to save the cleaned image for each of the images in the folder and not just the 4.png image. The output image should have the name as 4_cleaned.png if the input image has the name 4.png and it should be saved in a separate folder in the same directory. That is, if input image has the name x.png, the output image should have the name x_cleaned.png and saved in a separate folder. How can I do it?
Tldr; I just want to save the variable named img for each of the filename as number_cleaned.png where number corresponds to the original file name. These new files should be saved in a separate folder.
Tldr; I just want to save the variable named img for each of the filename as number_cleaned.png where number corresponds to the original file name. These new files should be saved in a separate folder.
Alright, so construct the output filename using file.path and a function such as paste or sprintf:
folder_name = 'test'
output_filename_pattern = file.path(folder_name, '%s_cleaned.png')
remove_extension = function (filename)
gsub('\\.[^.]$', '', filename)
for (f in filenames) {
# … your code her …
new_filename = sprintf(output_filename_pattern, remove_extension(f))
# … save file here …
}

How to read large (~20 GB) xml file in R?

I want to read data from large xml file (20 GB) and manipulate them. I tired to use "xmlParse()" but it gave me memory issue before loading. Is there any efficient way to do this?
My data dump looks like this,
<tags>
<row Id="106929" TagName="moto-360" Count="1"/>
<row Id="106930" TagName="n1ql" Count="1"/>
<row Id="106931" TagName="fable" Count="1" ExcerptPostId="25824355" WikiPostId="25824354"/>
<row Id="106932" TagName="deeplearning4j" Count="1"/>
<row Id="106933" TagName="pystache" Count="1"/>
<row Id="106934" TagName="jitter" Count="1"/>
<row Id="106935" TagName="klein-mvc" Count="1"/>
</tags>
In XML package the xmlEventParse function implements SAX (reading XML and calling your function handlers). If your XML is simple enough (repeating elements inside one root element), you can use branches parameter to define function(s) for every element.
Example:
MedlineCitation = function(x, ...) {
#This is a "branch" function
#x is a XML node - everything inside element <MedlineCitation>
# find element <ArticleTitle> inside and print it:
ns <- getNodeSet(x,path = "//ArticleTitle")
value <- xmlValue(ns[[1]])
print(value)
}
Call XML parsing:
xmlEventParse(
file = "http://www.nlm.nih.gov/databases/dtd/medsamp2015.xml",
handlers = NULL,
branches = list(MedlineCitation = MedlineCitation)
)
Solution with closure:
Like in Martin Morgan, Storing-specific-xml-node-values-with-rs-xmleventparse:
branchFunction <- function() {
store <- new.env()
func <- function(x, ...) {
ns <- getNodeSet(x, path = "//ArticleTitle")
value <- xmlValue(ns[[1]])
print(value)
# if storing something ...
# store[[some_key]] <- some_value
}
getStore <- function() { as.list(store) }
list(MedlineCitation = func, getStore=getStore)
}
myfunctions <- branchFunction()
xmlEventParse(
file = "medsamp2015.xml",
handlers = NULL,
branches = myfunctions
)
#to see what is inside
myfunctions$getStore()

How to get user-input in batch mode in R?

I have the following code:
first.moves <- function()
{
go.first <- readline("Do you want to go first? (Y/N) ")
if (go.first == "Y" || go.first == "y")
{
game <- altern.moves()
}
else
{
game <- move(game,1,1)
}
return(game)
}
altern.moves <- function()
{
plyr.mv <- as.numeric(readline("Please make a move (1-9) "))
game <- move(game,plyr.mv,0)
cmp.mv <- valid.moves(game)[1]
game <- move(game,cmp.mv,1)
return(game)
}
#game
game <- matrix(rep(NA,9),nrow=3)
print("Let's play a game of tic-tac-toe. You have 0's, I have 1's.")
(game <- first.moves())
repeat
{
game <- altern.moves()
print(game)
}
When I run the part after #game in batch mode neither does R stop to wait for "Do you want to go first? (Y/N)" nor does it repeat the repeat block. Everything works fine on its own and when I click through it line-by-line.
What am I doing wrong and how can I remedy the situation to have a decent program flow but with user interaction? (or do I really have to click through this part of the code line-by-line? I hope not...)
Add this to the beginning of your code:
if (!interactive()) {
batch_moves <- list('Y', 5, 2) # Add more moves or import from a file
readline <- (function() {
counter <- 0
function(...) { counter <<- counter + 1; batch_moves[[counter]] }
})()
}
Now you get
> readline()
[1] "Y"
> readline()
[1] 5
> readline()
[1] 2
EDIT: Optionally, to clean up (if you are running more scripts), add rm(readline) to the end of your script.
EDIT2: For those who don't like <<-, replace counter <<- counter + 1 with assign('counter', counter + 1, envir = parent.env(environment())).

How to increase buffer size in R-function

i would like to implement a function in R which is able to increase the size of a buffer in a for-loop.
The overall-plan is to write a package, which uses a test- and a reference-shapefile. It should create a buffer around the reference shapefile and increases the size as long as necessary, to intersect the whole test-shapefile.
Therefore, i already wrote some code snippets to insert the shapefiles and create the first buffer:
require("rgeos")
require("rgdal")
l1=readOGR(dsn="C:/Maps", layer="osm_ms1")
l2=readOGR(dsn="C:/Maps", layer="osm_ms2")
proj4string(l2) = CRS("+init=epsg:31467") ## DHDN / 3-degree Gauss-Kruger zone 3
l2buffer <- gBuffer(l2, width=0.001, capStyle="ROUND")
plot(l2buffer, col="black")
lines(l2, col="red")
lines(l1, col="blue")
Until now, every works fine.
After that, i wanted to transfer this method to a for-loop with a buffer for every step:
i = 0.001
buffergrow = function(shape) {
for (k in 1:10) {
linebuffer[k] <- gBuffer(l2, width=i, capStyle="ROUND")
plot(linebuffer[k])
i = i+0.001
}
}
> buffergrow(l2)
Error in linebuffer[k] <- gBuffer(shape, width = i, capStyle = "ROUND") :
Object 'linebuffer' not found
As you can see, an error occurs when i call the function 'buffergrow' with 'l2' as the argument (shape). Does anybody has an idea why this happens? I tried already some other ideas, but i need some help.
Optionally / Additionally: Do you have some hints for me, regarding the further work for my overall plan?
Best regards,
Stefan
You have to initialize an object before accessing its subelements.
E.g.:
foo <- double(10)
for (i in 1:10) {
foo[i] <- i;
}
# or
linebuffer <- list()
for (i in 1:10) {
linebuffer[[i]] <- i;
}
But you don't need an object linebuffer in your usecase.
Try the following instead:
buffergrow = function(shape) {
for (k in 1:10) {
plot(gBuffer(l2, width=i, capStyle="ROUND"))
i = i+0.001
}
}
EDIT:
If you need to store the gBuffer results:
buffergrow = function(shape) {
i <- 1
linebuffer <- vector("list", 10)
for (k in 1:10) {
linebuffer[[k]] <- gBuffer(l2, width=i, capStyle="ROUND")
plot(linebuffer[[k]])
i = i+0.001
}
return(linebuffer)
}

Resources