How escape or sanatize slash using regex in R?

How escape or sanatize slash using regex in R? - r

I'm trying to read in a (tab separted) csv file in R. When I want to read the column including a /, I get an error.
doSomething <- function(dataset) {
a <- dataset$data_transfer.Jingle/TCP.total_size_kb
...
}
The error says, that this object cannot be found. I've tried escaping with backslash but it did not work.
If anybody has got some idea, I'd really appreciate it!

Give
head(dataset)
and watch the name it has been given. Perhaps it would be something like:
dataset$data_transfer.Jingle.TCP.total_size_kb

Two ways:
dataset[["data_transfer.Jingle/TCP.total_size_kb"]]
or
dataset$`data_transfer.Jingle/TCP.total_size_kb`

Related

Extract the ground truth Language code from the file name

Would you please help me.
I need to extract ground truth Language code from a file name:
for example:
get or extract 'en' from the file name: 'Dictionary_en.txt'.
I tried so many times in vain.
Many thanks in advance.

Do you need something like this?:
filename = "Dictionary_en.txt"
gsub("(.*_)|(.txt)", "", filename)
The output will be: "en" in this example. You can make a list of your files, using list.files, and then apply the gsub function.
Best wishes

Regular Expression to extract function arguments in R

I have a problem to extract function arguments in R.
x="theme(legend.position='bottom',
legend.margin=(t=0,r=0,b=0,l=0,unit='mm'),
legend.background=element_rect(fill='red',size=rel(1.5)),
panel.background=element_rect(fill='red'),
legend.position='bottom')"
What I want is:
[1]legend.position='bottom'
[2]legend.margin=(t=0,r=0,b=0,l=0,unit='mm')
[3]legend.background=element_rect(fill='red',size=rel(1.5))
[4]panel.background=element_rect(fill='red')
[5]legend.position='bottom'
I tried several regular expressions without success including followings:
strsplit(x,",(?![^()]*\\))",perl=TRUE)
Please help me!

I think the best answer here might be to not attempt to use a regex to parse your function call. As the name implies, regular expressions require regular language. Your function call is not regular, because it has nested parentheses. I currently see a max nested depth of two, but who knows if that could get deeper at some point.
I would recommend writing a simple parser instead. You can use a stack here, to keep track of parentheses. And you would only split a parameter off if all parentheses were closed, implying that you are not in the middle of a parameter, excepting possibly the very first one.

Arf, I'm really sorry but i have to go work, i will continue later but for now i just let my way to solve it partially : theme\(([a-z.]*=['a-z]*)|([a-z._]*=[a-z0-9=,'_.()]*)*\,\)?
It misses only the last part..
Here the regex101 page : https://regex101.com/r/BZpcW0/2
See you later.

Thank you for all your advice. I have parsed the sentences and get the arguments as list. Here is my solution.
x<-"theme(legend.margin=margin(t=0,r=0,b=0,l=0,unit='mm'),
legend.background=element_rect(fill='red',size=rel(1.5)),
panel.background=element_rect(fill='red'),
legend.position='bottom')"
extractArgs=function(x){
result<-tryCatch(eval(parse(text=x)),error=function(e) return("error"))
if("character" %in% class(result)){
args=character(0)
} else {
if(length(names(result)>0)){
pos=unlist(str_locate_all(x,names(result)))
pos=c(sort(pos[seq(1,length(pos),by=2)]),nchar(x)+1)
args=c()
for(i in 1:(length(pos)-1)){
args=c(args,substring(x,pos[i],lead(pos)[i]-2))
}
} else{
args=character(0)
}
}
args
}

parsing xml file manually with r

for some reason, I cannot download the r xml package at work. I have an xml file that has contents like this:
x<-read.table("info.xml")
x
</name></content></item><item id="id-123"><content><name>
</name></content></item><item id="id-456"><content><name>
</name></content></item><item id="id-5559"><content><name>
I need to pick values that start with id and - and the numbers like
id-123, id-456 id-5559, etc
tried this:
str_extract_all(x, "id-[0-9]")
but is only printing id-1, I really need help very quick. Any ideas?

str_extract_all(x, "id-[0-9]+")

The regular expression "id-[0-9]" is missing a "+" at the end.
There may be more issues, but that one jumps out.

Reading a file into R with partly unknown filename

Is there a way to read a file into R where I do not know the complete file name. Something like.
read.csv("abc_*")
In this case I do not know the complete file name after abc_

If you have exactly one file matching your criteria, you can do it like this:
read.csv(dir(pattern='^abc_')[1])
If there is more than one file, this approach would just use the first hit. In a more elaborated version you could loop over all matches and append them to one dataframe or something like that.
Note that the pattern uses regular expressions and thus is a bit different from what you did expect (and what I wrongly assumed at my first shot to answer the question). Details can be found using ?regex
If you have a directory you want to submit, you have do modify the dir command accordingly:
read.csv(dir('path/to/your/file', full.names=T, pattern="^abc"))
The submitted path in your case may be c:\\users\\user\\desktop, and then the pattern as above. full.names=T forces dir() to output a whole path and not only the file name. Try running dir(...) without the read.csv to understand what is happening there.
If you want to give your path as a complete string, it again gets a bit more complicated:
filepath <- 'path/to/your/file/abc_'
read.csv(dir(dirname(filepath), full.names=T, pattern=paste("^", basename(filepath), sep='')))
That process will fail if your filename contains any regular expression keywords. You would have to substitute then with their corresponding escape sequences upfront. But that again is another topic.

Special characters in R language

I have a table, which looks like this:
1β 2β
1.0199e-01 2.2545e-01
2.5303e-01 6.5301e-01
1.2151e+00 1.1490e+00
and so on...
I want to make a boxplot of this data. The commands I am using is this:
pdf('rtest.pdf')
w1<-read.table("data_CMR",header=T)
w2<-read.table("data_C",header=T)
boxplot(w1[,], w2[,], w3[,],outline=FALSE,names=c(colnames(w1),colnames(w2),colnames(w3)))
dev.off()
The problem is instead of symbol beta (β), I get two dots (..) in the output.
Any suggestions, to solve this problem.
Thank you in advance.

The suggestion to use check.names will prevent the appending of "X" to the "1β" and "2β" which would otherwise occur even once the encoding is sorted out (since column names are not supposed to start with numbers. (One could also have just used the"names" argument to boxplot.)
w1<-read.table(text="1β 2β
1.0199e-01 2.2545e-01
2.5303e-01 6.5301e-01
1.2151e+00 1.1490e+00",header=TRUE, check.names=FALSE, fileEncoding="UTF-8")
boxplot(w1)

This also works
pdf('rtest.pdf')
w1<-read.table("data_CMR",header=T)
w2<-read.table("data_C",header=T)
one<-expression(paste("1", beta,sep=""))
two <- expression(paste("2", beta,sep=""))
boxplot(w1[,], w2[,], w3[,],outline=FALSE, names=c(one,two))
dev.off()

This could be an encoding problem. Try adding encoding='UTF-8' to your read.table statements.
w1<-read.table("data_CMR",header=T,encoding='UTF-8')

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How escape or sanatize slash using regex in R? - r

Give head(dataset) and watch the name it has been given. Perhaps it would be something like: dataset$data_transfer.Jingle.TCP.total_size_kb

Two ways: dataset[["data_transfer.Jingle/TCP.total_size_kb"]] or dataset$`data_transfer.Jingle/TCP.total_size_kb`

Related

Extract the ground truth Language code from the file name

Regular Expression to extract function arguments in R

parsing xml file manually with r

Reading a file into R with partly unknown filename

Special characters in R language

Categories

Resources