Create a data.frame using a string expression in R - r

How can I create a data frame from the following string:
my_str <- "a=1, b=2"
In other words, how can I feed y into the data.frame or data.table functions so that it gives me the same thing as
data.frame(a=1, b=2)
Think about how you can easily pass a string of form my_str <- "y~x1+x2+x3" into a statistical model in R by simply using as.formula(my_str) and effectively remove the quotes. So I am looking for a similar solution.
Thank you!

I would strongly discourage you from storing code as a string in R. There are almost always better ways to write R code that don't require parsing strings.
But let's assume you have no other options. Then you can write your own parser, or use R's built in parser. The expression "a=1, b=2" on it's own doesn't make any sense in R (you can't have two "assignments" separated by a comma) so it would only make sense as parameters to a function.
If you want to wrap it in data.frame(), then you can use paste() to make the string you want and then parse() and finally eval() it to run it
my_str <- "a=1, b=2"
my_code <- paste0("data.frame(", my_str, ")")
my_expr <- parse(text=my_code)
eval(my_expr)
# a b
# 1 1 2
But like I already mentioned eval/parse should generally be avoided.

Related

Unexpected outcome, not replacing, in R out of a gsub function

As the output of a certain operation, I have the following dataframe whith 729 observations.
> head(con)
Connections
1 r_con[C3-C3,Intercept]
2 r_con[C3-C4,Intercept]
3 r_con[C3-CP1,Intercept]
4 r_con[C3-CP2,Intercept]
5 r_con[C3-CP5,Intercept]
6 r_con[C3-CP6,Intercept]
As can be seen, the pattern to be removed is everything but the pair of Electrode information, for instance, in the first observation this would be C3-C3. Now, this is my take on the issue, which I'd expect to have the dataframe with everything removed. If I'm not wrong (which probably am) the regex syntax is ok and from my understanding I believe fixed=TRUE is also necessary. However, I do not understand the R output. When I would expect the pattern to be changed by nothing ""it returns this output, which doesn't make sense to me.
> gsub("r_con\\[\\,Intercept\\]\\","",con,fixed=TRUE)
[1] "3:731"
I believe this will probably be a silly question for an expert programmer, which I am far from being, and any insight would be much appreciated.
[UPDATE WITH SOLUTION]
Thanks to Tim and Ben I realised I was using a wrong regex syntax and a wrong source, this made it to me:
con2 <- sub("^r_con\\[([^,]+),Intercept\\]", "\\1", con$Connections)
I think your problem is that you're accessing "con" in your sub call. Also, as the user above me pointed out, you probably don't want to use sub.
I'm assuming, that your data is consistent, i.e., the strings in con$Connections follow more or less the same pattern. Then, this works:
I have set up this example:
con <- data.frame(Connections = c("r_con[C3-C3,Intercept]", "r_con[C3-CP1,Intercept]"))
library(stringr)
f <- function(x){
part <- str_split(x, ",")[[1]][1]
str_sub(part, 7, -1)
}
f(con$Connections[1])
sapply(con$Connections, f)
The sub function doesn't work this way. One viable approach would be to capture the quantity you want, then use this capture group as the replacement:
x <- "r_con[C3-C3,Intercept]"
term <- sub("^r_con\\[([^,]+),Intercept\\]", "\\1", x)
term
[1] "C3-C3"

Can you create an R function that calls using a prefix and suffix (operating like brackets)?

I have read about prefix functions and infix functions on Hadley Wickham's Advanced R website. I would like to know if there is any way to define functions that are called by placing a prefix and suffix around a single argument, so that the prefix and suffix operate like brackets. Is there any way to create a function like this, and if so, how do you do it?
An example for formulation: In order to give a specific example for formulation, suppose you have an object char that is a character string. You want to create a function that is called on a character string using the prefix _# and suffix #_ and the function adds five dashes to the front of the character string. If programmed successfully, it would operate as shown below.
char
[1] "Hello"
_#char#_
[1] "-----Hello"
There is a way to do this as long as your special operator takes a particular form, that is .%_% char %_%. . This is because the parser will interpret the dot as a variable name. If we use non-standard evaluation, we don't need the dot to actually exist, and we only need to use this as a marker for opening and closing our special operator. So we can do something like this:
`%_%` <- function(a, b)
{
if((deparse(match.call()$a) != ".") +
(deparse(match.call()$b) != ".") != 1)
stop("Unrecognised SPECIAL")
if(deparse(match.call()$a == "."))
return(`attr<-`(b, "prepped", TRUE))
if(attr(a, "prepped"))
return(paste0("-----", a))
stop("Unrecognised SPECIAL")
}
.%_% "hello" %_%.
#> [1] "-----hello"
However, this is a weird thing to do in R. It's not idiomatic and uses more keystrokes than a simple function call would. It would also very likely cause unpredictable problems in places where non-standard evaluation is used. This is really just a demo to show that it can be done. Not that it should be done.
Writing a simple function seems like a more R-like solution. If terseness is a priority, then maybe something like
._ <- function(x) paste0("-----", x)
._("hello")
# [1] "-----hello"
Or if you wanted something more bracket-like
.. <- structure(list(NULL), class="dasher")
`[.dasher` <- function(a, x) paste0("-----", x)
..["hello"]
# [1] "-----hello"
Another way to use a custom class would be to redefine the - operator to paste that value in front of the string. For example
literal <- function(x) {class(x)<-"literal"; x}
`-.literal` <- function(e1, e2) {literal(paste0("-", unclass(e1)))}
print.literal <- function(x) print(unclass(x))
Then you can do
val <- literal("hello")
-----val
# [1] "-----hello"
---val
# [1] "---hello"
So here the number of - you type is the number you get in the output.
You can get creative/weird with syntax, but you need to make sure whatever symbols you come up with can be parsed by the parser otherwise you are out-of-luck.

How to match any character existing between a pattern and a semicolon

I am trying to get anything existing between sample_id= and ; in a vector like this:
sample_id=10221108;gender=male
tissue_id=23;sample_id=321108;gender=male
treatment=no;tissue_id=98;sample_id=22
My desired output would be:
10221108
321108
22
How can I get this?
I've been trying several things like this, but I don't find the way to do it correctly:
clinical_data$sample_id<-c(sapply(myvector, function(x) sub("subject_id=.;", "\\1", x)))
You could use sub with a capture group to isolate that which you are trying to match:
out <- sub("^.*\\bsample_id=(\\d+).*$", "\\1", x)
out
[1] "10221108" "321108" "22"
Data:
x <- c("sample_id=10221108;gender=male",
"tissue_id=23;sample_id=321108;gender=male",
"treatment=no;tissue_id=98;sample_id=22")
Note that the actual output above is character, not numeric. But, you may easily convert using as.numeric if you need to do that.
Edit:
If you are unsure that the sample IDs would always be just digits, here is another version you may use to capture any content following sample_id:
out <- sub("^.*\\bsample_id=([^;]+).*$", "\\1", x)
out
You could try the str_extract method which utilizes the Stringr package.
If your data is separated by line, you can do:
str_extract("(?<=\\bsample_id=)([:digit:]+)") #this tells the extraction to target anything that is proceeded by a sample_id= and is a series of digits, the + captures all of the digits
This would extract just the numbers per line, if your data is all collected like that, it becomes a tad more difficult because you will have to tell the extraction to continue even if it has extracted something. The code would look something like this:
str_extract_all("((?<=sample_id=)\\d+)")
This code will extract all of the numbers you're looking for and the output will be a list. From there you can manipulate the list as you see fit.

Language parser for R

Is there a parser in R to process natural language strings and convert them into R instructions? Something like LEX and BISON for C++. For example it would turn this string:
Dataset: Cars - Column: Speed: 15 - Range: [20-40]
into
filter_cars <- cars[cars$speed <= 15,][20:40,]
What I've seen works only for integrating R expressions in C++, that's why I'm asking.
This can be done using the functions evaluate() and parse(). parse() converts a string into an expression. evaluate() can handle expressions:
data(mtcars)
cars <- mtcars # example data
expr <- "filter_cars <- cars[cars$speed <= 15,][20:40,]" # text string
eval(parse(text = expr)) # convert string to expression and evaluate
EDIT: If you want to create the expressions automatically you will have to write your own function. paste() or cat() are useful for creating text strings from multiple inputs.

Using column names with signs of a data frame in a qplot

I have a dataset and unfortunately some of the column labels in my dataframe contain signs (- or +). This doesn't seem to bother the dataframe, but when I try to plot this with qplot it throws me an error:
x <- 1:5
y <- x
names <- c("1+", "2-")
mydf <- data.frame(x, y)
colnames(mydf) <- names
mydf
qplot(1+, 2-, data = mydf)
and if I enclose the column names in quotes it will just give me a category (or something to that effect, it'll give me a plot of "1+" vs. "2-" with one point in the middle).
Is it possible to do this easily? I looked at aes_string but didn't quite understand it (at least not enough to get it to work).
Thanks in advance.
P.S. I have searched for a solution online but can't quite find anything that helps me with this (it could be due to some aspect I don't understand), so I reason it might be because this is a completely retarded naming scheme I have :p.
Since you have non-standard column names, you need to to use backticks (`)in your column references.
For example:
mydf$`1+`
[1] 1 2 3 4 5
So, your qplot() call should look like this:
qplot(`1+`, `2-`, data = mydf)
You can find more information in ?Quotes and ?names
As said in the other answer you have a problem because you you don't have standard names. When solution is to avoid backticks notation is to convert colnames to a standard form. Another motivation to convert names to regular ones is , you can't use backticks in a lattice plot for example. Using gsub you can do this:
gsub('(^[0-9]+)[+|-]+|[+|-]+','a\\1',c("1+", "2-","a--"))
[1] "a1" "a2" "aa"
Hence, applying this to your example :
colnames(mydf) <- gsub('(^[0-9]+)[+|-]+|[+|-]+','a\\1',colnames(mydf))
qplot(a1,a2,data = mydf)
EIDT
you can use make.names with option unique =T
make.names(c("10+", "20-", "10-", "a30++"),unique=T)
[1] "X10." "X20." "X10..1" "a30.."
If you don't like R naming rules, here a custom version with using gsubfn
library(gsubfn)
gsubfn("[+|-]|^[0-9]+",
function(x) switch(x,'+'= 'a','-' ='b',paste('x',x,sep='')),
c("10+", "20-", "10-", "a30++"))
"x10a" "x20b" "x10b" "a30aa" ## note x10b looks better than X10..1

Resources