I'm looking for a way in R to search for a certain, delimited string.
In my example I need to receive TRUE if a cell contains "HDT2" and not "HDT21" or "HDT24" and so on, because this string contains HDT2 as well.
So right now I am using
grepl("HDT2",data.label[d,2])
in a for-loop to check each row of the second column of data.label for "HDT2". The problem is that this also returns TRUE if there is more than just "HDT2". As for example it returns also true if there is "HDT21" or "HDT24", but this is not what i want.
Is there a way to only check for a certain, delimited string?
Thanks!
EDIT: The strings I have to check are longer than just "HDT2". The string is for example "HDT2 (Arm 1: reference)".
You can use the following regular expression in grepl(). This will return true for an exact match of "HDT2", with nothing coming before or after it.
grepl("^HDT2$",data.label[d,2])
Usage:
> grepl("^HDT2$", "HDT2")
[1] TRUE
> grepl("^HDT2$", "AHDT2")
[1] FALSE
> grepl("^HDT2$", "HDT2 (Arm 1: reference)")
[1] FALSE
Related
I am trying to create a function which will look at two vectors of character labels, and print the appropriate label based on an If statement. I am running into an issue when one of the vectors is populated by NA.
I'll truncate my function:
eventTypepriority=function(a,b) {
if(is.na(a)) {print(b)}
if(is.na(b)) {print(a)}
if(a=="BW"& b=="BW",) {print("BW")}
if(a=="?BW"& b=="BW") {print("?BW")}
...#and so on
}
Some data:
a=c("Pm", "BW", "?BW")
b=c("PmDP","?BW",NA)
c=mapply(eventTypepriority, a,b, USE.NAMES = TRUE)
The function works fine for the first two, selecting the label I've designated in my if statements. However, when it gets to the third pair I receive this error:
Error in if (a == "?BW" & b == "BW") { :
missing value where TRUE/FALSE needed
I'm guessing this is because at that place, b=NA, and this is the first if statement, outside of the 'is.na' statements, that need it to ignore missing values.
Is there a way to handle this? I'd really rather not add conditional statements for every label and NA. I've also tried:
-is.null (same error message)
-Regular Expressions:
if(a==grepl([:print:]) & b==NA) {print(a)}
In various formats, including if(a==grepl(:print:)... No avail. I receive an 'Error: unexpected '[' or whatever character R didn't like first to tell me this is wrong.
All comments and thoughts would be appreciated. ^_^
if all your if conditions are exclusives, just call return() to avoid checking other conditions when one is met:
eventTypepriority=function(a,b) {
if(is.na(a)) {print(b);return()}
if(is.na(b)) {print(a);return()}
if(a=="BW"& b=="BW",) {print("BW");return()}
if(a=="?BW"& b=="BW") {print("?BW");return()}
...#and so on
}
You need to use if .. else statements instead of simply if; otherwise, your function will evaluate the 3rd and 4th lines even when one of the values is n/a.
Given you mapply statement, I also assume you want the function to output the corresponding label, not just print it?
In that case
eventTypepriority<-function(a,b) {
if(is.na(a)) b
else if(is.na(b)) a
else if(a=="BW"& b=="BW") "BW"
else if(a=="?BW"& b=="BW") "?BW"
else "..."
}
a=c("Pm", "BW", "?BW")
b=c("PmDP","?BW",NA)
c=mapply(eventTypepriority, a,b, USE.NAMES = T)
c
returns
Pm BW ?BW
"..." "..." "?BW"
If you actually want to just print the label and have your function return something else, you should be able to figure it out from here.
I have a fasta sequences like following:
fasta_sequences
seq1_1
"MTFJKASDKASWQHBFDDFAHJKLDPAL"
seq1_2
"GTRFKJDAIUETZUQOIHHASJKKJHPAL"
seq1_3
"MTFJHAZOQIIREUUBSDFHGTRF"
seq2_1
"JUZGFNBGTFCKAJDASEJIJAS"
seq2_1
"MTFHJHJASBBCMASDOEQSDPAL"
seq2_3
"RTZIIASDPLKLKLKLLJHGATRF"
seq3_1
"HMTFLKBNCYXBASHDGWPQWKOP"
seq3_2
"MTFJKASDJLKIOOIEOPWEIOKOP"
I would like to retain only those sequences which starts with MTF and ends with either KOP or TRF or PAL. At the end it should be like
seq1_1
"MTFJKASDKASWQHBFDDFAHJKLDPAL"
seq1_3
"MTFJHAZOQIIREUUBSDFHGTRF"
seq2_1
"MTFHJHJASBBCMASDOEQSDPAL"
seq3_2
"MTFJKASDJLKIOOIEOPWEIOKOP"
I tried the following code in R but it gave me which contains nothing
new_fasta=grep("^MTF.*(PAL|TRF|KOP)$")
Could anyone help how to get the desired output. Thanks in advance.
This is the way to go i guess;
For every element in fasta_sequences; (if fasta_sequences is a vector containing the sequences)
newseq = list()
it=1
for (i in fasta_sequences){
# i is seq1_1, seq1_2 etc.
a=substr(i,1,3)
if (a=="MTF"){
x=substr(i,(nchar(i)-2),nchar(i))
if ( x=="PAL" | x=="KOP" | x=="TRF"){
newseq[it]=i
it=it+1
}
}
}
Hope it helps
new_fasta=grep("^MTF.*(PAL|TRF|KOP)$",fasta_sequences,perl=True)
^^^^^^^^^
Add perl=True option.
I'm making a function and before it does any of the hard stuff I need it to check that all the column names listed in the 'samples' dataset are also present in the 'grids' dataset (the function maps one onto the other).
all(names(samples[expvar]) %in% names(grids))
This does that: the code within all() asks if all the names in the list ('expvar') of columns in 'samples' are also names in 'grids'. The output for a correct length=3, expvar would be TRUE TRUE TRUE. 'all' asks if all are TRUE, so the output here is TRUE. I want to make an IF statement along the lines of:
if(all(names(samples[expvar]) %in% names(grids)) = FALSE) {stop("Not all expvar column names found as column names in grids")}
No else needed, it'll just carry on. The problem is that the '= FALSE' is redundant because all() is a logically evaluable statement... is there a "carry on" function, e.g.
if(all(etc)) CARRYON else {stop("warning")}
Or, can anyone think of a way I can restructure this to make it work?
You're looking for the function stopifnot.
However you don't need to implement it as
if (okay) {
# do stuff
} else {
stop()
}
which is what you have. Instead you can do
if (!okay) {
stop()
}
# do stuff
since the lines will execute in sequential order. But, again, it might be more readable to use stopifnot, as in:
stopifnot(okay)
# do stuff
I would code it:
if(!all(...))
stop(...)
... rest of program ...
How can I extend the exists function to work with the following:
Any ideas how I would extend to this to looking at seeing whether a nested dictionary would also exist. I.e. for example: if(exists("mylists[[index]]['TSI']")), where the mylists object is a dictionary look up that also wants to contain a nested dictionary.
Now mylists will look like:
[[index]]["TSI"]=c(0="a",1="b")
How should I check this exists so that I may append it so I have:
[[index]]["TSI"]=c(0="a",1="b",2="c")
Here is more code that illustrates things better:
index is an ID
if(!is.null(listsar[[index]]["TSI"])) {
print("extending existing")
listsar[[index]][["TSI"]] <- c(listsar[[index]][["TSI"]], risktype=myTSI)
}else
{
print("creating new")
listsar[[index]][["TSI"]] <- c(risktype=myTSI)
}
However this does not seem to work. I get the "extending existing" and I never seem to get the "creating new". If I change the evaluation line to:
if(!is.null(listsar[[index]][["TSI"]]))
I get different statement:
"creating new"
You can test for NULL in most cases. Sample data (which is something you should have given us along with working code - wtf is c(0="a",1="b",2="c") supposed to be?)
> mylists=list()
> mylists[["foo"]]=list()
> mylists[["foo"]][["TSI"]]=c(a=0,b=1)
Does a "foo" exist at the top level?
> !is.null(mylists[["foo"]])
[1] TRUE
Yes.
Does a "fnord" exist at the top level?
> !is.null(mylists[["fnord"]])
[1] FALSE
No.
Does a "TSI" exist within "foo"?
> !is.null(mylists[["foo"]][["TSI"]])
[1] TRUE
Yes.
Does a "FNORD" exist within "foo"?
> !is.null(mylists[["foo"]][["FNORD"]])
[1] FALSE
No.
Does a "FNORD" exist within a top-level (and nonexistent) "fnord":
> !is.null(mylists[["fnord"]][["FNORD"]])
[1] FALSE
No.
I would like to use the grepl function in R to find if a string contains something, but on the condition that it is not preceeded by something else.
So for example say I wanted to find a string which includes the pattern 'xx', as long as it is not preceeded by 'yy'. So:
'123xx45' world return TRUE
'123yy4xx5' would also return TRUE as the 'yy' is not immediately preceding 'xx'
However '123yyxx45' would return FALSE.
Please let me know if anything is unclear or you would like a better example.
How about grepl('(?<!yy)xx', c('123yy4xx5','123xx45','123yyxx45'), perl=TRUE)?
your.data <- c('123yy4xx5','123xx45','123yyxx45')
grepl("xx",your.data) & !grepl("yyxx",your.data)
[1] TRUE TRUE FALSE