If I use
d <- function(x){deparse(substitute(x))}
for letters or number all works fine. d(a1) gives "a1", for example. But using special characters results in an error. I want to use d(+) and get "+" as result.
From comments:
I want "+" == d(+) to give a TRUE. In other words, I do not want to use d(`+`). Is this possible? The function is part of a code that will await input from non-R-users and that is why I want to avoid using `` for special characters (I do not want explain to every user what a special character is).
Related
I want to extract information from downloaded html-Code. The html-Code is given as a string. The required information is stored inbetween specific html-expressions. For example, if I want to have every headline in the string, I have to search for "H1>" and "/H1>" and the text between these html expressions.
So far, I used substr(), but I had to calculate the position of "H1>" and "/H1>" first.
htmlcode = " some html code <H1>headline</H1> some other code <H1>headline2</H1> "
startposition = c(21,55) # calculated with gregexpr
stopposition = c(28, 63) # calculated with gregexpr
substr(htmlcode, startposition[1], stopposition[1])
substr(htmlcode, startposition[2], stopposition[2])
The output is correct, but to calculate every single start and stopposition is a lot of work. Instead I search for a similar function like substr (), where you can use start and stop words instead of the position. For example like this:
function(htmlcode, startword = "H1>", stopword = "/H1>")
I'd agree that using a package built for html processing is probably the best way to handle the example you give. However, one potential way to sub-string a string based on character values would be to do the following.
Step 1: Define a simple function to return to position of a character in a string, in this example I am only using fixed character strings.
strpos_fixed=function(string,char){
a<-gregexpr(char,string,fixed=T)
b<-a[[1]][1:length(a[[1]])]
return(b)
}
Step 2: Define your new sub-string function using the strpos_fixed() function you just defined
char_substr<-function(string,start,stop){
x<-strpos_fixed(string,start)+nchar(start)
y<-strpos_fixed(string,stop)-1
z<-cbind(x,y)
apply(z,1,function(x){substr(string,x[1],x[2])})
}
Step 3: Test
htmlcode = " some html code <H1>headline</H1> some other code <H1>headline2</H1> "
htmlcode2 = " some html code <H1>baa dee ya</H1> some other code <H1>say do you remember?</H1>"
htmlcode3<- "<x>baa dee ya</x> skdjalhgfjafha <x>dancing in september</x>"
char_substr(htmlcode,"<H1>","</H1>")
char_substr(htmlcode2,"<H1>","</H1>")
char_substr(htmlcode3,"<x>","</x>")
You have two options here. First, use a package that has been developed explicitly for the parsing of HTML structures, e.g., rvest. There are a number of tutorials online.
Second, for edge cases where you may need to extract from strings that are not necessarily well-formatted HTML you should use regular expressions. One of the simpler implementations for this comes from stringr::str_match:
# 1. the parenthesis define regex groups
# 2. ".*?" means any character, non-greedy
# 3. so together we are matching the expression <H1>some text or characters of any length</H1>
str_match(htmlcode, "(<H1>)(.*?)(</H1>)")
This will yield a matrix where the columns are (in order) the fully matched string followed by each independent regex group we specified. You would just want to pull the second group in this case if you want whatever text is between the <H1> tags (3rd column).
I have a column within a data frame with a series of identifiers in, a letter and 8 numbers, i.e. B15006788.
Is there a way to remove all instances of B15.... to make them empty cells (there’s thousands of variations of numbers within each category) but keep B16.... etc?
I know if there was just one thing I wanted to remove, like the B15, I could do;
sub(“B15”, ””, df$col)
But I’m not sure on the how to remove a set number of characters/numbers (or even all subsequent characters after B15).
Thanks in advance :)
Welcome to SO! This is a case of regex. You can use base R as I show here or look into the stringR package for handy tools that are easier to understand. You can also look for regex rules to help define what you want to look for. For what you ask you can use the following code example to help:
testStrings <- c("KEEPB15", "KEEPB15A", "KEEPB15ABCDE")
gsub("B15.{2}", "", testStrings)
gsub is the base R function to replace a pattern with something else in one or a series of inputs. To test our regex I created the testStrings vector for different examples.
Breaking down the regex code, "B15" is the pattern you're specifically looking for. The "." means any character and the "{2}" is saying what range of any character we want to grab after "B15". You can change it as you need. If you want to remove everything after "B15". replace the pattern with "B15.". the "" means everything till the end.
edit: If you want to specify that "B15" must be at the start of the string, you can add "^" to the start of the pattern as so: "^B15.{2}"
https://www.rstudio.com/wp-content/uploads/2016/09/RegExCheatsheet.pdf has a info on different regex's you can make to be more particular.
attach.files = c(paste("/users/joesmith/nosection_", currentDate,".csv",sep=""),
paste("/users/joesmith/withsection_", currentDate,".csv",sep=""))
Basically, if I did it like
c("nosection_051418.csv", "withsection_051418.csv")
And I did that manually it would work fine but since I'm automating this to run every day I can't do that.
I'm trying to attach files in an automated email but when I structure it like this, it doesn't work. How can I recreate this so that the character vector accepts it?
I thought your example implied the need for "parallel" inputs to the path stem, the first portion of the file name, and the date portions of those full paths. Consider this illustration of using a 2 item vector and a one item vector (produced by Sys.Date, replacing your "currentdate") to populate the %s positions in that sprintf string (suggested by #Gregor):
sprintf("/users/joesmith/%s_%s.csv", c("nosection", "withsection"), Sys.Date() )
[1] "/users/joesmith/nosection_2018-05-14.csv" "/users/joesmith/withsection_2018-05-14.csv"
Basically I want someone to give me a simple rundown of how this bit of python code works. Much appreciated
vari :
kw1 = ['keyword1', 'keyword2']
problem = input("Detect keywords from list\n")
main :
if set(kw1).intersection(problem.split()):
print(" Kw found. ")
else:
print(" Keywords not found. ")
A lot of things there.
First, when you call input you're asking for the user to give you an input string.
When you use split() on it you transform it into a list of strings, by separating the input string based on the empty spaces, so that "bla bli blo".split() gives you ["bla","bli","blo"].
Then, when you call set(my_list), it will transform my_list into a set, which is a mathematical construct without any duplicates and which responds to operators like union, intersection and so on.
Finally, when you compare your set (made from splitting the user input) to a list of keywords, if there are no matches (so none of the keywords in the list appreared directly in the user input), then it will give you an empty set and that will be considered as false by the if. So if set(["bla","bli","blo"]).intersection(["blu"]) will not activate, but if set(["bla","bli","blo"]).intersection(["blu","blo"]) will, as it is not an empty set.
Note that if you want to recognize keywords inside words, this method will NOT work. For instance, if you're looking for keywords kw1=['car','truck','bike'] and the user inputs cars trucks bikes, none of the keywords will be recognized, because the split() will split along empty spaces, giving you ['cars','trucks','bikes'] and 'cars'!='car'...
I need some help with R programming.
Basically I need to get user input from the user and use it as a variable in my R script.
When getting the user input the following checks need to be made.
to see if missing values exist:
else Prompt user to reenter
Check to see that only alpha numeric characters are entered.
else prompt user to reenter.
allow some special characters: $,#,&, etc
White space is allowed as in first name, " ", last name.
It is unclear what you are trying to do with the else if part of your code. The nature of readline() is that it will return a string of the user's input. Are there any specific characters you don't want included in the input? You could use grepl() to identify them and prevent the user from entering them as an input.
If you are trying to ensure that the user inputs something then you should use a while loop as suggested in the comments. If you are going to use your variable in R after the function runs then you need to return() the value of v1 - the user input. If you are trying to replace the space in between the first and last name with %20 then you may want to use gsub(). See the code below.
fun1 <- function(){
v1 <- c("")
v1 <- readline(prompt='Enter your First & Last Name: ')
while (v1==""){
v1 <- readline("You forgot to enter your Name. Please try again: ")
}
return(gsub(" ", "%20", v1))
}
> "David%20Smith"