I have separated the last letter of individuals names into a separate column in order to count the number of lowercase letters there are in the column. These could be any letter of the alphabet and I want to only count them if they are a lowercase letter. Any assistance would be appreciated. Thank you.
This would be very easy to do with a VBA custom function - but as this hasn't been mentioned in your tags, you could use a combination of Sumproduct and Exact to essentially do a case sensitive Countif
=SUMPRODUCT(EXACT({"A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z"},D1)*1)
I hope that this helps
Related
I need to delimit the string and this time the delimiter is $($, but I need to note that the next character is number ( because I am specifically trying to separate the title from the year from one column. ) Even better would be that I could indicate, that after $($ there are 4 digits. But in general my question is where can I find all the symbols that denote different form of characters or group of character in order to make it easier to separate text into two columns. Thanks in advance.
The text column can hold up to 100 letters for each entry. How can i write a script that recognizes the word "Approved" or "Rejected". Sometimes the word will be "-Approved", "Approved","Approved" or "Approve". I want it to account for each scenario with a "LIKE" type of function.
There are two words i am looking for so "OR" may be applicable to this as opposed to a range.
R has a pair of text-similarity functions, agrep and agrepl, which are like grep and grepl in returning a vector when given a vector. The agrepl function is logical and of the same length as the input so works better in cases like this:
agrepl("Approved", df$text_col) | agrepl("Rejected", df$text_col)
That could be used to logically index matching rows of a dataframe. Or you could sum the logical vector to get a count. Suggestion: Edit your question with an example to use for demonstration.
There are additional parameters that can be used to adjust the tightness of the approximate matching.
this is my first entry on stack overflow, so please be indulgent if my post might have some lack in terms of quality.
I want to learn some webscraping with R and started with a simple example --> Extracting a table from a Wikipedia site.
I managed to download the specific page and identified the HTML sections I am interested in:
<td style="text-align:right">511.000.000\n</td>
Now I want to extract the number in the data from the table by using regex. So i created a regex, which should match the structure of the number from my point of view:
pattern<-"\\d*\\.\\d*\\.\\d*\\.\\d*\\."
I also tried other variations but none of them found the number within the HTML code. I wanted to keep the pattern open as the numbers might be hundreds, thousand, millions, billions.
My questions: The number is within the HTML code, might it be
necessary to include some code for the non-number code (which should
not be extracted...)
What would be the correct version for the
pattern to identify the number correctly?
Thank you very much for your support!!
So many stars implies a lot of backtracking.
One point further, using \\d* would match more than 3 digits in any group and would also match a group with no digit.
Assuming your numbers are always integers, formatted using a . as thousand separator, you could use the following: \\d{1,3}(?:\\.\\d{3})* (note the usage of non-capturing group construct (?:...) - implying the use of perl = TRUE in arguments, as mentioned in Regular Expressions as used in R).
Look closely at your regex. You are assuming that the number will have 4 periods (\\.) in it, but in your own example there are only two periods. It's not going to match because while the asterisk marks \\d as optional (zero or more), the periods are not marked as optional. If you add a ? modifier after the 3rd and 4th period, you may find that your pattern starts matching.
I am attempting to remove all one or two letter words in R with this regular expression:
\\b\\w{1,2}\\b
But I also want to exclude certain two letter words from the removal, e.g. IT.
Is there any way to do this?
Assume there's a list in R with a variable which contains character fields. I want to output all values which begin with certain letters like "ab". How can I do this? Thanks for help.
As the OP didn't provide any reproducible example, based on the description, it seems to be a data.frame, we can use grep to subset the elements in the column that begin with 'ab'.
grep('^ab',yourdata$yourcol, value=TRUE)