plot function type - r

For digits I have done so:
digits <- c("0","1","2","3","4","5","6","7","8","9")

You can use the [:punct:] to detect punctuation. This detects
[!"\#$%&'()*+,\-./:;<=>?#\[\\\]^_`{|}~]
Either in grepexpr
x = c("we are friends!, Good Friends!!")
gregexpr("[[:punct:]]", x)
R> gregexpr("[[:punct:]]", x)
[[1]]
[1] 15 16 30 31
attr(,"match.length")
[1] 1 1 1 1
attr(,"useBytes")
[1] TRUE
or via stringi
# Gives 4
stringi::stri_count_regex(x, "[:punct:]")
Notice the , is counted as punctuation.
The question seems to be about getting individual counts of particular punctuation marks. #Joba provides a neat answer in the comments:
## Create a vector of punctuation marks you are interested in
punct = strsplit('[]?!"\'#$%&(){}+*/:;,._`|~[<=>#^-]\\', '')[[1]]
The count how often they appear
counts = stringi::stri_count_fixed(x, punct)
Decorate the vector
setNames(counts, punct)

You can use regular expressions.
stringi::stri_count_regex("amdfa, ad,a, ad,. ", "[:punct:]")
https://en.wikipedia.org/wiki/Regular_expression
might help too.

Related

Finding the position of decimal point in an integer or a string

If i want to determine the position of a decimal point in an integer, for example, in 524.79, the position of the decimal point is 4, and that is what I want as output in R; which function or command should i use? I have tried using gregexpr as well as regexpr but each time the output comes out to be 1.
This is what I did :
x <- 524.79
gregexpr(pattern = ".", "x")
The output looks like this:
[[1]]
[1] 1
attr(,"match.length")
[1] 1
attr(,"useBytes")
[1] TRUE
The . is a metacharacter which means any character. It either needs to be escaped (\\.) or place it inside square brackets [.] or use fixed = TRUE to get the literal character
as.integer(gregexpr(pattern = ".", x, fixed = TRUE))
#[1] 4
Or a compact option is str_locate
library(stringr)
unname(str_locate(x, "[.]")[,1])
#[1] 4
The second issue in the OP's solution is quoting the object x. So, the gregexpr locates the . as 1 because there is only one character "x" and it is the first position
data
x <- 524.79
We could actually use a regex here:
x <- "524.79"
nchar(sub("(?<=\\.)\\d+", "", x, perl=TRUE))
4

Using str_view with a list of words in R

I want to use str_view from stringr in R to find all the words that start with "y" and all the words that end with "x." I have a list of words generated by Corpora, but whenever I launch the code, it returns a blank view.
Common_words<-corpora("words/common")
#start with y
start_with_y <- str_view(Common_words, "^[y]", match = TRUE)
start_with_y
#finish with x
str_view(Common_words, "$[x]", match = TRUE)
Also, I would like to find the words that are only 3 letters long, but no
ideas so far.
I'd say this is not about programming with stringr but learning some regex. Here are some sites I have found useful for learning:
http://www.regular-expressions.info/tutorial.html
http://www.rexegg.com/
https://www.debuggex.com/
Here the \\w or short hand class for word characters (i.e., [A-Za-z0-9_]) is useful with quantifiers (+ and {3} in these 2 cases). PS here I use stringi because stringr is using that in the backend anyway. Just skipping the middle man.
x <- c("I like yax because the rock to the max!",
"I yonx & yix to pick up stix.")
library(stringi)
stri_extract_all_regex(x, 'y\\w+x')
stri_extract_all_regex(x, '\\b\\w{3}\\b')
## > stri_extract_all_regex(x, 'y\\w+x')
## [[1]]
## [1] "yax"
##
## [[2]]
## [1] "yonx" "yix"
## > stri_extract_all_regex(x, '\\b\\w{3}\\b')
## [[1]]
## [1] "yax" "the" "the" "max"
##
## [[2]]
## [1] "yix"
EDIT Seems like these may be of use too:
## Just y starting words
stri_extract_all_regex(x, 'y\\w+\\b')
## Just x ending words
stri_extract_all_regex(x, 'y\\w+x')
## Words with n or more characters
stri_extract_all_regex(x, '\\b\\w{4,}\\b')

How to recognise and extract alpha numeric characters

I want to extract alphanumeric characters from a partiular sentence in R.
I have tried the following:
aa=grep("[:alnum:]","abc")
.This should return integer(0),but it returns 1,which should not be the case as "abc" is not an alphanumeric.
What am I missing here?
Essentially I am looking for a function,that only searches for characters that are combinations of both alphabets and numbers,example:"ABC-0112","PCS12SCH"
Thanks in advance for your help.
[[:alnum:]] matches alphabets or digits. To match the string which contains the both then you should use,
x <- c("ABC", "ABc12", "--A-1", "abc--", "89=A")
grep("(.*[[:alpha:]].*[[:digit:]]|.*[[:digit:]].*[[:alpha:]])", x)
# [1] 2 3 5
or
which(grepl("[[:alpha:]]", x) & grepl("[[:digit:]]", x))
# [1] 2 3 5

Capitalizing letters. R equivalent of excel "PROPER" function [duplicate]

This question already has answers here:
Capitalize the first letter of both words in a two word string
(15 answers)
Closed 6 years ago.
Colleagues,
I'm looking at a data frame resembling the extract below:
Month Provider Items
January CofCom 25
july CofCom 331
march vobix 12
May vobix 0
I would like to capitalise first letter of each word and lower the remaining letters for each word. This would result in the data frame resembling the one below:
Month Provider Items
January Cofcom 25
July Cofcom 331
March Vobix 12
May Vobix 0
In a word, I'm looking for R's equivalent of the ROPER function available in the MS Excel.
With regular expressions:
x <- c('woRd Word', 'Word', 'word words')
gsub("(?<=\\b)([a-z])", "\\U\\1", tolower(x), perl=TRUE)
# [1] "Word Word" "Word" "Word Words"
(?<=\\b)([a-z]) says look for a lowercase letter preceded by a word boundary (e.g., a space or beginning of a line). (?<=...) is called a "look-behind" assertion. \\U\\1 says replace that character with it's uppercase version. \\1 is a back reference to the first group surrounded by () in the pattern. See ?regex for more details.
If you only want to capitalize the first letter of the first word, use the pattern "^([a-z]) instead.
The question is about an equivalent of Excel PROPER and the (former) accepted answer is based on:
proper=function(x) paste0(toupper(substr(x, 1, 1)), tolower(substring(x, 2)))
It might be worth noting that:
proper("hello world")
## [1] "Hello world"
Excel PROPER would give, instead, "Hello World". For 1:1 mapping with Excel see #Matthew Plourde.
If what you actually need is to set only the first character of a string to upper-case, you might also consider the shorter and slightly faster version:
proper=function(s) sub("(.)", ("\\U\\1"), tolower(s), pe=TRUE)
Another method uses the stringi package. The stri_trans_general function appears to lower case all letters other than the initial letter.
require(stringi)
x <- c('woRd Word', 'Word', 'word words')
stri_trans_general(x, id = "Title")
[1] "Word Word" "Word" "Word Words"
I dont think there is one, but you can easily write it yourself
(dat <- data.frame(x = c('hello', 'frIENds'),
y = c('rawr','rulZ'),
z = c(16, 18)))
# x y z
# 1 hello rawr 16
# 2 frIENds rulZ 18
proper <- function(x)
paste0(toupper(substr(x, 1, 1)), tolower(substring(x, 2)))
(dat <- data.frame(lapply(dat, function(x)
if (is.numeric(x)) x else proper(x)),
stringsAsFactors = FALSE))
# x y z
# 1 Hello Rawr 16
# 2 Friends Rulz 18
str(dat)
# 'data.frame': 2 obs. of 3 variables:
# $ x: chr "Hello" "Friends"
# $ y: chr "Rawr" "Rulz"
# $ z: num 16 18

Get indices of all character elements matches in string in R

I want to get indices of all occurences of character elements in some word. Assume these character elements I look for are: l, e, a, z.
I tried the following regex in grep function and tens of its modifications, but I keep receiving not what I want.
grep("/([leazoscnz]{1})/", "ylaf", value = F)
gives me
numeric(0)
where I would like:
[1] 2 3
To use grep work with individual characters of a string, you first need to split the string into separate character vectors. You can use strsplit for this:
strsplit("ylaf", split="")[[1]]
[1] "y" "l" "a" "f"
Next you need to simplify your regular expression, and try the grep again:
strsplit("ylaf", split="")[[1]]
grep("[leazoscnz]", strsplit("ylaf", split="")[[1]])
[1] 2 3
But it is easier to use gregexpr:
gregexpr("[leazoscnz]", "ylaf")
[[1]]
[1] 2 3
attr(,"match.length")
[1] 1 1
attr(,"useBytes")
[1] TRUE

Resources