If I have a list for example:
Userid Total
Apple1 12
Apple2 8
Apple3 15
Apple4 3
Apple5 4
Apple6 6
Apple7 20
Apple8 22
Apple9 5
Apple10 11
Orange1 15
Orange2 8
but I want to do calculations of all Apple items in general, how do I subtract the numbers from the end, I have a code that works if it is a single digit, however I do not know what to do when it becomes double digit.
I currently am using:
substr(userid, 1, nchar(userid)-1)
, which would work for Apple1-9 however Apple10 would then be Apple1, any suggestions what to do.
try gsub to replace all numbers:
x <- c("Apple10", "Apple3", "Orange123")
gsub("[0-9]", "", x)
#[1] "Apple" "Apple" "Orange"
This means, check each element of x and replace any numbers with nothing.
Or, if your data was in a data.frame called df:
df$Userid <- gsub("[0-9]", "",df$Userid)
Now you can procede with ordering as you wish
Using the stringr package, and a different approach:
require(stringi)
x <- c("Apple10", "Apple3", "Orange123")
str_replace_all(str = x, pattern = "\\d{1,3}$", replacement = "")
[1] "Apple" "Apple" "Orange"
The pattern to be replaced by "" is 1 to 3 digits at the end of a string.
Related
My goal is to order multiple variables in a list in R:
"+5x^{5}" "-2x^{3}" "5x^{7}" "0" "1"
I want to get this order:
"5x^{7}" "+5x^{5}" "-2x^{3}" "1" "0"
So exponents from highest to lowest, then numerical order for the numbers.
How can I achieve this? Decreasing for numerical alone is clear. For the exponents it would be necessary to detect if there is a x in the string and then extract the exponent and order it based on this. But I dont know how to do it.
A more verbose option is to extract the exponents and the multipliers and then use arrange. This has the advantage of having these numbers ready if you need to use them.
library(stringr)
library(dplyr)
dat <- data.frame(x = c("+5x^{5}", "-2x^{3}", "5x^{7}", "0", "1"))
dat |> mutate(m = as.numeric(str_match(x, "([+-]*\\d+)x\\^\\{(\\d)\\}")[, 2]),
exp = as.numeric(str_match(x, "([+-]*\\d+)x\\^\\{(\\d)\\}")[, 3]),
n = as.numeric(x)) |>
arrange(desc(exp), desc(n))
Output
#> x m exp n
#> 1 5x^{7} 5 7 NA
#> 2 +5x^{5} 5 5 NA
#> 3 -2x^{3} -2 3 NA
#> 4 1 NA NA 1
#> 5 0 NA NA 0
Created on 2022-06-13 by the reprex package (v2.0.1)
Base R solution:
x[
order(
gsub(
".*\\{(\\d+)\\}.*",
"\\1",
x
),
decreasing = TRUE
)
]
Input data:
x <- c(
"+5x^{5}",
"-2x^{3}",
"5x^{7}",
"0",
"1"
)
Well... it works
> x=c("+5x^{5}","-2x^{3}","5x^{7}","0","1")
> x[order(gsub("(.*\\^\\{)(.+)(\\}.*)","\\2",x),decreasing=T)]
[1] "5x^{7}" "+5x^{5}" "-2x^{3}" "1" "0"
the regex string (.*\\^\\{)(.+)(\\}.*) looks for three things:
(.*\\^\\{) searches for anything before ^{, this is the first split,
(.+) searches for anything inside curly brackets, second split,
(\\}.*) searches for anything after }, third split,
in the end it returns only \\2, the contents of the second split,
which is what we use to order the elements of the string vector.
Suppose I have a long vector with characters which is more or less like this:
vec <- c("32, 25", "5", "15, 24")
I want to apply a function which give me the number of strings for any element separated by a comma and returns me a vector with any individual length. Using lapply and my toy vector, this is my approach:
lapply(vec, function(x) {
a <- strsplit(x, ",")
y <- length(a[[1:length(a)]])
unlist(y[1:length(y)])
})
[[1]]
[1] 2
[[2]]
[1] 1
[[3]]
[1] 2
This almost gives me what I want since first element has 2 strings, second element 1 string and third element 2 strings. The problem is I can't achieve that my function returns me a vector of the form c(2,1,2). I'm using this function to create a new variable on some data.frame which I'm working with.
Any idea will be much appreciated.
You could do:
stringr::str_count(vec, ",") + 1
#> [1] 2 1 2
Or, in base R:
nchar(gsub("[^,]", "", vec)) + 1
#> [1] 2 1 2
I'm trying to create a calculator that multiplies permutation groups written in cyclic form (the process of which is described in this post, for anyone unfamiliar: https://math.stackexchange.com/questions/31763/multiplication-in-permutation-groups-written-in-cyclic-notation). Although I know this would be easier to do with Python or something else, I wanted to practice writing code in R since it is relatively new to me.
My gameplan for this is take an input, such as "(1 2 3)(2 4 1)" and split it into two separate lists or vectors. However, I am having trouble starting this because from my understanding of character functions (which I researched here: https://www.statmethods.net/management/functions.html) I will ultimately have to use the function grep() to find the points where ")(" occur in my string to split from there. However, grep only takes vectors for its argument, so I am trying to coerce my string into a vector. In researching this problem, I have mostly seen people suggest to use as.integer(unlist(str_split())), however, this doesn't work for me as when I split, not everything is an integer and the values become NA, as seen in this example.
library(tidyverse)
x <- "(1 2 3)(2 4 1)"
x <- as.integer(unlist(str_split(x," ")))'
x
Is there an alternative way to turn a string into a vector when there are not just integers involved? I also realize that the means by which I am trying to split up the two permutations is very roundabout, but that is because of the character functions that I researched this seems like the only way. If there are other functions that would make this easier, please let me know.
Thank you!
Comments in the code.
x <- "(1 2 3)(2 4 1)"
out1 <- strsplit(x, split = ")(", fixed = TRUE)[[1]] # split on close and open bracket
out2 <- gsub("[\\(|\\)]", replacement = "", out1) # remove brackets
out3 <- strsplit(out2, " ") # tease out numbers between spaces
lapply(out3, as.integer)
[[1]]
[1] 1 2 3
[[2]]
[1] 2 4 1
There aren't really any scalars on R. Single values like 1, TRUE, and "a" are all 1-element vectors. grep(pattern, x) will work fine on your original string. As a starting point for getting towards your desired goal, I would suggest splitting the groups using:
> str_extract_all(x, "\\([0-9 ]+\\)")
[[1]]
[1] "(1 2 3)" "(2 4 1)"
If we need to split the strings with the brackets
strsplit(x, "(?<=\\))(?=\\()", perl = TRUE)[[1]]
#[1] "(1 2 3)" "(2 4 1)"
Or we can use convenient wrapper from qdapRegex
library(qdapRegex)
ex_round(x, include.marker = TRUE)[[1]]
#[1] "(1 2 3)" "(2 4 1)"
alternative: using library(magrittr)
x <- "(1 2 3)(2 4 1)"
x %>%
gsub("^\\(","c(",.) %>% gsub("\\)\\(","),c(",.) %>% gsub("(?=\\s\\d)",", ",.,perl=T) %>%
paste0("list(",.,")") %>% {eval(parse(text=.))}
result:
# [[1]]
# [1] 1 2 3
#
# [[2]]
# [1] 2 4 1
You could use chartr with read.table :
read.table(text= chartr("()"," \n",x))
# V1 V2 V3
# 1 1 2 3
# 2 2 4 1
I have a string and need to count number of appearances of a given value which must appear consequent. I tried to take help from stringr package but it counts every time it finds that value/pattern. For example, say we have to count appearance of "213" in string "2132132132137889213", then the output i need is 4 however, i am getting 5 after using stringr_count function. Please help.
I'm not sure of my "regex" skills but, hopefully, you could make something out of this:
max_rep_pat = function(pat, text)
{
res = gregexpr(paste0("(", pat, ")+"), text)
sapply(res, function(x) max(attr(x, "match.length")) / nchar(pat))
}
max_rep_pat("213", c("2132132132137889213",
"21321321321378892132132132132132213213"))
#[1] 4 5
gregexpr returns the position a pattern occured and the number of characters of the found pattern. Wrapping the pattern in "(pattern)+" means 'find the repetitive pattern'. Compare the following two:
gregexpr("213", "2132132132137889213")
[[1]]
[1] 1 4 7 10 17
attr(,"match.length")
[1] 3 3 3 3 3
#attr(,"useBytes")
#[1] TRUE
gregexpr("(213)+", "2132132132137889213")
[[1]]
[1] 1 17
attr(,"match.length")
[1] 12 3
#attr(,"useBytes")
#[1] TRUE
In the first case, it found the position of each "213" and the length of each match is just the nchar of pattern. In the second case, it found every repetitive pattern of "213" and we see that repetitions of "213" occured two times; first time with 12 / 3 = 4 repetitions and the second with 3 / 3 = 1 repetition. Using max(attr(x, "match.length")) / nchar(pattern) we get that 4.
Another way would be:
fun1 <- function(pat, text) {
max_rep_pat1 <- function(pat, text) {
text1 <- gsub(pat, paste(" ", pat, " "), text)
rl <- rle(scan(text = text1, what = "", quiet = T) == pat)
max(rl$lengths[rl$values])
}
setNames(mapply(max_rep_pat1, pat, text), NULL)
}
str1 <- c("2132132132137889213", "21321321321378892132132132132132213213")
str2 <- "213421342134213477"
fun1("2134", str2)
#[1] 4
fun1("213", str1)
#[1] 4 5
I have a list of strings which contain random characters such as:
list=list()
list[1] = "djud7+dg[a]hs667"
list[2] = "7fd*hac11(5)"
list[3] = "2tu,g7gka5"
I'd like to know which numbers are present at least once (unique()) in this list. The solution of my example is:
solution: c(7,667,11,5,2)
If someone has a method that does not consider 11 as "eleven" but as "one and one", it would also be useful. The solution in this condition would be:
solution: c(7,6,1,5,2)
(I found this post on a related subject: Extracting numbers from vectors of strings)
For the second answer, you can use gsub to remove everything from the string that's not a number, then split the string as follows:
unique(as.numeric(unlist(strsplit(gsub("[^0-9]", "", unlist(ll)), ""))))
# [1] 7 6 1 5 2
For the first answer, similarly using strsplit,
unique(na.omit(as.numeric(unlist(strsplit(unlist(ll), "[^0-9]+")))))
# [1] 7 667 11 5 2
PS: don't name your variable list (as there's an inbuilt function list). I've named your data as ll.
Here is yet another answer, this one using gregexpr to find the numbers, and regmatches to extract them:
l <- c("djud7+dg[a]hs667", "7fd*hac11(5)", "2tu,g7gka5")
temp1 <- gregexpr("[0-9]", l) # Individual digits
temp2 <- gregexpr("[0-9]+", l) # Numbers with any number of digits
as.numeric(unique(unlist(regmatches(l, temp1))))
# [1] 7 6 1 5 2
as.numeric(unique(unlist(regmatches(l, temp2))))
# [1] 7 667 11 5 2
A solution using stringi
# extract the numbers:
nums <- stri_extract_all_regex(list, "[0-9]+")
# Make vector and get unique numbers:
nums <- unlist(nums)
nums <- unique(nums)
And that's your first solution
For the second solution I would use substr:
nums_first <- sapply(nums, function(x) unique(substr(x,1,1)))
You could use ?strsplit (like suggested in #Arun's answer in Extracting numbers from vectors (of strings)):
l <- c("djud7+dg[a]hs667", "7fd*hac11(5)", "2tu,g7gka5")
## split string at non-digits
s <- strsplit(l, "[^[:digit:]]")
## convert strings to numeric ("" become NA)
solution <- as.numeric(unlist(s))
## remove NA and duplicates
solution <- unique(solution[!is.na(solution)])
# [1] 7 667 11 5 2
A stringr solution with str_match_all and piped operators. For the first solution:
library(stringr)
str_match_all(ll, "[0-9]+") %>% unlist %>% unique %>% as.numeric
Second solution:
str_match_all(ll, "[0-9]") %>% unlist %>% unique %>% as.numeric
(Note: I've also called the list ll)
Use strsplit using pattern as the inverse of numeric digits: 0-9
For the example you have provided, do this:
tmp <- sapply(list, function (k) strsplit(k, "[^0-9]"))
Then simply take a union of all `sets' in the list, like so:
tmp <- Reduce(union, tmp)
Then you only have to remove the empty string.
Check out the str_extract_numbers() function from the strex package.
pacman::p_load(strex)
list=list()
list[1] = "djud7+dg[a]hs667"
list[2] = "7fd*hac11(5)"
list[3] = "2tu,g7gka5"
charvec <- unlist(list)
print(charvec)
#> [1] "djud7+dg[a]hs667" "7fd*hac11(5)" "2tu,g7gka5"
str_extract_numbers(charvec)
#> [[1]]
#> [1] 7 667
#>
#> [[2]]
#> [1] 7 11 5
#>
#> [[3]]
#> [1] 2 7 5
unique(unlist(str_extract_numbers(charvec)))
#> [1] 7 667 11 5 2
Created on 2018-09-03 by the reprex package (v0.2.0).