How to keep negative sign from character into numeric - r

I wanted to keep negative sign of character while converting it into numeric class. But ended up getting a warning like NAs introduced by coercion.
test <- c("001","00-2","0003")
test <- as.numeric(as.character(test))
I expect the real output is:
1 -2 3
But the current output is:
1 NA 3
Can anyone please help me on this? Thanks!

1) sub Remove leading 0's and then convert:
as.numeric(sub("^0+", "", test))
## [1] 1 -2 3
2) trimws In R 3.6 or later this would also work:
as.numeric(trimws(test, "left", 0))
## [1] 1 -2 3
Edge case
3) paste0 If it is possible to have all zeros then the sub will reduce that string to an empty string so assuming that we have no decimals we can do this:
test2 <- c(test, "00")
as.numeric(sub("^0+", "", paste0(test2, ".0")))
## [1] 1 -2 3 0
3a) or this:
as.numeric(sub("^0*(.)", "\\1", test2))
## [1] 1 -2 3 0

Related

Remove characters from list which contains numeric and characters without NA coercion message

I've got a list which contains numbers and text items. I want to remove the text items and keep the numeric items. I don't mind using a few lines of code, So I can do the below with with x, then remove the NAs using another line. BUT I want to avoid the NAs coercion warning message.
x <- c(1,"_2",3 , 6 , "1_" , "a")
as.numeric(x)
[1] 1 NA 3 6 NA NA
Warning message:
NAs introduced by coercion
Any help is greatly appreciated. Thanks in advance.
You could use grep from base R to get values that only contain digits ie ^\\d+$ or use \\D to get those that contain non-digits and invert the regex to match only digits:
as.numeric(grep("\\D", x, value = TRUE, invert = TRUE))
[1] 1 3 6
as.numeric(grep("^\\d+$", x, value = TRUE))
[1] 1 3 6
We can use suppressWarnings to muffle the warnings
suppressWarnings(as.numeric(x))
[1] 1 NA 3 6 NA NA
Or with str_subset
library(tidyr)
as.numeric(str_subset(x, "^\\d+$"))
[1] 1 3 6

Count number of non-numeric characters in a string

How do I count the number of non-numeric characters in a string with both numeric and non-numeric characters in R
for example:
c("ab34","f354","q64423","abf3")
I would like the output to be:
c(2,1,1,3)
1) gsub Replace digits with the empty string and count what is left. No packages are used.
x <- c("ab34","f354","q64423","abf3")
nchar(gsub("\\d", "", x))
## [1] 2 1 1 3
2) gregexpr Another possibility is to use gregexpr and lengths to count the non-numerics. We append a non-numeric to each component so that the value can never be zero (as gregexpr returns NA in that case) and then subtract 1 in the end.
lengths(gregexpr("\\D", paste0(x, "X"))) - 1
## [1] 2 1 1 3
3) strsplit / %in% split the strings into individual characters and sum the number that are not in 0:9
sapply(strsplit(x, ""), function(x) sum(!x %in% 0:9))
[1] 2 1 1 3
4) trimws In the example in the question the numerics are always at the end (or even if they were at the beginning) we can trim them off and then count what is left.
nchar(trimws(x, white = "\\d"))
## [1] 2 1 1 3
5) regexpr If the digits are always at the end then we can use regexpr to find the position of the first one. We append a 0 to ensure that there is at least one digit and subtract 1 at the end.
c(regexpr("\\d", paste0(x, 0)) - 1)
## [1] 2 1 1 3

Coercing String to Vector

I'm trying to create a calculator that multiplies permutation groups written in cyclic form (the process of which is described in this post, for anyone unfamiliar: https://math.stackexchange.com/questions/31763/multiplication-in-permutation-groups-written-in-cyclic-notation). Although I know this would be easier to do with Python or something else, I wanted to practice writing code in R since it is relatively new to me.
My gameplan for this is take an input, such as "(1 2 3)(2 4 1)" and split it into two separate lists or vectors. However, I am having trouble starting this because from my understanding of character functions (which I researched here: https://www.statmethods.net/management/functions.html) I will ultimately have to use the function grep() to find the points where ")(" occur in my string to split from there. However, grep only takes vectors for its argument, so I am trying to coerce my string into a vector. In researching this problem, I have mostly seen people suggest to use as.integer(unlist(str_split())), however, this doesn't work for me as when I split, not everything is an integer and the values become NA, as seen in this example.
library(tidyverse)
x <- "(1 2 3)(2 4 1)"
x <- as.integer(unlist(str_split(x," ")))'
x
Is there an alternative way to turn a string into a vector when there are not just integers involved? I also realize that the means by which I am trying to split up the two permutations is very roundabout, but that is because of the character functions that I researched this seems like the only way. If there are other functions that would make this easier, please let me know.
Thank you!
Comments in the code.
x <- "(1 2 3)(2 4 1)"
out1 <- strsplit(x, split = ")(", fixed = TRUE)[[1]] # split on close and open bracket
out2 <- gsub("[\\(|\\)]", replacement = "", out1) # remove brackets
out3 <- strsplit(out2, " ") # tease out numbers between spaces
lapply(out3, as.integer)
[[1]]
[1] 1 2 3
[[2]]
[1] 2 4 1
There aren't really any scalars on R. Single values like 1, TRUE, and "a" are all 1-element vectors. grep(pattern, x) will work fine on your original string. As a starting point for getting towards your desired goal, I would suggest splitting the groups using:
> str_extract_all(x, "\\([0-9 ]+\\)")
[[1]]
[1] "(1 2 3)" "(2 4 1)"
If we need to split the strings with the brackets
strsplit(x, "(?<=\\))(?=\\()", perl = TRUE)[[1]]
#[1] "(1 2 3)" "(2 4 1)"
Or we can use convenient wrapper from qdapRegex
library(qdapRegex)
ex_round(x, include.marker = TRUE)[[1]]
#[1] "(1 2 3)" "(2 4 1)"
alternative: using library(magrittr)
x <- "(1 2 3)(2 4 1)"
x %>%
gsub("^\\(","c(",.) %>% gsub("\\)\\(","),c(",.) %>% gsub("(?=\\s\\d)",", ",.,perl=T) %>%
paste0("list(",.,")") %>% {eval(parse(text=.))}
result:
# [[1]]
# [1] 1 2 3
#
# [[2]]
# [1] 2 4 1
You could use chartr with read.table :
read.table(text= chartr("()"," \n",x))
# V1 V2 V3
# 1 1 2 3
# 2 2 4 1

order() function in R

I used order() function to do the following
x<-c(12,5,13,8)
order(x)
It gives the following result, indicating it is in descending order
[1] 2 4 1 3
However, when I typed the following
x<-c(12,11,13,14)
order(x)
It gives a result that is in ascending order
[1] 2 1 3 4
I am not quite sure if I missed anything. Thanks for your help!
Order returns the row numbers in ascending order of x ( by default). So your output is as expected.
In case you were expecting the vector x to be ordered :
> x<-c(12,5,13,8)
# returns row numbers
> order(x)
[1] 2 4 1 3
# returns the ordered vector#############
> x[order(x)]
[1] 5 8 12 13
To order in descending order , use :
> x[order(x, decreasing = TRUE)]
[1] 13 12 8 5
You were just mistaken in the way you were reading the function.
The numbers that are being returned are the position in your vector.
Your first example:
x <- c(12,5,13,8)
order(x)
[1] 2 4 1 3
This is telling you that, in an ascending order, the first number is in the second position = 5, the second number is in the fourth position = 8, and so on.

Count number of occurrences at end of string

I want to count how many commas are at the end of a string with a regex:
x <- c("w,x,,", "w,x,", "w,x", "w,x,,,")
I'd like to get:
[1] 2 1 0 3
This gives:
library(stringi)
stringi::stri_count_regex(x, ",+$")
## [1] 1 1 0
Because I'm using a quantifier but don't know how to count actual number of times single character was repeated at end.
The "match.length" attribute within the regexpr seem to get the job done (-1 is used to distinguish no match from zero-width matches such as lookaheads)
attr(regexpr(",+$", x), "match.length")
## [1] 2 1 -1 3
Another option (with contribution from #JasonAizkalns) would be
nchar(x) - nchar(gsub(",+$", "", x))
## [1] 2 1 0 3
Or using stringi package combined with nchar while specifying , keepNA = TRUE (this way no matches will be specified as NAs)
library(stringi)
nchar(stri_extract_all_regex(x, ",+$"), keepNA = TRUE)
## [1] 2 1 NA 3

Resources