Remove first "." from values in R

Remove first "." from values in R - r

I have a dataset with different values in R. Some values are like 11.474 and others like 1.034.496 in the same column. I would like to change the values with two dots from 1.034.496 to 1034.496. Is there anyone who could help me please?
Thanks for the help!

Use gsub with Perl regexes:
df <- data.frame(a = c('11.474', '1.034.496', '1.234.034.496'))
df$a = gsub('[.](?=.*[.])', '', df$a, perl = TRUE)
print(df)
## a
## 1 11.474
## 2 1034.496
## 3 1234034.496
Here, [.](?=.*[.]) is a literal dot (has to be escaped like so \. or put into a character class like so: [.]), followed by a literal dot using positive lookahead: (?=PATTERN).

I guess there must be other smarter regex approaches than the below one, but here is my attempt
> ifelse(lengths(gregexpr("\\.",v))>1,sub("\\.","",v),v)
[1] "11.474" "1034.496"
where
v <- c("11.474","1.034.496")

Related

Replacing multiple punctuation marks in a column of data

Column in a df:
chr10:123453:A:C
chr10:2345543:TTTG:CG
chr10:3454757:G:C
chr10:4567875765:C:G
Desired output:
chr10:123453_A/C
chr10:2345543_TTTG/CG
chr10:3454757_G/C
chr10:4567875765_C/G
I think I could use stingsplit but I wanted to try and do it all in a R oneliner. Any ideas would be greatly welcome!

Try this:
gsub(":([A-Z]+):([A-Z]+)$", "_\\1/\\2", x, perl = TRUE)
[1] "chr10:123453_A/C" "chr10:2345543_TTTG/CG"
Here we use backreference twice: \\1 recollects what's between the pre-ultimate and the ultimate :, whereas \\2 recollects what's after the ultimate :.
Data:
x <- c("chr10:123453:A:C","chr10:2345543:TTTG:CG")

removing numeric character from data

I want to remove the number 0 from all the names in my data.frame.
I have tried to do it myself, however, working with strings is a first time for me.
I have tried:
gsub('\0', '', df )
reproducible code:
df <- c("y2016.09", "y2010.05", "y2010.06", "y2010.07", "y2010.08",
"y2010.09")
expected output
y2016.9
y2010.5
y2010.6
y2010.7
y2010.8
y2010.9

We can specify the location of . (. is a metacharacter in regex - for any character, so it is escaped \\ to evaluate it literally) and 0 or more character of 0's is matched (0*), in the replacement, replace with . i.e. the one we removed by matching
sub("\\.0*", ".", df)
#[1] "y2016.9" "y2010.5" "y2010.6" "y2010.7" "y2010.8" "y2010.9"

Here is another regex solution using lookarounds, but not as simple as the one by #akrun
> gsub("(?<=\\.)0+","",df,perl = TRUE)
[1] "y2016.9" "y2010.5" "y2010.6" "y2010.7" "y2010.8" "y2010.9"

Regexpr not working as expected

For the following string <10.16;13.05) I want to match only the first number (sometimes the first number does not exist, i.e. <;13.05)). I used the following regular expression:
grep("[0-9]+\\.*[0-9]*(?=;)","<10.16;13.05)",value=T,perl=T)
However, the result is not "10.16" but "<10.16;13.05)". Could anyone please help me with this one? Thanks.

You could also use strsplit here with minimum regex, i.e.
x <- '<10.16;13.05)'
as.numeric(gsub('<(.*)', '\\1', unlist(strsplit(x, ';', fixed = TRUE))[1]))
#[1] 10.16
x <- '<;13.05)'
as.numeric(gsub('<(.*)', '\\1', unlist(strsplit(x, ';', fixed = TRUE))[1]))
#[1] NA

I believe you are using the wrong regex function. grep just tells you whether the patern was found, it does not extract it.
Try instead
regmatches("<10.16;13.05)", regexpr("\\d*\\.\\d*", "<10.16;13.05)"))

Remove part of a string until a character is found R

I have a regex problem or somewhat regex related problem...
I have strings that look like this:
"..........))))..)))))))"
"....))))))))...)).))))..))"
"......))))...)))...)))))"
I want to remove the initial dot sequence, so that I only get the string starting by the first occurence of ")" symbol. Say, the output would be somthing like:
"))))..)))))))"
"))))))))...)).))))..))"
"))))...)))...)))))"
I assume it would be somewhat similar to a lookahead regex but cannot figure out the correct one...
Any help?
Thanks

We match for 0 or more dots (\\.*) from the start (^) of the string and replace it with blank
sub("^\\.*", "", v1)
#[1] "))))..)))))))" "))))))))...)).))))..))" "))))...)))...)))))"
If it needs to start from ), then as above match 0 or more dots till the first ) and replace with the )
sub("^\\.*\\)", ")", v1)
#[1] "))))..)))))))" "))))))))...)).))))..))" "))))...)))...)))))"
data
v1 <- c("..........))))..)))))))", "....))))))))...)).))))..))", "......))))...)))...)))))")

You can simply remove dots from the beginning of the line (marked in the regex by ^) until you reach a non-dot character:
a <- "..........))))..)))))))"
b <- "....))))))))...)).))))..))"
c <- "......))))...)))...)))))"
sub("^\\.*", "", a) # "))))..)))))))"
sub("^\\.*", "", b) # "))))))))...)).))))..))"
sub("^\\.*", "", c) # "))))...)))...)))))"

The way your question is worded, the goal isn't to remove just . from the beginning, but any symbol until the first ) is encountered. So this answer is a more general solution.
stringr::str_extract("..........))))..)))))))","\\).*$")
Alternatively, if you want to stick with base R, you could use sub/gsub like this:
gsub("[^\\)]*(\\).*$)","\\1","..........))))..)))))))")
sub("[^\\)]*","","..........))))..)))))))")

Finding number of r's in the vector (Both R and r) before the first u

rquote <- "R's internals are irrefutably intriguing"
chars <- strsplit(rquote, split = "")[[1]]
in the above code we need to find the number of r's(R and r) in rquote

You could use substrings.
## find position of first 'u'
u1 <- regexpr("u", rquote, fixed = TRUE)
## get count of all 'r' or 'R' before 'u1'
lengths(gregexpr("r", substr(rquote, 1, u1), ignore.case = TRUE))
# [1] 5
This follows what you ask for in the title of the post. If you want the count of all the "r", case insensitive, then simplify the above to
lengths(gregexpr("r", rquote, ignore.case = TRUE))
# [1] 6
Then there's always stringi
library(stringi)
## count before first 'u'
stri_count_regex(stri_sub(rquote, 1, stri_locate_first_regex(rquote, "u")[,1]), "r|R")
# [1] 5
## count all R or r
stri_count_regex(rquote, "r|R")
# [1] 6

To get the number of R's before the first u, you need to make an intermediate step. (You probably don't need to. I'm sure akrun knows some incredibly cool regular expression to get the job done, but it won't be as easy to understand as this).
rquote <- "R's internals are irrefutably intriguing"
before_u <- gsub("u[[:print:]]+$", "", rquote)
length(stringr::str_extract_all(before_u, "(R|r)")[[1]])

You may try this,
> length(str_extract_all(rquote, '[Rr]')[[1]])
[1] 6
To get the count of all r's before the first u
> length(str_extract_all(rquote, perl('u.*(*SKIP)(*F)|[Rr]'))[[1]])
[1] 5

EDIT: Just saw before the first u. In that case, we can get the position of the first 'u' from either which or match.
Then use grepl in the 'chars' up to the position (ind) to find the logical index of 'R' with ignore.case=TRUE and use sum using the strsplit output from the OP's code.
ind <- which(chars=='u')[1]
Or
ind <- match('u', chars)
sum(grepl('r', chars[seq(ind)], ignore.case=TRUE))
#[1] 5
Or we can use two gsubs on the original string ('rquote'). First one removes the characters starting with u until the end of the string (u.$) and the second matches all characters except R, r ([^Rr]) and replace it with ''. We can use nchar to get count of the characters remaining.
nchar(gsub('[^Rr]', '', sub('u.*$', '', rquote)))
#[1] 5
Or if we want to count the 'r' in the entire string, gregexpr to get the position of matching characters from the original string ('rquote') and get the length
length(gregexpr('[rR]', rquote)[[1]])
#[1] 6

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Remove first "." from values in R - r

I have a dataset with different values in R. Some values are like 11.474 and others like 1.034.496 in the same column. I would like to change the values with two dots from 1.034.496 to 1034.496. Is there anyone who could help me please? Thanks for the help!

I guess there must be other smarter regex approaches than the below one, but here is my attempt > ifelse(lengths(gregexpr("\\.",v))>1,sub("\\.","",v),v) [1] "11.474" "1034.496" where v <- c("11.474","1.034.496")

Related

Replacing multiple punctuation marks in a column of data

removing numeric character from data

Regexpr not working as expected

Remove part of a string until a character is found R

Finding number of r's in the vector (Both R and r) before the first u

Categories

Resources