Convert currency with commas into numeric - r

I have a column in a dataframe as follows:
COL1
$54,345
$65,231
$76,234
How do I convert it into this:
COL1
54345
65231
76234
The way I tried it at first was:
df$COL1<-as.numeric(as.character(df$COL1))
That didn't work because it said NA's were introduced.
Then I tried it like this:
df$COL1<-as.numeric(gsub("\\$","",as.character(df$COL1)))
And the same this happened.
Any ideas?

We could use parse_number from readr package which removes any non-numeric characters.
library(readr)
parse_number(df$COL1)
#[1] 54345 65231 76234

The reason why the gsub didn't work was there was , in the column, which is still non-numeric. So when convert to 'numeric' with as.numeric, all the non-numeric elements are converted to NA. So, we need to remove both , and $ to make it work.
df1$COL1 <- as.numeric(gsub('[$,]', '', df1$COL1))
We match the $ and , inside the square brackets ([$,]) so that it will be considered as that character ($ left alone has special meaning i.e. it signifies the end of the string.) and replace it with ''.
Or we can escape (\\) the character ($) to match it and replace by ''.
df1$COL1 <- as.numeric(gsub('\\$|,', '', df1$COL1))

Another option using stringr library to remove '$' and ',' then convert as follows:
df %>% mutate(COL1 = COL1 %>% str_remove_all("\\$,") %>% as.numeric())

Nested gsub to handle negatives and transform to make it functional and to take advantage of NSE
transform(df, COL1 = as.numeric(gsub("[$),]", "", gsub("^\\(", "-", COL1))))

Related

Erase comma and apostrophe in character R

I want to remove the comma and the apostrophe but the point of the following character. After that pass to numeric
I have this:
characterExample <- "234'564,900.99"
I want 234564900.99
I try the following but I can't:
result <- gsub("[:punct:].","", characterExample)
Another option is to explicitly remove the characters you want to remove:
gsub("[',]", "", characterExample)
#[1] "234564900.99"
``
An option is to not match the digits or the . by using ^ within the square bracket
gsub("[^0-9.]+","", characterExample)
#[1] "234564900.99"
Or another option is to make use of SKIP/FAIL for the ., while matching the rest of the punct
gsub("(\\.)(*SKIP)(*F)|[[:punct:]]+", "", characterExample, perl = TRUE)
#[1] "234564900.99"
NOTE: Both solutions make sure that it matches any punct characters other than the . and replace with blank ("")
It can also use the pipe symbol like this:
#Code
gsub(",|'","", characterExample)
Output:
gsub(",|'","", characterExample)
[1] "234564900.99"

removing numeric character from data

I want to remove the number 0 from all the names in my data.frame.
I have tried to do it myself, however, working with strings is a first time for me.
I have tried:
gsub('\0', '', df )
reproducible code:
df <- c("y2016.09", "y2010.05", "y2010.06", "y2010.07", "y2010.08",
"y2010.09")
expected output
y2016.9
y2010.5
y2010.6
y2010.7
y2010.8
y2010.9
We can specify the location of . (. is a metacharacter in regex - for any character, so it is escaped \\ to evaluate it literally) and 0 or more character of 0's is matched (0*), in the replacement, replace with . i.e. the one we removed by matching
sub("\\.0*", ".", df)
#[1] "y2016.9" "y2010.5" "y2010.6" "y2010.7" "y2010.8" "y2010.9"
Here is another regex solution using lookarounds, but not as simple as the one by #akrun
> gsub("(?<=\\.)0+","",df,perl = TRUE)
[1] "y2016.9" "y2010.5" "y2010.6" "y2010.7" "y2010.8" "y2010.9"

Remove some special characters from a string and convert to decimal a column of a data frame [duplicate]

I have a column in a dataframe as follows:
COL1
$54,345
$65,231
$76,234
How do I convert it into this:
COL1
54345
65231
76234
The way I tried it at first was:
df$COL1<-as.numeric(as.character(df$COL1))
That didn't work because it said NA's were introduced.
Then I tried it like this:
df$COL1<-as.numeric(gsub("\\$","",as.character(df$COL1)))
And the same this happened.
Any ideas?
We could use parse_number from readr package which removes any non-numeric characters.
library(readr)
parse_number(df$COL1)
#[1] 54345 65231 76234
The reason why the gsub didn't work was there was , in the column, which is still non-numeric. So when convert to 'numeric' with as.numeric, all the non-numeric elements are converted to NA. So, we need to remove both , and $ to make it work.
df1$COL1 <- as.numeric(gsub('[$,]', '', df1$COL1))
We match the $ and , inside the square brackets ([$,]) so that it will be considered as that character ($ left alone has special meaning i.e. it signifies the end of the string.) and replace it with ''.
Or we can escape (\\) the character ($) to match it and replace by ''.
df1$COL1 <- as.numeric(gsub('\\$|,', '', df1$COL1))
Another option using stringr library to remove '$' and ',' then convert as follows:
df %>% mutate(COL1 = COL1 %>% str_remove_all("\\$,") %>% as.numeric())
Nested gsub to handle negatives and transform to make it functional and to take advantage of NSE
transform(df, COL1 = as.numeric(gsub("[$),]", "", gsub("^\\(", "-", COL1))))

Extract only number between commas

I have a returned string like this from my code: (<C1>, 4.297, %)
And I am trying to extract only the value 4.297 from this string using gsub command:
Fat<-gsub("\\D", "", stringV)
However, this extracts not only 4.297 but also the number '1' in C1.
Is there a way to extract only 4.297 from this string, please can you help.
Thanks
How about this?
# Your sample character string
ss <- "(<C1>, 4.297, %)";
gsub(".+,\\s*(\\d+\\.\\d+),.+", "\\1", ss)
#[1] "4.297"
or
gsub(".+,\\s*([0-9\\.]+),.+", "\\1", ss)
Convert to numeric with as.numeric if necessary.
Another option is str_extract to match one or more numeric elements with . and is preceded by a word boundary and succeeded by word boundary(\\b)
library(stringr)
as.numeric(str_extract(stringV, "\\b[0-9.]+\\b"))
#[1] 4.297
If there are multiple numbers, use str_extract_all
data
stringV <- "(<C1>, 4.297, %)"
An alternative is to treat your vector as a comma-separated-variable, and use read.csv
df <- read.csv(text = stringV, colClasses = c("character", "numeric", "character"), header = F)
V1 V2 V3
1 (<C1> 4.297 %)
This method is relying on the 'numeric' being in the 'second' position in the vector.
you can use as.numeric convert no number string to NA.
ss <- as.numeric(unlist(strsplit(stringV, ',')))
ss[!is.na(ss)]
#[1] 4.297

formatting the date in R

I have a date value as follows
"'2015-10-24'"
class Character
I am trying to format this value such that it looks like this '10/24/2015'
I know how to use noquote function and strip the quotes and gsub function to replace the - with / but I am not sure how to switch the year, date and month such that it looks like this '10/24/2015'
Any help is much appreciated.
We can convert to Date class after removing the ' with gsub, and then use format to get the expected output
format(as.Date(gsub("'", '', v1)), "'%m/%d/%Y'")
#[1] "'10/24/2015'" "'10/25/2015'"
Or without using the gsub to remove ', we can specify the ' also in the format within as.Date
format(as.Date(v1, "'%Y-%m-%d'"), "'%m/%d/%Y'")
#[1] "'10/24/2015'" "'10/25/2015'"
This can be made more compact if we are using library(lubridate)
library(lubridate)
format(ymd(v1), "'%m/%d/%Y'")
#[1] "'10/24/2015'" "'10/25/2015'"
If we don't need the ' in the output, we don't have to specify that in the format,
format(ymd(v1), "%m/%d/%Y")
#[1] "10/24/2015" "10/25/2015"
Or we can do this using only gsub by capturing the characters as a group. In the below code, we capture the first 4 characters (.{4}) as a group by wrapping with parentheses followed by matching the -, then capturing the next two characters, followed by -, and capturing the last two characters. In the replacement, we can shuffle the capture groups as per the requirement. In this case, the second capture group should come first (\\2) followed by /, then the third (\\3) and so on...
gsub('(.{4})-(.{2})-(.{2})', '\\2/\\3/\\1', v1)
#[1] "'10/24/2015'" "'10/25/2015'"
To avoid the quotes,
gsub('.(.{4})-(.{2})-(.{2}).', '\\2/\\3/\\1', v1)
#[1] "10/24/2015" "10/25/2015"
In addition, there are other ways such as splitting the string
vapply(strsplit(v1, "['-]"), function(x) paste(x[c(3,4,2)], collapse='/'), character(1))
#[1] "10/24/2015" "10/25/2015"
or extracting the numeric part with str_extract_all and pasteing as before.
library(stringr)
vapply(str_extract_all(v1, '\\d+'), function(x)
paste(x[c(2,3,1)], collapse='/'), character(1))
#[1] "10/24/2015" "10/25/2015"
data
v1 <- c("'2015-10-24'", "'2015-10-25'")
You can also use the function strftime to get the result
d <- "'2015-10-24'"
strftime(as.Date(gsub("'", "", d)), "%m/%d/%Y")
# [1] "10/24/2015"

Resources