Convert mathematical notation to string - r

The solution might be very simply, but I can't seem to figure it out easily. I have the following number:
a = 1000000
#> a
#[1] 1e+06
I would like to convert "a" to a string, but when I try using toString, it gives the following:
#> toString(a)
#[1] "1e+06"
I would like to get: 1,000,000 instead, with the comma separator. Is that easily feasible?
Thanks!

format(1e6, big.mark=",", scientific=FALSE) or prettyNum(1000000,big.mark=",",scientific=F) should give you the desired result

Related

How to split filename strings and convert to a datetime in R

In R I'd like to split file names in the format "a_b_c_d.jpg"
For example:
20190104_080314_2048_1700.jpg
The Date: 2019.01.04 and time 08:03:14 is important to me. The other numbers (2048= pixel, 1700= filter) are not.
So I need the a and b value.
If I use strsplit I get: [1]"a" "b" "c" "d.jpg", but i want [1] a [2] b only.
And in the end i want to use the [1] date and [2] time and put it together into one value: 2019-01-04T08:03:14
Has anyone an idea how to do this?
Thanks for helping me with programming for my astrological research about the sun activity :)
You can use a regular expression to get the pieces of the string you need.
library(stringr)
x <- '20190104_080314_2048_1700.jpg'
str_replace(x, '(^.{4})(.{2})(.{2})_(.{2})(.{2})(.{2}).*', '\\1-\\2-\\3T\\4:\\5:\\6')
#[1] "2019-01-04T08:03:14"
The expression is anchored to the start of the string, then gets the first four characters, then the next 2 characters etc. The first bracket is capture group 1 (i.e. \1)
There are two steps here. First is to split the string, as you suggest, and second to convert those outputs to a datetime object.
Step 1:
strsplit produces a list object. To access individual parts of that list, you need to unlist() it and then call the specific elements you're after.
t <- "20190104_080314_2048_1700.jpg"
t.split <- unlist(strsplit(t, "_"))[c(1,2)]
# [1] "20190104" "080314"
Step 2:
Now you can convert these two strings to a datetime object of your choice. Using lubridate makes it pretty easy:
library(lubridate)
ymd_hms(paste(t.split[1], t.split[2]))
# [1] "2019-01-04 08:03:14 UTC"
or you can use the base R function strptime:
strptime(paste(t.split[1], t.split[2]), format="%Y%m%d %H%M%S")
# [1] "2019-01-04 08:03:14 PST"
Note the difference in the default timezones, and be sure to specify the right one (both functions take a tz= argument).

Partial string extraction with stringr - getting NA

I'm trying to extract part of a string using stringr.
I'm aiming for the output to be E5_1_C33 and E5_1_C23, but instead I'm getting NA.
Any help would be appreciated!
library(stringr)
mystring <- c("can_ComplianceWHOInfrastructurePol_E5_1_C33","can_ComplianceWHOInfrastructurePol_E5_1_C23")
str_extract(mystring, "A\\d_\\d_B\\d\\d$")
slightly modified your line , as as need any letter not only A and B:
str_extract(mystring, "[A-z]\\d_\\d_[A-z]\\d\\d$")
Here's an R base approach using gsub
> gsub(".*(\\w{2}_\\w{1}_\\w{3})$", "\\1", mystring)
[1] "E5_1_C33" "E5_1_C23"

Replace dashes with colon in part of string

I have a dataframe with date and time values as characters, as extracted from a database. Currently the date/time looks like this: 2017-03-17 11-56-20
I want it to loook like this 2017-03-17 11:56:20
It doesn't seem to be as simple as replacing all the dashes using gsub as in R: How to replace . in a string?
I'm thinking it has something to do with the positioning, like telling R to only look after the space. Ideas?
Since you're dealing with a date-time object, you can use strptime:
x <- "2017-03-17 11-56-20"
as.character(strptime(x, "%Y-%m-%d %H-%M-%S", tz = ""))
# [1] "2017-03-17 11:56:20"
Try matching the following pattern:
(\\d\\d)-(\\d\\d)-(\\d\\d)$
and then replace that with:
\\1:\\2:\\3
This will match your timestamp exclusively, because of the terminal anchor $ at the end of the pattern. Then, we rebuild the timestamp the way you want using colons and the three capture groups.
gsub("(\\d\\d)-(\\d\\d)-(\\d\\d)$", "\\1:\\2:\\3", x)
[1] "2017-03-17 11:56:20"
You can use library(anytime) to take care of the formatting for you too (which also coerces to POSIX)
library(anytime)
anytime(x)
# [1] "2017-03-17 11:56:20 AEDT"
as.character(anytime(x))
# [1] "2017-03-17 11:56:20"

extract text from string in R

I have a lot of strings that all looking similar, e.g.:
x1= "Aaaa_11111_AA_Whatiwant.txt"
x2= "Bbbb_11111_BBBB_Whatiwanttoo.txt"
x3= "Ccc_22222_CC_Whatiwa.txt"
I would like to extract the: Whatiwant, Whatiwanttoo, and the Whatiwa in R.
I started with substring(x1,15,23), but I don't know how to generalize it. How can I always extract the part between the last _ and the .txt ?
Thank you!
You can use regexp capture groups:
gsub(".*_([^_]*)\\.txt","\\1",x1)
You can also use the stringr library with funtions like str_extract (and many other possibilities) only in case you don't get into regular expressions. It is extremely easy to use
x1= "Aaaa_11111_AA_Whatiwant.txt"
x2= "Bbbb_11111_BBBB_Whatiwanttoo.txt"
x3= "Ccc_22222_CC_Whatiwa.txt"
library(stringr)
patron <- "(What)[a-z]+"
str_extract(x1, patron)
## [1] "Whatiwant"
str_extract(x2, patron)
## [1] "Whatiwanttoo"
str_extract(x3, patron)
## [1] "Whatiwa"

How to remove last n characters from every element in the R vector

I am very new to R, and I could not find a simple example online of how to remove the last n characters from every element of a vector (array?)
I come from a Java background, so what I would like to do is to iterate over every element of a$data and remove the last 3 characters from every element.
How would you go about it?
Here is an example of what I would do. I hope it's what you're looking for.
char_array = c("foo_bar","bar_foo","apple","beer")
a = data.frame("data"=char_array,"data2"=1:4)
a$data = substr(a$data,1,nchar(a$data)-3)
a should now contain:
data data2
1 foo_ 1
2 bar_ 2
3 ap 3
4 b 4
Here's a way with gsub:
cs <- c("foo_bar","bar_foo","apple","beer")
gsub('.{3}$', '', cs)
# [1] "foo_" "bar_" "ap" "b"
Although this is mostly the same with the answer by #nfmcclure, I prefer using stringr package as it provdies a set of functions whose names are most consistent and descriptive than those in base R (in fact I always google for "how to get the number of characters in R" as I can't remember the name nchar()).
library(stringr)
str_sub(iris$Species, end=-4)
#or
str_sub(iris$Species, 1, str_length(iris$Species)-3)
This removes the last 3 characters from each value at Species column.
The same may be achieved with the stringi package:
library('stringi')
char_array <- c("foo_bar","bar_foo","apple","beer")
a <- data.frame("data"=char_array, "data2"=1:4)
(a$data <- stri_sub(a$data, 1, -4)) # from the first to the (last-4)-th character
## [1] "foo_" "bar_" "ap" "b"
Similar to #Matthew_Plourde using gsub
However, using a pattern that will trim to zero characters i.e. return "" if the original string is shorter than the number of characters to cut:
cs <- c("foo_bar","bar_foo","apple","beer","so","a")
gsub('.{0,3}$', '', cs)
# [1] "foo_" "bar_" "ap" "b" "" ""
Difference is, {0,3} quantifier indicates 0 to 3 matches, whereas {3} requires exactly 3 matches otherwise no match is found in which case gsub returns the original, unmodified string.
N.B. using {,3} would be equivalent to {0,3}, I simply prefer the latter notation.
See here for more information on regex quantifiers:
https://www.regular-expressions.info/refrepeat.html
friendly hint when working with n characters of a string to cut off/replace:
--> be aware of whitespaces in your strings!
use base::gsub(' ', '', x, fixed = TRUE) to get rid of unwanted whitespaces in your strings. i spent quite some time to find out why the great solutions provided above did not work for me. thought it might be useful for others as well ;)

Resources