format string to specific format including dots and dashes - r

I have a string with numbers only and it always has the same witdh. I need this string in a specific format.
original = "00000000000000"
outcome = "00.000.000/0000-00"
Is there a way simple way to do this? Could it be applied to a vector of strings?

Assuming the width is constant, we can use sub:
original = "00000000000000"
sub("(.{2})(.{3})(.{3})(.{4})(.{2})", "\\1.\\2.\\3/\\4-\\5", original)
# [1] "00.000.000/0000-00"

Related

How can I insert a decimal separator "." after two digit in a specif column in R?

I have a data.frame with two column in which there are numbers in the following format:
Latitude Longitude
-300663 -512344
enter image description here
I wish insert a "." after two digits to convert to a decimal number, like:
Latitude Longitude
-30.0663 -51.2344
enter image description here
How can I do this?
As someone above already noted, you can simply scale the variable by dividing it by whatever number you choose. Just wanted to add that if you are only looking to explicitly turn something from an integer into a decimal format, you can also use format, like so:
x <- 40000
formatted.x <- format(x,
nsmall = 4) # how many digits after the decimal
formatted.x
Which gives you what you want:
[1] "40000.0000"
However, if you check the class of the variable saved:
class(formatted.x)
You will find it is formatted to be a character variable, which can be annoying if you try to treat it like a decimal. Another option if you want all numbers in R to have more/less decimals is to manually change the options in R.
options(scipen=1000) # changes scientific notation
options(digits=5) # changes number of digits
Generally speaking, I wouldn't advise doing either of these for your specific purpose, but figured I would note these as other ways this can be achieved.

In R How to remove a precise character in a column ( in this case the " , " )that has other same character that i don't want to remove?

i have a dataset with some columns that have a monetized value, but considering the name of the columns and the description of them, i believe that there's an error in the representation of the numbers. i.e. (5,52,32,974)----> this is an example of the number, i believe there is a comma too many or put in the wrong position. I would like to know if it's possible to remove a certain comma in this case and came to this representation of the number, for instance 55.232.974... of $ for example. The dataset is in .csv. Thanks in advance.
if I understand it correctly your data is given as a string.
Then you could use the following code:
a <- c("5,52,32,974", "5,52,32,974", "5,52,32,974")
b <- gsub(",", "", a)
as.numeric(b)
#[1] 55232974 55232974 55232974

Specify the number of columns read_csv is applied to

Is it possible to pass column indices to read_csv?
I am passing many CSV files to read_csv with different header names so rather than specifying names I wish to use column indices.
Is this possible?
df.list <- lapply(myExcelCSV, read_csv, skip = headers2skip[i]-1)
Alternatively, you can use a compact string representation
where each character represents one column: c = character, i
= integer, n = number, d = double, l = logical, f = factor, D
= date, T = date time, t = time, ? = guess, or ‘_’/‘-’ to
skip the column.
If you know the total number of columns in the file you could do it like this:
my_read <- function(..., tot_cols, skip_cols=numeric(0)) {
csr <- rep("?",tot_cols)
csr[skip_cols] <- "_"
csr <- paste(csr,collapse="")
read_csv(...,col_types=csr)
}
If you don't know the total number of columns in advance you could add code to this function to read just the first line of the file and count the number of columns returned ...
FWIW the skip argument might not do what you think it does (it skips rows rather than selecting/deselecting columns): as I read ?readr::read_csv() there doesn't seem to be any convenient way to skip and/or include particular columns (by name or by index) except by some ad hoc mechanism such as suggested above; this might be worth a feature request/discussion on the readr issues list? (e.g. add cols_include and/or cols_exclude arguments that could be specified by name or position?)

How to reorder a string in R to follow a consistent pattern

I have vector of strings of the type: 2004/083.BHP, 2007.MAIS.0048 and 2006/0066. There are lots of these strings of varying characters.
I would like to have consistentcy in the representation of these strings, so that they look something like 083/2004.BHP, 0048/2007.MAIS and 0066/2006.
How do I get all strings in this vector to appear in this way without manually coding it? I understand this is a difficult task, and any help is appreciated.
Thank you in advance.
Here are a few suggestions for ordering or sorting your strings in a consistent pattern (such as by alphabetic order or by number of characters). The last case, sorts by starting with a 4-digit (ie date) and then by name.
strings <- c("2004/083.BHP","2007.MAIS.0048","2006/0066","432.ABC","2008/42002","2094/31.AC")
strings <- sort(strings)
strings
strings <- strings[order(nchar(strings))]
strings
strings <- strings[order(strings,decreasing =T) ]
strings
strings <- strings[order(grepl("^\\d{4}",strings),strings,decreasing =F) ]
strings

Looking for a regular expression to capture occurrences of a pattern and replace each instance with a different value in R [duplicate]

This question already has answers here:
How to parse an XML file to an R data frame?
(4 answers)
Closed 5 years ago.
I would like to find every occurrence of a string in a large body of text and replace the nth occurrence of that string with the nth element in an array of replacement strings.
I have a large text file of XML with the url/path of a particular image. This url occurs 1000 times in this file. I have an array of 1000 unique image paths that I would like to substitute into this text file.
The basic idea is:
needle: IM_5sWQ4n0fUWh0jVH
haystack: random XML..src=IM_5sWQ4n0fUWh0jVH...random XML... src=IM_5sWQ4n0fUWh0jVH... random XML... src=IM_5sWQ4n0fUWh0jVH...
Array of image url paths: replaceArray = {IM_5sWQ4n0fUWh0jVH, IM_31DS439u38, IM_8939cSd9321,...}
Goal: Replace first occurrence of IM_5sWQ4n0fUWh0jVH with the first element of replaceArray, replace the second occurrence of IM_5sWQ4n0fUWh0jVH with the second element of replaceArray, etc.
Desired output:
random XML..src=IM_5sWQ4n0fUWh0jVH...random XML... src=IM_31DS439u38... random XML... src=IM_8939cSd9321...
Does anyone have any idea how to go about doing this preferably in R? I've looked around the web a bit but haven't found the answer so far. Thanks in advance!
You could use sub in a loop.
With sub you can search and replace the first instance of a pattern.
(In general gsub is more useful, since it replaces all instances.)
Replacing Regex Matches in String Vectors
The sub function has three required parameters: a string with the
regular expression, a string with the replacement text, and the input
vector. sub returns a new vector with the same length as the input
vector. If a regex match could be found in a string element, it is
replaced with the replacement text. Only the first match in each
string element is replaced. If no matches could be found in some
strings, those are copied into the result vector unchanged.
df <- c("16_24cat 16_24cat", "25_34cat34343", "35_44cats 16_24cat33 16_24cat", "45_54Cat 16_24cat", "55_104fat")
ar <- c("mouse", "bear", "duck")
x <- 1
while(x < 5) {
df = sub(pattern = "cat", replacement = ar[x], df, ignore.case = TRUE, perl=TRUE);
x <- x+1;
}
df
Output:
"16_24mouse 16_24bear"
"25_34mouse34343"
"35_44mouses 16_24bear33 16_24duck"
"45_54mouse 16_24bear"
"55_104fat"

Resources