Remove 91 from mobile_number_column in R - r

Eg: Mobile_Number column contains
read.table(header=T,text=' Mobile_Number_Column
919177289917
917728991746
917728991748
919126380348
')
Now I want to remove 91 from Mobile_Number_Column
Expected Result:
Mobile_Number_Column
9177289917
7728991746
7728991748
9126380348

This can be accomplished with a regular expression. Since you're reading in the numbers as part of a data.frame, you can leverage the ^ start of string matcher plus the literal numbers of 91 in a sub call. No point in gsub since you only want to match once.
df = read.table(header=T,text=' Mobile_Number_Column
919177289917
917728991746
917728991748
919126380348
')
df$Mobile_Number_Column = sub("^91","",as.character(df$Mobile_Number_Column))
df
#> Mobile_Number_Column
#> 1 9177289917
#> 2 7728991746
#> 3 7728991748
#> 4 9126380348

This uses the stringr and dplyr packages:
library(tidyverse)
data <- tibble(numbers = c(
919177289917,
917728991746,
917728991748,
919126380348)
)
data_2 <- data %>%
mutate(numbers = str_sub(numbers, start = 3L, end = -1L))

Related

Delimit a column in R based on 2 characters

I have a column in a dataframe in R that contains values such as
C22/00556,
C21/00445,
B22/00111,
C22-00679, etc.
I would like to split this into 2 columns named initial and number. The delimiter being "-" or "/".
As a result I would expect a column containing C22, C21, B22, etc and another column containing 00556, 00445 etc.
I am trying to use the separate function but I am struggling with the sep= part.
I have tried using sep= c("/","-") but this is not working and throws an error.
You could use separate from tidyr by / or (|) - like this:
df <- data.frame(V1 = c("C22/00556", "C21/00445", "B22/00111", "C22-00679"))
library(tidyr)
df %>%
separate(V1, c("initial", "number"), sep = "/|-")
#> initial number
#> 1 C22 00556
#> 2 C21 00445
#> 3 B22 00111
#> 4 C22 00679
Created on 2023-01-05 with reprex v2.0.2
Using base R
read.table(text = chartr("/", "-", df$V1), sep = "-", header = FALSE,
col.names = c("initial", "number"), colClasses = "character")
initial number
1 C22 00556
2 C21 00445
3 B22 00111
4 C22 00679

Remove Last Character in R inplace

I came from a Python background and I am working in R with this data df.
name age
1 Anon1 52a
2 Anon2 62
3 Anon3 44a
4 Anon4 30
5 Anon5 110a
Using R language, how can I remove the a in the last part of the age column and do data modification in place??
(just like Python using inplace=True)
Can I attain it using
df$Age[which(df$Age == `a pattern`)] <- ""
This is a perfect use case for parse_number from readr package (it is in tidyverse:
library(dplyr)
library(readr)
df %>%
mutate(age = parse_number(age))
name age
1 Anon1 52
2 Anon2 62
3 Anon3 44
4 Anon4 30
5 Anon5 110
data:
df <- structure(list(name = c("Anon1", "Anon2", "Anon3", "Anon4", "Anon5"
), age = c("52a", "62", "44a", "30", "110a")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5"))
You could use sub here:
df$age <- sub("a$", "", df$age, fixed=TRUE)
#A tidy solution
library(dplyr)
library(stringr)
df <- data.frame(name=c("anon1","anon2"),age=c("52","37a"))
df <- df %>%
mutate(age = str_extract(age,"^\\d+"))
df
name age
1 anon1 52
2 anon2 37
Here are two approaches. No packages are used.
1) We remove all non-digit characters where in a regular expression \D means non-digit. If we knew that only a could appear as a non-digit we could , instead, use "a" as the first argument to gsub and if we knew it only appears once we could use sub instead of gsub.
Also it is easier to debug code if you don't overwrite variables since then you always know that a particular variable is in its original state. Instead assign the result to a new variable.
transform(DF, age = as.numeric(gsub("\\D", "", age)))
This could also be written using pipes:
transform(DF, age = age |> gsub(pattern = "\\D", replacement = "") |> as.numeric())
2) We can use scan specifying that a is a comment character.
transform(DF, age = scan(text = age, comment.char = "a", quiet = TRUE))
Note
Lines <- "
name age
1 Anon1 52a
2 Anon2 62
3 Anon3 44a
4 Anon4 30
5 Anon5 110a"
DF <- read.table(text = Lines)
The inplace modifier in python refers to making a change without creating a copy. The data.table package in R allows for this (called replace by reference).
df <- read.table(text="
name age
1 Anon1 52a
2 Anon2 62
3 Anon3 44a
4 Anon4 30
5 Anon5 110a")
library(data.table)
library(stringi)
setDT(df)[, age:=stri_extract(age, regex='^\\d+')]
df
The first clause (setDT(df)) converts df to a data.table by reference (e.g., without making a copy), and the second clause ([, age:=...]) replaces the values in column age with ... also by reference.

How to Remove characters that doesn't match the string pattern from a column of a data frame

I have a column in my data frame as shown below.
I want to keep the data in the pattern "\\d+Zimmer" and remove all the digits from the column such as "9586" and "927" in the picture.
I tried following gsub function.
gsub("[^\\d+Zimmer]", "", flat_cl_one$rooms)
But it removes all the digits, as below.
What Regex can I use to get the correct result? Thank You in Advance
We can coerce any rows that have alphanumeric characters to NA and then replace the rows that don't have NA to blanks.
library(dplyr)
flat_cl_one %>%
mutate(rooms = ifelse(!is.na(as.numeric(rooms)), "", rooms))
Or we can use str_detect:
flat_cl_one %>%
mutate(rooms = ifelse(str_detect(rooms, "Zimmer", negate = TRUE), "", rooms))
Output
rooms
1 647Zimmer
2 394Zimmer
3
4
5 38210Zimmer
We could do the same thing with filter if you wanted to actually remove those rows.
flat_cl_one %>%
filter(is.na(as.numeric(rooms)))
# rooms
#1 647Zimmer
#2 394Zimmer
#3 38210Zimmer
Data
flat_cl_one <- structure(list(rooms = c("647Zimmer", "394Zimmer", "8796", "9389",
"38210Zimmer")), class = "data.frame", row.names = c(NA, -5L))
Just replace strings that don't contain the word "Zimmer"
flat_cl_one$room[!grepl("Zimmer", flat_cl_one$room)] <- ""
flat_cl_one
#> room
#> 1 3Zimmer
#> 2 2Zimmer
#> 3 2Zimmer
#> 4 3Zimmer
#> 5
#> 6
#> 7 3Zimmer
#> 8 6Zimmer
#> 9 2Zimmer
#> 10 4Zimmer
Data
flat_cl_one <- data.frame(room = c("3Zimmer", "2Zimmer", "2Zimmer", "3Zimmer",
"9586", "927", "3Zimmer", "6Zimmer",
"2Zimmer", "4Zimmer"))
Another possible solution, using stringr::str_extract (I am using #AndrewGillreath-Brown's data, to whom I thank):
library(tidyverse)
df <- structure(
list(rooms = c("647Zimmer", "394Zimmer", "8796", "9389", "38210Zimmer")),
class = "data.frame",
row.names = c(NA, -5L))
df %>%
mutate(rooms = str_extract(rooms, "\\d+Zimmer"))
#> rooms
#> 1 647Zimmer
#> 2 394Zimmer
#> 3 <NA>
#> 4 <NA>
#> 5 38210Zimmer
This pattern [^\\d+Zimmer] matches any character except a digit or the following characters + Z i m etc...
Using gsub, you can check if the string does not start with the pattern ^\\d+Zimmer using a negative lookahead (?! setting perl = TRUE and then match 1 or more digits if the assertion it true.
gsub("^(?!^\\d+Zimmer\\b)\\d+\\b", "", flat_cl_one$rooms, perl = TRUE)
See an R demo.

Change character string by key at R

here is a dataframe for example:
test_df <- structure(list(plant_id = c("AB1234", "CC0000", "ZX9998", "AA1110", "LO8880"),
NewName = c("ZY8765", "XX9999", "AC0001", "ZZ8889", "OL1119")),
row.names = c(NA, -5L), class = "data.frame",
.Names = c("plant_sp", "NewName"))
As you can see, there is a column call "plant_sp" with a 6 character code.
I'd like to tansform this code to a new code (like at column "NewName" by this format:
For letters:
A-Z
B-Y
C-X
D-W
E-V
F-U
G-T
.
.
.
For numbers:
0-9
1-8
2-7
3-6
4-5
5-4
.
.
.
plant_sp NewName
1 AB1234 ZY8765
2 CC0000 XX9999
3 ZX9998 AC0001
4 AA1110 ZZ8889
5 LO8880 OL1119
So that each character will get the opposite one by its value (0=9, 1=8... A=Z, B=Y...)
How can I do it? a pipe solution would be great.
Thanks a lot!
One option to achieve your desired result would be via a lookup table and stringr::str_replace_all:
library(dplyr)
library(stringr)
lt_letters <- setNames(rev(LETTERS), LETTERS)
lt_numbers <- setNames(rev(0:9),0:9)
test_df %>%
mutate(NewName1 = str_replace_all(plant_sp, "[A-Z0-9]", function(x) c(lt_letters, lt_numbers)[x]))
#> plant_sp NewName NewName1
#> 1 AB1234 ZY8765 ZY8765
#> 2 CC0000 XX9999 XX9999
#> 3 ZX9998 AC0001 AC0001
#> 4 AA1110 ZZ8889 ZZ8889
#> 5 LO8880 OL1119 OL1119

Can dplyr remove all dots from as.character string?

Lets say I have a set of dates
> p
birth
1 22.12.1946
2 01.08.1948
3 02.11.2028
4 18.11.1953
5 28.03.1948
Is there a dplyr solution to remove all dots?
I tried
p %>% mutate(birth = str_replace(birth, ".", ""))
Data
p <- structure(list(birth = c("22.12.1946", "01.08.1948", "02.11.2028",
"18.11.1953", "28.03.1948")), row.names = c(NA, 5L), class = "data.frame")
We need fixed wrapped or escape (\\.) the dot as . in regex matches any character and not the literal .
library(dplyr)
library(stringr)
p %>%
mutate(birth = str_remove_all(birth, fixed(".")))
-output
# birth
#1 22121946
#2 01081948
#3 02112028
#4 18111953
#5 28031948
NOTE: while str_replace_all would work as well, the wrapped str_remove would be a compact option
It is easier to convert to Date class first and then do the format
format(as.Date(p$birth, "%d.%m.%Y"), "%d%m%y")
#[1] "221246" "010848" "021128" "181153" "280348"

Resources