Need to trim last character string only if is blank or "." - r

I have a large vector of words read from an excel file. Some of those records end with space or "." period. Only in those cases, I need to trim those chars.
Example:
"depresion" "tristeza."
"nostalgia" "preocupacion."
"enojo." "soledad "
"frustracion" "desesperacion "
"angustia." "desconocidos."
Notice some words end normal without "." or " ".
Is there a way to do that?
I have this
substr(conceptos, 1, nchar(conceptos)-1))
to test for the last character (conceptos is this long vector)
Thanks for any advise,

We can use sub to match zero or more . or spaces and replace it with blank ("")
sub("(\\.| )*$", "", v1)
#[1] "depresion" "tristeza" "nostalgia" "preocupacion" "enojo"
#[6] "soledad" "frustracion" "desesperacion"
#[9] "angustia" "desconocidos"
data
v1 <- c("depresion","tristeza.","nostalgia","preocupacion.",
"enojo.","soledad ","frustracion","desesperacion ",
"angustia.","desconocidos.")

Regular expressions are good for this:
library(stringr)
x = c("depresion", "tristeza.", "nostalgia", "preocupacion.",
"enojo.", "soledad ", "frustracion", "desesperacion ",
"angustia.", "desconocidos.")
x_replaced = str_replace(x, "(\\.|\\s)$", "")
The pattern (\\.|\\s)$ will match a . or any whitespace that occurs right at the end of the string.

Try this:
iif((mid(trim(conceptos), Len(conceptos), 1) == ".") ? substr(conceptos, 1, nchar(conceptos)-1)) : trim(conceptos))

Related

Remove first place comma and space between two texts and the last comma or space

I have joined multiple columns in from a data frame into a single column. Now because of the formatting I am getting some issues. I want to remove comma if it at the first place and last place comma.Also I want to delete the space coming in between the texts.
eq: if the combines string :
, this is test, dd,pqr, then this should be converted to this is test,dd,prq
df <- as.data.frame(rbind(c('11061002','11862192','11083069'),
c(" ",'1234567','452589'),
c("fs"," ","dd"," ")))
df$f1 <-paste0(df$V1,
',',
" ",
df$V2,
',',
" ",
df$V3,',',df$V4)
df_1 <- as.data.frame(df[,c(5)])
names(df_1)[1] <-"f1"
expected output is :
11061002,11862192,11083069,11061002 (No spaces)
1234567,452589
fs,dd
Regards,
R
Using double gsub :
gsub(',{2,}', ',', gsub('^,|,$| ', '', trimws(df_1$f1)))
#[1] "11061002,11862192,11083069,11061002" "1234567,452589" "fs,dd"
,{2,} - Replaces more than 2+ consecutive commas with one comma.
^, - removes commas at start.
,$ - removes commas at end.
and remove whitespaces from the string.
It seems that you have double space in third row. One way to approach this is to use apply with margin 1 to do a rowwise operation; in your case, paste, i.e.
apply(df, 1, function(i)paste(i[!i %in% c(' ', ' ')], collapse = ','))
#[1] "11061002,11862192,11083069" "1234567,452589" "fs,dd"

Remove specific string

I would like to remove this character
c("
I use this
df <- gsub("c/(/"", " ", df$text)
But I receive this error:
Error: unexpected string constant in "inliwc <- gsub("c/(/"", ""
What can I do?
You need to escape the round brackets as well as the quotes which can be done as :
temp <- 'this is ac(" string'
gsub("c\\(\"", " ", temp)
#OR use single quotes in gsub
#gsub('c\\("', " ", temp)
#[1] "this is a string"
A faster way would be to use fixed = TRUE
gsub('c("', " ", temp, fixed = TRUE)
You can also use sub if there is a single occurrence of the pattern in the string.
The opening round bracket is a regex metacharacter; in R, its literal use needs to be escaped using \\:
text <- "c("
text <- gsub("c\\(", "", text)
We can also use sub
sub('c[()]"', '', temp)
#[1] "this is a string"
data
temp <- 'this is ac(" string'

How to throw out spaces and underscores only from the beginning of the string?

I want to ignore the spaces and underscores in the beginning of a string in R.
I can write something like
txt <- gsub("^\\s+", "", txt)
txt <- gsub("^\\_+", "", txt)
But I think there could be an elegant solution
txt <- " 9PM 8-Oct-2014_0.335kwh "
txt <- gsub("^[\\s+|\\_+]", "", txt)
txt
The output should be "9PM 8-Oct-2014_0.335kwh ". But my code gives " 9PM 8-Oct-2014_0.335kwh ".
How can I fix it?
You could bundle the \s and the underscore only in a character class and use quantifier to repeat that 1+ times.
^[\s_]+
Regex demo
For example:
txt <- gsub("^[\\s_]+", "", txt, perl=TRUE)
Or as #Tim Biegeleisen points out in the comment, if only the first occurrence is being replaced you could use sub instead:
txt <- sub("[\\s_]+", "", txt, perl=TRUE)
Or using a POSIX character class
txt <- sub("[[:space:]_]+", "", txt)
More info about perl=TRUE and regular expressions used in R
R demo
The stringr packages offers some task specific functions with helpful names. In your original question you say you would like to remove whitespace and underscores from the start of your string, but in a comment you imply that you also wish to remove the same characters from the end of the same string. To that end, I'll include a few different options.
Given string s <- " \t_blah_ ", which contains whitespace (spaces and tabs) and underscores:
library(stringr)
# Remove whitespace and underscores at the start.
str_remove(s, "[\\s_]+")
# [1] "blah_ "
# Remove whitespace and underscores at the start and end.
str_remove_all(s, "[\\s_]+")
# [1] "blah"
In case you're looking to remove whitespace only – there are, after all, no underscores at the start or end of your example string – there are a couple of stringr functions that will help you keep things simple:
# `str_trim` trims whitespace (\s and \t) from either or both sides.
str_trim(s, side = "left")
# [1] "_blah_ "
str_trim(s, side = "right")
# [1] " \t_blah_"
str_trim(s, side = "both") # This is the default.
# [1] "_blah_"
# `str_squish` reduces repeated whitespace anywhere in string.
s <- " \t_blah blah_ "
str_squish(s)
# "_blah blah_"
The same pattern [\\s_]+ will also work in base R's sub or gsub, with some minor modifications, if that's your jam (see Thefourthbird`s answer).
You can use stringr as:
txt <- " 9PM 8-Oct-2014_0.335kwh "
library(stringr)
str_trim(txt)
[1] "9PM 8-Oct-2014_0.335kwh"
Or the trimws in Base R
trimws(txt)
[1] "9PM 8-Oct-2014_0.335kwh"

How to take only that part of a string which occurs before a pattern of 2 dots?

I used a code of regular expressions which only took stuff before the 2nd occurrence of a dot. The following is the code:-
colnames(final1)[i] <- gsub("^([^.]*.[^.]*)..*$", "\\1", colnames(final)[i])
But now i realized i wanted to take the stuff before the first occurrence of a pattern of 2 dots.
I tried
gsub(",.*$", "", colnames(final)[i]) (changed the , to ..)
gsub("...*$", "", colnames(final)[i])
But it didn't work
The example to try on
KC1.Comdty...PX_LAST...USD......Comdty........
converted to
KC1.Comdty.
or
"LIT.US.Equity...PX_LAST...USD......Comdty........"
to
"LIT.US.Equity."
Can anyone suggest anything?
Thanks
We could use sub to match 2 or more dots followed by other characters and replace it with blank
sub("\\.{2,}.*", "", str1)
#[1] "KC1.Comdty" "LIT.US.Equity"
The . is a metacharacter implying any character. So, we need to escape (\\.) to get the literal meaning of the character
data
str1 <- c("KC1.Comdty...PX_LAST...USD......Comdty.......", "LIT.US.Equity...PX_LAST...USD......Comdty........")
Another solution with strsplit:
str1 <- c("KC1.Comdty...PX_LAST...USD......Comdty.......", "LIT.US.Equity...PX_LAST...USD......Comdty........")
sapply(strsplit(str1, "\\.{2}\\w"), "[", 1)
# [1] "KC1.Comdty." "LIT.US.Equity."
To also include the dot at the end with #akrun's answer, one can do:
sub("\\.{2}\\w.*", "", str1)
# [1] "KC1.Comdty." "LIT.US.Equity."

Replace a string of characters to " " in a data frame column using R function

Screenshot of the dataframe
I want to replace everything after the first _ in data77298$SAMPLE.CODE to " ", such that I get levels to be GSM2048265, GSM2048266 etc.,
Is it possible using a single command to change all strings after the underscore to null?
you can do it by gsub
my_string<-c("GSM2048265_Somet_323_h4554ing_here","GSM2048266_sometwewe_sdsd_hing_here")
gsub("\\_.*","",my_string)
[1] "GSM2048265" "GSM2048266"
How about:
library(stringr)
my_string<-c("GSM2048265_1_2_£_$_F_CA","GSM2048266_aasv_vaerv_vasd", "GSM2048266_arvqb_oyor_1234")
word(my_string, 1, sep = "_")

Resources