Add a space every three characters from the end

Add a space every three characters from the end - r

I need to add a space between every 3rd character in the string but from the end. Also, ignore the element, which has a percentage %.
string <- c('186527500', '3875055', '23043', '10.8%', '9.8%')
And need to get the view:
186 527 500, 3 875 055, 23 043, 10.8%, 9.8%

You could do:
ifelse(grepl('%', string), string, scales::comma(as.numeric(string), big = ' '))
#> [1] "186 527 500" "3 875 055" "23 043" "10.8%" "9.8%"

Using format and ifelse:
ifelse(!grepl("\\D", string), format(as.numeric(string), big.mark = " ", trim = T), string)
#[1] "186 527 500" "3 875 055" "23 043" "10.8%" "9.8%"

Here is a base R solution with prettyNum. The trick is to set big.mark to one space character.
I use a variant of my answer to another post, but instead of returning an index to the numbers cannot be converted to numeric, the function below returns the index of the numbers that can. This is to avoid trying to put spaces in the % numbers.
check_num <- function(x){
y <- suppressWarnings(as.numeric(x))
if(anyNA(y)){
which(!is.na(y))
} else invisible(NULL)
}
string <- c('186527500', '3875055', '23043', '10.8%', '9.8%')
i <- check_num(string)
prettyNum(string[i], big.mark = " ", preserve.width = "none")
#> [1] "186 527 500" "3 875 055" "23 043"
Created on 2022-05-16 by the reprex package (v2.0.1)
You can then assign the result back to the original string.
string[i] <- prettyNum(string[i], big.mark = " ", preserve.width = "none")

An idea is to reverse the strings, add the space and reverse back, i.e.
new_str <- string[!grepl('%', string)]
stringi::stri_reverse(sub("\\s+$", "", gsub('(.{3})', '\\1 ',stringi::stri_reverse(new_str))))
#[1] "186 527 500" "3 875 055" "23 043"
Another way is via formatC, i.e.
sapply(new_str, function(i)formatC(as.numeric(i), big.mark = " ", big.interval = 3, format = "d", flag = "0", width = nchar(i)))
# 186527500 3875055 23043
# "186 527 500" "3 875 055" "23 043"

Related

Pattern Matching & Replacement / Cleaning of Data in R

I'm looking to plot geospatial data, thus I require coordinates. The information I've been provided is very messy and I need a good system to convert a vector of coordinates in multiple formats into one useful format as per below:
Input:
- lat <- c("41º12'23.33''", "40º39'15.6'", "41 10 589", "38 31 10.6",
"38.720647")
- lon <- c("8º19'40.66''", "7º52'31.95'", "8 37 832", "8 54 17.0",
"-9.22522")
Output:
- lat <- c(41.122333, 40.39156, 41.10589, 38.31106, 38.720647)
- lon <- c(8.194066, 7.523195, 8.37832, 8.54170, -9.22522)
Does anyone have a creative solution? Any response is much appreciated!

lat <- c("41º12'23.33''", "40º39'15.6'", "41 10 589", "38 31 10.6", "38.720647")
lon <- c("8º19'40.66''", "7º52'31.95'", "8 37 832", "8 54 17.0", "-9.22522")
gsub(" ", "", sub("\\s", ".", gsub("º|\\'|\\.", " ", lat)))
[1] "41.122333" "40.39156" "41.10589" "38.31106" "38.720647"
gsub(" ", "", sub("\\s", ".", gsub("º|\\'|\\.", " ", lon)))
[1] "8.194066" "7.523195" "8.37832" "8.54170" "-9.22522"
1.: replace all º, ' and . with a white space
2.: replace the first white space with a decimal point
3.: replace all remaining spaces by "" to have your strings pasted together again

With Base R could you please try following and let me know if this helps you.
lat <- c("41º12'23.33''", "40º39'15.6'", "41 10 589", "38 31 10.6", "38.720647")
for (i in lat)
{
i <- gsub("º| ","#",i)
i <- gsub("'|\\.","",i)
i <- gsub("#",".",i)
print(i)
}
Output will be as follows.
[1] "41.122333"
[1] "40.39156"
[1] "41 10 589"
[1] "38 31 106"
[1] "38720647"

This function will also work:
# DATA
lat <- c("41º12'23.33''", "40º39'15.6'", "41 10 589", "38 31 10.6", "38.720647")
lon <- c("8º19'40.66''", "7º52'31.95'", "8 37 832", "8 54 17.0", "-9.22522")
# FUNCTION
convert_coordinates <- function(x) {
splits <- x %>% strsplit(. , "º| |[.]|'") # Remove unwanted punctuation. Note that you can add more characters to replace here, just separate them with a |
splits <- lapply(splits, function(x){x[!x ==""]}) # Remove any empty strings
output <- c()
for (i in 1:length(splits)) {
output[i] <- paste0(splits[[i]][1], ".", paste0(splits[[i]][2:(length(splits[[i]]))], collapse=""), collapse="")
}
return(output)
}
# RESULTS
convert_coordinates(lat)
# [1] "41.122333" "40.39156" "41.10589" "38.31106" "38.720647"
convert_coordinates(lon)
# [1] "8.194066" "7.523195" "8.37832" "8.54170" "-9.22522"

Strip out numbers from text: R

hello i having the data set which consists to text, whole numbers and decimal numbers, text is a paragraph which will be having all this mix, trying to strip out only the whole numbers and decimal numbers out of the text content, there are about 30k trow entries.
input format of data:
This. Is a good 13 part. of 135.67 code
how to strip 66.8 in the content 6879
get the numbers 3475.5 from. The data. 879 in this 369426
Output:
13 135.67
66.8 6879
3475.5 879 369426
i tried replace all alphabets one by one, but 26+26 replace all is making code lengthy, and replacing "." replaces "." from the numbers also
Thanks,
Praveen

Don't forget that R has already inbuilt regex functions:
input <- c('This. Is a good 13 part. of 135.67 code', 'how to strip 66.8 in the content 6879',
'get the numbers 3475.5 from. The data. 879 in this 369426')
m <- gregexpr('\\b\\d+(?:\\.\\d+)?\\b', input)
(output <- lapply(regmatches(input, m), as.numeric))
This yields
[[1]]
[1] 13.00 135.67
[[2]]
[1] 66.8 6879.0
[[3]]
[1] 3475.5 879.0 369426.0

An option using strsplit to split in separate lines and then use gsub to replace [:alpha] following . or or just [:alpha].
text <- "1. This. Is a good 13 part. of 135.67 code
2. how to strip 66.8 in the content 6879
3. get the numbers 3475.5 from. The data. 879 in this 369426"
lines <- strsplit(text, split = "\n")[[1]]
gsub("[[:alpha:]]+\\.|[[:alpha:]]+\\s*","",lines)
#[1] "1. 13 135.67 "
#[2] "2. 66.8 6879"
#[3] "3. 3475.5 879 369426"

you can try
library(stringr)
lapply(str_extract_all(a, "[0-9.]+"), function(x) as.numeric(x)[!is.na(as.numeric(x))])
[[1]]
[1] 13.00 135.67
[[2]]
[1] 66.8 6879.0
[[3]]
[1] 3475.5 879.0 369426.0
The basic idea is from here but we include the .. The lapply transforms to numeric and excludes NA's
The data:
a <- c("This. Is a good 13 part. of 135.67 code",
"how to strip 66.8 in the content 6879",
"get the numbers 3475.5 from. The data. 879 in this 369426")

Another method with gsub:
string = c('This. Is a good 13 part. of 135.67 code',
'how to strip 66.8 in the content 6879',
'get the numbers 3475.5 from. The data. 879 in this 369426')
trimws(gsub('[\\p{L}\\.\\s](?!\\d)+', '', string, perl = TRUE))
# [1] "13 135.67" "66.8 6879" "3475.5 879 369426"

A solution free of regex and external packages:
sapply(
strsplit(input, " "),
function(x) {
x <- suppressWarnings(as.numeric(x))
paste(x[!is.na(x)], collapse = " ")
}
)
[1] "13 135.67" "66.8 6879" "3475.5 879 369426"

R Error when using beside=TRUE parameter

I am plotting a graph with barplot() and any attempts to use the beside=TRUE parameter seem to return the error of Error in -0.01 * height : non-numeric argument to binary operator
The following is the code for the graph:
combi <- as.matrix(combine)
barplot(combi, main="Top 5 hospitals in California",
ylab="Mortality/Admission Rates", col = heat.colors(5), las=1)
The output of the graph is that the bars are stacked on each other instead of being beside each other.

The issue is not reproducible, when combineis a data.frame:
combine <- data.frame(
HeartAttack = c(13.4,12.3,16,13,15.2),
HeartFailure = c(11.1,7.3,10.7,8.9,10.8),
Pneumonia = c(11.8,6.8,10,9.9,9.5),
HeartAttack2 = c(18.3,19.3,21.8,21.6,17.3),
HeartFailure2 = c(24,23.3,24.2,23.8,24.6),
Pneumonia2 = c(17.4,19,17,18.4,18.2)
)
combi <- as.matrix(combine)
barplot(combi, main="Top 5 hospitals in California",
ylab="Mortality/Admission Rates", col = heat.colors(5), las=1, beside = TRUE)

Had the same issue earlier (different dataset, tho) and resolved it by using as.numeric() on my dataframe after I converted it to matrix with as.matrix(). Leaving as as.numeric()" out leads to "Error in -0.01 * height : non-numeric argument to binary operator"
¯\(ツ)/¯
My df called tmp:
> tmp
125 1245 1252 1254 1525 1545 12125 12425 12525 12545 125245 125425
Freq.x.2d "14" " 1" " 1" " 1" " 3" " 2" " 1" " 1" " 9" " 4" " 1" " 5"
Freq.x.3d "13" " 0" " 1" " 0" " 4" " 0" " 0" " 0" "14" " 4" " 1" " 2"
> dim(tmp)
[1] 2 28
> is(tmp)
[1] "matrix" "array" "structure" "vector"
> tmp <- as.matrix(tmp)
> dim(tmp)
[1] 2 28
> is(tmp)
[1] "matrix" "array" "structure" "vector"
> tmp <- as.numeric(tmp)
> dim(tmp)
NULL
> is(tmp)
[1] "numeric" "vector"
barplot(tmp, las=2, beside=TRUE, col=c("grey40","grey80"))

Display both levels of binary outcome in getDescriptionStatsBy

ex1 = sample(50, x=c("A","B"), replace=TRUE)
ex2 = sample(50, x=c("A","B"), replace=TRUE)
getDescriptionStatsBy(factor(ex1),ex2,html=TRUE,useNA="no",statistics=TRUE,add_total_col="last”)
with useNA=“no” or useNA=“ifany", I get
A B Total P-value
A "13 (59.1%)" "16 (57.1%)" "29 (58.0%)" "1.0”
but with useNA=“always”, I get
A B Total P-value
A "13 (59.1%)" "16 (57.1%)" "29 (58.0%)" "1.0"
B "9 (40.9%)" "12 (42.9%)" "21 (42.0%)" ""
Missing "0 (0.0%)" "0 (0.0%)" "0 (0.0%)" “"
Is there a way to force the display of both levels of the binary outcome (A and B) with useNA=“ifany”? Although it is obvious to me that if there is no missing data, one must only show the row of “A” (and infer that B = 1-A), some of my colleagues seem to prefer that “A” and “B” are displayed always.

I was able to answer my own question by using a wrapper function that removes the "Missing" row with useNA="always"
k2 = getDescriptionStatsBy(factor(ex1),ex2,html=TRUE,useNA="always",statistics=TRUE,add_total_col=FALSE)
r = table(ex2)
n0 = apply(k2,1,function(x) sum(x=="0 (0%)" | x=="0 (0.0%)"))
rmv = which(rownames(k2)=="Missing" & n0==length(r))
k2[-as.numeric(rmv),]
Note in the above, I set add_total_col=FALSE and consequently looked for n0==length(r); if add_total_col="last", I would look for n0==(length(r)+1)

read.zoo with date and time as index in R

I have the following file
"Index" "time" "open" "high" "low" "close" "numEvents" "volume"
2013-01-09 14:30:00 "2013-01-09T14:30:00.000" "110.8500" "110.8500" "110.8000" "110.8000" " 57" "32059"
2013-01-09 14:31:00 "2013-01-09T14:31:00.000" "110.7950" "110.8140" "110.7950" "110.8140" " 2" " 1088"
2013-01-09 14:32:00 "2013-01-09T14:32:00.000" "110.8290" "110.8300" "110.8290" "110.8299" " 5" " 967"
2013-01-09 14:33:00 "2013-01-09T14:33:00.000" "110.8268" "110.8400" "110.8268" "110.8360" " 8" " 2834"
2013-01-09 14:34:00 "2013-01-09T14:34:00.000" "110.8400" "110.8400" "110.8200" "110.8200" " 33" " 6400"
I want to read this file into a zoo (or xts) object in R. This file was created as an xts object and saved using write.zoo(as.zoo(xts_object), path, sep = "\t") and now I am trying to read it in using zoo_object <- read.zoo(path, sep = "\t", header=TRUE, format="%Y-%m-%d %H:%M:%S"). However, I get the following warning
Warning message:
In zoo(rval3, ix) :
some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique
And when I type zoo_object into the console to show its contents i get:
time open high low close numEvents volume
2013-01-09 2013-01-09T14:30:00.000 110.8500 110.850 110.8000 110.8000 57 32059
2013-01-09 2013-01-09T14:31:00.000 110.7950 110.814 110.7950 110.8140 2 1088
2013-01-09 2013-01-09T14:32:00.000 110.8290 110.830 110.8290 110.8299 5 967
2013-01-09 2013-01-09T14:33:00.000 110.8268 110.840 110.8268 110.8360 8 2834
2013-01-09 2013-01-09T14:34:00.000 110.8400 110.840 110.8200 110.8200 33 6400
where you can see that the time is not included in the row index. I assume I can convert the time field into the index and fix my problems, but I also assume I am doing something wrong in reading this file (or maybe writing), but after search all day I have no idea what. Can anyone offer any insight?
dput(zoo_object) after read
dput(zoo_object)
structure(c("2013-01-09T14:30:00.000", "2013-01-09T14:31:00.000",
"2013-01-09T14:32:00.000", "2013-01-09T14:33:00.000", "2013-01-09T14:34:00.000",
"110.8500", "110.7950", "110.8290", "110.8268", "110.8400", "110.850",
"110.814", "110.830", "110.840", "110.840", "110.8000", "110.7950",
"110.8290", "110.8268", "110.8200", "110.8000", "110.8140", "110.8299",
"110.8360", "110.8200", "57", " 2", " 5", " 8", "33", "32059",
" 1088", " 967", " 2834", " 6400"), .Dim = c(5L, 7L), .Dimnames = list(
NULL, c("time", "open", "high", "low", "close", "numEvents",
"volume")), index = structure(c(15714, 15714, 15714, 15714,
15714), class = "Date"), class = "zoo")

(Please note that the object that was desired for testing was the one passed to write.zoo, not the final object.)
By default (it appears) the date-time function used by read.zoo is as.Date while I would have guessed it would be as.POSIXct. You can force the desired behavior with:
zoo_object <- read.zoo("~/test", index.column=2, sep = "\t",
header=TRUE, format="%Y-%m-%dT%H:%M:%S", FUN=as.POSIXct)
Note that I changed your format slightly because looking at the text output in an editor, it appeared that the was a single column with "T" as a separator between the Date and time text.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Add a space every three characters from the end - r

I need to add a space between every 3rd character in the string but from the end. Also, ignore the element, which has a percentage %. string <- c('186527500', '3875055', '23043', '10.8%', '9.8%') And need to get the view: 186 527 500, 3 875 055, 23 043, 10.8%, 9.8%

You could do: ifelse(grepl('%', string), string, scales::comma(as.numeric(string), big = ' ')) #> [1] "186 527 500" "3 875 055" "23 043" "10.8%" "9.8%"

Using format and ifelse: ifelse(!grepl("\\D", string), format(as.numeric(string), big.mark = " ", trim = T), string) #[1] "186 527 500" "3 875 055" "23 043" "10.8%" "9.8%"

Related

Pattern Matching & Replacement / Cleaning of Data in R

Strip out numbers from text: R

R Error when using beside=TRUE parameter

Display both levels of binary outcome in getDescriptionStatsBy

read.zoo with date and time as index in R

Categories

Resources