Adding a dash to a string - r

i think I have a simple question, but I did not get it. I have something like this:
df <- data.frame(identifier = c("9562231945200505501901190109-5405303
", "190109-8731478", "1901098260031", "
.9..43675190109-3690341", "-1103214010200000190109-8841419", "-190109-5232506-.08001234-111",
"190109-2018362-","51770217835901218103304190109-9339765
"), true_values = c("190109-5405303","190109-8731478","190109-8260031","190109-3690341","190109-8841419",
"190109-5232506","190109-2018362","190109-9339765"))
I used the following function and it almost worked, but I do not know how too avoid the last dash.
I tried str_replace and sth else, but it did not work.

You can try substr with paste after removing unwanted parts with gsub.
tt <- gsub("-\\..*", "", df$identifier)
tt <- gsub("[^0-9]", "", tt)
tt <- substring(tt, nchar(tt)-12)
paste0(substr(tt, 1, 6), "-", substring(tt, 7))
#[1] "190109-5405303" "190109-8731478" "190109-8260031" "190109-3690341"
#[5] "190109-8841419" "190109-5232506" "190109-2018362" "190109-9339765"

Related

Solve shorten notation by regular expression

I want to solve two shorten notation in R.
For Ade/i, I should get Ade, Adi
For Do(i)lfal, I should get Dolfal, Doilfal
I have this solution
b='Do(i)lferl'
gsub(pattern = '(\\w+)\\((\\w+)+\\)', replacement='\\1\\i,\\1\\2', x=b)
Can anyone help me to code this
If these values are part of a dataframe, you can do this:
df <- data.frame(
Nickname = c("Ade/i", "Do(i)lfal")
)
df$Nickname_new[1] <- paste0(sub("(?=.*/)(.*)/.*", "\\1", df$Nickname[1], perl = T), ",", paste0(unlist(str_split(df$Nickname[1], "\\w/")), collapse = ""))
df$Nickname_new[2] <- paste0(sub("(.*)(\\(.*\\))(.*)", "\\1\\3", df$Nickname[2]),",", sub("(.*)\\((\\w)\\)(.*)", "\\1\\2\\3\\4", df$Nickname[2]))
which gives you:
df
Nickname Nickname_new
1 Ade/i Ade,Adi
2 Do(i)lfal Dolfal,Doilfal
EDIT:
Just in case the whole thing is not part of a dataframe but an atomic vector, you can do this:
x <- c("Ade/i", "Do(i)lfal")
c(paste0(sub("/.*", "", x[grepl("/", x)]), ", ", sub("./", "", x[grepl("/", x)])),
paste0(sub("(.*)\\((\\w)\\)(.*)", "\\1\\2\\3\\4", x[grepl("\\(", x)]), ", ", sub("\\(\\w\\)", "", x[grepl("\\(", x)])))
which gives you:
[1] "Ade, Adi" "Doilfal, Dolfal"
If there are values that you don't want to change, then this regex by #Wiktor will work (it works even with any scenario!):
x <- c("Ade/i", "Do(i)lfal", "Peter", "Mary")
gsub('(\\w*)\\((\\w+)\\)(\\w*)', '\\1\\2\\3, \\1\\3', gsub("(\\w*)(\\w)/(\\w)\\b", "\\1\\2, \\1\\3", x))
which gives you:
[1] "Ade, Adi" "Doilfal, Dolfal" "Peter" "Mary"

Extract a pattern before // and after || symbol

I am not very familiar with regex in R.
in a column I am trying to extract words before // and after || symbol. I.e. this is what I have in my column:
qtaro_269//qtaro_269||qtaro_353//qtaro_353||qtaro_375//qtaro_375||qtaro_11//qtaro_11
This is what I want:
qtaro_269; qtaro_353; qtaro_375; qtaro_11
I found this: Extract character before and after "/" and this: Extract string before "|". However I don't know how to adjust it to my input. Any hint is much appreciated.
EDIT:
a qtaro_269//qtaro_269||qtaro_353//qtaro_353||qtaro_375//qtaro_375||qtaro_11//qtaro_11
b
c qtaro_269//qtaro_269||qtaro_353//qtaro_353||qtaro_375//qtaro_375||qtaro_11//qtaro_11
What about the following?
# Split by "||"
x2 <- unlist(strsplit(x, "\\|\\|"))
[1] "qtaro_269//qtaro_269" "qtaro_353//qtaro_353" "qtaro_375//qtaro_375" "qtaro_11//qtaro_11"
# Remove everything before and including "//"
gsub(".+//", "", x2)
[1] "qtaro_269" "qtaro_353" "qtaro_375" "qtaro_11"
And if you want it as one string with ; for separation:
paste(gsub(".+//", "", x2), collapse = "; ")
[1] "qtaro_269; qtaro_353; qtaro_375; qtaro_11"
This is how I solved it. For sure not the most intelligent and elegant way, so suggestions to improve it are welcome.
df <-unlist(lapply(strsplit(df[[2]],split="\\|\\|"), FUN = paste, collapse = "; "))
df <-unlist(lapply(strsplit(df[[2]],split="\\/\\/"), FUN = paste, collapse = "; "))
df <- sapply(strsplit(df$V2, "; ", fixed = TRUE), function(x) paste(unique(x), collapse = "; "))

Replace special character in data frame

I have a dataframe which contains in different cells a special character which I know which is. An example of the structure:
df = data.frame(col_1 = c("21 myspec^ch2 12",NA),
col_2 = c("1 myspec^ch2 4","4 myspec^ch2 212"))
The character is this myspec^ch2 and I would like to replace with -. An example of expected output:
df = data.frame(col_1 = c("21-12",NA),
col_2 = c("1-4","4-212"))
I tried this but it is not working:
df [ df == " myspec^ch2 " ] <- "-"
To get gsub work on whole dataframe use apply:
apply(df, 2, function(x) gsub(" myspec\\^ch2 ", "-", x))
You really want to do a regex-style substitution here. However, in regex, ^ is seen as the beginning of the line (rather than a literal caret). So you can do something like this (using the stringr package):
library(dplyr)
library(stringr)
fixed_df <- df %>%
mutate_all(funs(str_replace_all( . , " myspec\\^ch2 ", "-"))
Note the double backslashes in front of the caret--that escapes the caret and tells R to interpret it literally, rather than as the beginning of the line.

Search-and-replace on a set of columns - getting an error trying to gsub

this is a follow-up to this question: Search-and-replace on a list of strings - gsub eapply?
I have the following code:
library(quantmod)
library(stringr)
stockData <- new.env()
stocksLst <- c("AAB.TO", "BBD-B.TO", "BB.TO", "ZZZ.TO")
nrstocks = length(stocksLst)
startDate = as.Date("2016-09-01")
for (i in 1:nrstocks) {
getSymbols(stocksLst[i], env = stockData, src = "yahoo", from = startDate)
}
stockData = as.list(stockData)
names(stockData) = gsub("[.].*$", "", names(stockData))
names(stockData) = gsub("-", "", names(stockData))
symbolsLstCl <- ls(stockData)
The last post got me this far and I greatly appreciate the help. Now, I am trying to do a similar replace for the column names as quantmod includes the symbol name in the columns:
colnames(stockData$ZZZ)
# [1] "ZZZ.TO.Open" "ZZZ.TO.High" "ZZZ.TO.Low" "ZZZ.TO.Close" "ZZZ.TO.Volume" "ZZZ.TO.Adjusted"
I can easily update one of the xts objects using colnames, but I want to include this in a loop so I can do it to all. This is what I had tried, but it fails:
eval(parse(text = paste0("colnames(stockData$", symbolsLstCl[i], ")"))) <- eval(parse(text = (paste0("str_replace(colnames(stockData$", symbolsLstCl[i], "), ", "\".TO\", ", "\"\")"))))
Which I find strange, as if I use this (where the left side is hard-coded), it works:
colnames(stockData$ZZZ) <- eval(parse(text = (paste0("str_replace(colnames(stockData$", symbolsLstCl[i], "), ", "\".TO\", ", "\"\")"))))
I have the sneaking suspicion that there is a much better way to update all of the columns for each element in these lists.. any suggestions are appreciated. Thanks, Adam
allnames <- lapply(stockData,
function(x) names(x) = gsub(".TO", "", names(x)))
# replace column names
for (i in 1:length(stockData)) {
names(stockData[[i]]) <- allnames[[i]]
}
# print all column names
for (i in 1:length(stockData)) {
print(names(stockData[[i]]))
}
[1] "AAB.Open" "AAB.High" "AAB.Low" "AAB.Close" "AAB.Volume" "AAB.Adjusted"
[1] "BBD-B.Open" "BBD-B.High" "BBD-B.Low" "BBD-B.Close" "BBD-B.Volume" "BBD-B.Adjusted"
[1] "ZZZ.Open" "ZZZ.High" "ZZZ.Low" "ZZZ.Close" "ZZZ.Volume" "ZZZ.Adjusted"
[1] "BB.Open" "BB.High" "BB.Low" "BB.Close" "BB.Volume" "BB.Adjusted"
Edited: the output were not correct just now.
I suppose this is what you hope to get.

Identify missing values in a sequence / perform asymmetric difference between two lists

Using R, I want to efficiently identify which values in a sequence are missing. I've written the below example of how I do it. There must be a better way. Can someone help?
data.list=c(1,2,4,5,7,8,9)
full.list=seq(from = 1, to = 10, by =1)
output <- c()
for(i in 1:length(full.list)){
holder1 <- as.numeric(any(data.list == i))
output[i] <- holder1
}
which(output == 0)
Another possible solution
setdiff(full.list,data.list)
full.list[!full.list %in% data.list]
Another option using match (similar to %in%)
full.list[!match(full.list,data.list,nomatch=FALSE)]
[1] 3 6 10
Using grep():
grep(paste("^", data.list, "$", sep = "", collapse = "|"), full.list, invert = TRUE)
You could be "lazy" and use collapse = ^|$ but use the above for precise accuracy.
Using grepl():
full.list[!grepl(paste("^", data.list, "$", sep = "", collapse = "|"), full.list)]

Resources