Search-and-replace on a list of strings - gsub eapply? - r

Here is a simplified excerpt of my code for reproduction purposes:
library("quantmod")
stockData <- new.env()
stocksLst <- c("AAB.TO", "BBD-B.TO", "BB.TO", "ZZZ.TO")
nrstocks = length(stocksLst)
startDate = as.Date("2016-09-01")
for (i in 1:nrstocks) {
getSymbols(stocksLst[i], env = stockData, src = "yahoo", from = startDate)
}
My data is then stored in this environment stockData which I use to do some analysis. I'd like to clean up the names of the xts objects, which are currently:
ls(stockData)
[1] "AAB.TO" "BB.TO" "BBD-B.TO" "ZZZ.TO"
I want to remove the - and the .TO from all of the names, and have tried to use gsub and eapply, without any success- can't figure out the appropriate syntax. Any help would be appreciated. Thanks.

Using as.list and gsub:
library("quantmod")
stockData <- new.env()
stocksLst <- c("AAB.TO", "BBD-B.TO", "BB.TO", "ZZZ.TO")
nrstocks = length(stocksLst)
startDate = as.Date("2016-09-01")
for (i in 1:nrstocks) {
getSymbols(stocksLst[i], env = stockData, src = "yahoo", from = startDate)
}
ls(stockData)
# [1] "AAB.TO" "BB.TO" "BBD-B.TO" "ZZZ.TO"
#convert to list for ease in manipulation
stockData = as.list(stockData)
#find . and replace everything after it with ""
names(stockData)= gsub("[.].*$","",names(stockData))
#alternately you could match pattern .TO exactly and replace with ""
#names(stockData)= gsub("[.]TO$","",names(stockData))
ls(stockData)
# [1] "AAB" "BB" "BBD-B" "ZZZ"
#convert back to env
list2env(stockData)

Instead of using base R functions like gsub with ?regex while learning R, you may find it much easier to operate on strings with the functions in library stringr. You can use str_replace:
library(stringr)
e.stocks <- list2env(setNames(lapply(stocksLst, function(x) y <- getSymbols(x, env = NULL)),
str_replace(str_replace(stocksLst, "-", ""), "\\.TO", "")))

Related

R - call xts via character

Assume we have a list of characters as basis for a function, which has a xts object with the same name as result:
library(zoo)
library(xts)
library(quantmod)
l<-list("AAPL","NKE")
for(i in 1:length(l)){
getSymbols(l[[i]], src = "yahoo")
write.zoo(l[[i]], paste(l[[i]],".csv", sep=''), sep = ",")
}
My code does not work, cause getSymbols creates an xts object (named AAPL / NKE). My problem is, that I cannot call them properly in the write.zoo function. Can you please help me?
Call getSymbols with auto = FALSE to get the data directly.
library(quantmod)
syms <- c("AAPL", "NKE")
for(s in syms) {
dat <- getSymbols(s, auto = FALSE)
write.zoo(dat, paste0(s, ".csv"), sep = ",")
}
Here, we need get to get the value of the object created
for(i in 1:length(l)){
getSymbols(l[[i]], src = "yahoo")
write.zoo(get(l[[i]]), paste(l[[i]],".csv", sep=''), sep = ",")
}
-checking

removing spaces from all column names at once in R using gsub [duplicate]

I am reading in a bunch of CSVs that have stuff like "sales - thousands" in the title and come into R as "sales...thousands". I'd like to use a regular expression (or other simple method) to clean these up.
I can't figure out why this doesn't work:
#mock data
a <- data.frame(this.is.fine = letters[1:5],
this...one...isnt = LETTERS[1:5])
#column names
colnames(a)
# [1] "this.is.fine" "this...one...isnt"
#function to remove multiple spaces
colClean <- function(x){
colnames(x) <- gsub("\\.\\.+", ".", colnames(x))
}
#run function
colClean(a)
#names go unaffected
colnames(a)
# [1] "this.is.fine" "this...one...isnt"
but this code does:
#direct change to names
colnames(a) <- gsub("\\.\\.+", ".", colnames(a))
#new names
colnames(a)
# [1] "this.is.fine" "this.one.isnt"
Note that I'm fine leaving one period between words when that occurs.
Thank you.
names(a) <- gsub(x = names(a), pattern = "\\.", replacement = "#")
you can use gsub function to replace . with another special character like #.
Rich Scriven had the answer:
Define
colClean <- function(x){ colnames(x) <- gsub("\\.\\.+", ".", colnames(x)); x }
and then do
a <- colClean(a)
to update a

Remove path from variable name in a dataframe

I've put together a function that looks like this, with the first comment lines being an example. Most importantly here is the set.path variable that I use to set the path initially for the function.
# igor.import(set.path = "~/Desktop/Experiment1 Folder/SCNavigator/Traces",
# set.pattern = "StepsCrop.ibw",
# remove.na = TRUE)
igor.multifile.import <- function(set.path, set.pattern, remove.na){
{
require("IgorR")
require("reshape2")
raw_list <- list.files(path= set.path,
pattern= set.pattern,
recursive= TRUE,
full.names=TRUE)
multi.read <- function(f) { # Note that "temp.data" is just a placeholder in the function
temp_data <- as.vector(read.ibw(f)) # Change extension to match your data type
}
my_list <- sapply(X = raw_list, FUN = multi.read) # Takes all files gathered in raw_list and applies multi.read()
my_list_combined <- as.data.frame(do.call(rbind, my_list))
my_list_rotated <- t(my_list_combined[nrow(my_list_combined):1,]) # Matrix form
data_out <- melt(my_list_rotated) # "Long form", readable by ggplot2
data_out$frame <- gsub("V", "", data_out$Var1)
data_out$name <- gsub(set.path, "", data_out$Var2) # FIX THIS
}
if (remove.na == TRUE){
set_name <- na.omit(data_out)
} else if (remove.na == FALSE) {
set_name <- data_out
} else (set_name <- data_out)
}
When I run this function I'll get a large dataframe, where each file that matched the pattern will show up with a name like
/Users/Joh/Desktop/Experiment1 Folder/SCNavigator/Traces/Par994/StepsCrop.ibw`
that includes the entire filepath, and is a bit unwieldy to look at and deal with.
I've tried to remove the path part with the line that says
data_out$name <- gsub(set.path, "", data_out$Var2)
Similar to the command above that removes the dataframe auto-named V1, V2, V3... (which works). I can't remove the string part matching the set.path = "my/path/" though.
Regardless of what your set.path is, you can eliminate it by
gsub(".*/","",mypath)
mypath<-"/Users/Joh/Desktop/Experiment1 Folder/SCNavigator/Traces/Par994/StepsCrop.ibw"
gsub(".*/","",mypath)
[1] "StepsCrop.ibw"
`

Replace characters in column names gsub

I am reading in a bunch of CSVs that have stuff like "sales - thousands" in the title and come into R as "sales...thousands". I'd like to use a regular expression (or other simple method) to clean these up.
I can't figure out why this doesn't work:
#mock data
a <- data.frame(this.is.fine = letters[1:5],
this...one...isnt = LETTERS[1:5])
#column names
colnames(a)
# [1] "this.is.fine" "this...one...isnt"
#function to remove multiple spaces
colClean <- function(x){
colnames(x) <- gsub("\\.\\.+", ".", colnames(x))
}
#run function
colClean(a)
#names go unaffected
colnames(a)
# [1] "this.is.fine" "this...one...isnt"
but this code does:
#direct change to names
colnames(a) <- gsub("\\.\\.+", ".", colnames(a))
#new names
colnames(a)
# [1] "this.is.fine" "this.one.isnt"
Note that I'm fine leaving one period between words when that occurs.
Thank you.
names(a) <- gsub(x = names(a), pattern = "\\.", replacement = "#")
you can use gsub function to replace . with another special character like #.
Rich Scriven had the answer:
Define
colClean <- function(x){ colnames(x) <- gsub("\\.\\.+", ".", colnames(x)); x }
and then do
a <- colClean(a)
to update a

Reading multiple csv of same format in a data frame

I need to run the same set of code for multiple CSV files. I want to do it with the same with macro. Below is the code that I am executing, but results are not coming properly. It is reading the data in 2-d format while I need to run in 3-d format.
lf = list.files(path = "D:/THD/data", pattern = ".csv",
full.names = TRUE, recursive = TRUE, include.dirs = TRUE)
ds<-lapply(lf,read.table)
I dont know if this is going to be useful but one of the way I do is:
##Step 1 read files
mycsv = dir(pattern=".csv")
n <- length(mycsv)
mylist <- vector("list", n)
for(i in 1:n) mylist[[i]] <- read.csv(mycsv[i],header = T)
then I useually just use apply function to change things, for example,
## Change coloumn name
mylist <- lapply(mylist, function(x) {names(x) <- c("type","date","v1","v2","v3","v4","v5","v6","v7","v8","v9","v10","v11","v12","v13","v14","v15","v16","v17","v18","v19","v20","v21","v22","v23","v24","total") ; return(x)})
## changing type coloumn for weekday/weekend
mylist <- lapply(mylist, function(x) {
f = c("we", "we", "wd", "wd", "wd", "wd", "wd")
x$type = rep(f,52, length.out = 365)
return(x)
})
and so on.
Then I save with this following code again after all the changes I made (it is also sometime useful to split original file name and rename each files to save with a part of file name so that I can track each individual files later)
## for example some of my file had a pattern in file name such as "201_E424220_N563500.csv",so I split this to save with a new name like this:
mylist <-lapply(1:length(mylist), function(i) {
mylist.i <- mylist[[i]]
s = strsplit(mycsv[i], "_" , fixed = TRUE)[[1]]
d = cbind(mylist.i[, c("type", "date")], ID = s[1], Easting = s[2], Northing = s[3], mylist.i[, 3:ncol(mylist.i)])
return(d)
})
for(i in 1:n)
write.csv(file = paste("file", i, ".csv", sep = ""), mylist[i], row.names = F)
I hope this will help. When you get some time pleaes read about the PLYR package as I am sure this will be very useful for you, it is a very useful package with lots of data analysis options. PLYR has apply functions such as:
## l_ply split list, apply function and discard result
## ldply split list, apply function and return result in data frame
## laply split list, apply function and return result in an array
for example you can use the ldply to read all your csv and return a data frame simething like:
data = ldply(list.files(pattern = ".csv"), function(fname) {
j = read.csv(fname, header = T)
return(j)
})
So here J will be your data frame with all your csv files data.
Thanks,Ayan

Resources