How to save results into a various data.frames with various names - r

I have the following vector:
USTickers=c("BAC","C","JPM","HBS","WFC","GS","MS","USB","BK","PNC")
Actually this vector of mine is much longer, but I just cut it short. This vector has ticker names of stocks.
I use quantmod to download data of the stocks from yahoo.
Since I do not intend to write function for every specific ticker I want to do a loop.
First I want to use a function getSymbols which is not a problem. An object of a specific stock is downloaded.
However I want to make some adjustments of it and save it. Then I have a problem (second line in the for in loop). I want to have a variable name. The name of an object in which it will be saved has to be changing. But I am unable to do that.
for (i in 1:(length(USTickers))) {
getSymbols.yahoo(paste(USTickers[i]),.GlobalEnv,from=StrtDt,to=EndDt)
as.symbol(USTickers[i]=data.frame(time(get(USTickers[1])),get(USTickers[1])[,4],row.names=NULL)
}
In addiction:
in every object of a stock that I download, a column name is in this form "AAL.Open" and i want to change it to "AAL". How am I supposed to change column name?
I know it can be done with colnames function, but i don't know how to automate the operation.
Cause the first part "AAL" will be constantly changing, i just want to get rid of the ".Open" part.
Basically I could just be rewriting it with a ticker name, but I do not know how to apply it when the column name will be changing and I am planning to use as a reference my vector USTickers.

It is a better idea to turn off auto assignment with the getSymbols function and store the results in a list. The elements can be easily accessed later. See the below for some ideas.
require(quantmod)
# Not going to loop through all
USTickers = c("BAC","C")#,"JPM","HBS","WFC","GS","MS","USB","BK","PNC")
# Initialise empty list
mysymbols <- vector("list", length(USTickers))
# Loop through symbols
for (i in 1:length(USTickers)) {
# Store in list
mysymbols[[i]] <- getSymbols.yahoo(paste(USTickers[i]),auto.assign = F)
# Isolate column of interest and date
mysymbols[[i]] <- data.frame(time(mysymbols[[i]]),
mysymbols[[i]][,4],
row.names = NULL)
# Change list elements name to symbol
names(mysymbols)[i] <- USTickers[i]
}
Regarding substituting names, this can be done easily with gsub which can be applied to the colnames. For example:
gsub(".Open", "", "AAL.Open")
However if you just want to make that column name the ticker you can just do that directly in the loop as well colnames(mysymbols[[i]])[2] <- USTickers[i]

Related

Extract items in a list using variable names in R

I'm parsing a JSON using the RJSONIO package.
The parsed item contains nested lists.
Each item in the list can be extracted using something like this:
dat_raw$`12`[[31]]
which correctly returns the string stored at this location (in this example, the '12' refers to the month and [[31]] to day).
"31-12-2021"
I now want to run a for loop to sequentially extract the date for every month. Something like this:
for (m in 1:12) {
print(dat_raw$m[[31]])
}
This, naturally, returns a NULL because there is no $m[[31]] in the list.
Instead, I'd like to extract the objects stored at $`1`[[31]], $`2`[[31]], ... $`12`[[31]].
There must be a relatively easy solution here but I haven't managed to crack it. I'd value some help. Thanks.
EDIT: I've added a screenshot of the list structure I'm trying to extract. The actual JSON object is quite large for a dput() output. Hope this helps
So, to get the date in this list, I'd use something like dat_raw$data$`1`[[1]]$date$gregorian$date.
What I'm trying to do is run a loop to extract multiple items of the list by cycling through $data$`1`[[1]]$..., $data$`2`[[1]]$... ... $data$`12`[[1]]$... using $data$m[[1]]$... in a for loop where m is the month.
Instead of dat_raw$`12`[[31]], you can have dat_raw[[12]][[31]] if 12 is the 12th element of the JSON. So your for loop would be:
for (m in 1:12) {
print(dat_raw[[m]][[31]])
}

Combining many vectors into one larger vector (in an automated way)

I have a list of identifiers as follows:
url_num <- c('85054655', '85023543', '85001177', '84988480', '84978776', '84952756', '84940316', '84916976', '84901819', '84884081', '84862066', '84848942', '84820189', '84814935', '84808144')
And from each of these I'm creating a unique variable:
for (id in url_num){
assign(paste('test_', id, sep = ""), FUNCTION GOES HERE)
}
This leaves me with my variables which are:
test_8505465, test_85023543, etc, etc
Each of them hold the correct output from the function (I've checked), however my next step is to combine them into one big vector which holds all of these created variables as a seperate element in the vector. This is easy enough via:
c(test_85054655,test_85023543,test_85001177,test_84988480,test_84978776,test_84952756,test_84940316,test_84916976,test_84901819,test_84884081,test_84862066,test_84848942,test_84820189,test_84814935,test_84808144)
However, as I update the original 'url_num' vector with new identifiers, I'd also have to come down to the above chunk and update this too!
Surely there's a more automated way I can setup the above chunk?
Maybe some sort of concat() function in the original for-loop which just adds each created variable straight into an empty vector right then and there?
So far I've just been trying to list all the variable names and somehow get the output to be in an acceptable format to get thrown straight into the c() function.
for (id in url_num){
cat(as.name(paste('test_', id, ",", sep = "")))
}
...which results in:
test_85054655,test_85023543,test_85001177,test_84988480,test_84978776,test_84952756,test_84940316,test_84916976,test_84901819,test_84884081,test_84862066,test_84848942,test_84820189,test_84814935,test_84808144,
This is close to the output I'm looking for but because it's using the cat() function it's essentially a print statement and its output can't really get put anywhere. Not to mention I feel like this method I've attempted is wrong to begin with and there must be something simpler I'm missing.
Thanks in advance for any help you guys can give me!
Troy

In R, I am trying to make a for loop that will cycle through variable names and perform functions on them

I have variables that are named team.1, team.2, team.3, and so forth.
First of all, I would like to know how to go through each of these and assign a data frame to each one. So team.1 would have data from one team, then team.2 would have data from a second team. I am trying to do this for about 30 teams, so instead of typing the code out 30 times, is there a way to cycle through each with a counter or something similar?
I have tried things like
vars = list(sprintf("team.x%s", 1:33)))
to create my variables, but then I have no luck assigning anything to them.
Along those same lines, I would like to be able to run a function I made for cleaning and sorting the individual data sets on all of them at once.
For this, I have tried a for loop
for (j in 1:33) {
assign(paste("team.",j, sep = ""), cleaning1(paste("team.",j, sep =""), j))
}
where cleaning1 is my function, with two calls.
cleaning1(team.1, 1)
This produces the error message
Error in who[, -1] : incorrect number of dimensions
So obviously I am hoping the loop would count through my data sets, and also input my function calls and reassign my datasets with the newly cleaned data.
Is something like this possible? I am a complete newbie, so the more basic, the better.
Edit:
cleaning1:
cleaning1 = function (who, year) {
who[,-1]
who$SeasonEnd = rep(year, nrow(who))
who = (who[-nrow(who),])
who = tbl_df(who)
for (i in 1:nrow(who)) {
if ((str_sub(who$Team[i], -1)) == "*") {
who$Playoffs[i] = 1
} else {
who$Playoffs[i] = 0
}
}
who$Team = gsub("[[:punct:]]",'',who$Team)
who = who[c(27:28,2:26)]
return(who)
}
This works just fine when I run it on the data sets I have compiled myself.
To run it though, I have to go through and reassign each data set, like this:
team.1 = cleaning1(team.1, 1)
team.2 = cleaning1(team.2, 2)
So, I'm trying to find a way to automate that part of it.
I think your problem would be better solved by using a list of data frames instead of many variables containing one data frame each.
You do not say where you get your data from, so I am not sure how you would create the list. But assuming you have your data frames already stored in the variables team.1 etc., you could generate the list with
team.list <- list(team.1, team.2, ...,team.33)
where the dots stand for the variables that I did not write explicitly (you will have to do that). This is tedious, of course, and could be simplified as follows
team.list <- do.call(list,mget(paste0("team.",1:33)))
The paste0 command creates the variable names as strings, mget converts them to the actual objects, and do.call applies the list command to these objects.
Now that you have all your data in a list, it is much easier to apply a function on all of them. I am not quite sure how the year argument should be used, but from your example, I assume that it just runs from 1 to 33 (let me know, if this is not true and I'll change the code). So the following should work:
team.list.cleaned <- mapply(cleaning1,team.list,1:33)
It will go through all elements of team.list and 1:33 and apply the function cleaning1 with the elements as its arguments. The result will again be a list containing the output of each call, i.e.,
list( cleaning1(team.list[[1]],1), cleaning1(team.list[[2]],2), ...)
Since you are now to R I strongly recommend that you read the help on the apply commands (apply, lapply, tapply, mapply). There are very useful and once you got used to them, you will use them all the time...
There is probably also a simple way to directly generate the list of data frames using lapply. As an example: if the data frames are read in from files and you have the file names stored in a character vector file.names, then something along the lines of
team.list <- lapply(file.names,read.table)
might work.

Creating (and saving to) an object with a random name

I have a function which I use repeatedly. One of the things it returns is a plot visualising effects of a model. I want the function to save the plot to an object, but I want the name of the object to have a random component to it. I use the function multiple times and don't want the plots to overwrite. But I could use the unique identifier in its name to reference it later for the writeup.
So I tried a few things, trying to save a simple object under a partially-random name. All of them fail because I put a function left from the "<-" sign. I'm not going to give examples, because they are just very very wrong.
So I'd like to have something like:
NAME(randomNumber) <- "some plot"
Which, after running multiple times in a function (with the actual input on the right of course) would result in objects named randomly like
NAME104, NAME314, NAME235, etc.
Is this at all doable?
Yes its doable.
Don't do it.
Make a LIST of objects. You can use the name as the key in the list. Example:
plots = list()
plots[["NAME104"]] = "some plot"
plots[["NAMEXXX"]] = "some other plot"
Why? Because now it's easy to loop over the plots stored in the list. Its also easy to create the list in a loop in the first place, something like:
for(i in 1:100){
data = read.table(paste("data",i,".csv"))
name = data$name[1] # get name from column in file
plots[[name]] = plotthing(data)
}
If you really really want to create a thing with a random name, use assign:
> assign(paste0("NAME",round(runif(1,1,1000))), "hello")
> ls(pattern="NAME*")
[1] "NAME11" "NAME333" "NAME717" "NAME719"
But really DONT do that.

How to remove selected R variables without having to type their names

While testing a simulation in R using randomly generated input data, I have found and fixed a few bugs and would now like to re-run the simulation with the same data, but with all intermediate variables removed to ensure it's a clean test.
Is there a way to remove several dozen manually selected variables from the workspace without having to:
a) clobber the entire workspace, e.g. rm(list=ls()), or b) type each variable name, e.g. remove(name1, name2, ...)?
Ideal solution would be to use ls() to inspect the definitions and then pick out the indices of the ones I want to remove, e.g.
ls() # inspect definitions
delme <- c(3,5,7:9,11,13) # names selected for removal
remove(ls()[delme]) # DESIRED SOLUTION -- doesn't quite work this way
(In hindsight, I should have used a fixed seed to generate the random input data, which allow clearing everything and then re-running the test...)
There is a much simpler and more direct solution:
vars.to.remove <- ls()
vars.to.remove <- temp[c(1,2,14:15)]
rm(list = vars.to.remove)
Or, better yet, if you are good about variable naming schemes, you can use the following pattern matching strategy:
E.g. I name all temporary variables with the starting string "Temp."
... so, you can have Temp.Names, Temp.Values, Temp.Whatever
The following produces the list of variables that match this pattern
ls(pattern = "^Temp\\.")
So, you can remove all unneeded variables using ONE line of code, as follows:
rm(list = ls(pattern = "^Temp\\."))
Hope this helps.
Assad, while I think the actual answer to the question is in the comments, let me suggest this pattern as a broader solution:
rm(list=
Filter(
Negate(is.na), # filter entries corresponding to objects that don't meet function criteria
sapply(
ls(pattern="^a"), # only objects that start with "a"
function(x) if(is.matrix(get(x))) x else NA # return names of matrix objects
) ) )
In this case, I'm removing all matrix object that start with "a". By modifying the pattern argument and the function used by sapply here, you can get pretty fine control over what you delete, without having to specify many names.
If you are concerned that this could delete something you don't want to delete, you can store the result of the Filter(... operation in a variable, review the contents, and then execute the rm(list=...) command.
Try
eval(parse(text=paste("rm(",paste(ls()[delme],sep=","),")")))
I had a similar requirement. I pulled all the elements I needed to a list:
varsToPurge = as.list(ls())
I then reassign the few values I wish to keep with new variable names which will not be in the variable varsToPurge. After that I looped through the elements
for (j in 1:length(varsToPurge)){
rm(list = as.character(varsToPurge[j]))
}
Do a little garbage collecting, and you maintain a clean environment as you go through your code.
gc()
You can also use a vector of row numbers you wish to keep instead and run through the vector in the loop but it won't be as dynamic if you add rough work you wish to remove.

Resources