Creating different sets of variables, and assigning them using if else - r

I have a set of variables. And for different scenarios, I would like to use a different set. So I was wondering if there is a way of doing so without using the if-else statement? Below is what I am trying to achieve using the if-else approach:
if(broker=="A"){
brokeragefutures=aaaa
brokerageoptions=bbbb
STT=cccc
}
else if(broker=="B"){
brokeragefutures=xxxx
brokerageoptions=yyyy
STT=zzzz
}
....
....
....
}
So in the end, I would just use the broker name and the variables will automatically be used for my computations. This would make trying different scenarios very easy.
My question is, is there a more efficient way of doing this, than by using the if-else loop?
Thanks.
Shivam

This looks like what list variables are for. Assuming you have all your data available, create a list like mylist <- list('A'=c(aaaa,bbbbb,ccc),'B'=c(xxxxx,yyyy,zz)) and so on, with one list element for each broker. Then to retriev the data of interest, just mylist$B for broker "B" and so on.

Solution for a single broker and assignment to a single set of values
Depending on volumes, I would suggest a mapping table that allows you to retrieve the corresponding variables needed in each instance. You could then use get to retrieve the variable required.
Here is a short example in line with your code
broker<-"B"
#Setup of initial variables
aa<-1
bb<-2
#Production of your mapping table
mapping<-data.frame(broker=LETTERS[1:2],brokeragefutures=c("aa","bb"),stringsAsFactors=FALSE)
#Use get on the corresponding variable name for the broker
brokeragefutures<-get(mapping[mapping$broker==broker,2])

Related

whisper-merge appears to replace rather than aggregate duplicate values

I'd like to use whisper-merge to merge two whisper databases using an aggregation method, but it appears that one piece of data replaces the other if there's any overlap of timestamps.
Obviously graphite itself can aggregate data, but does anyone know of a command-line utility to perform aggregation?
You can use whisper-fill.py instead of whisper-merge to add missing data.
After that, execute whisper-set-aggregation-method.py to set aggregation defined.
Ref: https://github.com/graphite-project/whisper

index by name or by position in list / vector, which is faster?

I am currently trying to optimise the speed of a physical model computation. The specificity of this model is that it uses hundreds of input parameters, all stored in a big named vector:
initialize = c("temperature"=100, "airpressure"=150, "friction"=0.46)
The model, while iterating hundreds of times, needs to access the parameters, possibly updates them, etc.:
compute(initialize['temperature'], initialize['airpressure'])
initialize['friction'] <- updateP(initialize['friction'])
This is the logic. However I wonder if this is really efficient to work like this. What happens behind an indexation by name, is it fast? Some ideas to change this logic:
define each parameter as an independent variable in the environment?
(but how to pass a a large number of them as argument of a function?
have a list of parameters instead of a named vector?
access each parameter by its index in the vector, like this:
compute(initialize[1], initialize[2])
If I go with this last solution, of course I will loose the readability of the code (which parameter is actually initialize[1]?). So a way to go could be to define their positions first:
temperature.pos <- 1
airpressure.pos <- 2
compute(initialize[temperature.pos], initialize[airpressure.pos])
Of course, why didn't I try this and tested the speed? Well, it would take me hours to transform every location of parameters call in the script, that's why I ask before doing it.
And maybe there is a even more clever solution?
Thanks

Using For-Loop With Strings

I'm learning R and trying to use it for a statistical analysis at the same time.
Here, I am in the first part of the work: I am writing matrices and doing some simple things with them, in order to work later with these.
punti<-c(0,1,2,4)
t1<-matrix(c(-8,36,-8,-20,51,-17,-17,-17,57,-19,-19,-19,35,-8,-19,-8,0,0,0,0,-20,-20,-20,60,
-8,-8,-28,44,-8,-8,39,-23,-8,-19,35,-8,57,-8,-41,-8,-8,55,-8,-39,-8,-8,41,-25,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0),ncol=4,byrow=T)
colnames(t1) <- c("20","1","28","19")
r1<-matrix(c(12,1,19,9,20,20,11,20,20,11,20,28,0,0,0,12,19,19,20,19,28,15,28,19,11,28,1,
33,20,28,31,1,19,17,28,19,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA),ncol=3,byrow=T)
pt1<-rbind(sort(colSums(t1)),sort(punti))
colnames(r1)<-c("Valore","Vincitore","Perdente")
r1<-as.data.frame(r1)
But I have more matrices t_ and r_ so I would like to run a for-loop like:
for (i in 1:150)
{
pt[i]<-rbind(sort(colSums(t[i])),sort(punti))
colnames(r[i])<-c("Valore","Vincitore","Perdente")
r[i]<-as.data.frame(r[i])
}
This one just won't work because r_, t_ and pt_ are strings, but you get both the idea and that I would not like to copy-paste these three lines and manually edit the [i] 150 times. Is there a way to do it?
personally i don't advise dynamically and automatically creating lots of variables in the global environment, and would advise you to think about how you can accomplish your goals without such an approach. with that said, if you feel you really need to dynamically create all these variables, you may benefit from the assign function.
it could work like so:
for (i in 1:150)
{
assign(paste0('p',i),rbind(sort(colSums(t[i])),sort(punti)))
}
the first argument in the assign function is the formula for the variable name and how it is created; the second argument is what you wish to assign to the variable being created.

Declaration of mass variables in column headings in R

I cannot figure out how to assign the column headers from my imported xlsx sheet as variables. I have several column headers, for example DAY_CHNG and INPUT_CHG. So far, I can only run gls(DAY_CHG~INPUT_CHG) by first assigning the values as variables by X<-mydata$DAY_CHG. Is there some command to get these variables assigned automatically when I import?
I had horrible problems getting the program up and running, by the way, due to firewalls at the firm for which I'm working, wondering if that's causing some of the issue.
Any help is much appreciated. Thanks!
attach(mydata) will allow you to directly use the variable names. However, attach may cause problems, especially with more complex data/analyses (see Do you use attach() or call variables by name or slicing? for a discussion)
An alternative would be to use with, such as with(mydata, gls(DAY_CHG~INPUT_CHG)
I would suggest using the $ in order to use the headers as variables and still be able to use other data sets. All that needs to be done is assign the data to an object such as your mydata and by putting a $ immediately following, you will be able to refer to your headers as variables.
As an example for your case, instead of creating a new object x, simply take what you assigned x to and put it directly into your command.
gls(mydata$DAY_CHG ~ mydata$INPUT_CHG)
when it becomes more complicated with more data sets this will allow you to have access to all of them still while not limiting yourself to the data set you attach()

How to get R to use a certain dataset for multiple commands without usin attach() or appending data="" to every command

So I'm trying to manipulate a simple Qualtrics CSV, and I want to use colSums on certain columns of data, given a certain filter.
For example: within the .csv file called data, I want to get the sum of a few columns, and print them with certain labels (say choice1, choice2 etc). That is easy enough by itself:
firstqn<-data.frame(choice1=data$Q7_2,choice2=data$Q7_3,choice3=data$Q7_4);
secondqn<-data.frame(choice1=data$Q8_6,choice2=data$Q8_7,choice3=data$Q8_8)
print colSums(firstqn); print colSums(secondqn)
The problem comes when I want to repeat the above steps with different filters, - say, only the rows where gender==2.
The only way I know how is to create a new dataset data2 and replace data$ with data2$ in every line of the above code, such as:
data2<-(data[data$Q2==2,])
firstqn<-data.frame(choice1=data2$Q7_2,choice2=data2$Q7_3,choice3=data2$Q7_4);
however i have 6 choices for each of 5 questions and am planning to apply about 5-10 different filters, and I don't relish the thought of copy/pasting data2 and `data3' etc hundreds of times.
So my question is: Is there any way of getting R to reference data by default without using data$ in front of every variable name?
I can probably use attach() to achieve this, but i really don't want to:
data2<-(data[data$Q2==2,])
attach(data2)
firstqn<-data.frame(choice1=Q7_2,choice2=Q7_3,choice3=Q7_4);
detach(data2)
is there a command like attach() that would allow me to avoid using data$ in front of every variable, for a specified amount of code? Then whenever I wanted to create a new filter, I could just copy/paste the same code and change the first command (defining a new dataset).
I guess I'm looking for some command like with(data2, *insert multiple commands here*)
Alternatively, if anyone has a better way to do the above in an entirely different way please enlighten me - i'm not very proficient at R (yet).

Resources