R f\Functions/Macros and Generating Calls From Data - r

So I'm a proficient SAS user but very new to R. I am finding myself frustrated because I cannot figure out how to do something in R that's pretty simple in SAS, and so I assume simple in R. I think there's something that I'm missing at a very fundamental level about the way R works.
In R, I am using the airquality dataset and trying to do a scatter plot of every variable by every variable.
In SAS I would do something like the following:
proc contents data=airquality noprint out=contents;
run;
proc sql noprint;
create table all_combs as select A.name,B.name as name2 from contents A, contents B where a.name ne B.name;
select cats('%scatter(',name,',',name2,')') into :scatter separated by ' ' from all_combs;
quit;
%macro scatter(x,y);
proc sqplot data=airquality;
scatter x=&x. y=&y.;
run;
%mend;
&scatter.;
The basic process is to generate a list of variables from the data and generate the cartesian product. The result is thrown into a series of macro calls that are stored in a macro variable which is called after you define the macro.
I assume the way to do this in R is to generate a function to do this, but I failed there. I expected the below to work and it didn't and I don't understand why.
plotfun=function(v1,v2){plotfun=plot(airquality$v1,airquality$v2)}
plotfun(Wind,Temp)
Even after I do that part I don't know how to automatically generate the calls of the function.
Any suggestions?

This might do what you were hoping:
plotfun=function(df, v1,v2){ plot(df[[v1]],df[[v2]]) }
plotfun(airquality, 'Wind','Temp')
Note that your code would have caused the R interpreter to go looking for a variable named 'Wind', then passed it to v1 but it would have failed even if it had found such a variable, since $ does not evaluate its second argument, v1 in your case.

Related

How can I create a procedure from a long command in R?

I have a command with six lines that I want to use several times. Therfore, I want to assign a name to this command and use it as a procedure instead of writing the whole command lines over and over.
In this case it is a <-rbind() command, but the issue is also more general.
modelcoeff<-rbind(modelcoeff,c(as.character((summary(mymodel)$terms[[2]])[[3]]),
as.character((((((summary(mymodel)$terms[[2]])[[2]])[[3]])[[3]])[[2]])[[3]]),
summary(mymodel)$coefficients[2,1],
summary(mymodel)$coefficients[2,4],
summary(mymodel)$coefficients[2,2],
summary(mymodel)$r.squared*100))
I would like to call something like rbindmodelcoeff and execute these command lines. How can I achieve this?
I tried to write a function, but it didn't seem to be the right approach.
A literal wrapping of your code into a function:
rbindmodelcoeff <- function(modelcoeff, mymodel) {
rbind(modelcoeff,
c(as.character((summary(mymodel)$terms[[2]])[[3]]),
as.character((((((summary(mymodel)$terms[[2]])[[2]])[[3]])[[3]])[[2]])[[3]]),
summary(mymodel)$coefficients[2,1],
summary(mymodel)$coefficients[2,4],
summary(mymodel)$coefficients[2,2],
summary(mymodel)$r.squared*100))
}
However, there are a couple changes I recommend:
call summary(mymodel) once, then re-use the results
you are using as.character on some of the objects but not all within the enclosing c(.), so everything is being converted to a character; to see what I mean, try c(as.character(1), 2); we can use a list instead to preserve string-vs-number
rbindmodelcoeff <- function(modelcoeff, mymodel) {
summ <- summary(mymodel)
rbind(modelcoeff,
list(as.character((summ$terms[[2]])[[3]]),
as.character((((((summ$terms[[2]])[[2]])[[3]])[[3]])[[2]])[[3]]),
summ$coefficients[2,1],
summ$coefficients[2,4],
summ$coefficients[2,2],
summ$r.squared*100))
}
But there are still some problems with this. I can't get it to work at the moment since I don't know the model parameters you're using, so as.character((summ$terms[[2]])[[3]]) for me will fail. With that, I'm always hesitant to hard-code so many brackets without a firm understanding of what is being used. It's out of scope for this question (which is being converting your basic code into a function), but you might want to find out how to generalize that portion a bit.

Running a loop and executing a function after each iteration of loop: r

I'm running a google big query script off of RStudio.
I have one important parameterised variable. Which needs to be replaced with values in a dataframe
health_tags<-read.csv('marker_tags.csv')
health_tags<-tail(tags, 7)
I have built a function which executes my query whilst adding the parameters to my variables.
query_details (MD2_date_start="2018-06-06",
MD2_date_end="2018-07-07",
Sterile_tag="7894")
So "query_details" is a function API call which fills in details for BQ to run. How do I write a looper which replaces the values in "sterile_tag" with the codes found in the health_tags CSV and then run the "query_details" function each time until all iterations have completed.
You can use sapply where column should be the real name of your column:
sapply(health_tags$column, function(x) query_details (MD2_date_start="2018-06-06",
MD2_date_end="2018-07-07",
Sterile_tag=as.character(x)))

How to explore the structure and dimensions of an object in Octave?

Im Matlab the properties function sounds like a possible valid equivalent to commands in R that acquaint you with a particular object in the working environment, providing information as to its structure (data.frame, matrix, list, vector) and type of variables (character, numeric) (for example, with the R command str()), dimensions (using perhaps the call dim()), and names of the variables (names()).
However, this function is not operational in Octave:
>> properties(data)
warning: the 'properties' function is not yet implemented in Octave
I installed the package dataframe as suggested in a comment on the post linked above:
pkg install -forge dataframe and loaded it pkg load dataframe
But I can't find a way of getting a summary of the structure and dimensions of a datset data.mat in the workspace.
I believe it's a structure consisting of a 4 x 372,550 numerical matrix; two 4 x 46,568 numerical matrices, and a 256 x 1 character matrix. To get this info I had to scroll through many pages of the printout of data.
This info is not available on the Octave IDE, where I get:
Name Class Dimensions
data struc 1 x 1
a far cry from the complexity of the object data.
What is the smart way of getting some this information about an object in the workspace in Octave?
Following up on the first answer provided, here is what I get with whos:
>> whos
Variables in the current scope:
Attr Name Size Bytes Class
==== ==== ==== ===== =====
data 1x1 7452040 struct
Total is 1 element using 7452040 bytes
This is not particularly informative about what data really contains. In fact, I just found out a way to extract the names inside data:
>> fieldnames(data)
ans =
{
[1,1] = testData
[2,1] = trainData
[3,1] = validData
[4,1] = vocab
}
Now if I call
>> size(data)
ans =
1 1
the output is not very useful. On the other hand, knowing the names of the matrices within data I can do
>> size(data.trainData)
ans =
4 372550
which is indeed informative.
If you type the name of the variable, you'll see information about it. In your case it's a struct, so it'll tell you the field names. Relevant functions are: size, ndims, class, fieldnames, etc.
size(var)
class(var)
etc.
You refer to .mat. Maybe you have a MAT-file, which you can load with load filename. Once loaded you can examine and use the variables in the file.
whos
prints simple information on the variables in memory, most useful to see what variables exist.
Following up on your edited question. This works in Octave:
for s=fieldnames(data)'
s=s{1};
tmp=data.(s);
disp([s,' - ',class(tmp),' - ',mat2str(size(tmp))])
end
It prints basic information of each of the members of the struct. It does assume that data is a 1x1 struct array. Note that a struct can be an array:
data(2).testData = [];
Causes your data struct to be a 2x1 array. This is why size(data) is relevant. class is also important (it's shown in the output of whos. Variables can be of type double (normal arrays), and other numeric types, logical, struct, cell (an array of arrays), or a custom class that you can write yourself.
I highly recommend reading an introductory text on MATLAB/Octave, as it works very differently from R. It's not just a different flavor of language, it's a whole different world.

Open code statement recursion detected during exporting a file

I try to export a file in SAS but I get "Open code statement recursion detected." error. Since I export more than one files depending on date I define as a macro variable based on the prompt date, I want to name my file to be exported with this variable but it does not work. I would really appreciate if anyone helps me.
rep_date = 30APR2015:00:00:00
Outfile = work.A042015.sas7
%let var = CATS("A",MONTH(DATEPART(&rep_date)),YEAR(DATEPART(&rep_date)));
data WORK.&var(compress=yes); 
set WORK.have;
run; 
Macro variables are just strings. So if you want to execute functions in macro code you need to wrap the function inside of the %SYSFUNC() macro function.
%let rep_date='01JAN2015:01:23'dt ;
%let dsname = A%sysfunc(datepart(&rep_date),monyy6);
data &dsname(compress=yes);
set have;
run;
As a more broad issue, OPEN STATEMENT RECURSION DETECTED refers to cases where you assign a macro variable to itself.
%let &mvar = &mvar;
Of course, this wouldn't normally happen on purpose (one would think). When it does happen, usually it's a sign of one of two classes of mistake.
You're missing a semicolon, end parenthesis, end quote, or something else that causes SAS to not "see" the semicolon at the end of the %let statement. Then your next statement uses the macro variable in a macro context, and SAS sees that as part of the %let statement, which causes this error message.
Something went wrong somewhere else, and you have an issue where that something-else propagates errors further down that don't make any sense. Having an unmatched quotation is a classic example of this, as is a macro that doesn't properly %mend.
1 can happen in cases as simple as this:
%let mvar=mydataset
%put &mvar;
Oops. If it's that simple, then just pop the semicolon on and you're good. It could, however, be caused by something more significant - such as an unmatched parenthesis or quotation - which might require restarting your SAS session. (Sometimes submitting a magic string, which are variations on %*;*;*';*";%*%mend;*);, will fix things, sometimes not. Restarting SAS is a reliable way to fix that).
That's also true with 2 above - if a magic string doesn't fix it, then you may simply need to restart your SAS session. Of course, you still need to find that unmatched quote/parenthesis/etc., but you first need to start SAS over so you can figure it out.

Function doesn't do operations on dataset in R

I'm fairly new to R programming, so my question can appear naive.
I want to define all my functions of R in a single file, named functions.R, and call when I need them. I thought to use source().
That's my code:
main.R:
library(gstat)
library(lattice)
library(rgdal)
source("functions.R")
source("script_import.R")
script_import.R:
source("functions.R")
#Here I import the dataset named "dati"
dati<-read.csv2("/home/eugen/Documenti/file_da_importare.csv", header = TRUE, skip=4, dec = ",")
colnames(dati)<-c("provider", "ente", "nome_stazione", "long", "lat", "quota", "periodo_dati", "anni_dati", "tm_01", "tm_02", "tm_03", "tm_04", "tm_05", "tm_06", "tm_07", "tm_08", "tm_09", "tm_10", "tm_11", "tm_12", "remove", "tn_01", "tn_02", "tn_03", "tn_04", "tn_05", "tn_06", "tn_07", "tn_08", "tn_09", "tn_10", "tn_11", "tn_12", "remove1", "tx_01", "tx_02", "tx_03", "tx_04", "tx_05", "tx_06", "tx_07", "tx_08", "tx_09", "tx_10", "tx_11", "tx_12", "stato", "note", "nazione")
#That's the function call with which I have problems
clean_main_ds()
#If I use this commands instead of the function all works well
#dati$remove<-NULL
#dati$remove1<-NULL
functions.R:
clean_main_ds<-function(){
#I want to delete two columns
dati$remove<-NULL
dati$remove1<-NULL
cat("I'm exiting the function")
return(dati)
}
When compiling I don't receive any error, the function appears as declared in rstudio, is called by script_import.R, the cat() works well (so I imagine that there's no problem with the call), but the function doesn't delete the two columns. If I use the same commands ("dati$remove<-NULL") in script_import.R, instead of the function, all works well.
Where's the error? How can I do to let my function operate on a dataset defined in another file?
Thank you very much for the help,
Eugen
Ps: sorry for the errors in the language, I'm not english. I hope that the text is clear enough...
When you use the assignment operator <- within a function, it only does an assignment within the function's own environment. That is, the function creates a copy of the object dati, and then assigns NULL to elements remove and remove1 of dati within the function's environment.
Now when you use return, the function will return this modified copy of the original object dati. It will not modify the object dati in the global environment. If you do str(clean_main_ds()), you'll notice that that object is actually your data frame with the columns removed.
There's a couple of things you could do to get around this. First, you could specify your function using the assignment operator <<-, which will do assignment in the global environment instead of the function's own environment:
clean_main_ds<-function(){
#I want to delete two columns
dati$remove<<-NULL
dati$remove1<<-NULL
cat("I'm exiting the function")
return(dati)
}
(In fact, doing this, you don't even need the last line return(dati) in the function. By the time you get there your function has already done the modifications to your object in the global environment.)
Another option would be to just assign the value returned by your original function to the original data frame by
dati <- clean_main_ds().
Finally, you could just remove the columns from your data frame directly, without writing a function for it, by using indexes.
dati <- dati[ , -which(colnames(dati) %in% c("remove", "remove1"))]
(You could do this directly by just specifying the column numbers of the columns to remove instead of the which() segment. That part just looks up the indexes of the columns whose name is remove or remove1.)

Resources