I'm using GNU Octave, version 3.2.4.
I have two functions. First function, f1.m, has a global variable. I'm accessing the global variable in another function, f2.m.
It works fine in following situation:
# File f1.m
function testGlobal1()
clear all;
global var1 = [];
testGlobal2();
endfunction
#File f2.m
function testGlobal2()
global var1;
isglobal("var1")
var1
var1 = [var1, 10]
endfunction
The result after running f1 is as follows:
ans = 1
var1 =
var1 = 10
Now, if I run f1 again, following is the result:
ans = 1
var1 = 10
var1 =
10 10
Upon subsequent running of f1, the vector var1 continues to grow.
My question is, why isn't the variable, var1, getting reinitialized in f1 (line 3) even though the code session (is that what it's called?) is over and a new one begins when I run f1 again?
The funny thing is, if I assign a value to var1 in f2, instead of appending on the vector, the new value gets assigned to var1. Code for f2 with modifications is given below:
function f2()
global var1;
isglobal("var1")
var1
var1 = [var1, 10]
var1 = 20
endfunction
The output after running f1 is below:
ans = 1
var1 =
10 10
var1 =
10 10 10
var1 = 20
On running f1 again, following output is seen:
ans = 1
var1 = 20
var1 =
20 10
var1 = 20
I get the code to behave as I expect it to if I reinitialize var1 to an empty vector in f2. In that case, subsequent running of f1 does not make var1 grow as shown above.
Could someone explain what's happening? Why doesn't the re initialization of var1 to empty vector in f1 work?
Related
I've been trying unsuccessfully to replicate in R the following Stata loop:
forvalues i=1/10 {
replace var`i'= a if other_var`i'==b
}
So far I've got this as the closest attempt:
for(i in 1:10) {
df <- df %>%
mutate(get(paste("var",i,sep="")) =
ifelse(get(paste("other_var",i,sep=""))==b
,a
,get(paste("var",i,sep=""))))
}
But I get the following error:
Error: unexpected '=' in:
"survey_data <- survey_data %>%
mutate(paste("offer",i,"_accepted",sep="") ="
If I change the variable to be mutated to a simple variable name, it works, so I'm guessing my code is OK for the "right-hand side of the mutation", but for some reason it's not OK for the "left-hand side".
This solution is very inelegant, but I think does exactly what you want.
var1 <- "x"
var2 <- "y"
var3 <- "z"
other_var1 <- 1
other_var2 <- 0
other_var3 <- 1
df <- data.frame(var1, other_var1, var2, other_var2, var3, other_var3)
for(i in 1:3){
var_name <- paste("df$var", i, sep = "")
other_var_name <- paste("df$other_var", i, sep = "")
if (eval(parse(text = other_var_name)) == 1){
assign(var_name, "a")
}
}
There are three key ingredients here. First the paste() function to create the names of the variables in the current iteration of the loop. Second, the eval(parse(foo)) combo to reference the actual variable whose name is stored as string in foo. Third, using assign() to assign values to a variable (as opposed to using <-).
This looks like FAQ 7.21.
The most important part of that answer is at the end where is says to use a list instead.
Trying to work on a group of global variables in R leads to complicated code that is hard to read and even harder to debug.
If you instead put those variables into a single list, then you can access them by name or position and use tools like lapply or the purrr package (part of tidyverse) to process everything in the list (or some of the things in the list using map_at or map_if from purrr).
If tell us more about what you are trying to accomplish, we may be able to give a much simpler example of how to do it.
You can do something like the following:
df <- structure(list(var1 = c(1, 2, 3, 4),
var2 = c(1, 2, 3, 4),
var3 = c(1,2, 3, 4),
var4 = c(1, 2, 3, 4),
other_var1 = c(1, 0, 1, 0),
other_var2 = c(0,1, 0, 1),
other_var3 = c(1, 1, 0, 0),
other_var4 = c(0, 0, 1,1)),
class = "data.frame",
row.names = c(NA, -4L))
# var1 var2 var3 var4 other_var1 other_var2 other_var3 other_var4
# 1 1 1 1 1 1 0 1 0
# 2 2 2 2 2 0 1 1 0
# 3 3 3 3 3 1 0 0 1
# 4 4 4 4 4 0 1 0 1
## Values to replace based on OP original question
a <- 777
b <- 1
## Iter along all four variables avaible in df
for (i in 1:4) {
df <- within(df, {
assign(paste0("var",i), ifelse(get(paste0("other_var",i)) %in% c(b), ## Condition
a, ## Value if Positive
get(paste0("var",i)) )) ## Value if Negative
})
}
which results in the following output:
# var1 var2 var3 var4 other_var1 other_var2 other_var3 other_var4
# 1 777 1 777 1 1 0 1 0
# 2 2 777 777 2 0 1 1 0
# 3 777 3 3 777 1 0 0 1
# 4 4 777 4 777 0 1 0 1
The solution doesn't look like a one-line-solution, but it actually is one, a quite dense one tho; hence let's see how it works by its foundation components.
within(): I don't want to repeat what other people have excellently explained, so for the within() usage, I gently refer you here.
The: assign(paste0("var",i), X) part.
Here I am following that #han-tyumi did in his answer, meaning recover the name of the variables using paste0() and assign them the value of X(to be explained) using the assign() function.
Let's talk about X.
Before I referenced assign(paste0("var",i), X). Where, indeed, X is equal to ifelse(get(paste0("other_var",i)) %in% c(b), a, get(paste0("var",i)) ).
Inside the ifelse():
The condition:
First, I recover the values of variable other_var(i) (with i = 1,2,3,4) combining the function get() with paste0() while looping. Then, I use the %in% operator to check whether the value assigned to variable b(on my example, the number 1) was contained on variable other_var(i) or not; this generates a TRUE or FALSE depending if the condition is met.
The TRUE part of the ifelse() function.
This is the simplest part if the condition is met then assign, a (which in my example is equal to 777).
The FALSE part of the ifelse() function.
get(paste0("var",i)): which is the value of the variable itself (meaning, if the condition is not meet, then keep the variable unaltered).
I have a bunch of SAS datasets named "haveyear" ranging from 2000-2018, i.e. "have2000"-"have2018". These are stored on a local directory at 'path_to_have_data'. Each dataset contains several variables, i.e. var1, var2 and so on. I want to load these datasets and then subset them contingent on var1 ne '0' and also keep var1 and var2 from the original datasets. Furthermore, I want to add a new variable year to each of the subsets, so I can tell which year the data is from. Finally, I want to append (stack) all the new subsets into a single dataset named appended. For example:
Dataset Have2017 looks like this:
var1 var2 var3
0 2 5
3 7 9
Dataset Have2018 looks like this:
var1 var2 var3
0 2 5
3 7 9
Subset Want2017 looks like this:
var1 var2 year
3 7 2017
Subset Want2018 looks like this:
var1 var2 year
3 7 2018
The final dataset appended looks like this:
var1 var2 year
3 7 2017
3 7 2018
The following SAS script does the trick:
libname raw 'path_to_have_data';
%macro a;
%do &year.=2000 %to 2018;
data want&year. (keep= var1 var2);
set raw.have&year.;
where var1 ne '0';
year=&year.;
run;
%end;
%mend;
%a;
data appended;
set want:;
run;
Does anyone know how to achieve the same result with R Studio?
EDIT: the MCVE version of the question
Here is a working version of the SAS code needed to produce the required result from the original post.
First, one needs a DATA step to create a few SAS data sets. We'll store them in the default WORK library instead of referencing another library on disk.
/* generate sample data */
data have2000 have2001 have2002;
input var1 var2 var3;
cards;
0 1 2
1 3 5
2 7 4
0 9 9
8 7 3
;
run;
Next, we'll need a few edits to the SAS macro to make it run.
/* run macro from OP */
options mprint; /* shows SAS code generated by macro processor */
/*
* corrections / adjustments made to macro
* 1. remove & in %do loop
* 2. add year to keep list
* 3. fix syntax error in where statement because var1 is numeric
* 4. use work library, and only process 3 years of data
*/
%macro a;
%do year = 2000 %to 2002;
data want&year. (keep= var1 var2 year);
set have&year.;
where var1 ne 0;
year=&year.;
run;
%end;
%mend;
/* run the macro */
%a;
The SAS option mprint causes SAS to write the code generated by the macro to the log. When we run the macro, a subset of the generated code for a single data set looks like this.
MPRINT(A): data want2000 (keep= var1 var2 year);
MPRINT(A): set have2000;
MPRINT(A): where var1 ne 0;
MPRINT(A): year=2000;
MPRINT(A): run;
MPRINT(A): data want2001 (keep= var1 var2 year);
MPRINT(A): set have2001;
MPRINT(A): where var1 ne 0;
MPRINT(A): year=2001;
MPRINT(A): run;
MPRINT(A): data want2002 (keep= var1 var2 year);
MPRINT(A): set have2002;
MPRINT(A): where var1 ne 0;
MPRINT(A): year=2002;
MPRINT(A): run;
The macro generates three SAS data steps, one for each year, including the following changes.
Drop var3
Delete rows where var1 = 0
Write output to a SAS data set named want<year>
Finally, the code combines the data sets just created into a single SAS data set called appended. We'll also print the resulting data set.
data appended;
set want:; /* references all SAS datasets that start with "want" */
run;
proc print data = appended;
run;
...and the output:
Here is a Base R solution to the problem. The OP wants to replicate the process of a SAS macro that subsets a list of SAS data sets, raw.have2000 - raw.have2018, keeps two columns, sets a variable year equal to the year listed in the data set name, and joins these into a single data set.
# create some data
var1 <- 0:5
var2 <- 6:11
var3 <- 12:17
raw.have2000 <- data.frame(var1,var2,var3)
raw.have2001 <- data.frame(var1,var2,var3)
raw.have2002 <- data.frame(var1,var2,var3)
years <- 2000:2002
dataList <- lapply(years,function(x){
# create name of data set as a character object
dsname <- paste0("raw.have",x)
# use dsname with get() to extract data and subset first 2 variables
ds <- subset(get(dsname),var1 !=0,select=c(var1,var2))
ds$year <- x
# print to have data frame returned in
# output list
ds
})
# combine data frames
appended <- do.call(rbind,dataList)
...and the output, noting that the rows where var1 = 0 have been eliminated, var3 has been dropped, and the year variable has been added:
> appended
var1 var2 year
2 1 7 2000
3 2 8 2000
4 3 9 2000
5 4 10 2000
6 5 11 2000
21 1 7 2001
31 2 8 2001
41 3 9 2001
51 4 10 2001
61 5 11 2001
22 1 7 2002
32 2 8 2002
42 3 9 2002
52 4 10 2002
62 5 11 2002
>
Explanation
One of the major differences between SAS and R is that experienced SAS programmers use the SAS macro language to automate repetitive tasks. The macro language generates SAS code that is processed by the SAS system.
R does not have a macro language / code generator. However, one can use the get() function to access R objects whose names can be generated by combining various pieces of information into character objects.
Consider retrieving all needed have data frames in global environment using mget into a list of data frames. Then, iteratively run your data frame operations on each item, then row bind all items at end.
Below uses mapply to iterate elementwise between the have data frames and years 2000-2018:
haves_dfs <- mget(ls(pattern="have"))
# SUBSET ROWS AND COLUMNS
want_dfs <- mapply(function(df, yr) transform(subset(df, var1 != '0')[c("var1", "var2")],
year = yr),
have_dfs, c(2000:2018), SIMPLIFY = FALSE)
final_df <- do.call(rbind, want_dfs)
Or with lapply using get iteratively:
want_dfs <- lapply(c(2000:2018), function(yr)
# SUBSET ROWS AND COLUMNS
transform(subset(get(paste0("have", yr)), var1 != '0')[c("var1", "var2")],
year = yr)
)
final_df <- do.call(rbind, want_dfs)
Above, might appear dense but this nested one line
transform(subset(df, var1 != '0')[c("var1", "var2")], year = yr)
Equates to the multiple step:
df_process <- function(df, yr) {
# SUBSET ROWS
df <- df[df$var1 != '0',]
# SUBSET COLUMNS
df <- df[c("var1", "var2")]
# ADD NEW COLUMN
df$year <- yr
# RETURN FINAL
return(df)
}
Thank you #Parfait for also writing a great answer that does the trick! However, in your first line of code you wrote:
haves_dfs <- mget(ls(pattern="have"))
and subsequently you refered to:
have_dfs
Hence, your first line of code should have been:
have_dfs <- mget(ls(pattern="have"))
I have adjusted your answer and combined it with the dataset generating part of the answer #Len gave. Here is a fully working example of the solution:
var1 <- 0:5
var2 <- 6:11
var3 <- 12:17
have2000 <- data.frame(var1,var2,var3)
have2001 <- data.frame(var1,var2,var3)
have2002 <- data.frame(var1,var2,var3)
have_dfs <- mget(ls(pattern="have"))
want_dfs <- mapply(function(df, yr) transform(subset(df, var1 != '0')[c("var1", "var2")],
year = yr),
have_dfs, c(2000:2002), SIMPLIFY = FALSE)
final_df <- do.call(rbind, want_dfs)
or alternatively with lapply using get():
want_dfs <- lapply(c(2000:2002), function(yr)
transform(subset(get(paste0("have", yr)), var1 != '0')[c("var1", "var2")],
year = yr) )
final_df <- do.call(rbind, want_dfs)
I have a data frame called "Region_Data" which I have created by performing some functions on it.
I want to take this data frame called "Region_Data" and use it an input and I want to subset it using the following function that I created. The function should produce the subset data frame:
Region_Analysis_Function <- function(Input_Region){
Subset_Region_Data = subset(Region_Data, Region == "Input_Region" )
Subset_Region_Data
}
However, when I create this function and then execute it using:
Region_Analysis_Fuction("North West")
I get 0 observations when I execute this code (though I know that there are xx number of observations in the data frame.)
I read that there is something called global / local environment, but I'm not really clear on that.
How do I solve this issue? Thank you so much in advance!!
When you try to subset your data using subset(Region_Data, Region == "Input_Region" ), "Input_Region" is being interpreted as a string literal, rather than being evaluated to the value it represents. This means that unless the column Input_Region in your object Region_Data contains some rows with the value "Input_Region", your function will return a zero-row subset. Removing the quotes will solve this, and changing == to %in% will make your function more generalized. Consider the following data set,
mydf <- data.frame(
x = 1:5,
y = rnorm(5),
z = letters[1:5])
##
R> mydf
x y z
1 1 -0.4015449 a
2 2 0.4875468 b
3 3 0.9375762 c
4 4 -0.7464501 d
5 5 0.8802209 e
and the following 3 functions,
qfoo <- function(Z) {
subset(mydf, z == "Z")
}
foo <- function(Z) {
subset(mydf, z == Z)
}
##
bar <- function(Z) {
subset(mydf, z %in% Z)
}
where qfoo represents the approach used in your question, foo implements the first change I noted, and bar implements both changes.
The second two functions will work when the input value is a scalar,
R> qfoo("c")
[1] x y z
<0 rows> (or 0-length row.names)
##
R> foo("c")
x y z
3 3 0.9375762 c
##
R> bar("c")
x y z
3 3 0.9375762 c
but only the third will work if it is a vector:
R> foo(c("a","c"))
x y z
1 1 -0.4015449 a
Warning messages:
1: In is.na(e1) | is.na(e2) :
longer object length is not a multiple of shorter object length
2: In `==.default`(z, Z) :
longer object length is not a multiple of shorter object length
##
R> bar(c("a","c"))
x y z
1 1 -0.4015449 a
3 3 0.9375762 c
I'm not familiar with R, just a newbie. So I want to translate some code from matlab to R. But I have the problem about the output of function. I want to create a function give output to two specify variable, like this:
list[a,b]<-function(var1,var2){
a<-var1 + var2
b<-var1 - var2
return list(a,b)
}
But my code is not working, please help me to solve this problem.
You seem to have some fundamental misunderstanding about functions in R. Read "An Introduction to R". Also, return is a function in R.
myfun <- function(var1, var2){
a <- var1 + var2
b <- var1 - var2
return(list(a, b))
}
myfun(1:5, 10:6)
#[[1]]
#[1] 11 11 11 11 11
#
#[[2]]
#[1] -9 -7 -5 -3 -1
This may be silly, but I am not getting how to do it,
What I want?
My function goes like this.
plot_exp <-
function(i){
dat <- subset(dat.frame,En == i )
ggplot(dat,aes(x=hours, y=variable, fill = Mn)) +
geom_point(aes(x=hours, y=variable, fill = Mn),size = 3,color = Mi) + geom_smooth(stat= "smooth" , alpha = I(0.01))
}
ll <- lapply(seq_len(EXP), plot_exp)
do.call(grid.arrange, ll)
and I have two variables
Var1, Var2 (Which will be passed through the command line, so cant group it using subset)
I want to run the above function for var1 and var2, my function produces two plots for each complete execution. So now it should produce 2 plots for var1 and two plots for var2.
I just want to know how can I apply the logic here to handle what I want? Thank you
This is what data.frame looks like
En Mn Hours var1 var2
1 1 1 0.1023488 0.6534707
1 1 2 0.1254325 0.5423215
1 1 3 0.1523245 0.2542354
1 2 1 0.1225425 0.2154533
1 2 2 0.1452354 0.4521255
1 2 3 0.1853324 0.2545545
2 1 1 0.1452369 0.2321542
2 1 2 0.1241241 0.2525212
2 1 3 0.0542232 0.2626214
2 2 1 0.8542154 0.2154522
2 2 2 0.0215420 0.5245125
2 2 3 0.2541254 0.2542512
I will table the above data.frame as input and I want to run my function once for var1 and produce two plots and then again run the same function for var2 and produce two more plots, then combine all of then using grid.arrange.
The variable values I have to read from the command line and then I have to do the following to get the required data out of main data frame.
subset((aggregate(cbind(variable1,variable2)~En+Mn+Hours,a, FUN=mean)))
after I read from the commandline and store them inside the "variable1" and "variable2" if I directly call them in the above command its not working. what should I do to enter those two variable values inside the command line.
I made a few changes and ran it on your sample data. Basically i just needed to use aes_string rather than aes to allow for a variable with a column name.
myvars<-c("var1", "var2")
plot_exp <- function(i, plotvar) {
dat <- subset(dat.frame,En == i )
ggplot(dat,aes_string(x="Hours", y=plotvar, fill = "Mn")) +
geom_point(aes(color=Mn), size = 3) +
geom_smooth(stat= "smooth" , alpha = I(0.01), method="loess")
}
ll <- do.call(Map, c(plot_exp, expand.grid(i=1:2, plotvar=myvars, stringsAsFactors=F)))
do.call(grid.arrange, ll)
(I'm not sure why the colors of the legends are messed up in the image, they look fine on screen)
For subsetting, use
myvars <- c("var1", "var2")
subset(a[,myvars], a[,c("En","Mn","Hours")], FUN=mean)