Does `R` language Override variable values? - r

I'm creating a variable dataset and assigning a value to it like this :
dataset = iris
Now i assign a different value to the same variable like this :
dataset = read.csv(filename, header = FALSE)
Does R Override the previous value of dataset? Can anyone explain me how this works and can we assign more than one value to the same variable?

Yes, r will override the value of the previously assigned variable (Some examples to get started can be found here)
On a sidenote: In contrast to other languages, r uses <- as an assignment operator, so to make the code more readable to other users, you should consider using that instead of =.

Related

Is there a generalizable way to pass variable names into functions in R? If not, why? [duplicate]

This question already has answers here:
Dynamically select data frame columns using $ and a character value
(10 answers)
Closed 3 years ago.
It seems like one of the primary things I get stuck on when R programming is passing through variable names. I come from a Stata background, where we can easily call globals with "$" in any code or function. However, that doesn't seem to work in R. It seems like sometimes I have to use some special package or use something like df[[x]] or something like that. Instead of doing all of this ad-hoc, I was wondering if someone can walk me through the R architecture so I understand how to address this problem every time I run into it.
As a simple example, I am currently working on a code that stores a row count:
rowcount <- function(x){
all_n <- length(which(!is.na(df$x) & df$model=="Honda"))
print(all_n)
}
The function simply stores the count of rows when x is not missing and make is "Honda". I want to be able to pass the variable name into the function, then have it return this count. For instance, for variable gender, I want to be able to write rowcount(gender)', and for gender to be passed into the function asdf$gender'. However, this doesn't happen.
Can someone explain how to fix this code, and in the process, how I can generally fix these types of problems? I know there may be more elegant ways to achieve my goal, but my intention is both to (1) get a code that fulfills a specific goal for my project, and (2) more generally understand how R treats variable names as arguments in functions.
Thanks
We can pass the column name as string and then uses [[. It is better to have the data also as an argument in the function so that it can be reused for different datasets
rowcount <- function(data, x){
all_n <- length(which(!is.na(data[[x]] & model=="Honda"))
all_n
}
Note that print only prints the output. We need to return the object created. In R, we don't have to explicitly specify the return
In addition to the OP's method, it can also be done with sum
rowcount <- function(data, x){
sum(!is.na(data[[x]] & model=="Honda")
}
Note that we don't have to create an object and then return if it is a single expression
As an aside, the tidyverse option would be
library(dplyr)
rowcount <- function(data, x) {
x <- enquo(x)
data %>%
summarise(out = sum(!is.na(!!x) & model == "Honda")) %>%
pull(out)
}
where we can pass the column name unquoted
rowcount(df1, columnname)

assignment in R with left side a formula

I have a dataframe created from a loop. The loop examines ordinal regression for a couple dozen outcomes for a given exposure.
At the beginning of the loop, a variable called exposure is defined. Example: exposure <- "MyExposure"
At the end of the routine, I want to actually save the resulting data set I've compiled and to have the name of the saved data object be related to the exposure.
I've had issues with making the left hand side of the assignment based on the variable names.
The name of the new dataframe should be
paste0(exposure,"_imputed_ds")
[1] "MyExposure_imputed_ds"
However, when I try to put this on left hand of an assignment, it fails.
paste0(exposure,"_imputed_ds") <- existing.data.frame
Error in paste0(exposure,"_imputed_ds") <- existing.data.frame
could not find function "paste0<-"
What I wanted was a new dataframe named MyExposure_imputed_ds that contained contents of existing.data.frame
You can use assign() to set a value for a name you construct with paste
assign(paste0('MyExposure', '_imputed_ds'), 5)
Now you have MyExposure_imputed_ds in the environment with value 5
I find the use of assign to be generally a warning flag, though! Maybe you want something like this instead...
imputed_ds <- list()
imputed_ds[['MyExposure']] <- 5

R calling string for lookup in a function

I'm trying to call a column name for the e1071 svm function.
The working code looks like:
model = svm(Air_Flow~., data = trainset)
But in an effort to make it more automated I changed it to:
coi=44
model = svm(colnames(data)[coi]~., data = trainset)
where
This didn't work due (I think) to the quote marks, so I tried:
get(colnames(data)[coi])
cat(...)
print(...,quote = F)
as.name(...)
parse(...)
Only get() sort of worked, but then when I tried to predict other values using model it didn't. Any suggestions on what may get this working?
Thanks
Formulas are not strings that you can just "paste" variables into. Nor are variable names the same as strings. You need to be careful about how you build expressions to make sure you are using the correct type. Formulas are really un-evaluated calls that hold names/symbols as parameters.
You might consider using bquote() to build your formula expression, and be sure to convert the character version of the variable name to a proper variable name with as.name()
coi=44
model = svm(bquote(.(as.name(colnames(data)[coi])~.), data = trainset)
Yes, this is a bit ugly. That's why often functions that allow formulas also have an alternative interface that's easier to program against. svm() also allows you to pass in an x and y parameter for the response and predictors. You might do
model = svm(trainset[,col], trainset[,-col])
which is nicer because you can subset columns from your dataset with both string and numeric indexes

convert period in stata to NA in r

I have a dataset in stata and I want to take it to R, but there are some missing values in state and they are represented using a period. I want to get the data into R which I do by loading the foreign package and then I use read.table() function. How do I convert the periods in state which are genuinely missing to NA in R?
If i understand you correctly, you first load the Foreign-Package for loading a .dta-File, correct?
library("foreign")
Then you would read in your Data by using:
myRFile <- read.dta(file="someStataFile.dta")
You are asking for a way that the missing operator from Stata, often denoted by a dot ., is converted to the missing operator in R, NA, also correct?
One thing to know here is, that Stata handles missing values "behind the scenes" in multiple ways. There are actually about 27 different missing operators in Stata, which are usually not distinguishable for the user. You do not need to know them for you problem though, because read.dta() handles them itself.
To learn how you can tackle a simple problem like this yourself in the future, you always need to check the help file for your function first:
help(read.dta)
Here you see, that the function handles the extensive missing-data types from Stata automatically and correctly.
If you want to have information about which type of missing operator was recognized, you can set the argument missing.type=TRUE, by using:
myRFile <- read.dta(file="someStataFile.dta", missing.type=TRUE)
Then, according to the help file, the following will happen:
If missing.type is TRUE a separate list is created with the same
variable names as the loaded data. For string variables the list value
is NULL. For other variables the value is NA where the observation is
not missing and 0–26 when the observation is missing. This is attached
as the "missing" attribute of the returned value.

R: partimat function doesn't recognize my classes

I am a relatively novice r user and am attempting to use the partimat() function within the klaR package to plot decision boundaries for a linear discriminant analysis but I keep encountering the same error. I have tried inputing the arguments multiple different ways according to the manual, but keep getting the following error:
Error in partimat.default(x, grouping, ...) :
at least two classes required
Here is an example of the input I've given:
partimat(sources1[,c(3:19)],grouping=sources1[,2],method="lda",prec=100)
where my data table is loaded in under the name "sources1" with columns 3 through 19 containing the explanatory variables and column 2 containing the classes. I have also tried doing it by entering the formula like so:
partimat(sources1$group~sources1$tio2+sources1$v+sources1$cr+sources1$co+sources1$ni+sources1$rb+sources1$sr+sources1$y+sources1$zr+sources1$nb+sources1$la+sources1$gd+sources1$yb+sources1$hf+sources1$ta+sources1$th+sources1$u,data=sources1)
with these being the column heading.
I have successfully run an LDA on this same data set without issue so I'm not quite sure what is wrong.
From the source code of the partimat.default function getAnywhere(partimat.default) it states
if (nlevels(grouping) < 2)
stop("at least two classes required")
Therefore maybe you haven't defined your grouping column as a factor variable. If you try summary(sources1[,2]) what do you get? If it's not a factor, try
sources1[,2] <- as.factor(sources1[,2])
Or in method 2 try removing the "sources1$"on each of your variable names in the formula as you specify the data frame in which to look for these variable names in the data argument. I think you are effectively specifying the dataframe twice and it might be looking, for instance, for
"sources1$sources1$groups"
Rather than
"sources1$groups"
Without further error messages or a reproducible example (i.e. include some data in your post) it's hard to say really.
HTH

Resources