I'm trying to write a function that I can use across multiple dataframes which accepts column names as input. The objective is to identify whether an event happened (if it was the earliest) and then code the results into a binary 0 and 1. This is what I've come up with so far:
event <- function(x){
analysis$event <- 0
analysis$event[analysis$earliest == analysis$x] <- 1
}
However, when I try it with say test <- event(death_date) it returns just a value of 1. What went wrong and how can I fix it? Thanks!
The dollar operator does not work with variables. You can use double square brackets instead:
col_name <- "mpg"
mtcars[[col_name]]
# compare:
mtcars$mpg
Related
I'm trying to write a function that I can use across multiple dataframes which accepts column names as input. The objective is to identify whether an event happened (if it was the earliest) and then code the results into a binary 0 and 1. This is what I've come up with so far:
event <- function(x){
analysis$event <- 0
analysis$event[analysis$earliest == analysis$x] <- 1
}
However, when I try it with say test <- event(death_date) it returns just a value of 1. What went wrong and how can I fix it? Thanks!
The dollar operator does not work with variables. You can use double square brackets instead:
col_name <- "mpg"
mtcars[[col_name]]
# compare:
mtcars$mpg
I am learning R, and I am trying to understand the indexing properties. I cannot seem to understand why the following code to change a column name does not work:
state.all <- as.data.frame(state.x77)
head(state.all)
state.all$States <- rownames(state.all)
rownames(state.all) <- NULL
# why the following row does not work?
names(state.all["States"]) <- "Test"
colnames(state.all)
While this works:
state.all <- as.data.frame(state.x77)
head(state.all)
state.all$States <- rownames(state.all)
rownames(state.all) <- NULL
# This work
names(state.all)[which(colnames(state.all)=="States")] <- "Test"
colnames(state.all)
Shouldn't the function be able to overwrite the name of the column also in the first example? Is it something to do with the local vs. global environment?
Thanks in advance!
What you're trying to do is replacing the name of column number 9.
the expression which(colnames(state.all)=="States") results in the index if the column named "States" (if there is any) and then takes this index and replaces the value in the names vector.
the expression state.all["States"] just returns the values of this column so of course nothing will happen.
I suggest something like colnames(state.all)[which(colnames(state.all)=="States")] <- "Test".
Let say that I have these vectors:
time <- c(306,455,1010,210,883,1022,310,361,218,166)
status <- c(1,1,0,1,1,0,1,1,1,1)
gender <- c(1,1,1,1,1,1,2,2,1,1)
And I turn it into these data frame:
dataset <- data.frame(time, status, gender)
I want to list the factors in the third column using this function (p/s: pardon the immaturity. I'm still learning):
getFactor<-function(dataset){
result <- list()
result["Factors"] <- unique(dataset[[3]])
return(result)
}
And all I get is this:
getFactor(dataset)
$Factors
[1] 1
Warning message:
In result["Factors"] <- unique(dataset[[3]]) :
number of items to replace is not a multiple of replacement length
I tried using levels, but all I get is an empty list. My question is (1) why does this happen? and (2) is there any other way that I can get the list of the factor in a function?
Solution is simple, you just need double brackets around "Factors" :)
In the function
result[["Factors"]] <- unique(dataset[[3]])
That should be the line.
The double brackets return an element, single brackets return that selection as a list.
Sounds silly, by try this
test <- list()
class(test["Factors"])
class(test[["Factors"]])
The first class will be of type 'list'. The second will be of type 'NULL'. This is because the single brackets returns a subset as a list, and the double brackets return the element itself. It's useful depending on the scenario. The element in this case is "NULL" because nothing has been assigned to it.
The error "number of items to replace is not a multiple of replacement length" is because you've asked it to put 3 things into a single element (that element is a list). When you use double brackets you actually put it inside a list, where you can have multiple elements, so it can work!
Hope that makes sense!
Currently, when you create your data frame, dataset$gender is double vector (which R will automatically do if everything in it is numbers). If you want it to be a factor, you can declare it that way at the beginning:
dataset <- data.frame(time, status, gender = as.factor(gender))
Or coerce it to be a factor later:
dataset$gender <- as.factor(gender)
Then getting a vector of the levels is simple, without writing a function:
level_vector <- levels(dataset$gender)
level_vector
You're also subsetting lists & data frames incorrectly in your function. To call the third column of dataset, use dataset[,3]. The first element of a list is called by list[[1]]
I'm new at R (and stackoverflow , too), so please forgive any possible stupidity!
I wrote a function
getvariance <-function(data, column)
that returns the variance of values in column in data
In the function I wrote
mydata = read.csv(data)
for i=1:datasize {
x=(mydata$column)[i]
//compute variance of x
}
When I call
getvariance("randomnumbers.csv", X1)
x is returned as a column of null values.
However, when I simply write
x=(mydata$X1[i])
it prints the full column with numerical values.
Can anyone help me understand why mydata$column[i] doesn't work when column's a parameter of a function?
Thanks in advance.
You are trying to access data$column instead of data$X1.
data <- data.frame(X1=1:3)
data$X1
## [1] 1 2 3
data$column
## NULL
Instead try to actually access the column with the name X1 as follows:
fct <- function(data, column){
data[,column]
}
fct(data, "X1")
## [1] 1 2 3
In dataframe df and column a, I would like to replace all is.na and all a=2 <- 0 for all rows.
This has been achieved simply with the following: df[is.na(df$a) | df$a==2, "a"] <- 0
I will be doing this over and over again over different columns (b, c, d, etc) so I just wanted to see if I could build a function. I will simply use the function over and over again. That way, if I need to change which values or which output, it will be a simple task.
Here is a small example. First the dataframe:
df<-data.frame(
a=sample(c(0,1,2), 10, replace=TRUE)
)
Now some missing values:
df[sample(nrow(df), 3, FALSE), "a"] <- NA
Finally, the action to replace [which I already mentioned]:
df[is.na(df$a) | df$a==2, "a"] <- 0
I have tried the following function:
f.na<-function(df,col) df[is.na(df[,col]) | df[,col]==2, col]<-0
f.na(df, "a")
I feel like it should work but I cannot figure out why it doesn't. I get:
Error in [.data.frame(df, , col) : undefined columns selected
I know that I cannot use the $ sign so I tried using this [] format after reading some things online. I had used an apply type of function but then later I could not use the results in a dataframe. So I resorted to this way. I guess I can just iterate over and over for each column I need to modify but I thought the function solution would be nice.
Can you suggest anything I should try?
I can't reproduce your error. However, your function call will not have the desired effect because modifications to df inside your f.na function do not have global scope. Instead, one solution is to have your function return the modified object, like this:
set.seed(37337)
df<-data.frame(
a=sample(c(0,1,2), 10, replace=TRUE)
)
df[sample(nrow(df), 3, FALSE), "a"] <- NA
f.na<-function(df,col) {
df[is.na(df[,col]) | df[,col]==2, col] <- 0
return(df)
}
(df.new <- f.na(df, "a"))
df[is.na(df$a) | df$a==2, "a"] <- 0
print(df)
To access a column in a data frame (or element of a list) by a name stored in another variable you want to use doubled brackets [[]] instead of the single ones.
However, for what you are doing you may want to look into macros instead, see the article on macros in this issue of Rnews and also the defmacro function in the gtools package.