I'm trying to write a function that I can use across multiple dataframes which accepts column names as input. The objective is to identify whether an event happened (if it was the earliest) and then code the results into a binary 0 and 1. This is what I've come up with so far:
event <- function(x){
analysis$event <- 0
analysis$event[analysis$earliest == analysis$x] <- 1
}
However, when I try it with say test <- event(death_date) it returns just a value of 1. What went wrong and how can I fix it? Thanks!
The dollar operator does not work with variables. You can use double square brackets instead:
col_name <- "mpg"
mtcars[[col_name]]
# compare:
mtcars$mpg
Related
I'm trying to write a function that I can use across multiple dataframes which accepts column names as input. The objective is to identify whether an event happened (if it was the earliest) and then code the results into a binary 0 and 1. This is what I've come up with so far:
event <- function(x){
analysis$event <- 0
analysis$event[analysis$earliest == analysis$x] <- 1
}
However, when I try it with say test <- event(death_date) it returns just a value of 1. What went wrong and how can I fix it? Thanks!
The dollar operator does not work with variables. You can use double square brackets instead:
col_name <- "mpg"
mtcars[[col_name]]
# compare:
mtcars$mpg
I am learning R, and I am trying to understand the indexing properties. I cannot seem to understand why the following code to change a column name does not work:
state.all <- as.data.frame(state.x77)
head(state.all)
state.all$States <- rownames(state.all)
rownames(state.all) <- NULL
# why the following row does not work?
names(state.all["States"]) <- "Test"
colnames(state.all)
While this works:
state.all <- as.data.frame(state.x77)
head(state.all)
state.all$States <- rownames(state.all)
rownames(state.all) <- NULL
# This work
names(state.all)[which(colnames(state.all)=="States")] <- "Test"
colnames(state.all)
Shouldn't the function be able to overwrite the name of the column also in the first example? Is it something to do with the local vs. global environment?
Thanks in advance!
What you're trying to do is replacing the name of column number 9.
the expression which(colnames(state.all)=="States") results in the index if the column named "States" (if there is any) and then takes this index and replaces the value in the names vector.
the expression state.all["States"] just returns the values of this column so of course nothing will happen.
I suggest something like colnames(state.all)[which(colnames(state.all)=="States")] <- "Test".
Let say that I have these vectors:
time <- c(306,455,1010,210,883,1022,310,361,218,166)
status <- c(1,1,0,1,1,0,1,1,1,1)
gender <- c(1,1,1,1,1,1,2,2,1,1)
And I turn it into these data frame:
dataset <- data.frame(time, status, gender)
I want to list the factors in the third column using this function (p/s: pardon the immaturity. I'm still learning):
getFactor<-function(dataset){
result <- list()
result["Factors"] <- unique(dataset[[3]])
return(result)
}
And all I get is this:
getFactor(dataset)
$Factors
[1] 1
Warning message:
In result["Factors"] <- unique(dataset[[3]]) :
number of items to replace is not a multiple of replacement length
I tried using levels, but all I get is an empty list. My question is (1) why does this happen? and (2) is there any other way that I can get the list of the factor in a function?
Solution is simple, you just need double brackets around "Factors" :)
In the function
result[["Factors"]] <- unique(dataset[[3]])
That should be the line.
The double brackets return an element, single brackets return that selection as a list.
Sounds silly, by try this
test <- list()
class(test["Factors"])
class(test[["Factors"]])
The first class will be of type 'list'. The second will be of type 'NULL'. This is because the single brackets returns a subset as a list, and the double brackets return the element itself. It's useful depending on the scenario. The element in this case is "NULL" because nothing has been assigned to it.
The error "number of items to replace is not a multiple of replacement length" is because you've asked it to put 3 things into a single element (that element is a list). When you use double brackets you actually put it inside a list, where you can have multiple elements, so it can work!
Hope that makes sense!
Currently, when you create your data frame, dataset$gender is double vector (which R will automatically do if everything in it is numbers). If you want it to be a factor, you can declare it that way at the beginning:
dataset <- data.frame(time, status, gender = as.factor(gender))
Or coerce it to be a factor later:
dataset$gender <- as.factor(gender)
Then getting a vector of the levels is simple, without writing a function:
level_vector <- levels(dataset$gender)
level_vector
You're also subsetting lists & data frames incorrectly in your function. To call the third column of dataset, use dataset[,3]. The first element of a list is called by list[[1]]
I'm new at R (and stackoverflow , too), so please forgive any possible stupidity!
I wrote a function
getvariance <-function(data, column)
that returns the variance of values in column in data
In the function I wrote
mydata = read.csv(data)
for i=1:datasize {
x=(mydata$column)[i]
//compute variance of x
}
When I call
getvariance("randomnumbers.csv", X1)
x is returned as a column of null values.
However, when I simply write
x=(mydata$X1[i])
it prints the full column with numerical values.
Can anyone help me understand why mydata$column[i] doesn't work when column's a parameter of a function?
Thanks in advance.
You are trying to access data$column instead of data$X1.
data <- data.frame(X1=1:3)
data$X1
## [1] 1 2 3
data$column
## NULL
Instead try to actually access the column with the name X1 as follows:
fct <- function(data, column){
data[,column]
}
fct(data, "X1")
## [1] 1 2 3
I have the following example data frame x
id val
a 1
a 2
a 3
b 2
b 4
b 9
I have a simple function that I apply while doing my SAC. Something like,
f.x <- function(x){
cat(x$id,"\n")
}
Note, all that I'm trying to do is print out the grouping term. This function is called using ddply as follows
ddply(x,.(id),f.x)
However, when I run this I get integer outputs which I suspect are indices of a list. How do I get the actual grouping term within f.x, in this case 'a' & 'b'?
Thanks much in advance.
Never be fooled into thinking that just because a variable looks like a bunch of characters, that it really is just a bunch of characters.
Factors are not characters. They are integer codes and will often behave like integer codes.
Coerce the values to character using as.character().
If every thing you want to do is print you should use d_ply.
In ddply (d_ply) actual argument passed to function (f.x here) is data.frame (subset of x data.frame), so you should modify your f.x function.
f.x <- function(x) cat(x[1,"id"],"\n")
d_ply(x,.(id),f.x)