This is an assignment question everybody in my class solved it through split,apply I want to use different approach and used ddplyr and got stuck.
Here I have to generate a function best("State","Outcome"), o/p is Hospital name with lowest Mortality rate in the state entered.
eg-best("TX","heart failure") o/p-"CYPRESS"
MYCODE-
In the above steps Ihave read the file & subsetted the desired columns in data1
library(plyr)
data2 <- ddply(data1,.(State, Hospital.Name),
summarise, Heart.Attack=min(as.numeric(HA,na.rm=TRUE)))
data3 <- data2[complete.cases(data2),]
best <- function(State,outcome)
{
if(! State %in% data3$State) {
stop("invalid state")
} else if(State %in% data3$State && outcome == "Heart Attack") {
data4 <- subset(data3, State %in% data3$State, select=c(Hospital.Name))
return(nrow(data4))
}
}
Here when I am trying to return only those Hospitalnames which are in the entered function I am getting all the hospital names, If I assign the value manually then I get the correct no. of rows. I cant understand why its not taking value directly from function State%in%data3$State.
Well the error resolved...
I introduced empty character vector in loop,assigned the State value to it and then compared.
Related
I have a dataframe in R as follows
df <-
as.data.frame(cbind(c(1,2,3,4,5), c(0,1,2,3,4,5),c(1,2,4,5,6)))
and I have a function in which I want the procedure to stop and display a message if the input df contains at least one 0 value. I tried the following but can't make it work properly. What is the correct if() statement I should use?
my_function <- function(df){
if (all(df == 0) == 'TRUE')
stop(paste("invalid input df"))
}
We could use %in%
my_function <- function(df) {
if(0 %in% unlist(df)) {
stop("invalid input df")
}
}
I have this code that gets three arguments. It gets narrowed down by state. It then gets narrowed down by condition. It lastly gets narrowed down by rank of hospital under its performance in the condition (heart attack, heart failure or pneumonia). For this code, I am working on the heart failure part of the code so the other two can be ignored. The order function orders the heart failure rate nicely. However, I am having difficulty in selecting the ranking according after that.
best("AK","heart failure", 3)
best <- function(state, outcome, num) {
#Reads the csv file
dataTable <- read.csv("outcome.csv", header = TRUE, stringsAsFactors = FALSE)
#Passes the state argument to the choice variable
choice <- state
stateOfChoice <- dataTable[dataTable$State == choice,]
#Makes sure that only three of outcomes found in the csv file are selected
if(outcome != "heart failure" && outcome != "heart attack" && outcome != "pneumonia"){
print("wrong condition, try again")
main()
}
#using the selected rows from above, return the minimum value of rate from heart attack and then use this selected row to find the hospital name
else if (outcome == "heart attack"){
heart_attack <- stateOfChoice[which.min(stateOfChoice$Hospital.30.Day.Death..Mortality..Rates.from.Heart.Attack), ]
hospital <- heart_attack$Hospital.Name
return(hospital)
}
#Similar as above, but instead with heart failure
else if (outcome == "heart failure"){
orderState <- stateOfChoice[order(as.integer(stateOfChoice$Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure), decreasing = FALSE),]
orderStateNum <- orderState$Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure[[num]]
##heart_failure <- stateOfChoice[which.min(stateOfChoice$Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure),]
hospital <- orderStateNum$Hospital.Name
return(hospital)
}
#Similar as above, but instead with pneumonia
else if (outcome == "pneumonia"){
pneumonia <- stateOfChoice[which.min(stateOfChoice$Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia),]
hospital <- pneumonia$Hospital.Name
return(hospital)
}
}
For instance, you can see that the order function has ordered the rows nicely under this variable due to orderState <- stateOfChoice[order(as.integer(stateOfChoice$Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure),decreasing = FALSE),]for the condition heart failure. The third selection should be #100 which corresponds with hospital name Mat-su regional medical center. I am not getting that hospital name. I am getting #101 which corresponds with Bartlett Regional Hospital.
Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure
115 10.8
104 11.2
100 11.4
114 11.4
101 11.6
The data is here:
Pls click for dataset
Not sure to understand what you are trying to do. Furthermore, your code is not working properly and I get the following error:
Error in orderStateNum$Hospital.Name (from so.R!hf5Gh5#27) :
$ operator is invalid for atomic vectors
I'll suppose that your goal is to retrieve the hospital name number num in your sorted dataset. You can replace your current code by this one, at the right place:
#Similar as above, but instead with heart failure
else if (outcome == "heart failure"){
orderState <- stateOfChoice[order(stateOfChoice$Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure, decreasing = FALSE), ]
return(orderState[num, "Hospital.Name"])
}
There were various problems in your code. The most important was the following: your as.integer instruction does not make sense, since your are trying to sort decimal numbers. You must definitely remove that. Also, you should maybe not keep your data like this, with "Not Available" making all numeric variables recognized as character vectors. You may have a good reason to do so, but it sounds quite risky.
You should also modify/adapt the other portions of code accordingly.
Problem Objective: Finding the Best Hospital in a State
Data File if Needed: outcome-of-care-measures.csv
Explanation
I am working with hospital data for different states in the USA. The csv file contains information about 30-day mortality and readmission rates for heart attacks, heart failure, and pneumonia for over 4,000 hospitals.
I want to write a function called 'best' that take two arguments: the 2-character abbreviated name of a state (e.g: 'NY' for New York) and an outcome name. The function reads the 'outcome-of-care-measures.csv' file and returns a character vector with the name of the hospital that has the best (i.e. lowest) 30-day mortality for the specified outcome in that state. The hospital name is the name provided in the Hospital.Name variable in the csv file. The outcomes can be one of “heart attack”, “heart failure”, or “pneumonia”. Hospitals that do not have data on a particular outcome should be excluded from the set of hospitals when deciding the rankings
The function should check the validity of its arguments. If an invalid state value is passed to best, the function should throw an error via the stop function with the exact message “invalid state”. If an invalid outcome value is passed to best, the function should throw an error via the stop function with the exact message “invalid outcome”.
Code I Wrote:
best <- function(state,outcome) {
df <- read.csv("outcome-of-care-measures.csv")
df1 <- df[ ,c(2,7,11,17,23)] # column numbers correspond to the columns of interest from the entire csv file
table <- split(df1,df1$State)
if (outcome == "heart attack") {
n = 3
} else if (outcome == "heart failure") {
n = 4
} else if (outcome == "pneumonia") {
n = 5
} else {
stop("Invalid Outcome")
}
min.val <- min(table$state[,n],na.rm = TRUE)
row.no <- which(table$state[,n] == min.val)
print(table$state[1,row.no])
}
Error
best("TX", "heart failure")
NULL
Warning message:
In min(table$state[, n], na.rm = TRUE) :
no non-missing arguments to min; returning Inf
I 've got an R assignment in which I have to add a column to my matrix. It's about dates(time zones), I use dplyr and lubridate libraries.
So I want from the below table to according to the state column to add its OlsonName(i.e. NSW -> Australia/NSW)
Event.ID Database Date.Time Nearest.town State *OlsonName*
1 20812 Wind 23/11/1975 07:00 SYDNEY NSW *Australia/NSW*
2 20813 Tornado 02/12/1975 14:00 BARHAM NSW *Australia/NSW*
I implement that with a function and a loop:
#function
addOlsonNames <- function(aussieState,aussieTown){
if(aussieState=="NSW"){
if(aussieTown=="BROKEN HILL"){
value <- "Australia/Broken_Hill";
}else{
value <- "Australia/NSW"
}
}else if(aussieState=="QLD"){
value <- "Australia/Queensland"
}else if(aussieState=="NT"){
value <- "Australia/North"
}else if(aussieState=="SA"){
value <- "Australia/South"
}else if(aussieState=="TAS"){
value <- "Australia/Tasmania"
}else if(aussieState=="VIC"){
value <- "Australia/Victoria"
}else if(aussieState=="WA"){
value <- "Australia/West"
}else if(aussieState=="ACT"){
value <- "Australia/ACT"
}
else{
value <- "NAN"
}
return(value)
}
#loop
for(i in 1:nrow(aussieStorms)){
aussieStorms$OlsonName[i] <- addOlsonNames(State[i],Nearest.town[i])
}
Most of the instances are classified correctly like on my table above but some of the instances are misclassified(i.e. State~TAS -> OlsonName~Australia/West. Altough I have some State~TAS -> OlsonName~Australia/Tasmania).
Seems strange to me. What might be the issue ?
Update:
I also tried mutate() and that's what I got:
aus1 <- mutate(aussieStorms,OlsonXYZ = addOlsonNames(State,Nearest.town))
Warning messages:
1: In if (aussieState == "NSW") { :
the condition has length > 1 and only the first element will be used
2: In if (aussieTown == "BROKEN HILL") { :
the condition has length > 1 and only the first element will be used
If Ben Bolker's comment is right then the problem is in here:
for(i in 1:nrow(aussieStorms)){
aussieStorms$OlsonName[i] <- addOlsonNames(State[i],Nearest.town[i])
}
in that the values passed to addOlsonNames are not coming from rows of the aussieStorms data frame. If R isn't giving an error, then it must be getting State[i] from another object called State in your R workspace. Similarly for Nearest.town. If those objects aren't the same as the ones in your aussieStorms data frame, that would explain the apparent misclassification.
[Its also possible that you've used attach on a data frame at some point, and State is being got from that. But attaching data frames is a bad idea as you can see here...]
Ben's solution, ie making them aussieStorms$State and aussieStorms$Nearest.town look good to me.
best <- function(state, outcome) {
data = read.csv("outcome-of-care-measures.csv", colClasses="character")
data[, 11] <- as.numeric(data[, 11])
data[, 17] <- as.numeric(data[, 17])
data[, 23] <- as.numeric(data[, 23])
if (outcome == "heart attack") {
dataset <- data[,c(2,7,11)]
} else if (outcome == "heart failure") {
dataset <- data[,c(2,7,17)]
} else if (outcome == "pneumonia") {
dataset <- data[,c(2,7,23)]
}
dataset<- na.omit(dataset)
names(dataset)<- c("a","State","c")
datastates <- split(dataset, dataset$State)
datastate <- datastates$state
order.h <- order(datastate$c)
answer <- datastate[order.h,]
answer [1,1]
}
The error I am getting in my code is;
Error in order(datastate$c) : argument 1 is not a vector
I believe it is because I did not write the code before it correctly. The code show take the name of the state that I put into the function and create a data set of 3 columns in the order of the third column.
Error in order(datastate$c) : argument 1 is not a vector means that order() doesn't know what to do with datastate$c because it is not a vector. I can't say for sure as you haven't provided data, but my guess is that datastate$c is returning NULL.
Your problem likely lies in the following code:
names(dataset)<- c("a","State","c")
datastates<- split(dataset, dataset$State)
datastate <- datastates$state
order.h <- order(datastate$c)
According to ?split, "the value returned from split is a list of vectors containing the values for the groups. The components of the list are named by the levels of f". In other words, your object datastates no longer has the structure of a data.frame and your attempt to access datastate$c isn't working. I would run your function up till datastates <- split(dataset, dataset$State) and then call str() on datastates to determine its structure.