Application of 'apply' functions in R - r

I have got the following list which was generated by using split function with state as index.
$AK
Hospital_Name State Mortality_Rate
99 PROVIDENCE ALASKA MEDICAL CENTER AK 13.4
100 MAT-SU REGIONAL MEDICAL CENTER AK 17.7
102 FAIRBANKS MEMORIAL HOSPITAL AK 15.5
$AL
Hospital_Name State Mortality_Rate
1 SOUTHEAST ALABAMA MEDICAL CENTER AL 14.3
2 MARSHALL MEDICAL CENTER SOUTH AL 18.5
3 ELIZA COFFEE MEMORIAL HOSPITAL AL 18.1
$AR
Hospital_Name State Mortality_Rate
193 SILOAM SPRINGS MEMORIAL HOSPITAL AR 15.6
194 JOHNSON REGIONAL MEDICAL CENTER AR 16.9
195 WASHINGTON REGIONAL MED CTR AT NORTH HILLS AR 15.2
I want to select Hospital Name of 2nd row from each of these states. Can somebody help with an apply function here? I was trying to use sapply the following way(not working) -
x <- sapply(test.case3,function(i) test.case3[[i]][2,1])
so that, by varying 'i', I can get result as follows -
> test.case3[[1]][2,1]
[1] "MAT-SU REGIONAL MEDICAL CENTER"
> test.case3[[2]][2,1]
[1] "MARSHALL MEDICAL CENTER SOUTH"
> test.case3[[3]][2,1]
[1] "JOHNSON REGIONAL MEDICAL CENTER"
Kindly advise.
I required to produce the final report in the following format.(ignore the data, in below example)
Hospital_Name State
D W MCMILLAN MEMORIAL HOSPITAL AL
ARKANSAS METHODIST MEDICAL CENTER AR
JOHN C LINCOLN DEER VALLEY HOSPITAL AZ

Related

How can I filter (dplyr) on the same dataset twice in a 'for' loop? R

I have a dataset that looks like this:
Hospital.Name State heart attack
1 SOUTHEAST ALABAMA MEDICAL CENTER AL 14.3
2 MARSHALL MEDICAL CENTER SOUTH AL 18.5
3 ELIZA COFFEE MEMORIAL HOSPITAL AL 18.1
4 MIZELL MEMORIAL HOSPITAL AL Not Available
5 CRENSHAW COMMUNITY HOSPITAL AL Not Available
6 MARSHALL MEDICAL CENTER NORTH AL Not Available
7 ST VINCENT'S EAST AL 17.7
8 DEKALB REGIONAL MEDICAL CENTER AL 18.0
9 SHELBY BAPTIST MEDICAL CENTER AL 15.9
10 CALLAHAN EYE FOUNDATION HOSPITAL AL Not Available
11 HELEN KELLER MEMORIAL HOSPITAL AL 19.6
12 DALE MEDICAL CENTER AL 17.3
13 CHEROKEE MEDICAL CENTER AL Not Available
14 BAPTIST MEDICAL CENTER SOUTH AL 17.8
15 JACKSON HOSPITAL & CLINIC INC AL 17.5
16 GEORGE H. LANIER MEMORIAL HOSPITAL AL 15.4
17 ELBA GENERAL HOSPITAL AL Not Available
18 EAST ALABAMA MEDICAL CENTER AND SNF AL 16.3
19 WEDOWEE HOSPITAL AL Not Available
20 UNIVERSITY OF ALABAMA HOSPITAL AL 15.0
The goal is to retrieve the hospital name, for a given rank of hospital on 'heart attack' for every state. For example, here I am trying to retrieve the hospital name for the lowest score (rank=1) in the heart attack column, for every state in a data frame.
This is my attempt:
stateVec <- unique(df$State)
outcome <- 'heart attack'
name <- c()
st <- c()
stateVec <- c()
rank <- 1
for (i in 1:length(stateVec)) {
k <- stateVec[i]
df1 <- dplyr::filter(df, State==k)
rankVec <- unique(df[[outcome]])
rankVec <- sort(rankVec[rankVec != 'Not Available'])
key <- rankVec[rank]
df1 <- dplyr::filter(df1, get(outcome, envir = as.environment(df))==key)
df1 <- df1[order(df$Hospital.Name), , drop = F]
d <- df1[1,]
name <- d$Hospital.Name
st <- k
return(data.frame(st, name))
}
I receive the following error:
Error in filter_impl(.data, quo) : Result must have length 98, not 4706
I've tried recreating the problem with the mtcars dataset, and don't get the same error. Any help would be appreciated :)
I think this is what you are looking for.
desired_rank <- 1
df %>%
filter(!is.na(heart.attack)) %>%
group_by(State) %>%
arrange(heart.attack) %>%
slice(desired_rank) %>%
ungroup()
It remove's NA values for heart.attack;
Then groups by State;
Then sorts ascending on heart.attack;
Then returns the first hospital (so the hospital with lowest heart.attack value).
The output is a data.frame.

Convert string to symbol accepted by dplyr in function

My data frame looks like:
> str(b)
'data.frame': 2720 obs. of 3 variables:
$ Hospital.Name: chr "SOUTHEAST ALABAMA MEDICAL CENTER" "MARSHALL MEDICAL CENTER SOUTH" "ELIZA COFFEE MEMORIAL HOSPITAL" "ST VINCENT'S EAST" ...
$ State : chr "AL" "AL" "AL" "AL" ...
$ heart attack : num 14.3 18.5 18.1 17.7 18 15.9 19.6 17.3 17.8 17.5 ...
I want to group it by State, sort them by State and Heart Attack, and then add a column that return row number within each group. The ideal result would look like:
# A tibble: 2,720 x 4
# Groups: State [54]
Hospital.Name State `heart attack` rank
<chr> <chr> <dbl> <int>
1 PROVIDENCE ALASKA MEDICAL CENTER AK 13.4 1
2 ALASKA REGIONAL HOSPITAL AK 14.5 2
3 FAIRBANKS MEMORIAL HOSPITAL AK 15.5 3
4 ALASKA NATIVE MEDICAL CENTER AK 15.7 4
5 MAT-SU REGIONAL MEDICAL CENTER AK 17.7 5
6 CRESTWOOD MEDICAL CENTER AL 13.3 1
7 BAPTIST MEDICAL CENTER EAST AL 14.2 2
8 SOUTHEAST ALABAMA MEDICAL CENTER AL 14.3 3
9 GEORGIANA HOSPITAL AL 14.5 4
10 PRATTVILLE BAPTIST HOSPITAL AL 14.6 5
# ... with 2,710 more rows
so my code is:
outcome<-"heart attack"
c<-arrange(b,State,sym(outcome))%>%
group_by(State)%>%
mutate(rank=row_number(sym(outcome)))
but I got this error:
Error in arrange_impl(.data, dots) : object 'heart attack' not found
When I ran sym(outcome) independently and copied the results into my code, it works:
sym(outcome)
`heart attack`
c<-arrange(b,State,`heart attack`)%>%
+ group_by(State)%>%
+ mutate(rank=rank(`heart attack`))
> c
# A tibble: 2,720 x 4
# Groups: State [54]
Hospital.Name State `heart attack` rank
<chr> <chr> <chr> <dbl>
1 PROVIDENCE ALASKA MEDICAL CENTER AK 13.4 1
2 ALASKA REGIONAL HOSPITAL AK 14.5 2
3 FAIRBANKS MEMORIAL HOSPITAL AK 15.5 3
4 ALASKA NATIVE MEDICAL CENTER AK 15.7 4
5 MAT-SU REGIONAL MEDICAL CENTER AK 17.7 5
6 CRESTWOOD MEDICAL CENTER AL 13.3 1
7 BAPTIST MEDICAL CENTER EAST AL 14.2 2
8 SOUTHEAST ALABAMA MEDICAL CENTER AL 14.3 3
9 GEORGIANA HOSPITAL AL 14.5 4
10 PRATTVILLE BAPTIST HOSPITAL AL 14.6 5
# ... with 2,710 more rows
This is a part of a function, so the 'outcome' needs to be a string. Therefore I tried to convert a string to a symbol so that I can reference the column in dplyr.
can anyone tell me what's happening here?
are there any good ways to achieve my goal?
You need to unquote the symbol with !!:
arrange(b, State, !!sym(outcome))
Or UQ:
arrange(b, State, UQ(sym(outcome)))
Similarly for mutate:
mutate(rank=row_number(!!sym(outcome))) # or mutate(rank=row_number(UQ(sym(outcome))))
If you are only trying to name the column then you will want to use the backtick (`). (It is typically paired with the ~ on the top left of your keyboard just below the ESC key.) Please note that is not the same as the single quotation mark (').
The reason you often will get your variable written like this is from importing header names containing spaces into tibbles. Any header name that has a space in it gets wrapped in `. You need to refer to those columns by also wrapping them in backticks or else R does not recognize you are referring the objects in memory that it can work with. It will just think you are referring to the string and not the object in memory. Though it will happily store the object with a space in its name if you use " or '.
see below demonstration of the issue:
`tidy time` <- 4
'tidy time' <- 5
"tidy time" <- 6
print('tidy time')
print("tidy time")
print(`tidy time`)
This is the cause for R's error message.
Hopefully understanding all that will spare you from having to call on the sym function. In any case, if you remove the space in the name the problem will also go away and you can save the backticks for another day.
To learn more about !! and unquoting variables (which psidom was referring to in his answer), and also learn about the related issues that occur in writing functions that rely on referencing objects with non-standard evaluation in dplyr please see here: https://rpubs.com/hadley/dplyr-programming

Sorting a dataframe based on multiple columns - Sorting issue

i have a data frame as below
Provider.Number Hospital.Name State Mortality
210001 MERITUS MEDICAL CENTER MD 12.5
210002 UNIVERSITY OF MARYLAND MEDICAL CENTER MD 12.7
210003 PRINCE GEORGES HOSPITAL CENTER MD 13
210004 HOLY CROSS HOSPITAL MD 9.6
210005 FREDERICK MEMORIAL HOSPITAL MD 9.8
210006 HARFORD MEMORIAL HOSPITAL MD 11.5
210007 SAINT JOSEPH MEDICAL CENTER MD 9.5
210008 MERCY MEDICAL CENTER INC MD 11.2
210009 JOHNS HOPKINS HOSPITAL, THE MD 10.2
210011 SAINT AGNES HOSPITAL MD 11.1
210012 SINAI HOSPITAL OF BALTIMORE MD 9.7
210013 BON SECOURS HOSPITAL MD 9.6
210015 MEDSTAR FRANKLIN SQUARE MEDICAL CENTER MD 9.3
210016 WASHINGTON ADVENTIST HOSPITAL MD 11
210017 GARRETT COUNTY MEMORIAL HOSPITAL MD 13.5
210018 MEDSTAR MONTGOMERY MEDICAL CENTER MD 9.3
210019 PENINSULA REGIONAL MEDICAL CENTER MD 10.6
210022 SUBURBAN HOSPITAL MD 9.9
210023 ANNE ARUNDEL MEDICAL CENTER MD 12
210024 MEDSTAR UNION MEMORIAL HOSPITAL MD 11.3
210027 WESTERN MARYLAND REGIONAL MEDICAL CENTER MD 12.6
210028 MEDSTAR SAINT MARY'S HOSPITAL MD 13.1
210029 JOHNS HOPKINS BAYVIEW MEDICAL CENTER MD 10.7
210030 CHESTER RIVER HOSPITAL CENTER MD 11.2
210032 UNION HOSPITAL OF CECIL COUNTY MD 9.9
210033 CARROLL HOSPITAL CENTER MD 9.7
210034 MEDSTAR HARBOR HOSPITAL MD 9.2
210035 CIVISTA MEDICAL CENTER MD 14.2
210037 MEMORIAL HOSPITAL AT EASTON MD 10.6
210038 MARYLAND GENERAL HOSPITAL MD 10.8
210039 CALVERT MEMORIAL HOSPITAL MD 10.1
210040 NORTHWEST HOSPITAL CENTER MD 12.6
210043 BALTIMORE WASHINGTON MEDICAL CENTER MD 12.7
210044 GREATER BALTIMORE MEDICAL CENTER MD 7.4
210045 EDWARD MCCREADY MEMORIAL HOSPITAL MD 12.9
210048 HOWARD COUNTY GENERAL HOSPITAL MD 10.1
210049 UPPER CHESAPEAKE MEDICAL CENTER MD 12.9
210051 DOCTORS' COMMUNITY HOSPITAL MD 11
210054 SOUTHERN MARYLAND HOSPITAL CENTER MD 11.7
210055 LAUREL REGIONAL MEDICAL CENTER MD 10.6
210056 MEDSTAR GOOD SAMARITAN HOSPITAL MD 8.4
210057 SHADY GROVE ADVENTIST HOSPITAL MD 11.7
210060 FORT WASHINGTON HOSPITAL MD 11
210061 ATLANTIC GENERAL HOSPITAL MD 10.8
21020F VA MARYLAND HEALTHCARE SYSTEM - BALTIMORE MD 12.6
i want to sort the DF based on mortality column and then alphabetical order of Hospital column to handle ties.
I tired using different sort functions like
sorted <- work[order(work$Mortality,work$Hosptial),]
or
with(work, order(Mortality, Hospital))
but the final sorted outcome is wrong.
The output i got is
Hosptial State Mortality
CALVERT MEMORIAL HOSPITAL MD 10.1
HOWARD COUNTY GENERAL HOSPITAL MD 10.1
JOHNS HOPKINS HOSPITAL, THE MD 10.2
LAUREL REGIONAL MEDICAL CENTER MD 10.6
MEMORIAL HOSPITAL AT EASTON MD 10.6
PENINSULA REGIONAL MEDICAL CENTER MD 10.6
JOHNS HOPKINS BAYVIEW MEDICAL CENTER MD 10.7
ATLANTIC GENERAL HOSPITAL MD 10.8
MARYLAND GENERAL HOSPITAL MD 10.8
DOCTORS' COMMUNITY HOSPITAL MD 11
FORT WASHINGTON HOSPITAL MD 11
WASHINGTON ADVENTIST HOSPITAL MD 11
SAINT AGNES HOSPITAL MD 11.1
CHESTER RIVER HOSPITAL CENTER MD 11.2
MERCY MEDICAL CENTER INC MD 11.2
MEDSTAR UNION MEMORIAL HOSPITAL MD 11.3
HARFORD MEMORIAL HOSPITAL MD 11.5
SHADY GROVE ADVENTIST HOSPITAL MD 11.7
SOUTHERN MARYLAND HOSPITAL CENTER MD 11.7
ANNE ARUNDEL MEDICAL CENTER MD 12
MERITUS MEDICAL CENTER MD 12.5
NORTHWEST HOSPITAL CENTER MD 12.6
VA MARYLAND HEALTHCARE SYSTEM - BALTIMORE MD 12.6
WESTERN MARYLAND REGIONAL MEDICAL CENTER MD 12.6
BALTIMORE WASHINGTON MEDICAL CENTER MD 12.7
UNIVERSITY OF MARYLAND MEDICAL CENTER MD 12.7
EDWARD MCCREADY MEMORIAL HOSPITAL MD 12.9
UPPER CHESAPEAKE MEDICAL CENTER MD 12.9
PRINCE GEORGES HOSPITAL CENTER MD 13
MEDSTAR SAINT MARY'S HOSPITAL MD 13.1
GARRETT COUNTY MEMORIAL HOSPITAL MD 13.5
CIVISTA MEDICAL CENTER MD 14.2
GREATER BALTIMORE MEDICAL CENTER MD 7.4
MEDSTAR GOOD SAMARITAN HOSPITAL MD 8.4
MEDSTAR HARBOR HOSPITAL MD 9.2
MEDSTAR FRANKLIN SQUARE MEDICAL CENTER MD 9.3
MEDSTAR MONTGOMERY MEDICAL CENTER MD 9.3
SAINT JOSEPH MEDICAL CENTER MD 9.5
BON SECOURS HOSPITAL MD 9.6
HOLY CROSS HOSPITAL MD 9.6
CARROLL HOSPITAL CENTER MD 9.7
SINAI HOSPITAL OF BALTIMORE MD 9.7
FREDERICK MEMORIAL HOSPITAL MD 9.8
SUBURBAN HOSPITAL MD 9.9
UNION HOSPITAL OF CECIL COUNTY MD 9.9
Function that gets 2 arguments:
state(check the original CSV data)
heart attack/ heat failure/ Pneumonia
My complete code is
best <- function(states,outcomes)
{
#patterns is obtained to use them in the regex function
patterns<-paste("^Hospital.*",outcomes, sep="")
Readcsv<-read.csv("outcome-of-care-measures.csv", check.names = F)
columnname<-colnames(Readcsv)
#regex operation going on
regex1<-grep(patterns,columnname,ignore.case=TRUE, value = T)
#another regex operation
Extracted<-grep("Mortality",regex1,ignore.case=TRUE, value = T)
#extract dataframe based on the state and final extracted column name using the regex function
dfe<-subset(Readcsv, Readcsv$State == states & Readcsv[[Extracted]]!="Not Available")
#create a vector
b<-c("Hospital Name","State", Extracted)
#extract only those columns seen in the vector
work<-dfe[,b]
#change column name
colnames(work)<-c("Hosptial","State","Mortality")
# stuck after this point
Ascorder<-work[with(work, order(Mortality, Hosptial)),]
}
I am relatively new to Stack overflow, please mind my formatting issue. I would like to understand where I'm going wrong.
You can use dplyr for this:
require(dplyr)
work <- work %>%
arrange(Mortality, Hospital)
I couldn't test it because you didn't give a reproducible example of your data, but it should do the trick.

subsetting by a variable name of a column r

row.names Hospital State Heart Attack Heart Failure
1 2275 PROVIDENCE MEMORIAL HOSPITAL TX 16.1 9.1
2 2276 MEMORIAL HERMANN BAPTIST ORANGE HOSPITALTX 16.3 14.3
4 2278 UNITED REGIONAL HEALTH CARE SYSTEM TX 17.4 15.1
5 2279 ST JOSEPH REGIONAL HEALTH CENTER TX 15.7 15.6
6 2280 PARKLAND HEALTH AND HOSPITAL SYSTEM TX 12.9 11.2
7 2281 UNIVERSITY OF TEXAS MEDICAL BRANCH GAL TX 17.4 11.8
Hello R peeps, I need to get a row.name where input, which is variable column name (Heart Attack or Heart Failure) is minimum for that column. In the exmple above, if I input "Heart failure" it needs to return [1] 2275Which row name in the first row. so far I got this:inds<-subset(wfperstate, wfperstate[[outname]]==min)where wfperstate is my data frame
outname is my inputPlease, help!
To transform my last comment into a function :
get_min_rowname <-
function(dat,col)
dat[which.min(dat[[col]]),"row.names"]
Then you apply it :
get_min_rowname(wfperstate, "Heart Attack")
get_min_rowname(wfperstate, "Heart Failure")

How to break ties with order function in R

I have a data frame with 2 columns. I have ordered them using order() function
data<-data[order(data$Mortality),]
head(data)
Hospital.Name Mortality
FORT DUNCAN MEDICAL CENTER 8.1
TOMBALL REGIONAL MEDICAL CENTER 8.5
DETAR HOSPITAL NAVARRO 8.7
CYPRESS FAIRBANKS MEDICAL CENTER 8.7
MISSION REGIONAL MEDICAL CENTER 8.8
METHODIST HOSPITAL,THE 8.8
3rd and 4th positions are ties (Mortality = 8.7 for both). I want to break the tie with alphabetical order in data$Hospital.Name so that "CYPRESS FAIRBANKS" is 3rd and "DETAR HOSPITAL" as 4th.
Use data$Hospital.Name as second argument in order:
R> data <- data[order(data$Mortality, data$Hospital.Name), ]
R> data
Hospital.Name Mortality
1 FORT DUNCAN MEDICAL CENTER 8.1
2 TOMBALL REGIONAL MEDICAL CENTER 8.5
4 CYPRESS FAIRBANKS MEDICAL CENTER 8.7
3 DETAR HOSPITAL NAVARRO 8.7
6 METHODIST HOSPITAL,THE 8.8
5 MISSION REGIONAL MEDICAL CENTER 8.8

Resources