Order a data frame using character and numeric columns - r

I have a dataframe:
df <- data.frame(c(name = "FORT DUNCAN", "DETAR HOSPITAL", "CYPRESS FAIRBANKS","MISSION REGIONAL", "Test"), rate = c(8.0,8.7,8.7,8.1,8.9))
colnames(df) = c("name","rate")
ordered_df <- df[order(df[,2]),]
name rate
1 FORT DUNCAN 8.0
4 MISSION REGIONAL 8.1
2 DETAR HOSPITAL 8.7
3 CYPRESS FAIRBANKS 8.7
5 Test 8.9
I can clearly order the dataframe by the rate variable. However, If two rates are similar then I want to order by name. i.e. Detar Hospital and Cypress Fairbanks have the same rate of 8.7. Therefore, I want Cypress Fairbanks to move up and Detar Hospital to move down and Test should remain at its place (The last place according to the rate)...
Any ideas???
Cheers

I think I fixed it by:
ordered_df <- df[order(df$rate, df$name),]
Cheers

Since order accepts many variables via ... you can do the following:
> df[order(df[,2],df[,1] ),]
name rate
1 FORT DUNCAN 8.0
4 MISSION REGIONAL 8.1
3 CYPRESS FAIRBANKS 8.7
2 DETAR HOSPITAL 8.7
5 Test 8.9

Related

Sort column values alphabtically [duplicate]

This question already has answers here:
Sort (order) data frame rows by multiple columns
(19 answers)
Closed 6 years ago.
HospitalName | Rating
-----------------------------------| ------
FORT DUNCAN MEDICAL CENTER | 8.1
TOMBALL REGIONAL MEDICAL CENTER | 8.5
DETAR HOSPITAL NAVARRO | 8.7
CYPRESS FAIRBANKS MEDICAL CENTER | 8.7
Here is my sample table , you can see In Hospital Name "DETAR HOSPITAL" and "Cypress FAIRBANKS " Having the same Rating . I have sorted the table by the least rating to highest rating but all I need is sort the Hospital Names alphabetically which have the same Rating, like "CYPRESS..." Should come first then "DETAR" though they having the same Rating but they need to be arranged alphabetically .
Can any one help me with this ?
We can use order
df1[order(df1$Rating, df1$HospitalName),]
# HospitalName Rating
#1 FORT DUNCAN MEDICAL CENTER 8.1
#2 TOMBALL REGIONAL MEDICAL CENTER 8.5
#4 CYPRESS FAIRBANKS MEDICAL CENTER 8.7
#3 DETAR HOSPITAL NAVARRO 8.7
If we are using dplyr, arrange is the way to go
library(dplyr)
df1 %>%
arrange(Rating, HospitalName)

Adding a row based upon conditionals -- trying to do it the most R way

I have a data set that records the averages of air pollution coming from different kinds of monitors by county and year. If the monitor is known to only be Monitor 1 it is coded as such, otherwise the average is coded as "all". If there isn't anything other than Monitor 1 though, so far there isn't an All. I want to take the values of Monitor 1 and create a new row with the exact same information labeled as All, but only if All doesn't already exist. Example:
Year County Type Average
2001 Adams Monitor 1 8.9
2001 Benton Monitor 1 6.5
2001 Benton All 7.1
In this case, I would want it to become:
Year County Type Average
2001 Adams Monitor 1 8.9
2001 Adams All 8.9 ***identical to the above
2001 Benton Monitor 1 6.5
2001 Benton All 7.1
I can think of a few kludgy, convoluted starts to doing this, or I could try to mess with conditionals. But I am trying to improve my R ability and keep my coding consistent with how R works best (there's a phrase for this I'm forgetting...!) Does anyone have any suggestions?
As a first step, I would use the ave function to determine if each row is of Type "Monitor 1" and is the only row for a particular county:
(to.duplicate <- ave(as.character(dat$Type), dat$County, FUN=function(x) if(identical(x, "Monitor 1")) { TRUE } else {rep(FALSE, length(x))}) == "TRUE")
# [1] TRUE FALSE FALSE
Then I would generate all the new rows in one shot and use rbind to add it to the data frame:
new.dat <- dat[to.duplicate,]
new.dat$Type <- "All"
rbind(dat, new.dat)
# Year County Type Average
# 1 2001 Adams Monitor 1 8.9
# 2 2001 Benton Monitor 1 6.5
# 3 2001 Benton All 7.1
# 4 2001 Adams All 8.9

Issue with sorting one column after rank is assigned

*****This is to deal with the question asked in Coursera and hence I may not be able to reveal the complete code*****
hi,
below is my data frame (outcome_H)
Hospital_Name H_A H_F PN
ABC 4.5 5 6
CDE 4.5 1 3
EFG 5 2 1
1) I need to rank the column provided in the function call (it could be one of H_A ,H_F,PN)
2) there will also a rank be provided in the call. Need to match that rank with the rank calculated above and return the respective Hospital_Name
I had used ties.method="first" to solve the tie problem. But however when I look at the final output the hospital name is not sorted.
Example: if i give rank =2, I expect CDE to be printed, but due to some problems(which I am note aware) ABC gets printed for rank=2 and CDE is printed for rank=1.
Below are some parts of code for better understanding:
H_A <- as.numeric(outcome_H$H_A)
HA <- H_A[order(H_A)] // newly added piece to order the value
df <- data.frame(HA,round(rank(HA,ties.method="first")),outcome_H$Hospital_Name)
rowss <- df[order(df$round.rank.HA..),]
Before ordering Output:
HA round.rank.HA.. outcome_H.Hospital.Name
42 8.1 1 FORT DUNCAN MEDICAL CENTER
192 8.5 2 TOMBALL REGIONAL MEDICAL CENTER
61 8.7 4 DETAR HOSPITAL NAVARRO
210 8.7 4 CYPRESS FAIRBANKS MEDICAL CENTER
69 8.8 6 MISSION REGIONAL MEDICAL CENTER
117 8.8 6 METHODIST HOSPITAL,THE
After Ordering output:
HA round.rank.HA..ties.method....first... outcome_H.Hospital.Name
1 8.1 1 PROVIDENCE MEMORIAL HOSPITAL
2 8.5 2 MEMORIAL HERMANN BAPTIST ORANGE HOSPITAL
3 8.7 3 PETERSON REGIONAL MEDICAL CENTER
4 8.7 4 CHILDREN'S HOSPITAL -SCOTT & WHITE HEALTHCARE
5 8.8 5 UNITED REGIONAL HEALTH CARE SYSTEM
6 8.8 6 ST JOSEPH REGIONAL HEALTH CENTER
As you can see, the data with hospital names are completely incorrect.
Any help is very much appreciated.
Thanks,
Pravellika J
You could try H_A <- as.numeric(as.character(outcome_H$H_A))
Output
HA round.rank.HA..ties.method....first... outcome_H.Hospital_Name
1 4.5 1 ABC
2 4.5 2 CDE
3 5.0 3 EFG
I figured it myself. I had initialy assigned HA only with one of the three cols(H_A,H_F,PN). Now i clubbed it with hospital_Name and ordered it based on both the attributes.
Thanks,
Pravellika J

Rank a sorted dataset using apply function

My dataframe looks like this:
head(temp$HName)
[1] "UNIVERSITY OF TEXAS HEALTH SCIENCE CENTER AT TYLER"
[2] "METHODIST HOSPITAL,THE"
[3] "TOMBALL REGIONAL MEDICAL CENTER"
[4] "METHODIST SUGAR LAND HOSPITAL"
[5] "GULF COAST MEDICAL CENTER"
[6] "VHS HARLINGEN HOSPITAL COMPANY LLC"
head(temp$Rate)
[1] 7.3 8.3 8.7 8.7 8.8 8.9
76 Levels: 7.3 8.3 8.7 8.8 8.9 9 9.1 9.2 9.3 9.4 9.5 9.6 ... 17.1
> head(temp$Rank)
[1] NA NA NA NA NA NA
The temp$Rate is sorted. I am trying to write a function assignRank which gives me a new column temp$Rank which has values as 1, 2, 3, 3, 4, 5
My code is as below:
tapply(temp$Rank,temp$Rate, assignRank)
where :
assignRank<- function(r=1){
temp$Rank <- r
r <- r + 1
return(r)
}
I get following error when running tapply
tapply(temp$Rank,temp$Rate, assignRank)
Show Traceback
Rerun with Debug
Error in `$<-.data.frame`(`*tmp*`, "Rank", value = c(NA, NA)) :
replacement has 2 rows, data has 301
Please advise where I am going wrong?
I use data.table for stuff like this, because both sorting and ranking are very efficient/simple syntax
library(data.table)
setkey(setDT(temp), Rate) # This will sort your data set by Rate in case it's not yet sorted
temp[, Rank := .GRP, by = Rate]
temp
# HName Rate Rank
# 1: UNIVERSITY OF TEXAS HEALTH SCIENCE CENTER AT TYLER 7.3 1
# 2: METHODIST HOSPITAL,THE 8.3 2
# 3: TOMBALL REGIONAL MEDICAL CENTER 8.7 3
# 4: METHODIST SUGAR LAND HOSPITAL 8.7 3
# 5: GULF COAST MEDICAL CENTER 8.8 4
# 6: VHS HARLINGEN HOSPITAL COMPANY LLC 8.9 5
Or you could easily do the same using base R (assuming your data is sorted by Rank) just do
as.numeric(factor(temp$Rate))
## [1] 1 2 3 3 4 5
Or could also use dense_rank function from dplyr package (which will not require sorting the data set)
library(dplyr)
temp %>%
mutate(Rank = dense_rank(Rate))
# HName Rate Rank
# 1 UNIVERSITY OF TEXAS HEALTH SCIENCE CENTER AT TYLER 7.3 1
# 2 METHODIST HOSPITAL,THE 8.3 2
# 3 TOMBALL REGIONAL MEDICAL CENTER 8.7 3
# 4 METHODIST SUGAR LAND HOSPITAL 8.7 3
# 5 GULF COAST MEDICAL CENTER 8.8 4
# 6 VHS HARLINGEN HOSPITAL COMPANY LLC 8.9 5
Other options (if the data is ordered)
with(temp, cumsum(ave(Rate, Rate, FUN=function(x) c(1,x[-1]!=x[-length(x)]))))
#[1] 1 2 3 3 4 5
with(temp, match(Rate, unique(Rate)) )
#[1] 1 2 3 3 4 5

How to break ties with order function in R

I have a data frame with 2 columns. I have ordered them using order() function
data<-data[order(data$Mortality),]
head(data)
Hospital.Name Mortality
FORT DUNCAN MEDICAL CENTER 8.1
TOMBALL REGIONAL MEDICAL CENTER 8.5
DETAR HOSPITAL NAVARRO 8.7
CYPRESS FAIRBANKS MEDICAL CENTER 8.7
MISSION REGIONAL MEDICAL CENTER 8.8
METHODIST HOSPITAL,THE 8.8
3rd and 4th positions are ties (Mortality = 8.7 for both). I want to break the tie with alphabetical order in data$Hospital.Name so that "CYPRESS FAIRBANKS" is 3rd and "DETAR HOSPITAL" as 4th.
Use data$Hospital.Name as second argument in order:
R> data <- data[order(data$Mortality, data$Hospital.Name), ]
R> data
Hospital.Name Mortality
1 FORT DUNCAN MEDICAL CENTER 8.1
2 TOMBALL REGIONAL MEDICAL CENTER 8.5
4 CYPRESS FAIRBANKS MEDICAL CENTER 8.7
3 DETAR HOSPITAL NAVARRO 8.7
6 METHODIST HOSPITAL,THE 8.8
5 MISSION REGIONAL MEDICAL CENTER 8.8

Resources