Sort column values alphabtically [duplicate] - r

This question already has answers here:
Sort (order) data frame rows by multiple columns
(19 answers)
Closed 6 years ago.
HospitalName | Rating
-----------------------------------| ------
FORT DUNCAN MEDICAL CENTER | 8.1
TOMBALL REGIONAL MEDICAL CENTER | 8.5
DETAR HOSPITAL NAVARRO | 8.7
CYPRESS FAIRBANKS MEDICAL CENTER | 8.7
Here is my sample table , you can see In Hospital Name "DETAR HOSPITAL" and "Cypress FAIRBANKS " Having the same Rating . I have sorted the table by the least rating to highest rating but all I need is sort the Hospital Names alphabetically which have the same Rating, like "CYPRESS..." Should come first then "DETAR" though they having the same Rating but they need to be arranged alphabetically .
Can any one help me with this ?

We can use order
df1[order(df1$Rating, df1$HospitalName),]
# HospitalName Rating
#1 FORT DUNCAN MEDICAL CENTER 8.1
#2 TOMBALL REGIONAL MEDICAL CENTER 8.5
#4 CYPRESS FAIRBANKS MEDICAL CENTER 8.7
#3 DETAR HOSPITAL NAVARRO 8.7
If we are using dplyr, arrange is the way to go
library(dplyr)
df1 %>%
arrange(Rating, HospitalName)

Related

Add a column to a dataframe with values based on another column [duplicate]

This question already has answers here:
several substitutions in one line R
(3 answers)
Closed 7 years ago.
I have a dataframe with a column called Province and I need to add a new column called Region. The value is based on the Province column. Here is the dataframe:
Province
1 Alberta
2 Manitoba
3 Ontario
4 British Columbia
5 Nova Scotia
6 New Brunswick
7 Quebec
Output:
Province Region
1 Alberta Prairies
2 Manitoba Prairies
3 Ontario Central
4 British Columbia Pacific
5 Nova Scotia East
6 New Brunswick East
7 Quebec East
I tried this code in R and it is not working.
Region <- as.character(Province)
if (length(grep("British Comlumbia", Province)) > 0) {
return("Pacific")
}
You can create vectors and do a step-wise replacement. This may not be an apt way but this will work.
Prairies <- c("Alberta","Manitoba")
Central <- c("Ontario")
Pacific <- c("British Colombia")
East <- c("Nova Scotia","New Brusnwick","Quebec")
#make a copy of the column province
df$Region <- as.vector(df[,1])
#one by one replace the items based on your vectors
df$Region <- replace(df$Region, df$Region%in%Prairies, "Prairies")
df$Region <- replace(df$Region, df$Region%in%Central, "Central")
df$Region <- replace(df$Region, df$Region%in%Pacific, "Pacific")
df$Region <- replace(df$Region, df$Region%in%East, "East")

How to sort alphabetically rows of a data frame? [duplicate]

This question already has answers here:
Sort (order) data frame rows by multiple columns
(19 answers)
Closed 8 years ago.
I am tring to sort c alphabetically if x[i]== x[i+1]. I used order() function but it changes the x column as well. I want to order the entire row:
best <- function(state){
HospitalName<-vector()
StateName<-vector()
HeartAttack<-vector()
k<-1
outcome<-read.csv("outcome-of-care-measures.csv",colClasses= "character")
temp<-(outcome[,c(2,7,11,17,23)])
for (i in 1:nrow(temp)){
if(identical(state,temp[i,2])==TRUE){
HospitalName[k]<-temp[i,1]
StateName[k]<-temp[i,2]
HeartAttack[k]<-as.numeric(temp[i,4])
k<-k+1
}}
frame<-data.frame(cbind(HospitalName,StateName,HeartAttack))
library(dplyr)
frame %>%
group_by(as.numeric(as.character(frame[,3]))) %>%
arrange(frame[,1])
}
Output:
HospitalName StateName HeartAttack
1 FORT DUNCAN MEDICAL CENTER TX 8.1
2 TOMBALL REGIONAL MEDICAL CENTER TX 8.5
3 CYPRESS FAIRBANKS MEDICAL CENTER TX 8.7
4 DETAR HOSPITAL NAVARRO TX 8.7
5 METHODIST HOSPITAL,THE TX 8.8
6 MISSION REGIONAL MEDICAL CENTER TX 8.8
7 BAYLOR ALL SAINTS MEDICAL CENTER AT FW TX 8.9
8 SCOTT & WHITE HOSPITAL-ROUND ROCK TX 8.9
9 THE HEART HOSPITAL BAYLOR PLANO TX 9
10 UT SOUTHWESTERN UNIVERSITY HOSPITAL TX 9
.. ... ... ...
Variables not shown: as.numeric(as.character(frame[, 3])) (dbl)
Output does not contain the HeartAttack Column and I do not understand why?
One solution with dplyr:
library(dplyr)
df %>%
group_by(x) %>%
arrange(c)
Or as #Akrun mentions in the comments below just
df %>%
arrange(x,c)
if you are not interested in grouping. Depends on what you want.
Output:
Source: local data frame [5 x 2]
Groups: x
x c
1 2 A
2 2 D
3 3 B
4 3 C
5 5 E
There is another solution in base R but it will only work if your x column is ordered as is, or if you don't mind changing the order it has:
> df[order(df$x, df$c), , drop = FALSE]
x c
2 2 A
1 2 D
4 3 B
3 3 C
5 5 E

Issue with sorting one column after rank is assigned

*****This is to deal with the question asked in Coursera and hence I may not be able to reveal the complete code*****
hi,
below is my data frame (outcome_H)
Hospital_Name H_A H_F PN
ABC 4.5 5 6
CDE 4.5 1 3
EFG 5 2 1
1) I need to rank the column provided in the function call (it could be one of H_A ,H_F,PN)
2) there will also a rank be provided in the call. Need to match that rank with the rank calculated above and return the respective Hospital_Name
I had used ties.method="first" to solve the tie problem. But however when I look at the final output the hospital name is not sorted.
Example: if i give rank =2, I expect CDE to be printed, but due to some problems(which I am note aware) ABC gets printed for rank=2 and CDE is printed for rank=1.
Below are some parts of code for better understanding:
H_A <- as.numeric(outcome_H$H_A)
HA <- H_A[order(H_A)] // newly added piece to order the value
df <- data.frame(HA,round(rank(HA,ties.method="first")),outcome_H$Hospital_Name)
rowss <- df[order(df$round.rank.HA..),]
Before ordering Output:
HA round.rank.HA.. outcome_H.Hospital.Name
42 8.1 1 FORT DUNCAN MEDICAL CENTER
192 8.5 2 TOMBALL REGIONAL MEDICAL CENTER
61 8.7 4 DETAR HOSPITAL NAVARRO
210 8.7 4 CYPRESS FAIRBANKS MEDICAL CENTER
69 8.8 6 MISSION REGIONAL MEDICAL CENTER
117 8.8 6 METHODIST HOSPITAL,THE
After Ordering output:
HA round.rank.HA..ties.method....first... outcome_H.Hospital.Name
1 8.1 1 PROVIDENCE MEMORIAL HOSPITAL
2 8.5 2 MEMORIAL HERMANN BAPTIST ORANGE HOSPITAL
3 8.7 3 PETERSON REGIONAL MEDICAL CENTER
4 8.7 4 CHILDREN'S HOSPITAL -SCOTT & WHITE HEALTHCARE
5 8.8 5 UNITED REGIONAL HEALTH CARE SYSTEM
6 8.8 6 ST JOSEPH REGIONAL HEALTH CENTER
As you can see, the data with hospital names are completely incorrect.
Any help is very much appreciated.
Thanks,
Pravellika J
You could try H_A <- as.numeric(as.character(outcome_H$H_A))
Output
HA round.rank.HA..ties.method....first... outcome_H.Hospital_Name
1 4.5 1 ABC
2 4.5 2 CDE
3 5.0 3 EFG
I figured it myself. I had initialy assigned HA only with one of the three cols(H_A,H_F,PN). Now i clubbed it with hospital_Name and ordered it based on both the attributes.
Thanks,
Pravellika J

Order a data frame using character and numeric columns

I have a dataframe:
df <- data.frame(c(name = "FORT DUNCAN", "DETAR HOSPITAL", "CYPRESS FAIRBANKS","MISSION REGIONAL", "Test"), rate = c(8.0,8.7,8.7,8.1,8.9))
colnames(df) = c("name","rate")
ordered_df <- df[order(df[,2]),]
name rate
1 FORT DUNCAN 8.0
4 MISSION REGIONAL 8.1
2 DETAR HOSPITAL 8.7
3 CYPRESS FAIRBANKS 8.7
5 Test 8.9
I can clearly order the dataframe by the rate variable. However, If two rates are similar then I want to order by name. i.e. Detar Hospital and Cypress Fairbanks have the same rate of 8.7. Therefore, I want Cypress Fairbanks to move up and Detar Hospital to move down and Test should remain at its place (The last place according to the rate)...
Any ideas???
Cheers
I think I fixed it by:
ordered_df <- df[order(df$rate, df$name),]
Cheers
Since order accepts many variables via ... you can do the following:
> df[order(df[,2],df[,1] ),]
name rate
1 FORT DUNCAN 8.0
4 MISSION REGIONAL 8.1
3 CYPRESS FAIRBANKS 8.7
2 DETAR HOSPITAL 8.7
5 Test 8.9

How to break ties with order function in R

I have a data frame with 2 columns. I have ordered them using order() function
data<-data[order(data$Mortality),]
head(data)
Hospital.Name Mortality
FORT DUNCAN MEDICAL CENTER 8.1
TOMBALL REGIONAL MEDICAL CENTER 8.5
DETAR HOSPITAL NAVARRO 8.7
CYPRESS FAIRBANKS MEDICAL CENTER 8.7
MISSION REGIONAL MEDICAL CENTER 8.8
METHODIST HOSPITAL,THE 8.8
3rd and 4th positions are ties (Mortality = 8.7 for both). I want to break the tie with alphabetical order in data$Hospital.Name so that "CYPRESS FAIRBANKS" is 3rd and "DETAR HOSPITAL" as 4th.
Use data$Hospital.Name as second argument in order:
R> data <- data[order(data$Mortality, data$Hospital.Name), ]
R> data
Hospital.Name Mortality
1 FORT DUNCAN MEDICAL CENTER 8.1
2 TOMBALL REGIONAL MEDICAL CENTER 8.5
4 CYPRESS FAIRBANKS MEDICAL CENTER 8.7
3 DETAR HOSPITAL NAVARRO 8.7
6 METHODIST HOSPITAL,THE 8.8
5 MISSION REGIONAL MEDICAL CENTER 8.8

Resources