R - functions throws bad results

R - functions throws bad results - r

Explanation
I'm trying to write a function which has to find the lowest number in one of this columns and return hospital name.
"Hospital.30.Day.Death..Mortality..Rates.from.Heart.Attack"
"Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure"
"Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia"
I cannot understand why my results are not the same like in samples from PDF. Please hold in mind that I'm a fresh R programmer.
Example
For best("TX", "heart attack") function should return "CYPRESS FAIRBANKS MEDICAL CENTER". While my function returns: (Pay attention that correct result isn't even in this vector)
[1] "HEREFORD REGIONAL MEDICAL CENTER"
[2] "EAST TEXAS MEDICAL CENTER MOUNT VERNON"
[3] "ALLEGIANCE SPECIALTY HOSPITAL OF KILGORE"
[4] "KNOX COUNTY HOSPITAL"
[5] "EAST TEXAS MEDICAL CENTER TRINITY"
[6] "COMMUNITY GENERAL HOSPITAL"
[7] "KELL WEST REGIONAL HOSPITAL"
[8] "GOOD SHEPHARD MEDICAL CENTER-LINDEN"
[9] "BURLESON ST JOSEPH HEALTH CENTER"
[10] "MCCAMEY HOSPITAL"
[11] "FISHER COUNTY HOSPITAL DISTRICT"
[12] "HANSFORD COUNTY HOSPITAL"
[13] "ST LUKES LAKESIDE HOSPITAL"
Code
best <- function(state, outcome) {
file <- read.csv("outcome-of-care-measures.csv")
data <- file[file$State == state, ]
if (outcome == "heart attack") {
number <- 15 #column number
} else if (outcome == "heart failure") {
number <- 21
} else if (outcome == "pneumonia") {
number <- 27
}
col <- as.numeric(data[, number]) #column data
lowest <- min(col, na.rm = TRUE) #lowest number
data$Hospital.Name[data[, number] == lowest] #result
}
Sources
Data I work with
PDF with instructions Check point 2.

I'm gonna public solution, after hour of searching I found it! In firs steps I accidentally write wrong column numbers from documentation.
Column numbers are incorrect.
Solution
Simply change wrong column numbers (15, 21, 27) to (11, 17, 23)
Thanks
Thank you for your answers, it increased my knowledge. Have a nice weekend.

Related

Using grepl to subset dataframe containing the same mentioning of some text in two columns

I'm working on a dataframe (account) with two columns containing "posting" IP location (in the column city) and the locations at the time when those accounts were first registered (in the column register). I'm using grepl() to subset rows whose posting location and register location are both from the state of New York (NY). Below are part of the data and my code for subsetting the desired output:
account <- data.frame(city = c("Beijing, China", "New York, NY", "Hoboken, NJ", "Los Angeles, CA", "New York, NY", "Bloomington, IN"),
register = c("New York, NY", "New York, NY", "Wilwaukee, WI", "Rochester, NY", "New York, NY", "Tokyo, Japan"))
sub_data <- subset(account, grepl("NY", city) == "NY" & grepl("NY", register) == "NY")
sub_data
[1] city register
<0 rows> (or 0-length row.names)
My code didn't work and returned 0 row (while at least two rows should have met my selection criterion). What went wrong in my code?
I have referenced this previous thread before lodging this question.

The function grepl already returns a logical vector, so just use the following:
sub_data <- subset(account,
grepl("NY", city) & grepl("NY", register)
)
By using something like grepl("NY", city) == "NY" you are asking R if any values in FALSE TRUE FALSE FALSE TRUE FALSE are equal to "NY", which is of course false.

Find out POI (within 2km) using latitude and longitude

I have a dataset which corresponding of Zipcode along with lat and log.I want to find out list of hospital/bank(within 2km) from that latitude and longitude.
How to do it?
The Long/Lat data looks like
store_zip lon lat
410710 73.8248981 18.5154681
410209 73.0907 19.0218215
400034 72.8148177 18.9724162
400001 72.836334 18.9385352
400102 72.834424 19.1418961
400066 72.8635299 19.2313448
400078 72.9327444 19.1570343
400078 72.9327444 19.1570343
400007 72.8133825 18.9618411
400050 72.8299518 19.0551695
400062 72.8426858 19.1593396
400083 72.9374227 19.1166191
400603 72.9781047 19.1834148
401107 72.8929 19.2762702
401105 72.8663173 19.3053477
400703 72.9992013 19.0793547
401209 NA NA
401203 72.7983705 19.4166761
400612 73.0287209 19.1799265
400612 73.0287209 19.1799265
400612 73.0287209 19.1799265

If your Points of Interest are unknown and you need to find them, you can use Google's API through my googleway package (as you've suggested in the comments). You will need a valid API key for this to work.
As the API can only accept one request at a time, you'll need to iterate over your data one row at a time. For that you can use whatever looping method you're most comforatable with
library(googleway) ## using v2.4.0 on CRAN
set_key("your_api_key")
lst <- lapply(1:nrow(df), function(x){
google_places(search_string = "Hospital",
location = c(df[x, 'lat'], df[x, 'lon']),
radius = 2000)
})
lst is now a list that contains the results of the queries. For example, the names of the hospitals it has returned for the first row of your data is
place_name(lst[[1]])
# [1] "Jadhav Hospital"
# [2] "Poona Hospital Medical Store"
# [3] "Sanjeevan Hospital"
# [4] "Suyash Hospital"
# [5] "Mehta Hospital"
# [6] "Deenanath Mangeshkar Hospital"
# [7] "Sushrut Hospital"
# [8] "Deenanath Mangeshkar Hospital and Research Centre"
# [9] "MMF Ratna Memorial Hospital"
# [10] "Maharashtra Medical Foundation's Joshi Multispeciality Hospital"
# [11] "Sahyadri Hospitals"
# [12] "Deendayal Memorial Hospital"
# [13] "Jehangir Specialty Hospital"
# [14] "Global Hospital And Research Institute"
# [15] "Prayag Hospital"
# [16] "Apex Superspeciality Hospital"
# [17] "Deoyani Multi Speciality Hospital"
# [18] "Shashwat Hospital"
# [19] "Deccan Multispeciality Hardikar Hospital"
# [20] "City Hospital"
You can also view them on a map
set_key("map_api_key", api = "map")
## the lat/lon of the returned results are found through `place_location()`
# place_location(lst[[1]])
df_hospitals <- place_location(lst[[1]])
df_hospitals$name <- place_name(lst[[1]])
google_map() %>%
add_circles(data = df[1, ], radius = 2000) %>%
add_markers(data = df_hospitals, info_window = "name")
Note:
Google's API is limited to 2,500 queries per key per day, unless you pay for a premium account.

Extract address components from coordiantes

I'm trying to reverse geocode with R. I first used ggmap but couldn't get it to work with my API key. Now I'm trying it with googleway.
newframe[,c("Front.lat","Front.long")]
Front.lat Front.long
1 -37.82681 144.9592
2 -37.82681 145.9592
newframe$address <- apply(newframe, 1, function(x){
google_reverse_geocode(location = as.numeric(c(x["Front.lat"],
x["Front.long"])),
key = "xxxx")
})
This extracts the variables as a list but I can't figure out the structure.
I'm struggling to figure out how to extract the address components listed below as variables in newframe
postal_code, administrative_area_level_1, administrative_area_level_2, locality, route, street_number
I would prefer each address component as a separate variable.

Google's API returns the response in JSON. Which, when translated into R naturally forms nested lists. Internally in googleway this is done through jsonlite::fromJSON()
In googleway I've given you the choice of returning the raw JSON or a list, through using the simplify argument.
I've deliberately returned ALL the data from Google's response and left it up to the user to extract the elements they're interested in through usual list-subsetting operations.
Having said all that, in the development version of googleway I've written a few functions to help accessing elements of various API calls. Here are three of them that may be useful to you
## Install the development version
# devtools::install_github("SymbolixAU/googleway")
res <- google_reverse_geocode(
location = c(df[1, 'Front.lat'], df[1, 'Front.long']),
key = apiKey
)
geocode_address(res)
# [1] "45 Clarke St, Southbank VIC 3006, Australia"
# [2] "Bank Apartments, 275-283 City Rd, Southbank VIC 3006, Australia"
# [3] "Southbank VIC 3006, Australia"
# [4] "Melbourne VIC, Australia"
# [5] "South Wharf VIC 3006, Australia"
# [6] "Melbourne, VIC, Australia"
# [7] "CBD & South Melbourne, VIC, Australia"
# [8] "Melbourne Metropolitan Area, VIC, Australia"
# [9] "Victoria, Australia"
# [10] "Australia"
geocode_address_components(res)
# long_name short_name types
# 1 45 45 street_number
# 2 Clarke Street Clarke St route
# 3 Southbank Southbank locality, political
# 4 Melbourne City Melbourne administrative_area_level_2, political
# 5 Victoria VIC administrative_area_level_1, political
# 6 Australia AU country, political
# 7 3006 3006 postal_code
geocode_type(res)
# [[1]]
# [1] "street_address"
#
# [[2]]
# [1] "establishment" "general_contractor" "point_of_interest"
#
# [[3]]
# [1] "locality" "political"
#
# [[4]]
# [1] "colloquial_area" "locality" "political"

After reverse geocoding into newframe$address the address components could be extracted further as follows:
# Make a boolean array of the valid ("OK" status) responses (other statuses may be "NO_RESULTS", "REQUEST_DENIED" etc).
sel <- sapply(c(1: nrow(newframe)), function(x){
newframe$address[[x]]$status == 'OK'
})
# Get the address_components of the first result (i.e. best match) returned per geocoded coordinate.
address.components <- sapply(c(1: nrow(newframe[sel,])), function(x){
newframe$address[[x]]$results[1,]$address_components
})
# Get all possible component types.
all.types <- unique(unlist(sapply(c(1: length(address.components)), function(x){
unlist(lapply(address.components[[x]]$types, function(l) l[[1]]))
})))
# Get "long_name" values of the address_components for each type present (the other option is "short_name").
all.values <- lapply(c(1: length(address.components)), function(x){
types <- unlist(lapply(address.components[[x]]$types, function(l) l[[1]]))
matches <- match(all.types, types)
values <- address.components[[x]]$long_name[matches]
})
# Bind results into a dataframe.
all.values <- do.call("rbind", all.values)
all.values <- as.data.frame(all.values)
names(all.values) <- all.types
# Add columns and update original data frame.
newframe[, all.types] <- NA
newframe[sel,][, all.types] <- all.values
Note that I've only kept the first type given per component, effectively skipping the "political" type as it appears in multiple components and is likely superfluous e.g. "administrative_area_level_1, political".

You can use ggmap:revgeocode easily; look below:
library(ggmap)
df <- cbind(df,do.call(rbind,
lapply(1:nrow(df),
function(i)
revgeocode(as.numeric(
df[i,2:1]), output = "more")
[c("administrative_area_level_1","locality","postal_code","address")])))
#output:
df
# Front.lat Front.long administrative_area_level_1 locality
# 1 -37.82681 144.9592 Victoria Southbank
# 2 -37.82681 145.9592 Victoria Noojee
# postal_code address
# 1 3006 45 Clarke St, Southbank VIC 3006, Australia
# 2 3833 Cec Dunns Track, Noojee VIC 3833, Australia
You can add "route" and "street_number" to the variables that you want to extract but as you can see the second address does not have street number and that will cause an error.
Note: You may also use sub and extract the information from the address.
Data:
df <- structure(list(Front.lat = c(-37.82681, -37.82681), Front.long =
c(144.9592, 145.9592)), .Names = c("Front.lat", "Front.long"), class = "data.frame",
row.names = c(NA, -2L))

function to retrieve data from one column based on data from another column in R

I have a farmers market data set and one of the columns is "MarketName" (column2) and one is "WIC" (column21). I wrote a function to retrieve the market name if the WIC column = Y. My output should be a list of 2,207 names however I am getting an output of 8,144 rows because for the rows where the WIC column = N, my output is showing NA. There are 45 columns and 8,144 rows but here is a fake data set with only two columns
MarketName <- c("Union Springs Famers Market","Union Square Farmers Market", "Union Square Greenmarket", "Union Street Farmers Market", "Unity Market Day Farmers", "University Farmers Market")
WIC <- c("Y","N","N","N","Y","Y")
data3 <- data.frame(MarketName, WIC)
data3$MarketName <- as.character(data3$MarketName)
data3$WIC <- as.character(data3$WIC)
This is my function (which could be the problem?)
marketacceptWIC <- function(mydf)
{
market <- 0
for(i in 1:length(data3$WIC))
{
if(data3[i,2] == "Y")
market[i] <- data3[i,1]
}
return(market)
}
This is a sample of the output that I am getting
[1] "Union Springs Famers Market" NA NA NA
[5] "Unity Market Day Farmers" "University Farmers Market"
What I want is just a list of the farmers markets that accept WIC
[1] "Union Springs Farmers' Market"
[2] "Unity Market Day Farmers"
[3] "University Farmers Market"

I don't think you need a for loop here. Try to subset on WIC column. If you only need the MarketName column then
data3[data3$WIC == "Y", ]$MarketName
[1] "Union Springs Famers Market" "Unity Market Day Farmers" "University Farmers Market"

Why is the second column of my data.frame not printing

I need help with my second column. It is not printing. My program is looping through giving me the correct hospitals but is not giving me my second column the statelist. (the state abbreviation where the hospital is located)
This is what I get:
[1] "ALASKA REGIONAL HOSPITAL"
[1] "BAPTIST MEDICAL CENTER EAST"
[1] "BAPTIST HEALTH MEDICAL CENTER NORTH LITTLE ROCK"
[1] "CARONDELET HEART AND VASCULAR INSTITUTE"
[1] "OLYMPIA MEDICAL CENTER"
but this is what I need
[1] "ALASKA REGIONAL HOSPITAL" "AK"
[1] "BAPTIST MEDICAL CENTER EAST" "AL"
[1] "BAPTIST HEALTH MEDICAL CENTER NORTH LITTLE ROCK" "AR"
Here's the code
states<-unique(dat$State)
state<-(sort(states))
for (i in 1:length(state))
statelist<-state[i]
HospitalName<-data.frame()
for(i in 1:length(state)) {
if(outcome=="heart attack" && num==num) {
sub.heart.attack <- subset(dat,State==state[i],select=c (Hospital.Name,Hospital.30.Day.Death..Mortality..Rates.from.Heart.Attack))
order.heart.attack <- order(as.numeric(sub.heart.attack[,2],na.rm=TRUE))
number1.heart.attack <- order.heart.attack[num]
HospitalName <- print(sub.heart.attack$Hospital.Name[number1.heart.attack])
statelist
}
}
HospitalName<-data.frame(HospitalName,statelist)
}

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R - functions throws bad results - r

Related

Using grepl to subset dataframe containing the same mentioning of some text in two columns

Find out POI (within 2km) using latitude and longitude

Extract address components from coordiantes

function to retrieve data from one column based on data from another column in R

Why is the second column of my data.frame not printing

Categories

Resources