Trying to determine the distances between centroids of countries R - r

I currently have a dataframe of pairs of country codes (like US, RU, CA etc.) Is there a function that determines the centroid of a country given the country code so that I can find the distance between the pairs of countries? Or is there a function that can give me the coordinates of the centroid of each country (such as the longitude and latitude for example)?
This is the first couple lines of my dataset that I had filtered from a previous one for reference.

You can scrape this google public dataset.
My previous suggestion to use the countryref dataset in package CoordinateCleaner doesn't work because I found out there are duplicates with different positions.
library(rvest)
library(dplyr)
url <- 'https://developers.google.com/public-data/docs/canonical/countries_csv'
webpage <- read_html(url)
centroids <- url %>% read_html %>% html_nodes('table') %>% html_table() %>% as.data.frame
data <- data.frame(V1 = c("US","US"), V2 = c('VN','ZA'))
data %>% inner_join(centroids,by = c("V1"="country")) %>% inner_join(centroids,by = c("V2"="country"))
V1 V2 latitude.x longitude.x name.x latitude.y longitude.y name.y
1 US VN 37.09024 -95.71289 United States 14.05832 108.27720 Vietnam
2 US ZA 37.09024 -95.71289 United States -30.55948 22.93751 South Africa

Related

Making a graph from a dataset

I am trying to make a graph showing the average temp in Australia from 1950 to 2000. My dataset contains a "Country" table which contains Australia but also other countries as well. The dataset also includes years and average temp for every country. How would I go about excluding all the other data to make a graph just for Australia?
Example of the dataset
You just need to subset your data so that it only contains observations about Australia. I can't see the details of your dataset from your picture, but let's assume that your dataset is called d and the column of d detailing which country that observation is about is called country. Then you could do the following using base r:
d_aus <- d[d$country == "Australia", ]
Or using dplyr you could do:
library(dplyr)
d_aus <- d %>%
filter(country == "Australia")
Then d_aus would be the dataset containing only the observations about Australia (in which `d$country == "Australia"), which you could use to make your graph.
This should make the job. Alternatively, change the names of the columns to those of yours.
library("ggplot2")
library("dplyr")
data %>% filter(Country == "Australia" & Year %in% (1950:2000)) %>% ggplot(.,aes(x=Year,y=Temp)) + geom_point()

r - ratio calculation via data set

df <- read.csv('https://raw.githubusercontent.com/ulklc/covid19-
timeseries/master/countryReport/raw/rawReport.csv')
df$countryName = as.character(df$countryName)
I processed the dataset.
df$countryName[df$countryName == "United States"] <- "United States of America"
Changed here for United States of America Arrived in population data.
df8$death_pop <- df8$death / df8$PopTotal
I totally calculated the death/pop.
most, 10 countries. death/pop. how can I find?
Using base R:
df8[order(df8$death_pop, decreasing = TRUE)[1:10],]
This orders your data.frame by death_pop and extracts the first 10 rows.
Using the package dplyr there is the function top_n, which gives you the desired result. I added arrange(desc() to give you a sorted output. Remove this part if you don't need it.
df8 %>% top_n(10, death_pop) %>% arrange(desc(death_pop))

Adding values from a dataframe to a different dataframe

I'm a noob in r programming.
I have 2010 census data in the link-
census data.
This is my dataframe-
dataframe.
What I'd like to do is add the population column 'P001001' from the census data for each state into the dataframe. I'm not able to figure out how to map the state abbreviations in the dataframe to the full names in the census data, and add the respective population to each row for that state in the data frame. The data is for all of the states. What should be the simplest way to do this?
Thanks in advance.
Use the inbuilt datasets for USA states: state.abb and state.name see State name to abbreviation in R
Here's a simple bit of code which will give you a tidyverse approach to the problem.
1) add the state abbreviation to the census table
2) left join the census with the df by state abbrevation
library(tibble)
library(dplyr)
census <-tibble(name = c("Colorado", "Alaska"),
poo1oo1 = c(100000, 200000))
census <-
census %>%
mutate(state_abb = state.abb[match(name, state.name)])
df <- tibble(date = c("2011-01-01", "2011-02-01"),
state = rep("CO", 2),
avg = c(123, 1234))
df <-
df %>%
left_join(census, by = c("state" = "state_abb"))

Distance Matrix in R using geosphere

I have a dataset with info on international investments in Europe and coordinates about NUTS3. For each investment I have the city and the coordinates (lat1,long1). I want to compute the distance from each city to each of the NUTS 3 I have --> E.G. Paris to Paris, Paris_Lyone, Paris_Orly, Paris_Maidenhead etc etc. I want to loop this mechanism for all the cities I have, so at the end I have a matrix for each city that include its distance to each NUTS. I tried to use geosphere but it gives me just the distance between rows.
summary(coordinate$NUTS_BN_ID)
summary(fdimkt$NUTS_BN_ID)
##merge dataset
df <- merge(fdimkt,coordinate, by="nutscode", all = FALSE)
View(df)
fix(df)
#install.packages("dplyr")
library(dplyr)
df %>% dplyr::rename(lat1= `_destination_latitude`, long1= `_destination_longitude` )
library(geosphere)
library(data.table)
#dt <- expand.grid.df(df,df)
setDT(df)[ , dist_km := distGeo(matrix(c(`_destination_latitude`, `_destination_longitude`), ncol = 2),
matrix(c(`lat2`, `long2`), ncol = 2))/1000]
summary(df$dist_km)
This didn't work because it returns me the distance by row, but I actually want the distance from each city to all the NUTS3 coordinates I have
Someone can help me with this?
I'm not sure on how to post my dt, this I gues that might help to have more suggestions.

How to scrape key statistics from Yahoo! Finance with R? [duplicate]

This question already has an answer here:
Webscraping with Yahoo Finance
(1 answer)
Closed last year.
Unfortunately, I am not an experienced scraper yet. However, I need to scrape key statistics of multiple stocks from Yahoo Finance with R.
I am somewhat familiar with scraping data directly from html using read_html, html_nodes(), and html_text() from the rvest package. However, this web page MSFT key stats is a bit complicated, I am not sure if all the stats are kept in XHR, JS, or Doc. I am guessing the data is stored in JSON.
If anyone knows a good way to extract and parse data for this web page with R, kindly answer my question, great thanks in advance!
Or if there is a more convenient way to extract these metrics via quantmod or Quandl, kindly let me know, that would be a extremely good solution!
The goal is to have tickets/symbols as rownames/rowlabels whereas the statistics are identified as columns. A illustration of my needs can be found at this Finviz link:
https://finviz.com/screener.ashx
The reason I would like to scrape Yahoo Finance data is because Yahoo also considers Enterprise, EBITDA key stats..
EDIT:
I meant to refer to the key statistics page.. For example.. : https://finance.yahoo.com/quote/MSFT/key-statistics/ . The code should lead to one data frame rows of stock symbols and columns of key stats.
Code
library(rvest)
library(tidyverse)
# Define stock name
stock <- "MSFT"
# Extract and transform data
df <- paste0("https://finance.yahoo.com/quote/", stock, "/financials?p=", stock) %>%
read_html() %>%
html_table() %>%
map_df(bind_cols) %>%
# Transpose
t() %>%
as_tibble()
# Set first row as column names
colnames(df) <- df[1,]
# Remove first row
df <- df[-1,]
# Add stock name column
df$Stock_Name <- stock
Result
Revenue `Total Revenue` `Cost of Revenu… `Gross Profit`
<chr> <chr> <chr> <chr>
1 6/30/2… 110,360,000 38,353,000 72,007,000
2 6/30/2… 96,571,000 33,850,000 62,721,000
3 6/30/2… 91,154,000 32,780,000 58,374,000
4 6/30/2… 93,580,000 33,038,000 60,542,000
# ... with 25 more variables: ...
edit:
Or, for convenience, as a function:
get_yahoo <- function(stock){
# Extract and transform data
x <- paste0("https://finance.yahoo.com/quote/", stock, "/financials?p=", stock) %>%
read_html() %>%
html_table() %>%
map_df(bind_cols) %>%
# Transpose
t() %>%
as_tibble()
# Set first row as column names
colnames(x) <- x[1,]
# Remove first row
x <- x[-1,]
# Add stock name column
x$Stock_Name <- stock
return(x)
}
Usage: get_yahoo(stock)
I hope that this is what are you looking for:
library(quantmod)
library(plyr)
what_metrics <- yahooQF(c("Price/Sales",
"P/E Ratio",
"Price/EPS Estimate Next Year",
"PEG Ratio",
"Dividend Yield",
"Market Capitalization"))
Symbols<-c("XOM","MSFT","JNJ","GE","CVX","WFC","PG","JPM","VZ","PFE","T","IBM","MRK","BAC","DIS","ORCL","PM","INTC","SLB")
metrics <- getQuote(paste(Symbols, sep="", collapse=";"), what=what_metrics)
to get the list of metrics
yahooQF()
you can use lapply to get more than one pirce
library(quantmod)
Symbols<-c("XOM","MSFT","JNJ","GE","CVX","WFC","PG","JPM","VZ","PFE","T","IBM","MRK","BAC","DIS","ORCL","PM","INTC","SLB")
StartDate <- as.Date('2015-01-01')
Stocks <- lapply(Symbols, function(sym) {
Cl(na.omit(getSymbols(sym, from=StartDate, auto.assign=FALSE)))
})
Stocks <- do.call(merge, Stocks)
in this case i get the closing price look in function Cl()

Resources