How to Calculate the normalised data of column using population? - r

London City number Data
Population of London city numbers is 100,000. How can I calculate the normalized data using population? I have searched and spent a lot of time to find out any clue but failed. I find ways to normalize data without using population number but did not find any way to normalize the data using population number. Can anyone help me, please?

Related

How to create a column by setting the condition

I am currently working on dataset with different firms. I have each firms' longitude and latitude. I want to find the firms' city locations by using R.
For example, I found that Shanghai's city longitude and latitude range 120.852326~122.118227 and 30.691701~31.874634 respectively.
I firstly want to create a column named "city", and I want to use find if firms' longitudes and latitudes within Shanghai's city longitude and latitude range. If yes, then R will print "Shanghai" in the "city column if not, it will remain NA.
In my dataframe longitude and latitude variables are displayed as "longitude" and "latitude".
I am not sure how to run the code and I am really appreciate your favor and help!
I am really struggling at the beginning. Your help and favor are highly appreciative!

In R, how can I join data by similar, but not identical, centroid (or numeric argument)?

I'm building a transition matrix of land use change (state) over the years.
I'm therefore comparing shapefile years after years and build a dataframe with:
Landuse year1 - Landuse year2 - ....- ID- centroid
with the following function :
full_join(landuse1, landuse2, by="centroid")
where centroid is the actual centroid of the polygons. A centroid, is basically a vector of two numeric value.
However, the centroid, year after year, can slitghly shift (because the polygon actually change a little bit) leading in incomplete data gathering through the full_join function because centroid must exactly match.
I'd like to include a "more or less" argument, so that that any centroid close enough to the one from the year before can be joined to the datagrame for that particular polygon.
But I'm not sure how ?
Thank you in advance.
So the general term for what you are trying to do is called fuzzy matching. Im not sure how exactly it would work for the coordinates of a centroid. My Idea would be to calculate the distance between the Coordinates, and then set a margin of error, say 0.5%, and if they deviate from each other by less than that you could declare it a match. Basically loop through your list of locations and give the matches some unique ID, which you can then use for the join

How do I count instances of a categorical variable for each instance of another categorical variable?

Disclaimer: I can't include data because it's confidential student data.
I have an R dataframe "data" with a column "StateResidence" for what state the student is from, and a column "Enrolled" 0 or a 1 that tells whether or not they enrolled in the school I go to.
I'm trying to make a dataframe with three columns: Column 1 should list out each of the 69 unique States listed in Data (I've already done this one), and column 2 should show how many students from that state are enrolled, and column 3 should show what percentage of the total students from that state were enrolled.
The reason for this is so I can do some exploratory data analysis by plotting barplots with the number on the Y axis and the state on the X axis to analyze enrollment trends geographically.
I really don't have much else to include - I'm completely lost here, and I'm not very familiar with R. Any help is greatly appreciated, even just some helpful functions or something. Thank you.

Calculating Sharpe Ratio with monthly Returns (multiple securities in one Dataset)

The dataset contains over 1 million rows with columns for the monthly returns, the Date and the Securities ID. So it is the monthly data for about 20,000 funds. Here is a screenshot of how the data is structured.
The problem is, that I can not get the calculation of the Sharpe Ratio to work using the PerformanceAnalytics library. At least not with the data as it is given. How would you approach this calculation? Would really appreciate your input, as I am new to the world of R and happy to learn. Thanks in advance!

Nearest weather station to each zip code in large dataset?

I'm looking for an efficient way to link each record in a large dataset to its nearest NOAA weather station. The dataset contains 9-digit zip codes, and NOAA weather stations have lat long info. Anyone have tips on the best way to do this? Thanks!
EDIT: updating with code that worked in case anyone else is looking to find nearest NOAA weather station to a set of zip codes/ if there are suggestions for better ways to do this.
code based on that provided in this question: Finding nearest neighbour (log, lat), then the next closest neighbor, and so on for all points between two datasets in R
temp_stations is downloaded from https://www1.ncdc.noaa.gov/pub/data/normals/1981-2010/station-inventories/temp-inventory.txt (weather stations used in development of temperature dataset)
zipcodes is a package that contains a dataset with lat long for each zip code in the US.
install.packages("zipcode")
require(zipcode)
data(zipcode)
#prime.zips is a subset of "zipcode" created by selecting just the zip codes contained in my original dataset. running the code below on the whole zipcode dataset crashed R on my computer.
install.packages("geosphere")
require(geosphere)
mat <- distm(prime.zips[ ,c('longitude','latitude')], temp_stations[ ,c(3,2)], fun=distGeo)
# assign the weather station id to each record in prime.zips based on shortest distance in the matrix
prime.zips$nearest.station <- temp_stations$station.id[apply(mat, 1, which.min)]

Resources