I have two files including the covid_provincial.csv and Co_adm1.zip files. The first one consists of number of COVID confirmed in a country. The second one is the shapefile of boundaries of the country. I imported them into R and stored them into a and b objects writing the following codes:
a<-read.csv("the directory of my system which includes covid_provincial.csv")
b<-readOGR(dsn = "the directory (the folder) of my system which includes shape files")
I want to add a variable with name Province to the b object. Finally, I want to merge the a to the b.
EDIT
Taking into account #Robert's suggestion, it is better to omit the explicit call to #data.
You can add a column to your shp file by:
b$Province<-c("provinces", "names", "ecc")
or like this if you now that the order of Province in a correspond to the element of b
b$Province<-a$Province
then merge a to b with merge:
b <-merge(b,a,by="Province")
This should work. However, there are different way to do the same things.
EDIT
I add an example with some data. This example is not reproducible but should drive you in the right direction:
library(raster)
shp <- shapefile("dati/Limiti_2016_WGS84_Italia/regioni/basilicata_provincie.shp")
plot(shp)
data.frame(shp)
SHAPE_Leng SHAPE_Area X_sum
0 593511.8 6593536923 4751953
1 367848.1 3479205085 1371840
create a toy data.frame
df <- data.frame(provincie=c("A","B"),pop=c(10000,20000),covid=c(2000,3500))
shp$provincie <- df$provincie
now the shp looks like this
SHAPE_Leng SHAPE_Area X_sum provincie
0 593511.8 6593536923 4751953 A
1 367848.1 3479205085 1371840 B
finally, we can merge the two
shp <- merge(shp,df,by="province")
data.frame(shp)
the output is:
provincie SHAPE_Leng SHAPE_Area X_sum pop covid
1 A 593511.8 6593536923 4751953 10000 2000
2 B 367848.1 3479205085 1371840 20000 3500
but the class of shp is preserved
> class(shp)
[1] "SpatialPolygonsDataFrame"
attr(,"package")
[1] "sp"
I am trying to learn R, and use the corrplot library to draw Y:City and X: Population graph. I wrote the below code:
When you look at the picture above, there are 2 columns City and population. When I run the code I get this error message:
Error in cor(Illere_Gore_Nufus) : 'x' must be numeric.
My excel data:
In general, correlation plot (Scattered plot) can be plotted only when you have two continuous variable. Correlation is a value that tells you how two continuous variables are linearly related. The Correlation value will always fall between -1 and 1, where correlation value of -1 depicts weak linear relationship and correlation value of 1 depicts strong linear relationship between the two variables. Correlation value of 0 says that there is no linear relationship between the two variables, however, there could be curvi-linear relationship between the two variables
For example
Area of the land Vs Price of the land
Here is the Data
The correlation value for this data is 0.896, which means that there is a strong linear correlation between Area of the land and Price of the land (Obviously!).
Scatter plot in R would look like this
Scatter plot
The R code would be
area<-c(650,785,880,990,1100,1250,1350,1800,2200,2800)
price<-c(250,275,280,290,350,340,400,335,420,460)
cor(area,price)
plot(area,price)
In Excel, for the same example, you can select the two columns, go to Insert > Scatter plot (under charts section)
Scatter plot
In your case, the information can be plotted in bar graph with city in y axis and population in x axis or vice versa!
Hope I have answered you query!
Some assumptions
You are asking how to do this in Excel, but your question is tagged R and Power BI (also RStudio, but that has been edited away), so I'm going to show you how to do this with R and Power BI. I'm also going to show you why you got that error message, and also why you would get an error message either way because your dataset is just not sufficient to make a correlation plot.
My answer
I'm assuming you would like to make a correlation plot of the population between the cities in your table. In that table you'd need more information than only one year for each city. I would check your data sources and see if you could come up with population numbers for, let's say, the last 10 years. In lack of the exact numbers for the cities in your table, I'm going to use some semi-made up numbers for the population in the 10 most populous countries (following your datastrutcture):
Country 2017 2016 2015 2014 2013
China 1415045928 1412626453 1414944844 1411445597 1409517397
India 1354051854 1340371473 1339431384 1343418009 1339180127
United States 326766748 324472802 325279622 324521777 324459463
Indonesia 266794980 266244787 266591965 265394107 263991379
Brazil 210867954 210335253 209297939 209860881 209288278
Pakistan 200813818 199761249 200253292 197655630 197015955
Nigeria 195875237 192568158 195757661 191728478 190886311
Bangladesh 166368149 165630262 165936711 166124290 164669751
Russia 143964709 143658415 143146914 143341653 142989754
Mexcio 137590740 137486490 136768870 137177870 136590740
Writing and debugging R code in Power BI is a real pain, so I would recommend installing R studio, write your little R snippets there, and then paste it into Power B.
The reason for your error message is that the function cor() onlyt takes numerical data as arguments. In your code sample the city names are given as arguments. And there are more potential traps in your code sample. You have to make sure that your dataset is numeric. And you have to make sure that your dataset has a shape that the cor() will accept.
Below is an R script that will do just that. Copy the data above, and store it in a file called data.xlsx on your C drive.
The Code
library(corrplot)
library(readxl)
# Read data
setwd("C:/")
data <- read_excel("data.xlsx")
# Set Country names as row index
rownames(data) <- data$Country
# Remove Country from dataframe
data$Country <- NULL
# Transpose data into a readable format for cor()
data <- data.frame(t(data))
# Plot data
corrplot(cor(data))
The plot
Power BI
In Power BI, you need to import the data before you use it in an R visual:
Copy this:
Country,2017,2016,2015,2014,2013
China,1415045928,1412626453,1414944844,1411445597,1409517397
India,1354051854,1340371473,1339431384,1343418009,1339180127
United States,326766748,324472802,325279622,324521777,324459463
Indonesia,266794980,266244787,266591965,265394107,263991379
Brazil,210867954,210335253,209297939,209860881,209288278
Pakistan,200813818,199761249,200253292,197655630,197015955
Nigeria,195875237,192568158,195757661,191728478,190886311
Bangladesh,166368149,165630262,165936711,166124290,164669751
Russia,143964709,143658415,143146914,143341653,142989754
Mexcio,137590740,137486490,136768870,137177870,136590740
Save it as countries.csv in a folder of your choosing, and pick it up in Power BI using
Get Data | Text/CSV, click Edit in the dialog box, and in the Power Query Editor, click Use First Row as headers so that you have this table in your Power Query Editor:
Click Close & Apply and make sure that you've got the data available under VISUALIZATIONS | FIELDS:
Click R under VISUALIZATIONS:
Select all columns under FIELDS | countries so that you get this setup:
Take parts of your R snippet that we prepared above
library(corrplot)
# Set Country names as row index
data <- dataset
rownames(data) <- data$Country
# Remove Country from dataframe
data$Country <- NULL
# Transpose data into a readable format for cor()
data <- data.frame(t(data))
# Plot data
corrplot(cor(data))
And paste it into the Power BI R script Editor:
Click Run R Script:
And you're gonna get this:
That's it!
If you change the procedure to importing data from an Excel file instead of a textfile (using Get Data | Excel , you've successfully combined the powers of Excel, Power BI and R to produce a scatterplot!
I hope this is what you were looking for!
I am looking for a way to shade counties on the US maps in R. I have list of numeric/char county FIPS code that I can input as parameter. I just need to highlight these counties -- so would just need to shade them and there are no values or variations corresponding to the counties. I tried to look up
library(choroplethr)
library(maps)
and
county_choropleth(df_pop_county)
head(df_pop_county)
region value
1 1001 54590
2 1003 183226
3 1005 27469
4 1007 22769
5 1009 57466
6 1011 10779
But these need a region, value pair. For e.g.,fips code and population in the above. Is there a way to call the county_choropleth function without having to use the values, just with the fipscode dataframe. In that way, I can my fips code with one color. What would be an efficient way to accomplish this in R using Choroplethr?
Here's an example using the maps library:
library(maps)
library(dplyr)
data(county.fips)
## Set up fake df_pop_county data frame
df_pop_county <- data.frame(region=county.fips$fips)
df_pop_county$value <- county.fips$fips
y <- df_pop_county$value
df_pop_county$color <- gray(y / max(y))
## merge population data with county.fips to make sure color column is
## ordered correctly.
counties <- county.fips %>% left_join(df_pop_county, by=c('fips'='region'))
map("county", fill=TRUE, col=counties$color)
Here's the resulting map:
Notice that counties with lower FIPS are darker, while counties with higher FIPS are lighter.
I usually went to the site :
http://gadm.org
to find the contours of the country I wanted to plot with R.
But today, this website is closed, and I don't know how to proceed to have a really accurate map...
When I try :
library(maps)
map("world", xlim=c(15,23), ylim=c(45.5,49))
The contours of the country are not very accurate...
I try to find an other website which could give the contours, but I didn't find.
Somebody can help me ?
Ok, let's say you want to plot France contours using data from Natural Earth:
# First download the data in your working directory
download.file("http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_0_countries.zip", "countries.zip")
# Then unzip
unzip("countries.zip")
# Load maptools
library(maptools)
# Read in the shapefile
world <- readShapeSpatial("ne_10m_admin_0_countries.shp")
# Plot France
plot(world[world$ADMIN=="France",1])
# Column 1 is the id column, column "ADMIN" contains the country names
I am not sure how to start this, as my GIS playing in R has been to plot things using ggplot2 and other packages using latlong coordinates. What I need to do now, is to use a visualization component in Microstrategy that uses a shapefile in the form of an HTML file containing x-y coordinates for the plot (ie. top left is 0,0). An example of a state level file is:
<HTML><HEAD><TITLE>untitled</TITLE></HEAD><BODY>
<IMG SRC="" USEMAP="#myMap" WIDTH="812" HEIGHT="713" BORDER="0" />
<MAP NAME="myMap">
<AREA SHAPE="POLY" HREF="#" ALT="Texas" COORDS="299,363,299,360,....." />
</MAP></BODY></HTML>
The points listed in 'coords' are the X and Y points with respect to a 812 by 713 'image' that is plotted and colored on the fly.
I have shp, shx, dbf files for Zip3 and Zip5 from http://www.vdstech.com/usa-data.aspx but am unsure of where to even start the conversion! I don't mind doing the grunt work of formatting the HTML file by hand, it is the X-Y conversion that I am stuck at (rusty, not touched GIS for quite a while):
The following code imports the shapefile into R
library(rgdal)
zip3 <- readOGR(dsn = '/Users/adempsey/Downloads/zip3'), layer = 'zip3')
After which I am stuck and currently hunting for tutorial of how to extract zip3 + x-y coordinates into a dataframe that I can then use to create my final file with
update 2
using the following, I ca convert to a data frame, but I am unable to pull across the associated zip3 code, which appeared to be stored in the associated dbf file
Row long lat order hole piece group id
1 -151.0604 70.41873 1 FALSE 1 0.1 0
2 -150.7620 70.49722 2 FALSE 1 0.1 0
Yes, this is beyond my current rusty R
update3
This code dumps the zip codes into a data frame
zip3.codes <- as.data.frame(zip3)
Which should be combinable with something like
zip3.df <- fortify(zip3#polygons[[1000]])
Where the 1000 would be replaced with all the rows zip3.codes associated with a particular zip3
You can use fastshp package to load the data:
install.packages("fastshp",,"http://rforge.net")
library(fastshp)
s <- read.shp("zip5.shp", format="polygon")
s is now a list of all ZIP shapes. You're interested in the x and y components
- for example to plot the first ZIP simply use something like
plot(s[[1]]$x, s[[1]]$y, asp=1.25)
polygon(s[[1]]$x, s[[1]]$y, col="#eeeeee")
To match the names, use read.dbf from foreign:
library(foreign)
d <- read.dbf("zip5.dbf", as.is=TRUE)
names(s) <- d$ZIP5
See ?read.shp for more details on the available formats. The "polygon" one uses NA to separate individual polygons, "list" uses indexing to give you the parts.
BTW the dataset is somewhat dubious, you may want to look into TIGER/Line census ZCTA5 data (most recent is 2010).