Make repeating character vector values - r

Hey there everyone just getting started with R, so I decided to make some data up with the eventual goal of superimposing it on top of a map.
Before I can get there I'm trying to add a name to my data to sort by Province.
Drugs <- c("Azin", "Prolof")
Provinces <- c("Ontario", "British Columbia", "Quebec")
Gender <- c("Female", "Male")
raw <- c(10,16,8,20,7,12,13,11,9,7,14,7)
yomom <- matrix(raw, nrow = 6, ncol = 2)
colnames(yomom) <- Drugs
bro <- data.frame(Gender, yomom)
idunno <- data.frame(Provinces, bro)
The first problem I've encountered is that the provinces vector is repeating, I'm not sure how to make it look like this in R. I'm basically trying to get it to skip a row.

Something like this?
idunno <- data.frame(Provinces=rep(Provinces,each=2), bro)
idunno
# Provinces Gender Azin Prolof
# 1 Ontario Female 10 13
# 2 Ontario Male 16 11
# 3 British Columbia Female 8 9
# 4 British Columbia Male 20 7
# 5 Quebec Female 7 14
# 6 Quebec Male 12 7
Read the documentation on rep(...)

Related

How to make neural edges more dynamic in neural plot?

I have data which represents transit between UK cities.
Transit: if there is a transit between these two cities = 1, otherwise
=0
ave.pas: average number of passengers
.
library(plotly)
library(ggraph)
library(tidyverse)
library(tidygraph)
library(igraph)
library(edgebundleR)
df2 <- data.frame (City1 = c("London", "London", "London", "London" ,"Liverpool","Liverpool","Liverpool" , "Manchester", "Manchester", "Bristol"),
City2 = c("Liverpool", "Manchester", "Bristol","Derby", "Manchester", "Bristol","Derby","Bristol","Derby","Derby"),
Transit = c(1,0,1,1,1,1,1,1,0,1),
ave.pas = c(10,0,11,24,40,45,12,34,0,29))
df:
City1 City2 Transit ave.pas
1 London Liverpool 1 10
2 London Manchester 0 0
3 London Bristol 1 11
4 London Derby 1 24
5 Liverpool Manchester 1 40
6 Liverpool Bristol 1 45
7 Liverpool Derby 1 12
8 Manchester Bristol 1 34
9 Manchester Derby 0 0
10 Bristol Derby 1 29
Now I plot circular network:
df <- subset(df2, Transit== 1, select = c("City1","City2"))
edgebundle(graph.data.frame(df),directed=F,tension=0.1,fontsize = 10)
My goal is to set the size or colour's intensitvity of edges based on the corresponding value in 'ave.pas' variable from the dataset
linked links: link1 link2 link3 link4
(Plot must be made using edgebundle() function)
The intensity of the edges in the linked plots appears to be a function of the number of edges joining the vertices. We can make the number of edges equal to the number of passengers, but the problem here is that after a few lines are plotted on top of each other, the intensity stops increasing. It is therefore good for showing the difference between, say, 1 and 3 edges, but the difference between 10 and 30 is much less obvious. As a compromise, we can make the number of edges approximately proportional to the number of passengers. One way to do this is to create the graph from an adjacency matrix:
cities <- unique(c(df2$City1, df$City2))
m <- matrix(0, nrow = length(cities), ncol = length(cities),
dimnames = list(cities, cities))
for(i in seq(nrow(df2))) m[df2[i, 1], df2[i, 2]] <- df2[i, 4]
m <- m/min(m[m > 0])
edgebundle(graph_from_adjacency_matrix(m))

R: save the shapefile IDs in a particular order as a vector

Sample data
library(raster)
dat <- getData('GADM', country='FRA', level=1)
plot(dat)
text(dat, labels=as.character(dat$ID_1), col="darkred", font=2, offset=0.5, adj=c(0,2))
To save the IDs of provinces, I can do this
province.id <- dat$ID_1
However, I want to arrange these IDs according to some direction (i.e. south to north)
For example, my province.id id should start from 10 (since it is the southern most province) all the way till 17 since it is the northern most province
One way I thought was I can generate centorid of each province and based on the centroid,
I can determine which are the most south to most north location.
library(rgeos)
trueCentroids = gCentroid(dat,byid=TRUE)
plot(dat)
points(coordinates(dat),pch=1)
But I still cannot export the output or arrange the centroids in the south-to-north direction
to save as a vector
An easy approach would be to take the minimum latitude of each polygon and sort your IDs based on that:
# data
library(raster)
dat <- getData('GADM', country='FRA', level=1)
# create south to north index
sn_index <- unlist(lapply(dat#polygons, function(x) min(x#Polygons[[1]]#coords[,2])))
#sort IDs
dat$ID_1[order(sn_index)]
# [1] 10 13 21 16 22 3 2 14 20 6 18 8 11 7 1 9 15 4 5 19 12 17

Convert one column into multiple columns

I am a novice. I have a data set with one column and many rows. I want to convert this column into 5 columns. For example my data set looks like this:
Column
----
City
Nation
Area
Metro Area
Urban Area
Shanghai
China
24,000,000
1230040
4244234
New york
America
343423
23423434
343434
Etc
The output should look like this
City | Nation | Area | Metro City | Urban Area
----- ------- ------ ------------ -----------
Shangai China 2400000 1230040 4244234
New york America 343423 23423434 343434
The first 5 rows of the data set (City, Nation,Area, etc) need to be the names of the 5 columns and i want the rest of the data to get populated under these 5 columns. Please help.
Here is a one liner (considering that your column is character, i.e. df$column <- as.character(df$column))
setNames(data.frame(matrix(unlist(df[-c(1:5),]), ncol = 5, byrow = TRUE)), c(unlist(df[1:5,])))
# City Nation Area Metro_Area Urban_Area
#1 Shanghai China 24,000,000 1230040 4244234
#2 New_york America 343423 23423434 343434
I'm going to go out on a limb and guess that the data you're after is from the URL: https://en.wikipedia.org/wiki/List_of_largest_cities.
If this is the case, I would suggest you actually try re-reading the data (not sure how you got the data into R in the first place) since that would probably make your life easier.
Here's one way to read the data in:
library(rvest)
URL <- "https://en.wikipedia.org/wiki/List_of_largest_cities"
XPATH <- '//*[#id="mw-content-text"]/table[2]'
cities <- URL %>%
read_html() %>%
html_nodes(xpath=XPATH) %>%
html_table(fill = TRUE)
Here's what the data currently looks like. Still needs to be cleaned up (notice that some of the columns which had names in merged cells from "rowspan" and the sorts):
head(cities[[1]])
## City Nation Image Population Population Population
## 1 Image City proper Metropolitan area Urban area[7]
## 2 Shanghai China 24,256,800[8] 34,750,000[9] 23,416,000[a]
## 3 Karachi Pakistan 23,500,000[10] 25,400,000[11] 25,400,000
## 4 Beijing China 21,516,000[12] 24,900,000[13] 21,009,000
## 5 Dhaka Bangladesh 16,970,105[14] 15,669,000 18,305,671[15][not in citation given]
## 6 Delhi India 16,787,941[16] 24,998,000 21,753,486[17]
From there, the cleanup might be like:
cities <- cities[[1]][-1, ]
names(cities) <- c("City", "Nation", "Image", "Pop_City", "Pop_Metro", "Pop_Urban")
cities["Image"] <- NULL
head(cities)
cities[] <- lapply(cities, function(x) type.convert(gsub("\\[.*|,", "", x)))
head(cities)
# City Nation Pop_City Pop_Metro Pop_Urban
# 2 Shanghai China 24256800 34750000 23416000
# 3 Karachi Pakistan 23500000 25400000 25400000
# 4 Beijing China 21516000 24900000 21009000
# 5 Dhaka Bangladesh 16970105 15669000 18305671
# 6 Delhi India 16787941 24998000 21753486
# 7 Lagos Nigeria 16060303 13123000 21000000
str(cities)
# 'data.frame': 163 obs. of 5 variables:
# $ City : Factor w/ 162 levels "Abidjan","Addis Ababa",..: 133 74 12 41 40 84 66 148 53 102 ...
# $ Nation : Factor w/ 59 levels "Afghanistan",..: 13 41 13 7 25 40 54 31 13 25 ...
# $ Pop_City : num 24256800 23500000 21516000 16970105 16787941 ...
# $ Pop_Metro: int 34750000 25400000 24900000 15669000 24998000 13123000 13520000 37843000 44259000 17712000 ...
# $ Pop_Urban: num 23416000 25400000 21009000 18305671 21753486 ...

Add a column to a dataframe with values based on another column [duplicate]

This question already has answers here:
several substitutions in one line R
(3 answers)
Closed 7 years ago.
I have a dataframe with a column called Province and I need to add a new column called Region. The value is based on the Province column. Here is the dataframe:
Province
1 Alberta
2 Manitoba
3 Ontario
4 British Columbia
5 Nova Scotia
6 New Brunswick
7 Quebec
Output:
Province Region
1 Alberta Prairies
2 Manitoba Prairies
3 Ontario Central
4 British Columbia Pacific
5 Nova Scotia East
6 New Brunswick East
7 Quebec East
I tried this code in R and it is not working.
Region <- as.character(Province)
if (length(grep("British Comlumbia", Province)) > 0) {
return("Pacific")
}
You can create vectors and do a step-wise replacement. This may not be an apt way but this will work.
Prairies <- c("Alberta","Manitoba")
Central <- c("Ontario")
Pacific <- c("British Colombia")
East <- c("Nova Scotia","New Brusnwick","Quebec")
#make a copy of the column province
df$Region <- as.vector(df[,1])
#one by one replace the items based on your vectors
df$Region <- replace(df$Region, df$Region%in%Prairies, "Prairies")
df$Region <- replace(df$Region, df$Region%in%Central, "Central")
df$Region <- replace(df$Region, df$Region%in%Pacific, "Pacific")
df$Region <- replace(df$Region, df$Region%in%East, "East")

How to specific rows from a split list in R based on column condition

I am new to R and to programming in general and am looking for feedback on how to approach what is probably a fairly simple problem in R.
I have the following dataset:
df <- data.frame(county = rep(c("QU","AN","GY"), 3),
park = (c("Downtown","Queens", "Oakville","Squirreltown",
"Pinhurst", "GarbagePile","LottaTrees","BigHill",
"Jaynestown")),
hectares = c(12,42,6,18,92,6,4,52,12))
df<-transform(df, parkrank = ave(hectares, county,
FUN = function(x) rank(x, ties.method = "first")))
Which returns a dataframe looking like this:
county park hectares parkrank
1 QU Downtown 12 2
2 AN Queens 42 1
3 GY Oakville 6 1
4 QU Squirreltown 18 3
5 AN Pinhurst 92 3
6 GY GarbagePile 6 2
7 QU LottaTrees 4 1
8 AN BigHill 52 2
9 GY Jaynestown 12 3
I want to use this to create a two-column data frame that lists each county and the park name corresponding to a specific rank (e.g. if when I call my function I add "2" as a variable, shows the second biggest park in each county).
I am very new to R and programming and have spent hours looking over the built in R help files and similar questions here on stack overflow but I am clearly missing something. Can anyone give a simple example of where to begin? It seems like I should be using split then lapply or maybe tapply, but everything I try leaves me very confused :(
Thanks.
Try,
df2 <- function(A,x) {
# A is the name of the data.frame() and x is the rank No
df <- A[A[,4]==x,]
return(df)
}
> df2(df,2)
county park hectares parkrank
1 QU Downtown 12 2
6 GY GarbagePile 6 2
8 AN BigHill 52 2

Resources