Add a column to a dataframe with values based on another column [duplicate] - r

This question already has answers here:
several substitutions in one line R
(3 answers)
Closed 7 years ago.
I have a dataframe with a column called Province and I need to add a new column called Region. The value is based on the Province column. Here is the dataframe:
Province
1 Alberta
2 Manitoba
3 Ontario
4 British Columbia
5 Nova Scotia
6 New Brunswick
7 Quebec
Output:
Province Region
1 Alberta Prairies
2 Manitoba Prairies
3 Ontario Central
4 British Columbia Pacific
5 Nova Scotia East
6 New Brunswick East
7 Quebec East
I tried this code in R and it is not working.
Region <- as.character(Province)
if (length(grep("British Comlumbia", Province)) > 0) {
return("Pacific")
}

You can create vectors and do a step-wise replacement. This may not be an apt way but this will work.
Prairies <- c("Alberta","Manitoba")
Central <- c("Ontario")
Pacific <- c("British Colombia")
East <- c("Nova Scotia","New Brusnwick","Quebec")
#make a copy of the column province
df$Region <- as.vector(df[,1])
#one by one replace the items based on your vectors
df$Region <- replace(df$Region, df$Region%in%Prairies, "Prairies")
df$Region <- replace(df$Region, df$Region%in%Central, "Central")
df$Region <- replace(df$Region, df$Region%in%Pacific, "Pacific")
df$Region <- replace(df$Region, df$Region%in%East, "East")

Related

How can I add the country name to a dataset based on city name and population? [duplicate]

This question already has answers here:
extracting country name from city name in R
(3 answers)
Closed 7 months ago.
I have a dataset containing information on a range of cities, but there is no column which says what country the city is located in. In order to perform the analysis, I need to add an extra column which has the name of the country.
population city
500,000 Oslo
750,000 Bristol
500,000 Liverpool
1,000,000 Dublin
I expect the output to look like this:
population city country
500,000 Oslo Norway
750,000 Bristol England
500,000 Liverpool England
1,000,000 Dublin Ireland
How can I add a column of country names based on the city and population to a large dataset in R?
I am adapting Tom Hoel's answer, as suggested by Ian Campbell. If this is selected I am happy to mark it as community wiki.
library(maps)
library(dplyr)
data("world.cities")
df <- readr::read_table("population city
500,000 Oslo
750,000 Bristol
500,000 Liverpool
1,000,000 Dublin")
df |>
inner_join(
select(world.cities, name, country.etc, pop),
by = c("city" = "name")
) |> group_by(city) |>
filter(
abs(pop - population) == min(abs(pop - population))
)
# A tibble: 4 x 4
# Groups: city [4]
# population city country.etc pop
# <dbl> <chr> <chr> <int>
# 1 500000 Oslo Norway 821445
# 2 750000 Bristol UK 432967
# 3 500000 Liverpool UK 468584
# 4 1000000 Dublin Ireland 1030431
As stated by others, the cities exists in other countries too as well.
library(tidyverse)
library(maps)
data("world.cities")
df <- read_table("population city
500,000 Oslo
750,000 Bristol
500,000 Liverpool
1,000,000 Dublin")
df %>%
merge(., world.cities %>%
select(name, country.etc),
by.x = "city",
by.y = "name")
# A tibble: 7 × 3
city population country.etc
<chr> <dbl> <chr>
1 Bristol 750000 UK
2 Bristol 750000 USA
3 Dublin 1000000 USA
4 Dublin 1000000 Ireland
5 Liverpool 500000 UK
6 Liverpool 500000 Canada
7 Oslo 500000 Norway
I think your best bet would be to add a new column in your dataset called country and fill it out, this is part of the CRSIP-DM process data preparation so this is not uncommon. If that does not answer your question please let me know and i will do my best to help.

Recode by comparing a value to numbers in a vector

I want to code the values in a column into fewer values in another column.
For example,
if the value in zipcode column is one of the following c(90272,90049,90077,90210,90046,90069,90024,90025,90048,90036,90038,90028),
code it as "west" in district column.
How can I do it in R?
You can use the ifelse() function.
Set up the data in a dataframe:
df <- data.frame(zipcode = c(90272,90049,90077,90210,90046,90069,90024,90025,90048,90036,90038,90028))
Then use ifelse() to code a new value based on the values of zipcode.
df$district <- ifelse(df$zipcode %in% c(90272,90049,90077,90210,90046,90069,90024,90025,90048,90036,90038,90028),
"west",
NA)
> df
zipcode region
1 90272 west
2 90049 west
3 90077 west
4 90210 west
5 90046 west
6 90069 west
7 90024 west
8 90025 west
9 90048 west
10 90036 west
11 90038 west
12 90028 west

Removing rows with NA value in column [duplicate]

This question already has answers here:
Remove rows with all or some NAs (missing values) in data.frame
(18 answers)
Closed 5 years ago.
I have a dataset.csv in R. I want to remove all NA values from the Rank columns.
The column is like this
Rank State
NA District of Columbiaâ€
1 Connecticut
2 New Jersey
3 Massachusetts
4 Maryland
5 New Hampshire
6 Virginia
7 New York
8 North Dakota
9 Alaska
10 Minnesota
11 Colorado
12 Washington
13 Rhode Island
14 Delaware
15 California
16 Illinois
17 Hawaii
18 Wyoming
19 Pennsylvania
20 Vermont
NA United States
21 Iowa
The dataframe of this CSV is called RacePerState
The code I have tried
subset(RacePerState, State!="United States" && State!="District of Columbiaâ€" && !="Puerto Ricoâ€")
RacePerState <- RacePerState[!(RacePerState$Rank=="NA"),]
But when i write the dataframe to a csv the data is still there.
Any help?
RacePerState <- subset(RacePerState, !is.na(Rank))
or
RacePerState <- RacePerState[!is.na(RacePerState$Rank), ]
or
RacePerState <- RacePerState[complete.cases(RacePerState), ]
or
require(dplyr);
require(magrittr);
RacePerState %>% na.omit();

Adding data based on the values of two or more other columns in R [duplicate]

This question already has answers here:
How to sum a variable by group
(18 answers)
Closed 7 years ago.
I have the following data:
State Name Population
1 NY New York 1
2 NJ New Jersey 2
3 CA California 1
4 RI Rhode Island 1
5 NY New York 1
I want to use R to sum up the population column for all unique combination of the state and name columns. So the end result will be:
State Name Population
1 NJ New Jersey 2
2 NY New York 2
3 CA California 1
4 RI Rhode Island 1
Any help is greatly appreciated!
You can use dplyr package to do something like this:
library(dplyr)
df %>% group_by(State, Name) %>% summarise(Population = sum(Population))
We can just use aggregate from base R
aggregate(Population~., df1, sum)
Or with data.table
library(data.table)
setDT(df1)[, list(Population = sum(Population)), .(State, Name)]

Make repeating character vector values

Hey there everyone just getting started with R, so I decided to make some data up with the eventual goal of superimposing it on top of a map.
Before I can get there I'm trying to add a name to my data to sort by Province.
Drugs <- c("Azin", "Prolof")
Provinces <- c("Ontario", "British Columbia", "Quebec")
Gender <- c("Female", "Male")
raw <- c(10,16,8,20,7,12,13,11,9,7,14,7)
yomom <- matrix(raw, nrow = 6, ncol = 2)
colnames(yomom) <- Drugs
bro <- data.frame(Gender, yomom)
idunno <- data.frame(Provinces, bro)
The first problem I've encountered is that the provinces vector is repeating, I'm not sure how to make it look like this in R. I'm basically trying to get it to skip a row.
Something like this?
idunno <- data.frame(Provinces=rep(Provinces,each=2), bro)
idunno
# Provinces Gender Azin Prolof
# 1 Ontario Female 10 13
# 2 Ontario Male 16 11
# 3 British Columbia Female 8 9
# 4 British Columbia Male 20 7
# 5 Quebec Female 7 14
# 6 Quebec Male 12 7
Read the documentation on rep(...)

Resources