Add new column to the existing data set - r

Date City Temp
1/1/2012 Liverpool 10
1/2/2012 Madrid 20
1/3/2012 Milan 40
1/4/2012 Istanbul 35
1/5/2012 Munich 10
I need to add another column in this data set with County column name. If the df$City is Madrid, Country will need to be Spain. I now this is a very small data set, I need to be able to do this programatically thin R?
I would like my new data frame to look like this:
Date City Temp Country
--------------------------------------
1/1/2012 Liverpool 10 England
1/2/2012 Madrid 20 Matrid
1/3/2012 Milan 40 Italy
1/4/2012 Istanbul 35 Turkey
1/5/2012 Munich 10 Germany
Any pointers how I would do this in R?

On way with your exact data provided is:
df <- read.table(text= " Date City Temp
1/1/2012 Liverpool 10
1/2/2012 Madrid 20
1/3/2012 Milan 40
1/4/2012 Istanbul 35
1/5/2012 Munich 10",header=TRUE)
df$Country <- ifelse(df$City == "Liverpool", "England",
ifelse(df$City == "Madrid", "Spain",
ifelse(df$City == "Milan", "Italy",
ifelse(df$City == "Istanbul", "Turkey", "Germany") )))
However I am assuming you may have more cities and countries, in which case something like:
countrydf <- read.table(text= " City Country
Liverpool England
Madrid Spain
Milan Italy
Istanbul Turkey
Munich Germany",header=TRUE,stringsAsFactors=FALSE)
merge(df,countrydf, by="City")
note:
had a look in package maps, which could be useful to you
library(maps)
data(world.cities)
head(world.cities)
world.cities[world.cities$name == "Istanbul" ,]

Without knowing how the cities are mapped to countries in your situation (i.e., are they mapped in a list, vector, data.frame, or something else altogether?), it's hard to guess what the right answer is for you. Here is one way, where the city-country mapping is in a list:
df <- read.table(text="Date City Temp
1/1/2012 Liverpool 10
1/2/2012 Madrid 20
1/3/2012 Milan 40
1/4/2012 Istanbul 35
1/5/2012 Munich 10", header=TRUE)
city.countries <- list(England=c('Liverpool', 'London'),
Spain='Madrid',
Italy='Milan',
Turkey='Istanbul',
Germany='Munich')
df <- transform(df, Country = with(stack(city.countries), ind[match(City, values)]))
# Date City Temp Country
# 1 1/1/2012 Liverpool 10 England
# 2 1/2/2012 Madrid 20 Spain
# 3 1/3/2012 Milan 40 Italy
# 4 1/4/2012 Istanbul 35 Turkey
# 5 1/5/2012 Munich 10 Germany

Related

Selecting a column with a dot in R (nested object)

I'm new to R and I'm not sure how to rephrase the question, but basically, I have this dataset coming from the following code:
data_url <- 'https://prod-scores-api.ausopen.com/year/2021/stats'
dat <- jsonlite::fromJSON(data_url)
men_aces <- bind_rows(dat$statistics$rankings[[1]]$players[1])
men_aces_table <- dat$players %>%
inner_join(men_aces, by = c('uuid' = 'player_id')) %>% select(full_name, nationality)
Which resulted in this data frame:
full_name nationality.uuid nationality.name nationality.code
1 Novak Djokovic 99da9b29-eade-4ac3-a7b0-b0b8c2192df7 Serbia SRB
2 Alexander Zverev 99d83e85-3173-4ccc-9d91-8368720f4a47 Germany GER
3 Milos Raonic 07779acb-6740-4b26-a664-f01c0b54b390 Canada CAN
4 Daniil Medvedev fa925d2d-337f-4074-a0bd-afddb38d66e1 Russia RUS
5 Nick Kyrgios 9b11f78c-47c1-43c4-97d0-ba3381eb9f07 Australia AUS
nationality is the nested object inside the player object if you check the JSON url, it contains the above properties (uuid, name, code), if I select the full_name property I would get the value (which is of type character) right back.
I'm not sure how to select the name and from that data frame (nationality) and rename it to country.
My expected outcome is:
full_name country
1 Novak Djokovic Serbia
2 Alexander Zverev Germany
3 Milos Raonic Canada
4 Daniil Medvedev Russia
5 Nick Kyrgios Australia
I would appreciate some help. Sorry I was unclear.
Use purrr::pmap_chr
library(tidyverse)
dat$players %>%
inner_join(men_aces, by = c('uuid' = 'player_id')) %>%
select(full_name, nationality) %>%
mutate(nationality = pmap_chr(nationality, ~ ..2))
full_name nationality
1 Novak Djokovic Serbia
2 Alexander Zverev Germany
3 Milos Raonic Canada
4 Daniil Medvedev Russia
5 Nick Kyrgios Australia
6 Alexander Bublik Kazakhstan
7 Reilly Opelka United States of America
8 Jiri Vesely Czech Republic
9 Andrey Rublev Russia
10 Lloyd Harris South Africa
11 Aslan Karatsev Russia
12 Taylor Fritz United States of America
13 Matteo Berrettini Italy
14 Grigor Dimitrov Bulgaria
15 Feliciano Lopez Spain
16 Stefanos Tsitsipas Greece
17 Felix Auger-Aliassime Canada
18 Thanasi Kokkinakis Australia
19 Ugo Humbert France
20 Borna Coric Croatia
You could do:
bind_cols(full_name = dat$players$full_name, country = dat$players$nationality$name)
# A tibble: 169 x 2
full_name country
<chr> <chr>
1 Novak Djokovic Serbia
2 Alexander Zverev Germany
3 Milos Raonic Canada
4 Daniil Medvedev Russia
5 Nick Kyrgios Australia
6 Alexander Bublik Kazakhstan
7 Reilly Opelka United States of America
8 Jiri Vesely Czech Republic
9 Andrey Rublev Russia
10 Lloyd Harris South Africa
just add this line at the end
newdf <- data.frame(full_name = men_aces_table$full_name, country = men_aces_table$nationality$name)

How to use If function in R to create a column using multiple conditions

I am not familiar with R , I need your help for this issue ,
I have a data frame composed with 25 variables (25 columns) named df simplified
name experience Club age Position
luc 2 FCB 18 Goalkeeper
jean 9 Real 26 midfielder
ronaldo 14 FCB 32 Goalkeeper
jean 9 Real 26 midfielder
messi 11 Liverpool 35 midfielder
tevez 6 Chelsea 27 Attack
inzaghi 9 Juve 34 Defender
kwfni 17 Bayern 40 Attack
Blabla 9 Real 25 midfielder
wdfood 11 Liverpool 33 midfielder
player2 7 Chelsea 28 Attack
player3 10 Juve 34 Defender
fgh 17 Bayern 40 Attack
I would like to add a column to this data frame named "country".This new column takes into account different conditions .
Juve Italy
FCB Spain
Real Spain
Chelsea England
Liverpool England
Bayern Germany
So let say if the club is FCB or Real the value in country is Spain
the output of df$Country should be as follows
Country
Spain
Spain
Spain
Spain
England
England
Italy
Germany
Spain
England
England
Italy
Germany
The code I started to do is the following
df$country=ifelse(df$Club=="FCB","spain", df$Club=="Real","Spain" ......)
But it seems false .
knowing that my real data set has more than 250 different values in "club" column
and more than 30 in "Country"
doing that manually seems too long .
Could you help me in that point please .
Do you know how to use if-else statements inside for loops? This would be the simplest way out.
Something like this:
df <- data.frame(name = c("a", "b", "c"),
Club = c("FCB", "Real", "Liverpool"),
stringsAsFactors = FALSE)
for(i in 1:nrow(df)){
if(df$Club[i] == "FCB" | df$Club[i] == "Real"){
df$country[i] <- "Spain"
} else if(df$Club[i] == "Liverpool"){
df$country[i] <- "England"
} else{
df$country[i] <- NA
}
}
df
# name Club country
# 1 a FCB Spain
# 2 b Real Spain
# 3 c Liverpool England

substract two strings in dplyr row wise for R dataframe

Have two columns and need a third substracting the two using dplyr.
Very simple example for the sake of clarity. Split/separate approach not valid in my case.
x <- c("FRANCE","GERMANY","RUSSIA")
y <- c("Paris FRANCE", "Berlin GERMANY", "Moscow RUSSIA")
cities <- data.frame(x,y)
cities
x y
1 FRANCE Paris FRANCE
2 GERMANY Berlin GERMANY
3 RUSSIA Moscow RUSSIA
Expected results:
x y new
1 FRANCE Paris FRANCE Paris
2 GERMANY Berlin GERMANY Berlin
3 RUSSIA Moscow RUSSIA Moscow
What I've tried so far (to no avail):
this gets the very same df but removing the city (contrary as desired)
cities %>% mutate(new = setdiff(x,y))
x y new
1 FRANCE Paris FRANCE FRANCE
2 GERMANY Berlin GERMANY GERMANY
3 RUSSIA Moscow RUSSIA RUSSIA
On the contrary, setdiff in reverse order gets same initial data
cities %>% mutate(new = setdiff(y,x))
x y new
1 FRANCE Paris FRANCE Paris FRANCE
2 GERMANY Berlin GERMANY Berlin GERMANY
3 RUSSIA Moscow RUSSIA Moscow RUSSIA
Using gsub to remove worked just for first row issuing a warning
cities %>% mutate(new = gsub(x,"",y))
Warning message:
In gsub(x, "", y) :
argument 'pattern' has length > 1 and only the first element will be used
x y new
1 FRANCE Paris FRANCE Paris
2 GERMANY Berlin GERMANY Berlin GERMANY
3 RUSSIA Moscow RUSSIA Moscow RUSSIA
We can use stringr::str_replace:
library(tidyverse)
cities %>%
mutate_if(is.factor, as.character) %>%
mutate(new = trimws(str_replace(y, x, "")))
# x y new
#1 FRANCE Paris FRANCE Paris
#2 GERMANY Berlin GERMANY Berlin
#3 RUSSIA Moscow RUSSIA Moscow
Here is a solution with base R:
x <- c("FRANCE","GERMANY","RUSSIA")
y <- c("Paris FRANCE", "Berlin GERMANY", "Moscow RUSSIA")
cities <- data.frame(x,y,stringsAsFactors = F)
cities$new = mapply(function(a,b)
{setdiff(strsplit(a,' ')[[1]],strsplit(b,' ')[[1]])}, cities$y, cities$x)
Output:
x y new
1 FRANCE Paris FRANCE Paris
2 GERMANY Berlin GERMANY Berlin
3 RUSSIA Moscow RUSSIA Moscow
Hope this helps!

Convert one column into multiple columns

I am a novice. I have a data set with one column and many rows. I want to convert this column into 5 columns. For example my data set looks like this:
Column
----
City
Nation
Area
Metro Area
Urban Area
Shanghai
China
24,000,000
1230040
4244234
New york
America
343423
23423434
343434
Etc
The output should look like this
City | Nation | Area | Metro City | Urban Area
----- ------- ------ ------------ -----------
Shangai China 2400000 1230040 4244234
New york America 343423 23423434 343434
The first 5 rows of the data set (City, Nation,Area, etc) need to be the names of the 5 columns and i want the rest of the data to get populated under these 5 columns. Please help.
Here is a one liner (considering that your column is character, i.e. df$column <- as.character(df$column))
setNames(data.frame(matrix(unlist(df[-c(1:5),]), ncol = 5, byrow = TRUE)), c(unlist(df[1:5,])))
# City Nation Area Metro_Area Urban_Area
#1 Shanghai China 24,000,000 1230040 4244234
#2 New_york America 343423 23423434 343434
I'm going to go out on a limb and guess that the data you're after is from the URL: https://en.wikipedia.org/wiki/List_of_largest_cities.
If this is the case, I would suggest you actually try re-reading the data (not sure how you got the data into R in the first place) since that would probably make your life easier.
Here's one way to read the data in:
library(rvest)
URL <- "https://en.wikipedia.org/wiki/List_of_largest_cities"
XPATH <- '//*[#id="mw-content-text"]/table[2]'
cities <- URL %>%
read_html() %>%
html_nodes(xpath=XPATH) %>%
html_table(fill = TRUE)
Here's what the data currently looks like. Still needs to be cleaned up (notice that some of the columns which had names in merged cells from "rowspan" and the sorts):
head(cities[[1]])
## City Nation Image Population Population Population
## 1 Image City proper Metropolitan area Urban area[7]
## 2 Shanghai China 24,256,800[8] 34,750,000[9] 23,416,000[a]
## 3 Karachi Pakistan 23,500,000[10] 25,400,000[11] 25,400,000
## 4 Beijing China 21,516,000[12] 24,900,000[13] 21,009,000
## 5 Dhaka Bangladesh 16,970,105[14] 15,669,000 18,305,671[15][not in citation given]
## 6 Delhi India 16,787,941[16] 24,998,000 21,753,486[17]
From there, the cleanup might be like:
cities <- cities[[1]][-1, ]
names(cities) <- c("City", "Nation", "Image", "Pop_City", "Pop_Metro", "Pop_Urban")
cities["Image"] <- NULL
head(cities)
cities[] <- lapply(cities, function(x) type.convert(gsub("\\[.*|,", "", x)))
head(cities)
# City Nation Pop_City Pop_Metro Pop_Urban
# 2 Shanghai China 24256800 34750000 23416000
# 3 Karachi Pakistan 23500000 25400000 25400000
# 4 Beijing China 21516000 24900000 21009000
# 5 Dhaka Bangladesh 16970105 15669000 18305671
# 6 Delhi India 16787941 24998000 21753486
# 7 Lagos Nigeria 16060303 13123000 21000000
str(cities)
# 'data.frame': 163 obs. of 5 variables:
# $ City : Factor w/ 162 levels "Abidjan","Addis Ababa",..: 133 74 12 41 40 84 66 148 53 102 ...
# $ Nation : Factor w/ 59 levels "Afghanistan",..: 13 41 13 7 25 40 54 31 13 25 ...
# $ Pop_City : num 24256800 23500000 21516000 16970105 16787941 ...
# $ Pop_Metro: int 34750000 25400000 24900000 15669000 24998000 13123000 13520000 37843000 44259000 17712000 ...
# $ Pop_Urban: num 23416000 25400000 21009000 18305671 21753486 ...

Summarize data using doBy package at region level

I have a dataset Data as below,
Region Country Market Price
EUROPE France France 30.4502
EUROPE Israel Israel 5.14110965
EUROPE France France 8.99665
APAC CHINA CHINA 2.6877232
APAC INDIA INDIA 60.9004
AFME SL SL 54.1729685
LA BRAZIL BRAZIL 56.8606917
EUROPE RUSSIA RUSSIA 11.6843732
APAC BURMA BURMA 63.5881232
AFME SA SA 115.0733685
I would like to summarize the data at Region level and get the SUM of Price at every Region Level.
I want the ouput to be Like below.
Data Output
Region Country Price
EUROPE France 30.4502
EUROPE Israel 5.14110965
EUROPE France 8.99665
EUROPE RUSSIA 11.6843732
Europe 56.27233285
APAC BURMA 63.5881232
APAC CHINA 2.6877232
APAC INDIA 60.9004
Apac 127.1762464
AFME BAHARAIN 54.1729685
AFME SA 115.0733685
AFME 169.246337
LA BRAZIL 56.8606917
LA 56.8606917
I have used summaryBy function of doBy package, i have tried the code below.
summaryBy
myfun1 <- function(x){c(s=Sum(x)}
DB= summaryBy(Data$Price ~Region + Country , data=Data, FUN=myfun1)
Anyhelp on this regard is very much appreciated.
You can do this by using dplyr to generate a summary table:
library(dplyr)
totals <- data %>% group_by(Region) %>% summarise(Country="",Price=sum(Price))
And then merging the summary with the rest of the data:
summary <- rbind(data[-3], totals)
Then you can sort by Region to put the summary with the region:
summary <- summary %>% arrange(Region)
Output:
Region Country Price
1 AFME SL 54.1730
2 AFME SA 115.0734
3 AFME 169.2463
4 APAC CHINA 2.6877
5 APAC INDIA 60.9004
6 APAC BURMA 63.5881
7 APAC 127.1762
8 EUROPE France 30.4502
9 EUROPE Israel 5.1411
10 EUROPE France 8.9967
11 EUROPE RUSSIA 11.6844
12 EUROPE 56.2723
13 LA BRAZIL 56.8607
14 LA 56.8607
You have to split data by Region factor and sum Price for each factor
lapply(split(data, data$Region), function(x) sum(x$Price))
Or, if you need to present result as you have shown:
totals = lapply(split(data, data$Region), function(x) rbind(x,data.frame(Region=unique(x$Region), Country="", Market="", Price=sum(x$Price))))
do.call(rbind, totals)

Resources