How to use If function in R to create a column using multiple conditions - r

I am not familiar with R , I need your help for this issue ,
I have a data frame composed with 25 variables (25 columns) named df simplified
name experience Club age Position
luc 2 FCB 18 Goalkeeper
jean 9 Real 26 midfielder
ronaldo 14 FCB 32 Goalkeeper
jean 9 Real 26 midfielder
messi 11 Liverpool 35 midfielder
tevez 6 Chelsea 27 Attack
inzaghi 9 Juve 34 Defender
kwfni 17 Bayern 40 Attack
Blabla 9 Real 25 midfielder
wdfood 11 Liverpool 33 midfielder
player2 7 Chelsea 28 Attack
player3 10 Juve 34 Defender
fgh 17 Bayern 40 Attack
I would like to add a column to this data frame named "country".This new column takes into account different conditions .
Juve Italy
FCB Spain
Real Spain
Chelsea England
Liverpool England
Bayern Germany
So let say if the club is FCB or Real the value in country is Spain
the output of df$Country should be as follows
Country
Spain
Spain
Spain
Spain
England
England
Italy
Germany
Spain
England
England
Italy
Germany
The code I started to do is the following
df$country=ifelse(df$Club=="FCB","spain", df$Club=="Real","Spain" ......)
But it seems false .
knowing that my real data set has more than 250 different values in "club" column
and more than 30 in "Country"
doing that manually seems too long .
Could you help me in that point please .

Do you know how to use if-else statements inside for loops? This would be the simplest way out.
Something like this:
df <- data.frame(name = c("a", "b", "c"),
Club = c("FCB", "Real", "Liverpool"),
stringsAsFactors = FALSE)
for(i in 1:nrow(df)){
if(df$Club[i] == "FCB" | df$Club[i] == "Real"){
df$country[i] <- "Spain"
} else if(df$Club[i] == "Liverpool"){
df$country[i] <- "England"
} else{
df$country[i] <- NA
}
}
df
# name Club country
# 1 a FCB Spain
# 2 b Real Spain
# 3 c Liverpool England

Related

Selecting a column with a dot in R (nested object)

I'm new to R and I'm not sure how to rephrase the question, but basically, I have this dataset coming from the following code:
data_url <- 'https://prod-scores-api.ausopen.com/year/2021/stats'
dat <- jsonlite::fromJSON(data_url)
men_aces <- bind_rows(dat$statistics$rankings[[1]]$players[1])
men_aces_table <- dat$players %>%
inner_join(men_aces, by = c('uuid' = 'player_id')) %>% select(full_name, nationality)
Which resulted in this data frame:
full_name nationality.uuid nationality.name nationality.code
1 Novak Djokovic 99da9b29-eade-4ac3-a7b0-b0b8c2192df7 Serbia SRB
2 Alexander Zverev 99d83e85-3173-4ccc-9d91-8368720f4a47 Germany GER
3 Milos Raonic 07779acb-6740-4b26-a664-f01c0b54b390 Canada CAN
4 Daniil Medvedev fa925d2d-337f-4074-a0bd-afddb38d66e1 Russia RUS
5 Nick Kyrgios 9b11f78c-47c1-43c4-97d0-ba3381eb9f07 Australia AUS
nationality is the nested object inside the player object if you check the JSON url, it contains the above properties (uuid, name, code), if I select the full_name property I would get the value (which is of type character) right back.
I'm not sure how to select the name and from that data frame (nationality) and rename it to country.
My expected outcome is:
full_name country
1 Novak Djokovic Serbia
2 Alexander Zverev Germany
3 Milos Raonic Canada
4 Daniil Medvedev Russia
5 Nick Kyrgios Australia
I would appreciate some help. Sorry I was unclear.
Use purrr::pmap_chr
library(tidyverse)
dat$players %>%
inner_join(men_aces, by = c('uuid' = 'player_id')) %>%
select(full_name, nationality) %>%
mutate(nationality = pmap_chr(nationality, ~ ..2))
full_name nationality
1 Novak Djokovic Serbia
2 Alexander Zverev Germany
3 Milos Raonic Canada
4 Daniil Medvedev Russia
5 Nick Kyrgios Australia
6 Alexander Bublik Kazakhstan
7 Reilly Opelka United States of America
8 Jiri Vesely Czech Republic
9 Andrey Rublev Russia
10 Lloyd Harris South Africa
11 Aslan Karatsev Russia
12 Taylor Fritz United States of America
13 Matteo Berrettini Italy
14 Grigor Dimitrov Bulgaria
15 Feliciano Lopez Spain
16 Stefanos Tsitsipas Greece
17 Felix Auger-Aliassime Canada
18 Thanasi Kokkinakis Australia
19 Ugo Humbert France
20 Borna Coric Croatia
You could do:
bind_cols(full_name = dat$players$full_name, country = dat$players$nationality$name)
# A tibble: 169 x 2
full_name country
<chr> <chr>
1 Novak Djokovic Serbia
2 Alexander Zverev Germany
3 Milos Raonic Canada
4 Daniil Medvedev Russia
5 Nick Kyrgios Australia
6 Alexander Bublik Kazakhstan
7 Reilly Opelka United States of America
8 Jiri Vesely Czech Republic
9 Andrey Rublev Russia
10 Lloyd Harris South Africa
just add this line at the end
newdf <- data.frame(full_name = men_aces_table$full_name, country = men_aces_table$nationality$name)

Convert one column into multiple columns

I am a novice. I have a data set with one column and many rows. I want to convert this column into 5 columns. For example my data set looks like this:
Column
----
City
Nation
Area
Metro Area
Urban Area
Shanghai
China
24,000,000
1230040
4244234
New york
America
343423
23423434
343434
Etc
The output should look like this
City | Nation | Area | Metro City | Urban Area
----- ------- ------ ------------ -----------
Shangai China 2400000 1230040 4244234
New york America 343423 23423434 343434
The first 5 rows of the data set (City, Nation,Area, etc) need to be the names of the 5 columns and i want the rest of the data to get populated under these 5 columns. Please help.
Here is a one liner (considering that your column is character, i.e. df$column <- as.character(df$column))
setNames(data.frame(matrix(unlist(df[-c(1:5),]), ncol = 5, byrow = TRUE)), c(unlist(df[1:5,])))
# City Nation Area Metro_Area Urban_Area
#1 Shanghai China 24,000,000 1230040 4244234
#2 New_york America 343423 23423434 343434
I'm going to go out on a limb and guess that the data you're after is from the URL: https://en.wikipedia.org/wiki/List_of_largest_cities.
If this is the case, I would suggest you actually try re-reading the data (not sure how you got the data into R in the first place) since that would probably make your life easier.
Here's one way to read the data in:
library(rvest)
URL <- "https://en.wikipedia.org/wiki/List_of_largest_cities"
XPATH <- '//*[#id="mw-content-text"]/table[2]'
cities <- URL %>%
read_html() %>%
html_nodes(xpath=XPATH) %>%
html_table(fill = TRUE)
Here's what the data currently looks like. Still needs to be cleaned up (notice that some of the columns which had names in merged cells from "rowspan" and the sorts):
head(cities[[1]])
## City Nation Image Population Population Population
## 1 Image City proper Metropolitan area Urban area[7]
## 2 Shanghai China 24,256,800[8] 34,750,000[9] 23,416,000[a]
## 3 Karachi Pakistan 23,500,000[10] 25,400,000[11] 25,400,000
## 4 Beijing China 21,516,000[12] 24,900,000[13] 21,009,000
## 5 Dhaka Bangladesh 16,970,105[14] 15,669,000 18,305,671[15][not in citation given]
## 6 Delhi India 16,787,941[16] 24,998,000 21,753,486[17]
From there, the cleanup might be like:
cities <- cities[[1]][-1, ]
names(cities) <- c("City", "Nation", "Image", "Pop_City", "Pop_Metro", "Pop_Urban")
cities["Image"] <- NULL
head(cities)
cities[] <- lapply(cities, function(x) type.convert(gsub("\\[.*|,", "", x)))
head(cities)
# City Nation Pop_City Pop_Metro Pop_Urban
# 2 Shanghai China 24256800 34750000 23416000
# 3 Karachi Pakistan 23500000 25400000 25400000
# 4 Beijing China 21516000 24900000 21009000
# 5 Dhaka Bangladesh 16970105 15669000 18305671
# 6 Delhi India 16787941 24998000 21753486
# 7 Lagos Nigeria 16060303 13123000 21000000
str(cities)
# 'data.frame': 163 obs. of 5 variables:
# $ City : Factor w/ 162 levels "Abidjan","Addis Ababa",..: 133 74 12 41 40 84 66 148 53 102 ...
# $ Nation : Factor w/ 59 levels "Afghanistan",..: 13 41 13 7 25 40 54 31 13 25 ...
# $ Pop_City : num 24256800 23500000 21516000 16970105 16787941 ...
# $ Pop_Metro: int 34750000 25400000 24900000 15669000 24998000 13123000 13520000 37843000 44259000 17712000 ...
# $ Pop_Urban: num 23416000 25400000 21009000 18305671 21753486 ...

R - Converting one table to multiple table

I have a csv file called data.csv that contains 3 tables all merged in just one. I would like to separate them in 3 different data.frame when I import it to R.
So far this is what I have got after running this code:
df <- read.csv("data.csv")
View(df)
Student
Name Score
Maria 18
Bob 25
Paul 27
Region
Country Score
Italy 65
India 99
United 88
Sub region
City Score
Paris 77
New 55
Rio 78
How can I split them in such a way that I get this result:
First:
View(StudentDataFrame)
Name Score
Maria 18
Bob 25
Paul 27
Second:
View(regionDataFrame)
Country Score
Italy 65
India 99
United 88
Third:
View(SubRegionDataFrame)
City Score
Paris 77
New 55
Rio 78
One option would be to read the dataset with readLines, create a grouping variable ('grp') based on the location of 'Student', 'Region', 'Sub region' in the 'lines', split it and read it with read.table
i1 <- trimws(lines) %in% c("Student", "Region", "Sub region")
grp <- cumsum(i1)
lst <- lapply(split(lines[!i1], grp[!i1]), function(x)
read.table(text=x, stringsAsFactors=FALSE, header=TRUE))
lst
#$`1`
# Name Score
#1 Maria 18
#2 Bob 25
#3 Paul 27
#$`2`
# Country Score
#1 Italy 65
#2 India 99
#3 United 88
#$`3`
# City Score
#1 Paris 77
#2 New 55
#3 Rio 78
data
lines <- readLines("data.csv")

Add new column to the existing data set

Date City Temp
1/1/2012 Liverpool 10
1/2/2012 Madrid 20
1/3/2012 Milan 40
1/4/2012 Istanbul 35
1/5/2012 Munich 10
I need to add another column in this data set with County column name. If the df$City is Madrid, Country will need to be Spain. I now this is a very small data set, I need to be able to do this programatically thin R?
I would like my new data frame to look like this:
Date City Temp Country
--------------------------------------
1/1/2012 Liverpool 10 England
1/2/2012 Madrid 20 Matrid
1/3/2012 Milan 40 Italy
1/4/2012 Istanbul 35 Turkey
1/5/2012 Munich 10 Germany
Any pointers how I would do this in R?
On way with your exact data provided is:
df <- read.table(text= " Date City Temp
1/1/2012 Liverpool 10
1/2/2012 Madrid 20
1/3/2012 Milan 40
1/4/2012 Istanbul 35
1/5/2012 Munich 10",header=TRUE)
df$Country <- ifelse(df$City == "Liverpool", "England",
ifelse(df$City == "Madrid", "Spain",
ifelse(df$City == "Milan", "Italy",
ifelse(df$City == "Istanbul", "Turkey", "Germany") )))
However I am assuming you may have more cities and countries, in which case something like:
countrydf <- read.table(text= " City Country
Liverpool England
Madrid Spain
Milan Italy
Istanbul Turkey
Munich Germany",header=TRUE,stringsAsFactors=FALSE)
merge(df,countrydf, by="City")
note:
had a look in package maps, which could be useful to you
library(maps)
data(world.cities)
head(world.cities)
world.cities[world.cities$name == "Istanbul" ,]
Without knowing how the cities are mapped to countries in your situation (i.e., are they mapped in a list, vector, data.frame, or something else altogether?), it's hard to guess what the right answer is for you. Here is one way, where the city-country mapping is in a list:
df <- read.table(text="Date City Temp
1/1/2012 Liverpool 10
1/2/2012 Madrid 20
1/3/2012 Milan 40
1/4/2012 Istanbul 35
1/5/2012 Munich 10", header=TRUE)
city.countries <- list(England=c('Liverpool', 'London'),
Spain='Madrid',
Italy='Milan',
Turkey='Istanbul',
Germany='Munich')
df <- transform(df, Country = with(stack(city.countries), ind[match(City, values)]))
# Date City Temp Country
# 1 1/1/2012 Liverpool 10 England
# 2 1/2/2012 Madrid 20 Spain
# 3 1/3/2012 Milan 40 Italy
# 4 1/4/2012 Istanbul 35 Turkey
# 5 1/5/2012 Munich 10 Germany

Calculate rows with same title

Since my other question got closed, here is the required data.
What I'm trying to do is have R calculate the last column 'count' towards the column city so I can map the data. Therefore I would need some kind of code to match this. Since I want to show how many participants (in count) are in the state of e.g Hawaii (HI)
zip city state latitude longitude count
96860 Pearl Harbor HI 24.859832 -168.021815 36
96863 Kaneohe Bay HI 21.439867 -157.74772 39
99501 Anchorage AK 61.216799 -149.87828 12
99502 Anchorage AK 61.153693 -149.95932 17
99506 Elmendorf AFB AK 61.224384 -149.77461 2
what I've tried is
match<- c(match(datazip$state, datazip$number))>$
but I'm really helpless trying to find a solution since I don't even know how to describe this in short. My plan afterwards is to make choropleth map with the data and believe me by now I've seen almost all the pages that try to give advice. so your help is pretty much appreciated. Thanks
# I read your sample data to a data frame
> df
zip city state latitude longitude count
1 96860 Pearl_Harbor HI 24.85983 -168.0218 36
2 96863 Kaneohe_Bay HI 21.43987 -157.7477 39
3 99501 Anchorage AK 61.21680 -149.8783 12
4 99502 Anchorage AK 61.15369 -149.9593 17
5 99506 Elmendorf_AFB AK 61.22438 -149.7746 2
# If you want to sum the number of counts by state
library(plyr)
> ddply(df, .(state), transform, count2 = sum(count))
zip city state latitude longitude count count2
1 99501 Anchorage AK 61.21680 -149.8783 12 31
2 99502 Anchorage AK 61.15369 -149.9593 17 31
3 99506 Elmendorf_AFB AK 61.22438 -149.7746 2 31
4 96860 Pearl_Harbor HI 24.85983 -168.0218 36 75
5 96863 Kaneohe_Bay HI 21.43987 -157.7477 39 75
Maybe aggregate would be a nice and simple solution for you:
df
zip city state latitude longitude count
1 96860 Pearl Harbor HI 24.85983 -168.0218 36
2 96863 Kaneohe Bay HI 21.43987 -157.7477 39
3 99501 Anchorage AK 61.21680 -149.8783 12
4 99502 Anchorage AK 61.15369 -149.9593 17
5 99506 Elmendorf AFB AK 61.22438 -149.7746 2
aggregate(df$count,by=list(df$state),sum)
Group.1 x
1 AK 31
2 HI 75
aggregate(df$count,by=list(df$city),sum)
Group.1 x
1 Anchorage 29
2 Elmendorf AFB 2
3 Kaneohe Bay 39
4 Pearl Harbor 36

Resources