I want to code the values in a column into fewer values in another column.
For example,
if the value in zipcode column is one of the following c(90272,90049,90077,90210,90046,90069,90024,90025,90048,90036,90038,90028),
code it as "west" in district column.
How can I do it in R?
You can use the ifelse() function.
Set up the data in a dataframe:
df <- data.frame(zipcode = c(90272,90049,90077,90210,90046,90069,90024,90025,90048,90036,90038,90028))
Then use ifelse() to code a new value based on the values of zipcode.
df$district <- ifelse(df$zipcode %in% c(90272,90049,90077,90210,90046,90069,90024,90025,90048,90036,90038,90028),
"west",
NA)
> df
zipcode region
1 90272 west
2 90049 west
3 90077 west
4 90210 west
5 90046 west
6 90069 west
7 90024 west
8 90025 west
9 90048 west
10 90036 west
11 90038 west
12 90028 west
Related
R, RStudio
How do I convert DataFrame column to Factors?
I wish 0 is "North", 1 is "South", 2 is "East" and 3 is "West".
directions <- data.frame(
state=c("New York","New Jersey","Deleware","Texax","Alaska"),
travel=c(0,0,3,2,1)
)
head(directions)
Outputs
state travel
1 New York 0
2 New Jersey 0
3 Deleware 3
4 Texax 2
5 Alaska 1
I tried the following, but the entire travel column is NA
directions$travel <- factor(directions$travel,levels=c("North","South","East","West"))
head(directions)
Outputs
state travel
1 New York <NA>
2 New Jersey <NA>
3 Deleware <NA>
4 Texax <NA>
5 Alaska <NA>
We need to specify it in labels
factor(directions$travel,labels=c("North","South","East","West"))
#[1] North North West East South
#Levels: North South East West
If we need a custom grouping, then specify the levels as well
factor(directions$travel,levels = c(0, 1, 2, 3),
labels=c("North","South","East","West"))
nubie here with a dataframe/mutate question... I want to update a dataframe (df1) based on data in another dataframe (df2). For one offs I've used MUTATE so I figure this is the way to go. Additionally I would like a check function added (TRUE/FALSE ?) to indicate if the the field in df1 was updated.
For Example..
df1-
State
<chr>
1 N.Y.
2 FL
3 AL
4 MS
5 IL
6 WS
7 WA
8 N.J.
9 N.D.
10 S.D.
11 CALL
df2
State New_State
<chr> <chr>
1 N.Y. New York
2 FL Florida
3 AL Alabama
4 MS Mississippi
5 IL Illinois
6 WS Wisconsin
7 WA Washington
8 N.J. New Jersey
9 N.D. North Dakota
10 S.D. South Dakota
11 CAL California
I want the output to look like this
df3
New_State Test
<chr>
1 New York TRUE
2 Florida TRUE
3 Alabama TRUE
4 Mississippi TRUE
5 Illinois TRUE
6 Wisconsin TRUE
7 Washington TRUE
8 New Jersey TRUE
9 North Dakota TRUE
10 South Dakota TRUE
11 CALL FALSE
In essence I want R to read the data in df1 and change df1 based on the match in df2 chaining out to the full state name and replace. Lastly if the data in df1 was update mark as "TRUE" (N.Y. to NEW YORK) and "FALSE" if not updated (CALL vs CAL)
Thanks in advance for any and all help.
This should give you the result you're looking for:
match_vec <- match(df1$State, table = df2$State)
This vector should match all the abbreviated state names in df1 with those in df2. Where there's no match, you end up with a missing value:
Then the following code using dplyr should produce the df3 you requested.
library(dplyr)
df3 <- df1 %>%
mutate(New_State = df2$New_State[match_vec]) %>%
mutate(Test = !is.na(match_vec)) %>%
mutate(New_State = ifelse(is.na(New_State),
State, New_State)) %>%
select(New_State, Test)
I have two data sets. One with a numeric value assigned to individual categorical variables (country name) and a second with survey responses including a person's nationality. How do I assign the numeric value to a new column in the survey dataset with matching nationality/country name?
Here is the head of data set 1 (my.data1):
EN HCI
1 South Korea 0.845
2 UK 0.781
3 USA 0.762
Here is the head of data set 2 (my.data2):
Nationality OIS IR
1 South Korea 2 2
2 South Korea 3 3
3 USA 3 4
4 UK 3 3
I would like to make it look like this:
Nationality OIS IR HCI
1 South Korea 2 2 0.845
2 South Korea 3 3 0.845
3 USA 3 4 0.762
4 UK 3 3 0.781
I have tried this but unsuccessfully:
my.data2$HCI <- NA
for (i in i:nrow(my.data2)) {
my.data2$HCI[i] <- my.data1$HCI[my.data1$EN == my.data2$Nationality[i]]
}
We can use a left_join
library(dplyr)
left_join(my.data2, my.data1, by = c("Nationality" = "EN"))
Or with merge from base R
merge(my.data2, my.data1, by.x = c("Nationality", by.y = "EN", all.x = TRUE)
This question already has answers here:
several substitutions in one line R
(3 answers)
Closed 7 years ago.
I have a dataframe with a column called Province and I need to add a new column called Region. The value is based on the Province column. Here is the dataframe:
Province
1 Alberta
2 Manitoba
3 Ontario
4 British Columbia
5 Nova Scotia
6 New Brunswick
7 Quebec
Output:
Province Region
1 Alberta Prairies
2 Manitoba Prairies
3 Ontario Central
4 British Columbia Pacific
5 Nova Scotia East
6 New Brunswick East
7 Quebec East
I tried this code in R and it is not working.
Region <- as.character(Province)
if (length(grep("British Comlumbia", Province)) > 0) {
return("Pacific")
}
You can create vectors and do a step-wise replacement. This may not be an apt way but this will work.
Prairies <- c("Alberta","Manitoba")
Central <- c("Ontario")
Pacific <- c("British Colombia")
East <- c("Nova Scotia","New Brusnwick","Quebec")
#make a copy of the column province
df$Region <- as.vector(df[,1])
#one by one replace the items based on your vectors
df$Region <- replace(df$Region, df$Region%in%Prairies, "Prairies")
df$Region <- replace(df$Region, df$Region%in%Central, "Central")
df$Region <- replace(df$Region, df$Region%in%Pacific, "Pacific")
df$Region <- replace(df$Region, df$Region%in%East, "East")
I'm trying to calculate the best goal differentials in the group stage of the 2014 world cup.
football <- read.csv(
file="http://pastebin.com/raw.php?i=iTXdPvGf",
header = TRUE,
strip.white = TRUE
)
football <- head(football,n=48L)
football[which(max(abs(football$home_score - football$away_score)) == abs(football$home_score - football$away_score)),]
Results in
home home_continent home_score away away_continent away_score result
4 Cameroon Africa 0 Croatia Europe 4 l
7 Spain Europe 1 Netherlands Europe 5 l
37 Germany
So those are the games with the highest goal differntial, but now I need to make a new data frame that has a team name, and abs(football$home_score-football$away_score)
football$score_diff <- abs(football$home_score - football$away_score)
football$winner <- ifelse(football$home_score > football$away_score, as.character(football$home),
ifelse(football$result == "d", NA, as.character(football$away)))
You could save some typing in this way. You first get score differences and winners. When the result indicates w, home is the winner. So you do not have to look into scores at all. Once you add the score difference and winner, you can subset your data by subsetting data with max().
mydf <- read.csv(file="http://pastebin.com/raw.php?i=iTXdPvGf",
header = TRUE, strip.white = TRUE)
mydf <- head(mydf,n = 48L)
library(dplyr)
mutate(mydf, scorediff = abs(home_score - away_score),
winner = ifelse(result == "w", as.character(home),
ifelse(result == "l", as.character(away), "draw"))) %>%
filter(scorediff == max(scorediff))
# home home_continent home_score away away_continent away_score result scorediff winner
#1 Cameroon Africa 0 Croatia Europe 4 l 4 Croatia
#2 Spain Europe 1 Netherlands Europe 5 l 4 Netherlands
#3 Germany Europe 4 Portugal Europe 0 w 4 Germany
Here is another option without using ifelse for creating the "winner" column. This is based on row/column indexes. The numeric column index is created by matching the result column with its unique elements (match(football$result,..), and the row index is just 1:nrow(football). Subset the "football" dataset with columns 'home', 'away' and cbind it with an additional column 'draw' with NAs so that the 'd' elements in "result" change to NA.
football$score_diff <- abs(football$home_score - football$away_score)
football$winner <- cbind(football[c('home', 'away')],draw=NA)[
cbind(1:nrow(football), match(football$result, c('w', 'l', 'd')))]
football[with(football, score_diff==max(score_diff)),]
# home home_continent home_score away away_continent away_score result
#60 Brazil South America 1 Germany Europe 7 l
# score_diff winner
#60 6 Germany
If the dataset is very big, you could speed up the match by using chmatch from library(data.table)
library(data.table)
chmatch(as.character(football$result), c('w', 'l', 'd'))
NOTE: I used the full dataset in the link