How to highlight the common rows from 2 separate dataframes in R? - r

I have 2 dataframes and I would like to highlight the common rows
library(openxlsx)
df = data.frame(Year = c(2018,2019,2020,2018,2019,2020,2018,2019,2020),
Country = c("Germany","Germany","Germany", "Japan", "Japan", "Japan", "Thailand", "Thailand", "Thailand"),
Count = c(17, 15, 60, 23, 25, 60, 50, 18, 31))
df2 = data.frame(Year = c(2018,2019,2020,2018,2019,2020,2018,2019,2020),
Country = c("Germany","Germany","Germany", "Japan", "Japan", "Japan", "Japan", "Thailand", "Thailand"),
Count = c(17, 100, 101, 102, 103, 60, 104, 18, 31))
wb = createWorkbook()
addWorksheet(wb, "Master")
writeDataTable(wb, "Master", df2, tableStyle = "TableStyleLight9")
yellow_style = createStyle(fgFill = "#FFFF00")
x = which(abs(df2$Count) == df$Count)
y = 1:which(colnames(df2) == "Count")
addStyle(wb, sheet = "Master", style = yellow_style, rows = x+1, col = y, gridExpand = TRUE)
saveWorkbook(wb, "Master.xlsx", overwrite = TRUE)
Right now this set of codes work but it can only verify "Count" instead of the entire row.
If I want to find out the common "Count", it will work perfectly. But let's say I want to verify that the entire row is the same, how do I do it?

Here is a base R approach. Use paste to create a composite of your columns, and identify which rows have same composite in other data frame.
x = which(do.call(paste, df2) %in% do.call(paste, df))
y = 1:ncol(df2)

Assuming the dataframes have the same dimensions (same number of rows and columns) you can simply compare them via df == df2 and then check row-wise whether the result has only the value TRUE in each row:
library(tidyverse)
df = data.frame(Year = c(2018,2019,2020,2018,2019,2020,2018,2019,2020),
Country = c("Germany","Germany","Germany", "Japan", "Japan", "Japan", "Thailand", "Thailand", "Thailand"),
Count = c(17, 15, 60, 23, 25, 60, 50, 18, 31))
df2 = data.frame(Year = c(2018,2019,2020,2018,2019,2020,2018,2019,2020),
Country = c("Germany","Germany","Germany", "Japan", "Japan", "Japan", "Japan", "Thailand", "Thailand"),
Count = c(17, 100, 101, 102, 103, 60, 104, 18, 31))
eq_df <- as_tibble(df == df2)
eq_df <- eq_df %>% rowwise() %>% mutate(rows_equal = (Year & Country & Count))
The resulting eq_df indicates in last column whether the respective row in df and df2 are the same.

Related

Basic Plot - How to use text labels on X axis?

Can you use text as X axis labels on a plot? I've searched and cannot see any examples. Am I trying to do something that is not possible in R? Even when I try to plot one variable. Countries is text/character - but I do not know how to set it as such
plot(Finally$Countries,Finally$RobberyPerCent, pch = 16, col = 2)
I get the error
Error in plot.window(...) : need finite 'xlim' values
In addition: There were 24 warnings (use warnings() to see them)
Thank you, my goal is to combine two variables and see if there is a basic pattern. I've been able to figure out simple linear regression (no correlation), but I'm failing at basic plotting
#Subset for Percentages
Q5DataFinal <- subset(Q5Data, select = c(RobberyPerCent, UnlawfulPerCent))
View(Q5DataFinal)
library(data.table)
Nearlythere <- setDT(Q5DataFinal, keep.rownames = TRUE)[] # turn rownames into column data
names(Nearlythere)[names(Nearlythere) == 'rn'] <- 'Countries' #renaming rn to countries
Nearlythere$Countries[] <- lapply(Nearlythere$Countries, as.character) #Changing Countries to Character
Finally <- Nearlythere
summary(Finally) #Countries saved as characters
# Attempt to create two Y axis Graph with Countries as X ticks
par(mar = c(5, 4, 4, 4) + 0.3) # Additional space for second y-axis
plot(Finally$Countries,Finally$RobberyPerCent, pch = 16, col = 2) # Create first plot
par(new = TRUE) # Add new plot
plot(Finally$Countries, Finally$UnlawfulPerCent, pch = 17, col = 3, # Create second plot without axes
axes = FALSE, xlab = "", ylab = "")
axis(side = 4, at = pretty(range(Finally$UnlawfulPerCent))) # Add second axis
mtext("UnlawfulPerCent", side = 4, line = 3) # Add second axis label
Dput is
structure(list(Countries = list("Albania", "Austria", "Bulgaria",
"Croatia", "Cyprus", "Czechia", "Finland", "Germany (until 1990 former territory of the FRG)",
"Greece", "Ireland", "Italy", "Kosovo (under United Nations Security Council Resolution 1244/99)",
"Latvia", "Lithuania", "Luxembourg", "Malta", "Montenegro",
"Romania", "Serbia", "Slovenia", "Spain", "Switzerland"),
RobberyPerCent = c(5, 6, 18, 7, 5, 23, 5, 9, 24, 9, 40, 12,
17, 18, 10, 52, 24, 33, 10, 17, 80, 2), UnlawfulPerCent = c(95,
94, 82, 93, 95, 77, 95, 91, 76, 91, 60, 88, 83, 82, 90, 48,
76, 67, 90, 83, 20, 98)), row.names = c(NA, -22L), class = c("data.table",
"data.frame"), .internal.selfref = <pointer: 0x0000020282d01ef0>)
Do you want something like this?
par(mar = c(5, 5, 4, 2))
x <- seq(0, 5, length.out = 500)
plot(x, sin(x^2), xaxt = "n", xlab = expression("Here is X"), ylab = expression(sin(x^2)),
main = expression("My coolest plot" - sin(x^2)))
axis(1, at=0:5, labels=c("Albania", "Kosovo", "Kongo", "Germany", "Bulgaria", "Spain"))
An addition
#your dataset
countries <- list("Albania", "Austria", "Bulgaria",
"Croatia", "Cyprus", "Czechia", "Finland", "Germany (until 1990 former territory of the FRG)",
"Greece", "Ireland", "Italy", "Kosovo (under United Nations Security Council Resolution 1244/99)",
"Latvia", "Lithuania", "Luxembourg", "Malta", "Montenegro",
"Romania", "Serbia", "Slovenia", "Spain", "Switzerland")
#modify to
axis(1, at=0:21, labels=countries, cex.axis=0.5) #select cex.axis for better displaying

ggplot formula for a bar graph

I am looking to get a bar graph of medals in R. I have 3 distinct columns (gold, silver, bronze). The columns for gold medals has a total of 8, the silver has 10, and the bronze has 13.
For the code, I started writing: ggplot(data, aes(x=?)) + geom_bar()
I am not sure how to write all 3 gold medals on the function where it shows x=?
Thanks
For plotting purposes, it is "easier" to work with long data instead of wide. Below I converted the data you mentioned in your comment to long and plotted the data as a grouped bar.
library(tidyverse)
# load data
raw_data <- structure(list(Rank = c(1, 2, 3, 4, 5, 6),
`Team/Noc` = c("United States of America", "People's Republic of China", "Japan", "Great Britain", "ROC", "Australia"),
Gold = c(39, 38, 27, 22, 20, 17),
Silver = c(41,32, 14, 21, 28, 7),
Bronze = c(33, 18, 17, 22, 23, 22),
Total = c(113, 88, 58, 65, 71, 46),
`Rank by Total` = c(1, 2, 5, 4, 3, 6)),
row.names = c(NA,-6L),
class = c("tbl_df", "tbl", "data.frame"))
# convert wide data to long
long_data <- raw_data %>%
pivot_longer(cols = -`Team/Noc`, names_to = 'Medal') %>% # convert wide data to long format
filter(Medal %in% c("Gold", "Silver", "Bronze")) # only select medal columns
# plot
ggplot(long_data) +
geom_col(aes(x = `Team/Noc`,
y = value,
fill = Medal),
position = "dodge" # grouped bars
)
Hope this gets you started!

What's the syntax for this kind of graph using ggplot in R?

How can I make a graph like that? Thank you.
If I have two dataframes like this here, and their topics are the same.
You could do something like this:
library(ggplot2)
df <- data.frame(Frequency = c(300, 0, 600, 900, 0, 1000, 700,
0, 300, 400, 0, 1400, 1600, 6500,
0, -250, -500, -400, -600, -1300,
0, -1150, -100, 0, -200, 0, -2500,
-2500),
Stream = rep(c("Ne", "Pri"), each = 14),
country = c("Argentina", "Brazil", "Canada",
"France", "Germany", "India",
"Indonesia", "Italy", "Mexico",
"Phillipines", "Spain", "Thailand",
"United Kingdom", "United States"))
ggplot(df, aes(Frequency, country, fill = Stream)) +
geom_col(width = 0.6) +
labs(y = "") +
scale_x_continuous(breaks = c(-2500, 0, 2500, 3750, 5000, 6250),
labels =c(250, 0, 2500, 3750, 5000, 6250),
limits = c(-2600, 7000)) +
theme_bw() +
theme(panel.border = element_blank())
Edit
If I have two data frames like this:
df1 <- data.frame(topic = c("design", "game", "hardware", "price"),
n = c(80, 1695, 29, 53))
df1
#> topic n
#> 1 design 80
#> 2 game 1695
#> 3 hardware 29
#> 4 price 53
df2 <- data.frame(topic = c("design", "game", "hardware", "price"),
n = c(400, 1235, 290, 107))
df2
#> topic n
#> 1 design 400
#> 2 game 1235
#> 3 hardware 290
#> 4 price 107
Then I can simply rbind them together, negating the n column on df2 first and adding a column to show which data frame each value came from:
df3 <- rbind(df1, within(df2, n <- -n))
df3$origin <- rep(c("df1", "df2"), each = nrow(df1))
And when I plot, I add abs as a labeller in `scale_x_continuous to remove the negative symbols on the left half of the plot.
ggplot(df3, aes(n, topic, fill = origin)) +
geom_col() +
scale_x_continuous(labels = abs)

Conditionally replace values across multiple columns based on string match in a separate column

I'm trying to conditionally replace values in multiple columns based on a string match in a different column but I'd like to be able to do so in a single line of code using the across() function but I keep getting errors that don't quite make sense to me. I feel like this is probably a simple solution so if anyone could point me in the right direction, that would be fantastic!
df <- data.frame("type" = c("Park", "Neighborhood", "Airport", "Park", "Neighborhood", "Neighborhood"),
"total" = c(34, 56, 75, 89, 21, 56),
"group_a" = c(30, 26, 45, 60, 3, 46),
"group_b" = c(4, 30, 30, 29, 18, 10))
# working but not concise
df %>%
mutate(total = ifelse(str_detect(type, "Park"), NA, total),
group_a = ifelse(str_detect(type, "Park"), NA, group_a),
group_b = ifelse(str_detect(type, "Park"), NA, group_b))
# concise but not working
df %>% mutate(across(total, group_a, group_b), ifelse(str_detect(type, "Park"), NA, .))
Update
We got a solution that works with my dummy dataset but is not working with my real data, so I am going to share a small snippet of my real data frame with the numbers changed and organization names hidden. When I run this line of code (df %>% mutate(across(c(Attempts, Canvasses, Completes)), ~ifelse(str_detect(long_name, "park-cemetery"), NA, .))) on these data, I get the following error message:
Error: Problem with mutate() input ..2. x Input ..2 must be a
vector, not a formula object. i Input ..2 is
~ifelse(str_detect(long_name, "park-cemetery"), NA, .).
This a small sample of the data that produces this error:
df <- structure(list(Org = c("OrgName", "OrgName", "OrgName", "OrgName",
"OrgName", "OrgName", "OrgName", "OrgName", "OrgName", "OrgName"
), nCode = c("M34", "R36", "R46", "X29", "M31", "K39", "Q12",
"Q39", "X41", "K27"), Attempts = c(100, 100, 100, 100, 100, 100,
100, 100, 100, 100), Canvasses = c(80, 80, 80, 80, 80, 80, 80,
80, 80, 80), Completes = c(50, 50, 50, 50, 50, 50, 50, 50, 50,
50), van_nocc_id = c(999, 999, 999, 999, 999, 999, 999, 999,
999, 999), van_name = c("M-Upper West Side", "SI-Rosebank", "SI-Tottenville",
"BX-park-cemetery-etc-Bronx", "M-Stuyvesant Town-Cooper Village",
"BK-Kensington", "Q-Broad Channel", "Q-Lindenwood", "BX-Wakefield",
"BK-East New York"), boro_short = c("M", "SI", "SI", "BX", "M",
"BK", "Q", "Q", "BX", "BK"), long_name = c("Upper West Side",
"Rosebank", "Tottenville", "park-cemetery-etc-Bronx", "Stuyvesant Town-Cooper Village",
"Kensington", "Broad Channel", "Lindenwood", "Wakefield", "East New York"
)), row.names = c(NA, -10L), class = "data.frame")
Final update
The curse of the misplaced closing bracket! Thanks to everyone for your help... the correct solution was df %>% mutate(across(c(Attempts, Canvasses, Completes), ~ifelse(str_detect(long_name, "park-cemetery"), NA, .)))
If you use the newly introduced function across (which is the correct way to approach this task), you have to specify inside across itself the function you want to apply. In this case the function ifelse(...) has to be a purrr-style lambda (so starting with ~). Check out across documentation and look for the arguments .cols and .fns.
df %>%
mutate(across(c(total, group_a, group_b), ~ifelse(str_detect(type, "Park"), NA, .)))
Output
# type total group_a group_b
# 1 Park NA NA NA
# 2 Neighborhood 56 26 30
# 3 Airport 75 45 30
# 4 Park NA NA NA
# 5 Neighborhood 21 3 18
# 6 Neighborhood 56 46 10
Here a data.table solution.
require(data.table)
df <- data.frame("type" = c("Park", "Neighborhood", "Airport", "Park", "Neighborhood", "Neighborhood"),
"total" = c(34, 56, 75, 89, 21, 56),
"group_a" = c(30, 26, 45, 60, 3, 46),
"group_b" = c(4, 30, 30, 29, 18, 10))
setDT(df)
df[type == "Park", c("total", "group_a", "group_b") := NA]
Update: that didn't take long to figure out! Just needed to place the columns in a vector:
# concise AND working!
df %>% mutate(across(c(total, group_a, group_b)), ifelse(str_detect(type, "Park"), NA, .))
I had tried this initially but placed the columns in quotes... don't do that :)

ggmap with value showing on the countries

I am looking for some help with the given sample data of countries on one column and count on another column. I am trying a build a geo maps using ggplot showing the count and name of the country in the respective places of the map when I hover above the country. Below is the sample data given. I tried with the ggmap with the lat and long position to identify the country but not able to show the count and name of the country on hovering.
structure(list(Countries = c("USA", "India", "Europe", "LATAM",
"Singapore", "Phillipines", "Australia", "EMEA", "Malaysia",
"Hongkong", "Philippines", "Thailand", "New Zealand"
), count = c(143002, 80316, 33513, 3736, 2180, 1905, 1816, 921,
707, 631, 207, 72, 49)), .Names = c("Countries", "count"), row.names = c(NA,
13L), class = "data.frame")
I tried the below code.
countries = geocode(Countryprofile$Countries)
Countryprofile = cbind(Countryprofile,countries)
mapWorld <- borders("world", colour="grey", fill="lightblue")
q<-ggplot(data = Countryprofile) + mapWorld + geom_point(aes(x=lon, y=lat) ,color="red", size=3)+
geom_text(data = Countryprofile,aes(x=lon,y=lat,label=Countries))
ggplotly(q)
You can change any attribute in the result from ggplotly. In this case you can set the text attribute of the 2nd trace (where you markers are defined).
plotly_map <- ggplotly(q)
plotly_map$x$data[[2]]$text <- paste(Countryprofile$Countries,
Countryprofile$count,
sep='<br />')
plotly_map
library(plotly)
library(ggmap)
Countryprofile <- structure(list(Countries = c("USA", "India", "Europe", "LATAM",
"Singapore", "Phillipines", "Australia", "EMEA", "Malaysia",
"Hongkong", "Philippines", "Thailand", "New Zealand"
), count = c(143002, 80316, 33513, 3736, 2180, 1905, 1816, 921,
707, 631, 207, 72, 49)), .Names = c("Countries", "count"), row.names = c(NA,
13L), class = "data.frame")
countries = geocode(Countryprofile$Countries)
Countryprofile = cbind(Countryprofile,countries)
mapWorld <- borders("world", colour="grey", fill="lightblue")
q<-ggplot(data = Countryprofile) + mapWorld + geom_point(aes(x=lon, y=lat) ,color="red", size=3)+
geom_text(data = Countryprofile,aes(x=lon,y=lat,label=Countries))
plotly_map <- ggplotly(q)
plotly_map$x$data[[2]]$text <- paste(Countryprofile$Countries, Countryprofile$count, sep='<br />')
plotly_map

Resources