I am looking to ultimately render a map in Leaflet that shows the roll call voting results for a specific vote from the Senate. This obviously involves coloring a state polygon based on the unique combination of the Senator's party affiliation and how they voted (2 senators per state). The problem I have is developing a workflow to color code a state (here I am using a simple sf dataframe of the US states) in this manner. The idea would be to "stripe" the state in two different colors based on each of the Senator's party affiliation and vote type.
Below is a workflow that has already been created for viewing roll call voting results by congressional districts (not what I want, I want to do this for Senate voting), but I figured this would be a starting point or baseline for hoping to create a similar map for a roll call vote from the Senate. This code can be found at https://www.r-bloggers.com/2020/09/mapping-congressional-roll-calls/. The only thing different is that I provided a function that I found on another website to directly read a congressional district shapefile from the website where they are housed courtesy of the UCLA Political Science Department:
# Workflow for mapping congressional district roll call voting results
library(Rvoteview)
library(tidyverse)
devtools::install_github("jaytimm/wnomadds")
library(wnomadds)
library(sf)
library(tigris)
# Function to download a shapefile for any congressional district of your choice.
get_congress_map <- function(cong=113) {
tmp_file <- tempfile()
tmp_dir <- tempdir()
zp <- sprintf("http://cdmaps.polisci.ucla.edu/shp/districts%03i.zip",cong)
download.file(zp, tmp_file)
unzip(zipfile = tmp_file, exdir = tmp_dir)
fpath <- paste(tmp_dir, sprintf("districtShapes/districts%03i.shp",cong), sep = "/")
st_read(fpath)
}
# Get the shapefile for the 89th congress
cd89 <- get_congress_map(cong = 89)
options(tigris_use_cache = TRUE, tigris_class = "sf")
# List the FIPS for US territories (and Alaska and Hawaii) that we won't include in maps.
nonx <- c('78', '69', '66', '72', '60', '15', '02')
# Create a simple states dataframe
states <- tigris::states(cb = TRUE) %>%
data.frame() %>%
select(STATEFP, STUSPS) %>%
rename(state_abbrev = STUSPS)
# Join the congressional districts shapefile with the simple states dataframe we
# created above.
cd_sf <- cd89 %>%
mutate(STATEFP = substr(ID, 2, 3),
district_code = as.numeric(substr(ID, 11, 12))) %>%
left_join(states, by = "STATEFP") %>%
filter(!STATEFP %in% nonx) %>%
select(STATEFP, state_abbrev, district_code)
# Download rollcall data from the Voteview database. Here for the Voting
# Rights Act of 1965
vra <- Rvoteview::voteview_search('("VOTING RIGHTS ACT OF 1965") AND (congress:89)
AND (chamber:house)') %>%
filter( date == '1965-07-09') %>%
janitor::clean_names()
votes <- Rvoteview::voteview_download(vra$id)
names(votes) <- gsub('\\.', '_', names(votes))
# Restructure the roll call voting data stored in votes
big_votes <- votes$legis_long_dynamic %>%
left_join(votes$votes_long, by = c("id", "icpsr")) %>%
filter(!grepl('POTUS', cqlabel)) %>%
group_by(state_abbrev) %>%
mutate(n = length(district_code)) %>%
ungroup() %>%
mutate(avote = case_when(vote %in% c(1:3) ~ 'Yea',
vote %in% c(4:6) ~ 'Nay',
vote %in% c(7:9) ~ 'Not Voting'),
party_code = case_when(party_code == 100 ~ 'Dem',
party_code == 200 ~ 'Rep' ),
Party_Member_Vote = paste0(party_code, ': ', avote),
## fix at-large --
district_code = ifelse(district_code %in% c(98, 99), 0, district_code),
district_code = ifelse(n == 1 & district_code == 1, 0, district_code),
district_code = as.integer(district_code)) %>%
select(-n)
#Members who represent historical “at-large” districts are
##assigned 99, 98, or 1 in various circumstances. Per VoteView.
# Make the Party_Member_Vote variable a factor and change the order of its levels.
big_votes$Party_Member_Vote <- factor(big_votes$Party_Member_Vote)
big_votes$Party_Member_Vote <-
factor(big_votes$Party_Member_Vote,
levels(big_votes$Party_Member_Vote)[c(3,6,1,4,2,5)])
# Join the roll call voting data with the shapefile and plot.
cd_sf_w_rolls <- cd_sf %>%
left_join(big_votes, by = c("state_abbrev", "district_code"))
main1 <- cd_sf_w_rolls %>%
ggplot() +
geom_sf(aes(fill = Party_Member_Vote),
color = 'white',
size = .25) +
wnomadds::scale_fill_rollcall() +
theme_minimal() +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.title.y=element_blank(),
axis.text.y=element_blank(),
legend.position = 'none') # +
main1 + ggtitle(vra$short_description)
This is fine for mapping congressional districts on roll call votes by the house. I am trying to figure out a way to reproduce a similar map for senate roll call votes. So, I started with the same workflow and am not sure how to proceed further or if it is even possible:
# Now I want to make a similar map for the senators of each state, not
# the representatives.
# I want to include Hawaii and Alaska in my Senate maps, so remove those FIPS
# from the vector.
non_states <- c('78', '69', '66', '72', '60', '11')
# No congressional district shapefile is therefore needed. So, here I just make
# a simple sf dataframe for the US States. Set the coordinate reference system to
# 4326 (World Geodetic System 1984) because I want to render the map in Leaflet
# and that's the reference system Leaflet uses.
states_Senate <- tigris::states(cb = TRUE) %>%
st_as_sf(crs = 4326) %>%
select(STATEFP, STUSPS, geometry) %>%
filter(!STATEFP %in% non_states) %>%
rename(state_abbrev = STUSPS)
# Query a roll call vote in the Voteview database. Any vote will work, here
# a vote related to marketing of non-prescription drugs in the 116th congress in the
# Senate now, not the House.
vra2 <- Rvoteview::voteview_search('("A bill to amend the Federal Food, Drug, and Cosmetic Act")
AND (congress:116) AND (chamber:senate)') %>%
janitor::clean_names()
votes2 <- Rvoteview::voteview_download(vra2$id)
names(votes2) <- gsub('\\.', '_', names(votes2))
# Restructure the roll call voting data stored in votes2
big_votes2 <- votes2$legis_long_dynamic %>%
left_join(votes2$votes_long, by = c("id", "icpsr")) %>%
filter(!grepl('POTUS', cqlabel)) %>%
mutate(avote = case_when(vote %in% c(1:3) ~ 'Yea',
vote %in% c(4:6) ~ 'Nay',
vote %in% c(7:9) ~ 'Not Voting'),
party_code = case_when(party_code == 100 ~ 'Dem',
party_code == 200 ~ 'Rep' ),
Party_Member_Vote = paste0(party_code, ': ', avote))
# Now I have a dataframe, big_votes2 that has 2 rows for each state. I need to figure
# out how to color the polygons for each state based on the unique combination of
# party affiliation and vote cast.
# Make Party_Member_Vote a factor like the congressional district workflow above,
# join big_votes2 with states_Senate sf dataframe, and plot........finishing this
# workflow and making a Senate map is essentially my question.
My hope is make a final map that looks similiar to the following (found at https://voteview.com/rollcall/RS1160389), which is the resulting roll call vote for the example query
I provide in the script of my workflow for creating a senate map directly above (the roll call vote about non-prescription drugs). This is probably done in Javascript, maybe D3, but I am working on an R Shiny app looking at roll call voting, so I am strictly looking to do this in R.
Here the state polygons are "striped" by the senator's party and how they voted. If the senators in a state are both one party and vote in unison, the state is a solid color reflecting this. The color palette is based off the voteview_pal provided in the wnomadds package. The colors in this palette don't include senators that consider themselves independent, but I can update the palette if there is a solution to creating the striping pattern within the state polygons. In my use of R I can't think of a way to accomplish this, since color fills are based on unique levels of a factor variable and here we have to rows per state, as the dataframe is being created in this workflow. Additionally, I've never seen a pattern or stripe fill in ggplot that could accomplish this even if the dataframe was arranged in a way that there were only 1 row/observation per state. If this is even possible, I would want to render this in Leaflet, but if the basic concepts can be accomplished by plotting the sf object in ggplot I would gladly start there. Any help would appreciated.
Related
I am working on a music streaming project, and I am trying to get the top15 global streamings in 2020 and make it an interactive graph.
It successfully showed the top 15 song names as a dataframe, but it failed to show as a bar graph, I wonder where did I do wrong here? Although it worked after I flip the bar graph into horizontal, but the data seem to look a bit off.
It looks like this as a vertical bar graph:
The horizontical bar graph looks like this, but the data seem incorrect:
Here is the code I have:
library("dplyr")
library("ggplot2")
# load the .csv into R studio, you can do this 1 of 2 ways
#read.csv("the name of the .csv you downloaded from kaggle")
spotiify_origional <- read.csv("charts.csv")
spotiify_origional <- read.csv("https://raw.githubusercontent.com/info201a-au2022/project-group-1-section-aa/main/data/charts.csv")
View(spotiify_origional)
# filters down the data
# removes the track id, explicit, and duration columns
spotify_modify <- spotiify_origional %>%
select(name, country, date, position, streams, artists, genres = artist_genres)
#returns all the data just from 2022
#this is the data set you should you on the project
spotify_2022 <- spotify_modify %>%
filter(date >= "2022-01-01") %>%
arrange(date) %>%
group_by(date)
# use write.csv() to turn the new dataset into a .csv file
write.csv(Your DataFrame,"Path to export the DataFrame\\File Name.csv", row.names = FALSE)
write.csv(spotify_2022, "/Users/oliviasapp/Documents/info201/project-group-1-section-aa/data/spotify_2022.csv" , row.names = FALSE)
# then I pushed the spotify_2022.csv to the GitHub repo
View(spotiify_origional)
spotify_2022_global <- spotify_modify %>%
filter(date >= "2022-01-01") %>%
filter(country == "global") %>%
arrange(date) %>%
group_by(streams)
View(spotify_2022_global)
top_15 <- spotify_2022_global[order(spotify_2022_global$streams, decreasing = TRUE), ]
top_15 <- top_15[1:15,]
top_15$streams <- as.numeric(top_15$streams)
View(top_15)
col_chart <- ggplot(data = top_15) +
geom_col(mapping = aes(x = name, y = streams)) +
ggtitle("Top 15 Songs Daily Streamed Globally") +
theme(plot.title = element_text(hjust = 0.5))
col_chart <- col_chart + coord_cartesian(ylim = c(999000,1000000)) + coord_flip()
col_chart
Thank you so much! Any suggestions will hugely help!
top_15 <- spotify_2022_global[order(spotify_2022_global$streams, decreasing = TRUE), ]
This code sorts in decreasing order, but the streams data here is still of character type, so numbers like 999975 will be "higher" than 1M, which is why your data looks weird. One song had two weeks just under 1M which is why it shows up with ~2M.
If you use this instead you'll get more what you intended:
top_15 <- spotify_2022_global[order(as.numeric(spotify_2022_global$streams), decreasing = TRUE), ]
However, this is finding the highest song-weeks, not the highest songs, so in this case all 15 highest song-weeks were one song.
I'd suggest you group_by(name) and then summarize to get total streams by song, filter top 15, and then make name an ordered factor, e.g. with forcats::fct_reorder.
I am at the final stages of a project where i have been comparing the appraisal price vs the sold price of different properties. The complete code for data collection and tidying is below.
At this stage i am looking at different ways to visualize my data. However, I am quite new to it so my question is whether anyone has any "new" or special ways they visualizing data that they find usefull og intuitive. I have given a couple of examples of what i am able to visualize now using ggplot.
Additionally: Now my visualizations plots all 1275 observations every time. I would however also like to visualize the data both with mean and median for the Percentage, Sold and Tax variables which i am most interested in. For example to visualize the mean value of the Percentage column based on different years.
Appreciate any help!
Complete code:
#Step 1: Load needed library
library(tidyverse)
library(rvest)
library(jsonlite)
library(stringi)
library(dplyr)
library(data.table)
library(ggplot2)
#Step 2: Access the URL of where the data is located
url <- "https://www.forsvarsbygg.no/ListApi/ListContent/78635/SoldEstates/0/10/"
#Step 3: Direct JSON as format of data in URL
data <- jsonlite::fromJSON(url, flatten = TRUE)
#Step 4: Access all items in API
totalItems <- data$TotalNumberOfItems
#Step 5: Summarize all data from API
allData <- paste0('https://www.forsvarsbygg.no/ListApi/ListContent/78635/SoldEstates/0/', totalItems,'/') %>%
jsonlite::fromJSON(., flatten = TRUE) %>%
.[1] %>%
as.data.frame() %>%
rename_with(~str_replace(., "ListItems.", ""), everything())
#Step 6: removing colunms not needed
allData <- allData[, -c(1,4,8,9,11,12,13,14,15)]
#Step 7: remove whitespace and change to numeric in columns SoldAmount and Tax
#https://stackoverflow.com/questions/71440696/r-warning-argument-is-not-an-atomic-vector-when-attempting-to-remove-whites/71440806#71440806
allData[c("Tax", "SoldAmount")] <- lapply(allData[c("Tax", "SoldAmount")], function(z) as.numeric(gsub(" ", "", z)))
#Step 8: Remove rows where value is NA
#https://stackoverflow.com/questions/4862178/remove-rows-with-all-or-some-nas-missing-values-in-data-frame
alldata <- allData %>%
filter(across(where(is.numeric),
~ !is.na(.)))
#Step 9: Remove values below 10000 NOK on SoldAmount og Tax.
alldata <- alldata %>%
filter_all(any_vars(is.numeric(.) & . > 10000))
#Step 10: Calculate percentage change between tax and sold amount and create new column with percent change
#df %>% mutate(Percentage = number/sum(number))
alldata_Percent <- alldata %>% mutate(Percentage = (SoldAmount-Tax)/Tax)
Visualization
# Plot Percentage difference based on County
ggplot(data=alldata_Percent,mapping = aes(x = Percentage, y = County)) +
geom_point(size = 1.5)
#Plot County with both Date and Percentage difference The The
theme_set(new = ggthemes::theme_economist())
p <- ggplot(data = alldata_Percent,
mapping = aes(x = Date, y = Percentage, colour = County)) +
geom_line(na.rm = TRUE) +
geom_point(na.rm = TRUE)
p
I'm working through eBird code from this webpage:
https://github.com/CornellLabofOrnithology/ebird-best-practices/blob/master/03_covariates.Rmd
with the exception of using my own data. I have a .gpkg from gadm.org of Australia, and my own ebird data selected for Australia. I have followed out the code exactly with the exception of not using "bcr" as my dataset has no bcr codes, along with removing st_buffer(dist = 10000) from the rgdal code because this prevented me from actually downloading the MODIS data for some reason.
EDIT:I have also used the provided data from the site and still received the same error
I got stuck at this code:
lc_extract <- ebird_buff %>%
mutate(pland = map2(year_lc, data, calculate_pland, lc = landcover)) %>%
select(pland) %>%
unnest(cols = pland)
It returns this error:
Error: Problem with `mutate()` input `pland`.
x error in evaluating the argument 'x' in selecting a method for function 'exact_extract': invalid layer names
i Input `pland` is `map2(year_lc, data, calculate_pland, lc = landcover)`.)`
I can not seem to figure out how to correct it, I'm rather new to dense geo-spatial code like this.
There is a free dataset in the link, but I haven't yet tried it out, so it may be that my data is incompatible with the code? however, I have had a look at the Gis-data.gpkg provided, and my data from gadm seems fine.
The previous two codes to the one above were:
neighborhood_radius <- 5 * ceiling(max(res(landcover))) / 2
ebird_buff <- red_knot %>%
distinct(year = format(observation_date, "%Y"),
locality_id, latitude, longitude) %>%
# for 2019 use 2018 landcover data
mutate(year_lc = if_else(as.integer(year) > max_lc_year,
as.character(max_lc_year), year),
year_lc = paste0("y", year_lc)) %>%
# convert to spatial features
st_as_sf(coords = c("longitude", "latitude"), crs = 4326) %>%
# transform to modis projection
st_transform(crs = projection(landcover)) %>%
# buffer to create neighborhood around each point
st_buffer(dist = neighborhood_radius) %>%
# nest by year
nest(data = c(year, locality_id, geometry))
calculate_pland <- function(yr, regions, lc) {
locs <- st_set_geometry(regions, NULL)
exact_extract(lc[[yr]], regions, progress = FALSE) %>%
map(~ count(., landcover = value)) %>%
tibble(locs, data = .) %>%
unnest(data)
}
This has been answered by the author of the webpage.
The solution was this code:
lc_extract <- NULL
for (yr in names(landcover)) {
# get the buffered checklists for a given year
regions <- ebird_buff$data[[which(yr == ebird_buff$year_lc)]]
# get landcover values within each buffered checklist area
ee <- exact_extract(landcover[[yr]], regions, progress = FALSE)
# count the number of each landcover class for each checklist buffer
ee_count <- map(ee, ~ count(., landcover = value))
# attach the year and locality id back to the checklists
ee_summ <- tibble(st_drop_geometry(regions), data = ee_count) %>%
unnest(data)
# bind to results
lc_extract <- bind_rows(lc_extract, ee_summ)
}
credits go to:
Matt Strimas-Mackey
Can you help figure out the best way to resolve the length mismatch error thrown by dotsInPolys? I think it is because there are NA's or NULLs or some funk in the polygon data that makes it too long. Here's code that reproduces the error. Ultimately, I want to plot multiple races using Leaflet, but I can't produce the lat/lon needed for the random dots at this point.
require(maptools)
require(tidycensus)
person.number.divider <- 1000
census_api_key("ENTER KEY HERE", install = TRUE)
racevars <- c(White = "B02001_002", #"P005003"
Black = "B02001_003", #Black or African American alone
Latinx = "B03001_003"
)
nj.county <- get_acs(geography = "county", #tract
year = 2015,
variables = racevars,
state = "NJ", #county = "Harris County",
geometry = TRUE,
summary_var = "B02001_001")
library(sf)
st_write(nj.county, "nj.county.shp", delete_layer = TRUE)
nj <- rgdal::readOGR(dsn = "nj.county.shp") %>%
spTransform(CRS("+proj=longlat +datum=WGS84"))
nj#data <- nj#data %>%
tidyr::separate(NAME,
sep =",",
into = c("county", "state")) %>%
dplyr::select(estimat,variabl, GEOID, county) %>%
spread(key = variabl, value = estimat) %>%
mutate(county = trimws(county))
black.dots <- dplyr::select(nj#data, Black) / person.number.divider #%>%
black.dots <- dotsInPolys(nj, as.integer(black.dots$Black), f="random")
# Error in dotsInPolys(nj, as.integer(black.dots$Black), f = "random") :
# different lengths
length(nj) # 63 This seems too many, because I believe NJ has 21 counties.
length(black.dots$Black) # 21
This post (Advice on troubleshooting dotsInPolys error (maptools)) came close to helping me, but I couldn't see how to apply it to my case.
I can change the length of the nj spatialpolygonsdataframe by removing NA's and counties with a black pop greater than 0, but then the map doesn't plot multiple counties (maybe there is something wrong with the census download?).
It looks like you might have gotten this figured out, but I wanted to share another approach that uses sf::st_sample() instead of maptools::dotsInPolys(). One advantage of this is that you don't need to convert the sf object you get from tidycensus to a sp object.
In the following example I split the census data by race into a list three sf objects then perform st_sample() on each element of the list (each race). Next, I recombine the sampled points into one sf object with a new race variable for each point. Finally, I use tmap to make a map, though you could use ggplot2 or leaflet to map as well.
library(tidyverse)
library(tidycensus)
library(sf)
library(tmap)
person.number.divider <- 1000
racevars <- c(White = "B02001_002", #"P005003"
Black = "B02001_003", #Black or African American alone
Latinx = "B03001_003"
)
# get acs data with geography in "tidy" form
nj.county <- get_acs(geography = "county", #tract
year = 2015,
variables = racevars,
state = "NJ", #county = "Harris County",
geometry = TRUE,
summary_var = "B02001_001"
)
# split by race
county.split <- nj.county %>%
split(.$variable)
# randomly sample points in polygons based on population
points.list <- map(county.split, ~ st_sample(., .$estimate / person.number.divider))
# combine points into sf collections and add race variable
points <- imap(points.list, ~ st_sf(tibble(race = rep(.y, length(.x))), geometry = .x)) %>%
reduce(rbind)
# map!
tm_shape(nj.county) +
tm_borders(col = "darkgray", lwd = 0.5) +
tm_shape(points) +
tm_dots(col = "race", size = 0.01, pal = "Set2")
I don't have enough rep to post the map image directly, but here it is.
I am trying to use the 'Synth' package in R to explore the effect that certain coups had on economic growth in the countries where they occurred, but I'm hung up on an error I can't understand. When I attempt to run dataprep(), I get the following:
Error in dataprep(foo = World, predictors = c("rgdpe.pc", "population.ln", :
unit.variable not found as numeric variable in foo.
That's puzzling because my data frame, World, does include a numeric id called "idno" as specified in the call to dataprep().
Here is the script I'm using. It ingests a .csv with the requisite data from GitHub. The final step --- the call to dataprep() --- is where the error arises. I would appreciate help in figuring out why this error arises and how to avoid it so I can get on to the synth() part to follow.
library(dplyr)
library(Synth)
# DATA INGESTION AND TRANSFORMATION
World <- read.csv("https://raw.githubusercontent.com/ulfelder/coups-and-growth/master/data.raw.csv", stringsAsFactors=FALSE)
World$rgdpe.pc = World$rgdpe/World$pop # create per capita version of GDP (PPP)
World$idno = as.numeric(as.factor(World$country)) # create numeric country id
World$population.ln = log(World$population/1000) # population size in 1000s, logged
World$trade.ln = log(World$trade) # trade as % of GDP, logged
World$civtot.ln = log1p(World$civtot) # civil conflict scale, +1 and logged
World$durable.ln = log1p(World$durable) # political stability, +1 and logged
World$polscore = with(World, ifelse(polity >= -10, polity, NA)) # create version of Polity score that's missing for -66, -77, and -88
World <- World %>% # create clocks counting years since last coup (attempt) or 1950, whichever is most recent
arrange(countrycode, year) %>%
mutate(cpt.succ.d = ifelse(cpt.succ.n > 0, 1, 0),
cpt.any.d = ifelse(cpt.succ.n > 0 | cpt.fail.n > 0, 1, 0)) %>%
group_by(countrycode, idx = cumsum(cpt.succ.d == 1L)) %>%
mutate(cpt.succ.clock = row_number()) %>%
ungroup() %>%
select(-idx) %>%
group_by(countrycode, idx = cumsum(cpt.any.d == 1L)) %>%
mutate(cpt.any.clock = row_number()) %>%
ungroup() %>%
select(-idx) %>%
mutate(cpt.succ.clock.ln = log1p(cpt.succ.clock), # include +1 log versions
cpt.any.clock.ln = log1p(cpt.any.clock))
# THAILAND 2006
THI.coup.year = 2006
THI.years = seq(THI.coup.year - 5, THI.coup.year + 5)
# Get names of countries that had no coup attempts during window analysis will cover. If you wanted to restrict the comparison to a
# specific region or in any other categorical way, this would be the place to do that as well.
THI.controls <- World %>%
filter(year >= min(THI.years) & year <= max(THI.years)) %>% # filter to desired years
group_by(idno) %>% # organize by country
summarise(coup.ever = sum(cpt.any.d)) %>% # get counts by country of years with coup attempts during that period
filter(coup.ever==0) %>% # keep only the ones with 0 counts
select(idno) # cut down to country names
THI.controls = unlist(THI.controls) # convert that data frame to a vector
names(THI.controls) = NULL # strip the vector of names
THI.synth.dat <- dataprep(
foo = World,
predictors = c("rgdpe.pc", "population.ln", "trade.ln", "fcf", "govfce", "energy.gni", "polscore", "durable.ln", "cpt.any.clock.ln", "civtot.ln"),
predictors.op = "mean",
time.predictors.prior = seq(from = min(THI.years), to = THI.coup.year - 1),
dependent = "rgdpe.pc",
unit.variable = "idno",
unit.names.variable = "country",
time.variable = "year",
treatment.identifier = unique(World$idno[World$country=="Thailand"]),
controls.identifier = THI.controls,
time.optimize.ssr = seq(from = THI.coup.year, to = max(THI.years)),
time.plot = THI.years
)
Too long for a comment.
Your dplyr statement:
World <- World %>% ...
converts World from a data.frame to a tbl_df object (read the docs on dplyr). Unfortunately, this causes mode(World[,"idno"]) to return list, not numeric and the test for numeric unit.variable fails.
You can fix this by using
`World <- as.data.frame(World)`
just before the call to dataprep(...).
Unfortunately (again) you now get a different error which may be due to the logic of your dplyr statement.