This question already has an answer here:
Set NA and "" Cells in R Dataframe to NULL
(1 answer)
Closed 4 years ago.
I have a dataframe where I want to replace all values in a column that contain the value '2018' with NULL.
I have a dataset where every value in a column is a list. There are NULLs included as well. One of the values is not a list and I want to replace it with a NULL. If I replace it with NA then the datatypes in that column are mixed.
If I have a column like below, how do I replace the value containing 2018 with NULL instead of NA?
spend actions
176.2 2018-02-24
166.66 list(action_type = c("landing_page_view", "link_click", "offsit...
153.89 list(action_type = c("landing_page_view", "like", "link_click",...
156.54 list(action_type = c("landing_page_view", "like", "link_click",...
254.95 list(action_type = c("landing_page_view", "like", "link_click",...
374 list(action_type = c("landing_page_view", "like", "link_click",...
353.29 list(action_type = c("landing_page_view", "like", "link_click",...
0.41 NULL
Reproducible Example:
structure(list(spend = c("176.2", "166.66", "153.89", "156.54",
"254.95", "374", "353.29", "0.41"), actions = list("2018-02-24",
structure(list(action_type = c("landing_page_view", "link_click",
"offsite_conversion.fb_pixel_add_to_cart",
"offsite_conversion.fb_pixel_purchase",
"offsite_conversion.fb_pixel_search",
"offsite_conversion.fb_pixel_view_content",
"post", "post_reaction", "page_engagement", "post_engagement",
"offsite_conversion"), value = c("179", "275", "212", "18",
"269", "1434", "1", "17", "293", "293", "1933")), .Names = c("action_type",
"value"), class = "data.frame", row.names = c(NA, 11L)),
structure(list(action_type = c("landing_page_view", "like",
"link_click", "offsite_conversion.fb_pixel_add_to_cart",
"offsite_conversion.fb_pixel_purchase",
"offsite_conversion.fb_pixel_search",
"offsite_conversion.fb_pixel_view_content", "post_reaction",
"page_engagement", "post_engagement", "offsite_conversion"
), value = c("136", "3", "248", "101", "6", "237", "730",
"11", "262", "259", "1074")), .Names = c("action_type", "value"
), class = "data.frame", row.names = c(NA, 11L)), structure(list(
action_type = c("landing_page_view", "like", "link_click",
"offsite_conversion.fb_pixel_add_to_cart",
"offsite_conversion.fb_pixel_purchase",
"offsite_conversion.fb_pixel_search",
"offsite_conversion.fb_pixel_view_content",
"post", "post_reaction", "page_engagement", "post_engagement",
"offsite_conversion"), value = c("95", "1", "156", "91",
"5", "83", "532", "1", "13", "171", "170", "711")), .Names =
c("action_type",
"value"), class = "data.frame", row.names = c(NA, 12L)),
structure(list(action_type = c("landing_page_view", "like",
"link_click", "offsite_conversion.fb_pixel_add_to_cart",
"offsite_conversion.fb_pixel_purchase",
"offsite_conversion.fb_pixel_search",
"offsite_conversion.fb_pixel_view_content", "post_reaction",
"page_engagement", "post_engagement", "offsite_conversion"
), value = c("178", "4", "243", "56", "4", "138", "437",
"19", "266", "262", "635")), .Names = c("action_type", "value"
), class = "data.frame", row.names = c(NA, 11L)), structure(list(
action_type = c("landing_page_view", "like", "link_click",
"offsite_conversion.fb_pixel_add_to_cart",
"offsite_conversion.fb_pixel_purchase",
"offsite_conversion.fb_pixel_search",
"offsite_conversion.fb_pixel_view_content",
"post_reaction", "page_engagement", "post_engagement",
"offsite_conversion"), value = c("203", "2", "306", "105",
"7", "186", "954", "23", "331", "329", "1252")), .Names =
c("action_type",
"value"), class = "data.frame", row.names = c(NA, 11L)),
structure(list(action_type = c("landing_page_view", "like",
"link_click", "offsite_conversion.fb_pixel_add_to_cart",
"offsite_conversion.fb_pixel_purchase",
"offsite_conversion.fb_pixel_search",
"offsite_conversion.fb_pixel_view_content", "post", "post_reaction",
"page_engagement", "post_engagement", "offsite_conversion"
), value = c("241", "4", "320", "106", "3", "240", "789",
"1", "17", "342", "338", "1138")), .Names = c("action_type",
"value"), class = "data.frame", row.names = c(NA, 12L)),
NULL)), .Names = c("spend", "actions"), row.names = c(NA,
-8L), class = "data.frame")
My ultimate goal is to use this function with this dataset to make the action_types their own column. This function works when either a list or NULL is in the actions column:
fb_insights_all<-df %>%
as.tibble() %>%
filter(!map_lgl(actions, is.null)) %>%
unnest() %>%
right_join(select(df, -actions)) %>%
spread(action_type, value)
Error: Each column must either be a list of vectors or a list of data frames [actions]
Without data to test this on, I'd try:
df$COL1<-ifelse(grepl("2018", df$COL1),"NULL",df$COL1)
As stated here NA functions more like what you seem to be trying to do, while NULL serves a different function. If you just want the value to just say "NULL" rather than function like NULL, treat it like a character value.
I want to create a slope graph in R like this using ggplot
https://rud.is/b/2013/01/11/slopegraphs-in-r/
after cleaning the data and melt the data frame i ran into an error like this:
Error: Aesthetics must be either length 1 or the same as the data (182): x, y, group, colour, label
There's no NAs in my data. Any ideas? Much appreciated!
Here's the code
#Read file as numeric data
betterlife<-read.csv("betterlife.csv",skip=4,stringsAsFactors = F)
num_data <- data.frame(data.matrix(betterlife))
numeric_columns <- sapply(num_data,function(x){mean(as.numeric(is.na(x)))<0.5})
final_data <- data.frame(num_data[,numeric_columns],
betterlife[,!numeric_columns])
## rescale selected columns data frame
final_data <- data.frame(lapply(final_data[,c(3,4,5,6,7,10,11)], function(x) scale(x, center = FALSE, scale = max(x, na.rm = TRUE)/100)))
## Add country names as indicator
final_data["INDICATOR"] <- NA
final_data$INDICATOR <- betterlife$INDICATOR
employment.data <- final_data[5:30,]
indicator <- employment.data$INDICATOR
## Melt data to draw graph
employment.melt <- melt(employment.data)
#plot
sg = ggplot(employment.melt, aes(factor(variable), value,
group = indicator,
colour = indicator,
label = indicator)) +
theme(legend.position = "none",
axis.text.x = element_text(size=5),
axis.text.y=element_blank(),
axis.title.x=element_blank(),
axis.title.y=element_blank(),
axis.ticks=element_blank(),
axis.line=element_blank(),
panel.grid.major.x = element_line("black", size = 0.1),
panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank(),
panel.background = element_blank())
sg1
This is the data I'm working with
dput(betterlife)
structure(list(X = c("", "ISO3", "AUS", "AUT", "BEL", "CAN",
"CHL", "CZE", "DNK", "EST", "FIN", "FRA", "DEU", "GRC", "HUN",
"ISL", "IRL", "ISR", "ITA", "JPN", "KOR", "LUX", "MEX", "NLD",
"NZL", "NOR", "POL", "PRT", "SVK", "SVN", "ESP", "SWE", "CHE",
"TUR", "GBR", "USA", "OECD", "", ""),
INDICATOR = c("UNIT", "COUNTRY",
"Australia", "Austria", "Belgium", "Canada", "Chile", "Czech Republic",
"Denmark", "Estonia", "Finland", "France", "Germany", "Greece",
"Hungary", "Iceland", "Ireland", "Israel", "Italy", "Japan",
"Korea", "Luxembourg", "Mexico", "Netherlands", "New Zealand",
"Norway", "Poland", "Portugal", "Slovak Republic", "Slovenia",
"Spain", "Sweden", "Switzerland", "Turkey", "United Kingdom",
"United States", "OECD average", "", "n.a. : not available"),
Rooms.per.person = c("Average number of rooms shared per person in a dwelling",
"", "2.4", "1.7", "2.3", "2.5", "1.3", "1.3", "1.9", "1.2",
"1.9", "1.8", "1.7", "1.2", "1", "1.6", "2.1", "1.1", "1.4",
"1.8", "1.3", "1.9", "1.566666667", "2", "2.3", "1.9", "1",
"1.5", "1.1", "1.1", "1.9", "1.8", "1.7", "0.7", "1.8", "1.605208333",
"1.6", "", ""),
Dwelling.without.basic.facilities = c("% of people without indoor flushing toilets in their home",
"", "3.425714286", "1.3", "0.6", "2.722", "9.36", "0.7",
"0", "12.2", "0.8", "0.8", "1.2", "1.8", "7.1", "0.3", "0.3",
"2.52", "0.2", "6.4", "7.46", "0.8", "6.6", "0", "2.984285714",
"0.1", "4.8", "2.4", "1.1", "0.6", "0", "0", "0.1", "17.1",
"0.5", "0", "2.82", "", ""),
Household.disposable.income = c("USD (PPPs adjusted)",
"", "27,039", "27,670", "26,008", "27,015", "8,712", "16,690",
"22,929", "13,486", "24,246", "27,508", "27,665", "21,499",
"13,858", "19,621", "24,313", "22,539", "24,383", "23,210",
"16,254", "19,621", "12,182", "25,977", "18,819", "29,366",
"13,811", "18,540", "15,490", "19,890", "22,972", "26,543",
"27,542", "21,030", "27,208", "37,685", "22,284", "", ""),
Employment.rate = c("% of the working age population (15-64)",
"", "72.3", "71.73", "62.01", "71.68", "59.32", "65", "73.44",
"61.02", "68.15", "63.99", "71.1", "59.55", "55.4", "78.17",
"59.96", "59.21", "56.89", "70.11", "63.31", "65.21", "60.39",
"74.67", "72.34", "75.31", "59.26", "65.55", "58.76", "66.2",
"58.55", "72.73", "78.59", "46.29", "69.51", "66.71", "64.52",
"", ""),
Long.term.unemployment.rate = c("% of people, aged 15-64, who are not working but have been actively seeking a job for over a year",
"", "1", "1.13", "4.07", "0.97", "2.98375", "3.19", "1.44",
"7.84", "2.01", "3.75", "3.4", "5.73", "5.68", "1.35", "6.74",
"1.85", "4.13", "1.99", "0.01", "1.29", "0.13", "1.24", "0.6",
"0.34", "2.49", "5.97", "8.56", "3.21", "9.1", "1.42", "1.49",
"3.11", "2.59", "2.85", "2.74", "", ""),
Quality.of.support.network = c("% of people who have friends or relatives to rely on in case of need",
"", "95.4", "94.6", "92.6", "95.3", "85.2", "88.9", "96.8",
"84.6", "93.4", "93.9", "93.5", "86.1", "88.6", "97.6", "97.3",
"93", "86", "89.7", "79.8", "95", "87.1", "94.8", "97.1",
"93.1", "92.2", "83.3", "89.6", "90.7", "94.1", "96.2", "93.2",
"78.8", "94.9", "92.3", "91.1", "", ""),
Educational.attainment = c("% of people, aged 15-64, having at least an upper-secondary (high-school) degree",
"", "69.72", "81.04", "69.58", "87.07", "67.97", "90.9",
"74.56", "88.48", "81.07", "69.96", "85.33", "61.07", "79.7",
"64.13", "69.45", "81.23", "53.31", "87", "79.14", "67.94",
"33.55", "73.29", "72.05", "80.7", "87.15", "28.25", "89.93",
"82.04", "51.23", "85.04", "86.81", "30.31", "69.63", "88.7",
"72.95", "", ""),
Students.reading.skills = c("Average reading performance of students aged 15, according to PISA",
"", "515", "470", "506", "524", "449", "478", "495", "501",
"536", "496", "497", "483", "494", "500", "496", "474", "486",
"520", "539", "472", "425", "508", "521", "503", "500", "489",
"477", "483", "481", "497", "501", "464", "494", "500", "493",
"", ""),
Air.pollution = c("Average concentration of particulate matter (PM10) in cities with population larger than 100 000, measured in micrograms per cubic meter",
"", "14.28", "29.03", "21.27", "15", "61.55", "18.5", "16.26",
"12.62", "14.87", "12.94", "16.21", "32", "15.6", "14.47",
"12.54", "27.57", "23.33", "27.14", "30.76", "12.63", "32.69",
"30.76", "11.93", "15.85", "35.07", "21", "13.14", "29.03",
"27.56", "10.52", "22.36", "37.06", "12.67", "19.4", "21.99",
"", ""),
Consultation.on.rule.making = c("Composite index, increasing with the number of key elements of formal consultation processes",
"", "10.5", "7.13", "4.5", "10.5", "2", "6.75", "7", "3.25",
"9", "3.5", "4.5", "6.5", "7.88", "5.13", "9", "2.5", "5",
"7.25", "10.38", "6", "9", "6.13", "10.25", "8.13", "10.75",
"6.5", "6.63", "10.25", "7.25", "10.88", "8.38", "5.5", "11.5",
"8.25", "7.28", "", ""),
Voter.turnout = c("Number of people voting as % of the registered population ",
"", "95", "82", "91", "60", "88", "64", "87", "62", "74",
"84", "78", "74", "64", "84", "67", "65", "81", "67", "63",
"57", "59", "80", "79", "77", "54", "64", "55", "63", "75",
"82", "48", "84", "61", "90", "72", "", ""),
Life.expectancy = c("Average number of years a person can expect to live",
"", "81.5", "80.5", "79.8", "80.7", "77.8", "77.3", "78.8",
"73.9", "79.9", "81", "80.2", "80", "73.8", "81.3", "79.9",
"81.1", "81.5", "82.7", "79.9", "80.6", "75.1", "80.2", "80.4",
"80.6", "75.6", "79.3", "74.8", "78.8", "81.2", "81.2", "82.2",
"73.6", "79.7", "77.9", "79.2", "", ""),
Self.reported.health = c("% of people reporting their health to be \"good or very good\"",
"", "84.9", "69.6", "76.7", "88.1", "56.2", "68.2", "74.3",
"56.3", "67.7", "72.4", "64.7", "76.4", "55.2", "80.6", "84.4",
"79.7", "63.4", "32.7", "43.7", "74", "65.5", "80.6", "89.7",
"80", "57.7", "48.6", "31.1", "58.8", "69.8", "79.1", "80.95",
"66.8", "76", "88", "69", "", ""),
Life.Satisfaction = c("Average self-evaluation of life satisfaction, on a scale from 0 to 10",
"", "7.5", "7.3", "6.9", "7.7", "6.6", "6.2", "7.8", "5.1",
"7.4", "6.8", "6.7", "5.8", "4.7", "6.9", "7.3", "7.4", "6.4",
"6.1", "6.1", "7.1", "6.8", "7.5", "7.2", "7.6", "5.8", "4.9",
"6.1", "6.1", "6.2", "7.5", "7.5", "5.5", "7", "7.2", "6.7",
"", ""),
Homicide.rate = c("Average number of reported homicides per 100 000 people",
"", "1.2", "0.5", "1.8", "1.7", "8.1", "2", "1.4", "6.3",
"2.5", "1.4", "0.8", "1.1", "1.5", "0", "2", "2.4", "1.2",
"0.5", "2.3", "1.5", "11.6", "1", "1.3", "0.6", "1.2", "1.2",
"1.7", "0.5", "0.9", "0.9", "0.7", "2.9", "2.6", "5.2", "2.1",
"", ""),
Assault.rate = c("% of people who report having been assaulted in the previous year",
"", "2.1", "3", "7.3", "1.4", "9.5", "3.5", "3.9", "6.2",
"2.4", "4.9", "3.6", "3.8", "3.8", "2.7", "2.7", "3.1", "4.7",
"1.6", "2.1", "4.3", "14.8", "5", "2.3", "3.3", "2.2", "6.2",
"3.5", "3.9", "4.2", "5.2", "4.2", "6", "1.9", "1.6", "4.1",
"", "")),
.Names = c("X", "INDICATOR", "Rooms.per.person", "Dwelling.without.basic.facilities",
"Household.disposable.income", "Employment.rate",
"Long.term.unemployment.rate", "Quality.of.support.network",
"Educational.attainment", "Students.reading.skills", "Air.pollution",
"Consultation.on.rule.making", "Voter.turnout", "Life.expectancy",
"Self.reported.health", "Life.Satisfaction", "Homicide.rate",
"Assault.rate"), class = "data.frame", row.names = c(NA, -39L))
Did I melt the data frame wrongly? since the index of each row are not in the correct order
I'm trying to grab some election results from politco's website using rvest.
http://www.politico.com/2016-election/results/map/president/wisconsin/
I couldn't pull all the data on the page at once, so I went for a county-level approach. Each county has a unique css selector (e.g Adams County's is: '#countyAdams .results-table'). So I grabbed all the county names from elsewhere and set up a quick loop (yes I know loops are bad practice in R but I anticipated this method taking me about 3 minutes).
Grab the URL
wiscoSixteen <- read_html("http://www.politico.com/2016-election/results/map/president/wisconsin")
Create an empty data.frame (and no I didn't pre-define the columns)
stateDf <- NULL
Get the list of counties (this isn't complete but to get to the point the routine breaks we don't need all 70 counties)
wiscoCounties <- c("Adams", "Ashland", "Barron", "Bayfield", "Brown", "Buffalo", "Burnett", "Calumet", "Chippewa", "Clark", "Columbia", "Crawford", "Dane", "Dodge", "Door", "Douglas", "Dunn", "Eau Claire", "Florence", "Fond du Lac", "Forest", "Grant", "Green", "Green Lake", "Iowa", "Iron", "Jackson", "Jefferson", "Juneau")
My 'for' loop:
for (i in 1:length(wiscoCounties)){
#Pull out the i'th county name and paste it in a string
wiscoResult <- wiscoSixteen %>% html_node(paste("#county"," .results-table", sep=wiscoCounties[i])) %>% html_table()
#add a column for the county name so I can ID later
wiscoResult[,4] <- wiscoCounties[i]
#then rbind
stateDf <- rbind(stateDf, wiscoResult)
}
When it gets through the 10th county it stops and returns 'Error: No matches'.
Can't find anything unique about 'Columbia', the 11th county. At a loss for what's happening. I'm sure it's something stupid as that's usually the case. Any help is appreciated.
So, why not just use the XHR requests that end up populating those tables (I'm kinda surprised you're getting any data at all from them since they get generated from a separate data request):
library(httr)
library(stringi)
library(purrr)
library(dplyr)
res <- GET("http://s3.amazonaws.com/origin-east-elections.politico.com/mapdata/2016/WI_20161108.xml")
dat <- readLines(textConnection(content(res, as="text")))
stri_split_fixed(dat[2], "|")[[1]] %>%
stri_replace_last_fixed(";", "") %>%
stri_split_fixed(";", 3) %>%
map_df(~setNames(as.list(.), c("rep_id", "first", "last"))) -> candidates
dat[stri_detect_regex(dat, "^WI;P;G")] %>%
stri_replace_first_regex("^WI;P;G;", "") %>%
map_df(function(x) {
county_results <- stri_split_fixed(x, "||", 2)[[1]]
stri_replace_last_fixed(county_results[1], ";;", "") %>%
stri_split_fixed(";") %>%
map_df(~setNames(as.list(.), c("fips", "name", "x1", "reporting", "x2", "x3", "x4"))) -> county_prefix
stri_split_fixed(county_results[2], "|")[[1]] %>%
stri_split_fixed(";") %>%
map_df(~setNames(as.list(.), c("rep_id", "party", "count", "pct", "x5", "x6", "x7", "x8", "candidate_idx"))) %>%
left_join(candidates, by="rep_id") -> df
df$fips <- county_prefix$fips
df$name <- county_prefix$name
df$reporting <- county_prefix$reporting
select(df, -starts_with("x"))
}) -> results
It seems to be complete data:
glimpse(results)
## Observations: 511
## Variables: 10
## $ rep_id <chr> "WI270631108", "WI270621108", "WI270691108", "WI270711108", "WI270701108", "WI270731108", "WI270721108",...
## $ party <chr> "Dem", "GOP", "Lib", "CST", "ADP", "WW", "Grn", "Dem", "GOP", "Lib", "CST", "ADP", "WW", "Grn", "Dem", "...
## $ count <chr> "1382210", "1409467", "106442", "12179", "1561", "1781", "30980", "3780", "5983", "207", "44", "4", "9",...
## $ pct <chr> "46.9", "47.9", "3.6", "0.4", "0.1", "0.1", "1.1", "37.4", "59.2", "2.0", "0.4", "0.0", "0.1", "0.8", "5...
## $ candidate_idx <chr> "1", "2", "3", "4", "5", "6", "7", "1", "2", "3", "4", "5", "6", "7", "1", "2", "3", "4", "5", "6", "7",...
## $ first <chr> "Clinton", "Trump", "Johnson", "Castle", "De La Fuente", "Moorehead", "Stein", "Clinton", "Trump", "John...
## $ last <chr> "Hillary", "Donald", "Gary", "Darrell", "Rocky", "Monica", "Jill", "Hillary", "Donald", "Gary", "Darrell...
## $ fips <chr> "0", "0", "0", "0", "0", "0", "0", "55001", "55001", "55001", "55001", "55001", "55001", "55001", "55003...
## $ name <chr> "Wisconsin", "Wisconsin", "Wisconsin", "Wisconsin", "Wisconsin", "Wisconsin", "Wisconsin", "Adams", "Ada...
## $ reporting <chr> "100.0", "100.0", "100.0", "100.0", "100.0", "100.0", "100.0", "100.0", "100.0", "100.0", "100.0", "100....
Despite the ".xml" extension on the URL, it's not XML data. I also don't know what some of the columns actually are, but you can dig into that. Also, there's a whole other section of data:
WI;S;G;0;Wisconsin;X;100.0;X;;50885;;||WI269201108;Dem;1380496;46.8;;X;;;1|WI267231108;GOP;1479262;50.2;X;X;X;;2|WI270541108;Lib;87291;3.0;;X;;;3
WI;S;G;55001;Adams;X;100.0;X;;50885;;||WI269201108;Dem;4093;41.2;;X;;;1|WI267231108;GOP;5346;53.9;X;X;X;;2|WI270541108;Lib;486;4.9;;X;;;3
WI;S;G;55003;Ashland;X;100.0;X;;50885;;||WI269201108;Dem;4349;55.1;;X;;;1|WI267231108;GOP;3337;42.2;X;X;X;;2|WI270541108;Lib;214;2.7;;X;;;3
WI;S;G;55005;Barron;X;100.0;X;;50885;;||WI269201108;Dem;8691;38.8;;X;;;1|WI267231108;GOP;12863;57.4;X;X;X;;2|WI270541108;Lib;853;3.8;;X;;;3
WI;S;G;55007;Bayfield;X;100.0;X;;50885;;||WI269201108;Dem;5161;54.6;;X;;;1|WI267231108;GOP;4022;42.6;X;X;X;;2|WI270541108;Lib;263;2.8;;X;;;3
WI;S;G;55009;Brown;X;100.0;X;;50885;;||WI269201108;Dem;51004;40.0;;X;;;1|WI267231108;GOP;71750;56.3;X;X;X;;2|WI270541108;Lib;4615;3.6;;X;;;3
WI;S;G;55011;Buffalo;X;100.0;X;;50885;;||WI269201108;Dem;2746;39.9;;X;;;1|WI267231108;GOP;3850;56.0;X;X;X;;2|WI270541108;Lib;285;4.1;;X;;;3
WI;S;G;55013;Burnett;X;100.0;X;;50885;;||WI269201108;Dem;3143;37.4;;X;;;1|WI267231108;GOP;4998;59.5;X;X;X;;2|WI270541108;Lib;258;3.1;;X;;;3
which obviously means something for that page (it's kinda obvious, but I'm so weary from the election that I'm kinda done with the data) and you can process in similar fashion as what is above.