Given this:
kc$sqft_living_group <- cut(kc$sqft_living, breaks = c(0, 1000, 2000, 3000, 5000, 7000, 10000, 15000), dig.lab=5)
How do I set the limit of my ggplot2 graph?
Nothing I can find shows the syntax to set the limit for intervals.
kc %>%
filter(zipcode %in% top_10_zipcodes) %>%
group_by(sqft_living_group) %>%
summarize(Mean_Price = mean(price)) %>%
ggplot(aes(y = Mean_Price, x = sqft_living_group)) +
geom_bar(stat = "identity") +
scale_y_continuous(labels = comma) +
scale_x_discrete(limits = "(0, 1000], (1000, 12000]") <---------- HERE
structure of data:
'data.frame': 21613 obs. of 22 variables:
$ id : num 7.13e+09 6.41e+09 5.63e+09 2.49e+09 1.95e+09 ...
$ date : POSIXct, format: "2014-10-13" "2014-12-09" "2015-02-25" "2014-12-09" ...
$ price : num 221900 538000 180000 604000 510000 ...
$ bedrooms : int 3 3 2 4 3 4 3 3 3 3 ...
$ bathrooms : num 1 2.25 1 3 2 4.5 2.25 1.5 1 2.5 ...
$ sqft_living : int 1180 2570 770 1960 1680 5420 1715 1060 1780 1890 ...
$ sqft_lot : int 5650 7242 10000 5000 8080 101930 6819 9711 7470 6560 ...
$ floors : Factor w/ 6 levels "1","1.5","2",..: 1 3 1 1 1 1 3 1 1 3 ...
$ waterfront : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ view : int 0 0 0 0 0 0 0 0 0 0 ...
$ condition : int 3 3 3 5 3 3 3 3 3 3 ...
$ grade : int 7 7 6 7 8 11 7 7 7 7 ...
$ sqft_above : int 1180 2170 770 1050 1680 3890 1715 1060 1050 1890 ...
$ sqft_basement : int 0 400 0 910 0 1530 0 0 730 0 ...
$ yr_built : int 1955 1951 1933 1965 1987 2001 1995 1963 1960 2003 ...
$ yr_renovated : Factor w/ 70 levels "0","1934","1940",..: 1 46 1 1 1 1 1 1 1 1 ...
$ zipcode : Factor w/ 70 levels "98001","98002",..: 67 56 17 59 38 30 3 69 61 24 ...
$ lat : num 47.5 47.7 47.7 47.5 47.6 ...
$ long : num -122 -122 -122 -122 -122 ...
$ sqft_living15 : int 1340 1690 2720 1360 1800 4760 2238 1650 1780 2390 ...
$ sqft_lot15 : int 5650 7639 8062 5000 7503 101930 6819 9711 8113 7570 ...
$ sqft_living_group: Factor w/ 7 levels "(0,1000]","(1000,2000]",..: 2 3 1 2 2 5 2 2 2 2 ...
Related
'data.frame': 33510 obs. of 10 variables:
$ model : Factor w/ 92 levels " 1 Series"," 2 Series",..: 3 54 25 72 19 16 37 41 29 29 ...
$ year : int 2009 2019 2014 2016 2016 2017 2019 2017 2019 2015 ...
$ price : int 4675 40950 11472 17998 14399 9980 37990 14000 12299 8484 ...
$ transmission: Factor w/ 3 levels "Automatic","Manual",..: 2 3 3 2 2 2 1 2 2 2 ...
$ mileage : int 70000 19322 83417 30010 45693 70860 1499 20122 4132 25000 ...
$ fuelType : Factor w/ 4 levels "Diesel","Electric",..: 4 4 1 1 1 4 1 1 4 4 ...
$ tax : int 165 150 145 235 20 30 145 30 145 0 ...
$ mpg : num 47.9 34 54.3 44.1 65.7 55.4 40.9 64.2 48.7 65.7 ...
$ engineSize : num 2 3 2.1 2 2.1 1 2 1.5 1.1 1 ...
$ automaker : Factor w/ 4 levels "BMW","Ford","Mercedes",..: 1 1 3 2 3 2 3 2 2 2 ...
mycars_formula = price ~ year + transmission + mileage + fuelType + tax + mpg + engineSize + automaker
dt_mycars <- tree(mycars_formula, data = training_mycars)
cv_mycars <- cv.tree(dt_mycars, FUN=prune.misclass)
pruned_tree_size <- rev(cv_mycars$size)[which.min(rev(cv_mycars$dev))]
p_dt_mycars <- prune.misclass(dt_mycars, best = pruned_tree_size)
Error in prune.tree(tree = dt, best = pruned_tree_size, method = "misclass") :
misclass only for classification trees
Can someone explain to me why I cannot use misclass method?
I know that my factor model has too many levels so I exclude it from my formula. if you have a suggetion also about how i can include it as well it would be very helpful.
I'm newbie, and working on a classification to see the causes of coral diseases. The dataset contains 45 variables.
The output variable is a factor with 21 levels (21 diseases) and the inputs are numeric and factor variables, and those factors have even 94 levels, those are like "type of specie of coral", so I can't get into a split factor because I want to be as precise as possible, so maybe one species is less resistant than another. So I can't split those factors. Numeric variables are such as, population in the area, fishing trips etc.
First problem: tried genetic algorithms to select most important variables, random forests, etc., but... it gets aborted, so the variables I eliminated were just based on correlograms. I want something stronger to decide which variables select.
Second problem: I've tried everything I know and made tons of searches on Google to find something that runs and make a classification, but nothing goes on. I tried SVM, Random Forests, Cart, GBM, bagging and boosting, but nothing can't with this dataset.
This is the structure of the dataset
'data.frame': 136510 obs. of 45 variables:
$ SITE : Factor w/ 144 levels "TUT-1511","TUT-1513",..: 56 15 55 21 12 12 17 53 48 82 ...
$ Zone_Fine : Factor w/ 17 levels "Aunuu_E","Aunuu_W",..: 11 9 10 9 9 9 9 8 10 10 ...
$ TRANSECT : num 1 1 1 1 1 1 1 1 1 1 ...
$ SEGMENT : num 5 1 1 1 7 5 7 5 3 7 ...
$ Seg_WIDTH : num 1 1 1 1 1 1 1 1 1 1 ...
$ Seg_LENGTH : num 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 ...
$ SPECIES : Factor w/ 156 levels "AAAA","AABR",..: 94 126 94 102 9 126 135 94 93 94 ...
$ COLONYLENGTH : num 11 45 10 5 12 10 8 30 20 14 ...
$ OLDDEAD : num 5 2 5 0 0 5 10 0 5 10 ...
$ RECENTDEAD : num 0 10 0 0 0 0 0 0 0 0 ...
$ DZCLASS : Factor w/ 21 levels "Acute Tissue Loss - White Syndrome",..: 14 14 14 14 14 14 14 14 14 14 ...
$ EXTENT : num 52.9 52.9 52.9 52.9 52.9 ...
$ SEVERITY : num 3.11 3.11 3.11 3.11 3.11 ...
$ TAXONNAME.x : Factor w/ 155 levels "Acanthastrea hemprichii",..: 95 132 95 107 7 132 133 95 89 95 ...
$ PHYLUM : Factor w/ 2 levels "Cnidaria","Rhodophyta": 1 1 1 1 1 1 1 1 1 1 ...
$ CLASS : Factor w/ 3 levels "Anthozoa","Florideophyceae",..: 1 1 1 1 1 1 1 1 1 1 ...
$ FAMILY : Factor w/ 20 levels "Acroporidae",..: 1 18 1 2 1 18 18 1 8 1 ...
$ GENUS : Factor w/ 55 levels "Acanthastrea",..: 35 44 35 39 2 44 44 35 34 35 ...
$ RANK : Factor w/ 2 levels "Genus","Species": 1 1 1 1 2 1 2 1 1 1 ...
$ DATE_ : Date, format: "0015-03-27" ...
$ OBS_YEAR : num 2015 2015 2015 2015 2015 ...
$ REEF_ZONE : Factor w/ 2 levels "Backreef","Forereef": 2 2 2 2 2 2 2 2 2 2 ...
$ DEPTH_BIN : Factor w/ 4 levels "Bank","Deep",..: 2 2 4 3 2 2 3 4 3 3 ...
$ LBSP : Factor w/ 2 levels "N","Y": 1 1 1 1 1 1 1 1 1 1 ...
$ Zone_Fine_ReefZone_Depth: Factor w/ 41 levels "Aunuu_E_Deep",..: 30 24 29 25 24 24 25 23 28 28 ...
$ Area_km2.x : num 50.9 49.1 101.8 49.1 49.1 ...
$ Fishing.trips.per.km2 : num 719 1148 1431 1148 1148 ...
$ Area_km2.y : num 50.9 49.1 50.9 49.1 49.1 ...
$ Pop.km2 : num 167.5 49.1 561.9 49.1 49.1 ...
$ SHED_NAME : Factor w/ 35 levels "Aasu","Afao - Asili",..: 2 9 15 17 17 1 1 35 28 26 ...
$ Shed_Cond : Factor w/ 4 levels "Extensive","Intermediate",..: 3 4 2 4 4 3 3 3 1 2 ...
$ Shed_Area_Calc : num 30202 29422 458542 126361 32595 ...
$ Perc_Area : num 0.00128 0.00107 0.00993 0.00458 0.00118 ...
$ Cond_Scale : num 3 4 2 4 4 3 3 3 1 2 ...
$ Shoreline_m : num 23146 33046 45821 33046 33046 ...
$ Rank : num 5 9 3 9 9 9 9 6 3 3 ...
$ Comp.8 : num 0.826 0.814 0.838 0.814 0.814 ...
$ Ble : num 0.958 0.969 0.959 0.969 0.969 ...
$ DZ : num 0.647 0.837 0.732 0.837 0.837 ...
$ Herb : num 0.682 0.564 0.704 0.564 0.564 ...
$ Rec : num 0.375 0.477 0.467 0.477 0.477 ...
$ MA : num 0.965 0.975 0.907 0.975 0.975 ...
$ Dam : num 0.998 1 0.992 1 1 ...
$ TAXONNAME.y : Factor w/ 94 levels "Abudefduf sordidus",..: 94 94 94 94 94 94 94 94 94 94 ...
$ Dummy : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
I expected a classification of "DZCLASS".
Thanks, every recommendation is welcomed!
I have tried everything I can think of to fix this error but I have not been able to figure it out. 32 bit machine, trying to build a choropleth. The data file is pretty basic some municipal IDs with population figures associated with it. The shape file is taken from here: www.ontario.ca/data/municipal-boundaries
library('tmap')
library('leaflet')
library('magrittr')
library('rio')
library('plyr')
library('scales')
library('htmlwidgets')
library('tmaptools')
setwd("C:/Users/rdhasa/desktop")
datafile <- "shapefiles2/Population - 2014.csv"
Pop2014 <- rio::import(datafile)
Pop2014$Population <- as.factor(Pop2014$Population)
str(Pop2014)
'data.frame': 454 obs. of 9 variables:
$ MUNID : int 20002 18000 18013 18001 18005 18017 18009 18039 18020 18029 ...
$ YEAR : int 2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 ...
$ MAH CODE : int 1106 10000 10101 10102 10401 10402 10404 10601 10602 10603 ...
$ V4 : int 1999 1800 1813 1801 1805 1817 1809 1839 1820 1829 ...
$ Municipality: chr "Toronto C" "Durham R" "Oshawa C" "Pickering C" ...
$ Tier : chr "ST" "UT" "LT" "LT" ...
$ A : int 11 11 11 11 11 11 11 11 11 11 ...
$ B : chr "a" "a" "a" "a" ...
$ Population : Factor w/ 438 levels "-","1,006","1,026",..: 160 359 117 432 86 419 97 73 179 171 ...
mnshape <- "shapefiles2/MUNICIPAL_BOUNDARY_LOWER_AND_SINGLE_TIER.shp"
mngeo2 <- read_shape(file=mnshape)
str(mngeo2#data)
'data.frame': 683 obs. of 13 variables:
$ MUNID : int 1002 1002 1002 1009 1009 1009 1016 1016 1016 1026 ...
$ MAH_CODE : int 71616 71616 71616 71618 71618 71618 71614 71614 71614 71613 ...
$ SGC_CODE : int 1005 1005 1005 1011 1011 1011 1020 1020 1020 1030 ...
$ ASSESSMENT: int 101 101 101 406 406 406 506 506 506 511 ...
$ LEGAL_NAME: Factor w/ 414 levels "CITY OF BARRIE",..: 369 369 369 370 370 370 96 96 96 334 ...
$ STATUS : Factor w/ 2 levels "LOWER TIER","SINGLE TIER": 1 1 1 1 1 1 1 1 1 1 ...
$ EXTENT : Factor w/ 3 levels "ISLANDS","LAND",..: 1 2 3 1 2 3 1 2 3 2 ...
$ MSO : Factor w/ 4 levels "CENTRAL","EASTERN",..: 2 2 2 2 2 2 2 2 2 2 ...
$ NAME_PREFI: Factor w/ 8 levels "-","CITY OF",..: 6 6 6 6 6 6 4 4 4 6 ...
$ UPPER_TIER: Factor w/ 30 levels "BRUCE","DUFFERIN",..: 27 27 27 27 27 27 27 27 27 27 ...
$ NAME : Factor w/ 413 levels "ADDINGTON HIGHLANDS",..: 339 339 339 342 342 342 337 337 337 259 ...
$ Shape_Leng: num 0.115 1.622 1.563 0.551 1.499 ...
$ Shape_Area: num 2.32e-05 6.95e-02 7.51e-03 5.63e-04 5.09e-02 ...
mnmap <- append_data(mngeo2, Pop2014, key.shp = "MUNID", key.data="MUNID")
minPct <- min(c(mnmap#data$Population))
maxPct <- max(c(mnmap#data$Population))
paletteLayers <- colorBin(palette = "RdBu", domain = c(minPct, maxPct), bins = c(0, 50000,200000 ,500000, 1000000, 2000000) , pretty=FALSE)
rm(mngeo2)
rm(Pop2014)
rm(mnshape)
rm(datafile)
rm(maxPct)
rm(minPct)
gc()
leaflet(mnmap) %>%
addProviderTiles("CartoDB.Positron") %>%
addPolygons(stroke=TRUE,
smoothFactor = 0.2,
weight = 1,
fillOpacity = .6)
Error: cannot allocate vector of size 177.2 Mb
Is there I can maybe safe space through simplfying the shape file. If so how would I go about doing this efficiently?
THanks
I'm able to scrape the first table of this page using the rvest package and using the following code:
library(rvest)
library(magrittr)
urlbbref <- read_html("http://www.baseball-reference.com/bio/Venezuela_born.shtml")
Bat <- urlbbref %>%
html_node(xpath = '//*[(#id = "bio_batting")]') %>%
html_table()
But I'm not able to scrape the second table of this page. I use selectorgadget to find the xpath of both tables and I use that info in the code, but it doesn't seem to be working for the second one.
Pit <- urlbbref %>%
html_node(xpath = '//*[(#id = "div_bio_pitching")]') %>%
html_table()
I come up with 3 tables in total.
library(magrittr)
library(rvest)
library(xml2)
library(stringi)
urlbbref <- read_html("http://www.baseball-reference.com/bio/Venezuela_born.shtml")
# First table is in the markup
table_one <- xml_find_all(urlbbref, "//table") %>% html_table
# Additional tables are within the comment tags, ie <!-- tables -->
# Which is why your xpath is missing them.
# First get the commented nodes
alt_tables <- xml2::xml_find_all(urlbbref,"//comment()") %>% {
#Find only commented nodes that contain the regex for html table markup
raw_parts <- as.character(.[grep("\\</?table", as.character(.))])
# Remove the comment begin and end tags
strip_html <- stringi::stri_replace_all_regex(raw_parts, c("<\\!--","-->"),c("",""),
vectorize_all = FALSE)
# Loop through the pieces that have tables within markup and
# apply the same functions
lapply(grep("<table", strip_html, value = TRUE), function(i){
rvest::html_table(xml_find_all(read_html(i), "//table")) %>%
.[[1]]
})
}
# Put all the data frames into a list.
all_tables <- c(
table_one, alt_tables
)
Results:
> Map(str, all_tables)
'data.frame': 361 obs. of 27 variables:
$ Rk : int 1 2 3 4 5 6 7 8 9 10 ...
$ Name : chr "Bobby Abreu" "Ehire Adrianza" "Jesus Aguilar" "Edgardo Alfonzo" ...
$ Yrs : int 18 4 4 12 6 7 1 5 5 2 ...
$ From : int 1996 2013 2014 1995 2006 2011 2000 2011 2013 2002 ...
$ To : int 2014 2016 2017 2006 2011 2017 2000 2015 2017 2004 ...
$ ASG : int 2 0 0 1 0 4 0 1 0 0 ...
$ G : int 2425 154 47 1506 193 842 2 92 150 38 ...
$ PA : int 10081 331 89 6108 624 3708 5 109 3 75 ...
$ AB : int 8480 291 81 5385 591 3411 5 94 2 64 ...
$ R : int 1453 27 4 777 44 456 1 5 0 11 ...
$ H : int 2470 64 18 1532 142 1062 1 22 0 16 ...
$ 2B : int 574 16 3 282 24 208 0 4 0 4 ...
$ 3B : int 59 1 0 18 3 19 0 0 0 0 ...
$ HR : int 288 3 0 146 17 60 0 1 0 2 ...
$ RBI : int 1363 26 8 744 67 326 0 9 0 10 ...
$ SB : int 400 4 0 53 1 204 0 0 0 1 ...
$ CS : int 128 4 0 17 2 59 0 0 0 0 ...
$ BB : int 1476 23 6 596 17 214 0 1 1 7 ...
$ SO : int 1840 60 28 617 158 389 1 34 0 12 ...
$ BA : num 0.291 0.22 0.222 0.284 0.24 0.311 0.2 0.234 0 0.25 ...
$ OBP : num 0.395 0.292 0.281 0.357 0.271 0.354 0.2 0.237 0.333 0.324 ...
$ SLG : num 0.475 0.313 0.259 0.425 0.377 0.436 0.2 0.309 0 0.406 ...
$ OPS : num 0.87 0.605 0.54 0.782 0.648 0.791 0.4 0.546 0.333 0.731 ...
$ Birthdate : chr "Mar 11, 1974" "Aug 21, 1989" "Jun 30, 1990" "Nov 8, 1973" ...
$ Debut : chr "Sep 1, 1996" "Sep 8, 2013" "May 15, 2014" "Apr 26, 1995" ...
$ Birthplace: chr "Maracay, Aragua" "Guarenas, Miranda" "Maracay, Aragua" "Santa Teresa del Tuy, Miranda" ...
$ Pos : chr "POS" "POS" "POS" "POS" ...
'data.frame': 157 obs. of 31 variables:
$ Rk : int 1 2 3 4 5 6 7 8 9 10 ...
$ Name : chr "Henderson Alvarez" "Jose Alvarez" "Wilson Alvarez" "Alexi Amarista" ...
$ Yrs : int 5 5 14 7 5 2 10 4 6 4 ...
$ From : int 2011 2013 1989 2011 1980 2015 1999 2007 2012 2005 ...
$ To : int 2015 2017 2005 2017 1984 2016 2008 2011 2017 2009 ...
$ ASG : int 1 0 1 0 0 0 0 0 0 0 ...
$ W : int 27 6 102 0 9 4 53 1 15 3 ...
$ L : int 34 12 92 0 6 2 65 3 6 4 ...
$ W-L% : num 0.443 0.333 0.526 NA 0.6 0.667 0.449 0.25 0.714 0.429 ...
$ ERA : num 3.8 3.97 3.96 0 3.27 4.35 4.65 5.28 2.91 6.86 ...
$ G : int 92 150 355 2 110 72 185 43 275 25 ...
$ GS : int 92 6 263 0 0 0 167 0 0 8 ...
$ GF : int 0 32 18 2 66 14 7 16 36 12 ...
$ CG : int 5 0 12 0 0 0 0 0 0 0 ...
$ SHO : int 5 0 5 0 0 0 0 0 0 0 ...
$ SV : int 0 0 4 0 7 0 0 0 0 0 ...
$ IP : num 563 167.2 1747.2 0.2 220 ...
$ H : int 596 174 1624 0 222 64 891 57 177 68 ...
$ R : int 261 85 857 0 86 39 519 29 75 51 ...
$ ER : int 238 74 769 0 80 30 478 27 72 46 ...
$ HR : int 54 17 190 0 17 5 122 7 10 4 ...
$ BB : int 129 55 805 0 68 36 431 21 80 34 ...
$ IBB : int 7 10 29 0 7 3 41 5 17 1 ...
$ SO : int 296 148 1330 0 113 63 680 41 180 37 ...
$ HBP : int 22 8 50 0 3 2 51 4 11 4 ...
$ BK : int 3 1 4 0 3 1 6 0 3 1 ...
$ WP : int 16 3 28 0 5 2 43 1 14 2 ...
$ BF : int 2358 729 7518 2 928 285 4055 221 913 282 ...
$ Birthdate : chr "Apr 18, 1990" "May 6, 1989" "Mar 24, 1970" "Apr 6, 1989" ...
$ Debut : chr "Aug 10, 2011" "Jun 9, 2013" "Jul 24, 1989" "Apr 26, 2011" ...
$ Birthplace: chr "Valencia, Carabobo" "Barcelona, Anzoategui" "Maracaibo, Zulia" "Barcelona, Anzoategui" ...
'data.frame': 3 obs. of 17 variables:
$ Rk : int 1 2 NA
$ Mgr : chr "Ozzie Guillen" "Al Pedrique" "Totals"
$ Yrs : int 9 1 10
$ From : int 2004 2004 2004
$ To : int 2012 2004 2012
$ W : int 747 22 769
$ L : int 710 61 771
$ W-L% : num 0.513 0.265 0.499
$ Ties : int 0 0 0
$ G>.500 : int 37 -39 -2
$ G : int 1457 83 1540
$ BestFin : int 1 5 1
$ WrstFin : int 5 5 5
$ AvRk : num 2.7 5 2.8
$ Birthdate : chr "Jan 20, 1964" "Aug 11, 1960" ""
$ Debut : chr "Apr 9, 1985" "Apr 14, 1987" ""
$ Birthplace: chr "Ocumare del Tuy, Miranda" "Valencia, Carabobo" ""
I am trying to update a a value in a data frame but am getting--what seems to me--a weird error about operation that I don't think I am using.
Here's a summary of the data:
> str(us.cty2015#data)
'data.frame': 3108 obs. of 15 variables:
$ STATEFP : Factor w/ 52 levels "01","02","04",..: 17 25 33 46 4 14 16 24 36 42 ...
$ COUNTYFP : Factor w/ 325 levels "001","003","005",..: 112 91 67 9 43 81 7 103 72 49 ...
$ COUNTYNS : Factor w/ 3220 levels "00023901","00025441",..: 867 1253 1600 2465 38 577 690 1179 1821 2104 ...
$ AFFGEOID : Factor w/ 3220 levels "0500000US01001",..: 976 1472 1879 2813 144 657 795 1395 2098 2398 ...
$ GEOID : Factor w/ 3220 levels "01001","01003",..: 976 1472 1879 2813 144 657 795 1395 2098 2398 ...
$ NAME : Factor w/ 1910 levels "Abbeville","Acadia",..: 1558 1703 1621 688 856 1075 148 1807 1132 868 ...
$ LSAD : Factor w/ 9 levels "00","03","04",..: 5 5 5 5 5 5 5 5 5 5 ...
$ ALAND : num 1.66e+09 1.10e+09 3.60e+09 2.12e+08 1.50e+09 ...
$ AWATER : num 2.78e+06 5.24e+07 3.50e+07 2.92e+08 8.91e+06 ...
$ t_pop : num 0 0 0 0 0 0 0 0 0 0 ...
$ n_wht : num 0 0 0 0 0 0 0 0 0 0 ...
$ n_free_blk: num 0 0 0 0 0 0 0 0 0 0 ...
$ n_slv : num 0 0 0 0 0 0 0 0 0 0 ...
$ n_blk : num 0 0 0 0 0 0 0 0 0 0 ...
$ n_free : num 0 0 0 0 0 0 0 0 0 0 ...
> str(us.cty1860#data)
'data.frame': 2126 obs. of 29 variables:
$ DECADE : Factor w/ 1 level "1860": 1 1 1 1 1 1 1 1 1 1 ...
$ NHGISNAM : Factor w/ 1236 levels "Abbeville","Accomack",..: 1142 1218 1130 441 812 548 1144 56 50 887 ...
$ NHGISST : Factor w/ 41 levels "010","050","060",..: 32 13 9 36 16 36 16 30 23 39 ...
$ NHGISCTY : Factor w/ 320 levels "0000","0010",..: 142 206 251 187 85 231 131 12 6 161 ...
$ ICPSRST : Factor w/ 37 levels "1","11","12",..: 5 13 21 26 22 26 22 10 15 17 ...
$ ICPSRCTY : Factor w/ 273 levels "10","1010","1015",..: 25 93 146 72 247 122 12 10 228 45 ...
$ ICPSRNAM : Factor w/ 1200 levels "ABBEVILLE","ACCOMACK",..: 1108 1184 1097 432 791 535 1110 55 49 860 ...
$ STATENAM : Factor w/ 41 levels "Alabama","Arkansas",..: 32 13 9 36 16 36 16 30 23 39 ...
$ ICPSRSTI : int 14 31 44 49 45 49 45 24 34 40 ...
$ ICPSRCTYI : int 1210 1970 2910 1810 710 2450 1130 110 50 1450 ...
$ ICPSRFIP : num 0 0 0 0 0 0 0 0 0 0 ...
$ STATE : Factor w/ 41 levels "010","050","060",..: 32 13 9 36 16 36 16 30 23 39 ...
$ COUNTY : Factor w/ 320 levels "0000","0010",..: 142 206 251 187 85 231 131 12 6 161 ...
$ PID : num 1538 735 306 1698 335 ...
$ X_CENTROID : num 1348469 184343 1086494 -62424 585888 ...
$ Y_CENTROID : num 556680 588278 -229809 -433290 -816852 ...
$ GISJOIN : Factor w/ 2126 levels "G0100010","G0100030",..: 1585 627 319 1769 805 1788 823 1425 1079 2006 ...
$ GISJOIN2 : Factor w/ 2126 levels "0100010","0100030",..: 1585 627 319 1769 805 1788 823 1425 1079 2006 ...
$ SHAPE_AREA : num 2.35e+09 1.51e+09 8.52e+08 2.54e+09 6.26e+08 ...
$ SHAPE_LEN : num 235777 155261 166065 242608 260615 ...
$ t_pop : int 25043 653 4413 8184 174491 1995 4324 17187 4649 8392 ...
$ n_wht : int 24974 653 4295 6892 149063 1684 3001 17123 4578 2580 ...
$ n_free_blk : int 69 0 2 0 10939 2 7 64 12 409 ...
$ n_slv : int 0 0 116 1292 14484 309 1316 0 59 5403 ...
$ n_blk : int 69 0 118 1292 25423 311 1323 64 71 5812 ...
$ n_free : num 25043 653 4297 6892 160007 ...
$ frac_free : num 1 1 0.974 0.842 0.917 ...
$ frac_free_blk: num 1 NA 0.0169 0 0.4303 ...
$ frac_slv : num 0 0 0.0263 0.1579 0.083 ...
> str(overlap)
'data.frame': 15266 obs. of 7 variables:
$ cty2015 : Factor w/ 3108 levels "0","1","10","100",..: 1 1 2 2 2 2 2 1082 1082 1082 ...
$ cty1860 : Factor w/ 2126 levels "0","1","10","100",..: 1047 1012 1296 1963 2033 2058 2065 736 1413 1569 ...
$ area_inter : num 1.66e+09 2.32e+05 9.81e+04 1.07e+09 7.67e+07 ...
$ area1860 : num 1.64e+11 1.81e+11 1.54e+09 2.91e+09 2.32e+09 ...
$ frac_1860 : num 1.01e-02 1.28e-06 6.35e-05 3.67e-01 3.30e-02 ...
$ sum_frac_1860 : num 1 1 1 1 1 ...
$ scaled_frac_1860: num 1.01e-02 1.28e-06 6.35e-05 3.67e-01 3.30e-02 ...
I am trying to multiply a vector of variables vars <- c("t_pop", "n_wht", "n_free_blk", "n_slv", "n_blk", "n_free") in the us.cty1860#data data frame by a scalar overlap$scaled_frac_1860[i], then add it to the same vector of variables in the us.cty2015#data data frame, and finally overwrite the variables in the us.cty2015#data data frame.
When I make the following call, I get an error that seems to be saying that I am trying to preform invalid operations on factors (which is not the case (you can confirm from the str output)).
> us.cty2015#data[overlap$cty2015[1], vars] <- us.cty2015#data[overlap$cty2015[1], vars] + (overlap$scaled_frac_1860[1] * us.cty1860#data[overlap$cty1860[1], vars])
Error in Summary.factor(1L, na.rm = FALSE) :
‘max’ not meaningful for factors
In addition: Warning message:
In Ops.factor(i, 0L) : ‘>=’ not meaningful for factors
However, when I don't attempt to overwrite the old value, the operation works fine.
> us.cty2015#data[overlap$cty2015[1], vars] + (overlap$scaled_frac_1860[1] * us.cty1860#data[overlap$cty1860[1], vars])
t_pop n_wht n_free_blk n_slv n_blk n_free
0 118.3889 113.6468 0.1317233 4.610316 4.742039 113.7785
I'm sure there are better ways of accomplishing what I am trying to do but does anyone have any idea what is going on?
Edit:
I am using the following libraries: rgdal, rgeos, and maptools
The all the data/object are coming from NHGIS shapefiles 1860 and 2015 United States Counties.