Related
I'm working on a data.table which contains, among other data, the demand for certain products on certain stores of a business franchise. The goal is to predict the demand for every single product on every single store.
Here is a "head" of my dataset:
head(train_dataset)
Week
Store_ID
Product_ID
Sales
Returns
Demand
3
15766
1212
3
0
3
3
15766
1216
4
0
4
3
15766
1238
4
0
4
3
15766
1240
4
0
4
3
15766
1242
3
0
3
3
15766
1250
5
0
5
My initial approach was to subset the original dataset so that I end up with one dataset per product per store. Exemplifying, if there are 3 products, namely product 1, 2 and 3, and 2 stores, A and B, I want to have one dataset containing all the data of product 1 on the store A, another one containing all the data from product 1 on store B and so on.
Since there are more than 2500 products, my first attempt was to try to automatize, with a loop for or something from the apply family, a code like this:
library(dplyr)
product.n <- filter(train_dataset, product_id == n)
where "n" is a product id which can be obtained from another, dedicated, dataset. In this case, the products ids are int variables. Assuming I loaded this dedicated dataset as "prods", I tried something like:
for (i in prods){
a = prods$product_id[i]
product.a <- paste("product", a)
product.a <- filter(train_dataset, product_id == a)
}
but it didn't work. Then I tried:
products <- split(train_dataset, f = train_dataset$product_id)
which worked. It returned a list of various lists, each one comprising all the data of a certain product id. Then, to subset this lists based on the stores ids, I saw that I could not use a code structured in the same way because "train_dataset$store_id" is not available to be put on the "f" parameter of the split function. To get around this I tried using lapply:
products.per.store <- lapply(products, '[[', "store_id")
which didn't work.
It ocurred me trying to convert all the sublists to dataframes and then trying to apply the same split process again, all automatically. It worked for a single sublist that I did manually, but I wasn't able to automatize it, I also don't think that it would be an efficient way to go about this. I also thought about combining "filter" and "group by" from dplyr but, since wasn't able to automatize the first code example, didn't try any further.
Here is a "head" from one dataset in the pattern that I'm aiming at (comprising, only, all the data from a certain product id in a certain store id):
head(prod41_store684023)
Week
Store_ID
Product_ID
Sales
Returns
Demand
3
684023
41
30
0
30
4
684023
41
95
0
95
5
684023
41
82
0
82
6
684023
41
30
0
30
7
684023
41
60
0
60
8
684023
41
70
0
70
I've seen quite a few other questions here in SO about operations on lists within lists and about filtering/spliting/subsetting datasets but, unfortunately, could not extrapolate anything to this question, so I apologize if this has already been answered before.
Any help will be greatly appreciated.
Thanks!
P.S. I'll add here a sample dput file with data from 2 product ids, id 41 and 151:
structure(list(Week = c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 9L), Store_ID = c(684023L, 681747L, 685079L,
1623763L, 1035265L, 2482890L, 1546790L, 4586525L, 684023L, 1938075L,
681747L, 685079L, 1623763L, 2482890L, 1451516L, 4586525L, 2470338L,
684023L, 1938075L, 681747L, 1623763L, 2482890L, 2470338L, 146030L,
684023L, 1938075L, 465617L, 681747L, 1623763L, 2482890L, 1546790L,
4586525L, 2470338L, 1105804L, 2284385L, 146030L, 684023L, 681747L,
1623763L, 2482890L, 1546790L, 4586525L, 2470338L, 2284385L, 146030L,
684023L, 465617L, 681747L, 1623763L, 2482890L, 1546790L, 4586525L,
2470338L, 2284385L, 146030L, 684023L, 1938075L, 681747L, 1623763L,
2482890L, 1546790L, 64209L, 1451306L, 1451307L, 2290541L, 153680L,
817983L, 1163986L, 1873535L, 4286560L, 4498110L, 153547L, 153688L,
153817L, 713342L, 1549943L, 161141L, 1044616L, 1072646L, 1856859L,
1137252L, 1469082L, 1951821L, 9716137L, 1963850L, 153840L, 1524199L,
1133031L, 168596L, 52677L, 167312L, 168521L, 168527L, 168678L,
1915817L, 1915818L, 168631L, 168784L, 434240L, 984120L, 2176784L,
64209L, 1451306L, 1451307L, 2290541L, 153680L, 817983L, 1163986L,
1873535L, 4286560L, 4498110L, 153547L, 153688L, 153817L, 713342L,
1549943L, 161141L, 1044616L, 1072646L, 1856859L, 1137252L, 1469082L,
1951821L, 9716137L, 1963850L, 153840L, 1524199L, 1133031L, 168596L,
52677L, 167312L, 168521L, 168527L, 168678L, 1915817L, 1915818L,
168631L, 168784L, 434240L, 984120L, 2176784L, 2176785L, 64209L,
1451306L, 1451307L, 2290541L, 153680L, 817983L, 1163986L, 4286560L,
4498110L, 153547L, 153688L, 153817L, 713342L, 1549943L, 161141L,
1044616L, 1072646L, 1856859L, 1137252L, 1469082L, 9716137L, 1963850L,
153840L, 1524199L, 168596L, 52677L, 167312L, 168521L, 168527L,
168678L, 1915817L, 1915818L, 168540L, 168631L, 168784L, 434240L,
984120L, 2176784L, 2176785L, 64209L, 1451306L, 1451307L, 2290541L,
153680L, 817983L, 1163986L, 4286560L, 153688L, 153817L, 713342L,
1549943L, 161141L, 1044616L, 1072646L, 1856859L, 1137252L, 1469082L,
9716137L, 1963850L, 153840L, 168596L, 52677L, 167312L, 168521L,
168527L, 168678L, 1915817L, 1915818L, 168540L, 168631L, 168784L,
434240L, 984120L, 2176784L, 64209L, 1451306L, 1451307L, 2290541L,
153680L, 817983L, 1163986L, 1873535L, 4286560L, 153688L, 153817L,
713342L, 1549943L, 161141L, 1044616L, 1072646L, 1856859L, 1137252L,
1469082L, 1951821L, 9716137L, 1963850L, 153840L, 168596L, 52677L,
167312L, 168521L, 168527L, 168678L, 1915817L, 1915818L, 168540L,
168631L, 168784L, 434240L, 984120L, 2176784L, 64209L, 1451306L,
1451307L, 2290541L, 153680L, 817983L, 1163986L, 1873535L, 4286560L,
153547L, 153688L, 153817L, 713342L, 1549943L, 161141L, 1044616L,
1072646L, 1856859L, 1137252L, 1469082L, 1951821L, 9716137L, 1963850L,
153840L, 1524199L, 168596L, 52677L, 167312L, 168521L, 168527L,
168678L, 1915817L, 1915818L, 168540L, 168631L, 168784L, 434240L,
984120L, 2176784L, 2176785L, 64209L, 1451306L, 1451307L, 2290541L,
153680L, 817983L, 1163986L, 1873535L, 4286560L, 153547L, 153688L,
153817L, 713342L, 1549943L, 161141L, 1044616L, 1072646L, 1856859L,
1137252L, 1469082L, 1951821L, 9716137L, 1963850L, 153840L, 1524199L,
4722056L, 1133031L, 168596L, 52677L, 167312L, 168521L, 168527L,
168678L, 1915817L, 1915818L, 168540L, 168631L, 168784L, 434240L,
984120L, 2176784L, 2176785L), Product_ID = c(41L, 41L, 41L, 41L,
41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L,
41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L,
41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L,
41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L,
41L, 41L, 41L, 41L, 41L, 151L, 151L, 151L, 151L, 151L, 151L,
151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L,
151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L,
151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L,
151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L,
151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L,
151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L,
151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L,
151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L,
151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L,
151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L,
151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L,
151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L,
151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L,
151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L,
151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L,
151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L,
151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L,
151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L,
151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L,
151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L,
151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L,
151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L,
151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L,
151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L,
151L, 151L, 151L, 151L), Sales = c(30L, 2064L, 0L, 1022L, 0L,
330L, 200L, 20L, 95L, 105L, 1430L, 0L, 740L, 430L, 5L, 7L, 45L,
82L, 20L, 1686L, 820L, 400L, 25L, 70L, 30L, 40L, 0L, 1250L, 986L,
500L, 80L, 1L, 25L, 138L, 200L, 60L, 60L, 1570L, 1030L, 300L,
50L, 10L, 20L, 100L, 40L, 70L, 30L, 1305L, 1159L, 295L, 60L,
20L, 10L, 110L, 65L, 45L, 70L, 1378L, 1269L, 410L, 40L, 12L,
14L, 7L, 15L, 10L, 15L, 23L, 9L, 18L, 3L, 10L, 13L, 21L, 12L,
17L, 72L, 20L, 9L, 16L, 25L, 12L, 1L, 10L, 25L, 11L, 9L, 12L,
10L, 14L, 20L, 10L, 18L, 11L, 10L, 10L, 3L, 16L, 3L, 5L, 6L,
14L, 8L, 5L, 13L, 5L, 13L, 7L, 6L, 11L, 1L, 3L, 19L, 15L, 13L,
13L, 38L, 27L, 11L, 14L, 13L, 6L, 3L, 14L, 10L, 8L, 3L, 14L,
11L, 12L, 18L, 14L, 24L, 12L, 5L, 10L, 3L, 22L, 24L, 10L, 4L,
8L, 19L, 23L, 4L, 10L, 7L, 17L, 27L, 9L, 4L, 4L, 12L, 17L, 16L,
18L, 32L, 9L, 1L, 16L, 29L, 5L, 22L, 10L, 11L, 6L, 5L, 8L, 28L,
11L, 22L, 10L, 10L, 25L, 18L, 8L, 20L, 18L, 25L, 8L, 16L, 16L,
8L, 5L, 6L, 7L, 17L, 19L, 22L, 18L, 20L, 21L, 20L, 55L, 14L,
4L, 16L, 7L, 3L, 16L, 17L, 15L, 15L, 16L, 24L, 16L, 20L, 17L,
14L, 15L, 6L, 6L, 14L, 19L, 31L, 10L, 15L, 15L, 6L, 7L, 2L, 11L,
18L, 4L, 9L, 13L, 7L, 2L, 8L, 9L, 17L, 2L, 20L, 6L, 10L, 6L,
8L, 20L, 3L, 6L, 16L, 18L, 20L, 28L, 5L, 11L, 10L, 5L, 3L, 17L,
11L, 10L, 2L, 16L, 9L, 8L, 7L, 21L, 43L, 44L, 13L, 20L, 21L,
21L, 26L, 29L, 60L, 38L, 12L, 5L, 16L, 9L, 10L, 3L, 10L, 9L,
8L, 7L, 18L, 15L, 15L, 20L, 40L, 16L, 20L, 15L, 21L, 6L, 10L,
26L, 14L, 8L, 9L, 25L, 14L, 15L, 20L, 6L, 10L, 15L, 14L, 19L,
3L, 22L, 21L, 14L, 8L, 122L, 43L, 8L, 9L, 39L, 18L, 2L, 16L,
23L, 18L, 18L, 1L, 29L, 17L, 30L, 42L, 18L, 55L, 12L, 20L, 15L,
16L, 11L, 12L, 21L, 20L, 13L, 16L), Returns = c(0L, 0L, 9L, 0L,
90L, 0L, 0L, 5L, 0L, 0L, 0L, 20L, 0L, 0L, 0L, 3L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 30L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 70L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Demand = c(30L,
2064L, 0L, 1022L, 0L, 330L, 200L, 15L, 95L, 105L, 1430L, 0L,
740L, 430L, 5L, 4L, 45L, 82L, 20L, 1686L, 820L, 400L, 25L, 70L,
30L, 40L, 0L, 1250L, 986L, 500L, 80L, 1L, 25L, 138L, 200L, 60L,
60L, 1570L, 1030L, 300L, 50L, 10L, 20L, 100L, 40L, 70L, 0L, 1305L,
1159L, 295L, 60L, 20L, 10L, 110L, 65L, 45L, 70L, 1378L, 1269L,
410L, 40L, 12L, 14L, 7L, 15L, 10L, 15L, 23L, 9L, 18L, 3L, 10L,
13L, 21L, 12L, 17L, 72L, 20L, 9L, 16L, 25L, 12L, 1L, 10L, 25L,
11L, 9L, 12L, 10L, 14L, 20L, 10L, 18L, 11L, 10L, 10L, 3L, 16L,
3L, 5L, 6L, 14L, 8L, 5L, 13L, 5L, 13L, 7L, 6L, 11L, 1L, 3L, 19L,
15L, 13L, 13L, 38L, 27L, 11L, 14L, 13L, 6L, 3L, 14L, 10L, 8L,
3L, 14L, 11L, 12L, 18L, 14L, 24L, 12L, 5L, 10L, 3L, 22L, 24L,
10L, 4L, 8L, 19L, 23L, 4L, 10L, 7L, 17L, 27L, 9L, 4L, 4L, 12L,
17L, 16L, 18L, 32L, 9L, 1L, 16L, 29L, 5L, 22L, 10L, 11L, 6L,
5L, 8L, 28L, 11L, 22L, 10L, 10L, 25L, 18L, 8L, 20L, 18L, 25L,
8L, 16L, 16L, 8L, 5L, 6L, 7L, 17L, 19L, 22L, 18L, 20L, 21L, 20L,
55L, 14L, 4L, 16L, 7L, 3L, 16L, 17L, 15L, 15L, 16L, 24L, 16L,
20L, 17L, 14L, 15L, 6L, 6L, 14L, 19L, 31L, 10L, 15L, 15L, 6L,
7L, 2L, 11L, 18L, 4L, 9L, 13L, 7L, 2L, 8L, 9L, 17L, 2L, 20L,
6L, 10L, 6L, 8L, 20L, 3L, 6L, 16L, 18L, 20L, 28L, 5L, 11L, 10L,
5L, 3L, 17L, 11L, 10L, 2L, 16L, 9L, 8L, 7L, 21L, 43L, 44L, 13L,
20L, 21L, 21L, 26L, 29L, 60L, 38L, 12L, 5L, 16L, 9L, 10L, 3L,
10L, 9L, 8L, 7L, 18L, 15L, 15L, 20L, 40L, 16L, 20L, 15L, 21L,
6L, 10L, 26L, 14L, 8L, 9L, 25L, 14L, 15L, 20L, 6L, 10L, 15L,
14L, 19L, 3L, 22L, 21L, 14L, 8L, 122L, 43L, 8L, 9L, 39L, 18L,
2L, 16L, 23L, 18L, 18L, 1L, 29L, 17L, 30L, 42L, 18L, 55L, 12L,
20L, 15L, 16L, 11L, 12L, 21L, 20L, 13L, 16L)), row.names = c(NA,
-335L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x0000000002611ef0>)
Following up on the approach using split, I managed to solve this.
Like I said on the question, one of my attempts started with:
products <- split(train_dataset, f = train_dataset$product_id)
which created a list with various lists inside, each one comprising all the data from a certain product.
To further subset this sublists, it ocurred me using lapply with an anonymous function:
products_per_stores <- lapply(products, function(x){split(x, f = x$Store_ID)})
it created a list with lists inside, which, in turn, contained lists inside them as well. The "first level" of sublists comprises one list per product id and the "second level", one list per combination of the product id with the stores ids, as was the goal.
In your first attempt to use the for loop, it couldn't work for two reasons:
First:
You try to iterate over 'prods', which is, as you say, a dataset - nothing, you can iterate over.
So in case you want to iterate over each line of your dataset (as your example suggests), you can use
for (a in prods$product_id){
Second:
You overwrote your subset in every iteration.
I assume you tried to name the subset with
product.a <- paste("product", a)
but it doesn't work that way.
To assign a name containing your 'a', you can use the assign() function like so:
assign(paste0("product.", a), filter(train_dataset, product_id == a))
If you only want separate dataframes for each product (that's what you loop attempts, as far as I can see), you can also just subset by the id, allowing you to use only iterate over unique ids.
for(i in unique(prods$product_id)){
assign(paste0("product.",i), prods[which(prods$product_id == i),])
}
This is, of course, no complete solution to your problem but might help you to revisit your initial approach.
When I try to run log(x) on a variable in my dataset I get the error:
Error in oldat$gdp16 + 1 : non-numeric argument to binary operator
At first I thought that the reason is that these particular variables have NA's so I decided to deal with this problem like so:
oldat$gdp16[oldat$gdp16 == "#N/A"] <- "NA"
oldat$gdp16LOG <- log(oldat$gdp16 + 1, na.rm=TRUE)
This did not solve the problem.
Please find the excerpt of the data as an example below. The variable that fails the log transform is gdp16:
structure(list(gdp16 = c("19469", "159049", "554861", "10546",
"1208039", "390800", "37868", "11839", "32153", "47723", "467546",
"15649", "1793989", "53241", "32218", "1535768", "250036", "11190993",
"280091", "51339", "NA", "20154", "195305", "306900", "72343",
"98614", "332928", "5010", "23338", "73001", "4671", "238678",
"2465134", "14014", "14378", "3477796", "192691", "1056", "68664",
"320881", "125817", "20304", "2274230", "932256", "418977", "304819",
"317748", "1859384", "36375", "14057", "4949273", "137278", "70875",
"6715", "110912", "6813", "27572", "42773", "296536", "12232",
"1076912", "6796", "11183", "4374", "103606", "777228", "189286",
"404653", "7607", "1414804", "371075", "57821", "304889", "27424",
"471400", "205184", "105035", "152452", "187806", "1284728",
"644936", "38300", "309764", "89769", "44709", "295763", "1411042",
"1237255", "95584", "514460", "668745", "NA", "525608", "6952",
"411755", "4389", "22320", "42063", "863722", "24079", "93270",
"357045", "2650850", "18624475", "67068", "236", "205276", "16620"
), pop16 = c(34656L, 40606L, 43847L, 2925L, 24211L, 8737L, 9758L,
391L, 1425L, 9502L, 11331L, 2250L, 207653L, 7128L, 23439L, 36265L,
17910L, 1378665L, 48653L, 4174L, 11476L, 1170L, 10566L, 5728L,
10649L, 16385L, 95689L, 4955L, 1316L, 102403L, 899L, 5495L, 66860L,
1980L, 3719L, 82349L, 10776L, 107L, 16582L, 7337L, 9814L, 335L,
1324171L, 261115L, 80277L, 4755L, 8546L, 60627L, 23696L, 2881L,
126995L, 17794L, 48462L, 1816L, 4053L, 6080L, 1960L, 2868L, 31187L,
1263L, 127540L, 3552L, 3027L, 622L, 35277L, 17030L, 4693L, 185990L,
20673L, 51246L, 5235L, 4034L, 103320L, 6725L, 37970L, 10325L,
3407L, 2570L, 19702L, 144342L, 32276L, 7058L, 5607L, 5431L, 2065L,
56015L, 25369L, 46484L, 39579L, 9923L, 8373L, 18430L, 22465L,
8735L, 68864L, 7606L, 1365L, 11403L, 79512L, 41488L, 45005L,
9270L, 65596L, 323406L, 31848L, 31568L, 94569L, 16150L), gold16 = c(0L,
0L, 3L, 1L, 8L, 0L, 1L, 1L, 1L, 1L, 2L, 0L, 7L, 0L, 0L, 4L, 0L,
26L, 3L, 5L, 5L, 0L, 1L, 2L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L,
10L, 0L, 2L, 17L, 3L, 0L, 0L, 0L, 8L, 0L, 0L, 1L, 3L, 0L, 0L,
8L, 1L, 6L, 12L, 3L, 6L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 8L, 4L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 2L, 0L, 1L, 0L,
1L, 19L, 0L, 2L, 1L, 2L, 1L, 2L, 9L, 7L, 0L, 2L, 3L, 0L, 1L,
1L, 2L, 0L, 0L, 0L, 1L, 0L, 2L, 0L, 27L, 46L, 4L, 0L, 1L, 0L),
tot16 = c(0L, 2L, 4L, 4L, 29L, 1L, 18L, 2L, 2L, 9L, 6L, 0L,
19L, 3L, 0L, 22L, 0L, 70L, 8L, 10L, 11L, 0L, 10L, 15L, 1L,
0L, 3L, 0L, 1L, 8L, 1L, 1L, 42L, 0L, 7L, 42L, 6L, 1L, 0L,
0L, 15L, 0L, 2L, 3L, 8L, 2L, 2L, 28L, 2L, 11L, 41L, 18L,
13L, 1L, 0L, 0L, 0L, 4L, 5L, 0L, 5L, 0L, 2L, 0L, 1L, 19L,
18L, 1L, 1L, 7L, 4L, 0L, 1L, 0L, 11L, 1L, 1L, 1L, 4L, 55L,
0L, 8L, 1L, 4L, 4L, 10L, 21L, 17L, 0L, 11L, 7L, 0L, 3L, 1L,
6L, 0L, 1L, 3L, 8L, 0L, 11L, 1L, 67L, 121L, 13L, 3L, 2L,
0L), altitude = c(1790, 1, 10.5, 989, 605, 170, -28, 2, 6,
198, 76, 983, 1079, 580, 726, 74, 521, 44, 2625, 130, 4,
170, 244, 0, 0, 2850, 22, 2325, 37, 2355, 0, 25, 34, 0, 451,
34, 153, 25, 1529, 100, 102, 15, 210, 3, 1189, 8, 754, 14,
217, 53, 17, 338, 1795, 0, 5, 771, 8, 124, 60, 134, 2240,
80, 1350, 61, 53, -2, 10, 777, 207, 6, 12, 0, 7, 54, 93,
15, 3, 13, 70, 124, 624, 116, 0, 131, 281, 1271, 33, 667,
377, 15, 542, 691, 5, 789, 1, 63, 0, 0, 938, 1190, 168, 13,
14, 2, 459, 909, 25, 1483), athletes16 = c(3L, 64L, 215L,
31L, 420L, 72L, 56L, 29L, 33L, 120L, 104L, 12L, 462L, 50L,
24L, 310L, 42L, 392L, 143L, 85L, 117L, 15L, 104L, 119L, 26L,
37L, 121L, 12L, 46L, 37L, 53L, 54L, 393L, 6L, 40L, 418L,
92L, 7L, 21L, 37L, 154L, 8L, 112L, 28L, 63L, 76L, 47L, 309L,
12L, 56L, 336L, 101L, 79L, 8L, 0L, 19L, 32L, 67L, 32L, 2L,
123L, 23L, 43L, 35L, 48L, 237L, 195L, 71L, 6L, 31L, 62L,
10L, 13L, 11L, 234L, 90L, 40L, 37L, 95L, 285L, 10L, 103L,
25L, 52L, 63L, 135L, 135L, 307L, 6L, 151L, 103L, 7L, 57L,
7L, 54L, 5L, 28L, 61L, 100L, 21L, 204L, 12L, 360L, 555L,
70L, 86L, 22L, 30L)), class = "data.frame", row.names = c(NA,
-108L))
Your gdp16 variable is of class character. You cannot add a number to character. You need to convert your variable to the numeric type (and perhaps substitute NAs):
df$gdp16 <- as.numeric(df$gdp16)
You need to replace the "NA" with an actual NA, and do as.numeric:
oldat$gdp16LOG = log(as.numeric(replace(oldat$gdp16,oldat$gdp16=="NA",NA)))
You can just do as.numeric(oldat$gdp16) . It will return some error messages because any other string that is not numeric will be converted to NA..
Consider the following data set:
SimulatedDated <- structure(list(CustumerId = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L,
11L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 13L, 13L,
13L, 13L, 13L, 13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L, 14L, 14L,
14L, 14L, 14L, 14L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L,
15L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 17L, 17L, 17L,
17L, 17L, 17L, 17L, 17L, 17L, 18L, 18L, 18L, 18L, 18L, 18L, 18L,
18L, 18L, 18L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 20L, 20L,
20L, 20L, 20L, 20L, 20L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L,
22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 23L, 23L, 23L, 23L,
23L, 23L, 23L, 23L, 24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L,
25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 26L, 26L, 26L,
26L, 26L, 26L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 28L,
28L, 28L, 28L, 28L, 28L, 28L, 29L, 29L, 29L, 29L, 29L, 29L, 29L,
29L, 29L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 31L, 31L,
31L, 31L, 31L, 31L, 31L, 31L, 31L, 32L, 32L, 32L, 32L, 32L, 32L,
32L, 32L, 32L, 32L, 33L, 33L, 33L, 33L, 33L, 33L, 33L, 33L, 33L,
34L, 34L, 34L, 34L, 34L), ProductId = c(6L, 3L, 4L, 9L, 8L, 10L,
1L, 5L, 7L, 1L, 5L, 3L, 4L, 2L, 7L, 6L, 10L, 8L, 7L, 4L, 10L,
5L, 1L, 3L, 8L, 6L, 2L, 9L, 6L, 1L, 2L, 4L, 7L, 8L, 5L, 9L, 10L,
3L, 2L, 5L, 9L, 4L, 10L, 3L, 6L, 1L, 8L, 8L, 10L, 2L, 4L, 3L,
9L, 5L, 6L, 5L, 6L, 4L, 9L, 10L, 8L, 2L, 7L, 1L, 3L, 10L, 3L,
2L, 8L, 9L, 7L, 5L, 4L, 1L, 7L, 1L, 3L, 2L, 4L, 8L, 9L, 6L, 5L,
10L, 1L, 9L, 2L, 4L, 7L, 3L, 8L, 7L, 9L, 8L, 4L, 10L, 3L, 5L,
1L, 6L, 2L, 6L, 4L, 9L, 3L, 10L, 1L, 8L, 7L, 5L, 2L, 9L, 5L,
7L, 4L, 10L, 1L, 3L, 2L, 6L, 5L, 9L, 2L, 4L, 3L, 8L, 1L, 10L,
6L, 7L, 10L, 9L, 2L, 1L, 5L, 8L, 6L, 4L, 7L, 3L, 9L, 8L, 3L,
5L, 6L, 10L, 1L, 7L, 4L, 1L, 6L, 9L, 10L, 3L, 4L, 2L, 8L, 7L,
10L, 8L, 1L, 6L, 4L, 5L, 9L, 3L, 7L, 2L, 4L, 8L, 3L, 7L, 10L,
1L, 6L, 5L, 5L, 6L, 4L, 7L, 1L, 10L, 3L, 10L, 8L, 3L, 1L, 4L,
5L, 6L, 2L, 9L, 5L, 6L, 4L, 8L, 2L, 10L, 3L, 1L, 8L, 4L, 10L,
6L, 9L, 7L, 2L, 3L, 8L, 3L, 6L, 7L, 9L, 4L, 5L, 2L, 10L, 1L,
5L, 9L, 3L, 7L, 6L, 10L, 8L, 2L, 4L, 8L, 7L, 1L, 4L, 2L, 10L,
10L, 3L, 8L, 1L, 7L, 5L, 4L, 6L, 2L, 10L, 6L, 1L, 2L, 5L, 4L,
8L, 1L, 10L, 8L, 3L, 2L, 9L, 5L, 6L, 4L, 9L, 10L, 6L, 2L, 1L,
7L, 4L, 8L, 5L, 1L, 5L, 9L, 10L, 3L, 8L, 7L, 2L, 4L, 10L, 1L,
5L, 7L, 6L, 2L, 3L, 4L, 9L, 8L, 1L, 5L, 2L, 7L, 3L, 6L, 10L,
4L, 9L, 9L, 5L, 10L, 8L, 2L), DaysSinceEpoch = c(7L, 20L, 31L,
40L, 105L, 146L, 162L, 169L, 212L, 10L, 18L, 31L, 65L, 84L, 122L,
156L, 202L, 206L, 1L, 4L, 7L, 11L, 14L, 24L, 25L, 100L, 148L,
149L, 3L, 10L, 12L, 14L, 18L, 26L, 35L, 41L, 96L, 147L, 9L, 22L,
66L, 80L, 102L, 104L, 170L, 199L, 234L, 10L, 24L, 36L, 38L, 75L,
122L, 163L, 169L, 9L, 16L, 35L, 39L, 54L, 58L, 79L, 116L, 133L,
224L, 27L, 35L, 37L, 49L, 73L, 91L, 105L, 141L, 252L, 16L, 28L,
51L, 73L, 76L, 83L, 126L, 202L, 97L, 105L, 150L, 172L, 203L,
207L, 223L, 256L, 259L, 25L, 28L, 38L, 40L, 63L, 100L, 120L,
176L, 186L, 191L, 7L, 22L, 36L, 37L, 40L, 41L, 53L, 67L, 114L,
233L, 1L, 16L, 17L, 23L, 40L, 52L, 125L, 184L, 186L, 12L, 42L,
53L, 65L, 67L, 69L, 83L, 149L, 154L, 265L, 10L, 14L, 33L, 47L,
67L, 106L, 133L, 181L, 247L, 258L, 6L, 21L, 26L, 41L, 49L, 68L,
89L, 112L, 119L, 9L, 34L, 88L, 91L, 102L, 110L, 132L, 171L, 200L,
6L, 14L, 21L, 36L, 40L, 60L, 64L, 88L, 109L, 208L, 8L, 17L, 21L,
55L, 77L, 85L, 97L, 168L, 18L, 28L, 42L, 44L, 70L, 77L, 101L,
14L, 23L, 33L, 84L, 107L, 123L, 124L, 125L, 25L, 29L, 33L, 57L,
79L, 83L, 98L, 112L, 119L, 5L, 31L, 64L, 91L, 102L, 131L, 222L,
234L, 27L, 46L, 48L, 60L, 61L, 64L, 72L, 103L, 161L, 8L, 24L,
27L, 50L, 60L, 62L, 92L, 99L, 147L, 159L, 16L, 19L, 20L, 84L,
175L, 202L, 17L, 21L, 25L, 46L, 69L, 121L, 161L, 175L, 267L,
10L, 14L, 20L, 39L, 58L, 90L, 229L, 32L, 35L, 39L, 40L, 60L,
66L, 98L, 153L, 173L, 2L, 3L, 25L, 46L, 51L, 80L, 96L, 166L,
202L, 43L, 70L, 76L, 77L, 115L, 160L, 183L, 202L, 223L, 25L,
33L, 61L, 72L, 74L, 77L, 85L, 91L, 152L, 265L, 16L, 62L, 63L,
64L, 66L, 82L, 104L, 126L, 181L, 47L, 49L, 55L, 58L, 67L), BoughtPAD = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L)), .Names = c("CustumerId",
"ProductId", "DaysSinceEpoch", "BoughtPAD"), row.names = c(NA,
300L), class = "data.frame")
Then, doing
library(TraMineR)
SimSeq <- seqecreate(id = SimulatedDated$CustumerId,
timestamp = SimulatedDated$DaysSinceEpoch,
event = SimulatedDated$ProductId)
Cohort <- factor(SimulatedDated$BoughtPAD, labels = c("PAD", "NPAD"))
Fsubseq <- seqefsub(seq = SimSeq, pMinSupport = .01)
DiscrCohort <- seqecmpgroup(subseq = Fsubseq, group = Cohort)
produces:
Error in model.frame.default(formula = ww ~ group + seqmatrix[, index]) :
variable lengths differ (found for 'group')
and I was wondering, what could be causing this problem?
The group variable should have length equal to the number of sequences, i.e., the number of customers in your case. Also it is supposed to remain constant all along the sequence (which is not the case in your example).
The Cohort variable that you use as group argument has for length the total number of events (300) while you have only 34 customers. So you need to aggregate it by the CustumerID.
Here is how you can do that (here by taking the max of the group value for each customer.)
bylist <- list(id = SimulatedDated$CustumerId)
agg.PAD <- aggregate(SimulatedDated[,c("CustumerId","BoughtPAD")], by=bylist, FUN="max")
Cohort <- agg.PAD$BoughtPAD
Now you can look for the subsequences that best discriminate the groups
DiscrCohort <- seqecmpgroup(subseq = Fsubseq, group = Cohort)
print(DiscrCohort[1:10])
Hope this helps.
This question already has answers here:
How to combine scales for colour and size into one legend?
(2 answers)
Closed 7 years ago.
What is the code to make the two legends into one: A circles legend with color?
I think, a single legend with circles colored according to "size" and "# total number of crimes" is the best way to show the legend.
Desired output:
1) There should be one legend: the circles, instead of black should be colored: 0 circle = "yellow" to 800 circle = "red".
My code:
library(maps)
library(ggmap)
Get map from Google Maps
lima <- get_map(location = "lima", zoom = 11, maptype = c("terrain"))
Plot
ggmap(lima) + geom_point(data = limanov2, aes(x = LONGITUD , y = LATITUD, color = TOTALES,
size = TOTALES)) +
scale_size_continuous(name = "Cantidad\ndelitos",range = c(2,12)) +
scale_color_gradient(name = "Cantidad\ndelitos", low = "yellow", high = "red") +
theme(legend.text= element_text(size=14)) +
ggtitle("TOTAL DELITOS - LIMA NOV 2012") +
theme(plot.title = element_text(size = 12, vjust=2, family="Verdana", face="italic"),
legend.position = 'left')
My data:
structure(list(DISTRITO = c("SAN JUAN DE LURIGANCHO", "CALLAO",
"LOS OLIVOS", "ATE", "LIMA", "SAN MARTIN DE PORRES", "SANTIAGO DE SURCO",
"CHORILLOS", "COMAS", "INDEPENDENCIA", "EL AGUSTINO", "LA VICTORIA",
"SAN JUAN DE MIRAFLORES", "VILLA EL SALVADOR", "SAN MIGUEL",
"CARABAYLLO", "MIRAFLORES", "SAN BORJA", "VENTANILLA", "SURQUILLO",
"BREÑA", "ANCON", "PTE. PIEDRA", "RIMAC", "BARRANCO", "LA MOLINA",
"SAN LUIS", "SANTA ANITA", "LURIGANCHO", "P. LIBRE", "MAGDALENA DEL MAR",
"LA PERLA", "CHACLACAYO", "PUENTE PIEDRA", "SAN ISIDRO", "JESUS MARIA",
"BELLAVISTA", "LINCE", "CARMEN DE LA LEGUA REYNOSO", "CIENEGUILLA",
"SANTA ROSA", "LURIN", "PUNTA NEGRA", "PUCUSANA", "LA PUNTA",
"PUNTA HERMOSA", "PACHACAMAC", "SAN BARTOLO", "SANTA MARIA"),
TOTALES = c(861L, 696L, 696L, 642L, 516L, 479L, 442L, 378L,
371L, 368L, 361L, 333L, 325L, 291L, 282L, 251L, 239L, 196L,
193L, 188L, 185L, 174L, 165L, 161L, 138L, 134L, 128L, 119L,
115L, 105L, 67L, 65L, 63L, 58L, 58L, 56L, 45L, 38L, 23L,
23L, 11L, 8L, 6L, 5L, 3L, 3L, 2L, 0L, 0L), HOMICIDIOS = c(1L,
7L, 0L, 1L, 2L, 0L, 0L, 1L, 7L, 4L, 4L, 4L, 0L, 0L, 0L, 2L,
0L, 0L, 7L, 0L, 0L, 0L, 0L, 4L, 0L, 0L, 2L, 0L, 0L, 0L, 0L,
0L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), LESIONES = c(100L, 72L, 61L, 43L, 44L, 8L, 10L,
15L, 44L, 40L, 50L, 15L, 52L, 28L, 7L, 33L, 15L, 3L, 21L,
7L, 36L, 33L, 15L, 19L, 14L, 1L, 8L, 6L, 16L, 4L, 4L, 9L,
1L, 12L, 2L, 9L, 5L, 2L, 5L, 7L, 1L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), VIO..DE.LA.LIBERTAD.PERSONAL = c(0L, 7L, 6L,
5L, 6L, 1L, 1L, 0L, 3L, 1L, 2L, 0L, 2L, 0L, 1L, 0L, 1L, 0L,
1L, 1L, 0L, 3L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L,
0L, 1L, 0L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L), VIO..DE.LA.LIBERTAD.SEXUAL = c(56L, 14L, 12L, 15L, 7L,
10L, 2L, 9L, 11L, 13L, 8L, 9L, 7L, 14L, 4L, 15L, 4L, 2L,
17L, 7L, 3L, 4L, 6L, 12L, 2L, 1L, 5L, 3L, 11L, 4L, 1L, 2L,
0L, 6L, 2L, 0L, 3L, 0L, 2L, 2L, 0L, 4L, 0L, 0L, 0L, 0L, 0L,
0L, 0L), HURTO.SIMPLE.Y.AGRAVADO = c(217L, 203L, 296L, 230L,
260L, 167L, 226L, 217L, 130L, 117L, 154L, 133L, 121L, 46L,
163L, 72L, 161L, 119L, 69L, 120L, 64L, 19L, 64L, 21L, 57L,
44L, 39L, 2L, 48L, 60L, 30L, 19L, 48L, 20L, 41L, 25L, 19L,
27L, 7L, 11L, 9L, 0L, 6L, 0L, 2L, 3L, 1L, 0L, 0L), ROBO.SIMPLE.Y.AGRAVADO = c(460L,
289L, 308L, 344L, 186L, 277L, 198L, 130L, 165L, 184L, 137L,
149L, 134L, 188L, 104L, 126L, 58L, 72L, 64L, 51L, 77L, 115L,
79L, 76L, 64L, 88L, 73L, 108L, 40L, 36L, 30L, 32L, 14L, 17L,
12L, 22L, 12L, 8L, 6L, 3L, 1L, 3L, 0L, 2L, 1L, 0L, 1L, 0L,
0L), MICRO.COM.DE.DROGAS = c(26L, 100L, 13L, 3L, 10L, 15L,
5L, 5L, 11L, 8L, 3L, 23L, 9L, 15L, 3L, 3L, 0L, 0L, 8L, 2L,
5L, 0L, 0L, 28L, 0L, 0L, 1L, 0L, 0L, 0L, 2L, 2L, 0L, 2L,
0L, 0L, 6L, 0L, 0L, 0L, 0L, 0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L
), TENENCIA.ILEGAL.DE.ARMAS = c(1L, 4L, 0L, 1L, 1L, 1L, 0L,
1L, 0L, 1L, 3L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 6L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), LONGITUD = c(-77,
-77.12, -77.08, -76.89, -77.04, -77.09, -76.99, -77.01, -77.05,
-77.05, -77, -77.02, -76.97, -76.94, -77.09, -76.99, -77.03,
-77, -77.13, -77.01, -77.05, -77.11, -77.08, -76.7, -77.02,
-76.92, -77, -76.96, -76.86, -77.06, -77.07, -77.12, -76.76,
-77.08, -77.03, -77.05, -77.11, -77.04, -77.09, -76.78, -77.16,
-76.81, -76.73, -76.77, -77.16, -76.76, -76.83, -76.73, -76.77
), LATITUD = c(-11.99, -12.04, -11.95, -12.04, -12.06, -12,
-12.16, -12.2, -11.93, -11.99, -12.04, -12.08, -12.16, -12.23,
-12.08, -11.79, -12.12, -12.1, -11.89, -12.11, -12.06, -11.69,
-11.88, -11.94, -12.15, -12.09, -12.08, -12.04, -11.98, -12.08,
-12.09, -12.07, -11.99, -11.88, -12.1, -12.08, -12.06, -12.09,
-12.04, -12.07, -11.81, -12.24, -12.32, -12.47, -12.07, -12.28,
-12.18, -12.38, -12.42)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -49L), .Names = c("DISTRITO", "TOTALES",
"HOMICIDIOS", "LESIONES", "VIO..DE.LA.LIBERTAD.PERSONAL", "VIO..DE.LA.LIBERTAD.SEXUAL",
"HURTO.SIMPLE.Y.AGRAVADO", "ROBO.SIMPLE.Y.AGRAVADO", "MICRO.COM.DE.DROGAS",
"TENENCIA.ILEGAL.DE.ARMAS", "LONGITUD", "LATITUD"))
I've found a solution. Reading the documention for GGPLOT2 V. 0.9
It is the new function: guide_legend() that should be used inside guides().
This is a function that lets you have more control over legend labels.
This is the end code with the resulting output (See the last line):
ggmap(lima) + geom_point(data = limanov2, aes(x = LONGITUD , y = LATITUD, color = TOTALES,
size = TOTALES)) +
scale_size_continuous(name = "Cantidad\ndelitos",range = c(2,12)) +
scale_color_gradient(name = "Cantidad\ndelitos", low = "yellow", high = "red") +
theme(legend.text= element_text(size=14)) +
ggtitle("TOTAL DELITOS - LIMA NOV 2012") +
theme(plot.title = element_text(size = 12, vjust=2, family="Verdana", face="italic"),
legend.position = 'left') +
guides(colour = guide_legend())
I'm plotting some points over a map with ggmap package.
The problem is that i get the message: "Removed 12 rows containing missing values (geom_point)".
But i don't have any NAs. I've looked the data, and used:
sum(is.na(limanov2)) #Gives 0
to prove it.
This is my code:
library(maps)
library(ggmap)
lima <- get_map(location = "lima", zoom = 11)
ggmap(lima) + geom_point(data = limanov2, aes(x = LONGITUD , y = LATITUD, color = TOTALES,
size = TOTALES)) +
scale_color_gradient(low = "yellow", high = "red")
My data:
structure(list(DISTRITO = c("SAN JUAN DE LURIGANCHO", "CALLAO",
"LOS OLIVOS", "ATE VITARTE", "LIMA CERCADO", "SAN MARTÍN", "SANTIAGO DE SURCO",
"CHORILLOS", "COMAS", "INDEPENDENCIA", "EL AGUSTINO", "LA VICTORIA",
"SAN JUAN DE MIRAFLORES", "VILLA EL SALVADOR", "S. MIGUEL", "CARABAYLLO",
"MIRAFLORES", "PTE. PIEDRA", "SAN BORJA", "VENTANILLA", "SURQUILLO",
"BREÑA", "ANCÓN", "EL RIMAC", "BARRANCO", "LA MOLINA", "SAN LUIS",
"STA. ANITA", "LURIGANCHO", "P. LIBRE", "MAGDALENA", "LA PERLA",
"CHACLACAYO", "SAN ISIDRO", "J. MARÍA", "BELLAVISTA", "LINCE",
"C. DE LA LEGUA", "CIENEGUILLA", "STA.ROSA", "LURÍN", "PTA.NEGRA",
"PUCUSANA", "LA PUNTA", "PTA. HERMOSA", "PACHACAMAC", "SAN BARTOLO",
"SANTA MARÍA"), TOTALES = c(861L, 696L, 696L, 642L, 516L, 479L,
442L, 378L, 371L, 368L, 361L, 333L, 325L, 291L, 282L, 251L, 239L,
223L, 196L, 193L, 188L, 185L, 174L, 161L, 138L, 134L, 128L, 119L,
115L, 105L, 67L, 65L, 63L, 58L, 56L, 45L, 38L, 23L, 23L, 11L,
8L, 6L, 5L, 3L, 3L, 2L, 0L, 0L), HOMICIDIOS = c(1L, 7L, 0L, 1L,
2L, 0L, 0L, 1L, 7L, 4L, 4L, 4L, 0L, 0L, 0L, 2L, 0L, 1L, 0L, 7L,
0L, 0L, 0L, 4L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), LESIONES = c(100L,
72L, 61L, 43L, 44L, 8L, 10L, 15L, 44L, 40L, 50L, 15L, 52L, 28L,
7L, 33L, 15L, 27L, 3L, 21L, 7L, 36L, 33L, 19L, 14L, 1L, 8L, 6L,
16L, 4L, 4L, 9L, 1L, 2L, 9L, 5L, 2L, 5L, 7L, 1L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L), VIO..DE.LA.LIBERTAD.PERSONAL = c(0L, 7L,
6L, 5L, 6L, 1L, 1L, 0L, 3L, 1L, 2L, 0L, 2L, 0L, 1L, 0L, 1L, 1L,
0L, 1L, 1L, 0L, 3L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L,
0L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), VIO..DE.LA.LIBERTAD.SEXUAL = c(56L,
14L, 12L, 15L, 7L, 10L, 2L, 9L, 11L, 13L, 8L, 9L, 7L, 14L, 4L,
15L, 4L, 12L, 2L, 17L, 7L, 3L, 4L, 12L, 2L, 1L, 5L, 3L, 11L,
4L, 1L, 2L, 0L, 2L, 0L, 3L, 0L, 2L, 2L, 0L, 4L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), HURTO.SIMPLE.Y.AGRAVADO = c(217L, 203L, 296L, 230L,
260L, 167L, 226L, 217L, 130L, 117L, 154L, 133L, 121L, 46L, 163L,
72L, 161L, 84L, 119L, 69L, 120L, 64L, 19L, 21L, 57L, 44L, 39L,
2L, 48L, 60L, 30L, 19L, 48L, 41L, 25L, 19L, 27L, 7L, 11L, 9L,
0L, 6L, 0L, 2L, 3L, 1L, 0L, 0L), ROBO.SIMPLE.Y.AGRAVADO = c(460L,
289L, 308L, 344L, 186L, 277L, 198L, 130L, 165L, 184L, 137L, 149L,
134L, 188L, 104L, 126L, 58L, 96L, 72L, 64L, 51L, 77L, 115L, 76L,
64L, 88L, 73L, 108L, 40L, 36L, 30L, 32L, 14L, 12L, 22L, 12L,
8L, 6L, 3L, 1L, 3L, 0L, 2L, 1L, 0L, 1L, 0L, 0L), MICRO.COM.DE.DROGAS = c(26L,
100L, 13L, 3L, 10L, 15L, 5L, 5L, 11L, 8L, 3L, 23L, 9L, 15L, 3L,
3L, 0L, 2L, 0L, 8L, 2L, 5L, 0L, 28L, 0L, 0L, 1L, 0L, 0L, 0L,
2L, 2L, 0L, 0L, 0L, 6L, 0L, 0L, 0L, 0L, 0L, 0L, 3L, 0L, 0L, 0L,
0L, 0L), TENENCIA.ILEGAL.DE.ARMAS = c(1L, 4L, 0L, 1L, 1L, 1L,
0L, 1L, 0L, 1L, 3L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 6L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), LONGITUD = c(-77, -77.12,
-77.08, -76.89, -77.04, -77.09, -76.99, -77.01, -77.05, -77.05,
-77, -77.02, -76.97, -76.94, -77.09, -76.99, -77.03, -77.08,
-77, -77.13, -77.01, -77.05, -77.11, -76.7, -77.02, -76.92, -77,
-76.96, -76.86, -77.06, -77.07, -77.12, -76.76, -77.03, -77.05,
-77.11, -77.04, -77.09, -76.78, -77.16, -76.81, -76.73, -76.77,
-77.16, -76.76, -76.83, -76.73, -76.77), LATITUD = c(-11.99,
-12.04, -11.97, -12.04, -12.06, -12, -12.16, -12.2, -11.93, -11.99,
-12.04, -12.08, -12.16, -12.23, -12.08, -11.79, -12.12, -11.88,
-12.1, -11.89, -12.11, -12.06, -11.69, -11.94, -12.15, -12.09,
-12.08, -12.04, -11.98, -12.08, -12.09, -12.07, -11.99, -12.1,
-12.08, -12.06, -12.09, -12.04, -12.07, -11.81, -12.24, -12.32,
-12.47, -12.07, -12.28, -12.18, -12.38, -12.42)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -48L), .Names = c("DISTRITO",
"TOTALES", "HOMICIDIOS", "LESIONES", "VIO..DE.LA.LIBERTAD.PERSONAL",
"VIO..DE.LA.LIBERTAD.SEXUAL", "HURTO.SIMPLE.Y.AGRAVADO", "ROBO.SIMPLE.Y.AGRAVADO",
"MICRO.COM.DE.DROGAS", "TENENCIA.ILEGAL.DE.ARMAS", "LONGITUD",
"LATITUD"))
You have values outside of the base map zoom range... try changing your zoom parameter.
library(maps)
library(ggmap)
lima <- get_map(location = "lima", zoom = 10)
ggmap(lima) +
geom_point(data = limanov2,
aes(x = LONGITUD , y = LATITUD,
color = TOTALES, size = TOTALES)) +
scale_color_gradient(low = "yellow", high = "red")