removing columns with similar variance - r

I have a dataframe of 3500 X 4000. I am trying to write a professional command in R to remove any columns in a matrix that show the same variance. I am able to do this a with a long, complicated command such as
datavar <- apply(data, 2, var)
datavar <- datavar[!duplicated(datavar)]
then assemble my data by matching the remaining column names, but this is SAD! I was hoping to do this in a single go. I was thinking of something like
data <- data[, which(apply(data, 2, function(col) !any(var(data) = any(var(data)) )))]
I know the last part of the above command is nonsense, but I also know there is someway it can be done in some... smart command!
Here's some data that applies to the question
data <- structure(list(V1 = c(3, 213, 1, 135, 5, 2323, 1231, 351, 1,
33, 2, 213, 153, 132, 1321, 53, 1, 231, 351, 3135, 13), V2 = c(1,
1, 1, 2, 3, 5, 13, 33, 53, 132, 135, 153, 213, 213, 231, 351,
351, 1231, 1321, 2323, 3135), V3 = c(65, 41, 1, 53132, 1, 6451,
3241, 561, 321, 534, 31, 135, 1, 1351, 31, 351, 31, 31, 3212,
3132, 1), V4 = c(2, 2, 5, 4654, 5641, 21, 21, 1, 1, 465, 31,
4, 651, 35153, 13, 132, 123, 1231, 321, 321, 5), V5 = c(23, 13,
213, 135, 15341, 564, 564, 8, 464, 8, 484, 6546, 132, 165, 123,
135, 132, 132, 123, 123, 2), V6 = c(2, 1, 84, 86468, 464, 18,
45, 55, 2, 5, 12, 4512, 5, 123, 132465, 12, 456, 15, 45, 123213,
12), V7 = c(1, 2, 2, 5, 5, 12, 12, 12, 15, 18, 45, 45, 55, 84,
123, 456, 464, 4512, 86468, 123213, 132465)), .Names = c("V1",
"V2", "V3", "V4", "V5", "V6", "V7"), row.names = c(NA, 21L), class = "data.frame")
Would I be able to keep one of the "similar variance" columns too?
Thanks,

I might go a more cautious route, like
data[, !duplicated(round(sapply(data,var),your_precision_here))]

This is pretty similar to what you've come up with:
vars <- lapply(data,var)
data[,which(sapply(1:length(vars), function(x) !vars[x] %in% vars[-x]))]
One thing to think about though is whether you want to match variances exactly (as in this example) or just variances that are close. The latter would be a significantly more challenging problem.

... or as alternative:
data[ , !c(duplicated(apply(data, 2, var)) | duplicated(apply(data, 2, var), fromLast=TRUE))]
...but also not shorter :)

Related

How to generate a map for property cluster

Could you help me make a graph in R similar to the one I inserted in the image below, which shows the properties on a map, differentiating by cluster. See in my database that I have 4 properties, properties 1 and 3 are of cluster 1 and properties 2 and 4 are of cluster 2. In addition, the database has the coordinates of the properties, so I believe that with this information I can generate a graph similar to what I inserted. Surely, there must be some package in R that does something similar. Any help is welcome!
This link can help: https://rstudio-pubs-static.s3.amazonaws.com/176768_ec7fb4801e3a4772886d61e65885fbdd.html
#database
df<-structure(list(Properties = c(1,2,3,4),
Latitude = c(-24.930473, -24.95575,-24.924161,-24.95579),
Longitude = c(-49.994889, -49.990162,-50.004343, -50.007371),
cluster = c(1,2,1,2)), class = "data.frame", row.names = c(NA, -4L))
Properties Latitude Longitude cluster
1 1 -24.93047 -49.99489 1
2 2 -24.95575 -49.99016 2
3 3 -24.92416 -50.00434 1
4 4 -24.95579 -50.00737 2
Example of figure:
Your code
#database
df<-structure(list(Propertie = c(1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,
30, 31, 32, 33, 34, 35, 38, 39, 40, 42, 43, 44, 45, 46, 47, 48,
49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 61, 62, 64, 65, 66,
67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82,
83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98,
99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111,
112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124,
125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137,
138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150,
151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163,
164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176,
177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189,
190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202,
203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215,
216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228,
229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241,
242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254,
255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267,
268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280,
281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293,
294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306,
307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319,
320, 321, 322, 323, 324, 325), Latitude = c(-24.927417, -24.927417,
-24.927417, -24.927417, -24.930195, -24.930473, -24.946306, -24.949361,
-24.949361, -24.950195, -24.950195, -24.951584, -24.95575, -24.954084,
-24.96075, -24.957139, -24.95825, -24.96825, -24.961334, -24.968806,
-24.976861, -24.982139, -24.986584, -24.985487, -24.994362, -24.994362,
-24.999084, -24.771583, -24.77186, -24.772138, -24.772138, -24.78686,
-24.78436, -24.872139, -24.822222, -24.83549, -24.874916, -24.874916,
-24.874639, -24.865472, -24.873838, -24.87325, -24.858611, -24.874361,
-24.874361, -24.86, -24.860472, -24.874916, -24.814638, -24.814666,
-24.818527, -24.818527, -24.822694, -24.822694, -24.845472, -24.844638,
-24.878528, -24.879639, -24.879639, -24.906028, -24.897972, -24.900278,
-24.900278, -24.90075, -24.902972, -24.899361, -24.898611, -24.899083,
-24.913889, -24.908333, -24.914361, -24.914361, -24.924361, -24.915472,
-24.91075, -24.913805, -24.913528, -24.912139, -24.919917, -24.914083,
-24.914361, -24.914361, -24.925194, -24.92575, -24.928528, -24.929361,
-24.934361, -24.935278, -24.922694, -24.927139, -24.927972, -24.931861,
-24.936861, -24.878537, -24.887972, -24.882972, -24.901583, -24.901667,
-24.902139, -24.902139, -24.90325, -24.902972, -24.90299, -24.90575,
-24.905791, -24.899639, -24.899083, -24.875472, -24.878805, -24.883805,
-24.884916, -24.8905, -24.884083, -24.884087, -24.905194, -24.904125,
-24.894722, -24.895222, -24.895194, -24.911028, -24.907972, -24.908805,
-24.919916, -24.919361, -24.919639, -24.919639, -24.920194, -24.920472,
-24.917972, -24.908805, -24.911305, -24.91325, -24.917416, -24.928528,
-24.929083, -24.92325, -24.923805, -24.93188, -24.932139, -24.936028,
-24.935472, -24.937139, -24.923805, -24.922139, -24.922139, -24.926861,
-24.908805, -24.908333, -24.908805, -24.913805, -24.913805, -24.929638,
-24.939917, -24.943806, -24.942695, -24.94325, -24.944639, -24.946028,
-24.94825, -24.954084, -24.956111, -24.958611, -24.958806, -24.959084,
-24.958528, -24.958528, -24.956584, -24.955833, -24.95825, -24.960833,
-24.967417, -24.962695, -24.958611, -24.959083, -24.96075, -24.96075,
-24.964361, -24.961306, -24.961028, -24.962417, -24.965833, -24.964639,
-24.963806, -24.964917, -24.965472, -24.966861, -24.968528, -24.942972,
-24.948611, -24.950556, -24.951028, -24.951028, -24.93825, -24.941889,
-24.943528, -24.944639, -24.945194, -24.945472, -24.949083, -24.946861,
-24.94825, -24.949361, -24.951306, -24.948805, -24.948, -24.95075,
-24.952694, -24.959722, -24.961583, -24.96325, -24.96325, -24.96325,
-24.964639, -24.96575, -24.959361, -24.954639, -24.960472, -24.960472,
-24.966583, -24.970195, -24.972417, -24.976306, -24.974084, -24.974167,
-24.974639, -24.979362, -24.979639, -24.980278, -24.982973, -24.982973,
-24.977417, -24.979639, -24.981028, -24.981028, -24.98325, -24.969361,
-24.988056, -24.987139, -24.987139, -24.986584, -24.984639, -24.984639,
-24.984917, -24.984917, -24.994917, -24.987139, -24.989917, -24.992139,
-24.991861, -24.991861, -24.989639, -24.989917, -24.989917, -24.991861,
-24.989639, -24.992417, -24.975195, -24.97325, -24.979361, -24.972694,
-24.972972, -24.942417, -24.941861, -24.93825, -24.938273, -24.949639,
-24.948333, -24.948805, -24.949639, -24.949639, -24.951615, -24.951583,
-24.951615, -24.953611, -24.954639, -24.954639, -24.954639, -24.956861,
-24.956861, -24.966028, -24.956861, -24.955556, -24.957176, -24.96075,
-24.960194, -24.960231, -24.980194, -24.969106, -24.986306, -24.986306,
-24.993806, -24.877972, -24.878889, -24.87686, -24.886305, -24.875749,
-24.876305, -24.876319, -24.878805, -24.891027, -24.898527, -24.898527,
-24.904083, -24.904083, -24.905, -24.901328, -24.902138, -24.898268,
-24.900782, -24.901305, -24.88493, -24.887138, -24.929638, -25.001862,
-25.004084, -25.011028, -25.000194, -25.000472), Longitude = c(-49.98793,
-49.98793, -49.98793, -49.988778, -49.98962, -49.994889, -49.999912,
-49.991273, -49.991273, -49.996551, -49.996551, -49.995704, -49.990162,
-49.992945, -49.990718, -49.999056, -49.998222, -49.981259, -49.997389,
-49.979357, -49.999908, -49.995713, -49.980449, -49.995736, -49.980444,
-49.980444, -49.986852, -50.200149, -50.200172, -50.199602, -50.199603,
-50.199339, -50.209899, -50.038787, -50.243338, -50.235446, -50.139343,
-50.139348, -50.154871, -50.164607, -50.179621, -50.179895, -50.226412,
-50.196297, -50.196297, -50.233639, -50.234066, -50.242649, -50.251816,
-50.252098, -50.258233, -50.258233, -50.288502, -50.288525, -50.251001,
-50.261575, -50.039037, -50.044333, -50.044333, -50.015148, -50.115163,
-50.094472, -50.094472, -50.094899, -50.108204, -50.111829, -50.113653,
-50.114079, -50.010278, -50.017523, -50.010704, -50.010704, -50.004343,
-50.087667, -50.106547, -50.103487, -50.116283, -50.117968, -50.101301,
-50.119913, -50.120191, -50.120191, -50.079593, -50.080167, -50.082112,
-50.093519, -50.070172, -50.074194, -50.095459, -50.117959, -50.121024,
-50.094079, -50.102677, -50.129635, -50.140468, -50.143492, -50.166288,
-50.166426, -50.166816, -50.166844, -50.166024, -50.169635, -50.169635,
-50.165154, -50.165154, -50.175427, -50.182686, -50.188496, -50.203515,
-50.208765, -50.208487, -50.220728, -50.24933, -50.24933, -50.190159,
-50.204603, -50.241421, -50.241576, -50.241849, -50.135746, -50.144894,
-50.142117, -50.14408, -50.146839, -50.148223, -50.148223, -50.143802,
-50.144066, -50.151269, -50.163802, -50.159357, -50.160168, -50.159066,
-50.138232, -50.137107, -50.151288, -50.151001, -50.137376, -50.139061,
-50.132691, -50.132968, -50.152399, -50.170709, -50.176566, -50.176566,
-50.173237, -50.195182, -50.196949, -50.197376, -50.209608, -50.209608,
-50.239872, -50.007371, -50.006579, -50.007931, -50.008523, -50.01044,
-50.013787, -50.014607, -50.014037, -50.013056, -50.004181, -50.006569,
-50.004607, -50.008482, -50.008482, -50.026278, -50.030861, -50.018523,
-50.019444, -50.014903, -50.020181, -50.045875, -50.046301, -50.057121,
-50.057121, -50.036278, -50.040176, -50.043227, -50.044894, -50.036125,
-50.050158, -50.055186, -50.04876, -50.053213, -50.062385, -50.061561,
-50.085727, -50.093361, -50.083352, -50.083227, -50.083228, -50.10488,
-50.10351, -50.108783, -50.121816, -50.121279, -50.098487, -50.093788,
-50.104315, -50.10238, -50.107121, -50.108482, -50.111024, -50.124043,
-50.115723, -50.124343, -50.083375, -50.074315, -50.073515, -50.073514,
-50.073769, -50.070459, -50.072959, -50.106561, -50.116857, -50.113797,
-50.113797, -50.103802, -50.007107, -50.001815, -50.005185, -50.022371,
-50.021685, -50.022111, -50.004597, -50.006269, -50.007778, -50.001843,
-50.001843, -50.01906, -50.020185, -50.020185, -50.020426, -50.021843,
-50.06044, -50.00362, -50.00519, -50.00519, -50.007102, -50.024079,
-50.024079, -50.023778, -50.023778, -50.010732, -50.037686, -50.032936,
-50.03657, -50.038204, -50.038223, -50.041283, -50.042375, -50.044885,
-50.043227, -50.05851, -50.03988, -50.062653, -50.087385, -50.077112,
-50.110996, -50.119061, -50.126279, -50.132691, -50.149052, -50.149052,
-50.137371, -50.141431, -50.141858, -50.170992, -50.170992, -50.176288,
-50.176844, -50.176844, -50.14225, -50.142404, -50.142404, -50.142408,
-50.155432, -50.155432, -50.14852, -50.159344, -50.160579, -50.157409,
-50.158209, -50.170436, -50.170436, -50.132121, -50.165154, -50.144052,
-50.144052, -50.13408, -50.263247, -50.264755, -50.26821, -50.257386,
-50.28265, -50.2924, -50.2924, -50.303516, -50.264891, -50.251543,
-50.251543, -50.261302, -50.261539, -50.264755, -50.270455, -50.270747,
-50.294067, -50.290159, -50.290432, -50.315715, -50.320456, -50.251849,
-49.989338, -49.986551, -49.976296, -50.127404, -50.127654),
cluster = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 1,
4, 4, 5, 5, 5, 5, 5, 5, 4, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 1, 1, 1, 1, 5, 5, 5, 5, 5, 5, 5, 5, 1, 1, 1, 1,
1, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 4, 4, 5, 5, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 5, 5, 5, 5, 5, 5, 2, 5, 5, 5, 5, 5, 5,
5, 5, 2, 2, 2, 2, 2, 2, 2, 5, 5, 5, 5, 5, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 2, 2, 5, 5)), row.names = c(NA,
-318L), class = c("tbl_df", "tbl", "data.frame"))
w1<-convexhull.xy(df$Longitude[df$cluster==1], df$Latitude[df$cluster==1])
w2<-convexhull.xy(df$Longitude[df$cluster==2], df$Latitude[df$cluster==2])
w3<-convexhull.xy(df$Longitude[df$cluster==3], df$Latitude[df$cluster==3])
w4<-convexhull.xy(df$Longitude[df$cluster==4], df$Latitude[df$cluster==4])
w5<-convexhull.xy(df$Longitude[df$cluster==5], df$Latitude[df$cluster==5])
p1<-st_as_sf(w1, crs=4269)
p2<-st_as_sf(w2, crs=4269)
p3<-st_as_sf(w3, crs=4269)
p4<-st_as_sf(w4, crs=4269)
p5<-st_as_sf(w5, crs=4269)
poly<-rbind(p1,p2,p3,p4,p5)
poly[,"cluster"]<-c(1,2,3,4,5)
pts<-st_as_sf(df, coords=c("Longitude", "Latitude"), crs=4269)
tmap_mode("plot")
tm_shape(poly)+
tm_polygons(col="cluster", palette=c("darkolivegreen","skyblue","skyblue","yellow","pink"), style="cat", title="cluster")+
tm_shape(pts)+
tm_dots(size=2)+
tm_layout(legend.outside = TRUE)
One approach would be use a voronoi partition. ggvoronoi will do this for you with ggplot2, and you could easily overlay it on a ggmap map.
There is also a st_voronoi function in the sf package which will create a voronoi partition shapefile from a MULTIPOINT shape (see update below).
Here is a simple example using your data. I have removed duplicated points (i.e. the same point in different clusters) which break the voronoi algorithm!
library(tidyverse) #specifically ggplot2 and dplyr for the pipe
library(ggvoronoi)
df %>% distinct(Longitude, Latitude, .keep_all = TRUE) %>%
ggplot(aes(x = Longitude, y = Latitude, fill = factor(cluster), label = cluster)) +
geom_voronoi() +
geom_text()
Update:
To do this with sf_voronoi you can do the following (unlike ggvoronoi, sf_voronoi works without having to weed out the duplicates)...
pts_vor <- pts %>% st_union() %>% #merge points into a MULTIPOINT
st_voronoi() %>% #calculate voronoi polygons
st_cast() %>% #next three lines return it to a useable list of polygons
data.frame(geometry = .) %>%
st_sf(.) %>%
st_join(., pts) #merge with clusters and other variables
pts_vor %>% ggplot(aes(fill = factor(cluster))) +
geom_sf(colour = NA) + #draw voronoi tiles (no borders)
geom_sf_text(data = pts, aes(label = cluster)) + #plot points
coord_sf(xlim = c(-50.35, -49.95), ylim = c(-25.05, -24.75))
#Antonio, I think this might be the solution you are after, but it requires at least three points per cluster to work, which from your figure I am assuming you have in your full dataset. The trick is to create convex hulls and convert them into polygons. This can be accomplished using the convexhull.xy() function in the spatstat:: package. Then these can be converted into simple features in the sf:: package, and then drawn with your mapping package of choice. I personally am a fan of the tmap:: package. Here is a reproducible example. Note, I had to add two more points to your example data to make this work (you cannot compute a polygon from only two points).
##Loading Necessary Packages##
library(spatstat)#For convexhull.xy() function
library(tmap)# For drawing the map
library(sf) #To create simple features for mapping
##Loading Example Data##
df<-structure(list(Properties = c(1,2,3,4,5,6),
Latitude = c(-24.930473, -24.95575,-24.924161,-24.95579, -24.94557, -24.93267),
Longitude = c(-49.994889, -49.990162,-50.004343, -50.007371, -50.01542, -50.00702),
cluster = c(1,2,1,2,1,2)), class = "data.frame", row.names = c(NA, -6L))
##Calculating convexhulls for each cluster##
w1<-convexhull.xy(df$Longitude[df$cluster==1], df$Latitude[df$cluster==1])
w2<-convexhull.xy(df$Longitude[df$cluster==2], df$Latitude[df$cluster==2])
##Converting hulls to simple features. Note, I assumed that you are using the EPSG 4269 projection (WGS84)
p1<-st_as_sf(w1, crs=4269)
p2<-st_as_sf(w2, crs=4269)
#Combining the two simple features together
poly<-rbind(p1,p2)
#Labelling the clusters
poly[,"cluster"]<-c(1,2)
#Creating a point simple feature from your property data in the dataframe
pts<-st_as_sf(df, coords=c("Longitude", "Latitude"), crs=4269)
#Setting the mapping mode to plot. Change this to "view" if you want an interactive map
tmap_mode("plot")
#Drawing the map
tm_shape(poly)+
tm_polygons(col="cluster", palette=c("darkolivegreen", "skyblue"), style="cat", title="cluster")+
tm_shape(pts)+
tm_dots(size=2)+
tm_layout(legend.outside = TRUE)

R : List as part of data frame

This is driving me nuts! I'm sure the answer is simple but I can't work it out!
I have long data with grouped variables. I have 'dcast' it into wide format, the idea being that I want to perform some calculations on groups and then melt again before plotting them.
I now have the wide dataframe which looks 99% correct. All of the results for 'StartTime' are stored as a list (a vector, I guess?) in a single cell of the dataframe.
To get here:
dw <- dcast(data,Volunteer~grpa, fun =list, value.var=StartTime)
I would have thought it should be a case of simply indexing the vector and calculating a mean of that. E.g. df$mean1 <- mean(df$group1) but this returns NA
Alternatively, if I do df$mean1 <- mean(unlist(df$group1)) I get the mean of the whole column of lists which is obviously not what I want.
In addition to the mean, I'd also like to calculate the SD of the data. When I work out the syntax this should be trivial. I'm sure there are several ways of doing this:
Work out what I'm doing wrong and just generate a mean from the list
Using a different function in the dcast command to generate a table with all the values, means & SD in separate columns
Converting the lists to columns (N.B the lists are all length 3, with one exception where there is missing data)
Perhaps there is a way of calculating the (groupwise) means from the long data
Any help will be gratefully received! Thanks.
EDIT: Here's the data via dput:
structure(list(Well = c("A02", "A03", "A05", "A06", "A07", "A09",
"A10", "A11", "B01", "B02", "B03", "B05", "B06", "B07", "B09",
"B10", "B11", "C01", "C02", "C03", "C05", "C06", "C07", "C09",
"C10", "C11", "D01", "D02", "D03", "D05", "D06", "D07", "D09",
"D10", "D11", "E01", "E02", "E03", "E05", "E06", "E07", "E09",
"E10", "E11", "F01", "F02", "F03", "F05", "F06", "F07", "F09",
"F10", "F11", "G01", "G02", "G05", "G06", "G07", "G09", "G10",
"G11"), Volunteer = c(2, 2, 2, 2, 2, 2, 2, 2, 40, 40, 40, 40,
40, 40, 40, 40, 40, 70, 70, 70, 70, 70, 70, 70, 70, 70, 72, 72,
72, 72, 72, 72, 72, 72, 72, 73, 73, 73, 73, 73, 73, 73, 73, 73,
74, 74, 74, 74, 74, 74, 74, 74, 74, 999, 999, 999, 999, 999,
999, 999, 999), group = c(1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 1, 2,
2, 2, 3, 3, 3, 1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 1, 2, 2, 2, 3,
3, 3, 1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 1, 2, 2, 2, 3, 3, 3, 1,
1, 2, 2, 2, 3, 3, 3), StartTime = c(340, 330, 310, 315, 305,
330, 335, 325, 440, 445, 450, 460, 465, 455, 405, 410, 415, 350,
355, 360, 355, 360, 350, 375, 380, 385, 290, 295, 285, 280, 285,
290, 315, 320, 310, 290, 295, 300, 385, 405, 380, 405, 395, 385,
305, 295, 300, 295, 300, 290, 300, 320, 310, 395, 400, 385, 390,
395, 375, 380, 370)), row.names = c(NA, -61L), class = c("tbl_df",
"tbl", "data.frame"))

No data-frame returned with get_mon_arrivals() for Simmer

A simulation model I created with Simmer returns a meaningful data-frame with get_mon_attributes(), however only a data-frame with 2 rows with get_mon_arrivals(). (My original code only returns a data-frame with no rows at this point). I use both data-frames to create a new data-frame to display queue and activity durations (of which there are three each) as well as throughput time for each arrival, which is then further analysed for output with Shiny (not shown in reproducible example below).
A previous version of this code worked well without issues, in spite of a more complex calculation of activity times.
I kept the reproducible example as small as possible - much of it is data. I realize it is quite large, not sure how to reduce it further, without changing it too much with regards to the original code.
I didn't find a similar problem on google or stackoverflow.com.
library(simmer)
library(dplyr)
arrivalsAtHatchV <- c(33.05, 59.65, 133.15, 187.683333333333,
190.916666666667, 191.316666666667,
191.733333333333, 192.2, 225.283333333333, 226.15,
232.483333333333,
250.983333333333, 294.616666666667, 295.05,
342.083333333333,
370.283333333333, 376.35, 381.816666666667,
392.716666666667,
393.866666666667, 398.666666666667,
399.116666666667, 400.8,
415.85, 429.65, 433.35, 436.466666666667,
437.883333333333, 439,
440.2, 440.633333333333, 441.216666666667,
443.066666666667,
457.25, 461.933333333333, 479.783333333333,
523.083333333333,
524.45)
arrivalsFromWardV <- c(18.3666666666667, 18.3666666666667, 46.15,
72.4333333333333,
72.45, 72.45, 72.4666666666667, 76.75, 80.6,
88.3833333333333,
99, 100.383333333333, 103.366666666667, 117.6,
117.683333333333,
125.466666666667, 136.633333333333,
136.633333333333, 150.033333333333,
156.6, 156.6, 158.833333333333, 158.833333333333,
158.833333333333,
158.85, 161.283333333333, 171.366666666667,
171.366666666667,
175.283333333333, 177.733333333333, 180.85,
193.366666666667,
193.383333333333, 208.266666666667,
208.683333333333, 209.166666666667,
209.266666666667, 209.366666666667, 218.55,
220.3, 232.733333333333,
235.683333333333, 237.95, 237.95,
240.383333333333, 254.083333333333,
254.75, 262.066666666667, 263.933333333333,
263.95, 275.05, 282.25,
291.45, 293.8, 309.25, 324.633333333333,
335.816666666667, 341,
342.316666666667, 343.15, 360.7, 364.5,
388.383333333333, 388.383333333333,
396.183333333333, 402.016666666667,
414.833333333333, 434.716666666667,
434.716666666667, 449.116666666667,
453.266666666667, 461.8,
469.75, 471.9, 476.116666666667, 476.2,
478.683333333333, 480.033333333333,
494.266666666667, 495.983333333333, 507.25)
defaultShiftMatrix <- structure(c(0, 0, 2, 0, 1, 2, 4, 1, 1, 2, 4, 1, 1,
2, 4, 1, 1,
2, 4, 1, 1, 0, 2, 1), .Dim = c(4L,
6L), .Dimnames = list(c("disp. pharmacist",
"ward pharmacist", "pharm. tech.", "checking tech."), c("8-9 a.m.",
"9-11 a.m.", "11 a.m.-1 p.m.", "1-3 p.m.", "3-5 p.m.", "5-6:30 p.m."
)))
outPatDurForChosenDate <- structure(list(RxID = c(108323, 108326,
108340, 108356, 108357,
108358, 108359, 108360, 108370, 108371, 108372, 108381, 108391,
108392, 108399, 108404, 108405, 108407, 108410, 108411, 108414,
108415, 108416, 108420, 108421, 108422, 108425, 108426, 108427,
108428, 108429, 108431, 108432, 108436, 108438, 108447, 108455,
108456), verifActivity = c(65, 1046, 1884, 82, 3, 6, 3, 4, 6,
663, 103, 4, 6, 5, 125, 9, 3, 13, 5, 6, 3, 6, 3, 3, 29, 5, 202,
7, 3, 3, 5, 7, 5, 4, 5, 2, 132, 5), dispActivity = c(602, 8,
702, 1032, 399, 172, 250, 301, 745, 303, 59, 4, 1278, 173, 728,
102, 356, 112, 4, 561, 1165, 383, 560, 433, 568, 604, 630, 378,
486, 3, 305, 378, 822, 257, 674, 1656, 413, 2), finCheckActivity = c(284,
162, 305, 3, 290, 163, 386, 282, 90, 56, 28, 72, 202, 67, 45,
163, 67, 59, 48, 3, 54, 2, 1, 3, 4, 263, 92, 7, 56, 2, 4, 2332,
718, 77, 7, 2, 3, 2)), row.names = c(NA, -38L), class = "data.frame")
inPatDurForChosenDate <- structure(list(RxID = c(108318, 108319, 108324, 108327, 108328,
108329, 108330, 108331, 108332, 108333, 108334, 108335, 108336,
108337, 108338, 108339, 108341, 108342, 108343, 108344, 108345,
108346, 108347, 108348, 108349, 108350, 108351, 108352, 108353,
108354, 108355, 108361, 108362, 108363, 108364, 108365, 108366,
108367, 108368, 108369, 108373, 108374, 108375, 108376, 108377,
108382, 108383, 108384, 108385, 108386, 108387, 108388, 108389,
108390, 108394, 108396, 108397, 108398, 108400, 108401, 108402,
108403, 108408, 108409, 108413, 108417, 108419, 108423, 108424,
108434, 108435, 108437, 108440, 108441, 108443, 108444, 108446,
108448, 108450, 108451, 108454), verifActivity = c(514, 224,
205, 1370, 9, 4751, 390, 5, 1057, 3699, 240, 30, 46147, 796,
753, 1020, 39, 713, 703, 401, 13517, 128, 507, 6391, 160, 6,
136, 293, 596, 196, 287, 863, 1770, 4, 548, 4, 462, 99, 118,
217, 7031, 10, 4504, 599, 44, 143, 127, 1239, 164, 94, 926, 77,
172, 4, 982, 760, 456, 44, 164, 3, 466, 2672, 710, 635, 445,
820, 2575, 8, 7, 92, 1283, 36, 4, 13, 7, 51, 131, 3, 15, 2, 4
), dispActivity = c(3, 1202, 4482, 100, 2611, 9600, 667, 1169,
596, 1124, 3, 8, 1673, 673, 977, 145, 592, 892, 300, 4004, 435,
728, 969, 1695, 1308, 8, 382, 470, 880, 366, 589, 1113, 1456,
606, 3256, 2135, 964, 145, 499, 3690, 4473, 622, 399, 878, 1687,
547, 1610, 3698, 966, 745, 127, 72, 658, 404, 15, 4103, 5827,
1175, 508, 127, 792, 2723, 33411, 617, 5037, 855, 607, 1093,
169, 3608, 925, 78, 1151, 53, 733, 1755, 579, 2014, 7953, 273,
999), finCheckActivity = c(422, 4, 3, 8, 273, 2257, 149, 579,
247, 316, 783, 2, 1, 175, 1978, 67, 545, 209, 4, 635, 4, 178,
424, 4, 2, 3, 2, 328, 163, 71, 116, 598, 1, 2, 1430, 150, 343,
22, 304, 758, 36, 201, 3, 1, 324, 157, 108, 874, 108, 94, 4,
3, 4, 31, 4, 3, 863, 6, 1, 118, 3, 64, 806, 4, 4, 215, 3, 131,
504, 1, 63, 3, 4, 278, 116, 5, 76, 1, 382, 1, 2)), row.names = c(NA,
-81L), class = "data.frame")
#The simulation model below is a function that requires a shift pattern for
#four roles, a vector outlining arrivals at the hatch and a vector outlining arrivals
#from the ward, both over 1 working day, i.e. the simulator
#runs over 1 working day. It returns a data-frame of the arrivals and one for
#their attributes:
simulationResults <- function(shiftMatrix, arrivalsAtHatchVect,
arrivalsFromWardVect, outPatDurs, inPatDurs, repeatNumber){
#outPatDurs is a data-frame containing the activity durations for outpatients
#for a chosen date
#inPatDurs is a the equivalent data-frame for inpatient wards
arrivalsAtHatchV <- arrivalsAtHatchVect
arrivalsFromWardV <- arrivalsFromWardVect
outpatientDurations <- outPatDurs
inpatientDurations <- inPatDurs
#Data input of average durations of main activities per arrival
#and duration of run.I will assume that time-units are minutes.
runDuration <- 630 #630 min. would be 10.5 hours, e.g. from 8:00 a.m. to 6:30 p.m.
arrivalFromHatch <- "hatch" #The name used for arrivals at the hatch in the model.
arrivalFromWard <- "wards" #The name used for arrivals from the wards in the model.
#Schedules (i.e. shifts for resources):
shiftTimes <- c(0, 60, 180, 300, 420, 540) # this corresponds to 8 a.m.,
#9 a.m., 11 a.m., 1 p.m., 3 p.m., and 5 p.m. - this is when number of resources
#change
disp.pharmacist.sched <- schedule(shiftTimes,
shiftMatrix["disp. pharmacist",], period = 630)
ward.pharmacist.sched <- schedule(shiftTimes,
shiftMatrix["ward pharmacist",], period = 630)
pharm.tech.sched <- schedule(shiftTimes,
shiftMatrix["pharm. tech.",], period = 630)
fin.check.sched <- schedule(shiftTimes,
shiftMatrix["checking tech.",], period = 630)
arrivalDataFrame <- NULL
for (counter in (1:repeatNumber)){
#since I want to keep the activity durations of a particular prescription
#together, I will just create a vector of randomly selected RxIDs from
#the outpatient or inpatient prescription data for the chosen day -
#these prescription numbers will be used to access the activity data
#later prescription by prescription
outpatientRxIDs <- sample(x = outpatientDurations %>% pull(RxID),
size = outpatientDurations %>% nrow(),
replace = F)
inpatientRxIDs <- sample(x = inpatientDurations %>% pull(RxID),
size = inpatientDurations %>% nrow(),
replace = F)
##############################################################
#Defining Simmer environment:
pharmacy <- simmer("Dispensing Process")
#Defining trajectory with 2 activities, the distribution of their durations
#and their required resources:
dispProcess <- trajectory("dispensing & final checking") %>%
set_attribute(keys = "progress", values = function(){5}) %>% # 5 ... waiting for dispensing
seize("dispenser", 1) %>%
set_attribute(keys = "progress", values = function(){6}) %>% # 6 ... start of dispensing
timeout(function() {durationCalculator(get_name(pharmacy),
get_attribute(pharmacy,"progress"),
outpatientRxIDs,
inpatientRxIDs,
outpatientDurations,
inpatientDurations)}) %>% #********
release("dispenser", 1) %>%
set_attribute(keys = "progress", values = function(){7}) %>% # 7 ... waiting for final checking
simmer::select(resources = c("final checker","disp_pharmacist"), policy = 'shortest-queue') %>%
seize_selected(amount = 1) %>%
set_attribute(keys = "progress", values = function(){8}) %>% # 8 ... start of final checking
timeout(function() {durationCalculator(get_name(pharmacy),
get_attribute(pharmacy,"progress"),
outpatientRxIDs,
inpatientRxIDs,
outpatientDurations,
inpatientDurations)}) %>% #********
release_selected(amount = 1) %>%
set_attribute(keys = "progress", values = function(){9}) # 9 ... finish of final checking and process
#Part of the trajectory that covers the verifying of prescriptions from the ward:
verifyingOnWards <- trajectory("verifying on wards") %>%
#Attribute keeping track of progress of Rx in process:
set_attribute(keys = "progress", values = function(){1}) %>% # 1 ... waiting for verifying
seize("ward pharmacist", 1) %>%
set_attribute(keys = "progress", values = function(){2}) %>% # 2 ... start of verifying
timeout(function() {durationCalculator(get_name(pharmacy),
get_attribute(pharmacy,"progress"),
outpatientRxIDs,
inpatientRxIDs,
outpatientDurations,
inpatientDurations)}) %>%
release("ward pharmacist", 1)
#Part of the trajectory that covers the verifying of prescriptions from the hatch (mainly
#outpatient Rxs):
verifyingOutpatients <- trajectory("verifying in dispensary") %>%
set_attribute(keys = "progress", values = function(){1}) %>% # 1 ... waiting for verifying
seize("disp_pharmacist", 1) %>%
set_attribute(keys = "progress", values = function(){2}) %>% # 2 ... start of verifying
timeout(function() {durationCalculator(get_name(pharmacy),
get_attribute(pharmacy,"progress"),
outpatientRxIDs,
inpatientRxIDs,
outpatientDurations,
inpatientDurations)}) %>%
release("disp_pharmacist", 1)
prescriptFromWard <- join(verifyingOnWards, dispProcess)
prescriptFromHatch <- join(verifyingOutpatients, dispProcess)
#Defining number of resources (i.e. staff) available:
pharmacy %>%
add_resource("disp_pharmacist", disp.pharmacist.sched) %>%
add_resource("ward pharmacist", ward.pharmacist.sched) %>%
add_resource("dispenser", pharm.tech.sched) %>%
add_resource("final checker", fin.check.sched) %>%
add_generator(arrivalFromHatch, prescriptFromHatch, at(arrivalsAtHatchV), mon = 2) %>%
add_generator(arrivalFromWard, prescriptFromWard, at(arrivalsFromWardV), mon = 2)
#Defining length of simulation run:
pharmacy %>% run(until = runDuration)
#Output of data to data-frame:
arrivals.df <- pharmacy %>% get_mon_arrivals() %>% .[order(.$start_time),]
attributes.df <- pharmacy %>% get_mon_attributes() %>% .[order(.$time),]
arrivalDataFrame <- arrivalDataFrame %>%
rbind(arrivals.df %>% cbind(trial = counter))
}
return(arrivalDataFrame)
}
#This function returns a duration dependend on attributes in the trajectory below;
durationCalculator <- function(arrivName, activity, outpatRxIDs,
inpatRxIDs, outpatDurs, inpatDurs){
#arrivName is the name of the arrival, e.g. "wards11", or "hatch5" -
#the last digits are to count the arrivals
#activity is an integer from 1 to 9
#depending on these two parameters an activity duration is picked from a
#data-frame, i.e. outpatientDurations or inpatientDurations
kounter <- substr(arrivName, start = 6, stop = nchar(arrivName)) %>% #both arrival names have 5 letters
as.integer() %>% "+" (1) #this is to extract the number of the arrival
currActivity <- switch(activity %>% as.character(), "2" ="verifActivity",
"6" = "dispActivity", "8" = "finCheckActivity")
if (grepl("hatch",arrivName)){ #this expression would be true for an inpatient Rx
r <- outpatDurs[
outpatDurs$RxID == outpatRxIDs[kounter],
currActivity]
}else{
r <- inpatDurs[
inpatDurs$RxID == inpatRxIDs[kounter],
currActivity]
}
return(r)
}
simulationResults(defaultShiftMatrix, arrivalsAtHatchV, arrivalsFromWardV,
outPatDurForChosenDate, inPatDurForChosenDate, 1) %>% #the last digit is the number of
print()
I would have expected the Simmer simulation to take account of all arrivals as per arrivalsAtHatchV and arrivalsFromWardV. This does not happen, however.
Any help would be greatly appreciated.
See comment above. Need to watch what I feed my functions in the future.

Show a specific value of x-axis on ggplot

Im creating a ggplot with geom_vline at a specific location on the x axis. i would like the x axis to show that specific value
Following is my data + code:
dput(agg_data)
structure(list(latency = structure(c(0, 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 24, 26, 28,
29, 32, 36, 37, 40, 43, 46, 47, 48, 49, 54, 64, 71, 72, 75, 87,
88, 89, 93, 134, 151), class = "difftime", units = "days"), count = c(362,
11, 8, 5, 4, 2, 8, 6, 4, 2, 2, 1, 5, 1, 2, 2, 2, 1, 1, 1, 2,
1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1,
1, 1, 1), cum_sum = c(362, 373, 381, 386, 390, 392, 400, 406,
410, 412, 414, 415, 420, 421, 423, 425, 427, 428, 429, 430, 432,
433, 435, 436, 437, 438, 439, 440, 441, 442, 444, 446, 447, 448,
449, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460), cum_dist = c(0.78695652173913,
0.810869565217391, 0.828260869565217, 0.839130434782609, 0.847826086956522,
0.852173913043478, 0.869565217391304, 0.882608695652174, 0.891304347826087,
0.895652173913044, 0.9, 0.902173913043478, 0.91304347826087,
0.915217391304348, 0.919565217391304, 0.923913043478261, 0.928260869565217,
0.930434782608696, 0.932608695652174, 0.934782608695652, 0.939130434782609,
0.941304347826087, 0.945652173913043, 0.947826086956522, 0.95,
0.952173913043478, 0.954347826086957, 0.956521739130435, 0.958695652173913,
0.960869565217391, 0.965217391304348, 0.969565217391304, 0.971739130434783,
0.973913043478261, 0.976086956521739, 0.980434782608696, 0.982608695652174,
0.984782608695652, 0.98695652173913, 0.989130434782609, 0.991304347826087,
0.993478260869565, 0.995652173913044, 0.997826086956522, 1)), .Names = c("latency",
"count", "cum_sum", "cum_dist"), row.names = c(NA, -45L), class = "data.frame")
and code:
library(ggplot2)
library(ggthemes)
russ<-ggplot(data=agg_data,aes(x=as.numeric(latency),y=cum_dist))+geom_line(size=2)
russ<-russ+ggtitle("Latency from first click to Qualified Demo:") + xlab("in Days")+ ylab("Proportion of maturity")+theme_economist()
russ<-russ+geom_vline(aes(xintercept=10), color="black", linetype="dashed")
russ
which creates the following plot:
I want the plot show the value '10' (same location as the vline) on the x-axis
I looked for some other similar answers, like in Customize axis labels
but this one re creates the x axis labels (with scale_x_discrete), and does not add a new number to the current scale, which is more of what im looking for.
thanks in advance!
In your case x scale is continuous, so you can use function scale_x_continuous() and provide breaks at positions you need.
russ + scale_x_continuous(breaks=c(0,10,50,100,150))

Getting the error "level sets of factors are different" when running a for loop

I have the following 3 tables:
AggData <- structure(list(Path = c("NonBrand", "Brand", "NonBrand,NonBrand",
"Brand,Brand", "NonBrand,NonBrand,NonBrand", "Brand,Brand,Brand",
"Brand,NonBrand", "NonBrand,Brand", "NonBrand,NonBrand,NonBrand,NonBrand",
"Brand,Brand,Brand,Brand", "NonBrand,NonBrand,NonBrand,NonBrand,NonBrand",
"Brand,Brand,Brand,Brand,Brand", "Brand,Brand,NonBrand", "NonBrand,Brand,Brand",
"Brand,NonBrand,NonBrand", "NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand",
"NonBrand,NonBrand,Brand", "Brand,NonBrand,Brand", "NonBrand,Brand,NonBrand",
"NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand",
"Brand,Brand,Brand,Brand,Brand,Brand", "NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand",
"NonBrand,Brand,Brand,Brand", "NonBrand,NonBrand,NonBrand,Brand",
"Brand,Brand,Brand,NonBrand", "Brand,Brand,Brand,Brand,Brand,Brand,Brand",
"Brand,NonBrand,NonBrand,NonBrand", "NonBrand,NonBrand,Brand,Brand",
"Brand,Brand,NonBrand,NonBrand", "Brand,NonBrand,Brand,Brand",
"NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand",
"Brand,Brand,NonBrand,Brand", "NonBrand,Brand,NonBrand,NonBrand",
"Brand,Brand,Brand,Brand,Brand,Brand,Brand,Brand", "NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand",
"NonBrand,NonBrand,Brand,NonBrand", "Brand,NonBrand,NonBrand,Brand",
"NonBrand,Brand,Brand,Brand,Brand", "NonBrand,NonBrand,NonBrand,NonBrand,Brand",
"Brand,NonBrand,Brand,NonBrand", "NonBrand,Brand,Brand,NonBrand",
"Brand,Brand,Brand,Brand,NonBrand", "Brand,NonBrand,NonBrand,NonBrand,NonBrand",
"Brand,Brand,Brand,Brand,Brand,Brand,Brand,Brand,Brand", "NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand",
"Brand,NonBrand,Brand,Brand,Brand", "NonBrand,Brand,NonBrand,Brand",
"Brand,Brand,Brand,NonBrand,Brand", "NonBrand,NonBrand,Brand,Brand,Brand",
"NonBrand,NonBrand,NonBrand,Brand,Brand", "Brand,Brand,NonBrand,Brand,Brand",
"Brand,Brand,Brand,NonBrand,NonBrand", "Brand,Brand,Brand,Brand,Brand,Brand,Brand,Brand,Brand,Brand",
"NonBrand,NonBrand,NonBrand,Brand,NonBrand", "Brand,Brand,NonBrand,NonBrand,NonBrand",
"NonBrand,Brand,Brand,Brand,Brand,Brand", "NonBrand,Brand,NonBrand,NonBrand,NonBrand",
"NonBrand,NonBrand,Brand,NonBrand,NonBrand", "NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,Brand",
"Brand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand", "Brand,Brand,Brand,Brand,Brand,NonBrand",
"NonBrand,Brand,Brand,NonBrand,NonBrand", "Brand,NonBrand,NonBrand,Brand,Brand",
"NonBrand,NonBrand,NonBrand,NonBrand,Brand,Brand", "NonBrand,NonBrand,Brand,Brand,Brand,Brand",
"NonBrand,NonBrand,NonBrand,NonBrand,Brand,NonBrand", "NonBrand,NonBrand,Brand,NonBrand,Brand",
"Brand,NonBrand,NonBrand,Brand,NonBrand", "NonBrand,NonBrand,NonBrand,Brand,Brand,Brand",
"NonBrand,Brand,Brand,NonBrand,Brand", "Brand,NonBrand,NonBrand,NonBrand,NonBrand,Brand",
"Brand,Brand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand", "Brand,Brand,Brand,Brand,NonBrand,NonBrand,NonBrand"
), click_count = c(1799265, 874478, 198657, 128159, 45728, 30172,
20520, 17815, 16718, 9479, 6554, 3722, 3561, 3408, 3391, 3366,
3256, 2526, 1846, 1708, 1682, 1013, 951, 899, 881, 782, 780,
703, 642, 625, 615, 601, 453, 442, 414, 407, 362, 343, 313, 284,
281, 281, 271, 269, 268, 229, 223, 218, 215, 212, 204, 162, 161,
158, 155, 145, 132, 130, 115, 103, 102, 86, 77, 77, 72, 68, 68,
67, 58, 52, 32, 18, 18), conv_count = c(30938, 19652, 7401, 3803,
2014, 1072, 1084, 981, 652, 379, 230, 166, 205, 246, 254, 93,
239, 104, 112, 51, 76, 23, 66, 81, 55, 29, 62, 57, 50, 37, 17,
33, 38, 17, 8, 41, 33, 30, 24, 16, 26, 18, 16, 17, 7, 21, 10,
8, 27, 23, 11, 13, 6, 15, 14, 16, 8, 10, 6, 6, 11, 11, 8, 9,
8, 8, 9, 7, 7, 6, 6, 6, 7), CR = c(0.0171947989873643, 0.0224728352228415,
0.0372551684561833, 0.0296740767328085, 0.0440430370888733, 0.0355296301206417,
0.0528265107212476, 0.0550659556553466, 0.0389998803684651, 0.0399831205823399,
0.0350930729325603, 0.0445996775926921, 0.057568098848638, 0.0721830985915493,
0.0749041580654674, 0.0276292335115865, 0.0734029484029484, 0.0411718131433096,
0.0606717226435536, 0.0298594847775176, 0.0451843043995244, 0.0227048371174729,
0.0694006309148265, 0.0901001112347052, 0.0624290578887628, 0.0370843989769821,
0.0794871794871795, 0.0810810810810811, 0.0778816199376947, 0.0592,
0.0276422764227642, 0.0549084858569052, 0.0838852097130243, 0.0384615384615385,
0.0193236714975845, 0.100737100737101, 0.0911602209944751, 0.0874635568513119,
0.0766773162939297, 0.0563380281690141, 0.0925266903914591, 0.0640569395017794,
0.0590405904059041, 0.0631970260223048, 0.0261194029850746, 0.091703056768559,
0.0448430493273543, 0.036697247706422, 0.125581395348837, 0.108490566037736,
0.053921568627451, 0.0802469135802469, 0.0372670807453416, 0.0949367088607595,
0.0903225806451613, 0.110344827586207, 0.0606060606060606, 0.0769230769230769,
0.0521739130434783, 0.058252427184466, 0.107843137254902, 0.127906976744186,
0.103896103896104, 0.116883116883117, 0.111111111111111, 0.117647058823529,
0.132352941176471, 0.104477611940299, 0.120689655172414, 0.115384615384615,
0.1875, 0.333333333333333, 0.388888888888889)), .Names = c("Path",
"click_count", "conv_count", "CR"), row.names = c(NA, -73L), class = "data.frame")
another one here:
breakVector <- structure(list(breakVector = structure(c(1L, 1L), .Label = "NonBrand", class = "factor"),
CR = c(0.461541302855402, 0.538458697144598)), .Names = c("breakVector",
"CR"), row.names = c(NA, -2L), class = "data.frame")
and:
FinalTable <- structure(list(autribution_category = structure(c(2L, 1L), .Label = c("Brand",
"NonBrand"), class = "factor"), attributed_result = c(0, 0)), .Names = c("autribution_category",
"attributed_result"), row.names = 1:2, class = "data.frame")
when I run the following command:
if (FinalTable [2,1] == breakVector[1,1]) {
FinalTable$attributed_result[2] <- FinalTable$attributed_result[2] +
breakVector[1,2] * AggData$conv_count[3];
break}
I get the following error:
Error in Ops.factor(FinalTable[2, 1], breakVector[1, 1]) :
level sets of factors are different
This is pretty weird, since both values that im comparing are factors, I don't see any reason why R cant compare the two levels?
FinalTable[2,1] and breakVector[1,1] do not have the same levels:
> FinalTable[2,1]
[1] Brand
Levels: Brand NonBrand
> breakVector[1,1]
[1] NonBrand
Levels: NonBrand
This is easily fixed by using
breakVector[,1] <- factor(breakVector[,1], levels=c("Brand", "NonBrand"))
or, more generally
breakVector[,1] <- factor(breakVector[,1], levels=levels(FinalTable[,1]))
Perhaps, it will better compare both variables like a string:
if (as.character(FinalTable [2,1]) == as.character(breakVector[1,1])) {
FinalTable$attributed_result[2] <- FinalTable$attributed_result[2] +
breakVector[1,2] * AggData$conv_count[3];
break}

Resources