I know that with Flux.jl I can do julia> Flux.params(model) to get the parameters but the output does not tell me how many total parameters actually exist in the model itself. Is there a function to check this or a programatic way to calculate this?
As #mcabbott mentions in the comment, you can pass in the whole model to the params function to get the total count (sum(length, params(model))) or loop through each layer as follows:
julia> model = Chain(
resnet[1:end-2],
Dense(2048, 1000),
Dense(1000, 256),
Dense(256, 2), # we get 2048 features out, and we have 2 classes
)
Chain(Chain(Conv((7, 7), 3=>64), MaxPool((3, 3), pad=1, stride=2), Metalhead.ResidualBlock((Conv((1, 1), 64=>64), Conv((3, 3), 64=>64), Conv((1, 1), 64=>256)), (BatchNorm(64), BatchNorm(64), BatchNorm(256)), Chain(Conv((1, 1), 64=>256), BatchNorm(256))), Metalhead.ResidualBlock((Conv((1, 1), 256=>64), Conv((3, 3), 64=>64), Conv((1, 1), 64=>256)), (BatchNorm(64), BatchNorm(64), BatchNorm(256)), identity), Metalhead.ResidualBlock((Conv((1, 1), 256=>64), Conv((3, 3), 64=>64), Conv((1, 1), 64=>256)), (BatchNorm(64), BatchNorm(64), BatchNorm(256)), identity), Metalhead.ResidualBlock((Conv((1, 1), 256=>128), Conv((3, 3), 128=>128), Conv((1, 1), 128=>512)), (BatchNorm(128), BatchNorm(128), BatchNorm(512)), Chain(Conv((1, 1), 256=>512), BatchNorm(512))), Metalhead.ResidualBlock((Conv((1, 1), 512=>128), Conv((3, 3), 128=>128), Conv((1, 1), 128=>512)), (BatchNorm(128), BatchNorm(128), BatchNorm(512)), identity), Metalhead.ResidualBlock((Conv((1, 1), 512=>128), Conv((3, 3), 128=>128), Conv((1, 1), 128=>512)), (BatchNorm(128), BatchNorm(128), BatchNorm(512)), identity), Metalhead.ResidualBlock((Conv((1, 1), 512=>128), Conv((3, 3), 128=>128), Conv((1, 1), 128=>512)), (BatchNorm(128), BatchNorm(128), BatchNorm(512)), identity), Metalhead.ResidualBlock((Conv((1, 1), 512=>256), Conv((3, 3), 256=>256), Conv((1, 1), 256=>1024)), (BatchNorm(256), BatchNorm(256), BatchNorm(1024)), Chain(Conv((1, 1), 512=>1024), BatchNorm(1024))), Metalhead.ResidualBlock((Conv((1, 1), 1024=>256), Conv((3, 3), 256=>256), Conv((1, 1), 256=>1024)), (BatchNorm(256), BatchNorm(256), BatchNorm(1024)), identity), Metalhead.ResidualBlock((Conv((1, 1), 1024=>256), Conv((3, 3), 256=>256), Conv((1, 1), 256=>1024)), (BatchNorm(256), BatchNorm(256), BatchNorm(1024)), identity), Metalhead.ResidualBlock((Conv((1, 1), 1024=>256), Conv((3, 3), 256=>256), Conv((1, 1), 256=>1024)), (BatchNorm(256), BatchNorm(256), BatchNorm(1024)), identity), Metalhead.ResidualBlock((Conv((1, 1), 1024=>256), Conv((3, 3), 256=>256), Conv((1, 1), 256=>1024)), (BatchNorm(256), BatchNorm(256), BatchNorm(1024)), identity), Metalhead.ResidualBlock((Conv((1, 1), 1024=>256), Conv((3, 3), 256=>256), Conv((1, 1), 256=>1024)), (BatchNorm(256), BatchNorm(256), BatchNorm(1024)), identity), Metalhead.ResidualBlock((Conv((1, 1), 1024=>512), Conv((3, 3), 512=>512), Conv((1, 1), 512=>2048)), (BatchNorm(512), BatchNorm(512), BatchNorm(2048)), Chain(Conv((1, 1), 1024=>2048), BatchNorm(2048))), Metalhead.ResidualBlock((Conv((1, 1), 2048=>512), Conv((3, 3), 512=>512), Conv((1, 1), 512=>2048)), (BatchNorm(512), BatchNorm(512), BatchNorm(2048)), identity), Metalhead.ResidualBlock((Conv((1, 1), 2048=>512), Conv((3, 3), 512=>512), Conv((1, 1), 512=>2048)), (BatchNorm(512), BatchNorm(512), BatchNorm(2048)), identity), MeanPool((7, 7)), #103), Dense(2048, 1000), Dense(1000, 256), Dense(256, 2))
julia> paramCount = 0
0
julia> for layer in model
paramCount += sum(length, params(layer))
end
julia> paramCount
25840234
In this example, I am just incrementing the count but you could append the count from each layer into an array for example to keep track of each layer's count individually.
Related
I would like to subset my data frame based on the index column; I would like to keep those cases whose index is saved in myvar (eg. 110, 111). I don't understand why I receive 0 observations when running this code:
newdata <- df[ which(df$index=="myvars"), ]
Sample data:
df<-structure(list(index = c(111, 110, 101, 111), et = c(1, 1, 1,
1), d1_t2 = c(0, 1, 1, 1), d1_t3 = c(0, 0, 1, 1), d1_t4 = c(0,
1, 0, 1), d2_t1 = c(0, 0, 1, 1), d2_t2 = c(0, 1, 1, 1), d2_t3 = c(0,
0, 0, 1), d2_t4 = c(1, 0, 1, 1), d3_t1 = c(1, 0, 1, 1), d3_t2 = c(1,
1, 0, 1), d3_t3 = c(1, 0, 1, 1), d3_t4 = c(1, 1, 0, 1), d4_t1 = c(0,
0, 1, 1), d4_t2 = c(1, 1, 0, 1), d4_t3 = c(0, 0, 1, 1), d4_t4 = c(1,
0, 1, 1), d5_t1 = c(1, 0, 0, 1), d5_t2 = c(0, 1, 1, 1), d5_t3 = c(1,
0, 1, 1), d5_t4 = c(0, 0, 1, 1), d6_t1 = c(1, 0, 0, 1), d6_t2 = c(0,
0, 1, 1), d6_t3 = c(1, 0, 1, 1), d6_t4 = c(1, 0, 1, 1), d7_t1 = c(1,
1, 1, 1), d7_t2 = c(1, 1, 1, 1), d7_t3 = c(1, 0, 1, 1), d7_t4 = c(1,
0, 1, 1)), row.names = c(NA, 4L), class = "data.frame")
Code:
myvars<-c("110", "111")
try
myvars<-c(110, 111) # <-- !! no quotes !!
df[ which(df$index %in% myvars ), ] #also, no quotes round myvars
There are several basic problems with what you are trying to do.
You are not using the variable 'myvars' -- you are using a string with the value "myvars". None of your rows has the index "myvars".
You are using == which is good for one value (e.g. values==4), but myvars has multiple values in it. Instead, you could use df$index %in% myvars
This does work, but you have integer indices, and are accessing them with strings. This is unnecessary, and could lead to problems in other places.
You may be confused because of your very large and complex example data. You only need one column to test -- not twenty.
I am trying to benchmark the performance of the Flux code mentioned below:
#model
using Flux
vgg19() = Chain(
Conv((3, 3), 3 => 64, relu, pad=(1, 1), stride=(1, 1)),
Conv((3, 3), 64 => 64, relu, pad=(1, 1), stride=(1, 1)),
MaxPool((2,2)),
Conv((3, 3), 64 => 128, relu, pad=(1, 1), stride=(1, 1)),
Conv((3, 3), 128 => 128, relu, pad=(1, 1), stride=(1, 1)),
MaxPool((2,2)),
Conv((3, 3), 128 => 256, relu, pad=(1, 1), stride=(1, 1)),
Conv((3, 3), 256 => 256, relu, pad=(1, 1), stride=(1, 1)),
Conv((3, 3), 256 => 256, relu, pad=(1, 1), stride=(1, 1)),
MaxPool((2,2)),
Conv((3, 3), 256 => 512, relu, pad=(1, 1), stride=(1, 1)),
Conv((3, 3), 512 => 512, relu, pad=(1, 1), stride=(1, 1)),
Conv((3, 3), 512 => 512, relu, pad=(1, 1), stride=(1, 1)),
MaxPool((2,2)),
Conv((3, 3), 512 => 512, relu, pad=(1, 1), stride=(1, 1)),
Conv((3, 3), 512 => 512, relu, pad=(1, 1), stride=(1, 1)),
Conv((3, 3), 512 => 512, relu, pad=(1, 1), stride=(1, 1)),
BatchNorm(512),
MaxPool((2,2)),
flatten,
Dense(512, 4096, relu),
Dropout(0.5),
Dense(4096, 4096, relu),
Dropout(0.5),
Dense(4096, 10),
softmax
)
#data
using MLDatasets: CIFAR10
using Flux: onehotbatch
# Data comes pre-normalized in Julia
trainX, trainY = CIFAR10.traindata(Float32)
testX, testY = CIFAR10.testdata(Float32)
# One hot encode labels
trainY = onehotbatch(trainY, 0:9)
testY = onehotbatch(testY, 0:9)
#training
using Flux: crossentropy, #epochs
using Flux.Data: DataLoader
model = vgg19()
opt = Momentum(.001, .9)
loss(x, y) = crossentropy(model(x), y)
data = DataLoader(trainX, trainY, batchsize=64)
#epochs 100 Flux.train!(loss, params(model), data, opt)
I have tried using the in-built tick() and tock() function to measure the time. But, this gives out a basic time and not efficient to perform the intensive comparison.
Numerous developers in the community have recommended using BenchmarkTools.jl package to benchmark the code. But when I try to benchmark the ScikitLearn Model in the REPL it produced a warning;
WARNING: redefinition of constant LogisticRegression. This may fail, cause incorrect answers, or produce other errors.
Similarly, I tried to benchmark the above-mentioned code in the REPL using #btime but it throws this error:
julia> using BenchmarkTools
julia> #btime include("C:/Users/user/code.jl")
[ Info: Epoch 1
WARNING: both Flux and BenchmarkTools export "params"; uses of it in module Main must be qualified
ERROR: LoadError: UndefVarError: params not defined
May I know what is the best way to perform a detailed benchmark of the code?
Thanks in advance.
The below mentioned code is taken from model-zoo. I am trying to run the vgg19 tutorial in julia using flux library.
Code:
#model
using Flux
vgg19() = Chain(
Conv((3, 3), 3 => 64, relu, pad=(1, 1), stride=(1, 1)),
Conv((3, 3), 64 => 64, relu, pad=(1, 1), stride=(1, 1)),
MaxPool((2,2)),
Conv((3, 3), 64 => 128, relu, pad=(1, 1), stride=(1, 1)),
Conv((3, 3), 128 => 128, relu, pad=(1, 1), stride=(1, 1)),
MaxPool((2,2)),
Conv((3, 3), 128 => 256, relu, pad=(1, 1), stride=(1, 1)),
Conv((3, 3), 256 => 256, relu, pad=(1, 1), stride=(1, 1)),
Conv((3, 3), 256 => 256, relu, pad=(1, 1), stride=(1, 1)),
MaxPool((2,2)),
Conv((3, 3), 256 => 512, relu, pad=(1, 1), stride=(1, 1)),
Conv((3, 3), 512 => 512, relu, pad=(1, 1), stride=(1, 1)),
Conv((3, 3), 512 => 512, relu, pad=(1, 1), stride=(1, 1)),
MaxPool((2,2)),
Conv((3, 3), 512 => 512, relu, pad=(1, 1), stride=(1, 1)),
Conv((3, 3), 512 => 512, relu, pad=(1, 1), stride=(1, 1)),
Conv((3, 3), 512 => 512, relu, pad=(1, 1), stride=(1, 1)),
BatchNorm(512),
MaxPool((2,2)),
flatten,
Dense(512, 4096, relu),
Dropout(0.5),
Dense(4096, 4096, relu),
Dropout(0.5),
Dense(4096, 10),
softmax
)
#data
using MLDatasets: CIFAR10
using Flux: onehotbatch
# Data comes pre-normalized in Julia
trainX, trainY = CIFAR10.traindata(Float64)
testX, testY = CIFAR10.testdata(Float64)
# One hot encode labels
trainY = onehotbatch(trainY, 0:9)
testY = onehotbatch(testY, 0:9)
#training
using Flux: crossentropy, #epochs
using Flux.Data: DataLoader
model = vgg19()
opt = Momentum(.001, .9)
loss(x, y) = crossentropy(model(x), y)
data = DataLoader(trainX, trainY, batchsize=64)
#epochs 100 Flux.train!(loss, params(model), data, opt)
When I execute this file on IJulia, the following error is thrown:
MethodError: no method matching ∇maxpool(::Array{Float32,4}, ::Array{Float64,4}, ::Array{Float64,4}, ::PoolDims{2,(2, 2),(2, 2),(0, 0, 0, 0),(1, 1)})
Closest candidates are:
∇maxpool(::AbstractArray{T,N}, !Matched::AbstractArray{T,N}, !Matched::AbstractArray{T,N}, ::PoolDims; kwargs...) where {T, N}
Please suggest some solution for this error and if possible do provide a brief explanation or reference.
Thanks in advance!
As mentioned by #mcabbott, the issue was related to the input type of the data. This can be fixed by changing the type from Float64 to Float32 for below mentioned parameters under #data section.
trainX, trainY = CIFAR10.traindata(Float32)
testX, testY = CIFAR10.testdata(Float32)
Using the following script:
df <- read.csv("/covpl.csv")
m <- melt(df)
Time <- m$variable
coverage_plot <- ggplot(data=m, aes(x=Time, y=value, group=config, color=config)) +
geom_line(size=1) +
geom_point(aes(shape=config, colour = config), show.legend = T, size=3) +
scale_x_discrete(labels = seq(1, 60.0, by=1)) +
theme(legend.position="bottom", axis.text.x = element_text(angle = 90),text = element_text(size=13),legend.title=element_blank())+
labs(x = "Time (minutes)", y = "Coverage") +
guides(shape=guide_legend(override.aes=list(size=3, linetype=0)))
I get the following plot:
In the x-axis, I would like the labels to be from 1 to 30 (in this case 60 should be shown as 30) because the data represent a value that is stored after half a minute (this is why we have 60 data points) but I want to plot them as 30 minutes.
To do that, I changed scale_x_discrete(labels = seq(1, 30.0, by=1)) but this gives the following:
Do you have any idea how to fix this?
Reproducible data:
structure(list(config = structure(1:5, .Label = c("f1", "f2",
"f3", "f4", "f5"), class = "factor"), class = structure(c(1L,
1L, 1L, 1L, 1L), .Label = "com.google.common.base.Joiner", class = "factor"),
CoverageTimeline_T1 = c(0.85390153, 0.841557035, 0.8381143561,
0.8404624807, 0.8448297462), CoverageTimeline_T2 = c(0.9431633586,
0.9192875446, 0.9010343959, 0.9126220049, 0.938583703), CoverageTimeline_T3 = c(0.9881426292,
0.9793648538, 0.9406397492, 0.9507933561, 0.9762333662),
CoverageTimeline_T4 = c(0.9937107313, 0.9933404876, 0.9632557533,
0.9706779854, 0.9946485039), CoverageTimeline_T5 = c(0.9966666667,
1, 0.9799043011, 0.9830096664, 0.9966666667), CoverageTimeline_T6 = c(0.9966666667,
1, 0.9930106526, 0.9866666667, 0.9966666667), CoverageTimeline_T7 = c(0.9966666667,
1, 1, 0.991560876, 0.9966666667), CoverageTimeline_T8 = c(0.9966666667,
1, 1, 0.9966666667, 0.9966666667), CoverageTimeline_T9 = c(0.9966666667,
1, 1, 0.9966666667, 0.9966666667), CoverageTimeline_T10 = c(0.9966666667,
1, 1, 0.9966666667, 0.9966666667), CoverageTimeline_T11 = c(0.9966666667,
1, 1, 0.9966666667, 0.9966666667), CoverageTimeline_T12 = c(0.9966666667,
1, 1, 0.9966666667, 0.9966666667), CoverageTimeline_T13 = c(0.9966666667,
1, 1, 0.9966666667, 0.9966666667), CoverageTimeline_T14 = c(0.9966666667,
1, 1, 0.9966666667, 0.9966666667), CoverageTimeline_T15 = c(0.9966666667,
1, 1, 0.9966666667, 0.9966666667), CoverageTimeline_T16 = c(0.9966666667,
1, 1, 0.9966666667, 0.9966666667), CoverageTimeline_T17 = c(0.9966666667,
1, 1, 0.9966666667, 0.9966666667), CoverageTimeline_T18 = c(0.9966666667,
1, 1, 0.9966666667, 0.9966666667), CoverageTimeline_T19 = c(0.9966666667,
1, 1, 0.9966666667, 0.9966666667), CoverageTimeline_T20 = c(0.9966666667,
1, 1, 0.9966666667, 0.9989709749), CoverageTimeline_T21 = c(0.9966666667,
1, 1, 0.9966666667, 1), CoverageTimeline_T22 = c(0.9966666667,
1, 1, 0.9966666667, 1), CoverageTimeline_T23 = c(0.9966666667,
1, 1, 0.9966666667, 1), CoverageTimeline_T24 = c(0.9966666667,
1, 1, 0.9966666667, 1), CoverageTimeline_T25 = c(0.9966666667,
1, 1, 0.9966666667, 1), CoverageTimeline_T26 = c(0.9966666667,
1, 1, 0.9966666667, 1), CoverageTimeline_T27 = c(0.9966666667,
1, 1, 0.9966666667, 1), CoverageTimeline_T28 = c(0.9966666667,
1, 1, 0.9966666667, 1), CoverageTimeline_T29 = c(0.9966666667,
1, 1, 0.9966666667, 1), CoverageTimeline_T30 = c(0.9966666667,
1, 1, 0.9966666667, 1), CoverageTimeline_T31 = c(0.9966666667,
1, 1, 0.9966666667, 1), CoverageTimeline_T32 = c(0.9966666667,
1, 1, 0.9966666667, 1), CoverageTimeline_T33 = c(0.9966666667,
1, 1, 0.9966666667, 1), CoverageTimeline_T34 = c(0.9966666667,
1, 1, 0.9966666667, 1), CoverageTimeline_T35 = c(0.9966666667,
1, 1, 0.9966666667, 1), CoverageTimeline_T36 = c(0.9966666667,
1, 1, 0.9966666667, 1), CoverageTimeline_T37 = c(0.9966666667,
1, 1, 0.9966666667, 1), CoverageTimeline_T38 = c(0.9966666667,
1, 1, 0.9966666667, 1), CoverageTimeline_T39 = c(0.9966666667,
1, 1, 0.9966666667, 1), CoverageTimeline_T40 = c(0.9966666667,
1, 1, 1, 1), CoverageTimeline_T41 = c(0.9966666667, 1, 1,
1, 1), CoverageTimeline_T42 = c(0.9966666667, 1, 1, 1, 1),
CoverageTimeline_T43 = c(0.9966666667, 1, 1, 1, 1), CoverageTimeline_T44 = c(0.9966666667,
1, 1, 1, 1), CoverageTimeline_T45 = c(0.9966666667, 1, 1,
1, 1), CoverageTimeline_T46 = c(0.9966666667, 1, 1, 1, 1),
CoverageTimeline_T47 = c(0.9966666667, 1, 1, 1, 1), CoverageTimeline_T48 = c(0.9966666667,
1, 1, 1, 1), CoverageTimeline_T49 = c(0.9966666667, 1, 1,
1, 1), CoverageTimeline_T50 = c(0.9966666667, 1, 1, 1, 1),
CoverageTimeline_T51 = c(0.9966666667, 1, 1, 1, 1), CoverageTimeline_T52 = c(0.9966666667,
1, 1, 1, 1), CoverageTimeline_T53 = c(0.9966666667, 1, 1,
1, 1), CoverageTimeline_T54 = c(0.9966666667, 1, 1, 1, 1),
CoverageTimeline_T55 = c(0.9966666667, 1, 1, 1, 1), CoverageTimeline_T56 = c(0.9966666667,
1, 1, 1, 1), CoverageTimeline_T57 = c(0.9966666667, 1, 1,
1, 1), CoverageTimeline_T58 = c(0.9966666667, 1, 1, 1, 1),
CoverageTimeline_T59 = c(0.9966666667, 1, 1, 1, 1), CoverageTimeline_T60 = c(0.9966666667,
1, 1, 1, 1)), class = "data.frame", row.names = c(NA, -5L
))
Edit: It would be better if you define Time as a numeric vector based on the factor you have (m$variable). Using a simple regular expression we can pull out the number and divide by 2:
df <- read.csv("/covpl.csv")
m <- melt(df)
Time <- as.numeric(gsub('.*_T', '', m$variable)) / 2
coverage_plot <- ggplot(data=m, aes(x=Time, y=value, group=config, color=config)) +
geom_line(size=1) +
geom_point(aes(shape=config, colour = config), show.legend = T, size=3) +
theme(legend.position="bottom", axis.text.x = element_text(angle = 90),text = element_text(size=13),legend.title=element_blank())+
labs(x = "Time (minutes)", y = "Coverage") +
guides(shape=guide_legend(override.aes=list(size=3, linetype=0)))
I am trying to run k-means clustering on a data set which was preprocessed (categorical to dummy, na cleaning etc.).
here is an extract (head) of the data:
dput(head(clustering.set.in))
structure(list(activity_type = c(1, 1, 1, 1, 1, 1), app_id.PXkw7OJ1se = c(0,
1, 1, 1, 1, 0), app_id.PXszbKVa5M = c(0, 0, 0, 0, 0, 0), app_id.PXw3GFQKBm = c(1,
0, 0, 0, 0, 0), browser_version = c(48, 42, 9, 9, 48, 44), continent.AS = c(0,
1, 1, 0, 0, 0), continent.EU = c(0, 0, 0, 0, 1, 0), continent.SA = c(0,
0, 0, 0, 0, 0), f_activex = c(1, 1, 1, 1, 1, 1), f_atob = c(2,
2, 2, 2, 2, 2), f_audio = c(2, 2, 2, 2, 2, 2), f_battery = c(2,
2, 1, 1, 2, 2), f_bind = c(2, 2, 2, 2, 2, 2), f_flash = c(1,
2, 2, 2, 2, 2), f_getComputedStyle = c(2, 2, 2, 2, 2, 2), f_matchSelector = c(2,
2, 2, 2, 2, 2), f_mimeTypes = c(2, 2, 2, 2, 2, 2), f_mimeTypesLength = c(0,
8, 11, 55, 7, 8), f_navigationTiming = c(2, 2, 1, 2, 2, 2), f_orientationEvents = c(2,
1, 1, 1, 1, 1), f_plugins = c(2, 2, 2, 2, 2, 2), f_pluginsLength = c(0,
6, 6, 15, 5, 6), f_raf = c(2, 2, 2, 2, 2, 2), f_resourceTiming = c(2,
2, 1, 1, 2, 2), f_sse = c(2, 2, 2, 2, 2, 2), f_webgl = c(1, 2,
2, 2, 2, 1), f_websql = c(1, 2, 2, 2, 2, 2), f_xdr = c(1, 1,
1, 1, 1, 1), n_appCodeName = c(2, 2, 2, 2, 2, 2), n_doNotTrack = c(2,
2, 1, 2, 2, 2), n_geolocation = c(2, 2, 2, 2, 2, 2), n_mimeTypes = c(2,
2, 2, 2, 2, 2), n_platform.iPhone = c(0, 0, 0, 0, 0, 0), n_platform.Linux.armv7l = c(1,
0, 0, 0, 0, 0), n_platform.MacIntel = c(0, 0, 1, 1, 0, 0), n_platform.Win32 = c(0,
1, 0, 0, 1, 0), n_plugins = c(2, 2, 2, 2, 2, 2), n_product.Sub20030107 = c(1,
1, 1, 1, 1, 1), n_product.Sub20100101 = c(0, 0, 0, 0, 0, 0),
n_product.Submissing = c(0, 0, 0, 0, 0, 0), os_family.Android = c(1,
0, 0, 0, 0, 0), os_family.iOS = c(0, 0, 0, 0, 0, 0), os_family.Mac.OS.X = c(0,
0, 1, 1, 0, 0), os_family.Windows = c(0, 1, 0, 0, 1, 0),
os_version = c(6, 8.1, 10, 10, 7, 0), site_history_length = c(31,
1, 1, 1, 1, 1), w_chrome...loadTimes....csi....app....webstore....runtime.. = c(0,
1, 0, 0, 1, 0), w_chrome...loadTimes....csi.. = c(0, 0, 0,
0, 0, 0), w_chrome... = c(1, 0, 1, 1, 0, 0), window_dimensions = c(2,
1, 2, 2, 2, 2), window_history = c(50, 1, 1, 1, 1, 3)), .Names = c("activity_type",
"app_id.PXkw7OJ1se", "app_id.PXszbKVa5M", "app_id.PXw3GFQKBm",
"browser_version", "continent.AS", "continent.EU", "continent.SA",
"f_activex", "f_atob", "f_audio", "f_battery", "f_bind", "f_flash",
"f_getComputedStyle", "f_matchSelector", "f_mimeTypes", "f_mimeTypesLength",
"f_navigationTiming", "f_orientationEvents", "f_plugins", "f_pluginsLength",
"f_raf", "f_resourceTiming", "f_sse", "f_webgl", "f_websql",
"f_xdr", "n_appCodeName", "n_doNotTrack", "n_geolocation", "n_mimeTypes",
"n_platform.iPhone", "n_platform.Linux.armv7l", "n_platform.MacIntel",
"n_platform.Win32", "n_plugins", "n_product.Sub20030107", "n_product.Sub20100101",
"n_product.Submissing", "os_family.Android", "os_family.iOS",
"os_family.Mac.OS.X", "os_family.Windows", "os_version", "site_history_length",
"w_chrome...loadTimes....csi....app....webstore....runtime..",
"w_chrome...loadTimes....csi..", "w_chrome...", "window_dimensions",
"window_history"), row.names = c(NA, 6L), class = "data.frame")
I am trying to cluster kmeans this data sets (k=2)
and getting error message:
Error in pam(clustering.set.in, k) :
negative length vectors are not allowed
my line of code:
pam(clustering.set.in, 2)
Any suggestions ?
it turns out that one column has na values in it.
Removed it with
new.data[is.na(new.data)] <- 1
and it seems to work fine now