Empty nodes when creating a SOM in R - r

I am trying to create a SOM map based on records with different discrete classifications (tags) like the example below
Record Tag1 Tag2 Tag3 Tag4
3555 1 0 0 0
6447 1 0 0 0
5523 1 0 1 0
7550 1 0 1 0
6330 1 0 1 0
2451 1 0 0 0
4308 1 0 1 0
8917 0 0 0 0
4780 1 0 1 0
6802 1 0 1 0
2021 1 0 0 0
5792 1 0 1 0
5475 1 0 1 0
4198 1 0 0 0
223 1 0 1 0
4811 1 0 1 0
678 1 0 1 0
The problem I am facing is that there are many empty nodes in the SOM. From what I have read, each node should have 5-10 records but still, this is not working.
Could it be that all observations are very different from one another?

Related

comparison.wordcloud error of strwidth(words[i], cex = size[i], ...) : invalid 'cex' value (SOLVED)

I have a termdocumentmatrix tdm1 and I put it in through this formula:
comparison.cloud(tdm1, random.order=FALSE,
colors = c("#00B2FF", "red", "#FF0099", "#6600CC", "green", "orange", "blue", "brown"),
title.size=1, max.words=50, scale=c(4, 0.5),rot.per=0.4)
However, I got an error which is "Error in strwidth(words[i], cex = size[i], ...) : invalid 'cex' value"
Not too sure what cex value am I missing.
The tdm1 is as follows:
Docs
Terms anger anticipation disgust fear joy sadness surprise trust
bag 1 0 0 0 0 1 0 0
choices 1 1 0 0 1 2 1 1
limited 1 0 0 0 0 1 0 0
plastic 1 0 0 0 0 1 0 0
provided 1 0 0 0 0 1 0 0
abit 0 1 0 0 1 1 1 1
ai 0 2 0 0 2 1 2 2
always 0 1 0 1 0 0 0 1
amazed 0 1 0 0 1 1 1 1
amount 0 1 0 0 1 0 1 1
app 0 2 0 1 2 1 2 2
area 0 1 0 0 1 0 1 1
areas 0 1 0 0 0 0 0 0
around 0 1 0 0 1 0 1 1
atmosphere 0 1 0 0 1 0 1 1
attended 0 1 0 0 1 1 1 1
back 0 1 0 0 1 1 1 0
basah 0 1 0 1 1 0 1 1
bought 0 1 0 0 0 0 0 0
brands 0 1 0 0 1 0 1 1
bras 0 1 0 1 1 0 1 1
breeze 0 1 0 0 1 1 1 1
buy 0 1 0 1 1 1 1 1
can 0 2 0 0 1 1 1 0
cant 0 1 0 0 0 0 0 0
cashiers 0 1 0 0 1 0 1 1
cbd 0 1 0 0 0 0 0 0
charged 0 1 0 0 1 1 1 1
choose 0 2 0 0 1 0 0 0
chopstick 0 1 0 0 0 0 0 0
classes 0 1 0 0 1 0 0 0
come 0 2 0 0 2 2 2 1
concept 0 4 0 0 3 0 3 3
confused 0 1 0 0 1 1 1 1
contains 0 1 0 0 1 1 1 1
convenient 0 8 0 0 5 1 4 4
cool 0 4 0 0 4 1 4 4
correct 0 1 0 0 1 1 1 1
cream 0 1 0 0 1 0 1 1
cup 0 1 0 0 0 0 0 0
curious 0 1 0 0 0 0 0 0
current 0 1 0 0 1 0 1 1
customer 0 1 0 0 0 0 0 0
cutlery 0 1 0 0 0 0 0 0
doesnt 0 1 0 0 1 0 1 1
dont 0 1 0 0 1 0 1 1
don’t 0 1 0 1 1 1 1 1
download 0 1 0 0 1 1 1 1
drinks 0 1 0 0 1 0 0 0
easy 0 3 0 0 2 0 2 2
eat 0 1 0 0 1 1 1 1
electronic 0 1 0 0 1 1 1 1
eleven 0 1 0 0 1 0 1 1
entering 0 1 0 0 1 1 1 1
ereciept 0 1 0 0 1 1 1 1
especially 0 1 0 0 1 0 1 1
even 0 2 0 2 1 1 1 1
exit 0 1 0 0 1 1 1 1
experience 0 3 0 1 2 1 2 2
explained 0 1 0 0 1 1 1 1
feel 0 1 0 0 1 0 1 1
first 0 1 0 0 1 1 1 1
found 0 1 0 0 1 0 1 1
free 0 1 0 0 1 0 1 1
friends 0 1 0 0 1 1 1 1
fussfree 0 1 0 0 1 0 1 1
gantry 0 1 0 0 1 0 1 1
get 0 2 0 1 2 1 1 1
go 0 3 0 1 3 1 3 3
good 0 2 0 0 2 0 2 2
goods 0 1 0 1 1 1 1 1
goto 0 1 0 0 0 0 0 0
great 0 4 0 0 3 0 3 3
greatly 0 1 0 0 1 0 1 1
hasslefree 0 1 0 0 0 0 0 1
history 0 1 0 0 1 1 1 1
hope 0 1 0 0 1 0 1 1
hour 0 1 0 0 1 0 1 1
hours 0 1 0 0 1 0 1 1
ice 0 1 0 0 1 0 1 1
im 0 1 0 0 1 1 1 0
inside 0 1 0 0 1 1 1 1
items 0 3 0 0 3 2 3 2
jiffy 0 1 0 0 1 1 1 1
just 0 4 0 0 3 2 3 2
large 0 1 0 0 1 0 1 1
leave 0 1 0 0 1 1 1 0
less 0 1 0 0 0 0 0 1
link 0 2 0 0 2 1 2 2
linked 0 1 0 0 1 0 1 1
lots 0 1 0 0 0 0 0 1
love 0 3 0 0 3 1 3 2
lovely 0 1 0 0 1 1 1 1
makes 0 1 0 0 1 0 1 1
making 0 1 0 0 0 0 0 1
many 0 1 0 0 0 0 0 0
method 0 1 0 0 1 0 1 1
methods 0 1 0 0 1 1 1 1
minute 0 2 0 0 1 1 1 2
mrt 0 1 0 1 1 0 1 1
muchneeded 0 1 0 0 1 0 1 1
near 0 1 0 0 1 0 1 1
nearby 0 1 0 0 1 0 1 1
new 0 1 0 0 1 0 1 1
newly 0 1 0 0 0 0 0 0
nice 0 1 0 0 0 0 0 0
noodles 0 1 0 0 0 0 0 0
number 0 1 0 0 1 1 1 1
offers 0 1 0 0 1 0 1 1
often 0 2 0 0 2 2 2 1
opened 0 1 0 0 0 0 0 0
operate 0 1 0 0 1 0 1 1
order 0 1 0 0 1 0 1 1
outlets 0 1 0 0 1 1 1 0
patiently 0 1 0 0 1 1 1 1
pay 0 1 0 0 1 0 1 1
payment 0 3 0 0 3 1 3 3
people 0 1 0 0 0 0 0 0
perfect 0 1 0 0 1 1 1 1
pick 0 4 0 0 3 0 3 3
picking 0 1 0 0 1 1 1 1
prepare 0 1 0 0 0 0 0 0
prices 0 1 0 0 1 0 1 1
product 0 2 0 0 2 2 2 2
products 0 7 0 0 5 1 5 4
promotion 0 1 0 1 1 0 1 1
promotions 0 1 0 0 0 0 0 1
quench 0 1 0 0 1 1 1 1
queue 0 2 0 0 2 1 2 2
quite 0 1 0 0 1 1 1 1
range 0 2 0 0 1 1 1 0
ready 0 1 0 0 1 1 1 1
reasonable 0 1 0 0 1 0 1 1
recieved 0 1 0 0 1 1 1 1
recommend 0 1 0 0 0 0 0 1
reduced 0 1 0 0 1 0 1 1
rush 0 1 0 0 1 1 1 0
salut 0 1 0 0 0 0 0 1
sandwiches 0 1 0 0 1 1 1 1
scan 0 1 0 0 1 0 1 1
see 0 3 0 0 2 0 2 2
seems 0 1 0 0 0 0 0 0
setup 0 1 0 0 0 0 0 1
shop 0 3 0 0 2 1 2 2
shopping 0 4 0 1 4 2 4 4
show 0 1 0 0 1 1 1 1
small 0 1 0 0 0 0 0 0
smu 0 2 0 0 2 0 2 2
snacks 0 3 0 0 3 1 1 1
spent 0 1 0 0 1 0 1 1
staff 0 3 0 1 3 3 3 3
stared 0 1 0 1 1 1 1 1
stop 0 1 0 0 1 1 1 1
store 0 14 0 0 10 5 9 9
stores 0 1 0 0 1 0 1 1
students 0 1 0 0 1 0 0 0
stuff 0 1 0 0 0 0 0 0
super 0 2 0 0 2 0 2 2
sure 0 1 0 0 1 1 1 1
sweets 0 1 0 0 1 0 0 0
take 0 1 0 0 1 1 1 0
technology 0 4 0 0 4 2 4 4
thankfully 0 1 0 0 1 1 1 1
theres 0 1 0 0 0 0 0 0
thirst 0 1 0 0 1 1 1 1
thought 0 1 0 0 0 0 0 0
time 0 2 0 0 2 1 2 2
took 0 1 0 0 0 0 0 1
truly 0 1 0 0 0 0 0 1
unmanned 0 1 0 0 1 1 1 1
use 0 1 0 0 1 0 1 1
used 0 1 0 0 1 0 1 1
useful 0 1 0 0 1 0 0 0
users 0 1 0 0 1 0 1 1
variety 0 4 0 0 4 0 3 3
wait 0 2 0 0 1 0 1 1
waited 0 1 0 0 1 1 1 1
walk 0 2 0 0 1 0 1 1
want 0 1 0 0 0 0 0 0
wanted 0 1 0 0 1 0 1 1
wasnt 0 1 0 0 1 1 1 1
whatever 0 1 0 0 1 0 1 1
wide 0 4 0 0 3 1 2 1
won’t 0 1 0 1 1 1 1 1
worry 0 1 0 1 1 1 1 1
avoid 0 0 0 1 0 0 0 0
away 0 0 0 1 0 0 0 0
better 0 0 0 1 0 0 0 0
choice 0 0 0 1 0 0 0 0
customers 0 0 0 1 0 0 0 0
deceptive 0 0 0 1 0 0 0 0
expired 0 0 0 1 0 0 0 0
listed 0 0 0 1 0 0 0 0
make 0 0 0 1 0 0 0 0
marketing 0 0 0 1 0 0 0 0
minutes 0 0 0 1 0 0 0 0
much 0 0 0 1 0 0 0 0
purchases 0 0 0 1 0 0 0 0
qr 0 0 0 1 0 0 0 0
resulting 0 0 0 1 0 0 0 0
scanner 0 0 0 1 0 0 0 0
screen 0 0 0 1 0 0 0 0
showing 0 0 0 1 0 0 0 0
still 0 0 0 1 0 0 0 0
takes 0 0 0 1 0 0 0 0
tries 0 0 0 1 0 0 0 0
trusted 0 0 0 1 0 0 0 0
trusting 0 0 0 1 0 0 0 0
works 0 0 0 1 0 0 0 0
Hence, not too sure what is the issue about since there is no NA!
Hope you can help. Thank you!

How to use binary matrix in CSV for apriori alogorithm in r

I am trying to use the binary matrix containing transactions for the apriori algorithm I don't know how to implement it
data_purchase
Txn Bag Blush Nail.Polish Brushes Concealer Eyebrowpencil Bronzer
1 1 0 1 1 1 1 0 1
2 2 0 0 1 0 1 0 1
3 3 0 1 0 0 1 1 1
4 4 0 0 1 1 1 0 1
5 5 0 1 0 0 1 0 1
6 6 0 0 0 0 1 0 0
7 7 0 1 1 1 1 0 1
8 8 0 0 1 1 0 0 1
9 9 0 0 0 0 1 0 0
10 10 1 1 1 1 0 0 0
11 11 0 0 1 0 0 0 1
12 12 0 0 1 1 1 0 1
The above is the data frame containing the binary matrix.
Have a look at the R package arules at https://cran.r-project.org/package=arules

finding strcutural holes constraint , efficiency,ego density and effective size in r

I am working on the adjacency matrix to find the results of the egonet package function. But when I run the command index.egonet, it gives me an error.
My adjacency matrix "p2":
p2
1 2 3 4 5 7 8 9 6
1 0 1 1 1 1 0 0 0 0
2 1 0 0 0 1 1 1 1 0
3 1 0 0 0 0 1 0 1 1
4 1 0 0 0 0 0 0 0 0
5 1 1 0 0 0 0 0 0 0
7 0 1 1 0 0 0 0 0 0
8 0 1 0 0 0 0 0 0 0
9 0 1 1 0 0 0 0 0 0
6 0 0 1 0 0 0 0 0 0
I apply this command on the adjacency for the desired results but it gives me an error
index.egonet(p2)
Error in dati[ego.name, y] : subscript out of bounds
So any alternative or solution to current code error will highly be appreciated.
The ego name must be "EGO" in capital letters, as far as I could understand from working with that function.
colnames(p2) <- rownames(p2) <- c("EGO", 2:ncol(p2))
index.egonet(p2)
this should work...

Retrieve values in each cluster in R

I have successfully run the DBSCAN algorithm (here is the stripped down command):
results <- dbscan(data,MinPts=15, eps=0.01)
and plotted my clusters:
plot(results, data)
results$cluster returns a list with numeric values. The value at each index reflects the cluster to which the original data in that index belongs:
[1] 0 1 2 1 0 0 2 1 0 0 0 1 2 0 2 0 2 0 0 1 2 0 2 2 0 1 2 0 1 0 1 0 2 0 0 0 1 1 0 1 2 0 0 0 1 0 0 1 1 0 1
[52] 0 2 2 0 0 1 2 2 0 2 1 0 0 0 1 0 1 0 0 0 0 0 1 1 0 1 0 2 2 2 2 2 0 0 0 0 0 2 1 2 1 0 2 0 0 1 1 1 0 0 1
[103] 2 1 1 0 1 0 1 1 0 0 0 0 1 2 0 0 1 1 1 1 0 0 0 1 0 0 2 2 1 1 0 1 2 1 0 0 1 0 1 2 0 0 2 0 0 2 2 2 2 0 1
However, how can I retrieve the values of the original data that is in each cluster? For example, how can I get all the values from the original data that are in cluster #2?
Okay, this should do the trick for, e.g., cluster #2:
data[results$cluster==2,]

How to sum leading diagonal of table in R

I have a table created using the table() command in R:
y
x 0 1 2 3 4 5 6 7 8 9
0 23 0 0 0 0 1 0 0 0 0
1 0 23 1 0 1 0 1 2 0 2
2 1 1 28 0 0 0 1 0 2 2
3 0 1 0 24 0 1 0 0 0 1
4 1 1 0 0 34 0 3 0 0 0
5 0 0 0 0 0 33 0 0 0 0
6 0 0 0 0 0 2 32 0 0 0
7 0 1 0 1 0 0 0 36 0 1
8 1 1 1 1 0 0 0 1 20 1
9 1 3 0 1 0 1 0 1 0 24
This table shows the results of a classification, and I want to sum the leading diagonal of it (the diagonal with the large numbers - like 23, 23, 28 etc). Is there a sensible/easy way to do this in R?
How about sum(diag(tbl)), where tbl is your table?

Resources