I have successfully run the DBSCAN algorithm (here is the stripped down command):
results <- dbscan(data,MinPts=15, eps=0.01)
and plotted my clusters:
plot(results, data)
results$cluster returns a list with numeric values. The value at each index reflects the cluster to which the original data in that index belongs:
[1] 0 1 2 1 0 0 2 1 0 0 0 1 2 0 2 0 2 0 0 1 2 0 2 2 0 1 2 0 1 0 1 0 2 0 0 0 1 1 0 1 2 0 0 0 1 0 0 1 1 0 1
[52] 0 2 2 0 0 1 2 2 0 2 1 0 0 0 1 0 1 0 0 0 0 0 1 1 0 1 0 2 2 2 2 2 0 0 0 0 0 2 1 2 1 0 2 0 0 1 1 1 0 0 1
[103] 2 1 1 0 1 0 1 1 0 0 0 0 1 2 0 0 1 1 1 1 0 0 0 1 0 0 2 2 1 1 0 1 2 1 0 0 1 0 1 2 0 0 2 0 0 2 2 2 2 0 1
However, how can I retrieve the values of the original data that is in each cluster? For example, how can I get all the values from the original data that are in cluster #2?
Okay, this should do the trick for, e.g., cluster #2:
data[results$cluster==2,]
Related
I have a termdocumentmatrix tdm1 and I put it in through this formula:
comparison.cloud(tdm1, random.order=FALSE,
colors = c("#00B2FF", "red", "#FF0099", "#6600CC", "green", "orange", "blue", "brown"),
title.size=1, max.words=50, scale=c(4, 0.5),rot.per=0.4)
However, I got an error which is "Error in strwidth(words[i], cex = size[i], ...) : invalid 'cex' value"
Not too sure what cex value am I missing.
The tdm1 is as follows:
Docs
Terms anger anticipation disgust fear joy sadness surprise trust
bag 1 0 0 0 0 1 0 0
choices 1 1 0 0 1 2 1 1
limited 1 0 0 0 0 1 0 0
plastic 1 0 0 0 0 1 0 0
provided 1 0 0 0 0 1 0 0
abit 0 1 0 0 1 1 1 1
ai 0 2 0 0 2 1 2 2
always 0 1 0 1 0 0 0 1
amazed 0 1 0 0 1 1 1 1
amount 0 1 0 0 1 0 1 1
app 0 2 0 1 2 1 2 2
area 0 1 0 0 1 0 1 1
areas 0 1 0 0 0 0 0 0
around 0 1 0 0 1 0 1 1
atmosphere 0 1 0 0 1 0 1 1
attended 0 1 0 0 1 1 1 1
back 0 1 0 0 1 1 1 0
basah 0 1 0 1 1 0 1 1
bought 0 1 0 0 0 0 0 0
brands 0 1 0 0 1 0 1 1
bras 0 1 0 1 1 0 1 1
breeze 0 1 0 0 1 1 1 1
buy 0 1 0 1 1 1 1 1
can 0 2 0 0 1 1 1 0
cant 0 1 0 0 0 0 0 0
cashiers 0 1 0 0 1 0 1 1
cbd 0 1 0 0 0 0 0 0
charged 0 1 0 0 1 1 1 1
choose 0 2 0 0 1 0 0 0
chopstick 0 1 0 0 0 0 0 0
classes 0 1 0 0 1 0 0 0
come 0 2 0 0 2 2 2 1
concept 0 4 0 0 3 0 3 3
confused 0 1 0 0 1 1 1 1
contains 0 1 0 0 1 1 1 1
convenient 0 8 0 0 5 1 4 4
cool 0 4 0 0 4 1 4 4
correct 0 1 0 0 1 1 1 1
cream 0 1 0 0 1 0 1 1
cup 0 1 0 0 0 0 0 0
curious 0 1 0 0 0 0 0 0
current 0 1 0 0 1 0 1 1
customer 0 1 0 0 0 0 0 0
cutlery 0 1 0 0 0 0 0 0
doesnt 0 1 0 0 1 0 1 1
dont 0 1 0 0 1 0 1 1
don’t 0 1 0 1 1 1 1 1
download 0 1 0 0 1 1 1 1
drinks 0 1 0 0 1 0 0 0
easy 0 3 0 0 2 0 2 2
eat 0 1 0 0 1 1 1 1
electronic 0 1 0 0 1 1 1 1
eleven 0 1 0 0 1 0 1 1
entering 0 1 0 0 1 1 1 1
ereciept 0 1 0 0 1 1 1 1
especially 0 1 0 0 1 0 1 1
even 0 2 0 2 1 1 1 1
exit 0 1 0 0 1 1 1 1
experience 0 3 0 1 2 1 2 2
explained 0 1 0 0 1 1 1 1
feel 0 1 0 0 1 0 1 1
first 0 1 0 0 1 1 1 1
found 0 1 0 0 1 0 1 1
free 0 1 0 0 1 0 1 1
friends 0 1 0 0 1 1 1 1
fussfree 0 1 0 0 1 0 1 1
gantry 0 1 0 0 1 0 1 1
get 0 2 0 1 2 1 1 1
go 0 3 0 1 3 1 3 3
good 0 2 0 0 2 0 2 2
goods 0 1 0 1 1 1 1 1
goto 0 1 0 0 0 0 0 0
great 0 4 0 0 3 0 3 3
greatly 0 1 0 0 1 0 1 1
hasslefree 0 1 0 0 0 0 0 1
history 0 1 0 0 1 1 1 1
hope 0 1 0 0 1 0 1 1
hour 0 1 0 0 1 0 1 1
hours 0 1 0 0 1 0 1 1
ice 0 1 0 0 1 0 1 1
im 0 1 0 0 1 1 1 0
inside 0 1 0 0 1 1 1 1
items 0 3 0 0 3 2 3 2
jiffy 0 1 0 0 1 1 1 1
just 0 4 0 0 3 2 3 2
large 0 1 0 0 1 0 1 1
leave 0 1 0 0 1 1 1 0
less 0 1 0 0 0 0 0 1
link 0 2 0 0 2 1 2 2
linked 0 1 0 0 1 0 1 1
lots 0 1 0 0 0 0 0 1
love 0 3 0 0 3 1 3 2
lovely 0 1 0 0 1 1 1 1
makes 0 1 0 0 1 0 1 1
making 0 1 0 0 0 0 0 1
many 0 1 0 0 0 0 0 0
method 0 1 0 0 1 0 1 1
methods 0 1 0 0 1 1 1 1
minute 0 2 0 0 1 1 1 2
mrt 0 1 0 1 1 0 1 1
muchneeded 0 1 0 0 1 0 1 1
near 0 1 0 0 1 0 1 1
nearby 0 1 0 0 1 0 1 1
new 0 1 0 0 1 0 1 1
newly 0 1 0 0 0 0 0 0
nice 0 1 0 0 0 0 0 0
noodles 0 1 0 0 0 0 0 0
number 0 1 0 0 1 1 1 1
offers 0 1 0 0 1 0 1 1
often 0 2 0 0 2 2 2 1
opened 0 1 0 0 0 0 0 0
operate 0 1 0 0 1 0 1 1
order 0 1 0 0 1 0 1 1
outlets 0 1 0 0 1 1 1 0
patiently 0 1 0 0 1 1 1 1
pay 0 1 0 0 1 0 1 1
payment 0 3 0 0 3 1 3 3
people 0 1 0 0 0 0 0 0
perfect 0 1 0 0 1 1 1 1
pick 0 4 0 0 3 0 3 3
picking 0 1 0 0 1 1 1 1
prepare 0 1 0 0 0 0 0 0
prices 0 1 0 0 1 0 1 1
product 0 2 0 0 2 2 2 2
products 0 7 0 0 5 1 5 4
promotion 0 1 0 1 1 0 1 1
promotions 0 1 0 0 0 0 0 1
quench 0 1 0 0 1 1 1 1
queue 0 2 0 0 2 1 2 2
quite 0 1 0 0 1 1 1 1
range 0 2 0 0 1 1 1 0
ready 0 1 0 0 1 1 1 1
reasonable 0 1 0 0 1 0 1 1
recieved 0 1 0 0 1 1 1 1
recommend 0 1 0 0 0 0 0 1
reduced 0 1 0 0 1 0 1 1
rush 0 1 0 0 1 1 1 0
salut 0 1 0 0 0 0 0 1
sandwiches 0 1 0 0 1 1 1 1
scan 0 1 0 0 1 0 1 1
see 0 3 0 0 2 0 2 2
seems 0 1 0 0 0 0 0 0
setup 0 1 0 0 0 0 0 1
shop 0 3 0 0 2 1 2 2
shopping 0 4 0 1 4 2 4 4
show 0 1 0 0 1 1 1 1
small 0 1 0 0 0 0 0 0
smu 0 2 0 0 2 0 2 2
snacks 0 3 0 0 3 1 1 1
spent 0 1 0 0 1 0 1 1
staff 0 3 0 1 3 3 3 3
stared 0 1 0 1 1 1 1 1
stop 0 1 0 0 1 1 1 1
store 0 14 0 0 10 5 9 9
stores 0 1 0 0 1 0 1 1
students 0 1 0 0 1 0 0 0
stuff 0 1 0 0 0 0 0 0
super 0 2 0 0 2 0 2 2
sure 0 1 0 0 1 1 1 1
sweets 0 1 0 0 1 0 0 0
take 0 1 0 0 1 1 1 0
technology 0 4 0 0 4 2 4 4
thankfully 0 1 0 0 1 1 1 1
theres 0 1 0 0 0 0 0 0
thirst 0 1 0 0 1 1 1 1
thought 0 1 0 0 0 0 0 0
time 0 2 0 0 2 1 2 2
took 0 1 0 0 0 0 0 1
truly 0 1 0 0 0 0 0 1
unmanned 0 1 0 0 1 1 1 1
use 0 1 0 0 1 0 1 1
used 0 1 0 0 1 0 1 1
useful 0 1 0 0 1 0 0 0
users 0 1 0 0 1 0 1 1
variety 0 4 0 0 4 0 3 3
wait 0 2 0 0 1 0 1 1
waited 0 1 0 0 1 1 1 1
walk 0 2 0 0 1 0 1 1
want 0 1 0 0 0 0 0 0
wanted 0 1 0 0 1 0 1 1
wasnt 0 1 0 0 1 1 1 1
whatever 0 1 0 0 1 0 1 1
wide 0 4 0 0 3 1 2 1
won’t 0 1 0 1 1 1 1 1
worry 0 1 0 1 1 1 1 1
avoid 0 0 0 1 0 0 0 0
away 0 0 0 1 0 0 0 0
better 0 0 0 1 0 0 0 0
choice 0 0 0 1 0 0 0 0
customers 0 0 0 1 0 0 0 0
deceptive 0 0 0 1 0 0 0 0
expired 0 0 0 1 0 0 0 0
listed 0 0 0 1 0 0 0 0
make 0 0 0 1 0 0 0 0
marketing 0 0 0 1 0 0 0 0
minutes 0 0 0 1 0 0 0 0
much 0 0 0 1 0 0 0 0
purchases 0 0 0 1 0 0 0 0
qr 0 0 0 1 0 0 0 0
resulting 0 0 0 1 0 0 0 0
scanner 0 0 0 1 0 0 0 0
screen 0 0 0 1 0 0 0 0
showing 0 0 0 1 0 0 0 0
still 0 0 0 1 0 0 0 0
takes 0 0 0 1 0 0 0 0
tries 0 0 0 1 0 0 0 0
trusted 0 0 0 1 0 0 0 0
trusting 0 0 0 1 0 0 0 0
works 0 0 0 1 0 0 0 0
Hence, not too sure what is the issue about since there is no NA!
Hope you can help. Thank you!
I would like to do transform Gender and Country using One-Hot-Encoding.
With the code below I can not create the new dataset including the ID
library(caret)
ID<-1:10
Gender<-c("F","F","F","M","M","F","M","M","F","M")
Country<-c("Mali","France","France","Guinea","Senegal",
"Mali","France","Mali","Senegal","France")
data<-data.frame(ID,Gender,Country)
#One hot encoding
dmy <- dummyVars(" ~Gender+Country", data = data, fullRank = T)
dat_transformed <- data.frame(predict(dmy, newdata = data))
dat_transformed
Gender.M Country.Guinea Country.Mali Country.Senegal
1 0 0 1 0
2 0 0 0 0
3 0 0 0 0
4 1 1 0 0
5 1 0 0 1
6 0 0 1 0
7 1 0 0 0
8 1 0 1 0
9 0 0 0 1
10 1 0 0 0
I want to get a dataset that include the ID without enconding it.
ID Gender.M Country.Guinea Country.Mali Country.Senegal
1 1 0 0 1 0
2 2 0 0 0 0
3 3 0 0 0 0
4 4 1 1 0 0
5 5 1 0 0 1
6 6 0 0 1 0
7 7 1 0 0 0
8 8 1 0 1 0
9 9 0 0 0 1
10 10 1 0 0 0
dat_transformed <- cbind(ID,dat_transformed)
dat_transformed
ID Gender.M Country.Guinea Country.Mali Country.Senegal
1 0 0 1 0
2 0 0 0 0
3 0 0 0 0
4 1 1 0 0
5 1 0 0 1
6 0 0 1 0
7 1 0 0 0
8 1 0 1 0
9 0 0 0 1
10 1 0 0 0
I am trying to use the binary matrix containing transactions for the apriori algorithm I don't know how to implement it
data_purchase
Txn Bag Blush Nail.Polish Brushes Concealer Eyebrowpencil Bronzer
1 1 0 1 1 1 1 0 1
2 2 0 0 1 0 1 0 1
3 3 0 1 0 0 1 1 1
4 4 0 0 1 1 1 0 1
5 5 0 1 0 0 1 0 1
6 6 0 0 0 0 1 0 0
7 7 0 1 1 1 1 0 1
8 8 0 0 1 1 0 0 1
9 9 0 0 0 0 1 0 0
10 10 1 1 1 1 0 0 0
11 11 0 0 1 0 0 0 1
12 12 0 0 1 1 1 0 1
The above is the data frame containing the binary matrix.
Have a look at the R package arules at https://cran.r-project.org/package=arules
I have a data set that looks like this:
Person Team
114 1
115 1
116 1
117 1
121 1
122 1
123 1
214 2
215 2
216 2
217 2
221 2
222 2
223 2
"Team" ranges from 1 to 33, and teams vary in terms of size (i.e., there can be 5, 6, or 7 members, depending on the team). I need to create a data set into something that looks like this:
1 1 1 1 1 1 1 0 0 0 0 0 0 0
1 1 1 1 1 1 1 0 0 0 0 0 0 0
1 1 1 1 1 1 1 0 0 0 0 0 0 0
1 1 1 1 1 1 1 0 0 0 0 0 0 0
1 1 1 1 1 1 1 0 0 0 0 0 0 0
1 1 1 1 1 1 1 0 0 0 0 0 0 0
1 1 1 1 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 1 1 1 1 1
0 0 0 0 0 0 0 1 1 1 1 1 1 1
0 0 0 0 0 0 0 1 1 1 1 1 1 1
0 0 0 0 0 0 0 1 1 1 1 1 1 1
0 0 0 0 0 0 0 1 1 1 1 1 1 1
0 0 0 0 0 0 0 1 1 1 1 1 1 1
0 0 0 0 0 0 0 1 1 1 1 1 1 1
The sizes of the individual blocks are given by the number of people in a team. How can I do this in R?
You could use bdiag from the package Matrix. For example:
> bdiag(matrix(1,ncol=7,nrow=7),matrix(1,ncol=7,nrow=7))
Another idea, although, I guess this is less efficient/elegant than RStudent's:
DF = data.frame(Person = sample(100, 21), Team = rep(1:5, c(3,6,4,5,3)))
DF
lengths = tapply(DF$Person, DF$Team, length)
mat = matrix(0, sum(lengths), sum(lengths))
mat[do.call(rbind,
mapply(function(a, b) arrayInd(seq_len(a ^ 2), c(a, a)) + b,
lengths, cumsum(c(0, lengths[-length(lengths)])),
SIMPLIFY = F))] = 1
mat
I have a table created using the table() command in R:
y
x 0 1 2 3 4 5 6 7 8 9
0 23 0 0 0 0 1 0 0 0 0
1 0 23 1 0 1 0 1 2 0 2
2 1 1 28 0 0 0 1 0 2 2
3 0 1 0 24 0 1 0 0 0 1
4 1 1 0 0 34 0 3 0 0 0
5 0 0 0 0 0 33 0 0 0 0
6 0 0 0 0 0 2 32 0 0 0
7 0 1 0 1 0 0 0 36 0 1
8 1 1 1 1 0 0 0 1 20 1
9 1 3 0 1 0 1 0 1 0 24
This table shows the results of a classification, and I want to sum the leading diagonal of it (the diagonal with the large numbers - like 23, 23, 28 etc). Is there a sensible/easy way to do this in R?
How about sum(diag(tbl)), where tbl is your table?