How to show the cluster assignment in each cluster

How to show the cluster assignment in each cluster - r

Is there a way to show the the member in a cluster after cutree step in R?
for example:
tree <- hclust(dist, method='single')
plot(tree, hang=-1, cex=0.8)
cutree(tree, h=18)
I obtain sth like:
X10100 X3755 X13068 X264 X13216
1 1 2 2 3
X8379 X13727 X9925 X13849 X467
3 4 4 5 5
X14265 X388 X14426 X8246 X14961
6 6 7 7 8
X17037 X1200 X844 X13024 X155
8 9 9 10 11
I want to see/print it as a more straightforward way
such as:
cluster 1: 10100,03755
cluster 2: ..........
How can I do it? Thanks!

You can group the results using split or by :
hh <- cutree(tree, h=18)
split(names(hh),hh)
Or
by(names(hh),hh,paste,collapse=',')

Related

Frequency distribution using binCounts

I have a dataset of Ages for the customer and I wanted to make a frequency distribution by 9 years of a gap of age.
Ages=c(83,51,66,61,82,65,54,56,92,60,65,87,68,64,51,
70,75,66,74,68,44,55,78,69,98,67,82,77,79,62,38,88,76,99,
84,47,60,42,66,74,91,71,83,80,68,65,51,56,73,55)
My desired outcome would be similar to below-shared table, variable names can be differed(as you wish)
Could I use binCounts code into it ? if yes could you help me out using the code as not sure of bx and idxs in this code?
binCounts(x, idxs = NULL, bx, right = FALSE) ??
Age Count
38-46 3
47-55 7
56-64 7
65-73 14
74-82 10
83-91 6
92-100 3
Much Appreciated!

I don't know about the binCounts or even the package it is in but i have a bare r function:
data.frame(table(cut(Ages,0:7*9+37)))
Var1 Freq
1 (37,46] 3
2 (46,55] 7
3 (55,64] 7
4 (64,73] 14
5 (73,82] 10
6 (82,91] 6
7 (91,100] 3
To exactly duplicate your results:
lowerlimit=c(37,46,55,64,73,82,91,101)
Labels=paste(head(lowerlimit,-1)+1,lowerlimit[-1],sep="-")#I add one to have 38 47 etc
group=cut(Ages,lowerlimit,Labels)#Determine which group the ages belong to
tab=table(group)#Form a frequency table
as.data.frame(tab)# transform the table into a dataframe
group Freq
1 38-46 3
2 47-55 7
3 56-64 7
4 65-73 14
5 74-82 10
6 83-91 6
7 92-100 3
All this can be combined as:
data.frame(table(cut(Ages,s<-0:7*9+37,paste(head(s+1,-1),s[-1],sep="-"))))

R - set bucket from a mapper data frame

Probably a similar situation has already been solved but I could not find it.
I have a mapper data frame like the following
mapper
bucket_label bucket_no
1 (-Inf; 9.99) 1
2 (25.01; 29.99) 1
3 (29.99; 30.01) 1
4 (30.01; Inf) 1
5 (19.99; 20.01) 2
6 (20.01; 24.99) 2
7 (24.99; 25.01) 2
8 (9.99; 10.11) 3
9 (10.11; 14.99) 3
10 (14.99; 15.01) 3
11 (15.01; 19.99) 3
and a vector x with random data
x <- rnorm(100)*100
I need to set the corresponding bucket for each entry of this in a quick way and findInterval and cut seem not to help for this issue.

Plot empty groups in boxplot

I want to plot a lot of boxplots in on particular style to compare them.
But when a group is empty the group "isn't plotted".
lets say I have a dataframe:
a b
1 1 5
2 1 4
3 1 6
4 1 4
5 2 9
6 2 8
7 2 9
8 3 NaN
9 3 NaN
10 3 NaN
11 4 2
12 4 8
and I use boxplot to plot it:
boxplot(b ~ a , df)
than I get the plot without group 3
(which I can't show because I did not have "10 reputation")
I found some solutions for removing empty groups via Google but my problem is the other way around.
And I found the solution via at=c(1,2,4) but as I generate an Rscript with python and different groups are empty I would prefer, that the groups aren't dropped at all.
Oh I don't think I have the time to grapple with additional packages.
Therefore I would be thankful for solutions without them.

You can get the group on the x-axis by
boxplot(b ~ a , df, na.action=na.pass)
Or
boxplot(b~factor(a), df)

How to plot using multiple criteria in R?

Following are first 15 rows of my data:
> head(df,15)
frame.group class lane veh.count mean.speed
1 [22,319] 2 5 9 23.40345
2 [22,319] 2 4 9 24.10870
3 [22,319] 2 1 11 14.70857
4 [22,319] 2 3 8 20.88783
5 [22,319] 2 2 6 16.75327
6 (319,616] 2 5 15 22.21671
7 (319,616] 2 2 16 23.55468
8 (319,616] 2 3 12 22.84703
9 (319,616] 2 4 14 17.55428
10 (319,616] 2 1 13 16.45327
11 (319,616] 1 1 1 42.80160
12 (319,616] 1 2 1 42.34750
13 (616,913] 2 5 18 30.86468
14 (319,616] 3 3 2 26.78177
15 (616,913] 2 4 14 32.34548
'frame.group' contains time intervals, 'class' is the vehicle class i.e. 1=motorcycles, 2=cars, 3=trucks and 'lane' contains lane numbers. I want to create 3 scatter plots with frame.group as x-axis and mean.speed as y-axis, 1 for each class. In a scatterplot for one vehicle class e.g. cars, I want 5 plots i.e. one for each lane. I tried following:
cars <- subset(df, class==2)
by(cars, lane, FUN = plot(frame.group, mean.speed))
There are two problems:
1) R does not plot as expected i.e. 5 plots for 5 different lanes.
2) Only one is plotted and that too is box-plot probably because I used intervals instead of numbers as x-axis.
How can I fix the above issues? Please help.

Each time a new plot command is issued, R replaces the existing plot with the new plot. You can create a grid of plots by doing par(mfrow=c(1,5)), which will be 1 row with 5 plots (other numbers will have other numbers of rows and columns). If you want a scatterplot instead of a boxplot you can use plot.default
It is easier to do all this with the ggplot2 library instead of the base graphics, and the resulting plot will look much nicer:
library(ggplot2)
ggplot(cars,aes(x=frame.group,y=mean.speed))+geom_point()+facet_wrap(~lane)
See the ggplot2 documentation for more details: http://docs.ggplot2.org/current/

How to put information obtained by cast function of reshape package back in my original data frame in R

I have a data.frame in panel format (country-year) and I need to calculate the mean of a variable by country and at each five years. So I just used the 'cast' function from 'reshape' package and it worked. Now I need to put this information(the mean by quinquennium) in the old data.frame, so I can run some regressions. How can I do that? Below I provide an example to ilustrate what I want:
set.seed(2)
fake= data.frame(y=rnorm(20), x=rnorm(20), country=rep(letters[1:2], each=10), year=rep(1:10,2), quinquenio= rep(rep(1:2, each=5),2))
fake.m = melt.data.frame(fake, id.vars=c("country", "year", "quinquenio"))
cast(fake.m, country ~ quinquenio, mean, subset=variable=="x", na.rm=T)
Now, everything is fine and I get what I wantted: the mean of x and y, by country and by quinquennial years. Now, I would like to put them back in the data.frame fake, like this:
y x country year quinquenio mean.x
1 -0.89691455 2.090819205 a 1 1 0.8880242
2 0.18484918 -1.199925820 a 2 1 0.8880242
3 1.58784533 1.589638200 a 3 1 0.8880242
4 -1.13037567 1.954651642 a 4 1 0.8880242
5 -0.08025176 0.004937777 a 5 1 0.8880242
6 0.13242028 -2.451706388 a 6 2 -0.2978375
7 0.70795473 0.477237303 a 7 2 -0.2978375
8 -0.23969802 -0.596558169 a 8 2 -0.2978375
9 1.98447394 0.792203270 a 9 2 -0.2978375
10 -0.13878701 0.289636710 a 10 2 -0.2978375
11 0.41765075 0.738938604 b 1 1 0.2146461
12 0.98175278 0.318960401 b 2 1 0.2146461
13 -0.39269536 1.076164354 b 3 1 0.2146461
14 -1.03966898 -0.284157720 b 4 1 0.2146461
15 1.78222896 -0.776675274 b 5 1 0.2146461
16 -2.31106908 -0.595660499 b 6 2 -0.8059598
17 0.87860458 -1.725979779 b 7 2 -0.8059598
18 0.03580672 -0.902584480 b 8 2 -0.8059598
19 1.01282869 -0.559061915 b 9 2 -0.8059598
20 0.43226515 -0.246512567 b 10 2 -0.8059598
I appreciate any tip in the right direction. Thanks in advance.
ps.: the reason I need this is that I'll run a regression with quinquennial data, and for some variables (like per capita income) I have information for all years, so I decided to average them by 5 years.

I'm sure there's an easy way to do this with reshape, but my brain defaults to plyr first:
require(plyr)
ddply(fake, c("country", "quinquenio"), transform, mean.x = mean(x))
This is quite hackish, but one way to use reshape building off your earlier work:
zz <- cast(fake.m, country ~ quinquenio, mean, subset=variable=="x", na.rm=T)
merge(fake, melt(zz), by = c("country", "quinquenio"))
though I'm positive there has to be a better solution.

Here's a more old school approach using tapply, ave, and with
fake$mean.x <- with(fake, unlist(tapply(x, list(country, quinquenio), ave)))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to show the cluster assignment in each cluster - r

You can group the results using split or by : hh <- cutree(tree, h=18) split(names(hh),hh) Or by(names(hh),hh,paste,collapse=',')

Related

Frequency distribution using binCounts

R - set bucket from a mapper data frame

Plot empty groups in boxplot

How to plot using multiple criteria in R?

How to put information obtained by cast function of reshape package back in my original data frame in R

Categories

Resources