I have the following data set in test.csv (6500 rows)...
Field 1 Field 2 Field 3 Field 4 Field 5 Field 6 Field 7 Field 8 Field 9 Field 10 Field 11 Field 12 Field 13 Field 14 Field 15 Field 16 Field 17 Field 18
1 1 1 1 0 0 1 0 1 1 1 0 1 0 0 1 0 0 1
2 1 1 1 0 0 1 0 1 1 1 0 1 0 0 1 0 0 1
3 1 1 1 0 0 1 0 1 1 1 0 1 0 0 1 0 0 1
4 1 1 1 0 0 1 0 1 1 1 0 1 0 0 1 0 0 1
5 1 1 1 0 0 1 0 1 1 1 0 1 0 0 1 0 0 1
6 1 1 1 0 0 1 0 1 1 1 0 1 0 0 1 0 0 1
7 1 1 1 0 0 1 0 1 1 1 0 1 0 0 1 0 0 1
8 1 1 1 0 0 1 0 1 1 1 0 1 0 0 1 0 0 1
9 1 1 1 0 0 1 0 1 1 1 0 1 0 0 1 0 0 1
I need to recreate the layout and format of this heatmap.
The legend and y axis label (Rows / observations) look like they were added outside of R, but so far, here is what I have done...
library(RColorBrewer)
# Read in dataset
df <- read.csv("test2.csv")
# Set as integer matrix
m <- as.matrix(df[, -1])
# Create colors
blackgrey <- c("black", "grey")
pal <- colorRampPalette(blackgrey)(100)
# Create heatmap
heatmap(m, Rowv = NA, Colv = NA, scale="column", col = pal)
Which yielded...
How do I change the bounds of the y axis units (instead of counting from 1, skip every 2000). Also, can I move this to the left like the heatmap I am trying to replicate?
Related
I have this puzzle to solve.
This is given data
# A tibble: 351 x 3
# Groups: expcode [?]
expcode rank distributpermm.3
<chr> <int> <dbl>
1 ER02 1 892.325
2 ER02 2 694.030
3 ER02 3 917.110
4 ER02 4 991.475
5 ER02 5 1487.210
6 ER02 6 892.325
7 ER02 7 694.030
8 ER02 8 1710.290
9 ER02 9 1090.620
10 ER02 10 1288.915
# ... with 341 more rows
When I call table on this data like this:
table(ranktab$expcode, ranktab$rank)
I get a ordinary table:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
ER02 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ER03 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ER04 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ER05 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ER07 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
ER11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ER12 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ER14 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
ER16 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0
ER18 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
ER19 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ER22 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ER23 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ER26 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
Now I would like to get a matrix looks like this table above, but instate of sum of cases I would like to have the valves of third variable in the data frame, if there are two observations, then the mean of these.
Let's consider that your initial data is in df dataframe
df1 <- with(df, aggregate(distributpermm.3, by = list(expcode, rank), mean))
colnames(df1) <- colnames(df)
#this will give you final output in the desired format
xtabs(distributpermm.3 ~ expcode + rank, df1)
Hope this helps!
If you just want to obtain the means of variable relative to variable, you can use aggregate function.
Try this:
expcode = c (rep ("ER02", 3), rep ("ER03", 4), "ER04", rep ("ER05", 2))
rank = c (1, 2, 3, 1, 2, 3, 4, 1, 1, 2)
ddistributpermml.3 = c (892.325, 694.030, 917.110, 991.475, 1487.210, 892.325, 694.030, 1710.290, 1090.620, 1288.915)
data = data.frame (expcode, rank, ddistributpermml)
res = aggregate (data [, 3], list (data$expcode), mean)
colnames (res) = c ("expcode", "mean (distributpermm.3)")
res
# > res
# expcode mean (distributpermm.3)
# 1 ER02 834.4883
# 2 ER03 1016.2600
# 3 ER04 1710.2900
# 4 ER05 1189.7675
If you want to keep variable in some way, please clarify what you want to obtain.
I have a directory matrix that contains a series of text file matrices of 0s and 1s of varying sizes which look like:
txt.1
0 1 0
1 1 1
0 0 1
txt.2
1 1 0
0 1 1
txt.3
1 1 1 1
0 1 0 1
0 0 0 0
I am trying create a larger diagonal matrix from these smaller matrices that replaces all the values in the smaller matrices with 0 and fills in the empty spaces in the diagonal with 1s so that the final result looks like:
print(bigmatrix)
0 0 0 1 1 1 1 1 1 1
0 0 0 1 1 1 1 1 1 1
0 0 0 1 1 1 1 1 1 1
1 1 1 0 0 0 1 1 1 1
1 1 1 0 0 0 1 1 1 1
1 1 1 0 0 0 1 1 1 1
1 1 1 1 1 1 0 0 0 0
1 1 1 1 1 1 0 0 0 0
1 1 1 1 1 1 0 0 0 0
1 1 1 1 1 1 0 0 0 0
Is there some way to use bdiag or some other function here? I have only been able to get bigdiag to fill in everything with 0s.
You don't need to know the elements of each small matrix, just create N matrices filled with 1's and dimension of max(dim(mx))
m1 = matrix(1,3,3)
m2 = matrix(1,3,3)
m3 = matrix(1,4,4)
lst = list(m1,m2,m3)
print(lst)
m0 = as.matrix(bdiag(lst))
m0 = ifelse(m0 == 0, 1, 0)
View(m0)
Result:
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 0 0 0 1 1 1 1 1 1 1
2 0 0 0 1 1 1 1 1 1 1
3 0 0 0 1 1 1 1 1 1 1
4 1 1 1 0 0 0 1 1 1 1
5 1 1 1 0 0 0 1 1 1 1
6 1 1 1 0 0 0 1 1 1 1
7 1 1 1 1 1 1 0 0 0 0
8 1 1 1 1 1 1 0 0 0 0
9 1 1 1 1 1 1 0 0 0 0
10 1 1 1 1 1 1 0 0 0 0
This method works:
library(Matrix)
library(MASS)
structural0<-lapply(dir(), function(x){as.matrix(read.table(x))})
structural0<-lapply(structural0,function(x){ifelse(x==0,1,1)})
structural0<-bdiag(structural0)
write.matrix(structural0, file="structural0.txt")
structural0a<-as.matrix(read.table("structural0.txt"))
structural0a<-ifelse(structural0a==0,1,0)
write.matrix(structural0a, file="structural0a.txt")
However, I wonder if there is a more efficient way of doing it. Thank you.
I have a data set that looks like this:
Person Team
114 1
115 1
116 1
117 1
121 1
122 1
123 1
214 2
215 2
216 2
217 2
221 2
222 2
223 2
"Team" ranges from 1 to 33, and teams vary in terms of size (i.e., there can be 5, 6, or 7 members, depending on the team). I need to create a data set into something that looks like this:
1 1 1 1 1 1 1 0 0 0 0 0 0 0
1 1 1 1 1 1 1 0 0 0 0 0 0 0
1 1 1 1 1 1 1 0 0 0 0 0 0 0
1 1 1 1 1 1 1 0 0 0 0 0 0 0
1 1 1 1 1 1 1 0 0 0 0 0 0 0
1 1 1 1 1 1 1 0 0 0 0 0 0 0
1 1 1 1 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 1 1 1 1 1
0 0 0 0 0 0 0 1 1 1 1 1 1 1
0 0 0 0 0 0 0 1 1 1 1 1 1 1
0 0 0 0 0 0 0 1 1 1 1 1 1 1
0 0 0 0 0 0 0 1 1 1 1 1 1 1
0 0 0 0 0 0 0 1 1 1 1 1 1 1
0 0 0 0 0 0 0 1 1 1 1 1 1 1
The sizes of the individual blocks are given by the number of people in a team. How can I do this in R?
You could use bdiag from the package Matrix. For example:
> bdiag(matrix(1,ncol=7,nrow=7),matrix(1,ncol=7,nrow=7))
Another idea, although, I guess this is less efficient/elegant than RStudent's:
DF = data.frame(Person = sample(100, 21), Team = rep(1:5, c(3,6,4,5,3)))
DF
lengths = tapply(DF$Person, DF$Team, length)
mat = matrix(0, sum(lengths), sum(lengths))
mat[do.call(rbind,
mapply(function(a, b) arrayInd(seq_len(a ^ 2), c(a, a)) + b,
lengths, cumsum(c(0, lengths[-length(lengths)])),
SIMPLIFY = F))] = 1
mat
I want to merge each row of the data.frame my.samples to another data.frame my.template to obtain the desired.result.
The template my.template could be created with expand.grid. So, even though this is a minimal example the output data set desired.result is still large.
I have posted below several attempts that did not work and one attempt that does work. However, the code that works seems overly complex.
Thank you for any advice. I prefer base R. There are numerous other posts about merging data frames. I looked at quite a few, but did not see this scenario addressed. Sorry if I overlooked it.
my.samples <- read.table(text = '
obs X1 X2 X3 z
1 2 1 0 1
2 0 0 0 1
3 0 1 2 1
', header = TRUE)
my.template <- read.table(text = '
X1 X2 X3
0 0 0
0 0 1
0 0 2
0 1 0
0 1 1
0 1 2
0 2 0
0 2 1
0 2 2
1 0 0
1 0 1
1 0 2
1 1 0
1 1 1
1 1 2
1 2 0
1 2 1
1 2 2
2 0 0
2 0 1
2 0 2
2 1 0
2 1 1
2 1 2
2 2 0
2 2 1
2 2 2
', header = TRUE)
desired.result <- read.table(text = '
obs X1 X2 X3 z
1 0 0 0 0
1 0 0 1 0
1 0 0 2 0
1 0 1 0 0
1 0 1 1 0
1 0 1 2 0
1 0 2 0 0
1 0 2 1 0
1 0 2 2 0
1 1 0 0 0
1 1 0 1 0
1 1 0 2 0
1 1 1 0 0
1 1 1 1 0
1 1 1 2 0
1 1 2 0 0
1 1 2 1 0
1 1 2 2 0
1 2 0 0 0
1 2 0 1 0
1 2 0 2 0
1 2 1 0 1
1 2 1 1 0
1 2 1 2 0
1 2 2 0 0
1 2 2 1 0
1 2 2 2 0
2 0 0 0 1
2 0 0 1 0
2 0 0 2 0
2 0 1 0 0
2 0 1 1 0
2 0 1 2 0
2 0 2 0 0
2 0 2 1 0
2 0 2 2 0
2 1 0 0 0
2 1 0 1 0
2 1 0 2 0
2 1 1 0 0
2 1 1 1 0
2 1 1 2 0
2 1 2 0 0
2 1 2 1 0
2 1 2 2 0
2 2 0 0 0
2 2 0 1 0
2 2 0 2 0
2 2 1 0 0
2 2 1 1 0
2 2 1 2 0
2 2 2 0 0
2 2 2 1 0
2 2 2 2 0
3 0 0 0 0
3 0 0 1 0
3 0 0 2 0
3 0 1 0 0
3 0 1 1 0
3 0 1 2 1
3 0 2 0 0
3 0 2 1 0
3 0 2 2 0
3 1 0 0 0
3 1 0 1 0
3 1 0 2 0
3 1 1 0 0
3 1 1 1 0
3 1 1 2 0
3 1 2 0 0
3 1 2 1 0
3 1 2 2 0
3 2 0 0 0
3 2 0 1 0
3 2 0 2 0
3 2 1 0 0
3 2 1 1 0
3 2 1 2 0
3 2 2 0 0
3 2 2 1 0
3 2 2 2 0
', header = TRUE)
# this works for one obs at a time
merge(my.samples[1,], my.template, by=c('X1', 'X2', 'X3'), all=TRUE)
# this does not work
apply(my.samples, 1, function(x) merge(x, my.template, by=c('X1', 'X2', 'X3'), all=TRUE))
# this does not work
my.output <- matrix(0, nrow=(3^3 * max(my.samples$obs)), ncol=5)
for(i in 1:max(desired.result$obs)) {
x <- merge(my.samples[i,], my.template, by=c('X1', 'X2', 'X3'), all=TRUE)
my.output[((i-1) * 3^3 +1) : ((i-1) * 3^3 + 3^3), 1:5] <- x
}
# this works
for(i in 1:max(desired.result$obs)) {
x <- merge(my.samples[i,], my.template, by=c('X1', 'X2', 'X3'), all=TRUE)
x$obs <- i
x$z[is.na(x$z)] <- 0
if(i == 1) {my.output = x}
if(i > 1) {my.output = rbind(my.output, x)}
}
my.output
all.equal(my.output[1:3], desired.result[,2:4])
I believe this should work
#expand template
full<-do.call(rbind, lapply(unique(my.samples$obs),
function(x) cbind(obs=x, my.template)))
#merge
result<-merge(full, my.samples, all.x=T)
#change NA's to 0
result$z[is.na(result$z)]<-0
#> all(result==desired.result)
#[1] TRUE
I like the answer posted by #MrFlick but when I added another column to my.samples I discovered that I had to modify the code. Below is what I came up with.
my.samples <- read.table(text = '
obs X1 X2 X3 z aa
1 2 1 0 1 20
2 0 0 0 1 -10
3 0 1 2 1 10
', header = TRUE)
my.template <- read.table(text = '
X1 X2 X3
0 0 0
0 0 1
0 0 2
0 1 0
0 1 1
0 1 2
0 2 0
0 2 1
0 2 2
1 0 0
1 0 1
1 0 2
1 1 0
1 1 1
1 1 2
1 2 0
1 2 1
1 2 2
2 0 0
2 0 1
2 0 2
2 1 0
2 1 1
2 1 2
2 2 0
2 2 1
2 2 2
', header = TRUE)
obs.aa <- my.samples[, c(1, ncol(my.samples))]
my.template2 <- merge(my.template, obs.aa)
my.template3 <- merge(my.template2, my.samples, by=c('obs', 'aa', paste0('X', 1:(ncol(my.samples)-3))), all = TRUE)
my.template3$z[is.na(my.template3$z)] <- 0
my.template3
I have successfully run the DBSCAN algorithm (here is the stripped down command):
results <- dbscan(data,MinPts=15, eps=0.01)
and plotted my clusters:
plot(results, data)
results$cluster returns a list with numeric values. The value at each index reflects the cluster to which the original data in that index belongs:
[1] 0 1 2 1 0 0 2 1 0 0 0 1 2 0 2 0 2 0 0 1 2 0 2 2 0 1 2 0 1 0 1 0 2 0 0 0 1 1 0 1 2 0 0 0 1 0 0 1 1 0 1
[52] 0 2 2 0 0 1 2 2 0 2 1 0 0 0 1 0 1 0 0 0 0 0 1 1 0 1 0 2 2 2 2 2 0 0 0 0 0 2 1 2 1 0 2 0 0 1 1 1 0 0 1
[103] 2 1 1 0 1 0 1 1 0 0 0 0 1 2 0 0 1 1 1 1 0 0 0 1 0 0 2 2 1 1 0 1 2 1 0 0 1 0 1 2 0 0 2 0 0 2 2 2 2 0 1
However, how can I retrieve the values of the original data that is in each cluster? For example, how can I get all the values from the original data that are in cluster #2?
Okay, this should do the trick for, e.g., cluster #2:
data[results$cluster==2,]