Sunburst diagram in with sunburstR package in R - r

I am using the sunburstR package to create a sunburst diagram but it is not working and I am not sure what I am doing wrong.
Raw data:
> sequences
V1
1 A-aa-aaa-end
2 A-aa-aaa-end
3 A-aa-vvv-end
4 A-aa-vvv-end
5 A-cc-vvv-end
6 A-cc-vvv-end
7 B-aa-vvv-end
8 B-aa-vvv-end
9 B-bb-rr-end
10 B-bb-rr-end
11 C-aa-rr-end
12 C-aa-rr-end
13 C-bb-rr-end
14 C-bb-rr-end
15 C-cc-rr-end
Code:
sequences <- read.csv(filepath, header=F ,stringsAsFactors = FALSE)
sunburst(sequences)

You need some values in the second column of your data frame...
sequences <- read.table(text = '
A-aa-aaa-end
A-aa-aaa-end
A-aa-vvv-end
A-aa-vvv-end
A-cc-vvv-end
A-cc-vvv-end
B-aa-vvv-end
B-aa-vvv-end
B-bb-rr-end
B-bb-rr-end
C-aa-rr-end
C-aa-rr-end
C-bb-rr-end
C-bb-rr-end
C-cc-rr-end
')
sequences$V2 <- seq_along(sequences$V1)
sequences
library(sunburstR)
sunburst(sequences)

You are missing the count part. Try sunburst(data.frame(table(sequences$V1))) and it should work as expected.
PS : not tested without the sequences dataframe.

Related

Can I subset an aggregated CellDataSet object using Monocle in R?

I have a CelldataSet object (cds):
> class(cds)
[1] "CellDataSet"
attr(,"package")
[1] "monocle"
composed of 6 different aggregated samples that can be distinguished by the suffixes of their barcodes. Here is a sample of what these look like:
cds$barcode
1 ACCAACGACTTGCC-1
2 CGCACTACTCGATG-4
3 CGTACAGAGTATCG-5
4 CGTCAAGATCACCC-5
5 ACTGAGACCCGTAA-2
6 TTAGACCTCGGGAA-6
7 TTCAAGCTGGTATC-3
8 TTTGACTGTCCTTA-4
9 TTTGCATGCTCTTA-4
10 AAACATTGAAGCCT-5
Is it possible to split this CellDataSet object into 6 smaller CellDataSet objects that each comprise barcodes with the same "-n" suffix, so I can analyse each sample separately? For example, the barcodes of CellDataSet1 would look like:
cds$barcode
1 AAACCGTGCCCTCA-1
2 AAACGCACACGCAT-1
3 AAACGGCTTCCGAA-1
4 AAAGACGAACCCAA-1
5 AAAGACGACTGTTT-1
6 AAAGAGACAAAGCA-1
7 AAAGATCTGGTAAA-1
8 AAAGCAGAGCAAGG-1
9 AAAGCAGATTATCC-1
10 AAAGCCTGATGACC-1
etc, and would contain the corresponding attributes as in the original object.
Many thanks!
Abigail
You can use tidyverse to solve the problem:
library(tidyverse)
dataseti <- data.frame(barcode = c("ACCAACGACTTGCC-1",
"GCACTACTCGATG-4",
"CGTACAGAGTATCG-5",
"CGTCAAGATCACCC-5",
"ACTGAGACCCGTAA-2",
"TTAGACCTCGGGAA-6",
"TTCAAGCTGGTATC-3",
"TTTGACTGTCCTTA-4",
"TTTGCATGCTCTTA-4",
"AAACATTGAAGCCT-5"),
stringsAsFactors = FALSE)
Let's say you want group 4
dataseti %>% separate(barcode, c("chain","group"),"-") %>% filter(group == 4)
Good luck!

Capture the output of arules::inspect as data.frame

In "Zero frequent items" when using the eclat to mine frequent itemsets, the OP is interested in the groupings/clusterings based on how frequent they are ordered together. This grouping can be inspected by the arules::inspect function.
library(arules)
dataset <- read.transactions("8GbjnHK2.txt", sep = ";", rm.duplicates = TRUE)
f <- eclat(dataset,
parameter = list(
supp = 0.001,
maxlen = 17,
tidLists = TRUE))
inspect(head(sort(f, by = "support"), 10))
The data set can be downloaded from https://pastebin.com/8GbjnHK2.
However, the output cannot be easily saved to another object as a data frame.
out <- inspect(f)
So how do we capture the output of inspect(f) for use as data frame?
We can use the methods labels to extract the associations/groupings and quality to extract the quality measures (support and count). We can then use cbind to store these into a data frame.
out <- cbind(labels = labels(f), quality(f))
head(out)
# labels support count
# 1 {3031093,3059242} 0.001010 16
# 2 {3031096,3059242} 0.001073 17
# 3 {3060614,3060615} 0.001010 16
# 4 {3022540,3072091} 0.001010 16
# 5 {3061698,3061700} 0.001073 17
# 6 {3031087,3059242} 0.002778 44
Coercing the itemsets to a data.frame also creates the required output.
> head(as(f, "data.frame"))
items support count
1 {3031093,3059242} 0.001010101 16
2 {3031096,3059242} 0.001073232 17
3 {3060614,3060615} 0.001010101 16
4 {3022540,3072091} 0.001010101 16
5 {3061698,3061700} 0.001073232 17
6 {3031087,3059242} 0.002777778 44

R plot_ly plots missing values due to wrong rows extraction from a list

I have the following data :
> data
Type value
1 aromatics.aromatics 0.974489796
2 aromatics.charged 0.005102041
3 aromatics.polar 0.005102041
4 aromatics.unpolar 0.015306122
5 charged.aromatics 0.008620690
6 charged.charged 0.982758621
7 charged.polar 0.006465517
8 charged.unpolar 0.002155172
9 polar.aromatics 0.000000000
10 polar.charged 0.008403361
11 polar.polar 0.983193277
12 polar.unpolar 0.008403361
13 unpolar.aromatics 0.005532503
14 unpolar.charged 0.000000000
15 unpolar.polar 0.011065007
16 unpolar.unpolar 0.983402490
> typeof(data)
[1] "list"
# I keep only some rows of the data :
rows <- c(2,3,4,7,8,12)
data.2 <- data[rows,]
# result
> data.2
Type value
2 aromatics.charged 0.005102041
3 aromatics.polar 0.005102041
4 aromatics.unpolar 0.015306122
7 charged.polar 0.006465517
8 charged.unpolar 0.002155172
12 polar.unpolar 0.008403361
I want to use plot_ly to make a barplot with data.2
The problem is that this code :
plot_ly() %>%
add_bars(x = data.2[,1], y = data.2[,2])
Set the x-axis with all the lines of the main data (see picture).
And indeed :
# data.2[,1] is :
[1] aromatics.charged aromatics.polar aromatics.unpolar charged.polar charged.unpolar polar.unpolar
16 Levels: aromatics.aromatics aromatics.charged aromatics.polar aromatics.unpolar charged.aromatics ... unpolar.unpolar``
# while data.2[,2] is :
[1] 0.005102041 0.005102041 0.015306122 0.006465517 0.002155172 0.008403361
So I guess my method for extracting lines is wrong, since all the levels are taken ... How can I correct this ? Note that the problem does not happen when using ggplot2.
Remark : I also use the add_markers() function to plot supplemental data (different values, but same x levels), and it does the same problem.
Perhaps dropping unused levels in your Type variable might solve your problems.
So try:
data.2$Type <- droplevels(data.2$Type)
Found it :
droplevels(data.2)
Silly me.

R and appending to data frames

I have some cross correlation function crosscor, and I would like to loop through the function for each of the columns I have in my data matrix. The function outputs some cross correlation that looks something like this each time it is run:
Lags Cross.Correlation P.value
1 0 -0.0006844958 0.993233547
2 1 0.1021006478 0.204691627
3 2 0.0976746274 0.226628526
4 3 0.1150337867 0.155426784
5 4 0.1943150900 0.016092041
6 5 0.2360415470 0.003416147
7 6 0.1855274375 0.022566685
8 7 0.0800646242 0.330081900
9 8 0.1111071269 0.177338885
10 9 0.0689602574 0.404948252
11 10 -0.0097332533 0.906856279
12 11 0.0146241719 0.860926388
13 12 0.0862549791 0.302268025
14 13 0.1283308019 0.125302070
15 14 0.0909537922 0.279988895
16 15 0.0628012627 0.457795228
17 16 0.1669241304 0.047886605
18 17 0.2019811994 0.016703619
19 18 0.1440124960 0.090764520
20 19 0.1104842808 0.197035340
21 20 0.1247428178 0.146396407
I would like put all of the lists together so they are in a data frame, and ultimately export it into a csv file so the columns are as follows: lags.3, cross-correlation.3, p-value.3, lags.3, cross-correlation.2....etc. until p.value.50.
I have tried to use do.call as follows, but have not been successful:
for(i in 3:50)
{
l1<-crosscor(data[,2], data[,i], lagmax=20)
ccdata<-do.call(rbind, l1)
cat("Data row", i)
}
I've also tried just creating the data frame straight out, but am just getting the lag column names:
ccdata <- data.frame()
for(i in 3:50)
{
ccdata[i-2:i+1]<-crosscor(data[,2], data[,i], lagmax=20)
cat("Data row", i)
}
What am I doing wrong? Or is there an online source on data sets I could access to figure out how to do this? Best,
There is a transpose method for data.frames. If "crosscor" is the name of the object just try this:
tcrosscor <- t(crosscor)
write.csv(tcrosscor, file="my_crosscor_1.csv")
The first row would be the Lag's; the second row, the Cross.Correlation's; the third row the P.value's. I suppose you could "flatten" it further so it would be entirely "horizontal" or "wide". Seems painful but this might go something like:
single_line <- as.data.frame(unlist(tcrosscor))
names(single_line) <- paste("Lag", 'Cross.Correlation', 'P.value'), rep(1:50, 3), sep=".")
write.csv(single_line, file="my_single_1.csv")

Multiple plots in R with different settings for each axis with less lines of code

In the graph below,
Is it possible to create same graph with less lines of codes? I mean, since each Figs. A-D has different label settings, I have to write settings for each Fig. which makes it longer.
The graph below is produced with the data in pdf device.
Any help with these issues is highly appreciated.(Newbie to R!). Since all the code is too long to post here, I have posted a part relevant to the problem here for Fig.C
#FigC
label1=c(0,100,200,300)
plot(data$TimeVariable2C,data$Variable2C,axes=FALSE,ylab="",xlab="",xlim=c(0,24),
ylim=c(0,2.4),xaxs="i",yaxs="i",pch=19)
lines(data$TimeVariable3C,data$Variable3C)
axis(2,tick=T,at=seq(0.0,2.4,by=0.6),label= seq(0.0,2.4,by=0.6))
axis(1,tick=T,at=seq(0,24,by=6),label=seq(0,24,by=6))
mtext("(C)",side=1,outer=F,line=-10,adj=0.8)
minor.tick(nx=5,ny=5)
par(new=TRUE)
plot(data$TimeVariable1C,data$Variable1C,axes=FALSE,xlab="",ylab="",type="l",
ylim=c(800,0),xaxs="i",yaxs="i")
axis(3,xlim=c(0,24),tick=TRUE,at= seq(0,24,by=6),label=seq(0,24,by=6),col.axis="violetred4",col="violetred4")
axis(4,tick=TRUE,at= label1,label=label1,col.axis="violetred4",col="violetred4")
polygon(data$TimeVariable1C,data$Variable1C,col='violetred4',border=NA)
You ask many questions in the same OP. I will try to answer to just one : How to simplify your code or rather how to call it once for each letter. I think it is better to put your data in the long format. For example, This will create a list of 4 elements
ll <- lapply(LETTERS[1:4],function(let){
dat.let <- dat[,grepl(let,colnames(dat))]
dd <- reshape(dat.let,direction ='long',
v.names=c('TimeVariable','Variable'),
varying=1:6)
dd$time <- factor(dd$time)
dd$Type <- let
dd
}
)
ll is a list of 4 data.frame, where each one that looks like :
head(ll[[1]])
time TimeVariable Variable id Type
1.1 1 0 0 1 A
2.1 1 0 5 2 A
3.1 1 8 110 3 A
4.1 1 16 0 4 A
5.1 1 NA NA 5 A
6.1 1 NA NA 6 A
Then you can use it like this for example :
library(Hmisc)
layout(matrix(1:4, 2, 2, byrow = TRUE))
lapply(ll,function(data){
label1=c(0,100,200,300)
Type <- unique(dat$Type)
dat <- subset(data,time==2)
x.mm <- max(dat$Variable,na.rm=TRUE)
plot(dat$TimeVariable,dat$Variable,axes=FALSE,ylab="",xlab="",xlim=c(0,x.mm),
ylim=c(0,2.4),xaxs="i",yaxs="i",pch=19)
dat <- subset(data,time==2)
lines(dat$TimeVariable,dat$Variable)
axis(2,tick=T,at=seq(0.0,2.4,by=0.6),label= seq(0.0,2.4,by=0.6))
axis(1,tick=T,at=seq(0,x.mm,by=6),label=seq(0,x.mm,by=6))
mtext(Type,side=1,outer=F,line=-10,adj=0.8)
minor.tick(nx=5,ny=5)
par(new=TRUE)
dat <- subset(data,time==1)
plot(dat$TimeVariable,dat$Variable,axes=FALSE,xlab="",ylab="",type="l",
ylim=c(800,0),xaxs="i",yaxs="i")
axis(3,xlim=c(0,24),tick=TRUE,at= seq(0,24,by=6),label=seq(0,24,by=6),col.axis="violetred4",col="violetred4")
axis(4,tick=TRUE,at= label1,label=label1,col.axis="violetred4",col="violetred4")
polygon(dat$TimeVariable,dat$Variable,col='violetred4',border=NA)
})
Another advantage of using the long data format is to use ``ggplot2andfacet_wrap` for example .
## transform your data to a data.frame
dat.l <- do.call(rbind,ll)
library(ggplot2)
ggplot(subset(dat.l,time !=1)) +
geom_line(aes(x=TimeVariable,y=Variable,group=time,color=time))+
geom_polygon(data=subset(dat.l,time ==1),
aes(x=TimeVariable,y=60-Variable/10,fill=Type))+
geom_line(data=subset(dat.l,time ==1),
aes(x=TimeVariable,y=Variable,fill=Type))+
facet_wrap(~Type,scales='free')

Resources