SAVE groups clustering to out of r - r

I wrote the follow code to clustering data :
clusrer.data <- function(data,n) {
miRNA.exp.cluster <- scale(t(miRNA.exp))
k.means.fit <- kmeans(miRNA.exp.cluster,n)
#i try to save the results of k-means cluster by this code :
k.means.fit <- as.data.frame(k.means.fit)
write.csv(k.means.fit, file="k-meanReslut.csv")
#x<-k.means.fit$clusters
#write.csv(x, file="k-meanReslut.csv")
}
but I can not save the clusters to outside of (clusters) (8, 6, 7, 20, 18), I want to save each cluster separated (with columns and rows) in txt file or CSV.

Here is one approach of splitting the original dataset according to cluster and saving that chunk to a file. I added cluster assignment to the original dataset for easier visual check. Example is partly taken from ?kmeans. Feel free to adapt the way files are written, as well as the way file name is created.
x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
colnames(x) <- c("x", "y")
(cl <- kmeans(x, 2))
x <- cbind(x, cluster = cl$cluster)
by(x, INDICES = cl$cluster, FUN = function(sp) {
write.table(sp, file = paste0("file", unique(sp$cluster), ".txt"),
row.names = TRUE, col.names = TRUE)
})

Related

ggarrange generates an empty pdf file

I am dealing with a function that takes a big data frame (36 rows and 194 columns) which performs a Principal Component Analysis and then generates a list of plots where I have the combination of 26 Principal Components which are 325 in total, using 'expand.grid'.
My problem is that when I am using ggarrange(), from ggpubr, to merge all the plots in only one pdf file, this file is empty.
My code:
a = 26
row.pairs = 325
PC.Graph <- function(df, col1, col2, tag, id){
df1 <- df[,-c(col1:col2)]
pca <- prcomp(df1, scale. = T)
pc.summ <- summary(pca)
a <- sum(pc.summ$importance[3,] < 0.975)
b <- c(1:a)
pc.grid <- expand.grid(b, b)
pc.pairs <- pc.grid[pc.grid$Var1 < pc.grid$Var2,]
row.pairs <- nrow(pc.pairs)
components <- c(1:row.pairs)
S.apply.FUN <- function(x){
c <- sapply(pc.pairs, "[", x, simplify = F)
pcx <- c$Var1
pcy <- c$Var2
df2 <- df
row.names(df2) <- df[, tag]
name = paste("PCA_", pcx, "_vs_", pcy)
autoplot(pca, data = df2, colour = id, label = T, label.repel = T, main = name,
x = pcx, y = pcy)
}
all.plots <- Map(S.apply.FUN, components)
pdf(file = "All_PC.pdf", width = 50, height = 70)
print(ggarrange(all.plots))
dev.off()
}
PC.Graph(Final_DF, col1 = 1, col2 = 5, tag = "Sample", id = "Maturation")
You would have to pass a plotlist to ggarrange, but I am not sure you would get any useful plot out of that plot area in the PDF file, so I would advise you to split the plotlist into chunks (e.g. of 20) and plot these to multiple pages.
Specifically, I would export all.plots from your PC.Graph function (and remove the code to write to PDF there).
I would also change the expand.grid(b, b) to t(combn(b, 2)), since you don't need to plot the PC combinations twice.
Then I would do something like this:
# export the full list of plots
plots <- PC.Graph(Final_DF, col1 = 1, col2 = 5, tag = "Sample", id = "Maturation")
# split the plotlist
splitPlots <- split(plots, ceiling(seq_along(plots)/20))
plotPlots <- function(x){
out <- cowplot::plot_grid(plotlist = x, ncol = 5, nrow = 4)
plot(out)
}
pdf(file = "All_PC.pdf", width = 50, height = 45)
lapply(splitPlots, plotPlots)
dev.off()

Applying a distance matrix to multiple data frames

I have 20 data frames of different lengths, but all the same number of columns. I would like to run some analyses, in this case a distance matrix using vegan, for each of these data frames. I have searched around and just figure I am missing a step somewhere.
dummy data is using 5 df, and I have been trying to use the lapply.
df1<- matrix(data = c(1:100), nrow = 10, ncol = 10)
df2<- matrix(data = c(1:150), nrow = 15, ncol = 10)
df3<- matrix(data = c(1:50), nrow = 5, ncol = 10)
df4<- matrix(data = c(1:200), nrow = 20, ncol = 10)
df5<- matrix(data = c(1:100), nrow = 10, ncol = 10)
Y<- list(df1, df2, df3, df4, df5)
Y.dc <- lapply(Y, dist.ldc(Y, "chord"))
I have also tried just running it on the list directly, and I keep getting errors there too.
Y.dc<- dist.ldc(Y, "chord")
Ideally, I would like to not run 20 lines/chunks of code for each frame.
Eventually, I would also like to be able to generate nMDS plots, and run PERMANOVAs on each of the data frames all at once as well. Would I need to write/run a function in order to accomplish that?
A valid syntax :
Y.dc <- lapply(Y, dist.ldc, method = "chord")
(I assumed function dist.lc came from package adespatial, which I don't know)

Loop set a list of command for different variables in R?

I've just studied in R programming. I'm trying to create an automatic calculation for my data, but as a newcomer, i've found it so much more difficult than i though. I hope someone help me figure this out.
My data consist of dataset of 11 species in different Csv files in the same folder. I need to run a set of command to get the graphs. For each species, i need to type again commands (From glm...to the end).
>setwd("C:\Users\OneDrive\Work\Journal Articles\I9-2019\Reports\LM50")
>library(FSA)
>temp = list.files(pattern="*.csv")
for (i in 1:length(temp)) assign(temp[i], read.table(temp[i],header=T,sep=";"))
>lr<-function(cf,p)(log(p/(1-p))-cf[1])/cf[2]
>glmFencr<-glm(Maturity~FL,data=`Encrasicholina heteroloba F 2019 TNB I9.csv`,family=binomial)
>(L50Fencr<-lr(coef(glmFencr),0.5))
> '**First result**
>fitPlot(glmFencr,xlab="FL (mm)",ylab="Percentage (%)",main="",xlim=c(0,180),plot.p=FALSE)
>lines(c(0,L50Fencr),c(0.5,0.5),lty=3,lwd=2,col="black")
>lines(c(L50Fencr,L50Fencr),c(-0.2,0.5),lty=3,lwd=2,col="black")
>**second result (graph)**
So does it possible to create a loopset for this one?
Thank you in advance!~
You could simply store your data frames in a list and loop through them. Notice it helps a lot to have your code indented and properly spaced.
library(FSA)
setwd("C:\Users\OneDrive\Work\Journal Articles\I9-2019\Reports\LM50")
lr <- function(cf, p)
{
return((log(p/(1-p))-cf[1])/cf[2])
}
all_files <- list.files(pattern="*.csv")
# Use lapply to get all your data frames and models into lists
all_data <- lapply(all_files, function(x) read.table(x, header = T, sep = ";"))
all_models <- lapply(all_data, function(x) glm(Maturity ~ FL, data = x, family = binomial))
# Now you can get your L50s as a vector
all_L50s <- unlist(lapply(all_models, function(x) lr(coef(x), 0.5)))
# Now you can loop quite easily
for(i in seq_along(all_models))
{
fitPlot(all_models[[i]], xlim=c(0, 180), plot.p = FALSE,
xlab = "FL (mm)", ylab = "Percentage(%)", main = "")
lines(c(0, all_L50s[i]), c(0.5, 0.5), lty = 3, lwd = 2, col = "black")
lines(c(all_L50s[i], all_L50s[i]), c(-0.2, 0.5), lty = 3, lwd = 2, col = "black")
}
Put the code that you want to apply to each file in a function
library(FSA)
lr <- function(cf,p)(log(p/(1-p))-cf[1])/cf[2]
apply_fun <- function(data) {
glmFencr <- glm(Maturity~FL,data=data,family=binomial)
L50Fencr <- lr(coef(glmFencr),0.5)
fitPlot(glmFencr,xlab="FL (mm)", ylab="Percentage (%)",
main="", xlim=c(0,180), plot.p=FALSE)
lines(c(0,L50Fencr),c(0.5,0.5),lty=3,lwd=2,col="black")
lines(c(L50Fencr,L50Fencr),c(-0.2,0.5),lty=3,lwd=2,col="black")
}
Get all the files and apply the function to each file using lapply. Store the output in out_plot.
temp = list.files(pattern="*.csv")
out_plot <- lapply(temp, function(x) {
df <- read.table(x, header=T,sep=";")
apply_fun(df)
})

Save 2-plot figures in pdf within for loop

I have multiple plots to save as .pdf files and they are created in R by using par(mfrow=c(1,2)), i.e. per each figure (to be saved) there are 2 plots disposed by 1 row and 2 columns.
Since my total number of plots is quite high I am creating the plots with a for loop.
How can I save the figures (with 2 plots each one) as pdf files in the for loop?
Here's same funky code:
## create data.frames
df_1 = data.frame(x = c(1:100), y = rnorm(100))
df_2 = data.frame(x = c(1:100), y = rnorm(100))
df_3 = data.frame(x = c(1:100), y = rnorm(100))
df_4 = data.frame(x = c(1:100), y = rnorm(100))
## create list of data.frames
df_lst = list(df_1, df_2, df_3, df_4)
## plot in for loop by 1 row and 2 cols
par(mar=c(3,3,1,0), mfrow=c(1,2))
for (i in 1:length(df_lst)) {
barplot(df_lst[[i]]$y)
}
Let's say I want to save the plots with the pdf function. Here's what I tried:
for (i in 1:length(df_lst)) {
pdf(paste('my/directory/file_Name_', i, '.pdf', sep = ''), height = 6, width = 12)
barplot(df_lst[[i]]$y)
dev.off()
}
My solution is clearly wrong because the pdf function saves a figure at each loop (i.e. 4 instead of 2).
Any suggestion?
Thanks
Sounds like you could use a nested loop here: an outer loop for each file you create, and an inner loop for each multi-panel figure you create. Since all the data frames are stored in a 1-d list, you'll then need to keep track of the index of the list that you are plotting.
Here's one way to do that:
nrow <- 1
ncol <- 2
n_panels <- nrow * ncol
n_files <- length(df_lst) / n_panels
for (i in seq_len(n_files)) {
file <- paste0("file_", i, ".pdf")
pdf(file, height = 6, width = 12)
# plot params need to be set for each device
par(mar = c(3, 3, 1, 0), mfrow = c(nrow, ncol))
for (j in seq_len(n_panels)) {
idx <- (i - 1) * n_panels + j
barplot(df_lst[[idx]]$y)
}
# updated to also add a legend
legend("bottom", legend = "Bar", fill = "grey")
dev.off()
}
If you just want one file with multiple pages, all you need to do is move the pdf() call outside your original loop, and move the parameter setting after the pdf():
pdf('my/directory/file_Name.pdf', height = 6, width = 12)
par(mar=c(3,3,1,0), mfrow=c(1,2))
for (i in 1:length(df_lst)) {
barplot(df_lst[[i]]$y)
}
dev.off()

Methods for iteratively changing matrices in a set in R

Currently I am working on an R project where I iteratively have to make a lot of small changes to the final output, which is stored in an self made Class. The calculation time of the problem becomes very large if the amount of iterations increase. Unfortunately, a more vectorized version of the code is not possible, because the future values to be changed depend on the current changes.
A small example of the problem that I encouter is given below. For the calculations using the Example_Class the calculations are significantly longer then for the instance where just a matrix is used. Are their methods available to speed up the calculations in R? Or should I look at extionsion to for example C++?
Example_Class <- setClass(
"Example",
slots = c(
slot_1 = "matrix",
slot_2 = "matrix"
),
prototype=list(
slot_1 = matrix(1, ncol = 1, nrow = 7),
slot_2 = matrix(1, ncol = 4, nrow = 7)
)
)
Example <- Example_Class()
example_matrix_1 <- matrix(1, ncol = 1, nrow = 7)
example_matrix_2 <- matrix(1, ncol = 4, nrow = 7)
example_list <- list(example_matrix_1, example_matrix_2)
profile <- microbenchmark::microbenchmark(
example_matrix_2[3,3] <- (example_matrix_2[3,3] + 1)/2,
example_list[[2]][3,3] <- (example_list[[2]][3,3] + 1)/2,
Example#slot_2[3,3] <- (Example#slot_2[3,3] + 1)/2,
times = 1000
)
profile

Resources