R equivalent python code gives different output - r

I am trying to write R equivalent code in python but not getting the same result. The R code is as follows:
# Set parameters
max.people = 50
max.trials = 500
plot.step = 1
# load libraries
library(tidyverse)
#Set up an initial data frame
df<-data.frame("trial"=NA,"people"=NA, "val"=NA)
# Set up a common theme for plots
ztheme<-function(){
theme_classic()+
theme(panel.background=element_rect(fill="#F0F0F0", color="#F0F0F0"))+
theme(plot.background=element_rect(fill="#F0F0F0", color="#F0F0F0"))}
#Run main loop
for(trial in 1:max.trials){
# set up a buffer. Makes the program run a lot faster.
buff<-data.frame("trial"=NA,"people"=NA, "val"=NA)
for(people in 1:max.people){
buff<-rbind(buff,data.frame("trial"=trial,"people"=people, "val"=NA))
samp<-sample(1:365, people, replace=T)
if(length(unique(samp))==length(samp)){
buff$val[nrow(buff)]<-0
}else{
buff$val[nrow(buff)]<-1
}; rm(samp)}
df<-rbind(df, buff); rm(buff)
print(paste(round(trial/(max.trials)*100, 2), "% Complete", sep=""))
}
df<-subset(df, !is.na(df$trial))
rm(max.people); rm(people); rm(trial)
# Generate multiple plots of result
for(n in seq(plot.step,max.trials,plot.step)){
print(
ggplot(summarise(group_by(subset(df, trial<=n), people), prob=mean(val)), aes(people, prob))+
geom_bar(stat="identity", fill="steelblue1")+
geom_smooth(se=F, color="black", method="loess")+
scale_y_continuous(labels=scales::percent, limits=c(0,1))+
labs(title="Birthday Paradox",
subtitle=paste("Based on",n,"simulations."),
x="Number of People in Room",
y="One or More Matching Birthdays (True/False Ratio)",
caption="created by /u/zonination")+
ztheme())
ggsave(paste("bday_", formatC(n,width=5,flag = "0"), ".png", sep=""), height=4.5, width=7, dpi=120, type="cairo-png")
}; rm(n)
I have written equivalent code in python as follows:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import random
plt.style.use('ggplot')
maxTrials = 500
maxPeople = 50
plotStep = 1
df = pd.DataFrame(columns=['trial','people','val'])
for trial in range(plotStep, maxTrials+1):
buff = pd.DataFrame()
for people in range(plotStep,maxPeople+1):
buff = buff.append(pd.DataFrame({'trial':[trial],'people':[people],'val':[np.nan]}), ignore_index=True)
samp = [random.randint(1,366) for x in range(people)]
if len(set(samp)) == len(samp):
buff.at[len(buff.index)-1,'val'] = 0
else:
buff.at[len(buff.index)-1,'val'] = 1
del(samp)
df = df.append(buff, ignore_index=True)
del(buff)
print(str(round(trial/(maxTrials)*100, 2)) + "% Complete")
df = df.dropna(axis=0, how='any')
del(maxPeople)
del(people)
del(trial)
for n in range(plotStep,5):
dfCopy = df.loc[df.trial<=n]
dfCopy = dfCopy.groupby(['people'])['val'].mean().to_frame(name='prob').reset_index()
print(dfCopy)
plt.bar(dfCopy['people'],
dfCopy['prob'],
color='blue',
edgecolor='none',
width=0.5,
align='center')
plt.suptitle("Birthday Paradox\n")
plt.title("Based on "+str(n)+" simulations.")
plt.yticks([0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0])
plt.xlabel("Number of people in room")
plt.ylabel("Probability of one or more matching birthdays")
plt.savefig("bday_"+str(n)+".png", dpi=110, bbox_inches='tight')
A few initial saved plot from R look like this but python output looks like this I want to know if this is because of rounding error of some sort.

The code is just fine but you don't clear your axes, so it will add every run without clearing the last.
Adding plt.cla() after plt.savefig(...) will make it look much like the R output

Related

I can't get my plots to a single grid please help correct my code

I have 11 plots and used a looping function to plot them see my code below. However, I can't get them to fit in just 1 page or less. The plots are actually too big. I am using R software and writing my work in RMarkdown. I have spent almost an entire week trying to resolve this.
group_by(Firm_category) %>%
doo(
~ggboxplot(
data =., x = "Means.type", y = "means",
fill ="grey", palette = "npg", legend = "none",
ggtheme = theme_pubr()
),
result = "plots"
)
graph3
# Add statistical tests to each corresponding plot
Firm_category <- graph3$Firm_category
xx <- for(i in 1:length(Firm_category)){
graph3.i <- graph3$plots[[i]] +
labs(title = Firm_category[i]) +
stat_pvalue_manual(stat.test[i, ], label = "p.adj.signif")
print(graph3.i)
}
#output3.long data sample below as comments
#Firm_category billmonth Means.type means
#Agric 1 Before 38.4444
#Agric 1 After 51.9
Complete data is on my github: https://github.com/Fridahnyakundi/Descriptives-in-R/blob/master/Output3.csv
This code prints all the graphs but in like 4 pages. I want to group them into a grid. I have tried to add all these codes below just before my last curly bracket and none is working, please help me out.
library(cowplot)
print(plot_grid(plotlist = graph3.i[1:11], nrow = 4, ncol = 3))
library(ggpubr)
print(ggarrange(graph3.i[1:11], nrow = 4, ncol = 3))
I tried the gridExtra command as well (they all seem to do the same thing). I am the one with a mistake and I guess it has to do with my list. I read a lot of similar work here, some suggested
dev.new()
dev.off()
I still didn't get what they do. But adding either of them caused my code to stop.
I tried defining my 'for' loop function say call it 'XX', then later call it to make a list of graph but it returned NULL output.
I have tried defining an empty list (as I read in some answers here) then counting them to make a list that can be printed but I got so many errors.
I have done this for almost 3 days and will appreciate your help in resolving this.
Thanks!
I tried to complete your code ... and this works (but I don't have your 'stat.test' object). Basically, I added a graph3.i <- list() and replaced graph3.i in the loop ..
Is it what you wanted to do ?
library(magrittr)
library(dplyr)
library(rstatix)
library(ggplot2)
library(ggpubr)
data <- read.csv(url('http://raw.githubusercontent.com/Fridahnyakundi/Descriptives-in-R/master/Output3.csv'))
graph3 <- data %>% group_by(Firm_category) %>%
doo(
~ggboxplot(
data =., x = "Means.type", y = "means",
fill ="grey", palette = "npg", legend = "none",
ggtheme = theme_pubr()
),
result = "plots"
)
graph3
# Add statistical tests to each corresponding plot
graph3.i <- list()
Firm_category <- graph3$Firm_category
xx <- for(i in 1:length(Firm_category)){
graph3.i[[i]] <- graph3$plots[[i]] +
labs(title = Firm_category[i]) # +
# stat_pvalue_manual(stat.test[i, ], label = "p.adj.signif")
print(graph3.i)
}
library(cowplot)
print(plot_grid(plotlist = graph3.i[1:11], nrow = 4, ncol = 3))

Weird characters appearing in the plot legend when using DoHeatmap

I was using Seurat to analyse single cell RNA-seq data and I managed to draw a heatmap plot with DoHeatmap() after clustering and marker selection, but got a bunch of random characters appearing in the legend. They are random characters as they will change every time you run the code. I was worrying over it's something related to my own dataset, so I then tried the test Seurat object 'ifnb' but still got the same issue (see the red oval in the example plot).
example plot
I also tried importing the Seurat object in R in the terminal (via readRDS) and ran the plotting function, but got the same issue there, so it's not a Rstudio thing.
Here are the codes I ran:
'''
library(Seurat)
library(SeuratData)
library(patchwork)
InstallData("ifnb")
LoadData("ifnb")
ifnb.list <- SplitObject(ifnb, split.by = "stim")
ifnb.list <- lapply(X = ifnb.list, FUN = function(x) {
x <- NormalizeData(x)
x <- FindVariableFeatures(x, selection.method = "vst", nfeatures = 2000)
})
features <- SelectIntegrationFeatures(object.list = ifnb.list)
immune.anchors <- FindIntegrationAnchors(object.list = ifnb.list, anchor.features = features)
immune.combined <- IntegrateData(anchorset = immune.anchors)
immune.combined <- ScaleData(immune.combined, verbose = FALSE)
immune.combined <- RunPCA(immune.combined, npcs = 30, verbose = FALSE)
immune.combined <- RunUMAP(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindNeighbors(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindClusters(immune.combined, resolution = 0.5)
DefaultAssay(immune.combined) <- 'RNA'
immune_markers <- FindAllMarkers(immune.combined, latent.vars = "stim", test.use = "MAST", assay = 'RNA')
immune_markers %>%
group_by(cluster) %>%
top_n(n = 10, wt = avg_log2FC) -> top10_immune
DoHeatmap(immune.combined, slot = 'data',features = top10_immune$gene, group.by = 'stim', assay = 'RNA')
'''
Does anyone have any idea how to solve this issue other than reinstalling everything?
I have been having the same issue myself and while I have solved it by not needing the legend, I think you could use this approach and use a similar solution:
DoHeatmap(immune.combined, slot = 'data',features = top10_immune$gene, group.by = 'stim', assay = 'RNA') +
scale_color_manual(
values = my_colors,
limits = c('CTRL', 'STIM'))
Let me know if this works! It doesn't solve the source of the odd text values but it does the job! If you haven't already, I would recommend creating a forum question on the Seurat forums to see where these characters are coming from!
When I use seurat4.0, I met the same problem.
While I loaded 4.1, it disappeared

Stop furrr::future_map from printing iteration AFTER finishing

I am using future_map to create several plots where I iterate through a list of variables and output/save a png file per variable to a folder. So there is no output that needs to be shown in the console or the "plot" pane.
The plotting part of the function:
ggplot(aes(sample = value,
color = key)) +
stat_qq(alpha = 0.8, size = 0.5) +
theme_light() +
theme(legend.position = "none") +
stat_qq_line() +
facet_wrap(~key,
ncol = 4) +
ggtitle(.var) +
ggsave(filename = here::here(paste0(.path,
.var,
".png")),
units = "cm",
width = 25,
height = 10)}
How I map the function:
plan(multiprocess(workers = 10))
future_map(names_list,
~check_dists(df_lips_imputed, .x, "doc/distributions/testing2/"),
verbose = FALSE)
However, after all files are created, I can see they are in the folder, this is slowly printed (takes a while, ~1k iterations):
[[1]]
[[2]]
[[3]]
...
Does anyone know how to suppress this output?
Many thanks!
If you install the development version of furrr with
devtools::install_github("DavisVaughan/furrr")
You can then use future_walk, which is acts like walk does versus map. With walk the function acts by side effects and so the return value is simply the input.
I was having the same issue. I'm not sure if this will change the time that it takes to print out the list elements at the end, but if you save your future_map call as a throwaway variable, it will save the output in that variable instead of printing out and clogging up your console or log file:
x <- future_map(names_list,
~check_dists(df_lips_imputed, .x, "doc/distributions/testing2/"),
verbose = FALSE)

R for loop execute last line before running previous lines

I have several text files and I read them in R as data frames. When I put those data frames in a list and do analysis in a for loop. It seems the code is not run line by line. If I add some code at the end of the loop, which may cause an error, the codes before it will not run in the for loop. Any one help to explain this for me?
below is my code:
rm(list=ls())
graphics.off()
#cost.seq = 10^seq(2, -3, length = 20)
k=7
error = data.frame(cv = c(1:k))
train.error = data.frame(cv = c(1:k))
report.rows = 3
report.name = "rda"
source("scriptd_stats01_read_feature.R")
source("scriptd_stats01_read_feature_fc.R")
source("scriptd_stats02_cv_functions.R")
multimodal.feature = scale(cbind(spm.vbm, alff, reho, label.fa, tract.fa, tract.md))
feature.list = list(spm.vbm, alff, reho, label.fa, tract.fa, tract.md, fc, multimodal.feature)
library(reshape2)
for (i.feature in 1:length(feature.list)) {
print("loop:")
print(i.feature)
brain.feature = scale(feature.list[[i.feature]])
df.all = cbind(subject.info[,-1], brain.feature)
report = data.frame(group = c("hc vs trauma", "ptsd vs trauma", "hc vs ptsd"), acc=rep(NA, report.rows),sensi=rep(NA,report.rows),speci=rep(NA,report.rows))
report.sd = data.frame(group = c("hc vs trauma", "ptsd vs trauma", "hc vs ptsd"), acc=rep(NA, report.rows),sensi=rep(NA,report.rows),speci=rep(NA,report.rows))
# ---------------------select data for ptsd 0 and 1 :---------------------
df.subset = df.all[df.all$ptsd==0|df.all$ptsd==1,]
print("dimension for subset dataset")
print(dim(df.subset))
print(table(df.subset$ptsd))
x = model.matrix(df.subset$ptsd~., df.subset)
y = df.subset$ptsd
print(dim(x))
set.seed(333)
cv.result = rda.cv.fun(x[,-1], y, k)
result = cv.result[[1]]
train.result = cv.result[[2]]
print(result)
print(train.result)
print("####")
print(colMeans(result, na.rm=T))
report[1,-1] = colMeans(result,na.rm=T)
report.sd[1,-1] = apply(result, 2, function(x)sd(na.omit(x)))
total.report = melt(report, id = ("acc", "sensi", "speci"))
print(total.report)
}
when I comment the last 2 lines in the for loop, it runs without error, and it print loop 1 2 3 to 8. But when I uncomment them, the previous codes did not run (no values are printed) and I got the error below:
Error: unexpected ',' in:
"
total.report = melt(report, id = ("acc","
Execution halted
This is very weird, as it seems the last two lines were executed before previous lines in the for loop.

Making simple R GUI with tcltk package

I'm trying to make very simple GUI for my script. In nutshell problem looks like that :
dataset is dataframe, I would like to plot one column as the time and use simple GUI for choosing next/previus column.
dataset <-data.frame(rnorm(10), rnorm(10), rnorm(10))
columnPlot <- function(dataset, i){
plot(dataset[, i])
}
how to use tcltk for calling fplot with different i's ?
Not what you asked for (not tcltkrelated), but I would advise you to have a look at the new shiny package from RStudio.
Are you particularly attached to the idea of using tcltk? I've been working on something similar using the gWidgets package and have had some success. According to it's CRAN site, "gWidgets provides a toolkit-independent API for building interactive GUIs". This package uses tcltk or GTK2 and I've been using the GTK2 portion. Here's a quick example of a GUI with a spinbutton for changing i. I also added a little fanciness to your function because you mentioned you would be plotting time series, so I made the x axis Time.
data<-data.frame(rnorm(11),rnorm(11),rnorm(11))
i = 1
fplot <- function(i, data = data){
library(ggplot2)
TimeStart <- as.Date('1/1/2012', format = '%m/%d/%Y')
plotdat <- data.frame(Value = data[ ,i], Time = seq(TimeStart,TimeStart + nrow(data) - 1, by = 1))
myplot <- ggplot(plotdat, aes(x = Time, y = Value))+
geom_line()
print(myplot)
}
library(gWidgets)
options(guiToolkit = 'RGtk2')
window <- gwindow ("Time Series Plots", visible = T)
notebook <- gnotebook (cont = window)
group1 <- ggroup(cont = notebook, label = "Choose i", horizontal=F)
ichooser <- gspinbutton(cont = group1, from = 1, to = ncol(data), by = 1, value = i, handler = function(h,...){
i <<- svalue(h$obj)})
plotbutton <- gbutton('Plot', cont = group1, handler=function(h,...){
fplot(i, data)})
graphicspane1 <- ggraphics(cont = group1)

Resources