R Interactive Sankey Diagram + Hierarchize Nodes - r

I am trying to visualize sequences of events by using Sankey diagrams.
I have a set of event (Event1 to Event16) over sequences of different length.
The steps of the sequences are noted by T0, T0 - 1, T0 - 2 ...
The width of the flow is corresponding to the frequency rate of the sequences.
I would like that all the nodes corresponding to a given step to be aligned vertically.
By using the GoogleVis package I succeed to obtain the following :
Sankey with GoogleVis
As you can see some events T0-1, T0-2 and T0-3... are on the far right, instead of with the others of their time step.
It seems to be due to the fact that it is not possible to have nodes whithout children...
Do you know a way to have hierarchize nodes or/and nodes whithout children, for GoogleVis ?
If not, do you know another R package which could allow to have these characteristics for interactive plots ?
My R code is bellow. The main variable containing the sequences is a list of list, see picture.
Data containing sequences
My code :
# Package
library(googleVis)
library(dplyr)
library(reshape2)
library(tidyverse)
# Load
load("SeqCh")
# Loop -------------------------------------------------------------
# Inits
From = c()
To = c()
Freq = c()
Target = SeqCh
# Get maximum length of sequence
maxls = 0
for (kk in 1:length(Target)){
temp = length(Target[[kk]])
if (temp > maxls){
maxls = temp
}
}
# Loop on length of sequences
for (zz in 2:maxls){
# Prefix to add to manage same event repeated
if (zz == 2){
SufixFrom = "(T0)"
SufixTo = "(T0 - 1)"
} else {
SufixFrom = paste("(T0 - ", as.character(zz-2), ")", sep = "")
SufixTo = paste("(T0 - ", as.character(zz-1), ")", sep = "")
}
# Message
cat("\n")
print(paste(" Processing events from ", SufixFrom, " to ", SufixTo))
# Loop on Target
ind = lapply(Target, function(x) length(x) == zz)
TargetSub = Target[unlist(ind)]
FreqSub = Support[unlist(ind)]
for (jj in 1:length(TargetSub)){
temp = TargetSub[[jj]]
TempFrom = paste(temp[zz-1], SufixFrom, sep = " ")
TempTo = paste(temp[zz], SufixTo, sep = " ")
From = c(From, TempFrom)
To = c(To, TempTo)
Freq = c(Freq, FreqSub[jj])
}
} # end for loop on length of sequences
# All in same variable
Flows = data.frame("From" = From, "To" = To, "Occurence_Frequency" = Freq, stringsAsFactors = FALSE)
# Plot --------------------------------------------------------------------
plot(gvisSankey(Flows, from='From', to='To', weight="Occurence_Frequency",
options=list(height=900, width=1800, sankey="{link:{color:{fill:'lightblue'}}}")))
Thanks, Romain.

Related

%dopar% safe way of write to csv inside foreach loop

[EDITED]
It is a general question: I have seen some posts saying that it is not a good idea to use foreach and write.csv inside a foreach loop due to different cores trying to write in the file at the same time, resulting in missing results. Still, I need to write in an external file inside the parallel loop to get my output (500000+ rows and 10+ columns). Otherwise, it crushes for memory issues. So, I would like to know if there is a more safe way to write a result file within a foreach loop.
I appreciate any help on this
I am adding some more info and a much more simple code and data than what I actually have.
Description: I have two different polygons layers (sf, polygon), each with 500000+ sf. I need to calculate the area of different raster classes (1 raster layer with 3 classes) within each one of the polygons. This is the most time-consuming part of the script, specifically because I need to use sf::sf_intersection multiple times. Then, I use many different combinations of if-else and rules to populate a df with values and rules.
This is the original code, which I get memory issues with the original data:
require(sf)
require(raster)
require(rgdal)
require(rgeos)
require(dplyr)
require(stars)
## Sample data
set.seed(131)
sample_raster = raster(nrows = 1, ncols = 1, res = 0.5, xmn = 0, xmx = 11, ymn = 0, ymx = 11)
values(sample_raster) = rep(1:3, length.out = ncell(sample_raster))
crs(sample_raster) = CRS('+init=EPSG:4326')
plot(sample_raster, axes=T)
sample_raster
##
m = rbind(c(0,0), c(1,0), c(1,1), c(0,1), c(0,0))
p = st_polygon(list(m))
n = 100
l = vector("list", n)
for (i in 1:n)
l[[i]] = p + 10 * runif(2)
sample_poly = st_sfc(l)
data = data.frame(PR_ID = seq(1:100),
COND1 = rep(1:10, length.out = 100))
sample_poly = st_sf(cbind(data, sample_poly))
plot(sample_poly, col = sf.colors(categorical = TRUE, alpha = .5), add=T)
sample_poly = sample_poly %>% st_set_crs(4326)
sample_poly
##
## Code
require(parallel)
require(foreach)
require(doParallel)
idall = as.character(sample_poly$PR_ID)
area = as.numeric(st_area(sample_poly))/10000
# i=1
# listID = idall
# mainpoly = sample_poly
# mainras = sample_raster
# mainpolyarea = area
per.imovel.paralallel = function (listID, mainpoly, mainras, mainpolyarea) { # Starting the function
## Setting the parallel work up into your computer
UseCores = detectCores()-1
cl = parallel::makeCluster(UseCores, output="")
doParallel::registerDoParallel(cl)
writeLines(c(""), "log.txt") # Creates a LOG FILE in the folder to follow processing
FOREACH.RESULT = foreach(i = 1:length(listID), .packages=c('raster', 'rgdal', 'rgeos', 'dplyr', 'parallel',
'doParallel', 'sf', 'stars'), .inorder = T , .combine ='rbind') %dopar%
{ # Stating the paral-loop
sink("log.txt", append=TRUE) # LOG FILE in the home folder
cat(paste(i, "of", length(listID), as.character(Sys.time()),"\n")) # Write to LOG FILE
sink() # end diversion of output
########################
### Pick one poly
px = sf::st_buffer(mainpoly[mainpoly$PR_ID == listID[i],], # Conditional to select the geometry PR_ID in position i
dist = 0.1) # buffer = 0 w/ byid, selects the geometry
########################
### Intersect with raster and get area
px2 = sf::st_buffer(px, dist = 0.1) # Buffer because raster::mask() masks out partially covered cells since it call rasterize() first
desm_prop = raster::crop(mainras, as_Spatial(px2))
desm_prop_shp = if(all(is.na(values(desm_prop)))){NULL
} else {sf::st_intersection(st_cast(sf::st_as_sf(stars::st_as_stars(desm_prop)), "POLYGON"), px)}
names(desm_prop_shp)[1] = if(any(names(desm_prop_shp) == "layer")){"values"
} else {NULL}
desm_prop_bet0108 = if(is.null(desm_prop_shp)){NULL
} else {desm_prop_shp[desm_prop_shp$values == 1, ]}
desm_prop_bet0108 = if(is.null(desm_prop_bet0108) | length(desm_prop_bet0108) == 0){NULL
} else if(length(desm_prop_bet0108$values) == 0){NULL
} else {desm_prop_bet0108}
desm_prop_after08 = if(is.null(desm_prop_shp)){NULL
} else {desm_prop_shp[desm_prop_shp$values == 2, ]}
desm_prop_after08 = if(is.null(desm_prop_after08) | length(desm_prop_after08) == 0){NULL
} else if(length(desm_prop_after08$values) == 0){NULL
} else {desm_prop_after08}
desm_prop_upto00 = if(is.null(desm_prop_shp)){NULL
} else {desm_prop_shp[desm_prop_shp$values == 3, ]}
desm_prop_upto00 = if(is.null(desm_prop_upto00) | length(desm_prop_upto00) == 0){NULL
} else if(length(desm_prop_upto00$values) == 0){NULL
} else {desm_prop_upto00}
area_desm_prop_bet0108 <- if(is.null(desm_prop_bet0108)){0
} else { sum(as.numeric(sf::st_area(desm_prop_bet0108)/10000))} # Deforestation area in PX 2001 - 2008
area_desm_prop_after08 <- if(is.null(desm_prop_after08)){0
} else { sum(as.numeric(sf::st_area(desm_prop_after08)/10000))} # Deforestation area in PX after 2008
area_desm_prop_upto00 <- if(is.null(desm_prop_upto00)){0
} else { sum(as.numeric(sf::st_area(desm_prop_upto00)/10000))} # Deforestation area in PX upto 2000
########################
# RESULTS
TEMP.RESULTS = data.frame(PR_ID = as.character(listID[i]),
PR_AREA_HA = mainpolyarea[i],
PR_D09 = area_desm_prop_after08,
PR_D0108 = area_desm_prop_bet0108,
PR_D00 = area_desm_prop_upto00)
return (TEMP.RESULTS)
} # Ending the loop
return (FOREACH.RESULT)
parallel::stopCluster(cl) # stop cluster
stopImplicitCluster() # stop cluster
gc()
} # Ending the function
#####################################################################################################
results_feach = per.imovel.paralallel (listID = idall, mainpoly = sample_poly, mainras = sample_raster, mainpolyarea = area)
warnings()
I have also tried #mischva11 (modified) suggestion by adding this:
length_of_chunk = round(length(idall)/(length(idall)/10)) # generate chunks of 10 lines
lchunks = split(idall, sort(rep_len(1:length_of_chunk, length(idall))))
for (z in 1:length_of_chunk){
# split up the data in chunks
idall_chunk = as.vector(unlist(lchunks[z]))
results_chunk = per.imovel.paralallel (listID = idall_chunk, mainpoly = sample_poly, mainras = sample_raster, mainpolyarea = area)
# save your foreach results for each chunk, append after the first one
if (z == 1) {write.table(results_chunk, file = "TESTDATAresults1.csv")
}else {write.table(results_chunk, file = "TESTDATAresults1.csv", append = TRUE, col.names = FALSE)}
print(NULL) # print(results_chunk)
}
It works like a charm for this example.
BUT, I have a setback when running it with the real script/data: it takes ages for the foreach to close. I am watching my machine performance and log file.. after processing all lines of my sf object, my CPU work goes down as expected, but it still takes more than 30min (i did not wait for it to completely finish) to close the foreach function.
Because of it, I thought about writing the output on the flow inside the foreach work. But clearly it is not a good idea as explained here. I have seen some posts about the package 'flock' which look the output file for writing the output. I have not tested but it sounds promising.
The problem here is, that you need communication between the cores. One core has to wait for the next one until it's finished writing in the csv. That's not easily done and not possible with foreach as far as I now. foreach does provide this method with the variable inorder(by default true). You are telling us, you got memory issues. So one solution is to chunk up your output if it's possible. I do not have a good dataset for this example, so I use mtcars which will be filled by NAs
library(foreach)
library(parallel)
library(doParallel)
registerDoParallel(4)
# split your output here, I use 5 chunks here. My data is mtcars */
length_of_chunk <-round(nrow(mtcars)/5)
for ( z in 1:length_of_chunk-1){
x<-0
#here the data gets split up
data <- mtcars[(z*length_of_chunk):(z*length_of_chunk+length_of_chunk),]
#foreach with those 5 datarows
results <- foreach(i=1:length_of_chunk, .combine=rbind) %dopar% {
#***your code***
y = data[i,]
return(y)
}
print(results)
# save your foreach results and then begin again
if (z==1) {write.table(results, file= "test.csv")}
else {write.table(results, file="test.csv", append=TRUE, col.names = FALSE)}
}

How to read a .MAP file extension in R?

Is there a simple way to read a file of .MAP extension in R? I have tried a few options below but had no success. Here is a .MAP file for a reproducible example.
context: For some odd reason, the spatial regionalization used in health planning policies in Brazil is only available in this format. I would like to convert it to geopackage so we can add it to the geobr package.
# none of these options work
mp <- sf::st_read("./se_mapas_2013/se_regsaud.MAP")
mp <- rgdal::readGDAL("./se_mapas_2013/se_regsaud.MAP")
mp <- rgdal::readOGR("./se_mapas_2013/se_regsaud.MAP")
mp <- raster::raster("./se_mapas_2013/se_regsaud.MAP")
mp <- stars::read_stars("./se_mapas_2013/se_regsaud.MAP")
ps. there is a similar question on SO focused on Python, unfortunately unanswered
UPDATE
We have found a publication that uses a custom function that reads the .MAP file. See example below. However, it returns a "polylist" object. Is there a simple way to convert it to a simple feature?
original custom function
read.map = function(filename){
zz=file(filename,"rb")
#
# header of .map
#
versao = readBin(zz,"integer",1,size=2) # 100 = versao 1.00
#Bounding Box
Leste = readBin(zz,"numeric",1,size=4)
Norte = readBin(zz,"numeric",1,size=4)
Oeste = readBin(zz,"numeric",1,size=4)
Sul = readBin(zz,"numeric",1,size=4)
geocodigo = ""
nome = ""
xleg = 0
yleg = 0
sede = FALSE
poli = list()
i = 0
#
# repeat of each object in file
#
repeat{
tipoobj = readBin(zz,"integer",1,size=1) # 0=Poligono, 1=PoligonoComSede, 2=Linha, 3=Ponto
if (length(tipoobj) == 0) break
i = i + 1
Len = readBin(zz,"integer",1,size=1) # length byte da string Pascal
geocodigo[i] = readChar(zz,10)
Len = readBin(zz,"integer",1,size=1) # length byte da string Pascal
nome[i] = substr(readChar(zz,25),1,Len)
xleg[i] = readBin(zz,"numeric",1,size=4)
yleg[i] = readBin(zz,"numeric",1,size=4)
numpontos = readBin(zz,"integer",1,size=2)
sede = sede || (tipoobj = 1)
x=0
y=0
for (j in 1:numpontos){
x[j] = readBin(zz,"numeric",1,size=4)
y[j] = readBin(zz,"numeric",1,size=4)
}
# separate polygons
xInic = x[1]
yInic = y[1]
for (j in 2:numpontos){
if (x[j] == xInic & y[j] == yInic) {x[j]=NA; y[j] = NA}
}
poli[[i]] = c(x,y)
dim(poli[[i]]) = c(numpontos,2)
}
class(poli) = "polylist"
attr(poli,"region.id") = geocodigo
attr(poli,"region.name") = nome
attr(poli,"centroid") = list(x=xleg,y=yleg)
attr(poli,"sede") = sede
attr(poli,"maplim") = list(x=c(Oeste,Leste),y=c(Sul,Norte))
close(zz)
return(poli)
}
using original custom function
mp <- read.map("./se_mapas_2013/se_regsaud.MAP")
class(mp)
>[1] "polylist"
# plot
plot(attributes(mp)$maplim, type='n', asp=1, xlab=NA, ylab=NA)
title('Map')
lapply(mp, polygon, asp=T, col=3)
The problems were: use of readChar with trailing nul bytes - changed to readBin(); 8-bit characters that rawToChar() would not accept (on my UTF-8 system); multiple slivers in some files that needed dropping; and some others. I added the edited read.map() function above to maptools, but with a different name and not exported. So now (with maptools rev 370 from https://r-forge.r-project.org/R/?group_id=943 when build completes):
library(maptools)
o <- maptools:::readMAP2polylist("se_regsaud.MAP")
oo <- maptools:::.makePolylistValid(o)
ooo <- maptools:::.polylist2SpP(oo, tol=.Machine$double.eps^(1/4))
rn <- row.names(ooo)
df <- data.frame(ID=rn, row.names=rn, stringsAsFactors=FALSE)
res <- SpatialPolygonsDataFrame(ooo, data=df)
library(sf)
res_sf <- st_as_sf(res)
res_sf
plot(st_geometry(res_sf))
This approach re-uses the maptools code dating back almost twenty years, with minor edits to handle subsequent changes in reading binary files, and fixing slivers.
EDIT: looks like this doesn't work generally across all files so proper conversion to sf would need a deeper look.
Here's a quick stab at resurrection. It might be incorrect to cumulatively sum to get the multi linestrings, I tested with se_municip.MAP and it only had NAs as the closing row of each ring. If it potentially has non-connected multi-rings (multipolygon) then this approach won't work completely.
x <- read.map("se_municip.MAP")
df <- setNames(as.data.frame(do.call(rbind, x)), c("x", "y"))
df$region.name <- rep(attr(x, "region.name"), unlist(lapply(x, nrow)))
## in case there are multi-rings
df$linestring_id <- cumsum(c(0, diff(is.na(df$x))))
df$polygon_id <- as.integer(factor(df$region.name))
df <- df[!is.na(df$x), ]
sfx <- sfheaders::sf_polygon(df, x = "x", y = "y", linestring_id = "linestring_id", polygon_id = "polygon_id", keep = TRUE)
#sf::st_crs(sfx) <- sf::st_crs(<whatever it is probably 4326>)
plot(sf::st_geometry(sfx), reset = FALSE)
maps::map(add = TRUE)
Interesting that you came across an official version of a forgotten legacy!
(BTW can I publish the data sets in a package?)

Properly calling variable name when creating multiple Benford plots

I am creating Benford plots for all the numeric variables in my dataset. https://en.wikipedia.org/wiki/Benford%27s_law
Running a single variable
#install.packages("benford.analysis")
library(benford.analysis)
plot(benford(iris$Sepal.Length))
Looks great. And the legend says "Dataset: iris$Sepal.Length", perfect!.
Using apply to run 4 variables,
apply(iris[1:4], 2, function(x) plot(benford(x)))
Creates four plots, however, each plot's legend says "Dataset: x"
I attempted to use a for loop,
for (i in colnames(iris[1:4])){
plot(benford(iris[[i]]))
}
This creates four plots, but now the legends says "Dataset: iris[[i]]". And I would like the name of the variable on each chart.
I tried a different loop, hoping to get titles with an evaluated parsed string like "iris$Sepal.Length":
for (i in colnames(iris[1:4])){
plot(benford(eval(parse(text=paste0("iris$", i)))))
}
But now the legend says "Dataset: eval(parse(text=paste0("iris$", i)))".
AND, Now I've run into the infamous eval(parse(text=paste0( (eg: How to "eval" results returned by "paste0"? and R: eval(parse(...)) is often suboptimal )
I would like labels such as "Dataset: iris$Sepal.Length" or "Dataset: Sepal.Length". How can I create multiple plots with meaningfully variable names in the legend?
This is happening because of the first line within the benford function=:
benford <- function(data, number.of.digits = 2, sign = "positive", discrete=TRUE, round=3){
data.name <- as.character(deparse(substitute(data)))
Source: https://github.com/cran/benford.analysis/blob/master/R/functions-new.R
data.name is then used to name your graph. Whatever variable name or expression you pass to the function will unfortunately be caught by the deparse(substitute()) call, and will be used as the name for your graph.
One short-term solution is to copy and rewrite the function:
#install.packages("benford.analysis")
library(benford.analysis)
#install.packages("data.table")
library(data.table) # needed for function
# load hidden functions into namespace - needed for function
r <- unclass(lsf.str(envir = asNamespace("benford.analysis"), all = T))
for(name in r) eval(parse(text=paste0(name, '<-benford.analysis:::', name)))
benford_rev <- function{} # see below
for (i in colnames(iris[1:4])){
plot(benford_rev(iris[[i]], data.name = i))
}
This has negative side effects of:
Not being maintainable with package revisions
Fills your GlobalEnv with normally hidden functions in the package
So hopefully someone can propose a better way!
benford_rev <- function(data, number.of.digits = 2, sign = "positive", discrete=TRUE, round=3, data.name = as.character(deparse(substitute(data)))){ # changed
# removed line
benford.digits <- generate.benford.digits(number.of.digits)
benford.dist <- generate.benford.distribution(benford.digits)
empirical.distribution <- generate.empirical.distribution(data, number.of.digits,sign, second.order = FALSE, benford.digits)
n <- length(empirical.distribution$data)
second.order <- generate.empirical.distribution(data, number.of.digits,sign, second.order = TRUE, benford.digits, discrete = discrete, round = round)
n.second.order <- length(second.order$data)
benford.dist.freq <- benford.dist*n
## calculating useful summaries and differences
difference <- empirical.distribution$dist.freq - benford.dist.freq
squared.diff <- ((empirical.distribution$dist.freq - benford.dist.freq)^2)/benford.dist.freq
absolute.diff <- abs(empirical.distribution$dist.freq - benford.dist.freq)
### chi-squared test
chisq.bfd <- chisq.test.bfd(squared.diff, data.name)
### MAD
mean.abs.dev <- sum(abs(empirical.distribution$dist - benford.dist)/(length(benford.dist)))
if (number.of.digits > 3) {
MAD.conformity <- NA
} else {
digits.used <- c("First Digit", "First-Two Digits", "First-Three Digits")[number.of.digits]
MAD.conformity <- MAD.conformity(MAD = mean.abs.dev, digits.used)$conformity
}
### Summation
summation <- generate.summation(benford.digits,empirical.distribution$data, empirical.distribution$data.digits)
abs.excess.summation <- abs(summation - mean(summation))
### Mantissa
mantissa <- extract.mantissa(empirical.distribution$data)
mean.mantissa <- mean(mantissa)
var.mantissa <- var(mantissa)
ek.mantissa <- excess.kurtosis(mantissa)
sk.mantissa <- skewness(mantissa)
### Mantissa Arc Test
mat.bfd <- mantissa.arc.test(mantissa, data.name)
### Distortion Factor
distortion.factor <- DF(empirical.distribution$data)
## recovering the lines of the numbers
if (sign == "positive") lines <- which(data > 0 & !is.na(data))
if (sign == "negative") lines <- which(data < 0 & !is.na(data))
if (sign == "both") lines <- which(data != 0 & !is.na(data))
#lines <- which(data %in% empirical.distribution$data)
## output
output <- list(info = list(data.name = data.name,
n = n,
n.second.order = n.second.order,
number.of.digits = number.of.digits),
data = data.table(lines.used = lines,
data.used = empirical.distribution$data,
data.mantissa = mantissa,
data.digits = empirical.distribution$data.digits),
s.o.data = data.table(second.order = second.order$data,
data.second.order.digits = second.order$data.digits),
bfd = data.table(digits = benford.digits,
data.dist = empirical.distribution$dist,
data.second.order.dist = second.order$dist,
benford.dist = benford.dist,
data.second.order.dist.freq = second.order$dist.freq,
data.dist.freq = empirical.distribution$dist.freq,
benford.dist.freq = benford.dist.freq,
benford.so.dist.freq = benford.dist*n.second.order,
data.summation = summation,
abs.excess.summation = abs.excess.summation,
difference = difference,
squared.diff = squared.diff,
absolute.diff = absolute.diff),
mantissa = data.table(statistic = c("Mean Mantissa",
"Var Mantissa",
"Ex. Kurtosis Mantissa",
"Skewness Mantissa"),
values = c(mean.mantissa = mean.mantissa,
var.mantissa = var.mantissa,
ek.mantissa = ek.mantissa,
sk.mantissa = sk.mantissa)),
MAD = mean.abs.dev,
MAD.conformity = MAD.conformity,
distortion.factor = distortion.factor,
stats = list(chisq = chisq.bfd,
mantissa.arc.test = mat.bfd)
)
class(output) <- "Benford"
return(output)
}
I have just updated the package (GitHub version) to allow for a user supplied name.
Now the function has a new parameter called data.name in which you can provide a character vector with the name of the data and override the default. Thus, for your example you can simply run the following code.
First install the GitHub version (I will submit this version to CRAN soon).
devtools::install_github("carloscinelli/benford.analysis") # install new version
Now you can provide the name of the data inside the for loop:
library(benford.analysis)
for (i in colnames(iris[1:4])){
plot(benford(iris[[i]], data.name = i))
}
And all the plots will have the correct naming as you wish (below).
Created on 2019-08-10 by the reprex package (v0.2.1)

R baseline package saving plots in a loop

I'm trying to optimize the parameters for baseline in the R baseline package by changing each parameters in a loop and comparing plots to determine which parameters give me the best baseline.
I currently have the code written so that the loop produces each plot, but I'm having trouble with getting the plot saved as the class of each object I'm creating is a baseline package-specific (which I'm suspecting is the problem here).
foo <- data.frame(Date=seq.Date(as.Date("1957-01-01"), by = "day",
length.out = ncol(milk$spectra)),
Visits=milk$spectra[1,],
Old_baseline_visits=milk$spectra[1,], row.names = NULL)
foo.t <- t(foo$Visits)
#the lines above were copied from https://stackoverflow.com/questions/37346967/r-packagebaseline-application-to-sample-dataset to make a reproducible dataset
df <- expand.grid(lambda=seq(1,10,1), p=seq(0.01,0.1,0.01))
baselinediff <- list()
for(i in 1:nrow(df)){
thislambda <- df[i,]$lambda
thisp <- df[i,]$p
thisplot <- baseline(foo.t, lambda=thislambda, p=thisp, maxit=20, method='als')
print(paste0("lambda = ", thislambda))
print(paste0("p = ", thisp))
print(paste0("index = ", i))
baselinediff[[i]] <- plot(thisplot)
jpeg(file = paste(baselinediff[[i]], '.jpeg', sep = ''))
dev.off()
}
I know that I would be able to extract corrected spectra using baseline.als but I just want to save the plot images with the red baseline so that I can see how well the baselines are getting drawn. Any baseline users out there that can help?
I suggest you change your loop in the following way:
for(i in 1:nrow(df)){
thislambda <- df[i,]$lambda
thisp <- df[i,]$p
thisplot <- baseline(foo.t, lambda=thislambda, p=thisp, maxit=20, method='als')
print(paste0("lambda = ", thislambda))
print(paste0("p = ", thisp))
print(paste0("index = ", i))
baselinediff[[i]] <- thisplot
jpeg(file = paste('baseline', i, '.jpeg', sep = ''))
plot(baselinediff[[i]])
dev.off()
}
Note that this does not try to capture the already plotted element (thisplot) inside of the list. Instead, the plotting is done after you call the jpeg command. This solves your export issue. Another problem was the naming of the file. If you call baselinediff[[i]] inside of paste, you apparently end up with an error. So I switched it to a simpler name. To plot your resulting list, call:
lapply(baselinediff, plot)
If you are determined on storing the already plotted element, the capture.plotfunction from the imager package might be a good start.

R WGCNA Cytoscape hub genes

I have the following problem with
WGCNA - http://labs.genetics.ucla.edu/horvath/htdocs/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/
Working on Section 1.6, Export of networks to external software (Cytoscape)
I'm currently trying to perform WGCNA on a set of genes and I'm having trouble getting the top x hub genes for each module. I am trying to export a network to Cytoscape and used the same method for getting the top x hub genes as outlined for exporting to VisANT.
# Select modules (only interested in one for now)
modules = c("greenyellow")
# Select module probes
probes = names(datExpr)
inModule = is.finite(match(bwModuleColors, modules))
modProbes = probes[inModule]
modGenes = annot$gene_symbols[match(modProbes, annot$geneID)]
# Select the corresponding Topological Overlap
modTOM = TOM[inModule, inModule]
dimnames(modTOM) = list(modProbes, modProbes)
# Restrict the network to the top 30 genes
nTop = 30
IMConn = softConnectivity(datExpr[, modProbes]
top = (order(-IMConn) <= nTop)
# Export the network into a fomat that Cytoscape can read
cyt = exportNetworkToCytoscape(modTOM[top, top],
edgeFile = paste("CytoscapeInput-edges-", paste(modules, collapse="-"), ".txt", sep = ""),
nodeFile = paste("CytoscapeInput-nodes-", paste(modules, collapse="-"), ".txt", sep = ""),
weight = TRUE,
threshold = 0.02,
nodeNames = modProbes,
altNodeNames = modGenes,
nodeAttr = bwModuleColors[inModule])
I've written a short loop to count the number of connections to each gene and it works as expected, but the xth gene consistently has zero connections. Let's say that x is 30. If I increase the cutoff to 31 hub genes, the 30th gene now shows connections to the others in the network, but the 31st gene shows nothing. In addition, this change increases AND decreases some of the number of connections to other genes in the network. This really bothers me, because connections should only be added, since the network is getting bigger by one gene, and the changes should be accounted for by the 30th gene, but this is not the case for the output.
# Split the cytoscape file into two parts: edge and node
node <- cyt$nodeData
edge <- cyt$edgeData
# The limit covers all of the connections in the edge file by determining the length of the column ‘fromNode’
limit <- length(edge$fromNode)
# Create an empty list to store the counts for each gene
counts = list()
# Loop for the genes going from 1 to the number of genes specified for the network, ‘nTop’
for (i in 1:nTop) {
# Reset the count for each new gene and specify the names of the gene of interest and the matching genes
name = node$nodeName[[i]]
count = 0
# Nested loop that searches for matches to the gene in question in both the ‘fromNode’ and ‘toNode’columns, and adds one to the count for each match.
for (j in 1:limit) {
matchName1 = edge$fromNode[[j]]
matchName2 = edge$toNode[[j]]
if (name == matchName1 || name == matchName2)
{count = count + 1}
}
# Create a string for the attribute in the correct format
attribute <- paste(name, "=", count)
# Adds the count to the list
counts <- c(counts, attribute)
}
# End of loop
The loop seems to be working as expected, so I'm thinking that the problem is with the network construction. I'm currently referring back to what I know about linear algebra, matrices and topology to try to see if the problem is the way they're being sorted or something like that, but it might just be the way that the exportNetworkToCytoscape() function works.
modules = "brown";
probes = rownames(datExpr_human) ======> data genes in row and samples in column.
inModule = is.finite(match(modules_human,modules))
modTOM = dissTOM_Human[inModule, inModule];
modProbes = probes[inModule];
dimnames(modTOM) = list(modProbes, modProbes)
nTop = 30;
datExpr = t(datExpr_human)
IMConn = softConnectivity(datExpr[, modProbes]);
top = (rank(-IMConn) <= nTop)
cyt = exportNetworkToCytoscape(modTOM[top, top],
edgeFile = paste("CytoscapeInput-edges-", paste(modules, collapse="-"), ".txt", sep=""),
nodeFile = paste("CytoscapeInput-nodes-", paste(modules, collapse="-"), ".txt", sep=""),
weighted = TRUE)

Resources