XPlot trying to visualize data - plot

I am having a really hard time trying to visualize some data using f#. I am trying to achieve this on Linux environment using jupyter notebooks that I am running on localhost. I am following this article.
Everything seems to be fine, I managed to load all the needed script files, such as MathNet.Numerics and XPlot. I don't get any errors, my terminal is fine as well, kernel is in place. I wonder why am I not getting any graph reprisentation after I run my code?
It only says that I get back Xplot.Plotly.PlotlyChart, what about the actual graph? I am not sure if this would be enough to help me out, if not, let me know and will fill in other information. I tried different browsers as well, didn't help.
Actual code:
#load #"<project-root>/.paket/load/net45/MathNet.Numerics.fsx"
#load #"<project-root>/.paket/load/net45/MathNet.Numerics.FSharp.fsx"
#load #"<project-root>/.paket/load/net45/XPlot.Plotly.fsx"
open System
open System.Linq
open MathNet.Numerics.Distributions
open MathNet.Numerics.LinearAlgebra
open XPlot.Plotly
let n = 40
let nbsim = 1000
let lambda = 0.2
let randomSeed = 1111
let exponential = Exponential.Samples(new Random(randomSeed), lambda) |> Seq.take (n* nbsim) |> List.ofSeq
let m = Matrix<float>.Build.DenseOfRowMajor(nbsim, n, exponential)
let means = m.RowSums() / (float n)
means.Average()
let historyTrace =
Histogram(
x = means,
xbins =
Xbins(
start = 2.8,
``end`` = 7.75,
size = 0.08
),
marker =
Marker(
color = "yellow",
line =
Line(
color = "grey",
width = 1
)
),
opacity = 0.75,
name = "Exponental distribution"
) :> Trace
let meanTrace =
Scatter(
x = [5; 5],
y = [0; 60],
name = "Theorical mean"
) :> Trace
// Or plain historyTrace below
[historyTrace; meanTrace]
|> Chart.Plot
|> Chart.WithXTitle("Means")
|> Chart.WithYTitle("Frequency")
|> Chart.WithTitle("Distribution of 1000 means of exponential distribution")
Please note that #load statements include <project-root> placeholder. I am using Paket to generate scripts for #load.

This worked for me in the F# Azure Notebook.
Make sure to include this in a cell before you invoke the chart
#load "XPlot.Plotly.Paket.fsx"
#load "XPlot.Plotly.fsx"
open XPlot.Plotly
This is a quote from FSharp for Azure Notebooks:
Note that we had to #load two helper scripts in order to load the
assemblies we need and to enable Display to show our charts. The first
downloads and installs the required Paket packages, and the second
sets up Display support.
The key line for you is: #load "XPlot.Plotly.fsx"
That is the one that lets you display the chart in the notebook.
This is my code in the Azure notebook:
// cell 1
#load "XPlot.Plotly.Paket.fsx"
#load "XPlot.Plotly.fsx"
// cell 2
Paket.Package [ "MathNet.Numerics"
"MathNet.Numerics.FSharp" ]
#load "Paket.Generated.Refs.fsx"
// cell 3
open System
open System.Linq
open MathNet.Numerics.Distributions
open MathNet.Numerics.LinearAlgebra
open XPlot.Plotly
let n = 40
let nbsim = 1000
let lambda = 0.2
let randomSeed = 1111
let exponential = Exponential.Samples(new Random(randomSeed), lambda) |> Seq.take (n* nbsim) |> List.ofSeq
let m = Matrix<float>.Build.DenseOfRowMajor(nbsim, n, exponential)
...

Related

DiagrammeR: Force horizontal arrows to be straight rather than diagonal

I am attempting to recreate this example as a test for a flow diagram for a project I am working on: https://dannyjnwong.github.io/STROBE-CONSORT-Diagrams-in-R/
That page shows the code should result in a diagram that looks like this: https://dannyjnwong.github.io/figures/2018-02-12-STROBE-CONSORT-Diagrams-in-R/STROBE.png
However, when I try running the same exact code in RStudio I get this instead, the horizontal arrows do not render as horizontal, they instead curve downwards:
Is there any way to force these arrows to be straight and horizontal as they are in the github example? Could it perhaps be related to the version of DiagrammeR? That post uses DiagrammeR_0.9.2 while mine is using DiagrammeR_1.0.6.1 I would like to avoid having to roll back my version of the package if possible. Thanks!
I use ortho splines with DiagrammeR to get horizontal lines in my flowcharts. I tried using add_global_graph_attrs with create_graph in your example which produced horizontal lines but did not keep the architecture intact.
Here is how I have made similar graphs. I use glue for convenience to insert specific values and text in the flowchart. Perhaps this may be helpful for you.
library(DiagrammeR)
library(glue)
n <- 1000
exclude1 <- 100
exclude2 <- 50
include1 <- n - exclude1 - exclude2
grViz(
glue("digraph my_flowchart {{
graph[splines = ortho]
node [fontname = Helvetica, shape = box, width = 4, height = 1]
node1[label = <Total available patients<br/>(n = {n})>]
blank1[label = '', width = 0.01, height = 0.01]
excluded1[label = <Excluded because of<br/>exclusion criteria (n={exclude1})>]
node1 -> blank1[dir = none];
blank1 -> excluded1[minlen = 2];
{{ rank = same; blank1 excluded1 }}
blank2[label = '', width = 0.01, height = 0.01]
excluded2[label = <Excluded because of missing values (n={exclude2})>]
blank1 -> blank2[dir = none];
blank2 -> excluded2[minlen = 2];
{{ rank = same; blank2 excluded2 }}
node2[label = <Included for analysis<br/>(n={include1})>]
blank2 -> node2;
node3[label = <Data linked with<br/>external dataset>]
node2 -> node3;
}}")
)
Note: a couple of efforts have been made to construct CONSORT diagrams:
https://github.com/higgi13425/ggconsort
https://github.com/tgerke/ggconsort
Diagram

Parallel computing on two R servers using batchtools/BatchJobs

I'm trying to use batchtools/BatchJobs for parallel computing on two unix-based R servers. I'm completely new to this and hence followed a few articles and package details to do this. I have added some links below:
batchtools,
BatchJobs
So far I have not really understood how to use batchtools for multi-machines. On the other hand, with BatchJobs I have better progress.
I made an ssh connection from the terminal first and execute the following lines:
reg = makeRegistry("TestExp")
reg$cluster.functions = makeClusterFunctionsSSH(worker = makeSSHWorker(nodename="sla19438")) #By BatchJobs
#Test Function
piApprox = function(n) {
nums = matrix(runif(2 * n), ncol = 2)
d = sqrt(nums[, 1]^2 + nums[, 2]^2)
4 * mean(d <= 1)
}
set.seed(42)
piApprox(1000)
BatchJobs::batchMap(reg = reg, fun = piApprox, n = rep(1e7, 10))
getJobTable()
BatchJobs::submitJobs(reg = reg, resources = list(walltime = 3600, memory = 1024))
getStatus(reg = reg)
loadResult(reg = reg, id = 5)
mean(sapply(1:10, loadResult, reg = reg))
It works and gives me the results but I can't see any indication of the jobs being run on the other machine (sla19438) when I run "top" in the terminal.
Please help me understand what I'm doing wrong. Maybe there is some configuration needed but I don't see any material online which dumbs down the steps for a newbie like me.
Thanks

how to prepare image data for using in torch

I want to prepare my own image data for training in torch.
I tried to find a good source for this but could not find.
They have given reference to data that has been already prepared in .lua or .t7 formats.
Can you please explain the procedure of preparing raw image data for torch? (training, validation and test sets)
Thanks
you may try to write your own data loader class. store your image paths in a table, read image using
require 'image'
YOUR_RGB_FILE_PATH = "/home/username/image.png"
img = image.load(YOUR_RGB_FILE_PATH, 3)
Write your lua code in a iTorch notebook, it helps you debug quickly.
if you do not know how to start, you can refer to the project here wrote with lua torch.
require 'io'
require 'torch'
require 'image'
------------------------------ Parameters ---------------------------------
file_name = '.../train.txt'
save_name = '.../train.t7'
num_images = 10000*3
num_channels = 3
width = 51
height = 51
---------------------------------------------------------------------------
file = io.open(file_name, 'rb')
data = torch.Tensor(num_images, num_channels, width, height):byte()
label = torch.Tensor(num_images):byte()
counter = 1
for line in file:lines() do
print(counter)
image_name, image_label = line:split(' ')[1], line:split(' ')[2]
data[counter] = image.load(image_name, num_channels, 'byte')
label[counter] = image_label
counter = counter + 1
end
torch.save(save_name, {data = data, label = label})

Tensorflow: 6 layer CNN: OOM (use 10Gb GPU memory)

I am using the following code for running a 6 layer CNN with 2 FC layers on top (on Tesla K-80 GPU).
Somehow, it consumes entire memory 10GB and died out of memory.I know that i can reduce the batch_size and then run , but i also want to run with 15 or 20 CNN layers.Whats wrong with the following code and why it takes all the memory? How should i run the code for 15 layers CNN.
Code:
import model
with tf.Graph().as_default() as g_train:
filenames = tf.train.match_filenames_once(FLAGS.train_dir+'*.tfrecords')
filename_queue = tf.train.string_input_producer(filenames, shuffle=True, num_epochs=FLAGS.num_epochs)
feats,labels = get_batch_input(filename_queue, batch_size=FLAGS.batch_size)
### feats size=(batch_size, 100, 50)
logits = model.inference(feats, FLAGS.batch_size)
loss = model.loss(logits, labels, feats)
tvars = tf.trainable_variables()
global_step = tf.Variable(0, name='global_step', trainable=False)
# Add to the Graph operations that train the model.
train_op = model.training(loss, tvars, global_step, FLAGS.learning_rate, FLAGS.clip_gradients)
# Add the Op to compare the logits to the labels during evaluation.
eval_correct = model.evaluation(logits, labels, feats)
summary_op = tf.merge_all_summaries()
saver = tf.train.Saver(tf.all_variables(), max_to_keep=15)
# The op for initializing the variables.
init_op = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init_op)
summary_writer = tf.train.SummaryWriter(FLAGS.model_dir,
graph=sess.graph)
# Start input enqueue threads.
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
try:
step = 0
while not coord.should_stop():
_, loss_value = sess.run([train_op, loss])
if step % 100 == 0:
print('Step %d: loss = %.2f (%.3f sec)' % (step, loss_value))
# Update the events file.
summary_str = sess.run(summary_op)
summary_writer.add_summary(summary_str, step)
if (step == 0) or (step + 1) % 1000 == 0 or (step + 1) == FLAGS.max_steps:
ckpt_model = os.path.join(FLAGS.model_dir, 'model.ckpt')
saver.save(sess, ckpt_model, global_step=step)
#saver.save(sess, FLAGS.model_dir, global_step=step)
step += 1
except tf.errors.OutOfRangeError:
print('Done training for %d epochs, %d steps.' % (FLAGS.num_epochs, step))
finally:
coord.join(threads)
sess.close()
###################### File model.py ####################
def conv2d(x, W, b, strides=1):
# Conv2D wrapper, with bias and relu activation
x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1],
padding='SAME')
x = tf.nn.bias_add(x, b)
return tf.nn.relu(x)
def maxpool2d(x, k=2,s=2):
# MaxPool2D wrapper
return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, s,
s,1],padding='SAME')
def inference(feats,batch_size):
#feats size (batch_size,100,50,1) #batch_size=256
conv1_w=tf.get_variable("conv1_w", [filter_size,filter_size,1,256],initializer=tf.uniform_unit_scaling_initializer())
conv1_b=tf.get_variable("conv1_b",[256])
conv1 = conv2d(feats, conv1_w, conv1_b,2)
conv1 = maxpool2d(conv1, k=2,s=2)
### This was replicated for 6 layers and the 2 FC connected layers are added
return logits
def training(loss, train_vars, global_step, learning_rate, clip_gradients):
# Add a scalar summary for the snapshot loss.
tf.scalar_summary(loss.op.name, loss)
grads, _ = tf.clip_by_global_norm(tf.gradients(loss, train_vars,aggregation_method=1), clip_gradients)
optimizer = tf.train.AdamOptimizer(learning_rate)
train_op = optimizer.apply_gradients(zip(grads, train_vars), global_step=global_step)
return train_op
I am not too sure what the model python library is. If it is something you wrote and can change the setting in the optimizer I would suggest the following which I use in my own code
train_step = tf.train.AdamOptimizer(learning_rate).minimize(cost, aggregation_method = tf.AggregationMethod.EXPERIMENTAL_ACCUMULATE_N)
By default the aggeragetion_method is ADD_N but if you change it to EXPERIMENTAL_ACCUMULATE_N or EXPERIMENTAL_TREE this will greatly save memory. The main memory hog in these programs is that tensorflow must save the output values at every neuron so that it can compute the gradients. Changing the aggregation_method helps a lot from my experience.
Also BTW I don't think there is anything wrong with your code. I can run out of memory on small cov-nets as well.

How would you index a table that is being initialized?

An example of what I desire:
local X = {["Alpha"] = 5, ["Beta"] = this.Alpha+3}
print(X.Beta) --> error: [string "stdin"]:1: attempt to index global 'this' (a nil value)
is there a way to get this working, or a substitute I can use without too much code bloat(I want it to look presentable, so fenv hacks are out of the picture)
if anyone wants to take a crack at lua, repl.it is a good testing webpage for quick scripts
No there is no way to do this because the table does not yet exist and there is no notion of "self" in Lua (except via syntactic sugar for table methods). You have to do it in two steps:
local X = {["Alpha"] = 5}
X["Beta"] = X.Alpha+3
Note that you only need the square brackets if your key is not a string or if it is a string with characters other than any of [a-z][A-Z][0-9]_.
local X = {Alpha = 5}
X.Beta = X.Alpha+3
Update:
Based on what I saw on your pastebin, you probably should do this slightly differently:
local Alpha = 5
local X = {
Alpha = Alpha,
Beta = Alpha+3,
Gamma = someFunction(Alpha),
Eta = Alpha:method()
}
(obviously Alpha has no method because in the example it is a number but you get the idea, just wanted to show if Alpha were an object).

Resources