I want to convert my articial neural network implementations to the new tensorflow 2 platform, where keras is an implicit part of (tf.keras). Are there any recommended sources that explain the implementation of ANNs using tensorflow 2/tf.keras within R?
Furthermore, why there is an extra keras package from F. Chollet available, when keras is as mentioned an implicit part of tensorflow now?
Sorry guys maybe for such basic questions, but my own searches were unfortunately not crowned with success.
From original tensorflow documentation I extract the following Python code:
input1 = keras.layers.Input(shape=(16,))
x1 = keras.layers.Dense(8, activation='relu')(input1)
input2 = keras.layers.Input(shape=(32,))
x2 = keras.layers.Dense(8, activation='relu')(input2)
added = keras.layers.add([x1, x2])
out = keras.layers.Dense(4)(added)
model = keras.models.Model(inputs=[input1, input2], outputs=out)
My own R conversions are
library(tensorflow)
k <- tf$keras
l <- k$layers
input1 <- k$layers$Input(shape = c(16,?))
x1 <- k$layers$Dense(units = 8, activation = "relu") (input1)
input2 <- k$layers$Input(shape = c(32,?))
x2 <- k$layers$Dense(units = 8, activation = "relu") (input2)
added <- k$layers$add(inputs = c(x1,x2))
My question hopefully seems not to be too stupid, but I've problems to implement a python tuple resp. scalar into its R equivalent. So my question: How must the shape argument in the input layers be converted into R?
I think the following page should provide the answer to your question: https://blogs.rstudio.com/ai/posts/2019-10-08-tf2-whatchanges/.
In essence, your code should stay the same if you are using Keras with a version 2.2.4.1 or above. For more details, refer to the linked site above.
Related
I am trying to use the Nomad technique for blackbox optimisation from the crs package (C implementation), which is called via the snomadr function. The method works when trying straight numerical optimisation, but errors when categorical features are included. However the help for categorical optimisation is not very well documented, so I am struggling to see where I am going wrong. Reproducible code below:
library(crs)
library(randomForest)
Illustrating this on randomForest & the iris dataset.
Creating the randomForest model (leaving the last row out as starting points for the optimizer)
rfIris <- randomForest(x=iris[-150,-c(1)], y=unlist(iris[-150,1]))
The objective function (functions we want to optimize)
objFn <- function(x0,model){
preds <- predict(object = model, newdata = x0)
as.numeric(preds)
}
Test to see if the objective function works (should return ~6.37)
objOut <- objFn(x0=unlist(iris[150,-c(1)]),model = rfIris)
Creating initial conditions, options list, and upper/lower bounds for Nomad
x0 <- iris[150,-c(1)]
x0 <- unlist(x0)
options <- list("MAX_BB_EVAL"=10000,
"MIN_MESH_SIZE"=0.001,
"INITIAL_MESH_SIZE"=1,
"MIN_POLL_SIZE"=0.001,
"NEIGHBORS_EXE" = c(1,2,3),
"EXTENDED_POLL_ENABLED" = 'yes',
"EXTENDED_POLL_TRIGGER" = 'r0.01',
"VNS_SEARCH" = '1')
up <- c(10,10,10,10)
low <- c(0,0,0,0)
Calling the optimizer
opt <- snomadr(eval.f = objFn, n = 4, bbin = c(0,0,0,2), bbout = 0, x0= x0 ,model = rfIris, opts=options,
ub = up, lb = low)
and I get an error about the NEIGHBORS_EXE parameter in the options list. It seems as if I need to supply NEIGHBORS_EXE a file corresponding to a set of 'extended poll' coordinates, however is it not clear what these exactly are.
The method works by setting "EXTENDED_POLL_ENABLED" = 'no' in the options list, as it then ignores the categorical variables and defaults to numerical optimisation, but this is not what I want.
I also managed to pull up some additional information for NEIGHBORS_EXE using
snomadr(information=list("help"="-h NEIGHBORS_EXE"))
and again, do not understand what the 'neighbours.exe' is meant to be.
Any help would be much appreciated!
This is the response from Zhenghua who coded the R interface:
The issue is that he did not configure the parameter “NEIGHBORS_EXE” properly. He need to prepare an Executable file for defining the neighbors, put the executable file in the folder where R is called, and then set the parameter “NEIGHBORS_EXE” to the executable file name.
You can contact us at nomad#gerad.ca if you wish to continue the discussion.
About the neighbours_exe parameter you can refer to the section 7.1 of user guide of Nomad
https://www.gerad.ca/nomad/Downloads/user_guide.pdf
I am currently working on a project where I have to do some feature selection for building a predictive model. I was lead to a package in R called mRMRe. I am just trying to work the example but cannot get it working. The example can be found here - http://www.inside-r.org/packages/cran/mRMRe/docs/mRMR.ensemble.
Here is my code -
data(cgps)
data <- data.frame(target=cgps.ic50, cgps.ge)
mRMR.ensemble(data, 1, rep.int(1, 30))
When I run this code I get the error -
Error in .local(.Object, ...) : data must be of type mRMRe.Data.
I dug a litter further and found that you actually have to convert the data to mRMR.Data type. So I did this update -
# Update
data <- mRMR.data(data = data.frame(target=cgps.ic50, cgps.ge))
mRMR.ensemble(data, 1, rep.int(1, 30))
but I still get the same error. When I look at the class I have -
> class(data)
[1] "mRMRe.Data"
attr(,"package")
[1] "mRMRe"
So the data is the requested type but the code is still not functional.
My question is if anyone has experience using this package or any help or comments would be appreciated!
Also want to note that in the example from the link - when I load the data
cgps_ic50 -> cgps.ic50
cgps_ge -> cgps.ge
so the names of the data aren't the same as the same in the example.
With the code you wrote:
data(cgps)
data <- mRMR.data(data = data.frame(target=cgps.ic50, cgps.ge))
mRMR.ensemble(data, 1, rep.int(1, 30))
The function mRMR.ensemble is getting the data as the first parameter, but the default first parameter in this function is solution_count.
I understand that your intentions executing that example are finding 30 relevant and non-redundant features using the classic mRMR feature selection algorithm so try this:
data(cgps)
data <- mRMR.data(data = data.frame(target=cgps.ic50, cgps.ge))
mRMR.ensemble(data = data, target_indices = 1,
feature_count = 30, solution_count = 1)
The target_indices are the positions in the original data.frame of the features used to maximize the relevance (correlation or other quality measure for this issue), so features selected in the end will be good for explaining the features indicated in the target_indices.
For example, in a classification problem, we would choose the position of the class variable as the value for the target_indices parameter.
The feature_count parameter indicates the number of variables to be chosen.
The solution_count is not a parameter of the classic mRMR. It indicates the number of mRMR algorithms to be ensembled to get a final feature selection, so if set to 1 it performs only one classic mRMR.
I am new to Julia programming language, however, I am fitting a Linear Mixed Effects Model and I find it difficult to save the fixed and random effects estimates in .csv files.
An example code can be found:
using MixedModels
#time modelOutput = fit(lmm(Y~ A + B + (0 + A | group), data))
There is available reference about how to obtain the fixed (fixef(modelOutput)) and random (ranef(modelOutput)) effects however using a DataFrame I am facing errors.
Any advice is appreciated.
Okay, I actually took the time to do this for you. A CoefTable is a type defined in statmodels here. Given this information, we can extract the relevant information from the CoefTable instance as follows:
df = DataFrame(variable = ct.rownms,
Estimate = ct.mat[:,1],
StdError = ct.mat[:,2],
z_val = ct.mat[:,3])
This will give an nvar-by-4 DataFrame which you can then write to csv as described earlier using writetable("output.csv",df)
I had a number of problems getting the accepted answer to work; Julia has evolved a lot since then. I rewrote it based primarily on code from the jglmm R package, with some adaptation/cobbling-together from other sources ...
"""
outfun(m, outfn="output.csv")
output the coefficient table of a fitted model to a file
"""
outfun = function(m, outfn="output.csv")
ct = coeftable(m)
coef_df = DataFrame(ct.cols);
rename!(coef_df, ct.colnms, makeunique = true)
coef_df[!, :term] = ct.rownms;
CSV.write(outfn, coef_df);
end
Is there a Python (perhaps pandas) equivalent to R's
install.packages("caTools")
library(caTools)
set.seed(88)
split = sample.split(df$col, SplitRatio = 0.75)
that will generate exactly the same value split?
My current context for this is, as an example getting Pandas dataframes that correspond exactly to the R dataframes (qualityTrain, qualityTest) created by:
# https://courses.edx.org/c4x/MITx/15.071x/asset/quality.csv
quality = read.csv("quality.csv")
set.seed(88)
split = sample.split(quality$PoorCare, SplitRatio = 0.75)
qualityTrain = subset(quality, split == TRUE)
qualityTest = subset(quality, split == FALSE)
I think scikit-learn's train_test_split function might work for you (link).
import pandas as pd
from sklearn.cross_validation import train_test_split
url = 'https://courses.edx.org/c4x/MITx/15.071x/asset/quality.csv'
quality = pd.read_csv(url)
train, test = train_test_split(quality, train_size=0.75, random_state=88)
qualityTrain = pd.DataFrame(train, columns=quality.columns)
qualityTest = pd.DataFrame(test, columns=quality.columns)
Unfortunately I don't get the same rows as the R function. I'm guessing it's the seeding, but could be wrong.
Splitting with sample.split from caTools library means the class distribution is preserved. Scikit-learn method train_test_split does not guarantee that (it splits dataset into a random train and test subsets).
You can get equivalent result as R caTools library (regarding class distribution) by using instead sklearn.cross_validation.StratifiedShuffleSplit
sss = StratifiedShuffleSplit(quality['PoorCare'], n_iter=1, test_size=0.25, random_state=0)
for train_index, test_index in sss:
qualityTrain = quality.iloc[train_index,:]
qualityTest = quality.iloc[test_index,:]
I know this is an old thread but I just found it looking for any potential solution because for a lot of online classes in stats and machine learning that are taught in R, if you want to use Python you run into this issue where all the classes say to do a set.seed() in R and then you use something like the caTools sample.split and you must get the same split or your result won't be the same later and you can't get the right answer for some quiz or exercise question. One of the main issues is that although both Python and R use, by default, the Mercenne Twister algorithm for their pseudo-random number generation, I discovered, by looking at the random states of their respective prngs, that they won't produce the same result given the same seed. And one (I forget which) is using signed numbers and the other unsigned, so it seems like there's little hope that you could find a seed to use with Python that would produce the same series of numbers as R.
A small correction in the above, StatifiedShuffleSplit is now part of sklearn.model_selection.
I have a some data with X and Y in different numpy arrays. The distribution of 1s against 0s in my Y array is about 4.1%. If I use StatifiedShuffleSplit it maintains this distribution in test and train set made after wards. See below.
full_data_Y_np.sum() / len(full_data_Y_np)
0.041006701187937859
for train_index, test_index in sss.split(full_data_X_np, full_data_Y_np):
X_train = full_data_X_np[train_index]
Y_train = full_data_Y_np[train_index]
X_test = full_data_X_np[test_index]
Y_test = full_data_Y_np[test_index]
Y_train.sum() / len(Y_train)
0.041013925152306355
Y_test.sum() / len(Y_test)
0.040989847715736043
I've imported a ClustalW2 tree in R using the ape function and read.tree function of the ape package. I estimate molecular ages using the chronopl function, resulting in a ultrametric, binary tree. From which I want to create a R build in dendrogram object.
The tree plots fine, and is a real phylo object. However i'm running into problems when trying to convert it:
Minimal Working Example:
require(ape)
test.tree <- read.tree(file = "testree.phylip", text = NULL, tree.names = NULL, skip = 0,
comment.char = "#", keep.multi = FALSE)
test.tree.nu <- chronopl(test.tree, 0, age.min = 1, age.max = NULL,
node = "root", S = 1, tol = 1e-8,
CV = FALSE, eval.max = 500, iter.max = 500)
is.ultrametric(test.tree.nu)
is.binary.tree(test.tree.nu)
treeclust <- as.hclust.phylo(test.tree.nu)
The resulting tree "looks" fine,
I test to make sure the tree is not ultrametric and binary, and want to convert it into a hclust object, to make eventually a dendrogram object of it.
> is.binary.tree(test.tree.nu)
[1] TRUE
> is.ultrametric(test.tree.nu)
[1] TRUE
After trying to make a hclust object out of the tree, I get an error:
> tree.phylo <- as.hclust.phylo(test.tree.nu)
Error in if (tmp <= n) -tmp else nm[tmp] :
missing value where TRUE/FALSE needed
In addition: Warning message:
In nm[inode] <- 1:N :
number of items to replace is not a multiple of replacement length
I realize this is a very detailed question, and perhaps such questions which are specifically related to certain packages are better asked somewhere else, but I hope someone is able to help me.
All help is much appreciated,
Regards,
File download
The Phylip file can be downloaded here
http://www.box.net/shared/rnbdk973ja
I can reproduce this with version 2.6-2 of ape under R 2.12.1 beta (2010-12-07 r53808) on Linux, but your code works in version 2.5-3 of ape.
This would suggest a bug has crept into the package and you should inform the developers of the problem to ask for expert advice. The email address of the maintainer, Emmanuel Paradis, is on the CRAN package for ape
looks like the problem is that chronopl returns a tree which is either unrooted, or has a multifurcating root (depending on how it's interpreted). Also as.hclust.phylo has/had unhelpful error messages.
This:
modded.tree <- drop.tip(test.tree.nu,c(
'An16g06590','An02g12505','An11g00390','An14g01130'))
removes all tips from one of the three clades descending from the root, thus
is.ultrametric(modded.tree)
is.binary.tree(modded.tree)
is.rooted(modded.tree)
all return TRUE, and you can do
treeclust <- as.hclust.phylo(modded.tree)
. Though I think you really want an hclust object representing the multifurcating tree, and though hclust objects can handle those, as.hclust.phylo (from package 'ape') doesn't work on multifurcations for some reason. If you know a way to import newick files into hclust objects, that might be a way forward - ade has write.tree() to generate newick files.