Torch7: Slice Tensor using ByteTensor mask - torch

I have two tensors:
labels is a 1D Tensor (5000)
dataset is 4D Tensor (5000,1,32,32)
I would like to efficiently slice the labels and dataset corresponding to label of value 1. I succeed in slicing the labels but not the dataset.
Slicing the labels:
positive_mask = labels:eq(1)
sliced_labels = labels[positive_mask]
I tried doing the following to slice the dataset and failed:
sliced_dataset = dataset[positive_mask]
sliced_dataset = dataset[{positive_mask, {}, {}, {}}]
sliced_dataset = dataset:narrow(1,positive_mask)
sliced_dataset = dataset:select(1,positive_mask)
Is there an elegant approach to perform this in Torch7?

sliced_dataset = dataset:index(1, positive_mask:nonzero():squeeze())

Related

Using an R function to hash values produces a repeating value across rows

I'm using the following query:
let
Source = {1..5},
#"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), {"Numbers"}, null, ExtraValues.Error),
#"Added Custom" = Table.AddColumn(#"Converted to Table", "Letters", each Character.FromNumber([Numbers] + 64)),
#"Run R script" = R.Execute("# 'dataset' holds the input data for this script#(lf)#(lf)library(""digest"")#(lf)#(lf)dataset$SuffixedLetters <- paste(dataset$Letters, ""_suffix"")#(lf)dataset$HashedLetters <- digest(dataset$Letters, ""md5"", serialize = TRUE)#(lf)output<-dataset",[dataset=#"Added Custom"]),
output = #"Run R script"{[Name="output"]}[Value]
in
output
which leads to the resulting table:
And the here is the R script with better formatting:
# 'dataset' holds the input data for this script
library("digest")
dataset$SuffixedLetters <- paste(dataset$Letters, "_suffix")
dataset$HashedLetters <- digest(dataset$Letters, "md5", serialize = TRUE)
output<-dataset
The 'paste' function appears to iterate over rows and resolve on each row with the new input. But the 'digest' function only appears to return the first value in the table across all rows.
I don't know why the behavior of the two functions would seem to operate differently. Can anyone advise how to get the 'HashedLetters' column to resolve using the values from each row instead of just the initial one?
Use:
dataset$HashedLetters <- sapply(dataset$Letters, digest, algo = "md5", serialize = TRUE)
digest works on a whole object at a time, not individual elements of a vector.
vec <- letters[1:3]
digest::digest(vec, algo="md5", serialize=TRUE)
# [1] "38ce1fe9e19a222505e693e8bdd8aeec"
sapply(vec, digest::digest, algo="md5", serialize=TRUE)
# a b c
# "127a2ec00989b9f7faf671ed470be7f8" "ddf100612805359cd81fdc5ce3b9fbba" "6e7a8c1c098e8817e3df3fd1b21149d1"

Loop in R through variable names with values as endings and create new variables from the result

I have 24 variables called empl_1 -empl_24 (e.g. empl_2; empl_3..)
I would like to write a loop in R that takes this values 1-24 and puts them in the respective places so the corresponding variables are either called or created with i = 1-24. The sample below shows what I would like to have within the loop (e.g. ye1- ye24; ipw_atet_1 - ipw_atet_14 and so on.
ye1_ipw <- empl$empl_1[insample==1]
ipw_atet_1 <- treatweight(y=ye1_ipw, d=treat_ipw, x=x1_ipw, ATET =TRUE, trim=0.05, boot = 2)
ipw_atet_1
ipw_atet_1$se
ye2_ipw <- empl$empl_2[insample==1]
ipw_atet_2 <- treatweight(y=ye2_ipw, d=treat_ipw, x=x1_ipw, ATET =TRUE, trim=0.05, boot = 2)
ipw_atet_2
ipw_atet_2$se
ye3_ipw <- empl$empl_3[insample==1]
ipw_atet_3 <- treatweight(y=ye3_ipw, d=treat_ipw, x=x1_ipw, ATET =TRUE, trim=0.05, boot = 2)
ipw_atet_3
ipw_atet_3$se
coming from a Stata environment I tried
for (i in seq_anlong(empl_list)){
ye[i]_ipw <- empl$empl_[i][insample==1]
ipw_atet_[i]<-treatweight(y=ye[i]_ipw, d=treat_ipw, x=x1_ipw, ATET=TRUE, trim=0.05, boot =2
}
However this does not work at all. Do you have any idea how to approach this problem by writing a nice loop? Thank you so much for your help =)
You can try with lapply :
result <- lapply(empl[paste0('empl_', 1:24)], function(x)
treatweight(y = x[insample==1], d = treat_ipw,
x = x1_ipw, ATET = TRUE, trim = 0.05, boot = 2))
result would be a list output storing the data of all the 24 variables in same object which is easier to manage and process instead of having different vectors.

Pathview R: Mapping known transcripts to a KEGG pathway diagram representing FoldChange

I'm struggling with: library(pathview)
I have a data frame ("T3") with the following column names and possible identifiers to map Fold changes to a significantly enriched KEGG pathway:
KEGGid SYMBOL Human_ENSEMBL Human_ENTREZID Mouse_ensembl_gene_id Mouse_ENTREZID
It has taken a long time to learn how to get all of these possible IDs but unfortunately, when I try to map them to relevant KEGG nodes, by assigning identifiers as rownames, I do not seem to yield a result (Error message:
Warning: None of the genes or compounds mapped to the pathway!
Argument gene.idtype or cpd.idtype may be wrong.
Error in select(db.obj, keys = in.ids, keytype = in.type, columns = c(in.type, :
unused arguments (keys = in.ids, keytype = in.type, columns = c(in.type, out.type))
Error in $<-.data.frame(*tmp*, "labels", value = c("", "", "", "", :
replacement has 82 rows, data has 89
)
This is frustrating because T3 contains all of the transcripts which are annotated to PI3K signaling and so they should map. None of the identifiers which I have been using seem to work? However, I know that these transcripts map. For example using "AKT3", which is in the list, we can highlight this node online [https://www.genome.jp/kegg-bin/show_pathway?hsa04151+10000] Where the +1000 at the end of the address specifies AKT node to be highlighted in red.
Command lines for example
SYMBOL <- c("AKT3", "AKT3")
Human_ENSEMBL<- c("ENSG00000117020","ENSG00000275199")
Human_ENTREZID <-c("10000", "10000")
Mouse_ensembl_gene_id <- c("ENSMUSG00000019699", "ENSMUSG00000019699")
Mouse_entrezgene <- c(23797, 23797)
log2FoldChange <-c(-0.676668324, -0.676668324)
T3 <- c(SYMBOL, Human_ENSEMBL, Human_ENTREZID, Mouse_ensembl_gene_id,
Mouse_entrezgene, log2FoldChange)
row.names(T3) <- T3$SYMBOL ##For example here using SYMBOL but I have tried a
lot of the other identifiers
pv.out <- pathview(gene.data = T3,
pathway.id = "hsa04151",
out.suffix = "Control vs Treatment" )
Thanks for taking the time to help
Mark

r - taking difference of two xyplots?

I have several xyplot objects that I have saved as .RDATA files. I am now interested in being able to look at their differences. I have tried things like
plot1-plot2
but this does not work (I get the "non-numeric argument to binary operator error).
I would also be able to do this if I knew how to extract the timeseries data stored within the lattice xyplot object, but I have looked everywhere and can't figure out how to do this either.
Any suggestions?
EDIT:
just to make it perfectly clear what I mean for MrFlick, by "taking the difference of two plots" I mean plotting the elementwise difference of the timeseries from each plot, assuming it exists (i.e. assuming that the plots have the same domain). Graphically,
I might want to take the following two plots, stored as xyplot objects:
and end up with something that looks like this:
-Paul
Here is a little function I wrote to plot the difference of two xyplots:
getDifferencePlot = function(plot1,plot2){
data1 = plot1$panel.args
data2 = plot2$panel.args
len1 = length(data1)
len2 = length(data2)
if (len1!=len2)
stop("plots do not have the same number of panels -- cannot take difference")
if (len1>1){
plotData = data.table(matrix(0,0,4))
setNames(plotData,c("x","y1","y2","segment"))
for (i in 1:len1){
thing1 = data.table(cbind(data1[[i]]$x,data1[[i]]$y))
thing2 = data.table(cbind(data2[[i]]$x,data2[[i]]$y))
finalThing = merge(thing1, thing2,by = "V1")
segment = rep(i,nrow(finalThing))
finalThing = cbind(finalThing,segment)
setNames(finalThing,c("x","y1","y2","segment"))
plotData = rbind(plotData,finalThing)
}
}
if (len1==1){
plotData = data.table(matrix(0,0,3))
setNames(plotData,c("x","y1","y2"))
thing1 = data.table(cbind(data1[[i]]$x,data1[[i]]$y))
thing2 = data.table(cbind(data2[[i]]$x,data2[[i]]$y))
plotData = merge(thing1, thing2,by = "V1")
}
plotData$difference = plotData$y1-plotData$y2
if (len1==1)
diffPlot = xyplot(difference~x,plotData,type = "l",auto.key = T)
if (len1>1)
diffPlot = xyplot(difference~x|segment,plotData,type = "l",auto.key = T)
return(diffPlot)
}

Issues with formatting header in R prior to using plot() function

I have a data set that I've successfully read into R. It's a simple data.frame with ONE ROW of data (I'm not sure how many columns, but its in the hundreds). It was read with column headers, but no row labels. So the data set looks something like this:
df=structure(list(X500000 = 0.0958904109589041, X1500000 = 0.10958904109589, X2500000 = 0.10958904109589, X3500000 = 0.164383561643836, X4500000 = 0.136986301369863, X5500000 = 0.205479452054795, X6500000 = 0.136986301369863, X7500000 = 0.0273972602739726, X8500000 = 0.0821917808219178, X9500000 = 0.178082191780822), .Names = c("X500000", "X1500000", "X2500000", "X3500000", "X4500000", "X5500000", "X6500000", "X7500000", "X8500000", "X9500000"), class = "data.frame", row.names = 79L)
Except that it is MUCH LARGER (I don't know if it matters, but it has around 300 columns going across). I'm trying to plot it so that the X##### labels are on the x axis, and the value of each data point is plotted on the y axis (say like a scatter plot on excel or even a line graph). Doing just plot(df) gives me an extremely bizarre graph that makes no sense to me (a bunch of boxes each with a dot right in the centre and no labels?).
I have a feeling it might work if I were to transform the data frame into a vector by removing the headings and then adding x-axis labels individually afterwards and doing a plot() on the vector, but if there is a way of avoiding that it would be great....
As explained in '?plot', 'x' and 'y' must be two vectors of numerics, of same size:
df=structure(list(X500000 = 0.0958904109589041, X1500000 = 0.10958904109589, X2500000 = 0.10958904109589, X3500000 = 0.164383561643836, X4500000 = 0.136986301369863, X5500000 = 0.205479452054795, X6500000 = 0.136986301369863, X7500000 = 0.0273972602739726, X8500000 = 0.0821917808219178, X9500000 = 0.178082191780822), .Names = c("X500000", "X1500000", "X2500000", "X3500000", "X4500000", "X5500000", "X6500000", "X7500000", "X8500000", "X9500000"), class = "data.frame", row.names = 79L)
plot(x=as.numeric(substr(names(df),2,nchar(names(df)))), as.numeric(df), xlab="This is xlab", ylab="This is y")

Resources