I'm really confused about how to properly concatenate julia arrays. I have an array (sim1.value) that is 4875x3x4. I would like to collapse it over the last dimension so that is is 19500x3.
vcat(sim1.value) and cat(3,sim1.value) don't give the result I want.
vcat(args) command is like an abbreviation for cat(1,args) as it concatenates the given args on the vertical axis (the 1st dimension of your array)
You can get more information on that topic following this link: http://docs.julialang.org/en/latest/manual/arrays/#concatenation
Therefore, you can find a solution without using the reshape function:
# Get the size of your data
x, y, z = size(data)
# Create a "result matrix" with the same number of columns, but no lines
result = similar(data, 0, y)
# For each layer, concatenate the layer verticaly with the "result matrix"
for i in 1:z
result = vcat(result, data[:,:,z])
end
Related
I have two lists and I want to subtract one list element wise with the other, in order to replicate a Matlab function bsxfun(#minus, lt, lt2). The two lists look something like the below (edit: now works without pracma package):
# Code
# First list
lt = c(list())
# I use these lines to pre-dim the list...
lt[[1]] = c(rep(list(1)))
lt[[2]] = c(rep(list(1)))
# ... such that I can add matrices it this way:
lt[[1]][[1]] = matrix(c(3),nrow=1, ncol=1,byrow=TRUE)
lt[[2]][[1]] = matrix(c(1),nrow=1, ncol=1, byrow=TRUE)
# Same with the second list:
lt2 = c(list())
lt2[[1]] = c(rep(list(1)))
lt2[[2]] = c(rep(list(1)))
lt2[[1]][[1]] = matrix(c(2,2,2),nrow=3, ncol=1,byrow=TRUE)
lt2[[2]][[1]] = matrix(c(1,1,1),nrow=3, ncol=1,byrow=TRUE)
Element wise subtraction would mean that that each row of an element of lt2 would be subtracted
by the respective element of the object lt, i.e., lt2[[1]][[1]] each row by 3, resulting in t(c(-1 -1 -1)).... and lt2[[2]][[1]] = t(c(0,0,0)) by 1 ... It is important to me that the list structure is maintained in the results.
Now I tried using lapply(lt2,"-",lt) but it does not work. Any suggestions?
I suspect you are looking for something like this skeleton code which subtracts 2 lists element-wise...
x <- list(1,2,3)
y <- list(4,5,6)
mapply('-', y, x, SIMPLIFY = FALSE)
but as noted, you need 2 identical lists (or at least R's recycling algorithms must make sense) as for example...
z <- list(4,5,6,7,8,9)
mapply('-',z,x,SIMPLIFY = FALSE)
You might be looking for something like this where you subtract a constant from each member of the list...
mapply('-',y,2, SIMPLIFY= FALSE)
I figured it out - I had another mistake in the question :/
Changing the second class as.numeric worked
lt3 = lapply(lt2[[1]],"-",as.numeric(lt[[1]]))
Matlab's [n,mapx] = histc(x, bin_edged) returns the counts of x in each bin as n and returns a map, which is the same length of x which is the bin index that each element of x was placed into.
I can do the same thing in Julia as follows:
Using StatsBase
x = rand(1000)
bin_e = 0:0.1:1
h = fit(Histogram, x, bin_e)
yx = map((z) -> findnext(z.<=h.edges[1],1),x) .- 1
Is this the "right way" to do this? It seem a bit kludgy.
Inspired by this python question you should be able to define a small function that delivers the desired mapping (modulo conventions):
binindices(edges, data) = searchsortedlast.(Ref(edges), data)
Note that the bin edges are sorted and we can use seachsortedlast to get the last bin edge smaller or equal than a datapoint. Broadcasting this over all of the data we obtain the mapping. Note that the Ref(edges) indicates that edges is a scalar under broadcasting (that means that the full array is considered in each call).
Although conceptionally identical to your solution, this approach is about 13x faster on my machine.
I filed an issue over at StatsBase.jl's github page suggesting to add this as a feature.
After looking through the code for Histogram.jl I found that they already included a function binindex. So this solution is probably the best:
x = 0:0.001:10
h1 = fit(Histogram,x,0:10,closed=left)
xmap1 = StatsBase.binindex.(Ref(h1), x)
h2 = fit(Histogram,x,0:10,closed=right)
xmap2 = StatsBase.binindex.(Ref(h2), x)
I stumbled across this question when I was trying to figure out how many occurrences of each value I had in a list of values. If each value is in its own bin (as for categorical data, or integer data with a small number of unique values), this is what one would be plotting in a histogram.
If that is what you want, then countmap() in StatBase package is just what you need.
I have a data.frame dim = (200,500)
I want to do a shaprio.test on each column of my dataframe and append to a list. This is what I'm trying:
colstoremove <- list();
for (i in range(dim(I.df.nocov)[2])) {
x <- shapiro.test(I.df.nocov[1:200,i])
colstoremove[[i]] <- x[2]
}
However this is failing. Some pointers? (background is mainly python, not much of an R user)
Consider lapply() as any data frame passed into it runs operations on columns and the returned list will be equal to number of columns:
colstoremove <- lapply(I.df.noconv, function(col) shapiro.test(col)[2])
Here is what happens in
for (i in range(dim(I.df.nocov)[2]))
For the sake of example, I assume that I.df.nocov contains 100 rows and 5 columns.
dim(I.df.nocov) is the vector of I.df.nocov dimensions, i.e. c(100, 5)
dim(I.df.nocov)[2] is the 2nd dimension of I.df.nocov, i.e. 5
range(x)is a 2-element vector which contains minimal and maximal values of x. For example, range(c(4,10,1)) is c(1,10). So range(dim(I.df.nocov)[2]) is c(5,5).
Therefore, the loop iterate twice: first time with i=5, and second time also with i=5. Not surprising that it fails!
The problem is that R's function range and Python's function with the same name do completely different things. The equivalent of Python's range is called seq. For example, seq(5)=c(1,2,3,4,5), while seq(3,5)=c(3,4,5), and seq(1,10,2)=c(1,3,5,7,9). You may also write 1:n, it is the same as seq(n), and m:n is same as seq(m,n) (but the priority of ':' is very high, so 1:2*x is interpreted as (1:2)*x.
Generally, if something does not work in R, you should print the subexpressions from the innerwise to the outerwise. If some subexpression is too big to be printed, use str(x) (str means "structure"). And never assume that functions in Python and R are same! If there is a function with same name, it usually does a different thing.
On a side note, instead of dim(I.df.nocov)[2] you could just write ncol(I.df.nocov) (there is also a function nrow).
I have a function that computes some things and then assigns that to a matrix. This matrix receives its name from a paste statement (based on some other current values). I then want to assign the dimnames to the matrix, but don't know how to make the pasted name be understood.
Here is what is going on:
function <- someComputations(labs) {
### bunch of computations, leading to X, Y, and Z:
matName <- paste("rhoMat_", X, sep = "") # this yields rhoMat_15 if X equals 15
assign(matName, Y %*% Z)
assign(dimnames(matName), labs) # labs is a list of row labels and column labels
return(matName)
}
This works well, including the first assign statement, and then it breaks down.
I have tried all kinds of approaches, such as eval(parse(text = matNum)), as.name(matNum), substitute(matNum), but to no avail.
Since I don't know the actual name of the matrix (because matNum is not given), I can't hardcode the name into the function--so I am stuck with its character name matName. How can I make R understand I want to set the dimnames of the matrix rhoMat_15, rather than of matName?
Thanks, Peter
dimnames(get(matName)) <- labs
i'm writing a script that reads two .txt file in two vectors. After that I want to make a Spearman's rank correlation and plot the result.
The first vectors value's length is 12-13 characters (e.g. 7.3445555667 or 10.3445555667) and the second vectors value's length is one character (e.g. 1 or 2).
The code:
vector1 <- read.table ("D:...path.../mytext1.txt", header=FALSE)
vector2 <- read.table ("D:...path.../mytext2.txt", header=FALSE)
cor.coeff = cor(vector1 , vector2 , method = "spearman")
cor.test(vector1 , vector2 , method = "spearman")
plot(vector1.var, vector2.var)
The .txt files contain only numeric values.
I'm getting two errors, the first in line 4 it's like " 'x' have to be a numeric vector"
and the second error occurs in line 5 it's like "object vector 1.var couldn't be found"
I also tried
plot(vector1, vector2)
instead of
plot(vector1.var, vector2.var)
But then there's an error like "Error in stripchart.default (x1,...) : invalid plot-method
The implementation is orientated at http://www.gardenersown.co.uk/Education/Lectures/R/correl.htm#correlation
I doubt vector1 and vector2 are vectors. Reading ?read.table we note in the Value section:
Value:
A data frame (‘data.frame’) containing a representation of the
data in the file.
....
So even if your two text files contain just a single variable, the two objects read in will be data frames with a single component each.
Secondly, your data files don't contain headers so R will make up a variable name. I haven't tested this but IIRC your the variables in vector1 and vector2 will both be called X1. Do head(vector1) and the same on vector2 (or names(vector1)) to see how your objects look in R.
I can see why you might think vector1.var might work, but you should realise that as far as R was concerned it was looking for an object named vector1.var. The . is just any other character in R object names. If you meant to use . as a subsetting or selection operator, then you need to read up on subsetting operators in R. These are $ and [ and [[. See for example the R Language Definition manual or the R manual.
I suspect you could just change your code to:
vector1 <- read.table ("D:...path.../mytext1.txt", header=FALSE)[, 1]
vector2 <- read.table ("D:...path.../mytext2.txt", header=FALSE)[, 1]
cor.coeff <- cor(vector1 , vector2 , method = "spearman")
cor.test(vector1 , vector2 , method = "spearman")
plot(vector1, vector2)
But I am supposing quite a bit about what is in your two text files...
str is a very useful function (see ?str for more) that one should use often, especially to verify R object types. A quick str(vector1) and str(vector2) will tell you if those columns were read as characters instead of numeric. If so, then use as.numeric(vector1) to typecast the data in each vector.
Also, names(vector1) and names(vector2) will tell you what the column names are and likely resolve your plotting issue.