I having some problems with an application of the rgl 3d graphing package.
I'm trying to draw some line segments. And my data is arranged in a dataframe called 'markers' with six columns, one for each of the starting x, y, and z values, and one for each of the ending x, y, and z values.
startX startY startZ endX endY endZ
69.345 45.732 20 115 39.072 1.92413
80.270 38.480 30 175 44.548 0.36777
99.590 33.596 20 175 35.224 0.06929
32.120 41.218 20 115 39.294 2.81424
11.775 37.000 30 175 35.890 1.38047
76.820 44.104 22 115 44.992 4.14674
85.790 23.384 18 115 36.112 0.40508
80.040 17.464 20 175 31.080 2.59038
103.615 38.850 22 115 39.220 3.18201
41.200 31.006 30 175 36.260 3.48049
88.665 43.956 30 115 39.738 0.50635
109.365 23.976 20 175 33.374 3.99750
This should be a piece of cake. Just feed those values to the segment3d() command and I should get the plot I want. Only I can't figure out how to correctly pass the respective starting and ending pairs into segment3d().
I've tried just about everything possible ($ notation, indexing, concatenating, using a loop, apply and sapply, etc.), including reading the documentation. It's great, it says for the arguments x, y, and z: "Any reasonable way of defining the coordinates is acceptable." Ugh... it does refer you to the xyz.coords utility.
So I went over that documentation. And I think I understand what it does; I can even use it to standardize my data e.g.
starts <- xyz.coords(markers$startX, markers$startY, markers$startZ)
ends <- xyz.coords(markers$endX, markers$endY, markers$endZ)
But then I'm still not sure what to do with those two lists.
segments3d(starts, ends)
segments3d(starts + ends)
segments3d((starts, ends), (starts, ends), (starts, ends))
segments3d(c(starts, ends), c(starts, ends), c(starts, ends))
segments3d(c(starts$x, ends$x), c(starts$y, ends$y), c(starts$z, ends$z))
I mean I know why the above don't work. I'm basically just trying things at this point as this is making me feel incredibly stupid, like there is something obvious—I mean facepalm level obvious—I'm missing.
I went through the rgl documentation itself looking for an example, and the only place I found them using segment3d() in any manner resembling what I'm trying to do, they used the '+' notation I tried above. Basically they built 2 matrices and added the second to the first.
Something like this should work.
library(rgl)
open3d(scale=c(1/5,1,1))
segments3d(x=as.vector(t(markers[,c(1,4)])),
y=as.vector(t(markers[,c(2,5)])),
z=as.vector(t(markers[,c(3,6)])))
axes3d()
title3d(xlab="X",ylab="Y",zlab="Z")
The problem is that segments3d(...) takes the x (and y and z) values in pairs. So rows 1-2 are the first segment, rows 3-4 are the second segment, etc. You need to interleave, e.g. $startx and $endx, etc. The code above does that.
Code for creating the data set:
markers <- data.frame(startX = c(69.345, 80.270, 99.590, 32.120, 11.775, 76.820, 85.790, 80.040, 103.615, 41.200, 88.665, 109.365), startY = c(45.732, 38.480, 33.596, 41.218, 37.000, 44.104, 23.384, 17.464, 38.850, 31.006, 43.956, 23.976), startZ = c(20, 30, 20, 20, 30, 22, 18, 20, 22, 30, 30, 20), endX = c(115, 175, 175, 115, 175, 115, 115, 175, 115, 175, 115, 175), endY = c(39.072, 44.548, 35.224, 39.294, 35.890, 44.992, 36.112, 31.080, 39.220, 36.260, 39.738, 33.374), endZ = c(1.92413, 0.36777, 0.06929, 2.81424, 1.38047, 4.14674, 0.40508, 2.59038, 3.18201, 3.48049, 0.50635, 3.99750))
Related
I was wondering if there is a way in R to find the specific seed that generates a specific set of numbers;
For example:
sample(1:300, 10)
I want to find the seed that gives me, as the output of the previous code:
58 235 243 42 281 137 2 219 284 184
As far as I know there is no elegant way to do this, but you could brute force it:
desired_output <- c(58, 235, 243, 42, 281, 137, 2, 219, 284, 184)
MAX_SEED <- .Machine$integer.max
MIN_SEED <- MAX_SEED * -1
i <- MIN_SEED
while (i < MAX_SEED - 1) {
set.seed(i)
actual_output <- sample(1:300, 10)
if (identical(actual_output, desired_output)) {
message("Seed found! Seed is: ", i)
break
}
i <- i + 1
}
This takes 11.5 seconds to run with the first 1e6 seeds on my laptop - so if you're unlucky then it would take about 7 hours to run. Also, this is exactly the kind of task you could run in parallel in separate threads to cut the time down quite a bit.
EDIT: Updated to include negative seeds which I had not considered. So in fact it could take twice as long.
I have the following array which I need to figure out its variance:
julia> a = [5, 6, 7, 8, 10, 12, 12, 17, 67, 68, 69, 72, 74, 74, 92, 93, 100, 105, 110, 120, 124]
21-element Vector{Int64}:
5
6
7
8
10
12
12
17
67
68
⋮
74
74
92
93
100
105
110
120
124
How can I do this in Julia?
Julia has var function in built into standard Statistics module. So you can just do:
using Statistics
var(a)
The StatsBase.jl package does not export the var function, so your code would not work when used in a fresh Julia session. You would have to write StatsBase.var(a) instead (or add using Statistics).
What StatsBase.jl adds to the var function is that it defines additional methods that allow computation of weighted variance. So e.g. the following works with StatsBase.jl (but would currently not work without it):
julia> using Statistics
julia> using StatsBase
julia> var([1,2,3], Weights([1,2,3]))
0.5555555555555555
If someone wants to calculate variance from first principles, you can do this:
function my_variance(x)
n = length(x)
μ = sum(x) / n
sum((x .- μ) .^ 2) / (n - 1)
end
But please, just use StatsBase.var!
I'm more experienced with R than many of my peers, yet it sometimes takes hours to move a novel-to-me concept into the code line, and usually a few more to get a successful output. I don't know how to describe this in R language, so I hope you can help me- either with sample code, or pointing me in the right direction.
I have c(X1,X2,X3,...Xn) for starting variable, a non-random numeric value.
I have c(Y1,Y2,Y3,...Yn) for change variable, a non-random numeric value denoting by how much to change X, give or take, and a value between 0-10.
I have c(Z1,Z2,Z3,...Zn) which is the min and max range of X.
What I want to observe is the random sampling of all numbers X, which have all randomly had corresponding Y variable subtracted or added to them. What I'm trying to ask in this problem, is how many times will I draw X values which are exactly the X values which I initially input as well as give or take only a low Y value.
For instance,
Exes<-c(135,462,579,222)
Whys<-c(1,3,3,2)
Zees<-c(c(115,155),c(450,474),c(510,648),c(200,244))
First iteration: X=c(135,562,579,222), second iteration: X=c(130,471,585,230)<- as you can see, X of second iteration has changed by (-5*Y1), (+3*Y2), (+2*Y3), and (+11*Y4)
What I want to output is a list of randomized X values which have changed by only a factor of their corresponding Y value, and always fall within the range of given Z values. Further, I want to examine how many times at least one- and only one- X value will be be significantly different from the corresponding,starting input X.
I feel like I'm not wording the question succinctly, but I also feel that this is why I've posted. I'm not trying to ask for hand-holding, but rather seeking advice.
I am not sure that I understood the question, do you want to reiterate the process numerous times? is it for the purpose of simulation?. Here is a start of a solution.
library(dplyr)
x <- c(135,462,579,222)
y <- c(1,3,3,2)
z.lower <- c(115, 450, 510, 200)
z.upper <- c(155, 474, 648, 244)
temp.df <- data.frame(x, y, z.lower, z.upper)
df %>%
mutate(samp = sample(seq(-10, 10, 1), nrow(temp.df))) %>% ### Sample numbers between 0 and 10
mutate(new.val = x + samp * y) %>% ### Create new X
mutate(is.bound = new.val < z.upper & new.val > z.lower) ### Check that falls in bounds
x y z.lower z.upper samp new.val is.bound
1 135 1 115 155 -10 125 TRUE
2 462 3 450 474 10 492 FALSE
3 579 3 510 648 8 603 TRUE
4 222 2 200 244 6 234 TRUE
For this dataset, this is a possibility:
Exes<-c(135,462,579,222)
Whys<-c(1,3,3,2)
Zees<-c(c(115,155),c(450,474),c(510,648),c(200,244))
n = 10000
x_range_l <- split(Zees, rep(seq_len(length(Zees) / 2), each = 2))
mapply(function(y, x_range) sample(seq(from = x_range[1], to = x_range[2], by = y), size = n, replace = T),
Whys, x_range_l)
Note that this option depends more on the Zees than the Exes. A more complete way to do it would be:
Exes<-c(135,462,579,222)
Whys<-c(1,3,3,2)
Why_Range <- c(20, 4, 13, 11)
x_range_l <- Map(function(x, y, rng) c(x - y * rng, x + y * rng), Exes, Whys, Why_Range)
n = 10000
mapply(function(y, x_range) sample(seq(from = x_range[1], to = x_range[2], by = y), size = n, replace = T),
Whys, x_range_l)
I need to plot a large amount of data, but most of them are equal to 0. My idea was, in order to save space and computation time, to not store values equal to 0.
Furthermore, I want to use geom_line() function of ggplot2 package in R, because with my data, this representation is the best one and has the aesthetics that I want.
My problem is: How, between two values of my X axis, can I plot a line at 0. Do I have to generate the associated Data Frame or a trick is possible to plot this?
Example:
X Y
117 1
158 14
179 4
187 1
190 1
194 2
197 1
200 4
203 3
208 1
211 1
212 5
218 1
992 15
1001 1
1035 1
1037 28
1046 1
1048 1
1064 14
1078 1
# To generate the DF
X <- c(117, 158, 179, 187, 190, 194, 197, 200, 203, 208, 211, 212, 218, 992, 1001, 1035, 1037, 1046, 1048, 1064, 1078)
Y <- c(1,14,4,1,1,2,1,4,3,1,1,5,1,15,1,1,28,1,1,14,1)
data <- data.frame(X,Y)
g <- ggplot(data = data, aes(x = data$X, y = data$Y))
g <- g + geom_line()
g
To give you an idea, that I am trying to do is to convert this image:
to something like this:
http://www.hostingpics.net/viewer.php?id=407269stack2.png
To generate the second figure, I have to define two positions around peaks in order to have this good shape.
I tried to change the scale to continuous scale, or discrete, but I did not have good peaks. So, there is a trick to say at ggplot2, if a position in X axis is between two values of X, this position will be display at 0?
Thank you a lot, any kind of help will be highly appreciated.
Your problem is that R doesn't see any interval values of X. You can fix that by doing the following:
X <- c(117, 158, 179, 187, 190, 194, 197, 200, 203, 208, 211, 212, 218, 992, 1001, 1035, 1037, 1046, 1048, 1064, 1078)
Y <- c(1,14,4,1,1,2,1,4,3,1,1,5,1,15,1,1,28,1,1,14,1)
Which is your original data frame.
Z <- data.frame(seq(min(X),max(X)))
Creates a data frame that has all of the X values.
colnames(Z)[1] <- "X"
Renames the first column as "X" to be able to merge it with your "data" dataframe.
data <- data.frame(X,Y)
data <- merge(Z[1],data, all.x = X)
Creates a new data frame with all of the interval X values.
data[is.na(data)] <- 0
Sets all X values that are NA to 0.
g <- ggplot(data = data, aes(x = data$X, y = data$Y))
g <- g + geom_line()
g
Now plots it.
I have a problem when I use the function CA() in R.
My data is :
data
row.names Conscient NonConscient
MoinsSouvent 185 213
PlusieursfMois 98 56
PlusieursfSemaine 28 27
TLJ 5 8
but when I use CA(data), I have :
test <- CA(data)
Error in res.ca$col$coord[, axes] : subscript out of bounds
Can someone help please ?
The problem is the due to the fact that in correspondance analysis with a conteingency table of size I x J the number of factorial axes is min{(I-1), (J-1)}.
You have a 4 x 2 table so you can't have factorial plan but an axe (because dim = 1 = min(4-1, 2-1)).
One way to solve this problem is to use CA with the parameter graph set to FALSE.
require(FactoMineR)
data <- matrix(c(185, 213, 98, 56, 28, 27, 5, 8),
ncol = 2, byrow = TRUE)
dimnames(data) <- list(c("ms", "plfm", "plfs", "tlj"),
c("cs", "ncs"))
data <- as.table(data)
res <- CA(data, graph = FALSE)
You can also check the coordinates to see that plotting a plan here is not possible.
res$row$coord
## ms plfm plfs tlj
## -0.0897234 0.2534199 -0.0011732 -0.2501709
res$col$coord
## [,1]
## cs 0.1469
## ncs -0.1527
There is no point of doing a correspondence analysis on a 4*2 table. CA are made to reduce the dimensionality of large contingency table.
If your variables have so few possible values, you better just interpret the contingency table directly, using chisquare or fisher test if needed.