convert 4-dimensional array to 2-dimensional data set in R - r

I would like to convert a 4-dimensional array into a 2-dimensional data set. I present code for two approaches that do that: one approach using a brute force method involving cbind and rbind and a second approach using nested for-loops. Nevertheless, I am thinking there is likely a better way. Thank you for any suggestions.
R <- 3 # regions
M <- 5 # sites
J <- 2 # samples
T <- 4 # years
# 4-dim example array
y <- array(NA, dim = c(M, J, T, R))
# region 1
y[,1,1,1] = 1; y[,2,1,1] = 2;
y[,1,2,1] = 3; y[,2,2,1] = 4;
y[,1,3,1] = 5; y[,2,3,1] = 6;
y[,1,4,1] = 7; y[,2,4,1] = 8;
# region 2
y[,1,1,2] = 9; y[,2,1,2] = 10;
y[,1,2,2] = 11; y[,2,2,2] = 12;
y[,1,3,2] = 13; y[,2,3,2] = 14;
y[,1,4,2] = 15; y[,2,4,2] = 16;
# region 3
y[,1,1,3] = 17; y[,2,1,3] = 18;
y[,1,2,3] = 19; y[,2,2,3] = 20;
y[,1,3,3] = 21; y[,2,3,3] = 22;
y[,1,4,3] = 23; y[,2,4,3] = 24;
# desired two-dimensional data set
z = read.table(text = "
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16
9 10 11 12 13 14 15 16
9 10 11 12 13 14 15 16
9 10 11 12 13 14 15 16
9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24
17 18 19 20 21 22 23 24
17 18 19 20 21 22 23 24
17 18 19 20 21 22 23 24
17 18 19 20 21 22 23 24
", sep = "", header = FALSE)
# using cbind and rbind to convert 4-dimensional array to 2-dimensional data set
r1 <- cbind(y[,,1,1], y[,,2,1], y[,,3,1], y[,,4,1])
r2 <- cbind(y[,,1,2], y[,,2,2], y[,,3,2], y[,,4,2])
r3 <- cbind(y[,,1,3], y[,,2,3], y[,,3,3], y[,,4,3])
my.data <- rbind(r1,r2,r3)
my.data
# using nested for-loops to convert 4-dimensional array to 2-dimensional data set
m2 <- matrix(NA, nrow = M*R, ncol= J*T)
for(i in 1:R) {
for(j in 1:T) {
m2[(M*(i-1) + (1:M)), (J*(j-1) + (1:J))] = y[,,j,i]
}
}
m2
# basis for nested for-loops above
m3 <- matrix(NA, nrow = M*R, ncol= J*T)
m3[(M*0 + (1:M)), (J*0 + (1:J))] = y[,,1,1]
m3[(M*0 + (1:M)), (J*1 + (1:J))] = y[,,2,1]
m3[(M*0 + (1:M)), (J*2 + (1:J))] = y[,,3,1]
m3[(M*0 + (1:M)), (J*3 + (1:J))] = y[,,4,1]
m3[(M*1 + (1:M)), (J*0 + (1:J))] = y[,,1,2]
m3[(M*1 + (1:M)), (J*1 + (1:J))] = y[,,2,2]
m3[(M*1 + (1:M)), (J*2 + (1:J))] = y[,,3,2]
m3[(M*1 + (1:M)), (J*3 + (1:J))] = y[,,4,2]
m3[(M*2 + (1:M)), (J*0 + (1:J))] = y[,,1,3]
m3[(M*2 + (1:M)), (J*1 + (1:J))] = y[,,2,3]
m3[(M*2 + (1:M)), (J*2 + (1:J))] = y[,,3,3]
m3[(M*2 + (1:M)), (J*3 + (1:J))] = y[,,4,3]
m3

It took a couple of tries, but:
matrix(aperm(y,c(1,4,2,3)),15)
or more generally
matrix(aperm(y,c(1,4,2,3)),prod(dim(y)[c(1,4)]))

In case someone comes here looking for a similar question about collapsing to an array, but to one that is greater than dimension=2, use array() instead of matrix(), with the dim() argument to specify what dimensions you want. Code that will also work for the problem above is:
array(aperm(y,c(1,4,2,3)), dim=c(15,8))
This can easily be modified if you wanted the output to be, say, a 3d array by putting in an additional value to dim(). The aperm() bit may not be necessary for your particular case, but you should always check that the collapsed array is in the order you want and use aperm() accordingly.

Related

Issue with Density Plot using GGPLOT2 [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I want to plot a density plot for 2 groups and below is my code.
library(ggplot2)
#Sample data
dat <- data.frame(Score = c(myfiles2Best$V2, myfilesL2Best$V2)
, Group = rep(c("T", "L")))
ggplot(dat, aes(x = Score)) +
geom_density(aes(color = Group)) + xlim(0,16)
Below is the image of the output.
and when I change the data frame by changing the location of the column as shown below this is how my plot looks like.
dat <- data.frame(Score = c(myfilesL2Best$V2, myfiles2Best$V2)
, Group = rep(c("L", "T")))
Individually, this is how they look like.
dat <- data.frame(Score = c(myfiles2Best$V2)
, Group = rep(c("T"))
ggplot(dat, aes(x = Score)) +
geom_density(aes(color = Group)) + xlim(0,16)
dat <- data.frame(Score = c(myfilesL2Best$V2)
, Group = rep(c("L"))
ggplot(dat, aes(x = Score)) +
geom_density(aes(color = Group)) + xlim(0,16)
This is totally wrong, anything wrong with my setup
rownumber score group
1 8 T
2 8 L
3 7 T
4 7 L
5 9 T
6 8 L
7 8 T
8 7 L
9 8 T
10 8 L
11 8 T
12 9 L
13 8 T
14 8 L
15 8 T
16 8 L
17 9 T
18 7 L
19 9 T
20 7 L
21 8 T
22 10 L
23 8 T
24 8 L
25 9 T
26 8 L
27 8 T
28 8 L
29 9 T
30 8 L
31 7 T
32 10 L
33 8 T
34 10 L
35 8 T
36 7 L
37 8 T
38 7 L
39 11 T
40 9 L
41 8 T
42 9 L
43 8 T
44 10 L
45 8 T
46 9 L
47 8 T
48 8 L
49 8 T
50 7 L
51 9 T
52 8 L
53 8 T
54 9 L
55 8 T
56 7 L
57 7 T
58 9 L
59 10 T
60 8 L
ggplot2::geom_density uses the base R density function to compute density. (see ?geom_density.) This requires a parameter for smoothing, which by default uses a rule named "nrd0", which was picked for "historical and compatibility reasons." (see ?density.) You will get density plots with different appearances depending on this parameter.
From ?bandwidth:
bw.nrd0 implements a rule-of-thumb for choosing the bandwidth of a Gaussian kernel density estimator. It defaults to 0.9 times the minimum of the standard deviation and the interquartile range divided by 1.34 times the sample size to the negative one-fifth power (= Silverman's ‘rule of thumb’, Silverman (1986, page 48, eqn (3.31))) unless the quartiles coincide when a positive result will be guaranteed.
In your example, the two subgroups look like they have different standard deviations and IQRs, so it makes sense to me that they would look different depending on whether that smoothing parameter is calculated for them collectively (as in the case with the combined plot) or individually.
If you want your density plots to correspond between a grouped and individual basis, specify the bandwidth manually:
ggplot(df, aes(x = score)) +
geom_density(aes(color = group), bw = 0.3) +
xlim(0,16)
ggplot(subset(df, group == "L"), aes(x = score)) +
geom_density(aes(color = group), bw = 0.3) +
xlim(0,16)
ggplot(subset(df, group == "T"), aes(x = score)) +
geom_density(aes(color = group), bw = 0.3) +
xlim(0,16)

Applying a function for multiple groups using dplyr

I have some data for multiple location and year
big.data <- data.frame(loc.id = rep(1:3, each = 10*3),
year = rep(rep(1981:1983, each = 10),times = 3),
day = rep(1:10, times = 3*3),
CN = rep(c(50,55,58), each = 10*3),
top.FC = rep(c(72,76,80),each = 10*3),
DC = rep(c(0.02,0.5,0.8), each = 10*3),
WAT0 = rep(c(20,22,26), each = 10*3),
Precp = sample(1:100,90, replace = T),
ETo = sample(1:10,90, replace = T))
I have a function: water.model which uses a second function internally called water.update
water.model <- function(dat){
top.FC <- unique(dat$top.FC)
dat$WAT <- -9.9
dat$RO <- -9.9
dat$DR <- -9.9
dat$WAT[1] <- top.FC/2 # WAT.i is a constant
dat$RO[1] <- NA
dat$DR[1] <- NA
for(d in 1:(nrow(dat)-1)){
dat[d + 1,10:12] <- water.update(WAT0 = dat$WAT[d],
RAIN.i = dat$Precp[d + 1],
ETo.i = dat$ETo[d + 1],
CN = unique(dat$CN),
DC = unique(dat$DC),
top.FC = unique(dat$top.FC))
}
return(dat)
}
water.update <- function(WAT0, RAIN.i, ETo.i, CN, DC, top.FC){
S = 25400/CN - 254; IA = 0.2*S
if (RAIN.i > IA) { RO = (RAIN.i - 0.2 * S)^2/(RAIN.i + 0.8 * S)
} else {
RO = 0
}
if (WAT0 + RAIN.i - RO > top.FC) {
DR = DC * (WAT0 + RAIN.i - RO - top.FC)
} else {
DR = 0
}
dWAT = RAIN.i - RO - DR - ETo.i
WAT1 = WAT0 + dWAT
WAT1 <- ifelse(WAT1 < 0, 0, WAT1)
return(list(WAT1,RO,DR))
}
If I run the above function for a single location X year
big.data.sub <- big.data[big.data$loc.id == 1 & big.data$year == 1981,]
water.model(big.data.sub)
loc.id year day CN top.FC DC WAT0 Precp ETo WAT RO DR
1 1 1981 1 50 72 0.02 20 52 5 36.0000 NA NA
2 1 1981 2 50 72 0.02 20 12 9 39.0000 0.0000000 0.000000
3 1 1981 3 50 72 0.02 20 3 2 40.0000 0.0000000 0.000000
4 1 1981 4 50 72 0.02 20 81 9 107.8750 3.2091485 0.915817
5 1 1981 5 50 72 0.02 20 37 10 133.4175 0.0000000 1.457501
6 1 1981 6 50 72 0.02 20 61 7 184.5833 0.3937926 2.440475
7 1 1981 7 50 72 0.02 20 14 10 186.0516 0.0000000 2.531665
8 1 1981 8 50 72 0.02 20 9 6 186.5906 0.0000000 2.461032
9 1 1981 9 50 72 0.02 20 77 9 248.3579 2.4498216 3.782815
10 1 1981 10 50 72 0.02 20 18 6 256.4708 0.0000000 3.887159
How do I run this for all location and year?
big.data %>% group_by(loc.id, year) %>% # apply my function here.
My final data should look like the above with three new columns called WAT, RO and DR which are generated when the function is run.
We can split the data and apply the water.model by looping over the list with map
library(tidyverse)
split(big.data, big.data[c('loc.id', 'year')], drop = TRUE) %>%
map_df(water.model)
Or apply the function within do after group_by
big.data %>%
group_by(loc.id, year) %>%
do(data.frame(water.model(.)))

formula to map between two sequences

Given the 3 sequences below, I would like to be able to map seqN to seq0. seq1 maps to seq0 and seq2 to seq1 by:
seqN(i) = (seqN-1(i)*7)%27
For example,
seq1(i) = (seq0(i)*7)%27
seq0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
seq1 0 7 14 21 1 8 15 22 2 9 16 23 3 10 17 24 4 11 18 25 5 12 19 26 6 13 20 27
seq2 0 22 17 12 7 2 24 19 14 9 4 26 21 16 11 6 1 23 18 13 8 3 25 20 15 10 5 27
Now, one way to do this will be to apply the function recursively. But that is not an option for me (this needs to be implemented in hardware). Nor can I save the values of the last sequence to calculate the value of the current sequence - I don't have space for it.
Is there a way to map seqN to seq0 using a mathematical equation (not recursive functions)?
Apart from what #Nitpick pointed out (27 % 27 = 0, so you have either to stop at 26 or use 28 instead), you should simply use a power:
seqN(i) = ( seq0(i) * 7N ) % 27
Or:
seqN(i) = ( seqN-1(i) * 7 ) % 27 = ( seqN-2(i) * 72 ) % 27 = ... = ( seqN-N(i) * 7N ) % 27
Not sure if this helps, but the transition of some seq_N(i) -> seq_/{N+/-1}(i) is completely determined by the following set of cycles:
List(0)
List(1, 7, 22, 19, 25, 13, 10, 16, 4)
List(2, 14, 17, 11, 23, 26, 20, 5, 8)
List(3, 21, 12)
List(6, 15, 24)
List(9)
List(18)
such that 0 -> 0, 1 -> 7 -> 22 -> 19 -> 25 -> ..., etc.
Based on these cycles you could take a number's position and translate it by N % length_of_cycle (just handling cycle wrapping) and get the original number.
I found these cycles with the following code(written in Scala) :
val m = (0 to 26).map(x => (x, x * 7 % 27)).toMap
var cycles = (0 to 26).map(x => {
var x_i = x
val oot = scala.collection.mutable.ArrayBuffer(x)
while (m(x_i) != x)
{
x_i = m(x_i)
oot.append(x_i)
}
oot.toList
})
var filteredCycles = cycles.toArray
for (i <- (1 to 26))
{
val cycle = filteredCycles(i)
for (j <- (1 until cycle.size))
{
filteredCycles(cycle(j)) = List[Int]()
}
}
val uniqueCycles = filteredCycles.filter(_.size != 0)

R ggplot geom_errorbar is not displaying the whiskers

Plotting my data in R with ggplot, the error bar whiskers are not displayed. Why are the whiskers not displayed and what is the fix so they will be displayed?
(Though it is not necessary to specify "data=..." in geom_errorbar, I am using a smaller set of data to plot a few points and their error bars on top of a larger set of data. I simplified here to just use the smaller data frame for everything but want to keep this example close to the code I intend to use.)
Thanks!
Shawna
shapes <- c(1, 19, 15, 1, 0)
names(shapes) <- levels(smallDF$Treatment)
p <- ggplot(data=smallDF, aes(x=pNew, y=diff, group=Treatment))
p <- p + geom_errorbar(data=smallDF, aes(ymin=diff-se,ymax=diff+se),
color="black", width=.3, position=position_dodge(.5))
p <- p + geom_line(size=.3)
p <- p + geom_point(data=smallDF, aes(shape=Treatment),fill="white",
size=2.5)
#p <- p + scale_shape_manual(values=c(1, 19, 15, 1, 0))
p <- p + scale_shape_manual(values=shapes)
p <- p + xlab("Pressure (mmHg)") + ylab("delD (mm)")
p <- p + theme_bw()
p <- p + theme(
legend.position="none"
# , axis.text.y=element_blank()
# , axis.title.y=element_blank()
, panel.border=element_blank()
, axis.line = element_line(colour = "black")
, axis.text.x = element_text(size=10)
, axis.text.y = element_text(size=10)
, axis.title.x= element_text(size=10)
, axis.title.y= element_text(size=10)
, strip.text.x = element_text(size = 10)
)
p
Treatment step N diff sd se ci predictD pNew
cntl 2 7 0.256537749 0.130605763 0.049364339 0.120790185 1.483185156 10
cntl 3 7 0.317586245 0.151444256 0.057240549 0.140062577 1.626590815 15
cntl 4 7 0.377309785 0.165262839 0.062463482 0.152842634 1.788401781 20
cntl 5 7 0.433531627 0.173735352 0.065665791 0.160678402 1.964393744 25
cntl 6 7 0.467529177 0.171603123 0.064859884 0.158706419 2.142879696 30
cntl 7 7 0.441401156 0.163740786 0.0618882 0.15143497 2.291729181 35
cntl 8 7 0.360578168 0.143967074 0.054414439 0.133147337 2.402152991 40
cntl 9 7 0.263484929 0.117425017 0.044382485 0.108600028 2.481824239 45
cntl 10 7 0.172079736 0.094209661 0.035607905 0.087129405 2.53504158 50
cntl 11 7 0.081780331 0.070316765 0.026577239 0.065032161 2.561500546 55
cntl 12 7 0.172079736 0.094209661 0.035607905 0.087129405 2.53504158 50
cntl 13 7 0.263484929 0.117425017 0.044382485 0.108600028 2.481824239 45
cntl 14 7 0.360578168 0.143967074 0.054414439 0.133147337 2.402152991 40
cntl 15 7 0.441401156 0.163740786 0.0618882 0.15143497 2.291729181 35
cntl 16 7 0.467529177 0.171603123 0.064859884 0.158706419 2.142879696 30
cntl 17 7 0.433531627 0.173735352 0.065665791 0.160678402 1.964393744 25
cntl 18 7 0.377309785 0.165262839 0.062463482 0.152842634 1.788401781 20
cntl 19 7 0.317586245 0.151444256 0.057240549 0.140062577 1.626590815 15
cntl 20 7 0.256537749 0.130605763 0.049364339 0.120790185 1.483185156 10
3hpx 2 6 0.124643574 0.068765439 0.028073373 0.072164903 1.511618688 10
3hpx 3 8 0.121806932 0.088542241 0.03130441 0.074023166 1.537544183 15
3hpx 4 8 0.138107729 0.097228081 0.034375318 0.08128471 1.602344034 20
3hpx 5 8 0.149529 0.10158369 0.035915258 0.08492609 1.665227481 25
3hpx 6 8 0.157687817 0.101898303 0.036026491 0.085189113 1.724788023 30
3hpx 7 8 0.154261671 0.099724849 0.035258058 0.08337206 1.776475381 35
3hpx 8 8 0.140631071 0.091483297 0.03234423 0.07648195 1.819131367 40
3hpx 9 8 0.123241311 0.083112381 0.029384664 0.069483689 1.854326249 45
3hpx 10 8 0.103092334 0.075467614 0.026681831 0.063092504 1.882440312 50
3hpx 11 8 0.080332775 0.070303352 0.024855989 0.058775073 1.903587298 55
3hpx 12 8 0.103092334 0.075467614 0.026681831 0.063092504 1.882440312 50
3hpx 13 8 0.123241311 0.083112381 0.029384664 0.069483689 1.854326249 45
3hpx 14 8 0.140631071 0.091483297 0.03234423 0.07648195 1.819131367 40
3hpx 15 8 0.154261671 0.099724849 0.035258058 0.08337206 1.776475381 35
3hpx 16 8 0.157687817 0.101898303 0.036026491 0.085189113 1.724788023 30
3hpx 17 8 0.149529 0.10158369 0.035915258 0.08492609 1.665227481 25
3hpx 18 8 0.138107729 0.097228081 0.034375318 0.08128471 1.602344034 20
3hpx 19 8 0.121806932 0.088542241 0.03130441 0.074023166 1.537544183 15
3hpx 20 6 0.124643574 0.068765439 0.028073373 0.072164903 1.511618688 10
I changed your width = 0.3 to width = 5. I also deleted all the redundant data = smallDF that you had and replaced your individually-set font sizes with the base_size argument of theme_bw().
p <- ggplot(data = smallDF, aes(x = pNew, y = diff, group = Treatment)) +
geom_errorbar(aes(ymin = diff - se, ymax = diff + se),
color = "black",
width = 5, ## This is the width of the crossbars!
position = position_dodge(.5))
geom_line(size = .3)
geom_point(aes(shape = Treatment),
fill = "white", size = 2.5) +
scale_shape_manual(values = shapes) +
xlab("Pressure (mmHg)") +
ylab("delD (mm)") +
theme_bw(base_size = 10) +
theme(
legend.position="none"
, panel.border=element_blank()
, axis.line = element_line(colour = "black")
)
p

ggplot2 merge color and fill legends

I want to merge two legends in ggplot2. I use the following code:
ggplot(dat_ribbon, aes(x = x)) +
geom_ribbon(aes(ymin = ymin, ymax = ymax,
group = group, fill = "test4 test5"), alpha = 0.2) +
geom_line(aes(y = y, color = "Test2"), data = dat_m) +
scale_colour_manual(values=c("Test2" = "white", "test"="black", "Test3"="red")) +
scale_fill_manual(values = c("test4 test5"= "dodgerblue4")) +
theme(legend.title=element_blank(),
legend.position = c(0.8, 0.85),
legend.background = element_rect(fill="transparent"),
legend.key = element_rect(colour = 'purple', size = 0.5))
The output is shown below. There are two problems:
When I use two or more words in the fill legend, the alignment becomes wrong
I want to merge the two legends into one, such that the fill legend is just part of a block of 4.
Does anyone know how I can achieve this?
Edit: reproducible data:
dat_m <- read.table(text="x quantile y group
1 1 50 0.4967335 0
2 2 50 0.4978249 0
3 3 50 0.5113562 0
4 4 50 0.4977866 0
5 5 50 0.5013287 0
6 6 50 0.4997994 0
7 7 50 0.4961121 0
8 8 50 0.4991302 0
9 9 50 0.4976087 0
10 10 50 0.5011666 0")
dat_ribbon <- read.table(text="
x ymin group ymax
1 1 0.09779713 40 0.8992385
2 2 0.09979283 40 0.8996875
3 3 0.10309222 40 0.9004759
4 4 0.10058433 40 0.8985366
5 5 0.10259125 40 0.9043807
6 6 0.09643109 40 0.9031940
7 7 0.10199870 40 0.9022920
8 8 0.10018253 40 0.8965690
9 9 0.10292754 40 0.9010934
10 10 0.09399359 40 0.9053067
11 1 0.20164694 30 0.7974174
12 2 0.20082056 30 0.7980642
13 3 0.20837821 30 0.8056074
14 4 0.19903399 30 0.7973723
15 5 0.19903322 30 0.8050146
16 6 0.19965049 30 0.8051922
17 7 0.20592719 30 0.8042850
18 8 0.19810139 30 0.7956606
19 9 0.20537392 30 0.8007527
20 10 0.19325158 30 0.8023044
21 1 0.30016463 20 0.6953927
22 2 0.29803646 20 0.6976961
23 3 0.30803808 20 0.7048137
24 4 0.30045448 20 0.6991248
25 5 0.29562249 20 0.7031225
26 6 0.29647060 20 0.7043499
27 7 0.30159103 20 0.6991356
28 8 0.30369025 20 0.6949053
29 9 0.30196483 20 0.6998127
30 10 0.29578036 20 0.7015861
31 1 0.40045725 10 0.5981147
32 2 0.39796299 10 0.5974115
33 3 0.41056038 10 0.6057062
34 4 0.40046287 10 0.5943157
35 5 0.39708008 10 0.6014512
36 6 0.39594129 10 0.6011162
37 7 0.40052411 10 0.5996186
38 8 0.40128517 10 0.5959748
39 9 0.39917658 10 0.6004600
40 10 0.39791453 10 0.5999168")
You are not using ggplot2 according to its philosophy. That makes things difficult.
ggplot(dat_ribbon, aes(x = x)) +
geom_ribbon(aes(ymin = ymin, ymax = ymax, group = group, fill = "test4 test5"),
alpha = 0.2) +
geom_line(aes(y = y, color = "Test2"), data = dat_m) +
geom_blank(data = data.frame(x = rep(5, 4), y = 0.5,
group = c("test4 test5", "Test2", "test", "Test3")),
aes(y = y, color = group, fill = group)) +
scale_color_manual(name = "combined legend",
values=c("test4 test5"= NA, "Test2" = "white",
"test"="black", "Test3"="red")) +
scale_fill_manual(name = "combined legend",
values = c("test4 test5"= "dodgerblue4",
"Test2" = NA, "test"=NA, "Test3"=NA))

Resources