graphic visualization of correlation between samples - r

s1
name tis1 tis2 tis3 tis4 tis5 tis6 tis7 tis8 tis9 tis10 tis11 tis12
S1 0 0 0 12.1 29.2 1.9 0.45 0.2 17.0 0.4 0.7 0.1
s2
name tis1 tis2 tis3 tis4 tis5 tis6 tis7 tis8 tis9 tis10 tis11 tis12
S2 1 2 0.4 14.1 9.2 1.8 0.7 0.9 7.0 0.3 0.7 0.3
I would to plot them to visualize their degree of correlation.
There is a way to do it?

Related

Times series in R : how to change y-axis?

New R user here, working with meteorological data (data frame is called "Stations"). Trying to plot 3 time series with temperature on y-axis with a regression line on each one, but I encounter a few problems and there is no error messages.
Loop doesn't seem to be working and I can't figure out why.
Didn't manage to change x-axis graduation values for years ("Année" in the data frame) instead of a number.
Title is the same for the 3 plots, how do I change it so each plot has its own title?
Regression line is not shown on the graph.
Thanks in advance!
Here is my code :
for (i in c(6,8,10))
plot(ts(Stations[,i]), col="dodgerblue4", xlab="Temps", ylab="Température", main="Genève")
for (i in c(6,8,10))
abline(h=Stations[,i])```
Nb.enr time Année Mois Jour T2m_GE pcp_GE T2m_PU pcp_PU T2m_NY
1 19810101 1981 1 1 1.3 0.3 2.8 0.0 2.3
2 19810102 1981 1 2 1.2 0.1 2.3 1.2 1.6
3 19810103 1981 1 3 4.1 21.8 4.9 5.2 3.8
4 19810104 1981 1 4 5.1 10.3 5.1 17.4 4.9
5 19810105 1981 1 5 0.9 0.0 1.0 0.1 0.8
6 19810106 1981 1 6 0.5 5.7 0.7 6.0 0.5
7 19810107 1981 1 7 -2.7 0.0 -2.1 0.1 -1.9
8 19810108 1981 1 8 -3.2 0.0 -4.1 0.0 -3.8
9 19810109 1981 1 9 -5.2 0.0 -3.5 0.0 -5.1
10 19810110 1981 1 10 -3.1 10.6 -0.9 6.0 -2.6

Subset using i statement dynamically created from another data.table's variables

I have data similar to the following:
set.seed(1)
dt <- data.table(ID=1:10, Status=c(rep("OUT",2),rep("IN",2),"ON",rep("OUT",2),rep("IN",2),"ON"),
t1=round(rnorm(10),1), t2=round(rnorm(10),1), t3=round(rnorm(10),1),
t4=round(rnorm(10),1), t5=round(rnorm(10),1), t6=round(rnorm(10),1),
t7=round(rnorm(10),1),t8=round(rnorm(10),1))
ID Status t1 t2 t3 t4 t5 t6 t7 t8
1: 1 OUT -0.6 1.5 0.9 1.4 -0.2 0.4 2.4 0.5
2: 2 OUT 0.2 0.4 0.8 -0.1 -0.3 -0.6 0.0 -0.7
3: 3 IN -0.8 -0.6 0.1 0.4 0.7 0.3 0.7 0.6
4: 4 IN 1.6 -2.2 -2.0 -0.1 0.6 -1.1 0.0 -0.9
5: 5 ON 0.3 1.1 0.6 -1.4 -0.7 1.4 -0.7 -1.3
6: 6 OUT -0.8 0.0 -0.1 -0.4 -0.7 2.0 0.2 0.3
7: 7 OUT 0.5 0.0 -0.2 -0.4 0.4 -0.4 -1.8 -0.4
8: 8 IN 0.7 0.9 -1.5 -0.1 0.8 -1.0 1.5 0.0
9: 9 IN 0.6 0.8 -0.5 1.1 -0.1 0.6 0.2 0.1
10: 10 ON -0.3 0.6 0.4 0.8 0.9 -0.1 2.2 -0.6
I need to apply constraints to dt similar to the following (which are read in from a csv using fread):
dt_constraints <- data.table(columns=c("t1","t3","t7","t8"), operator=c(rep(">=",2),rep("<=",2)),
values=c(-.6,-.5,2.4,.5))
columns operator values
1 t1 >= -0.6
2 t3 >= -0.5
3 t7 <= 2.4
4 t8 <= 0.5
I can easily subset dt by typing in the various constraints in the i statement:
dt_sub <- dt[t1>=-.6 & t3 >=-.5 & t7<=2.4 & t8<=.5,]
ID Status t1 t2 t3 t4 t5 t6 t7 t8
1 1 OUT -0.6 1.5 0.9 1.4 -0.2 0.4 2.4 0.5
2 2 OUT 0.2 0.4 0.8 -0.1 -0.3 -0.6 0 -0.7
3 5 ON 0.3 1.1 0.6 -1.4 -0.7 1.4 -0.7 -1.3
4 7 OUT 0.5 0 -0.2 -0.4 0.4 -0.4 -1.8 -0.4
5 9 IN 0.6 0.8-0.5 1.1 -0.1 0.6 0.2 0.1
6 10 ON -0.3 0.6 0.4 0.8 0.9 -0.1 2.2 -0.6
But, since the constraints are constantly changing (a new constrants csv is read in each time), I am looking for an efficient way to programatically apply the constraints directly from dt_constraints to subset dt. The actual data is quite large as is the number of constraints so efficiency is key.
Thanks so much.
There is an alternative approach which uses non-equi joins for subsetting:
thresholds <- dt_constraints[, values]
cond <- dt_constraints[, paste0(columns, operator, "V", .I)]
dt[dt[as.list(thresholds), on = cond, which = TRUE]]
ID Status t1 t2 t3 t4 t5 t6 t7 t8
1: 1 OUT -0.6 1.5 0.9 1.4 -0.2 0.4 2.4 0.5
2: 2 OUT 0.2 0.4 0.8 -0.1 -0.3 -0.6 0.0 -0.7
3: 5 ON 0.3 1.1 0.6 -1.4 -0.7 1.4 -0.7 -1.3
4: 7 OUT 0.5 0.0 -0.2 -0.4 0.4 -0.4 -1.8 -0.4
5: 9 IN 0.6 0.8 -0.5 1.1 -0.1 0.6 0.2 0.1
6: 10 ON -0.3 0.6 0.4 0.8 0.9 -0.1 2.2 -0.6
We can paste it as a single string and then do the eval
dt[eval(parse(text=do.call(paste, c(dt_constraints, collapse= ' & '))))]
# ID Status t1 t2 t3 t4 t5 t6 t7 t8
#1: 1 OUT -0.6 1.5 0.9 1.4 -0.2 0.4 2.4 0.5
#2: 2 OUT 0.2 0.4 0.8 -0.1 -0.3 -0.6 0.0 -0.7
#3: 5 ON 0.3 1.1 0.6 -1.4 -0.7 1.4 -0.7 -1.3
#4: 7 OUT 0.5 0.0 -0.2 -0.4 0.4 -0.4 -1.8 -0.4
#5: 9 IN 0.6 0.8 -0.5 1.1 -0.1 0.6 0.2 0.1
#6: 10 ON -0.3 0.6 0.4 0.8 0.9 -0.1 2.2 -0.6
If we are using tidyverse, then
library(dplyr)
dt %>%
filter(!!rlang::parse_expr(do.call(paste, c(dt_constraints, collapse= ' & '))))
# ID Status t1 t2 t3 t4 t5 t6 t7 t8
#1 1 OUT -0.6 1.5 0.9 1.4 -0.2 0.4 2.4 0.5
#2 2 OUT 0.2 0.4 0.8 -0.1 -0.3 -0.6 0.0 -0.7
#3 5 ON 0.3 1.1 0.6 -1.4 -0.7 1.4 -0.7 -1.3
#4 7 OUT 0.5 0.0 -0.2 -0.4 0.4 -0.4 -1.8 -0.4
#5 9 IN 0.6 0.8 -0.5 1.1 -0.1 0.6 0.2 0.1
#6 10 ON -0.3 0.6 0.4 0.8 0.9 -0.1 2.2 -0.6

Using R, data.table, conditionally sum columns

I have a data table similar to this (except it has 150 columns and about 5 million rows):
set.seed(1)
dt <- data.table(ID=1:10, Status=c(rep("OUT",2),rep("IN",2),"ON",rep("OUT",2),rep("IN",2),"ON"),
t1=round(rnorm(10),1), t2=round(rnorm(10),1), t3=round(rnorm(10),1),
t4=round(rnorm(10),1), t5=round(rnorm(10),1), t6=round(rnorm(10),1),
t7=round(rnorm(10),1),t8=round(rnorm(10),1))
which outputs:
ID Status t1 t2 t3 t4 t5 t6 t7 t8
1: 1 OUT -0.6 1.5 0.9 1.4 -0.2 0.4 2.4 0.5
2: 2 OUT 0.2 0.4 0.8 -0.1 -0.3 -0.6 0.0 -0.7
3: 3 IN -0.8 -0.6 0.1 0.4 0.7 0.3 0.7 0.6
4: 4 IN 1.6 -2.2 -2.0 -0.1 0.6 -1.1 0.0 -0.9
5: 5 ON 0.3 1.1 0.6 -1.4 -0.7 1.4 -0.7 -1.3
6: 6 OUT -0.8 0.0 -0.1 -0.4 -0.7 2.0 0.2 0.3
7: 7 OUT 0.5 0.0 -0.2 -0.4 0.4 -0.4 -1.8 -0.4
8: 8 IN 0.7 0.9 -1.5 -0.1 0.8 -1.0 1.5 0.0
9: 9 IN 0.6 0.8 -0.5 1.1 -0.1 0.6 0.2 0.1
10: 10 ON -0.3 0.6 0.4 0.8 0.9 -0.1 2.2 -0.6
Using data.table, I would like to add a new column (using :=) called Total that would contain the following:
For each row,
if Status=OUT, sum columns t1:t4 and t8
if Status=IN, sum columns t5,t6,t8
if Status=ON, sum columns t1:t3 and t6:t8
The final output should look like this:
ID Status t1 t2 t3 t4 t5 t6 t7 t8 Total
1: 1 OUT -0.6 1.5 0.9 1.4 -0.2 0.4 2.4 0.5 3.7
2: 2 OUT 0.2 0.4 0.8 -0.1 -0.3 -0.6 0.0 -0.7 0.6
3: 3 IN -0.8 -0.6 0.1 0.4 0.7 0.3 0.7 0.6 1.6
4: 4 IN 1.6 -2.2 -2.0 -0.1 0.6 -1.1 0.0 -0.9 -1.4
5: 5 ON 0.3 1.1 0.6 -1.4 -0.7 1.4 -0.7 -1.3 1.4
6: 6 OUT -0.8 0.0 -0.1 -0.4 -0.7 2.0 0.2 0.3 -1.0
7: 7 OUT 0.5 0.0 -0.2 -0.4 0.4 -0.4 -1.8 -0.4 -0.5
8: 8 IN 0.7 0.9 -1.5 -0.1 0.8 -1.0 1.5 0.0 -0.2
9: 9 IN 0.6 0.8 -0.5 1.1 -0.1 0.6 0.2 0.1 0.6
10: 10 ON -0.3 0.6 0.4 0.8 0.9 -0.1 2.2 -0.6 2.2
I am fairly new to data.table (currently using version 1.9.6) and would like to try for a solution using efficient data.table syntax.
I think doing it one by one, as suggested in comments, is perfectly fine, but you can also create a lookup table:
cond = data.table(Status = c("OUT", "IN", "ON"),
cols = Map(paste0, 't', list(c(1:4, 8), c(5,6,8), c(1:3, 6:8))))
# Status cols
#1: OUT t1,t2,t3,t4,t8
#2: IN t5,t6,t8
#3: ON t1,t2,t3,t6,t7,t8
dt[cond, Total := Reduce(`+`, .SD[, cols[[1]], with = F]), on = 'Status', by = .EACHI]
# ID Status t1 t2 t3 t4 t5 t6 t7 t8 Total
# 1: 1 OUT -0.6 1.5 0.9 1.4 -0.2 0.4 2.4 0.5 3.7
# 2: 2 OUT 0.2 0.4 0.8 -0.1 -0.3 -0.6 0.0 -0.7 0.6
# 3: 3 IN -0.8 -0.6 0.1 0.4 0.7 0.3 0.7 0.6 1.6
# 4: 4 IN 1.6 -2.2 -2.0 -0.1 0.6 -1.1 0.0 -0.9 -1.4
# 5: 5 ON 0.3 1.1 0.6 -1.4 -0.7 1.4 -0.7 -1.3 1.4
# 6: 6 OUT -0.8 0.0 -0.1 -0.4 -0.7 2.0 0.2 0.3 -1.0
# 7: 7 OUT 0.5 0.0 -0.2 -0.4 0.4 -0.4 -1.8 -0.4 -0.5
# 8: 8 IN 0.7 0.9 -1.5 -0.1 0.8 -1.0 1.5 0.0 -0.2
# 9: 9 IN 0.6 0.8 -0.5 1.1 -0.1 0.6 0.2 0.1 0.6
#10: 10 ON -0.3 0.6 0.4 0.8 0.9 -0.1 2.2 -0.6 2.2

ggplot2 - combining shape and color legend with common title

I have a data frame that looks like this.
lambda lambdas mu p
1 0.5 0.25 3.6 1.931105
2 0.5 0.25 3.8 2.150458
3 0.5 0.25 4.0 2.264805
4 0.5 0.25 4.2 2.337036
5 0.5 0.25 4.4 2.385832
6 0.5 0.25 4.6 2.420036
7 0.5 0.25 4.8 2.444610
8 0.5 0.25 5.0 2.462598
9 0.5 0.25 5.2 2.475974
10 0.5 0.25 5.4 2.486068
11 0.5 0.25 5.6 2.493801
12 0.5 0.25 5.8 2.499824
13 0.5 0.25 6.0 2.504604
14 0.5 0.25 6.2 2.508482
15 0.5 0.25 6.4 2.511708
16 0.5 0.25 6.6 2.514465
17 0.5 0.25 6.8 2.516892
18 0.5 0.25 7.0 2.519091
19 0.5 0.25 7.2 2.521137
20 0.5 0.25 7.4 2.523088
21 0.5 0.25 7.6 2.524984
22 0.5 0.25 7.8 2.526858
23 0.5 0.25 8.0 2.528729
24 0.5 0.3 4.0 1.453073
25 0.5 0.3 4.2 1.676078
26 0.5 0.3 4.4 1.769432
27 0.5 0.3 4.6 1.829259
28 0.5 0.3 4.8 1.871153
29 0.5 0.3 5.0 1.901801
30 0.5 0.3 5.2 1.924841
31 0.5 0.3 5.4 1.942502
32 0.5 0.3 5.6 1.956246
33 0.5 0.3 5.8 1.967078
34 0.5 0.3 6.0 1.975710
35 0.5 0.3 6.2 1.982661
36 0.5 0.3 6.4 1.988317
37 0.5 0.3 6.6 1.992968
38 0.5 0.3 6.8 1.996834
39 0.5 0.3 7.0 2.000085
40 0.5 0.3 7.2 2.002856
41 0.5 0.3 7.4 2.005248
42 0.5 0.3 7.6 2.007343
43 0.5 0.3 7.8 2.009207
44 0.5 0.3 8.0 2.010890
45 0.5 0.35 4.8 1.330792
46 0.5 0.35 5.0 1.415920
47 0.5 0.35 5.2 1.466734
48 0.5 0.35 5.4 1.502578
49 0.5 0.35 5.6 1.529478
50 0.5 0.35 5.8 1.550365
51 0.5 0.35 6.0 1.566948
52 0.5 0.35 6.2 1.580327
53 0.5 0.35 6.4 1.591256
54 0.5 0.35 6.6 1.600275
55 0.5 0.35 6.8 1.607783
56 0.5 0.35 7.0 1.614081
57 0.5 0.35 7.2 1.619400
58 0.5 0.35 7.4 1.623922
59 0.5 0.35 7.6 1.627789
60 0.5 0.35 7.8 1.631118
61 0.5 0.35 8.0 1.634000
62 0.5 0.4 6.0 1.093701
63 0.5 0.4 6.2 1.177399
64 0.5 0.4 6.4 1.214441
65 0.5 0.4 6.6 1.240465
66 0.5 0.4 6.8 1.260447
67 0.5 0.4 7.0 1.276454
68 0.5 0.4 7.2 1.289608
69 0.5 0.4 7.4 1.300606
70 0.5 0.4 7.6 1.309920
71 0.5 0.4 7.8 1.317887
72 0.5 0.4 8.0 1.324755
Using "scale_fill_discrete", I can combine the shape and color legends into one, but the title of the legend is not working. In other words:
plot = ggplot(data, aes(x = mu, y = p, color = lambdas, shape = lambdas))
plot = plot + geom_line()
plot = plot + geom_point(size = 2.5)
plot = plot + scale_fill_discrete(name = expression(lambda^{s}))
plot = plot + ylim(0,3)
plot = plot + ggtitle(expression(paste(p, ' when ', lambda, '= 0.5', sep = '')))
plot = plot + xlab(expression(mu)) + ylab(expression(p))
plot
this code gives the picture below.
On the other hand, if I use "scale_shape_discrete" and "scale_color discrete", then the title works out, but the legends are now separated. In other words:
plot = ggplot(data, aes(x = mu, y = p, color = lambdas, shape = lambdas))
plot = plot + geom_line()
plot = plot + geom_point(size = 2.5)
plot = plot + scale_shape_discrete(name = expression(lambda^{s}))
plot = plot + scale_color_discrete(name = expression(lambda^{s}))
plot = plot + ylim(0,3)
plot = plot + ggtitle(expression(paste(p, ' when ', lambda, '= 0.5', sep = '')))
plot = plot + xlab(expression(mu)) + ylab(expression(p))
plot
this code gives the picture below.
Is there any way to put the legends together AND have the title as in the second picture? Thank you.
Well, originally ggplot uses combined legend if values and name are common for two or more aes. In your case this is so (you specify identical names), but presumably it's expression that messes things up, so that ggplot think legend titles are different.
Let's outsmart him by using l <- expression(lambda^{s}) and
plot + labs(color=l, shape=l)
omitting scale_shape and scale_color completely.

Gnuplot: "all contours drawn in a single color" does not work

I am trying to draw all contours lines in a same color following the example from here: http://gnuplot.sourceforge.net/demo/contours.25.gnu
However, the example works, but my own code does not work:
set xyplane 0;
set pm3d
set contour
set cntrparam levels 6
unset surface;
unset key;
set pm3d map
set title "t";
splot for [i=1:1] "-" using 1:2:3 notitle with lines lc rgb "dark-blue";
....data....
Can you help me find the problem?
Here to download the code file:
https://dl.dropboxusercontent.com/u/45318932/contourpm3d.plt
I am using gnuplot4.6.5
The relevant line is
unset clabel
I know, that is very unintuitive. Don't know the reason behind it.
Here is the complete script with the respective changes, for reference:
set xyplane 0;
set pm3d
set contour
unset clabel
set cntrparam levels 6
unset surface;
unset key;
set pm3d map
splot for [i=1:1] "-" using 1:2:3 notitle with lines lw 2 lc rgb "dark-blue";
#a1 a2 t
0.0 0.0 25.0
0.0 0.1 28.0
0.0 0.2 37.0
0.0 0.3 23.0
0.0 0.4 23.0
0.0 0.5 15.0
0.0 0.6 16.0
0.0 0.7 33.0
0.0 0.8 16.0
0.0 0.9 20.0
0.0 1.0 14.0
0.1 0.0 25.0
0.1 0.1 47.0
0.1 0.2 26.0
0.1 0.3 14.0
0.1 0.4 16.0
0.1 0.5 15.0
0.1 0.6 27.0
0.1 0.7 13.0
0.1 0.8 14.0
0.1 0.9 20.0
0.1 1.0 0.0
0.2 0.0 25.0
0.2 0.1 28.0
0.2 0.2 26.0
0.2 0.3 14.0
0.2 0.4 16.0
0.2 0.5 16.0
0.2 0.6 32.0
0.2 0.7 14.0
0.2 0.8 19.0
0.2 0.9 0.0
0.2 1.0 0.0
0.3 0.0 57.0
0.3 0.1 36.0
0.3 0.2 26.0
0.3 0.3 14.0
0.3 0.4 15.0
0.3 0.5 16.0
0.3 0.6 31.0
0.3 0.7 18.0
0.3 0.8 0.0
0.3 0.9 0.0
0.3 1.0 0.0
0.4 0.0 42.0
0.4 0.1 23.0
0.4 0.2 26.0
0.4 0.3 19.0
0.4 0.4 15.0
0.4 0.5 16.0
0.4 0.6 34.0
0.4 0.7 0.0
0.4 0.8 0.0
0.4 0.9 0.0
0.4 1.0 0.0
0.5 0.0 54.0
0.5 0.1 23.0
0.5 0.2 26.0
0.5 0.3 17.0
0.5 0.4 15.0
0.5 0.5 16.0
0.5 0.6 0.0
0.5 0.7 0.0
0.5 0.8 0.0
0.5 0.9 0.0
0.5 1.0 0.0
0.6 0.0 21.0
0.6 0.1 23.0
0.6 0.2 23.0
0.6 0.3 16.0
0.6 0.4 16.0
0.6 0.5 0.0
0.6 0.6 0.0
0.6 0.7 0.0
0.6 0.8 0.0
0.6 0.9 0.0
0.6 1.0 0.0
0.7 0.0 21.0
0.7 0.1 16.0
0.7 0.2 27.0
0.7 0.3 12.0
0.7 0.4 0.0
0.7 0.5 0.0
0.7 0.6 0.0
0.7 0.7 0.0
0.7 0.8 0.0
0.7 0.9 0.0
0.7 1.0 0.0
0.8 0.0 61.0
0.8 0.1 27.0
0.8 0.2 33.0
0.8 0.3 0.0
0.8 0.4 0.0
0.8 0.5 0.0
0.8 0.6 0.0
0.8 0.7 0.0
0.8 0.8 0.0
0.8 0.9 0.0
0.8 1.0 0.0
0.9 0.0 27.0
0.9 0.1 21.0
0.9 0.2 0.0
0.9 0.3 0.0
0.9 0.4 0.0
0.9 0.5 0.0
0.9 0.6 0.0
0.9 0.7 0.0
0.9 0.8 0.0
0.9 0.9 0.0
0.9 1.0 0.0
1.0 0.0 35.0
1.0 0.1 0.0
1.0 0.2 0.0
1.0 0.3 0.0
1.0 0.4 0.0
1.0 0.5 0.0
1.0 0.6 0.0
1.0 0.7 0.0
1.0 0.8 0.0
1.0 0.9 0.0
1.0 1.0 0.0
e
with the output

Resources