I have two datasets (A for the age dataset and TE for the concentration dataset) and I'm aiming to plot concentration ~ Age but I'm stuck on how to merge and expand the Age data to fit the much larger database containing concentrations. These are examples of my two datasets:
(A) Distance in this case is in multiples of 25 micrometers and is distance along the slide. The total distance along each slide differs between slides depending on the side of the item on each slide. Age is cumulative age along each slide (so everything is nested within slide).
Slide
Age
Distance
1
7
25
1
14
50
1
22
75
1
28
100
2
8
25
2
15
50
(TE) Distance is continuous and is distance along the slide but more fine scale and distance between one data point to the next is not consistent.
Slide
Concentration
Distance
1
7800
0.57
1
7895
0.61
1
6547
1.22
1
6589
1.73
1
6887
4.89
1
6342
5.50
2
8560
35.50
2
8657
36.11
2
8500
38.43
2
8352
39.17
2
8334
41.01
2
7456
42.84
2
8912
56.92
I need a way to merge the two so I can do:
ggplot(TE, aes(x = Age, y = Concentration, group = Slide))+
geom(line)
...by expanding the age data to fit to the continuous distance scale in the TE dataset by interpolating age for each distance in the TE database. Something like this:
Slide
Concentration
Distance
Age
1
7800
0.57
0.3
1
7895
0.61
0.4
1
6547
1.22
0.8
1
6589
1.73
1.2
1
6887
4.89
4.3
1
6342
5.50
5.5
2
8560
35.50
7.3
2
8657
36.11
7.4
2
8500
38.43
7.6
2
8352
39.17
7.7
2
8334
41.01
7.8
2
7456
42.84
7.9
2
8912
56.92
8.4
Any ideas?
p.s. Sorry if this isn't clear I can update as necessary if it's not reproducible enough
Based on the data for slide 1 in Table A, it appears that there is a linear relationship between age and distance. Rather than initially joining the two tables while simultaneously interpolating distances in Table A based on age, instead you could: 1) split() Table A by slide; B) use lm() to obtain a linear model of age on distance for each slide; and C) use predict() with each linear model and the distance data from Table TE. This will give you a the linearly interpolated ages for each concentration in Table TE. The interpolated age and concentration data can then be combined for plotting.
I was plotting a bipartite graph using igraph package with R. There are about 10,000 edges, I want to expand the width of the whole plot to avoid state vertices overlapped.
my data looks like this:
> test2
user_id state meanlat meanlon countUS countS degState
<chr> <chr> <dbl> <dbl> <int> <int> <int>
1 -_1ctLaz3jhPYc12hKXsEQ NC 35.19401 -80.83235 909 3 18487
2 -_1ctLaz3jhPYc12hKXsEQ NV 36.11559 -115.18042 29 3 37884
3 -_1ctLaz3jhPYc12hKXsEQ SC 35.05108 -80.96166 4 3 665
4 -0wUMy3vgInUD4S6KJInnw IL 40.11227 -88.22955 2 3 1478
5 -0wUMy3vgInUD4S6KJInnw NV 36.11559 -115.18042 23 3 37884
6 -0wUMy3vgInUD4S6KJInnw WI 43.08051 -89.39835 20 3 3963
and below is my code on graph creating and setting.
g2 <- graph_from_data_frame(test2,directed = F)
V(g2)$type <- ifelse(names(V(g2)) %in% UserStateR$user_id, 'user', 'state')
V(g2)$label <- ifelse(V(g2)$type == 'user', " ", paste(names(V(g2)),"\n",as.character(test2$degState),sep=""))
V(g2)$size <- ifelse(V(g2)$type == 'user', 3, 20)
V(g2)$color <- ifelse(V(g2)$type == 'user', 'wheat', 'salmon')
V(g2)$type <- ifelse(names(V(g2)) %in% UserStateR$user_id, T, F )
E(g2)$color <- heat.colors(8)[test2$countS]
plot(g2,layout=layout.bipartite(g2, types = names(V(g2)) %in% UserStateR$state, hgap = 50, vgap = 50))
as you can see, I have tried to change the hgap and vgap arguments, but it doesn't work apparently. I have also tried asp argument, but that is not what I want.
I know this might be too late for #floatsd but I was struggling with this today and had a really hard time finding an answer, so this might help others out.
First, in general, there is a an attribute to iplot.graph called asp that very simply controls how rectangular your plot is. Simply do
l=layout.bipartite(CCM_net)
plot(CCM_net, layout=l, asp=0.65)
for a wide plot. asp smaller than 1 gives you a wide plot, asp larger than 1 a tall plot.
However, this might still not give you the layout you want. The bipartite command basically generates a matrix with coordinates for your vertices, and I actually don't understand yet how it comes up with the x-coordinates, so I ended up changing them myself.
Below the example (I am assuming you know how to turn your data into data frames with the edge list and edge/vertex attributes for making graphs so am skipping that).
My data is CCM_data_sign and is
from to value
2 EVI MAXT 0.67
4 EVI MINT 0.81
5 EVI P 0.70
7 EVI SM 0.79
8 EVI AMO 0.86
11 MAXT EVI 0.81
18 MAXT AMO 0.84
21 MEANT EVI 0.88
28 MEANT AMO 0.83
29 MEANT PDO 0.71
31 MINT EVI 0.96
39 MINT PDO 0.78
40 MINT MEI 0.66
41 P EVI 0.91
49 P PDO 0.77
50 P MEI 0.71
51 PET EVI 0.90
58 PET AMO 0.89
59 PET PDO 0.70
61 SM EVI 0.94
68 SM AMO 0.90
69 SM PDO 0.81
70 SM MEI 0.73
74 AMO MINT 0.93
76 AMO PET 0.66
79 AMO PDO 0.71
80 AMO MEI 0.83
90 PDO MEI 0.82
The data frame I generated for graphing is called CCM_net.
First a bipartite plot without any layout adjustments
V(CCM_net)$size<-30
l=layout.bipartite(CCM_net)
plot(CCM_net,
layout=l,
edge.arrow.size=1,
edge.arrow.width=2,
vertex.label.family="Helvetica",
vertex.label.color="black",
vertex.label.cex=2,
vertex.label.dist=c(3,3,3,3,3,3,3,3,3,3,3),
vertex.label.degree=c(pi/2,-pi/2,-pi/2,-pi/2,-pi/2,-pi/2,-pi/2,-pi/2,pi/2,pi/2,pi/2), #0 is right, “pi” is left, “pi/2” is below, and “-pi/2” is above
edge.lty=1)
This gives you the following
If I use asp I get the following
plot(CCM_net,
layout=l,
edge.arrow.size=1,
vertex.label.family="Helvetica",
vertex.label.color="black",
vertex.label.cex=2,
vertex.label.dist=c(3,3,3,3,3,3,3,3,3,3,3),
vertex.label.degree=c(pi/2,-pi/2,-pi/2,-pi/2,-pi/2,-pi/2,-pi/2,-pi/2,pi/2,pi/2,pi/2), #0 is right, “pi” is left, “pi/2” is below, and “-pi/2” is above
edge.arrow.width=2,
edge.lty=1,
asp=0.6) # controls how rectangular the plot is. <1 = wide, >1 = tall
dev.off()
This is looking better, but still not really what I want - see how some vertices are closer to each other than others?
So eventually I took the following approach. Setting the coordinates as bipartite looks like this
coords <- layout_as_bipartite(CCM_net)
coords
[,1] [,2]
[1,] 3.0 0
[2,] 0.0 1
[3,] 2.0 1
[4,] 3.5 1
[5,] 6.0 1
[6,] 1.0 1
[7,] 5.0 1
[8,] 7.0 1
[9,] 1.0 0
[10,] 4.5 0
[11,] 5.5 0
This matrix shows the x coordinates of your vertices in the first columns and the y coordinates in the second column, ordered according to your list with names. My list with names is
id name
1 EVI EVI
2 MAXT MAXT
3 MEANT MEANT
4 MINT MINT
5 P P
6 PET PET
7 SM SM
8 SR SR
9 AMO AMO
10 PDO PDO
11 MEI MEI
In my graph, EVI, AMO and PDO are on the bottom, but note their x coordinates: 3.0, 1.0, 4.5 and 5.5. I haven't figured out yet how the code comes up with that, but I don't like it so I simply changed it.
coords[,1]=c(2,0,4,8,12,16,20,24,9,16,24)
Now the plotting code (also with asp) and the output becomes
plot(CCM_net,
layout=coords,
edge.arrow.size=1,
vertex.label.family="Helvetica",
vertex.label.color="black",
vertex.label.cex=1,
vertex.label.dist=c(4,4,4,4,4,4,4,4,4,4,4),
vertex.label.degree=c(pi/2,-pi/2,-pi/2,-pi/2,-pi/2,-pi/2,-pi/2,-pi/2,pi/2,pi/2,pi/2), #0 is right, “pi” is left, “pi/2” is below, and “-pi/2” is above
edge.arrow.width=2,
edge.lty=1,
asp=0.6) # controls how rectangular the plot is. <1 = wide, >1 = tall
Now the vertices are nicely spaced in a rectangular plot!
Note - I also decreased the size of the vertices, the size of the labels and their positioning, for better readability.
I think you can output with PDF. then zoom in.
Or, use rgexf package to output gexf file. Then visualizate in gephi.
I think gephi is a good tools for network visualization.
I was hoping I could get some help. I am constructing a life table, not for insurance, but for ecology (a cross-sectional of the population of a any kind of wild fauna), so essentially censoring variables like smoker/non-smoker, pregnant, gender, health-status, etc.:
AgeClass=C(1,2,3,4,5,6)
SampleSize=c(100,99,87,46,32,19)
for(i in 1:6){
+ PropSurv=c(Sample/100)
+ }
> LifeTab1=data.frame(cbind(AgeClass,Sample,PropSurv))
Which gave me this:
ID AgeClas Sample PropSurv
1 1 100 1.00
2 2 99 0.99
3 3 87 0.87
4 4 46 0.46
5 5 32 0.32
6 6 19 0.19
I'm now trying to calculate those that died in each row (DeathInt) by taking the initial number of those survived and subtracting it by the number below it (i.e. 100-99, then 99-87, then 87-46, so on and so forth). And try to look like this:
ID AgeClas Sample PropSurv DeathInt
1 1 100 1.00 1
2 2 99 0.99 12
3 3 87 0.87 41
4 4 46 0.46 14
5 5 32 0.32 13
6 6 19 0.19 NA
I found this and this, and I wasn't sure if they answered my question as these guys subtracted values based on groups. I just wanted to subtract values by row.
Also, just as a side note: I did a for() to get the proportion that survived in each age group. I was wondering if there was another way to do it or if that's the proper, easiest way to do it.
Second note: If any R-users out there know of an easier way to do a life-table for ecology, do let me know!
Thanks!
If you have a vector x, that contains numbers, you can calculate the difference by using the diff function.
In your case it would be
LifeTab1$DeathInt <- c(-diff(Sample), NA)
I have a large data frame as follows which is a subset of a larger data frame.
tree=data.frame(INVYR=tree$INVYR,
DIA=tree$DIA,PLOT=tree$PLOT,SPCD=tree$SPCD,
D.2=tree$D.2, BA.T=tree$BA.T)
What I am attempting to do is calculate the total BA.T per Plot per Year (plots are remeasured in subsequent years). I do this by ...
x<-aggregate(tree$BA.T,list(tree$INVYR,tree$PLOT),FUN=sum)
x$PLOT<-x$Group.2
x<- x[with(x, order(Group.1,Group.2)), ]
This gives me the data frame...
x=data.frame(Group.1,Group.2,x,PLOT)
Where Group.1 is the INVYR, Group.2 is the PLOT, and x is total BA.T per plot per year. So far this works great. Here is where my problem begins. I then want to integrate this back into my original tree data.frame. If I merge the data by plot it doesn't account for year and quadrupoles the data set because of the four remeasurements. I can't run an if statement because the data set is not equal lengths. The data.frame I wish to accompolish is
tree=data.frame(INVYR, DIA, PLOT, SPCD, D.2, BA.T, x)
where x is the total BA.T for the given INVYR and PLOT of that record.
Any thoughts would be greatly appreciated. Thanks.
Edit
INVYR=rbind(1982,1982,1982,1982,1982,1995,1995,1995,1995,1995,2000,2000,2000,2000,2000)
PLOT=rbind(1,1,2,2,3,1,1,2,2,3,1,1,2,2,3)
BA.T=rbind(.1,.2,.3,.4,.2,.3,.5,.8,.3,.6,.7,.2,.1,1,1.02)
tree=data.frame(INVYR,PLOT,BA.T)
head(tree)
x<-aggregate(tree$BA.T,list(tree$INVYR,tree$PLOT),FUN=sum)
x$PLOT<-x$Group.2
x$INVYR<-x$Group.1
x<- x[with(x, order(Group.1,Group.2)), ]
head(x)
On solution is to use package reshape2.
library(reshape2)
melt(data=tree,id.vars=c('INVYR','PLOT')) ## Notice the choice of the id!the keys!
dcast(tree.m,formula=...~variable,fun.aggregate=sum)
INVYR PLOT BA.T
1 1982 1 0.30
2 1982 2 0.70
3 1982 3 0.20
4 1995 1 0.80
5 1995 2 1.10
6 1995 3 0.60
7 2000 1 0.90
8 2000 2 1.10
9 2000 3 1.02