Plotting Traits on a Phylogeny - r

I am following this guide on how to plot traits onto a phylogeny to determine trait conservatism. I have followed it step by step but can't seem to get either the community composition or trait plots on phylogeny to work at all for my datasets. I have formatted just as they said and it looks just like their example data sents to me.
I am not sure how to put tree files on here so here is one on a cloud link for all species and here is a tree that I used just for my native species used for trait plotting
VegComm <- df2vec(as.matrix(Veg2018), colID = 1:29) #community data
STraits <- read.csv()
rownames(STraits)<- STraits[,1]
STraits[1:1] <- list(NULL) #Trait Data
STraits <- df2vec(as.matrix.data.frame(STraits), colID=1:5)
STraits <- STraits[1:6,]
str(STraits)
prune.sample(VegComm,alltree)
par(mfrow=c(2,2))
for (i in colnames(STraits)) {
+ plot(nativetree, show.tip.label=TRUE, main=i)
+ tiplabels(pch=22, col=STraits[,i]+1, bg=STraits[,i]+1, cex=1.5)}
traits <- STraits[nativetree$tip.label,]
phylosignal(nativetree, STraits, nsim=1000, method="K")
Here is the community data:
Avena_fatua Bromus_diandrus Bromus_hordeaceus Festuca_myuros Festuca_perennis Carduus_pycnocephalus Cirsium_vulgare Erodium_cicutarium Geranium_dissectum Helminthotheca_echioides Lactuca_serriola Medicago_polymorpha Oxalis_pes-caprae Raphanus_sativus Senecio_vulgaris Sonchus_oleraceus Vicia_sativa Artemisia_californica Baccharis_pilularis Ericameria_ericoides Mimulus_aurantiacus Bromus_carinatus Elymus_triticoides Hordeum_brachyantherum Stipa_pulchra Achillea_millefolium Eschscholzia_californica Lupinus_variicolor Echium_candicans
PC1 0 1.25 0 20.83333333 7.416666667 0.5 0 0 21.25 0.333333333 0 6.916666667 0 4.916666667 0 0 0 4.583333333 18.33333333 1.25 0.833333333 0.5 0 0 0 7.5 1.25 0 0
PC2 0.5 0 0.333333333 14.16666667 2.25 0 0 0 25 0 1.916666667 30.41666667 0 3.666666667 0.833333333 0.833333333 0 0 17.91666667 0 0 2.083333333 0 0 0 3.333333333 0 0 0
PC3 0.333333333 4.083333333 0 27.5 3.333333333 6.083333333 0 0 15.83333333 1.75 2.416666667 3.833333333 0 6.666666667 0 5.916666667 0 1.25 2.083333333 0 2.5 5.416666667 0 0 1.25 5 0 0 0
PC4 0.333333333 1.25 3.333333333 10.41666667 15.83333333 5.833333333 0 0 25.83333333 0 1.583333333 10.75 0 5.833333333 0 1.25 0 0 2.083333333 0 0 0 0 0 0 3.416666667 2.916666667 0 0
PC5 1.916666667 0 8.833333333 10.91666667 6.666666667 0 0.333333333 0 15 1.25 1.75 0 0 3.333333333 0 10.83333333 0.5 0 3.333333333 0.5 0 4.666666667 0 0 0.5 9.166666667 0 0 0.666666667
PS1 0.333333333 3 0 6.25 2.25 16 0 0 11.41666667 0.333333333 0 3.833333333 0 0.833333333 0 1.166666667 0 0 12 0 0.166666667 3.333333333 0 0 0 49.16666667 0 0 0
PS2 2.25 4 0 6.5 1.25 13.75 0 4.166666667 10.5 0 0 6.666666667 0 4.5 0 0 0 1.583333333 3.833333333 0 4.166666667 4.166666667 0 0 1.25 22.91666667 1.25 0 0
PS3 2.5 0 0 5.083333333 1.25 0.833333333 0 5.916666667 20.83333333 0 0 16.66666667 0 7.583333333 0 1.333333333 0 0 4.5 0 0 0.333333333 0 0 1.75 25.41666667 0 0 0
PS4 2.25 0 1.5 2.5 1.75 0 2.5 0 22.91666667 0 0 19.16666667 2.916666667 18.33333333 0 0 0 2.916666667 6.666666667 0 1.25 5.5 0 0 4.583333333 8.75 0 2.5 0
PS5 4.75 0 1.75 7 2.083333333 4.666666667 0 0 18.08333333 0 0 4.25 0 13.75 0 0 0 0 0 0 0 0 0 0 0 34.33333333 0 0 0
PW1 4.75 1.75 0.666666667 11.83333333 4.916666667 0 0 0 15 2.833333333 1.25 39.16666667 0 0.666666667 0 3.833333333 0 0 4.166666667 0 0 0.833333333 0 0 0 14.16666667 0.666666667 0 1.25
PW2 2.5 0 4 21.66666667 4.666666667 0.5 0 0 25.41666667 0 1.25 7.083333333 0 14.58333333 0 0.833333333 0 1.25 1.25 0 0 3.333333333 0 0 1.25 4.166666667 1 0 0
PW3 1.583333333 1.25 0 10.66666667 4.25 5.75 0 0 12.5 0 1.5 30 0 0.333333333 0 0.333333333 0 3.833333333 0 0 0 2.083333333 0 0 4.583333333 10 0 0 0
PW4 0 1.25 6.666666667 9.916666667 8.25 0 0 0 33.33333333 0 0 5.833333333 0 5.833333333 0 2.083333333 0 0 1.25 0 0 2.5 0 0 0 3.75 1.583333333 0 0
PW5 2.25 2.083333333 0.333333333 10.41666667 4.416666667 1.25 0 0 23.33333333 0 0 4.583333333 0 5.083333333 0 0 13.33333333 12.66666667 8.333333333 0 0 0 0 0 0 12 0 0 0
Here is the trait data: (I tried omitting and not omitting NAs)
Growth_Rate Area AreaVar SLA SLAVar VLA VLAVar Thickness ThicknessVar logThickness logThicknessVar LV LVVar PD0 PD10 PD50 CPD
Achillea_millefolium 0.090888257 15.80656659 12.43783158 NA NA NA NA 0.249744167 0.187092582 -1.553441666 0.458076381 NA NA 12.61566 29.016 250 0.721921544
Artemisia_californica 0.035049437 14.56355219 11.78670881 180.1322546 99.50427931 9.364236482 1.414207935 0.268703703 0.074128238 -1.352780779 0.298806173 43.22157529 13.35296757 12.61566 29.016 250 0.721921544
Bromus_carinatus 0.022607407 2.384166667 2.316140235 NA NA NA NA NA NA NA NA NA NA 5.41269 11.7111 315.3334 0.681203858
Ericameria_ericoides 0.019809977 3.6875 1.703521078 NA NA NA NA NA NA NA NA NA NA 12.61566 29.016 250 0.721921544
Eschscholzia_californica 0.029380702 1.245833333 1.076820745 262.1630059 60.49033956 4.392284625 0.596306575 0.16357684 0.038660691 -1.835819399 0.223972815 39.80718218 11.25985865 294 294 294 0.577356321
Hosackia_gracilis 0.009183502 NA NA NA NA NA NA NA NA NA NA NA NA 41.81336 101.22 250 0.638988811
Lupinus_nanus 0.040867178 NA NA NA NA NA NA NA NA NA NA NA NA 33.60001 101.22 250 0.640373244
Lupinus_variicolor 0.028428463 NA NA NA NA NA NA NA NA NA NA NA NA 33.60001 101.22 250 0.640373244
Mimulus_aurantiacus 0.00652489 0.00652489 0.011364841 3.412857143 2.976064883 151.5001201 79.68333552 2.370279914 0.731201273 0.285257143 0.120154396 37.54090305 16.93270863 183.7778 209.3333 250 0.622318052
Sisyrinchium_bellum 0.01441308 5.477777778 5.117901992 181.6818246 42.91299583 2.954769874 0.448780843 0.176855556 0.018545802 -1.735344864 0.107673785 31.80493389 4.311588188 225.2889 225.2889 315.3334 0.594958509
Sidalcea_malviflora 0.020075948 4.974358974 4.901863202 142.4036892 39.11274955 1.651824981 0.295753475 0.148082051 0.045211395 -1.953346759 0.300665842 20.91557187 8.108682659 163.3333 193 250 0.625836637
Stipa_pulchra 0.01546666 5.28968254 6.055307558 122.3827137 32.67582669 7.352684101 3.027753522 0.149629537 0.031130015 -1.943799376 0.210327301 17.91978995 5.823172424 24 24 315.3334 0.611910294
Here are the dput outputs:
> dput(STraits)
structure(c(0.035049437, 0.029380702, 0.00652489, 0.01441308,
0.020075948, 0.01546666, 14.56355219, 1.245833333, 0.00652489,
5.477777778, 4.974358974, 5.28968254, 11.78670881, 1.076820745,
0.011364841, 5.117901992, 4.901863202, 6.055307558, 180.1322546,
262.1630059, 3.412857143, 181.6818246, 142.4036892, 122.3827137,
99.50427931, 60.49033956, 2.976064883, 42.91299583, 39.11274955,
32.67582669), .Dim = c(6L, 5L), .Dimnames = list(c("Artemisia_californica",
"Eschscholzia_californica", "Mimulus_aurantiacus", "Sisyrinchium_bellum",
"Sidalcea_malviflora", "Stipa_pulchra"), c("Growth_Rate", "Area",
"AreaVar", "SLA", "SLAVar")))
> dput(VegComm)
structure(list(Avena_fatua = c(0, 0.5, 0.333333333, 0.333333333,
1.916666667, 0.333333333, 2.25, 2.5, 2.25, 4.75, 4.75, 2.5, 1.583333333,
0, 2.25), Bromus_diandrus = c(1.25, 0, 4.083333333, 1.25, 0,
3, 4, 0, 0, 0, 1.75, 0, 1.25, 1.25, 2.083333333), Bromus_hordeaceus = c(0,
0.333333333, 0, 3.333333333, 8.833333333, 0, 0, 0, 1.5, 1.75,
0.666666667, 4, 0, 6.666666667, 0.333333333), Festuca_myuros = c(20.83333333,
14.16666667, 27.5, 10.41666667, 10.91666667, 6.25, 6.5, 5.083333333,
2.5, 7, 11.83333333, 21.66666667, 10.66666667, 9.916666667, 10.41666667
), Festuca_perennis = c(7.416666667, 2.25, 3.333333333, 15.83333333,
6.666666667, 2.25, 1.25, 1.25, 1.75, 2.083333333, 4.916666667,
4.666666667, 4.25, 8.25, 4.416666667), Carduus_pycnocephalus = c(0.5,
0, 6.083333333, 5.833333333, 0, 16, 13.75, 0.833333333, 0, 4.666666667,
0, 0.5, 5.75, 0, 1.25), Cirsium_vulgare = c(0, 0, 0, 0, 0.333333333,
0, 0, 0, 2.5, 0, 0, 0, 0, 0, 0), Erodium_cicutarium = c(0, 0,
0, 0, 0, 0, 4.166666667, 5.916666667, 0, 0, 0, 0, 0, 0, 0), Geranium_dissectum = c(21.25,
25, 15.83333333, 25.83333333, 15, 11.41666667, 10.5, 20.83333333,
22.91666667, 18.08333333, 15, 25.41666667, 12.5, 33.33333333,
23.33333333), Helminthotheca_echioides = c(0.333333333, 0, 1.75,
0, 1.25, 0.333333333, 0, 0, 0, 0, 2.833333333, 0, 0, 0, 0), Lactuca_serriola = c(0,
1.916666667, 2.416666667, 1.583333333, 1.75, 0, 0, 0, 0, 0, 1.25,
1.25, 1.5, 0, 0), Medicago_polymorpha = c(6.916666667, 30.41666667,
3.833333333, 10.75, 0, 3.833333333, 6.666666667, 16.66666667,
19.16666667, 4.25, 39.16666667, 7.083333333, 30, 5.833333333,
4.583333333), Oxalis_pes.caprae = c(0, 0, 0, 0, 0, 0, 0, 0, 2.916666667,
0, 0, 0, 0, 0, 0), Raphanus_sativus = c(4.916666667, 3.666666667,
6.666666667, 5.833333333, 3.333333333, 0.833333333, 4.5, 7.583333333,
18.33333333, 13.75, 0.666666667, 14.58333333, 0.333333333, 5.833333333,
5.083333333), Senecio_vulgaris = c(0, 0.833333333, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0), Sonchus_oleraceus = c(0, 0.833333333,
5.916666667, 1.25, 10.83333333, 1.166666667, 0, 1.333333333,
0, 0, 3.833333333, 0.833333333, 0.333333333, 2.083333333, 0),
Vicia_sativa = c(0, 0, 0, 0, 0.5, 0, 0, 0, 0, 0, 0, 0, 0,
0, 13.33333333), Artemisia_californica = c(4.583333333, 0,
1.25, 0, 0, 0, 1.583333333, 0, 2.916666667, 0, 0, 1.25, 3.833333333,
0, 12.66666667), Baccharis_pilularis = c(18.33333333, 17.91666667,
2.083333333, 2.083333333, 3.333333333, 12, 3.833333333, 4.5,
6.666666667, 0, 4.166666667, 1.25, 0, 1.25, 8.333333333),
Ericameria_ericoides = c(1.25, 0, 0, 0, 0.5, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0), Mimulus_aurantiacus = c(0.833333333, 0,
2.5, 0, 0, 0.166666667, 4.166666667, 0, 1.25, 0, 0, 0, 0,
0, 0), Bromus_carinatus = c(0.5, 2.083333333, 5.416666667,
0, 4.666666667, 3.333333333, 4.166666667, 0.333333333, 5.5,
0, 0.833333333, 3.333333333, 2.083333333, 2.5, 0), Elymus_triticoides = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
Hordeum_brachyantherum = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L), Stipa_pulchra = c(0, 0, 1.25,
0, 0.5, 0, 1.25, 1.75, 4.583333333, 0, 0, 1.25, 4.583333333,
0, 0), Achillea_millefolium = c(7.5, 3.333333333, 5, 3.416666667,
9.166666667, 49.16666667, 22.91666667, 25.41666667, 8.75,
34.33333333, 14.16666667, 4.166666667, 10, 3.75, 12), Eschscholzia_californica = c(1.25,
0, 0, 2.916666667, 0, 0, 1.25, 0, 0, 0, 0.666666667, 1, 0,
1.583333333, 0), Lupinus_variicolor = c(0, 0, 0, 0, 0, 0,
0, 0, 2.5, 0, 0, 0, 0, 0, 0), Echium_candicans = c(0, 0,
0, 0, 0.666666667, 0, 0, 0, 0, 0, 1.25, 0, 0, 0, 0)), .Names = c("Avena_fatua",
"Bromus_diandrus", "Bromus_hordeaceus", "Festuca_myuros", "Festuca_perennis",
"Carduus_pycnocephalus", "Cirsium_vulgare", "Erodium_cicutarium",
"Geranium_dissectum", "Helminthotheca_echioides", "Lactuca_serriola",
"Medicago_polymorpha", "Oxalis_pes.caprae", "Raphanus_sativus",
"Senecio_vulgaris", "Sonchus_oleraceus", "Vicia_sativa", "Artemisia_californica",
"Baccharis_pilularis", "Ericameria_ericoides", "Mimulus_aurantiacus",
"Bromus_carinatus", "Elymus_triticoides", "Hordeum_brachyantherum",
"Stipa_pulchra", "Achillea_millefolium", "Eschscholzia_californica",
"Lupinus_variicolor", "Echium_candicans"), row.names = c("PC1",
"PC2", "PC3", "PC4", "PC5", "PS1", "PS2", "PS3", "PS4", "PS5",
"PW1", "PW2", "PW3", "PW4", "PW5"), class = "data.frame")
> dput(nativetree)
structure(list(edge = structure(c(12L, 13L, 14L, 15L, 16L, 16L,
15L, 14L, 17L, 18L, 18L, 19L, 19L, 17L, 13L, 12L, 20L, 21L, 21L,
20L, 13L, 14L, 15L, 16L, 1L, 2L, 3L, 17L, 18L, 4L, 19L, 5L, 6L,
7L, 8L, 20L, 21L, 9L, 10L, 11L), .Dim = c(20L, 2L)), edge.length = c(7.629639,
22, 20.333344, 93.62796, 11.038696, 11.038696, 104.666656, 28.5,
62.899994, 33.600006, 16.800003, 16.800003, 16.800003, 96.5,
147, 41.985199, 51.760712, 60.883728, 60.883728, 112.64444),
Nnode = 10L, node.label = c("", "eudicots", "", "euasterids",
"", "eurosids", "mesopapilionoideaeclade", "lupinus", "",
""), tip.label = c("achillea_millefolium", "ericameria_ericoides",
"mimulus_aurantiacus", "hosackia_gracilis", "lupinus_nanus",
"lupinus_variicolor", "sidalcea_malviflora", "eschscholzia_californica",
"bromus_carinatus", "nassella_pulchra", "sisyrinchium_bellum"
), root.edge = 291.370361), .Names = c("edge", "edge.length",
"Nnode", "node.label", "tip.label", "root.edge"), class = "phylo", order = "cladewise")

The problem is that names of species do not match between STraits and nativetree.
intersect(row.names(STraits), nativetree$tip.label)
# character(0)
R is case-sensitive, so lower case names in the tree will not be recognised as identical to capitalised names in the data matrix. Also, the names of the species differ.
Once the names properly match, the traits need to be ordered as above:
traits <- STraits[nativetree$tip.label,]
and the phylogenetic signal calculated from the new traits table per column:
library(picante)
res = data.frame()
for(i in 1:ncol(traits)){
res[i, ] = phylosignal(x = traits[, i], phy = nativetree, reps = 999)
}
Note that I use the data you provided with dput, not the modifications implied with the script. Additionally, check ?phylosignal for syntax.
Continuous characters may be plotted on a phylogeny with the phytools package as shown here.

Related

How to increment a new column based on value in another column in R

I would like to create two new columns based on a third one. These two columns should have incrementing values of two different kinds.
Let´s take the following dataset as an example:
events <- data.frame(Frame = seq(from = 1001, to = 1033, by = 1),
Value = c(2.05, 0, 2.26, 2.38, 0, 0, 2.88, 0.32, 0.85, 2.85, 2.09, 0, 0, 0, 1.11, 0, 0,
0, 2.46, 2.85, 0, 0, 0.38, 1.91, 0, 0, 0, 2.23, 0, 0.48, 1.83, 0.23, 1.49))
I would like to create:
a column called "Number" incrementing everytime there is a sequence starting with 0 in the column "Value", and
a column called "Duration" starting from 1 everytime a new sequence of 0s is present in the column "Value" and incrementing with 1 as long as the sequence of 0s continues.
Ideally, the final data frame would be this one:
events_final <- data.frame(Frame = seq(from = 1001, to = 1033, by = 1),
Value = c(2.05, 0, 2.26, 2.38, 0, 0, 2.88, 0.32, 0.85, 2.85, 2.09, 0, 0, 0, 1.11, 0, 0,
0, 2.46, 2.85, 0, 0, 0.38, 1.91, 0, 0, 0, 2.23, 0, 0.48, 1.83, 0.23, 1.49),
Number = c(0, 1, 0, 0, 2, 2, 0, 0, 0, 0, 0, 3, 3, 3, 0, 4, 4,
4, 0, 0, 5, 5, 0, 0, 6, 6, 6, 0, 7, 0, 0, 0, 0),
Duration = c(0, 1, 0, 0, 1, 2, 0, 0, 0, 0, 0, 1, 2, 3, 0, 1, 2,
3, 0, 0, 1, 2, 0, 0, 1, 2, 3, 0, 1, 0, 0, 0, 0))
I tried to use the tidyverse to do so, but I do not manage to get what I need [I am even very far from it]:
events %>%
mutate(Number = ifelse(Value > 0, NA, 1),
Duration = case_when(Value == 0 & lag(Value, n = 1) != 0 ~ 1,
Value == 0 & lag(Value, n = 1) == 0 ~ 2))
By looking for related questions, I found that this was feasible in SQL [https://stackoverflow.com/questions/42971752/increment-value-based-on-another-column]. I also know that this is quite easy to be done in Excel [the first Value is in the cell B2]:
Number column [Column C]: =IF(B2>0,0,IF(B1=0,C1,MAX(C$1:C1)+1))
Duration column [Column D]: =IF(B2>0,0,IF(B1=0,D1+1,1))
But I need to have it work in R ;-)
Any help is welcome :-)
You can leverage data.table::rleid() twice here to solve the problem
library(data.table)
setDT(events)
events[, Number:=rleid(fifelse(Value==0,1,0))] %>%
.[Value==0,Number:=rleid(Number)] %>%
.[Value!=0,Number:=0] %>%
.[, Duration:=fifelse(Value==0, 1:.N,0), Number] %>%
.[]
Output:
Frame Value Number Duration
1: 1001 2.05 0 0
2: 1002 0.00 1 1
3: 1003 2.26 0 0
4: 1004 2.38 0 0
5: 1005 0.00 2 1
6: 1006 0.00 2 2
7: 1007 2.88 0 0
8: 1008 0.32 0 0
9: 1009 0.85 0 0
10: 1010 2.85 0 0
11: 1011 2.09 0 0
12: 1012 0.00 3 1
13: 1013 0.00 3 2
14: 1014 0.00 3 3
15: 1015 1.11 0 0
16: 1016 0.00 4 1
17: 1017 0.00 4 2
18: 1018 0.00 4 3
19: 1019 2.46 0 0
20: 1020 2.85 0 0
21: 1021 0.00 5 1
22: 1022 0.00 5 2
23: 1023 0.38 0 0
24: 1024 1.91 0 0
25: 1025 0.00 6 1
26: 1026 0.00 6 2
27: 1027 0.00 6 3
28: 1028 2.23 0 0
29: 1029 0.00 7 1
30: 1030 0.48 0 0
31: 1031 1.83 0 0
32: 1032 0.23 0 0
33: 1033 1.49 0 0
Here is a tidyverse solution:
library(tidyverse)
events |>
mutate(Number = replace(cumsum(Value == 0 & lag(Value != 0)), which(Value != 0), 0)) |>
group_by(tmp = cumsum(Value == 0 & lag(Value != 0))) |>
mutate(Duration = replace(row_number(), which(Value != 0), 0)) |>
ungroup() |>
select(-tmp)
#> # A tibble: 33 x 4
#> Frame Value Number Duration
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1001 2.05 0 0
#> 2 1002 0 1 1
#> 3 1003 2.26 0 0
#> 4 1004 2.38 0 0
#> 5 1005 0 2 1
#> 6 1006 0 2 2
#> 7 1007 2.88 0 0
#> 8 1008 0.32 0 0
#> 9 1009 0.85 0 0
#> 10 1010 2.85 0 0
#> # ... with 23 more rows
Here's a dplyr-based solution with a bit of cleverness for the Number column, but still relying on data.table::rleid for the Duration column:
events_final %>%
mutate(
add = Value == 0 & lag(Value) != 0,
Number_result = cumsum(add) * (Value == 0),
rle_0 = data.table::rleid(Value == 0)
) %>%
group_by(rle_0) %>%
mutate(
Duration_result = ifelse(Value == 0, row_number(), 0)
) %>%
ungroup() %>%
select(-add, -rle_0)
# # A tibble: 33 × 6
# Frame Value Number Duration Number_result Duration_result
# <dbl> <dbl> <dbl> <dbl> <int> <dbl>
# 1 1001 2.05 0 0 0 0
# 2 1002 0 1 1 1 1
# 3 1003 2.26 0 0 0 0
# 4 1004 2.38 0 0 0 0
# 5 1005 0 2 1 2 1
# 6 1006 0 2 2 2 2
# 7 1007 2.88 0 0 0 0
# 8 1008 0.32 0 0 0 0
# 9 1009 0.85 0 0 0 0
# 10 1010 2.85 0 0 0 0
# # … with 23 more rows
# # ℹ Use `print(n = ...)` to see more rows
Here is another (ugly) way to do it. Nowhere near as elegant as #langtang's solution but it works...
events <- data.frame(Frame = seq(from = 1001, to = 1033, by = 1),
Value = c(2.05, 0, 2.26, 2.38, 0, 0, 2.88, 0.32, 0.85, 2.85, 2.09, 0, 0, 0, 1.11, 0, 0,
0, 2.46, 2.85, 0, 0, 0.38, 1.91, 0, 0, 0, 2.23, 0, 0.48, 1.83, 0.23, 1.49))
events_final <- data.frame(Frame = seq(from = 1001, to = 1033, by = 1),
Value = c(2.05, 0, 2.26, 2.38, 0, 0, 2.88, 0.32, 0.85, 2.85, 2.09, 0, 0, 0, 1.11, 0, 0,
0, 2.46, 2.85, 0, 0, 0.38, 1.91, 0, 0, 0, 2.23, 0, 0.48, 1.83, 0.23, 1.49),
Number = c(0, 1, 0, 0, 2, 2, 0, 0, 0, 0, 0, 3, 3, 3, 0, 4, 4,
4, 0, 0, 5, 5, 0, 0, 6, 6, 6, 0, 7, 0, 0, 0, 0),
Duration = c(0, 1, 0, 0, 1, 2, 0, 0, 0, 0, 0, 1, 2, 3, 0, 1, 2,
3, 0, 0, 1, 2, 0, 0, 1, 2, 3, 0, 1, 0, 0, 0, 0))
library(stringr)
events$Number <- events$Value == 0
events$tmp <- NA
i <- 0
lapply(2:nrow(events), function(x) {
if ((events[ x, 'Number' ] == TRUE) &
(events[ x - 1, 'Number' ] == FALSE)) {
i <<- i + 1
events[ x, 'tmp' ] <<- i
} else if ((events[ x, 'Number' ] == TRUE) &
(events[ x - 1, 'Number' ] == TRUE)) {
events[ x, 'tmp' ] <<- i
}
}) |>
invisible()
idx <- which(is.na(events$tmp))
events[ idx, 'tmp' ] <- 0
events <- split(events, events$tmp) |>
lapply(function(x) {
if (unique(x$tmp) > 0) {
x$duration <- 1:nrow(x)
} else {
x$duration <- 0
}
x
}) |>
data.table::rbindlist(fill = TRUE) |>
as.data.frame()
idx <- order(events$Frame)
events <- events[ idx, ]
events$Number <- NULL
colnames(events) <- c('Frame', 'Value', 'Number', 'Duration')
rownames(events) <- NULL
print(events)
identical(events, events_final)

Plotting multiple columns, grouping by date, and adjusting scale

So here I what I want, I want to plot 4 columns (Standing, Sitting, Stepping, Cycling) vs Time, and have 1 plot per date. I also want the Y scale to be scaled between 0.5 and 4.5, BUT the Y axis be invisible and a legend saying which color is which.
Here is a sample of my data:
> head(graph_pre,30)
Date Time Axis1 Axis2 Axis3 VM Standing Stepping Cycling New_Sitting Counter
1 2022-05-10 2022-05-10 09:01:00 21 40 2 45.22 0 0 2 0 0
2 2022-05-10 2022-05-10 09:01:01 0 36 1 36.01 0 0 0 1 1
3 2022-05-10 2022-05-10 09:01:02 24 1 0 24.02 0 0 0 1 0
4 2022-05-10 2022-05-10 09:01:03 48 31 4 57.28 0 0 2 0 1
5 2022-05-10 2022-05-10 09:01:04 0 6 0 6.00 0 0 0 1 1
6 2022-05-10 2022-05-10 09:01:05 0 0 0 0.00 0 0 0 1 0
7 2022-05-10 2022-05-10 09:01:06 0 0 0 0.00 0 0 0 1 0
8 2022-05-10 2022-05-10 09:01:07 0 0 0 0.00 0 0 0 1 0
9 2022-05-10 2022-05-10 09:01:08 0 5 2 5.39 0 0 0 1 0
10 2022-05-10 2022-05-10 09:01:09 20 33 3 38.70 0 0 0 1 0
11 2022-05-10 2022-05-10 09:01:10 14 26 29 41.39 0 0 2 0 1
12 2022-05-10 2022-05-10 09:01:11 11 0 4 11.70 0 0 0 1 1
13 2022-05-10 2022-05-10 09:01:12 0 0 0 0.00 0 0 0 1 0
14 2022-05-10 2022-05-10 09:01:13 0 0 0 0.00 0 0 0 1 0
15 2022-05-10 2022-05-10 09:01:14 82 126 113 188.07 0 3 0 0 1
16 2022-05-10 2022-05-10 09:01:15 60 64 47 99.52 0 0 2 0 1
17 2022-05-10 2022-05-10 09:01:16 98 140 236 291.38 0 0 2 0 0
18 2022-05-10 2022-05-10 09:01:17 151 118 221 292.52 0 0 2 0 0
19 2022-05-10 2022-05-10 09:01:18 44 13 99 109.11 0 0 2 0 0
20 2022-05-10 2022-05-10 09:01:19 6 6 53 53.67 0 0 2 0 0
21 2022-05-10 2022-05-10 09:01:20 39 8 65 76.22 0 0 2 0 0
22 2022-05-10 2022-05-10 09:01:21 17 20 57 62.75 0 0 2 0 0
23 2022-05-10 2022-05-10 09:01:22 51 46 269 277.63 0 0 2 0 0
24 2022-05-10 2022-05-10 09:01:23 15 45 82 94.73 0 3 0 0 1
25 2022-05-10 2022-05-10 09:01:24 22 34 4 40.69 0 0 2 0 1
26 2022-05-10 2022-05-10 09:01:25 114 93 41 152.73 0 0 2 0 0
27 2022-05-10 2022-05-10 09:01:26 74 67 92 135.75 0 0 2 0 0
28 2022-05-10 2022-05-10 09:01:27 117 9 40 123.98 0 0 2 0 0
29 2022-05-10 2022-05-10 09:01:28 33 15 0 36.25 0 0 0 1 1
30 2022-05-10 2022-05-10 09:01:29 0 0 0 0.00 0 0 0 1 0
I have the code to separate by date, and to "kinda" plot, but I need it for the 4 columns.
graph_pre <- mutate(graph_pre, day = lubridate::day(Date))
ggplot(graph_pre, aes(x = Time, y = Posture))+
geom_point()+
facet_wrap(~day, scales = "free_x")
dput(head(graph_pre,30))
structure(list(Date = structure(c(19122, 19122, 19122, 19122,
19122, 19122, 19122, 19122, 19122, 19122, 19122, 19122, 19122,
19122, 19122, 19122, 19122, 19122, 19122, 19122, 19122, 19122,
19122, 19122, 19122, 19122, 19122, 19122, 19122, 19122), class = "Date"),
Time = structure(c(1652187660, 1652187661, 1652187662, 1652187663,
1652187664, 1652187665, 1652187666, 1652187667, 1652187668,
1652187669, 1652187670, 1652187671, 1652187672, 1652187673,
1652187674, 1652187675, 1652187676, 1652187677, 1652187678,
1652187679, 1652187680, 1652187681, 1652187682, 1652187683,
1652187684, 1652187685, 1652187686, 1652187687, 1652187688,
1652187689), class = c("POSIXct", "POSIXt"), tzone = ""),
Axis1 = c(21, 0, 24, 48, 0, 0, 0, 0, 0, 20, 14, 11, 0, 0,
82, 60, 98, 151, 44, 6, 39, 17, 51, 15, 22, 114, 74, 117,
33, 0), Axis2 = c(40, 36, 1, 31, 6, 0, 0, 0, 5, 33, 26, 0,
0, 0, 126, 64, 140, 118, 13, 6, 8, 20, 46, 45, 34, 93, 67,
9, 15, 0), Axis3 = c(2, 1, 0, 4, 0, 0, 0, 0, 2, 3, 29, 4,
0, 0, 113, 47, 236, 221, 99, 53, 65, 57, 269, 82, 4, 41,
92, 40, 0, 0), VM = c(45.22, 36.01, 24.02, 57.28, 6, 0, 0,
0, 5.39, 38.7, 41.39, 11.7, 0, 0, 188.07, 99.52, 291.38,
292.52, 109.11, 53.67, 76.22, 62.75, 277.63, 94.73, 40.69,
152.73, 135.75, 123.98, 36.25, 0), Standing = c(0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0), Stepping = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0,
0, 0, 0, 0), Cycling = c(2, 0, 0, 2, 0, 0, 0, 0, 0, 0, 2,
0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 0, 2, 2, 2, 2, 0, 0),
New_Sitting = c(0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1), Counter = c(0L,
1L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 1L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 0L)), row.names = c(NA,
30L), class = "data.frame")
First thing, we should pivot_longer to pull the four posture columns into name-value pairs. Here I've put the names into the "Posture" column. Then we can map that to color and use the values for the y axis.
I've specified the range in scale_y_continuous, but it could also be done with coord_cartesian(ylim = c(0.5,4.5)) -- the difference will be that the out of range points are filtered out in this way, but are in some sense "still there" if you use the coord_cartesian option. That can make a difference if you are doing a summary step, like geom_boxplot or geom_smooth.
Finally, I use theme to specify the y-axis related elements that should be hidden.
library(tidyverse)
graph %>%
mutate(day = lubridate::day(Date)) %>%
pivot_longer(Standing:New_Sitting, names_to = "Posture") %>%
ggplot(aes(x = Time, y = value, color = Posture))+
geom_point()+
scale_y_continuous(limits = c(0.5,4.5), expand = expansion(0)) +
facet_wrap(~day, scales = "free_x") +
labs(title = "Posture vs. Time") +
theme(axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank())
Here you go:
library(tidyverse)
graph_pre_long <- graph_pre %>% pivot_longer(c(Standing, New_Sitting , Stepping, Cycling), names_to = "Posture")
ggplot(graph_pre_long, aes(x = Time, y = value, color = Posture))+
geom_point()+
facet_wrap(~day, scales = "free_x") +
ylim(.5, 4.5) +
theme(axis.title.y = element_blank(), axis.text.y = element_blank(), axis.ticks.y = element_blank())

mle error: non-finite finite-difference value [1]

I'm trying to fit neutral model parameter using MLE in R.
I cannot get this error, and how can I solve it?
I've got the following error:
Error in optim(start, f, method = method, hessian = TRUE, ...) :
non-finite finite-difference value [1]
These are the codes that I used.
#Load packages
library(minpack.lm)
library(Hmisc)
library(stats4)
#Calculate the number of individuals per community
N <- mean(apply(otu_f_h, 1, sum))
#Calculate the average relative abundance of each taxa across communities
p.m <- apply(otu_f_h, 2, mean)
p.m <- p.m[p.m != 0]
p <- p.m/N
#Calculate the occurrence frequency of each taxa across communities
spp.bi <- 1*(otu_f_h>0)
freq <- apply(spp.bi, 2, mean)
freq <- freq[freq != 0]
#Combine
C <- merge(p, freq, by=0)
C <- C[order(C[,2]),]
C <- as.data.frame(C)
C.0 <- C[!(apply(C, 1, function(y) any(y == 0))),] #Removes rows with any zero (absent in either source pool or local communities)
p <- C.0[,2]
freq <- C.0[,3]
names(p) <- C.0[,1]
names(freq) <- C.0[,1]
#Calculate the limit of detection
d = 1/N
##Fit model parameter m (or Nm) using Non-linear least squares (NLS)
m.fit <- nlsLM(freq ~ pbeta(d, N*m*p, N*m*(1-p), lower.tail=FALSE), start=list(m=0.1))
m.ci <- confint(m.fit, 'm', level=0.95)
##Fit neutral model parameter m (or Nm) using Maximum likelihood estimation (MLE)
sncm.LL <- function(m, sigma){
R = freq - pbeta(d, N*m*p, N*m*(1-p), lower.tail=FALSE)
R = dnorm(R, 0, sigma)
-sum(log(R))
}
m.mle <- mle(sncm.LL, start=list(m=0.1, sigma=0.1), nobs=length(p))
This is the sample table that I used (otu_f_h)
SampleID
OTU01
OTU02
OTU03
OTU04
OTU05
OTU06
OTU07
OTU08
R01
8
0
8
0
0
0
0
257
R02
0
0
0
0
0
0
0
0
R05
0
0
0
0
51
0
0
0
R06
0
0
0
0
9
0
0
0
R09
0
0
0
0
0
0
0
0
R10
0
0
0
0
0
0
6
0
R39
0
0
0
0
0
0
0
28
R40
0
0
0
0
0
0
0
0
R51
0
0
0
0
0
0
0
0
S01
24
0
4
5
0
0
6
0
S02
0
0
37
0
0
0
0
0
S05
0
0
15
28
0
0
0
0
S06
0
2
0
0
7
0
0
0
S09
0
0
0
0
24
0
0
0
S10
0
0
0
0
0
0
0
0
S39
0
0
0
0
0
0
0
0
S40
0
0
0
0
0
0
0
0
S51
0
0
0
0
0
0
0
0
S52
0
0
46
79
0
0
35
0
structure(list(SampleID = c("R01", "R02", "R05", "R06", "R09",
"R10", "R39", "R40", "R51", "S01", "S02", "S05", "S06", "S09",
"S10", "S39", "S40", "S51", "S52"), OTU01 = c(8, 0, 0, 0, 0,
0, 0, 0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0), OTU02 = c(0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0), OTU03 = c(8,
0, 0, 0, 0, 0, 0, 0, 0, 4, 37, 15, 0, 0, 0, 0, 0, 0, 46), OTU04 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 28, 0, 0, 0, 0, 0, 0, 79), OTU05 = c(0,
0, 51, 9, 0, 0, 0, 0, 0, 0, 0, 0, 7, 24, 0, 0, 0, 0, 0), OTU06 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), OTU07 = c(0,
0, 0, 0, 0, 6, 0, 0, 0, 6, 0, 0, 0, 0, 0, 0, 0, 0, 35), OTU08 = c(257,
0, 0, 0, 0, 0, 28, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), row.names = 2:20, class = "data.frame")

Minus values from columns relative to year

I'm trying to minus values for each habitat covariate relative to year 2019 and 2010. So, something that can assign by ID those values belonging to each habitat for 2010 and 2019, minus them, otherwise, those that aren't grouped by ID are left as is in the dataframe.
Here's an example of the dataset and what I expect for the output:
#dataset example
# A tibble: 30 x 18
id year pland_00_water pland_01_evergr~ pland_02_evergr~ pland_03_decidu~ pland_04_decidu~ pland_05_mixed_~ pland_06_closed~
<int> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 267 2019 0.0833 0 0 0 0 0 0
2 268 2019 0.2 0 0 0 0 0 0
3 362 2019 0.1 0 0 0 0 0 0
4 420 2019 0.0556 0 0 0 0 0 0
5 421 2019 0.0667 0 0 0 0 0 0
6 484 2019 0.125 0 0 0 0 0 0
7 492 2010 0.1 0 0 0 0 0 0
8 492 2019 0.1 0 0 0 0 0 0
9 719 2010 0.0769 0 0 0 0 0 0
10 719 2019 0.0769 0 0 0 0 0 0
#output example
# A tibble: 30 x 18
id year pland_00_water pland_01_evergr~ pland_02_evergr~ pland_03_decidu~ pland_04_decidu~ pland_05_mixed_~ pland_06_closed~
<int> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 267 2019 0.0833 0 0 0 0 0 0
2 268 2019 0.2 0 0 0 0 0 0
3 362 2019 0.1 0 0 0 0 0 0
4 420 2019 0.0556 0 0 0 0 0 0
5 421 2019 0.0667 0 0 0 0 0 0
6 484 2019 0.125 0 0 0 0 0 0
7 492 changed 0 0 0 0 0 0 0
9 719 changed 0 0 0 0 0 0 0
I can imagine this working with a function and boolean operators such that, if year 2010 & 2019 match by id then minus the next row by the previous (assuming that they're ordered by id then this should work), otherwise, if they do not match by id then leave them as is.
I'm trying to wrap my head around which code to use for this, I can see this working within a function and using lapply to apply across the entire dataset.
Here's a reproducible code:
structure(list(id = c(267L, 268L, 362L, 420L, 421L, 484L, 492L,
492L, 719L, 719L, 986L, 986L, 1071L, 1071L, 1303L, 1303L, 1306L,
1399L, 1399L, 1400L, 1400L, 2007L, 2083L, 2083L, 2134L, 2135L,
2136L, 2213L, 2213L, 2214L), year = c(2019, 2019, 2019, 2019,
2019, 2019, 2010, 2019, 2010, 2019, 2010, 2019, 2010, 2019, 2010,
2019, 2010, 2010, 2019, 2010, 2019, 2019, 2010, 2019, 2019, 2019,
2019, 2010, 2019, 2010), pland_00_water = c(0.0833333333333333,
0.2, 0.1, 0.0555555555555556, 0.0666666666666667, 0.125, 0.1,
0.1, 0.0769230769230769, 0.0769230769230769, 0.0588235294117647,
0.0588235294117647, 0.0714285714285714, 0.0714285714285714, 0.0769230769230769,
0.0769230769230769, 0.0588235294117647, 0.05, 0.05, 0.111111111111111,
0.111111111111111, 0.0526315789473684, 0.142857142857143, 0.142857142857143,
0.0666666666666667, 0.0588235294117647, 0.1, 0.142857142857143,
0.142857142857143, 0.25), pland_01_evergreen_needleleaf = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.0588235294117647, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), pland_02_evergreen_broadleaf = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0), pland_03_deciduous_needleleaf = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.0714285714285714, 0, 0,
0, 0, 0.05, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), pland_04_deciduous_broadleaf = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.0714285714285714, 0.0714285714285714,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), pland_05_mixed_forest = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0), pland_06_closed_shrubland = c(0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0), pland_07_open_shrubland = c(0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0), pland_08_woody_savanna = c(0, 0, 0, 0, 0, 0,
0, 0, 0.0769230769230769, 0.0769230769230769, 0.0588235294117647,
0.0588235294117647, 0.0714285714285714, 0.0714285714285714, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), pland_09_savanna = c(0,
0, 0, 0, 0, 0, 0, 0, 0.0769230769230769, 0.0769230769230769,
0.0588235294117647, 0.0588235294117647, 0, 0, 0, 0.0769230769230769,
0.0588235294117647, 0.05, 0.05, 0.111111111111111, 0.111111111111111,
0, 0, 0, 0, 0, 0, 0, 0, 0), pland_10_grassland = c(0.0833333333333333,
0.2, 0.1, 0.0555555555555556, 0.0666666666666667, 0.125, 0.1,
0.1, 0.0769230769230769, 0.0769230769230769, 0.0588235294117647,
0.0588235294117647, 0.0714285714285714, 0.0714285714285714, 0.0769230769230769,
0.0769230769230769, 0.0588235294117647, 0.05, 0.05, 0.111111111111111,
0.111111111111111, 0.0526315789473684, 0.142857142857143, 0.142857142857143,
0.0666666666666667, 0.0588235294117647, 0.1, 0.142857142857143,
0.142857142857143, 0.25), pland_11_wetland = c(0.0833333333333333,
0.2, 0.1, 0.0555555555555556, 0, 0, 0.1, 0.1, 0.0769230769230769,
0.0769230769230769, 0.0588235294117647, 0.0588235294117647, 0.0714285714285714,
0.0714285714285714, 0.0769230769230769, 0.0769230769230769, 0.0588235294117647,
0.05, 0.05, 0.111111111111111, 0, 0.0526315789473684, 0.142857142857143,
0.142857142857143, 0.0666666666666667, 0.0588235294117647, 0.1,
0.142857142857143, 0.142857142857143, 0), pland_12_cropland = c(0.0833333333333333,
0.2, 0.1, 0.0555555555555556, 0.0666666666666667, 0.125, 0.1,
0.1, 0.0769230769230769, 0.0769230769230769, 0.0588235294117647,
0, 0, 0, 0.0769230769230769, 0.0769230769230769, 0.0588235294117647,
0.05, 0.05, 0.111111111111111, 0.111111111111111, 0.0526315789473684,
0.142857142857143, 0.142857142857143, 0.0666666666666667, 0,
0, 0.142857142857143, 0.142857142857143, 0.25), pland_13_urban = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0), pland_14_mosiac = c(0, 0, 0, 0, 0, 0,
0, 0, 0.0769230769230769, 0.0769230769230769, 0, 0.0588235294117647,
0, 0, 0, 0, 0, 0.05, 0.05, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
pland_15_barren = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA,
-30L), class = c("tbl_df", "tbl", "data.frame"))
Here's a tidyverse version:
library(dplyr)
x %>%
arrange(year) %>%
# can add 'id' if desired, minimum 'year' required for below
group_by(id) %>%
filter(
all(c("2010", "2019") %in% year),
year %in% c("2010", "2019")
) %>%
summarize_at(vars(-year), diff) %>%
mutate(year = "changed") %>%
ungroup() %>%
bind_rows(x, .) %>%
arrange(id, year) # just to show id=492
# # A tibble: 39 x 18
# id year pland_00_water pland_01_evergr~ pland_02_evergr~ pland_03_decidu~ pland_04_decidu~ pland_05_mixed_~
# <int> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 267 2019 0.0833 0 0 0 0 0
# 2 268 2019 0.2 0 0 0 0 0
# 3 362 2019 0.1 0 0 0 0 0
# 4 420 2019 0.0556 0 0 0 0 0
# 5 421 2019 0.0667 0 0 0 0 0
# 6 484 2019 0.125 0 0 0 0 0
# 7 492 2010 0.1 0 0 0 0 0
# 8 492 2019 0.1 0 0 0 0 0
# 9 492 chan~ 0 0 0 0 0 0
# 10 719 2010 0.0769 0 0 0 0 0
# # ... with 29 more rows, and 10 more variables: pland_06_closed_shrubland <dbl>, pland_07_open_shrubland <dbl>,
# # pland_08_woody_savanna <dbl>, pland_09_savanna <dbl>, pland_10_grassland <dbl>, pland_11_wetland <dbl>,
# # pland_12_cropland <dbl>, pland_13_urban <dbl>, pland_14_mosiac <dbl>, pland_15_barren <dbl>
Explanation:
the first arrange(year) is so that the diff later will have values in an expected order (assuming all years are year-like that sort lexicographically the same as a numerical sort);
the filter first removes any ids that do not have both years, and then ensures we have only those two years; while your data only contains "2010" and "2019", I didn't want to assume that ... it's a harmless filter if that's all you have, remove year %in% c("2010","2019") if desired and safe;
I assume that columns other than id and year are numeric/integer, so summarize_at(vars(-year), diff) is safe (id is out of the picture since it is a grouping variable); if there are non-numerical values, you might be able to use summarize_if(is.numeric, diff) which also works here ... but will silently NA-ize non-numeric fields if present;
bind_rows(x, .) is needed because the filter removed many rows we want/need to retain; and
the last arrange(id,year) is solely demonstrative for this answer.

Sort character in vector of string in R

I have data like,
df <- structure(list(Sex = structure(c(1L, 1L, 2L, 1L, 2L, 2L, 1L,
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("F", "M"), class = "factor"),
Age = c(19L, 16L, 16L, 13L, 16L, 30L, 16L, 30L, 16L, 30L,
30L, 16L, 19L, 1L, 30L), I = c(1, 1, 0, 0, 1, 0, 1, 0, 1,
0, 0, 0, 1, 0, 1), E = c(0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1,
1, 0, 1, 0), S = c(1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0,
0, 1), N = c(0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0),
F = c(1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1), T = c(0,
1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0), C = c(1, 1, 1,
0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1), D = c(0, 0, 0, 1, 0,
1, 0, 1, 0, 1, 1, 1, 1, 0, 0), type = c("CIFS", "CITN", "CESF",
"DEFS", "CIFN", "DETS", "CITS", "DEFS", "CIFN", "DEFN", "DETS",
"DETS", "DINF", "CENT", "CIFS"), PO = runif(15, -3, 3), AO = runif(15, -3, 3)), .Names = c("Sex",
"Age", "I", "E", "S", "N", "F", "T", "C", "D", "type", "PO",
"AO"), class = c("tbl_dt", "tbl", "data.table", "data.frame"), row.names = c(NA,
-15L))
I want to sort the column type. Not the column but the characters in it. And get the same structure afterwards. For example, CIFS should then be CFIS. I tried to do it as,
df <- within(df, {
type <- apply(sapply(strsplit(df[, type], split=''), sort), 2,
function(x) paste0(x, collapse = ''))
})
Is there any simpler solution, that I have missed to find.
Since you are using data.table, I would suggest
df[, type := paste(sort(unlist(strsplit(type, ""))), collapse = ""), by = type]
like described in How to sort letters in a string?
This should work for both data.frame and data.table (base R only):
df$type <- vapply(strsplit(df$type, split=''),FUN=function(x)paste(sort(x),collapse=''),"")
Result:
> df
Sex Age I E S N F T C D type PO AO
1 F 19 1 0 1 0 1 0 1 0 CFIS 2.9750666 2.0308410
2 F 16 1 0 0 1 0 1 1 0 CINT 0.7902187 2.0891158
3 M 16 0 1 1 0 1 0 1 0 CEFS -1.7173785 2.4774140
4 F 13 0 1 1 0 1 0 0 1 DEFS 1.5352127 -1.9272470
5 M 16 1 0 0 1 1 0 1 0 CFIN -0.2160741 1.7359897
6 M 30 0 1 1 0 0 1 0 1 DEST 2.6314981 -0.6252466
7 F 16 1 0 1 0 0 1 1 0 CIST -1.6032894 -1.9938226
8 M 30 0 1 1 0 1 0 0 1 DEFS 0.7748583 -2.0935737
9 F 16 1 0 0 1 1 0 1 0 CFIN -2.9368356 0.3363364
10 F 30 0 1 0 1 1 0 0 1 DEFN -0.6506217 2.6681535
11 F 30 0 1 1 0 0 1 0 1 DEST -0.4432578 0.4627441
12 F 16 0 1 1 0 0 1 0 1 DEST 2.0236760 2.7684298
13 F 19 1 0 0 1 1 0 0 1 DFIN -1.1774931 2.6546726
14 F 1 0 1 0 1 0 1 1 0 CENT -2.2365388 2.7902646
15 F 30 1 0 1 0 1 0 1 0 CFIS -1.6139238 -2.4982620

Resources