R - convert nan to 0 results in all 0's - r

I have a data frame containing NaN's that I'd like to convert to 0's. I wrote a function that I think should work:
fix_nan <- function(x){
return(x[is.nan(x)] <- 0)
}
And then I apply it to the data frame:
train_e <- structure(list(pack_id = structure(1:10, .Label = c("1", "2",
"4", "5", "7", "8", "9", "10", "11", "14"), class = "factor"),
item_1 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), item_2 = c(NaN,
NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN), item_3 = c(1.45225232891169,
0.613104472886409, NaN, 1.02450431651439, 0.735706794978741,
0.741937344729377, NaN, 0.83034830207343, 0.97650959186721,
0.750305594399894), item_4 = c(0.645137961373585, 0.615792803650477,
Inf, 0.752866415261568, 0.84901755126673, 0.646398200985872,
Inf, 0.786548355648346, 0.725113372622438, 0.709897990984761
), item_5 = c(NaN, NaN, NaN, 0, 0, 0, NaN, NaN, 0, 0), item_6 = c(0.510825623765991,
0.510825623765991, NaN, 0.510825623765991, 0.510825623765991,
0.510825623765991, NaN, 0.510825623765991, 0.847297860387204,
0.510825623765991)), .Names = c("pack_id", "item_1", "item_2",
"item_3", "item_4", "item_5", "item_6"), row.names = c(26155L,
6236L, 6281L, 6014L, 6035L, 26217L, 5576L, 6316L, 5594L, 26244L
), class = "data.frame")
vtf1 <- c('item_1','item_2','item_3','item_4','item_5','item_6')
train_e[,vtf1] <- as.data.frame(lapply(train_e[,vtf1], fix_nan))
head(train_e)
And I get all 0's:
> head(train_e)
pack_id item_1 item_2 item_3 item_4 item_5 item_6
26155 1 0 0 0 0 0 0
6236 2 0 0 0 0 0 0
6281 4 0 0 0 0 0 0
6014 5 0 0 0 0 0 0
6035 7 0 0 0 0 0 0
26217 8 0 0 0 0 0 0
Any suggestions ?

x[is.nan(x)] <- 0 returns only those elements of x that were NaN (and are now zero). To fix this, change your function:
fix_nan <- function(x){
x[is.nan(x)] <- 0
x
}

Related

Include all variables in tsibble formula

I want to fit a linear regression model using the tsibble package and I have a bunch of dummy variables that I want to include in my analysis. A sample dataset would be the following:
library(tsibble)
library(dplyr)
library(fable)
ex = structure(list(id = c("KEY1", "KEY1", "KEY1", "KEY1", "KEY1",
"KEY1", "KEY1", "KEY1", "KEY1", "KEY1", "KEY1", "KEY1", "KEY1",
"KEY1", "KEY1"), sales = c(0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0), date = structure(c(15003, 15004, 15005, 15006, 15007,
15008, 15009, 15010, 15011, 15012, 15013, 15014, 15015, 15016,
15017), class = "Date"), wday = c(1L, 2L, 3L, 4L, 5L, 6L, 7L,
1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L), dummy_1 = c(0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0), dummy_2 = c(0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0), dummy_3 = c(0, 0, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, -15L), key = structure(list(
id = "KEY1", .rows = list(1:15)), row.names = c(NA, -1L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), index = structure("date", ordered = TRUE), index2 = "date", interval = structure(list(
year = 0, quarter = 0, month = 0, week = 0, day = 1, hour = 0,
minute = 0, second = 0, millisecond = 0, microsecond = 0,
nanosecond = 0, unit = 0), class = "interval"), class = c("tbl_ts",
"tbl_df", "tbl", "data.frame"))
> ex
# A tsibble: 15 x 7 [1D]
# Key: id [1]
id sales date wday dummy_1 dummy_2 dummy_3
<chr> <dbl> <date> <int> <dbl> <dbl> <dbl>
1 KEY1 0 2011-01-29 1 0 0 0
2 KEY1 5 2011-01-30 2 0 0 0
3 KEY1 0 2011-01-31 3 0 0 1
4 KEY1 0 2011-02-01 4 1 0 0
5 KEY1 0 2011-02-02 5 0 0 0
6 KEY1 0 2011-02-03 6 0 0 0
7 KEY1 0 2011-02-04 7 0 1 0
8 KEY1 0 2011-02-05 1 0 0 0
9 KEY1 0 2011-02-06 2 0 0 0
10 KEY1 0 2011-02-07 3 0 0 0
11 KEY1 0 2011-02-08 4 0 0 0
12 KEY1 0 2011-02-09 5 0 0 0
13 KEY1 0 2011-02-10 6 0 0 0
14 KEY1 0 2011-02-11 7 0 0 0
15 KEY1 0 2011-02-12 1 0 0 0
They are too many dummies to specify manually so I was hoping for something faster. Normally I would use the . symbol in the formula in the following way:
fit = ex %>%
model(TSLM(sales ~ trend() + season() + .))
But this does not work:
Warning message:
1 error encountered for TSLM(sales ~ trend() + season() + .)
[1] '.' in formula and no 'data' argument
Is there a systematic tsibble way around this or do I have to create the formula on the fly using the names of the dataset?
We could create a formula with reformulate using the 'dummy' column names
nm1 <- names(ex)[startsWith(names(ex), 'dummy')]
ex %>%
model(lm = TSLM(reformulate(c(nm1, 'trend()', 'season()'), 'sales') ))

Creating one hot encoded columns while preserving other features

I've got the following data:
dataset <- structure(list(id = structure(c(2L, 3L, 1L, 3L, 1L, 9L), .Label = c("215101",
"215559", "216566", "217284", "219435", "220209", "220249", "220250",
"225678", "225679", "225687", "225869", "228420", "228435", "230621",
"230623", "233063", "233097", "233098", "235546", "235560", "235567",
"236379"), class = "factor"), cat1 = c("A", "B", "B", "A", "A",
"A"), cat2 = c("item 1", "item 1", "item 2", "item 5", "item 3",
"item 28"), cat3 = c("theme 2", "theme 2", "theme 1", "theme 4",
"theme 10", "theme 40")), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -6L))
I would like to create kind of model matrix with one hot encoded columns features created from columns cat2 and cat3. Therefore, my output would look like this:
structure(list(id = structure(c(1L, 1L, 2L, 3L, 3L, 9L), .Label = c("215101",
"215559", "216566", "217284", "219435", "220209", "220249", "220250",
"225678", "225679", "225687", "225869", "228420", "228435", "230621",
"230623", "233063", "233097", "233098", "235546", "235560", "235567",
"236379"), class = "factor"), cat1 = c("A", "B", "A", "A", "B",
"A"), `item 1` = c(0, 0, 1, 0, 1, 0), `item 2` = c(0, 1, 0, 0,
0, 0), `item 28` = c(0, 0, 0, 0, 0, 1), `item 3` = c(1, 0, 0,
0, 0, 0), `item 5` = c(0, 0, 0, 1, 0, 0), `theme 1` = c(0, 1,
0, 0, 0, 0), `theme 10` = c(1, 0, 0, 0, 0, 0), `theme 2` = c(0,
0, 1, 0, 1, 0), `theme 4` = c(0, 0, 0, 1, 0, 0), `theme 40` = c(0,
0, 0, 0, 0, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-6L))
However, I don't have my independent variable in this dataset and I would like to preserve id and cat1 columns. How can I do that?
You could use merge and dcast twice.
library(reshape2)
merge(dcast(dataset, id + cat1 ~ cat2, fun.aggregate = length),
dcast(dataset, id + cat1 ~ cat3, fun.aggregate = length),
by = c("id", "cat1"))
# id cat1 item 1 item 2 item 28 item 3 item 5 theme 1 theme 10 theme 2 theme 4 theme 40
#1 215101 A 0 0 0 1 0 0 1 0 0 0
#2 215101 B 0 1 0 0 0 1 0 0 0 0
#3 215559 A 1 0 0 0 0 0 0 1 0 0
#4 216566 A 0 0 0 0 1 0 0 0 1 0
#5 216566 B 1 0 0 0 0 0 0 1 0 0
#6 225678 A 0 0 1 0 0 0 0 0 0 1
If you have more then two variables to spread you might melt you data first. This will save you some typing.
dcast(melt(dataset, id.vars = c("id", "cat1")), id + cat1 ~ value, fun.aggregate = length)

Plotting Traits on a Phylogeny

I am following this guide on how to plot traits onto a phylogeny to determine trait conservatism. I have followed it step by step but can't seem to get either the community composition or trait plots on phylogeny to work at all for my datasets. I have formatted just as they said and it looks just like their example data sents to me.
I am not sure how to put tree files on here so here is one on a cloud link for all species and here is a tree that I used just for my native species used for trait plotting
VegComm <- df2vec(as.matrix(Veg2018), colID = 1:29) #community data
STraits <- read.csv()
rownames(STraits)<- STraits[,1]
STraits[1:1] <- list(NULL) #Trait Data
STraits <- df2vec(as.matrix.data.frame(STraits), colID=1:5)
STraits <- STraits[1:6,]
str(STraits)
prune.sample(VegComm,alltree)
par(mfrow=c(2,2))
for (i in colnames(STraits)) {
+ plot(nativetree, show.tip.label=TRUE, main=i)
+ tiplabels(pch=22, col=STraits[,i]+1, bg=STraits[,i]+1, cex=1.5)}
traits <- STraits[nativetree$tip.label,]
phylosignal(nativetree, STraits, nsim=1000, method="K")
Here is the community data:
Avena_fatua Bromus_diandrus Bromus_hordeaceus Festuca_myuros Festuca_perennis Carduus_pycnocephalus Cirsium_vulgare Erodium_cicutarium Geranium_dissectum Helminthotheca_echioides Lactuca_serriola Medicago_polymorpha Oxalis_pes-caprae Raphanus_sativus Senecio_vulgaris Sonchus_oleraceus Vicia_sativa Artemisia_californica Baccharis_pilularis Ericameria_ericoides Mimulus_aurantiacus Bromus_carinatus Elymus_triticoides Hordeum_brachyantherum Stipa_pulchra Achillea_millefolium Eschscholzia_californica Lupinus_variicolor Echium_candicans
PC1 0 1.25 0 20.83333333 7.416666667 0.5 0 0 21.25 0.333333333 0 6.916666667 0 4.916666667 0 0 0 4.583333333 18.33333333 1.25 0.833333333 0.5 0 0 0 7.5 1.25 0 0
PC2 0.5 0 0.333333333 14.16666667 2.25 0 0 0 25 0 1.916666667 30.41666667 0 3.666666667 0.833333333 0.833333333 0 0 17.91666667 0 0 2.083333333 0 0 0 3.333333333 0 0 0
PC3 0.333333333 4.083333333 0 27.5 3.333333333 6.083333333 0 0 15.83333333 1.75 2.416666667 3.833333333 0 6.666666667 0 5.916666667 0 1.25 2.083333333 0 2.5 5.416666667 0 0 1.25 5 0 0 0
PC4 0.333333333 1.25 3.333333333 10.41666667 15.83333333 5.833333333 0 0 25.83333333 0 1.583333333 10.75 0 5.833333333 0 1.25 0 0 2.083333333 0 0 0 0 0 0 3.416666667 2.916666667 0 0
PC5 1.916666667 0 8.833333333 10.91666667 6.666666667 0 0.333333333 0 15 1.25 1.75 0 0 3.333333333 0 10.83333333 0.5 0 3.333333333 0.5 0 4.666666667 0 0 0.5 9.166666667 0 0 0.666666667
PS1 0.333333333 3 0 6.25 2.25 16 0 0 11.41666667 0.333333333 0 3.833333333 0 0.833333333 0 1.166666667 0 0 12 0 0.166666667 3.333333333 0 0 0 49.16666667 0 0 0
PS2 2.25 4 0 6.5 1.25 13.75 0 4.166666667 10.5 0 0 6.666666667 0 4.5 0 0 0 1.583333333 3.833333333 0 4.166666667 4.166666667 0 0 1.25 22.91666667 1.25 0 0
PS3 2.5 0 0 5.083333333 1.25 0.833333333 0 5.916666667 20.83333333 0 0 16.66666667 0 7.583333333 0 1.333333333 0 0 4.5 0 0 0.333333333 0 0 1.75 25.41666667 0 0 0
PS4 2.25 0 1.5 2.5 1.75 0 2.5 0 22.91666667 0 0 19.16666667 2.916666667 18.33333333 0 0 0 2.916666667 6.666666667 0 1.25 5.5 0 0 4.583333333 8.75 0 2.5 0
PS5 4.75 0 1.75 7 2.083333333 4.666666667 0 0 18.08333333 0 0 4.25 0 13.75 0 0 0 0 0 0 0 0 0 0 0 34.33333333 0 0 0
PW1 4.75 1.75 0.666666667 11.83333333 4.916666667 0 0 0 15 2.833333333 1.25 39.16666667 0 0.666666667 0 3.833333333 0 0 4.166666667 0 0 0.833333333 0 0 0 14.16666667 0.666666667 0 1.25
PW2 2.5 0 4 21.66666667 4.666666667 0.5 0 0 25.41666667 0 1.25 7.083333333 0 14.58333333 0 0.833333333 0 1.25 1.25 0 0 3.333333333 0 0 1.25 4.166666667 1 0 0
PW3 1.583333333 1.25 0 10.66666667 4.25 5.75 0 0 12.5 0 1.5 30 0 0.333333333 0 0.333333333 0 3.833333333 0 0 0 2.083333333 0 0 4.583333333 10 0 0 0
PW4 0 1.25 6.666666667 9.916666667 8.25 0 0 0 33.33333333 0 0 5.833333333 0 5.833333333 0 2.083333333 0 0 1.25 0 0 2.5 0 0 0 3.75 1.583333333 0 0
PW5 2.25 2.083333333 0.333333333 10.41666667 4.416666667 1.25 0 0 23.33333333 0 0 4.583333333 0 5.083333333 0 0 13.33333333 12.66666667 8.333333333 0 0 0 0 0 0 12 0 0 0
Here is the trait data: (I tried omitting and not omitting NAs)
Growth_Rate Area AreaVar SLA SLAVar VLA VLAVar Thickness ThicknessVar logThickness logThicknessVar LV LVVar PD0 PD10 PD50 CPD
Achillea_millefolium 0.090888257 15.80656659 12.43783158 NA NA NA NA 0.249744167 0.187092582 -1.553441666 0.458076381 NA NA 12.61566 29.016 250 0.721921544
Artemisia_californica 0.035049437 14.56355219 11.78670881 180.1322546 99.50427931 9.364236482 1.414207935 0.268703703 0.074128238 -1.352780779 0.298806173 43.22157529 13.35296757 12.61566 29.016 250 0.721921544
Bromus_carinatus 0.022607407 2.384166667 2.316140235 NA NA NA NA NA NA NA NA NA NA 5.41269 11.7111 315.3334 0.681203858
Ericameria_ericoides 0.019809977 3.6875 1.703521078 NA NA NA NA NA NA NA NA NA NA 12.61566 29.016 250 0.721921544
Eschscholzia_californica 0.029380702 1.245833333 1.076820745 262.1630059 60.49033956 4.392284625 0.596306575 0.16357684 0.038660691 -1.835819399 0.223972815 39.80718218 11.25985865 294 294 294 0.577356321
Hosackia_gracilis 0.009183502 NA NA NA NA NA NA NA NA NA NA NA NA 41.81336 101.22 250 0.638988811
Lupinus_nanus 0.040867178 NA NA NA NA NA NA NA NA NA NA NA NA 33.60001 101.22 250 0.640373244
Lupinus_variicolor 0.028428463 NA NA NA NA NA NA NA NA NA NA NA NA 33.60001 101.22 250 0.640373244
Mimulus_aurantiacus 0.00652489 0.00652489 0.011364841 3.412857143 2.976064883 151.5001201 79.68333552 2.370279914 0.731201273 0.285257143 0.120154396 37.54090305 16.93270863 183.7778 209.3333 250 0.622318052
Sisyrinchium_bellum 0.01441308 5.477777778 5.117901992 181.6818246 42.91299583 2.954769874 0.448780843 0.176855556 0.018545802 -1.735344864 0.107673785 31.80493389 4.311588188 225.2889 225.2889 315.3334 0.594958509
Sidalcea_malviflora 0.020075948 4.974358974 4.901863202 142.4036892 39.11274955 1.651824981 0.295753475 0.148082051 0.045211395 -1.953346759 0.300665842 20.91557187 8.108682659 163.3333 193 250 0.625836637
Stipa_pulchra 0.01546666 5.28968254 6.055307558 122.3827137 32.67582669 7.352684101 3.027753522 0.149629537 0.031130015 -1.943799376 0.210327301 17.91978995 5.823172424 24 24 315.3334 0.611910294
Here are the dput outputs:
> dput(STraits)
structure(c(0.035049437, 0.029380702, 0.00652489, 0.01441308,
0.020075948, 0.01546666, 14.56355219, 1.245833333, 0.00652489,
5.477777778, 4.974358974, 5.28968254, 11.78670881, 1.076820745,
0.011364841, 5.117901992, 4.901863202, 6.055307558, 180.1322546,
262.1630059, 3.412857143, 181.6818246, 142.4036892, 122.3827137,
99.50427931, 60.49033956, 2.976064883, 42.91299583, 39.11274955,
32.67582669), .Dim = c(6L, 5L), .Dimnames = list(c("Artemisia_californica",
"Eschscholzia_californica", "Mimulus_aurantiacus", "Sisyrinchium_bellum",
"Sidalcea_malviflora", "Stipa_pulchra"), c("Growth_Rate", "Area",
"AreaVar", "SLA", "SLAVar")))
> dput(VegComm)
structure(list(Avena_fatua = c(0, 0.5, 0.333333333, 0.333333333,
1.916666667, 0.333333333, 2.25, 2.5, 2.25, 4.75, 4.75, 2.5, 1.583333333,
0, 2.25), Bromus_diandrus = c(1.25, 0, 4.083333333, 1.25, 0,
3, 4, 0, 0, 0, 1.75, 0, 1.25, 1.25, 2.083333333), Bromus_hordeaceus = c(0,
0.333333333, 0, 3.333333333, 8.833333333, 0, 0, 0, 1.5, 1.75,
0.666666667, 4, 0, 6.666666667, 0.333333333), Festuca_myuros = c(20.83333333,
14.16666667, 27.5, 10.41666667, 10.91666667, 6.25, 6.5, 5.083333333,
2.5, 7, 11.83333333, 21.66666667, 10.66666667, 9.916666667, 10.41666667
), Festuca_perennis = c(7.416666667, 2.25, 3.333333333, 15.83333333,
6.666666667, 2.25, 1.25, 1.25, 1.75, 2.083333333, 4.916666667,
4.666666667, 4.25, 8.25, 4.416666667), Carduus_pycnocephalus = c(0.5,
0, 6.083333333, 5.833333333, 0, 16, 13.75, 0.833333333, 0, 4.666666667,
0, 0.5, 5.75, 0, 1.25), Cirsium_vulgare = c(0, 0, 0, 0, 0.333333333,
0, 0, 0, 2.5, 0, 0, 0, 0, 0, 0), Erodium_cicutarium = c(0, 0,
0, 0, 0, 0, 4.166666667, 5.916666667, 0, 0, 0, 0, 0, 0, 0), Geranium_dissectum = c(21.25,
25, 15.83333333, 25.83333333, 15, 11.41666667, 10.5, 20.83333333,
22.91666667, 18.08333333, 15, 25.41666667, 12.5, 33.33333333,
23.33333333), Helminthotheca_echioides = c(0.333333333, 0, 1.75,
0, 1.25, 0.333333333, 0, 0, 0, 0, 2.833333333, 0, 0, 0, 0), Lactuca_serriola = c(0,
1.916666667, 2.416666667, 1.583333333, 1.75, 0, 0, 0, 0, 0, 1.25,
1.25, 1.5, 0, 0), Medicago_polymorpha = c(6.916666667, 30.41666667,
3.833333333, 10.75, 0, 3.833333333, 6.666666667, 16.66666667,
19.16666667, 4.25, 39.16666667, 7.083333333, 30, 5.833333333,
4.583333333), Oxalis_pes.caprae = c(0, 0, 0, 0, 0, 0, 0, 0, 2.916666667,
0, 0, 0, 0, 0, 0), Raphanus_sativus = c(4.916666667, 3.666666667,
6.666666667, 5.833333333, 3.333333333, 0.833333333, 4.5, 7.583333333,
18.33333333, 13.75, 0.666666667, 14.58333333, 0.333333333, 5.833333333,
5.083333333), Senecio_vulgaris = c(0, 0.833333333, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0), Sonchus_oleraceus = c(0, 0.833333333,
5.916666667, 1.25, 10.83333333, 1.166666667, 0, 1.333333333,
0, 0, 3.833333333, 0.833333333, 0.333333333, 2.083333333, 0),
Vicia_sativa = c(0, 0, 0, 0, 0.5, 0, 0, 0, 0, 0, 0, 0, 0,
0, 13.33333333), Artemisia_californica = c(4.583333333, 0,
1.25, 0, 0, 0, 1.583333333, 0, 2.916666667, 0, 0, 1.25, 3.833333333,
0, 12.66666667), Baccharis_pilularis = c(18.33333333, 17.91666667,
2.083333333, 2.083333333, 3.333333333, 12, 3.833333333, 4.5,
6.666666667, 0, 4.166666667, 1.25, 0, 1.25, 8.333333333),
Ericameria_ericoides = c(1.25, 0, 0, 0, 0.5, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0), Mimulus_aurantiacus = c(0.833333333, 0,
2.5, 0, 0, 0.166666667, 4.166666667, 0, 1.25, 0, 0, 0, 0,
0, 0), Bromus_carinatus = c(0.5, 2.083333333, 5.416666667,
0, 4.666666667, 3.333333333, 4.166666667, 0.333333333, 5.5,
0, 0.833333333, 3.333333333, 2.083333333, 2.5, 0), Elymus_triticoides = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
Hordeum_brachyantherum = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L), Stipa_pulchra = c(0, 0, 1.25,
0, 0.5, 0, 1.25, 1.75, 4.583333333, 0, 0, 1.25, 4.583333333,
0, 0), Achillea_millefolium = c(7.5, 3.333333333, 5, 3.416666667,
9.166666667, 49.16666667, 22.91666667, 25.41666667, 8.75,
34.33333333, 14.16666667, 4.166666667, 10, 3.75, 12), Eschscholzia_californica = c(1.25,
0, 0, 2.916666667, 0, 0, 1.25, 0, 0, 0, 0.666666667, 1, 0,
1.583333333, 0), Lupinus_variicolor = c(0, 0, 0, 0, 0, 0,
0, 0, 2.5, 0, 0, 0, 0, 0, 0), Echium_candicans = c(0, 0,
0, 0, 0.666666667, 0, 0, 0, 0, 0, 1.25, 0, 0, 0, 0)), .Names = c("Avena_fatua",
"Bromus_diandrus", "Bromus_hordeaceus", "Festuca_myuros", "Festuca_perennis",
"Carduus_pycnocephalus", "Cirsium_vulgare", "Erodium_cicutarium",
"Geranium_dissectum", "Helminthotheca_echioides", "Lactuca_serriola",
"Medicago_polymorpha", "Oxalis_pes.caprae", "Raphanus_sativus",
"Senecio_vulgaris", "Sonchus_oleraceus", "Vicia_sativa", "Artemisia_californica",
"Baccharis_pilularis", "Ericameria_ericoides", "Mimulus_aurantiacus",
"Bromus_carinatus", "Elymus_triticoides", "Hordeum_brachyantherum",
"Stipa_pulchra", "Achillea_millefolium", "Eschscholzia_californica",
"Lupinus_variicolor", "Echium_candicans"), row.names = c("PC1",
"PC2", "PC3", "PC4", "PC5", "PS1", "PS2", "PS3", "PS4", "PS5",
"PW1", "PW2", "PW3", "PW4", "PW5"), class = "data.frame")
> dput(nativetree)
structure(list(edge = structure(c(12L, 13L, 14L, 15L, 16L, 16L,
15L, 14L, 17L, 18L, 18L, 19L, 19L, 17L, 13L, 12L, 20L, 21L, 21L,
20L, 13L, 14L, 15L, 16L, 1L, 2L, 3L, 17L, 18L, 4L, 19L, 5L, 6L,
7L, 8L, 20L, 21L, 9L, 10L, 11L), .Dim = c(20L, 2L)), edge.length = c(7.629639,
22, 20.333344, 93.62796, 11.038696, 11.038696, 104.666656, 28.5,
62.899994, 33.600006, 16.800003, 16.800003, 16.800003, 96.5,
147, 41.985199, 51.760712, 60.883728, 60.883728, 112.64444),
Nnode = 10L, node.label = c("", "eudicots", "", "euasterids",
"", "eurosids", "mesopapilionoideaeclade", "lupinus", "",
""), tip.label = c("achillea_millefolium", "ericameria_ericoides",
"mimulus_aurantiacus", "hosackia_gracilis", "lupinus_nanus",
"lupinus_variicolor", "sidalcea_malviflora", "eschscholzia_californica",
"bromus_carinatus", "nassella_pulchra", "sisyrinchium_bellum"
), root.edge = 291.370361), .Names = c("edge", "edge.length",
"Nnode", "node.label", "tip.label", "root.edge"), class = "phylo", order = "cladewise")
The problem is that names of species do not match between STraits and nativetree.
intersect(row.names(STraits), nativetree$tip.label)
# character(0)
R is case-sensitive, so lower case names in the tree will not be recognised as identical to capitalised names in the data matrix. Also, the names of the species differ.
Once the names properly match, the traits need to be ordered as above:
traits <- STraits[nativetree$tip.label,]
and the phylogenetic signal calculated from the new traits table per column:
library(picante)
res = data.frame()
for(i in 1:ncol(traits)){
res[i, ] = phylosignal(x = traits[, i], phy = nativetree, reps = 999)
}
Note that I use the data you provided with dput, not the modifications implied with the script. Additionally, check ?phylosignal for syntax.
Continuous characters may be plotted on a phylogeny with the phytools package as shown here.

Create new columns with mutate_if [duplicate]

This question already has an answer here:
Create new variables with mutate_at while keeping the original ones
(1 answer)
Closed 4 years ago.
Let's assume that I have data like below:
structure(list(A = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 8), B = c(0, 1, 1, 0, 0, 1, 4, 9.2, 9, 0, 0, 1), C = c(2, 9, 0, 0, 0, 9, 0, 0, 0, 0, 0, 8)), .Names = c("A", "B", "C"), row.names = c(NA, -12L), class = "data.frame")
Now I would like to create dummy variables for these columns for which proportion of 0's is greater than 0.5. These dummy variables would have value 0 if there is 0 in original column, and 1 if opposite. How can I accomplish that with dplyr? I was thinking of data %>% mutate_if(~mean(. == 0) > .5, ~ifelse(. == 0, 0, 1)), but this operates in place and I need to create new variables named e.g. A01, C01 and preserve the old ones A and C.
We wrap with the funs and give a different name which will append as suffix
library(dplyr)
library(stringr)
df1 %>%
mutate_if(~mean(. == 0) > .5, funs(`01` = ifelse(. == 0, 0, 1))) %>%
rename_all(str_remove, "_")
# A B C A01 C01
#1 0 0.0 2 0 1
#2 0 1.0 9 0 1
#3 0 1.0 0 0 0
#4 0 0.0 0 0 0
#5 0 0.0 0 0 0
#6 0 1.0 9 0 1
#7 0 4.0 0 0 0
#8 0 9.2 0 0 0
#9 0 9.0 0 0 0
#10 0 0.0 0 0 0
#11 1 0.0 0 1 0
#12 8 1.0 8 1 1
In the newer version of dplyr, we can use mutate with across
df1 %>%
mutate(across(where(~ mean(. == 0) > .5),
~ as.integer(. != 0), .names = '{.col}01'))

How to convert predicted values into binary variables and save them to a CSV

I have made a decision tree model on test data then used it to predict vales in a test dataset.
dtpredict<-predict(ct1, testdat, type="class")
The output looks like:
1 2 3 4 5 6
Class_2 Class_2 Class_6 Class_2 Class_8 Class_2
I want to write a csv to look like:
id, Class_1, Class_2, Class_3, Class_4, Class_5, Class_6, Class_7, Class_8, Class_9
1, 0, 1, 0, 0, 0, 0, 0, 0, 0
2, 0, 1, 0, 0, 0, 0, 0, 0, 0
3, 0, 0, 0, 0, 0, 1, 0, 0, 0
4, 0, 1, 0, 0, 0, 0, 0, 0, 0
5, 0, 0, 0, 0, 0, 0, 0, 1, 0
6, 0, 1, 0, 0, 0, 0, 0, 0, 0
There's a package called dummies that does that well...
install.packages("dummies")
library(dummies)
x <- factor(c("Class_2", "Class_2", "Class_6", "Class_2", "Class_8", "Class_2"),
levels = paste("Class", 1:9, sep="_"))
dummy(x, drop = FALSE)
xClass_1 xClass_2 xClass_3 xClass_4 xClass_5 xClass_6 xClass_7 xClass_8 xClass_9
[1,] 0 1 0 0 0 0 0 0 0
[2,] 0 1 0 0 0 0 0 0 0
[3,] 0 0 0 0 0 1 0 0 0
[4,] 0 1 0 0 0 0 0 0 0
[5,] 0 0 0 0 0 0 0 1 0
[6,] 0 1 0 0 0 0 0 0 0
All that remains is to get rid of the "x" but this should not be too hard with something like this:
d <- dummy(x,drop = FALSE)
colnames(d) <- sub("x", "", colnames(d))
and then to save to disk:
write.csv(d, "somefile.csv", row.names = FALSE)
Uh, what are the 010101's - logicals? If so they don't make much sense in your example all are class 1 (doesn't correspond to your example dtpredict). If they are logicals....
# if dtpredict is a factor vector, where the values are the classes
# and the names are the boolean values:
values = as.numeric(as.character(names(dtpredict)))
classes = as.character(dtpredict)
x = data.frame(id=names(classes))
for(class in sort(unique(classes)){
x[ , class] = as.numeric(sapply(classes, FUN=function(p) p==class])
}
write.csv(x, 'blah.csv')

Resources