Cluster Segregation in R - r

Clusters have been formed. Now, I am wondering if we can select elements from a particular cluster id.
Here are the different clusters that are formed .
1 2 3 4 5 6 7 8 9
549 290 1206 103 97 102 2 208 123
10 11 12 13 14 15 16 17 18
17 75 293 981 23 586 25 15 365
Like , I have to chose element from cluster 12. Then, how to do it
This is the code used to form the cluster:
db <- dbscan(cbind(Final$event_begin_longitude,Final$event_begin_latitude), .0025, minPts = 1, scale = FALSE, method = "raw")

There is no predefined method to access elements of a cluster. However, you can easily do it yourself. The return value of dbscan has a slot named clusters, which is in the same order as your input:
dta <- structure(list(V1 = c(0, 0.04, 0.09, 0.13, 0.17, 0.22, 0.26, 0.3, 0.35, 0.39, 0.43, 0.48, 0.52, 0.57, 0.61, 0.65, 0.7, 0.74, 0.78, 0.83, 0.87, 0.91, 0.96, 1),
V2 = c(0.01, 0.01, 0, 0, 0.08, 0.03, 0.01, 0.05, 0.45, 0.73, 0.91, 0.9, 0.67, 0.77, 0.98, 0.94, 0.86, 1, 0.38, 0.09, 0.01, 0.01, 0, 0)),
.Names = c("V1", "V2"),
row.names = c(NA, -24L),
class = "data.frame")
db <- dbscan::dbscan(dta, .25, minPts = 1)
# Combine values and their cluster
cbind(dta, db$cluster)
# Plot with colored clusters
plot(dta, col = db$cluster, pch = 16)

Related

How to use Corrplot with correlation matrix created by hand (of type list)

I used a for loop to create a correlation matrix, because I needed to use polychor to generate polychoric correaltions and I was only able to get polychor to correlate two variables at a time. Anyway, I created my own correlation table with the following code:
for(i in 1:ncol(gd2)) {
for (j in 1:ncol(gd2)) {
corVal
The table looks like this:
head(dtnew)
Better Afraid Alive Bored Drop Empty Energy Happy Help Home Hope Memory Satis Spirit Worth TOT
1: 1.00 0.32 0.29 0.39 0.36 0.46 0.25 0.43 0.39 0.13 0.46 0.39 0.50 0.45 0.48 0.67
2: 0.32 1.00 0.25 0.20 0.24 0.30 0.23 0.30 0.43 0.15 0.44 0.28 0.31 0.29 0.34 0.62
3: 0.29 0.25 1.00 0.26 0.28 0.46 0.38 0.60 0.35 0.19 0.41 0.10 0.49 0.53 0.43 0.65
4: 0.39 0.20 0.26 1.00 0.36 0.56 0.31 0.36 0.39 0.16 0.32 0.23 0.39 0.35 0.44 0.67
5: 0.36 0.24 0.28 0.36 1.00 0.44 0.41 0.37 0.43 0.31 0.35 0.22 0.42 0.37 0.40 0.72
6: 0.46 0.30 0.46 0.56 0.44 1.00 0.32 0.55 0.51 0.18 0.45 0.17 0.62 0.52 0.64 0.75
>
But longer.
Here is the dput()
structure(list(Better = c(1, 0.32, 0.29, 0.39, 0.36, 0.46, 0.25,
0.43, 0.39, 0.13, 0.46, 0.39, 0.5, 0.45, 0.48, 0.67), Afraid = c(0.32,
1, 0.25, 0.2, 0.24, 0.3, 0.23, 0.3, 0.43, 0.15, 0.44, 0.28, 0.31,
0.29, 0.34, 0.62), Alive = c(0.29, 0.25, 1, 0.26, 0.28, 0.46,
0.38, 0.6, 0.35, 0.19, 0.41, 0.1, 0.49, 0.53, 0.43, 0.65), Bored = c(0.39,
0.2, 0.26, 1, 0.36, 0.56, 0.31, 0.36, 0.39, 0.16, 0.32, 0.23,
0.39, 0.35, 0.44, 0.67), Drop = c(0.36, 0.24, 0.28, 0.36, 1,
0.44, 0.41, 0.37, 0.43, 0.31, 0.35, 0.22, 0.42, 0.37, 0.4, 0.72
), Empty = c(0.46, 0.3, 0.46, 0.56, 0.44, 1, 0.32, 0.55, 0.51,
0.18, 0.45, 0.17, 0.62, 0.52, 0.64, 0.75), Energy = c(0.25, 0.23,
0.38, 0.31, 0.41, 0.32, 1, 0.48, 0.37, 0.36, 0.31, 0.14, 0.4,
0.43, 0.38, 0.74), Happy = c(0.43, 0.3, 0.6, 0.36, 0.37, 0.55,
0.48, 1, 0.45, 0.21, 0.49, 0.22, 0.69, 0.84, 0.49, 0.8), Help = c(0.39,
0.43, 0.35, 0.39, 0.43, 0.51, 0.37, 0.45, 1, 0.2, 0.51, 0.32,
0.5, 0.44, 0.6, 0.73), Home = c(0.13, 0.15, 0.19, 0.16, 0.31,
0.18, 0.36, 0.21, 0.2, 1, 0.23, 0.13, 0.13, 0.15, 0.26, 0.63),
Hope = c(0.46, 0.44, 0.41, 0.32, 0.35, 0.45, 0.31, 0.49,
0.51, 0.23, 1, 0.38, 0.48, 0.47, 0.59, 0.73), Memory = c(0.39,
0.28, 0.1, 0.23, 0.22, 0.17, 0.14, 0.22, 0.32, 0.13, 0.38,
1, 0.25, 0.24, 0.31, 0.66), Satis = c(0.5, 0.31, 0.49, 0.39,
0.42, 0.62, 0.4, 0.69, 0.5, 0.13, 0.48, 0.25, 1, 0.66, 0.6,
0.78), Spirit = c(0.45, 0.29, 0.53, 0.35, 0.37, 0.52, 0.43,
0.84, 0.44, 0.15, 0.47, 0.24, 0.66, 1, 0.51, 0.77), Worth = c(0.48,
0.34, 0.43, 0.44, 0.4, 0.64, 0.38, 0.49, 0.6, 0.26, 0.59,
0.31, 0.6, 0.51, 1, 0.77), TOT = c(0.67, 0.62, 0.65, 0.67,
0.72, 0.75, 0.74, 0.8, 0.73, 0.63, 0.73, 0.66, 0.78, 0.77,
0.77, 0.89)), row.names = c(NA, -16L), class = c("data.table",
"data.frame"), .internal.selfref = <pointer: 0x000001d7adc21ef0>)
</pre/>
I would like to generate a visual using corrplot. However, when I try, I get an error:
Error in is.finite(tmp) : default method not implemented for type 'list'
My data is indeed of type list. I have tried usuing 'unlist'. Not sure what else to try.
There is a problem with your dput() output, possibly because you have a data.table. I can read it by deleting ", .internal.selfref = <pointer: 0x000001d7adc21ef0>" from the last line so that it ends class = c("data.table", "data.frame")). Printing that out shows a problem with the last line/column (Tot). The bottom row in that column should be 1.00, but it is 0.89. We can trim that and use as.matrix (my mistake in the earlier comment) to convert the data frame:
gd3 <- gd2[-16, -16]
corrplot(as.matrix(gd3))
library(corrplot)
M <- cor(df)
head(round(M,2))
corrplot(M, method="number")

iterate over a nested dataframe and print kables in RMarkdown

I have a dataframe with nested datraframes for every year. I'm trying to print each nested dataframe in the result column as a kable using kableExtra in RMarkdown.
The dataframe looks like this:
MWU_Results
# A tibble: 10 x 4
YEAR data.oratios data.kfmaratios result
<dbl> <list<df[,16]>> <list<df[,16]>> <list>
1 2008 [8 × 16] [127 × 16] <tibble [15 × 3]>
2 2009 [8 × 16] [127 × 16] <tibble [15 × 3]>
3 2010 [8 × 16] [127 × 16] <tibble [15 × 3]>
4 2011 [8 × 16] [127 × 16] <tibble [15 × 3]>
5 2012 [8 × 16] [127 × 16] <tibble [15 × 3]>
6 2013 [8 × 16] [127 × 16] <tibble [15 × 3]>
7 2014 [8 × 16] [127 × 16] <tibble [15 × 3]>
8 2015 [8 × 16] [127 × 16] <tibble [15 × 3]>
9 2016 [8 × 16] [127 × 16] <tibble [15 × 3]>
10 2017 [8 × 16] [127 × 16] <tibble [15 × 3]>
What is the best way of doing this?
I've tried this but it doesn't work.
library(tidyverse)
library(devtools)
library(inspectdf)
library(readr)
library(broom)
library(knitr)
library(readxl)
library(skimr)
library(kableExtra)
kable(purrr::map2(MWU_Results$result, MWU_Results$YEAR)) %>%
kable_styling()
I've included sample data for the problem I'm trying to solve. The KFMARatios sample is rather large so I only have data for 2008.
Sample data:
ORATIOS:
structure(list(YEAR = c(2008, 2009, 2010, 2011, 2012, 2013, 2014,
2015, 2016, 2017, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015,
2016, 2017), FARM = c("D", "D", "D", "D", "D", "D", "D", "D",
"D", "D", "I", "I", "I", "I", "I", "I", "I", "I", "I", "I"),
`CURRENT RATIO` = c(0.568022785746452, 0.329854720020037,
0.832073159580644, 0.643108790851367, 25.1454874121908, 14.5975395062397,
5.12537888750377, 5.20160770260219, 7.64257374037806, 2.1580962424325,
1.31703632160198, 0.125166573684741, 0.0680923398879462,
0.100452384108057, 0.0998706900125819, 0.0907309088049343,
0.521537398114045, 0.773433351511582, 0.174099653043861,
0.0804425861373205), `WORKING CAPITAL TO GROSS FARMING INCOME` = c(-0.132573843177753,
-0.419436996986394, -0.031444400685141, -0.114022796397208,
1.22962822585944, 0.397841184148093, 0.239623650110705, 0.295681875030473,
0.502930206605254, 0.41862926754376, 0.0513905118422565,
-0.406448322702947, -0.343476652794216, -0.366684678854441,
-0.27321810774102, -0.306827980132377, -0.173010159020099,
-0.140768598200492, -0.367184395657858, -0.888263538055031
), `DEBT TO TOTAL ASSET RATIO` = c(0.0846892634197993, 0.102127561711337,
0.0750728145035032, 0.0797349374471145, 0.0122514875519798,
0.0162967044282012, 0.0165670856047258, 0.0188732833402721,
0.0150968780472965, 0.0275252089477482, 0.1123291162633,
0.151496340475165, 0.0960615511639704, 0.0985641068765839,
0.119816717131179, 0.121164074695269, 0.0970056997272376,
0.139114211255347, 0.0686657852466466, 0.17098484263781),
`DEBT TO FARM ASSET RATIO` = c(0.0935832744841849, 0.114259598684054,
0.0824723632268821, 0.08365143337564, 0.0129689938858425,
0.0191316764222117, 0.0216751963945452, 0.0225358439285237,
0.0167830935834987, 0.030821228954403, 0.140068283663094,
0.203393535891141, 0.133942894025292, 0.137887444914688,
0.17818477721901, 0.182143899668642, 0.141540075268137, 0.212926916788055,
0.0962721755129152, 0.172706971368876), `EQUITY TO ASSET RATIO` = c(0.915310736580201,
0.897872438288663, 0.924927185496497, 0.920265062552885,
0.98774851244802, 0.983703295571799, 0.983432914395274, 0.981126716659728,
0.984903121952704, 0.972474791052252, 0.8876708837367, 0.848503659524835,
0.90393844883603, 0.901435893123416, 0.880183282868821, 0.878835925304732,
0.902994300272762, 0.860885788744653, 0.931334214753353,
0.82901515736219), `DEBT TO EQUITY RATIO` = c(0.0925251502415636,
0.113743954437438, 0.0811661887343104, 0.0866434472975902,
0.0124034482437396, 0.0165666868267717, 0.0168461776723358,
0.0192363361631072, 0.0153282873318188, 0.0283042904566863,
0.126543652970169, 0.178545300040313, 0.106270013503315,
0.109341227289126, 0.13612700838927, 0.137868823072129, 0.107426702137473,
0.161594270778014, 0.0737284040024573, 0.206250562633691),
`RETURN ON FARM ASSETS` = c(0.0170145283510924, -0.00522377886147693,
0.0237250420249203, 0.00257743472229431, 0.0213365859181817,
0.0244609737360482, 0.0279373354305636, 0.0167869242322396,
0.0572363957452595, -0.00273821783417637, 0.0325678749005671,
-0.0532931806283685, 0.024215521265722, -0.0178636730481072,
0.0189254399688753, 0.00211416100547258, -0.00938005681041073,
0.0501921695586829, 0.0215269026374393, -0.0366154070757298
), `RETURN ON ASSETS` = c(0.0566608458884666, 0.0239054711694685,
0.0264084815850861, 0.00576204495548541, 0.179667366138176,
0.0246773695339781, 0.0246552659101915, 0.020526505137709,
0.0551370549195115, -5.05665725060606e-05, 0.0449112877923212,
-0.0284073208306705, 0.0249952584312144, -0.00283565027536605,
0.0360687362998932, 0.0080927754538142, -0.00331579015236834,
0.0457634829675583, 0.0229640648122328, -0.023016837706958
), `RETURN ON EQUITY` = c(0.0168221490501512, -0.00520020437367425,
0.023349291367177, 0.00266962346623839, 0.0204061503508897,
0.0211814836515069, 0.0217131742563291, 0.0143291246913213,
0.0522749822883451, -0.002514608130223, 0.0294232052511338,
-0.0467824450944562, 0.0192125442012039, -0.0141654371518756,
0.0144583817182496, 0.00160025611694793, -0.00711931632857772,
0.0380917883044123, 0.0164860113123938, -0.0437269454184399
), `FARM OPERATING PROFIT MARGIN RATIO` = c(0.113108456739495,
-0.0455472105804567, 0.199838203998892, 0.0234275923606582,
0.158472105656006, 0.183710042172317, 0.190582976791897,
0.124927655425634, 0.45847835351018, -0.0422031337055503,
0.122121670323183, -0.243017854350921, 0.11277681710057,
-0.0790679940692684, 0.076084143213901, 0.00890894198839937,
-0.0450368591167229, 0.204577659697265, 0.13619384495868,
-0.358538500350435), `ASSET TURNOVER RATIO` = c(0.0153974936379558,
-0.00466912018059027, 0.0215963943475807, 0.00245676120615052,
0.0201561446538819, 0.0208362952730876, 0.0213534502396742,
0.0140586870610039, 0.0514857932558134, -0.00244539301601691,
0.0261181226076402, -0.0396950758641658, 0.0173669574034299,
-0.0127692334904846, 0.0127260258857395, 0.00140636256526249,
-0.00642870206654449, 0.0327926792191383, 0.0153539864000432,
-0.0362503005370359), `OPERATING EXPENSE RATIO` = c(0.671535228245263,
0.773166498456329, 0.607985458258, 0.724432447012029, 0.67336000606662,
0.64796797949329, 0.589032574693052, 0.74988495257417, 0.461775664398759,
0.862141471389961, 0.672863504023624, 0.980455882037588,
0.669661413731221, 0.86690216270866, 0.670033358895902, 0.737005445439968,
0.783494244501376, 0.649760819934915, 0.706382908455109,
1.134948535946), `DEPRECIATION EXPENSE RATIO` = c(0.12660532789432,
0.132732814909818, 0.103826844188336, 0.144629676126728,
0.140059287930065, 0.157478624539652, 0.141620283491016,
0.0919194664659044, 0.0583370508964949, 0.133579109920113,
0.150646135557582, 0.183514628711121, 0.146236932328879,
0.16125312788589, 0.191531747619893, 0.197293862401247, 0.193527787561396,
0.0913809290148264, 0.0946887014018637, 0.145522583536315
), `INTEREST EXPENSE RATIO` = c(0.0887509871209225, 0.139647897214309,
0.0883494935547731, 0.107510284500585, 0.028108600347309,
0.0108433537947408, 0.0787641650240354, 0.0332679255342914,
0.0214089311945663, 0.0464825523954769, 0.0543686900956105,
0.0790473436022124, 0.0713248368393299, 0.0509127034747178,
0.0623507502703033, 0.0567917501703862, 0.068014827053951,
0.0542805913529945, 0.0627345451843474, 0.0780673808681226
), `NET FARM INCOME RATIO` = c(0.113108456739495, -0.0455472105804567,
0.199838203998892, 0.0234275923606582, 0.158472105656006,
0.183710042172317, 0.190582976791897, 0.124927655425634,
0.45847835351018, -0.0422031337055503, 0.122121670323183,
-0.243017854350921, 0.11277681710057, -0.0790679940692684,
0.076084143213901, 0.00890894198839937, -0.0450368591167229,
0.204577659697265, 0.13619384495868, -0.358538500350435)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -20L))
KFMARATIOS:
structure(list(YEAR = c(2008, 2008, 2008, 2008, 2008, 2008, 2008,
2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008,
2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008,
2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008
), FARM = c(11407100, 11484600, 11485100, 11495100, 11801800,
11806400, 11820000, 11885400, 11886000, 11897200, 11897300, 12004500,
12004501, 12303001, 12340101, 12398300, 13050001, 13700201, 13705601,
14089100, 14110900, 14130000, 14130002, 14184100, 14192300, 14330302,
14388200, 14783200, 14786200, 15094200, 15096200, 15584200, 15586100,
15682100, 15683100, 15689100, 16507002, 16580000, 16598200, 16601300
), `CURRENT RATIO` = c(-3, 0, 4.57, 15.94, 2.22, 0, 368.69, 1.86,
9.1, 3.45, 2, 0, 1.58, 6.26, 1.97, 1.54, 0, 3.39, 313.09, 5.59,
5.4, 0, 3.6, 5.78, 3.18, 207.1, 2.36, 28.31, 3.4, 3.68, 0.37,
3.5, 5.6, 13.64, 7.05, 0, 2.23, 0.89, 4.4, 1.11), `WORKING CAPITAL TO GROSS FARMING INCOME` = c(0.783990044655886,
0.939342207539837, 0.468883358203084, 0.53708199556795, 0.429230789973027,
0.856616290636639, 0.46085746623408, 0.019246546772549, 1.04338230212655,
0.318770448161572, 0.398058372857175, 0.506978780306214, 0.263816960947357,
0.4960655740923, 0.101962576323424, 0.220623464476751, 1.12676140487953,
0.533690322762107, 0.685276501922026, 0.703540899065169, 0.660869855557338,
0.71777803486123, 0.319578323479609, 0.722736340214157, 0.286630301648443,
0.818610240507597, 0.184477489966846, 0.78148168000963, 0.357891811040315,
0.289159422203956, -0.125641128630768, 0.392321597654173, 0.561996673317676,
0.353452531903466, 0.683345718597063, 0.804567295215173, 0.307398272114796,
-0.375449779668313, 0.186702574682293, -0.55737251721071), `DEBT TO TOTAL ASSET RATIO` = c(0.02,
0.07, 0.27, 0.37, 0.36, 0, 0.07, 0.37, 0.05, 0.33, 0.42, 0.08,
0.24, 0.34, 0.36, 0.51, 0.01, 0.11, 0.1, 0.07, 0.08, 0.01, 0.32,
0.14, 0.4, 0.52, 0.39, 0.06, 0.21, 0.32, 0.43, 0.52, 0.29, 0.12,
0.17, 0.1, 0.15, 0.87, 0.12, 0.69), `DEBT TO FARM ASSET RATIO` = c(0.0210960466847519,
0.0662443993261916, 0.270051570315789, 0.373240578143398, 0.359031265562519,
0, 0.0678176279710153, 0.369000587598404, 0.04831743727994, 0.33065743433488,
0.41680939549244, 0.0851067276205844, 0.245359588845858, 0.337912727823456,
0.356607488633417, 0.508663012923272, 0.0126098421632802, 0.10665178903834,
0.105106247793806, 0.0698908293989529, 0.0818483764283224, 0.00750932570017385,
0.319501072718455, 0.136757510256717, 0.400840648545665, 0.516753083750126,
0.389587948103612, 0.0577299469460252, 0.206521419569117, 0.315261383020663,
0.43256943562472, 0.520491208048298, 0.290288373137576, 0.120229338185664,
0.173192986515349, 0.104536048245734, 0.151997186500475, 0.868552025800098,
0.123958600776313, 0.692195974317741), `EQUITY TO ASSET RATIO` = c(0.98536882817945,
0.944215770167283, 0.736537746555766, 0.729860554651407, 0.642228778874089,
1, 0.94228148558872, 0.630999412401596, 0.95168256272006, 0.66934256566512,
0.592693562701164, 0.914893272379416, 0.813956784138156, 0.688995447780108,
0.725420084109645, 0.545241148972386, 0.988536562104007, 0.900124825958172,
0.90344241855196, 0.930936390469265, 0.92060316189968, 0.992490674299826,
0.758518009863028, 0.881474617998699, 0.600468426703118, 0.553595877267449,
0.667405715763261, 0.942270053053975, 0.842757601135073, 0.708413078986436,
0.56743056437528, 0.533041296742996, 0.743304732269968, 0.88511363093375,
0.831970255984885, 0.904591907651469, 0.876296809602567, 0.131447974199902,
0.890119750534961, 0.307804025682259), `DEBT TO EQUITY RATIO` = c(0.02,
0.07, 0.37, 0.6, 0.56, 0, 0.07, 0.58, 0.05, 0.49, 0.72, 0.09,
0.32, 0.51, 0.55, 1.04, 0.01, 0.12, 0.12, 0.08, 0.09, 0.01, 0.47,
0.16, 0.67, 1.07, 0.64, 0.06, 0.26, 0.46, 0.76, 1.08, 0.41, 0.14,
0.21, 0.12, 0.18, 6.61, 0.14, 2.25), `RETURN ON FARM ASSETS` = c(0.374484329540697,
0.0498819566035984, 0.181954755022922, 0.193161758267218, 0.0473627311001023,
0.327305563029612, 0.603037930741254, -0.0156737997438482, 0.10397858597475,
0.10789191406389, 0.180771277730155, 0.150007797084, 0.174196776278552,
0.120122100767257, 0.298096858936563, 0.0517125227815447, 0.111597414809764,
0.185024421154621, 0.239979711875599, 0.0808784377916965, 0.201436668181771,
0.135024051506645, 0.251851638310215, 0.103285147847268, 0.14207589091784,
0.247675592658745, 0.100067311604358, 0.308209326567443, 0.154555623216289,
0.174464204907127, 0.00457531564104158, 0.098141499884622, 0.251116584438097,
0.153198476415449, 0.183688952743912, 0.0838032420725189, 0.169288085631256,
0.0279120898963428, 0.147329195543669, 0.034801030826966), `RETURN ON ASSETS` = c(0.260063898261748,
0.0581159003954688, 0.186586004612603, 0.144217266907855, 0.0471965084015535,
0.203276288956977, 0.522691591931166, -0.0156737997438482, 0.104160943214225,
0.110451790466256, 0.178360409188664, 0.150089138729099, 0.134029707705111,
0.120565772385725, 0.229528019076799, 0.0697390623585822, 0.10198296142804,
0.192570247620748, 0.245119340816501, 0.115758491252085, 0.195889106965538,
0.138158444053898, 0.231674956423303, 0.0966027636728098, 0.141766843553559,
0.215113054221126, 0.135495862386357, 0.314351616201071, 0.133076845003381,
0.168262801476855, 0.00457531564104158, 0.0986664889666124, 0.242490501823923,
0.152124266735103, 0.201716489655936, 0.0786665142081486, 0.162659186669921,
0.0279454048764536, 0.134992616527726, 0.034801030826966), `RETURN ON EQUITY` = c(0.263580248064511,
0.0444871419402714, 0.241012793134955, 0.191549228659637, 0.0734886226747657,
0.186089113513671, 0.544673844576945, -0.0248396423765173, 0.109257634896201,
0.161190875342999, 0.298045789765326, 0.163962072531003, 0.162274234481587,
0.160460729376603, 0.31640703656353, 0.0847926292565323, 0.102628180483108,
0.192493344561337, 0.244023637469295, 0.0858503015508329, 0.212255623707772,
0.13604566269794, 0.250952374400512, 0.101551944180348, 0.235835707060263,
0.386487527831846, 0.128000474163853, 0.327092350614891, 0.139632557156543,
0.227780755169442, 0.0080632167674627, 0.165179790324242, 0.298742298993181,
0.165391606109475, 0.214205739228479, 0.084552656304169, 0.157224605882577,
0.212343248849882, 0.146717984157146, 0.113062299136044), `FARM OPERATING PROFIT MARGIN RATIO` = c(0.55,
0.18, 0.29, 0.33, 0.12, 0.46, 0.24, -0.1, 0.14, 0.23, 0.2, 0.22,
0.44, 0.25, 0.33, 0.13, 0.36, 0.44, 0.33, 0.05, 0.32, 0.16, 0.52,
0.3, 0.24, 0.35, 0.2, 0.32, 0.38, 0.29, 0.02, 0.24, 0.36, 0.25,
0.4, 0.18, 0.32, -0.01, 0.08, -0.01), `ASSET TURNOVER RATIO` = c(0.64,
0.2, 0.55, 0.58, 0.29, 0.64, 1.88, 0.39, 0.31, 0.34, 0.72, 0.58,
0.38, 0.41, 0.96, 0.38, 0.26, 0.4, 0.62, 0.41, 0.55, 0.67, 0.53,
0.29, 0.51, 0.86, 0.38, 0.94, 0.4, 0.54, 0.65, 0.49, 0.7, 0.49,
0.41, 0.3, 0.47, 0.62, 0.87, 0.79), `OPERATING EXPENSE RATIO` = c(0.29,
0.57, 0.61, 0.52, 0.69, 0.48, 0.57, 0.89, 0.64, 0.57, 0.72, 0.62,
0.45, 0.55, 0.52, 0.69, 0.49, 0.43, 0.5, 0.75, 0.53, 0.69, 0.38,
0.54, 0.6, 0.54, 0.55, 0.56, 0.5, 0.57, 0.87, 0.61, 0.54, 0.63,
0.44, 0.61, 0.56, 0.82, 0.77, 0.83), `DEPRECIATION EXPENSE RATIO` = c(0.08,
0.16, 0.01, 0.05, 0.07, 0.02, 0.03, 0.09, 0.02, 0.06, 0.03, 0.1,
0.04, 0.08, 0.06, 0.1, 0.06, 0.05, 0.03, 0.04, 0.08, 0.09, 0.04,
0.06, 0.05, 0.01, 0.11, 0.05, 0.04, 0.06, 0.05, 0.08, 0.04, 0.03,
0.06, 0.08, 0.01, 0.1, 0.05, 0.04), `INTEREST EXPENSE RATIO` = c(0.01,
0, 0.03, 0.07, 0.08, 0, 0, 0.06, 0, 0.02, 0.04, 0.01, 0.02, 0.06,
0.03, 0.06, 0, 0, 0.03, 0.01, 0.02, 0, 0.06, 0.01, 0.05, 0, 0.07,
0, 0.04, 0.01, 0.08, 0.1, 0.04, 0.02, 0.03, 0.02, 0.04, 0.04,
0.01, 0.09), `NET FARM INCOME RATIO` = c(0.62, 0.27, 0.35, 0.36,
0.16, 0.5, 0.39, -0.04, 0.34, 0.35, 0.22, 0.28, 0.49, 0.31, 0.39,
0.15, 0.45, 0.51, 0.44, 0.2, 0.37, 0.21, 0.52, 0.39, 0.29, 0.45,
0.27, 0.39, 0.43, 0.36, 0.01, 0.21, 0.37, 0.32, 0.47, 0.28, 0.38,
0.05, 0.17, 0.05)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-40L))

How to subset columns based on value in a different column?

EDITED:
I have a dataframe that stores information about when particular assessment happened ('when'). This assessment happened at different times (t1 - t3) which vary by participant.
The dataframe also contains all the assessments ever completed by every participant (including the one referenced in the 'when' column). I only want the assessment information represented in the 'when' column. So if the number is 1, I want to keep all the data related to that assessment and remove all the data that was not collected at that assessment. Please note that I have many more variables in my actual data set than are represented in this shortened data set so any solution should not rely on repeating variable names.
Here's the best I can do. The problem with this solution is that it would have to be repeated for every variable name.
df2 <- mutate(.data = df,
a1G_when = if_else(when == 1, a1G_t1, NA_real_))
# here is what we start with
df <- structure(list(id = 1:10, when = c(1, 3, 2, 1, 2, 1, 3, 2, 3,
1), a1G_t1 = c(0.78, 0.21, 0.04, 0.87, 0.08, 0.25, 0.9, 0.77,
0.51, 0.5), Stqo_t1 = c(0.68, 0.77, 0.09, 0.66, 0.94, 0.05, 0.97,
0.92, 1, 0.04), Twcdz_t1 = c(0.95, 0.41, 0.29, 0.54, 0.06, 0.45,
0.6, 0.24, 0.17, 0.55), Kgh_t1 = c(0.25, 0.86, 0.37, 0.34, 0.97,
0.75, 0.73, 0.68, 0.37, 0.66), `2xWX_t1` = c(0.47, 0.52, 0.23,
0.5, 0.88, 0.71, 0.21, 0.98, 0.76, 0.21), `2IYnS_t1` = c(0.32,
0.75, 0.03, 0.46, 0.89, 0.71, 0.51, 0.83, 0.34, 0.32), a1G_t2 = c(0.97,
0.01, 0.58, 0.33, 0.58, 0.37, 0.76, 0.33, 0.39, 0.56), Stqo_t2 = c(0.78,
0.42, 0.5, 0.69, 0.09, 0.72, 0.84, 0.94, 0.46, 0.83), Twcdz_t2 = c(0.62,
0.34, 0.72, 0.62, 0.8, 0.26, 0.3, 0.88, 0.42, 0.53), Kgh_t2 = c(0.99,
0.66, 0.02, 0.17, 0.51, 0.03, 0.03, 0.74, 0.1, 0.26), `2xWX_t2` = c(0.68,
0.97, 0.56, 0.27, 0.66, 0.71, 0.96, 0.24, 0.37, 0.76), `2IYnS_t2` = c(0.24,
0.88, 0.58, 0.31, 0.8, 0.92, 0.91, 0.9, 0.55, 0.52), a1G_t3 = c(0.73,
0.6, 0.66, 0.06, 0.33, 0.34, 0.09, 0.44, 0.73, 0.56), Stqo_t3 = c(0.28,
0.88, 0.56, 0.75, 0.85, 0.33, 0.88, 0.4, 0.63, 0.61), Twcdz_t3 = c(0.79,
0.95, 0.41, 0.07, 0.99, 0.06, 0.74, 0.17, 0.89, 0.4), Kgh_t3 = c(0.06,
0.52, 0.35, 0.91, 0.43, 0.74, 0.72, 0.96, 0.39, 0.4), `2xWX_t3` = c(0.25,
0.09, 0.64, 0.32, 0.15, 0.14, 0.18, 0.33, 0.97, 0.6), `2IYnS_t3` = c(0.92,
0.49, 0.09, 0.95, 0.3, 0.83, 0.82, 0.56, 0.29, 0.36)), row.names = c(NA,
-10L), class = "data.frame")
# here is an example of what I want with the first column. I would also want all other repeating columns to look like this (Stq0_when, Twcdz, etc.)
id when a1G_when
1 1 1 0.78
2 2 3 0.88
3 3 2 0.58
4 4 1 0.87
5 5 2 0.58
6 6 1 0.25
7 7 3 0.09
8 8 2 0.33
9 9 3 0.73
10 10 1 0.50
Using data.table, you could do something like:
library(data.table)
cols <- unique(paste0(gsub("_.*", "", setdiff(names(df), c("id", "when"))), "_when"))
setDT(df)[
, (cols) := lapply(cols, function(x) paste0(gsub("_.*", "", x), "_t", when))][
, (cols) := lapply(cols, function(x) as.character(.SD[[get(x)]])), by = cols][
, (cols) := lapply(.SD, as.numeric), .SDcols = cols
]
Output (only first 10 rows and only relevant when columns):
a1G_when Stqo_when Twcdz_when Kgh_when 2xWX_when 2IYnS_when
1: 0.78 0.68 0.95 0.25 0.47 0.32
2: 0.60 0.88 0.95 0.52 0.09 0.49
3: 0.58 0.50 0.72 0.02 0.56 0.58
4: 0.87 0.66 0.54 0.34 0.50 0.46
5: 0.58 0.09 0.80 0.51 0.66 0.80
6: 0.25 0.05 0.45 0.75 0.71 0.71
7: 0.09 0.88 0.74 0.72 0.18 0.82
8: 0.33 0.94 0.88 0.74 0.24 0.90
9: 0.73 0.63 0.89 0.39 0.97 0.29
10: 0.50 0.04 0.55 0.66 0.21 0.32
Here is an opportunity to use the new tidyr::pivot_longer. We can use this to reshape the data so that var and t are in their own columns, filter to just the rows with the data we want (i.e. where t equals when) and then pivot the data back out to wide.
library(tidyverse)
df1 <- structure(list(ID = c(101, 102, 103, 104, 105), when = c(1, 2, 3, 1, 2), var1_t1 = c(5, 6, 4, 5, 6), var2_t1 = c(2, 3, 4, 2, 3), var1_t2 = c(7, 8, 9, 7, 8), var2_t2 = c(5, 4, 5, 4, 5), var1_t3 = c(3, 4, 3, 4, 3), var2_t3 = c(6, 7, 6, 7, 6)), row.names = c(NA, 5L), class = "data.frame")
df1 %>%
pivot_longer(
cols = starts_with("var"),
names_to = c("var", "t"),
names_sep = "_t",
values_to = "val",
col_ptypes = list(var = character(), t = numeric())
) %>%
filter(when == t) %>%
select(-t) %>%
pivot_wider(names_from = "var", values_from = "val")
#> # A tibble: 5 x 4
#> ID when var1 var2
#> <dbl> <dbl> <dbl> <dbl>
#> 1 101 1 5 2
#> 2 102 2 8 4
#> 3 103 3 3 6
#> 4 104 1 5 2
#> 5 105 2 8 5
Created on 2019-07-16 by the reprex package (v0.3.0)

Reading and reconstructing symmetric matrix with R

I need to read the following matrix from a file. It's a symmetric correlation matrix, so half of it is omitted.
1.00
0.49 1.00
0.53 0.57 1.00
0.49 0.46 0.48 1.00
0.51 0.53 0.57 0.57 1.00
0.33 0.30 0.31 0.24 0.38 1.00
0.32 0.21 0.23 0.22 0.32 0.43 1.00
0.20 0.16 0.14 0.12 0.17 0.27 0.33 1.00
0.19 0.08 0.07 0.19 0.23 0.24 0.26 0.25 1.00
0.30 0.27 0.24 0.21 0.32 0.34 0.54 0.46 0.28 1.00
0.37 0.35 0.37 0.29 0.36 0.37 0.32 0.29 0.30 0.35 1.00
0.21 0.20 0.18 0.16 0.27 0.40 0.58 0.45 0.27 0.59 0.31 1.00
Currently, I'm using
data1 <- na.omit(as.vector(t(read.table('triangle-data.txt', fill = TRUE))))
pt <- 12
R <- matrix(0, nrow = pt , ncol = pt)
for(i in 1:pt){
R[i, 1:i] <- data1[(i*(i-1)/2 + 1): (i*(i+1)/2)]
}
R <- R + t(R) - diag(rep(1, pt))
R
The result is
> dput(R)
structure(c(1, 0.49, 0.53, 0.49, 0.51, 0.33, 0.32, 0.2, 0.19,
0.3, 0.37, 0.21, 0.49, 1, 0.57, 0.46, 0.53, 0.3, 0.21, 0.16,
0.08, 0.27, 0.35, 0.2, 0.53, 0.57, 1, 0.48, 0.57, 0.31, 0.23,
0.14, 0.07, 0.24, 0.37, 0.18, 0.49, 0.46, 0.48, 1, 0.57, 0.24,
0.22, 0.12, 0.19, 0.21, 0.29, 0.16, 0.51, 0.53, 0.57, 0.57, 1,
0.38, 0.32, 0.17, 0.23, 0.32, 0.36, 0.27, 0.33, 0.3, 0.31, 0.24,
0.38, 1, 0.43, 0.27, 0.24, 0.34, 0.37, 0.4, 0.32, 0.21, 0.23,
0.22, 0.32, 0.43, 1, 0.33, 0.26, 0.54, 0.32, 0.58, 0.2, 0.16,
0.14, 0.12, 0.17, 0.27, 0.33, 1, 0.25, 0.46, 0.29, 0.45, 0.19,
0.08, 0.07, 0.19, 0.23, 0.24, 0.26, 0.25, 1, 0.28, 0.3, 0.27,
0.3, 0.27, 0.24, 0.21, 0.32, 0.34, 0.54, 0.46, 0.28, 1, 0.35,
0.59, 0.37, 0.35, 0.37, 0.29, 0.36, 0.37, 0.32, 0.29, 0.3, 0.35,
1, 0.31, 0.21, 0.2, 0.18, 0.16, 0.27, 0.4, 0.58, 0.45, 0.27,
0.59, 0.31, 1), .Dim = c(12L, 12L))
This is too unwieldy, and I need to hard-code its size. Is there a more convenient way?
I used a combination of readLines and strsplit to read the file
a <- sapply(sapply(lapply(readLines("triangle.txt"),
function(x) strsplit(x, " ")), "[", 1),
function(x) na.omit(as.numeric(x)))
and rbind to cast it into a square matrix
A <- do.call("rbind", a)
Despite the warning, the lower part of the matrix is correctly read from the file, but the upper part is all messed up, which I fixed with a little dirty trick
A[upper.tri(A)] <- 0
A <- A + t(A) - diag(nrow(A))
EDIT
Another simpler solution based on the vector of the coefficients:
data1 <- na.omit(as.vector(t(read.table('triangle.txt', fill = TRUE))))
n <- Re(polyroot(c(-length(data1), 1/2, 1/2)))[1]
A <- matrix(0, n, n)
A[upper.tri(A, diag = T)] <- data1
A <- A + t(A) - diag(n)

normality test and different order of predictors in two-way aov

I have the data shown below:
Location=c("lcn","lcn","lcn","etb","lcs","bbs","lcn","lcs","bbs","lcn","lcs","bbs","lcs","lcs","lcn",
"bbs","etb","bbs","etb","etb","lcs","lcn","lcn","bbs","bbs","etb","bbs","etb","bbs","bbs",
"lcs","lcs","lcs","lcs","lcs","lcn","lcs","etb","lcn","lcn","etb","etb","etb","etb","lcn",
"bbs","bbs","lcs","etb","lcs","bbs","bbs","lcs","bbs","lcs","lcn","lcn","lcn","etb","lcn",
"lcs","bbs","etb","etb","etb","bbs","etb","bbs","etb","etb","bbs","lcs")
Treatment=c(rep("control",each=21),rep("foam",each=20),rep("hail",each=17),rep("teda",each=14))
Growth=c( 0.24, -0.05, 0.19, 1.02, 0.84, 0.11, 0.13, 0.08, -0.18, -0.06,
0.38, 1.04, 0.55, -1.71, 0.24, 0.05, 0.49, -0.41, 0.70, 0.30,
1.03, 0.14, 0.73, 0.56, 0.56, 0.98, 0.53, 0.27, 0.32, 0.95,
0.10, 0.55, 1.18, 0.49, 0.58, 0.36, 0.18, 0.30, 1.71, 0.65,
0.69, 0.68, 0.66, 1.24, 0.47 , 1.28, 0.60, 1.01, 0.76, 1.35,
1.02, 0.75, 0.40, 0.37, 0.46, 0.47, 0.25, 0.61, 0.63, 0.86,
0.92, 0.09, 1.66, 0.88, 0.68, 1.02, 1.17, 1.18, 1.71, 1.01,
0.42, 0.56)
Mang=data.frame(Location,Treatment,Growth)
I want to use Two-Way Anova to see the influence of changing Location and Treatment on Growth. Two questions I want to ask:
(1) If some levels of predictors can't pass the normality test (shown below), can I still do the Anova ?
> shapiro.test(subset(Mang,Location=="lcn")[,3])$p.value
[1] 0.01317841
> shapiro.test(subset(Mang,Treatment=="control")[,3])$p.value
[1] 0.008312405
(2) Why the results are different when the order of predictors is changed in Anova?
> test1=aov(Growth~Location+Treatment,data=Mang)
> summary(test1)
Df Sum Sq Mean Sq F value Pr(>F)
Location 3 1.713 0.5710 2.708 0.05235 .
Treatment 3 3.495 1.1650 5.524 0.00193 **
Residuals 65 13.707 0.2109
---
Signif. codes: 0 ?**?0.001 ?*?0.01 ??0.05 ??0.1 ??1
> test2=aov(Growth~Treatment+Location,data=Mang)
> summary(test2)
Df Sum Sq Mean Sq F value Pr(>F)
Treatment 3 4.402 1.4673 6.958 0.000393 ***
Location 3 0.806 0.2687 1.274 0.290658
Residuals 65 13.707 0.2109
---
Signif. codes: 0 ?**?0.001 ?*?0.01 ??0.05 ??0.1 ??1

Resources