create matrices from directed dyadic data - r

I have a directed dyadic data frame that looks like this.
ccode.a ccode.b year mindist int.a2b
167603 220 570 1976 8.985324 0
624316 781 770 1976 7.914206 0
593896 740 620 1976 9.128673 0
669443 900 660 1976 9.235735 1
323404 434 20 1976 8.764419 0
353101 451 432 1976 5.390160 0
37109 53 700 1976 9.373724 0
624853 790 2 1976 8.949047 0
242472 355 53 1976 9.030233 0
35129 53 350 1976 9.006963 0
481129 600 140 1976 8.377329 0
621802 781 310 1976 8.885368 0
236655 350 600 1976 7.587584 0
192503 290 950 1976 9.636472 1
464722 580 94 1976 9.551535 0
550661 694 42 1976 9.377167 0
585022 712 625 1976 8.586424 0
637007 812 2 1976 8.902645 0
539818 678 572 1976 8.402542 0
690214 950 900 1976 7.859377 1
135804 160 135 1976 6.314121 0
554291 694 811 1976 8.603649 0
496453 620 680 1976 7.565657 0
68160 90 220 1976 9.003781 0
605932 770 482 1976 8.355514 0
509185 640 660 1976 4.873909 0
24928 42 830 1976 9.774045 0
454705 570 666 1976 8.740049 0
92800 100 51 1976 6.631003 1
492025 616 651 1976 7.140152 0
598335 750 663 1976 7.997375 0
485329 615 130 1976 8.951390 0
330093 435 520 1976 8.560956 0
74135 91 570 1976 9.453192 0
465351 580 235 1976 8.989404 1
227129 345 165 1976 9.296795 0
488046 615 696 1976 8.298939 0
381548 483 375 1976 8.325534 0
237750 350 840 1976 9.103709 0
402138 510 265 1976 8.688151 0
372394 475 950 1976 9.798338 0
216445 338 570 1976 8.890926 1
598308 750 660 1976 8.082419 0
613237 775 235 1976 9.100852 0
310790 420 42 1976 8.617836 0
611272 771 696 1976 8.065013 0
183450 235 551 1976 8.739090 0
659939 840 590 1976 8.879307 0
506308 640 70 1976 9.232063 0
188756 290 91 1976 9.120557 1
I want to create a matrix of all of the pairs of ccode.a and ccode.b,
d.sender <- matrix(0, nrow = 192, ncol = 192, dimnames = list(unique(dyad$ccode.a), unique(dyad$ccode.a)))
And then assign the value of int.a2b (I'll have to do this for a bunch of variables) to the appropriate cell in the matrix. This will be done separately for each year in the data set. Since the data frame is directed dyadic I'll also have a receiver matrix of the same dimension that that mirrors the sender matrix I'm trying to create now.

If I am correct in understanding what you want, here is one way:
x <- data.frame(a=(1:3),b=(4:6),val=c(8,3,7))
library(reshape2)
acast(x, a~b, value.var="val")
# 4 5 6
#1 8 NA NA
#2 NA 3 NA
#3 NA NA 7

Related

Transpose every n columns into new rows in R

I have a data frame that looks like this
Frame RightEye_x RightEye_y RightEye_z LeftEye_x LeftEye_y LeftEye_z
0 773 490 0 778 322 0
1 780 490 0 789 334 0
2 781 490 0 792 334 0
3 783 337 0 797 334 1
And I would like to transform it into
BodyPart Frame x y z
RightEye 0 773 490 0
RightEye 1 780 490 0
RightEye 2 781 490 0
RightEye 3 783 337 0
LeftEye 0 778 322 0
LeftEye 1 789 334 0
LeftEye 2 792 334 0
LeftEye 3 797 334 1
Using the melt(...) method in data.table:
library(data.table)
setDT(df)
result <- melt(df, measure.vars = patterns(c('_x', '_y', '_z')), value.name = c('x', 'y', 'z'))
result[, variable:=c('RightEye', 'LeftEye')[variable]]
result
## Frame variable x y z
## 1: 0 RightEye 773 490 0
## 2: 1 RightEye 780 490 0
## 3: 2 RightEye 781 490 0
## 4: 3 RightEye 783 337 0
## 5: 0 LeftEye 778 322 0
## 6: 1 LeftEye 789 334 0
## 7: 2 LeftEye 792 334 0
## 8: 3 LeftEye 797 334 1
We can use base R reshape like below
reshape(
setNames(df, gsub("(.*)_(.*)", "\\2_\\1", names(df))),
direction = "long",
idvar = "Frame",
varying = -1,
timevar = "BodyPart",
sep = "_"
)
which gives
Frame BodyPart x y z
0.RightEye 0 RightEye 773 490 0
1.RightEye 1 RightEye 780 490 0
2.RightEye 2 RightEye 781 490 0
3.RightEye 3 RightEye 783 337 0
0.LeftEye 0 LeftEye 778 322 0
1.LeftEye 1 LeftEye 789 334 0
2.LeftEye 2 LeftEye 792 334 0
3.LeftEye 3 LeftEye 797 334 1

How can I call for something in a data.frame when the destinction has to be done in two columns?

Sorry for the very specific question, but I have a file as such:
Adj Year man mt wm wmt by bytl gr grtl
3 careless 1802 0 126 0 54 0 13 0 51
4 careless 1803 0 166 0 72 0 1 0 18
5 careless 1804 0 167 0 58 0 2 0 25
6 careless 1805 0 117 0 5 0 5 0 7
7 careless 1806 0 408 0 88 0 15 0 27
8 careless 1807 0 214 0 71 0 9 0 32
...
560 mean 1939 21 5988 8 1961 0 1152 0 1512
561 mean 1940 20 5810 6 1965 1 914 0 1444
562 mean 1941 10 6062 4 2097 5 964 0 1550
563 mean 1942 8 5352 2 1660 2 947 2 1506
564 mean 1943 14 5145 5 1614 1 878 4 1196
565 mean 1944 42 5630 6 1939 1 902 0 1583
566 mean 1945 17 6140 7 2192 4 1004 0 1906
Now I have to call for specific values (e.g. [careless,1804,man] or [mean, 1944, wmt].
Now I have no clue how to do that, one possibility would be to split the data.frame and create an array if I'm correct. But I'd love to have a simpler solution.
Thank you in advance!
Subsetting for specific values in Adj and Year column and selecting the man column will give you the required output.
df[df$Adj == "careless" & df$Year == 1804, "man"]

Spatial autocorrelation using Moran's I or other spatial overlap index

I have a file containing the data from bottom trawl survey. There are 102 draw's points to each with associated coordinates (Lon,Lat), for every set I calculated the density (DI N / 1km2) of the predator (merlmerDI N/1km2) and the density of its preferred prey (other columns) with the same coordinates...
I would pull out a spatial overlap index that quantifies a number of affinity for the presence / absence (using the value of DI N / 1km2) for the predator with its prey in order to justify the preferential choice of a prey than another (which will basically be a choice of presence in that same area shared ). I had found the Moran Index ( precisely Bivariate Moran's I) that returns a simple number (1) when there is high spatial correlation ... once compared the data comes out a Moran's bi-variate scatter plot.
I should compare the hake (predator) separately with each of its preys and to see so many index how many the preys are.
Can someone help me? I don't know if it is right to use this index. Some or any idea?
Year PrHNĀ° Latitude Longitude Haul depth (m) Swept area (km2) merlmerDI N/1km2 tractraDI N/1km2 engrenDI N/1km2 sardpilDI N/1km2 papelonDI N/1km2
2004 1 37,5370 12,6067 51 0,044 137 69 0 891 0
2004 2 37,5433 12,8518 34 0,043 743 0 0 2067 0
2004 3 37,4757 12,9192 51 0,045 841 376 1350 5754 0
2004 4 37,3212 12,9258 310 0,076 4299 949 0 0 12223
2004 5 37,2868 12,8012 214 0,098 1729 366 0 0 4027
2004 6 37,1255 12,9703 331 0,103 77 29 0 0 2563
2004 7 37,0010 12,8058 391 0,099 192 0 0 0 6891
2004 8 37,0298 12,7738 388 0,103 156 0 0 0 5040
2004 9 37,2212 12,6082 158 0,049 2347 7000 0 0 3768
2004 10 37,3883 12,5287 151 0,045 2467 1102 0 0 5023
2004 11 37,2632 13,2430 130 0,049 2788 10298 0 0 66304
2004 12 37,1952 13,3478 136 0,048 952 16612 0 0 21412
2004 13 37,2642 13,4077 40 0,045 270 112 270 8764 0
2004 14 37,2472 13,4677 34 0,045 539 0 0 16854 0
2004 15 37,1348 13,6887 26 0,045 22 3461 0 12135 0
2004 16 36,9882 13,0683 337 0,101 99 50 0 0 3044
2004 17 37,0145 13,4638 619 0,102 10 10 0 0 79
2004 18 37,0800 13,5803 96 0,045 516 314 516 426 7063
2004 19 36,9162 13,6578 655 0,084 0 0 0 0 95
2004 20 36,8105 13,3932 413 0,102 108 0 0 0 2626
2004 22 36,5673 13,9302 586 0,103 29 0 0 0 652

Difficulties applying pca

I am experimenting pca with R. I have the following data:
V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
2454 0 168 290 45 1715 61 551 245 30 91
222 188 94 105 60 3374 615 7 294 0 169
552 0 0 465 0 3040 0 0 771 0 0
2872 0 0 0 0 3380 0 289 0 0 0
2938 0 56 56 0 2039 538 311 113 0 254
2849 0 0 332 0 2548 0 332 0 0 221
3102 0 0 0 0 2690 0 0 0 807 807
3134 0 0 0 0 2897 289 144 144 144 0
558 0 0 0 0 3453 0 0 0 0 0
2893 0 262 175 0 2452 350 1138 262 87 175
552 0 0 351 0 3114 0 0 678 0 0
2874 0 109 54 0 2565 272 1037 109 0 0
1396 0 0 407 0 1730 0 0 305 0 0
2866 0 71 179 0 2403 358 753 35 107 143
449 0 0 0 0 2825 0 0 0 0 0
2888 0 0 523 0 2615 104 627 209 0 0
2537 0 57 0 0 1854 0 0 463 0 0
2873 0 0 342 0 3196 0 114 0 0 114
720 0 0 365 4 2704 0 4 643 4 0
218 125 31 94 219 2479 722 0 219 0 94
to which I apply the following code:
fit <- prcomp(data)
ev <- fit$rotation # pc loadings
In order to make some tests, I tried to see the data matrix I retrieve when I do keep all the components I can keep:
numberComponentsKept = 10
featureVector = ev[,1:numberComponentsKept]
newData <- as.matrix(data)%*%as.matrix(featureVector)
The newData matrix should be the same as the original one, but instead, I get a very different result:
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10
2454 1424.447 867.5986 514.0592 -155.4783720 -574.7425 85.38724 -86.71887 90.872507 4.305168 92.08284
222 3139.681 1020.4150 376.3165 471.8718398 -796.9549 142.14301 -119.86945 32.919950 -31.269467 32.55846
552 2851.544 539.6075 883.3969 -93.3579153 -908.6689 68.34030 -40.97052 -13.856931 23.133566 89.00851
2872 3111.317 1210.0187 433.0382 -144.4065362 -381.2305 -20.08927 -49.03447 9.569258 44.201571 70.13113
2938 1788.334 945.8162 189.6526 308.7703509 -593.5577 124.88484 -109.67276 -115.127348 14.170615 99.19492
2849 2291.839 978.1819 374.7567 -243.6739292 -496.8707 287.01065 -126.22501 -18.747873 54.080763 62.80605
3102 2530.989 814.7548 -510.5978 -410.6295894 -1015.3228 46.85727 -21.20662 14.696831 23.687923 72.37691
3134 2679.430 970.1323 311.8627 124.2884480 -536.4490 -26.23858 83.86768 -17.808390 -28.802387 92.09583
558 3268.599 988.2515 353.6538 -82.9155988 -342.5729 12.96219 -60.94886 18.537087 7.291126 96.14917
2893 1921.761 1664.0084 631.0800 -55.6321469 -864.9628 -28.11045 -104.78931 37.797727 -12.078535 104.88374
552 2927.108 607.6489 799.9602 -79.5494412 -827.6994 14.14625 -50.12209 -14.020936 29.996639 86.72887
2874 2084.285 1636.7999 621.6383 -49.2934502 -577.4815 -67.27198 -11.06071 -7.167577 47.395309 51.02962
1396 1618.171 337.4320 488.2717 -100.1663625 -469.8857 212.37199 -1.19409 13.531485 -23.332701 64.58806
2866 2007.261 1387.6890 395.1586 0.8640971 -636.1243 133.41074 12.34794 -26.969634 5.506828 74.13767
449 2674.136 808.5174 289.3345 -67.8356695 -280.2689 10.60475 -49.86404 15.165731 5.965083 78.66244
2888 2254.171 1162.4988 749.7230 -206.0215007 -652.2364 302.36320 40.76341 -1.079259 17.635956 57.86999
2537 1747.098 371.8884 429.1309 9.3761544 -480.7130 -196.25019 -81.31580 2.819608 24.089379 56.91885
2873 2973.872 974.3854 433.7282 -197.0601947 -478.3647 301.96576 -81.81105 14.516646 -1.191972 100.79057
720 2537.535 504.4124 744.5909 -78.1162036 -771.1396 38.17725 -36.61446 -9.079443 25.488688 78.21597
218 2292.718 800.5257 260.6641 603.3295960 -641.9296 187.38913 11.71382 70.011487 78.047216 96.10967
What did I do wrong?
I think the problem is rather a PCA problem than an R problem. You multiply the original data with the rotation matrix and you wonder then why newData!=data. This would be only the case if the rotation matrix would be the identity matrix.
What you probably were planning to do is the following:
# Run PCA:
fit <- prcomp(USArrests)
ev <- fit$rotation # pc loadings
# Reversed PCA:
head(fit$x%*% t(as.matrix(ev)))
# Centered Original data:
head(t(apply(USArrests,1,'-',colMeans(USArrests))))
In the last step you have to center the data, because the function prcomp centers them by default.

Convert data frame from wide to long with 2 variables

I have the following wide data frame (mydf.wide):
DAY JAN F1 FEB F2 MAR F3 APR F4 MAY F5 JUN F6 JUL F7 AUG F8 SEP F9 OCT F10 NOV F11 DEC F12
1 169 0 296 0 1095 0 599 0 1361 0 1746 0 2411 0 2516 0 1614 0 908 0 488 0 209 0
2 193 0 554 0 1085 0 1820 0 1723 0 2787 0 2548 0 1402 0 1633 0 897 0 411 0 250 0
3 246 0 533 0 1111 0 1817 0 2238 0 2747 0 1575 0 1912 0 705 0 813 0 156 0 164 0
4 222 0 547 0 1125 0 1789 0 2181 0 2309 0 1569 0 1798 0 1463 0 878 0 241 0 230 0
I want to produce the following "semi-long":
DAY variable_month value_month value_F
1 JAN 169 0
I tried:
library(reshape2)
mydf.long <- melt(mydf.wide, id.vars=c("YEAR","DAY"), measure.vars=c("JAN","FEB","MAR","APR","MAY","JUN","JUL","AUG","SEP","OCT","NOV","DEC"))
but this skip the F variable and I don't know how to deal with two variables...
This is one of those cases where reshape(...) in base R is a better option.
months <- c(2,4,6,8,10,12,14,16,18,20,22,24) # column numbers of months
F <- c(3,5,7,9,11,13,15,17,19,21,23,25) # column numbers of Fn
mydf.long <- reshape(mydf.wide,idvar=1,
times=colnames(mydf.wide)[months],
varying=list(months,F),
v.names=c("value_month","value_F"),
direction="long")
colnames(mydf.long)[2] <- "variable_month"
head(mydf.long)
# DAY variable_month value_month value_F
# 1.JAN 1 JAN 169 0
# 2.JAN 2 JAN 193 0
# 3.JAN 3 JAN 246 0
# 4.JAN 4 JAN 222 0
# 1.FEB 1 FEB 296 0
# 2.FEB 2 FEB 554 0
You can also do this with 2 calls to melt(...)
library(reshape2)
months <- c(2,4,6,8,10,12,14,16,18,20,22,24) # column numbers of months
F <- c(3,5,7,9,11,13,15,17,19,21,23,25) # column numbers of Fn
z.1 <- melt(mydf.wide,id=1,measure=months,
variable.name="variable_month",value.name="value_month")
z.2 <- melt(mydf.wide,id=1,measure=F,value.name="value_F")
mydf.long <- cbind(z.1,value_F=z.2$value_F)
head(mydf.long)
# DAY variable_month value_month z.2$value_F
# 1 1 JAN 169 0
# 2 2 JAN 193 0
# 3 3 JAN 246 0
# 4 4 JAN 222 0
# 5 1 FEB 296 0
# 6 2 FEB 554 0
melt() and dcast() are available from the reshape2 and data.table packages. The recent versions of data.table allow to melt multiple columns simultaneously. The patterns() parameter can be used to specify the two sets of columns by regular expressions:
library(data.table) # CRAN version 1.10.4 used
regex_month <- toupper(paste(month.abb, collapse = "|"))
mydf.long <- melt(setDT(mydf.wide), measure.vars = patterns(regex_month, "F\\d"),
value.name = c("MONTH", "F"))
# rename factor levels
mydf.long[, variable := forcats::lvls_revalue(variable, toupper(month.abb))][]
DAY variable MONTH F
1: 1 JAN 169 0
2: 2 JAN 193 0
3: 3 JAN 246 0
4: 4 JAN 222 0
5: 1 FEB 296 0
...
44: 4 NOV 241 0
45: 1 DEC 209 0
46: 2 DEC 250 0
47: 3 DEC 164 0
48: 4 DEC 230 0
DAY variable MONTH F
Note that "F\\d" is used as regular expression in patterns(). A simple "F" would have catched FEB as well as F1, F2, etc. producing unexpected results.
Also note that mydf.wide needs to be coerced to a data.table object. Otherwise, reshape2::melt() will be dispatched on a data.frame object which doesn't recognize patterns().
Data
library(data.table)
mydf.wide <- fread(
"DAY JAN F1 FEB F2 MAR F3 APR F4 MAY F5 JUN F6 JUL F7 AUG F8 SEP F9 OCT F10 NOV F11 DEC F12
1 169 0 296 0 1095 0 599 0 1361 0 1746 0 2411 0 2516 0 1614 0 908 0 488 0 209 0
2 193 0 554 0 1085 0 1820 0 1723 0 2787 0 2548 0 1402 0 1633 0 897 0 411 0 250 0
3 246 0 533 0 1111 0 1817 0 2238 0 2747 0 1575 0 1912 0 705 0 813 0 156 0 164 0
4 222 0 547 0 1125 0 1789 0 2181 0 2309 0 1569 0 1798 0 1463 0 878 0 241 0 230 0",
data.table = FALSE)

Resources