Which apply function in R to use for my calculations

Which apply function in R to use for my calculations - r

I have a dataframe where each row in the data represents a matchup in a soccer game. Here is a summary with some columns removed, and only for 50 games of a season:
dput(mydata)
structure(list(home_id = c(75L, 323L, 607L, 3627L, 3645L, 641L,
204L, 111L, 287L, 179L, 1062L, 292L, 413L, 275L, 182L, 3639L,
179L, 2649L, 111L, 478L, 383L, 3645L, 275L, 577L, 3639L, 75L,
413L, 287L, 607L, 3627L, 1062L, 75L, 583L, 323L, 3736L, 577L,
179L, 287L, 275L, 3645L, 3639L, 583L, 179L, 413L, 641L, 204L,
478L, 292L, 607L, 323L), away_id = c(3645L, 3736L, 583L, 2649L,
577L, 75L, 3736L, 182L, 323L, 607L, 3639L, 583L, 478L, 383L,
3645L, 607L, 413L, 204L, 641L, 583L, 3627L, 179L, 182L, 3736L,
292L, 204L, 323L, 1062L, 2649L, 3639L, 204L, 292L, 111L, 607L,
182L, 3645L, 478L, 413L, 641L, 287L, 577L, 182L, 2649L, 1062L,
383L, 111L, 3736L, 3627L, 75L, 275L), home_rating = c(1546.64167937943,
1534.94287021653, 1514.51852002403, 1558.91823781777, 1555.76784458784,
1518.37707748967, 1464.5264202735, 1642.57388443639, 1447.37725553409,
1420.69724095008, 1428.51535356064, 1512.81896541907, 1463.29314217469,
1492.70306452585, 1404.65235407107, 1418.03767059747, 1420.69724095008,
1532.76811278441, 1642.57388443639, 1515.31896572792, 1498.7997953168,
1555.76784458784, 1492.70306452585, 1519.94395373088, 1418.03767059747,
1546.64167937943, 1463.29314217469, 1447.37725553409, 1514.51852002403,
1558.91823781777, 1428.51535356064, 1546.64167937943, 1524.71735294388,
1534.94287021653, 1484.09023843799, 1519.94395373088, 1420.69724095008,
1447.37725553409, 1492.70306452585, 1555.76784458784, 1418.03767059747,
1524.71735294388, 1420.69724095008, 1463.29314217469, 1518.37707748967,
1464.5264202735, 1515.31896572792, 1512.81896541907, 1514.51852002403,
1534.94287021653), away_rating = c(1555.76784458784, 1484.09023843799,
1524.71735294388, 1532.76811278441, 1519.94395373088, 1546.64167937943,
1484.09023843799, 1404.65235407107, 1534.94287021653, 1514.51852002403,
1418.03767059747, 1524.71735294388, 1515.31896572792, 1498.7997953168,
1555.76784458784, 1514.51852002403, 1463.29314217469, 1464.5264202735,
1518.37707748967, 1524.71735294388, 1558.91823781777, 1420.69724095008,
1404.65235407107, 1484.09023843799, 1512.81896541907, 1464.5264202735,
1534.94287021653, 1428.51535356064, 1532.76811278441, 1418.03767059747,
1464.5264202735, 1512.81896541907, 1642.57388443639, 1514.51852002403,
1404.65235407107, 1555.76784458784, 1515.31896572792, 1463.29314217469,
1518.37707748967, 1447.37725553409, 1519.94395373088, 1404.65235407107,
1532.76811278441, 1428.51535356064, 1498.7997953168, 1642.57388443639,
1484.09023843799, 1558.91823781777, 1546.64167937943, 1492.70306452585
)), .Names = c("home_id", "away_id", "home_rating", "away_rating"
), row.names = c(NA, 50L), class = "data.frame")
Heres what it looks like:
> head(mydata)
home_id away_id home_rating away_rating
1 75 3645 1546.642 1555.768
2 323 3736 1534.943 1484.090
3 607 583 1514.519 1524.717
4 3627 2649 1558.918 1532.768
5 3645 577 1555.768 1519.944
6 641 75 1518.377 1546.642
The columns home_rating and away_rating are scores that reflect how good each team is, and I'd like to use these columns in an apply function. In particular, I have another function named use_ratings() that looks like this:
# takes a rating from home and away team, as well as is_cup boolean, returns score
use_ratings <- function(home_rating, away_rating, is_cup = FALSE) {
if(is_cup) { # if is_cup, its a neutral site game
rating_diff <- -(home_rating - away_rating) / 400
} else {
rating_diff <- -(home_rating + 85 - away_rating) / 400
}
W_e <- 1 / (10^(rating_diff) + 1)
return(W_e)
}
I'd like to apply this function over every row my mydata, using the values in the home_rating and away_rating column as the parameters passed each time to use_ratings(). How can I do this, thanks?

#SymbolixAU is absolutely right in that the best way to do this (in terms of both speed and readability) is taking advantage of vectorization directly. But if you were to use an "apply function", that function would probably be mapply() or apply():
Using mapply():
mapply(use_ratings, home_rating = mydata$home_rating,
away_rating = mydata$away_rating, is_cup = <a vector of booleans>)
Using apply():
apply(mydata, 1, function(row), use_ratings(row$home_rating, row$away_rating, <row$is_cup, which is missing>)
Multivariate apply (mapply) simultaneously applies a multivariate function to several objects corresponding to its arguments. apply applies a functions over the margins of matrix-like object. Setting MARGIN=1 asks apply to operate on rows. Hence, we had to modify the function to operate on rows and feed the relevant arguments to use_ratings.

Related

Remove middle inconsistent characters from a column header column name with r

`
set.seed(500)
index <- sample(1:nrow(Bands_reflectance_2017),100, replace = FALSE )
Bands_reflectance_2017 <- dput(head(Bands_reflectance_2017[1:100]))
Bands_reflectance_2017 <-
structure(
list(
t2017.01.05T08.25.12.000000000_blue = c(5064L,
5096L, 5072L, 5048L, 5048L, 5064L),
t2017.01.15T08.26.22.000000000_blue = c(418L,
487L, 480L, 449L, 449L, 480L),
t2017.01.25T08.21.38.000000000_blue = c(312L,
414L, 385L, 385L, 385L, 403L),
t2017.02.04T08.27.09.000000000_blue = c(5156L,
5096L, 5204L, 5240L, 5240L, 5112L),
t2017.02.14T08.27.29.000000000_blue = c(2554L,
2896L, 2842L, 2776L, 2776L, 2934L),
t2017.02.24T08.23.38.000000000_blue = c(2662L,
2428L, 2630L, 2644L, 2644L, 2276L),
t2017.03.06T08.24.47.000000000_blue = c(340L,
403L, 409L, 407L, 407L, 391L),
t2017.03.16T08.16.07.000000000_blue = c(188L,
245L, 257L, 239L, 239L, 245L),
t2017.03.26T08.22.43.000000000_blue = c(379L,
397L, 381L, 345L, 345L, 387L),
t2017.04.05T08.23.06.000000000_blue = c(604L,
647L, 639L, 647L, 647L, 631L),
t2017.04.15T08.23.45.000000000_blue = c(311L,
382L, 376L, 379L, 379L, 425L),
t2017.04.25T08.23.17.000000000_blue = c(219L,
318L, 237L, 322L, 322L, 302L),
t2017.05.05T08.23.45.000000000_blue = c(979L,
1030L, 1021L, 1030L, 1030L, 985L),
t2017.05.15T08.28.11.000000000_blue = c(138L,
219L, 196L, 201L, 201L, 247L),
t2017.05.25T08.23.46.000000000_blue = c(655L,
779L, 736L, 752L, 752L, 777L),
t2017.06.04T08.25.50.000000000_blue = c(318L,
419L, 384L, 343L, 343L, 400L),
t2017.06.14T08.28.06.000000000_blue = c(397L,
387L, 407L, 432L, 432L, 347L),
t2017.06.24T08.26.00.000000000_blue = c(336L,
450L, 402L, 395L, 395L, 388L),
t2017.07.04T08.23.42.000000000_blue = c(502L,
538L, 512L, 495L, 495L, 505L),
t2017.07.09T08.23.09.000000000_blue = c(568L,
597L, 639L, 611L, 611L, 577L),
t2017.07.19T08.23.43.000000000_blue = c(479L,
517L, 536L, 529L, 529L, 528L),
t2017.07.24T08.23.44.000000000_blue = c(409L,
499L, 499L, 473L, 473L, 482L),
t2017.07.29T08.26.12.000000000_blue = c(781L,
801L, 810L, 823L, 823L, 735L),
t2017.08.03T08.26.43.000000000_blue = c(517L,
579L, 560L, 583L, 583L, 564L),
t2017.08.08T08.23.41.000000000_blue = c(575L,
654L, 650L, 650L, 650L, 602L),
t2017.08.13T08.23.44.000000000_blue = c(623L,
679L, 708L, 698L, 698L, 677L),
t2017.08.18T08.25.16.000000000_blue = c(614L,
651L, 648L, 597L, 597L, 651L),
t2017.08.23T08.22.22.000000000_blue = c(554L,
613L, 559L, 524L, 524L, 596L),
t2017.08.28T08.28.01.000000000_blue = c(769L,
814L, 772L, 744L, 744L, 828L),
t2017.09.02T08.23.42.000000000_blue = c(756L,
761L, 763L, 783L, 783L, 742L),
t2017.09.07T08.23.30.000000000_blue = c(807L,
865L, 826L, 838L, 838L, 837L),
t2017.09.12T08.23.35.000000000_blue = c(861L,
869L, 876L, 904L, 904L, 869L),
t2017.09.22T08.23.38.000000000_blue = c(4640L,
3780L, 4340L, 4728L, 4728L, 3060L),
t2017.09.27T08.16.41.000000000_blue = c(778L,
777L, 811L, 839L, 839L, 752L),
t2017.10.02T08.17.41.000000000_blue = c(766L,
868L, 851L, 857L, 857L, 799L),
t2017.10.07T08.24.51.000000000_blue = c(767L,
816L, 839L, 830L, 830L, 753L),
t2017.10.12T08.24.39.000000000_blue = c(678L,
688L, 706L, 750L, 750L, 627L),
t2017.10.17T08.15.32.000000000_blue = c(678L,
769L, 804L, 797L, 797L, 711L),
t2017.10.22T08.21.34.000000000_blue = c(3146L,
3134L, 3128L, 3160L, 3160L, 3118L),
t2017.10.27T08.23.27.000000000_blue = c(612L,
697L, 721L, 697L, 697L, 708L),
t2017.11.01T08.24.41.000000000_blue = c(941L,
982L, 1001L, 1010L, 1010L, 999L),
t2017.11.06T08.20.50.000000000_blue = c(670L,
824L, 836L, 824L, 824L, 785L),
t2017.11.11T08.27.40.000000000_blue = c(720L,
817L, 839L, 807L, 807L, 801L),
t2017.11.16T08.16.16.000000000_blue = c(9824L,
9744L, 9792L, 9744L, 9744L, 9536L),
t2017.11.21T08.17.00.000000000_blue = c(749L,
841L, 838L, 738L, 738L, 830L),
t2017.11.26T08.25.13.000000000_blue = c(735L,
863L, 832L, 713L, 713L, 899L),
t2017.12.01T08.20.22.000000000_blue = c(674L,
836L, 816L, 800L, 800L, 771L),
t2017.12.06T08.19.42.000000000_blue = c(2742L,
2770L, 2742L, 2762L, 2762L, 2798L),
t2017.12.11T08.19.00.000000000_blue = c(582L,
745L, 734L, 654L, 654L, 743L),
t2017.12.16T08.23.19.000000000_blue = c(926L,
1054L, 1001L, 946L, 946L, 1054L),
t2017.12.21T08.20.53.000000000_blue = c(7432L,
7484L, 7456L, 7404L, 7404L, 7484L),
t2017.12.26T08.20.39.000000000_blue = c(629L,
724L, 762L, 738L, 738L, 731L),
t2017.12.31T08.20.04.000000000_blue = c(667L,
765L, 762L, 718L, 718L, 765L),
t2017.01.05T08.25.12.000000000_green = c(5224L,
5196L, 5208L, 5152L, 5152L, 5172L),
t2017.01.15T08.26.22.000000000_green = c(837L,
938L, 907L, 858L, 858L, 927L),
t2017.01.25T08.21.38.000000000_green = c(735L,
808L, 770L, 770L, 770L, 836L),
t2017.02.04T08.27.09.000000000_green = c(5424L,
5492L, 5488L, 5536L, 5536L, 5832L),
t2017.02.14T08.27.29.000000000_green = c(3050L,
3094L, 3108L, 3228L, 3228L, 2900L),
t2017.02.24T08.23.38.000000000_green = c(2664L,
2450L, 2598L, 2646L, 2646L, 2340L),
t2017.03.06T08.24.47.000000000_green = c(702L,
735L, 749L, 727L, 727L, 729L),
t2017.03.16T08.16.07.000000000_green = c(632L,
685L, 708L, 685L, 685L, 703L),
t2017.03.26T08.22.43.000000000_green = c(744L,
841L, 806L, 809L, 809L, 818L),
t2017.04.05T08.23.06.000000000_green = c(1030L,
1036L, 1044L, 1050L, 1050L, 1040L),
t2017.04.15T08.23.45.000000000_green = c(634L,
720L, 708L, 699L, 699L, 751L),
t2017.04.25T08.23.17.000000000_green = c(619L,
698L, 716L, 723L, 723L, 687L),
t2017.05.05T08.23.45.000000000_green = c(1340L,
1368L, 1374L, 1404L, 1404L, 1354L),
t2017.05.15T08.28.11.000000000_green = c(525L,
633L, 619L, 612L, 612L, 626L),
t2017.05.25T08.23.46.000000000_green = c(1042L,
1118L, 1078L, 1028L, 1028L, 1148L),
t2017.06.04T08.25.50.000000000_green = c(655L,
778L, 783L, 769L, 769L, 813L),
t2017.06.14T08.28.06.000000000_green = c(772L,
829L, 838L, 810L, 810L, 822L),
t2017.06.24T08.26.00.000000000_green = c(741L,
888L, 848L, 798L, 798L, 865L),
t2017.07.04T08.23.42.000000000_green = c(867L,
918L, 912L, 846L, 846L, 946L),
t2017.07.09T08.23.09.000000000_green = c(936L,
1001L, 1012L, 972L, 972L, 985L),
t2017.07.19T08.23.43.000000000_green = c(848L,
911L, 925L, 915L, 915L, 903L),
t2017.07.24T08.23.44.000000000_green = c(855L,
907L, 947L, 913L, 913L, 937L),
t2017.07.29T08.26.12.000000000_green = c(1096L,
1106L, 1134L, 1150L, 1150L, 1116L),
t2017.08.03T08.26.43.000000000_green = c(987L,
1072L, 1040L, 1030L, 1030L, 1021L),
t2017.08.08T08.23.41.000000000_green = c(996L,
1011L, 1001L, 1011L, 1011L, 1032L),
t2017.08.13T08.23.44.000000000_green = c(1006L,
1100L, 1082L, 1078L, 1078L, 1092L),
t2017.08.18T08.25.16.000000000_green = c(977L,
1034L, 1032L, 976L, 976L, 1020L),
t2017.08.23T08.22.22.000000000_green = c(976L,
1054L, 1044L, 985L, 985L, 1072L),
t2017.08.28T08.28.01.000000000_green = c(1162L,
1176L, 1188L, 1150L, 1150L, 1200L),
t2017.09.02T08.23.42.000000000_green = c(1136L,
1152L, 1158L, 1176L, 1176L, 1130L),
t2017.09.07T08.23.30.000000000_green = c(1122L,
1166L, 1174L, 1194L, 1194L, 1162L),
t2017.09.12T08.23.35.000000000_green = c(1158L,
1170L, 1168L, 1180L, 1180L, 1146L),
t2017.09.22T08.23.38.000000000_green = c(3304L,
3218L, 3072L, 3580L, 3580L, 4148L),
t2017.09.27T08.16.41.000000000_green = c(1172L,
1228L, 1242L, 1224L, 1224L, 1172L),
t2017.10.02T08.17.41.000000000_green = c(1148L,
1224L, 1220L, 1200L, 1200L, 1164L),
t2017.10.07T08.24.51.000000000_green = c(1120L,
1164L, 1160L, 1148L, 1148L, 1114L),
t2017.10.12T08.24.39.000000000_green = c(1124L,
1158L, 1166L, 1144L, 1144L, 1090L),
t2017.10.17T08.15.32.000000000_green = c(1092L,
1190L, 1180L, 1154L, 1154L, 1146L),
t2017.10.22T08.21.34.000000000_green = c(3140L,
3124L, 3142L, 3134L, 3134L, 3096L),
t2017.10.27T08.23.27.000000000_green = c(1064L,
1104L, 1116L, 1078L, 1078L, 1098L),
t2017.11.01T08.24.41.000000000_green = c(1298L,
1310L, 1344L, 1344L, 1344L, 1318L),
t2017.11.06T08.20.50.000000000_green = c(1114L,
1240L, 1220L, 1164L, 1164L, 1212L),
t2017.11.11T08.27.40.000000000_green = c(1182L,1278L, 1278L, 1192L, 1192L, 1284L),
t2017.11.16T08.16.16.000000000_green = c(8872L, 8728L, 8816L, 8904L, 8904L, 8600L),
t2017.11.21T08.17.00.000000000_green = c(1166L, 1268L, 1250L, 1158L, 1158L, 1260L),
t2017.11.26T08.25.13.000000000_green = c(1138L, 1272L, 1288L, 1240L, 1240L, 1278L)), row.names = c(NA, 6L), class = "data.frame")
`
I have a dataframe of dates for per specific bands with 534 column headers as follow:
"t2017-12-31T08:20:04.000000000_red_edge_3"
"t2017-02-04T08:27:09.000000000_nir_1"
"t2017-12-31T08:20:04.000000000_swir_2"
Now, I want to remove everything and only remain with the date and the band name e.g in column header one and two, I want to only remain with
"2017-12-31_red_edge_3"
"2017-02-04_nir_1"
I have about 534 columns and most characters are not consistent because each date time is different and more band examples not similar to what is shown here for all the 534 records, so I was only able to remove repetitive characters such as "T08", ":","t" and "000000000" which are available in all the columns. How do I remove the values between the date and the band characters when they vary per each column and so I cannot use :
for ( col in 1:ncol(Bands_reflectance_2017[5:534])){
colnames(Bands_reflectance_2017)[5:534] <- sub(".000000000", "", colnames(Bands_reflectance_2017)[5:534]) #Remove .000000000
}
etc
Also at the end of the day, I want to replace each bandname with a band coding system such as assign "nir-1" as "B8" and "12" as the month of "December" so that for example my first and second column header reads:
B7_December31
B8_February02
Cell 1
Cell 2
Cell 3
Cell 4
"B7_December31", "B8_February02" which are better naming to run in a random forest. Because I am running into problems of
Error in eval(predvars, data, env) : object '"t2017-12-31T08:20:04.000000000_red_edge_3"' not found
if I keep the naming convention in the example
I have the following column header names in my dataframe (Bands_reflectance_2017) of 534 columns :
"t2017-01-25T08:21:38.000000000_blue"
"t2017-08-23T08:22:22.000000000_green"
Cell 1
Cell 2
Cell 3
Cell 4
I want to remove everything except the date and band name e.g "2017_01_25_blue"
I tried:
for ( col in 1:ncol(Bands_reflectance_2017[5:534])){
colnames(Bands_reflectance_2017)[5:534] <- sub("T08", "", colnames(Bands_reflectance_2017)[5:534]) #Remove T08
But as some of the characters I want to remove are unique per each 534 columns, I am not sure how to remove them
I expect this at the end of the day:
2017_01_25_blue
2017_08_23_green
Cell 1
Cell 2
Cell 3
Cell 4
The later
"B2_December31", B3_August23
Cell 1
Cell 3
I also tried this :
substr(colnames(Bands_Reflectance_2017[2:335]),2,11)
What is the best way to do it? I am fairly new to programming and to r.

Thanks for sharing your code and data. Most people won't download random files. In the future you can share data with dput(data) or a smaller version with dput(head(data)).
library(stringr)
library(lubridate)
# Using the data frame that you provided with dput, which I call "df1" here
# You'll probably have to adjust the numbers between the [] because your
# data frame is vastly different from what I have and I'm not sure I have
# the write number, but since you said 534 columns, I'm using that.
df1 <- names(df1)[1:534]
band_names <- rep(NA, length(df1))
# This is messy. I'm sure someone who knows stringr or
# regex better has a neater way to do this.
# str_locate will find positions in a string and return the numeric value of the position
# str_sub uses positions to pull substrings
# gsub replaces patterns
# What this does is find the positions of the dates or labels,
# pulls out the substring, replaces things not needed
# (like "-" I used to mark positions), changed the number for date
# to something numeric so that month() can be switched from number to text.
for(i in 1:length(df1)) {
band_names[i] <- paste0(as.character(month(as.numeric(gsub("\\.","",
str_sub(df1[i],str_locate(df1[i],"\\.[0-9]{2}")))),
label=T, abbr = F)),gsub("T","",str_sub(df1[i],str_locate(df1[i],
"\\.[0-9]{2}T"))),"_",
str_sub(df1[i],str_locate(df1[i],"[a-z]{3,}.+")))}
# You can look at the results
band_names
[1] "Dec-12_red_edge_3" "Feb-02_nir_1" "Dec-12_swir_2"
# Split up band_names to replace the band label with number
band_out <- str_sub(band_names, 7)
band_stay <- str_sub(band_names, 1, 6)
# Made data frame up for the few example lines. I'm not downloading the CSV and I'm not going to find out the actual band names, labels, and numbers.
fake_bands <- data.frame(label = c("red_edge_3", "nir_1", "swir_2"), number = c("b1","b3","b2"))
# Change out labels for the numbers
band_replace <- fake_bands[match(band_out, fake_bands$label), "number"]
new_names <- paste0(band_stay, band_replace)
new_name
[1] "Dec-12_b1" "Feb-02_b3" "Dec-12_b2"
# Again, you might have to adjust the numbers in []
names(df1)[1:534] <- new_names
You're going to have to expand/replace the fake_bands data frame I made here with a data frame that has two columns. One column should have the labels, like "red_edge_3", and the other should have the appropriate band number.

How to rotate a Ternary plot in ggtern

I have this data collected on a tripole scale, where respondents were clicking on a point inside a triangle to show how someone responded to a situation:
structure(list(ID = c(24262L, 24263L, 24264L, 24266L, 24267L,
24268L, 24269L, 24270L, 24271L, 24272L, 24273L, 24275L, 24279L,
24282L, 24285L, 24286L, 24287L, 24288L, 24290L, 24292L, 24296L,
24298L, 24299L, 24300L, 24301L, 24302L, 24304L, 24305L, 24309L,
24310L, 24314L, 24328L, 24329L, 24330L, 24331L, 24332L, 24333L,
24339L, 24356L, 24363L, 24370L, 24378L, 24388L, 24390L, 24393L,
24404L, 24406L, 24408L, 24410L, 24420L, 24422L, 24431L, 24435L,
24449L, 24456L, 24457L, 24469L, 24503L, 24535L, 24538L, 24541L,
24543L, 24547L, 24549L, 24555L, 24560L, 24562L, 24564L, 24565L,
24574L, 24693L, 24694L, 24707L, 24711L, 24715L, 24717L, 24719L,
24721L, 24723L, 24725L, 24727L, 24733L, 24735L, 24737L, 24742L,
24750L, 24752L, 24758L, 24761L, 24762L, 24764L, 24770L, 24863L,
24865L, 24866L, 24867L, 24870L, 24885L, 24891L, 24984L, 24995L,
25005L, 25006L, 25010L, 25011L, 25012L, 25014L, 25015L, 25091L,
25092L, 25093L, 25094L, 25106L, 25109L, 25110L, 25111L, 25157L,
25159L, 25162L, 25174L, 25176L, 25180L, 25294L, 25295L, 25298L,
25302L, 25303L, 25304L, 25305L, 25308L, 25339L, 25341L, 25343L,
25345L, 25348L, 25349L, 25559L, 25566L, 25573L, 25575L, 25577L,
25579L, 25581L, 25586L, 25614L, 25622L, 25630L, 25631L, 25635L,
25641L, 25670L, 25671L, 25672L, 25673L, 25674L, 25677L, 25684L,
25688L, 25691L, 25693L, 25695L, 25700L, 24211L, 24212L, 24215L,
24217L, 24218L, 24219L, 24220L, 24222L, 24225L, 24226L, 24227L,
24230L, 24232L, 24234L, 24236L, 24237L, 24238L, 24239L, 24240L,
24243L, 24246L, 24247L, 24250L, 24251L, 24252L), Respectfully = c(0.5385952,
0.672799766, 0.515947104, 0.609299839, 0.600087047, 0.215293989,
0.112566531, 0.631413877, 0.171163484, 0.280788928, 0.895692229,
0.247195691, 0.181995317, 0.163163558, 0.900582135, 0.818431854,
0.795888841, 0.614360929, 0.945623696, 0.922643483, 0.628791392,
0.175074518, 0.619624436, 0.595834434, 0.352946192, 0.531283677,
0.211680189, 0.659169912, 0.526771784, 0.929830313, 0.898694217,
0.613898337, 0.617298901, 0.56617099, 0.554916739, 0.64306879,
0.189266831, 0.920095921, 0.712526262, 0.854605317, 0.913350403,
0.933309317, 1.006667733, 0.987369776, 1.017328858, 0.957674563,
0.90463531, 0.9272874, 0.891221881, 0.884747803, 0.933109701,
1.019063711, 0.916044593, 0.156491563, 0.654910684, 0.517636955,
0.247314185, 0.343438685, 0.337267578, 0.326364845, 0.114466496,
0.090442464, 0.243850961, 0.092173956, 0.235721201, 0.996143162,
0.635637045, 0.970861077, 0.948802829, 0.551817477, 0.912414432,
0.200542375, 0.826407254, 0.071805023, 0.892377079, 0.087980591,
0.918832958, 0.099396825, 1.023749948, 0.102644026, 0.107016437,
0.997948647, 0.110704333, 0.940060258, 0.091438882, 0.055989511,
0.081595875, 0.081419758, 0.770171881, 0.610801637, 0.511512518,
1.070922136, 0.593650937, 0.569419086, 0.873148918, 0.378054291,
0.582714975, 0.60744822, 0.14328903, 0.067492828, 0.315115869,
0.75541079, 0.061788347, 0.087719396, 1.049453616, 0.069038175,
1.044347167, 0.501647294, 0.476157516, 0.110015221, 0.269865036,
0.147203833, 0.961993456, 0.785571694, 0.641585886, 0.638352633,
0.609070599, 0.870874465, 0.864675701, 0.096855976, 0.610836565,
0.627459884, 0.874884486, 0.972632468, 0.164256439, 0.873557031,
0.57596755, 0.565361559, 0.586712956, 0.941195965, 0.446302474,
0.206582263, 0.610695481, 0.638060987, 0.530307591, 1.029941678,
0.607028246, 0.6176126, 0.543566525, 0.519073486, 0.609546781,
0.139241472, 0.901534081, 0.150142923, 0.317818969, 0.189081565,
0.626691282, 0.624533534, 0.612181485, 0.634860277, 0.646151781,
0.633498967, 0.624919891, 0.623312056, 0.631034791, 0.608126938,
0.236088231, 0.323942959, 0.919163823, 0.233712777, 0.276786536,
0.833319068, 0.095358528, 0.812533975, 0.209690139, 0.735989869,
0.596592605, 0.493421763, 0.818909705, 0.805246234, 0.613435805,
0.270724922, 0.366894066, 0.600306869, 0.869067788, 0.145871058,
0.604971766, 0.134385094, 0.588236988, 0.587666631, 1.032822847,
0.623843968, 0.605744064, 0.131348848, 0.588236988, 0.087467365,
0.600683391), Transparently = c(0.820800126, 0.615894616, 0.784985006,
0.606558323, 0.842676938, 0.844404042, 0.916779697, 0.615372658,
0.874791503, 0.814765275, 0.126808345, 0.855662525, 0.846717596,
0.862914324, 0.913444817, 0.251324534, 0.248540372, 0.614360929,
0.936769724, 0.095737927, 0.583792984, 0.858672082, 0.603269815,
0.617806852, 0.728860557, 0.763061166, 0.811132908, 0.599038482,
0.811664104, 0.077664897, 0.134824425, 0.606615484, 0.564655364,
0.618685603, 0.633455515, 0.545877218, 0.855959177, 0.095988706,
0.433236271, 0.697069466, 0.932611644, 0.942195773, 1.008322001,
0.992420793, 1.028732777, 0.969780326, 0.122604199, 0.099307142,
0.138839573, 0.150925994, 0.085792698, 1.020697951, 0.095590822,
0.849863172, 0.647231042, 0.773270965, 0.79933852, 0.781846166,
0.777013123, 0.73322922, 0.914041042, 0.923891008, 0.798273802,
0.938193262, 0.839317203, 0.990858972, 0.590011358, 0.042210646,
0.074093886, 0.548788846, 0.916915476, 0.836126328, 0.575304508,
0.935497701, 0.127815932, 0.920728266, 0.104502067, 0.921889246,
1.03024137, 0.907672346, 0.920933843, 1.002946377, 0.903099537,
0.083944403, 0.922207296, 0.956200302, 0.936974704, 0.937197804,
0.270489872, 0.625058591, 0.496246278, 1.073989391, 0.593650937,
0.592372119, 0.694542348, 0.625950456, 0.619678259, 0.570666313,
0.871415496, 0.946574152, 0.728291929, 0.722327173, 0.946510434,
0.926541567, 1.049453616, 0.943204463, 1.03007555, 0.50816232,
0.835366428, 0.918267071, 0.787079275, 0.868908703, 0.951541662,
0.811538815, 0.61506027, 0.663948357, 0.586418152, 0.898504972,
0.1523799, 0.914196193, 0.583227396, 0.606079459, 0.213126272,
0.986245692, 0.870046079, 0.869732857, 0.604211867, 0.736863017,
0.648767114, 0.939423501, 0.557043076, 0.804438114, 0.532972872,
0.598525584, 0.841363668, 1.029941678, 0.612435043, 0.615830719,
0.509812713, 0.497207224, 0.609743237, 0.897805572, 0.863769054,
0.864284277, 0.756386161, 0.861637115, 0.617861569, 0.612092674,
0.622858763, 0.583585918, 0.614777744, 0.603289545, 0.619621992,
0.586993933, 0.593338847, 0.614418983, 0.779004991, 0.70745641,
0.11726483, 0.775427818, 0.74606353, 0.851781547, 0.919092059,
0.924776435, 0.829707384, 0.580720782, 0.596592605, 0.519732594,
0.421046019, 0.215226546, 0.556450188, 0.759358466, 0.824817002,
0.577669203, 0.169151321, 0.881558836, 0.599436522, 0.90624404,
0.604998171, 0.622988939, 1.034414053, 0.626509905, 0.632660449,
0.89102143, 0.604998171, 0.918262541, 0.55049324), Impartially = c(0.465658277,
0.461714715, 0.497125953, 0.520229161, 0.401690006, 0.802266479,
0.894968808, 0.493858635, 0.84177649, 0.737350881, 0.889409304,
0.759607494, 0.847863555, 0.862213612, 0.109956756, 0.771663547,
0.793201268, 0.509038925, 0.069157727, 0.914556921, 0.524168909,
0.847581744, 0.51422137, 0.522430778, 0.704787254, 0.489853799,
0.838132501, 0.486388475, 0.479211062, 0.93793416, 0.875783682,
0.516177416, 0.552293181, 0.549585342, 0.547982574, 0.549040437,
0.829416931, 0.916274369, 0.618573725, 0.305697709, 0.092083447,
0.073197983, 0.008801615, 0.012722424, 0.028896471, 0.043793023,
0.888742805, 0.90670836, 0.875128031, 0.862690747, 0.92150861,
0.02294177, 0.92108041, 0.896055102, 0.44906044, 0.499450028,
0.791781664, 0.680421829, 0.690234244, 0.738181829, 0.893988132,
0.921715975, 0.798098445, 0.910879731, 0.779454529, 0.009174875,
0.512821853, 0.95919919, 0.92872417, 0.63617301, 0.100211762,
0.828822911, 0.428452343, 0.941970348, 0.890839458, 0.929689884,
0.904637277, 0.909784257, 0.031725951, 0.918267727, 0.899998665,
0.005025526, 0.908758759, 0.920179784, 0.921686828, 0.948506773,
0.924896836, 0.925002098, 0.785541952, 0.503044903, 0.803664029,
0.082803823, 0.545991898, 0.570557058, 0.305972755, 0.83199054,
0.532894731, 0.555229306, 0.887567461, 0.938506544, 0.761560678,
0.323256642, 0.947489619, 0.923677325, 0.0566625, 0.938717306,
0.045118894, 0.795827687, 0.525770962, 0.897508681, 0.769734502,
0.883341014, 0.051330354, 0.244468406, 0.485932499, 0.449341804,
0.538715363, 0.138629705, 0.879033506, 0.920966506, 0.540138841,
0.505164027, 0.791890979, 0.027370578, 0.854349852, 0.15208894,
0.553023458, 0.468633413, 0.50540942, 0.069690846, 0.827210844,
0.861133993, 0.591059625, 0.50319165, 0.470964789, 0.034407794,
0.517081559, 0.504993081, 0.700809896, 0.775536239, 0.517217338,
0.870460451, 0.143105969, 0.883978724, 0.728849649, 0.826070011,
0.495627373, 0.502366483, 0.503692865, 0.519353449, 0.482581139,
0.50265485, 0.495607883, 0.525813699, 0.513563395, 0.514401138,
0.839320302, 0.775061905, 0.887907207, 0.853994012, 0.807586253,
0.188630834, 0.918224454, 0.18752791, 0.820726156, 0.454573005,
0.540567636, 0.784800649, 0.584334373, 0.839739561, 0.563928843,
0.798892915, 0.639075577, 0.554966569, 0.848121166, 0.872286737,
0.530407548, 0.872425079, 0.540660083, 0.525467217, 0.038644876,
0.490734726, 0.501179636, 0.886452258, 0.540660083, 0.93474859,
0.581967294)), row.names = c(2L, 3L, 4L, 6L, 7L, 8L, 9L, 10L,
11L, 12L, 13L, 14L, 16L, 17L, 19L, 20L, 21L, 22L, 24L, 25L, 27L,
28L, 29L, 30L, 31L, 32L, 34L, 35L, 37L, 38L, 39L, 40L, 41L, 42L,
43L, 44L, 45L, 51L, 57L, 61L, 65L, 68L, 73L, 74L, 76L, 81L, 82L,
83L, 84L, 88L, 89L, 92L, 93L, 98L, 100L, 101L, 110L, 116L, 121L,
123L, 125L, 126L, 127L, 129L, 130L, 132L, 133L, 134L, 135L, 143L,
146L, 147L, 154L, 157L, 159L, 160L, 161L, 162L, 163L, 164L, 165L,
168L, 169L, 170L, 173L, 177L, 178L, 180L, 181L, 182L, 183L, 186L,
188L, 190L, 191L, 192L, 195L, 196L, 201L, 209L, 213L, 218L, 219L,
222L, 223L, 224L, 225L, 226L, 233L, 234L, 235L, 236L, 239L, 241L,
242L, 243L, 252L, 253L, 256L, 265L, 267L, 270L, 277L, 278L, 281L,
282L, 283L, 284L, 285L, 288L, 294L, 295L, 296L, 297L, 299L, 300L,
303L, 308L, 313L, 314L, 315L, 316L, 317L, 320L, 333L, 337L, 339L,
340L, 343L, 347L, 351L, 352L, 353L, 354L, 355L, 357L, 358L, 360L,
361L, 362L, 363L, 364L, 367L, 368L, 371L, 373L, 374L, 375L, 376L,
378L, 380L, 381L, 382L, 385L, 387L, 389L, 391L, 392L, 393L, 394L,
395L, 396L, 398L, 399L, 401L, 402L, 403L), class = "data.frame")
and would like to create a ternary plot as this one:
So far, I have this code:
plot <- ggtern(data=behavior,aes(x=Respectfully,
z=Transparently,
y=Impartially)) +
geom_point(size=3,fill="yellow",color="red",shape=21)
plot
which gives me this:
How do I rotate the triangle canvas to fit the plot?.
I have searched and could not find any help online.
I don't want to rotate the whole diagram, just the triangle. I want to retain the points as they are.

You can rotate the diagram by an angle (in degrees or radians) using theme_rotate.
library(ggtern)
plot <- ggtern(data=behavior,aes(x=Impartially,
z=Transparently,
y=Respectfully)) +
geom_point(size=3, fill="yellow", color="red", shape=21) +
theme_rotate()
plot

How to apply functions to entire df inside dplyr group_by()?

I have these functions:
foo <- function(z){
bob <- which(z$signchg!=0)
z$crit1 <- "opening"
ifelse(length(bob)==0, z$crit1 <- "opening",
ifelse(length(bob)==1,
z$crit1[match(min(bob, na.rm=T), as.numeric(rownames(z)))] <- "opening",
z$crit1[match(min(bob, na.rm=T), as.numeric(rownames(z))):match(max(bob[bob!=max(bob, na.rm=T)], na.rm=T), as.numeric(rownames(z)))] <- "unconscious follow"))
z$crit1
}
foo2 <- function(y){
bob <- which(y$signchg!=0)
y$crit2 <- "opening"
ifelse(length(bob)!=0,
ifelse(length(y[y$crit1=="unconscious follow",]$sacc)==0,
y$crit2[match(max(bob, na.rm=T), as.numeric(rownames(y))):nrow(y)] <- "opening",
ifelse(length(head(which(y$sacc>max(y[y$crit1=="unconscious follow",]$sacc, na.rm=T)),1))==0, y$crit2 <- "opening",
y$crit2[match(max(bob, na.rm=T), as.numeric(rownames(y))):head(which(y$sacc>max(y[y$crit1=="unconscious follow",]$sacc, na.rm=T)),1)] <- "unconscious follow")),
y$crit2 <- "opening")
y$crit2
}
foo3 <- function(x){
bob <- which(x$signchg!=0)
x$closing <- "opening"
ifelse(length(bob)!=0,
x$closing[1:match(min(bob), as.numeric(rownames(x)))-1] <- "closing", x$closing <- "opening")
x$closing
}
Data
Following is the data set containing 3 unique Vehicle.IDs (8, 12 and 1179). I took a sample of 50 rows:
> dput(ntraj1oo)
structure(list(Vehicle.ID = c(1179L, 12L, 12L, 1179L, 1179L,
1179L, 8L, 1179L, 1179L, 1179L, 8L, 1179L, 12L, 1179L, 12L, 8L,
1179L, 12L, 1179L, 1179L, 12L, 8L, 8L, 1179L, 1179L, 8L, 8L,
12L, 1179L, 1179L, 12L, 1179L, 8L, 12L, 1179L, 1179L, 1179L,
12L, 1179L, 12L, 1179L, 1179L, 12L, 12L, 8L, 1179L, 12L, 1179L,
12L, 1179L), Frame.ID = c(3145L, 225L, 169L, 3549L, 3258L, 3262L,
289L, 3246L, 3155L, 3316L, 74L, 3124L, 135L, 3398L, 434L, 342L,
3288L, 93L, 3221L, 3384L, 293L, 347L, 452L, 3301L, 3165L, 448L,
230L, 400L, 3343L, 3302L, 305L, 3242L, 333L, 181L, 3362L, 3201L,
3356L, 150L, 3466L, 129L, 3123L, 3513L, 124L, 234L, 265L, 3440L,
407L, 3497L, 454L, 3208L), sacc = c(1.2024142815693, 0.167471842386292,
0.389526218261013, 1.0608535451082, 1.34658348989163, 1.30827746568167,
0.676275947080881, 1.56168338812933, 1.45322442414619, 0.236926713182157,
-0.331746789624733, -0.296457890957575, 0.578696068042145, -0.104188799716241,
1.64373161583451, 0.74974701439042, 1.024635813019, -0.212898242245164,
1.54066066716165, -0.439030115502196, -0.0908376863222584, 0.691762173865882,
0.0956005839166526, 0.681722722129702, 1.44251516088868, -0.0772419385643099,
0.430003386843667, 1.05958689269776, -0.402975701449174, 0.648704793894625,
-0.0106984134869645, 1.63176231974786, 0.884756294567357, 0.219219760305613,
-0.428935665947576, 1.54207226189423, -0.40185390261026, 0.441773747246007,
0.983291264446801, 0.596528992338635, -0.351283490561794, 1.11356697363866,
0.64253447660771, 0.0491453453593057, 0.715465534653409, 0.760489329987362,
1.17711496285387, 1.07374138870048, 1.45061613430159, 1.5589484008358
), relative.v = c(-7.20683108836496, 1.41754770518283, -0.298659684886637,
-6.37538134834612, -4.00321428084874, -3.82309181190075, -0.727408127343359,
-4.14013093963352, -6.7253476528766, 4.84058965232001, -2.51365849828336,
-4.82796782714515, -2.2317642496626, -1.54138020745749, -2.91023536393949,
-0.904299522098896, -0.549568281350204, -2.99526240263305, -6.18033016152812,
1.08350055196426, 2.52903114154146, -1.01292990996659, -2.54795991136474,
2.14686490991681, -7.03361953812604, -1.24128349787506, -0.149590211893916,
-4.29601660568767, 4.70617725169663, 2.47874406770293, -0.442134244952982,
-4.72366659693532, -1.10949949758366, 0.850218831661735, 2.42271763669292,
-8.2259447855115, 1.44195914620509, -1.88517424984066, -6.48099656406857,
-3.22006152601574, -4.53955604248154, -7.95149284172251, -3.95841822705948,
0.978824881565963, -0.832249768583615, -3.99216317969555, -4.56499371815966,
-5.89675705778252, -0.269620247442631, -7.75907851102451), nspacing = c(67.9564390167725,
64.4222965548587, 69.9984793222568, 203.630967606615, 142.825962756316,
144.4974871287, 69.5663930132816, 138.544960496636, 75.1355363890009,
145.313025161387, 62.76071823522, 52.3376957871262, 63.854711706948,
119.303164791766, 82.7183786313178, 78.0100285715123, 151.786017600382,
41.6146093571944, 124.898333310041, 118.810008693412, 57.9329927929634,
78.1975432716604, 97.9377561743831, 151.845647043811, 81.0478415333349,
97.4581470183944, 63.9970348761168, 67.6721711092462, 129.125820950528,
151.636781319948, 56.1796449012404, 136.907951327661, 77.12358891961,
68.5284958380145, 126.438422026932, 109.685235806325, 126.52282899785,
65.3271870401025, 148.692268232249, 62.3990368362372, 51.846063554017,
178.498350166457, 60.768801672643, 62.2994121863875, 69.1176002124943,
135.401524339836, 71.0466952274176, 167.365284062391, 85.027302124975,
115.693182668085)), class = c("tbl_df", "tbl", "data.frame"), .Names = c("Vehicle.ID",
"Frame.ID", "sacc", "relative.v", "nspacing"), row.names = c(901L,
606L, 550L, 1305L, 1014L, 1018L, 261L, 1002L, 911L, 1072L, 46L,
880L, 516L, 1154L, 815L, 314L, 1044L, 474L, 977L, 1140L, 674L,
319L, 424L, 1057L, 921L, 420L, 202L, 781L, 1099L, 1058L, 686L,
998L, 305L, 562L, 1118L, 957L, 1112L, 531L, 1222L, 510L, 879L,
1269L, 505L, 615L, 237L, 1196L, 788L, 1253L, 835L, 964L))
Applying Functions Produces Error
Now, applying the functions on this data gives error:
ovv <- ntraj1oo %>%
group_by(Vehicle.ID) %>%
mutate(signs = sign(relative.v),
signchg = c(NA, diff(signs))) %>%
do(data.frame(Frame.ID=.$Frame.ID,crit1=foo(.), crit2=foo2(.), closing=foo3(.))) %>%
inner_join(x=ntraj1oo, y=., by=c("Vehicle.ID", "Frame.ID")) %>%
mutate(behavior = ifelse(crit1=="unconscious follow" |crit2=="unconscious follow", "Unconscious Following",
ifelse(closing=="closing" & relative.v>0, "closing",
ifelse(closing=="closing" & relative.v<0, "Unconscious Following", "opening")))) %>%
ungroup()
Error
Error in match(max(bob, na.rm = T), as.numeric(rownames(y))):nrow(y) :
NA/NaN argument
But using the data for 1 vehicle only does not produce the error. I tested for vehicles 8, 12 and 1179 separately and there was no error.
These were only 3 vehicles and total 50 rows. If I apply the functions on the original data set having 944 Vehicle.IDs, I get following error:
Error in match(min(bob, na.rm = T), as.numeric(rownames(z))):match(max(bob[bob != :
NA/NaN argument
Again, using the complete data for 1 vehicle does not produce any error. Why is dplyr not applying the functions when Vehicle.ID is more than 1?

Trusty old workaround when group_by() fails, we revert to good old-fashioned row-indices and for-loops:
# Create new cols, with pessimism
ovv$crit1 <- NA
ovv$crit2 <- NA
ovv$closing <- NA
ovv$behavior <- NA
for (vid in uniq(ovv$Vehicle.ID)) {
ovv $newcol
# Form a row-index
I <- which(ovv$Vehicle.ID == vid)
# Apply your fns, vid-wise...
ovv[I,]$crit1 <- foo(ovv[I,])
ovv[I,]$crit2 <- foo2(ovv[I,])
ovv[I,]$closing <- foo3(ovv[I,])
}
ovv$behavior <- ifelse(...)

monthly average of working days data

I have daily time series (of working days) which I would like to transform in monthly average.
The date format is %d/%m/%Y, moreover there are some missing observations (NA).
How can I do this?
# my data
timeseries <- structure(c(309L, 319L, 329L, 339L, 348L, 374L, 384L, 394L, 404L, 413L,
2317L, 2327L, 2337L, 2347L, 2356L, 2382L, 2392L, 2402L, 2412L, 2421L, 2447L, 2457L,
422L, 432L, 441L, 467L, 477L, 487L, 497L, 506L, 2467L, 2477L, 2487L, 2497L, 2506L,
2532L, 2542L, 2552L, 2562L, 2571L, 2597L, 2607L, 2617L, 2627L, 2636L,
[...]), .Label = c("01/01/1992", "01/01/1993", "01/01/1996", "01/01/1997", "01/01/1998", "01/01/1999", "01/01/2001 [...] ), class = "factor")

You can do this many, many ways. Using base R packages:
d <- data.frame(Date=Sys.Date()+1:60, Data=1:60)
tapply(d$Data, format(d$Date,"%Y%m"), mean)
aggregate(d$Data, by=list(Date=format(d$Date,"%Y%m")), mean)

Create barplot R for coverage

I want to create a barplot and my data is in a csv file in the following format
0,22
40,50
80,62
120,70
160,62
200,49
240,52
280,64
320,57
360,50
400,47
440,52
480,73
520,70
560,68
600,71
640,69
680,61
720,59
760,59
800,62
840,62
880,62
920,72
960,81
1000,89
1040,86
1080,76
1120,80
1160,95
The element before the comma should be the position in the x axis and the element after the comma the height= of the bar at that position. I can do this in Excel but the data is large.
The graph I want would look like this.
I have tried the following but I think it sums the data in each row.
data <- as.matrix(read.csv(file="data.csv",sep=",",header=FALSE))
barplot(data)

barplot(x$V2, names.arg = seq_len(nrow(x)), cex.names = .6)

two things: first, if you supply the whole matrix to the height parameter of barplot, it will sum them. instead, give it only your data.
dput(dat)
structure(c(0L, 40L, 80L, 120L, 160L, 200L, 240L, 280L, 320L,
360L, 400L, 440L, 480L, 520L, 560L, 600L, 640L, 680L, 720L, 760L,
800L, 840L, 880L, 920L, 960L, 1000L, 1040L, 1080L, 1120L, 1160L,
22L, 50L, 62L, 70L, 62L, 49L, 52L, 64L, 57L, 50L, 47L, 52L, 73L,
70L, 68L, 71L, 69L, 61L, 59L, 59L, 62L, 62L, 62L, 72L, 81L, 89L,
86L, 76L, 80L, 95L), .Dim = c(30L, 2L), .Dimnames = list(NULL,
c("V1", "V2")))
barplot(height=dat[,2])
second, you need to supply the names.arg to barplot to get the labeling:
barplot(height=dat[,2], names.arg=dat[,1])
a side note: its best to avoid naming variables with built in R functions. ?data is probably the most commonly overwritten! I use dat instead regularly.

Using your method of getting the data into R:
myData <- read.csv(file = "data.csv", sep = ",", header = FALSE)
To make sure that the order of the bars follows the order of the values in the first column (although this is not strictly what you asked for in your question)
myData2 <- myData[order(myData[, 1]), ]
barplot(myData2[, 2], names.arg = myData2[, 1])
For tweaking the graph, I recommend spending some time reading ?barplot and ?par

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Which apply function in R to use for my calculations - r

Related

Remove middle inconsistent characters from a column header column name with r

How to rotate a Ternary plot in ggtern

How to apply functions to entire df inside dplyr group_by()?

monthly average of working days data

Create barplot R for coverage

Categories

Resources