Related
I have given a
a namend vector col:
col <- c(id = "CLUSTER", x = "LONGNUM", y = "LATNUM", n = "Severely.stunted.child.under.5.years..Total", pos = "Severely.stunted.child.under.5.years.Yes")
#a List of Dataframes with the the Elements of col as Columns and namend after a specific study area. (see deput below) The List Results contains 19 different files (different years)
a list of shapefiles of with 6 Elements (corresponding countries):
study_area <- c("Ethiopia", "Liberia", "Malawi", "Rwanda", "Uganda", "Zimbabwe")
Countries <- lapply(study_area, function(x){gisco_get_countries(country= x, resolution = 60 )})
Countries <- lapply(Countries, function(x) {as_Spatial(x, cast = TRUE, IDs = c("CNTR_NAME", "ISO§_CODE", "CNRT_ID", "NAME_ENGL", "FID"))})
names(Countries) <- study_area
I would like to preform the function from the prevR Library :
s.prevR(Results[[1]], col, Countries[[1]])
But actually for every element in the lists where the names fit:
I tried something like: Map(function(x, y) { as.prevR(x, col, y)}, Results, Countries)
But there it does (obviously) not match by names of x and y
dput( dput(Results[[1]][1:5,1:24])
structure(list(CLUSTER = c("", "1", "10", "100", "101"), Severely.stunted.child.under.5.years.No = c(3438,
8, 7, 9, 6), Severely.stunted.child.under.5.years.Yes = c(1047,
4, NA, 7, 1), Severely.stunted.child.under.5.years..Total = c(4485,
12, 7, 16, 7), Stunted.child.under.5.years.No = c(2531, 2, 7,
7, 5), Stunted.child.under.5.years.Yes = c(1954, 10, NA, 9, 2
), Stunted.child.under.5.years..Total = c(4485, 12, 7, 16, 7),
Severely.wasted.child.under.5.years.No = c(4295, 11, 7, 16,
7), Severely.wasted.child.under.5.years.Yes = c(190, 1, NA,
NA, NA), Severely.wasted.child.under.5.years..Total = c(4485,
12, 7, 16, 7), Wasted.child.under.5.years.No = c(3957, 10,
7, 16, 6), Wasted.child.under.5.years.Yes = c(528, 2, NA,
NA, 1), Wasted.child.under.5.years..Total = c(4485, 12, 7,
16, 7), Severely.underweight.child.under.5.years.No = c(4028,
10, 7, 12, 7), Severely.underweight.child.under.5.years.Yes = c(457,
2, NA, 4, NA), Severely.underweight.child.under.5.years..Total = c(4485,
12, 7, 16, 7), Underweight.child.under.5.years.No = c(3185,
7, 7, 12, 5), Underweight.child.under.5.years.Yes = c(1300,
5, NA, 4, 2), Underweight.child.under.5.years..Total = c(4485,
12, 7, 16, 7), LATNUM = c(NA, 10.889096, 5.323272, 8.830199,
10.806748), LONGNUM = c(NA, 37.269565, 39.556812, 40.72964,
39.7703), SurveyId = c("ET2005DHS", "ET2005DHS", "ET2005DHS",
"ET2005DHS", "ET2005DHS"), DHSC = c("ET", "ET", "ET", "ET",
"ET"), Country = c("Ethiopia", "Ethiopia", "Ethiopia", "Ethiopia",
"Ethiopia")), row.names = c(NA, 5L), class = "data.frame")
and Countries
dput(Countries[[1]])
new("SpatialPolygonsDataFrame", data = structure(list(CNTR_NAME = "Federal Democratic Republic of Ethiopia",
ISO3_CODE = "ETH", CNTR_ID = "ET", NAME_ENGL = "Ethiopia",
FID = "ET"), class = "data.frame", row.names = 1L), polygons = list(
new("Polygons", Polygons = list(new("Polygon", labpt = c(39.6420582930584,
8.63562315843106), area = 93.13026982, hole = FALSE, ringDir = 1L,
coords = structure(c(41.6307, 42.4043, 41.816, 41.8348,
42.9681, 42.7628, 42.9804, 43.9589, 45.6126, 46.9411,
47.8524, 45.6126, 45.4747, 45.2923, 44.9162, 43.4741,
42.8138, 41.9101, 41.2328, 40.708, 39.9305, 39.5667,
38.9731, 38.1026, 36.9621, 35.9477, 35.8294, 35.3235,
35.0325, 34.9588, 34.5428, 33.7557, 33.0448, 33.2485,
33.8204, 34.0937, 34.1132, 34.4181, 34.8021, 35.2153,
35.6227, 36.1342, 36.5603, 37.2972, 37.5268, 37.9201,
38.5391, 39.0217, 40.0851, 40.8941, 41.6307, 13.3913,
12.4686, 11.6292, 11.0448, 10.9974, 10.7159, 10.0644,
9.0545, 8.4674, 8.0224, 7.9151, 5.5657, 5.4241, 5.2367,
4.9368, 4.7993, 4.301, 3.9823, 3.9616, 4.2326, 3.8858,
3.5224, 3.5158, 3.6459, 4.3833, 4.62, 5.2367, 5.413,
5.8494, 6.4537, 6.7418, 7.6074, 7.899, 8.381, 8.4168,
8.6026, 9.4986, 10.6735, 10.8052, 11.9187, 12.5064, 12.8315,
14.2577, 14.3876, 14.2588, 14.8128, 14.4413, 14.5899,
14.5456, 14.0891, 13.3913), dim = c(51L, 2L)))), plotOrder = 1L,
labpt = c(39.6420582930584, 8.63562315843106), ID = "1",
area = 93.13026982)), plotOrder = 1L, bbox = structure(c(33.0448,
3.5158, 47.8524, 14.8128), dim = c(2L, 2L), dimnames = list(c("x",
"y"), c("min", "max"))), proj4string = new("CRS", projargs = "+proj=longlat +datum=WGS84 +no_defs"))
If the Countries names are all in the Results names and if 'Results' have duplicates for names, then we can make the Countries to have the same length by replicating based on the names of the 'Results'
Map(function(x, y) { as.prevR(x, col, y)}, Results, Countries[names(Results)])
I have a dataframe with approximately 3 million rows. Each row is assigned a unique ID and has up to 4 dates. I wish to create a set of new columns for month and year (i.e. Jan-21, Feb-21, Mar-21, etc) and assign a value of "0" for each month/year prior to the first date, and then a value of "1" for the month/year containing the date for each ID, and maintain the value of "1" in each subsequent month/year column until the next column that matches the 2nd date.
I understand that it's easier to help me with examples, so I have put together this dput output with an example of what my current data looks like:
structure(list(id = c(1, 2, 3, 4, 5), date1 = structure(c(1623801600,
1615420800, 1654560000, 1620259200, 1615248000), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), date2 = structure(c(1629158400, 1621987200,
1658448000, 1623974400, NA), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
date3 = structure(c(NA, 1630454400, 1662076800, 1647907200,
NA), class = c("POSIXct", "POSIXt"), tzone = "UTC"), date4 = structure(c(NA,
1639008000, NA, NA, NA), class = c("POSIXct", "POSIXt"), tzone = "UTC")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -5L))
And this is what I would like it to look like:
structure(list(id = c(1, 2, 3, 4, 5), `Mar-21` = c(0, 1, 0, 0,
1), `Apr-21` = c(0, 1, 0, 0, 1), `May-21` = c(0, 2, 0, 1, 1),
`Jun-21` = c(1, 2, 0, 2, 1), `Jul-21` = c(1, 2, 0, 2, 1),
`Aug-21` = c(2, 2, 0, 2, 1), `Sep-21` = c(2, 3, 0, 2, 1),
`Oct-21` = c(2, 3, 0, 2, 1), `Nov-21` = c(2, 3, 0, 2, 1),
`Dec-21` = c(2, 4, 0, 2, 1), `Jan-22` = c(2, 4, 0, 2, 1),
`Feb-22` = c(2, 4, 0, 2, 1), `Mar-22` = c(2, 4, 0, 3, 1),
`Apr-22` = c(2, 4, 0, 3, 1), `May-22` = c(2, 4, 0, 3, 1),
`Jun-22` = c(2, 4, 1, 3, 1), `Jul-22` = c(2, 4, 2, 3, 1),
`Aug-22` = c(2, 4, 2, 3, 1), `Sep-22` = c(2, 4, 3, 3, 1)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -5L))
Just a note that I have this dataset in both wide and long format, in case using it in a long format makes more sense.
Thank you!
This was a fun exercise! I'm sure there are a billion ways to do this more efficiently, but I think this works and was a fun puzzle for me. I first put the dates into long format to get a min and max. Then I made a sequence of those dates by month. I then used expand grid to make all combinations of the months with each ID to join it to the original data frame. Then I just summed how many dates1:4 were greater then the months in the list. I had to use floor_date to change dates1:4 to the first of the month. Hopefully this helps!
library(dplyr)
library(lubridate)
library(tidyr)
dat2<-dat%>%
tidyr::pivot_longer(cols = -id, values_drop_na = T)
dat_min_max<-data.frame("Min" = min(dat2$value), "Max" = max(dat2$value))
month_seq<-seq(dat_min_max$Min, dat_min_max$Max+months(1), by = "month")
dat3<-dat%>%
mutate(date1 = floor_date(date1, "month"),
date2 = floor_date(date2, "month"),
date3 = floor_date(date3, "month"),
date4 = floor_date(date4, "month")
)%>%
left_join(expand.grid(dat$id, month_seq), by = c("id" = "Var1"))%>%
rowwise()%>%
mutate(c = sum(date1 <= Var2, date2 <= Var2, date3 <= Var2, date4 <= Var2, na.rm = T))%>%
mutate(Var2 = format(Var2, "%b-%y"))%>%
select(-date1, -date2, -date3, -date4)%>%
tidyr::pivot_wider(names_from = Var2, values_from = c)
I have a dataset comprised of leaves which I've weighed individually in order of emergence (first emerged through final emergence), and I'd like to combine these masses so that I have the entire mass of all the leaves for each individual plant.
How would I add these up using R programming language, or what would I need to google to get started on figuring this out?
structure(list(Tray = c(1, 1, 1, 1, 1, 1), Plant = c(2, 2, 2,
2, 3, 3), Treatment = structure(c(4L, 4L, 4L, 4L, 4L, 4L), .Label = c("2TLH",
"E2TL", "EH", "WL"), class = "factor"), PreSwitch = c("Soil",
"Soil", "Soil", "Soil", "Soil", "Soil"), PostSwitch = c("Soil",
"Soil", "Soil", "Soil", "Soil", "Soil"), Pellet = c(1, 1, 1,
1, 1, 1), Rep = c(1, 1, 1, 1, 1, 1), Date = structure(c(1618963200,
1618963200, 1618963200, 1618963200, 1618963200, 1618963200), tzone = "UTC", class = c("POSIXct",
"POSIXt")), DAP = c(60, 60, 60, 60, 60, 60), Position = c(2,
1, 3, 4, 4, 3), Whorl = structure(c(1L, 1L, 2L, 2L, 2L, 2L), .Label = c("1",
"2", "3", "4", "5"), class = "factor"), PetioleLength = c(1.229,
1.365, 1.713, 1.02, 0, 1.408), BladeLength = c(1.604, 1.755,
2.466, 2.672, 0.267, 2.662), BladeWidth = c(1.023, 1.185, 1.803,
1.805, 0.077, 1.771), BladeArea = c(1.289, 1.634, 3.492, 3.789,
0.016, 3.704), BladePerimeter = c(6.721, 7.812, 11.61, 12.958,
1.019, 14.863), BladeCircularity = c(0.359, 0.336, 0.326, 0.284,
0.196, 0.211), BPR = c(1.30512611879577, 1.28571428571429, 1.43957968476357,
2.61960784313725, NA, 1.890625), Leaf.Mass = c(9, 11, 31, 33,
32, 33), BladeAR = c(1.56793743890518, 1.48101265822785, 1.36772046589018,
1.4803324099723, 3.46753246753247, 1.50310559006211), Subirrigation = c(0,
0, 0, 0, 0, 0), Genotype = c(1, 1, 1, 1, 1, 1), Location = c(0,
0, 0, 0, 0, 0)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
I may be missing something but isn't this a sum by Plant?
One solution below sums it for each plant into a separate table with just the totals and the second summarizes and adds it back to the main data set in a single step.
library(tidyverse)
#summary data set
plant_total <- df %>% group_by(Plant) %>% summarize(plant_weight = sum(Leaf.Mass, na.rm= TRUE))
#add plant_weight column to df data set
plant_total <- df %>% group_by(Plant) %>% mutate(plant_weight = sum(Leaf.Mass, na.rm = TRUE))
I am having issues with the ICC function from the psych package in R. Pretty much we had three technicians (AA,AB,AC) who measure 11 control solutions three times. We know the control values for these solutions(F_exp). The three measurements were averaged, leaving to AA_avg,AB_avg,AC_avg.
I am trying to calculate the Inter-rater reliability of these three technicians (It reflects the variation between 2 or more raters who measure the same group of subjects). I am planning to use ICC (2,1)
When I try to run
ICC(try[3:5]) # n*p matrix where n=subjects, p=raters.
I get the following results:
I am not sure what to do. I am feeding the data as instructed. When I do it with icc in the irr package, which is more specific with its format of data, I get:
And well and ICC of 0.999998 seems too good to be true. I would really appreciate any help. Thank you!
Here is the structure of my data:
try<-structure(list(Input = c(1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12),
F_Exp = c(3, 100, 1, 40, 4, 40, 4, 40, 1, 40, 100), AA_avg = c(3.11666666666667,
103.716666666667, 1, 40.8333333333333, 4.18333333333333,
40.8666666666667, 4.18333333333333, 40.9166666666667, 1.03333333333333,
40.9333333333333, 103.783333333333), AB_avg = c(3.25, 103.016666666667,
1.13333333333333, 40.8333333333333, 3.94666666666667, 40.45,
4.28333333333333, 41.1166666666667, 1.05, 40.9166666666667,
104), AC_avg = c(3.2, 103.55, 1.23333333333333, 40.9, 4.26666666666667,
40.4, 4.28333333333333, 40.9, 1.05, 40.95, 103.733333333333
), ALL_avg = c(3.18888888888889, 103.427777777778, 1.12222222222222,
40.8555555555556, 4.13222222222222, 40.5722222222222, 4.25,
40.9777777777778, 1.04444444444444, 40.9333333333333, 103.838888888889
), AA_error = c(-0.116666666666667, -3.71666666666667, 0,
-0.833333333333336, -0.183333333333334, -0.866666666666667,
-0.183333333333334, -0.916666666666664, -0.0333333333333334,
-0.93333333333333, -3.78333333333333), AB_error = c(-0.25,
-3.01666666666667, -0.133333333333333, -0.833333333333336,
0.0533333333333332, -0.450000000000003, -0.283333333333333,
-1.11666666666667, -0.05, -0.916666666666664, -4), AC_error = c(-0.2,
-3.55, -0.233333333333333, -0.899999999999999, -0.266666666666667,
-0.399999999999999, -0.283333333333333, -0.899999999999999,
-0.05, -0.950000000000003, -3.73333333333333)), row.names = c(NA,
-11L), groups = structure(list(Input = c(1, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12), .rows = structure(list(1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 9L, 10L, 11L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -11L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
Your raters' scores are negligibly different across both average rating and rating for each id. It can't estimate random intercept variation if there isn't any. Why don't you believe that your ICC is really high?
Between-rater means:
lapply(try[, 3:5], mean)
$AA_avg
[1] 34.96061
$AB_avg
[1] 34.90879
$AC_avg
[1] 34.95152
I'm doing Propensity Score Matching and want to subset the data for treatment and control by using weights. There are 5 variables: ID, treatment(yes/No), Outcome(Yes/No), Age and "Weights". I was trying to write a programme in R, but have problems to do this according to weights. The survey package is used.
dput(dat2):
structure(list(ID = c(1, 2, 3, 4, 6, 7),
Weight = c(2.4740626, 2.4740626, 2.4740626, 2.4740626, 1.9548149, 1.9548149),
Age = c("35-44", "<15-24", "25-34", "35-44", ">45", "25-34"),
Treatment = c(1, 0, 0, 1, 0, 0),
Outcome = c(1, 1, 1, 0, 1, 1)),
row.names = c(NA, -6L),
class = c("tbl_df", "tbl", "data.frame")))
head(dat2):
data<-svydesign(ids = ~dat2$Id,
weights = ~dat2$Weight,
data = dat2)
treat<-subset(dat, dat2$treatment==1)
cont<-subset(dat, dat2$treatment==0)
I am sharing sample of data. I have 1587 rows. When I am finding dimensions without weights then the dimensions of treat and cont is 877*5 and 710*5 respectively. But with weights it will be 803*5 and 784*5.
Please help me.
Thanks in advance.
One way to do this is as below:
Sample Data
dat2 <- structure(list(ID = c(1, 2, 3, 4, 6, 7),
Weight = c(2.4740626, 2.4740626, 2.4740626, 2.4740626, 1.9548149, 1.9548149),
Age = c("35-44", "<15-24", "25-34", "35-44", ">45", "25-34"),
Treatment = c(1, 0, 0, 1, 0, 0),
Outcome = c(1, 1, 1, 0, 1, 1)),
row.names = c(NA, -6L),
class = c("tbl_df", "tbl", "data.frame"))
Script
data<-svydesign(ids = ~dat2$ID,
weights = ~dat2$Weight,
data = dat2)
treat<-subset(data, Treatment==1)
cont<-subset(data, Treatment==0)