Randomize one column values based on multiple other columns - r

I have the following df:
structure(list(Donorcode = c("406A001", "406A002", "406A003",
"406A004"), Doos = c(1, 1, 2, 2), `Leeftijd T0` = c(70, 73, 79,
75), Instituut = c("Spaarne ziekenhuis", "Spaarne ziekenhuis",
"Spaarne ziekenhuis", "Spaarne ziekenhuis"), Datum = structure(c(1567468800,
1567468800, 1567468800, 1567468800), class = c("POSIXct", "POSIXt"
), tzone = "UTC")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-4L))
I need to randomize the column 'Donorcode' based on the other 4 columns, not one column 'weighs' more than the other so the order of which column randomizes the Donorcode column first does not matter.
Is there a way to do this in R?
Many thanks!

Related

group_by and lag not working for date in long format

I have a long historical data like this format (unbalanced). While there is a lag until the data is released (next business day), I would like to record the date as of the day it happened. I tried to use dplyr as follows:
dataframe<-dataframe%>%group_by(date)%>%mutate(cob=lag(date,n=1))
However, it just produces the same result as:
lag(date,1)
date
name
value
2023/1/2
a
X
2023/1/2
b
X
2023/1/2
c
X
2023/1/3
a
X
2023/1/3
b
X
2023/1/4
a
X
2023/1/4
b
X
2023/1/5
a
X
2023/1/5
b
X
2023/1/5
c
X
I thought about:
dataframe<-dataframe%>%group_by(name)%>%mutate(cob=lag(date,n=1))
but it produces NA when there is no observation for a certain sample.
mutate(cob=date-1)
is not considering business day.
I just would like to slide all the dates in dataframe$date by 1 business day.
I attached the part of the actual data (historical prices of Japanese treasury bills).
structure(list(date = c("2002-08-06", "2002-08-06", "2002-08-07",
"2002-08-07", "2002-08-09", "2002-08-09"), code = c(2870075L,
3000075L, 2870075L, 3000075L, 2870075L, 3000075L), due_date = c("2002-08-20",
"2002-09-10", "2002-08-20", "2002-09-10", "2002-08-20", "2002-09-10"
), ave_price = c(99.99, 99.99, 99.99, 99.99, 99.99, 99.99)), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L), groups = structure(list(
date = c("2002-08-06", "2002-08-07", "2002-08-09"), .rows = structure(list(
1:2, 3:4, 5:6), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -3L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE))
The expected outcome is as follows:
structure(list(date = c("2002-08-06", "2002-08-06", "2002-08-07",
"2002-08-07", "2002-08-09", "2002-08-09"), code = c(2870075L,
3000075L, 2870075L, 3000075L, 2870075L, 3000075L), due_date = c("2002-08-20",
"2002-09-10", "2002-08-20", "2002-09-10", "2002-08-20", "2002-09-10"
), ave_price = c(99.99, 99.99, 99.99, 99.99, 99.99, 99.99), cob = c(NA,
NA, "2002-08-06", "2002-08-06", "2002-08-07", "2002-08-07")), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L), groups = structure(list(
date = c("2002-08-06", "2002-08-07", "2002-08-09"), .rows = structure(list(
1:2, 3:4, 5:6), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -3L), .drop = TRUE))
Thank you very much in advance.
If I understand correctly, you want the previous date recorded in your date column as cob. So, your Aug 9 rows would have the previously recorded date of Aug 7 in your cob column.
If so, you could try the following. First, your example data above is grouped so I started with ungroup. You can get a vector of unique or distinct dates, and get the lag or previous date for those dates. In this case, dates of Aug 6, 7, and 9 will have cob set as NA, Aug 6, and Aug 7.
Then, you can join back to original data with right_join. The final select will keep columns and include order desired.
I left date alone (currently is character value, not in date format).
library(tidyverse)
df %>%
ungroup() %>%
distinct(date) %>%
mutate(cob = lag(date)) %>%
right_join(df) %>%
select(date, code, due_date, ave_price, cob)
Output
date code due_date ave_price cob
<chr> <int> <chr> <dbl> <chr>
1 2002-08-06 2870075 2002-08-20 100. NA
2 2002-08-06 3000075 2002-09-10 100. NA
3 2002-08-07 2870075 2002-08-20 100. 2002-08-06
4 2002-08-07 3000075 2002-09-10 100. 2002-08-06
5 2002-08-09 2870075 2002-08-20 100. 2002-08-07
6 2002-08-09 3000075 2002-09-10 100. 2002-08-07

How make two row names in the Rshiny?

I have a dataset, which has a common feature - at the end of column names (after comma) is written the group to which a specific column corresponds to. Is it possible to create a table where two row column names will be used? In an example, the first row is Up and goes Quantity, Price, Quality. Is it also possible to somehow separate by empty column/ or some bold border these 3 groups (Up, Down and Total)? I know there is DT library that helps to make it easier, however, I am looking for the solution using shiny library only.
I also found that using tags$style() and CSS can help to solve it, however not familiar with CSS.
library(shiny)
df <- structure(list(Year = c(2022L, 2022L, 2022L, 2022L, 2022L),
Week = c(0, 1, 2, 3, 4),
`Quantity, Up` = c(335, 305, 740, 910, 515),
`Price, Up` = c(1, 2, 5, 5, 3),
`Quality, Up` = c(243, 243, 243, 12321.434052, 1706.327462),
`Quantity, Down` = c(-175, -900, -205, -35, 0),
`Price, Down` = c(243, 243, 1219.717851, 902.819827, 0),
`Quality, Down` = c(2834.205418, 243, -1219.717851, 902.819827, 0),
`Quantity, Total` = c(510, 1205, 945, 945, 515),
`Price, Total` = c(431, 32, 423, 342, 243),
`Quality, Total` = c(24, 4, -2, 42, 1706.327462)),
class = c("grouped_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, -5L),
groups = structure(list(Year = 2022L, .rows = structure(list(1:5), ptype = integer(0), class = c("vctrs_list_of", "vctrs_vctr", "list"))),
class = c("tbl_df", "tbl", "data.frame" ), row.names = c(NA, -1L), .drop = TRUE))
ui <- fluidPage(
# Application title
titlePanel("aFRR"),
# plot graphs
mainPanel(tabsetPanel(
tabPanel("Up",
h3(helpText("aFRR Price and Quantity")),
tableOutput("table_up"))
)
)
)
server <- function(input, output, session) {
output$table_up <- renderTable(df, na = "missing",align = 'c',striped = TRUE
)
}
shinyApp(ui, server)

Spatial data: calculating the distance of points from the maximum point value and plotting

My question is similar to this post where the distance between each point was calculated.
In my case, I am looking to find the distance of each point to the point with the highest value. I would also like to plot this relationship with lm(), but I am struggling to achieve both tasks with spatial data objects.
My data does not need CRS, it is based on the Euclidean distance (because these points are in a room).
A mock example of the data below, where column variable is of interest.
> dput(dat)
structure(list(date.hour = structure(c(1551057840, 1551057840,
1551057840, 1551057840, 1551057840, 1551057840, 1551057840), tzone = "UTC", class = c("POSIXct",
"POSIXt")), id = c(2, 5, 7, 8, 9, 10, 11), variable = c(456,
27, 130, 116, 92, 141, 145), xy_coord = c("6.2 14.8", "8.2 8.9",
"4.2 8.9", "2.2 8.9", "8.2 3.5", "6.2 3.5", "4.2 3.5")), row.names = c(NA,
-7L), groups = structure(list(id = c(2, 5, 7, 8, 9, 10, 11),
date.hour = structure(c(1551057840, 1551057840, 1551057840,
1551057840, 1551057840, 1551057840, 1551057840), tzone = "UTC", class = c("POSIXct",
"POSIXt")), .rows = structure(list(1L, 2L, 3L, 4L, 5L, 6L,
7L), ptype = integer(0), class = c("vctrs_list_of", "vctrs_vctr",
"list"))), row.names = c(NA, -7L), class = c("tbl_df", "tbl",
"data.frame"), .drop = TRUE), class = c("grouped_df", "tbl_df",
"tbl", "data.frame"))
> dat
# A tibble: 7 x 4
# Groups: id, date.hour [7]
date.hour id variable xy_coord
<dttm> <dbl> <dbl> <chr>
1 2019-02-25 01:24:00 2 456 6.2 14.8
2 2019-02-25 01:24:00 5 27 8.2 8.9
3 2019-02-25 01:24:00 7 130 4.2 8.9
4 2019-02-25 01:24:00 8 116 2.2 8.9
5 2019-02-25 01:24:00 9 92 8.2 3.5
6 2019-02-25 01:24:00 10 141 6.2 3.5
7 2019-02-25 01:24:00 11 145 4.2 3.5
>
Turning the data frame into a SpatialPointsDataFrame with the sp() package:
#Split x and y to separate columns
dat$x <- sapply(strsplit(as.character(dat$xy_coord), " "), "[", 1); dat$x <- as.numeric(dat$x)
dat$y <- sapply(strsplit(as.character(dat$xy_coord), " "), "[", 2); dat$y <- as.numeric(dat$y)
#SpatialPointsDataFrame
coordinates(dat) <- ~x+y
This is the point where I don't know what steps to take, but I want to know the distance of all the points to the highest value:
which.max(dat#data$variable)
And then plot this relationship with base plot().
If my question is unclear please let me know.
I'm still not sure I understand your question but I propose the following answer.
Load packages
library(sf)
#> Linking to GEOS 3.9.1, GDAL 3.2.1, PROJ 7.2.1
library(tidyr)
Load data
dat = structure(
list(
date.hour = structure(
c(
1551057840, 1551057840, 1551057840, 1551057840, 1551057840,
1551057840, 1551057840
),
tzone = "UTC",
class = c(
"POSIXct",
"POSIXt"
)
),
id = c(2, 5, 7, 8, 9, 10, 11),
variable = c(
456, 27, 130, 116, 92, 141, 145
),
xy_coord = c(
"6.2 14.8", "8.2 8.9", "4.2 8.9", "2.2 8.9", "8.2 3.5", "6.2 3.5",
"4.2 3.5"
)
),
row.names = c(NA,-7L),
groups = structure(
list(
id = c(2, 5, 7, 8, 9, 10, 11),
date.hour = structure(
c(
1551057840, 1551057840, 1551057840, 1551057840, 1551057840,
1551057840, 1551057840
),
tzone = "UTC",
class = c(
"POSIXct",
"POSIXt"
)
),
.rows = structure(
list(1L, 2L, 3L, 4L, 5L, 6L, 7L),
ptype = integer(0),
class = c(
"vctrs_list_of", "vctrs_vctr", "list"
)
)
),
row.names = c(NA, -7L),
class = c("tbl_df", "tbl", "data.frame"),
.drop = TRUE
),
class = c("grouped_df", "tbl_df", "tbl", "data.frame")
)
Separate the xy_coord column, convert columns to numeric and create an sf object
dat_sf <- st_as_sf(
separate(dat, xy_coord, c("x", "y"), sep = " ", convert = TRUE),
coords = c("x", "y")
)
Find the maximum of variable
which.max(dat_sf[["variable"]])
#> [1] 1
Compute all distances
dat_sf[["distances"]] <- st_distance(dat_sf, dat_sf[1, ])
Plot
plot(variable ~ distances, data = dat_sf)
Created on 2021-11-22 by the reprex package (v2.0.1)
You can also remove the first point (with distance = 0).

map and mutate over a list of tbl_df

I am trying to map over a list of data frames in R but not getting it right. What I am trying is:
lst %>%
map(~mutate(., NewColumn1 = .x$value*2,))
With error:
Error: Column NewColumn1 must be length 2 (the number of rows) or
one, not 0 In addition: Warning message: Unknown or uninitialised
column: 'value'.
The data looks like:
[[9]]
# A tibble: 2 x 4
time ID Value out
<date> <chr> <dbl> <dbl>
1 2016-12-23 CAT1 790. 0
2 2016-12-27 CAT1 792. 1
[[10]]
# A tibble: 2 x 4
time ID Value out
<date> <chr> <dbl> <dbl>
1 2016-12-28 CAT1 785. 0
2 2016-12-29 CAT1 783. 0
DATA:
Data <- list(structure(list(time = structure(c(17136, 17137), class = "Date"),
ID = c("CAT1", "CAT1"), Value = c(747.919983, 750.5), out = c(0,
1)), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"
)), structure(list(time = structure(c(17140, 17141), class = "Date"),
ID = c("CAT1", "CAT1"), Value = c(762.52002, 759.109985),
out = c(1, 0)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(time = structure(c(17142,
17143), class = "Date"), ID = c("CAT1", "CAT1"), Value = c(771.190002,
776.419983), out = c(1, 1)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(time = structure(c(17144,
17147), class = "Date"), ID = c("CAT1", "CAT1"), Value = c(789.289978,
789.27002), out = c(1, 1)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(time = structure(c(17148,
17149), class = "Date"), ID = c("CAT1", "CAT1"), Value = c(796.099976,
797.070007), out = c(1, 0)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(time = structure(c(17150,
17151), class = "Date"), ID = c("CAT1", "CAT1"), Value = c(797.849976,
790.799988), out = c(1, 0)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(time = structure(c(17154,
17155), class = "Date"), ID = c("CAT1", "CAT1"), Value = c(794.200012,
796.419983), out = c(1, 0)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(time = structure(c(17156,
17157), class = "Date"), ID = c("CAT1", "CAT1"), Value = c(794.559998,
791.26001), out = c(0, 0)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(time = structure(c(17158,
17162), class = "Date"), ID = c("CAT1", "CAT1"), Value = c(789.909973,
791.549988), out = c(0, 1)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(time = structure(c(17163,
17164), class = "Date"), ID = c("CAT1", "CAT1"), Value = c(785.049988,
782.789978), out = c(0, 0)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame")))
Take a look at the error message Unknown or uninitialised column: 'value'., then look at your code map(Data, ~mutate(., NewColumn1 = .x$value*2,)). The column name is Value and not value (case is important!).
Your syntax can also be cleaned up a bit. Try map(Data, ~mutate(., NewColumn1 = Value*2)). Technically, I think . and .x refer to the same thing, but it's better to be consistent. In mutate you also don't need to subset the data frame, i.e. mutate(df, new_col = old_col) is enough, you don't need mutate(df, new_col = .$old_col).

Build a dataframe of nested tibbles in R?

I have a couple of tibbles:
1:
structure(list(contacts = c(151, 2243, 4122, 6833, 76, 123)), .Names = "contacts", row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
2:
structure(list(image_names = c("/storage/emulated/0/Pictures/1.png",
"/storage/emulated/0/Pictures/10.png", "/storage/emulated/0/Pictures/2.png",
"/storage/emulated/0/Pictures/3.png", "/storage/emulated/0/Pictures/4.png",
"/storage/emulated/0/Pictures/5.png")), .Names = "image_names", row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
3:
structure(list(phone_number = c(22881, 74049, 74049, 22881, 22881,
22881), isInContact = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE),
callDuration = c(1, 0, 0, 71, 13, 54), Date = structure(c(17689,
17689, 17689, 17690, 17690, 17690), class = "Date"), Time = structure(c(76180,
77415, 84620, 27900, 28132, 29396), class = c("hms", "difftime"
), units = "secs")), .Names = c("phone_number", "isInContact",
"callDuration", "Date", "Time"), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
And consider that for each set of these dataframes I can get an identifier, say UUID.
I want to build a large dataframe object where the identifier will be user's uuid and all other columns will be nested tibbles:
UUID contacts images call_logs
123 <tibble> <tibble> <tibble>
456 <tibble> <tibble> <tibble>
Please advise how can I build such thing, I am trying to use map_dfr without luck.
We could place the tibbles in a list to create a single row
tblN <- tibble(contacts = list(tbl1), images = list(tbl2),
call_logs = list(tbl3))
It is not clear whether the same dataset should be replicated or not for different 'UUID's.
list(`123` = tblN, `456` = tblN) %>%
bind_rows(.id = 'UUID')

Resources