How to convert dataframe rows to variables in R - r

If I have a 2 column, 4 row data frame such as:
structure(list(var = c("url.loc", "radius", "jt", "post.age"),
value = c("london", "25", "fulltime", "7")), row.names = c(NA,
-4L), spec = structure(list(cols = list(var = structure(list(), class = c("collector_character",
"collector")), value = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ","), class = "col_spec"), problems = <pointer: 0x60000323fee0>, class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"))
... how would I go about converting this to 4 variables so the result would be the same as:
url.loc <- "london"
radius <- "25"
jt <- "fulltime"
post.age <- "7"
I can think of using assign and a loop, but I'm thinking there's a more elegant way, perhaps using NSE.
thanks

We may create a named list and use list2env
library(tibble)
list2env(as.list(deframe(df1)), .GlobalEnv)
-checking
> url.loc
[1] "london"
> radius
[1] "25"
A compact option is with %=% from collapse
library(collapse)
df1$var %=% df1$value
-checking
> url.loc
[1] "london"
> radius
[1] "25"

Related

Count edges using the adjacency matrix

Based on the adjacency matrix, I would like to count the number of unique edges in a network. In the below example I coloured the unique edges between the different nodes. But I don't know how to proceed.
Desired output:
Sample data
structure(list(...1 = c("m1", "m2", "m3", "m4"), m1 = c(0.2,
0.2, 0.2, 0.3), m2 = c(0.1, 0.2, 0.2, 0.6), m3 = c(0.5, 0.2,
1, 0), m4 = c(0.3, 0, 0, 0.1)), row.names = c(NA, -4L), spec = structure(list(
cols = list(...1 = structure(list(), class = c("collector_character",
"collector")), m1 = structure(list(), class = c("collector_double",
"collector")), m2 = structure(list(), class = c("collector_double",
"collector")), m3 = structure(list(), class = c("collector_double",
"collector")), m4 = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ","), class = "col_spec"), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"))
Assuming that this is an undirected graph such that 0 indicates no edge and a positive number indicates an edge, convert the input DF to a logical matrix and from that to an igraph object. Then get its edges and the names of those edges. (Another possible output is by using as_edgelist(g) to get a 2 column matrix such that each row defines an edge.)
If it were intended that the graph be directed then replace "undirected" with "directed" and in that case a character vector of 13 edge names will be produced instead of the 9 undirected edges shown below.
library(igraph)
m <- as.matrix(DF[-1])
rownames(m) <- colnames(m)
g <- graph_from_adjacency_matrix(m > 0, "undirected")
e <- E(g)
attr(e, "vnames")
## [1] "m1|m1" "m1|m2" "m1|m3" "m1|m4" "m2|m2" "m2|m3" "m2|m4" "m3|m3" "m4|m4"
Alternately as a pipeline
library(igraph)
library(tibble)
DF %>%
column_to_rownames("...1") %>%
as.matrix %>%
sign %>%
graph_from_adjacency_matrix("undirected") %>%
E %>%
attr("vnames")
## [1] "m1|m1" "m1|m2" "m1|m3" "m1|m4" "m2|m2" "m2|m3" "m2|m4" "m3|m3" "m4|m4"
The graph of g looks like this. (If "directed" had been chosen above then the edges would have arrowheads on them.)
set.seed(123)
plot(g)
Note
DF <-
structure(list(...1 = c("m1", "m2", "m3", "m4"), m1 = c(0.2,
0.2, 0.2, 0.3), m2 = c(0.1, 0.2, 0.2, 0.6), m3 = c(0.5, 0.2,
1, 0), m4 = c(0.3, 0, 0, 0.1)), row.names = c(NA, -4L), spec = structure(list(
cols = list(...1 = structure(list(), class = c("collector_character",
"collector")), m1 = structure(list(), class = c("collector_double",
"collector")), m2 = structure(list(), class = c("collector_double",
"collector")), m3 = structure(list(), class = c("collector_double",
"collector")), m4 = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ","), class = "col_spec"), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"))

Loop in tidyverse

I am learning tidyverse() and I am using a time-series dataset, and I selected columns that start with sec. What I would like basically to identify those values from columns that equal 123, keep these and have the rest replace with 0. But I don't know how to loop from sec1:sec4. Also how can I sum() per columns?
df1<-df %>%
select(starts_with("sec")) %>%
select(ifelse("sec1:sec4"==123, 1, 0))
Sample data:
structure(list(sec1 = c(1, 123, 1), sec2 = c(123, 1, 1), sec3 = c(123,
0, 0), sec4 = c(1, 123, 1)), spec = structure(list(cols = list(
sec1 = structure(list(), class = c("collector_double", "collector"
)), sec2 = structure(list(), class = c("collector_double",
"collector")), sec3 = structure(list(), class = c("collector_double",
"collector")), sec4 = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ","), class = "col_spec"), row.names = c(NA,
-3L), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"))
I think you would have to use mutate and across to accomplish this. below you will mutate across each column starting with sec and then keep all values that are 123 and replace all others with 0.
df1<-df %>%
select(starts_with("sec")) %>%
mutate(across(starts_with("sec"),.fns = function(x){ifelse(x == 123,x,0)}))

Unexpected behavior of filter inside a function dplyr

I have a function that filters a data.frame based on the unique values of a group column that is passed to the function
la <- function(df, grp){
gr <- df %>% pull({{grp}}) %>% unique()
purrr::map(gr, function(x){
print(x)
filter(df, {{grp}} == x)
})
}
When I use it with this df,
x <- structure(list(mac = c("dc:a6:32:21:59:2b", "dc:a6:32:2d:8c:ca",
"dc:a6:32:2d:b8:62", "dc:a6:32:2d:ca:3f"), datetime = structure(c(1594644546,
1594645457, 1594645375, 1594645080), tzone = "UTC", class = c("POSIXct",
"POSIXt")), Comment = c("FED2", "FED7", "FED1", "FED6")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -4L))
la(x, mac)
I get the proper prints and the subsets.
However, when I use it with this other df, which should be equivalent, it doesn't work as expected.
df <- structure(list(datetime = structure(c(1594644600, 1594644900,
1594645200, 1594645500, 1594645800, 1594646100), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), movement = c(9940.50454596681, 10779.7747307276,
7148.52826988968, 7687.54314683339, 8797.06954533588, 7524.02474093548
), x = c(606, NA, 240, NA, 504, NA), y = c(386, NA, 274, NA,
56, NA), i_x = c(606, 228, 214, 407.5, 500, 292.947368421053),
i_y = c(386, 286, 258, 49.1666666666667, 56, 234), mac = c("dc:a6:32:21:59:2b",
"dc:a6:32:21:59:2b", "dc:a6:32:21:59:2b", "dc:a6:32:21:59:2b",
"dc:a6:32:21:59:2b", "dc:a6:32:21:59:2b")), spec = structure(list(
cols = list(filename = structure(list(), class = c("collector_character",
"collector")), datetime = structure(list(format = ""), class = c("collector_datetime",
"collector")), movement = structure(list(), class = c("collector_double",
"collector")), x = structure(list(), class = c("collector_double",
"collector")), y = structure(list(), class = c("collector_double",
"collector")), i_x = structure(list(), class = c("collector_double",
"collector")), i_y = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = "\t"), class = "col_spec"), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
I get 0 rows on each type of group (my real example has the same groups as the ones for the x dataframe).
Interestingly, this works as expected.
la(select(head(df), mac, datetime), mac)
[1] "dc:a6:32:21:59:2b"
[[1]]
# A tibble: 6 x 2
mac datetime
<chr> <dttm>
1 dc:a6:32:21:59:2b 2020-07-13 12:50:00
2 dc:a6:32:21:59:2b 2020-07-13 12:55:00
3 dc:a6:32:21:59:2b 2020-07-13 13:00:00
4 dc:a6:32:21:59:2b 2020-07-13 13:05:00
5 dc:a6:32:21:59:2b 2020-07-13 13:10:00
6 dc:a6:32:21:59:2b 2020-07-13 13:15:00
What is going on?
As the comment suggests, the problem is that I have function(x) inside the map call and because df has an x column, things become weird. I chose another variable name for that, and now it's working.
la <- function(df, grp){
gr <- df %>% pull({{grp}}) %>% unique()
purrr::map(gr, function(tt){
print(tt)
filter(df, {{grp}} == tt)
})
}

Visualize bubbles on a map, using hc_add_series_map() instead of hcmap()

I am trying to visualize a bubble map, using highcharter.
I did it perfectly, using this code
library(highcharter)
library(tidyverse)
hcmap("custom/africa") %>%
hc_add_series(data = fake_data, type = "mapbubble", maxSize = '10%', color =
"Red", showInLegend = FALSE) %>%
hc_legend(enabled = FALSE)
My data
> dput(fake_data)
structure(list(country = c("DZ", "CD", "ZA", "TZ"), lat = c(28.033886,
-4.038333, -30.559482, -6.369028), lon = c(1.659626, 21.758664,
22.937506, 34.888822), name = c("Algeria", "Congo, Dem. Rep",
"South Africa", "Tanzania"), z = c(20, 5, 10, 1)), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -4L), spec =
structure(list(
cols = list(country = structure(list(), class = c("collector_character",
"collector")), lat = structure(list(), class = c("collector_double",
"collector")), lon = structure(list(), class = c("collector_double",
"collector")), name = structure(list(), class = c("collector_character",
"collector")), z = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
External geo data for Africa originally comes from this source and used with hcmap().
But I transform it into RDS and use locally. Available here.
My problem that I cannot use my code and external data due to corporate IT security restrictions. I cannot deploy this code with Shiny/RMarkdown on Connect, it is blocked.
So my solution currently
Use the same data in RDS format
africa_map_data <- readRDS("africa_map_data.RDS")
And use the hc_add_series_map() with local data instead of hcmap().
highchart() %>%
hc_add_series_map(
map = africa_map_data,
df = fake_data,
value = "z",
joinBy = c("hc-a2", "country"),
type = "mapbubble",
maxSize = '10%',
color = "Red"
)
But it does not work well, I get a mess.
How to create a bubble map with hc_add_series_map() (or any other way) without 'hcmap' and pulling external data.
Thanks!

How to Transform Data to Find Index with Same Value

I intend to find customers who have bought exactly the same products,
The data I have is customers' behaviors--what they have bought.
The example that I provided is a simplified version of my data. Customers will usually buy 10 to 20 products. There are around 50 products that consumers could choose to buy.
I am really confused what is an easy way to transform my data into the output that I prefer.
Could you please give me any advice? Thanks
Input:
structure(list(Customer_ID = 1:6, Products = c("Apple, Beer, Diaper",
"Beer, Apple", "Beer, Apple, Diaper, Diaper", "Apple, Diaper",
"Diaper, Apple", "Apple, Diaper, Beer, Beer")), .Names = c("Customer_ID",
"Products"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-6L), spec = structure(list(cols = structure(list(Customer_ID = structure(list(), class = c("collector_integer",
"collector")), Products = structure(list(), class = c("collector_character",
"collector"))), .Names = c("Customer_ID", "Products")), default = structure(list(), class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))
Output:
structure(list(`Products Bought` = c("Apple, Beer, Diaper", "Apple, Diaper"
), Customer_ID = c("1, 3, 6", "4, 5")), .Names = c("Products Bought",
"Customer_ID"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-2L), spec = structure(list(cols = structure(list(`Products Bought` = structure(list(), class = c("collector_character",
"collector")), Customer_ID = structure(list(), class = c("collector_character",
"collector"))), .Names = c("Products Bought", "Customer_ID")),
default = structure(list(), class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))
I am suspicious that you may want to look at structuring your data in a way that is more usable. In any case, the tidyverse can be a helpful way of thinking through your task.
As mentioned, posting code for others to start with can save them time and get you an answer faster.
library(dplyr)
library(stringr)
library(tidyr)
d <- data_frame(id=c(1,2,3,4,5,6)
, bought=c('Apple, Beer, Diaper','Apple, Beer', 'Apple, Beer, Diaper, Diaper'
, 'Apple, Diaper', 'Diaper, Apple', 'Apple, Diaper, Beer, Beer'))
d %>%
## Unnest the values & take care of white space
## - This is the better data structure to have, anyways
mutate(buy=str_split(bought,',')) %>%
unnest(buy) %>% mutate(buy=str_trim(buy)) %>% select(-bought) %>%
## Get distinct (and sort?)
distinct(id, buy) %>% arrange(id, buy) %>%
## Aggregate by id
group_by(id) %>% summarize(bought=paste(buy,collapse=', ')) %>% ungroup %>%
## Count
group_by(bought) %>% summarize(ids=paste(id,collapse=',')) %>% ungroup
EDIT: referencing this SO post for getting distinct combinations faster / cleaner in dplyr
Using the given input data and data.table, this can be written as (rather convoluted) "one-liner":
dcast(unique(setDT(input)[, strsplit(Products, ", "), Customer_ID])[
order(Customer_ID, V1)],
Customer_ID ~ ., paste, collapse = ", ")[
, .(Customers = paste(Customer_ID, collapse = ", ")), .(Products = .)]
# Products Customers
#1: Apple, Beer, Diaper 1, 3, 6
#2: Apple, Beer 2
#3: Apple, Diaper 4, 5
Note that the OP has dropped the second line with only one customer from
the expected output but hasn't mentioned any criteria for filtering the output in the question.
Input data
(As given by OP):
input <- structure(list(Customer_ID = 1:6, Products = c("Apple, Beer, Diaper",
"Beer, Apple", "Beer, Apple, Diaper, Diaper", "Apple, Diaper",
"Diaper, Apple", "Apple, Diaper, Beer, Beer")), .Names = c("Customer_ID",
"Products"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-6L), spec = structure(list(cols = structure(list(Customer_ID = structure(list(), class = c("collector_integer",
"collector")), Products = structure(list(), class = c("collector_character",
"collector"))), .Names = c("Customer_ID", "Products")), default = structure(list(), class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))

Resources