Conditionally count rows based on month and value of a column - r

I have a dataset containing Airbnb listings. I want to count the number of listings for each host_id based on if they are Entire home or Shared home per month. Therefor I assume I need two additional columns with the count for each row (tot_EH and tot_SH).
I've posted an image below to show how the dataset looks like and the desired output (deleted some columns that are not relevant). Now I just used one host_id but in reality it's many different ones.
Marked the new columns in red and entered the desired output. Can't figure out how to proceed. Would really appreciate some help!

Got help from a colleague and this worked:
df <- df %>%
group_by(host_id, last_scraped) %>% # group data by host and month
mutate(count_listings_in_data = length(unique(id)), # for each host/month combination; count the number of unique listing IDs
count_shared_homes = length(unique(id[which(room_type_NV == "Shared home")])), # for each host/month combination; count the number of unique listing IDs for which the room type is "shared"
count_entire_homes = length(unique(id[which(room_type_NV == "Entire home")]))) # for each host/month combination; count the number of unique listing IDs for which the room type is "entire"

One data.table approach, assuming your data are in a data.frame named df
library(data.table)
setDT(df)
df[room_type_NV == "Entire Home" , tot_EH := .N, by=.(date, host_id)]
df[room_type_NV == "Shared Home" , tot_SH := .N, by=.(date, host_id)]

Base R Solution:
df$grouping_var <- paste(df$host_id, as.Date(df$date, "%m-%Y"), sep = "_")
count_df <- data.frame(do.call("rbind", lapply(split(df, df$grouping_var),
function(x){
tmp <- data.frame(t(tapply(x$room_type_NV, x$room_type_NV, length)))
return(cbind(x, data.frame(tmp[rep(seq_len(nrow(tmp)), nrow(x)), ], row.names = NULL)))
}
)
),
row.names = NULL
)
Data:
structure(list(id = c(2, 1, 3, 1, 2, 3, 1, 2, 1, 2), date = structure(c(16983,
16983, 16983, 17014, 17014, 17014, 17045, 17045, 17106, 17106
), class = "Date"), host_id = c(27280608, 27280608, 27280608,
27280608, 27280608, 27280608, 27280608, 27280608, 27280608, 27280608
), room_type_NV = structure(c(2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L,
1L, 2L), .Label = c("Entire home", "Shared home"), class = "factor"),
grouping_var = c("27280608_2016-07-01", "27280608_2016-07-01",
"27280608_2016-07-01", "27280608_2016-08-01", "27280608_2016-08-01",
"27280608_2016-08-01", "27280608_2016-09-01", "27280608_2016-09-01",
"27280608_2016-11-01", "27280608_2016-11-01")), row.names = c(NA,
-10L), class = "data.frame")

Related

Regression over time specific year as weight

I am doing a regression with panel data of EU countries over time with observations from 2007-16. I want to use the observation for 2007 for each specific country as the weight. Is there a simple way to do this?
The is essentially the regression I run, but I don't think the weighting is working as I intend it to.
lm(log(POP25) ~ log(EMPLOY25), weights = POP25, data = data)
structure(list(...1 = 1:6, TIME = 2007:2012, NUTS_ID = c("AT",
"AT", "AT", "AT", "AT", "AT"), NUMBER = c(1L, 1L, 1L, 1L, 1L,
1L), POP15 = c(5529.1, 5549.3, 5558.5, 5572.1, 5601.1, 5620.8
), POP20 = c(5047.1, 5063.2, 5072.6, 5090, 5127.1, 5151.9), POP25 = c(4544,
4560.7, 4571.3, 4587.8, 4621.5, 4639), EMPLOY15 = c(3863.6, 3928.7,
3909.3, 3943.9, 3982.3, 4013.4), EMPLOY20 = c(3676.2, 3737, 3723.8,
3761.9, 3802.3, 3835), EMPLOY25 = c(3333.5, 3390.4, 3384.7, 3424.6,
3454.4, 3486.4)), row.names = c(NA, 6L), class = "data.frame")
You are right - this is not doing what you expect it to. The reason is that you are supplying POP25 as the weight but you haven't yet made it explicit that you only want the POP25 value from 2007.
A weights vector needs to be the same length as the dependent and independent variables. The easiest way to do this is by creating a weights column in the table, where the value is the POP25 value for each NUTS_ID in the year 2007:
library(dplyr)
data <- data |>
group_by(NUTS_ID) |>
mutate(weights = POP25[TIME==2007])
You can then supply this as the weights vector:
lm(log(POP25) ~ log(EMPLOY25), weights = weights, data = data)

Fetch daily data for each variable using mapply

I've a function which objective is to fetch daily data for each variable on a column on a data.frame. Range is a complete month, but could be any other range.
My df has a column unit_id, so I need my function to take the first id of col unit_id and fetch the data for every single date of march.
| unit | unit_id |
|:-----:|----------|
| AE | 123 |
| AD | 456 |
| AN | 789 |
But right now, my function loops the ids in unit_id col. So as I've 3 ids, the 4th day the function uses the 1st id again, and then for the 5th day uses the 2nd id and so on. And this repeats until the last day of the month.
I need it to use each id for every day of the month.
code:
my_dates <- seq(as.Date("2020-03-01"), as.Date("2020-03-31"), by = 1)
my_fetch <- function(unit, unit_id, d) {
df <- google_analytics(unit_id,
date_range = c(d, d),
metrics = c("totalEvents"),
dimensions = c("ga:date", "ga:eventCategory", "ga:eventAction", "ga:eventLabel"),
anti_sample = TRUE)
df$unidad_de_negocio <- unit
filename <- paste0(unit, "-", "total-events", "-", d, ".csv")
path <- "D:\\america\\costos_protv\\total_events"
write.csv(df, file.path(path, filename), row.names = FALSE)
print(filename)
rm(df)
gc()
}
monthly_fetches <- mapply(my_fetch, df$unit,
df$unit_id,
my_dates, SIMPLIFY = FALSE)
Variation 2: By monthly ranges
Thank you, Akrun. Your answer works.
I'ven trying to edit it, ot use it in this other similar scenario:
1.- Monthly starts and ends: Now the loops isn't a single day date, but has an start and end. I've called this monthly_dates
| starts | ends |
|:-----------:|------------|
| 2020-02-01 | 2020-02-29 |
| 2020-03-01 | 2020-03-31 |
I've tried to adapt the solution, but it is not working. May you see it and tell me why? Thank you.
monthly_fetches <- Map(function(x, y)
lapply(monthly_dates, function(d1, d2) my_fetch(x, y, monthly_dates$starts, monthly_dates$ends)))
Main function adapted to use 2 dates (start "d1" and end "d2"):
my_fetch <- function(udn, udn_id, d1, d2) {
df <- google_analytics(udn_id,
date_range = c(d1, d2),
metrics = c("totalEvents"),
dimensions = c("ga:month"),
anti_sample = TRUE)
df$udn <- udn
df$udn_id <- udn_id
df
}
** Code to make the monthly date ranges:**
make_date_ranges <- function(start, end){
starts <- seq(from = start,
to = Sys.Date()-1 ,
by = "1 month")
ends <- c((seq(from = add_months(start, 1),
to = end,
by = "1 month" ))-1,
(Sys.Date()-1))
data.frame(starts,ends)
}
## useage
monthly_dates <- make_date_ranges(as.Date("2020-02-01"), Sys.Date())
Update 1:
dput(monthly_fetches[1])
list(AE = list(structure(list(month = "02", totalEvents = 19670334,
udn = "AE", udn_id = 74415341), row.names = 1L, totals = list(
list(totalEvents = "19670334")), minimums = list(list(totalEvents = "19670334")), maximums = list(
list(totalEvents = "19670334")), isDataGolden = TRUE, rowCount = 1L, class = "data.frame"),
structure(list(month = "03", totalEvents = 19765253, udn = "AE",
udn_id = 74415341), row.names = 1L, totals = list(list(
totalEvents = "19765253")), minimums = list(list(totalEvents = "19765253")), maximums = list(
list(totalEvents = "19765253")), isDataGolden = TRUE, rowCount = 1L, class = "data.frame"),
structure(list(month = "04", totalEvents = 1319087, udn = "AE",
udn_id = 74415341), row.names = 1L, totals = list(list(
totalEvents = "1319087")), minimums = list(list(totalEvents = "1319087")), maximums = list(
list(totalEvents = "1319087")), isDataGolden = TRUE, rowCount = 1L, class = "data.frame")))
Update 2:
dput(monthly_fetches[[1]])
list(structure(list(month = "02", totalEvents = 19670334, udn = "AE",
udn_id = 74415341), row.names = 1L, totals = list(list(totalEvents = "19670334")), minimums = list(
list(totalEvents = "19670334")), maximums = list(list(totalEvents = "19670334")), isDataGolden = TRUE, rowCount = 1L, class = "data.frame"),
structure(list(month = "03", totalEvents = 19765253, udn = "AE",
udn_id = 74415341), row.names = 1L, totals = list(list(
totalEvents = "19765253")), minimums = list(list(totalEvents = "19765253")), maximums = list(
list(totalEvents = "19765253")), isDataGolden = TRUE, rowCount = 1L, class = "data.frame"),
structure(list(month = "04", totalEvents = 1319087, udn = "AE",
udn_id = 74415341), row.names = 1L, totals = list(list(
totalEvents = "1319087")), minimums = list(list(totalEvents = "1319087")), maximums = list(
list(totalEvents = "1319087")), isDataGolden = TRUE, rowCount = 1L, class = "data.frame"))
As Map/mapply requires all arguments to be of same length and 'df' with number of rows of 3 and 'my_dates' length 31, one option is to loop over the 'df', columns and then do a further loop inside the Map/mapply
monthly_fetches <- Map(function(x, y)
lapply(my_dates, function(date) my_fetch(x, y, date)),
df$unit, d$unit_id)
Or we can have outer loop for 'my_dates'
lapply(my_dates, function(date) Map(my_fetch, df$unit, df$unit_id, date))
Update
If we need to pass two columns, use Map
Map(function(start, end)
Map(my_fetch, df$unit, df$unit_id, start, end),
monthly_dates$starts, monthly_dates$ends))
Or
monthly_fetches <- Map(function(x, y) Map(function(start, end)
my_fetch(x, y, start, end),
monthly_dates$starts, monthly_dates$ends), df$unit, df$unit_id)
Then rbind
do.call(rbind,lapply(monthly_fetches, function(x) do.call(rbind, x)))
Or use map
library(purrr)
library(dplyr)
map_dfr(monthly_fetches, bind_rows, .id = 'grp')

Filter two tables with crosstalk

I am creating a Flexdashboard in R. I want the dashboard to contains both a table and a series of visualizations, that would be filtered through inputs.
As I need to deliver a dashboard locally (without a server running in the background), I am unable to use Shiny, hence I rely on crosstalk.
I know that the crosstalk package provides limited functionality in the front-end. For instance, the documentation says that you can't aggregate the SharedData object.
Nonetheless, I am not clear if I can use the same inputs to filter two different dataframes.
For example, lets say I have:
Dataframe One: Contains original data
df1 <- structure(list(owner = structure(c(1L, 2L, 2L, 2L, 2L), .Label = c("John",
"Mark"), class = "factor"), hp = c(250, 120, 250, 100, 110),
car = structure(c(2L, 2L, 2L, 1L, 1L), .Label = c("benz",
"bmw"), class = "factor"), id = structure(1:5, .Label = c("car1",
"car2", "car3", "car4", "car5"), class = "factor")), .Names = c("owner",
"hp", "car", "id"), row.names = c(NA, -5L), class = "data.frame")
Dataframe Two: Contains aggregated data
df2 <- structure(list(car = structure(c(1L, 2L, 1L, 2L), .Label = c("benz",
+ "bmw"), class = "factor"), owner = structure(c(1L, 1L, 2L, 2L
+ ), .Label = c("John", "Mark"), class = "factor"), freq = c(0L,
+ 1L, 2L, 2L)), .Names = c("car", "owner", "freq"), row.names = c(NA,
+ -4L), class = "data.frame")
These two dataframes contain columns with identical values - car and owner. As well as, additional columns too.
I could create two different objects:
library(crosstalk)
shared_df1 <- SharedData$new(df1)
shared_df2 <- SharedData$new(df2)
and than:
filter_select("owner", "Car owner:", shared_df1, ~ owner)
filter_select("owner", "Car owner:", shared_df2, ~ owner)
However, that would mean that the user will need to fill inputs that are essentially identical, twice. Also, if the table is large, this would double the size of the memory needed to use the dashboard.
Is it possible to work around this problem in crosstalk?
Ah I recently ran into this too, there is another argument to SharedData$new(..., group = )! The group argument seems to do the trick. I found out by accident when I had two dataframes and used the group =.
If you make a sharedData object, it will include
a dataframe
a key to select rows by - preferably unique, but not necessarily.
a group name
What I think happens is that crosstalk filters the sharedData by the key - for all sharedData objects in the same group! So as long as two dataframes use the same key, you should be able to filter them together in one group.
This should work for your example.
---
title: "blabla"
output:
flexdashboard::flex_dashboard:
orientation: rows
social: menu
source_code: embed
theme: cerulean
---
```{r}
library(plotly)
library(crosstalk)
library(tidyverse)
```
```{r Make dataset}
df1 <- structure(list(owner = structure(c(1L, 2L, 2L, 2L, 2L), .Label = c("John", "Mark"), class = "factor"), hp = c(250, 120, 250, 100, 110), car = structure(c(2L, 2L, 2L, 1L, 1L), .Label = c("benz", "bmw"), class = "factor"), id = structure(1:5, .Label = c("car1", "car2", "car3", "car4", "car5"), class = "factor")), .Names = c("owner", "hp", "car", "id"), row.names = c(NA, -5L), class = "data.frame")
df2 <- structure(list(car = structure(c(1L, 2L, 1L, 2L), .Label = c("benz",
"bmw"), class = "factor"), owner = structure(c(1L, 1L, 2L, 2L
), .Label = c("John", "Mark"), class = "factor"), freq = c(0L,
1L, 2L, 2L)), .Names = c("car", "owner", "freq"), row.names = c(NA,
-4L), class = "data.frame")
```
#
##
### Filters
```{r}
library(crosstalk)
# Notice the 'group = ' argument - this does the trick!
shared_df1 <- SharedData$new(df1, ~owner, group = "Choose owner")
shared_df2 <- SharedData$new(df2, ~owner, group = "Choose owner")
filter_select("owner", "Car owner:", shared_df1, ~owner)
# You don't need this second filter now
# filter_select("owner", "Car owner:", shared_df2, ~ owner)
```
### Plot1 with plotly
```{r}
plot_ly(shared_df1, x = ~id, y = ~hp, color = ~owner) %>% add_markers() %>% highlight("plotly_click")
```
### Plots with plotly
```{r}
plot_ly(shared_df2, x = ~owner, y = ~freq, color = ~car) %>% group_by(owner) %>% add_bars()
```
##
### Dataframe 1
```{r}
DT::datatable(shared_df1)
```
### Dataframe 2
```{r}
DT::datatable(shared_df2)
```
I spent some time on this by trying to extract data from plot_ly() using plotly_data() without luck until I figured out the answer. That's why there's some very simple plots with plotly.
Recently, I've also wanted to use one filter to filter 2 visualizations.
Brief description of my situation
I've wanted to use one filter to filter a boxplot and a table.
Source data has been a data frame. I've wanted to use some of variables for the boxplot and also calculate some statistics (like mean, standard deviation, mode, number of records).
Functions I've needed to use to display results: plotly::plot_ly(), DT::datatable(), crosstalk::bscols().
I've found out that there are 3 key information to solve this situation
Key 1) It's necessary to correctly create shared data.
In my case, I've had to use crosstalk::SharedData$new() twice.
Correct shared data, to be used as source for visualizations, can be used if firstly keys 2 and 3 are fulfilled.
Key 2) When creating shared data, use the same group argument as "Lodewic Van Twillert" explained on 16 Mar 2018.
Key 3) Ensure that all SharedData instances refer conceptually to the same data points, and share the same keys.
Start with ensuring that a data frame has row names even if row names are character vector with numbers (like "1", "2", ...).
Used literature for this key 3: https://rstudio.github.io/crosstalk/using.html. (I suggest to mainly read subtitle "Grouping".)
Summary of steps I've used to fulfill key information from above
Key 3) This one could be tricky in order to fulfill relevant conditions of key 3 above.
The approach I've chosen creates one table containing all data and this table (data frame) will be used to create both shared data.
I've applied data manipulations to original data frame (risk_scores_df) so now this data has a new column.
I've created a new data frame with statistics.
I've joined both data frames using
risk_scores_df <- dplyr::left_join... so now the original data frame contains all prepared data.
I've run print(rownames(risk_scores_df)) to ensure that my updated data frame has row names.
Now, I've had one data frame containing all data (needed for both visualizations) that fulfill conditions of information of key 3 above.
Key 2) I've simply added group = "sd1" in both crosstalk::SharedData$new()
Key 1) This one could be also tricky if a wrong approach is chosen.
Here, the key to create correct shared data instances is to use that one table with all data and choose only rows and columns needed for a relevant shared data.
Example - in my case, I've run codes in Option 1 to create two shared data instances, but also Option 2 is possible.
Option 1 (choosing of only needed rows and columns is in crosstalk::SharedData$new())
rs_df_sd1 <- crosstalk::SharedData$new(
risk_scores_df[, c(1, 2, 5)],
group = "sd1"
)
rs_df_sd1a <- crosstalk::SharedData$new(
risk_scores_df[risk_scores_df$NumRecords > 0 &
is.na(risk_scores_df$NumRecords) == F,
c(1, 6:11)],
group = "sd1"
)
Option 2 (choosing of only needed rows and columns is in additional variables)
sd1 <- risk_scores_df[, c(1, 2, 5)]
sd1a <- risk_scores_df[risk_scores_df$NumRecords > 0 &
is.na(risk_scores_df$NumRecords) == F,
c(1, 6:11)]
rs_df_sd1 <- crosstalk::SharedData$new(sd1, group = "sd1")
rs_df_sd1a <- crosstalk::SharedData$new(sd1a, group = "sd1")
Completing the solution
At this point I've created shared data instances rs_df_sd1 and rs_df_sd1a that can be used as main sources for visualizations that will be filtered using crosstalk::bscols().
Brief example:
box_n_jitter_chart1 <- plotly::plot_ly(rs_df_sd1) %>% add_trace(...
DT_table1 <- DT::datatable(rs_df_sd1a)
crosstalk::bscols(
widths = c(6, 12, NA),
crosstalk::filter_select(
id = "idAvgRisk",
label = "Account",
sharedData = rs_df_sd1,
group = ~Account,
multiple = F
),
box_n_jitter_chart1,
DT_table1
)
Note: DT::datatable() can also use rs_df_sd1a$data() and cells = list(values = base::rbind(... (see that cells = ... is used; see more about using cells e.g. at https://plotly.com/r/reference/table/) but because method data() is used (see more e.g. at https://rdrr.io/cran/crosstalk/man/SharedData.html#method-data) then it will not work with crosstalk::bscols.

prop.table doesn't work in a for-loop?

This may be a very simple question, but I don't see how to answer it.
I have the following reproducible code, where I have two small dataframes that I use to calculate a percentage value based on each column total:
#dataframe x
x <- structure(list(PROV = structure(c(1L, 1L), .Label = "AG", class = "factor"),
APT = structure(1:2, .Label = c("AAA", "BBB"), class = "factor"),
PAX.2013 = c(5L, 4L), PAX.2014 = c(4L, 2L), PAX.2015 = c(4L,0L)),
.Names = c("PROV", "APT", "PAX.2013", "PAX.2014", "PAX.2015"),
row.names = 1:2, class = "data.frame")
#dataframe y
y <- structure(list(PROV = structure(c(1L, 1L), .Label = "AQ", class = "factor"),
APT = structure(1:2, .Label = c("CCC", "AAA"), class = "factor"),
PAX.2013 = c(3L, 7L), PAX.2014 = c(2L, 1L), PAX.2015 = c(0L,3L)),
.Names = c("PROV", "APT", "PAX.2013", "PAX.2014", "PAX.2015"),
row.names = 1:2, class = "data.frame")
#list z (with x and y)
z <- list(x,y)
#percentage value of x and y based on columns total
round(prop.table(as.matrix(z[[1]][3:5]), margin = 2)*100,1)
round(prop.table(as.matrix(z[[2]][3:5]), margin = 2)*100,1)
as you can see, it works just fine.
Now I want to automate for all the list, but I can't figure out how to get the results. This is my simple code:
#for-loop that is not working
for (i in length(z))
{round(prop.table(as.matrix(z[[i]][3:5]), margin = 2)*100,1)}
You have two problems.
First, you have not put a range into your for loop so you are just trying to iterate over a single number and second, you are not assigning your result anywhere on each iteration.
Use 1:length(z) to define a range. Then assign the results to a variable.
This would work:
my_list <- list()
for (i in 1:length(z)){
my_list[[i]] <- round(prop.table(as.matrix(z[[i]][3:5]),
margin = 2)*100,1)
}
my_list
But it would be more efficient and idiomatic to use lapply:
lapply(1:length(z),
function(x) round(prop.table(as.matrix(z[[x]][3:5]), margin = 2)*100,1))
Barring discussions whether for-loops is the best approach, you had two issues. One, your for loop only iterates over 2 (which is length(z)) instead of 1:2. Two, you need to do something with the round(....) statement. In this solution, I added a print statement.
for (i in 1:length(z)){
print(round(prop.table(as.matrix(z[[i]][3:5]), margin = 2)*100,1))
}

BioConductor IRanges coverage counts and identify segments

I have a dataset with interval information for a bunch of manufacturing circuits
df <- data.frame(structure(list(circuit = structure(c(2L, 1L, 2L, 1L, 2L, 3L,
1L, 1L, 2L), .Label = c("a", "b", "c"), class = "factor"), start = structure(c(1393621200,
1393627920, 1393628400, 1393631520, 1393650300, 1393646400, 1393656000,
1393668000, 1393666200), class = c("POSIXct", "POSIXt"), tzone = ""),
end = structure(c(1393626600, 1393631519, 1393639200, 1393632000,
1393660500, 1393673400, 1393667999, 1393671600, 1393677000
), class = c("POSIXct", "POSIXt"), tzone = ""), id = structure(1:9, .Label = c("1001",
"1002", "1003", "1004", "1005", "1006", "1007", "1008", "1009"
), class = "factor")), .Names = c("circuit", "start", "end",
"id"), class = "data.frame", row.names = c(NA, -9L)))
Circuit: Identifier for circuit
Start: Time the circuit started running
Finish: Time the circuit stopped running
Id: Unique identifier for the row
I'm able to create a new data set that counts the number of overlapping intervals:
ir <- IRanges(start = as.numeric(df$start), end = as.numeric(df$end), names = df$id)
cov <- coverage(ir)
start_time <- as.POSIXlt(start(cov), origin = "1970-01-01")
end_time <- as.POSIXlt(end(cov), origin = "1970-01-01")
seconds <- runLength(cov)
circuits_running <- runValue(cov)
res <- data.frame(start_time,end_time,seconds,circuits_running)[-1,]
But what I really need is something that looks more like this:
sqldf("select
res.start_time,
res.end_time,
res.seconds,
res.circuits_running,
df.circuit,
df.id
from res left join df on (res.start_time between df.start and df.end)")
The problem is that the sqldf way of using an inequality join is unbearably slow on my full dataset.
How can I get something similar using IRanges alone?
I suspect it has something to do with RangedData but I've not been able to see how to get what I want. Here's what I've tried...
rd <- RangedData(ir, circuit = df$circuit, id = df$id)
coverage(rd) # works but seems to lose the circuit/id info
The coverage can be represented as ranges, dropping the first (the range from 1970 to the first start point)
cov <- coverage(ir)
intervals <- ranges(cov)[-1]
Your query is to find the start of the interval of each circuit, so I narrow the interval to their start coordinate and find overlaps (the first argument is the 'query', the second the 'subject')
olaps <- findOverlaps(narrow(intervals, width(intervals)), ir)
The number of circuits running in a particular interval is
tabulate(queryHits(olaps), queryLength(olaps))
and the actual circuits are
df[subjectHits(olaps), c("circuit", "id")]
The pieces can be knit together as, perhaps
df1 <- cbind(uid=seq_along(intervals),
as.data.frame(intervals),
circuits_running=tabulate(queryHits(olaps), queryLength(olaps)))
df2 <- cbind(uid=queryHits(olaps),
df[subjectHits(olaps), c("circuit", "id")])
merge(df1, df2, by="uid", all=TRUE)
Ranges can have associated with them 'metadata' that is accessible and subset in a coordinated way, so the connection between data.frame and ranges does not have to be so loose and ad hoc. I might instead have
ir <- IRanges(start = as.numeric(df$start), end = as.numeric(df$end))
mcols(ir) <- DataFrame(df)
## ...
mcols(ir[subjectHits(olaps)])
perhaps with as.data.frame() when done with IRanges-land.
It's better to ask your questions about IRanges on the Bioconductor mailing list; no subscription required.

Resources