I would like to create a table in which I can give access to a certain page of a report for a certain user
Imagine I have a table like this:
I have another table in which I have the name of every report page:
I want to get a table in which I have the users and all the pages at which they hace access to depending on their group. Group 1 can see all pages, but group 2 only can see the Team page:
The best option would be doing it in DAX code but I think it could be easier doing it using R. Thanks in advance!
The tidyverse package gives you easy tools to manipulate dataframes. You can create 3 variables (Orders, Sales, Team) recording the access rights for each page (with 1 or 0 for example) using case_when with a condition on Group, and then pivot_longer on these variables, and finally only keep rows where there is an access right with filter.
library(tidyverse)
Group <- c(1,2,1,2,2)
User <- c("alex","pablo","carlos","pepe","paula") %>% paste0("#gmail.com")
df <- data.frame(Group, User)
df2 <- df %>%
mutate(Orders = case_when(Group==1 ~ 1,
Group==2 ~ 0),
Sales = Orders,
Team = 1) %>%
pivot_longer(cols = c(Orders, Sales, Team), names_to = "Page") %>%
filter(value == 1) %>%
select(-value)
Output
> df
Group User
1 1 alex#gmail.com
2 2 pablo#gmail.com
3 1 carlos#gmail.com
4 2 pepe#gmail.com
5 2 paula#gmail.com
> df2
# A tibble: 9 x 3
Group User Page
<dbl> <chr> <chr>
1 1 alex#gmail.com Orders
2 1 alex#gmail.com Sales
3 1 alex#gmail.com Team
4 2 pablo#gmail.com Team
5 1 carlos#gmail.com Orders
6 1 carlos#gmail.com Sales
7 1 carlos#gmail.com Team
8 2 pepe#gmail.com Team
9 2 paula#gmail.com Team
An idea can be,
library(dplyr)
library(tidyr)
df %>%
mutate(page = toString(df1$page_name)) %>%
separate_rows(page, sep = ', ') %>%
mutate(page = replace(page, Group == 2 & page != 'team', NA)) %>%
na.omit()
# A tibble: 9 x 3
Group User page
<dbl> <chr> <chr>
1 1 A orders
2 1 A sales
3 1 A team
4 2 B team
5 1 C orders
6 1 C sales
7 1 C team
8 2 D team
9 2 E team
DATA
dput(df)
structure(list(Group = c(1, 2, 1, 2, 2), User = c("A", "B", "C",
"D", "E")), class = "data.frame", row.names = c(NA, -5L))
dput(df1)
structure(list(page_name = c("orders", "sales", "team")), class = "data.frame", row.names = c(NA,
-3L))
Another idea using fuzzyjoin package:
Data
users <- data.frame(
Group = c("1","2","1","2","2"),
User = c("alex","pablo","carlos","pepe","paula")
)
Group User
1 1 alex
2 2 pablo
3 1 carlos
4 2 pepe
5 2 paula
You can then add a column to the Page dataframe which tell the groups allowed to have access to each category:
pagename <- data.frame(
Page = c("Order","Sales","Team"),
Allowed = c("1","1","1|2")
)
Page Allowed
1 Order 1
2 Sales 1
3 Team 1|2
And finally using fuzzyjoin::regex_left_join:
users |>
fuzzyjoin::regex_left_join(pagename,
by = c(Group = "Allowed")) |>
dplyr::select(-Allowed)
Output
Group User Page
1 1 alex Order
2 1 alex Sales
3 1 alex Team
4 2 pablo Team
5 1 carlos Order
6 1 carlos Sales
7 1 carlos Team
8 2 pepe Team
9 2 paula Team
Related
I have survey data structured as follows:
df <- data.frame(userid = c(1, 2, 3),
pos1 = c("itemA_1", "itemB_1", "itemA_2"),
pos2 = c("itemB_1", "itemC_2", "itemC_1"),
pos3 = c("itemC_5", "itemA_4", "itemB_3")
)
df
> df
userid pos1 pos2 pos3
1 1 itemA_1 itemB_3 itemC_3
2 2 itemB_1 itemC_1 itemA_1
3 3 itemA_2 itemC_4 itemB_1
In the survey several items (itemA, itemB, itemC ...) were rated on a five-point likert-skale ranging from 1 to 5. The order in which the items were answered was also saved.
For example in the above data.frame user 1 rated itemA first and the rating was 1. Then he rated itemB and the rating was 3. Finally he rated itemC and the rating was 3.
user 2 started with itemB and the rating was 1 etc.
Obviously, that structure is not very useful to analyse the data. So I'd rather have it in a form like this:
userid itemA itemB itemC ...
1 1 3 3
2 1 1 1
3 2 1 4
But how can I get there? Thanks for help!
Get the data in long format, separate the rating value from 'item' and get the data in wide format.
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = starts_with('pos'), names_to = NULL) %>%
separate(value, c('item', 'rating'), sep = '_', convert = TRUE) %>%
pivot_wider(names_from = item, values_from = rating)
# userid itemA itemB itemC
# <dbl> <int> <int> <int>
#1 1 1 1 5
#2 2 4 1 2
#3 3 2 3 1
This question already has answers here:
Sort (order) data frame rows by multiple columns
(19 answers)
Closed 2 years ago.
I have a dataframe with several "people".
There are repeat instances for "people", however, the measured "value" is different in each instance.
Here is an example of dataframe.
df2 <- data.frame(
value = c(1, 2, 3, 4, 5),
people = c("d", "c", "b", "d", "b")
)
which looks like:
value people
1 d
2 c
3 b
4 d
5 b
I would like to group the data by "people", then sort the groups of rows by "value", and within the groups, I would like to sort descending by the "value".
That is, I want to keep duplicates together while sorting by value.
Here is how I would like the data to look:
value people
1 d
4 d
2 c
3 b
5 b
I have tried multiple attempts with group_by and arrange using {dplyr} but seems I am missing something.
Thanks for the help.
I have made a change - for clarity, I do not want "people" sorted alphabetically - this is a schedule in reality - person D has the first appointment (1), and his second appointment is 4. I want them to appear first and together. Person C has a 2nd appointment. Person B has a 3rd appointment, his other appointment is 5. I hope this makes it more clear. Thanks again
You can use arrange in this form :
library(dplyr)
df2 %>%
arrange(value) %>%
arrange(match(people, unique(people)))
# value people
#1 1 d
#2 4 d
#3 2 c
#4 3 b
#5 5 b
Though a longer code, but this will also work
df2 %>% group_by(people) %>% arrange(value) %>%
mutate(d = first(value)) %>% arrange(d) %>% ungroup() %>% select(-d)
# A tibble: 5 x 2
value people
<dbl> <chr>
1 1 d
2 4 d
3 2 c
4 3 b
5 5 b
I got your result with the following one-liner base-R code:
df2[order(df2$people, decreasing = TRUE),]
# value people
# 1 1 d
# 4 4 d
# 2 2 c
# 3 3 b
# 5 5 b
I am working on a dataset in which I need to calculate how long does it take for a retail store to replenish some products from shortage, and here is a quick view of the dataset in the simplest form:
Date <- c("2019-1-1","2019-1-2","2019-1-3","2019-1-4","2019-1-5","2019-1-6","2019-1-7","2019-1-8")
Product <- rep("Product A",8)
Net_Available_Qty <- c(-2,-2,10,8,-5,-6,-7,0)
sample_df <- data.frame(Date,Product,Net_Available_Qty)
When the Net_Available_Qty becomes negative, it means there is a shortage. When it turns back to 0 or positive qty, it means the supply has been recovered. What I need to calculate is the days between when we first see shortage and when it is recovered. In this case, for the 1st shortage, it took 2 days to recover and for the second shortage, it took 3 days to recover.
A tidyverse solution would be most welcome.
I hope someone else finds a cleaner solution. But this produces diffDate which assigns the date difference from when a negative turns positive/zero.
sample_df %>%
mutate(sign = ifelse(Net_Available_Qty > 0, "pos", ifelse(Net_Available_Qty < 0, "neg", "zero")),
sign_lag = lag(sign, default = sign[1]), # get previous value (exception in the first place)
change = ifelse(sign != sign_lag, 1 , 0), # check if there's a change
sequence=sequence(rle(as.character(sign))$lengths)) %>%
group_by(sequence) %>%
mutate(diffDate = as.numeric(difftime(Date, lag(Date,1))),
diffDate=ifelse(Net_Available_Qty <0, NA, ifelse((sign=='pos'| sign=='zero') & sequence==1, diffDate, NA))) %>%
ungroup() %>%
select(Date, Product, Net_Available_Qty, diffDate)
#Schilker had a great idea using rle. I am building on his answer and offering a slightly shorter version including the use of cumsum
Date <- c("2019-1-1","2019-1-2","2019-1-3","2019-1-4","2019-1-5","2019-1-6","2019-1-7","2019-1-8")
Product <- rep("Product A",8)
Net_Available_Qty <- c(-2,-2,10,8,-5,-6,-7,0)
sample_df <- data.frame(Date,Product,Net_Available_Qty)
library(tidyverse)
sample_df %>%
mutate(
diffDate = c(1, diff(as.Date(Date))),
sequence = sequence(rle(Net_Available_Qty >= 0)$lengths),
group = cumsum(c(TRUE, diff(sequence)) != 1L)
) %>%
group_by(group) %>%
mutate(n_days = max(cumsum(diffDate)))
#> # A tibble: 8 x 7
#> # Groups: group [4]
#> Date Product Net_Available_Qty diffDate sequence group n_days
#> <fct> <fct> <dbl> <dbl> <int> <int> <dbl>
#> 1 2019-1-1 Product A -2 1 1 0 2
#> 2 2019-1-2 Product A -2 1 2 0 2
#> 3 2019-1-3 Product A 10 1 1 1 2
#> 4 2019-1-4 Product A 8 1 2 1 2
#> 5 2019-1-5 Product A -5 1 1 2 3
#> 6 2019-1-6 Product A -6 1 2 2 3
#> 7 2019-1-7 Product A -7 1 3 2 3
#> 8 2019-1-8 Product A 0 1 1 3 1
Created on 2020-02-23 by the reprex package (v0.3.0)
I want to summarize relocations (between cities), based on a unique ID number. A sample dataframe, with two unique ID's:
year ID city adress
1 2013 1 B adress_1
2 2014 1 B adress_1
3 2015 1 A adress_2
4 2016 1 A adress_2
5 2013 2 B adress_3
6 2014 2 B adress_3
7 2015 2 C adress_4
8 2016 2 C adress_4
I have provided a sample code below. The summaries are correct, except for one thing. If, for example, a relocation is found between city B and city A, I want an output of relocation found from city B to city A (and number of times 1 = seen once in the dataframe). However, because of the properties of the summary function (and the tendency to store output in alphabetic order), I get the following output
tmp <- df %>% group_by(ID, city, adress) %>% summarize(numberofyears = n())
tmp <- tmp %>%
group_by(ID) %>%
#filter(n() >1) %>%
mutate(from = city[1], from_adres = adress[1], from_years = numberofyears[1], to = city[2],
to_adres = adress[2], to_years = numberofyears[2]) %>%
distinct(ID, .keep_all = TRUE) %>% select(-c(2:3))
# A tibble: 2 x 8
# Groups: ID [2]
ID numberofyears from from_adres from_years to to_adres to_years
<dbl> <int> <fct> <fct> <int> <fct> <fct> <int>
1 1 2 A adress_2 2 B adress_1 2
2 2 2 B adress_3 2 C adress_4 2
Which is wrong, because we know that adress_1 preceed adress_2. When summarizing a relocation from City B to City C, I get the right results.
It is a very small detail, but an important one as I tried to demonstrate. Any suggestions would be very much appreciated!
Similar to #jyjek but this will allow for the possibility of more than one move per ID.
library(tidyverse)
df <- data.frame(year = rep(2013:2016, 2),
ID = rep(1:2, each = 4),
city = c("B", "B", "A", "A", "B", "B", "C", "C"),
address = rep(1:4, each = 2),
stringsAsFactors = FALSE)
df %>%
group_by(ID, city, address) %>%
#note the first and last year at the address
summarise(startyear = min(year),
endyear = max(year)) %>%
#sort by ID and year
arrange(ID, startyear) %>%
group_by(ID) %>%
#grab the next address for each ID
mutate(to = lead(city),
to_address = lead(address),
to_years = lead(endyear) - lead(startyear) + 1,
from_years = endyear - startyear + 1) %>%
#exclude the last row of each ID, since there's no next address being moved to
filter(!is.na(to)) %>%
select(ID, from = city, from_address = address, from_years, to, to_address, to_years)
Like this?
library(tidyverse)
df<-read.table(text=" year ID city adress
1 2013 1 B adress_1
2 2014 1 B adress_1
3 2015 1 A adress_2
4 2016 1 A adress_2
5 2013 2 B adress_3
6 2014 2 B adress_3
7 2015 2 C adress_4
8 2016 2 C adress_4",header=T)
df%>%
group_by(ID, city, adress)%>%
summarize(numberofyears = n())%>%
mutate(id=parse_number(adress))%>%
group_by(ID,id)%>%
arrange(id)%>%
ungroup()%>%
select(-id)%>%
group_by(ID)%>%
mutate(from=first(city), from_adres = first(adress),
from_years = first(numberofyears),to=last(city),
to_adres = last(adress),to_years=last(numberofyears))%>%
distinct(ID, .keep_all = TRUE)%>%
select(-c(2:3))
# A tibble: 2 x 8
# Groups: ID [2]
ID numberofyears from from_adres from_years to to_adres to_years
<int> <int> <fct> <fct> <int> <fct> <fct> <int>
1 1 2 B adress_1 2 A adress_2 2
2 2 2 B adress_3 2 C adress_4 2
I have data with a list of people's names and their ID numbers. Not all people with the same name will have the same ID number but everyone with different names should have a different ID number. Like this:
Name david david john john john john megan bill barbara chris chris
ID 1 1 2 2 2 3 4 5 6 7 8
I need to make sure that these IDs are correct. So, I want to write a code that says "subset only if ID numbers are the same but their names are different"(so I will be only subsetting ID errors). I don't even know where to start with this because I tried
df1<-df(subset(duplicated(df$Name) & duplicated(df$ID)))
Error in subset.default(duplicated(df$officer) & duplicated(df$ID)) :
argument "subset" is missing, with no default
but it didn't work and I know it doesn't tell R to match and compare names and ID numbers.
Thank you so much in advance.
Updated with the information in the comments below
Here are some test data:
> DF <- data.frame(name = c("A", "A", "A", "B", "B", "C"), id=c(1,1,2,3,4,4))
> DF
name id
1 A 1
2 A 1
3 A 2
4 B 3
5 B 4
6 C 4
So ... if I understand your problem correctly you want to get the information that there are problems with id 4 since two different names (B and C) appear for that id.
library(dplyr)
DF %>% group_by(id) %>% distinct(name) %>% tally()
# A tibble: 4 x 2
id n
<dbl> <int>
1 1 1
2 2 1
3 3 1
4 4 2
Here we get a summary and see that there are two different names (n) for id 4. You can combine that with filter to only see the ids with more than one name
> DF %>% group_by(id) %>% distinct(name) %>% tally() %>% filter(n > 1)
# A tibble: 1 x 2
id n
<dbl> <int>
1 4 2
Did that help?