How to subtract rows where column factor matches using dplyr?

How to subtract rows where column factor matches using dplyr? - r

I'm moving to dplyr for much of my data wrangling, but I can't figure out how to do a row-diff based on a factor with it.
I can use ddply from plyr as follows:
ddply(.data = dat_frame, .variables = .(the_factor), .fun = summarise, diff = diff(the_number))
the_factor diff
1 169 0.000
2 169 0.000
3 372 22.557
4 372 0.000
5 372 -19.491
6 372 2.940
7 372 -2.767
8 372 -5.310
9 508 0.000
Source data:
structure(list(the_factor = structure(c(3L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 8L, 7L, 10L, 5L, 9L, 2L, 2L, 2L, 1L, 11L, 6L, 6L), .Label = c("166",
"169", "276", "372", "409", "508", "523", "714", "846", "876",
"969"), class = "factor"), the_date = structure(c(4L, 12L, 13L,
14L, 15L, 16L, 17L, 18L, 1L, 8L, 7L, 2L, 11L, 2L, 3L, 5L, 10L,
8L, 6L, 9L), .Label = c("2012-05-19 21:27:00", "2012-08-02 03:49:00",
"2012-08-02 03:50:00", "2012-08-02 03:52:00", "2012-08-02 08:36:00",
"2013-03-15 03:38:00", "2013-03-15 03:40:00", "2013-03-15 03:41:00",
"2013-03-15 09:14:00", "2013-04-24 13:45:00", "2013-09-04 09:17:00",
"2014-03-12 14:21:00", "2014-03-12 19:45:00", "2014-03-13 04:51:00",
"2014-03-13 21:04:00", "2014-03-14 01:18:00", "2014-03-14 04:49:00",
"2014-03-14 12:09:00"), class = "factor"), the_number = c(0.02,
17.443, 40, 40, 20.509, 23.449, 20.682, 15.372, 0.02, 0.02, 0.02,
0.02, 1.74, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02)), .Names = c("the_factor",
"the_date", "the_number"), row.names = c(NA, -20L), class = "data.frame")

Related

R - Create dataframe from a list of data with different number of columns

I have a dataset showing date, name of the location and values in the same list. My goal is to generate a dataframe in a loop where columns show the value for each date and the name of the location are the rows.
However, the dates don't always match. Missing values can be NA.
Furthermore, I only need the second value (value2).
What's the easiest way to do this?
Sample data:
data <- structure(list(date = structure(c(7L, 4L, 6L, 15L, 3L, 1L, 13L,
2L, 16L, 11L, 7L, 14L, 8L, 4L, 6L, 15L, 3L, 9L, 10L, 1L, 12L,
5L, 7L, 14L, 8L, 4L, 6L, 15L, 9L, 10L, 1L, 13L, 2L, 16L, 11L), .Label = c("01.10.2013",
"01.10.2015", "08.10.2010", "13.09.2007", "16.09.2003", "17.09.2008",
"20.09.2004", "21.09.2006", "23.09.2011", "26.09.2012", "26.09.2017",
"27.08.2001", "29.09.2014", "30.08.2005", "30.09.2009", "30.09.2016"
), class = "factor"), name = structure(c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("P1",
"P2", "P3"), class = "factor"), value = c(14L, 453L, 345L, 765L,
87L, 45L, 4L, 12L, 2L, 6L, 84L, 11L, 87L, 45L, 4L, 12L, 12L,
14L, 453L, 345L, 51L, 50L, 123L, 34L, 75L, 74L, 42L, 2L, 123L,
42L, 12L, 6L, 9L, 4L, 1L), value2 = c(0.046, 0.003, 0.022, 0.016,
-0.035, 0.032, 0.013, -0.001, 0.018, -0.006, 0.017, -0.001, 0.02,
0, 0.009, 0.191, 0.169, 0.191, 0.286, 0.324, 0.426, 0.35, 0.212,
0.107, 0.081, 0.034, 0.084, 0.092, 0.054, 0.019, 0.022, 0.017,
0.018, 0.002, 0.017)), .Names = c("date", "name", "value", "value2"
), class = "data.frame", row.names = c(NA, -35L))
So far, I've tried different things I found on the internet but none has worked for me.
First, I generated a dataframe with all unique dates found in the list
data$date <- as.Date(data$date, format="%d.%m.%Y")
uniquedates <- as.data.frame(unique(sort(data$date)))
colnames(uniquedates) <- c("date")
Next, I split up the data by name.
split <- split(data, data$name)
Finally, I tried get the date and value2 from every split and merge them together in a loop.
for (i in seq_along(split)) {
point <- split[[i]][,c("date","value2")]
name <- as.character(unique(split[[i]]$name))
colnames(merge)[colnames(merge) == "value2"] <- name
merge <- merge(x=uniquedates, y = point, by='date', all.x = TRUE)
}
This is the result I'm looking for:
date P1 P2 P3
27.08.2001 NA 0.426 NA
16.09.2003 NA NA 0.35
20.09.2004 0.046 0.017 0.212
30.08.2005 NA -0.001 0.107
21.09.2006 NA 0.02 0.081
13.09.2007 0.003 0 0.034
17.09.2008 0.022 0.009 0.084
30.09.2009 0.016 0.191 0.092
08.10.2010 -0.035 0.169 NA
23.09.2011 NA 0.191 0.054
26.09.2012 NA 0.286 0.019
01.10.2013 0.032 0.324 0.022
29.09.2014 0.013 NA 0.017
01.10.2015 -0.001 NA 0.018
30.09.2016 0.018 NA 0.002
26.09.2017 -0.006 NA 0.017

Look into the reshape2 package and the dcast and melt methods.
library(dplyr)
library(reshape2)
data2 = data%>%
mutate(date = as.Date(date,format = '%d.%m.%Y'))%>% #Convert date to a date time object
select(-value)%>% #Remove value because we dont need it
dcast(date~name,value.var = "value2") # Pivot the dataframe

Count mean subgroup occurrence within subgroup

I have the following dataframe:
date hour_of_day distance weather_of_the_day
2017-06-13 6 10.32 1
2017-06-13 8 2.32 1
2017-06-14 10 4.21 2
2017-06-15 7 4.56 4
2017-06-15 7 8.92 4
2017-06-16 22 2.11 3
structure(list(startdat = structure(c(17272, 17272, 17272, 17272,17272, 17272, 17272, 17272, 17272, 17272, 17272, 17272, 17272,17272, 17272, 17272, 17273, 17273, 17273, 17273), class = "Date"), hOfDay = c(22L, 16L, 12L, 13L, 18L, 19L, 19L, 16L, 22L, 10L,
10L, 16L, 11L, 20L, 9L, 15L, 18L, 12L, 16L, 18L), tripDKM = c(0.2,
6.4, 3.4, 0.8, 2.4, 2.2, 2.2, 7.3, 2.6, 3.8, 7.5, 5.8, 3.7,
2.1, 2.6, 5.2, 2.9, 1.7, 3.2, 3.1), totDMIN = c(1.85, 27.4,
8.2, 4.21666666666667, 15.65, 8.91666666666667, 11.5666666666667,
29.5166666666667, 7.01666666666667, 12.2166666666667, 15.8833333333333,
19.5666666666667, 21.7166666666667, 8.66666666666667, 11.2333333333333,
13.4, 7.58333333333333, 10.6166666666667, 6.76666666666667,
17.7), weather_day = structure(c(3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L), .Label = c("1",
"2", "3", "4"), class = "factor")), row.names = c(1L, 2L,3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L, 19L, 20L, 21L, 22L), class = "data.frame")
My final goal is to have a line ggplot, where the x-axis shows the hour_of_day, the y-axis stands for the mean number of occurrences. Eventually the lines should represent the 4 weather conditions. So one line ought to represent weather_of_the_day=1, and the y axis shows how often, on average weather_day=1 has an occurrence with hour_of_day=6 (as an example) and so on for 7, 8, etc.. What I want, are not only the number of occurrences, but the average number of occurrences.
I've been struggling for 2 days with this. I've tried different approaches, with for loops and subgrouping. But non of them brought a usable solution. Thank you very much for your help in advance!

Your posted data set is a little small but this is what I would suggest. It only makes sense with more data points though. df is the set you posted.
library(dplyr)
library(ggplot2)
df_plot <- df %>%
mutate(weather_of_the_day = factor(weather_of_the_day)) %>%
group_by(hour_of_day, weather_of_the_day) %>%
summarize(occurances = n())
ggplot(data = df_plot,
aes(x = hour_of_day,
y = occurances,
group = weather_of_the_day,
color = weather_of_the_day)) +
geom_line()+
geom_point()

I'm not completely sure if this mathes your desired output, but I gave it a try:
#Importing packages
library(dplyr)
library(ggplot2)
d <- structure(list(startdat = structure(c(17272, 17272, 17272, 17272,17272, 17272, 17272, 17272, 17272, 17272, 17272, 17272, 17272,17272, 17272, 17272, 17273, 17273, 17273, 17273),
class = "Date"),
hOfDay = c(22L, 16L, 12L, 13L, 18L, 19L, 19L, 16L, 22L, 10L, 10L, 16L, 11L, 20L, 9L, 15L, 18L, 12L, 16L, 18L),
tripDKM = c(0.2, 6.4, 3.4, 0.8, 2.4, 2.2, 2.2, 7.3, 2.6, 3.8, 7.5, 5.8, 3.7, 2.1, 2.6, 5.2, 2.9, 1.7, 3.2, 3.1),
totDMIN = c(1.85, 27.4, 8.2, 4.21666666666667, 15.65, 8.91666666666667, 11.5666666666667, 29.5166666666667, 7.01666666666667, 12.2166666666667, 15.8833333333333, 19.5666666666667, 21.7166666666667, 8.66666666666667, 11.2333333333333, 13.4, 7.58333333333333, 10.6166666666667, 6.76666666666667, 17.7),
weather_day = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L),
.Label = c("1", "2", "3", "4"),
class = "factor")),
row.names = c(1L, 2L,3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L, 19L, 20L, 21L, 22L),
class = "data.frame")
#Count how often every weather_day occurs during every hOfDay
plot_data <- d %>%
group_by(hOfDay, weather_day) %>%
summarize(n_occurences = n())
#Create plot
ggplot(plot_data, aes(x = hOfDay, y = n_occurences)) +
geom_line(aes(col = weather_day))

Trouble with GLMM with glmer in R: Error in pwrssUpdate...halvings failed to reduce deviance in pwrssUpdate

Here's a snipped of randomly selected data from my full dataframe:
canopy<-structure(list(Stage = structure(c(6L, 5L, 3L, 6L, 7L, 5L, 4L,
7L, 2L, 7L, 5L, 1L, 1L, 4L, 3L, 6L, 5L, 7L, 4L, 4L), .Label = c("milpa",
"robir", "jurup che", "pak che kor", "mehen che", "nu kux che",
"tam che"), class = c("ordered", "factor")), ID = c(44L, 34L,
18L, 64L, 54L, 59L, 28L, 51L, 11L, 56L, 33L, 1L, 7L, 25L, 58L,
48L, 36L, 51L, 27L, 66L), Sample = c(4L, 2L, 2L, 10L, 6L, 9L,
4L, 3L, 3L, 8L, 1L, 1L, 7L, 1L, 10L, 8L, 4L, 3L, 3L, 10L), Subsample = c(2L,
3L, 4L, 3L, 2L, 1L, 3L, 2L, 4L, 3L, 1L, 3L, 2L, 4L, 1L, 1L, 3L,
1L, 1L, 4L), Size..ha. = c(0.5, 0.5, 0.5, 0.5, 6, 0.5, 0.5, 0.25,
0.5, 6, 1, 1, 0.5, 2, 1, 0.5, 1, 0.25, 0.5, 2), Avg.Subsample.Canopy = c(94.8,
94.8, 97.92, 96.88, 97.14, 92.46, 93.24, 97.4, 25.64, 97.4, 94.8,
33.7, 13.42, 98.18, 85.44, 96.36, 97.4, 95.58, 85.7, 92.2), dec = c(0.948,
0.948, 0.9792, 0.9688, 0.9714, 0.9246, 0.9324, 0.974, 0.2564,
0.974, 0.948, 0.337, 0.1342, 0.9818, 0.8544, 0.9636, 0.974, 0.9558,
0.857, 0.922)), .Names = c("Stage", "ID", "Sample", "Subsample",
"Size..ha.", "Avg.Subsample.Canopy", "dec"), row.names = c(693L,
537L, 285L, 1017L, 853L, 929L, 441L, 805L, 173L, 889L, 513L,
9L, 101L, 397L, 913L, 753L, 569L, 801L, 417L, 1053L), class = "data.frame")
I am trying to code a GLMM of dec as a function of Stage and Size..ha.
The GLMM is necessary because each row represents a point Subsample measured within a larger Sample area. I am also using a binomial distribution given dec are proportional data.
I tried the model:
canopy.binomial.mod<-glmer(dec~Stage*Size..ha.+(1|Sample),family="binomial",data=canopy)
summary(canopy.binomial.mod)
but get the error:
Error in pwrssUpdate(pp, resp, tol = tolPwrss, GQmat = GQmat, compDev
= compDev, : (maxstephalfit) PIRLS step-halvings failed to reduce deviance in pwrssUpdate
I've seen online that this can be a result of needing to scale a predictor variable, so I tried:
cs. <- function(x) scale(x,scale=TRUE,center=TRUE)
canopy.binomial.mod<-glmer(dec~Stage*cs.(Size..ha.)+(1|Sample),family="binomial",data=canopy.rmna)
summary(canopy.binomial.mod)
Which doesn't seem to help. I also thought that maybe I'm asking too much of the model and it's not converging due to too many predictor variables, so let's remove the Size variable, which is of less interest to me.
canopy.binomial.mod<-glmer(dec~Stage+(1|Sample),family="binomial",data=canopy.rmna)
summary(canopy.binomial.mod)
Still no luck. Any ideas how to address this?

R plot color legend by factor

Using R 3.3.1 in Windows 10. I'm making an x-y plot from 95 rows of data. The data are in 6 different groupings (a factor called "group"). The plot itself is easy enough, but I can't get the legend to properly account for the factor and color correctly.
Here's the data in a variable v1:
v1 <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = c("F9", "T26", "W37",
"W40", "W41", "W42"), class = "factor"), point = c(1L, 2L, 3L,
4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L,
16L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L,
14L, 15L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L,
13L, 14L, 15L, 16L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
11L, 12L, 13L, 14L, 15L, 16L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,
9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L), x = c(-7.064, -5.1681,
-6.4866, -2.7522, -4.6305, -4.2957, -3.7552, -4.9482, -5.6452,
-6.0302, -5.3244, -3.9819, -3.8123, -5.3085, -5.6096, -6.4557,
-5.2549, -3.4893, -3.5909, -2.5546, -3.7247, -5.1733, -3.3451,
-2.8993, -2.6835, -3.9495, -4.9649, -2.8438, -4.6926, -3.4768,
-3.1221, -4.8175, -4.5641, -3.549, -3.08, -2.4153, -2.9882, -3.4045,
-4.6394, -3.3404, -2.6728, -3.3517, -2.6098, -3.7733, -4.051,
-2.9385, -4.5024, -4.59, -4.5617, -4.0658, -2.4986, -3.7559,
-4.245, -4.8045, -4.6615, -4.0696, -4.6638, -4.6505, -3.7978,
-4.5649, -5.7669, -4.519, -3.8561, -3.779, -3.0549, -3.1241,
-2.1423, -3.2759, -4.224, -4.028, -3.3412, -2.8832, -3.3866,
-0.1852, -3.3763, -4.317, -5.3607, -3.3398, -1.9087, -4.431,
-3.7535, -3.2545, -0.806, -3.1419, -3.7269, -3.4853, -4.3129,
-2.8891, -3.0572, -5.3309, -2.5837, -4.1128, -4.6631, -3.4695,
-4.1045), y = c(7.76, 0.72, 4.1, 1.36, 0.13, -0.02, 0.13, 0.42,
1.49, 2.64, 1.01, 0.08, 0.22, 1.01, 1.53, 4.39, 0.99, 0.56, 0.43,
2.31, 0.31, 0.59, 0.62, 1.65, 2.12, 0.1, 0.24, 1.68, 0.09, 0.59,
1.23, 0.4, 0.36, 0.49, 1.41, 3.29, 1.22, 0.56, 0.1, 0.67, 2.38,
0.43, 1.56, 0.07, 0.08, 1.53, -0.01, 0.12, 0.1, 0.04, 3.42, 0.23,
0, 0.34, 0.15, 0.03, 0.19, 0.17, 0.2, 0.09, 2.3, 0.07, 0.15,
0.18, 1.07, 1.21, 3.4, 0.8, -0.04, 0.02, 0.74, 1.59, 0.71, 10.64,
0.64, -0.01, 1.06, 0.81, 4.58, 0.01, 0.14, 0.59, 7.35, 0.63,
0.17, 0.38, -0.08, 1.1, 0.89, 0.94, 1.52, 0.01, 0.1, 0.38, 0.02
)), .Names = c("group", "point", "x", "y"), class = "data.frame", row.names = c(NA,
-95L))
Here's the plot my attempts to overlay a legend:
> attach(v1)
> plot(x,y, pch=16, col=group) #simple plot, automatic colors
> #first legend
> legend("topleft", legend=group, pch=16, col=group)
> # colors matched, but it's breaking out every point
> legend("topright", legend=levels(group), pch=16, col=group)
> # Corrected the number of levels in legend, but no colors
>
You can see that the first legend appears correct color-wise, but it shows an entry for every point and runs out of space. The second legend shows group as factor levels, which is what I want, but it doesn't change the colors.
I realize that I could color as a vector (e.g. col(c("black","red", etc.), but since the original plot command automatically assigned colors, I'm looking to do it "automatically" in my legend and avoid the risk of putting the wrong colors in my vector.
Thanks!

base R solution:
attach(v1)
plot(x,y, pch=16, col=group)
legend("topleft", legend=levels(group), pch=16, col=unique(group))
ggplot2 solution
ggplot(v1)+
geom_point(aes(x=x,y=y,colour=group))+
theme_bw()
Again, I would strongly suggest the use of ggplot2 over base R unless you're only exploring the data. There are plenty of questions/answers on the matter on SO.

Try creating a new column in v1 that is a number based on the value of group (as a factor). Pass this column as the col when plotting the points. Then create a vector of numbers for legend in the same way and pass that as the col for legend.
v1$cols = as.numeric(as.factor(v1$group))
legend.cols = as.numeric(as.factor(levels(v1$group)))
plot(v1$x , v1$y, pch=16, col=v1$cols)
legend("topright", legend=levels(group), pch=16, col=legend.cols)

Converting object of class rules to data frame in R

I have an output of apriori function, which mines data and gives set of rules. I want to convert it to data frame for further processing.
The rules object looks like this:
> inspect(output)
lhs rhs support confidence lift
1 {curtosis=(846,1.27e+03]} => {skewness=(-0.254,419]} 0.2611233 0.8044944 2.418776
2 {variance=(892,1.34e+03]} => {notes.class=FALSE} 0.3231218 0.9888393 1.781470
3 {variance=(-0.336,446]} => {notes.class=TRUE} 0.2859227 0.8634361 1.940608
4 {skewness=(837,1.26e+03]} => {notes.class=FALSE} 0.2924872 0.8774617 1.580815
5 {entropy=(-0.155,386],
class=FALSE} => {skewness=(837,1.26e+03]} 0.1597374 0.9521739 2.856522
6 {variance=(-0.336,446],
curtosis=(846,1.27e+03]} => {skewness=(-0.254,419]} 0.1378556 0.8325991 2.503275
We can create rules object using data frame. Data frame looks like this:
> data
variance skewness curtosis entropy notes.class
1 (892,1.34e+03] (837,1.26e+03] (-0.268,424] (386,771] FALSE
2 (892,1.34e+03] (-0.254,419] (424,846] (771,1.16e+03] FALSE
3 (892,1.34e+03] (837,1.26e+03] (-0.268,424] (-0.155,386] FALSE
4 (446,892] (-0.254,419] (846,1.27e+03] (386,771] FALSE
Than we can get output variable using this:
> output <- apriori(data)
There was used arules package. dput(output) gives this:
new("rules"
, lhs = new("itemMatrix"
, data = new("ngCMatrix"
, i = c(8L, 2L, 0L, 5L, 9L, 12L, 0L, 8L, 0L, 3L, 0L, 8L, 8L, 13L, 8L,
10L, 3L, 10L, 8L, 11L, 8L, 13L, 3L, 12L, 2L, 5L, 2L, 6L, 2L,
5L, 2L, 6L, 2L, 10L, 2L, 7L, 2L, 11L, 0L, 3L, 0L, 10L, 0L, 7L,
11L, 13L, 5L, 6L, 6L, 12L, 5L, 10L, 1L, 5L, 4L, 6L, 6L, 13L,
0L, 3L, 8L, 0L, 8L, 13L, 3L, 8L, 13L, 0L, 3L, 13L, 2L, 5L, 6L,
2L, 5L, 12L, 2L, 6L, 12L)
, p = c(0L, 1L, 2L, 3L, 4L, 6L, 8L, 10L, 12L, 14L, 16L, 18L, 20L, 22L,
24L, 26L, 28L, 30L, 32L, 34L, 36L, 38L, 40L, 42L, 44L, 46L, 48L,
50L, 52L, 54L, 56L, 58L, 61L, 64L, 67L, 70L, 73L, 76L, 79L)
, Dim = c(14L, 38L)
, Dimnames = list(NULL, NULL)
, factors = list()
)
, itemInfo = structure(list(labels = structure(c("variance=(-0.336,446]",
"variance=(446,892]", "variance=(892,1.34e+03]", "skewness=(-0.254,419]",
"skewness=(419,837]", "skewness=(837,1.26e+03]", "curtosis=(-0.268,424]",
"curtosis=(424,846]", "curtosis=(846,1.27e+03]", "entropy=(-0.155,386]",
"entropy=(386,771]", "entropy=(771,1.16e+03]", "notes.class=FALSE",
"notes.class=TRUE"), class = "AsIs"), variables = structure(c(5L,
5L, 5L, 4L, 4L, 4L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L), .Label = c("curtosis",
"entropy", "notes.class", "skewness", "variance"), class = "factor"),
levels = structure(c(4L, 8L, 12L, 2L, 6L, 10L, 3L, 7L, 11L,
1L, 5L, 9L, 13L, 14L), .Label = c("(-0.155,386]", "(-0.254,419]",
"(-0.268,424]", "(-0.336,446]", "(386,771]", "(419,837]",
"(424,846]", "(446,892]", "(771,1.16e+03]", "(837,1.26e+03]",
"(846,1.27e+03]", "(892,1.34e+03]", "FALSE", "TRUE"), class = "factor")), .Names = c("labels",
"variables", "levels"), row.names = c(NA, -14L), class = "data.frame")
, itemsetInfo = structure(list(), .Names = character(0), row.names = integer(0), class = "data.frame")
)
, rhs = new("itemMatrix"
, data = new("ngCMatrix"
, i = c(3L, 12L, 13L, 12L, 5L, 3L, 8L, 13L, 0L, 3L, 8L, 3L, 3L, 8L,
6L, 5L, 12L, 12L, 12L, 12L, 12L, 13L, 13L, 13L, 3L, 12L, 5L,
12L, 12L, 13L, 4L, 13L, 3L, 0L, 8L, 12L, 6L, 5L)
, p = 0:38
, Dim = c(14L, 38L)
, Dimnames = list(NULL, NULL)
, factors = list()
)
, itemInfo = structure(list(labels = structure(c("variance=(-0.336,446]",
"variance=(446,892]", "variance=(892,1.34e+03]", "skewness=(-0.254,419]",
"skewness=(419,837]", "skewness=(837,1.26e+03]", "curtosis=(-0.268,424]",
"curtosis=(424,846]", "curtosis=(846,1.27e+03]", "entropy=(-0.155,386]",
"entropy=(386,771]", "entropy=(771,1.16e+03]", "notes.class=FALSE",
"notes.class=TRUE"), class = "AsIs"), variables = structure(c(5L,
5L, 5L, 4L, 4L, 4L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L), .Label = c("curtosis",
"entropy", "notes.class", "skewness", "variance"), class = "factor"),
levels = structure(c(4L, 8L, 12L, 2L, 6L, 10L, 3L, 7L, 11L,
1L, 5L, 9L, 13L, 14L), .Label = c("(-0.155,386]", "(-0.254,419]",
"(-0.268,424]", "(-0.336,446]", "(386,771]", "(419,837]",
"(424,846]", "(446,892]", "(771,1.16e+03]", "(837,1.26e+03]",
"(846,1.27e+03]", "(892,1.34e+03]", "FALSE", "TRUE"), class = "factor")), .Names = c("labels",
"variables", "levels"), row.names = c(NA, -14L), class = "data.frame")
, itemsetInfo = structure(list(), .Names = character(0), row.names = integer(0), class = "data.frame")
)
, quality = structure(list(support = c(0.261123267687819, 0.323121808898614,
0.285922684172137, 0.292487235594457, 0.159737417943107, 0.137855579868709,
0.137855579868709, 0.142231947483589, 0.142231947483589, 0.110138584974471,
0.110138584974471, 0.12399708242159, 0.153902261123268, 0.107221006564551,
0.13056163384391, 0.13056163384391, 0.150984682713348, 0.139314369073669,
0.100656455142232, 0.107221006564551, 0.154631655725748, 0.165572574762947,
0.112326768781911, 0.105762217359592, 0.12180889861415, 0.181619256017505,
0.181619256017505, 0.102844638949672, 0.105762217359592, 0.12837345003647,
0.12837345003647, 0.137855579868709, 0.137855579868709, 0.137855579868709,
0.137855579868709, 0.13056163384391, 0.13056163384391, 0.13056163384391
), confidence = c(0.804494382022472, 0.988839285714286, 0.863436123348018,
0.87746170678337, 0.952173913043478, 0.832599118942731, 0.832599118942731,
0.859030837004405, 0.898617511520737, 0.853107344632768, 0.915151515151515,
0.80188679245283, 0.972350230414747, 0.885542168674699, 0.864734299516908,
0.913265306122449, 1, 0.974489795918367, 1, 1, 0.990654205607477,
1, 0.980891719745223, 0.873493975903614, 0.814634146341463, 0.943181818181818,
0.950381679389313, 1, 0.92948717948718, 0.931216931216931, 0.897959183673469,
1, 0.969230769230769, 0.895734597156398, 0.832599118942731, 1,
0.864734299516908, 0.93717277486911), lift = c(2.41877587226493,
1.78146998779801, 1.94060807395104, 1.580814717477, 2.85652173913043,
2.50327498261071, 2.56515369004603, 1.93070701234925, 2.71366653809456,
2.56493458221826, 2.81948927477017, 2.41093594836147, 2.92344773223381,
2.72826587247868, 2.58853870008227, 2.73979591836735, 1.80157687253614,
1.75561827884899, 1.80157687253614, 1.80157687253614, 1.78473970550309,
2.24754098360656, 2.20459434060771, 1.96321350977681, 2.44926187419769,
1.69921455023295, 2.85114503816794, 1.80157687253614, 1.67454260588295,
2.09294821753838, 2.68799572230639, 2.24754098360656, 2.91406882591093,
2.70496064471679, 2.56515369004603, 1.80157687253614, 2.58853870008227,
2.81151832460733)), row.names = c(NA, 38L), .Names = c("support",
"confidence", "lift"), class = "data.frame")
, info = structure(list(data = data, ntransactions = 1371L, support = 0.1,
confidence = 0.8), .Names = c("data", "ntransactions", "support",
"confidence"))
)

We can't duplicate your data from your question (oh, you just added your data as I was typing this! Sorry!), so I'll use the example from the arules package:
library('arules');
data("Adult")
## Mine association rules.
rules <- apriori(Adult,
parameter = list(supp = 0.5, conf = 0.9,
target = "rules"))
Then I can duplicate the stuff output from inspect(rules):
> ruledf = data.frame(
lhs = labels(lhs(rules))$elements,
rhs = labels(rhs(rules))$elements,
rules#quality)
> head(ruledf)
lhs rhs support confidence lift
1 {} {capital-gain=None} 0.9173867 0.9173867 1.0000000
2 {} {capital-loss=None} 0.9532779 0.9532779 1.0000000
3 {hours-per-week=Full-time} {capital-gain=None} 0.5435895 0.9290688 1.0127342
4 {hours-per-week=Full-time} {capital-loss=None} 0.5606650 0.9582531 1.0052191
5 {sex=Male} {capital-gain=None} 0.6050735 0.9051455 0.9866565
6 {sex=Male} {capital-loss=None} 0.6331027 0.9470750 0.9934931
and do stuff like order by decreasing lift:
head(ruledf[order(-ruledf$lift),])
The help for the rules class: http://www.rdocumentation.org/packages/arules/functions/rules-class.html will tell you what you can get from your rules object - I just used that information to build a data frame. If its not exactly what you want, then cook one up using your own recipe!

Run apriori in data Adult
rules <- apriori(Adult, parameter = list(supp = 0.5, conf = 0.9, target =
"rules"))
Inspect LHS, RHS, support, confidence and lift
arules::inspect(rules)
Create a dataframe
df = data.frame(
lhs = labels(lhs(rules)),
rhs = labels(rhs(rules)),
rules#quality)
View top 6 lines in new dataframe
head(df)

This does the trick
rules_dataframe <- as(output, 'data.frame')

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to subtract rows where column factor matches using dplyr? - r

Related

R - Create dataframe from a list of data with different number of columns

Count mean subgroup occurrence within subgroup

Trouble with GLMM with glmer in R: Error in pwrssUpdate...halvings failed to reduce deviance in pwrssUpdate

R plot color legend by factor

Converting object of class rules to data frame in R

Categories

Resources