So I am trying to program function with dplyr withou loop and here is something I do not know how to do
Say we have tv stations (x,y,z) and months (2,3). If I group by this say we get
this output also with summarised numeric value
TV months value
x 2 52
y 2 87
z 2 65
x 3 180
y 3 36
z 3 99
This is for evaluated Brand.
Then I will have many Brands I need to filter to get only those which get value >=0.8*value of evaluated brand & <=1.2*value of evaluated brand
So for example from this down I would only want to filter first two, and this should be done for all months&TV combinations
brand TV MONTH value
sdg x 2 60
sdfg x 2 55
shs x 2 120
sdg x 2 11
sdga x 2 5000
As #akrun said, you need to use a combination of merging and subsetting. Here's a base R solution.
m <- merge(df, data, by.x=c("TV", "MONTH"), by.y=c("TV", "months"))
m[m$value.x >= m$value.y*0.8 & m$value.x <= m$value.y*1.2,][,-5]
# TV MONTH brand value.x
#1 x 2 sdg 60
#2 x 2 sdfg 55
Data
data <- structure(list(TV = structure(c(1L, 2L, 3L, 1L, 2L, 3L), .Label = c("x",
"y", "z"), class = "factor"), months = c(2L, 2L, 2L, 3L, 3L,
3L), value = c(52L, 87L, 65L, 180L, 36L, 99L)), .Names = c("TV",
"months", "value"), class = "data.frame", row.names = c(NA, -6L
))
df <- structure(list(brand = structure(c(2L, 1L, 4L, 2L, 3L), .Label = c("sdfg",
"sdg", "sdga", "shs"), class = "factor"), TV = structure(c(1L,
1L, 1L, 1L, 1L), .Label = "x", class = "factor"), MONTH = c(2L,
2L, 2L, 2L, 2L), value = c(60L, 55L, 120L, 11L, 5000L)), .Names = c("brand",
"TV", "MONTH", "value"), class = "data.frame", row.names = c(NA,
-5L))
Related
I have the following dataset
structure(list(Var1 = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L), .Label = c("0", "1"), class = "factor"), Var2 = structure(c(1L,
1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("congruent", "incongruent"
), class = "factor"), Var3 = structure(c(1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L), .Label = c("spoken", "written"), class = "factor"),
Freq = c(8L, 2L, 10L, 2L, 10L, 2L, 10L, 2L)), class = "data.frame", row.names = c(NA,
-8L))
I would like to add another column reporting sum of coupled subsequent rows. Thus the final result would look like this:
I have proceeded like this
Table = as.data.frame(table(data_1$unimodal,data_1$cong_cond, data_1$presentation_mode)) %>%
mutate(Var1 = factor(Var1, levels = c('0', '1')))
row = Table %>% #is.factor(Table$Var1)
summarise(across(where(is.numeric),
~ .[Var1 == '0'] + .[Var1 == '1'],
.names = "{.col}_sum"))
column = c(rbind(row$Freq_sum,rep(NA, 4)))
Table$column = column
But I am looking for the quickest way possible with no scripting separated codes. Here I have used the dplyr package, but if you might know possibly suggest some other ways with map(), for loop, and or the method you deem as the best, please just let me know.
This should do:
df$column <-
rep(colSums(matrix(df$Freq, 2)), each=2) * c(1, NA)
If you are fine with no NAs in the dataframe, you can
df %>%
group_by(Var2, Var3) %>%
mutate(column = sum(Freq))
# A tibble: 8 × 5
# Groups: Var2, Var3 [4]
Var1 Var2 Var3 Freq column
<fct> <fct> <fct> <int> <int>
1 0 congruent spoken 8 10
2 1 congruent spoken 2 10
3 0 incongruent spoken 10 12
4 1 incongruent spoken 2 12
5 0 congruent written 10 12
6 1 congruent written 2 12
7 0 incongruent written 10 12
8 1 incongruent written 2 12
I have a vector of numbers:
a <- c(54, 456, 23432, 4868, 34, 245634, 37, 46453, 1342354)
In my already-existent dataframe (head included via dput below), I would like to create a new variable. Each row of the new variable will contain a single element from the vector. So there would be one value (e.g. 54) in each row of the new variable.
structure(list(Phone = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = "a", class = "factor"), Frame = structure(c(1L,
3L, 2L, 4L, 6L, 5L), .Label = c("[-4.46225397 -4.14727267 -4.45203785 -4.67251549 -5.13750066 -4.92839463\n -5.03957588 -5.68530479]",
"[-6.14532579 -4.38918589 -4.12275354 -4.19263549 -4.30380823 -4.35621995\n -4.4079389 -4.47339504]",
"[-6.43104195 -4.75506178 -4.2324676 -4.21878988 -4.1635973 -4.11186806\n -4.05023489 -4.08204198]",
"[-7.1528423 -5.46190925 -5.94873845 -6.635839 -6.84179002 -6.85955335\n -6.83714326 -6.87621415]",
"[-7.23901353 -4.61522546 -3.25206619 -3.38407075 -3.63762837 -3.85352927\n -3.94250123 -4.04015791]",
"[-7.34451319 -5.58664694 -4.69929752 -4.621823 -4.51670576 -4.48494125\n -4.39512713 -4.26553646]"
), class = "factor"), Previous = structure(c(1L, 1L, 1L, 1L,
1L, 1L), .Label = "ch", class = "factor"), Following = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = "p", class = "factor"), Word = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = "juk'ucha-pi", class = "factor"),
Note = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "", class = "factor"),
"[-10.79197258 -7.97949955 -7.10253093 -7.07957825 -6.98695923\n -6.90015207 -6.79672506 -6.85010073",
"[-10.31251047 -7.36552088 -6.91841906 -7.0356884 -7.2222481\n -7.31020053 -7.39699043 -7.5068328 ",
"[-12.00323036 -9.16566481 -9.982616 -11.13564383 -11.48125155\n -11.51106031 -11.47345379 -11.5390189 ",
"[-12.32487451 -9.37498793 -7.8859212 -7.7559107 -7.5795128\n -7.52620857 -7.37549093 -7.15802398",
"[-12.14783486 -7.74483933 -5.45731306 -5.67883075 -6.10432742\n -6.46663209 -6.61593651 -6.77981481"
), Morph_status = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "", class = "factor"),
row.names = c(NA, 6L), class = "data.frame")
When working with data frames, each variable (column) has as many entries as there are rows. What you are describing then is not a data frame and, if I understand you question correctly, the best your can do is going back to general lists:
df <- data.frame(a = 1:3, b = 1:3)
c(as.list(df), c = list(a))
# $a
# [1] 1 2 3
#
# $b
# [1] 1 2 3
#
# $c
# [1] 54 456 23432 4868 34 245634 37 46453 1342354
One other option, as to still have a data frame, would be to fill all the shorter columns with NA's:
library(rowr)
cbind.fill(df, a, fill = NA)
# a b object
# 1 1 1 54
# 2 2 2 456
# 3 3 3 23432
# 4 NA NA 4868
# 5 NA NA 34
# 6 NA NA 245634
# 7 NA NA 37
# 8 NA NA 46453
# 9 NA NA 1342354
I have the following data set:
Class Total AC Final_Coverage
A 1000 1 55
A 1000 2 66
B 1000 1 77
A 1000 3 88
B 1000 2 99
C 1000 1 11
B 1000 3 12
B 1000 4 13
B 1000 5 22
C 1000 2 33
C 1000 3 44
C 1000 4 55
C 1000 5 102
A 1000 4 105
A 1000 5 109
I would like to get the average of the AC and the Final_Coverage for the first three rows of each class. Then, I want to store the average values along with the class name in a new dataframe. To do that, I did the following:
dataset <- read_csv("/home/ad/Desktop/testt.csv")
classes <- unique(dataset$Class)
new_data <- data.frame(Class = character(0), AC = numeric(0), Coverage = numeric(0))
for(class in classes){
new_data$Class <- class
dataClass <- subset(dataset, Class == class)
tenRows <- dataClass[1:3,]
coverageMean <- mean(tenRows$Final_Coverage)
acMean <- mean(tenRows$AC)
new_data$Coverage <- coverageMean
new_data$AC <- acMean
}
Everything works fine except entering the average value into the new_data frame. I get the following error:
Error in `$<-.data.frame`(`*tmp*`, "Class", value = "A") :
replacement has 1 row, data has 0
Do you know how to solve this?
This should get you the new dataframe by using dplyr.
dataset %>% group_by(Class) %>% slice(1:3) %>% summarise(AC= mean(AC),
Coverage= mean(Final_Coverage))
In your method the error is that you initiated your new dataframe with 0 rows and try to assign a single value to it. This is reflected by the error. You want to replace one row to a dataframe with 0 rows. This would work, though:
new_data <- data.frame(Class = classes, AC = NA, Coverage = NA)
for(class in classes){
new_data$Class <- class
dataClass <- subset(dataset, Class == class)
tenRows <- dataClass[1:3,]
coverageMean <- mean(tenRows$Final_Coverage)
acMean <- mean(tenRows$AC)
new_data$Coverage[classes == class] <- coverageMean
new_data$AC[classes == class] <- acMean
}
You could look into aggregate().
> aggregate(df1[df1$AC <= 3, 3:4], by=list(Class=df1[df1$AC <= 3, 1]), FUN=mean)
Class AC Final_Coverage
1 A 2 69.66667
2 B 2 62.66667
3 C 2 29.33333
DATA
df1 <- structure(list(Class = structure(c(1L, 1L, 2L, 1L, 2L, 3L, 2L,
2L, 2L, 3L, 3L, 3L, 3L, 1L, 1L), .Label = c("A", "B", "C"), class = "factor"),
Total = c(1000L, 1000L, 1000L, 1000L, 1000L, 1000L, 1000L,
1000L, 1000L, 1000L, 1000L, 1000L, 1000L, 1000L, 1000L),
AC = c(1L, 2L, 1L, 3L, 2L, 1L, 3L, 4L, 5L, 2L, 3L, 4L, 5L,
4L, 5L), Final_Coverage = c(55L, 66L, 77L, 88L, 99L, 11L,
12L, 13L, 22L, 33L, 44L, 55L, 102L, 105L, 109L)), class = "data.frame", row.names = c(NA,
-15L))
I am studying this webpage, and cannot figure out how to rename freq to something else, say number of times imbibed
Here is dput
structure(list(name = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L), .Label = c("Bill", "Llib"), class = "factor"), drink = structure(c(2L,
3L, 1L, 4L, 2L, 3L, 1L, 4L), .Label = c("cocoa", "coffee", "tea",
"water"), class = "factor"), cost = 1:8), .Names = c("name",
"drink", "cost"), row.names = c(NA, -8L), class = "data.frame")
And this is working code with output. Again, I'd like to rename the freq column. Thanks!
library(plyr)
bevs$cost <- as.integer(bevs$cost)
count(bevs, "name")
Output
name freq
1 Bill 4
2 Llib 4
Are you trying to do this?
counts <- count(bevs, "name")
names(counts) <- c("name", "number of times imbibed")
counts
The count() function returns a data.frame. Just rename it like any other data.frame:
counts <- count(bevs, "name")
names(counts)[which(names(counts) == "freq")] <- "number of times imbibed"
print(counts)
# name number of times imbibed
# 1 Bill 4
# 2 Llib 4
I have following data and code:
dd
grp categ condition value
1 A X P 2
2 B X P 5
3 A Y P 9
4 B Y P 6
5 A X Q 4
6 B X Q 5
7 A Y Q 8
8 B Y Q 2
>
>
dput(dd)
structure(list(grp = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L), .Label = c("A", "B"), class = "factor"), categ = structure(c(1L,
1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("X", "Y"), class = "factor"),
condition = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("P",
"Q"), class = "factor"), value = c(2, 5, 9, 6, 4, 5, 8, 2
)), .Names = c("grp", "categ", "condition", "value"), out.attrs = structure(list(
dim = structure(c(2L, 2L, 2L), .Names = c("grp", "categ",
"condition")), dimnames = structure(list(grp = c("grp=A",
"grp=B"), categ = c("categ=X", "categ=Y"), condition = c("condition=P",
"condition=Q")), .Names = c("grp", "categ", "condition"))), .Names = c("dim",
"dimnames")), row.names = c(NA, -8L), class = "data.frame")
ggplot(dd, aes(grp,value, fill=condition))+geom_bar(stat='identity')+facet_grid(~categ)
How can I convert this bar chart to pie chart? I want 4 pies here with their sizes corresponding to heights of respective bars here. I tried following but they did not work:
ggplot(dd, aes(grp,value, fill=condition))+geom_bar(stat='identity')+facet_grid(~categ)+coord_polar()
ggplot(dd, aes(grp,value, fill=condition))+geom_bar(stat='identity')+facet_grid(~categ)+coord_polar('y')
I also tried to make pie chart similar to Pie charts in ggplot2 with variable pie sizes but I am not able to manage with my data. Thanks for your help.
Using the same idea as in the link you posted, you could add a column size do your dataframe that would be the sum of the values for each group, and use that as the width argument:
library(dplyr)
dd<-dd %>% group_by(categ,grp) %>% mutate(size=sum(value))
ggplot(dd, aes(x=size/2,y=value,fill=condition,width=size))+geom_bar(position="fill",stat='identity')+facet_grid(grp~categ)+coord_polar("y")
You want the group and category both to be variables for the grid, and not inside any plot. Here are two different layouts. X ought to be any single item, string, or something else.
ggplot(dd, aes(x=factor(1),y=value,
fill=condition))+geom_bar(stat='identity')+
facet_grid(~grp+categ)+coord_polar("x")
ggplot(dd, aes(x=factor(1),y=value,
fill=condition))+geom_bar(stat='identity')+
facet_grid(grp~categ)+coord_polar("x")
Something strange happened with the top opening here, maybe its just my interface. Should get you going enough though!