Mean of the sample obtained - r

I have this information:
Student: 1 2 3 4 5 6 7 8 9 10
Mark: 85 62 90 85 64 72 70 59 66 70
So I did this:
x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
y <- c(85, 62, 90, 85, 64, 72, 70, 59, 66, 70)
And I wanted to take 10 samples of 4 students and then obtain the mean of those students. I took the samples using the next command 10 times (sample1, sample2, ..., sample10):
sample1 <- sample(x, 4, replace=FALSE, prob=NULL)
Is there any easier way to obtain the samples?
How could I obtain the mean of those samples programming?
I have tried the following:
meansample1 <- mean(sample1)

You can calculate the mean marks for four students with:
mean(sample(y, 4, replace = FALSE))
Then, we can replicate this task 10 times with replicate():
replicate(n = 10, mean(sample(y, 4, replace = FALSE)))
# [1] 78.75 72.25 78.00 70.25 74.25 79.25 72.25 64.25 76.50 69.00

Related

How to define the mapping parameter iteratively to contract vertices chains?

I have a simple graph g. It is requared to smoth the graph by deleting the vertices whose degree is 2 with preserving the layout of the original graph. The same task was solved in the Mathematica.
library(igraph)
set.seed(1)
# preprocessing
g <- sample_gnp(40, 1/20)
V(g)$name <- seq(1:vcount(g))
components <- clusters(g, mode="weak")
biggest_cluster_id <- which.max(components$csize)
vert_ids <- V(g)[components$membership == biggest_cluster_id]
vert_ids
# input random graph
g <- induced_subgraph(g, vert_ids)
LO = layout.fruchterman.reingold(g)
plot(g, vertex.color = ifelse(degree(g)==2, "red", "green"), main ="g", layout = LO)
I have selected vertices chains with a degree of 2.
subg <- induced_subgraph(g, degree(g)==2)
subg_ids <- V(subg); subg_ids
I have read the Q&A and I manually define the mapping parameter of the contract() function.
# join nodes 3 -> 14, 15 -> 40, 13 -> 31, 29 -> 6
mapping = c(2, 3, 4, 5, 6, 7, 8, 10, 13, 3, 15, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 27, 6, 30, 13, 32, 33, 34, 35, 36, 38, 39, 15)
g2 <- simplify(contract(g, mapping=mapping, vertex.attr.comb=toString))
# L2 <- LO[-as.numeric(c(14, 40, 31, 6)),] # not working
plot(g2, vertex.color = ifelse(degree(g2)==2, "red", "green"), main ="g2")
Question. What is a possible way to define the mapping parameter iteratively?
Here is an option without mapping in contract (so you don't need to configure mapping manually)
g2 <- graph_from_data_frame(
rbind(
get.data.frame(delete.vertices(g, names(subg_ids))),
do.call(
rbind,
lapply(
decompose(subg),
function(x) {
nbs <- names(unlist(neighborhood(g, nodes = names(V(x))[degree(x) < 2])))
setNames(data.frame(t(subset(nbs, !nbs %in% names(subg_ids)))), c("from", "to"))
}
)
)
),
directed = FALSE
)
and you will see the graph below after running
plot(g2, main = "g2", layout = LO[match(names(V(g2)), names(V(g))), ])
This is only a partial answer, since it does not give a way to compute the contraction automatically. However, I can give some insights on the manual mapping:
Your vertices have names, so those are used for reference instead of the internal vertex number from 1 to n.
In the mapping we need to give the new IDs of the vertices after the contraction.
The original IDs are
> V(g)
+ 33/33 vertices, named, from 0af52c3:
[1] 2 3 4 5 6 7 8 10 13 14 15 16 17 18 19 20 21 22 23 25 26 27 29 30 31 32 33 34 35 36 38 39 40
The new IDs can be given as (multiple possibilities exist):
mapping <- c(6, 14, 6, 5, 6, 7, 7, 10, 31, 14, 15, 16, 17, 14, 6, 7, 31, 22, 6, 25, 26, 27, 14, 30, 31, 6, 6, 34, 35, 36, 38, 39, 15)
For better overview:
old ID: 2 3 4 5 6 7 8 10 13 14 15 16 17 18 19 20 21 22 23 25 26 27 29 30 31 32 33 34 35 36 38 39 40
new ID: 6 14 6 5 6 7 7 10 31 14 15 16 17 14 6 7 31 22 6 25 26 27 14 30 31 6 6 34 35 36 38 39 15
This results in:
g2 <- simplify(contract(g, mapping=mapping, vertex.attr.comb=toString))
plot(g2, vertex.color = ifelse(degree(g2)==2, "red", "green"), main ="g2")
To get rid of the now existing degree-0-nodes you can do:
g3 <- delete.vertices(g2, which(degree(g2) == 0))
Alternatively, and maybe even cleaner you could delete nameless nodes:
g3 <- delete.vertices(g2, which(names(V(g2)) == ""))
To keep the original layout you can do:
L3 <- LO[-which(mapping != as.numeric(names(V(g)))),]
plot(g3, layout = L3)
But is not very good looking in this case...

Cleaning Data & Association Rules - R

I am trying to tidy the following dataset (in link) in R and then run an association rules below.
https://www.kaggle.com/fanatiks/shopping-cart
install.packages("dplyr")
library(dplyr)
df <- read.csv("Groceries (2).csv", header = F, stringsAsFactors = F, na.strings=c(""," ","NA"))
install.packages("stringr")
library(stringr)
temp1<- (str_extract(df$V1, "[a-z]+"))
temp2<- (str_extract(df$V1, "[^a-z]+"))
df<- cbind(temp1,df)
df[2] <- NULL
df[35] <- NULL
View(df)
summary(df)
str(df)
trans <- as(df,"transactions")
I get the following error when I run the above trans <- as(df,"transactions") code:
Warning message:
Column(s) 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34 not logical or factor. Applying default discretization (see '? discretizeDF').
summary(trans)
When I run the above code, I get the following:
transactions as itemMatrix in sparse format with
1499 rows (elements/itemsets/transactions) and
1268 columns (items) and a density of 0.01529042
most frequent items:
V5= vegetables V6= vegetables temp1=vegetables V2= vegetables
140 113 109 108
V9= vegetables (Other)
103 28490
The attached results is showing all the vegetable values as separate items instead of a combined vegetable score which is obviously increasing my number of columns. I am not sure why this is happening?
fit<-apriori(trans,parameter=list(support=0.006,confidence=0.25,minlen=2))
fit<-sort(fit,by="support")
inspect(head(fit))
For coercion to transaction class the dataframe needs to be made up of factor columns. You have a dataframe of characters - hence the error message. The data requires some further cleaning in order to get it to coerce properly.
I'm not very familiar with the arules package but I believe the read.transactions function may be more useful as it would automatically discard duplicates. I found it easiest to make a binary matrix and use a for loop, but I am sure there is a neater solution.
Continuing on directly from your code:
items <- as.character(unique(unlist(df))) # get all unique items
items <- items[which(str_detect(items, "[a-z]"))] # remove numbers
trans <- matrix(0, nrow = nrow(df), ncol = length(items))
for(i in 1:nrow(df)){
trans[i,which(items %in% t(df[i,]))] <- 1
}
colnames(trans) <- items
rownames(trans) <- temp2
trans <- as(trans, "transactions")
summary(trans)
Giving
transactions as itemMatrix in sparse format with
1637 rows (elements/itemsets/transactions) and
38 columns (items) and a density of 0.3359965
most frequent items:
vegetables poultry waffles ice cream lunch meat (Other)
1058 582 562 556 555 17588
element (itemset/transaction) length distribution:
sizes
0 1 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
102 36 8 57 51 51 71 69 63 80 79 58 84 91 72 105 97 87 114 91 82 46 30 7 4 2
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 8.00 14.00 12.77 18.00 26.00
includes extended item information - examples:
labels
1 pork
2 shampoo
3 juice
includes extended transaction information - examples:
transactionID
1 1/1/2000
2 1/1/2000
3 2/1/2000

Interpolating but fixing a range in R

I get some extreme values in the beginning and in the end when interpolating.
In fact, the last b values should not be greater than max(b), and the first values should not be less than 0.
data example:
a<-c(1, 3, 4, 6, 8.7, 9, 10, 12, 19.3, 20)
b<-c(10, 30, 40, 60, 87, 90, 100, 120, 190, 200)
df<-data.frame(a=a, b=b)
> df
a b
1 1.0 10
2 3.0 30
3 4.0 40
4 6.0 60
5 8.7 87
6 9.0 90
7 10.0 100
8 12.0 120
9 19.3 190
10 20.0 200
This is the code I'm using right now:
Hmisc::approxExtrap(df$a, df$b, xout = c(0:25))
Wrap it in pmin and pmax:
pmin(max(df$b), pmax(min(df$b), approxExtrap(df$a, df$b, xout = c(0:25))))
This will keep the upper and lower bounds of b. If you want to replace the lower bound of b (currently 1) with 0, replace min(df$b) with 0.

Create sequence of additions

I want to extract specific rows of my dataframe, following a sequence of rownumbers.
The sequence should be:
7, 14, 21, 31, 38, 45, 55, 62, 69.....until 8760.
So it always is starting from row 7 and then it goes +7 +7 +10 and this should be repeated until the end.
I know rep and seq, but I don't know how to deal with that +10 after the +7.
Any ideas?
Try
x <- rep(c(7, 10), c(2, 1))
out <- cumsum(c(7, rep(x, ceiling(8760 / sum(x)))))
Result
head(out, 10)
# [1] 7 14 21 31 38 45 55 62 69 79
tail(out)
# [1] 8726 8733 8743 8750 8757 8767
If you want out to end at 8760 you might do
c(out[out < 8760], 8760)
We can use rep
x1 <- rep(c(7, 10), c(2, 1))
out <- cumsum(c(7, rep(x1, 8760 %/% sum(x1)))))
out1 <- out[out < 8760]
head(out1, 10)
#[1] 7 14 21 31 38 45 55 62 69 79
tail(out1, 10)
#[1] 8685 8695 8702 8709 8719 8726 8733 8743 8750 8757

Algorithm to Ascribe Value RStudio

I am trying to create an algorithm that essentially is a function of this data frame.
This is the code I was using, but it doesn't seem to be working.
I need image_id to be the independent variable so that when I input 7 into the function, I get back 10 and 15. If I were to input 8, I would get back 11 and 13.
num = function(image_id, category_id, data = categories) {x->y}
This is the data frame that I am using.
category_id image_id cat_to_img_last_update
1 15 15 NULL
2 11 11 NULL
3 13 13 NULL
4 10 10 NULL
5 35 35 NULL
6 78 78 NULL
7 112 112 NULL
8 61 61 NULL
9 86 86 NULL
10 101 101 NULL
11 61 61 NULL
12 86 86 NULL
You probably don't need a function for this, but if you really want, here is what it would look like:
# Read in data
categories <-
data.frame(category_id = c(15,11,13,10,35,78,112,61,86,101,61,86),
image_id = c(7,8,8,7,9,9,10,10,11,11,12,12),
stringsAsFactors = FALSE)
num <- function(image_id, data = categories) {
data$category_id[data$image_id == image_id]
}
num(7) # 15 10
num(8) # 11 13
df <- data.frame(
category_id = c(15, 11, 13, 10, 25, 78, 112, 61, 86, 101, 61, 86),
image_id = c(7, 8, 8, 7, 9, 9, 10, 10, 11, 11, 12, 12)
)
myfun <- function(num) { sort(df[df$image_id == num, "category_id"]) }
myfun(7)
myfun(8)

Resources