Output of binaryRatingMatrix in r - r

I have created a matrix in Excel, of customers and items that have been purchased, by the customers. The column names are "Item 1", "Item 2" ... "Item n" and the row names are "Customer 1", "Customer 2", ... ,"Customer n".
The code is as follows:
library(recommenderlab)
setwd("C:\\Users\\amitai\\Desktop\\se")
USERBASE <- read.csv("USERBASE.csv")
USERBASE2 <- as(USERBASE,"binaryRatingMatrix")
rec <- Recommender(USERBASE2, method = "UBCF")
recommended.items.customer1 <- predict(rec, USERBASE2["customer 1",], n=5)
as(recommended.items.customer1, "list")
I expected to get a list of 5 items, which are most recommended to customer 1.
Instead, I got this output:
$`customer 1`
[1] "1"
After running the same code for customer 103, I got a similar output:
$`customer 103`
[1] "1"
My questions are as follows:
Is there a problem with the code I wrote?
Is there another action I should take after reading the CSV file?
How is the correct output supposed to look like?

Related

How to filter a List of Character Vectors in R?

I'm starting to go round in circles. I feel I have searched online thoroughly but suspect I can't see the wood for the trees now after a few days of coming back to this problem.
I am looking to scrape multiple sets of data from thousands of excel files on a company SharePoint. I have been able to scrape successfully using readxl.
library(readxl)
library(data.table)
library(XLConnect)
root_URL <- '//companyname.office.abc.com/sites/thesite/thefolder')
folder.list <- list.dirs(root_URL)
file.list <- list.files(folder.list, pattern = "*.(xlsx|XLSX|xls|XLS|xlsm|XLSM|xlsb|XLSB)$",full.names = T,include.dirs = T)
Which results in a nice list of all the files I potentially required to scrape from. I have successfully pulled the data I need from the specific tab ("Address") from the 3rd, 4th and 5th files in my list using the following code.
ex.list <- file.list[3:5]
ex.list <- setNames(ex.list, ex.list)
df.list <- lapply(ex.list, read_excel, sheet = 'Address' )
df.list <- Map(function(df, name) {
df$source_name <- name
df
}, df.list, names(df.list))
df <- rbindlist(df.list, idcol = "id")
write.csv(df,"testdata1.csv")
The problem I have run into is the 1st, 2nd (and other files) do not have a tab called "Address" and I need to exclude these files from my file.list but because this is a list of character vectors I'm struggling to filter the list to exclude when a file doesn't contain a tab called "Address"
I have used lappy with the following result, and have even tried sapply (also shared) but am now struggling to write the conditional statement. Feeling very close but so very far away.
> aa <- lapply(ex.list, excel_sheets)
> aa
[[1]]
[1] "NODE SIDE A" "NODE SIDE B" "LMA" "BASE" "TUBE" "Notes"
[[2]]
[1] "NODE SIDE A" "LMA" "BASE" "TUBE" "Notes"
[[3]]
[1] "Equipment-Details" "Address" "Drop Down Values" "Validation Status" "EquipMaster"
[[4]]
[1] "Equipment-Details" "Address" "Drop Down Values" "Validation Status" "EquipMaster"
[[5]]
[1] "Equipment-Details" "Address" "Drop Down Values" "Validation Status" "EquipMaster"
> bb <- sapply(ex.list, excel_sheets)
> bb
$'//companyname.office.abc.com/sites/thesite/thefolder/subfolder/file1.xls`
[1] "NODE SIDE A" "NODE SIDE B" "LMA" "BASE" "TUBE" "Notes"
$`//companyname.office.abc.com/sites/thesite/thefolder/subfolder/file2.xls`
[1] "NODE SIDE A" "LMA" "BASE" "TUBE" "Notes"
$`//companyname.office.abc.com/sites/thesite/thefolder/subfolder/file3.xls`
[1] "Equipment-Details" "Address" "Drop Down Values" "Validation Status" "EquipMaster"
$`//companyname.office.abc.com/sites/thesite/thefolder/subfolder/file4.xls`
[1] "Equipment-Details" "Address" "Drop Down Values" "Validation Status" "EquipMaster"
$`//companyname.office.abc.com/sites/thesite/thefolder/subfolder/file5.xls`
[1] "Equipment-Details" "Address" "Drop Down Values" "Validation Status" "EquipMaster"
I think this should work:
library(readxl)
df.list <- lapply(ex.list, function(x)
if ("Address" %in% excel_sheets(x)) read_excel(x,sheet = 'Address')
else NULL)
Reading in all files you could filter the list using
aa <- list(c("A", "B", "C"),
c("A", "B", "Address"),
c("A", "B", "Address"),
c("A", "B", "C"))
aa[grep(pattern = "Address", aa)]

R: pasting (or combining) a variable amount of rows together as one

I have a text file I am trying to parse and put the information into a data frame. In each one of the 'events' there may or may not be some notes with it. However the notes can span various amounts of rows. I need to concatenate the notes for each event into one string to store in a column of the data frame.
ID: 20470
Version: 1
notes:
ID: 01040
Version: 2
notes:
The customer was late.
Project took 20 min. longer than anticipated
Work was successfully completed
ID: 00000
Version: 1
notes:
Customer was not at home.
ID: 00000
Version: 7
notes:
Fax at 2:30 pm
Called but no answer
Visit home no answer
Left note on door with call back number
Made a final attempt on 12/5/2013
closed case on 12/10 with nothing resolved
So for example for the third event the notes should be one long string: "The customer was late. Project took 20 min. longer than anticipated Work was successfully completed", which then would be store into the notes columns in the the data frame.
For each event I know how many rows the notes span.
Something like this (actually, you would be happier and learn more figuring it out yourself, I was just procrastinating between two tasks):
x <- readLines("R/xample.txt") # you'll probably read it from a file
ids <- grep("^ID:", x) # detecting lines starting with ID:
versions <- grep("^Version:", x)
notes <- grep("^notes:", x)
nStart <- notes + 1 # lines where the notes start
nEnd <- c(ids[-1]-1, length(x)) # notes end one line before the next ID: line
ids <- sapply(strsplit(x[ids], ": "), "[[", 2)
versions <- sapply(strsplit(x[versions], ": "), "[[", 2)
notes <- mapply(function(i,j) paste(x[i:j], collapse=" "), nStart, nEnd)
df <- data.frame(ID=ids, ver=versions, note=notes, stringsAsFactors=FALSE)
dput of data
> dput(x)
c("ID: 20470", "Version: 1", "notes: ", " ", " ", "ID: 01040",
"Version: 2", "notes: ", " The customer was late.", "Project took 20 min. longer than anticipated",
"Work was successfully completed", "", "ID: 00000", "Version: 1",
"notes: ", " Customer was not at home.", "", "ID: 00000", "Version: 7",
"notes: ", " Fax at 2:30 pm", "Called but no answer", "Visit home no answer",
"Left note on door with call back number", "Made a final attempt on 12/5/2013",
"closed case on 12/10 with nothing resolved ")

Format function output as a customised multiple lines string

I'm trying to make a function which gives output with simple format.
If I already calculated estimated values of beta's, what should I do if I want following result format.
Coefficients
-------------
Constant: 5.2
Beta1: 4
Beta2: 9
Beta3: 2
.
.
.
I tried cat() function but to use cat(), I have to write every line manually like:
cat("Coefficients","\n","-------------","\n","Constant: 5.2","\n","Beta1: 4",....)
Is there any way to make that simple result format?
If you have a vector of 10 results and you want to label them Beta1 to Beta10 you could do:
result = 10:1
b_order = 1:10
paste0("beta", b_order, ": ", result)
This gives:
[1] "beta1: 10" "beta2: 9" "beta3: 8" "beta4: 7" "beta5: 6" "beta6: 5" "beta7: 4" "beta8: 3" "beta9: 2" "beta10: 1"

how to sort out a nested list in R

The original data was a simple list named "data" like this
[1] "score: 10 / review 1 / ID 1
[2] "score: 9 / review 2 / ID 2
[3] "score: 8 / review 3 / ID 3
----
[30] "score: 7 / review 30 / ID&DATE: 30
In order to sort out scores reviews and ID&DATEs separately,
I first made it a matrix, and then split them by "/" using str_split "stringr"
so the whole process went like this.
a1 <- readLines("data.txt")
a2 <- t(a1) # Matrix
a3 <- t(a2) # reversing rows and columns
b1 <- str_split(a,"/")
here is the problem
b1 came out as a nested list like this.
[[1]]
[1] "score: 10"
[2] "review 1"
[3] "ID 1"
[[2]]
[1] "score: 9"
[2] "review 2"
[3] "ID 2"
[[3]]
[1] "score: 8"
[2] "review 3"
[3] "ID 3"
------
[[30]]
[1] "score: 7"
[2] "review 30"
[3] "ID 30"
I want to extract the values of [[1]][1], [[2]][1], [[3]][1], ... [[30]][1], [[n]][2], and [[n]][3] SEPARATELY, and make each one of them a dataframe.
Any clues?
The following would work for a particular type of nested list that looks like your data. Without a reproducible example, I don't know for sure:
# create nested list
temp <- list(a=c(list("score: 10"), "review 1", "ID 1"),
b=c("score: 9", "review 2", "ID 2"),
c=c("score: 8", "review 3","ID 3"))
# create data frame from this list
df <- data.frame(score=unlist(sapply(temp, function(i) i[1])),
review=unlist(sapply(temp, function(i) i[2])),
ID=unlist(sapply(temp, function(i) i[3])))
I use sapply to pull out elements from each list item. Then, unlist is applied to the output so that it becomes a vector. All of this out put is wrapped in a data.frame. Note that you can rearrange the output so that the variables are arranged differently.
An even cleaner method, mentioned by #parfait, uses do.call and rbind:
# construct data.frame, rbinding each list item
df <- data.frame(do.call(rbind, temp))
# add the desired names
names(df) <- c('score', 'review', 'ID')

get consensus of multiple partitioning methods in R

My data:
data=cbind(c(1,1,2,1,1,3),c(1,1,2,1,1,1),c(2,2,1,2,1,2))
colnames(data)=paste("item",1:3)
rownames(data)=paste("method",1:6)
I want as an output that according to majority vote, there are two communities (with their elements). Something like: group1={item1, item2}, group2={item3}.
You can try this, base R:
res=apply(data,2,function(u) as.numeric(names(sort(table(u), decreasing=T))[1]))
setNames(lapply(unique(res), function(u) names(res)[res==u]), unique(res))
#$`1`
#[1] "item 1" "item 2"
#$`2`
#[1] "item 3"
This function is passed a matrix where each column is an item and each row is a membership vector corresponding to a partition of the items according to a clustering method. The elements (numbers) composing each row have no meaning other than indicating membership and are recycled from row to row. The function returns the majority vote partition. When no consensus exists for an item, the partition given by the first row wins. This allows ordering of the partitions by decreasing values of modularity, for instance.
consensus.final <-
function(data){
output=list()
for (i in 1:nrow(data)){
row=as.numeric(data[i,])
output.inner=list()
for (j in 1:length(row)){
group=character()
group=c(group,colnames(data)[which(row==row[j])])
output.inner[[j]]=group
}
output.inner=unique(output.inner)
output[[i]]=output.inner
}
# gives the mode of the vector representing the number of groups found by each method
consensus.n.comm=as.numeric(names(sort(table(unlist(lapply(output,length))),decreasing=TRUE))[1])
# removes the elements of the list that do not correspond to this consensus solution
output=output[lapply(output,length)==consensus.n.comm]
# 1) find intersection
# 2) use majority vote for elements of each vector that are not part of the intersection
group=list()
for (i in 1:consensus.n.comm){
list.intersection=list()
for (p in 1:length(output)){
list.intersection[[p]]=unlist(output[[p]][i])
}
# candidate group i
intersection=Reduce(intersect,list.intersection)
group[[i]]=intersection
# we need to reinforce that group
for (p in 1:length(list.intersection)){
vector=setdiff(list.intersection[[p]],intersection)
if (length(vector)>0){
for (j in 1:length(vector)){
counter=vector(length=length(list.intersection))
for (k in 1:length(list.intersection)){
counter[k]=vector[j]%in%list.intersection[[k]]
}
if(length(which(counter==TRUE))>=ceiling((length(counter)/2)+0.001)){
group[[i]]=c(group[[i]],vector[j])
}
}
}
}
}
group=lapply(group,unique)
# variables for which consensus has not been reached
unclassified=setdiff(colnames(data),unlist(group))
if (length(unclassified)>0){
for (pp in 1:length(unclassified)){
temp=matrix(nrow=length(output),ncol=consensus.n.comm)
for (i in 1:nrow(temp)){
for (j in 1:ncol(temp)){
temp[i,j]=unclassified[pp]%in%unlist(output[[i]][j])
}
}
# use the partition of the first method when no majority exists (this allows ordering of partitions by decreasing modularity values for instance)
index.best=which(temp[1,]==TRUE)
group[[index.best]]=c(group[[index.best]],unclassified[pp])
}
}
output=list(group=group,unclassified=unclassified)
}
Some examples:
data=cbind(c(1,1,2,1,1,3),c(1,1,2,1,1,1),c(2,2,1,2,1,2))
colnames(data)=paste("item",1:3)
rownames(data)=paste("method",1:6)
data
consensus.final(data)$group
[[1]]
[1] "item 1" "item 2"
[[2]]
[1] "item 3"
data=cbind(c(1,1,1,1,1,3),c(1,1,1,1,1,1),c(1,1,1,2,1,2))
colnames(data)=paste("item",1:3)
rownames(data)=paste("method",1:6)
data
consensus.final(data)$group
[[1]]
[1] "item 1" "item 2" "item 3"
data=cbind(c(1,3,2,1),c(2,2,3,3),c(3,1,1,2))
colnames(data)=paste("item",1:3)
rownames(data)=paste("method",1:4)
data
consensus.final(data)$group
[[1]]
[1] "item 1"
[[2]]
[1] "item 2"
[[3]]
[1] "item 3"

Resources