User Based Recommendation in R - r

I am trying to do user based recommendation in R by using recommenderlab package but all the time I am getting 0(no) prediction out of the model.
my code is :
library("recommenderlab")
# Loading to pre-computed affinity data
movie_data<-read.csv("D:/course/Colaborative filtering/data/UUCF Assignment Spreadsheet_user_row.csv")
movie_data[is.na(movie_data)] <- 0
rownames(movie_data) <- movie_data$X
movie_data$X <- NULL
# Convert it as a matrix
R<-as.matrix(movie_data)
# Convert R into realRatingMatrix data structure
# realRatingMatrix is a recommenderlab sparse-matrix like data-structure
r <- as(R, "realRatingMatrix")
r
rec=Recommender(r[1:nrow(r)],method="UBCF", param=list(normalize = "Z-score",method="Cosine",nn=5, minRating=1))
recom <- predict(rec, r["1648"], n=5)
recom
as(recom, "list")
all the time I am getting out put like :
as(recom, "list")
$`1648`
character(0)
I am using user-row data from this link:
https://drive.google.com/file/d/0BxANCLmMqAyIQ0ZWSy1KNUI4RWc/view
In that data column A contains user id and apart from that all are movie rating for each movie name.
Thanks.

The line of code movie_data[is.na(movie_data)] <- 0 is the source of the error. For realRatingMatrix (unlike the binaryRatingMatrix) the movies that are not rated by the users are expected to be NA values, not zero values. For example, the following code gives the correct predictions:
library("recommenderlab")
movie_data<-read.csv("UUCF Assignment Spreadsheet_user_row.csv")
rownames(movie_data) <- movie_data$X
movie_data$X <- NULL
R<-as.matrix(movie_data)
r <- as(R, "realRatingMatrix")
rec=Recommender(r,method="UBCF", param=list(normalize = "Z-score",method="Cosine",nn=5, minRating=1))
recom <- predict(rec, r["1648"], n=5)
as(recom, "list")
# [[1]]
# [1] "X13..Forrest.Gump..1994." "X550..Fight.Club..1999."
# [3] "X77..Memento..2000." "X122..The.Lord.of.the.Rings..The.Return.of.the.King..2003."
# [5] "X1572..Die.Hard..With.a.Vengeance..1995."

Related

R code error "Assigned data `TL$x[TL$products == products$product]` must be compatible with existing data."

Am trying to calculate an indicator called "pesticide load index" using the package of the same name available on R. Everything goes well until I get to the
products$TL <- TL$x[TL$products==products$product]
products$FL <- FL$x[FL$products==products$product]
part. There i get the error msg :
Error:
! Assigned data TL$x[TL$products == products$product] must be compatible with existing data.
x Existing data has 6 rows.
x Assigned data has 0 rows.
i Only vectors of size 1 are recycled.
Run rlang::last_error() to see where the error occurred.
Here is the code used for that:
# Read-in data on Products (Fill out Table_R_substances.xlsx) ####
products <- read_excel("./Table_R_products.xlsx")
####### Health Load Computation #####
products$HL <- products$formula*(products$sum.risk.score/products$reference.sum.risk.scores)
###### Output: Load Indicator Values ######
substances$TL.products <- substances$concentration*substances$Environmental.Toxicity.Substance
substances$FL.products <- substances$concentration*substances$Fate.Load.substances
TL <- aggregate(substances$TL.products, by=list(products=substances$product), FUN=sum)
FL <- aggregate(substances$FL.products, by=list(products=substances$product),FUN=sum)
products$TL <- TL$x[TL$products==products$product]
products$FL <- FL$x[FL$products==products$product]
products$L <- products$HL + products$TL + products$FL
###### Output: Load Index Values #######
# caculate STI-Quotient
products$STI <- products$amount.applied/products$standard.doses
Any chance on how to fix this? Thanks in advance

How can I read fertility dataset from Countr package?

I am trying to use "fertility" dataset form "Countr" package, as mentioned in:
https://cran.r-project.org/web/packages/Countr/vignettes/exampleFertility.pdf
But I get an empty dataset. Here is my code:
library("Countr")
d <- data("fertility", package="Countr")
nrow(d)
and I get NULL:
> nrow(d)
NULL
When you load package "Countr" (or any other package), all datasets provided by it become available immediately.
library("Countr")
dim(fertility)
## [1] 1243 9
If you wish to load fertility, without loading "Countr", then use data. In a new R session do:
data("fertility", package = "Countr")
dim(fertility)
Note that data() makes available the dataset(s) as a side effect.
The return value is just the name of the dataset:
d <- data("fertility", package = "Countr")
d
## [1] "fertility"
Of course, d is a character vector, so it's dim is NULL.
If only argument "package" is specified, the result gives information about the available datasets in the package:
data(package = "Countr")$results[ , c("Package", "Item", "Title")]
## Package Item Title
[1,] "Countr" "fertility" "Fertility data"
[2,] "Countr" "football" "Football data"
library(Countr)
d <- fertility
nrow(d)
# [1] 1243

R: Package topicmodels: LDA: Error: invalid argument

I have a question regarding LDA in topicmodels in R.
I created a matrix with documents as rows, terms as columns, and the number of terms in a document as respective values from a data frame. While I wanted to start LDA, I got an Error Message stating "Error in !all.equal(x$v, as.integer(x$v)) : invalid argument type" . The data contains 1675 documents of 368 terms. What can I do to make the code work?
library("tm")
library("topicmodels")
data_matrix <- data %>%
group_by(documents, terms) %>%
tally %>%
spread(terms, n, fill=0)
doctermmatrix <- as.DocumentTermMatrix(data_matrix, weightTf("data_matrix"))
lda_head <- topicmodels::LDA(doctermmatrix, 10, method="Gibbs")
Help is much appreciated!
edit
# Toy Data
documentstoy <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16)
meta1toy <- c(3,4,1,12,1,2,3,5,1,4,2,1,1,1,1,1)
meta2toy <- c(10,0,10,1,1,0,1,1,3,3,0,0,18,1,10,10)
termstoy <- c("cus","cus","bill","bill","tube","tube","coa","coa","un","arc","arc","yib","yib","yib","dar","dar")
toydata <- data.frame(documentstoy,meta1toy,meta2toy,termstoy)
So I looked inside the code and apparently the lda() function only accepts integers as the input so you have to convert your categorical variables as below:
library('tm')
library('topicmodels')
documentstoy <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16)
meta1toy <- c(3,4,1,12,1,2,3,5,1,4,2,1,1,1,1,1)
meta2toy <- c(10,0,10,1,1,0,1,1,3,3,0,0,18,1,10,10)
toydata <- data.frame(documentstoy,meta1toy,meta2toy)
termstoy <- c("cus","cus","bill","bill","tube","tube","coa","coa","un","arc","arc","yib","yib","yib","dar","dar")
toy_unique = unique(termstoy)
for (i in 1:length(toy_unique)){
A = as.integer(termstoy == toy_unique[i])
toydata[toy_unique[i]] = A
}
lda_head <- topicmodels::LDA(toydata, 10, method="Gibbs")

looping difficulty with data

i can covert one column into climdexInput object format using the following code :
tmax.dates <- as.PCICt(do.call(paste, t[,c("year",
"days")]), format="%Y %j", cal="gregorian")
tmin.dates <- as.PCICt(do.call(paste, t[,c("year",
"days")]), format="%Y %j", cal="gregorian")
prec.dates <- as.PCICt(do.call(paste, t[,c("year",
"days")]), format="%Y %j", cal="gregorian")
## Load the data in.
ci <- climdexInput.raw(tmax=ntuobs[,1],
tmin=ntuobs[,1],
prec=ntuobs[,1],tmax.dates, tmin.dates, prec.dates, base.range=c(2000, 2010))
## Create a timeseries of monthly maximum 5-day consecutive
however, ntuobs[] has 100 columns and i want apply the function on all 100 columns and then store it into 100 ci[,] columns .
"i tried applying for loop
for (i in 1:100){
ci[,i] <- climdexInput.raw(tmax=ntuobs[,i],
tmin=ntuobs[,i],
prec=ntuobs[,i],tmax.dates, tmin.dates, prec.dates, base.range=c(2000, 2010))
} "
but this gives an error
"Error in ci[, i] <- climdexInput.raw(tmax = changi[, i], tmin = ntuobs[, :
object of type 'S4' is not subsettable"
kindly suggest me ways to tackle this problem. Any loop methods using APPLY function is welcomed.
Thanks
example dataset
library(PCICt)
## Create a climdexInput object from some data already loaded in and
## ready to go.
## Parse the dates into PCICt.
tmax.dates <- as.PCICt(do.call(paste, ec.1018935.tmax[,c("year",
"jday")]), format="%Y %j", cal="gregorian")
tmin.dates <- as.PCICt(do.call(paste, ec.1018935.tmin[,c("year",
"jday")]), format="%Y %j", cal="gregorian")
prec.dates <- as.PCICt(do.call(paste, ec.1018935.prec[,c("year",
"jday")]), format="%Y %j", cal="gregorian")
## Load the data in.
ci <- climdexInput.raw(ec.1018935.tmax$MAX_TEMP,
ec.1018935.tmin$MIN_TEMP, ec.1018935.prec$ONE_DAY_PRECIPITATION,
tmax.dates, tmin.dates, prec.dates, base.range=c(1971, 2000))
## Create a timeseries of annual SDII values.
sdii <- climdex.sdii(ci)
but this is for a single column but mine data is 100 column(ensemble).

Scrape number of articles on a topic per year from NYT and WSJ?

I would like to create a data frame that scrapes the NYT and WSJ and has the number of articles on a given topic per year. That is:
NYT WSJ
2011 2 3
2012 10 7
I found this tutorial for the NYT but is not working for me :_(. When I get to line 30 I get this error:
> cts <- as.data.frame(table(dat))
Error in provideDimnames(x) :
length of 'dimnames' [1] not equal to array extent
Any help would be much appreciated.
Thanks!
PS: This is my code that is not working (A NYT api key is needed http://developer.nytimes.com/apps/register)
# Need to install from source http://www.omegahat.org/RJSONIO/RJSONIO_0.2-3.tar.gz
# then load:
library(RJSONIO)
### set parameters ###
api <- "API key goes here" ###### <<<API key goes here!!
q <- "MOOCs" # Query string, use + instead of space
records <- 500 # total number of records to return, note limitations above
# calculate parameter for offset
os <- 0:(records/10-1)
# read first set of data in
uri <- paste ("http://api.nytimes.com/svc/search/v1/article?format=json&query=", q, "&offset=", os[1], "&fields=date&api-key=", api, sep="")
raw.data <- readLines(uri, warn="F") # get them
res <- fromJSON(raw.data) # tokenize
dat <- unlist(res$results) # convert the dates to a vector
# read in the rest via loop
for (i in 2:length(os)) {
# concatenate URL for each offset
uri <- paste ("http://api.nytimes.com/svc/search/v1/article?format=json&query=", q, "&offset=", os[i], "&fields=date&api-key=", api, sep="")
raw.data <- readLines(uri, warn="F")
res <- fromJSON(raw.data)
dat <- append(dat, unlist(res$results)) # append
}
# aggregate counts for dates and coerce into a data frame
cts <- as.data.frame(table(dat))
# establish date range
dat.conv <- strptime(dat, format="%Y%m%d") # need to convert dat into POSIX format for this
daterange <- c(min(dat.conv), max(dat.conv))
dat.all <- seq(daterange[1], daterange[2], by="day") # all possible days
# compare dates from counts dataframe with the whole data range
# assign 0 where there is no count, otherwise take count
# (take out PSD at the end to make it comparable)
dat.all <- strptime(dat.all, format="%Y-%m-%d")
# cant' seem to be able to compare Posix objects with %in%, so coerce them to character for this:
freqs <- ifelse(as.character(dat.all) %in% as.character(strptime(cts$dat, format="%Y%m%d")), cts$Freq, 0)
plot (freqs, type="l", xaxt="n", main=paste("Search term(s):",q), ylab="# of articles", xlab="date")
axis(1, 1:length(freqs), dat.all)
lines(lowess(freqs, f=.2), col = 2)
UPDATE: the repo is now at https://github.com/rOpenGov/rtimes
There is a RNYTimes package created by Duncan Temple-Lang https://github.com/omegahat/RNYTimes - but it is outdated because the NYTimes API is on v2 now. I've been working on one for political endpoints only, but not relevant for you.
I'm rewiring RNYTimes right now...Install from github. You need to install devtools first to get install_github
install.packages("devtools")
library(devtools)
install_github("rOpenGov/RNYTimes")
Then try your search with that, e.g,
library(RNYTimes); library(plyr)
moocs <- searchArticles("MOOCs", key = "<yourkey>")
This gives you number of articles found
moocs$response$meta$hits
[1] 121
You could get word counts for each article by
as.numeric(sapply(moocs$response$docs, "[[", 'word_count'))
[1] 157 362 1316 312 2936 2973 355 1364 16 880

Resources