I am just trying to loop over my columns and print out the count of unique values for further processing - but getting not output. This should be simple but I am not getting any output. Here is a simplified version of my code. Is there something glaringly obviously missing as I suspect
for (i in 1:length(mydata)) {
(table(mydata[,i]))
}
Do you mean using apply?
> x <- data.frame("SN" = 1:4, "Age" = c(21,15,56,15), "Name" =
c("John","Dora","John","Dora"))
> apply(x,2,function(x) unique(x))
$SN
[1] "1" "2" "3" "4"
$Age
[1] "21" "15" "56"
$Name
[1] "John" "Dora"
You can also count the uniques like this:
> apply(x,2,function(x) length(unique(x)))
SN Age Name
4 3 2
Related
I have the following column from a dataframe
df <- data.frame(
crime = as.character(c(115400, 171200, 91124, 263899, 67601, 51322)),
stringsAsFactors=FALSE
)
I am using a function to extract the first two digits based on some condition as seen on the function below
for (i in df$crime){
if (nchar(i)==6){
print(substring(i,1,2))}
else {print(substring(i,1,1))
}
}
when I run this function I get the following output which is what I want
[1] "11"
[1] "17"
[1] "9"
[1] "26"
[1] "6"
[1] "5"
However, I want this to be saved as stand along vector. how do I do that?
Here is a base R solution with ifelse+ substring
res <- with(df, substring(crime,1,ifelse(nchar(crime) == 6, 2, 1)))
such that
> res
[1] "11" "17" "9" "26" "6" "5"
substr/substring are vectorized, so we can use ifelse
v1 <- with(df1, ifelse(nchar(crime) == 6, substr(crime, 1, 2), substr(crime, 1, 1)))
v1
#[1] "11" "17" "9" "26" "6" "5"
In the OP's for loop, a vector can be initialized to store the output in each of the iterations
v1 <- character(nrow(df1))
for (i in seq_along(df1$crime)){
if (nchar(df1$crime[i])==6){
v1[i] <- substring(df1$crime[i],1,2)
} else {
v1[i] <- substring(df1$crime[i],1,1)
}
}
Using regex :
output <- with(df, ifelse(nchar(crime) == 6, sub("(..).*", "\\1", crime),
sub("(.).*", "\\1", crime)))
output
#[1] "11" "17" "9" "26" "6" "5"
It becomes a little simpler with str_extract from stringr
with(df, ifelse(nchar(crime) == 6, stringr::str_extract(crime, ".."),
stringr::str_extract(crime, ".")))
I can imagine some situations where keeping the extracted codes within the original data frame is useful.
I'll use the data.table package as it's fast, which may be handy if your data is big.
library(data.table)
# convert your data.frame to data.table
setDT(df)
# filter the rows where crime length is 6,
# and assign the first two characters of
# it into a new variable "extracted".
# some rows now have NAs in the new
# field. The last [] prints it to screen.
df[nchar(crime) == 6, extracted := substring(crime, 1, 2)][]
I am using Recommenderlab in R to build a recommendation system to provide craft-beer suggestions to new users.
However, upon running the model, I am receiving the same predictions per user for a majority of the training dataset, or receiving 'character(0)' as the output. How can I receive the predictions that are associated with each user and not duplicated?
The dataset I'm using can be found here: https://www.kaggle.com/rdoume/beerreviews/version/1
I have tried converting the data frame directly into a matrix, then into a realRatingMatrix.
In order to receive any recommendations, I need to use the 'dcast' function from the data.table library before converting the data frame into a matrix.
I have also tried removing the first column from the matrix to drop the user ids.
One thing to note is that when the data is sampled, there can be a few rows where the 'reviewer' is blank, but the rating and beer id is there.
library(dplyr)
library(tidyverse)
library(recommenderlab)
library(reshape2)
library(data.table)
beer <- read.csv('beer.csv', stringsAsFactors = FALSE)
#Take sample of data(1000)
beer_sample <- sample_n(beer, 1000)
#Select relevant columns & rename
beer_ratings <- select(beer_sample, reviewer = review_profilename, beerId = beer_beerid, rating = review_overall)
#Add unique id for reviewers
beer_ratings$userId <- group_indices_(beer_ratings, .dots = 'reviewer')
#Create ratings matrix
rating_matrix <- dcast(beer_ratings, userId ~ beerId, value.var = 'rating')
rating_matrix <- as.matrix(rating_matrix)
rating_matrix <- as(rating_matrix, 'realRatingMatrix')
#UBCF Model
recommender_model <- Recommender(rating_matrix, method = 'UBCF', param=list(method='Cosine',nn=10))
#Predict top 5 beers for first 10 users
recom <- predict(recommender_model, rating_matrix[1:10], n=5)
#Return top recommendations as a list
recom_list<- as(recom,'list')
recom_list
The above code will result in:
[[1]]
[1] "48542" "2042" "6" "10" "19"
[[2]]
[1] "10277" "2042" "6" "10" "19"
[[3]]
[1] "10277" "48542" "6" "10" "19"
[[4]]
[1] "10277" "48542" "2042" "6" "10"
[[5]]
[1] "10277" "48542" "2042" "6" "10"
[[6]]
[1] "10277" "48542" "2042" "6" "10"
Converting the data frame to a matrix then realRatingMatrix without casting first into a table results in the user's recommendation as:
`886093`
`character(0)`
Using the 'dcast' function first then converting the data frame into a matrix and removing the first column, then into a realRatingMatrix returns the same predictions for almost every user:
[[1]]
[1] "6" "7" "10" "12" "19"
[[2]]
[1] "6" "7" "10" "12" "19"
[[3]]
[1] "6" "7" "10" "12" "19"
Any help is greatly appreciated.
When I apply the seqdef function from the TraMineR package to a list of vector and then take a look at the levels obtained, I get two unwanted levels. I can't figure out how to erase those levels. Here is my code:
> require(TraMineR)
> seqW <- lapply(X = myListOfVectors, FUN = function(s){
seqdef(s, alphabet = 1:9)
})
After verification, there is only numbers from 1 to 9 in my sequences, but then I get
> levels(s$T1)
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "*" "%"
Where do these "*" and "%" come from ? How can I avoid their creation ?
> foo <- as.character(c(0, 2))
> foo
[1] "0" "2"
> foo[1]
[1] "0"
> foo[2]
[1] "2"
> as.character("0-2")
[1] "0-2" #this is the output I want from the command below:
> as.character("foo[1]-foo[2]")
[1] "foo[1]-foo[2]" # ... was hoping to get "0-2"
I tried some variations of eval(parse()), but same problem. I also tried these simple examples:
> as.character("as.name(foo[1])")
[1] "as.name(foo[1])"
> as.character(as.name("foo[1]"))
[1] "foo[1]"
Any chance of getting something simple like as.character("foo[1]-foo[2]") to display "0-2"?
UPDATE
Similar example (with a much longer string):
> lol <- as.character(seq(0, 20, 2))
> lol
[1] "0" "2" "4" "6" "8" "10" "12" "14" "16" "18" "20"
> c(as.character("0-2"), as.character("2-4"), as.character("4-6"), as.character("6-8"), as.character("8-10"), as.character("10-12"), as.character("12-14"),as.character("14-16"),as.character("16-18"),as.character("18-20"))
[1] "0-2" "2-4" "4-6" "6-8" "8-10" "10-12" "12-14" "14-16" "16-18" "18-20"
I would like to be able to actually call the object lol from within my character string.
We can use paste with the collapse argument
paste(foo, collapse='-')
#[1] "0-2"
If we need to paste adjacent elements together, remove the first and last elements of 'lol' and then paste it together with the sep argument.
paste(lol[-length(lol)], lol[-1], sep='-')
#[1] "0-2" "2-4" "4-6" "6-8" "8-10" "10-12" "12-14" "14-16" "16-18"
#[10] "18-20"
I wrote a function in R to attach zeros such that any number between 1 and 100 comes out as 001 (1), 010 (10), and 100 (100) but I can't figure out why the if statements aren't qualifying like I would like them to.
id <- 1:11
Attach_zero <- function(id){
i<-1
for(i in id){
if(id[i] < 10){
id[i] <- paste("00",id[i], sep = "")
}
if((id[i] < 100)&&(id[i]>=10)){
id[i] <- paste("0",id[i], sep = "")
}
print(id[i])
}
}
The output is "001", "2", "3",... "010", "11"
I have no idea why the for loop is skipping middle integers.
The problem here is that you're assigning a character string (e.g. "001") to a numeric vector. When you do this, the entire id vector is converted to character (elements of a vector must be of one type).
So, after comparing 1 to 10 and assigning "001" to id[1], the next element of id is "2" (i.e. character 2). When an inequality includes a character element (e.g. "2" < 10), the numeric part is coerced to character, and alphabetic sorting rules apply. These rules mean that both "100" and "10" comes before "2", and so neither of your if conditions are met. This is the case for all numbers except 10, which according to alphabetic sorting is less than 100, and so your second if condition is met. When you get to 11, neither condition is met once again, since the "word" "11" comes after the word "100".
While there are a couple of ways to fix your function, this functionality exists in R (as mentioned in the comments), both with sprintf and formatC.
sprintf('%03d', 1:11)
formatC(1:11, flag=0, width=3)
# [1] "001" "002" "003" "004" "005" "006" "007" "008" "009" "010" "011"
For another vectorised approach, you could use nested ifelse statements:
ifelse(id < 10, paste0('00', id), ifelse(id < 100, paste0('0', id), id))
Try this:
id <- 1:11
Attach_zero <- function(id){
id1 <- id
i <- 1
for (i in seq_along(id)) {
if(id[i] < 10){
id1[i] <- paste("00", id[i], sep = "")
}
if(id[i] < 100 & id[i] >= 10){
id1[i] <- paste("0", id[i], sep = "")
}
}
print(id1)
}
If you try your function with id = c(1:3, 6:11):
Attach_zero(id)
##[1] "001"
##[1] "2"
##[1] "3"
##[1] "8"
##[1] "9"
##[1] "010"
##[1] "11"
##Error in if (id[i] < 10) { : missing value where TRUE/FALSE needed
What here happens is that the missing values are omitted because your i values says so. The i<-1 does nothing as it is after that written with for (i in id) which in turns gives i for each loop the ith value of id instead of an index. So if your id is id <- c(1:3, 6:11) you will have unexpected results as showed.
Just correcting your function to include all the elements of the id:
Attach_zero <- function(id){
for(i in 1:length(id)){
if(id[i] < 10){
id[i] <- paste("00",id[i], sep = "")
}
if((id[i] < 100)&&(id[i]>=10)){
id[i] <- paste("0",id[i], sep = "")
}
print(id[i])
}
}
Attach_zero(id)
##[1] "001"
##[1] "2"
##[1] "3"
##[1] "6"
##[1] "7"
##[1] "8"
##[1] "9"
##[1] "010"
##[1] "11"
Note the number 7 in this output.
And using sprintf as jbaums says, including it in a function:
Attach_zero <- function(id){
return(sprintf('%03d', id)) #You can change return for print if you want
}
Attach_zero(id)
## [1] "001" "002" "003" "006" "007" "008" "009" "010" "011"