I have a question. I'm working on building a recommendation system in R, and I'm fairly new to the language. I can't seem to figure the following out.
I have a matrix like:
eventID g_26 g_27 g_28 g_29 g_30 g_31 g_32 g_33 g_34 g_35 g_36 g_37 g_38 g_39 g_40 g_41 g_42 g_43
1: 1010 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0
2: 1016 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
3: 1019 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
4: 1053 1 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0
5: 1168 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0
6: 1188 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
What I´d like to do is replace all values that have 1 to 1/sqrt(total # of 1's in that particular row).
I'm using the Data Table package as well if that makes it easier.
Thanks in advance!
We can multiply the dataframe with the value.
All the numbers that are 0 will remain 0 and the one with 1's will get changed to the desired output
df[-1] * 1/sqrt(rowSums(df==1))
We can specify the columns of interest in .SDcols (-1 implies we selected all the columns except the first column), get the sum of each row in the Subset of Data.table with Reduce and +, take the square root (sqrt), divide by 1, multiply with the Subset of data.table (.SD) and assign (:=) it to the columns of interest
dt[, (2:ncol(dt)) := .SD*1/sqrt(Reduce(`+`, .SD)), .SDcols = -1]
As an example
m <- matrix(c(1, 1, 0, 1, 0, 0, 1, 0, 0), ncol = 3, byrow = T)
rs <- apply(m,1,sum)
rs <- sqrt(rs)
m <- m/rs
Hope that's helpful
Related
I would like to write a function which would allow to filter the input data.
My input data is a list object containing named numeric vectors (minimal reproducible example below - dummy list).
vec1 <- c(rep(0, 10), rep(1, 4), rep(0,5), rep(-1,5))
vec2 <- c(rep(-1, 7), rep(0,99), rep(1, 6))
vec3 <- c(rep(1,2), rep(-1,2), rep(0,10), rep(-1,4), rep(0,8))
vec4 <- rep(0, 100)
dummy_list <- list(vec1, vec2, vec3, vec4)
names(dummy_list) <- c("first", "second", "third", "fourth")
I want my function to test whether in this vector any non-zero value occurs at least 5 times in a row.
My desired output should be a list containing only first two vectors of the initial dummy_list.
Below is one of my multiple attempts - I would like it to be as much similar to this as possible (except that the solution should work).
dummy_list <- Filter(function(x) which(rle(x$values !=0) x$lengths>5, dummy_list)
Note that we check whether any of the the rle length is greater or equal to 5.
Filter(function(x)any(with(rle(x), lengths[values!=0]>=5)), dummy_list)
$first
[1] 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 -1 -1 -1 -1 -1
$second
[1] -1 -1 -1 -1 -1 -1 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[32] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[63] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[94] 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1
I have successfully imported a csv file into R. It is a 6 by 6 matrix.
0 0 0 0 0 0
0 1 0 0 0 0
0 1 1 0 0 0
0 1 0 0 0 1
0 1 0 1 0 0
0 0 0 0 0 0
'1' exists in the second row and also exists in the second last row. So the distance between them vertically is 4.
Would I use the dist function to calculate this? And if so how would I implement it to give me the value of 4?
diff(range(which(rowSums(mat) > 0)))
# [1] 3
Explanation: since the data is binary, we can look at the distance between rows where the row sum is >0.
Adapting Sathish's nicely share data, this works:
mat <- matrix(as.integer(unlist(strsplit('0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0', " "))),
nrow = 6, ncol = 6, byrow = TRUE)
I've got the following table (which is called train) (in reality much bigger)
UNSPSC adaptor alert bact blood collection packet patient ultrasoft whit
514415 0 0 0 0 0 0 0 1 0
514415 0 0 0 1 0 0 0 1 0
514415 0 0 1 0 0 0 0 1 0
514415 0 0 0 0 0 0 0 1 0
514415 0 0 0 0 0 0 0 1 0
514415 0 0 0 0 0 0 0 1 0
422018 0 0 0 0 0 0 0 1 0
422018 0 0 0 0 0 0 0 1 0
422018 0 0 0 1 0 0 0 1 0
411011 0 0 0 0 0 0 0 1 0
I want to calculate the number of unique UNSPSC per column where the value is equal to 1. So for column blood it will be 2 and for column ultrasoft will be 3.
I'm doing this but don't know how to continue:
apply(train[,-1], 2, ......)
I'm trying to not to use loops.
To continue from where you left, we can use apply with margin=2 and calculate the length of unique values of "UNSPSC" for each column.
apply(train[-1], 2, function(x) length(unique(train$UNSPSC[x==1])))
#adaptor alert bact blood collection packet
# 0 0 1 2 0 0
#patient ultrasoft whit
# 0 3 0
Better option is with sapply/lapply which gives the same result but unlike apply does not convert the dataframe into matrix.
sapply(train[-1], function(x) length(unique(train$UNSPSC[x==1])))
If you have columns of only 0 and 1, like in the example, just use colSums:
colSums(train[,-1]) # you remove the non numeric columns before use, like UNSPSC
# adaptor alert bact blood collection packet patient
# 0 0 1 2 0 0 0
# ultrasoft whit
# 10 0
i'm having trouble manipulating vectors in R. i have a vector that looks like this:
stack <- append(append(rep(0,8),c(1,0,0,0,0,1)),rep(0,6))
[1] 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0
my overall goal is to the manipulate the vector as such:
*when there is a 1, make the next three values in the vector 1.
*change the original 1 to 0.
so ultimately the vector would look like:
[1] 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 1 1 0 0 0
the second part I can do by:
replace(stack,which(stack == 1),0)
but I can't figure out how to do the first one efficiently. any help would be greatly appreciated.
You can use filter here :
c(filter(sx,c(0,0,0,0,1,1,1),circular=TRUE))
## [1] 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 1 1 0 0 0
Here's a possible base R option
temp <- which(stack == 1)
stack[as.vector(mapply(`:`, temp, temp + 3))] <- c(0, rep(1, 3))
stack
# [1] 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 1 1 0 0 0
I would go with regular expressions
stack <- paste0(stack, collapse="")
stack <- gsub("1.{3}", "0111", stack)
stack <- strsplit(stack, "+")
I have two data frames:
DT1: (This data frame's column values I need to edit based on another datatable DT2)
BIC BCC1 BCC2 BCC6 BCC8 BCC9 BCC10 BCC11
990081899A 0 1 0 0 0 0 0
9900023620 0 1 1 0 0 0 0
9900427160 0 1 0 1 0 0 0
990064457TA 1 1 0 1 0 0 0
990066595A 0 0 0 0 0 0 1
990088248A 0 0 0 0 0 0 1
990088882C1 0 0 0 0 0 0 1
990088882C2 0 0 0 1 1 0 0
990088882C3 0 0 0 1 1 0 0
990088882C4 0 0 0 1 1 0 0
990088882C5 0 0 0 1 1 0 0
DT2:
BCC HIER1 HIER2 HIER3 HIER4 HIER5
BCC8 BCC9 BCC10 BCC11 BCC12 0
BCC9 BCC10 BCC11 BCC12 0 0
BCC10 BCC11 BCC12 0 0 0
BCC11 BCC12 0 0 0 0
BCC17 BCC18 BCC19 0 0 0
BCC18 BCC19 0 0 0 0
BCC27 BCC28 BCC29 BCC80 0 0
BCC28 BCC29 0 0 0 0
BCC46 BCC48 0 0 0 0
BCC54 BCC55 0 0 0 0
BCC57 BCC58 0 0 0 0
BCC70 BCC71 BCC72 BCC103 BCC104 BCC169
I want to look up the column names in DT1 though first column values in DT2$BCC, according to the hierarchy logic, as:
I want to loop through DT1 column names except first column and nest that loop through DT2 first column values to check if they are equal. If they are equal then get that DT2$BCC value and check if DT1$(DT2$BCC) = 1, if yes then set value 0 in DT1 columns are present in (HIER1, HIER2, HIER3,.......)
Result should be:
BIC BCC1 BCC2 BCC6 BCC8 BCC9 BCC10 BCC11
990081899A 0 1 0 0 0 0 0
9900023620 0 1 1 0 0 0 0
9900427160 0 1 0 1 0 0 0
990064457TA 1 1 0 1 0 0 0
990066595A 0 0 0 0 0 0 0
990088248A 0 0 0 0 0 0 0
990088882C1 0 0 0 0 0 0 0
990088882C2 0 0 0 1 0 0 0
990088882C3 0 0 0 1 0 0 0
990088882C4 0 0 0 1 0 0 0
990088882C5 0 0 0 1 0 0 0
I am doing this now:
cols<-setdiff(names(DT1), "HIC")
subs<-as.character(DT2$BCC)
colsHier<-setdiff(names(DT2), "BCC")
paste0("DT1$", eval(cols[i]))<-
for( i in 1:length(cols)){
for (k in 1:length(subs)){
ifelse(cols[i] == subs[k],
ifelse(do.call(paste0, list('DT1$', eval(cols[1]),'[]')) == 1,
for (j in 1:length(colsHeir)){
if(colsHeir[j]!= 0)
x<-paste0('DT2$',eval(colsHier[j]))
paste0('DT1$',eval(x[k])):= 0}
,DT1$cols[i]), DT1$cols[i])}}
I am trying to match the value of do.call(paste0, list('DT1$', eval(cols[1]),'[]')) == 1, but when I am running this expression in R I am getting following:
> do.call(paste0, list('DT1$', eval(cols[2]),'[1]'))
[1] "DT1$BCC2[1]"
and NOT the value of the cell. How can I access the value of that cell to match with 1.
I am not able get the correct way of doing this. I am sorry for long question. Any help is appreciated.
library(reshape2)
melt the data
dt1.m <- melt(dt1, id = "BIC")
dt2.m <- melt(dt2, id = "BCC")
If the dt1.m$variable is equal to one of the values in dt2.m set it to 0
dt1.m$value <- ifelse(dt1.m$variable %in% dt2.m$value, 0, dt1.m$value)
cast the data into proper form
dt1.c <- dcast(dt1.m, ...~variable)
Dcast automatically reorders the rows.