I have a standard 2x2 table
Yes No
Yes a b
No c d
I want to create a condition whereby IF(a or b or c or d = 0) then 0.5 is added on to each of the cells a,b,c,d.
I have tried this:
if(a && b && c && d == 0){
a=a+0.5, b=b+0.5, c=c+0.5, d=d+0.5
}
But I am getting an error saying
Error: unexpected ',' in:
"if(a && b && c && d== 0){
a=a+0.5,"
i.e. I don't think it is letting me put multiple things to execute.
Also I don't think that the && is right between each of the letters as I believe that means IF(a and b and ...)
UPDATE TO QUESTION:
I have another related question.
If I have say a set of say n tables, all in the format:
Yes No
Yes a b
No c d
and if one of the a,b,c or d in any of the n tables is equal to zero then 0.5 is added on to each of the a,b,c,d for all of the n tables. How would I do that?
My list looks like the following:
n11 n12 n21 n22
1 188 1157 173 1168
2 2 201 1 101
3 369 2280 354 2289
4 1 61 0 61
5 1306 16870 1333 16773
6 4 81 3 79
7 6 117 5 118
8 19 334 15 318
9 1 49 0 48
10 0 36 1 33
11 2 114 3 113
12 13 433 37 696
13 1 64 0 65
14 4 157 1 160
15 1 42 0 43
16 1 150 5 146
17 7 1124 10 1117
18 2 78 2 77
and what I am trying to say is that if any of the aspects of the cells of the table are 0, then I want 0.5 to be added on to every cell.
In R you can't use , to separate line, but you can use ;.
Also, the way you are doing considers a,b and c are boolean (TRUE/FALSE), which is not the case as they are numbers. Your condition should be :
if (a == 0 || b == 0 || c == 0 || d == 0)
Note that your code will run nevertheless, even if a,b and c are not boolean since they are numbers and there is an equivalence between FALSE and a == 0. This means you could also write your condition as :
if (!a || !b || !c || !d)
For the UPDATE, I consider matList is the list of matrices :
for (ii in 1:length(matList())) {
if (any(matList[[ii]] == 0)) {
matList = lapply(matList, function(X){X+0.5})
break # Exit the for loop
}
}
lapply applies mat + 0.5 (i.e + 0.5 to each element of the matrix thanks to R sugar) to every element (here matrices) of the list matList and returns the resulting list.
The problem is with the commas that separate your variables. R syntax does not allow you to do it. Write it this way:
if (a && b && c && d == 0){
a=a+0.5
b=b+0.5
c=c+0.5
d=d+0.5
}
Another problem is that the behaviour you described does not match with your code. If you write && it means and, not or. If you want to check if each element is equal to 0, you should write the following:
Modified based on Rodrigo's comment, the correct code would be:
if (0 %in% c(a,b,c,d)){
a=a+0.5
b=b+0.5
c=c+0.5
d=d+0.5
}
Related
So I understand that in R, a hash() is similar to a dictionary. I would like to extract specific values from my dataframe and put them in to a hash.
The componentindex column is were I have my keys and the cluster.index + UniqueFileSourceCounts columns contain my values. So for the same key I have multiple values. e.g: hash {91: [1,15],[22,99] etc..
So I would like to create a hash that contains each key, with multiple values. But im not sure how to do that.
mini_df <- head(df,10) #using a small df
compID <- unique(mini_df$componentindex) #list with unique keys
h1 <- hash()
for (i in 1:length(mini_df)){
if(compID == mini_df[i,"componentindex"]){
h1 <- hash(mini_df[i,"componentindex"] ,c(mini_df[i,"cluster.index"],mini_df[i,"UniqueFileSourcesCount"]))
}
#h2 <- append(h2,h1)
}
if I print h1 , I end up having only the last value:
<hash> containing 1 key-value pair(s).
91 : 42 5
Which I understand since I don't append to this hash but overwrite it. Im not sure how to append/expand hashes in R and I have not been able to find a solution yet.
mini_df:
UniqueFileSourcesCount cluster.index componentindex
1 15 1 91
2 15 10 -1
3 99 22 91
4 63 23 1675
5 12 25 91
6 6 27 91
7 50 37 91
8 5 42 91
9 2 43 -1
10 2 69 -1
when I do
table(df$strategy.x)
0 1 2 3
70 514 223 209
table(df$strategy.y)
0 1 2 3
729 24 7 4
I want to create a variable with both of these combined. I tried this
df <- df %>%
mutate(nstrategy1 = ifelse(strategy.x==1| strategy.y==1 , 1, 0))
table(df$nstrategy1)
0 1
399 519
I am supposed to get 514 + 24 = 538 but I got 519 instead
df <- df %>% mutate(nstrategy2 = ifelse(strategy.x==2| strategy.y==2 , 1, 0))
table(df$nstrategy2)
0 1
578 228
Similarly, I am supposed to get 223 + 7 = 230, but I got 228 instead
Is there a good way to merge both strategy.x and strategy.y and end up with a table like the following with 4 categories?
0 1 2 3
799 538 230 213
table(mtcars$am) # 13 1's
table(mtcars$vs) # 14 1's
mtcars$ones = ifelse(mtcars$am == 1 | mtcars$vs == 1, 1, 0)
table(mtcars$ones) # 20 1's < 13 + 14 = 27
Why is it showing only 20 1's instead of 27? It's because there are 7 + 6 + 7 = 20 cars with either one or two 1's in am and vs. There are 13 with am==1 (6+7), and 14 with vs==1 (7+7). Seven cars are in the bottom left because they have 1's in both dimensions, which you are expecting/seeking to count twice.
table(mtcars$am, mtcars$vs)
# 0 1
# 0 12 7
# 1 6 7
The simplest way to get the sum of the two results would be by adding the two table objects:
table(mtcars$am) + table(mtcars$vs)
# 0 1
# 37 27
I want to fill NA rows based on checking the differences between the closest non-NA labeled rows.
For instance
data <- data.frame(sd_value=c(34,33,34,37,36,45),
value=c(383,428,437,455,508,509),
label=c(c("bad",rep(NA,4),"unable")))
> data
sd_value value label
1 34 383 bad
2 33 428 <NA>
3 34 437 <NA>
4 37 455 <NA>
5 36 508 <NA>
6 45 509 unable
I want to evaluate how to change NA rows with checking the difference between sd_value and value those close to bad and unablerows.
if we want to get differences between the rows we can do;
library(dplyr)
data%>%
mutate(diff_val=c(0,diff(value)), diff_sd_val=c(0,diff(sd_value)))
sd_value value label diff_val diff_sd_val
1 34 383 bad 0 0
2 33 428 <NA> 45 -1
3 34 437 <NA> 9 1
4 37 455 <NA> 18 3
5 36 508 <NA> 53 -1
6 45 509 unable 1 9
The condition how I want to label the NA rows is
if the diff_val<50 and diff_sd_val<9 label them with the last non-NA label else use the first non-NA label after the last NA row.
So that the expected output would be
sd_value value label diff_val diff_sd_val
1 34 383 bad 0 0
2 33 428 bad 45 -1
3 34 437 bad 9 1
4 37 455 bad 18 3
5 36 508 unable 53 -1
6 45 509 unable 1 9
The possible solution I cooked up so far:
custom_labelling <- function(x,y,label){
diff_sd_val<-c(NA,diff(x))
diff_val<-c(NA,diff(y))
label <- NA
for (i in 1:length(label)){
if(is.na(label[i])&diff_sd_val<9&diff_val<50){
label[i] <- label
}
else {
label <- label[i]
}
}
return(label)
}
which gives
data%>%
mutate(diff_val=c(0,diff(value)), diff_sd_val=c(0,diff(sd_value)))%>%
mutate(custom_label=custom_labelling(sd_value,value,label))
Error in mutate_impl(.data, dots) :
Evaluation error: missing value where TRUE/FALSE needed.
In addition: Warning message:
In if (is.na(label[i]) & diff_sd_val < 9 & diff_val < 50) { :
the condition has length > 1 and only the first element will be used
One option is to find NA and non-NA index and based on the condition select the closest label to it.
library(dplyr)
#Create a new dataframe with diff_val and diff_sd_val
data1 <- data%>% mutate(diff_val=c(0,diff(value)), diff_sd_val=c(0,diff(sd_value)))
#Get the NA indices
NA_inds <- which(is.na(data1$label))
#Get the non-NA indices
non_NA_inds <- setdiff(1:nrow(data1), NA_inds)
#For every NA index
for (i in NA_inds) {
#Check the condition
if(data1$diff_sd_val[i] < 9 & data1$diff_val[i] < 50)
#Get the last non-NA label
data1$label[i] <- data1$label[non_NA_inds[which.max(i > non_NA_inds)]]
else
#Get the first non-NA label after last NA value
data1$label[i] <- data1$label[non_NA_inds[i < non_NA_inds]]
}
data1
# sd_value value label diff_val diff_sd_val
#1 34 383 bad 0 0
#2 33 428 bad 45 -1
#3 34 437 bad 9 1
#4 37 455 bad 18 3
#5 36 508 unable 53 -1
#6 45 509 unable 1 9
You can remove diff_val and diff_sd_val columns later if not needed.
We can also create a function
custom_label <- function(label, diff_val, diff_sd_val) {
NA_inds <- which(is.na(label))
non_NA_inds <- setdiff(1:length(label), NA_inds)
new_label = label
for (i in NA_inds) {
if(diff_sd_val[i] < 9 & diff_val[i] < 50)
new_label[i] <- label[non_NA_inds[which.max(i > non_NA_inds)]]
else
new_label[i] <- label[non_NA_inds[i < non_NA_inds]]
}
return(new_label)
}
and then apply it
data%>%
mutate(diff_val = c(0, diff(value)),
diff_sd_val = c(0, diff(sd_value)),
new_label = custom_label(label, diff_val, diff_sd_val))
# sd_value value label diff_val diff_sd_val new_label
#1 34 383 bad 0 0 bad
#2 33 428 <NA> 45 -1 bad
#3 34 437 <NA> 9 1 bad
#4 37 455 <NA> 18 3 bad
#5 36 508 <NA> 53 -1 unable
#6 45 509 unable 1 9 unable
If we want to apply it by group we can add a group_by statement and it should work.
data%>%
group_by(group) %>%
mutate(diff_val = c(0, diff(value)),
diff_sd_val = c(0, diff(sd_value)),
new_label = custom_label(label, diff_val, diff_sd_val))
I have a list containing data tables. A sample list can be created using following code.
mydata=read.table(textConnection("
MSA_id code variable Caucasian African.American Asian Hispanic Other
412 111011 1 64 2 0 0 0
412 111011 2 464 17 4 11 0
412 111021 1 2006 43 32 22 61
412 111021 2 559 18 6 10 0
412 111031 1 56 1 0 0 0
412 111031 2 1 0 0 0 0"),header=TRUE)
setDT(mydata)
z = split(mydata,mydata$code)
> z[1:2]
$`111011`
MSA_id code variable Caucasian African.American Asian Hispanic Other
1: 412 111011 1 64 2 0 0 0
2: 412 111011 2 464 17 4 11 0
$`111021`
MSA_id code variable Caucasian African.American Asian Hispanic Other
1: 412 111021 1 2006 43 32 22 61
2: 412 111021 2 559 18 6 10 0
I want to reformat elements of this list (data.tables) based on their values.
From my code, the elements of reformatted list should like this:
First Element:
[,1] [,2]
[1,] 64 2
[2,] 464 32
Second Element
Caucasian African.American Asian Hispanic
1: 2006 43 32 22
2: 559 18 6 10
Algorithm for this is:
Remove first 3 columns and the last column.
If minimum value of Caucasian is 0, or sum of minimum values of rest
3 (that is:African.American,Asian,Hispanic) categories is 0, then
set the element as NA.
Else if minimum of African.American is 0 or sum of minimum values of
Asian and Hispanic is 0, then sum up African.American, Asian, and
Hispanic as single category.
Else if minimum value of Asian is 0 or minimum value of Hispanic is
0, sum up Asian and Hispanic as single category.
Else keep the format as it is.
I created a function to do it. When I use this function on one element at a time, it works fine, but when I use lapply, it breaks.
formatTable <- function(z){
a = z[[1]]
b = a[,list(Caucasian,African.American,Asian,Hispanic),] # Deleting columns 1,2,3 and 8
if ( min(b$Caucasian) == 0) {
formatTable=NA
} else if ( (min(b$African.American) + min(b$Asian) + min(b$Hispanic)) == 0) {
formatTable=NA
} else if ( (min(b$African.American) == 0) | (min(b$Asian) + min(b$Hispanic)==0)) {
formatTable = cbind(b$Caucasian, b$African.American+b$Asian+b$Hispanic)
} else if ( min(b$Asian)==0 | min(b$Hispanic)==0) {
formatTable = cbind(b$Caucasian, b$African.American, b$Asian+b$Hispanic)
} else
formatTable = b
}
Using this function, t1=formatTable(z[1]) and t2=formatTable(z[2]) gives correct result, however if I use tbls = lapply(z[1:2],formatTable) it says Error in FUN(X[[1L]], ...) : object 'Caucasian' not found.
Please help on why lapply throws this error.
I am new to R. I have a data frame like following
>df=data.frame(Id=c("Entry_1","Entry_1","Entry_1","Entry_2","Entry_2","Entry_2","Entry_3","Entry_4","Entry_4","Entry_4","Entry_4"),Start=c(20,20,20,37,37,37,68,10,10,10,10),End=c(50,50,50,78,78,78,200,94,94,94,94),Pos=c(14,34,21,50,18,70,101,35,2,56,67),Hits=c(12,34,17,89,45,87,1,5,6,3,26))
Id Start End Pos Hits
Entry_1 20 50 14 12
Entry_1 20 50 34 34
Entry_1 20 50 21 17
Entry_2 37 78 50 89
Entry_2 37 78 18 45
Entry_2 37 78 70 87
Entry_3 68 200 101 1
Entry_4 10 94 35 5
Entry_4 10 94 2 6
Entry_4 10 94 56 3
Entry_4 10 94 67 26
For each entry I would like to iterate the data.frame in 3 different modes. For an example, for Entry_1 mode_1 =seq(20,50,3)and mode_2=seq(21,50,3) and mode_3=seq(22,50,3). I would like sum all the Values in Column "Hits" whose corresponding values in Column "Pos" that falls in mode_1 or_mode_2 or mode_3 and generate a data.frame like follow:
Id Mode_1 Mode_2 Mode_3
Entry_1 0 17 34
Entry_2 87 89 0
Entry_3 1 0 0
Entry_4 26 8 0
I tried the following code:
mode_1=0
mode_2=0
mode_3=0
mode_1_sum=0
mode_2_sum=0
mode_3_sum=0
for(i in dim(df)[1])
{
if(df$Pos[i] %in% seq(df$Start[i],df$End[i],3))
{
mode_1_sum=mode_1_sum+df$Hits[i]
print(mode_1_sum)
}
mode_1=mode_1_sum+counts
print(mode_1)
ifelse(df$Pos[i] %in% seq(df$Start[i]+1,df$End[i],3))
{
mode_2_sum=mode_2_sum+df$Hits[i]
print(mode_2_sum)
}
mode_2_sum=mode_2_sum+counts
print(mode_2)
ifelse(df$Pos[i] %in% seq(df$Start[i]+2,df$End[i],3))
{
mode_3_sum=mode_3_sum+df$Hits[i]
print(mode_3_sum)
}
mode_3_sum=mode_3_sum+counts
print(mode_3_sum)
}
But the above code only prints 26. Can any one guide me how to generate my desired output, please. I can provide much more details if needed. Thanks in advance.
It's not an elegant solution, but it works.
m <- 3 # Number of modes you want
foo <- ((df$Pos - df$Start)%%m + 1) * (df$Start < df$Pos) * (df$End > df$Pos)
tab <- matrix(0,nrow(df),m)
for(i in 1:m) tab[foo==i,i] <- df$Hits[foo==i]
aggregate(tab,list(df$Id),FUN=sum)
# Group.1 V1 V2 V3
# 1 Entry_1 0 17 34
# 2 Entry_2 87 89 0
# 3 Entry_3 1 0 0
# 4 Entry_4 26 8 0
-- EXPLANATION --
First, we find the indices of df$Pos That are both bigger than df$Start and smaller than df$End. These should return 1 if TRUE and 0 if FALSE. Next, we take the difference between df$Pos and df$Start, we take mod 3 (which will give a vector of 0s, 1s and 2s), and then we add 1 to get the right mode. We multiply these two things together, so that the values that fall within the interval retain the right mode, and the values that fall outside the interval become 0.
Next, we create an empty matrix that will contain the values. Then, we use a for-loop to fill in the matrix. Finally, we aggregate the matrix.
I tried looking for a quicker solution, but the main problem I cannot work around is the varying intervals for each row.