I'm looking to change the value of a certain entry in a matrix based on the value of another entry. Its easiest to explain with an example:
Matrix
ABC-DEF 1 0 0 0
HIJ-KLM 0 0 0 0
NOP-QRS 1 0 0 0
KLM-HIJ 0 0 0 0
DEF-ABC 0 0 0 0
QRS-NOP 0 0 0 0
As you can see, each of the rows in the matrix above has a counterpart (e.g. ABC-DEF's counterpart is DEF-ABC).
Is there some way in which I can look to see which rows have a one in the first column and then place a 2 in the fourth column of its counterpart? In the above example then:
ABC-DEF 1 0 0 0
HIJ-KLM 0 0 0 0
NOP-QRS 1 0 0 0
KLM-HIJ 0 0 0 0
DEF-ABC 0 0 0 2
QRS-NOP 0 0 0 2
I'm quite stuck and would really appreciate any help!
Thanks!
Assuming your column names are V1,...,V5, you can do something like this :
values <- d$V1[d$V2==1]
d$V5[d$V1 %in% gsub("(...)-(...)","\\2-\\1", values)] <- 2
Which will give :
V1 V2 V3 V4 V5
1 ABC-DEF 1 0 0 0
2 HIJ-KLM 0 0 0 0
3 NOP-QRS 1 0 0 0
4 KLM-HIJ 0 0 0 0
5 DEF-ABC 0 0 0 2
6 QRS-NOP 0 0 0 2
If, instead of a data frame, your data is a numeric matrix m with row names, you can do :
values <- rownames(m)[m[,1]==1]
m[rownames(m) %in% gsub("(...)-(...)","\\2-\\1", values),4] <- 2
EDIT : To understand what the code is doing, you must see that :
gsub("(...)-(...)","\\2-\\1", values)
will replace any character string in the values vector of the form XXX-YYY by YYY-XXX via regexp matching. The result is a character vector of the "counterparts" of values. Then we use %in% to select every rows whose rownames appear in these counterpart values, and assign 2 in the fourth column for these rows.
Related
I have successfully imported my csv file into R. It is a 6 by 6 matrix.
0 0 0 0 0 0
0 1 0 0 0 0
0 1 1 0 0 0
0 1 0 0 0 1
0 1 0 1 0 0
1 1 1 1 1 1
I am looking for a function that will allow me to calculate which rows have the value '1' exactly twice.
I know 3 of the rows contain '1' so I would like to print '3'.
Is there any function that will allow me to achieve this?
We can use rowSums to get the sum of each row, convert it to logical with comparison operator and get the position by wrapping with which
which(rowSums(m1) == 2)
If it is the count, use sum
sum(rowSums(m1) == 2)
I have a vector with tagged words like c(#142#856#856.2#745, NA, #856#855, NA, #685, #663, #965.23, #855#658#744#122).
Words are separated by sharp. I would like create a data frame with one column for each different code, and then write 1 or 0 (or NA) depending if that code it is in that row or not.
The idea is that each element becomes a row, and each code becomes a column, and then if the code is in that element then in the column is marked with 1, or 0 if that code is not in that element.
ID | 142 | 856 |856.2 | ... | 122 |
1 | 1 | 1 | 1 | ... | 0 |
2 | 0 | 0 | 0 | ... | 0 |
...
I know how to do this with a complex algorithm plenty of loops. But, is it there any easy way to do this in a easy way?
You can accomplish this fairly easily using stringr:
# First we load the package
library(stringr)
# Then we create your example data vector
tagged_vector <- c('#142#856#856.2#745', NA, '#856#855', NA, '#685', '#663',
'#965.23', '#855#658#744#122')
# Next we need to get all the unique codes
# stringr's str_extract_all() can do this:
all_codes <- str_extract_all(string=tagged_vector, pattern='(?<=#)[0-9\\.]+')
# We just looked for one or more numbers and/or dots following a '#' character
# Now we just want the unique ones:
unique_codes <- unique(na.omit(unlist(all_codes)))
# Then we can use grepl() to check whether each code occurs in any element
# I've also used as.numeric() since you want 0/1 instead of TRUE/FALSE
result <- data.frame(sapply(unique_codes, function(x){
as.numeric(grepl(x, tagged_vector))
}))
# Then we add in your ID column and move it to the front:
result$ID <- 1:nrow(result)
result <- result[ , c(ncol(result), 1:(ncol(result)-1))]
The result is
ID X142 X856 X856.2 X745 X855 X685 X663 X965.23 X658 X744 X122
1 1 1 1 1 1 0 0 0 0 0 0 0
2 2 0 0 0 0 0 0 0 0 0 0 0
3 3 0 1 0 0 1 0 0 0 0 0 0
4 4 0 0 0 0 0 0 0 0 0 0 0
5 5 0 0 0 0 0 1 0 0 0 0 0
6 6 0 0 0 0 0 0 1 0 0 0 0
7 7 0 0 0 0 0 0 0 1 0 0 0
8 8 0 0 0 0 1 0 0 0 1 1 1
You may notice in the column names an "X" precedes each code. That's because in R a variable name may not begin with a number.
I have this data called out:
Dates Consumer Staples Energy Financials Health Care
1 12/31/99 0 0 0 0 0
2 03/31/00 0 0 0 0 0
3 06/30/00 0 0 0 0 0
4 09/30/00 0 0 0 0 0
5 12/31/00 0 0 0 0 0
6 03/31/01 1000 0 0 50 0
7 06/30/01 0 0 0 0 0
I would like to compute the weights for each category on each row
but need to avoid summing the first column which is a date
Weights <- round(out[2:6]/rowSums(out[2:6])*100, 2)
1/ Is there a way to keep the dates in the first column, and compute
the weights of the next 5 columns in the same data set
2/ When a date has only 0 data, how to avoid the NAs?
Thank you for you help
outN <- out[,-1]
rownames(outN) <- out[,1]
Cap_Weights <- round(outN/rowSums(outN)*100, 2)
Cap_Weights[is.na(Cap_Weights)] <- 0
I have a 39 column (with upward of 100000 rows) data frame whose last ten columns looks like that (The rest of the columns do not concern my question)
H3K27me3_gross_bin H3K4me3_gross_bin H3K4me1_gross_bin UtoP UtoM UPU UPP UPM UMU UMP UMM
cg00000029 3 3 6 1 1 0 0 0 0 0 0
cg00000321 6 1 5 1 0 0 1 0 0 0 0
cg00000363 6 1 1 1 0 1 0 0 0 0 0
cg00000622 1 2 1 0 0 0 0 0 0 0 0
cg00000714 2 5 6 1 0 0 0 0 0 0 0
cg00000734 2 6 2 0 0 0 0 0 0 0 0
I want to create a matrix that will:
a) count the number of rows in which the value columns UPU, UPP or UPM is 1 by each of the first three columns (H3K27me3_gross_bin, H3K4me3_gross_bin, H3K4me1_gross_bin)
b) sum each row of the columns UPU, UPP, UPM by the first three columns
I came up with this incredibly cumbersome way of doing this:
UtoPFrac<-seq(6)
UtoPTotEvents<-seq(6)
for (j in 1:3){
y<-df[,28+j]
for (i in 1:3){
UtoPFrac<-cbind(UtoPFrac,tapply(df[which(is.na(y)==FALSE),33+i],y[which(is.na(y)==FALSE)], function(x) length(which(x==1))))
}
}
UtoPFrac<-UtoPFrac[,2:10]
UtoPEvents<-cbind(rowSums(UtoPFrac[,1:3]),rowSums(UtoPFrac[,4:6]),rowSums(UtoPFrac[,7:9]))
I am certian there is a more elegent way of doing this, probably by using aggregate() or ddply(), but was unable to get this working.
I will apprciate any help doing this more efficenly
Thanks in advance
Not tested:
library(plyr)
dpply(df,.(H3K27me3_gross_bin, H3K4me3_gross_bin, H3K4me1_gross_bin), summarize, UPUl=length(UPU[which(UPU==1)]),UPPl=length(UPP[which(UPP==1)]),UPMl=length(UPM[which(UPM==1)]), mysum=sum( UPU + UPP + UPM))
P.S. If you dput the data and provide the expected output, I will test the above code
I am trying to produce a simple crosstable in R and have that exported to latex using knitr in Rstudio.
I want the table to look like a publishable table, with row header, column header, and subheaders for each category of the variable in the column. Since my table have identical categories for rows and columns, I wish to replace the column level headers with numbers. See example below:
Profession Mother
ProfesssionFather 1. 2. 3.
1. Bla frequency frequency frequency
2. blahabblab
3. blahblahblah
I am getting close with 'xtable' (I can't get row and column headers to print, and not multicolumn header), and the 'tables' package (I can't replace the column categories with numbers).
Minimal example:
work1 <- paste("LongString", 1:10, sep="")
work2 <- paste("LongString", 1:10, sep="")
t <- table(work1, work2) # making table
t # table with repated row/column names
colnames(t) <- paste(1:10, ".", sep="") # replacing column names with numeric values
xtable(t) # headers are omitted for both rows and columns
work <- data.frame(cbind(work1, work2)) # prepare for use of tabular
tabular((FathersProfession=work1) ~ (MothersProfession=work2), data=work) # have headers, but no way to change column categories from "LongString"x to numeric.
You need to assign the output of the tabular function to a named object:
tb <- tabular((FathersProfession=work1) ~ (MothersProfession=work2), data=work)
str(tb)
It should be obvious that the data is in a list and that the column-names are in the attribute that begins:
- attr(*, "colLabels")= chr [1:2, 1:10] "MothersProfession" "LongString1" NA "LongString10" ...
So
attr(tb, "colLabels") <-
gsub("LongString", "" , attr(tb, "colLabels") )
This is then the output to the screen, but the output to a latex device would be different.
> tb
MothersProfession
FathersProfession 1 10 2 3 4 5 6 7 8 9
LongString1 1 0 0 0 0 0 0 0 0 0
LongString10 0 1 0 0 0 0 0 0 0 0
LongString2 0 0 1 0 0 0 0 0 0 0
LongString3 0 0 0 1 0 0 0 0 0 0
LongString4 0 0 0 0 1 0 0 0 0 0
LongString5 0 0 0 0 0 1 0 0 0 0
LongString6 0 0 0 0 0 0 1 0 0 0
LongString7 0 0 0 0 0 0 0 1 0 0
LongString8 0 0 0 0 0 0 0 0 1 0
LongString9 0 0 0 0 0 0 0 0 0 1