This link answers a part of my question: How to randomize (or permute) a dataframe rowwise and columnwise?.
> df1
a b c
1 1 1 0
2 1 0 0
3 0 1 0
4 0 0 0
Column-wise shuffle gives me below output df3, which is reordering the columns
> df3 <- df1[,sample(ncol(df1))]
> df3
c a b
1 0 1 1
2 0 1 0
3 0 0 1
4 0 0 0
What I want is that the column names should change as well. Row-wise and column-wise total remains the same, just the column names get reassigned. Something like df4. How can I achieve this?
> df4
c a b
1 1 1 0
2 1 0 0
3 0 1 0
4 0 0 0
PS: How do I keep the df in its shape rows by column? when I post the question the formatting collapses?
You might want to just sample the column-names. Something like:
names(df) <- names(df)[sample(ncol(df))]
I'm preprocessing some data from sensor-created files into the format required for external analysis (ultimately, it needs to be output as a CSV). The end goal is something like this:
1 C3 C4 Cz Pz AllSites 2 C3 C4 Cz Pz AllSites 3 C3 C4 Cz Pz AllSites
50:23.9 0 0 0 0 0 53:15.0 0 0 0 0 0 09:15.0 0 0 0 0 0
50:24.9 1 0 0 1 0 53:16.0 1 0 0 1 0 09:16.1 0 0 1 0 0
50:26.0 1 0 0 0 0 53:17.1 1 0 0 1 0 09:17.1 0 0 1 0 0
50:27.0 1 0 0 1 0 53:18.1 1 1 1 0 0 09:18.1 0 0 1 1 0
50:28.0 0 1 0 0 0 53:19.2 1 0 0 0 0 09:19.2 0 0 1 0 0
50:29.1 1 1 1 1 1 53:20.2 1 0 0 1 0 09:20.2 0 0 1 0 0
50:30.2 0 1 1 0 0 53:21.2 1 0 0 0 0 09:21.2 0 0 0 1 0
50:31.2 0 0 0 0 0 53:22.3 0 0 0 0 0 09:22.3 0 0 0 1 0
Each set of columns is data from one session. The only catch is that sessions are of inequal length (and thus each group has a different number of observations), so at the moment, it's all in a list instead of a data frame. I have found a few different ways of exporting to CSV (e.g., this question), but they all involve converting to a data frame first. How do I export a list to CSV without converting it to a data frame first?
N.B.: I also found a bunch of questions about exporting a list of data frames to a series of CSV files, but for this application, all the data frames need to be in a single CSV.
Lets make some simple samples:
b1 = data.frame(C3=sample(c(0,1),8,TRUE),C4=sample(c(0,1),8,TRUE),Cz=sample(c(0,1),8,TRUE))
b2 = data.frame(C3=sample(c(0,1),3,TRUE),C4=sample(c(0,1),3,TRUE),Cz=sample(c(0,1),3,TRUE))
b3 = data.frame(C3=sample(c(0,1),8,TRUE),C4=sample(c(0,1),8,TRUE),Cz=sample(c(0,1),8,TRUE))
You cant just column-bind them and hope R pads out the smaller columns:
> cbind(b1,b2,b3)
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 8, 3
So we need to paste them into a big enough data frame. Lets make one full of NAs to start:
b = data.frame(matrix(NA, ncol=ncol(b1)+ncol(b2)+ncol(b3), nrow=max(nrow(b1),nrow(b2),nrow(b3))))
dim(b)
[1] 8 9
Then this code puts each b data frame in the right place. Each one is a bit further along:
> b[1:nrow(b1),1:ncol(b1)]=b1
> b[1:nrow(b2),(1:ncol(b1))+ncol(b1)]=b2
> b[1:nrow(b3),(1:ncol(b1))+ncol(b1)+ncol(b2)]=b3
> b
X1 X2 X3 X4 X5 X6 X7 X8 X9
1 1 1 1 1 0 0 0 0 1
2 1 1 0 0 0 0 0 1 0
3 0 0 1 0 1 1 0 1 1
4 1 1 1 NA NA NA 1 1 1
5 0 0 0 NA NA NA 0 0 0
6 0 1 0 NA NA NA 1 0 1
7 0 0 0 NA NA NA 1 1 1
8 0 1 0 NA NA NA 1 1 1
Easy enough to generalise in a loop over a list. Now:
> write.csv(b,na="")
"","X1","X2","X3","X4","X5","X6","X7","X8","X9"
"1",1,1,1,1,0,0,0,0,1
"2",1,1,0,0,0,0,0,1,0
"3",0,0,1,0,1,1,0,1,1
"4",1,1,1,,,,1,1,1
"5",0,0,0,,,,0,0,0
"6",0,1,0,,,,1,0,1
"7",0,0,0,,,,1,1,1
"8",0,1,0,,,,1,1,1
Gives us those empty columns. You probably need to fiddle about to get the column headers back and repeated but that's easy enough...
Not sure if this is what you need... but it's a shot...
a <- data.frame(small=letters)
b <- data.frame(big=LETTERS)
l <- list(a=a, b=b)
sapply(names(l), function(x)write.csv(l[[x]], file=paste0(x, ".csv")))
# or maybe all in the same file...
sapply(names(l), function(x)write.table(l[[x]], file="c.csv", append=T))
A csv file is most often used to export data in tabular form. They map perfectly with the data.frame R objects. list objects are way more general and exhibit a lot of flexibility that a simple csv format cannot handle in many cases.
In your cases sure you have a list, but the components of your list are data frames that share (apparently) the same structure (same number and names of the columns). So, it's pretty trivial to join all them in just one data frame. You only need an additional column that indicates the session. So, if mylist is your list, you can try:
mydf<-do.call(rbind,mylist)
elLength<-vapply(mylist,length,1)
mydf$Session<-rep(1:length(mylist),times=elLength))
In this way you end up with a single data frame and you can extract the session through the Session column. You can use read.csv to export it to a csv file.
I'm looking to change the value of a certain entry in a matrix based on the value of another entry. Its easiest to explain with an example:
Matrix
ABC-DEF 1 0 0 0
HIJ-KLM 0 0 0 0
NOP-QRS 1 0 0 0
KLM-HIJ 0 0 0 0
DEF-ABC 0 0 0 0
QRS-NOP 0 0 0 0
As you can see, each of the rows in the matrix above has a counterpart (e.g. ABC-DEF's counterpart is DEF-ABC).
Is there some way in which I can look to see which rows have a one in the first column and then place a 2 in the fourth column of its counterpart? In the above example then:
ABC-DEF 1 0 0 0
HIJ-KLM 0 0 0 0
NOP-QRS 1 0 0 0
KLM-HIJ 0 0 0 0
DEF-ABC 0 0 0 2
QRS-NOP 0 0 0 2
I'm quite stuck and would really appreciate any help!
Thanks!
Assuming your column names are V1,...,V5, you can do something like this :
values <- d$V1[d$V2==1]
d$V5[d$V1 %in% gsub("(...)-(...)","\\2-\\1", values)] <- 2
Which will give :
V1 V2 V3 V4 V5
1 ABC-DEF 1 0 0 0
2 HIJ-KLM 0 0 0 0
3 NOP-QRS 1 0 0 0
4 KLM-HIJ 0 0 0 0
5 DEF-ABC 0 0 0 2
6 QRS-NOP 0 0 0 2
If, instead of a data frame, your data is a numeric matrix m with row names, you can do :
values <- rownames(m)[m[,1]==1]
m[rownames(m) %in% gsub("(...)-(...)","\\2-\\1", values),4] <- 2
EDIT : To understand what the code is doing, you must see that :
gsub("(...)-(...)","\\2-\\1", values)
will replace any character string in the values vector of the form XXX-YYY by YYY-XXX via regexp matching. The result is a character vector of the "counterparts" of values. Then we use %in% to select every rows whose rownames appear in these counterpart values, and assign 2 in the fourth column for these rows.
I have a 39 column (with upward of 100000 rows) data frame whose last ten columns looks like that (The rest of the columns do not concern my question)
H3K27me3_gross_bin H3K4me3_gross_bin H3K4me1_gross_bin UtoP UtoM UPU UPP UPM UMU UMP UMM
cg00000029 3 3 6 1 1 0 0 0 0 0 0
cg00000321 6 1 5 1 0 0 1 0 0 0 0
cg00000363 6 1 1 1 0 1 0 0 0 0 0
cg00000622 1 2 1 0 0 0 0 0 0 0 0
cg00000714 2 5 6 1 0 0 0 0 0 0 0
cg00000734 2 6 2 0 0 0 0 0 0 0 0
I want to create a matrix that will:
a) count the number of rows in which the value columns UPU, UPP or UPM is 1 by each of the first three columns (H3K27me3_gross_bin, H3K4me3_gross_bin, H3K4me1_gross_bin)
b) sum each row of the columns UPU, UPP, UPM by the first three columns
I came up with this incredibly cumbersome way of doing this:
UtoPFrac<-seq(6)
UtoPTotEvents<-seq(6)
for (j in 1:3){
y<-df[,28+j]
for (i in 1:3){
UtoPFrac<-cbind(UtoPFrac,tapply(df[which(is.na(y)==FALSE),33+i],y[which(is.na(y)==FALSE)], function(x) length(which(x==1))))
}
}
UtoPFrac<-UtoPFrac[,2:10]
UtoPEvents<-cbind(rowSums(UtoPFrac[,1:3]),rowSums(UtoPFrac[,4:6]),rowSums(UtoPFrac[,7:9]))
I am certian there is a more elegent way of doing this, probably by using aggregate() or ddply(), but was unable to get this working.
I will apprciate any help doing this more efficenly
Thanks in advance
Not tested:
library(plyr)
dpply(df,.(H3K27me3_gross_bin, H3K4me3_gross_bin, H3K4me1_gross_bin), summarize, UPUl=length(UPU[which(UPU==1)]),UPPl=length(UPP[which(UPP==1)]),UPMl=length(UPM[which(UPM==1)]), mysum=sum( UPU + UPP + UPM))
P.S. If you dput the data and provide the expected output, I will test the above code
I am trying to produce a simple crosstable in R and have that exported to latex using knitr in Rstudio.
I want the table to look like a publishable table, with row header, column header, and subheaders for each category of the variable in the column. Since my table have identical categories for rows and columns, I wish to replace the column level headers with numbers. See example below:
Profession Mother
ProfesssionFather 1. 2. 3.
1. Bla frequency frequency frequency
2. blahabblab
3. blahblahblah
I am getting close with 'xtable' (I can't get row and column headers to print, and not multicolumn header), and the 'tables' package (I can't replace the column categories with numbers).
Minimal example:
work1 <- paste("LongString", 1:10, sep="")
work2 <- paste("LongString", 1:10, sep="")
t <- table(work1, work2) # making table
t # table with repated row/column names
colnames(t) <- paste(1:10, ".", sep="") # replacing column names with numeric values
xtable(t) # headers are omitted for both rows and columns
work <- data.frame(cbind(work1, work2)) # prepare for use of tabular
tabular((FathersProfession=work1) ~ (MothersProfession=work2), data=work) # have headers, but no way to change column categories from "LongString"x to numeric.
You need to assign the output of the tabular function to a named object:
tb <- tabular((FathersProfession=work1) ~ (MothersProfession=work2), data=work)
str(tb)
It should be obvious that the data is in a list and that the column-names are in the attribute that begins:
- attr(*, "colLabels")= chr [1:2, 1:10] "MothersProfession" "LongString1" NA "LongString10" ...
So
attr(tb, "colLabels") <-
gsub("LongString", "" , attr(tb, "colLabels") )
This is then the output to the screen, but the output to a latex device would be different.
> tb
MothersProfession
FathersProfession 1 10 2 3 4 5 6 7 8 9
LongString1 1 0 0 0 0 0 0 0 0 0
LongString10 0 1 0 0 0 0 0 0 0 0
LongString2 0 0 1 0 0 0 0 0 0 0
LongString3 0 0 0 1 0 0 0 0 0 0
LongString4 0 0 0 0 1 0 0 0 0 0
LongString5 0 0 0 0 0 1 0 0 0 0
LongString6 0 0 0 0 0 0 1 0 0 0
LongString7 0 0 0 0 0 0 0 1 0 0
LongString8 0 0 0 0 0 0 0 0 1 0
LongString9 0 0 0 0 0 0 0 0 0 1