I have this line in one my function - result[result>0.05] <- "", that replaces all values from my data frame grater than 0.05, including the row names from the first column. How to avoid this?
This is a fast way too:
df <- as.data.frame(matrix(runif(100),nrow=10))
df[-1][df[-1]>0.05] <- ''
Output:
> df
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 0.60105471
2 0.63340567
3 0.11625581
4 0.96227379 0.0173133104108274
5 0.07333583
6 0.05474430 0.0228175506927073
7 0.62610309
8 0.76867090
9 0.76684615 0.0459537433926016
10 0.83312158
Related
I have a list of data frames. Each data frame has 6 rows and 6 columns. They are all numbers, however, all data frames have their elements as class character.
Example:
$`A`
V1 V2 V3 V4 V5 V6
V1 0.1212 0.6231 0.4431 0.3213 0.6578 0.1259
V2 2.1234 0.6532 0.9845 0.8743 0.8732
V3 0.2314 0.7648 0.7634 0.8732
V4 0.1234 0.6544 0.3456
V5 0.7653 0.9812
V6 0.1265
$`B`
V1 V2 V3 V4 V5 V6
V1 0.2345 0.1234 0.5647 0.7891 0.6721 0.3259
V2 1.1334 0.4332 0.1245 0.2343 0.5332
V3 0.2914 0.1648 0.2334 0.1232
V4 0.1234 0.6744 0.5656
V5 0.3553 0.9812
V6 0.4665
I would like to change all data frames of the list to class matrix (numerical).
I tried:
lapply (list, data.matrix)
but the result is a list of data frames with integers. Example:
V1 V2 V3 V4 V5 V6
V1 2 2 2 2 2 4
V2 1 3 4 5 5 7
V3 1 1 3 4 6 3
V4 1 1 1 3 4 5
V5 1 1 1 1 1 1
V6 1 1 1 1 1 1
Also tried to run
lapply(list, as.matrix)
however, I got a list of quoted matrices, like this:
$`A`
V1 V2 V3 V4 V5 V6
V1 "0.1212" "0.6231" "0.4431" "0.3213" "0.6578" "0.1259"
V2 "2.1234" "0.6532" "0.9845" "0.8743" "0.8732"
V3 "0.2314" "0.7648" "0.7634" "0.8732"
V4 "0.1234" "0.6544" "0.3456"
V5 "0.7653" "0.9812"
V6 "0.1265"
How can I convert these data frames of my list from character class to matrix class?
We may loop over the list, then loop over the data.frame columns with lapply convert to numeric and assign it back to the original data.frame object and return the data.frame ('x')
list <- lapply(list, function(x) {x[] <- lapply(x, as.numeric);x})
If those are factor columns, convert to character first and then to numeric
lapply(list, function(x) {x[] <- lapply(x, function(y) as.numeric(as.character(y)))
x})
You can convert to numeric and then reset the matrix order:
lapply(dfs, function(x) matrix(as.numeric(x), ncol = n_cols))
Data
set.seed(1L)
n_cols <- 6
n_total <- 36
a <- matrix(rnorm(n_total), ncol = n_cols)
b <- matrix(rnorm(n_total), ncol = n_cols)
a[lower.tri(a)] <- ""
b[lower.tri(b)] <- ""
dfs <- list(a, b)
I want to transpose a column in several smaller parts based on another column's values e.g.
1 ID1 V1
2 ID1 V2
3 ID1 V3
4 ID2 V4
5 ID2 V5
6 ID3 V6
7 ID3 V7
8 ID3 V8
9 ID3 V9
I wish to have all V values for each ID to be in one row e.g.
ID1 V1 V2 V3
ID2 V4 V5
ID3 V6 V7 V8 V9
Each id has different number of rows to transpose as shown in the example. If it is easier to use the serial number column to perform this then that is fine too.
Can anyone help ?
Here is a simple awk one-liner to do the trick:
awk '1 {if (a[$2]) {a[$2] = a[$2]" "$3} else {a[$2] = $3}} END {for (i in a) { print i,a[i]}}' file.txt
Output:
ID1 V1 V2 V3
ID2 V4 V5
ID3 V6 V7 V8 V9
If you like coding in Javascript this is how to do it on the command line using jline: https://github.com/bitdivine/jline/
mmurphy#violet:~$ cat ,,, | jline-foreach 'begin::global.all={}' line::'fields=record.split(/ +/);if(fields.length==3)tm.incrementPath(all,fields.slice(1))' end::'tm.find(all,{maxdepth:1},function(path,val){console.log(path[0],Object.keys(val).join(","));})'
ID1 V1,V2,V3
ID2 V4,V5
ID3 V6,V7,V8,V9
where the input is:
mmurphy#violet:~$ cat ,,,
1 ID1 V1
2 ID1 V2
3 ID1 V3
4 ID2 V4
5 ID2 V5
6 ID3 V6
7 ID3 V7
8 ID3 V8
9 ID3 V9
mmurphy#violet:~$
Explanation: This builds a tree where the first level of branches is the user ID and the second is the V (version?). You could do this for any number of levels. The leaves are just counters. First we create an empty tree:
'begin::global.all={}'
Then each line that comes in is split into counter, ID and version number. The counter is sliced off leaving just the array [userID,version]. incrementCounter creates those branches in the tree, a bit like mkdir -p, and increments the leaf counter although you don't actually need to know how often each user,version combination has been seen:
line::'fields=record.split(/ +/);if(fields.length==3)tm.incrementPath(all,fields.slice(1))' end::'tm.find(all,{maxdepth:1},function(path,val){console.log(path[0],Object.keys(val).join(","));})'
At the end we have tm.find which behaves just like UNIX find and prints every path in the tree. Except that that we limit the depth of the search to the desired breakdown (1, but if you're like me you'll be wanting to do a breakdown of 2,3,5 or 8 variables next). That way you have separated out the breakdown and your list of values and you can print your answer.
If you are never going to need deeper breakdowns you will probably want to stick with awk, as it's probably preinstalled.
I have a file that is laid out in the following way:
# Query ID 1
# note
# note
tab delimited data across 12 columns
# Query ID 2
# note
# note
tab delimited data across 12 columns
I'd like to import this data into R so that each query is its own dataframe. Ideally as a list of dataframes with the query ID as the name of each item in the list. I've been searching for awhile, but I haven't seen a good way to do this. Is this possible?
Thanks
We have used comma instead of tab to make it easier to see and have put the body of the file in a string but aside from making the obvious changes try this. First we use readLines to read in the file and then determine where the headers are and create a grp vector which has the same number of elements as lines in the file and whose values are the header for that line. Finally split the lines, and apply Read to each group.
but aside from that try this:
# test data
Lines <- "# Query ID 1
# note
# note
1,2,3,4,5,6,7,8,9,10,11,12
1,2,3,4,5,6,7,8,9,10,11,12
# Query ID 2
# note
# note
1,2,3,4,5,6,7,8,9,10,11,12
1,2,3,4,5,6,7,8,9,10,11,12"
L <- readLines(textConnection(Lines)) # L <- readLines("myfile")
isHdr <- grepl("Query", L)
grp <- L[isHdr][cumsum(isHdr)]
# Read <- function(x) read.table(text = x, sep = "\t", fill = TRUE, comment = "#")
Read <- function(x) read.table(text = x, sep = ",", fill = TRUE, comment = "#")
Map(Read, split(L, grp))
giving:
$`# Query ID 1`
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
1 1 2 3 4 5 6 7 8 9 10 11 12
2 1 2 3 4 5 6 7 8 9 10 11 12
$`# Query ID 2`
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
1 1 2 3 4 5 6 7 8 9 10 11 12
2 1 2 3 4 5 6 7 8 9 10 11 12
No packages needed.
I have a data.frame with 16 columns. Here's one example row.
> data[16,]
V1 V2 V3 V4
16 comp27182_c0_seq4 ENSP00000442096 ENSG00000011143 ENSFCAP00000011376
V5 V6 V7 V8
16 ENSFCAG00000012261 comp48601_c0_seq1 comp19130_c0_seq3 comp22796_c2_seq3
V9 V10 V11 V12
16 comp146901_c0_seq1 comp157916_c0_seq1 comp158124_c0_seq1
V13 V14 V15 V16
16 comp229797_c0_seq1 comp61875_c0_seq2
I'm only interested in columns 1 and 6-16. The first column contains the name I would like to use as a column name in the matrix, 6 to 16 may contain either a string or '' (nothing).
I would like to transform this data.frame into a matrix showing 1 or 0, reflecting the content in columns 6-16.
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
comp27182_c0_seq4 1 1 1 1 0 1 1 1 1 0 0
I've trying to use mask without success. I'm sure there's a very easy option out there.
Thanks for any help.
Try this:
do.call(cbind, lapply(c(1,6:16),
function(x) as.numeric(nchar(as.character(data[,x])) > 0)))
I slightly modified your code to my exact needs. Now the first column is naming the rows.
a<-do.call(cbind, lapply(c(6:16),
function(x) as.numeric(nchar(as.character(data[,x])) > 0)))
rownames(a)<-data[,1]
It works great, thanks!
I have a simulation dataset that explores a set of parameter space, and each set of parameter are run multiple times (iterations), it looks like so:
p1 p2 p3 iteration result
=================================
v3 v2 v1 1 23.8
v2 v1 v3 2 20.36
v3 v2 v1 2 28.8
v2 v1 v3 1 29.36
...
As can be seen from this example, both (v3, v2, v1) and (v2, v1, v3) are run twice. I am trying to extract only the rows with max result for each parameter setting, in this example:
only row 3 and 4 should be kept, as they represent the best results from that parameter set. Is there a easy way to accomplish that in R? Thanks
df <- read.table(textConnection("p1 p2 p3 iteration result
v3 v2 v1 1 23.8
v2 v1 v3 2 20.36
v3 v2 v1 2 28.8
v2 v1 v3 1 29.36"), header = T)
library(plyr)
ddply(df, .(p1,p2,p3), function(x) return(x[(which(x$result == max(x$result))), ]))
p1 p2 p3 iteration result
1 v2 v1 v3 1 29.36
2 v3 v2 v1 2 28.80