I need to read in a CSV file with no headers and with an unknown number of columns and rows. However , every other column belongs in one matrix while the next needs to be in a different matrix. Example
CSV input:
1,2,3,4
1,2,3,4
1,2,3,4
1,2,3,4
Desired result would be equivalent to:
matrix1 <- (c( 1, 3,
1, 3,
1, 3,
1, 3), NumberOfRows, NumberOfColumns, byrow=T);
and
matrix2 <- (c( 2, 4,
2, 4,
2, 4,
2, 4), NumberOfRows, NumberOfColumns, byrow=T);
I have tried something like this (but this seems overly complex and doesn't work anyways). Isn't there a simple way to do this in R?
mydata<- read.csv("~/Desktop/file.csv", header=FALSE, nrows=4000);
columnCount<-ncol(mydata);
rowCount<-nrow(mydata);
evenColumns <- matrix(); oddColumns <-matrix();
for (i in 1:columnCount) {
if (i %% 2) {
for (l in 1:rowCount){
col <- 1;
evenColumns[col, l] <-mydata[i,l];
col<-col+1;
}
}
else {
for (l in 1:rowCount){
col <-1;
oddColumns[col, l] <-mydata[i,l];
col<-col+1;
}
}
}
How should this be done properly in R?
You can get the column numbers with seq:
full = read.csv("mat.csv", header=FALSE)
odds = as.matrix(full[, seq(1, ncol(full), by=2)])
evens = as.matrix(full[, seq(2, ncol(full), by=2)])
Output:
> odds
V1 V3
[1,] 1 3
[2,] 1 3
[3,] 1 3
[4,] 1 3
> evens
V2 V4
[1,] 2 4
[2,] 2 4
[3,] 2 4
[4,] 2 4
Similar to the problem discussed here
mat.even <- mydata[,which(1:ncol(mydata) %% 2 == 0)]
mat.odd <- mydata[,which(1:ncol(mydata) %% 2 == 1)]
Every other starting with the first:
> cdat[ , c(TRUE,FALSE)]
V1 V3
1 1 3
2 1 3
3 1 3
4 1 3
Every other starting with the second:
> cdat[ , !c(TRUE,FALSE)]
V2 V4
1 2 4
2 2 4
3 2 4
4 2 4
Related
I have read the R igraph - save layout?, but in my case it is requared to save positions of begin's and end's edges into a file with the edge list together.
I have a tree igraph object and predefined mylayout layout on the plane.
tree <- make_tree(5, 2, mode = "undirected")
mylayout <- matrix(c(1, 2, 0, 3, 2,
1, 2, 0, 2, 3), ncol=2)
I have add a new attribute name
tree <- make_tree(5, 2, mode = "undirected") %>%
set_vertex_attr("name", value = seq(1:vcount(tree)))
and I get the edge list of graph via the get.edgelist() function, and I am going to use name attribute:
df1 <- data.frame(V1 = get.edgelist(tree)[,1],
V2 = get.edgelist(tree)[,2],
# V1_x = mylayout[as.integer(names(V(tree))), 1],
# V1_y = mylayout[as.integer(names(V(tree))), 2],
# V2_x = mylayout[, 1],
# V2_y = mylayout[, 2],
stringsAsFactors = FALSE)
Question. How to match the nodes positions with the begin's and end's positions of edges?
You can try this
get.data.frame(tree) %>%
cbind(
split(
data.frame(mylayout)[match(unlist(.), 1:nrow(mylayout)), ],
c(col(.))
)
)
which gives
from to 1.X1 1.X2 2.X1 2.X2
1 1 2 1 1 2 2
2 1 3 1 1 0 0
3 2 4 2 2 3 2
4 2 5 2 2 2 3
I don't know if there's an existing way to do this, but it's not too much work to write a helper function to do this
join_layout <- function(g, layout) {
edges <- as_edgelist(g)
idx1 <- match(edges[,1], V(g)$name)
idx2 <- match(edges[,2], V(g)$name)
result <- cbind(data.frame(edges),
layout[idx1, ],
layout[idx2, ]
)
names(result) <- c("V1", "V2", "V1_x", "V1_y", "V2_x","V2_y")
result
}
Basically we use match() to match up the vertex names to rows in the layout matrix. You call it by passing in the igraph object and your layout
join_layout(tree, mylayout)
# V1 V2 V1_x V1_y V2_x V2_y
# 1 1 2 1 1 2 2
# 2 1 3 1 1 0 0
# 3 2 4 2 2 3 2
# 4 2 5 2 2 2 3
I've got a data frame like that:
df:
A B C
1 1 2 3
2 2 2 4
3 2 2 3
I would like to subtract each column with the next smaler one (A-0, B-A, C-B). So my results should look like that:
df:
A B C
1 1 1 1
2 2 0 2
3 2 0 1
I tried the following loop, but it didn't work.
for (i in 1:3) {
j <- data[,i+1] - data[,i]
}
Try this
df - cbind(0, df[-ncol(df)])
# A B C
# 1 1 1 1
# 2 2 0 2
# 3 2 0 1
Data
df <- data.frame(A = c(1, 2, 2), B = c(2, 2, 2), C = c(3, 4, 3))
We can also remove the first and last column and do the subtraction
df[-1] <- df[-1]-df[-length(df)]
data
df <- data.frame(A = c(1, 2, 2), B = c(2, 2, 2), C = c(3, 4, 3))
Assume I have a list called: LS1 and within the list I have 20 matrix of 100 by 5. Now some columns might have just one value repeated like one column is all 100. I want to make these all 100 to all zeros. I can write a for loop to do that but I want to do it more efficiently with lapply and apply. For example one example of this matrix is
1 2 3 4 5
1 3 4 5 6
1 5 6 8 9
I want the first column which is all ones is changed to all zeros.
This is what I have done :
A= lapply(LS1, function(x) {apply(x,2,function(x1) {if(max(x1)== min(x1))
{0}}}
but this makes all the values NULL. Can anyone suggest doing this with lapply and apply?
This should work, especially for integer matrices.
lapply(lst,
function(mat) {
all_dupes = apply(mat, 2, function(x) length(unique(x)) ==1)
mat[, all_dupes] = 0L
return(mat)
}
)
This is my solution:
df <- data.frame(a = c(1, 1, 1),
b = c(2, 3, 5),
c = c(4, 5, 8),
d = c(5, 6, 9),
e = c(5, 5, 5))
A = data.frame(lapply(df, function(x) x = (max(x)!=min(x))*x ))
A
> A
a b c d e
1 0 2 4 5 0
2 0 3 5 6 0
3 0 5 8 9 0
If use sapply:
A = sapply(df, function(x) x = (max(x)!=min(x))*x)
A
a b c d e
[1,] 0 2 4 5 0
[2,] 0 3 5 6 0
[3,] 0 5 8 9 0
I have a list l, which has the following features:
It has 3 elements
Each element is a numeric vector of length 5
Each vector contains numbers from 1 to 5
l = list(a = c(2, 3, 1, 5, 1), b = c(4, 3, 3, 5, 2), c = c(5, 1, 3, 2, 4))
I want to do two things:
First
I want to know how many times each number occurs in the entire list and I want each result in a vector (or any form that can allow me to perform computations with the results later):
Code 1:
> a <- table(sapply(l, "["))
> x <- as.data.frame(a)
> x
Var1 Freq
1 1 3
2 2 3
3 3 4
4 4 2
5 5 3
Is there anyway to do it without using the table() function. I would like to do it "manually". I try to do it right below.
Code 2: (I know this is not very efficient!)
x <- data.frame(
"1" <- sum(sapply(l, "[")) == 1
"2" <- sum(sapply(l, "[")) == 2
"3" <- sum(sapply(l, "[")) == 3
"4" <- sum(sapply(l, "[")) == 4
"5" <- sum(sapply(l, "[")) == 5)
I tried the following, but I did not work. I actually did not understand the result.
> sapply(l, "[") == 1:5
a b c
[1,] FALSE FALSE FALSE
[2,] FALSE FALSE FALSE
[3,] FALSE TRUE TRUE
[4,] FALSE FALSE FALSE
[5,] FALSE FALSE FALSE
> sum(sapply(l, "[") == 1:5)
[1] 2
Second
Now, I would like to get the number of times each number appears in the list, but now in each element $a, $b and $c. I thought about using the lapply() but I don't know how exactly. Following is what I tried, but it is inefficient just like Code 2:
lapply(l, function(x) sum(x == 1))
lapply(l, function(x) sum(x == 2))
lapply(l, function(x) sum(x == 3))
lapply(l, function(x) sum(x == 4))
lapply(l, function(x) sum(x == 5))
What I get with these 5 lines of code are 5 lists of 3 elements each containing a single numeric value. For example, the second line of code tells me how many times number 2 appears in each element of l.
Code 3:
> lapply(l, function(x) sum(x == 2))
$a
[1] 1
$b
[1] 1
$c
[1] 1
What I would like to obtain is a list with three elements containing all the information I am looking for.
Please, use the references "Code 1", "Code 2" and "Code 3" in your answers. Thank you very much.
Just use as.data.frame(l) for the second part and table(unlist(l)) for the first.
> table(unlist(l))
1 2 3 4 5
3 3 4 2 3
> data.frame(lapply(l, tabulate))
a b c
1 2 0 1
2 1 1 1
3 1 2 1
4 0 1 1
5 1 1 1`
For code 1/2, you could use sapply to obtain the counts for whichever values you wanted:
l = list(a = c(2, 3, 1, 5, 1), b = c(4, 3, 3, 5, 2), c = c(5, 1, 3, 2, 4))
data.frame(number = 1:5,
freq = sapply(1:5, function(x) sum(unlist(l) == x)))
# number freq
# 1 1 3
# 2 2 3
# 3 3 4
# 4 4 2
# 5 5 3
For code 3, if you wanted to get the counts for lists a, b, and c, you could just apply your frequency function to each element of the list with the lapply function:
freqs = lapply(l, function(y) sapply(1:5, function(x) sum(unlist(y) == x)))
data.frame(number = 1:5, a=freqs$a, b=freqs$b, c=freqs$c)
# number a b c
# 1 1 2 0 1
# 2 2 1 1 1
# 3 3 1 2 1
# 4 4 0 1 1
# 5 5 1 1 1
here you have another example with nested lapply().
created data:
list = NULL
list[[1]] = c(1:5)
list[[2]] = c(1:5)+3
list[[2]] = c(1:5)+4
list[[3]] = c(1:5)-1
list[[4]] = c(1:5)*3
list2 = NULL
list2[[1]] = rep(1,5)
list2[[2]] = rep(2,5)
list2[[3]] = rep(0,5)
The result is this; it serve to subtract each element of one list with all elements of the other list.
lapply(list, function(d){ lapply(list2, function(a,b) {a-b}, b=d)})
I have a data frame containing a list vector with jagged entries:
df = data.frame(x = rep(c(1,2), 2), y = rep(c("a", "b"), each = 2))
L = list()
for (each in round(runif(4, 1,5))) L = c(L, list(1:each))
df$L = L
For example,
x y L
1 a 1
2 a 1, 2, 3, 4
1 b 1, 2, 3
2 b 1, 2, 3
How could I create a table which counts the values of L for each x, across the values of y? So, in this example it would output something like,
1 2 3 4
X
1 2 1 1 0
2 2 2 2 1
I had some luck using
tablist = function(L) table(unlist(L))
tapply(df$L, df$x, tablist)
which produces,
$`1`
1 2 3
2 1 1
$`2`
1 2 3 4
2 2 2 1
However, I'm not sure how to go from here to a single table. Also, I'm beggining to suspect that this approach might start taking an unruly amount of time for large data frames. Any thoughts / suggestions would be greatly appreciated!
Using pylr
library(plyr)
df = data.frame(x = rep(c(1,2), 2), y = rep(c("a", "b"), each = 2))
L = list()
set.seed(2)
for (each in round(runif(4, 1,5))) L = c(L, list(1:each))
df$L = L
> df
x y L
1 1 a 1, 2
2 2 a 1, 2, 3, 4
3 1 b 1, 2, 3
4 2 b 1, 2
table(ddply(df,.(x),summarize,unlist(L)))
> table(ddply(df,.(x),summarize,unlist(L)))
..1
x 1 2 3 4
1 2 2 1 0
2 2 2 1 1
If you're not into plyr...
vals <- unique(unlist(df$L))
names(vals) <- vals
do.call("rbind",
lapply(split(df,df$x),function(byx){
sapply(vals, function(i){
sum(unlist(sapply(byx$L,"==",i)))
})
})
)