Looping a function for each row in csv - r

I have a .csv file containing 22.388 rows with comma seperated numbers. I want to find all possible combinations of pairs of the numbers for each row seperately and list them pair for pair, so that I'll be able to make a visual representation of them as clusters.
An example of two rows from my file would be
"2, 13"
"2, 8, 6"
When I use the str() function R says the file contains factors. I guess it needs to be integers, but I need the rows to be seperate, therefore I've wrapped each row in " ".
I want possible combinations of pairs for each row like this.
2, 13
2, 8
2, 6
8, 6
I've already gotten an answer from #flodel saying
Sample input - replace textConnection(...) with your csv filename.
csv <- textConnection("2,13
2,8,6")
This reads the input into a list of values:
input.lines <- readLines(csv)
input.values <- strsplit(input.lines, ',')
This creates a nested list of pairs:
pairs <- lapply(input.values, combn, 2, simplify = FALSE)
This puts everything in a nice matrix of integers:
pairs.mat <- matrix(as.integer(unlist(pairs)), ncol = 2, byrow = TRUE)
pairs.mat
But I need the function to run through each row in my .csv file seperately, so I think I need to do a for loop with the function - I just can't get my head around it.
Thanks in advance.

Not sure exactly what you're after but maybe something like this:
dat <- readLines(n=2) #read in your data
2, 13
2, 8, 6
## split each string on "," and then remove white space
## and put into a list with lapply
dat2 <- lapply(dat, function(x) {
as.numeric(gsub("\\s+", "", unlist(strsplit(x, ","))))
})
## find all combinations using outer with outer (faster
## than expand.grid and we can take just a triangle)
dat3 <- lapply(dat2, function(x) {
y <- outer(x, x, paste)
c(y[upper.tri(y)])
})
## then split on the spaces and convert back to numeric
## stored as a list
lapply(strsplit(unlist(dat3), " "), as.numeric)
## > lapply(strsplit(unlist(dat3), " "), as.numeric)
## [[1]]
## [1] 2 13
##
## [[2]]
## [1] 2 8
##
## [[3]]
## [1] 2 6
##
## [[4]]
## [1] 8 6

Related

How to find patterns between sets of strings in R?

I'm trying to find patterns in a set of strings as the following example:
"2100780D001378FF01E1000000040000--------01A456000000------------"
"3100782D001378FF03E1008100040000--------01A445800000------------"
If I use the standard get_pattern from the bpa library, since it looks individually to every string I will get
"9999999A999999AA99A9999999999999--------99A999999999------------"
But my idea would be to find something like:
"X10078XD001378FF0XE100XX00040000--------01A4XXX00000------------"
The main objective is to find the set of strings with the most similar "pattern"
My first idea was to calculating the hamming distance between them and then analyzing the groups resulting from this distance but it gets tedious. Is there any "automatic" approach?
Any idea of how I can accomplish this mission?
for your sample data, the code below is working.. no idea how it scales to production...
library( data.table )
#sample data
data <- data.table( name = c("2100780D001378FF01E1000000040000--------01A456000000------------",
"3100782D001378FF03E1008100040000--------01A445800000------------"))
# name
# 1: 2100780D001378FF01E1000000040000--------01A456000000------------
# 2: 3100782D001378FF03E1008100040000--------01A445800000------------
#use data.table::tstrsplit() to split the string to individual characters
l <- lapply( data.table::tstrsplit( data$name, ""), function(x) {
#if the same character appears in all strings on the same position,return the character, else return 'X'
if ( length( unique( x ) ) == 1 ) as.character(x[1]) else "X"
})
#paste it all together
paste0(l, collapse = "")
# [1] "X10078XD001378FF0XE100XX00040000--------01A4XXX00000------------"
small explanation
data.table::tstrsplit( data$name, "") returns the following list
[[1]]
[1] "2" "3"
[[2]]
[1] "1" "1"
[[3]]
[1] "0" "0"
etc...
Using lapply(), you can loop over this list, determining the length of the vector with unique elements. Ith this length == 1, then the same character exists in all strings on this position, so return the character.
If the length > 1, then multiple characters apprear on this possition in different strings, and return "X".
Update
if you are after the hamming distances, use the stringdist-package
library(stringdist)
m <- stringdist::stringdistmatrix(a = data$name, b = data$name, ,method="hamming" )
# [,1] [,2]
# [1,] 0 8
# [2,] 8 0
#to get to the minimum value for each row, exclude the diagonal first (by making it NA)
# and the find the position with the minimum value
diag(m) <- NA
apply( m, 1, which.min )
# [1] 2 1
Here is a base R solution, where a custom function findPat is defined and Reduce is applied to find common pattern among a set of strings, i.e.,
findPat <- function(s1,s2){
r1 <- utf8ToInt(s1)
r2 <- utf8ToInt(s2)
r1[bitwXor(r1,r2)!=0]<- utf8ToInt("X")
pat <- intToUtf8(r1)
}
pat <- Reduce(findPat,list(s1,s2,s3))
such that
> pat
[1] "X10078XDX0X378FF0XE100XX00040000--------01AXXXXX0000------------"
DATA
s1 <- "2100780D001378FF01E1000000040000--------01A456000000------------"
s2 <- "3100782D001378FF03E1008100040000--------01A445800000------------"
s3 <- "4100781D109378FF03E1008100040000--------01A784580000------------"

How do you calculate the sum of a specific row in a list of matrices

I have data in matrices and the matrices are stored in a list, and I want the sum of the a specific row in each matrix.
some example data
A1<-matrix(0:9, nrow=5, ncol=2)
A2<-matrix(10:19, nrow=5, ncol = 2)
A3<-matrix(20:29, nrow=5, ncol = 2)
Mylist<-list(A1, A2, A3)
I can get the sum of all rows in each matrix with
lapply(Mylist, function(x) apply(x, 1, sum) )
but I only want the sum of a specific row, could be row 1, could be row 4, depending on what I want to look at. I know I can read it off of the results I generate with the code above but I want a cleaner solution that only gives me the results. Thanks
You can use purrr:map().
If you know the output type (in this case, seems to be all integers), you can be more specific, like map_int(). With map() you'll get a list back, with a specific map version, like map_int(), you get a vector back instead.
library(tidyverse)
ix <- 3 # let's say we want the sum of the third row
map_int(Mylist, ~sum(.x[ix, ]))
[1] 9 29 49
If the row index you care about changes per matrix, you can use map2() instead, which takes two inputs of the same length:
ixs <- c(1, 2, 3)
map2_int(Mylist, ixs, ~sum(.x[.y, ]))
[1] 5 27 49
Alternately, if you need to work in base R, you can just take the desired index (here, ix) of sum(), you don't need apply() inside lapply():
lapply(Mylist, function(x) sum(x[ix, ]))
[[1]]
[1] 9
[[2]]
[1] 29
[[3]]
[1] 49
one.row.sum <- function(df, row.num) lapply(Mylist, function(df) sum(df[row.num, ]))
one.row.sum(Mylist, 1)
[[1]]
[1] 5
[[2]]
[1] 25
[[3]]
[1] 45

r: how to partition a list or vector into pairs at an offset of 1

sorry for the elementary question but I need to partition a list of numbers at an offset of 1.
e.g.,
i have a list like:
c(194187, 193668, 192892, 192802 ..)
and need a list of lists like:
c(c(194187, 193668), c(193668, 192892), c(192892, 192802)...)
where the last element of list n is the first of list n+1. there must be a way to do this with
split()
but I can't figure it out
in mathematica, the command i need is Partition[list,2,1]
You can try like this, using zoo library
library(zoo)
x <- 1:10 # Vector of 10 numbers
m <- rollapply(data = x, 2, by=1, c) # Creates a Matrix of rows = n-1, each row as a List
l <- split(m, row(m)) #splitting the matrix into individual list
Output:
> l
$`1`
[1] 1 2
$`2`
[1] 2 3
$`3`
[1] 3 4
Here is an option using base R to create a vector of elements
v1 <- rbind(x[-length(x)], x[-1])
c(v1)
#[1] 194187 193668 193668 192892 192892 192802
If we need a list
split(v1, col(v1))
data
x <- c(194187, 193668, 192892, 192802);

R list into a single data frame cell

I'd like to change a list into one cell of a data frame.
list <- list(1,2,3,4,5)
View(list)
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
[[4]]
[1] 4
[[5]]
[1] 5
I'd like to transform this such that it looks like:
x
1 1,2,3,4,5
The reason is because I have a loop that is storing result in a list for each iteration, but I only want one cell per iteration.
There are other columns where for each iteration, there is only one result. So saving that in a data frame is easy. But then for the metric with multiple results, I don't want multiple columns or rows.
So I will have two data frames that I can use cbind on such that my final data frame will look like:
x y
1 1,2,3,4,5 a
2 5,4,3 b
You can easily achieve that by unlist and paste, i.e.,
data.frame(x = paste(l1, collapse = ','))
# x
#1 1,2,3,4,5
or simply (thanks #David)
data.frame(x = toString(list))
# x
#1 1, 2, 3, 4, 5
On a side note, avoid naming your lists 'list' as there is a function called list in R

Finding all pair combinations in strings with several comma separated instances in R

I'm really new to R and trying to solve a, for me, challenging problem.
I have a .csv file containing 22.388 rows with comma separated integers.
I want to find all possible combinations of pairs of the integers for each row separately and list them pair for pair, so that I'll be able to make a visual representation of them as clusters.
I've tried installing the combinat package for R but I can't seem to solve the problem.
An example from my file would be
2 13
2 8 6
Which should be listed in possible combinations of pairs like this.
2, 13
2, 8
2, 6
8, 6
Sample input - replace textConnection(...) with your csv filename.
csv <- textConnection("2,13
2,8,6")
This reads the input into a list of values:
input.lines <- readLines(csv)
input.values <- strsplit(input.lines, ',')
This creates a nested list of pairs:
pairs <- lapply(input.values, combn, 2, simplify = FALSE)
This puts everything in a nice matrix of integers:
pairs.mat <- matrix(as.integer(unlist(pairs)), ncol = 2, byrow = TRUE)
pairs.mat
# [,1] [,2]
# [1,] 2 13
# [2,] 2 8
# [3,] 2 6
# [4,] 8 6
combn gives the combinations of the vector elements. paste the combinations together with apply:
x <- c(2, 13)
y <- c(2, 8, 6)
apply(combn(x, 2), 2, paste, collapse=' ')
[1] "2 13"
Loop over these:
unlist(sapply(list(x, y), function(x) apply(combn(x, 2), 2, paste, collapse=' ')))
## [1] "2 13" "2 8" "2 6" "8 6"

Resources