I want to create a new csv from the existing csv consist of multiple same columns but not sorted data - r

I have a CSV with these data:
List Rank.A List Rank.B List Rank.C
a 4 a 8 b 3
b 5 e 5 e 9
c 7 f 5 r 1
I want to create a new csv in which there is only a one-column with a name List with a unique value and there is 3 more columns of "Rank.A", "Rank.B", "Rank.C" in same list. Suppose if Rank.A not listed with any row of List than it display blank. I want data in this format
List Rank.A Rank.B Rank.C
a 4 8
b 5 3
c 7
e 5 9
f 5
r 1
Can you please help me in that?

A base R option using split.default (to split your data.frame by columns) and Reduce + merge to combine data into a single data.frame.
Reduce(
function(x, y) merge(x, y, all = TRUE),
split.default(df, rep(1:(ncol(df) / 2), each = 2)))
# List Rank.A Rank.B Rank.C
# 1 a 4 8 NA
# 2 b 5 NA 3
# 3 c 7 NA NA
# 4 e NA 5 9
# 5 f NA 5 NA
# 6 r NA NA 1
Note that this assumes that you always have pairs of columns (List, Rank.x) in your original data.
Sample data
df <- read.table(text =
"List Rank.A List Rank.B List Rank.C
a 4 a 8 b 3
b 5 e 5 e 9
c 7 f 5 r 1", header = T, check.names = F)

Related

Bind data frames from two nested lists by list and df names

I would like to bind R data frames from two different nested lists by its names as follows:
list1 = list(list_a = list(df1 = data.frame(letters = c('A','B','C'),
numbers = seq(1,3)),
df2 = data.frame(letters = c('A','B','C','D','E'),
numbers = seq(1,5))),
list_b = list(df3 = data.frame(norm = rnorm(4))))
list2 = list(list_a = list(df1 = data.frame(letters = c('D','E','F'),
numbers = seq(4,6)),
df2 = data.frame(letters = c('F','G','H','I','J'),
numbers = seq(6,10))),
list_b = list(df3 = data.frame(norm = rnorm(4))))
The result I expect after binding this two lists by names is:
> list3
$list_a
$list_a$df1
letters numbers
1 A 1
2 B 2
3 C 3
4 D 4
5 E 5
6 F 6
$list_a$df2
letters numbers
1 A 1
2 B 2
3 C 3
4 D 4
5 E 5
6 F 6
7 G 7
8 H 8
9 I 9
10 J 10
$list_b
$list_b$df3
norm
1 0.1400504
2 -0.5785170
3 -0.2905891
4 1.9175712
5 1.8736454
6 -0.4895259
7 0.5975914
8 0.3586774
So, in brief what I really want to do is bind the respective data frame from this two nested list by its names.
Any ideas?
Assume that list1 and list2 have a mutual structure(identical depth, list names, order), you could use a nested Map():
Map(\(x, y) Map(rbind, x, y), list1, list2)
$list_a
$list_a$df1
letters numbers
1 A 1
2 B 2
3 C 3
4 D 4
5 E 5
6 F 6
$list_a$df2
letters numbers
1 A 1
2 B 2
3 C 3
4 D 4
5 E 5
6 F 6
7 G 7
8 H 8
9 I 9
10 J 10
$list_b
$list_b$df3
norm
1 0.6370310
2 0.3425583
3 -0.8333353
4 1.5825106
5 -0.2551183
6 1.1983533
7 -1.0771730
8 -1.1065747

select columns after named columns

I have a data frame of the following form in R
First
a
b
c
Second
a
b
c
3
8
1
7
6
8
5
9
4
2
8
5
I'm trying to write something that selects the three columns following "First" & "Second", and puts them into new data frames titled "First" & "Second" respectively. I'm thinking of using the strategy below (where df is the dataframe I outline above), but am unsure how to make it such that R takes the columns that follow the ones I specify
names <- c("First", "Second")
for (i in c){
i <- (something to specify the 3 columns following df$i)
}
An option is to split.default to split the data.frame into a list of data.frames
split.default(df, cumsum(names(df) %in% names))
#$`1`
# First a b c
#1 NA 3 8 1
#2 NA 5 9 4
#
#$`2`
# Second a b c
#1 NA 7 6 8
#2 NA 2 8 5
The expression cumsum(...) creates the indices according to which to group and split columns.
Sample data
df <- read.table(text = "First a b c Second a b c
'' 3 8 1 '' 7 6 8
'' 5 9 4 '' 2 8 5", header = T, check.names = F)
You can get position of names vector in column names of the data and subset the next 3 columns from it.
names <- c("First", "Second")
inds <- which(names(df) %in% names)
result <- Map(function(x, y) df[x:y], inds + 1, inds + 3)
result
#[[1]]
# a b c
#1 3 8 1
#2 5 9 4
#[[2]]
# a b c
#1 7 6 8
#2 2 8 5
To create separate dataframes you can name the list and use list2env
names(result) <- names
list2env(result, .GlobalEnv)

How can I insert blank rows every 3 existing rows in a data frame?

How can I insert blank rows every 3 existing rows in a data frame?
After a web scraping process I get a dataframe with the information I need, however the final excel format requires that I add a blank row every 3 rows. I have searched the web for help but have not found a solution yet.
With hypothetical data, the structure of my data frame is as follows:
mi_df <- data.frame(
"ID" = rep(1:3,c(3,3,3)),
"X" = as.character(c("a", "a", "a", "b", "b", "b", "c", "c", "c")),
"Y" = seq(1,18, by=2)
)
mi_df
ID X Y
1 1 a 1
2 1 a 3
3 1 a 5
4 2 b 7
5 2 b 9
6 2 b 11
7 3 c 13
8 3 c 15
9 3 c 17
The result I hope for is something like this
ID X Y
1 1 a 1
2 1 a 3
3 1 a 5
4
5 2 b 7
6 2 b 9
7 2 b 11
8
9 3 c 13
10 3 c 15
11 3 c 17
If the indices of a data frame contain NA, then the output will have NA rows. So my goal is to create a vector like 1 2 3 NA 4 5 6 NA ... and set it as the indices of mi_df.
cut <- rep(1:(nrow(mi_df)/3), each = 3)
mi_df[sapply(split(1:nrow(mi_df), cut), c, NA), ]
# ID X Y
# 1 1 a 1
# 2 1 a 3
# 3 1 a 5
# NA NA <NA> NA
# 4 2 b 7
# 5 2 b 9
# 6 2 b 11
# NA.1 NA <NA> NA
# 7 3 c 13
# 8 3 c 15
# 9 3 c 17
# NA.2 NA <NA> NA
If nrow(mi_df) is not a multiple of 3, then the following is a general solution:
# Version 1
cut <- rep(1:ceiling(nrow(mi_df)/3), each = 3, len = nrow(mi_df))
mi_df[Reduce(c, lapply(split(1:nrow(mi_df), cut), c, NA)), ]
# Version 2
cut <- rep(1:ceiling(nrow(mi_df)/3), each = 3, len = nrow(mi_df))
mi_df[Reduce(function(x, y) c(x, NA, y), split(1:nrow(mi_df), cut)), ]
Don't mind the NA in the output because some functions which write data to an excel file have an optional argument controls if NA values are converted to strings or be empty. E.g.
library(openxlsx)
write.xlsx(df, "test.xlsx", keepNA = FALSE) # defaults to FALSE
tmp <- split(mi_df, rep(1:(nrow(mi_df) / 3), each = 3))
# or split(mi_df, ggplot2::cut_width(seq_len(nrow(mi_df)), 3, center = 2))
do.call(rbind, lapply(tmp, function(x) { x[4, ] <- NA; x }))
ID X Y
1.1 1 a 1
1.2 1 a 3
1.3 1 a 5
1.4 NA <NA> NA
2.4 2 b 7
2.5 2 b 9
2.6 2 b 11
2.4.1 NA <NA> NA
3.7 3 c 13
3.8 3 c 15
3.9 3 c 17
3.4 NA <NA> NA
You can make empty rows like you show by assigning an empty character vector ("") instead of NA, but this will convert your columns to character, and I wouldn't recommend it.
My recommendation is somewhat different from all the other answers: don't make a mess of your dataset inside R . Use the existing packages to write to designated rows in an Excel workbook. For example, with the package xlConnect, the method writeWorksheet (called from writeWorksheetToFile ) includes these arguments:
object The workbook to write to data Data to write
sheet The name or index of the sheet to write to
startRow Index of the first row to write to. The default is startRow = 1.
startCol Index of the first column to write to. The default is startCol = 1.
So if you simply set up a loop that writes 3 rows of your data file at a time, then moves the row index down by 4 and writes the next 3 rows, etc., you're all set.
Here's one method.
Splits into list by ID, adds empty row, then binds list back into data frame.
mi_df2 <- do.call(rbind,Map(rbind,split(mi_df,mi_df$ID),rep("",3)))
rownames(mi_df2) <- NULL

R - split list every x items

I have data to analyse that is presented in the form of a list (just one row and MANY columns).
A B C D E F G H I
1 2 3 4 5 6 7 8 9
Is there a way to tell R to split this list every x items and get something as seen below (the columns C D E F G H I are virtually the same as A B)?
A B
1 2
3 4
5 6
7 8
9
If the number of columns is a multiple of 'x', then we unlist the dataset, and use matrix to create the expected output.
as.data.frame(matrix(unlist(df1), ncol=2, dimnames=list(NULL, c("A", "B")) , byrow=TRUE))
If the number of columns is not a multiple of 'x', then
x <- 2
gr <- as.numeric(gl(ncol(df1), x, ncol(df1)))
lst <- split(unlist(df1), gr)
do.call(rbind, lapply(lst, `length<-`, max(lengths(lst))))
# A B
# 1 1 2
# 2 3 4
# 3 5 6
# 4 7 8
# 5 9 NA

Convert a matrix with dimnames into a long format data.frame

Hoping there's a simple answer here but I can't find it anywhere.
I have a numeric matrix with row names and column names:
# 1 2 3 4
# a 6 7 8 9
# b 8 7 5 7
# c 8 5 4 1
# d 1 6 3 2
I want to melt the matrix to a long format, with the values in one column and matrix row and column names in one column each. The result could be a data.table or data.frame like this:
# col row value
# 1 a 6
# 1 b 8
# 1 c 8
# 1 d 1
# 2 a 7
# 2 c 5
# 2 d 6
...
Any tips appreciated.
Use melt from reshape2:
library(reshape2)
#Fake data
x <- matrix(1:12, ncol = 3)
colnames(x) <- letters[1:3]
rownames(x) <- 1:4
x.m <- melt(x)
x.m
Var1 Var2 value
1 1 a 1
2 2 a 2
3 3 a 3
4 4 a 4
...
The as.table and as.data.frame functions together will do this:
> m <- matrix( sample(1:12), nrow=4 )
> dimnames(m) <- list( One=letters[1:4], Two=LETTERS[1:3] )
> as.data.frame( as.table(m) )
One Two Freq
1 a A 7
2 b A 2
3 c A 1
4 d A 5
5 a B 9
6 b B 6
7 c B 8
8 d B 10
9 a C 11
10 b C 12
11 c C 3
12 d C 4
Assuming 'm' is your matrix...
data.frame(col = rep(colnames(m), each = nrow(m)),
row = rep(rownames(m), ncol(m)),
value = as.vector(m))
This executes extremely fast on a large matrix and also shows you a bit about how a matrix is made, how to access things in it, and how to construct your own vectors.
A modification that doesn't require you to know anything about the storage structure, and that easily extends to high dimensional arrays if you use the dimnames, and slice.index functions:
data.frame(row=rownames(m)[as.vector(row(m))],
col=colnames(m)[as.vector(col(m))],
value=as.vector(m))

Resources