Doing something similar to melt to an R dataframe [duplicate] - r

This question already has answers here:
Split comma-separated strings in a column into separate rows
(6 answers)
Closed 6 years ago.
I've got a dataframe like this:
The first column is numeric, and the second column is a comma separated list (character)
id numbers
1 2,4,5
2 1,4,6
3 NA
4 NA
5 5,1,2
And I want to in essence "melt" the dataframe similar to the reshape package. So that the output is a dataframe which looks like this
id numbers
1 2
1 4
1 5
2 1
2 4
2 6
3 NA
4 NA
5 5
5 1
5 2
Except in the reshape2 package each number will have to be each in a column... which takes up too much storage space if there are many numbers... which is why I have opted to set the list of numbers as a comma separated list. But melt no longer works with this setup.
Can you recommend the most efficient way to achieve the transformation from the input dataframe to output dataframe?

The way I would do it for each row, create a data.frame and store them in a list, where df is your initial data.frame.
l = list()
for (j in 1:nrow(df)){
l[[j]] = data.frame(id = df$id[[j]],
numbers = split(df$numbers[[j]], ','))
}
Afterwards, you can stack all list elements into a single data.frame using plyr::ldply with the 'data.frame' option.

Related

How to add row of a dataframe in r if I have created named list having same name as columns of dataframe? [duplicate]

This question already has answers here:
How to add a row to a data frame in R?
(16 answers)
Closed 2 years ago.
df <-data.frame(x=1:2,y=5:6)
row <- list(x=10,y=20)
add_row(df,row)
Error: New rows can't add columns.
x Can't find column row in .data.
Run rlang::last_error() to see where the error occurred.
but
add_row(df,x=10,y=20)
x y
1 1 5
2 2 6
3 10 20
works. Please help me add named list into df?
Using rbind as suggested by #DanY is an easy solution, to use add_row you can change row to tibble or data.frame :
library(tibble)
df <-data.frame(x=1:2,y=5:6)
row <- tibble(x=10,y=20)
add_row(df, row)
# x y
#1 1 5
#2 2 6
#3 10 20

how to convert large column using factor [duplicate]

This question already has answers here:
Convert data.frame column format from character to factor
(8 answers)
Closed 3 years ago.
Im writing a machine learning code for my dataset having hotels column.The hotel column contains 300 hotels name.For data preprocessing,I saw we have to use factor.Is there any easy way to covert it as there are so many values for level?
It's simple, use the as.factor() function to convert the column form character to factor.
Here's a sample
# Sample data
data
a b
1 A 1
2 B 2
3 C 3
4 A 4
5 B 5
class(data$a)
[1] "character"
# Converting to factor
data$a <- as.factor(data$a)
# Results
class(data$a)
[1] "factor"
summary(data$a)
A B C
2 2 1
if you are using read.csv option to load the csv data into a dataframe, then column having string values are by default loaded as a factor column.
Anyway you can use factor() function to convert a column to factor:
df$a <- factor(df$a).

R: Extracting non-duplicated values from vector (not keeping one value for duplicates) [duplicate]

This question already has answers here:
Finding ALL duplicate rows, including "elements with smaller subscripts"
(9 answers)
How can I remove all duplicates so that NONE are left in a data frame?
(3 answers)
Closed 5 years ago.
I would like to keep the non-duplicated values from a vector, but without retaining one element from duplicated values. unique() does not work for this. Neither would duplicated().
For example:
> test <- c(1,1,2,3,4,4,4,5,6,6,7,8,9,9)
> unique(test)
[1] 1 2 3 4 5 6 7 8 9
Whereas I would like the result to be: 2,3,5,7,8
Any ideas on how to approach this? Thank you!
We can use duplicated
test[!(duplicated(test)|duplicated(test, fromLast=TRUE))]
#[1] 2 3 5 7 8
You can use ave to count the length of sub-groups divided by unique values in test and retain only the ones whose length is 1 (the ones that have no duplicates)
test[ave(test, test, FUN = length) == 1]
#[1] 2 3 5 7 8
If test is comprised of characters, use seq_along as first argument of ave
test[ave(seq_along(test), test, FUN = length) == 1]

How can I merge the different elements of the list? [duplicate]

This question already has answers here:
Paste multiple columns together
(11 answers)
Concatenate row-wise across specific columns of dataframe
(3 answers)
Closed 7 years ago.
I have a list/dataframe such as
a b c d e f g VALUE
1 0 1 0 0 0 1 934
what I wanted to do is to print,
1010001 without using for loop. so basically, take those integers as a string and merge them while printing?
I will define a function, which truncate the last value and paste all the other elements together. And then use "apply" on all the dataframe
cc <- data.frame(a=1,b=0,c=1,d=0,e=0,f=0,g=1,VALUE=934)
# This function contains the all the jobs you want to do for the row.
myfuns <- function(x, collapse=""){
x <- x[-length(x)] # truncate the last element
paste(x,collapse="") # paste all the integers together
}
# the second argument "MARGIN=1" means apply this function on the row
apply(cc,MARGIN=1,myfuns2) # output: "1010001"

filter R data frame with one column - keep data frame format [duplicate]

This question already has an answer here:
Filtering single-column data frames
(1 answer)
Closed 7 years ago.
I am looking for a simple way to display a subset of a one column data frame
Let's assume, I have a a data frame:
> df <- data.frame(a = 1:100)
Now, I only need the first 10 rows. If I subset it by index, I'll get a result vector instead of a data frame:
> df[1:10,]
[1] 1 2 3 4 5 6 7 8 9 10
I tried to use 'subset' but not using the 'subset'-parameter will result in an error (only for one-column-data-frames?):
subset(df[1:10,])
Error in subset.default(df[1:10, ]) :
argument "subset" is missing, with no default
There should be a very easy solution to achive a subset (still a data frame) filtered by row index, no?
I am lookung for a solution with basic R commands (it should not depend on any special library)
you can use drop=FALSE, which prevent from droping the dimensions of the array.
df[1:10, , drop=FALSE]
a
1 1
2 2
3 3
4 4
5 5
...
For subset you need to add a condition.

Resources