String split into duplicate rows [duplicate] - r

This question already has an answer here:
Split parts of strings into a list column and then make a vector column
(1 answer)
Closed 9 years ago.
Given the following sample dataset:
col1 <- c("X1","X2","X3|X4|X5","X6|X7")
col2 <- c("5","8","1","4")
dat <- data.frame(col1,col2)
How can I split the col1 by | and enter them as separate rows with duplicated col2 values? Here's the dataframe that I'd like to end up with:
col1 col2
X1 5
X2 8
X3 1
X4 1
X5 1
X6 4
X7 4
I need a solution that can accomodate multiple columns similar to col2 that also need to be duplicated.

Just split the character string and then repeat the other columns based on the length.
y<-strsplit(as.character( dat[,1]) , "|", fixed=TRUE)
data.frame(col1= unlist(y), col2= rep(dat[,2], sapply(y, length)))
col1 col2
1 X1 5
2 X2 8
3 X3 1
4 X4 1
5 X5 1
6 X6 4
7 X7 4
And if you need to repeat many columns except the first
data.frame(col1= unlist(y), dat[ rep(1:nrow(dat), sapply(y, length)) , -1 ] )

Related

How to partition to multiple .csv from df based on whitespace row?

I'm working with a database that has a timestamp, 3 numeric vectors, and a character vector.
Basically, each "set" of data is delineated by a new row. I need each series of rows to save as .csv when the row reads that each column is empty (x = \t\r\n). There's about 370 in my dataset.
For example,
library(dplyr)
data <- data.frame(x1 = 1:4,
x2 = 4:1,
x3 = 3,
x4 = c("text", "no text", "example", "hello"))
new_row <- c("\t\r\n", "\t\r\n", "\t\r\n", "\t\r\n")
data1 <- rbind(data, new_row)
data2 <- data.frame(x1 = 1:4,
x2 = 4:1,
x3 = 4,
x4 = c("text", "no text", "example", "hello"))
data2 <- rbind(data2, new_row)
data3 <- rbind(data1, data2)
view(data3)
This is what my data set looks like (without the timestamp). I need every set of consecutive rows after a row full or \t\r\n to be exported as an individual .csv.
I'm doing text analysis. Each group of rows, with highly variable group size, represents a thread on different subject. I need to analyze these individual threads.
What is the best way to go about doing this? I haven't had this problem before.
ind <- grepl("\t", data3$x4)
ind <- replace(cumsum(ind), ind, -1)
ind
# [1] 0 0 0 0 -1 1 1 1 1 -1
data4 <- split(data3, ind)
data4
# $`-1`
# x1 x2 x3 x4
# 5 \t\r\n \t\r\n \t\r\n \t\r\n
# 10 \t\r\n \t\r\n \t\r\n \t\r\n
# $`0`
# x1 x2 x3 x4
# 1 1 4 3 text
# 2 2 3 3 no text
# 3 3 2 3 example
# 4 4 1 3 hello
# $`1`
# x1 x2 x3 x4
# 6 1 4 4 text
# 7 2 3 4 no text
# 8 3 2 4 example
# 9 4 1 4 hello
The use of -1 was solely to keep the "\t\r\n" rows from being included in each of their respective groups, and we know that cumsum(ind) should start at 0. You can obviously drop the first frame :-)
From here, you can export with
data4 <- data4[-1]
ign <- Map(write.csv, data4, sprintf("file_%03d.csv", seq_along(data4)))

How to insert a vector into one row of a data frame?

I'm trying to insert a character vector into a row of a data frame rather than create a separate row for each value in the data frame. What I have so far:
a<- as.character(c(1:10))
data_frame <- as.data.frame(a)
Instead of 10 observations in 1 variable, I want 1 observation in 1 variable and that 1 observation would look like "1", "2"...."10" where each value in the vector is separated by a comma.
You can simply do transpose vector t:
df = data.frame(t(a))
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 1 2 3 4 5 6 7 8 9 10
Please try below. "I" is a function that inhibits the interpretation / conversion of objects as indicated by typing "?I" in the console.
data.frame(test = I(list(a)))

Creating subset based on string conditions

Havig a dataframe like this:
df_in <- data.frame(x = c('x1','x2','x3','x4'),
col1 = c('http://youtube.com/something','NA','https://www.yahooexample.com','https://www.yahooexample2.com'),
col2 = c('https://google.com', 'http://www.bbcnews2.com?id=321','NA','https://google.com/text'),
col3 = c('http://www.bbcnews.com?id=321', 'http://google.com?id=1234','NA','https://bbcnews.com/search'),
col4 = c('NA', 'https://www.youtube/com','NA', 'www.youtube.com/searcht'))
Example of dataframe input as printed in the console:
x col1 col2 col3 col4
1 x1 http://youtube.com/something https://google.com http://www.bbcnews.com?id=321 NA
2 x2 NA http://www.bbcnews2.com?id=321 http://google.com?id=1234 https://www.youtube/com
3 x3 https://www.yahooexample.com NA NA NA
4 x4 https://www.yahooexample2.com https://google.com/text https://bbcnews.com/search www.youtube.com/searcht
I would like to create a dataframe of a specific subset conditions. Example I would like to keep only the one which contain the "google", "youtube" and "bbc" in their sting.
Example of expected output:
df_out <- data.frame(x = c('x1','x2','x4'),
col1new = c('http://youtube.com/something', 'http://www.bbcnews2.com?id=321', 'https://google.com/text'),
col2new = c('https://google.com', 'http://google.com?id=1234', 'https://bbcnews.com/search'),
col3new = c('http://www.bbcnews.com?id=321', 'https://www.youtube/com', 'www.youtube.com/searcht'))
Example of dataframe output as printed in the console:
x col1new col2new col3new
1 x1 http://youtube.com/something https://google.com http://www.bbcnews.com?id=321
2 x2 http://www.bbcnews2.com?id=321 http://google.com?id=1234 https://www.youtube/com
3 x4 https://google.com/text https://bbcnews.com/search www.youtube.com/searcht
We could create a logical condition with grep to filter the rows based on the entries of elements having atleast one of the pattern after the http://
i1 <- Reduce('|', lapply(df_in[-1], grepl, pattern= "https?://(google|youtube|bbc)"))
Then, loop through the rows of the subset data and get the links that match with google/youtube/bbc
tmp <- t(apply(df_in[i1,-1], 1, function(x) x[grepl("(google|youtube|bbc)", x)]))
colnames(tmp) <- paste0('col', seq_len(ncol(tmp)), "new")
and cbind with the subset of first column
cbind(df_in[i1, 1, drop = FALSE], tmp)
# x col1new col2new col3new
#1 x1 http://youtube.com/something https://google.com http://www.bbcnews.com?id=321
#2 x2 http://www.bbcnews2.com?id=321 http://google.com?id=1234 https://www.youtube/com
#4 x4 https://google.com/text https://bbcnews.com/search www.youtube.com/searcht

Partial reordering of data frame columns do not move the column's names [duplicate]

This question already has answers here:
Is it possible to swap columns around in a data frame using R?
(8 answers)
Closed 6 years ago.
When I try to partially reorder columns using "[", the values are swaped but the column's names do not move. See the example below:
x = data.frame(x1 = c(1,2,3), x2 = c(2,3,4), x3 = c("e","e","e"), x4 = c("f","f","f"))
x
#x1 x2 x3 x4
#1 2 e f
#2 3 e f
#3 4 e f
x[, c(3,4)] = x[, c(4,3)]
#x1 x2 x3 x4
#1 2 f e
#2 3 f e
#3 4 f e
Any idea as to why the column's names are not moving and how to simply solve this ?
Try this
x <- x[,c(1,2,4,3)]
One option is cbind
x1 <- cbind(x[1:2], x[4:3])
x1
# x1 x2 x4 x3
#1 1 2 f e
#2 2 3 f e
#3 3 4 f e
Or we can also use numeric ordering
By doing the assignment, we are changing only the values and not the column names. The column values does change by position, but it will not translate by swapping the column names as column name is fixed for that position.

Removing characters from column value and adding a new letter

I have the following data frame df1. I want to remove "/" from all values in column x2 and add letter v at the end of each value in x2.
df1
x1 x2
1 aa/bb/cc
2 ff/bb/cc
3 uu/bb/cc
Resulting df2
df2
x1 x2
1 aabbccv
2 ffbbccv
3 uubbccv
You can use gsub to remove the / and paste0 to add the v in each row:
df2 <- transform(df1, x2 = paste0(gsub("/", "", x2, fixed = TRUE), "v"))
df2
# x1 x2
#1 1 aabbccv
#2 2 ffbbccv
#3 3 uubbccv

Resources