Data frame output as a single line

Data frame output as a single line - r

I have a dataframe with multiple columns and rows. I am wanting to export this as a .txt file with all values on the same line (i.e one row), with individual values seperated by "," and data from the rows of the df separated by ":"
w<- c(1,5)
x<- c(2,6)
y<- c(3,7)
z<- c(4,8)
df<-data.frame(w,x,y,z)
the output would look like this
1,2,3,4:5,6,7,8:

We can combine data row-wise using apply and paste data together with collapse = ":".
paste0(apply(df, 1, toString), collapse = ":")
#[1] "1, 2, 3, 4:5, 6, 7, 8"

If you want to write it to a file, use:
write.table(df, "df.csv", col.names = FALSE, row.names = FALSE, sep = ",", eol = ":")
If you want the output in R you can use do.call() and paste():
do.call(paste, c(df, sep = ",", collapse = ":"))
[1] "1,2,3,4:5,6,7,8"

We can use str_c
library(stringr)
library(dplyr)
library(purrr)
df %>%
reduce(str_c, sep=",") %>%
str_c(collapse=";")
#[1] "1,2,3,4;5,6,7,8"

Related

How to open this text file properly in R?

So I have this line of code in a file:
{"id":53680,"title":"daytona1-usa"}
But when I try to open it in R using this:
df <- read.csv("file1.txt", strip.white = TRUE, sep = ":")
It produces columns like this:
Col1: X53680.title
Col2: daytona1.usa.url
What I want to do is open the file so that the columns are like this:
Col1: 53680
Col2: daytona1-usa
How can I do this in R?
Edit: The actual file I'm reading in is this:
{"id":53203,"title":"bbc-moment","url":"https:\/\/wow.bbc.com\/bbc-ids\/live\/enus\/211\/53203","type":"audio\/mpeg"},{"id":53204,"title":"shg-moment","url":"https:\/\/wow.shg.com\/shg-ids\/live\/enus\/212\/53204","type":"audio\/mpeg"},{"id":53205,"title":"was-zone","url":"https:\/\/wow.was.com\/was-ids\/live\/enus\/213\/53205","type":"audio\/mpeg"},{"id":53206,"title":"xx1-zone","url":"https:\/\/wow.xx1.com\/xx1-ids\/live\/enus\/214\/53206","type":"audio\/mpeg"},], WH.ge('zonemusicdiv-zonemusic'), {loop: true});
After reading it in, I remove the first column and then every 3rd and 4th column with this:
# Delete the first column
df <- df[-1]
# Delete every 3rd and 4th columns
i1 <- rep(seq(3, ncol(df), 4) , each = 2) + 0:1
df <- df[,-i1]
Thank you.
Edit 2:
Adding this fixed it:
df[] <- lapply(df, gsub, pattern = ".title", replacement = "", fixed = TRUE)
df[] <- lapply(df, gsub, pattern = ",url", replacement = "", fixed = TRUE)

If it is a single JSON in the file, then
jsonlite::read_json("file1.txt")
# $id
# [1] 53680
# $title
# [1] "daytona1-usa"
If it is instead NDJSON (Newline-Delimited json), then
jsonlite::stream_in(file("file1.txt"), verbose = FALSE)
# id title
# 1 53680 daytona1-usa

Although the answers above would have been correct if the data had been formatted properly, it seems they don't work for the data I have so what I ended up going with was this:
df <- read.csv("file1.txt", header = FALSE, sep = ":", dec = "-")
# Delete the first column
df <- df[-1]
# Delete every 3rd and 4th columns
i1 <- rep(seq(3, ncol(df), 4) , each = 2) + 0:1
df <- df[,-i1]
df[] <- lapply(df, gsub, pattern = ".title", replacement = "", fixed = TRUE)
df[] <- lapply(df, gsub, pattern = ",url", replacement = "", fixed = TRUE)

How to find names of columns that have non English values? R

I have a data as shown in image some columns have non English words how can I find those column names using R programming?
Data and expected result is shown in the image.

First some reproducible data:
df <- data.frame(
Var1 = c("some", "data", "ß", "کابل"),
Var2 = c("کابل", "data", "کابل", "data"),
Var3 = c("some", "data", "more", "data"),
Var4 = c("some", "data", "more", "data")
)
df
The solution first strings all columns together using paste0and then deselects (-) those column strings in which greplfinds matches of non-ASCII characters (which are equivalent to non-English characters):
df[, -which(grepl("[^ -~]", apply(df, 2, paste0, collapse = " ")))]
Var3 Var4
1 some some
2 data data
3 more more
4 data data
EDIT:
To get only the names, simply insert the whole statement into names:
names(df[, -which(grepl("[^ -~]", apply(df, 2, paste0, collapse = " ")))])
[1] "Var3" "Var4"

Base R:
lapply(df, function(x){
ifelse(grepl("\\#", x), x, gsub(paste0(c(letters, LETTERS), collapse = "|"), "", x))})
Return names:
names(df)[sapply(df, function(x) {
ifelse(grepl("\\#", x), FALSE,
any(gsub(paste0(
c(letters, LETTERS), collapse = "|"
), "", x) == ""))
})]

Converting character to number of months

I am working on a dataset, where there is a column- account_age. In this column, the age is mentioned in format- "1YRS 5MON" in character form.
How to convert the same to month ? Please guide.

We can do match of the 'YRS', 'MON' with gsubfn, replace the characters with the numbers and evaluate
library(gsubfn)
unname(sapply(gsubfn("[A-Z]+", list(YRS = "*12 +", MON = "*1"),
df1$col1), function(x) eval(parse(text = x))))
#[1] 17
Or another option is to extract the digits and do a sum or products
library(tidyverse)
map_dbl(str_extract_all(df1$col1, "\\d+"), ~ as.numeric(.x) %*% c(12, 1))
#[1] 17
Or we can remove the letters, read it with data.frame and get the sum of products
as.matrix(read.table(text = gsub("[A-Z]+", "", df1$col1),
header = FALSE) )%*% c(12, 1)
data
df1 <- data.frame(col1 = "1YRS 5MON", stringsAsFactors = FALSE)

Paste data frame without changing into factor levels

I have vectors let say a,b,c,d as below:
a <- c(1,2,3,4)
b <- c("L","L","F","L")
c <- c(11,22,33,44)
d <- c("Y", "N", "Y","Y")
And I try to use paste to get this output (1):
paste(a,b,c,d, sep = "$", collapse = "%")
[1] "1$L$11$Y%2$L$22$N%3$F$33$Y%4$L$44$Y"
Then I change it into this, let say df:
df <- data.frame(a,b,c,d)
and get this output (2):
paste(df, sep = "$", collapse = "%")
[1] "c(1, 2, 3, 4)%c(2, 2, 1, 2)%c(11, 22, 33, 44)%c(2, 1, 2, 2)"
My question is:
(1) Can somebody explain to me why in df it change its elements into numeric?
(2) Is there any other way that I can use df to get output (1)?

paste runs as.character (or something similar internally) on its ... arguments, effectively deparsing the list. Have a look at
as.character(df)
# [1] "c(1, 2, 3, 4)" "c(2, 2, 1, 2)" "c(11, 22, 33, 44)" "c(2, 1, 2, 2)"
deparse(df$a)
# [1] "c(1, 2, 3, 4)"
Your code is pasting these values together. To get around this, you can use do.call.
do.call(paste, c(df, sep = "$", collapse = "%"))
# [1] "1$L$11$Y%2$L$22$N%3$F$33$Y%4$L$44$Y"

Here is an alternative to the approach you used:
df_call <- c(df, sep="$")
paste(do.call(paste, df_call), collapse="%")
[1] "1$L$11$Y%2$L$22$N%3$F$33$Y%4$L$44$Y"
Demo

You cannot directly apply paste to a dataframe for your case here, to get the desired output you need to apply paste in two levels.
paste(apply(df, 1, function(x) paste(x, collapse = "$")), collapse = "%")
#[1] "1$L$11$Y%2$L$22$N%3$F$33$Y%4$L$44$Y"
Where the apply command creates a row-wise vector
apply(df, 1, function(x) paste(x, collapse = "$"))
#[1] "1$L$11$Y" "2$L$22$N" "3$F$33$Y" "4$L$44$Y"
and the next paste command merge these all together with collapse argument as "%".

Here's a dplyr approach:
pull(summarise(unite(df, tmp, 1:ncol(df), sep="$"), paste(tmp, collapse="%")))
Or:
df %>%
unite(tmp, 1:ncol(df),sep="$") %>%
summarise(output = paste(tmp, collapse="%")) %>%
pull()

how to split fields after reading the file in R

I have a file with this format in each line:
f1,f2,f3,a1,a2,a3,...,an
Here, f1, f2, and f3 are the fixed fields separated by ,, but f4 is the whole a1,a2,...,an where n can vary.
How can I read this into R and conveniently store those variable-length a1 to an?
Thank you.
My file looks like the following
3,a,-4,news,finance
2,b,1,politics
1,a,0
2,c,2,book,movie
...

It is not clear what you mean by "conveniently store". If you think a data frame will suit you, try this:
df <- read.table(text = "3,a,-4,news,finance
2,b,1,politics
1,a,0
2,c,2,book,movie",
sep = ",", na.strings = "", header = FALSE, fill = TRUE)
names(df) <- c(paste0("f", 1:3), paste0("a", 1:(ncol(df) - 3)))
Edit following #Ananda Mahto's comment.
From ?read.table:
"The number of data columns is determined by looking at the first five lines of input".
Thus, if the maximum number of columns with data occurs somewhere after the first five lines, the solution above will fail.
Example of failure
# create a file with max five columns in the first five lines,
# and six columns in the sixth row
cat("3, a, -4, news, finance",
"2, b, 1, politics",
"1, a, 0",
"2, c, 2, book,movie",
"1, a, 0",
"2, c, 2, book, movie, news",
file = "df",
sep = "\n")
# based on the first five rows, read.table determines that number of columns is five,
# and creates an incorrect data frame
df <- read.table(file = "df",
sep = ",", na.strings = "", header = FALSE, fill = TRUE)
df
Solution
# This can be solved by first counting the maximum number of columns in the text file
ncol <- max(count.fields("df", sep = ","))
# then this count is used in the col.names argument
# to handle the unknown maximum number of columns after row 5.
df <- read.table(file = "df",
sep = ",", na.strings = "", header = FALSE, fill = TRUE,
col.names = paste0("f", seq_len(ncol)))
df
# change column names as above
names(df) <- c(paste0("f", 1:3), paste0("a", 1:(ncol(df) - 3)))
df

#
# Read example data
#
txt <- "3,a,-4,news,finance\n2,b,1,politics\n1,a,0\n2,c,2,book,movie"
tc = textConnection(txt)
lines <- readLines(tc)
close(tc)
#
# Solution
#
lines_split <- strsplit(lines, split=",", fixed=TRUE)
ind <- 1:3
df <- as.data.frame(do.call("rbind", lapply(lines_split, "[", ind)))
df$V4 <- lapply(lines_split, "[", -ind)
#
# Output
#
V1 V2 V3 V4
1 3 a -4 news, finance
2 2 b 1 politics
3 1 a 0
4 2 c 2 book, movie

A place to start:
dat <- readLines(file) ## file being your file
df <- data.frame(
f1=sapply(dat_split, "[[", 1),
f2=sapply(dat_split, "[[", 2),
f3=sapply(dat_split, "[[", 3),
a=unlist( sapply(dat_split, function(x) {
if (length(x) <= 3) {
return(NA)
} else {
return(paste(x[4:length(x)], collapse=","))
}
}) )
)
and when you need to pull things out of a, you can do splitting as necessary.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Data frame output as a single line - r

We can combine data row-wise using apply and paste data together with collapse = ":". paste0(apply(df, 1, toString), collapse = ":") #[1] "1, 2, 3, 4:5, 6, 7, 8"

If you want to write it to a file, use: write.table(df, "df.csv", col.names = FALSE, row.names = FALSE, sep = ",", eol = ":") If you want the output in R you can use do.call() and paste(): do.call(paste, c(df, sep = ",", collapse = ":")) [1] "1,2,3,4:5,6,7,8"

We can use str_c library(stringr) library(dplyr) library(purrr) df %>% reduce(str_c, sep=",") %>% str_c(collapse=";") #[1] "1,2,3,4;5,6,7,8"

Related

How to open this text file properly in R?

How to find names of columns that have non English values? R

Converting character to number of months

Paste data frame without changing into factor levels

how to split fields after reading the file in R

Categories

Resources