I have a dataframe with multiple columns and rows. I am wanting to export this as a .txt file with all values on the same line (i.e one row), with individual values seperated by "," and data from the rows of the df separated by ":"
w<- c(1,5)
x<- c(2,6)
y<- c(3,7)
z<- c(4,8)
df<-data.frame(w,x,y,z)
the output would look like this
1,2,3,4:5,6,7,8:
We can combine data row-wise using apply and paste data together with collapse = ":".
paste0(apply(df, 1, toString), collapse = ":")
#[1] "1, 2, 3, 4:5, 6, 7, 8"
If you want to write it to a file, use:
write.table(df, "df.csv", col.names = FALSE, row.names = FALSE, sep = ",", eol = ":")
If you want the output in R you can use do.call() and paste():
do.call(paste, c(df, sep = ",", collapse = ":"))
[1] "1,2,3,4:5,6,7,8"
We can use str_c
library(stringr)
library(dplyr)
library(purrr)
df %>%
reduce(str_c, sep=",") %>%
str_c(collapse=";")
#[1] "1,2,3,4;5,6,7,8"
Related
So I have this line of code in a file:
{"id":53680,"title":"daytona1-usa"}
But when I try to open it in R using this:
df <- read.csv("file1.txt", strip.white = TRUE, sep = ":")
It produces columns like this:
Col1: X53680.title
Col2: daytona1.usa.url
What I want to do is open the file so that the columns are like this:
Col1: 53680
Col2: daytona1-usa
How can I do this in R?
Edit: The actual file I'm reading in is this:
{"id":53203,"title":"bbc-moment","url":"https:\/\/wow.bbc.com\/bbc-ids\/live\/enus\/211\/53203","type":"audio\/mpeg"},{"id":53204,"title":"shg-moment","url":"https:\/\/wow.shg.com\/shg-ids\/live\/enus\/212\/53204","type":"audio\/mpeg"},{"id":53205,"title":"was-zone","url":"https:\/\/wow.was.com\/was-ids\/live\/enus\/213\/53205","type":"audio\/mpeg"},{"id":53206,"title":"xx1-zone","url":"https:\/\/wow.xx1.com\/xx1-ids\/live\/enus\/214\/53206","type":"audio\/mpeg"},], WH.ge('zonemusicdiv-zonemusic'), {loop: true});
After reading it in, I remove the first column and then every 3rd and 4th column with this:
# Delete the first column
df <- df[-1]
# Delete every 3rd and 4th columns
i1 <- rep(seq(3, ncol(df), 4) , each = 2) + 0:1
df <- df[,-i1]
Thank you.
Edit 2:
Adding this fixed it:
df[] <- lapply(df, gsub, pattern = ".title", replacement = "", fixed = TRUE)
df[] <- lapply(df, gsub, pattern = ",url", replacement = "", fixed = TRUE)
If it is a single JSON in the file, then
jsonlite::read_json("file1.txt")
# $id
# [1] 53680
# $title
# [1] "daytona1-usa"
If it is instead NDJSON (Newline-Delimited json), then
jsonlite::stream_in(file("file1.txt"), verbose = FALSE)
# id title
# 1 53680 daytona1-usa
Although the answers above would have been correct if the data had been formatted properly, it seems they don't work for the data I have so what I ended up going with was this:
df <- read.csv("file1.txt", header = FALSE, sep = ":", dec = "-")
# Delete the first column
df <- df[-1]
# Delete every 3rd and 4th columns
i1 <- rep(seq(3, ncol(df), 4) , each = 2) + 0:1
df <- df[,-i1]
df[] <- lapply(df, gsub, pattern = ".title", replacement = "", fixed = TRUE)
df[] <- lapply(df, gsub, pattern = ",url", replacement = "", fixed = TRUE)
I have a data as shown in image some columns have non English words how can I find those column names using R programming?
Data and expected result is shown in the image.
First some reproducible data:
df <- data.frame(
Var1 = c("some", "data", "ß", "کابل"),
Var2 = c("کابل", "data", "کابل", "data"),
Var3 = c("some", "data", "more", "data"),
Var4 = c("some", "data", "more", "data")
)
df
The solution first strings all columns together using paste0and then deselects (-) those column strings in which greplfinds matches of non-ASCII characters (which are equivalent to non-English characters):
df[, -which(grepl("[^ -~]", apply(df, 2, paste0, collapse = " ")))]
Var3 Var4
1 some some
2 data data
3 more more
4 data data
EDIT:
To get only the names, simply insert the whole statement into names:
names(df[, -which(grepl("[^ -~]", apply(df, 2, paste0, collapse = " ")))])
[1] "Var3" "Var4"
Base R:
lapply(df, function(x){
ifelse(grepl("\\#", x), x, gsub(paste0(c(letters, LETTERS), collapse = "|"), "", x))})
Return names:
names(df)[sapply(df, function(x) {
ifelse(grepl("\\#", x), FALSE,
any(gsub(paste0(
c(letters, LETTERS), collapse = "|"
), "", x) == ""))
})]
I am working on a dataset, where there is a column- account_age. In this column, the age is mentioned in format- "1YRS 5MON" in character form.
How to convert the same to month ? Please guide.
We can do match of the 'YRS', 'MON' with gsubfn, replace the characters with the numbers and evaluate
library(gsubfn)
unname(sapply(gsubfn("[A-Z]+", list(YRS = "*12 +", MON = "*1"),
df1$col1), function(x) eval(parse(text = x))))
#[1] 17
Or another option is to extract the digits and do a sum or products
library(tidyverse)
map_dbl(str_extract_all(df1$col1, "\\d+"), ~ as.numeric(.x) %*% c(12, 1))
#[1] 17
Or we can remove the letters, read it with data.frame and get the sum of products
as.matrix(read.table(text = gsub("[A-Z]+", "", df1$col1),
header = FALSE) )%*% c(12, 1)
data
df1 <- data.frame(col1 = "1YRS 5MON", stringsAsFactors = FALSE)
I have vectors let say a,b,c,d as below:
a <- c(1,2,3,4)
b <- c("L","L","F","L")
c <- c(11,22,33,44)
d <- c("Y", "N", "Y","Y")
And I try to use paste to get this output (1):
paste(a,b,c,d, sep = "$", collapse = "%")
[1] "1$L$11$Y%2$L$22$N%3$F$33$Y%4$L$44$Y"
Then I change it into this, let say df:
df <- data.frame(a,b,c,d)
and get this output (2):
paste(df, sep = "$", collapse = "%")
[1] "c(1, 2, 3, 4)%c(2, 2, 1, 2)%c(11, 22, 33, 44)%c(2, 1, 2, 2)"
My question is:
(1) Can somebody explain to me why in df it change its elements into numeric?
(2) Is there any other way that I can use df to get output (1)?
paste runs as.character (or something similar internally) on its ... arguments, effectively deparsing the list. Have a look at
as.character(df)
# [1] "c(1, 2, 3, 4)" "c(2, 2, 1, 2)" "c(11, 22, 33, 44)" "c(2, 1, 2, 2)"
deparse(df$a)
# [1] "c(1, 2, 3, 4)"
Your code is pasting these values together. To get around this, you can use do.call.
do.call(paste, c(df, sep = "$", collapse = "%"))
# [1] "1$L$11$Y%2$L$22$N%3$F$33$Y%4$L$44$Y"
Here is an alternative to the approach you used:
df_call <- c(df, sep="$")
paste(do.call(paste, df_call), collapse="%")
[1] "1$L$11$Y%2$L$22$N%3$F$33$Y%4$L$44$Y"
Demo
You cannot directly apply paste to a dataframe for your case here, to get the desired output you need to apply paste in two levels.
paste(apply(df, 1, function(x) paste(x, collapse = "$")), collapse = "%")
#[1] "1$L$11$Y%2$L$22$N%3$F$33$Y%4$L$44$Y"
Where the apply command creates a row-wise vector
apply(df, 1, function(x) paste(x, collapse = "$"))
#[1] "1$L$11$Y" "2$L$22$N" "3$F$33$Y" "4$L$44$Y"
and the next paste command merge these all together with collapse argument as "%".
Here's a dplyr approach:
pull(summarise(unite(df, tmp, 1:ncol(df), sep="$"), paste(tmp, collapse="%")))
Or:
df %>%
unite(tmp, 1:ncol(df),sep="$") %>%
summarise(output = paste(tmp, collapse="%")) %>%
pull()
I have a file with this format in each line:
f1,f2,f3,a1,a2,a3,...,an
Here, f1, f2, and f3 are the fixed fields separated by ,, but f4 is the whole a1,a2,...,an where n can vary.
How can I read this into R and conveniently store those variable-length a1 to an?
Thank you.
My file looks like the following
3,a,-4,news,finance
2,b,1,politics
1,a,0
2,c,2,book,movie
...
It is not clear what you mean by "conveniently store". If you think a data frame will suit you, try this:
df <- read.table(text = "3,a,-4,news,finance
2,b,1,politics
1,a,0
2,c,2,book,movie",
sep = ",", na.strings = "", header = FALSE, fill = TRUE)
names(df) <- c(paste0("f", 1:3), paste0("a", 1:(ncol(df) - 3)))
Edit following #Ananda Mahto's comment.
From ?read.table:
"The number of data columns is determined by looking at the first five lines of input".
Thus, if the maximum number of columns with data occurs somewhere after the first five lines, the solution above will fail.
Example of failure
# create a file with max five columns in the first five lines,
# and six columns in the sixth row
cat("3, a, -4, news, finance",
"2, b, 1, politics",
"1, a, 0",
"2, c, 2, book,movie",
"1, a, 0",
"2, c, 2, book, movie, news",
file = "df",
sep = "\n")
# based on the first five rows, read.table determines that number of columns is five,
# and creates an incorrect data frame
df <- read.table(file = "df",
sep = ",", na.strings = "", header = FALSE, fill = TRUE)
df
Solution
# This can be solved by first counting the maximum number of columns in the text file
ncol <- max(count.fields("df", sep = ","))
# then this count is used in the col.names argument
# to handle the unknown maximum number of columns after row 5.
df <- read.table(file = "df",
sep = ",", na.strings = "", header = FALSE, fill = TRUE,
col.names = paste0("f", seq_len(ncol)))
df
# change column names as above
names(df) <- c(paste0("f", 1:3), paste0("a", 1:(ncol(df) - 3)))
df
#
# Read example data
#
txt <- "3,a,-4,news,finance\n2,b,1,politics\n1,a,0\n2,c,2,book,movie"
tc = textConnection(txt)
lines <- readLines(tc)
close(tc)
#
# Solution
#
lines_split <- strsplit(lines, split=",", fixed=TRUE)
ind <- 1:3
df <- as.data.frame(do.call("rbind", lapply(lines_split, "[", ind)))
df$V4 <- lapply(lines_split, "[", -ind)
#
# Output
#
V1 V2 V3 V4
1 3 a -4 news, finance
2 2 b 1 politics
3 1 a 0
4 2 c 2 book, movie
A place to start:
dat <- readLines(file) ## file being your file
df <- data.frame(
f1=sapply(dat_split, "[[", 1),
f2=sapply(dat_split, "[[", 2),
f3=sapply(dat_split, "[[", 3),
a=unlist( sapply(dat_split, function(x) {
if (length(x) <= 3) {
return(NA)
} else {
return(paste(x[4:length(x)], collapse=","))
}
}) )
)
and when you need to pull things out of a, you can do splitting as necessary.