Multiline text in R dataframe - r

I'm trying to include a multiline text in a dataframe cell, however R keeps reading the \n as a next row, resulting in row mismatches. If i change the 'code' input to a simple string, the code works fine.
Defined dataframe:
df <- data.frame(
"Id" = character(),
"Name" = character(),
"Code" = character()
)
Adding new row:
NewRow <- data.frame(
"Id" = Id, # Simple string
"Name" = Name, # Simple string
"Code" = Code # Complex multiline string containing '#' and '\n' (10+ lines)
)
df <- rbind(df, NewRow)
Received error: Error in data.frame: arguments imply differing number of rows: 1, 0
Does anyone know how to get around this problem?
Many thanks in advance!

Maybe what you can try is to clean up the Code variable a bit, before adding it to the dataframe. In this sense, you can remove \n and # from the Code variable, and then add it inside the dataframe. For this you can use stringr and dplyr, to update the Code variable:
### Using the replace option:
Code <- Code %>%
str_replace_all("\\\n", "") %>%
str_replace_all("#", "")
### Using the remove option:
Code <- Code %>%
str_remove_all("\\\n") %>%
str_remove_all("#")

Related

How can I make R import each line of a .txt file as a character string?

I have a complex .txt file, of which I'll add a screenshot .txt file. I need to have each line as its own character string in order to group the lines of code by the 5 letter code near the beginning of each line (group together all GPGGA lines, for example; see screenshot) in order to process it as I need to. Here's what I've run so far:
df <- data.frame(Weather_data)
df %>%
mutate("Entry" = gsub(".*\\$([A-Z]+),.*", "\\1", text)) %>%
group_by(Entry) %>%
filter(Entry == "GPGGA")
This received the error:
"Error: Problem with mutate() column Entry. i Entry = gsub(".*\\$([A-Z]+),.*", "\\1", text). x cannot coerce type 'closure' to vector of type 'character'"
I had success filtering as I needed to when I copied and pasted the first few lines in and manually made then character strings to see if I could get the code to function, so making each line a character string NOT manually (there are over 3000 lines) is the next step. Anyone have a function to do this?
Here are some of the lines produced when I load the imported txt file:
HEADER
<chr>
13:30:00.587: <- $GPGGA,183000.30,4415.6243,N,08823.9769,W,1,7,1.7,225.5,M,-33.4,M,,*68
13:30:00.683: <- $GPGLL,4415.6243,N,08823.9769,W,183000.40,A,A*72
13:30:00.779: <- $GPVTG,159.6,T,163.2,M,0.1,N,0.1,K,A*2E
13:30:00.827: <- $HCHDG,74.8,0.0,E,3.6,W*6E
13:30:01.003: <- $WIMDA,29.9641,I,1.0147,B,26.5,C,,,48.2,,14.6,C,323.0,T,326.6,M,1.4,N,0.7,M*66
13:30:01.051: <- $WIMWV,248.4,R,1.1,N,A*29
13:30:01.114: <- $WIMWV,255.6,T,1.3,N,A*23
13:30:01.195: <- $YXXDR,A,-53.9,D,PTCH,A,-34.2,D,ROLL*57
13:30:01.307: <- $YXXDR,A,0.571,G,XACC,A,0.783,G,YACC,A,-0.181,G,ZACC*57
13:30:01.578: <- $GPGGA,183001.30,4415.6242,N,08823.9769,W,1,7,1.7,225.9,M,-33.4,M,,*64
You referenced the variable text which does not exist in your data.frame. Your column is named HEADER.
df %>%
mutate("Entry" = gsub(".*\\$([A-Z]+),.*", "\\1", HEADER)) %>%
group_by(Entry) %>%
filter(Entry == "GPGGA")

Passing special character in DT Table

I am trying to escape special character "--" in column of DT table but formatPercentage is not letting this happen , manually passing formatPercentage(c(2,3,5)) is working and i want to make it dynamic. so i am looking for a solution through which column having "--" can be displayed in DT table.
I have tried ifelse but doesn't work , this code is just a part of my function.
df <- mtcars[1:6,1:5]
df$drat <- "--"
df$disp <- "--"
datatable(df, escape = FALSE) %>%
formatPercentage(2:5)
so the actual problem is I am trying to mask one column in my DT table output but formatPercentage not providing the require output. so i am looking for a solution .
My function is big thats why i am unable to create a reproducible example
You can exclude the columns which has '--' in them.
library(DT)
all_cols <- 2:5
format_cols <- setdiff(all_cols, which(colSums(df == '--') > 0))
datatable(df, escape = FALSE) %>% formatPercentage(format_cols)

removing and replacing observations with string package

I have two datasets, I'm trying to join together. the column i am joining by does not exactly match up with each other. first file the column looks like this: 00:01:54:2145 etc. 00: for every single observation. I want to change all the observations in this column to be in this format: 01/54/2145.
I have tried several things with string package, but can't get it to work.
df1 <- df %>%
str_replace_all("00:")
I'm getting this error, but don't think that's the only problem:
argument is not an atomic vector; coercing
Thank you
library(stringr)
library(dplyr)
my_conversion <- Vectorize(function(str) {
str_replace(str, "^00:", "") %>%
str_replace_all(":", "/")
})
df <- data.frame(
a_column = 1:3, key_column = c("00:01:54:2145", "00:01:54:2145", "00:01:54:2145"))
df %>% mutate(key_column = my_conversion(key_column))

Filtering process not fetching full data? Using dplyr filter and grep

I have this log file that has about 1200 characters (max) on a line. What I want to do is read this first and then extract certain portions of the file into new columns. I want to extract rows that contain the text “[DF_API: input string]”.
When I read it and then filter based on the rows that I am interested, it almost seems like I am losing data. I tried this using the dplyr filter and using standard grep with the same result.
Not sure why this is the case. Appreciate your help with this. The code and the data is there at the following link.
Satish
Code is given below
library(dplyr)
setwd("C:/Users/satis/Documents/VF/df_issue_dec01")
sec1 <- read.delim(file="secondary1_aa_small.log")
head(sec1)
names(sec1) <- c("V1")
sec1_test <- filter(sec1,str_detect(V1,"DF_API: input string")==TRUE)
head(sec1_test)
sec1_test2 = sec1[grep("DF_API: input string",sec1$V1, perl = TRUE),]
head(sec1_test2)
write.csv(sec1_test, file = "test_out.txt", row.names = F, quote = F)
write.csv(sec1_test2, file = "test2_out.txt", row.names = F, quote = F)
Data (and code) is given at the link below. Sorry, I should have used dput.
https://spaces.hightail.com/space/arJlYkgIev
Try this below code which could give you a dataframe of filtered lines from your file based a matching condition.
#to read your file
sec1 <- readLines("secondary1_aa_small.log")
#framing a dataframe by extracting required lines from above file
new_sec1 <- data.frame(grep("DF_API: input string", sec1, value = T))
names(new_sec1) <- c("V1")
Edit: Simple way to split the above column into multiple columns
#extracting substring in between < & >
new_sec1$V1 <- gsub(".*[<\t]([^>]+)[>].*", "\\1", new_sec1$V1)
#replacing comma(,) with a white space
new_sec1$V1 <- gsub("[,]+", " ", new_sec1$V1)
#splitting into separate columns
new_sec1 <- strsplit(new_sec1$V1, " ")
new_sec1 <- lapply(new_sec1, function(x) x[x != ""] )
new_sec1 <- do.call(rbind, new_sec1)
new_sec1 <- data.frame(new_sec1)
Change columns names for your analysis.

how to convert text files into dataframe in R?

I am trying to export datapoints from mongodb. I was unable to directly connect it to rstudio unfortunately. So from the query outcome I created a text file and attempted to read it as text file in R.
"cityid", "count"
"102","2"
"55","31"
"119","7"
"206","1"
"18","2"
"15","1"
"32","3"
"14","1"
"54","2"
"23","85"
"158","3"
"266","1"
"9","1"
"34","1"
"159","1"
"31","1"
"22","2"
"209","2"
"121","4"
"73","12"
"350","2"
"311","2"
"377","2"
"230","7"
"290","1"
"49","2"
"379","2"
"75","1"
"59","6"
"165","3"
"19","8"
"13","40"
"126","13"
"243","12"
"325","1"
"17","1"
"null","235"
"144","2"
"334","1"
"40","12"
"7","34"
"181","40"
"349","4"
So bascially the format is like above and I would like to convert this into a data frame of which I can make as reference for calculation with other datasets.
This is what I tried to do to make as data frame...
L <- readLines(file.choose())
L.df <- as.data.frame(L)
list <- strsplit(L.df, ",")
library("plyr")
df <- ldply(list)
colnames(df) <- c("city_id", "count")
str(df)
df$city_id <- suppressWarnings(as.numeric(as.character(df$city_id)))
At the last line, I tried to convert the character value as numeric value only to fail and coerced them into NA.
Does anyone have better suggestion to make them as numeric value table?
OR is there actually better way to bring the mongodb into R without copying and pasting them as text files? I was successful to connect to mongodb using Rmongo, but the syntax was way too complicated for me to understand.. The query I used is:
db.getCollection('logging_app_location_view_logs').aggregate([
{"$group": {"_id": "$city_id", "total": {"$sum":1}}}
]).forEach(function(l){
print('"' + l._id + '","' + l.total + '"');
});
Thanks in advance for your help!
You don't need to specify column names again when you have already passed header = TRUE in read.table function. colClasses argument will take care of the class of a column data.
df <- read.table(file.choose(), header = TRUE, sep = ",", colClasses = c('character', 'character'), na.strings = 'null')
# convert character to numeric format
char_cols <- which(sapply(df, class) == 'character') # identify character columns
df[char_cols] <- lapply(df[char_cols], as.numeric) # convert character to numeric column

Resources