R: Create dataframe from paste0 content

R: Create dataframe from paste0 content - r

I am manually creating a "free text" table using cat and paste0 like so:
tab < - cat(paste0("Stage","\t","Number,"\n",
"A","\t",nrow(df[df$stage == "A",]),"\n",
"B","\t",nrow(df[df$stage == "B",]),"\n"
))
i.e.
Stage Number
A 54
B 85
where I want to be able to create a publication ready table (i.e. looks good and probably generated by r markdown.
The xtable() function can do this, but only accepts a dataframe. So my question is how to I get some free text, delimited by column using "\t" and by rows "\n" into a dataframe?
I have tried:
data.frame(do.call(rbind,strsplit(as.character(tab),'\t')))
But get "dataframe with zero columns and zero rows". I think this has to do with the fact I am not declaring "\" to be a new line.
By the way, if this way seems long-winded and there is an easier way, I am happy to take suggestions.

Related

How can i remove the first x number of characters of a column name from 200+ columns with each column being not the same number of characters

How can I remove a specific number of characters from a column name from 200+ column names for example: "Q1: GOING OUT?" and "Q5: STATE, PROVINCE, COUNTY, ETC" I just want to remove the "Q1: " and the "Q5: "I have looked around but haven't been able to find one where I don't have to manually rename them manually. Are there any functions or ways to use it through tidyverse? I have only been starting with R for 2 months.
I don't really have anything to show. I have considered using for loops and possibly using gsub or case_when, but don't really understand how to properly use them.
#probably not correctly written but tried to do it anyways
for ( x in x(0:length) and _:(length(CandyData)-1){
front -> substring(0:3)
back -> substring(4:length(CandyData))
print <- back
}
I don't really have any errors because I haven't been able to make it work properly.

Try this:
col_all<-c("Q1:GOING OUT?","Q2:STATE","Q100:PROVINCE","Q200:COUNTRY","Q299:ID") #This is an example.If you already have a dataframe ,you may get colnames by **col_all<-names(df)**
for(col in 1:length(col_all)) # Iterate over the col_all list
{
colname=col_all[col] # assign each column name to variable colname at each iteration
match=gregexpr(pattern =':',colname) # Find index of : for each colname(Since you want to delete characters before colon and keep the string succeeding :
index1=as.numeric(match[1]) # only first element is needed for index
if(index1>0)
{
col_all[col]=substr(colname,index1+1,nchar(colname))#Take substring after : for each column name and assign it to col_all list
}
}
names(df)<-col_all #assign list as column name of dataframe

The H 1 answer is still the best: sub() or gsub() functions will do the work. And do not fear the regex, it is a powerful tool in data management.
Here is the gsub version:
names(df) <- gsub("^.*:","",names(df))
It works this way: for each name, fetch characters until reaching ":" and then, remove all the fetched characters (including ":").
Remember to up vote H 1 soluce in the comments

Why is df.to_string printing out weird labels?

If I run the code like so:
print(df['Col1'].to_string(index=False))
I get:
1
2
3
Now if I use the code like so (without print):
s = df['Col1'].to_string(index=False)
s
I get:
'1\n2\n3'
Where are the backslashes and 'n' strings coming from? What is the appropriate way of listing a single columns with an ultimate goal of assigning to an array?

if you want to convert a data column to a list (array), then use this code:
col_list = df['Col1'].values
or
col_list = list(df['Col1'])
The \n sequence is a popular one found in many languages that support escape sequences. It is used to indicate a new line in a string. And print function will format the given string & inserts a new line

R regex match '\' and newline in dataframe to create columns with new values

I am trying to split values in a dataframe column that looks like this:
Apple\Banana
Drink
---
Drink\Cup Cake
Apple
--
Fudge\Grape\Ham
Cup Cake
---
I am trying to match both newline and '\' using regex in strsplit.
currently I am using this:
strsplit(as.character(df$Food), "[\\\\ \n]")
However, it is also matching the space and splitting up "CupCake" to "Cup" and "Cake"
I am trying to figure out the proper regex for this matching.
My aim is to split the multiple values to multiple food columns in the dataframe called Food.1, Food.2, Food.3, etc. Is there standard way to do the split and create new columns in a dataframe? I think strsplit may not be the best way forward.

You have a space in the pattern. Try putting the newline first then you don't need a space:
strsplit(as.character(df$Food), "[\n\\\\]")

R, merge multiple rows of text data frame into one cell

I have a text data frame that looks like below.
> nrow(gettext.df)
[1] 3
> gettext.df
gettext
1 hello,
2 Good to hear back from you.
3 I've currently written an application and I'm happy about it
I wanted to merge this text data into one cell (to do sentiment analysis) as below
> gettext.df
gettext
1 hello, Good to hear back from you. I've currently written an application and I'm happy about it
so I collapsed the cell using below code
paste(gettext.df, collapse =" ")
but it seems like it makes those text data into one chunk (as one word) so I cannot scan the sentence word by word.
Is there any way that I can merge those sentence as a collection of sentences, without transforming as one big word chunk?

You have to transform the data frame column into a character vector before using paste.
paste(unlist(gettext.df), collapse =" ")
This returns:
[1] "hello, Good to hear back from you. I've currently written an application and I'm happy about it"

Perform sequence of edits on a large text file

I am hoping to perform a series of edits to a large text file composed almost entirely of single letters, seperated by spaces. The file is about 300 rows by about 400,000 columns, and about 250 MB.
My goal is to tranform this table using a series of steps, for eventual processing with another language (R, probably). I don't have much experience working with big data files, but PERL has been suggested to me as the best way to go about this. Please let me know if there is a better way :).
So, I am hoping to write a PERL script that does the following:
Open file, edit or write to a new file the following:
remove columns 2-6
merge/concatenate pairs of columns, starting with column 2 (so, merge column 2-3,4-5, etc)
replace each character pair according to sequential conditional algorithm running accross each row:
[example PSEUDOCODE: if character 1 of cell = character 2 of cell=a, cell=1
else if character 1 of cell = character 2 of cell=b, cell=2
etc.] such that except for the first column, the table is a numerical matrix
remove every nth column, or keep every nth column and remove all others
I am just starting to learn PERL, so I was wondering if these operations were possible in PERL, whether PERL would be the best way to do them, and if there were any suggestions for syntax on these operations in the context of reading/writing to a file.

I'll start:
use strict;
use warnings;
my #transformed;
while (<>) {
chomp;
my #cols = split(/\s/); # split on whitespace
splice(#cols, 1,6); # remove columns
push #transformed, $cols[0];
for (my $i = 1; $i < #cols; $i += 2) {
push #transformed, "$cols[$i]$cols[$i+1]";
}
# other transforms as required
print join(' ', #transformed), "\n";
}
That should get you on your way.

You need to post some sample input and expected output or we're just guessing what you want but maybe this will be a start:
awk '{
printf "%s ", $1
for (i=7;i<=NF;i+=2) {
printf "%s%s ", $i, $(i+1)
}
print ""
}' file

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R: Create dataframe from paste0 content - r

Related

How can i remove the first x number of characters of a column name from 200+ columns with each column being not the same number of characters

Why is df.to_string printing out weird labels?

R regex match '\' and newline in dataframe to create columns with new values

R, merge multiple rows of text data frame into one cell

Perform sequence of edits on a large text file

Categories

Resources