r exporting summarise results to html or word - r

After going to find how to summarize a DataFrame I did it.
I can see the results in my Console which is what is shown below after the first two lines of code
byTue <- group_by(luckyloss.3,L_byUXR)
( sumMon <- summarize(byTue,count=n()) )
Below is what I see on the Console It feels good because it shows I got what I was looking for
The results below come from a column of 234 rows which has many values repeated.
So this I did a summarise of the 234 rows where in the case of ANA comes 8 times, ARI 14 and so on
# A tibble: 30 × 2
L_byUXR count
<chr> <int>
1 ANA 8
2 ARI 14
3 ATL 16
4 BAL 4
5 BOS 6
6 CHA 12
7 CHN 8
8 CIN 10
9 CLE 4
10 COL 8
# ... with 20 more rows
What I want is to have this output of 30 rows by two columns in a way I can take it to a word document or could even be HTML
I tried to do a write(byTUE.csv) but what I received was the list of 234 rows of the original data frame. It's like the summarise disappeared, I have checked other ways like markdown or create new files tried to see if the knitr package could help but nothing.

library(stringi) # ONLY NECESSARY FOR DATA SIMULATION
library(officer) # <<= install this
library(tidyverse)
Simulate some data:
set.seed(2017-11-18)
data_frame(
L_byUXR = stri_rand_strings(30, 3, pattern="[A-Z]"),
count = sample(20, 30, replace=TRUE)
) -> sumMon
Start a new Word doc and add the table, saving to a new doc:
read_docx() %>% # a new, empty document
body_add_table(sumMon, style = "table_template") %>%
print(target="new.docx")

I kept looking for an answer and found the "stargazer" package for R, which allowed me to get the result of the dataframe as a text which can be further edited
When you write the R instruction, in "out = ", name the file you want as output and stargazer will place it there for you in your session's folder
The instruction I used was:
stargazer(count, type = "text", summary = FALSE, title="Any Title", digits=1, out="table1.txt")
Even though I found the answer I could not have done it without the help of hrbrmstr who showed me there was a package do do it, I just needed to work more on it

Related

How do I create a data table in code in R

I have a data table as a CSV file that I use to create metrics for a dashboard. The data table includes Metric IDs and associates these with field names. This table--this definition of metrics--is largely static, and I'd like to include it within R code rather than, for example, importing a CSV file containing these headings.
The table looks something like this:
Metric_ID
Metric_Name
Numerator
Denominator
AB0001
Number_of_Customers
No_of_Customers
AB0002
Percent_New_Customers
No_of_New_Customers
No_of_Customers
This has about 40 rows of data, and I'd like to set this table up in code so that it is created at the time the R query is run. I'll then use it to associate metric IDs with measures I retrive through SQL queries. Sometimes this table may change -- for example, new metrics might be added or existing metrics modified. This would need some modificatoin in the code to incorporate these metrics.
The closet way I could find was to create a data table, along the lines described in the query below.
dt<-data.table(x=c(1,2,3),y=c(2,3,4),z=c(3,4,5))
dt
x y z
1: 1 2 3
2: 2 3 4
3: 3 4 5
cbind with data table and data frame
This works for a table with a few rows or columns, but will be unwieldy for tables with 40+ rows. For example, if I wanted to modify a metric 20 rows down, I'd have to go 20 rows down in each column, and then test the table to ensure I switched the metric at the right place in each column -- especially where some metrics have empty cells. for example, I may correct the metric ID in row 20, but accidentally put the definition (a separate column) in row 19.
Is there a more straightforward way of, in essence, creating a table in code?
(I appreciate the most straightforward way would be to keep a CSV file accessible and use read_csv to import it into R. However, this doesn't work so well if colleagues are running this query on their machine and have a different file path to the CSV -- it also raises the risk of them running the query with an out-of-date metrics table, as they may not have the latest version in their files).
Thanks in advance for any guidance you might have!
Tony
Here are two options (examples taken from respective help pages):
data.table::fread()
fread("A,B
1,2
3,4
")
#> A B
#> <int> <int>
#> 1: 1 2
#> 2: 3 4
https://rdatatable.gitlab.io/data.table/reference/fread.html
tibble::tribble()
tribble(
~colA, ~colB,
"a", 1,
"b", 2,
"c", 3
)
#> # A tibble: 3 × 2
#> colA colB
#> <chr> <dbl>
#> 1 a 1
#> 2 b 2
#> 3 c 3
https://tibble.tidyverse.org/reference/tribble.html
Other options:
If you already have the data.frame from somewhere, you can also use dput() to get a structure() code you can paste into the files you are distributing.
use the reprex package https://reprex.tidyverse.org/

Is there a way I can use r code in order to calculate the average price for specific days? (AVERAGEIF function)

Firstly: I have seen other posts about AVERAGEIF translations from excel into R but I didn't see one that worked on my specific case and I couldn't get around to making one work.
I have a dataset which encompasses the daily pricings of a bunch of listings.
It looks like this
listing_id date price
1 1000 1/2/2015 $100
2 1200 2/4/2016 $150
Sample of the dataset (and desired outcome) # https://send.firefox.com/download/228f31e39d18738d/#rlMmm6UeGxgbkzsSD5OsQw
The dataset I would like to have has only the date and the average prices of all listings on that date. The goal is to get a (different) dataframe which would look something like this so I can work with it:
Date Average Price
1 4/5/2015 204.5438
2 4/6/2015 182.6439
3 4/7/2015 176.553
4 4/8/2015 182.0448
5 4/9/2015 183.3617
6 4/10/2015 205.0997
7 4/11/2015 197.0118
8 4/12/2015 172.2943
I created this in Excel using the Average.if function (and copy pasting by value) from the sample provided above.
I tried to format the data in Excel first where I could use the AVERAGE.IF function saying take the average if it is this specific date. The problem with this is that the dataset consists of 30million rows and excel only allows for 1 million so it didn't work.
What I have done so far: I created a data frame in R (where i want the average prices to go into) using
Avg = data.frame("Date" =1:2, "Average Price"=1:2)
Avg[nrow(Avg) + 2036,] = list("v1","v2")
Avg$Date = seq(from = as.Date("2015-04-05"), to = as.Date("2020-11-01"), by = 'day')
I tried to create an averageif-like function by this article and another but could not get it to work.
I hope this is enough information to go on otherwise I would be more than happy to provide more.
If your question is how to replicate the AVERAGEIF function, you can use logical indexing :
R code :
> df
Dates Prices
1 1 100
2 2 120
3 3 150
4 1 320
5 2 250
6 3 210
7 1 102
8 2 180
9 3 150
idx <- df$Dates == 1 # Positions where condition is true
mean(df$Prices[idx]) # Prints same output as Excel

Find frequency of terms from Function

I need to find frequency of terms from the function that I have created that find terms with punctuation in them.
library("tm")
my.text.location <- "C:/Users/*/"
newpapers <- VCorpus(DirSource(my.text.location))
I read it then make the function:
library("stringr")
punctterms <- function(x){str_extract_all(x, "[[:alnum:]]{1,}[[:punct:]]{1,}?[[:alnum:]]{1,}")}
terms <- lapply(newpapers, punctterms)
Now I'm lost as to how will I find the frequency for each term in each file. Do I turn it into a DTM or is there a better way without it?
Thank you!
This task is better suited for quanteda, not tm. Your function creates a list and removes everything out of the corpus. Using quanteda you can just use the quanteda commands to get everything you want.
Since you didn't provide any reproducible data, I will use a data set that comes with quanteda. Comments above the code explain what is going on. Most important function in this code is dfm_select. Here you can use a diverse set of selection patterns to find terms in the text.
library(quanteda)
# load corpus
my_corpus <- corpus(data_corpus_inaugural)
# create document features (like document term matrix)
my_dfm <- dfm(my_corpus)
# dfm_select can use regex selections to select terms
my_dfm_punct <- dfm_select(my_dfm,
pattern = "[[:alnum:]]{1,}[[:punct:]]{1,}?[[:alnum:]]{1,}",
selection = "keep",
valuetype = "regex")
# show frequency of selected terms.
head(textstat_frequency(my_dfm_punct))
feature frequency rank docfreq group
1 fellow-citizens 39 1 19 all
2 america's 35 2 11 all
3 self-government 30 3 16 all
4 world's 24 4 15 all
5 nation's 22 5 13 all
6 god's 15 6 14 all
So I got it to work without using quanteda:
m <- as.data.frame(table(unlist(terms)))
names(m) <- c("Terms", "Frequency")

using subset with a string in R

I have the following data frame in R (made up stuff to learn the program):
country population civilised
1 Town 13 5
2 city 69 9
3 Home 24 2
4 Stuff 99 9
and I am trying to access specific rows with the subset function, like
test <- subset(t, country==Town. But all ever get is object not found.
We need to quote the string.
test <- subset(t, country=='Town')
test
# country population civilised
#1 Town 13 5
NOTE; t is a function name (Check ?t). It is better to name objects that are not function names.

How can I select human miRNA from affy chip while analyzing data using R?

I am new to R and want to analyze miRNA expression from a data set of 3 groups. Can anyone help me out.
In this case I got other miRNAs(on affy chips) as top expressed genes. Now I want to select only human miRNAs. Please help me
Thanks in advance
Summary
I'm not entirely sure what your data frame looks like, given that I haven't worked with Affy chips before. Let me try to summarize what I think you have told us. You have a data frame with a list of all of the microRNAs on the Affy chip, along with their expression data. You want to select a subset of these microRNAs that are unique to humans.
Possible solution 1
You do not state whether or not your data frame contains a variable that identifies whether or not these microRNAs are indeed from humans. If it does have this information, all you would need to do is subset your data based on this identifier. Type help(subset) or help(Extract) for more information on how to do this.
Possible solution 2
If your data frame does not contain such an identifier, you will first need to make a list of all known human microRNAs. You could retrieve these manually from the online miRBase website (and then import them into R), or you could download them from Ensembl using the R package biomaRt. To do the latter, after loading biomaRt, you might type this command:
miRNA <- getBM(c("mirbase_id", "ensembl_gene_id", "start_position", "chromosome_name"), filters = c("with_mirbase"), values = list(TRUE), mart = ensembl)
The above code requests that R download the mirbase identifier, gene ID, start position, and chromosome name for all microRNAs in the miRBase catalog. (Note that you would have to specify the human Ensembl mart in an earlier command, which I have not shown).
Once you have downloaded this information, you could use a merge command or perhaps a which command to pull the appropriate microRNAs from your Affy chip data.
Recommendations
This all might sound a bit complicated. If you haven't already, I recommend that you spend some time working through exercises on biomaRt and bioconductor. Information about these packages, and how to install them, are available at the below links:
Bioconductor, http://www.bioconductor.org/install/
Database mining with biomaRt, http://www.stat.berkeley.edu/~sandrine/Teaching/PH292.S10/Durinck.pdf
You might consider asking for this question to be migrated to Biostar. I think you would get better responses there. Also, consider editing your question to provide more information about your data. Good luck.
Edit to my original answer
In reference to your comment made at 2012-02-26 22:08:02, try the following:
## Load biomaRt package
library(biomaRt)
## Specify which "mart" (i.e., source of genetic data) that you want to use
ensembl <- useMart("ensembl")
ensembl <- useDataset("hsapiens_gene_ensembl", mart = ensembl)
## You can then ask the system what attributes are available for download
listAttributes(ensembl)
name description
58 mirbase_accession miRBase Accession(s)
59 mirbase_id miRBase ID(s)
60 mirbase_gene_name miRBase gene name
61 mirbase_transcript_name miRBase transcript
Above I have pasted part of the output from the listAttributes() command, which shows the relevant miRBase options. Now you can try the following code:
## Download microRNA data
miRNA <- getBM(c("mirbase_id", "ensembl_gene_id", "start_position", "chromosome_name"), filters = c("with_mirbase"), values = list(TRUE), mart = ensembl)
## Check how much we downloaded
> dim(miRNA)
[1] 715 4
## Peak at the head of our data
> head(miRNA)
mirbase_id ensembl_gene_id start_position chromosome_name
1 hsa-mir-320c-1 ENSG00000221493 19263471 18
2 hsa-mir-133a-1 ENSG00000207786 19405659 18
3 hsa-mir-1-2 ENSG00000207694 19408965 18
4 hsa-mir-320c-2 ENSG00000212051 21901650 18
5 hsa-mir-187 ENSG00000207797 33484781 18
6 hsa-mir-1539 ENSG00000222690 47013743 18
## Check which chromosomes are contributing to our data
> table(miRNA$chromosome_name)
1 10 11 12 13 14 15 16 17 18 19 2 20 21 22 3 4 5 6 7 8 9 X
50 27 26 25 15 59 26 15 35 7 85 23 32 5 16 31 23 30 17 33 27 28 80
Now your challenge will be to use this downloaded data to parse your original Affy data frame. Again, read the help files for the merge, Extract, and which functions to give it a try yourself first.

Resources