First of all I am quite new to all of this, please excuse any inaccuracies. I am struggling to find a way to include a line that adds up the percentage share of each category in a stargazer html table. The line is supposed to be located below the bottom of the table, so to say in the notes section. Unfortunately, since I set summary to FALSE, there seems to be no way to access/change the notes section as it appears to solely be available in regression table outputs of stargazer, not for tables that solely display the contents of the dataframe. I tried notes.label to name the notes section to "sum" in order to manually enter "100" below the line of the column "Relative share". That did not work.
If I add a row with the summed up relative share to the data frame, stargazer will include it into the regular table and not separately reveal it below the table line (possibly I just have not figured out how to do so).
Distribution table with relative share:
H1_Industry_dist <- H1_Industry_dist %>%
mutate("Relative share" = (Count/sum(Count)*100))
stargazer(H1_Industry_dist, type = "html",
summary = FALSE, rownames = FALSE,
out = "Table 2 - H1 Industry Distribution.html")
I hope to find some help here. Many thanks!
Related
I work a lot with the new tables collect command in stata 17. Does anybody know how to get the confidence interval in one cell in the table vs. One column for lower bound and one column for the upper bound estimate?
Alternatively a quick fix in word (or excel though my final document is word. Saving the output in excel takes so long)
Is I see it there is no option to put it in one column, so maybe a layout work around?
From the stata documentation of the collect command, the quick start mentions
table (colname) (result), command(_r_b _r_ci: regress y x1 x2 x3). You should be able to use collect with it, but without a minimum reproducible example of your specific case, it is hard to verify if this works as intended in your case. For the general idea of a minimum reproducible example please see here and for specific advice on how to create a minimum reproducible example please see here.
Here is a general example that uses table, collect and putdocx to create a word document to get the confidence interval in one cell:
use https://www.stata-press.com/data/r17/nlsw88.dta
table (colname) (result), command(_r_b _r_ci: regress wage union occupation married age)
collect layout (colname) (result)
putdocx begin
putdocx collect
putdocx save Table, replace
The text from a pdf I scraped is jumbled up in different elements. Not to mention, it deleted data when it was converted to a data frame. It's really hard to tell where the text should have been split since it seems like I got it correct in the below code. How do I split the text so that it looks looks like the original table?
mintz = "https://www.mintz.com/sites/default/files/media/documents/2019-02-08/State%20Legislation%20on%20Biosimilars.pdf"
mintzText = pdf_subset(mintz,pages = 2:23)
mintzText = pdf_text(mintzText)
q = data.frame(trimws(mintzText))
mintzdf <- q %>%
rename(x = trimws.mintzText.) %>%
mutate(x=strsplit(x, "\\n")) %>%
unnest(x)
View(mintzdf)
mintzDF=mintzdf[-c(1:2),]
mintzDF=mintzDF %>%
separate(x, c("a","State", "Substitution
Requirements","Pharmacy Notification Requirements
(to prescriber, patient, or others)","Recordkeeping
Requirements"))%>%
select(-a)
View(mintzdf)
what it looks like
what it should look like
Pdf stored order for a page may be random or bottom rows upwards as there are no key press order rules for when lasers charge a drum (The design requirement for PDF introduction)
We are lucky if the order can be sensibly extracted, but this is a very well ordered PDF. So remember there is no need to observe the grid simply output by rows with spaces that with luck form columns.
In this case using poppler pdftotext with no controls a single page text order could look like this with the first column headed State and the second starting with Substitution\nRequirements\n so clearly there may be head scratching why State is not spaced away from Alaska? but then it is PDF after all, so expect there are no rules.
Looks like it was written down one column then across two then perhaps down the last ?.
Dependant on the very different page variations, I would attempt to target as vertical strips, rather than horizontals. so set a template as 4 vertical page high zones and then hope the horizontal breaks can be determined as matches. The alternative (probably better) is extract as a tabular layout and xpdf pdftotext may then give a better result.
Or use a python table extractor like pdfminer.
I want to calculate diversity indices of different sampling sites in R. I have sites in the first row and the different species in the first column. However, R is reading the first column as normal data (not as a header so to speak).
Pics:
https://imgur.com/a/iBsFtbe
Code:
>Macro<-read.csv("C:\\Users\\Carly\\OneDrive\\Desktop\\Ecology >Projects\\Macroinvertebrates & Water >Quality\\Macro_RData\\Macroinvert\\MacroR\\MacroCSV.csv", header = T)
You need to add row.names = 1 to your command. This will indicate that row names are stored in column number 1.
Macro <- read.csv("<...>/MacroCSV.csv", header = TRUE, row.names = 1)
I sense that you are frustrated. As r2evans said, it is easier for people to help you if you provide them with the data in text form and not with screenshots - because we can't recreate the problem or try to solve it by loading a screenshot into R.
CSV files are just text, so you can open them with a text editor such as NotePad and copy and paste it here. You don't need the whole text - the columns and lines needed to reproduce the problem are enough. This was what we were looking for:
Site,Aeshnidae,Amnicolidae,Ancylidae,Asellidae
AN0119A,0,0,0,6,0
AN0143,0,0,0,0,0
Programming for many people is very frustrating when they start out, don't let this discourage you!
It looks like your data is in the wrong orientation for analysis in vegan - your species are the rows, and sites are columns. From your pics, it looks like you've spotted this issue and tried transposing, but are having issues with the placement of the headers.
Try reading your csv in, and specifying that the first column should be row names:
MacroDataDataFinal <- read.csv("Path/to/file.csv",
row.names=1)
Then transpose the data
MacroDataDataFinal_transposed <- t(MacroDataDataFinal)
Then try running the specaccum function:
library(vegan)
speccurve <- specaccum(comm=MacroDataDataFinal_transposed,
method="random",
permutation=1000)
Hopefully this will work. If you get any errors please let us know the code you typed, and the precise error message.
I'd like to do a contingency table between sex and disease. As I use R.markdown for pdf. reports, I use kableExtra to customize the tables. KableExtra doesn't make the tables well when they are not data.frame. So they make ugly table with tableby
With this data.frame here is what I got.
library(kableExtra)
library(arsenal)
set.seed(0)
Disease<-sample(c(rep("Name of the first category of the disease",20),
rep("Name of the Second category of the disease",32),
rep("Name of the third category of the disease",48),
rep("The category of those who do not belong to the first three categories",13)))
ID<-c(1:length(Disease))
Gender<-rbinom(length(Disease),1,0.55)
Gender<-factor(Gender,levels = c(0,1),labels = c("F","M"))
data<-data.frame(ID,Gender,Disease)
When I run the result of this analysis with R.markdown (pdf) I get this kind of table
There are two problems, thirsly KableExtra:: doesn't deal with the characters
Secondly I can't customize columns width when I use tableby with kableExtra, cause I would like to enlarge the column containing variable names, since I am really working with data where the names of the variable values are very long. But if I use kable of knitr::, the characters are removed, but the tables are not scale down, and a part is not displayed. I think knitr has many limitations.
How can I deal with this problem? Or is there another function which could be used in R.markdown (pdf format) to make beautiful contingency table with p.value.
To avoid the being added to your output when using knitr, use results='asin' in the chunk set up options, so like:
{r results='asis'}
You can control the width of the column with your variable names in them by using the width option in the print function. You technically do not need to wrap the summary call in print, but it doesn't change anything if you do, and you are able to adjust the width setting.
So, for your example:
print(summary(tableby(Gender~Disease)), width = 20)
should make it more readable when it renders to pdf. You can change the width and it will wrap at the limit you set.
Using the code from your example and the above function call, the table looks like this when knit to pdf:
I need an automatic code to extract pdf table in R.
So I searched website, find tabulizer package.
and I use
extract_tables(f2,pages = 25,guess=TRUE,encoding = 'UTF-8',method="stream")#f2 is pdf file name
I tried every method type, but the outcome is not tidy.
Some columns are mixed and there is a lot of blank as you can see image file.
I think I would do modify the data directly. But the purpose is automizing it. So general method is needed. And every pdf file is not organized. Some table is very tidy with every related line matched perfectly but others are not..
As you can see in my outcome image, in column 4, the number is mixed in same column. Other columns, the number is matched one by one what I mean is I want to make column tidy like table in pdf automatically.
Is there any package or some method to make extracted table tidy?
my Code result
table in PDF