First I have an input and I want to get an output like this
(I want to group the occurrences in the dataset between columns):
Second:
Display this table in a good looking way (something that looks like Word or Excel)
I can't use Word or Excel as I'm making some calculations in R with this dataset (it contains columns with numbers which aren't displayed here)
calculating the output table
I don't really see how the output table is calculated. Shouldn't it be output_table["A", "C"] = "e"?
Displaying your data
There are a lot of ways to do that. You might consider using RMarkdown to create a report-style output.
The DT library is also a very handy tool to display tables. It works well with RMarkdown and can be embedded in HTML documents. If you are using RStudio, you can use the following code to display your data
library(DT)
DT::datatable(iris)
Related
I am trying to get some epidemiological data stored in a pdf that is publicly available
link.
I am just looking at the data in page 9 (right table).
What I would like to achieve is to pass the data into a table, but since I have many headers, it's quite dificult to achieve this. Example: The column SIDA is divided in two further columns (SEM and ACUM). Would it be possible to split the SIDA cell?
So far I have tried to extract the data using pdftools and tabulizer.
I'd like to do a contingency table between sex and disease. As I use R.markdown for pdf. reports, I use kableExtra to customize the tables. KableExtra doesn't make the tables well when they are not data.frame. So they make ugly table with tableby
With this data.frame here is what I got.
library(kableExtra)
library(arsenal)
set.seed(0)
Disease<-sample(c(rep("Name of the first category of the disease",20),
rep("Name of the Second category of the disease",32),
rep("Name of the third category of the disease",48),
rep("The category of those who do not belong to the first three categories",13)))
ID<-c(1:length(Disease))
Gender<-rbinom(length(Disease),1,0.55)
Gender<-factor(Gender,levels = c(0,1),labels = c("F","M"))
data<-data.frame(ID,Gender,Disease)
When I run the result of this analysis with R.markdown (pdf) I get this kind of table
There are two problems, thirsly KableExtra:: doesn't deal with the characters
Secondly I can't customize columns width when I use tableby with kableExtra, cause I would like to enlarge the column containing variable names, since I am really working with data where the names of the variable values are very long. But if I use kable of knitr::, the characters are removed, but the tables are not scale down, and a part is not displayed. I think knitr has many limitations.
How can I deal with this problem? Or is there another function which could be used in R.markdown (pdf format) to make beautiful contingency table with p.value.
To avoid the being added to your output when using knitr, use results='asin' in the chunk set up options, so like:
{r results='asis'}
You can control the width of the column with your variable names in them by using the width option in the print function. You technically do not need to wrap the summary call in print, but it doesn't change anything if you do, and you are able to adjust the width setting.
So, for your example:
print(summary(tableby(Gender~Disease)), width = 20)
should make it more readable when it renders to pdf. You can change the width and it will wrap at the limit you set.
Using the code from your example and the above function call, the table looks like this when knit to pdf:
I need an automatic code to extract pdf table in R.
So I searched website, find tabulizer package.
and I use
extract_tables(f2,pages = 25,guess=TRUE,encoding = 'UTF-8',method="stream")#f2 is pdf file name
I tried every method type, but the outcome is not tidy.
Some columns are mixed and there is a lot of blank as you can see image file.
I think I would do modify the data directly. But the purpose is automizing it. So general method is needed. And every pdf file is not organized. Some table is very tidy with every related line matched perfectly but others are not..
As you can see in my outcome image, in column 4, the number is mixed in same column. Other columns, the number is matched one by one what I mean is I want to make column tidy like table in pdf automatically.
Is there any package or some method to make extracted table tidy?
my Code result
table in PDF
I have made a table in Rmarkdown comparing information about models I have created. Code below:
mlist <- list(fitdm3hyp,fitdm3.1,fitdm3.2,fitdm3full,fitdm3.5,fitdm3.5b,fitdm3bi,fitdm3bio)
tablea <- compareLavaan(mlist,fitmeas = c("chisq","df","rmsea.robust","cfi.robust","srmr"),digits=4,type="html",chidif=FALSE)
However, the 1st and 2nd columns (chisq and df) end up so close that you can't tell where each value ends and the other begins. It ends up looking like: 343.44160.00 rather than 343.44 | 160.00.
How can I format this to increase the space between columns, please?
You can 'View()' function in dyplr to better view the dataset. make sure you are using a wider window view too. If you use the Tidyverse package, and make it a dataframe, the data should pad itself enough for better visuals.
I am trying to extract information from a portion of a table in R. Example table below...
This is just a simple example compared to what I am really dealing with. I am working with a very large table that has a very strange structure and changes with each page. When I read the whole table using "extract_tables" function, I get a very unstructured result back with multiple table elements being pushed into the same row/column. So I am attempting to read only a portion of the table. I am trying to locate the position of the table using the text in the first cell "Here", so I can plug this into the "area" parameter of the "extract_tables" function. I cannot use the "extract_areas" function because I do not want to extract the tables manually.
Can anyone help me with this?