Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Overview
I'm not new to R but am very new to machine learning.
For work I collect data by writing on a datasheet printed on waterproof paper which I then have to transcribe to the database manually. This takes a long time at the end of a long day and is a process prone to mistakes.
The entire datasheet is shown below
What I would like to do is simply take a photo of the sheet and have keras read it and input the results into a database
And the section of the datasheet that I am interested in getting Keras to read is shown here
Each row of the datasheet represents what species of coral was found and each column represents what transect it was found on ie 7 Acroppora was found on T1
Each of these cells are given a unique entry in the database in a format similar to this which would show how the Acropora row is recorded
For each datasheet that we have entered in the past (probably somewhere between 1000 and 2500) there are corresponding database entries which can be exported to csv and linked to each datasheet
Ultimately, what I would like to do is simply take a photo of the sheet and have keras read the part I'm interested in (shown in second image) and input the results into a CSV in a similar format shown in the third image
The questions
What I've been thinking about is getting it to identify the borders of the parts of the datasheet I'm interested in (shown in the second image) and extract it. This would mean that I could then put in coordinates for each cell, ie Acropora T1 (as shown in the image below) and identify the number counted in that cell and export it to a database
Does this process sound possible? If so, would anyone know of any examples I could look up or even what you would call this process so I can look it up
Otherwise I was thinking about scanning each sheet as a whole (As shown in the first image) and simply training from that, however I feel that would be more prone to errors
I really hope this makes sense and would very much appreciate any help and/or suggestions either specifically to the questions that I asked or about my project in general
This uses OpenCV and Python.
According to the chapter on 'Hough Line Transform' you could detect lines like this.
import cv2
import numpy as np
img = cv2.imread('D:/Books/lines1.jpg', cv2.IMREAD_GRAYSCALE)
edges = cv2.Canny(img,50,150,apertureSize = 3)
cv2.imwrite('D:/Books/edges.jpg',edges)
But based on my simple research I think counting is possible using code like this.
More knowledge of OpenCV is required at this stage. I think this is just dilating and the borders of the lines are more pronounced.
img = cv2.imread('D:/Books/lines1.jpg', cv2.IMREAD_GRAYSCALE)
edges = cv2.Canny(img,50,150,apertureSize = 3)
cv2.imwrite('D:/Books/edges.jpg',edges)
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (4, 4))
dilated_Edges = cv2.dilate(edges, kernel, iterations=1)
cv2.imwrite("D:/Books/dilated_Edges.jpg", dilated_Edges);
lines = cv2.HoughLines(image=dilated_Edges,rho=1,theta=np.pi/180, threshold=100)
print( len(lines))
This prints 8 for me which isn't correct.
I pursued this and this code is based on help from the OpenCV forum(Suleyman TURKMEN).
Images I tested with are these. Prints the correct count.
import cv2
import math
img = cv2.imread('D:/Books/lines1.jpg', cv2.IMREAD_GRAYSCALE)
ret,bw = cv2.threshold(img,0,255,cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
cv2.imshow("bw", bw)
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 2))
eroded_Edges = cv2.erode(bw, kernel, iterations=3)
dilated_Edges = cv2.dilate(eroded_Edges, kernel, iterations=4)
im2, contours, hierarchy = cv2.findContours(dilated_Edges, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
print (len(contours) , " horizontal lines")
cv2.imshow("vertical lines", eroded_Edges)
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (1, 5))
eroded_Edges = cv2.erode(bw, kernel, iterations=3)
im2, contours, hierarchy = cv2.findContours(eroded_Edges, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
print (len(contours) , " vertical lines")
cv2.imshow("horizontal lines", eroded_Edges)
cv2.waitKey()
Related
I am new to processing RNA seq data and am now practicing to reproduce a published figure related to RNA seq. This os the paper and Fig2A is what I'm trying to achieve.
In brief, I downloaded the code with recount3 and subset the sample for groups that I want (control vs condition 1, control vs condition 2, etc). Then I performed the following code:
dds_4uM_30min <- DESeqDataSetFromMatrix(countData = ha_4uM_30min_data,
colData = ha_4uM_30min_meta,
design = ~ type)
dds2_4uM_30min <- DESeq(dds_4uM_30min)
res_4uM_30min <- results(dds2_4uM_30min, tidy=F)
(type is the column that I made to contain the information of whether it's control or condition 1)
This is the figure I get, which confuses me since it is nowhere near the original figure.
I thought that they might do additional processing of the data, but have no idea what are the common or reasonable ways to do.
Furthermore, there seems to be datapoints that form lines (as can seen in the above figure), which is not seen by in the original figure. I am wondering what causes this kind of distribution and how to adjust for getting rid of it.
Thanks in advance for any opinion or suggestion.
I have been trying to use the function lfcShrink but the figure still has this weird line.
Any suggestions on how to further process RNA seq data?
Is there a way to include open-ended/free-form questions that are ungraded or skipped by r-exams?
Use case: we want to have an exam with mostly multiple choice questions using the package and its grading capability, but also have 5-10 open ended questions that are printed in the same exam. Ideally, r-exams would provide the grade for the first MCQ section, and we could manually add the grade of the open-ended questions.
I forked the package and made some small changes that allows one to control how many questions are printed on the first page and to remove the string-question pages.
The new parameters are number_of_closed_questions and include_string_pages. It is far away from being ideal, but works for me.
As an example let us have 6 mpc/single-choice questions and one essay question (essayreg):
# install devtools if you do not have it!
# install the fork
devtools::install_github("johannes-titz/exams")
library("exams")
myexam <- list(
"tstat2.Rnw",
"ttest.Rnw",
"relfreq.Rnw",
"anova.Rnw",
c("boxplots.Rnw", "scatterplot.Rnw"),
"cholesky.Rnw",
"essayreg.Rnw"
)
set.seed(403)
ex1 <- exams2nops(myexam, n = 2,
dir = "nops_pdf", name = "demo", date = "2015-07-29",
number_of_closed_questions = 6, include_string_pages = FALSE)
This will produce only 6 questions on the front page (instead of 7) and will also exclude the string-question pages.
If you want normal behavior, just exclude the new parameters. Obviously, one will have to set the number of closed questions manually, so one should be really careful.
I guess one could automatically detect how many string questions are loaded and from this determine the number of open-ended/closed-ended questions, but I currently do not have the time to write this and the presented solution is usable for my case.
I am not 100% sure that the scans will work this way, but I assume there should not be any bigger problems as I did not really change much. Maybe Achim Zeileis could comment on that? See my commit: https://github.com/johannes-titz/exams/commit/def044e7e171ea032df3553acec0ea0590ae7f5e
There is built-in support for up to three open-ended "string" questions that are printed on a separate sheet that has to be marked by hand. The resulting sheet can then be scanned and evaluated along with the main sheet using nops_scan() and nops_eval(). It's on the wish list for the package to extend that number but it hasn't been implemented yet.
Another "trick" you could do is to use the pages= argument of exams2nops() to include a separate PDF sheet with the extra questions. But this would have to be handled completely separately "by hand" afterwards.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I have a challenging file-reading task.
I have a .txt file from a typical old accounting department (with headers, titles, pages and the useful tabulated quantitative and qualitative information). It looks like this:
From this file I am trying to do two tasks (with read.table and scan):
1) extract the information which is tabulated between | which is the accounting information (any trial ended in a not easy data frames or character vectors)
2) include as a variable each subtitle which begins with "Customers" in the text file: as you can see the Customer info is a title, then comes the accounting info (payables), then again another customer and the accounting info and so on. So is not a column, but a row (?)
I´ve been trying with read.table (several sep and quote parameters) and with scan and then having tried to work with the character vectors.
Thanks!!
I've been there before so I kind of know what you're going through.
I've got 2 news for you, one bad, one good. The bad one is I have read-in these types of files in SAS tons of times but never in R - however
the good news is I can give you some tips so you can work it out in R.
So the strategy is as follow:
1) You're going to read the file into a dataframe that contains only a single column. This column is character and will hold
a whole line of your input file. i.e. length is 80 if the largest line in your file is 80 long.
2) Now you have a data frame where every record equals a line in your input file. At this point you may want to check your
dataframe has the same number or records as per lines in your file.
3) Now you can use grep to get rid-off or keep only those lines that meet your criteria (ie subtitle which begins with "Customers").
You may find regular expressions really useful here.
4) Your dataframe now only have records that matches 'Customer' patterns and table patterns
(i.e line begin with 'Country' or /\d{3} \d{8}/ or ' Total').
5) What you need now is to create a group variable that increment +1 every time it finds 'Customer'. So group=1 will repeat the same value until it finds 'Customer 010343' where group is now group=2. Or even better your group can be customer id until a new id is found. You need to somehow retain the id until a new id is found.
From the last step you're pretty much done as you will be able to identify customers and tables pretty easy. You may want to create a function that output your table strings in a tabular format.
Whether you process them in a single table or split the data frame in n data frame to process them individually is up to you.
In SAS there is this concept of pointer (#) and retention (retain statement) where each line matching a criteria can be process differently from other criterias so you output data set already contains columns and customer info in a tabular format.
Well hope this helps you.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Is there an easy way to automatize the conversion of a R dataframe to a pretty Word table in APA format for publishing manuscripts? I'm currently doing this by saving the table in a csv, opening that in excel, copying the excel table to Word, and formatting it there, but I'm hoping there would be a way to automatize the formatting in R, so that when I convert it to Word, it would already be in APA format, because Word sucks in automatization.
Basically, I want to continue writing the manuscript itself in Word, while doing my analyses in R. Then gather all the results in R to a table (with manually modifiable formatting) by a script and convert it to whatever format I could then simply copy-paste to Word (so that the formatting actually holds). When I need to modify the table, I would make the changes in R and then just run the script again without the need to do any changes in Word.
I don't want to learn LaTeX, because everyone in my field uses Word with features like track changes, and I use Zotero add-in for citations, so it's simpler to just keep the writing separate from the analyses. Also, I am a psychologist, not a coder, so learning a lot of new technologies just for this is probably not worth the effort for me. Typically with new technologies come new technical problems, and I am aiming to make my workflow quicker, but not at the cost of unpredictability (which may make it slower exactly at the moment when I cannot afford it).
I found a R+knitr+rmarkdown+pander+pandoc solution "with as little overhead as possible", but it seems to be quite heavy still because I don't know any of those technologies apart from R. And I'm not eager to start learning all that, as it seems to be aimed for doing the writing and all in R to the very end, while I want to separate my writing and my code - I never need code in my writing, only the result tables. In addition, based on the examples, it seems to fetch the values directly from R code (e.g., from summary() to create a descriptive table), while I need to be able to tinker with my table manually before converting it, for instance, writing the title and notes (like a specific note to one cell and explaining it in the bottom). I also found R2wd, but it seems to be an older attempt for the same "whole workflow in R" problem as the solution above. SWord does not seem to be working anymore.
Any suggestions?
(Just to let you know, I am the author of the packages I recommend you...)
You can use package ReporteRs to output your table to Word. See here a tutorial (not mine):
http://www.sthda.com/english/wiki/create-and-format-word-documents-using-r-software-and-reporters-package
Objects FlexTable let you format and arrange tables easily with some standard R code. For example, to set the 2nd column in bold, the code looks like:
myFlexTable[, 2] = textBold()
There are (old) examples here:
http://davidgohel.github.io/ReporteRs/flextable_examples.html
These objects can be added to a Word report using the function addFlexTable. The word report can be generated with function writeDoc.
If you are working in RStudio, you can print the object and it will be rendered in the html viewer so you can export it in Word when you are satisfied with its content.
You can even add real Word footnotes (see the link below)
http://davidgohel.github.io/ReporteRs/pot_objects.html#pot_footnotes
If you need more tabular output, I recommend you also the rtable package that handles xtable objects (and other things I have to develop to satisfy my colleagues or customers) - a quick demo can be seen here:
http://davidgohel.github.io/tabular/
Hope it helps...
I have had the same need, and I have ended up using the package htmlTable, which is quite 'cost-efficient'. This creates a HTML table (in RStudio it is created in the "Viewer" windows in the bottom right which I just mark using the mouse copy-paste to Word. (Start marking form the bottom of the table and drag the mouse upwards, that way you are sure to include the start of the HTML code.) Word handles these tables quite nicely. The syntax of is quite simple involving just the function htmlTable(), but is still able to make somewhat more complex tables, such as grouped rows and primary and secondary column headers (i.e. column headers spanning more than one column). Check out the examples in the vignette.
One note of caution: htmlTable will not work will factor variables, i.e., they will come out as integer numbers (according to factor levels). So read the data using stringsAsFactors = FALSE or convert them using as.character().
Including trailing zeroes can be done using the txtRound function. Example:
mini_table <- data.frame(Name="A", x=runif(20), stringsAsFactors = FALSE)
txt <- txtRound(mini_table, 2)
It is not completely straightforward to assign formatting soch as bold or italics, but it can be done by wrapping the table contents in HTML code. If you for instance want to make an entire column bold, it can be done like this (please note the use of single and double quotation marks inside paste0):
library(plyr)
mini_table <- data.frame(Name="A", x=runif(20), stringsAsFactors = FALSE)
txt <- txtRound(mini_table, 2)
txt$x <- aaply(txt$x, 1, function(x)
paste0("<span style='font-weight:bold'>", x, "</span")
)
htmlTable(txt)
Of course, that would be easier to to in Word. However, it is more interesting to add formatting to numbers according to some criteria. For instance, if we want to emphasize all values of x that are less than 0.2 by applying bold font, we can modify the code above as follows:
library(plyr)
mini_table <- data.frame(Name="A", x=runif(20), stringsAsFactors = FALSE)
txt <- txtRound(mini_table, 2)
txt$x <- aaply(txt$x, 1, function(x)
if (as.numeric(x)<0.2) {
paste0("<span style='font-weight:bold'>", x, "</span>")
} else {
paste0("<span>", x, "</span>")
})
htmlTable(txt)
If you want even fancier emphasis, you can for instance replace the bold font by red background color by using span style='background-color: red' in the code above. All these changes carry over to Word, at least on my computer (Windows 7).
The short answer is "not really." I've never had much luck getting well formatted tables into MS Word. The best approach I can offer you requires using Rmarkdown to render your tables into an HTML file. You can copy and paste you results from the HTML file to MS Word, but I make no guarantees about how well the formatting will follow.
To format your tables, you can try something like the xtable package, or the pixiedust package. But again, no guarantees that the formatting will transfer.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
Is R code available for creating 3D plots of our galaxy or universe? I have searched a few times over the last six months and not found any.
This news article includes some very nice 3D plots that look like they may have been created with R:
http://www.dailymail.co.uk/sciencetech/article-2341750/The-beautiful-3D-map-space-plots-nearest-galaxies--reminds-tiny-Earth-is.html
A short video can be viewed at the above link, but I do not see a link to R code there. The video was created by people at the University of Lyon and the University of Hawaii. Here is a link to a longer video related to the same project:
http://irfu.cea.fr/cosmography
I just thought it would be neat to explore space from within a 3D plot in R but I cannot find any relevant code.
Locations for objects likely are found within the Redshift Catalog, and perhaps can be downloaded, but I have no idea whether I would need to adjust those location data in various ways if I tried to create my own 3D map. Here is one possible source of data if I were to try creating my own map:
https://www.cfa.harvard.edu/~dfabricant/huchra/zcat/
I have read something to the effect that asking for relevant packages does not make for an appropriate post. Sorry if this post is not appropriate.
The problem is not the modelling but the data. Here's a database made available by . http://www.stellar-database.com/isdb.mdb - but probably you'll need to dig around for what you want specifically.
Here's a simple SQL query to pull out some of the star data:
SELECT Positions.OwnerID, Positions.RA_hr, Positions.RA_min, Positions.RA_sec, Positions.Dec_deg, Positions.Dec_arcmin, Positions.Dec_arcsec, Positions.Distance, Spectra.SpectralClass, Spectra.LuminosityClass, qryProps.Name
FROM (Positions LEFT JOIN Spectra ON Positions.OwnerID = Spectra.OwnerID) LEFT JOIN qryProps ON Positions.OwnerID = qryProps.OwnerID
WHERE (((Positions.Distance)>=0));
Then save it as a csv and import it:
stars<-read.csv("qNamedStars.txt",header=T)
head(stars)
Write a function to translate the coords to X, Y, Z
celCoords<-function(Rh,Rm,Rs,Da,Dm,Ds,Distance){
R.angle<-((Rh/24)+(Rm/(24*60))+(Rm/(24*60*60)))*2*pi
D.angle<-(Da/90)+(Dm/(90*60))+(Ds/(90*60*60))*0.5*pi
Z<-cos(D.angle)*Distance
hyp.XY<-sin(D.angle)*Distance
X<-sin(R.angle)*hyp.XY
Y<-cos(R.angle)*hyp.XY
return(c(X,Y,Z))
}
starcoords<-cbind(stars,
matrix(celCoords(stars$RA_hr,
stars$RA_min,
stars$RA_sec,
stars$Dec_deg,
stars$Dec_arcmin,
stars$Dec_arcsec,
stars$Distance
),,ncol=3,byrow=T)
)
colnames(starcoords)<-c(colnames(stars),"X","Y","Z")
Filter the data.frame
sf<-starcoords[abs(starcoords$Z)<2000 & abs(starcoords$X)<1000,] # apply a filter
Then plot using rgl
require(rgl)
plot3d(sf$X,sf$Y,sf$Z,col=rainbow(nrow(sf)),size=10)
You can obviously add more data for luminosity, size, type, etc. if it's available, and then use those parameters to set size, color, etc.