How to search inside PDF files [closed] - asp.net

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I have to search inside PDF files for an upcoming (ASP.NET MVC) project in shared hosting environment. What is the best solution? Any third part product?

Lucene is a popular choice. See Lucene FAQ on searching pdfs.

Lucene is a good choice - for ASP.NET, using Lucene.NET is the best bet. Lucene is an indexing engine only, meaning you'll have to provide it with the text from the PDF. If you have access to the web server, you can install an IFilter for this (I recommend Foxit's PDF filter). Otherwise you'll have to get hold of some code to use on your website to parse and filter the PDF.

Docotic.Pdf library can help with such task.
The library could be used to extract text (with or without formatting). The extracted text can be used to create an index. You can even use String.IndexOf method if you just want to know if a PDF file contains a given text.
The library can also retrieve a collection of words with their bounding rectangles from PDFs. This might be useful if you need to know exact position of a text in a file.
Disclaimer: I work for the vendor of the library.

Related

Is there an R package which can modify an existing PDF file? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
I need a package that can convert or give me all of the information contained on a pdf file and that later I can produce a new pdf file with some of that information being changed.
pdftools can give me a lot of the information, but I was not able to generate a new pdf from the information being extracted and then modified by me.
My father in his job need to fill some forms that are created by the local Fire Department, so they are often similar but kind of hardworking to modify just by using Adobe Acrobat or Microsoft Word. A package in which I could obtain the data of the pdf, change some of the strings and numbers, and then generate a new pdf with the modified information but same set of fonts, tables, titles, line spacing and everyhting else related to the metadata of the pdf file being the same would be really handy.
I looked some packages that can manipulate pdf files, like pdftools and staplr, but I wasn't able to generate a new pdf file with the altered information.

Any Open source Libraries to Convert STEP files to glTF file format? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
Im trying to convert STEP file to glTF format in .net core.Any solution for to read and convert STEP files in glTF format.
Well, I have not heard about any open-source STEP reader written in C#, but if wrapping of C++ library is also considered, then it can be done with help of Open CASCADE Technology (OCCT, under LGPL), which provides STEP reader and glTF writer (within current development branch).
In this case, you will have to write a little bit code in C++/CLI to:
Read STEP file into XCAF document using STEPCAFControl_Reader tool.
Compute triangulation using BRepMesh_IncrementalMesh for all shapes within the document with necessary quality.
Convert XCAF document (with computed triangulation) into glTF 2.0 file using RWGltf_CafWriter tool.
Expose some function / class to be accessible from C# level (using C++/CLI or PInvoke() approach) and manage related DLLs distribution tasks.
OCCT comes with a C# sample using C++/CLI wrapping approach, but if conversion is all you need, then the sample might look too complex.

Dictionary text file [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
This post was edited and submitted for review 6 months ago and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
I am writing a program that needs a list of English words as a source file for it to work. I realise that these source files are available for students writing games such as Hangman or Crossword solvers but I am having trouble locating such a source file and wonder if anyone knows how I can attain one without slowly scraping websites and building up a dictionary manually.
What about /usr/share/dict/words on any Unix system? How many words are we talking about? Like OED-Unabridged?
For an English dictionary .txt file, you can use Custom Dictionary.
You can also generate a list aspell or wordlist with own settings.
Also you can take a look at http://wordlist.sourceforge.net/
Only english words: http://www.math.sjsu.edu/~foster/dictionary.txt
Also take a look at:
http://wordlist.sourceforge.net/
http://www.math.sjsu.edu/~foster/dictionary.txt
350,000 words
Very late, but might be useful for others.
There's also WordNet. Its data files format are well-documented.
I used it for building an embeddable dictionary library for iOS developers (www.lexicontext.com) and also in one of my apps.
#Future-searchers: you can use aspell to do the dictionary checks, it has bindings in ruby and python. It would make your job much simpler.

Wikipedia article names (no content) [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I am doing a project, for which I need to know all the wikipedia article names (I don't need the content). Is there a place where I can download this data.
Check out this page here on Wikipedia - there is an option to just download an archive with the names of the articles. Here's the actual path to the download page:
All Titles (gzipped) - 32+ Mb at the time of posting.
Edit:
You may notice non-English titles appearing in the list (and some profanity - be advised) contained in enwiki-latest-all-titles-in-ns0.gz. This is because by default most people create content on the main English wiki (language code en). If you were to investigate other language dumps you will observe there are different sets of articles.
Reading on the main download page, there are references to being able to use the Wikipedia API to perform some types of querying on Wikipedia, but I'm not sure this will resolve your problem (taxonomy of the pages doesn't seem to provide a simple way to differentiate "English" content vs "content on English wiki").
I'm not aware of any central list of articles, but if you just need a large number of them rather than a complete list (bearing in mind that any complete list will always be out of date anyway) then you could probably put something together with wget to recursively follow links within wikipedia from the main page and store the URLs you get.

Can anybody recommend a Bar Code web server control? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
Can anybody recommend a Bar Code web server control for formats 39 and 128 formats?
UPDATE: I posted this after being given a choice of 3 controls we'd never heard of. I was hoping to get a recommendation from somebody who is using something that is popular, stable and a commercial product. It looks like we will just go with one of the choices our manager sent. If you are reading this after the fact, and have a good recommendation, please add it for others needing one in the future. thx
There is a series of articles on CodeProject articles that do just that:
Drawing Barcodes in Windows Part 1 - Code 39
Drawing Barcodes in Windows Part 5 - Code 128
Another one:
Barcode Image Generation Library
Another way, using barcode Fonts, but simple to use in ASP.Net
Barcodes in ASP.NET applications
Actually, implementing you own barcode drawing routines is not too hard if you stick with simple 1D barcodes.
The best book ever on the subject is The Bar Code Book. It's one of these absolute reference books that you just want to keep and read out of pure nerdy pleasure.
There is also an open source ASP.NET barcode generation framework on www.codeplex.com:
http://www.codeplex.com/BarcodeRender
I did have a problem with the Interleaved 2 of 5 symbology, but I believe it was added rather recently. Perhaps the other symbologies are more stable.
Guys... don't go too far. There are Windows Fonts (TrueType fonts / TTF) that can easily be used to draw bar codes. the Graphics object is your friend.
You definitely should check out these two options:
1) www.idautomation.com
2) barbeque open source library (Java) to generate barcodes.
Both are excellent!

Resources