I've got a bunch of audio files (let's say ogg or mp3), with metadata.
I wish to read their metadata into R so to create a data.frame with:
file name
file location
file artist
file album
etc
Any way you know of for doing that ?
You take an existing mp3 or ogg client, look at what library it uses and then write a binding for said library to R, using the existing client as guide for that side -- and something like Rcpp as a guide on the other side to show you how to connect C/C++ libraries to R.
No magic bullet.
A cheaper and less reliable way is to use a cmdline tool that does what you want and write little helper functions that use system() to run that tool over the file, re-reading the output in R. Not pretty, not reliable, but possibly less challenging.
Possible, yes, easy, no.
You "could" use a combination of readChar and/or readBin on the file and parse out the contents. This would be highly dependent, though, on parsing the frame tags from the raw bytes of the ID3v2 tag (and mind you it would change if it was a version 1 tag). If would certainly be a lot of work to implement a straight R solution. Take this Python code for example, it's very clean straight python code but a lot of branching and parsing.
You can use exiftool with system command available in R. Optionally, you can create regexp to handle the fields you need... If I were you, I'd stick with Dirk's advice (as usual) =)!
Out here in 2021, I wanted to do this so I did the following...
Create a new playlist while in 'songs' view.
Select all songs and drag to the new playlist. Highlight that playlist
File> Library>Export Playlist. My default file was to save as .txt, if not, designate.
Open Excel to save as csv or read.delim() in r as the txt file is tab-separated
import to R
Related
I am developing some tool by make use of python 3.4.2 shell. But i stopped while dealing with file pointers in python. Basically what i am doing is merging the file with another file. I came across scenario, wherein i will search for a string in the file, the other file content i need to merge should start from that string. So please could anyone suggest me on this??
How to move reversely in a file using python file commands?
i tried using seek() and tell() but didn't workout.
can i use seek(-5,1) to go back 5 bytes back and write from that position?All answers are appreciated. Thanks
I'm processing data where the source is a manual update in Excel format which is provided monthly.
One of the quirk of the data is that some cancelled records are indicated either by the person keying in the data highlighting the cells in red, or changing the fonts to be strikeout. Unfortunately I have no control on the data entry source, so I have had to regularly manually search the file for red cells or strike out fonts, and manually clean them up (either delete, or add a column with status as Cancelled, depending on the usage).
Does anyone have suggestion on the best data cleaning practice for this? Is there an automated approach for this, or do I simply have to be resigned to documenting the steps, and executing them regularly?
For info my preferred tool is R, so if there is a way to clean it from within R, that would be best. I'm open to other approaches.
There are a few R packages for working with Excel, but filtering data based on formatting is going to involve using rcom and excel's COM interface. I'm not terribly familiar with either.
The route I would go is to write a VBA macro which would filter the data, wrap that macro in a VBS script, and call that script from the command line (or via R's system or shell functions)
The reason I would go that route is that both VBA and VBS are very easy to pick up if you have any familiarity at all with programming. COM on the other hand isn't something that people gain a level of comfort with very quickly.
VBA is what will give you access to the excel formatting. (Visual Basic for Applications). VBS is what you will need to automate the macro via the command line rather than from within Excel (Visual Basic Scripting Edition).
You can do this in Excel. Start by recording a macro, then put a colour based filter on the column you want (to select only the red cells), delete the rows and then stop recording of the macro.
This should give you a macro. You might need to make some small changes, you can google specific commands for more details.
I need to parse a large trace file (up to 200-300 MB) in a Flex application. I started using JSON instead of XML hoping to avoid these problems, but it did not help much. When the file is bigger than 50MB, JSON decoder can't handle it (I am using the as3corelib).
I have doing some research and I found some options:
Try to split the file: I would really like to avoid this; I don't want to change the current format of the trace files and, in addition, it would be very uncomfortable to handle.
Use a database: I was thinking of writing the trace into a SQLite database and then reading from there, but that would force me to modify the program that creates the trace file.
From your experience, what do you think of these options? Are there better options?
The program that writes the trace file is in C++.
Using AMF will give you much smaller data sizes for transfer because it is a binary, not text format. That is the best option. But, you'll need some middleware to translate the C++ program's output into AMF data.
Check out James Ward's census application for more information about benchmarks when sharing data:
http://www.jamesward.com/census/
http://www.jamesward.com/2009/06/17/blazing-fast-data-transfer-in-flex/
Maybe you could parse the file into chunks, without splitting the file itself. That supposes some work on the as3 core lib Json parser, but it should be doable, I think.
I found this library which is a lot faster than the official one: https://github.com/mherkender/actionjson
I am using it now and works perfectly. It also has asynchronous decoder and encoder
Is there a pre-existing library to extract plain text form Open XML file formats (e.g. docx, pptx, and xlsx) files?
I require this to populate a lucene.net index.
I've found this example which extracts text from docx and it seems to work okay. But before building my own solution based on this I was wondering if there's something already available for the other file formats?
Before spending cash, it may be worth looking at the IFilter interface - these were/are designed to do exactly what you want.
http://msdn.microsoft.com/en-us/library/ms691105
http://www.codeproject.com/KB/cs/IFilter.aspx
(Some links at the bottom of the codeprject link).
MS provide IFilters for office file types.
http://www.microsoft.com/downloads/details.aspx?familyid=60c92a37-719c-4077-b5c6-cac34f4227cc&displaylang=en
I know that we use this technology to allow us to index PDFs using Lucene but I did not write the actual code and cannot be of much use I am afraid.
If your Google-fu is strong I am sure you can dig up more examples of using IFilters to do exactly what you want.
watch aspose.com, they have a good library to handle both ppt and pptx.
You can try Toxy, an open source text/data extraction framework for .NET. For now, it supports xls, xlsx, doc, docx. It will support pptx in version 1.5 very soon.
For detail, you can check here
The last time I produced a catalog I used a software called EasyCatalog that worked with Adobe InDesign to merge data from a spreadsheet with graphics. I wouldn’t say it was completely successful. I know of one other catalog building software called Catalog Builder by Computer Pundits. I'm just looking for any suggestions from someone who might have gone through this process on what software I should use.
InDesign can create a really beautiful output from XML. Depending on the catalog content's complexity, you can either have a straightforward mapping of the elements in your catalog to the paragraph and character styles of the IDD file, or you may need to preprocess the XML with XSLT.
For example, if your data source can output the content as XML, but it doesn't map easily to InDesign tables, XSLT can be used to make the XML more "IDD-friendly" before you import it.
IDML is another way to handle XML content; instead of importing the XML content manually or with a script into your catalog template, you generate the IDML directly from your XML. (IDML files are a package of XML files that describe the page/spreads, fonts, swatches, text, images, etc. of the InDesign file.) You're probably going to need XSLT consulting help if this is not a skill you already have.
Take a look at the InDesign documentation for XML for the version you use. IDML is for CS4 or CS5.
Have a look at xCS.press by a company in Belgium. XML markup that is parsed to InDesign. Great for product catalogs.
I wouldn't use Catalog Builder by Computer Pundits again. I've used it in the past (mostly their website builder) and it is completely outdated in my opinion. Their templates are not easily customized and it was pretty slow for me. As for their website builder,(in case you're wondering) it's all tables and very little css IDs or Classes throughout the html.
I haven't used InDesign, but it seems there are lot of scripting features.
Easiest thing that comes to mind is creating an XML Schema with IDML to
get data from Excel into an InDesign document.
XML schema basically is a template for XML documents, and they're called XML Maps in Microsoft Office.
Am not familiar with catalog tools, try superuser.com as well for 3rd party tools & tips.
John. have you come across additional solutions since posting this? I wonder if you have considered CatBase or EM Software solutions.
InDesign works wonderfully with XML and XSLT. You can export the data from Windows Excel only to XML, but only when you create an XML-compatible worksheet. Don't save the file as an XML spreadsheet, that file is useless in InDesign.
What I do is either create a schema file (xsd) for the data that you want to use and import that into Excel on Windows (Mac version doesn't support XML) Once the schema is imported you can create an XML worksheet based on this schema and then copy and paste the data from the non-XML worksheet into the XML sheet. Once the data is in the spreadsheet you can export to an XML file and import it into InDesign.
As mentioned above you can map XML tags to Paragraph and character styles and create dynamic layout directly in InDesign or by using an XSLT to structure the data before you import it.
MS Access allows you to export directly to XML. If you move your data to InDesign you can save the time needed to build the XML spreadsheet. Image references have to be built properly before you export to XML or build an XSLT that will do it on the fly as you import the data in to your layout.
The entire process is described in detail in the book A Designer's Guide to Adobe InDesign and XML.
If the data is in MS Access Woodwing has a product that allows you to interface and import data for a catalog. I have not used it personally but I know people who have. Also, another product called In-Data also interfaces with InDesign, but I have no experience with that either. I usually just use XML and XSLT myself.
I've used EasyCatalog very successfully for a number of years now, even for really large catalogs (35,000+ articles). In the meantime, I offer EC consulting and hands-on user training as well.
I'd need many more details of what went wrong with your specific catalog in order to be able to point your attention to a different solution that may better fit your needs.
I personally would not recommend Jim Maivalds solution because a) Excel and Unicode are not friends b) working with Excel and XML really is a pain c) the process is relatively complicated d) you need a lot of specialized skills regarding XML, XSLT programming and so on e) it's not bi-directional f) when updating you'll do the whole process again.
With EasyCatalog, you just import your data into a panel and place them from there into your document, from manually up to fully automatically. It's really easy, and it's bi-directional - so you can update your document from the database at any time and - if you need to - your database from your document. By the way, you can import your data directly from Excel into an EasyCatalog panel as well.
However, EasyCatalog might not be the best solution if graphics are included in you spreadsheet as well - but who would ever include the real graphics in a spreadsheet instead of the name (and maybe path) to the actual graphic files?