I have a problem with saving XML file from R.
Firstly I write my code here:
doc = newXMLDoc()
document = newXMLNode("Document", doc = doc)
set = newXMLNode("Settings", parent = document)
elements = newXMLNode("Elements", parent = set)
newXMLNode("Canvas", parent = elements, attrs = c(Id = "1"))
newXMLNode("Canvas", parent = elements, attrs = c(Id = "2"))
objcol = newXMLNode("ObjectCollection", parent = document)
timeSeries1 = newXMLNode("Timeseries", parent = objcol)
timeSeries2 = newXMLNode("Timeseries", parent = objcol)
saveXML(doc, file="test.dtv", indent = T,
prefix = '<?xml version="1.0" encoding="utf-8" standalone="no"?>\n')
So, if I save doc without prefix, all is good, but i haven't prefix in my ouput file. When I add prefix attribute to function saveXML, output is really bad. It has only one '\n' after prefix(because I write it in prefix string), but all document is on one line. I haven't ideas how to fix it.
Thank you for your attention.
So, I'am also quite surprised why this is not working, but found a "workaround" to it. Hope this is helpfull.
cat(saveXML(doc,
indent = TRUE,
prefix = "<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"no\"?>\n"),
file="test.dtv")
Related
I'm trying to parse a directory with a collection of xml files from RSS feeds.
I have a similar code for another directory working fine, so I can't figure out the problem. I want to return the items so I can write them to a CSV file. The error I'm getting is:
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 0
Here is the site I've collected RSS feeds from: https://www.ba.no/service/rss
It worked fine for: https://www.nrk.no/toppsaker.rss and https://www.vg.no/rss/feed/?limit=10&format=rss&categories=&keywords=
Here is the function for this RSS:
import os
import xml.etree.ElementTree as ET
import csv
def baitem():
basepath = "../data_copy/bergens_avisen"
table = []
for fname in os.listdir(basepath):
if fname != "last_feed.xml":
files = ET.parse(os.path.join(basepath, fname))
root = files.getroot()
items = root.find("channel").findall("item")
#print(items)
for item in items:
date = item.find("pubDate").text
title = item.find("title").text
description = item.find("description").text
link = item.find("link").text
table.append((date, title, description, link))
return table
I tested with print(items) and it returns all the objects.
Can it be how the XML files are written?
Asked a friend and said to test with a try except statement. Found a .DS_Store file, which only applies to Mac computers. I'm providing the solution for those who might experience the same problem in the future.
def baitem():
basepath = "../data_copy/bergens_avisen"
table = []
for fname in os.listdir(basepath):
try:
if fname != "last_feed.xml" and fname != ".DS_Store":
files = ET.parse(os.path.join(basepath, fname))
root = files.getroot()
items = root.find("channel").findall("item")
for item in items:
date = item.find("pubDate").text
title = item.find("title").text
description = item.find("description").text
link = item.find("link").text
table.append((date, title, description, link))
except Exception as e:
print(fname, e)
return table
I've created a dictionary from an Uploaded file in Django.
This dictionary has a nested list of dictionaries:
file = {"name": "filename", "sections": [{"section_name": "string", "lines": [{line_number: 0, "line"; "data"}]}], "etc": "etc"}
The model represents the dictionaries depth too.
class Line(EmbeddedDocument):
line_number = IntField()
line = StringField()
definition = ReferenceField(Definition)
class Section(EmbeddedDocument):
section_name = StringField()
lines = EmbeddedDocumentListField(Line))
class File(Document):
name = StringField()
sections = EmbeddedDocumentListField(Section))
created_on = DateTimeField()
created_by = StringField()
modified_on = DateTimeField()
modified_by = StringField()
In the POST I have the following to chop the file up into the above Dict (the file is a simple text file):
file= {}
with open(os.path.join(path, filename + ".txt"), 'r') as temp_file:
filelines = temp_file.readlines()
sections = []
section = {}
lines = []
for i, l in enumerate(filelines):
if i == 0:
section["section_name"] = "Top"
elif '*' in l:
if l.index('*') == 0 and '*' not in lines[len(lines) - 2"line"]:
section["lines"] = lines
lines = []
sections.append(section)
section = dict()
section["section_name"] = filelines[i + 1][1:-2]
line = {"line_number": i + 1, "line": l}
lines.append(line)
section['lines'] = lines
sections.append(section)
file["name"] = filename
file["sections"] = sections
I will tidy this up eventually.
Once the dict has been made how do I serialise it using the serializer?
Is it possible to insert this into a serializer?
If not how can I get it all into the database with validation?
I've tried json.dumps() and JsonRequst() then putting them in data= for the serializer but get Unable to get repr for <class '....'>
I'm pretty new to Django and MongoDB so if you need more info I can provide :)
Thanks!
Update
Change the model's List Fields to EmbeddedDocumentListField as suggest in the answer.
Answered
Thanks to Boris' suggestion below it pointed me to an error I wasn't getting initially. I had a typo and passing the dict directly into FileSerializer(data=file) works like a charm! :)
James!
The easiest way to validate that your incoming JSONs adhere to the Mongoengine Documents schema that you've specified is to use DRF-Mongoengine's DocumentSerializer.
Basically, what you need to do is create a serializer
serializers.py
import rest_framework_mongoengine
class FileSerializer(rest_framework_mongoengine.DocumentSerializer):
class Meta:
fields = '__all__'
model = File
Then you need a view or viewset that makes use of this Serializer to respond to GET/POST/PUT/DELETE requests.
views.py
from rest_framework_mongoengine import viewsets
class FileViewSet(viewsets.ModelViewSet):
lookup_field = 'id'
serializer_class = FileSerializer
def get_queryset(self):
return File.objects.all()
and register this viewset with a router
urls.py
from rest_framework import routers
# this is DRF router for REST API viewsets
router = routers.DefaultRouter()
# register REST API endpoints with DRF router
router.register(r'file', FileViewSet, r"file")
I'd also recommend using EmbeddedDocumentListField instead of ListField(EmbeddedDocumentField(Section)) - it has additional methods.
Path path = Paths.get(access.getFilePath());
Charset charset = StandardCharsets.UTF_8;
String content = new String(Files.readAllBytes(path), charset);
String originalText;
File input = new File(access.getFilePath());
Document doc = Jsoup.parse(input, "UTF-8");
Element htmlElement = doc.select(htmlValue.get(punchLineTextField.getId())).first();
originalText = htmlElement.toString();
htmlElement.text(punchLineTextField.getText());
content = content.replaceAll(originalText, htmlElement.toString());
htmlElement = doc.select(htmlValue.get(newsDatePicker.getId())).first();
originalText = htmlElement.toString();
htmlElement.text(newsDatePicker.getValue().toString());
content = content.replaceAll(originalText, htmlElement.toString());
htmlElement = doc.select(htmlValue.get(welcomeMessageTextArea.getId())).first();
originalText = htmlElement.toString();
htmlElement.text(welcomeMessageTextArea.getText());
content = content.replaceAll(originalText, htmlElement.toString());
Files.write(path, content.getBytes(charset));
The TextField and DatePicker values are saved into the file correctly (which is an HTML file).
However, TextArea text is not being saved for some reason. It grabs the original text correctly but does not save it properly. Why?
SOLUTION: replace function doesn't like new lines in text. Either remove newlines from text or use regex to remove lines while reading file.
I'm using PHPExcel library for many Excel manipulations, combined with PHP/MySQL.
That helps me well.
But I can't figure how to split an Excel document sheet by sheet,where each sheet is created as a new Excel document.
I also need, at the same time, to delete the empty lines which are in the original document in the new Excel documents produced (cleaning up the final docs).
What's the best way to do it ?
All your experiences are greatly appreciated.
Best regards.
I have found the way of what I wanted.
Here is a solution (maybe not the best way, but it works fine enough) :
$file = $_POST['file'];
$filename = pathinfo($file, PATHINFO_FILENAME);
require_once 'phpexcel/Classes/PHPExcel.php';
$xls = new PHPExcel();
$xlsReader= new PHPExcel_Reader_Excel5();
$xlsTemplate = $xlsReader->load($file);
$sheet1 = $xlsTemplate->getSheetByName('Sheet1');
$xls->addExternalSheet($sheet1,0);
$xls->removeSheetByIndex(1);
$xlsWriter = new PHPExcel_Writer_Excel5($xls);
$xlsWriter->save($filename."_Sheet1.xls");
$sheet2 = $xlsTemplate->getSheetByName('Sheet2');
$xls->addExternalSheet($sheet2,0);
$xls->removeSheetByIndex(1);
$xlsWriter = new PHPExcel_Writer_Excel5($xls);
$xlsWriter->save($filename."_Sheet2.xls");
$sheet3 = $xlsTemplate->getSheetByName('Sheet3');
$xls->addExternalSheet($sheet3,0);
$xls->removeSheetByIndex(1);
$xlsWriter = new PHPExcel_Writer_Excel5($xls);
$xlsWriter->save($filename."_Sheet3.xls");
$sheet4 = $xlsTemplate->getSheetByName('Sheet4');
$xls->addExternalSheet($sheet4,0);
$xls->removeSheetByIndex(1);
$xlsWriter = new PHPExcel_Writer_Excel5($xls);
$xlsWriter->save($filename."_Sheet4.xls");
$sheet5 = $xlsTemplate->getSheetByName('Sheet5');
$xls->addExternalSheet($sheet5,0);
$xls->removeSheetByIndex(1);
$xlsWriter = new PHPExcel_Writer_Excel5($xls);
$xlsWriter->save($filename."_Sheet5.xls");
$sheet6 = $xlsTemplate->getSheetByName('Sheet6');
$xls->addExternalSheet($sheet6,0);
$xls->removeSheetByIndex(1);
$xlsWriter = new PHPExcel_Writer_Excel5($xls);
$xlsWriter->save($filename."_Sheet6.xls");
Then, my original Excel file, containing 6 sheets is now splitted in 6 Excel files, as I wanted.
As you can see, it was not so hard to release, but the documentation is so confusing...
Hope this can help.
I'm trying to parse an RSS feed using LINQ to Xml
This is the rss feed:
http://www.surfersvillage.com/rss/rss.xml
My code is as follows to try and parse
List<RSS> results = null;
XNamespace ns = "http://purl.org/rss/1.0/";
XNamespace rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#";
XDocument xdoc = XDocument.Load("http://www.surfersvillage.com/rss/rss.xml");
results = (from feed in xdoc.Descendants(rdf + "item")
orderby int.Parse(feed.Element("guid").Value) descending
let desc = feed.Element("description").Value
select new RSS
{
Title = feed.Element("title").Value,
Description = desc,
Link = feed.Element("link").Value
}).Take(10).ToList();
To test the code I've put a breakpoint in on the first line of the Linq query and tested it in the intermediate window with the following:
xdoc.Element(ns + "channel");
This works and returns an object as expect
i type in:
xdoc.Element(ns + "item");
the above worked and returned a single object but I'm looking for all the items
so i typed in..
xdoc.Elements(ns + "item");
This return nothing even though there are over 10 items, the decendants method doesnt work either and also returned null.
Could anyone give me a few pointers to where I'm going wrong? I've tried substituting the rdf in front as well for the namespace.
Thanks
You are referencing the wrong namespace. All the elements are using the default namespace rather than the rdf, so you code should be as follow:
List<RSS> results = null;
XNamespace ns = "http://purl.org/rss/1.0/";
XDocument xdoc = XDocument.Load("http://www.surfersvillage.com/rss/rss.xml");
results = (from feed in xdoc.Descendants(ns + "item")
orderby int.Parse(feed.Element(ns + "guid").Value) descending
let desc = feed.Element(ns + "description").Value
select new RSS
{
Title = feed.Element(ns + "title").Value,
Description = desc,
Link = feed.Element(ns + "link").Value
}).Take(10).ToList();