I need to get the sheet name, I'm using moonland\phpexcel
there is a config in import like this:
['setIndexSheetByName'=>true]
but this only works if there are multiple sheets if there is only one sheet then instead of the first layer of the array having 1 item (which is the sheet) and that items have rows, the first layer of array contains rows, and I don't have access to main sheet name
how can I access it
Related
I'm trying to import and process various XML files using R. Each XML file can contain different variables from various individuals. I would like to identify the values linked with each individual. The output should be a dataframe/table where each row is an individual and each column a variable contained in the XML.
For example, I have the following XML file:
<DatosE xmlns:ns0="tmp" xmlns:ns1="aux">
<ns0:DatosE>
<ns0:Cap>
<ns0:Code>1000</ns0:Code>
<ns0:Year>2022</ns0:Year>
</ns0:Cap>
<ns1:DataBody>
<ns1:RealData>
<ns1:IndividualData identity="1" name="AAA">
<ns1:DataA>
<ns1:Label1>2300.32</ns1:Label1>
<ns1:Label2>5600.90</ns1:Label2>
<ns1:Label3>87</ns1:Label3>
</ns1:DataA>
<ns1:DataB>
<ns1:DataB2>
<ns1:Label4>4500.34</ns1:Label4>
<ns1:Label5>23.20</ns1:Label5>
<ns1:Label6>10000.50</ns1:Label6>
</ns1:DataB2>
</ns1:DataB>
</ns1:IndividualData>
<ns1:IndividualData identity="2" name="BBB">
<ns1:DataA>
<ns1:Label1>4560.24</ns1:Label1>
<ns1:Label2>896.30</ns1:Label2>
<ns1:Label3>790.3</ns1:Label3>
</ns1:DataA>
<ns1:DataB>
<ns1:DataB2>
<ns1:Label4>2004.78</ns1:Label4>
<ns1:Label7>890</ns1:Label7>
<ns1:Label8></ns1:Label8>
</ns1:DataB2>
</ns1:DataB>
</ns1:IndividualData>
</ns1:RealData>
</ns1:DataBody>
</ns0:DatosE>
The output I would like to obtain is something similar as this:
Identify
Name
Label1
Label2
Label5
Label6
Label7
Label8
1
AAA
2300.32
5600.90
23.20
10000.50
NA
NA
2
BBB
4560.24
896.30
NA
NA
890
0
I want to read the different value numbers of the different elements in the XML nodes. Also, I want to link them depending on whose individual the value is. The identification of each individual is in the attributes (identity and name) inside the "ns1:IndividualData" node.
I've tried with 'xmlToDataFrame' function (XML package), and using the XPath synthaxis, but I don't know how to obtain the number/text of the attributes identify and name...I can read the values of the nodes that I want to, but not in the way I would like to link the different data.
I've tried the following function:
xmlToDataFrame(nodes = getNodeSet(xmlParse("xmlGGG.xml"), "//ns1:DataA |
//ns1:DataB2", namespaces = xml_ns(read_xml("xmlGGG.xml"))))
I also have investigated the "xml2" package...but my efforts didn't succeed.
Does anyone know how I can read the different value numbers of the different nodes/elements of my XML and link all of them considering the text element of the attributes than indicates which individual is?
Thank you.
I use Neo4J Community Edition version 3.2.1.
Consider this CSV-file with edges:
node1,relation,node2,type
1,RELATED_TO,2,Married
2,RELATED_TO,1,Married
1,RELATED_TO,3,Child
2,RELATED_TO,3,Child
3,RELATED_TO,4,Sibling
3,RELATED_TO,5,Sibling
4,RELATED_TO,5,Sibling
I have allready created the nodes for this. I then run the following csv load command:
load csv with headers from
"file:///test_dataset/edges.csv" as line
match (person1:Person {pid:line.node1}),
(person2:Person {pid:line.node2})
create (person1)-[:line.relation {type:line.type}]->(person2)
But this returns the following error:
Invalid input '.': expected an identifier character, whitespace, '|', a length specification, a property map or ']' (line 5, column 24 (offset: 167))
"create (person1)-[:line.relation {type:line.type}]->(person2)"
It seems that I cannot use "line.relation" like this. How can I use the relation from the csv-file (second column) using csv load?
I have seen this answer, but I would like to do this using native query language.
To verify that the rest of the query is correct I have managed to create the edges correctly by hardcoding the relation like this:
load csv with headers from
"file:///test_dataset/edges.csv" as line
match (person1:Person {pid:line.node1}),
(person2:Person {pid:line.node2})
create (person1)-[:RELATED_TO {type:line.type}]->(person2)
Natively it's not possible to create a node with a dynamic label and a relationship with a dynamic type.
That's why there is a procedure for that.
If you want to do it natively and you know all the distinct value of your relation column, you can create many cypher script like that (one per value):
LOAD CSV WITH HEADERS FROM "file:///test_dataset/edges.csv" AS line
WITH line WHERE line.relation ='RELATED_TO'
MATCH (person1:Person {pid:line.node1})
MATCH (person2:Person {pid:line.node2})
CREATE (person1)-[:RELATED_TO {type:line.type}]->(person2)
I chunked several novels into a data frame called documents. I want to export each chunk as a separate .txt file.
The data frame that consists of two columns. The first column has the file names for each chunk, and the second column has the actual text that would go into the file.
documents[1,1]
[1] "Beloved.txt_1"
documents[1,2]
[1] "124 was spiteful full of a baby's venom the women......"
class(documents)
[1] "data.frame"
I'm trying to write a for loop that would take each row, make the second column into a .txt file, and make the first column the name of the file. And then to iterate for each row. I've been working with something like this:
for (i in 1:ncol(documents)) {
write(tagged_text, paste("data/taggedCorpus/",
documents[i], ".txt", sep=""))
I've also been reading that maybe the cat function would work well here?
I'm not positive this will work for you (a little more of an example of your input and desired output would help), but one issue you've got is that your for loop is by column rather than by row. If you want to do this once for every row, then it needs to be for (i in 1:nrow(documents) rather than ncol.
Assuming that "documents" is the name of your data.frame and that the column containing the text you want to save is called "tagged_text" and the column with the file name is called "file", try this:
for (i in 1:nrow(documents)) {
write(documents$tagged_text[i], paste0("data/taggedCorpus/",
documents$file[i], ".txt"))
}
Note that you don't need to specify the path every time if you already set it before you start the loop.
I have a pipe delimited .txt Flat File that I'm using to do bulk insert to SQL. Everything works well for straight one to one. However, the Flat File now contains 2 new fields that can repeat an unknown number of times.
Is there a way to create a single flat file schema where I can have an unbounded child within the main unbounded child? I think the place I'm getting tripped up is how to make the ChildRoot listed below just a "group heading" like Root is where ChildRoot doesn't correspond to a location in the flat file. How do I insert something like that?
Schema:
-Roots
--Root (unbounded)
---ChildID
---ChildName
Roots gets a direct link to my sql stored procedure to do a bulk insert on as many "Root" rows that come in.
Now I have:
Schema:
-Roots
--Root (unbounded)
---Child
---ChildName
---ChildRoot (unbounded)
----ChildRootID
----ChildRootName
**EDIT
I should also add that ChildRootID & ChildRootName can repeat an indefinite number of times until the row delimiter (carriage return) is found
I have a flat file with some repeating sections in it, and I'm confused how to create the schema via the BT flat file mapping wizard. The file looks like this:
001,bunch of data
002,bunch of data
006,bunch of data
006A,bunch of data
006B,bunch of data
006B,bunch of data
006,bunch of data
006A,bunch of data
006B,bunch of data
As you can see, the 006* records can repeat. I'm going to want to wind up with XML that looks like this:
<001Stuff>...</001Stuff>
<002Stuff>...</002Stuff>
<006Loop>
<006Stuff>...</006Stuff>
<006AStuff>...</006AStuff>
<006BStuff>...</006BStuff>
<006BStuff>...</006BStuff>
</006Loop>
<006Loop>
<006Stuff>...</006Stuff>
<006AStuff>...</006AStuff>
<006BStuff>...</006BStuff>
</006Loop>
Obviously I can't just set the first group of 006* records to "Repeating record" and Ignore the second set. I'm used to dealing with single repeating rows via the wizard (i.e. another 006 row right after the first one) and not nested things like this - any suggestions on how to proceed? Thanks!
Working with the Flat File Schema Wizard is quite hard and there is only so much it can help you with. I always seem to have to tweak its output a little bit.
In order to make things a little bit easier, I suggest you should restrict your sample document to a single occurrence of the whole <006> structure. You will not have to set many lines to Ignored using the Flat File Schema Wizard :
001,bunch of data
002,bunch of data
006,bunch of data
006A,bunch of data
006B,bunch of data
006B,bunch of data
Next, each repeating structure should be wrapped inside a corresponding Repeating Record in the definition of your Xml Schema.
Please, note that you can always run the Flat File Schema Wizard recursively on nested structures to have more fine-grained control. So I would suggest, first, to run the wizard with an all-encompassing repeating <006> structure, like so :
Then, you can right click on the structure, and provide a more detailed definition of nested child structures, only highlighting a subset of the sample contents, like so:
Then, the most important part: you need to tweak the Child Order property to Conditional Default for both repeating structures, because there is only one empty line at the end of your document file and the Wizard cannot help you out with this situation.
For reference, your resulting structure should look like so:
With the following settings:
BunchOfStuff (Root) : Delimited, 0x0D 0x0A, Suffix.
_001Stuff : Delimited, ,, Prefix, Tag Identifier 001.
_002Stuff : Delimited, ,, Prefix, Tag Identifier 002.
_006Loop : Delimited, 0x0D 0x0A, Conditional Default.
_006Stuff : Delimited, ,, Prefix, Tag Identifier 006.
_006AStuff : Delimited, ,, Prefix, Tag Identifier 006A.
_006BLoop : Delimited, 0x0D 0x0A, Conditional Default.
_006BStuff : Delimited, ,, Prefix, Tag Identifier 006B.
Hope this helps.
Treat everything from the first start of the first 006, record to the start of the second 006, record as one record. When you define 006 record, set it up as a repeating record also. This should create a node for each 660, group and nodes for each 600 under it.
That is what I would try.
Here is my output after 2 minutes of work. Except for the node/element names I think it is what you want. You would still have to create seperate elements for each of the fields in your data.
<_x0030_01 xmlns="">001,bunch of data
<_x0030_02 xmlns="">002,bunch of data
<_x0030_06 xmlns="">
<_x0030_06_Child1>bunch of data
<_x0030_06_Child2>
<_x0030_06_Child2_Child1>A,bunch of data
<_x0030_06_Child2>
<_x0030_06_Child2_Child1>B,bunch of data
<_x0030_06_Child2>
<_x0030_06_Child2_Child1>B,bunch of data
<_x0030_06 xmlns="">
<_x0030_06_Child1>bunch of data
<_x0030_06_Child2>
<_x0030_06_Child2_Child1>A,bunch of data
<_x0030_06_Child2>
<_x0030_06_Child2_Child1>B,bunch of data