My pig statement generates the following output:
({(10)},5201)
({(20),(20),(20)},3334)
({(30),(30),(30),(30)},4632)
({(40),(40)},3101)
({(50),(50)},3801)
({(60),(60),(60)},3959)
But I want to store above output as below in pig:
(10,5201)
(20,3334)
(30,4632)
(40,3101)
(50,3801)
(60,3959)
Is there any way to extract the very first element from bag in pig?
Use the Datafu UDF FirstTupleFromBag to achieve exactly this!
Related
I have the following XML structure. I am trying to extract the attributes StartDate and EndDate of the relationship period, that is only if rr:PeriodType is RELATIONSHIP_PERIOD.
However, the nodes for "relationship" and "accounting" have exactly the same name and am not sure how to proceed.
<rr:RelationshipPeriods>
<rr:RelationshipPeriod>
<rr:StartDate>2018-01-01T00:00:00.000Z</rr:StartDate>
<rr:EndDate>2018-12-31T00:00:00.000Z</rr:EndDate>
<rr:PeriodType>ACCOUNTING_PERIOD</rr:PeriodType>
</rr:RelationshipPeriod>
<rr:RelationshipPeriod>
<rr:StartDate>2019-01-02T00:00:00.000Z</rr:StartDate>
<rr:PeriodType>RELATIONSHIP_PERIOD</rr:PeriodType>
</rr:RelationshipPeriod>
</rr:RelationshipPeriods>
I tried using this code
ldply(xpathApply(xmlData, '//rr:RelationshipPeriod/rr:StartDate', getChildrenStrings), rbind)
But doesn't work well as it's hard to understand if it is extracting accounting or relationship period.
Any help would be greatly appreciated!
For rr:StartDate use XPath:
//rr:RelationshipPeriod[rr:PeriodType='RELATIONSHIP_PERIOD']/rr:StartDate
But probably better to first find the correct rr:RelationshipPeriod using XPath:
//rr:RelationshipPeriod[rr:PeriodType='RELATIONSHIP_PERIOD']
See this answer on how to reuse the result of a XPath.
But don't use // in front of rr:StartDate and rr:EndDate
I'm parsing a JSON using the RJSONIO package.
The parsed item contains nested lists.
Each item in the list can be extracted using something like this:
dat_raw$`12`[[31]]
which correctly returns the string stored at this location (in this example, the '12' refers to the month and [[31]] to day).
"31-12-2021"
I now want to run a for loop to sequentially extract the date for every month. Something like this:
for (m in 1:12) {
print(dat_raw$m[[31]])
}
This, naturally, returns a NULL because there is no $m[[31]] in the list.
Instead, I'd like to extract the objects stored at $`1`[[31]], $`2`[[31]], ... $`12`[[31]].
There must be a relatively easy solution here but I haven't managed to crack it. I'd value some help. Thanks.
EDIT: I've added a screenshot of the list structure I'm trying to extract. The actual JSON object is quite large for a dput() output. Hope this helps
So, to get the date in this list, I'd use something like dat_raw$data$`1`[[1]]$date$gregorian$date.
What I'm trying to do is run a loop to extract multiple items of the list by cycling through $data$`1`[[1]]$..., $data$`2`[[1]]$... ... $data$`12`[[1]]$... using $data$m[[1]]$... in a for loop where m is the month.
Instead of dat_raw$`12`[[31]], you can have dat_raw[[12]][[31]] if 12 is the 12th element of the JSON. So your for loop would be:
for (m in 1:12) {
print(dat_raw[[m]][[31]])
}
Imagine i have a multidimensional List like this vals = [['John', '20'], ['Derron', '5'], ['Mike', '43']], what can i do to print out only the names e.g: John, Derron, Mike
In which language you want to do this? Using JavaScript you can try below solution.
function print(val) {
console.log(val[0])
}
vals.forEach(print)
You can use a nested for loop. In python For example:
for sublist in vals:
print(sublist[0])
The first line of this code will loop through each sublist. For this question, ['John','20'] is a sublist. The second line will print out the first element (aka name) of each of these sublists.
I would like to automatically create a vector with the following elements:
elements<-c("elem[1]","elem[1]" .... "elem[100]")
without typing elem[1], elem[2] etc by hand. How can I do this automatically?
Thanks
You can use paste0():
#Code
paste0('elem[',1:100,']')
How do you use indirect references in R? More specifically, in the following simple read statement, I would like to be able to use a variable name to read inputFile into data table myTable.
myTable <- read.csv(inputFile, sep=",", header=T)
Instead of the above, I want to define
refToMyTable <- "myTable"
Then, how can I use refToMyTable instead of myTable to read inputFile into myTable?
Thanks for the help.
R doesn't really have references like that, but you can use strings to retrieve/create variables of that name.
But first let me say this is generally not a good practice. If you're looking to do this type of thing, it's generally a sign that you're not doing it "the R way.'
Nevertheless
assign(refToMyTable, read.csv(inputFile, sep=",", header=T))
Should to the trick. And the complement to assign is get to retrieve a variable's value using it's name.
I think you mean something like the following:
reftomytable='~/Documents/myfile.csv'
myTable=read.csv(reftomytable)
Perhaps assign as mentioned by MrFlick.
When you want the contents of the object named "myTable" you would use get:
get("myTable")
get(refToMyTable) # since get will evaluate its argument
(It would be better to assign results of multiple such dataframes to a ist object or a Reference Class.)
If you wanted a language-name object you would use as.name:
as.name("myTable")
# myTable .... is printed at the console; note no quotes
str(as.name("myTable"))
#symbol myTable