Can I split part of an Adobe Analytics listvar into another variable? - adobe

I have a listvar with 19 parameters - this has caused a significant amount of unique values, so the reports are run from back-end tables to avoid "low-traffic" issues.
I have a need to create an Adobe Workspace based on just the first 5 parameters. Is there a way to pass only the first 5 parameters into another variable - i.e. write all 19 parameters to one listvar and just the first 5 in another using classifications or processing rules?
example:
list1 1:this:thing1:important:worthit:de:blah1:de1:::::::::::,2:this:thing2:important2:worthmore:de:blah1:de2:::more::::::::,3:this:thing3:4aez3e:important:de5:blah1:de2:::1more::::andmore::::

Related

Using Predefined Splits in PCR function R PLS package

In order to to ensure a good population representation I have created custom validation sets from my training data. However, I am not sure how I interface this in PCR in R
I have tried to add a list in the segments argument with each index similar to what you do in python predefined splits cv iterator, which runs but takes forever. So I feel I must be making an error somewhere
pcr(y~X,scale=FALSE,data=tdata,validation="CV",segments=test_fold)
where test fold is a list containing the validation set which belongs in the index
For example if the training data is composed on 9 samples and I want to use the first three as the first validation set on son
test_fold<-c(1,1,1,2,2,2,3,3,3)
This runs but it is very slow where if I do regular "CV" it runs in minutes. So far the results look okay but I have a over a thousand runs I need to do and it took 1 hr to get through one. So if anybody knows how I can speed this up I would be grateful.
So the segments parameters needs to be a list of multiple vectors. So going again with 9 samples if I want the first three to be in the first validation set, the next three in the second validation set and so on it should be
test_vec<-list(c(1,2,3),c(4,5,6),c(7,8,9))

Filter - Calculated fields relation in Tableau

I have 20 lists of servers. Suppose we have 50 servers and everyday (for 20 days) we get a list of active servers.
Having this list, I want to calculate the number of times each server has appeared in the lists. Suppose that Server1 has appeared in 16 out of these 20 lists. Here's how I'm doing it:
new calculated field: {FIXED [Server]:COUNT([Server])}
move this calculated field to columns
calculate CNTD (count distinct) and put it in rows
here's the results:
Now here comes the question:
What if I want to draw the very same chart, but only according to the last 5 lists (lists we've got the last 5 days)? If I filter based on paths and take the last 5 lists, the numbers calculated in calculated fields won't update. they're gonna still be 6,8,...16 while there are only 5 lists (the maximum number of appearance should be 5). Any ideas?
Instead of using the FIXED level-of-detail (LOD), use INCLUDE. The order of operations for LOD calculations will run FIXED calculations run before applying any filtering. INCLUDE/EXCLUDE are applied after filtering.
{INCLUDE [Server]:COUNT([Server])}
This image from the online help shows the order of operations for LOD calculations and filtering.
See https://onlinehelp.tableau.com/current/pro/desktop/en-us/calculations_calculatedfields_lod_overview.html for more details.

Is it possible to aggregate data with varying nesting depth in Grafana?

I have data in Grafana with different nesting depths. It looks like this (the nesting depth differs depending on the message type):
foo.<host>.type.<type-id>
foo.<host>.type.<type-id>.<subtype-id>
foo.<host>.type.<type-id>.<subtype-id>.<more-nesting>
...
The <host> field can be the IP of the server sending the data and <type-id> is the type of message that it handled. There are quite a lot of message types but for the visualization I am only interested in the first level of <type-id> aggregated over all hosts.
For example, if I have this data:
foo.ip1.type.type1 = 3
foo.ip1.type.type2.subtype1 = 5
foo.ip2.type.type1 = 4
foo.ip2.type.type2.subtype1 = 9
foo.ip2.type.type2.subtype2 = 13
I would rather see it like this:
foo.*.type.type1 = 7 (3+4)
foo.*.type.type2 = 27 (5+9+13)
Then it would be easier to produce a graph where you can see which types of messages are most frequent.
I have not found a way to express that in Grafana. The only option that I see is to create a graph by manually creating queries for each message type. If there were only a handful of types that would be OK, but in my example, the number of types is quite high and even worse, they can change over time. When new message types are added, I would like to see them without having to change the graph.
Does Grafana support to aggregate the data in such a way? Can it visualize the data aggregated by one node and while summing up everything that comes after the node (like the --max-depth option in the Unix du command)?
I am not very experienced with Grafana, but I am starting to believe this functionality is not supported. Not sure whether Grafana allows to preprocess the data, but if the data could be transformed to
foo.ip1.type.type1 = 3
foo.ip1.type.type2_subtype1 = 5
foo.ip2.type.type1 = 4
foo.ip2.type.type2_subtype1 = 9
foo.ip2.type.type2_subtype2 = 13
it would also be valid workaround as the number of subtypes in very low in my data (often there is even only one subtype).
I think the groupByNode function might be useful to you. By doing something like:
groupByNode(foo.ip1.type.*.*,3,"sumSeries")
You'll need to repeat this for each level of nesting. Hope that helps.
More information is available here:
http://graphite.readthedocs.io/en/latest/functions.html#graphite.render.functions.groupByNode
If you want to do it the way you alluded to in your example you could use aliasSub

Comparison between two Data Sets using R Scripting / TERR in Spotfire

I want to compare the two data table columns ID using R-Script / TERR in spotfire. Due to some limitations in am not able to install the functions called "compare","SQLDf". I can use the functions called "duplicated". Can some one help me in creating the sample script with out using the above functions.
Please find the below images for the detailed requirements.
Two Data Table
Result Table
Thanks,
-Vidya
Let's say you have two vectors setA and setB. You can get the result by
# in A but not in B
setdiff(setA,setB)
# in B but not in A
setdiff(setB,setA)
# both in A and B
intersect(setA,setB)
If you just want to know the count use the length function. This may not be the exact answer you were looking for but using the above functions you can create any set you want. If you need help with a specific logic please update your question.

Creating New Variables in R that relate to

I have 7 different variable in an excel spreadsheet that I have imported into R. They each are columns with a size of 3331. They are:
'Tribe' - there are 8 of them
'Month' - when the sampling was carried out
'Year' - the year when the sampling was carried out
'ID" - an identifier for each snail
'Weight' - weight of a snail in grams
'Length' - length of a snail shell in millimetres
'Width' - width of a snail shell in millimetres
This is a case where 8 different tribes have been asked to record data on a suspected endangered species of snail to see if they are getting rarer, or changing in size or weight.
This happened at different frequencies between 1993 and 1998.
I would like to know how to be able to create a new variables to the data so that if I entered names(Snails) # then it would list the 7 given variables plus any added variable that I have.
The dataset is limited to the point where I would like to add new variables. Such as, knowing the counts per month of snails in any given month.
This would rely on me using - Tribe,Month,Year and ID. Where if an ID (snail identifier) was were listed according to the rates in any given month then I would be able to sum them to see if there are any changes in counts. I have tried:
count=c(Tribe,Year,Month,ID)
count
But, after doing things like that, R just has a large list of that is 4X the size of the dataset. I would like to be able to create a given new variable that is of column size n=3331.
Or maybe I would like to create a simpler variable so I can see if a tribe collected at any given month. I don't know how I can do this.
I have looked at other forums and searched but, there is nothing that I can see that helps me in my case. I appreciate any help. Thanks
I'm guessing you need to organise your variables in a single structure, such as a data.frame.
See ?data.frame for the help file.
To get you started, you could do something like:
snails <- data.frame(Tribe,Year,Month,ID)
snails
# or for just the first few rows
head(snails)
Then this would have your data looking similar to your Excel file like:
Tribe Year Month ID
1 1 1 1 a
2 2 2 2 b
3 3 3 3 c
<<etc>>
Then if you do names(snails) it will list out your column names.
You could possibly avoid some of this mucking about by just importing your Excel file either directly from Excel, or saving as a csv (comma separated values) file first and then using read.csv("name_of_your_file.csv")
See http://www.statmethods.net/input/importingdata.html for some more specifics on this.
To tabulate your data, you can do things like...
table(snails$Tribe)
...to see the number of snail records collected by each tribe. Or...
table(snails$Tribe,snails$Year)
...to see the trends in each tribe by each year. The $ character will let you access the named variable (column) inside a data.frame in the same way you are currently using the free floating variables. This might seem like more work initially, but it will pay off greatly when you need to do some more involved analysis.
Take for example if you want to only analyse the weights from tribe "1", you could do:
snails$Weight[snails$Tribe==1]
# mean of these weights
mean(snails$Weight[snails$Tribe==1])
There are a lot more things I could explain but you would probably be better served by reading an excellent website like Quick-R here: http://www.statmethods.net/management/index.html to get you doing some more advanced analysis and plotting.

Resources