I am comparing each component in two matrices and pairs of dimnames.
Below is example matrices
mat1
| | D1| D2| D3| D4|
| D1 | NA| 1| 2| 2|
| D2 | NA| NA| 3| 8|
| D3 | NA| NA| NA| 8|
| D4 | NA| NA| NA| NA|
mat2
| | D1| D2| D3| D4|
| D1 | NA| 2| 4| 1|
| D2 | NA| NA| 13| 10|
| D3 | NA| NA| NA| 5|
| D4 | NA| NA| NA| NA|
I tried to compare it using "for" and get pairs with code below.
check = c()
for(i in 2:4){
for(j in 1:i-1){
if(mat1[j,i] < mat2[j,i])
{check= append(check, 100*j + i}
}
}
But comes out with this error messages
Error in if (mat1[j, i] < mat2[j, i]) { :
argument is of length zero)
I can understand that this error comes from here,
> mat1[j, i] < mat2[j, i]
logical(0)
But don't know how to solve it.
Or any other approaches to solve this problem would be super helpful thank you!
Comparing each components in two matrix with for statement
Related
I am currently trying to create a column that reflects a sequence from a recursive hierarchy in Pyspark. This is how the data looks like.
data = [(1,"A",None),(1,"B","A"),(1,"C","A"),(1,"D","C"),(1,"E","B"),(2,"A",None),(2,"B",None),(2,"C","A"),(2,"D","A"),(2,"E","D")]
df = spark.createDataFrame(data, "ID integer, Child string, Parent string")
+---+-----+------+
| ID|Child|Parent|
+---+-----+------+
| 1| A| null|
| 1| B| A|
| 1| C| A|
| 1| D| C|
| 1| E| B|
| 2| A| null|
| 2| B| null|
| 2| C| A|
| 2| D| A|
| 2| E| D|
+---+-----+------+
The expected result:
+---+-----+------+--------+
| ID|Child|Parent|Sequence|
+---+-----+------+--------+
| 1| A| null| 1|
| 1| B| A| 2|
| 1| C| A| 2|
| 1| D| C| 3|
| 1| E| B| 3|
| 2| A| null| 1|
| 2| B| null| 0|
| 2| C| A| 2|
| 2| D| A| 2|
| 2| E| D| 3|
+---+-----+------+--------+
What would be the best way to approach this?
I am aware that in SQL you can do this with recursive CTE, but there is no similar way to do it via Pyspark according to my investigation.
Recursively joining Dataframes seems to be the way to accomplish this, however it does seem expensive and overcomplex.
Is there a more native/efficient way to accomplish this?
I'm looking in pivottabler for the option to build the table for 2 parallel fields, like that:
SEX | POPULATION GROUPS
______________________|_______________________________________
1 | 2 | 1 | 2 | 3 | 4
___________|_________|_________|_________|_________|___________
AgeGroups |AgeGroups|AgeGroups|AgeGroups|AgeGroups|AgeGroups
| | | | | | | | | | | | | | | | |
1 | 2| 3| 1 | 2| 3| 1 | 2| 3| 1 |2 | 3| 1 | 2| 3| 1 | 2| 3
______|__|__|___|__|__|___|__|__|___|__|__|___|__|__|___|__|___
| | | | | | | | | | | | | | | | |
How can I add in AddColumnDataRows 2 or more fields parallel and not hierarchically?
This can be done by specifying atLevel=1 when adding the population group columns to the pivot table.
I have made up some sample data to use in an example below.
n <- 100
sex <- sample(x=c("M","F"), size=n, replace=TRUE)
pg <- sample(x=c("pg1","pg2","pg3","pg4"), size=n, replace=TRUE)
ag <- sample(x=c("ag1","ag2","ag3"), size=n, replace=TRUE)
grp <- sample(x=c("g1","g2","g3","g4"), size=n, replace=TRUE)
df <- data.frame(sex, pg, ag, grp)
library(pivottabler)
pt <- PivotTable$new()
pt$addData(df)
pt$addColumnDataGroups("sex", addTotal=FALSE)
pt$addColumnDataGroups("pg", atLevel=1, addTotal=FALSE)
pt$addColumnDataGroups("ag", addTotal=FALSE)
pt$addRowDataGroups("grp")
pt$defineCalculation(calculationName="Count", summariseExpression="n()")
pt$renderPivot()
Example output:
Hope that helps
Chris
If I parse do.call(what=knitr::kable,args=args) the function kable in do.call is parsed to as a SYMBOL and not as a SYMBOL_FUNCTION_CALL.
Why shouldn't it be the later?
tf <- tempfile()
cat('do.call(knitr::kable,args=args)',file = tf)
parsed <- utils::getParseData(parse(tf))
knitr::kable(parsed)
| | line1| col1| line2| col2| id| parent|token |terminal |text |
|:--|-----:|----:|-----:|----:|--:|------:|:--------------------|:--------|:-------|
|18 | 1| 1| 1| 31| 18| 0|expr |FALSE | |
|1 | 1| 1| 1| 7| 1| 3|SYMBOL_FUNCTION_CALL |TRUE |do.call |
|3 | 1| 1| 1| 7| 3| 18|expr |FALSE | |
|2 | 1| 8| 1| 8| 2| 18|'(' |TRUE |( |
|7 | 1| 9| 1| 20| 7| 18|expr |FALSE | |
|4 | 1| 9| 1| 13| 4| 7|SYMBOL_PACKAGE |TRUE |knitr |
|5 | 1| 14| 1| 15| 5| 7|NS_GET |TRUE |:: |
|6 | 1| 16| 1| 20| 6| 7|SYMBOL |TRUE |kable |
|8 | 1| 21| 1| 21| 8| 18|',' |TRUE |, |
|11 | 1| 22| 1| 25| 11| 18|SYMBOL_SUB |TRUE |args |
|12 | 1| 26| 1| 26| 12| 18|EQ_SUB |TRUE |= |
|13 | 1| 27| 1| 30| 13| 15|SYMBOL |TRUE |args |
|15 | 1| 27| 1| 30| 15| 18|expr |FALSE | |
|14 | 1| 31| 1| 31| 14| 18|')' |TRUE |) |
If you just have ktable its a symbol. That symbol could point to a function or a value. It's not clear until you actually evaluate it what it is.
However if you have ktable(), it's clear that you expect ktable to be a function and that you are calling it.
The do.call obscures the parser's ability to recognize that you are trying to call a function and that intention isn't realized till run-time.
Things can get funny if you do something like
sum <- 5
sum
# [1] 5
sum(1:3)
# [1] 6
Here sum is behaving both like a regular variable and a function. We've actually created a shadow variable in our global environment that masks the sum function from base. But because the parse treats sum and sum() differently we can still get at both meanings.
I would like to report descriptive values in a table (I am sure they should be in a table and not in a figure). The data comes from a 3-factorial experiment, so the table that I am able to produce with xtable (I'm doing it in an Rmarkdown and Knitr and have never used LaTex) contains one line per data value along the format:
group | condition | type | value
When all the lines are printed below each other, this in not very readable, for example the "group" entry remains the same for 10 lines. Is there a possibility to just print it the first time (in the first line) and then omit it until the "group" changes to the next group (only print it in line 11)?
My table should have apa-format, so I use either rapa::apa(mytable) or papaja::apa_table(mytable) for the final print.
Any help would be appreciated, thanks!
There are a few different ways to do this.
library(data.table)
dt = data.table("Group" = c(rep("A",4),rep("B",4)), "value" = rep(1:4, each = 2))
knitr::kable(dt)
> dt
Group value
1: A 1
2: A 1
3: A 2
4: A 2
5: B 3
6: B 3
7: B 4
8: B 4
We can remove duplicates across all rows
knitr::kable(dt[!duplicated(dt),])
|Group | value|
|:-----|-----:|
|A | 1|
|A | 2|
|B | 3|
|B | 4|
Or, we can remove duplicates according to specific rows
knitr::kable(unique(dt,by = c("Group")))
|Group | value|
|:-----|-----:|
|A | 1|
|B | 3|
Then, since that can match to multiple options we can specify which one we want to grab
knitr::kable(dt[unique(dt,by = c("Group")),.(Group, value), mult = "first"])
|Group | value|
|:-----|-----:|
|A | 1|
|B | 3|
knitr::kable(dt[unique(dt,by = c("Group")),.(Group, value), mult = "last"])
|Group | value|
|:-----|-----:|
|A | 2|
|B | 4|
EDIT
To not print values in a specific group that have been duplicated
dt$Group = ifelse(duplicated(dt$Group),"",dt$Group)
knitr::kable(dt)
|Group | value|
|:-----|-----:|
|A | 1|
| | 1|
| | 2|
| | 2|
|B | 3|
| | 3|
| | 4|
| | 4|
You can use duplicated function with negation (!) to retain values of "group" only at transitions but be careful that is does
not result in loss of information from other columns (if they are important). In the demo datset we retain only transitions of cyl variable.
mtcarsSubset = mtcars[,1:5]
knitr::kable(mtcarsSubset)
#| | mpg| cyl| disp| hp| drat|
#|:-------------------|----:|---:|-----:|---:|----:|
#|Mazda RX4 | 21.0| 6| 160.0| 110| 3.90|
#|Mazda RX4 Wag | 21.0| 6| 160.0| 110| 3.90|
#|Datsun 710 | 22.8| 4| 108.0| 93| 3.85|
#|Hornet 4 Drive | 21.4| 6| 258.0| 110| 3.08|
#|Hornet Sportabout | 18.7| 8| 360.0| 175| 3.15|
#|Valiant | 18.1| 6| 225.0| 105| 2.76|
#|Duster 360 | 14.3| 8| 360.0| 245| 3.21|
#|Merc 240D | 24.4| 4| 146.7| 62| 3.69|
#|Merc 230 | 22.8| 4| 140.8| 95| 3.92|
#|Merc 280 | 19.2| 6| 167.6| 123| 3.92|
#|Merc 280C | 17.8| 6| 167.6| 123| 3.92|
#|Merc 450SE | 16.4| 8| 275.8| 180| 3.07|
#|Merc 450SL | 17.3| 8| 275.8| 180| 3.07|
#|Merc 450SLC | 15.2| 8| 275.8| 180| 3.07|
#|Cadillac Fleetwood | 10.4| 8| 472.0| 205| 2.93|
#|Lincoln Continental | 10.4| 8| 460.0| 215| 3.00|
#|Chrysler Imperial | 14.7| 8| 440.0| 230| 3.23|
#|Fiat 128 | 32.4| 4| 78.7| 66| 4.08|
#|Honda Civic | 30.4| 4| 75.7| 52| 4.93|
#|Toyota Corolla | 33.9| 4| 71.1| 65| 4.22|
#|Toyota Corona | 21.5| 4| 120.1| 97| 3.70|
#|Dodge Challenger | 15.5| 8| 318.0| 150| 2.76|
#|AMC Javelin | 15.2| 8| 304.0| 150| 3.15|
#|Camaro Z28 | 13.3| 8| 350.0| 245| 3.73|
#|Pontiac Firebird | 19.2| 8| 400.0| 175| 3.08|
#|Fiat X1-9 | 27.3| 4| 79.0| 66| 4.08|
#|Porsche 914-2 | 26.0| 4| 120.3| 91| 4.43|
#|Lotus Europa | 30.4| 4| 95.1| 113| 3.77|
#|Ford Pantera L | 15.8| 8| 351.0| 264| 4.22|
#|Ferrari Dino | 19.7| 6| 145.0| 175| 3.62|
#|Maserati Bora | 15.0| 8| 301.0| 335| 3.54|
#|Volvo 142E | 21.4| 4| 121.0| 109| 4.11|
knitr::kable(mtcarsSubset[!duplicated(mtcarsSubset$cyl),])
#| | mpg| cyl| disp| hp| drat|
#|:-----------------|----:|---:|----:|---:|----:|
#|Mazda RX4 | 21.0| 6| 160| 110| 3.90|
#|Datsun 710 | 22.8| 4| 108| 93| 3.85|
#|Hornet Sportabout | 18.7| 8| 360| 175| 3.15|
Finally, I changed the data-frame that then is converted into a table.
ReplicationTable %>% mutate(dependent_variable = ifelse(duplicated(dependent_variable), "", dependent_variable)
This replaces all entries with an empty string after the first unique entry in dependent_variable. This also works in grouped data-frames.
have a sql table with the columns and want to import it into the sql table, at the same time remove the hypnens under the column and the last delimter. please help.
timestamp |cp |type |count |fail_count |succ |fail |
-------------------|---|--------|-----------|-----------|-----------|-----------|
2014.12.15 00:00:00| 1| 5| 5| 0| 143| 0|
2014.12.15 01:00:00| 1| 5| 30| 0| 945| 0|
2014.12.15 02:00:00| 1| 5| 30| 0| 1055| 0|
2014.12.15 03:00:00| 1| 5| 24| 0| 816| 0|
2014.12.15 04:00:00| 1| 5| 28| 0| 882| 0|
2014.12.15 05:00:00| 1| 5| 6| 0| 155| 0|
2014.12.15 06:00:00| 1| 5| 12| 0| 236| 0|