reshape error - invalid 'row.names' length - r

I have the following database (in wide form), "st_all", where I have got two variables I wish to reshape ("P" and "PLC"). The id for the subjects is "g_id".
g_id study condition sample PLC1 PLC2 PLC3 PLC4 PLC5 PLC6 PLC7 PLC8 PLC9 PLC10 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10
1 1 1 1 1 20 20 20 50 50 20 30 20 50 50 1 2 2 1 2 2 1 1 1 1
2 2 1 1 1 60 70 50 70 60 60 60 70 60 50 1 2 1 1 2 2 1 1 1 1
3 3 1 1 1 80 50 55 58 70 50 80 80 60 65 1 2 2 1 2 2 1 1 1 1
4 4 1 1 1 89 51 59 62 72 60 86 80 61 54 1 1 2 1 2 2 1 1 1 1
5 5 1 1 1 90 50 60 70 80 50 90 80 60 50 1 1 1 1 2 2 1 1 1 1
6 6 1 1 1 95 50 60 100 95 60 50 60 60 55 1 2 2 1 2 2 1 1 1 1
To do so I ran the following code:
reshape(st_all,
idvar="g_id",
direction="long",
varying=list(c(5:14),c(15:24)),
v.names=c("PLC","P")
)
and I get the following error:
Error in `row.names<-.data.frame`(`*tmp*`, value = paste(d[, idvar], times[1L], :
invalid 'row.names' length
I have searched for an answer to this, but I do not find any.
Thanks in advance.

As noted in the comments, you'll have problems with the reshape function when your data is a tbl.
Use as.data.frame first:
reshape(as.data.frame(st_all),
idvar = "g_id",
direction = "long",
varying = list(c(5:14), c(15:24)),
v.names = c("PLC","P"))

Related

how to summarize one variable through group_by on another variable, so the output variable is connected to grouped variable

I'm trying to summarize the counts of one variable through grouping one variable, so that the total_count is connected to each row of the grouped variable.
I want to be able to add the "emp" column by grouping fam_id, so that total_employed reflects the numbers of employed in family for all within the same fam_id
acs_5years
fam_id emp ins age
33 1 1 45
33 0 1 23
44 1 1 19
44 1 0 26
44 1 0 54
44 1 0 50
77 1 1 33
77 1 1 38
77 1 1 44
88 1 0 65
88 0 0 90
should look like:
fam_id emp ins age total_employed
33 1 1 45 1
33 0 1 23 1
44 1 1 19 4
44 1 0 26 4
44 1 0 54 4
44 1 0 50 4
77 1 1 33 3
77 1 1 38 3
77 1 1 44 3
88 1 0 65 1
88 0 0 90 1
I've tried the following code:
sample_grouping <- acs_5years %>% group_by(fam_id) %>%
summarize(total_count=n(),.groups = 'drop') %>%
as.data.frame()
sample_grouping
#######
sample_2 <- acs_5years %>% group_by(fam_id) %>%
summarize(total_count=(emp))
sample_2
I'm not sure I'm getting correct results.
Any help or suggestions would be greatly appreciated, thanks in advance!
emp of fam_id 44 is different and your code is different with your data, but you may try
df %>%
group_by(fam_id) %>%
mutate(total_employed = sum(emp))
fam_id emp ins age total_employed
<int> <int> <int> <int> <int>
1 33 1 1 45 1
2 33 0 1 23 1
3 44 1 1 19 3
4 44 1 0 26 3
5 44 1 0 54 3
6 44 0 0 50 3
7 77 1 1 33 3
8 77 1 1 38 3
9 77 1 1 44 3
10 88 1 0 65 1
11 88 0 0 90 1

How to get p values for odds ratios from an ordinal regression in r

I am trying to get the p values for my odds ratio from an ordinal regression using r.
I previously constructed my p values on the log odds like this
scm <- polr(finaloutcome ~ Size_no + Hegemony + Committee, data = data3, Hess = TRUE)
(ctable <- coef(summary(scm)))
Calculate and store p value
p <- pnorm(abs(ctable[, "t value"]), lower.tail = FALSE) * 2
## combined table
(ctable <- cbind(ctable, "p value" = p))
I created by odds ratios like this:
ci <- confint.default(scm)
exp(coef(scm))
## OR and CI
exp(cbind(OR = coef(scm), ci))
However, I am now unsure how to create the p values for the odds ratio. Using the previous method I got:
(ctable1 <- exp(coef(scm)))
p1 <- pnorm(abs(ctable1[, "t value"]), lower.tail = FALSE) * 2
(ctable <- cbind(ctable, "p value" = p1))
However i get the error: Error in ctable1[, "t value"] : incorrect number of dimensions
Odds ratio output sample:
Size
Hegem
Committee
9.992240e-01
6.957805e-02
1.204437e-01
Data sample:
finaloutcome
Size_no
Committee
Hegemony
1
3
54
2
0
2
2
127
3
0
3
2
127
3
0
4
2
22
1
1
5
2
193
4
1
6
2
54
2
0
7
NA
11
1
1
8
3
54
2
0
9
3
22
1
1
10
2
53
3
1
11
2
53
3
1
12
2
53
3
1
13
2
53
3
1
14
2
53
3
1
15
2
53
3
1
16
2
120
3
0
17
2
120
3
0
18
1
22
1
1
19
1
22
1
1
20
2
193
4
1
21
2
193
4
1
22
2
193
4
1
23
2
12
4
1
24
2
35
1
1
25
1
193
4
1
26
1
164
4
1
27
1
12
4
1
28
2
12
4
1
29
2
193
4
1
30
2
54
2
0
31
2
193
4
1
32
2
193
4
1
33
2
54
2
0
34
2
12
4
1
35
2
22
1
1
36
4
53
3
1
37
2
35
1
1
38
1
193
4
1
39
5
54
2
0
40
7
164
4
1
41
5
54
2
0
42
1
12
4
1
43
7
193
4
1
44
2
193
4
1
45
2
193
4
1
46
2
193
4
1
47
2
193
4
1
48
2
193
4
1
49
2
12
4
1
50
2
22
1
1
51
2
12
4
1
52
2
12
4
1
53
6
13
1
1
54
6
13
1
1
55
6
13
1
1
56
6
12
4
1
57
2
193
4
1
58
3
12
4
1
59
1
12
4
1
60
1
12
4
1
61
8
35
1
1
62
2
193
4
1
63
8
35
1
1
64
6
30
2
1
65
8
12
4
1
66
4
12
4
1
67
5
30
2
1
68
5
54
2
0
69
7
12
4
1
70
5
12
4
1
71
5
54
2
0
72
5
193
4
1
73
5
193
4
1
74
5
54
2
0
75
5
54
2
0
76
1
11
1
1
77
3
22
1
1
78
3
12
4
1
79
6
12
4
1
80
2
22
1
1
81
8
193
4
1
82
8
193
4
1
83
4
193
4
1
84
2
193
4
1
85
2
193
4
1
86
2
193
4
1
87
2
193
4
1
88
2
193
4
1
89
2
193
4
1
90
2
193
4
1
91
2
193
4
1
92
2
193
4
1
93
8
193
4
1
94
6
12
4
1
95
5
12
4
1
96
5
12
4
1
97
5
12
4
1
98
5
12
4
1
99
5
12
4
1
100
5
12
4
1
I usually use lm or glm to create my model (mdl <- lm(…) or mdl <- glm(…)). Then I use summary on the object to see these values. More than this, you can use the Yardstick and Broom. I recommend the book R for Data Science. There is a great explanation about modeling and using the Tidymodels packages.
I went through the same difficulty.
I finally used the fonction tidy from the broom package: https://broom.tidymodels.org/reference/tidy.polr.html
library(broom)
tidy(scm, p.values = TRUE)
This does not yet work if you have categorical variables with more than two levels, or missing values.

How to swap/shuffle values within a column in R?

I want to anonymize data using cell swapping. Therefore I want to conditionally swap values within a column.
My data looks like:
Sex Age Houeshold_size
0 95 2
0 95 3
1 90 1
1 90 5
1 45 1
1 45 1
1 34 1
1 34 1
1 34 1
1 34 1
I want to give swap values so everyone above a certain age has a household size of 1. In this case 90 or older. So my outcome has to look like:
Sex Age Houeshold_size
0 95 1
0 95 1
1 90 1
1 90 1
1 45 1
1 45 1
1 34 2
1 34 3
1 34 5
1 34 1
It is more that I want to know how to conditionally swap data instead of solving this example, since its just a fraction of my data.
Thanks for helping me out, cheers.
You can use the following :
#Get the index where Age is 90 or higher
inds <- which(df$Age >= 90)
#replace `Houeshold_size` where age is less than 90 with that of inds
df$Houeshold_size[sample(which(df$Age < 90), length(inds))] <- df$Houeshold_size[inds]
#Change household size of inds to 1
df$Houeshold_size[inds] <- 1
# Sex Age Houeshold_size
#1 0 95 1
#2 0 95 1
#3 1 90 1
#4 1 90 1
#5 1 45 1
#6 1 45 3
#7 1 34 2
#8 1 34 1
#9 1 34 1
#10 1 34 5

dplyr append group id sequence?

I have a dataset like below, it's created by dplyr and currently grouped by ‘Stage', how do I generate a sequence based on unique, incremental value of Stage, starting from 1 (for eg row$4 should be 1 row#1 and #8 should be 4)
X Y Stage Count
1 61 74 1 2
2 58 56 2 1
3 78 76 0 1
4 100 100 -2 1
5 89 88 -1 1
6 47 44 3 1
7 36 32 4 1
8 75 58 1 2
9 24 21 5 1
10 12 11 6 1
11 0 0 10 1
I tried the approach in below post but didn't work.
how to mutate a column with ID in group
Thanks.
Here is another dplyr solution:
> df
# A tibble: 11 × 4
X Y Stage Count
<dbl> <dbl> <dbl> <dbl>
1 61 74 1 2
2 58 56 2 1
3 78 76 0 1
4 100 100 -2 1
5 89 88 -1 1
6 47 44 3 1
7 36 32 4 1
8 75 58 1 2
9 24 21 5 1
10 12 11 6 1
11 0 0 10 1
To create the group id's use dpylr's group_indicies:
i <- df %>% group_indices(Stage)
df %>% mutate(group = i)
# A tibble: 11 × 5
X Y Stage Count group
<dbl> <dbl> <dbl> <dbl> <int>
1 61 74 1 2 4
2 58 56 2 1 5
3 78 76 0 1 3
4 100 100 -2 1 1
5 89 88 -1 1 2
6 47 44 3 1 6
7 36 32 4 1 7
8 75 58 1 2 4
9 24 21 5 1 8
10 12 11 6 1 9
11 0 0 10 1 10
It would be great if you could pipe both commands together. But, as of this writing, it doesn't appear to be possible.
After some experiment, I did %>% ungroup() %>% mutate(test = rank(Stage)), which will yield the following result.
X Y Stage Count test
1 100 100 -2 1 1.0
2 89 88 -1 1 2.0
3 78 76 0 1 3.0
4 61 74 1 2 4.5
5 75 58 1 2 4.5
6 58 56 2 1 6.0
7 47 44 3 1 7.0
8 36 32 4 1 8.0
9 24 21 5 1 9.0
10 12 11 6 1 10.0
11 0 0 10 1 11.0
I don't know whether this is the best approach, feel free to comment....
update
Another approach, assuming the data called Node
lvs <- levels(as.factor(Node$Stage))
Node %>% mutate(Rank = match(Stage,lvs))

Calculate variable based on criteria in r

How can i add a new column to my data frame that would take into consideration some criteria such as:
ID AGE PERNO
1 30 1
1 25 2
2 25 1
2 24 2
2 3 3
3 65 1
3 55 2
to end with a table like:
ID AGE PERNO AGE_HEAD
1 30 1 30
1 25 2 30
2 25 1 25
2 24 2 25
2 3 3 25
3 65 1 65
3 55 2 65
Pretty much have the age of perno one in all the rows related to the id
Plyr solution:
library(plyr)
ddply(df,.(ID),transform,AGE_HEAD=head(AGE,1))
OR
ddply(df,.(ID),transform,AGE_HEAD=AGE[PERNO==1])
ID AGE PERNO AGE_HEAD
1 1 30 1 30
2 1 25 2 30
3 2 25 1 25
4 2 24 2 25
5 2 3 3 25
6 3 65 1 65
7 3 55 2 65
data.table solution:
library(data.table)
DT<-data.table(df)
DT[, AGE_HEAD := AGE[PERNO==1], by="ID"]
ID AGE PERNO AGE_HEAD
1: 1 30 1 30
2: 1 25 2 30
3: 2 25 1 25
4: 2 24 2 25
5: 2 3 3 25
6: 3 65 1 65
7: 3 55 2 65
As far as I understand, what you want is choosing the value of AGE for each level of ID when PERNO is 1 which in this example is the same (by chance) as taking just the maximum value of AGE, if I'm not wrong, this code is what are after.
> transform(df, AGE_HEAD=rep(df$AGE[df$PERNO==1], rle(df$ID)$lengths))
ID AGE PERNO AGE_HEAD
1 1 30 1 30
2 1 25 2 30
3 2 25 1 25
4 2 24 2 25
5 2 3 2 25
6 3 65 1 65
7 3 55 2 65

Resources