R 2.15.0 lm() on Windows 6.1.7601 - Subsetting data frame receive error when labeling columns - r

Purpose: Subset a dataframe into 2 columns that are labeled with new names
for example:
Age Height
1 65 183
2 73 178
[data1[dataset1$Age>50 | dataset1$Height>140,], c("Age","Cm")]
# Error: unexpected ',' in "data1[data1$Age>50 | data1$Height>140,],"
What I've tried:
data1[dataset1$Age>50 | dataset1$Height>140,] #This doesn't organize results in columns
data1[dataset1$Age>50 | dataset1$Height>140,], c("Age","Cm") #Returns same error
I can't get the columns to be organized side-by-side with the labels in c("label1", "label2"). Thanks for your help! New to R and learning it alongside biostats.

If I got it clearly can subset function be of help
dataset1 <- data.frame(
age=c(44,77,21,55,66,90,23,54,31),
height=c(144,177,121,155,166,190,123,154,131)
)
data1 <- as.data.frame(subset(dataset1,dataset1$age>50 | dataset1$height>140))
colnames(data1) <- c("Age", "Height")

I may have missed what you were trying to do need a bit more reproducible data I think.
Nevertheless I had a go
dataset1 = data.frame(cbind((35:75),(135:175)))
colnames(dataset1) = c("Age","Height")
Age Height
35 135
36 136
37 137
38 138
39 139
40 140
41 141
42 142
43 143
44 144
and subset
data1 = dataset1[dataset1$Age>50 | dataset1$Height>140,]
colnames(data1) = c("Age","Cm")
Age Cm
41 141
42 142
43 143
44 144
45 145
46 146
47 147
48 148
49 149
50 150
My apologies if I missed what you wanted but to me it wasn't very clear.

Related

creating a two-way table with totals in R

I was wondering if there is an easy way to create a table that has the columns as well as row totals?
smoke <- matrix(c(51,43,22,92,28,21,68,22,9),ncol=3,byrow=TRUE)
colnames(smoke) <- c("High","Low","Middle")
rownames(smoke) <- c("current","former","never")
smoke <- as.table(smoke)
I thought this would be super easy, but the solutions i found until now seem to be pretty complicated involving lapply and rbind. However, this seems as such a trivial task, there must be some easier way?
derired results:
> smoke
High Low Middle TOTAL
current 51 43 22 116
former 92 28 21 141
never 68 22 9 99
TOTAL 211 93 52 51
addmargins(smoke)
addmargins is in the stats package.
You can use adorn_totals from janitor :
library(janitor)
library(magrittr)
smoke %>%
as.data.frame.matrix() %>%
tibble::rownames_to_column() %>%
adorn_totals(name = 'TOTAL') %>%
adorn_totals(name = 'TOTAL', where = 'col')
# rowname High Low Middle TOTAL
# current 51 43 22 116
# former 92 28 21 141
# never 68 22 9 99
# TOTAL 211 93 52 356

rowsums accross specific row in a matrix

final.marks
# raj sanga rohan rahul
#physics 45 43 44 49
#chemistry 47 45 48 47
#total 92 88 92 96
This is the matrix I have. Now I want to find the total for each subject separately across respective subject rows and add them as a new column to the above matrix as the 5th column . However my code i.e class.marks.chemistry<- rowSums(final.marks[2,]) keeps producing an error saying
Error saying
rowSums(final.marks[2, ]) :
'x' must be an array of at least two dimensions
Can you please help me solve it. I am very new to R or any form of scripting or programming background.
Do you mean this?
# Sample data
df <- read.table(text =
" raj sanga rohan rahul
physics 45 43 44 49
chemistry 47 45 48 47
total 92 88 92 96", header = T)
# Add column total with row sum
df$total <- rowSums(df);
df;
# raj sanga rohan rahul total
#physics 45 43 44 49 181
#chemistry 47 45 48 47 187
#total 92 88 92 96 368
The above also works if df is a matrix instead of a data.frame.
If you look at ?rowSums you can see that the x argument needs to be
an array of two or more dimensions, containing numeric,
complex, integer or logical values, or a numeric data frame.
So in your case we must pass the entire data.frame (or matrix) as an argument, rather than a specific column (like you did).
Another option would be to use addmargins on a matrix
addmargins(as.matrix(df), 2)
# raj sanga rohan rahul Sum
#physics 45 43 44 49 181
#chemistry 47 45 48 47 187
#total 92 88 92 96 368

R - apriori() not recognising lhs from numerical transaction

I am having real trouble getting my data to produce any rules using the arules package. I have managed to get 100000 rows of transaction data and in SAS the rules are shown. I cannot get it to work in R.
[5] {19,29,40,119,134}
[6] {24,40,45,67,141}
[7] {17,18,57,74,412}
[8] {16,79,90,150,498}
[9] {18,57,111,161,267}
[10] {11,75,131,427,429}
[11] {57,99,111,143,236}
The transactions data looks like this and originally came from a table where all the numbers were separate.
arules <- read.transactions('tid.csv', format = c("basket", "single"),
sep=",")
rules <- apriori(arules,parameter = list(supp = 0.1, conf = 0.1, target =
"rules"))
summary(rules)
For reference the supports and confidence settings make no difference. Sometimes I get this when I inspect the rules.
lhs rhs support confidence lift count
[1] {} => {8,11,96,112,432} 9.710623e-06 9.710623e-06 1 1
[2] {} => {62,134,222,254,412} 9.710623e-06 9.710623e-06 1 1
Any idea why apriori can't separate the items in the transaction? Does this need to be recast into long format and if so how would I do that form this data frame?
V2 V3 V4 V5 V6
8 11 96 112 432
10 35 39 76 119
18 38 68 141 267
29 36 57 61 63
19 29 40 119 134
24 40 45 67 141
17 18 57 74 412
If I understood you correctly then you should try this and let us know if it helped.
library(arules)
library(arulesViz)
#sample data
df <- read.table(text="V2 V3 V4 V5 V6
8 11 96 112 432
10 35 39 76 119
18 38 68 141 267
29 36 57 61 63
19 29 40 119 134
24 40 45 67 141
17 18 57 74 412", header=T)
write.csv(df, "apriori_demo.csv", row.names = F)
#convert sample data into transactions format for apriori algorithm
trx <- read.transactions("apriori_demo.csv", format="basket", sep=",", skip=1)
#apriori rules
apriori_rule <- apriori(trx, parameter = list(supp = 0.1, conf = 0.1))
#obviously you need to have better parameters compared to the one you have used in your post!
inspect(apriori_rule)
plot(apriori_rule, method="graph")

R sum multiple columns with multiple row

So i have this data
10 21 22 23 23 43
20 12 26 43 23 65
21 54 64 73 25 75
My expected outcome is:
142
189
312
I tried to use:
df = data.matrix(df)
df = colSums(df)
df = as.data.frame(df)
However, the sum of values are wrong. I would like to know how to improve or correct this solution?
We can use rowSums
rowSums(df)
#[1] 142 189 312
Your data is stored as factors. You must convert it to numeric using as.numeric(as.character()).
In your situation I suggest to do:
for(i in 1:nrow(df)){
df[i,]<-as.numeric(as.character(df[i,]))
}
rowSums(df)

Loop Linear Regression

As a begginer in R i have a, probably, simple question.
I have a linear regression with this specification:
X1 = X1_t-h + X2_t-h
h for is equal to 1,2,3,4,5:
For example, when h=1 i run this code:
Modelo11 <- dynlm(X1 ~ L(X1,1) + L(X2, 1)-1, data = GDP)
Its a simple regression.
I want to implement a function that gives me the five linear regressions (h=1,2,3,4 and 5) with and without HAC heteroscedasticity estimation:
I did this, and didnt work:
for(h in 1:5){
Modelo1[h] <- dynlm(GDPTrimestralemT ~ L(SpreademT,h) + L(GDPTrimestralemT, h)-1, data = MatrizDadosUS)
coeftest(Modelo1[h], df = Inf, vcov = parzenHAC)
return(list(summary(Modelo1[h])))
}
One of the error message is:
number of items to replace is not a multiple of replacement length
This is my data.frame:
GDP <- data.frame(data )
GDP
X1 X2
1 0.542952690 0.226341364
2 0.102328393 0.743360185
3 0.166345969 0.186533485
4 1.406733422 1.392420181
5 -0.469811005 -0.114609464
6 -0.509268267 0.687555461
7 1.470439930 0.298655018
8 1.046456428 -1.056387597
9 -0.492462197 -0.530284962
10 -0.516065519 0.645957530
11 0.624638996 1.044731264
12 0.213616470 -1.652979785
13 0.669747432 1.398602289
14 0.552089131 -0.821013792
15 0.452715216 1.420094663
16 -0.892063248 -1.436600779
17 1.429284965 0.559738610
18 0.853740565 -0.898976767
19 0.741864168 1.352012831
20 0.171494650 1.704764705
21 0.422326351 -0.267064235
22 -1.261643503 -2.090694608
23 -1.321086283 -0.273954212
24 0.365226000 1.965167113
25 -0.080888690 -0.594498893
26 -0.183293801 -0.483053404
27 -1.033792032 0.586491772
28 0.718322432 1.776210145
29 -2.822693790 -0.731509917
30 -1.251740437 -1.918124078
31 1.184256949 -0.016548037
32 2.255202675 0.303438286
33 -0.930446147 0.803126180
34 -1.691383225 -0.157839283
35 -1.081643279 -0.006652717
36 1.034162006 -1.970063305
37 -0.716827488 0.306792930
38 0.098471514 0.338333164
39 0.343536547 0.389775011
40 1.442117465 -0.668885360
41 0.095131066 -0.298356861
42 0.222524607 0.291485267
43 -0.499969717 1.308312472
44 0.588162304 0.026539575
45 0.581215173 0.167710855
46 0.629343124 -0.052835206
47 0.811618963 0.716913172
48 1.463610069 -0.356369304
49 -2.000576321 1.226446201
50 1.278233553 0.313606888
51 -0.700373666 0.770273988
52 -1.206455648 0.344628878
53 0.024602262 1.001621886
54 0.858933385 -0.865771777
55 -1.592291995 -0.384908852
56 -0.833758365 -1.184682199
57 -0.281305858 2.070391729
58 -0.122848757 -0.308397782
59 -0.661013984 1.590741535
60 1.887869805 -1.240283364
61 -0.313677463 -1.393252994
62 1.142864110 -1.150916732
63 -0.633380499 -0.223923970
64 -0.158729527 -1.245647224
65 0.928619010 -1.050636078
66 0.424317087 0.593892028
67 1.108704956 -1.792833100
68 -1.338231248 1.138684394
69 -0.647492569 0.181495183
70 0.295906675 -0.101823172
71 -0.079827607 0.825158278
72 0.050353111 -0.448453121
73 0.129068772 0.205619797
74 -0.221450137 0.051349511
75 -1.300967949 1.639063824
76 -0.861963677 1.273104220
77 -1.691001610 0.746514122
78 0.365888734 -0.055308006
79 1.297349754 1.146102001
80 -0.652382297 -1.095031447
81 0.165682952 -0.012926971
82 0.127996446 0.510673745
83 0.338743162 -3.141650682
84 -0.266916587 -2.483389321
85 0.148135154 -1.239997153
86 1.256591385 0.051984536
87 -0.646281986 0.468210275
88 0.180472423 0.393014848
89 0.231892902 -0.545305005
90 -0.709986273 0.104969765
91 1.231712844 -1.703489840
92 0.435378714 0.876505107
93 -1.880394798 -0.885893722
94 1.083580732 0.117560662
95 -0.499072654 -1.039222894
96 1.850756855 -1.308752222
97 1.653952857 0.440405804
98 -1.057618294 -1.611779530
99 -0.021821282 -0.807071503
100 0.682923562 -2.358596342
101 -1.132293845 -1.488806929
102 0.319237353 0.706203968
103 -2.393105781 -1.562111727
104 0.188653972 -0.637073832
105 0.667003685 0.047694037
106 -0.534018861 1.366826933
107 -2.240330371 -0.071797320
108 -0.220633546 1.612879694
109 -0.022442941 1.172582601
110 -1.542418139 0.635161458
111 -0.684128812 -0.334973482
112 0.688849615 0.056557966
113 0.848602803 0.785297518
114 -0.874157558 -0.434518305
115 -0.404999060 -0.078893114
116 0.735896917 1.637873669
117 -0.174398836 0.542952690
118 0.222418628 0.102328393
119 0.419461884 0.166345969
120 -0.042602368 1.406733422
121 2.135670836 -0.469811005
122 1.197644287 -0.509268267
123 0.395951293 1.470439930
124 0.141327444 1.046456428
125 0.691575897 -0.492462197
126 -0.490708151 -0.516065519
127 -0.358903359 0.624638996
128 -0.227550909 0.213616470
129 -0.766692832 0.669747432
130 -0.001690915 0.552089131
131 -1.786701123 0.452715216
132 -1.251495762 -0.892063248
133 1.123462446 1.429284965
134 0.237862653 0.853740565
Thanks.
Your variable Modelo1 is a vector which cannot store lm objects. When Modelo1 is a list it should work.
library(dynlm)
df<-data.frame(rnorm(50),rnorm(50))
names(df)<-c("a","b")
c<-list()
for(h in 1:5){
c[[h]] <- dynlm(a ~ L(a,h) + L(b, h)-1, data = df)
}
To get the summary you have to access the single list elements. For example:
summary(c[[1]])
*edit in response to Richard Scriven comment
The most efficent way to to get all summaries would be:
lapply(c, summary)
This applies the summary function to each element of the list and returns a list with the results.

Resources