Purpose : I want to repeat the analysis i have already done in python using R.codes are below kindly help write equivalent code in R:
Question no 1:
For below table
caught bowled run out lbw stumped
62 21 8 4 4
caught and bowled hit wicket
2 1
But then I when I converted it back to `dataframe` for using `ggplot` it so coming as
A Freq
1 1 1
2 2 1
3 4 2
4 8 1
5 21 1
6 62 1
How to i avoid this? kindly advice?
******Question no 2 :****
```python code is as below:
len(df_warner\[df_warner\['batsman_runs'\]==6\])
# what is Eqivalent R syntax?
df_six<-df_warner2[(df_warner2$batsman_runs==6),]
nrow(df_six) # worked well
One day I tried to execute my routine cspade sequences mining in R and it suddenly failed with error and some very strange print to console. Here is the example code:
library(arulesSequences)
data(zaki)
cspade(zaki, parameter=list(support=0.5))
It throws very long output (even with option control=list(verbose=F)) followed by an error:
CONF 4 9 2.7 2.5
MINSUPPORT 2 4
MINMAX 1 4
1 SUPP 4
2 SUPP 4
4 SUPP 2
6 SUPP 4
numfreq 4 : 0 SUMSUP SUMDIFF = 0 0
EXTRARYSZ 2465792
OPENED C:\Users\Dawid\AppData\Local\Temp\Rtmp279Wy5\cspade2cd4751e5905.idx
OFF 9 38
Wrote Offt 0.00099802
BOUNDS 1 5
WROTE INVERT 0.000998974
Total elapsed time 0.00299406
MINSUPPORT 2 out of 4 sequences
1 -- 4 4
2 -- 4 4
4 -- 2 2
6 -- 4 4
1 6 -- 3 3
2 6 -- 4 4
4 -> 6 -- 2 2
4 -> 2 6 -- 2 2
1 2 6 -- 3 3
1 2 -- 3 3
4 -> 2 -- 2 2
2 -> 1 -- 2 2
4 -> 1 -- 2 2
6 -> 1 -- 2 2
4 -> 6 -> 1 -- 2 2
2 6 -> 1 -- 2 2
4 -> 2 6 -> 1 -- 2 2
4 -> 2 -> 1 -- 2 2
Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
cannot open file
'C:\Users\Dawid\AppData\Local\Temp\Rtmp279Wy5\cspade2cd4751e5905.out': No
such file or directory
It looks like it is printing the mined rules to the console (which has never happened before). And it ends with error so I can't write the rules into a variable. Looks like some problem with writing temporary files maybe?
My configuration:
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Packages:
arulesSequences_0.2-19
arules_1.6-1
(arulesSequences have new version but on the latest version arulesSequences_0.2-20 it fails in the same way)
Thank you!
One workaround is to use the R console, not Rstudio.
Well, it should work fine then. I see that more people have the same problem. I have tried reinstalling Rstudio together with reinstalling packages and using older Rstudio version but it didn't work.
Hope it helps but I would be grateful for a full answer. Thanks!
Help sought from anyone.
I have a household survey data set named h2004 and would like to create a variable equals to another variable that satisfy certain condition. Here I have put a sample of observations.
cq15 expen
10 0.4616136
10 1.538712
11 2.308068
11 0.384678
12 2.576797822
12 5.5393632
13 5.4624276
14 2.6158104
14 20.157127
and I tried the following command:
h2004$crops[h2004$cq15>=12 & h2004$cq15<=14]=h2004$expen
and this produces wrong results in R as I know the correct result from using Stata. In the original data set, the above command takes values of 'expen' even when cq15<12 and replaces the observations against cq15>=12 & cq15<=14.
I also tried with filter option of dplyr that correctly subset the data frame but don't know how to apply it to specific variable.
fil<- filter(h2004, cq15>=12 & cq15<=14)
I think my subsetting (cq15>=12 & cq15<=14) is wrong. Please advice. Thanks
The problem is in the command. When the command is executed, the following warning message is issued:
Warning message:
In h2004$crops[h2004$cq15 >= 12 & h2004$cq15 <= 14] = h2004$expen :
number of items to replace is not a multiple of replacement length
The reason for this is that the LHS of this command selects elements satisfying condition h2004$cq15 >= 12 & h2004$cq15 <= 14 while on the RHS, the complete vector h2004$expen is given causing mismatch in length.
Solution:
> h2004$crops[h2004$cq15>=12 & h2004$cq15<=14]=h2004$expen[h2004$cq15>=12 & h2004$cq15<=14]
> h2004
cq15 expen crops
1 10 0.4616136 NA
2 10 1.5387120 NA
3 11 2.3080680 NA
4 11 0.3846780 NA
5 12 2.5767978 2.576798
6 12 5.5393632 5.539363
7 13 5.4624276 5.462428
8 14 2.6158104 2.615810
9 14 20.1571270 20.157127
or Alternatively:
> indices <- which(h2004$cq15>=12 & h2004$cq15<=14)
> h2004$crops[indices] = h2004$expen[indices]
> h2004
cq15 expen crops
1 10 0.4616136 NA
2 10 1.5387120 NA
3 11 2.3080680 NA
4 11 0.3846780 NA
5 12 2.5767978 2.576798
6 12 5.5393632 5.539363
7 13 5.4624276 5.462428
8 14 2.6158104 2.615810
9 14 20.1571270 20.157127
I am using the RDS package for respondent-driven sampling survey data. I want to convert a regular R data frame to an rds.data.frame. To do so, I have been trying to use the as.rds.data.frame function from RDS.
Here is an excerpted section of my data frame, where the first case (id=1) is the 'seed' respondent (who has no recruiter). It contains the variables: id (respondent id number), recruit.id(id number of respondent who recruited him/her), netsize (respondent's network size) and population (estimate of whole population size).
df<-data.frame(id=c(1,2,3,4,5,6,7,8,9,10),
recruit.id=c(-1,1,1,2,2,4,5,3,8,3),
netsize=c(6,6,6,5,5,4,4,3,4,6), population=rep(22,000, 10))
I then (try to) apply the relevant function:
new.df <-as.rds.data.frame(df,id=df$id,
recruiter.id=df$recruit.id,
network.size=df$netsize,
population.size=df$population,
max.coupons=2)
I get the error message:
Error in as.rds.data.frame(df, id = df$id, recruiter.id = df$recruit.id,: Invalid id
and the warning
In addition: Warning message:In if (!(id %in% names(x))) stop("Invalid id") :
the condition has length > 1 and only the first element will be used
I have tried assigning various 'recruiter id' values for seed participants, including -1,0 or their own id number but I still get the same message. I have also tried eliminating function arguments (coupon.max, population) or deleting seed respondents, but I still get the same message.
Package documentation says the function will fail if recruitment information is incomplete. As far as I can tell, this is not the case.
I am new to this, so if anyone can point me in the right direction I would be really grateful.
This seems to work:
colnames(df)[2:4] <- c("recruiter.id", "network.size.variable", "population.size")
as.rds.data.frame(df,max.coupons=2)
This gives a result with a warning
as.rds.data.frame(df, id="id", recruiter.id="recruit.id",
network.size="netsize", population.size="population", max.coupons=2)
# An object of class "rds.data.frame"
#id: 1 2 3 4 5 6 7 8 9 10
#recruiter.id: -1 1 1 2 2 4 5 3 8 3
# id recruit.id netsize population
#1 1 -1 6 22
#2 2 1 6 22
#3 3 1 6 22
#4 4 2 5 22
#5 5 2 5 22
#6 6 4 4 22
#7 7 5 4 22
#8 8 3 3 22
#9 9 8 4 22
#10 10 3 6 22
# Warning message:
#In as.rds.data.frame(df, id = "id", recruiter.id = "recruit.id", :
#NAs introduced by coercion
I am a beginner in R, but I am aware I should look for answers before asking a question here. I did, looked into help files, but to no avail. The problem is as follows: when I ask for a summary of subset X, the output of the two columns is as below. I wanted to have only the output for the answer, which I am able to to, but it is presented differently (see the output at the bottom). I want to have the results presented as a table, not as a list.
summary(X, max = 12)
results in:
student answer
Min. : 335 0 - Not at all likely : 35
1st Qu.: 855480 1 : 18
Median :1831962 10 - Extremely likely :9336
Mean :1519041 2 : 23
3rd Qu.:2183663 3 : 19
Max. :2607132 4 : 15
5 - Neutral : 939
6 : 235
7 : 921
8 :1844
9 :1194
option_i4x-DelftX-ET3034TUx-problem-b3d30df864ca41ffa0170e790f01a783_2_1_dummy_default: 71
Because I am only interested in the summary stats for answer, I used
summary(X$answer, max = 12)
And then I get the list below as answer.
0 - Not at all likely
35
1
18
10 - Extremely likely
9336
2
23
3
19
4
15
5 - Neutral
939
6
235
7
921
8
1844
9
1194
option_i4x-DelftX-ET3034TUx-problem-b3d30df864ca41ffa0170e790f01a783_2_1_dummy_default
71
You should try
summary(X["answer"], max = 12)
since X["answer"] is not a vector like X$answer but a one-column data frame.
EDIT: I just found out that if you want to save/export, my solution
below gives more useful output (as a table).
write.csv(data.frame(summary(X$answer)), "X.csv")
I played around a bit more, and with #JT85's suggestion, I found a nice solution.
data.frame(summary(X$answer))
and
data.frame(table(X$answer))
both work and give the output I want.
PS. It is a coincidence I found it so quickly after posting the question. This has been bugging me for 2 days already.
The output I get for data.frame(summary...) is as follows:
summary.A1.answer.
0 - Not at all likely 35
1 18
10 - Extremely likely 9336
2 23
3 19
4 15
5 - Neutral 939
6 235
7 921
8 1844
9 1194
option_i4x-DelftX-ET3034TUx-problem-b3d30df864ca41ffa0170e790f01a783_2_1_dummy_default 71