I am trying to visualise some Likert data. I have been successful doing this across all respondents using likert()
df <- data[,11:14]
df[] <- lapply(df, factor,
levels=c(1,2,3,4,5),
labels = c("Strongly disagree", "Disagree", "Neutral","Agree", "Strongly agree"))
df <- data.frame(df)
LK <- likert(df)
LK <- likert(summary = LK$results)
plot(LK, include.center=FALSE)+ggtitle("Title")
To cut this by group i have used the following code (which works):
LK2 <- likert(df, grouping = data$group)
str(LK2$results)
'data.frame': 34 obs. of 7 variables:
$ Group : Factor w/ 9 levels "NSW","VIC", "QLD": 1 1 1 1 1 1 1 1 1 1 ...
$ Item : Factor w/ 4 levels "Question_1",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Strongly disagree: num 18.18 13.64 12.12 15.15 9.09 ...
$ Disagree : num 19.7 16.7 10.6 24.2 22.7 ...
$ Neutral : num 27.3 34.8 45.5 22.7 27.3 ...
$ Agree : num 24.2 22.7 18.2 24.2 24.2 ...
$ Strongly agree : num 10.6 12.1 13.6 13.6 16.7 ...
But then when i make a summary out of these results to plot:
LK3 <- likert(summary = LK2$results, grouping = LK2$results[,1])
str(LK3)
List of 5
$ results :'data.frame': 28 obs. of 8 variables:
..$ Group : Factor w/ 9 levels "NSW","VIC","QLD",..: 1 1 1 1 2 2 2 2 3 3 ...
..$ Group : Factor w/ 9 levels "NSW","VIC","QLD",..: 1 1 1 1 2 2 2 2 3 3 ...
..$ Item : Factor w/ 4 levels "Question_1",..: 1 2 3 4 1 2 3 4 1 2 ...
..$ Strongly disagree: num [1:28] 16.22 16.22 8.11 18.92 14.29 ...
..$ Disagree : num [1:28] 18.9 18.9 13.5 16.2 19 ...
..$ Neutral : num [1:28] 35.1 27 48.6 21.6 42.9 ...
..$ Agree : num [1:28] 16.2 24.3 13.5 29.7 19 ...
..$ Strongly agree : num [1:28] 13.51 13.51 16.22 13.51 4.76 ...
$ items : NULL
$ grouping: Factor w/ 9 levels "NSW","VIC","QLD",..: 1 1 1 1 2 2 2 2 3 3 ...
$ nlevels : num 6
$ levels : chr [1:6] "Item" "Strongly disagree" "Disagree" "Neutral" ...
- attr(*, "class")= chr "likert"
It seems to duplicate 'Group', and when i try to plot i get the following error message:
plot(LK3)
Error in FUN(newX[, i], ...) : invalid 'type' (character) of argument
No idea why this is happening - any help would be appreciated
Related
In my data, Hemoglobin_group is a categorical variable with 6 levels. When I run the below code, I can't see Hemoglobin_group levels in the output. How can I solve this problem?
fit <- coxph(Surv(Time,Status)~Hemoglobin_group, data)
fit
My output:
coef exp(coef) se(coef) z p
Hemoglobin_group -0.06585 0.93627 0.01874 -3.514 0.000441
str(data)
tibble [2,021 x 21] (S3: tbl_df/tbl/data.frame)
$ status : num [1:2021] 1 1 1 1 1 1 0 0 0 0 ...
$ id : num [1:2021] 1 1 1 1 1 1 2 2 2 2 ...
$ Time : num [1:2021] 20.3 20.3 20.3 20.3 20.3 ...
$ t1 : num [1:2021] 0 1 2 3 4 5 0 1 2 3 ...
$ t2 : num [1:2021] 1 2 3 4 5 ...
$ sex_string : chr [1:2021] "MALE" "MALE" "MALE" "MALE"...
$ sex : num [1:2021] 1 1 1 1 1 1 1 1 1 1 ...
$ age : num [1:2021] 89 89 89 89 89 89 77 77 77 77 ...
$ hemoglobin : num [1:2021] 9.71 10.22 11.3 11.8 11.2 ...
$ Diabet : num [1:2021] 1 1 1 1 1 1 2 2 2 2 ...
$ Hemoglobin_group : num [1:2021] 4 4 4 4 4 4 5 5 5 5 ...
$ Kreatinin : num [1:2021] 7.19 7.19 7.19 7.19 7.19 ...
$ fosfor : num [1:2021] 4.14 4.14 4.14 4.14 4.14 ...
$ Kalsiyum : num [1:2021] 8.5 8.5 8.5 8.5 8.5 ...
$ CRP : num [1:2021] 1.33 1.33 1.33 1.33 1.33 ...
$ Albumin : num [1:2021] 4.19 4.19 4.19 4.19 4.19 ...
$ Ferritin : num [1:2021] 428 428 428 428 428 ...
$ months : num [1:2021] 1 2 3 4 5 6 1 2 3 4 ...
It looks like when I write str (data). I thought I was transforming into a factor by doing the following codes in my data. I guess I couldn't transform. I did not understand?
The codes I wrote to convert to factor were as follows
sex<-as.factor(sex)
is.factor(sex)
Diabet<-as.factor(Diabet)
is.factor(Diabet)
Status<-as.factor(Status)
is.factor(Status)
months<-as.factor(months)
is.factor(months)
Hemoglobin_group<-as.factor(Hemoglobin_group)
is.factor(Hemoglobin_group)
When ı run this code, R console looks like:
> sex<-as.factor(sex)
> is.factor(sex)
[1] TRUE
>
> Diabet<-as.factor(Diabet)
> is.factor(Diabet)
[1] TRUE
>
>
> Status<-as.factor(Status)
> is.factor(Status)
[1] TRUE
>
> months<-as.factor(months)
> is.factor(months)
[1] TRUE
>
> Hemoglobin_group<-as.factor(Hemoglobin_group)
> is.factor(Hemoglobin_group)
[1] TRUE
In this case, don't the categorical variables in my data turn into factors?
Your variable Hemoglobin_group is probably considered as a numeric value. Try:
Hemoglobin_groupF <- factor(Hemoglobin_group)
fit <- coxph(Surv(Time,Status) ~ Hemoglobin_groupF, data)
fit
The reference group will the first factor. You can easily change your reference factor with the function relevel
I came across the following and I haven't figured out the purpose of the "#" operator. What's the meaning there? I didn't make heads/tails of the R manual language.
library(lattice)
library(sp)
data(meuse)
coordinates(meuse) <- ~x+y
proj4string(meuse) <- CRS("+init=epsg:28992")
p <- xyplot(copper ~ cadmium, data = meuse#data, col = "grey", pch = 20, cex = 2)
R manuals says
Usage
object#name
object#name <- value
Extract or replace the contents of a slot in a object with a formal (S4) class structure.
These operators support the formal classes of package methods, and are enabled only when package methods is loaded (as per default). See slot for further details, in particular for the differences between slot() and the # operator.
It is checked that object is an S4 object (see isS4), and it is an error to attempt to use # on any other object. (There is an exception for name .Data for internal use only.) The replacement operator checks that the slot already exists on the object (which it should if the object is really from the class it claims to be).
I checked the structure of "meuse" and found no references to a slot named "data".
meuse is an S4 object
isS4(meuse)
[1] TRUE
If you take the structure of of meuse (str_meuse) you'll see some fields are denoted with your # operator, including one called data. These slots can be accessed with # similar to how you might see other slots in other objects accessed using the $ operator. So meuse#data gives you the data portion of the meuse object.
str(meuse)
Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
..# data :'data.frame': 155 obs. of 12 variables:
.. ..$ cadmium: num [1:155] 11.7 8.6 6.5 2.6 2.8 3 3.2 2.8 2.4 1.6 ...
.. ..$ copper : num [1:155] 85 81 68 81 48 61 31 29 37 24 ...
.. ..$ lead : num [1:155] 299 277 199 116 117 137 132 150 133 80 ...
.. ..$ zinc : num [1:155] 1022 1141 640 257 269 ...
.. ..$ elev : num [1:155] 7.91 6.98 7.8 7.66 7.48 ...
.. ..$ dist : num [1:155] 0.00136 0.01222 0.10303 0.19009 0.27709 ...
.. ..$ om : num [1:155] 13.6 14 13 8 8.7 7.8 9.2 9.5 10.6 6.3 ...
.. ..$ ffreq : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
.. ..$ soil : Factor w/ 3 levels "1","2","3": 1 1 1 2 2 2 2 1 1 2 ...
.. ..$ lime : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 1 1 1 1 ...
.. ..$ landuse: Factor w/ 15 levels "Aa","Ab","Ag",..: 4 4 4 11 4 11 4 2 2 15 ...
.. ..$ dist.m : num [1:155] 50 30 150 270 380 470 240 120 240 420 ...
..# coords.nrs : int [1:2] 1 2
..# coords : num [1:155, 1:2] 181072 181025 181165 181298 18130
See how that subsetting is working?
str(meuse#data)
'data.frame': 155 obs. of 12 variables:
$ cadmium: num 11.7 8.6 6.5 2.6 2.8 3 3.2 2.8 2.4 1.6 ...
$ copper : num 85 81 68 81 48 61 31 29 37 24 ...
$ lead : num 299 277 199 116 117 137 132 150 133 80 ...
$ zinc : num 1022 1141 640 257 269 ...
$ elev : num 7.91 6.98 7.8 7.66 7.48 ...
$ dist : num 0.00136 0.01222 0.10303 0.19009 0.27709 ...
$ om : num 13.6 14 13 8 8.7 7.8 9.2 9.5 10.6 6.3 ...
$ ffreq : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
$ soil : Factor w/ 3 levels "1","2","3": 1 1 1 2 2 2 2 1 1 2 ...
$ lime : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 1 1 1 1 ...
$ landuse: Factor w/ 15 levels "Aa","Ab","Ag",..: 4 4 4 11 4 11 4 2 2 15 ...
$ dist.m : num 50 30 150 270 380 470 240 120 240 420 ...
I've read my .CSV and then converted the file to a data frame using several methods including:
df<-read.csv('cdSH2015Fall.csv', dec = ".", na.strings = c("na"), header=TRUE,
row.names=NULL, stringsAsFactors=F)
df<-as.data.frame(lapply(df, unlist)) # converted .csv to a a data.frame
str(df) # provides the structure of df.
'data.frame': 72 obs. of 16 variables:
$ trtGroup : Factor w/ 68 levels "AANN","AAPN",..: 5 7 14 18 20 23
27 33 37 48 ...
$ cd : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ PreviousExp : Factor w/ 2 levels "Empty","Enriched": 2 1 2 2 2 2 1
1 1 1 ...
$ treatment : Factor w/ 2 levels "NN","PN": 1 1 1 1 1 1 1 1 1 1 ...
$ total.Area.DarkBlue.: num 827 1037 663 389 983 ...
$ numberOfGroups : int 1 1 1 1 1 1 1 1 1 1 ...
$ totalGroupArea : num 15.72 2.26 9.45 11.57 9.73 ...
$ averageGrpArea : num 15.72 2.26 9.45 11.57 9.73 ...
$ proximityToPlants : num 5.65 16.05 2.58 9.65 4.74 ...
$ latFeed : num 2 0.5 0 1 0 0 1 0.5 2 1 ...
$ latBalloon : num 6 2 2 NA 0 0.1 3 0.5 1 0.7 ...
$ countChases : int 5 8 16 4 16 21 18 11 14 28 ...
$ chases : int 95 87 67 923 636 96 1210 571 775 816 ...
$ grpDiameter : num 16.8 23.3 19.5 11.2 29.9 ...
$ grpActiv : num 4908 5164 4197 5263 5377 ...
$ NND : num 0 11.88 8.98 3.6 9.8 ...
I then run my model two ways:
First option.
fit = t.test(df$proximityToPlants[which (df$cd==1 &
df$treatment == 'PN')], df$proximityToPlants[which
(df$cd==0 & df$treatment == 'PN')]
)
Second option trying to ensure I have a proper data frame.
Subset the data and then create a matrix.
cdProximityToPlantsPN<-cdSH2015Fall$proximityToPlants[which (cdSH2015Fall$cd==1 & cdSH2015Fall$treatment == 'PN')]
H2ProximityToPlantsPN<-cdSH2015Fall$proximityToPlants[which (cdSH2015Fall$cd==0 & cdSH2015Fall$treatment == 'PN')]
cdProximityToPlantsNN<-cdSH2015Fall$proximityToPlants[which (cdSH2015Fall$cd==1 & cdSH2015Fall$treatment == 'NN')]
H2ProximityToPlantsNN<-cdSH2015Fall$proximityToPlants[which (cdSH2015Fall$cd==0 & cdSH2015Fall$treatment == 'NN')]
Creating a matrix
df<-
cbind(cdProximityToPlantsPN,H2ProximityToPlantsPN,cdProximityToPlantsNN,
H2ProximityToPlantsNN)
mat <- sapply(df,unlist)
fit=t.test(mat[,1],mat[,2], paired = F, var.equal = T)
Yet, I still get errors when assessing outliers using the following:
outlierTest(fit) # Bonferonni p-value for most extreme obs
Error in UseMethod("outlierTest") :
no applicable method for 'outlierTest' applied to an object of class
"htest"
qqPlot(fit, main="QQ Plot") #qq plot for studentized resid
Error in order(x[good]) : unimplemented type 'list' in 'orderVector1'
leveragePlots(fit) # leverage plots
Error in formula.default(model) : invalid formula
I know the issue must be with my data structure. Any ideas on how to fix it?
I want to see if there is correlation between my variables. This is the structure of the dataset
'data.frame': 189 obs. of 20 variables:
$ age : num 24 31 32 35 36 26 31 24 35 36 ...
$ diplM2 : Factor w/ 3 levels "0","1","2": 3 2 1 3 2 2 3 2 2 1 ...
$ TimeDelcat : Factor w/ 4 levels "0","1","2","3": 1 1 3 3 3 4 2 1 4 4 ...
$ SeasonDel : Factor w/ 4 levels "1","2","3","4": 1 2 4 3 4 3 4 3 2 3 ...
$ BMIM2 : num 23.4 25.7 17 26.6 24.6 21.6 21 22.3 20.8 20.7 ...
$ WgtB2 : int 3740 3615 3705 3485 3420 2775 3365 3770 3075 3000 ...
$ sex : Factor w/ 2 levels "1","2": 2 2 1 2 2 2 1 1 1 1 ...
$ smoke : Factor w/ 3 levels "0","1","2": 1 1 1 2 1 1 1 1 1 3 ...
$ nRBC : num 0.1621 0.0604 0.1935 0.0527 0.1118 ...
$ CD4T : num 0.1427 0.2143 0.1432 0.0686 0.0979 ...
$ CD8T : num 0.1574 0.1549 0.1243 0.0804 0.0782 ...
$ NK : num 0.02817 0 0.04368 0.00641 0.02398 ...
$ Bcell : num 0.1033 0.1124 0.1468 0.0551 0.0696 ...
$ Mono : num 0.0633 0.0641 0.0773 0.0531 0.0656 ...
$ Gran : num 0.428 0.442 0.329 0.716 0.6 ...
$ chip : Factor w/ 92 levels "200251580021",..: 12 24 23 2 27 22 6 22 17 22 ...
$ pos : Factor w/ 12 levels "R01C01","R01C02",..: 11 12 1 6 9 2 12 1 7 11 ...
$ trim1PM25ifdmv4: num 9.45 13.81 15.59 7.13 15.43 ...
$ trim2PM25ifdmv4: num 13.27 15.53 10.69 13.56 9.27 ...
$ trim3PM25ifdmv4: num 16.72 16.21 12.17 6.47 10.66 ...
As you can see, there are both continious and categorical variables.
When I run chart.Correlation(variables, histrogram=T,method = c("pearson") )
I get this error:
Error in pairs.default(x, gap = 0, lower.panel = panel.smooth, upper.panel = panel.cor, :
non-numeric argument to 'pairs'
How can I fix this?
Thank you.
I believe you want correlation only between numerical variables. The below code will do this and it will output only unique correlations between the input.
library(reshape2)
data <- data.frame(x1=rnorm(10),
x2=rnorm(10),
x3=rnorm(10),
x4=c("a","b","c","d","e","f","g","h","i","j"),
x5=c("ab","sp","sp","dd","hg","hj","qw","dh","ko","jk"))
data
x1 x2 x3 x4 x5
1 -1.2169793 0.5397598 0.4981513 a ab
2 -0.7032631 -2.1262837 -1.0377371 b sp
3 0.8766831 -0.2326975 -0.1219613 c sp
4 0.3405332 2.4766225 -1.1960618 d dd
5 0.1889945 0.3444534 1.9659062 e hg
6 0.8086956 0.4654644 -1.2526696 f hj
7 -0.6850181 -1.7657241 0.5156620 g qw
8 0.8518034 0.9484547 1.4784063 h dh
9 0.5191793 1.2246566 1.3867829 i ko
10 0.4568953 -0.6881464 0.3548839 j jk
#finding correlation for all numerical values
corr=cor(data[as.numeric(which(sapply(data,class)=="numeric"))])
#convert the correlation table to long format
res=melt(corr)
##keeping only one side of the correlations
res$type=apply(res,1,function(x)
paste(sort(c(as.character(x[1]),as.character(x[2]))),collapse="*"))
res=unique(res[,c("type","value")])
res
type value
x1*x1 1.00000000
x1*x2 0.44024939
x1*x3 0.04936654
x2*x2 1.00000000
x2*x3 0.08859169
x3*x3 1.00000000
I'm getting an error when I try to run an arima model with the zelig package. I'm using MI data with 20 imputations that were created with Amelia. Here is a short summary of my id and response variables:
$ imp20:'data.frame': 442 obs. of 50 variables:
..$ region : Factor w/ 4 levels "Central Africa",..: 3 3 3 3 3 3 3 3 3 3 ...
..$ subregionid : Factor w/ 4 levels "FC","FE","FS",..: 3 3 3 3 3 3 3 3 3 3 ...
..$ country : Factor w/ 34 levels "Angola","Benin",..: 1 1 1 1 1 1 1 1 1 1 ...
..$ ISO2 : Factor w/ 34 levels "AO","BF","BJ",..: 1 1 1 1 1 1 1 1 1 1 ...
..$ ISO3 : Factor w/ 34 levels "AGO","BEN","BFA",..: 1 1 1 1 1 1 1 1 1 1 ...
..$ year : num [1:442] 2002 2003 2004 2005 2006 ...
..$ cap.lat : num [1:442] -8.5 -8.5 -8.5 -8.5 -8.5 -8.5 -8.5 -8.5 -8.5 -8.5 ...
..$ cap.long : num [1:442] 13.2 13.2 13.2 13.2 13.2 ...
..$ NGDP_RPCH : num [1:442] 14.53 5.25 10.88 18.26 20.73 ...
..$ NGDPD : num [1:442] 3.18 3.31 3.38 3.44 3.48 ...
..$ NGDPDPC : num [1:442] 2.68 2.69 2.72 2.75 2.78 ...
..$ NGSD_NGDP : num [1:442] 10.62 7.77 12.63 26.98 40.94
...
..$ PIKE.regional : num [1:442] 0.225 0.295 0.287 0.358 0.357 ...
..$ Definite.Probable : num [1:442] 36 36 36 36 36.1 ...
..$ Elephant.range : num [1:442] 406006 433613 511662 456046 459418 ...
..$ Change.by.year : num [1:442] 0.000463 0.000463 0.000463 0.000463 0.000463 ...
..$ Diff.from.expected : num [1:442] -0.0415 -0.0415 -0.0415 -0.0415 -0.0415 ...
Diff.from.expected is my response variable. And here is the code that I've run along with the error I'm getting.
z1 <- zarima$new()
> z1$zelig(Diff.from.expected~GNI, order=c(1,0,1), model="arima",
+ data = a.coVarsTrans.more, ts="year", cs="country")
Error in data[, cs] : incorrect number of dimensions
So it appears to me that there is an issue with the cs='country' call, but I'm not sure what the issue is. I'm planning to add more independent variables, but want to make sure that a basic model works first, which clearly it doesn't.
Here is the link to my saved Amelia .Rdata file.