I am trying to create a simple for loop in R, but I am not sure how to go about this without creating a global variable.
I am trying to output a predict table neatly, instead of running code through many different instances (something like below) that I wish to predict.
house1 = newdata[1,]
predict(fullmodel, house1)
predict(sqftmodel, house1)
predict(bestmodel, house1)
house2 = newdata[2,]
predict(fullmodel, house2)
predict(sqftmodel, house2)
predict(bestmodel, house2)
house3 = newdata[3,]
I want to use a for loop to run through 37 different houses and have the output in a table. Any ideas?
edit: this is a portion of my code so far
data = read.table("DewittData.txt")
newdata = na.omit(data)#28 points to refer to
colnames(newdata) = c("ListPrice", "Beds", "Bath","HouseSize","YearBuilt"
,"LotSize", "Fuel","ForcedAir", "Other","FM","ESM","JD",
"SchoolDistrict","HouseType","GarageStalls","Taxes")
attach(newdata)
fullmodel = lm((ListPrice) ~ HouseSize + Beds + Bath + YearBuilt + LotSize
+ Fuel + ForcedAir + Other + SchoolDistrict+
HouseType + Other + FM + ESM + JD + GarageStalls + Taxes)
bestmodel = lm(ListPrice~Beds)
sqftmodel = lm(ListPrice~HouseSize, data = newdata)
update:
I see, so I've changed it to
predict(fullmodel, newdata[,])
predict(sqftmodel, newdata[,])
predict(bestmodel, newdata[,])
Now how would I output this in a table format?
I am not sure if I get your question , but this what I would do for predicting based on different rows of a df.
Housefull <- predict(fullmodel, newdata[,])
Housebest <- predict(bestmodel, newdata[,])
Housesqft <- predict(sqftmodel, newdata[,])
Generally, sticking to vectors is much better than using loops.
Related
This data is from an excel CSV file.
I want to see if a transformation is necessary, but my problem is that I keep getting this message:
Error in model.frame.default(formula = comment$Number.of.Comments ~ comment$Character.Count + : 'data' must be a data.frame, environment, or list
The following is my code:
comment <- read.csv('AdAnalysis3.csv', header = TRUE, fileEncoding = "UTF-8-BOM")
commentfit <- lm(comment$Number.of.Comments ~ comment$Character.Count + comment$Number.of.Shares + comment$Number.of.Likes + comment$Type.of.Ad + comment$Dealing.with.Life + comment$Christlike.Attributes + comment$Spiritual.Learning, data = comment)
library(car)
boxCox(commentfit)
I get the following message immediately after boxCox(commentfit):
Any suggestions?
You haven't given us a reproducible example, but my guess is that you have confused car::boxCox() by including comment$ in your formula. In general it's better (for a number of reasons including clarity) to specify a linear model with just the variable names, i.e.:
commentfit <- lm(Number.of.Comments ~ Character.Count + Number.of.Shares +
Number.of.Likes + Type.of.Ad + Dealing.with.Life +
Christlike.Attributes + Spiritual.Learning,
data = comment)
I wonder how I can sort this bug in R.
My simple lines
Remit_data <- panel_data(dataremit, id = id, wave = t)
model<-asym(wel_loggdp_cap ~ logremit + remitsq + logcpi + corruption +
employilo + senrol_netprim + logfert + urbanization + tradegdp +
netoda_gini, data = dataremit)
I get this error
Error: Only strings can be converted to symbols Backtrace:
panelr::asym(...)
panelr:::diff_data(...)
rlang::sym(id)
In panelr you have to define your panel data classifier (e.g. id / time) outside of the wmb function. You can compare this to plm were it can be done within plm.
library(panelr)
library(plm)
data(Produc)
# fixed effects with plm
FE_plm <- plm(gsp ~ pcap + pc + pcap:pc,
data = Produc,
index = c("state","year"),
method="within")
# fixed effects with panelr
Produc <- panel_data(Produc, id = state, wave = year)
FE_panelr <- wbm(gsp ~ pcap + pc + pcap:pc,
model = "within",
interaction.style = c("double-demean"),
data = Produc)
This should fix the issue. Always try to provide a minimal working example.
hoping someone can offer some guidance here.
I'm creating a multivariate simulation using the simDesign package, I am varying the number of factors as well as items that load on each factor. I would like to write a command that identifies the number of factors present in factornumbers and assigns the appropriate items to them (no cross loading). I will be testing all combinations of the conditions below and more, and I would like to have a model command that acknowledge the iterations of differing models, so I don't have to write multiple model statements.
factornumbers<-c(1,2,3,5)
itemsperfactor<-c(5,10,30)
What lavaan and mirt are looking for is below:
mirtmodel<-mirt.model('
F1=1-15
F2=16-30
MEAN=F1,F2
COV=F1*F2')
lavmodel <- ' F1=~ Item_1 + Item_2 + Item_3 + Item_4 + Item_5 + Item_6 + Item_7 + Item_8 + Item_9 + Item_10 + Item_11 + Item_12 + Item_13 + Item_14 + Item_15
F2=~ Item_16 + Item_17 + Item_18 + Item_19 + Item_20 + Item_21 + Item_22 + Item_23 + Item_24 + Item_25 + Item_26 + Item_27 + Item_28 + Item_29 + Item_30'
The simDesign package offers this example, I would like to expand on it but I'm not sure I have the know-how:
lavmodel<-paste0('F=~ ', paste0(colnames(dat)[1L], ' + '),
paste0(colnames(dat)[-1L], collapse = ' + '))
What I would like is a single mirt and lavaan command that finds the number of factors specified in the factornumbers command and assigns the correct items specified in the data as well as itemsperfactor.
EDIT:
I would like the model identification to pick up on which factor & item structure is in use for that condition and fill in the model identification with the correct information.
For Example:
mirtmodel<-mirt.model('
F1=1-1
F2=6-10
F3=11-15
F4=16-20
F5=21-25
MEAN=F1,F2,F3,F4,F5
COV=F1*F2*F3*F4*F5')
Or
mirtmodel<-mirt.model('
F1=1-30
F2=31-60
MEAN=F1,F2
COV=F1*F2')
And also the corresponding lavaan models.
The idea here is to paste different strings together so that the condition input (row of the respective Design object) is all that is required to construct a suitable model specification string. Generating syntax for simulations is arguably the most annoying part of simulations, but at least in R there are a good number of helpful string operations (plus, packages like stringr).
Here's my interpretation of what you are currently looking for using base R functions.
library(SimDesign)
library(mirt)
Design <- createDesign(factornumbers = c(1,2,3,5),
itemsperfactor = c(5,10,30))
gen_syntax_mirt <- function(condition){
fn <- with(condition, factornumbers)
ipf <- with(condition, itemsperfactor)
nitems <- fn * ipf
maxloads <- sort(seq(nitems, ipf, length.out = fn))
minloads <- c(1, maxloads[-length(maxloads)] + 1)
fnames <- paste0('F', 1:fn)
df <- cbind(fnames, ' = ', minloads, '-', maxloads)
s1 <- apply(df, 1, paste0, collapse = '')
s2 <- paste0('MEAN = ', paste0(fnames, collapse = ','))
s3 <- paste0('COV = ', paste0(fnames, collapse = '*'))
ret <- paste0(c(s1, s2, s3), collapse = '\n')
mirt.model(ret)
}
gen_syntax_mirt(Design[1,])
gen_syntax_mirt(Design[10,])
The input to this function is a single row from the Design input to runSimulation(), so you can see here that it will work just fine. Do something similar for lavaan's syntax and you'll be set.
I want to generate a plot of interest over time using GTrendsR and ggplot2
The plot I want (generated with google trends) is this:
Any help will be much appreciated.
Thanks!
This is the best I was able to get:
library(ggplot2)
library(devtools)
library(GTrendsR)
usr = "my.email"
psw = "my.password"
ch = gConnect(usr, psw)
location = "all"
query = "MOOCs"
MOOCs_trends = gTrends(ch, geo = location, query = query)
MOOCs<-MOOCs_trends[[1]]
MOOCs$moocs<-as.numeric(as.character(MOOCs$moocs))
MOOCs$Week <- as.character(MOOCs$Week)
MOOCs$start <- as.Date(MOOCs$Week)
ggplot(MOOCs[MOOCs$moocs!=0,], aes(start, moocs)) +
geom_line(colour = "blue") +
ylab("Trends") + xlab("") + theme_bw()
I think that to match the graph generated by google I would need to aggregate the data to months instead of weeks... not sure how to do that yet
The object returned by gtrendsR is a list, of which the trend element in a data.frame that you would want to plot.
usr = "my.email"
psw = "my.password"
gconnect(usr, psw)
MOOCs_trends = gtrends('MOOCs')
MOOCsDF <- MOOCs_trends$trend
ggplot(data = MOOCsDF) + geom_line(aes(x=start, y=moocs))
This gives:
Now if you want to aggregate by month, I would suggest using the floor_date function from the lubridate package, in combination with dplyr (note that I am using the chain operator %>% which dplyr re-exports from the magrittr package).
usr = "my.email"
psw = "my.password"
gconnect(usr, psw)
MOOCs_trends = gtrends('MOOCs')
MOOCsDF <- MOOCs_trends
MOOCsDF$start <- floor_date(MOOCsDF$start, unit = 'month')
MOOCsDF %>%
group_by(start) %>%
summarise(moocs = sum(moocs)) %>%
ggplot() + geom_line(aes(x=start, y=moocs))
This gives:
Note 1: The query MOOCs was changed to moocs, by gtrendsR, this is reflected in the y variable that you're plotting.
Note 2: some of the cases of functions have changed (e.g. gtrendsR not GTrendsR), I am using current versions.
This will get you most of the way there. The plot doesn't look quite right, but that's more of a function of the data being a bit different. Here's the necessary conversions to numeric and to dates.
MOOCs<-MOOCs_trends[[1]]
library(ggplot2)
library(plyr)
## Convert to string
MOOCs$Week <- as.character(MOOCs$Week)
MOOCs$moocs <- as.numeric(MOOCs$moocs)
# split the string
MOOCs$start <- unlist(llply(strsplit(MOOCs$Week," - "), function(x) return(x[2])))
MOOCs$start <- as.POSIXlt(MOOCs$start)
ggplot(MOOCs,aes(x=start,y=moocs))+geom_point()+geom_path()
Google might do some smoothing, but this will plot the data you have.
I have a large dataset (questionnaire results) of mostly categorical variables. I have tested for dependency between the variables using chi-square test. There are incomprehensible number of dependencies between variables. I used the chaid() function in the CHAID package to detect interactions and separate out (what I hope to be) the underlying structure of these dependencies for each variable. What typically happens is that the chi-square test will reveal a large number of dependencies (say 10-20) for a variable and the chaid function will reduce this to something much more comprehensible (say 3-5). What I want to do is to extract the names of those variable that were shown to be relevant in the chaid() results.
The chaid() output is in the form of a constparty object. My question is how to extract the variable names associated with the nodes in such an object.
Here is a self contained code example:
library(evtree) # for the ContraceptiveChoice dataset
library(CHAID)
library(vcd)
library(MASS)
data("ContraceptiveChoice")
longform = formula(contraceptive_method_used ~ wifes_education +
husbands_education + wifes_religion + wife_now_working +
husbands_occupation + standard_of_living_index + media_exposure)
z = chaid(longform, data = ContraceptiveChoice)
# plot(z)
z
# This is the part I want to do programatically
shortform = formula(contraceptive_method_used ~ wifes_education + husbands_occupation)
# The thing I want is a programatic way to extract 'shortform' from 'z'
# Examples of use of 'shortfom'
loglm(shortform, data = ContraceptiveChoice)
One possible sollution:
nn <- nodeapply(z)
n.names= names(unlist(nn[[1]]))
ext <- unlist(sapply(n.names, function(x) grep("split.varid.", x, value=T)))
ext <- gsub("kids.split.varid.", "", ext)
ext <- gsub("split.varid.", "", ext)
dep.var <- as.character(terms(z)[1][[2]]) # get the dependent variable
plus = paste(ext, collapse=" + ")
mul = paste(ext, collapse=" * ")
shortform <- as.formula(paste (dep.var, plus, sep = " ~ "))
satform <- as.formula(paste (dep.var, mul, sep = " ~ "))
mosaic(shortform, data = ContraceptiveChoice)
#stp <- step(glm(satform, data=ContraceptiveChoice, family=binomial), direction="both")