Can anyone tell me how to constrain the output and selected variables of a neural network such that the influence of a charateristic is positive using the function nnet in R. I Have a database (real estate) with numerical (surface, price) and categorial values (parking Y/N, areacode, ectera). The output of the model is the price. The thing is that the model currently estimates that in a few areacodes the homes with a parking spot are less worth than the homes without a parking spot. I would like to constrain the output (Price) so that in each areacode, the influence of a parking spot on the price is positive. Ofcourse a really small house with parking spot can still be cheaper than a big house without a parking spot.
example data (of 80.000 observations):
Price Surface Parking Y Areacode 1 Areacode 2 Areacode 3
100000 100 0 1 0 0
110000 99 1 0 1 0
200000 110 0 0 0 1
150000 130 0 0 1 0
190000 130 1 0 0 1
(thanks for putting the table in a decent format)
I had this modelled in R using nnet.
model = nnet(Price~ . , data=data6, MaxNWts=2500, size=12, skip=TRUE, linout=TRUE, decay=0.025, na.action=na.omit)
I used nnet because I hope to find different values for parking spots per area code. If there is a beter way for this please let us know.
Im using RStudio Version 0.98.976 on windows XP (yes i know;)
Thanks in advance for your replies
Related
I have a data set with rankings as the column names and about 15,000 contestants. My data looks like:
contestant
1
2
3
4
101
13
0
5
12
14
0
1
34
6
...
...
...
...
...
500
0
2
23
3
I've been working on doing cluster analysis on this dataset. The dendrograms are obviously not very helpful with this dataset--it produces a thick block line because of the large number of entries.
I'm wondering if there is a better way to do cluster analysis with this type of data. I've tried
fviz_cluster()
and similar commands, as well as went through multiple tutorials. Many tutorials guided me through making dendograms. The data all seems to be different than mine (comparing two variables, etc) and much smaller. Essentially, I'm asking which types of cluster analysis may work well with this type of data.
I am currently doing k-means to cluster my data, however, I wish each cluster to appear once in each given year. I have searched for answers for a whole night but with no result. Would anyone have ideas upon this problem using R? Or is there any package I should look for ? Thanks.
More background infos :
I try to replicated the cluster of relationships, using the reported gender, education level and birth year. I am doing this because this is a survey data whose respondents are old people and they sometime will report inaccurate age or education infos. My main challenge now is that I wish to "have only one cluster labels in each survey year". For example, I do not want to see there are two cluster3 in survey year 2000. My data is like below :
survey year
relationship
gender
education level
birth year
k-means cluster
2000
41( first daughter)
0
3
1997
1
2003
41( first daughter)
0
3
1997
1
2000
42( second daughter)
0
4
1999
2
2003
42( second daughter)
0
4
1999
2
2000
42( third daughter)
0
5
1999
2
2003
42( third daughter)
0
5
2001
3
Thanks in advance.
--Update--
A more detailed description of the task:
The data set is a panel survey data asking elders for their health status, their relationships ( incl. sons, daughters, neighbors ). Since these older people are sometimes imprecise on their family's demographic information such as birth year, education level, etc., we might need to delete a big part of the data if it did not match.
(e.g., A reported his first son is 30 years old in 1997, while said his first son was 29 years old in 1999, this data could therefore be problematic). My task is to save as much data as possible if the imprecision is not that high.
Therefore I first mutated columns to check the precision of each family member (e.g., birth year error %in% c(-1,2)). Next, I run k-means if the family members are detected to be imprecise. In this way, I save much of the data. Although I did not solve the above problem, it rarely occurs that I can almost ignore or drop these observations.
I dont know if you are familiar with the Candy-Power-Ranking Data.
But the data looks like this:
Chocolade Caramel Fruit Win%
1 1 0 0,8
0 1 0 0,54
0 0 0 0,23
1 1 1 0,49
Now I want to check which combination of properties lead to an higher Win%.
What I already did is a Multiple linear regression
lmodel = lm(candy_data$Win% ~ candy_data$chocolade + candy_data$caramel + candy_data$fruit)
summary(lmodel)
That gives me the info which propertie leads to an higher Win%. But I want to check which combinaton is the best. Do you have any idea?
I have a dataset called dolls.csv that I imported using
dolls <- read.csv("dolls.csv")
This is a snippet of the data
Name Review Year Strong Skinny Weak Fat Normal
Bell 3.5 1990 1 1 0 0 0
Jan 7.2 1997 0 0 1 0 1
Tweet 7.6 1987 1 1 0 0 0
Sall 9.5 2005 0 0 0 1 0
I am trying to run some preliminary analysis of this data. The Name is the name of the doll, the review is a rating 1-10, year is year made, and all values after that are binary where they are 1 if they possess a characteristic or 0 if they don't.
I ran
summary(dolls)
and get the header, means, mins and max's of values.
I am trying to possibly see what the correlations are between characteristics and year or review rating to see if there is some correlation (for example to see if certain dolls have really high ratings yet have unfavorable traits ), not sure how to construct charts or what functions to use in this case? I was considering some ANOVA tail testing for outliers and means of different values but not sure how to compare values like this (In python i'd run a if-then statement but i dont know how to in R).
This is for a personal study I wanted to conduct and improve my R skills.
Thank you!
I have been struggling with this problem for quite a while and any help would be much appreciated.
I am trying to write a function to calculate a transition matrix from observed data for a markov model.
My initial data I am using to build the function look something like this;
Season Team State
1 1 Manchester United 1
2 1 Chelsea 1
3 1 Manchester City 1
.
.
.
99 5 Charlton Athletic 4
100 5 Watford 4
with 5 seasons and 4 states.
I know how I am going to calculate the transition matrix, but in order to do this I need to count the number of teams that move from state i to state j for each season.
I need code that will do something like this,
a<-function(x,i,j){
if("team x is in state i in season 1 and state j in season 2") 1 else 0
}
sum(a)
and then I could do this for each team and pair of states and repeat for all 5 seasons. However, I am having a hard time getting my head around how to tell R the thing in quotation marks. Sorry if there is a really obvious answer but I am a rubbish programmer.
Thanks so much for reading!
This function tells you if a team made the transition from state1 to state2 from season1 to season2
a <- function(team, state1, state2, data, season1, season2) {
team.rows = data[team == data["Team",],]
in.season1.in.state1 = ifelse(team.rows["Season",]==season1 && team.rows["State",state1],1,0)
in.season2.in.state2 = ifelse(team.rows["Season",]==season2 && team.rows["State",state2],1,0)
return(sum(in.season1.in.stat1) * sum(in.season2.in.state2))
}
In the first line I select all rows of a particular team.
The second line is determining for each entry if a team is ever in state1 in season1.
The third line is determining for each entry if a team is ever in state2 in season2,
and the return statement returns 0 if the team was never in the respective state in the respective season or 1 otherwise (only works if there are no duplicates, in that case it might return a value greater than 1)