This question already has answers here:
How to change factor labels into string in a data frame
(3 answers)
Closed 5 years ago.
I have this table (all football games from greek league, where one team won from behind - ht)
Date HomeTeam AwayTeam FTHG FTAG FTR HTHG HTAG HTR
8 24/08/15 Panetolikos Panathinaikos 1 2 A 1 0 H
16 31/08/15 Platanias Atromitos 1 2 A 1 0 H
40 28/09/15 Veria AEK 1 2 A 1 0 H
42 03/10/15 Panthrakikos Levadeiakos 1 3 A 1 0 H
68 01/11/15 Asteras Tripolis PAOK 2 1 H 0 1 A
97 05/12/15 Asteras Tripolis Iraklis 1 2 A 1 0 H
120 21/12/15 AEK Levadeiakos 1 2 A 1 0 H
138 17/01/16 Asteras Tripolis Kallonis 3 1 H 0 1 A
196 06/03/16 Panthrakikos PAOK 2 1 H 0 1 A
203 13/03/16 Atromitos Asteras Tripolis 2 1 H 0 1 A
233 17/04/16 Asteras Tripolis Veria 2 1 H 0 1 A
and I want to create a new column, let's call it tempWinner which has the name of the winner. I am using the following formula, which uses excel's rational and unfortunately fails to give me the correct result. I have searched how to just "copy" a cell using a condition, but I was not able to find anything relevant.
anatropes$tempWinner <- ifelse (anatropes$FTR == "H", anatropes$HomeTeam , anatropes$AwayTeam)
Any idea? What I want to do eventually is count how many times each team has won from behind (either being home or away team).
edit:
str(anatropes) returns:
'data.frame': 11 obs. of 9 variables:
$ Date : Factor w/ 85 levels "","01/11/15",..: 67 84 75 7 2 12 57 41 14 32 ...
$ HomeTeam: Factor w/ 17 levels "","AEK","Asteras Tripolis",..: 11 15 16 13 3 3 2 3 13 4 ...
$ AwayTeam: Factor w/ 17 levels "","AEK","Asteras Tripolis",..: 10 4 2 8 14 6 8 7 14 3 ...
$ FTHG : int 1 1 1 1 2 1 1 3 2 2 ...
$ FTAG : int 2 2 2 3 1 2 2 1 1 1 ...
$ FTR : Factor w/ 4 levels "","A","D","H": 2 2 2 2 4 2 2 4 4 4 ...
$ HTHG : int 1 1 1 1 0 1 1 0 0 0 ...
$ HTAG : int 0 0 0 0 1 0 0 1 1 1 ...
$ HTR : Factor w/ 4 levels "","A","D","H": 4 4 4 4 2 4 4 2 2 2 ...
There is no error in my method, I just get the following data frame:
> anatropes
Date HomeTeam AwayTeam FTHG FTAG FTR HTHG HTAG HTR
8 24/08/15 Panetolikos Panathinaikos 1 2 A 1 0 H
16 31/08/15 Platanias Atromitos 1 2 A 1 0 H
40 28/09/15 Veria AEK 1 2 A 1 0 H
42 03/10/15 Panthrakikos Levadeiakos 1 3 A 1 0 H
68 01/11/15 Asteras Tripolis PAOK 2 1 H 0 1 A
97 05/12/15 Asteras Tripolis Iraklis 1 2 A 1 0 H
120 21/12/15 AEK Levadeiakos 1 2 A 1 0 H
138 17/01/16 Asteras Tripolis Kallonis 3 1 H 0 1 A
196 06/03/16 Panthrakikos PAOK 2 1 H 0 1 A
203 13/03/16 Atromitos Asteras Tripolis 2 1 H 0 1 A
233 17/04/16 Asteras Tripolis Veria 2 1 H 0 1 A
tempWinner
8 10
16 4
40 2
42 8
68 3
97 6
120 8
138 3
196 13
203 4
233 3
as #Sotos suggested - use as.character() since HomeTeam and AwayTeam are both factors. what you are getting is the id of the level instead of the string value.
Related
I'm trying to create a mixed effects model with lmer. The SubPlot should be nested with Plot ja Treatment should be nested with Subplot. So there's 3 to 7 Treatment in SubPlots ja always 3 SubPlot in a Plot. I created a following model:
model <- lmer(Depth ~ Mass + (1|Plot:SubPlot:Treatment), data=mydata)
But this gives me an error:
Error: number of levels of each grouping factor must be < number of observations (problems: Plot:SubPlot:Treatment)
'data.frame': 147 obs. of 6 variables:
$ Plot : int 1 1 1 1 1 1 1 1 1 1 ...
$ SubPlot : int 1 1 1 1 1 1 1 2 2 2 ...
$ Treatment : int 1 2 3 4 5 6 7 1 2 3 ...
$ Depth : num 0 4 4.5 5.5 6 6 6 3 4.5 6.5 ...
$ Mass : int 21 50 78 103 128 147 172 21 49 77 ...
Here's some data:
Plot SubPlot Treatment Depth Mass
1 1 1 0 21
1 1 2 4 50
1 1 3 4.5 78
1 1 4 5.5 103
1 1 5 6 128
1 1 6 6 147
1 1 7 6 172
1 2 1 3 21
1 2 2 4.5 49
1 2 3 6.5 77
1 2 4 7 102
1 2 5 8 127
1 2 6 9 146
1 2 7 10.5 171
1 3 1 3 21
1 3 2 1.5 49
1 3 3 1.5 77
1 3 4 1.5 102
1 3 5 1.5 127
1 3 6 1.5 146
1 3 7 1.5 171
2 1 1 3 21
2 1 2 5 50
2 1 3 5 78
2 1 4 7 103
2 1 5 9 128
2 1 6 9.5 146
2 1 7 10 171
2 2 1 1.5 21
2 2 2 4 50
2 2 3 5 78
2 2 4 9 103
2 2 5 10 128
2 2 6 10.5 146
2 2 7 10.5 171
2 3 1 0 21
2 3 2 0 50
2 3 3 0 78
2 3 4 0 103
2 3 5 0 128
2 3 6 0 146
2 3 7 0 171
Any ideas how to proceed?
I have a data frame called cleandata and need to change values on column age.
I can find the values to be replaced with filter and select functions from dplyr.
> str(cleantrain)
'data.frame': 891 obs. of 9 variables:
$ train$PassengerId: int 1 2 3 4 5 6 7 8 9 10 ...
$ survived : Factor w/ 2 levels "0","1": 1 2 2 2 1 1 1 1 2 2 ...
$ Title : Factor w/ 17 levels "Capt","Col","Don",..: 12 13 9 13 12 12 12 8 13 13 ...
$ fare : num 7.25 71.28 7.92 53.1 8.05 ...
$ sbsp : int 1 1 0 1 0 0 0 3 0 1 ...
$ parch : int 0 0 0 0 0 0 0 1 2 0 ...
$ alone : Factor w/ 2 levels "0","1": 1 1 2 1 2 2 2 1 1 1 ...
$ familysize : Factor w/ 9 levels "1","2","3","4",..: 2 2 1 2 1 1 1 5 3 2 ...
$ age : num 22 38 26 35 35 NA 54 2 27 14 ...
# Column title is equal to "Master" and Column age is NA
> cleantrain %>% filter(Title == "Master" & is.na(age))
train$PassengerId survived Title fare sbsp parch alone familysize age
1 66 1 Master 15.2458 1 1 0 3 NA
2 160 0 Master 69.5500 8 2 0 11 NA
3 177 0 Master 25.4667 3 1 0 5 NA
4 710 1 Master 15.2458 1 1 0 3 NA
I just need to replaces these NAs with 8.
Using mutate as below will not update original cleantrain data.frame
>cleantrain %>% filter(Title == "Master" & is.na(age)) %>% mutate(age = 8) #will put the right info on the right place.
train$PassengerId survived Title fare sbsp parch alone familysize age
1 66 1 Master 15.2458 1 1 0 3 8
2 160 0 Master 69.5500 8 2 0 11 8
3 177 0 Master 25.4667 3 1 0 5 8
4 710 1 Master 15.2458 1 1 0 3 8
#but not actually. when checking dataframe values remains NAS
>cleantrain %>% filter(Title == "Master" & is.na(age))
train$PassengerId survived Title fare sbsp parch alone familysize age
1 66 1 Master 15.2458 1 1 0 3 NA
2 160 0 Master 69.5500 8 2 0 11 NA
3 177 0 Master 25.4667 3 1 0 5 NA
4 710 1 Master 15.2458 1 1 0 3 NA
Can I use mutate to do this? Any Dplyr/quick function that does not requires for/if loops?
#learningR
The replace solution of #akrun will work if you want to update rows with a fixed value. Generally, you have to use ifelse function, I believe:
cleantrain <- cleantrain %>%
mutate(age = ifelse(Title == 'Master' & is.na(age),
8,
age))
This question already has answers here:
How does one reorder columns in a data frame?
(12 answers)
Closed 2 years ago.
Good Day
I am trying to move the last column of a dataset to be the third column in a dataframe in R and was wondering what would be the most efficient way to do this.
My DataFrame structure is as follows:
str(HR)
'data.frame': 2940 obs. of 36 variables:
$ EmployeeNumber : int 1 2 3 4 5 6 7 8 9 10 ...
$ Attrition : Factor w/ 2 levels "No","Yes": 2 1 2 1 1 1 1 1 1 1 ...
$ Age : int 41 49 37 33 27 32 59 30 38 36 ...
$ BusinessTravel : Factor w/ 3 levels "Non-Travel","Travel_Frequently",..: 3 2 3 2 3 2 3 3 2 3
$ DailyRate : int 1102 279 1373 1392 591 1005 1324 1358 216 1299 ...
$ Department : Factor w/ 3 levels "Human Resources",..: 3 2 2 2 2 2 2 2 2 2 ...
$ DistanceFromHome : int 1 8 2 3 2 2 3 24 23 27 ...
$ Education : int 2 1 2 4 1 2 3 1 3 3 ...
$ EducationField : Factor w/ 6 levels "Human Resources",..: 2 2 5 2 4 2 4 2 2 4 ...
$ EmployeeCount : int 1 1 1 1 1 1 1 1 1 1 ...
$ EnvironmentSatisfaction : int 2 3 4 4 1 4 3 4 4 3 ...
$ Gender : Factor w/ 2 levels "Female","Male": 1 2 2 1 2 2 1 2 2 2 ...
$ HourlyRate : int 94 61 92 56 40 79 81 67 44 94 ...
$ JobInvolvement : int 3 2 2 3 3 3 4 3 2 3 ...
$ JobLevel : int 2 2 1 1 1 1 1 1 3 2 ...
$ JobRole : Factor w/ 9 levels "Healthcare Representative",..: 8 7 3 7 3 3 3 3 5 1 ...
$ JobSatisfaction : int 4 2 3 3 2 4 1 3 3 3 ...
$ MaritalStatus : Factor w/ 3 levels "Divorced","Married",..: 3 2 3 2 2 3 2 1 3 2 ...
$ MonthlyIncome : int 5993 5130 2090 2909 3468 3068 2670 2693 9526 5237 ...
$ MonthlyRate : int 19479 24907 2396 23159 16632 11864 9964 13335 8787 16577 ...
$ NumCompaniesWorked : int 8 1 6 1 9 0 4 1 0 6 ...
$ Over18 : Factor w/ 1 level "Y": 1 1 1 1 1 1 1 1 1 1 ...
$ OverTime : Factor w/ 2 levels "No","Yes": 2 1 2 2 1 1 2 1 1 1 ...
$ PercentSalaryHike : int 11 23 15 11 12 13 20 22 21 13 ...
$ PerformanceRating : int 3 4 3 3 3 3 4 4 4 3 ...
$ RelationshipSatisfaction: int 1 4 2 3 4 3 1 2 2 2 ...
$ StandardHours : int 80 80 80 80 80 80 80 80 80 80 ...
$ StockOptionLevel : int 0 1 0 0 1 0 3 1 0 2 ...
$ TotalWorkingYears : int 8 10 7 8 6 8 12 1 10 17 ...
$ TrainingTimesLastYear : int 0 3 3 3 3 2 3 2 2 3 ...
$ WorkLifeBalance : int 1 3 3 3 3 2 2 3 3 2 ...
$ YearsAtCompany : int 6 10 0 8 2 7 1 1 9 7 ...
$ YearsInCurrentRole : int 4 7 0 7 2 7 0 0 7 7 ...
$ YearsSinceLastPromotion : int 0 1 0 3 2 3 0 0 1 7 ...
$ YearsWithCurrManager : int 5 7 0 0 2 6 0 0 8 7 ...
$ AttritionB : num 1 0 1 0 0 0 0 0 0 0 ...
and I am trying to have AttritionB come after Attrition.
HRCorForm = HR[,c(1,2,36:35)], I have tried this code however it negates the rest of the columns
Kind Regards
Rehaan
This will get all your columns:
HRCorForm = HR[,c(1,2,36,3:35)]
This question already has an answer here:
generate sequence within group in R [duplicate]
(1 answer)
Closed 6 years ago.
I have a dataset which was ordered using function order() in R and same is shown below
A B C
1 1 85
1 1 62
1 0 92
2 1 80
2 0 92
2 0 84
3 1 65
3 0 92
I've to print rank based on column A and expected output is shown below
A B C Rank
1 1 85 1
1 1 62 2
1 0 92 3
2 1 80 1
2 0 92 2
2 0 84 3
3 1 65 1
3 0 92 2
Request for expertise in R
A simple base R solution using ave and seq_along is
df$Rank <- ave(df$B, df$A, FUN=seq_along)
which returns
df
A B C Rank
1 1 1 85 1
2 1 1 62 2
3 1 0 92 3
4 2 1 80 1
5 2 0 92 2
6 2 0 84 3
7 3 1 65 1
8 3 0 92 2
seq_along returns a vector 1, 2, 3, ... the length of its argument. ave allows the user to apply a function to groups which are determined here by the variable A.
data
df <- read.table(header=TRUE, text="A B C
1 1 85
1 1 62
1 0 92
2 1 80
2 0 92
2 0 84
3 1 65
3 0 92")
I have the following data frame:
obs zip age bed bath size lot exter garage fp price
1 1 1 3 21 3 3.0 951 64904 other 0 0 30000
2 2 2 3 21 3 2.0 1036 217800 frame 0 0 39900
3 3 3 4 7 1 1.0 676 54450 other 2 0 46500
4 4 4 3 6 3 2.0 1456 51836 other 0 1 48600
5 5 5 1 51 3 1.0 1186 10857 other 1 0 51500
6 6 6 2 19 3 2.0 1456 40075 frame 0 0 56990
7 7 7 3 8 3 2.0 1368 . frame 0 0 59900
8 8 8 4 27 3 1.0 994 11016 frame 1 0 62500
9 9 9 1 51 2 1.0 1176 6259 frame 1 1 65500
10 10 10 3 1 3 2.0 1216 11348 other 0 0 69000
11 11 11 4 32 3 2.0 1410 25450 brick 0 0 76900
12 12 12 3 2 3 2.0 1344 . other 0 1 79000
13 13 13 3 25 2 2.0 1064 218671 other 0 0 79900
14 14 14 1 31 3 1.5 1770 19602 brick 0 1 79950
15 15 15 4 29 3 2.0 1524 12720 brick 2 1 82900
16 16 16 3 16 3 2.0 1750 130680 frame 0 0 84900
17 17 17 3 20 3 2.0 1152 104544 other 2 0 85000
18 18 18 3 18 4 2.0 1770 10640 other 0 0 87900
19 19 19 4 28 3 2.0 1624 12700 brick 2 1 89900
20 20 20 2 27 3 2.0 1540 5679 brick 2 1 89900
with the following structure:
str(df)
'data.frame': 69 obs. of 12 variables:
$ Obs : int 1 2 3 4 5 6 7 8 9 10 ...
$ obs : int 1 2 3 4 5 6 7 8 9 10 ...
$ zip : int 3 3 4 3 1 2 3 4 1 3 ...
$ age : int 21 21 7 6 51 19 8 27 51 1 ...
$ bed : int 3 3 1 3 3 3 3 3 2 3 ...
$ bath : num 3 2 1 2 1 2 2 1 1 2 ...
$ size : Factor w/ 66 levels ".","1036","1064",..: 65 2 64 14 6 14 10 66 5 7 ...
$ lot : Factor w/ 60 levels ".","10295","10400",..: 47 28 43 39 9 35 1 11 46 13 ...
$ exter : Factor w/ 3 levels "brick","frame",..: 3 2 3 3 3 2 2 2 2 3 ...
$ garage: int 0 0 2 0 1 0 0 1 1 0 ...
$ fp : int 0 0 0 1 0 0 0 0 1 0 ...
$ price : int 30000 39900 46500 48600 51500 56990 59900 62500 65500 69000 ...
As you can be seen the "lot" variable appears as a factor. I have the following questions about this data:
Why does R read this variable "lot" as a factor?
When I tried:
df$lot[df$lot == "."] <- NA all dots (.) were replaced with <NA> and not as NA as I wanted.
I then tried df$lot <- as.numeric(df$lot) but the numerical values of this variable have changed completely, with the (.) being replaced by 1. What happened when I changed the variable's type?
How may I replace all dots (.) with NA?