R - create conditional column in data frame by copying element/column [duplicate] - r

This question already has answers here:
How to change factor labels into string in a data frame
(3 answers)
Closed 5 years ago.
I have this table (all football games from greek league, where one team won from behind - ht)
Date HomeTeam AwayTeam FTHG FTAG FTR HTHG HTAG HTR
8 24/08/15 Panetolikos Panathinaikos 1 2 A 1 0 H
16 31/08/15 Platanias Atromitos 1 2 A 1 0 H
40 28/09/15 Veria AEK 1 2 A 1 0 H
42 03/10/15 Panthrakikos Levadeiakos 1 3 A 1 0 H
68 01/11/15 Asteras Tripolis PAOK 2 1 H 0 1 A
97 05/12/15 Asteras Tripolis Iraklis 1 2 A 1 0 H
120 21/12/15 AEK Levadeiakos 1 2 A 1 0 H
138 17/01/16 Asteras Tripolis Kallonis 3 1 H 0 1 A
196 06/03/16 Panthrakikos PAOK 2 1 H 0 1 A
203 13/03/16 Atromitos Asteras Tripolis 2 1 H 0 1 A
233 17/04/16 Asteras Tripolis Veria 2 1 H 0 1 A
and I want to create a new column, let's call it tempWinner which has the name of the winner. I am using the following formula, which uses excel's rational and unfortunately fails to give me the correct result. I have searched how to just "copy" a cell using a condition, but I was not able to find anything relevant.
anatropes$tempWinner <- ifelse (anatropes$FTR == "H", anatropes$HomeTeam , anatropes$AwayTeam)
Any idea? What I want to do eventually is count how many times each team has won from behind (either being home or away team).
edit:
str(anatropes) returns:
'data.frame': 11 obs. of 9 variables:
$ Date : Factor w/ 85 levels "","01/11/15",..: 67 84 75 7 2 12 57 41 14 32 ...
$ HomeTeam: Factor w/ 17 levels "","AEK","Asteras Tripolis",..: 11 15 16 13 3 3 2 3 13 4 ...
$ AwayTeam: Factor w/ 17 levels "","AEK","Asteras Tripolis",..: 10 4 2 8 14 6 8 7 14 3 ...
$ FTHG : int 1 1 1 1 2 1 1 3 2 2 ...
$ FTAG : int 2 2 2 3 1 2 2 1 1 1 ...
$ FTR : Factor w/ 4 levels "","A","D","H": 2 2 2 2 4 2 2 4 4 4 ...
$ HTHG : int 1 1 1 1 0 1 1 0 0 0 ...
$ HTAG : int 0 0 0 0 1 0 0 1 1 1 ...
$ HTR : Factor w/ 4 levels "","A","D","H": 4 4 4 4 2 4 4 2 2 2 ...
There is no error in my method, I just get the following data frame:
> anatropes
Date HomeTeam AwayTeam FTHG FTAG FTR HTHG HTAG HTR
8 24/08/15 Panetolikos Panathinaikos 1 2 A 1 0 H
16 31/08/15 Platanias Atromitos 1 2 A 1 0 H
40 28/09/15 Veria AEK 1 2 A 1 0 H
42 03/10/15 Panthrakikos Levadeiakos 1 3 A 1 0 H
68 01/11/15 Asteras Tripolis PAOK 2 1 H 0 1 A
97 05/12/15 Asteras Tripolis Iraklis 1 2 A 1 0 H
120 21/12/15 AEK Levadeiakos 1 2 A 1 0 H
138 17/01/16 Asteras Tripolis Kallonis 3 1 H 0 1 A
196 06/03/16 Panthrakikos PAOK 2 1 H 0 1 A
203 13/03/16 Atromitos Asteras Tripolis 2 1 H 0 1 A
233 17/04/16 Asteras Tripolis Veria 2 1 H 0 1 A
tempWinner
8 10
16 4
40 2
42 8
68 3
97 6
120 8
138 3
196 13
203 4
233 3

as #Sotos suggested - use as.character() since HomeTeam and AwayTeam are both factors. what you are getting is the id of the level instead of the string value.

Related

Mixed effects model, lmer

I'm trying to create a mixed effects model with lmer. The SubPlot should be nested with Plot ja Treatment should be nested with Subplot. So there's 3 to 7 Treatment in SubPlots ja always 3 SubPlot in a Plot. I created a following model:
model <- lmer(Depth ~ Mass + (1|Plot:SubPlot:Treatment), data=mydata)
But this gives me an error:
Error: number of levels of each grouping factor must be < number of observations (problems: Plot:SubPlot:Treatment)
'data.frame': 147 obs. of 6 variables:
$ Plot : int 1 1 1 1 1 1 1 1 1 1 ...
$ SubPlot : int 1 1 1 1 1 1 1 2 2 2 ...
$ Treatment : int 1 2 3 4 5 6 7 1 2 3 ...
$ Depth : num 0 4 4.5 5.5 6 6 6 3 4.5 6.5 ...
$ Mass : int 21 50 78 103 128 147 172 21 49 77 ...
Here's some data:
Plot SubPlot Treatment Depth Mass
1 1 1 0 21
1 1 2 4 50
1 1 3 4.5 78
1 1 4 5.5 103
1 1 5 6 128
1 1 6 6 147
1 1 7 6 172
1 2 1 3 21
1 2 2 4.5 49
1 2 3 6.5 77
1 2 4 7 102
1 2 5 8 127
1 2 6 9 146
1 2 7 10.5 171
1 3 1 3 21
1 3 2 1.5 49
1 3 3 1.5 77
1 3 4 1.5 102
1 3 5 1.5 127
1 3 6 1.5 146
1 3 7 1.5 171
2 1 1 3 21
2 1 2 5 50
2 1 3 5 78
2 1 4 7 103
2 1 5 9 128
2 1 6 9.5 146
2 1 7 10 171
2 2 1 1.5 21
2 2 2 4 50
2 2 3 5 78
2 2 4 9 103
2 2 5 10 128
2 2 6 10.5 146
2 2 7 10.5 171
2 3 1 0 21
2 3 2 0 50
2 3 3 0 78
2 3 4 0 103
2 3 5 0 128
2 3 6 0 146
2 3 7 0 171
Any ideas how to proceed?

Overwrite a value on a data.frame filtered with Dplyr - R

I have a data frame called cleandata and need to change values on column age.
I can find the values to be replaced with filter and select functions from dplyr.
> str(cleantrain)
'data.frame': 891 obs. of 9 variables:
$ train$PassengerId: int 1 2 3 4 5 6 7 8 9 10 ...
$ survived : Factor w/ 2 levels "0","1": 1 2 2 2 1 1 1 1 2 2 ...
$ Title : Factor w/ 17 levels "Capt","Col","Don",..: 12 13 9 13 12 12 12 8 13 13 ...
$ fare : num 7.25 71.28 7.92 53.1 8.05 ...
$ sbsp : int 1 1 0 1 0 0 0 3 0 1 ...
$ parch : int 0 0 0 0 0 0 0 1 2 0 ...
$ alone : Factor w/ 2 levels "0","1": 1 1 2 1 2 2 2 1 1 1 ...
$ familysize : Factor w/ 9 levels "1","2","3","4",..: 2 2 1 2 1 1 1 5 3 2 ...
$ age : num 22 38 26 35 35 NA 54 2 27 14 ...
# Column title is equal to "Master" and Column age is NA
> cleantrain %>% filter(Title == "Master" & is.na(age))
train$PassengerId survived Title fare sbsp parch alone familysize age
1 66 1 Master 15.2458 1 1 0 3 NA
2 160 0 Master 69.5500 8 2 0 11 NA
3 177 0 Master 25.4667 3 1 0 5 NA
4 710 1 Master 15.2458 1 1 0 3 NA
I just need to replaces these NAs with 8.
Using mutate as below will not update original cleantrain data.frame
>cleantrain %>% filter(Title == "Master" & is.na(age)) %>% mutate(age = 8) #will put the right info on the right place.
train$PassengerId survived Title fare sbsp parch alone familysize age
1 66 1 Master 15.2458 1 1 0 3 8
2 160 0 Master 69.5500 8 2 0 11 8
3 177 0 Master 25.4667 3 1 0 5 8
4 710 1 Master 15.2458 1 1 0 3 8
#but not actually. when checking dataframe values remains NAS
>cleantrain %>% filter(Title == "Master" & is.na(age))
train$PassengerId survived Title fare sbsp parch alone familysize age
1 66 1 Master 15.2458 1 1 0 3 NA
2 160 0 Master 69.5500 8 2 0 11 NA
3 177 0 Master 25.4667 3 1 0 5 NA
4 710 1 Master 15.2458 1 1 0 3 NA
Can I use mutate to do this? Any Dplyr/quick function that does not requires for/if loops?
#learningR
The replace solution of #akrun will work if you want to update rows with a fixed value. Generally, you have to use ifelse function, I believe:
cleantrain <- cleantrain %>%
mutate(age = ifelse(Title == 'Master' & is.na(age),
8,
age))

Moving the last column to a nth place in R [duplicate]

This question already has answers here:
How does one reorder columns in a data frame?
(12 answers)
Closed 2 years ago.
Good Day
I am trying to move the last column of a dataset to be the third column in a dataframe in R and was wondering what would be the most efficient way to do this.
My DataFrame structure is as follows:
str(HR)
'data.frame': 2940 obs. of 36 variables:
$ EmployeeNumber : int 1 2 3 4 5 6 7 8 9 10 ...
$ Attrition : Factor w/ 2 levels "No","Yes": 2 1 2 1 1 1 1 1 1 1 ...
$ Age : int 41 49 37 33 27 32 59 30 38 36 ...
$ BusinessTravel : Factor w/ 3 levels "Non-Travel","Travel_Frequently",..: 3 2 3 2 3 2 3 3 2 3
$ DailyRate : int 1102 279 1373 1392 591 1005 1324 1358 216 1299 ...
$ Department : Factor w/ 3 levels "Human Resources",..: 3 2 2 2 2 2 2 2 2 2 ...
$ DistanceFromHome : int 1 8 2 3 2 2 3 24 23 27 ...
$ Education : int 2 1 2 4 1 2 3 1 3 3 ...
$ EducationField : Factor w/ 6 levels "Human Resources",..: 2 2 5 2 4 2 4 2 2 4 ...
$ EmployeeCount : int 1 1 1 1 1 1 1 1 1 1 ...
$ EnvironmentSatisfaction : int 2 3 4 4 1 4 3 4 4 3 ...
$ Gender : Factor w/ 2 levels "Female","Male": 1 2 2 1 2 2 1 2 2 2 ...
$ HourlyRate : int 94 61 92 56 40 79 81 67 44 94 ...
$ JobInvolvement : int 3 2 2 3 3 3 4 3 2 3 ...
$ JobLevel : int 2 2 1 1 1 1 1 1 3 2 ...
$ JobRole : Factor w/ 9 levels "Healthcare Representative",..: 8 7 3 7 3 3 3 3 5 1 ...
$ JobSatisfaction : int 4 2 3 3 2 4 1 3 3 3 ...
$ MaritalStatus : Factor w/ 3 levels "Divorced","Married",..: 3 2 3 2 2 3 2 1 3 2 ...
$ MonthlyIncome : int 5993 5130 2090 2909 3468 3068 2670 2693 9526 5237 ...
$ MonthlyRate : int 19479 24907 2396 23159 16632 11864 9964 13335 8787 16577 ...
$ NumCompaniesWorked : int 8 1 6 1 9 0 4 1 0 6 ...
$ Over18 : Factor w/ 1 level "Y": 1 1 1 1 1 1 1 1 1 1 ...
$ OverTime : Factor w/ 2 levels "No","Yes": 2 1 2 2 1 1 2 1 1 1 ...
$ PercentSalaryHike : int 11 23 15 11 12 13 20 22 21 13 ...
$ PerformanceRating : int 3 4 3 3 3 3 4 4 4 3 ...
$ RelationshipSatisfaction: int 1 4 2 3 4 3 1 2 2 2 ...
$ StandardHours : int 80 80 80 80 80 80 80 80 80 80 ...
$ StockOptionLevel : int 0 1 0 0 1 0 3 1 0 2 ...
$ TotalWorkingYears : int 8 10 7 8 6 8 12 1 10 17 ...
$ TrainingTimesLastYear : int 0 3 3 3 3 2 3 2 2 3 ...
$ WorkLifeBalance : int 1 3 3 3 3 2 2 3 3 2 ...
$ YearsAtCompany : int 6 10 0 8 2 7 1 1 9 7 ...
$ YearsInCurrentRole : int 4 7 0 7 2 7 0 0 7 7 ...
$ YearsSinceLastPromotion : int 0 1 0 3 2 3 0 0 1 7 ...
$ YearsWithCurrManager : int 5 7 0 0 2 6 0 0 8 7 ...
$ AttritionB : num 1 0 1 0 0 0 0 0 0 0 ...
and I am trying to have AttritionB come after Attrition.
HRCorForm = HR[,c(1,2,36:35)], I have tried this code however it negates the rest of the columns
Kind Regards
Rehaan
This will get all your columns:
HRCorForm = HR[,c(1,2,36,3:35)]

Need to rank a dataset based on 3 columns in R [duplicate]

This question already has an answer here:
generate sequence within group in R [duplicate]
(1 answer)
Closed 6 years ago.
I have a dataset which was ordered using function order() in R and same is shown below
A B C
1 1 85
1 1 62
1 0 92
2 1 80
2 0 92
2 0 84
3 1 65
3 0 92
I've to print rank based on column A and expected output is shown below
A B C Rank
1 1 85 1
1 1 62 2
1 0 92 3
2 1 80 1
2 0 92 2
2 0 84 3
3 1 65 1
3 0 92 2
Request for expertise in R
A simple base R solution using ave and seq_along is
df$Rank <- ave(df$B, df$A, FUN=seq_along)
which returns
df
A B C Rank
1 1 1 85 1
2 1 1 62 2
3 1 0 92 3
4 2 1 80 1
5 2 0 92 2
6 2 0 84 3
7 3 1 65 1
8 3 0 92 2
seq_along returns a vector 1, 2, 3, ... the length of its argument. ave allows the user to apply a function to groups which are determined here by the variable A.
data
df <- read.table(header=TRUE, text="A B C
1 1 85
1 1 62
1 0 92
2 1 80
2 0 92
2 0 84
3 1 65
3 0 92")

Replace a dot "." with NA in a dataframe in R

I have the following data frame:
obs zip age bed bath size lot exter garage fp price
1 1 1 3 21 3 3.0 951 64904 other 0 0 30000
2 2 2 3 21 3 2.0 1036 217800 frame 0 0 39900
3 3 3 4 7 1 1.0 676 54450 other 2 0 46500
4 4 4 3 6 3 2.0 1456 51836 other 0 1 48600
5 5 5 1 51 3 1.0 1186 10857 other 1 0 51500
6 6 6 2 19 3 2.0 1456 40075 frame 0 0 56990
7 7 7 3 8 3 2.0 1368 . frame 0 0 59900
8 8 8 4 27 3 1.0 994 11016 frame 1 0 62500
9 9 9 1 51 2 1.0 1176 6259 frame 1 1 65500
10 10 10 3 1 3 2.0 1216 11348 other 0 0 69000
11 11 11 4 32 3 2.0 1410 25450 brick 0 0 76900
12 12 12 3 2 3 2.0 1344 . other 0 1 79000
13 13 13 3 25 2 2.0 1064 218671 other 0 0 79900
14 14 14 1 31 3 1.5 1770 19602 brick 0 1 79950
15 15 15 4 29 3 2.0 1524 12720 brick 2 1 82900
16 16 16 3 16 3 2.0 1750 130680 frame 0 0 84900
17 17 17 3 20 3 2.0 1152 104544 other 2 0 85000
18 18 18 3 18 4 2.0 1770 10640 other 0 0 87900
19 19 19 4 28 3 2.0 1624 12700 brick 2 1 89900
20 20 20 2 27 3 2.0 1540 5679 brick 2 1 89900
with the following structure:
str(df)
'data.frame': 69 obs. of 12 variables:
$ Obs : int 1 2 3 4 5 6 7 8 9 10 ...
$ obs : int 1 2 3 4 5 6 7 8 9 10 ...
$ zip : int 3 3 4 3 1 2 3 4 1 3 ...
$ age : int 21 21 7 6 51 19 8 27 51 1 ...
$ bed : int 3 3 1 3 3 3 3 3 2 3 ...
$ bath : num 3 2 1 2 1 2 2 1 1 2 ...
$ size : Factor w/ 66 levels ".","1036","1064",..: 65 2 64 14 6 14 10 66 5 7 ...
$ lot : Factor w/ 60 levels ".","10295","10400",..: 47 28 43 39 9 35 1 11 46 13 ...
$ exter : Factor w/ 3 levels "brick","frame",..: 3 2 3 3 3 2 2 2 2 3 ...
$ garage: int 0 0 2 0 1 0 0 1 1 0 ...
$ fp : int 0 0 0 1 0 0 0 0 1 0 ...
$ price : int 30000 39900 46500 48600 51500 56990 59900 62500 65500 69000 ...
As you can be seen the "lot" variable appears as a factor. I have the following questions about this data:
Why does R read this variable "lot" as a factor?
When I tried:
df$lot[df$lot == "."] <- NA all dots (.) were replaced with <NA> and not as NA as I wanted.
I then tried df$lot <- as.numeric(df$lot) but the numerical values of this variable have changed completely, with the (.) being replaced by 1. What happened when I changed the variable's type?
How may I replace all dots (.) with NA?

Resources