Convert column of class factor into NA [duplicate] - r

This question already has answers here:
How to convert a factor to integer\numeric without loss of information?
(12 answers)
Closed 3 years ago.
How to convert a column of class factor into numeric without disturbing the NAs present in it?
I do not want to convert it to 0!!
>Conceded
[1] 665 515 NA NA NA 67 98 15 31 NA NA NA NA NA 2195 2525 1756 6366 3143
[20] 7857 5926 2254 3199 4297 4568 2246 1506 2291
21 Levels: 15 1506 1756 2195 2246 2254 2291 2525 31 3143 3199 4297 4568 515 5926 6366 665 67 ...
NA
>class(Conceded)
[1]"factor"
>as.numeric(Conceded)
[1] 17 14 21 21 21 18 20 1 9 21 21 21 21 21 4 8 3 16 10 19 15 6 11 12 13 5 2 7
1)How can I retain the value of NA,while converting a factor vector into a number vector?
2)Also what are these values that appear as a result oh this conversion
3) why do I need to convert to character vector followed by numeric vector?

You will probably need to first convert to a character, and then to numeric. Otherwise your factor levels are used for the values instead of the original values coded by the text.
Ex.
x <- factor(c(23,4,7,16, 10, NA))
as.numeric(x) # wrong values
as.numeric(as.character(x)) # correct values

Related

Conditional interpolation of time series data in R

I have time series data with N/As. The data are to end up in an animated scatterplot
Week X Y
1 1 105
2 3 110
3 5 N/A
4 7 130
8 15 160
12 23 180
16 30 N/A
20 37 200
For a smooth animation, the data will be supplemented by calculated, additional values/rows. For the X values this is simply arithmetical. No problem so far.
Week X Y
1 1 105
2
2 3 110
4
3 5 N/A
6
4 7 130
8
9
10
11
12
13
14
8 15 160
16
17
18
19
20
21
22
12 23 180
24
25
26
27
28
29
16 30 N/A
31
32
33
34
35
36
20 37 200
The Y values should be interpolated and there is the additional requirement, that interpolation should only appear between two consecutive values and not between values, that have a N/A between them.
Week X Value
1 1 105
2 interpolated value
2 3 110
4
3 5 N/A
6
4 7 130
8 interpolated value
9 interpolated value
10 interpolated value
11 interpolated value
12 interpolated value
13 interpolated value
14 interpolated value
8 15 160
16 interpolated value
17 interpolated value
18 interpolated value
19 interpolated value
20 interpolated value
21 interpolated value
22 interpolated value
12 23 180
24
25
26
27
28
29
16 30 N/A
31
32
33
34
35
36
20 37 200
I have already experimented with approx, converted the "original" N/A to placeholder values and tried the zoo package with na.approx etc. but donĀ“t get it, to express a correct condition statement for this kind of "conditional approximation" or "conditional gap filling". Any hint is welcome and very appreciated.
Thanks in advance
Replace the NAs with Inf, interpolate and then revert infinite values to NA.
library(zoo)
DF2 <- DF
DF2$Y[is.na(DF2$Y)] <- Inf
w <- merge(DF2, data.frame(Week = min(DF2$Week):max(DF2$Week)), by = 1, all.y = TRUE)
w$Value <- na.approx(w$Y)
w$Value[!is.finite(Value)] <- NA
giving the following where Week has been expanded to all weeks, Y is such that the original NAs are shown as Inf and the inserted NAs as NA. Value is the interpolated Y.
> w
Week X Y Value
1 1 1 105 105.0
2 2 3 110 110.0
3 3 5 Inf NA
4 4 7 130 130.0
5 5 NA NA 137.5
6 6 NA NA 145.0
7 7 NA NA 152.5
8 8 15 160 160.0
9 9 NA NA 165.0
10 10 NA NA 170.0
11 11 NA NA 175.0
12 12 23 180 180.0
13 13 NA NA NA
14 14 NA NA NA
15 15 NA NA NA
16 16 30 Inf NA
17 17 NA NA NA
18 18 NA NA NA
19 19 NA NA NA
20 20 37 200 200.0
Note: Input DF in reproducible form:
Lines <- "
Week X Y
1 1 105
2 3 110
3 5 N/A
4 7 130
8 15 160
12 23 180
16 30 N/A
20 37 200"
DF <- read.table(text = Lines, header = TRUE, na.strings = "N/A")

Adding data frame below another data frame [duplicate]

This question already has answers here:
Combine two data frames by rows (rbind) when they have different sets of columns
(14 answers)
Closed 5 years ago.
I want to do the following:
I have a Actual Sales Dataframe
Dates Actual
24/04/2017 58
25/04/2017 59
26/04/2017 58
27/04/2017 154
28/04/2017 117
29/04/2017 127
30/04/2017 178
Another data frame of Predicted values
Dates Predicted
01/05/2017 68.54159
02/05/2017 90.7313
03/05/2017 82.76875
04/05/2017 117.48913
05/05/2017 110.3809
06/05/2017 156.53363
07/05/2017 198.14819
Add the predicted Sales data frame below the Actual data Frame in following manner:
Dates Actual Predicted
24/04/2017 58
25/04/2017 59
26/04/2017 58
27/04/2017 154
28/04/2017 117
29/04/2017 127
30/04/2017 178
01/05/2017 68.54159
02/05/2017 90.7313
03/05/2017 82.76875
04/05/2017 117.48913
05/05/2017 110.3809
06/05/2017 156.53363
07/05/2017 198.14819
With:
library(dplyr)
bind_rows(d1, d2)
you get:
Dates Actual Predicted
1 24/04/2017 58 NA
2 25/04/2017 59 NA
3 26/04/2017 58 NA
4 27/04/2017 154 NA
5 28/04/2017 117 NA
6 29/04/2017 127 NA
7 30/04/2017 178 NA
8 01/05/2017 NA 68.54159
9 02/05/2017 NA 90.73130
10 03/05/2017 NA 82.76875
11 04/05/2017 NA 117.48913
12 05/05/2017 NA 110.38090
13 06/05/2017 NA 156.53363
14 07/05/2017 NA 198.14819
Or with:
library(data.table)
rbindlist(list(d1,d2), fill = TRUE)
Or with:
library(plyr)
rbind.fill(d1,d2)

Sorting a matrix by a vector with correct numeration

I am looking to sort a matrix by a vector, it's partially working : I have a matrix g (2 column id and nobs) that I sort by the vector id.
My code is this one :
g[order(id),]
The sorting is OK however I end up with this result :
id nobs
6 30 932
5 29 711
4 28 475
3 27 338
2 26 586
1 25 463
And I am looking to an output this way :
id nobs
1 30 932
2 29 711
3 28 475
4 27 338
5 26 586
6 25 463
What is the first column with the numeration 1 to 6 and do I impact that ?
R 3.2.1, Windows 10
The first number of each line is just the name of the row. If you want/need to fix it, you can just use the following (after the ordering):
m <- g[order(id),]
rownames(m) <- 1:nrow(g)
and it should look the way you want it.

Changing factors to Integers without changing the order of the data

I have following data and trying change CCG and Pract to numbers so I can use stan or Winbugs...when I try to change it seems its changing the order of the data..
I want to change CCG and Pract to numbers without changing the order of the data...I tried hard but I couldn't do it.
I am struggling with this basic issue than writing Bugs codes....please help..
I have the following data
CCG pract Deno Numer Points Excep
1 01C N81049 49 46 4 4
2 01C N81022 28 26 4 23
3 01C N81632 66 64 4 4
4 01C N81069 15 14 4 3
5 01C N81062 98 89 4 9
6 01C N81033 31 28 4 9
I tried to change to integer using as.integer() and I am getting I am getting..
CCG pract Deno Numer Points Excep
1 20 6621 160 144 41 36
2 20 6594 130 117 41 18
3 20 6698 179 164 41 36
4 20 6640 57 46 41 25
5 20 6633 214 191 41 62
6 20 6605 137 119 41 62
By checking Deno and Numer it is clear the order of the data has been changed...Why CCG is not starting from 1?
I want
CCG pract Deno Numer Points Excep
1 01C N81049 49 46 4 4
2 01C N81022 28 26 4 23
3 01C N81632 66 64 4 4
4 01C N81069 15 14 4 3
5 01C N81062 98 89 4 9
6 01C N81033 31 28 4 9
change to something like this
CCG pract Deno Numer Points Excep
1 1 1 49 46 4 4
2 1 1 28 26 4 23
3 1 1 66 64 4 4
4 1 1 15 14 4 3
5 1 1 98 89 4 9
6 1 1 31 28 4 9
Please help me..
In R, factors are internally represented as integers, linking to a table of the factor levels. AFAIK, these internal integers are assigned based on a lexicographic order of the factor levels, so 57 gets a higher code than 238.
as.integer() will extract this internal integer coding. As you found out, this is not very useful. (I honestly don't understand why R does this when applying as.integer() to factors that have integers as factor levels.)
Solution: first convert to character, then to integer. as.integer(as.character(Deno))

Transforming long format data to short format by segmenting dates that include redundant observations

I have a data set that is long format and includes exact date/time measurements of 3 scores on a single test administered between 3 and 5 times per year.
ID Date Fl Er Cmp
1 9/24/2010 11:38 15 2 17
1 1/11/2011 11:53 39 11 25
1 1/15/2011 11:36 39 11 39
1 3/7/2011 11:28 95 58 2
2 10/4/2010 14:35 35 9 6
2 1/7/2011 13:11 32 7 8
2 3/7/2011 13:11 79 42 30
3 10/12/2011 13:22 17 3 18
3 1/19/2012 14:14 45 15 36
3 5/8/2012 11:55 29 6 11
3 6/8/2012 11:55 74 37 7
4 9/14/2012 9:15 62 28 18
4 1/24/2013 9:51 82 45 9
4 5/21/2013 14:04 135 87 17
5 9/12/2011 11:30 98 61 18
5 9/15/2011 13:23 55 22 9
5 11/15/2011 11:34 98 61 17
5 1/9/2012 11:32 55 22 17
5 4/20/2012 11:30 23 4 17
I need to transform this data to short format with time bands based on month (i.e. Fall=August-October; Winter=January-February; Spring=March-May). Some bands will include more than one observation per participant, and as such, will need a "spill over" band. An example transformation for the Fl scores below.
ID Fall1Fl Fall2Fl Winter1Fl Winter2Fl Spring1Fl Spring2Fl
1 15 NA 39 39 95 NA
2 35 NA 32 NA 79 NA
3 17 NA 45 NA 28 74
4 62 NA 82 NA 135 NA
5 98 55 55 NA 23 NA
Notice that dates which are "redundant" (i.e. more than 1 Aug-Oct observation) spill over into Fall2fl column. Dates that occur outside of the desired bands (i.e. November, December, June, July) should be deleted. The final data set should have additional columns that include Fl Er and Cmp.
Any help would be appreciated!
(Link to .csv file with long data http://mentor.coe.uh.edu/Data_Example_Long.csv )
This seems to do what you are looking for, but doesn't exactly match your desired output. I haven't looked at your sample data to see whether the problem lies with your sample desired output or the transformations I've done, but you should be able to follow along with the code to see how the transformations were made.
## Convert dates to actual date formats
mydf$Date <- strptime(gsub("/", "-", mydf$Date), format="%m-%d-%Y %H:%M")
## Factor the months so we can get the "seasons" that you want
Months <- factor(month(mydf$Date), levels=1:12)
levels(Months) <- list(Fall = c(8:10),
Winter = c(1:2),
Spring = c(3:5),
Other = c(6, 7, 11, 12))
mydf$Seasons <- Months
## Drop the "Other" seasons
mydf <- mydf[!mydf$Seasons == "Other", ]
## Add a "Year" column
mydf$Year <- year(mydf$Date)
## Add a "Times" column
mydf$Times <- as.numeric(ave(as.character(mydf$Seasons),
mydf$ID, mydf$Year, FUN = seq_along))
## Load "reshape2" and use `dcast` on just one variable.
## Repeat for other variables by changing the "value.var"
dcast(mydf, ID ~ Seasons + Times, value.var="Fluency")
# ID Fall_1 Fall_2 Winter_1 Winter_2 Spring_2 Spring_3
# 1 1 15 NA 39 39 NA 95
# 2 2 35 NA 32 NA 79 NA
# 3 3 17 NA 45 NA 29 NA
# 4 4 62 NA 82 NA 135 NA
# 5 5 98 55 55 NA 23 NA

Resources