i am encountering a baffling error.
i am using the following function to delete rows of a dataframe containing an NA observation in any column
##### removes NA'd rows from a dataFrame
wipeNArows<-function(X){
rowsToDelete<-unique(unlist(apply(apply(X,2,is.na),2,which)))
if (length(rowsToDelete)>0){
return (X[-rowsToDelete,])
}
else{
return (X)
}
}
This function works fine normally, for instance a reproducible example is:
testFrame<-data.frame(x=rpois(20,10),y=rpois(20,10),z=rpois(20,10))
rowsToDelete<-sample(1:nrow(testFrame),5,FALSE)
testFrame$x[rowsToDelete]<-NA
testFrame
wipeNArows(testFrame) ### removes the rows where NA is encountered
Now i have a data frame containing about 2993 rows. When i pass this data frame through the function i face the following error:
Error in apply(apply(X, 2, is.na), 2, which) :
error in evaluating the argument 'X' in selecting a method for function 'apply': Error in as.matrix.data.frame(X) :
dims [product 14965] do not match the length of object [14974]
Thanks for responses,
Works fine for me, but why not use ?complete.cases
testFrame[complete.cases(testFrame),]
x y z
2 10 8 13
3 11 16 18
4 11 7 7
6 8 8 14
7 9 11 11
8 12 11 5
9 10 7 4
10 7 12 9
11 10 13 11
12 9 12 10
13 10 5 8
14 13 5 8
15 11 5 5
18 13 14 7
19 2 13 8
identical(testFrame[complete.cases(testFrame),], wipeNArows(testFrame))
[1] TRUE
Another way to solve your problem would be na.omit
na.omit(testFrame)
x y z
2 7 11 11
3 12 10 10
4 13 10 9
6 11 10 12
7 13 14 8
8 7 9 7
9 8 11 12
10 5 10 7
11 5 15 9
12 7 13 9
15 15 8 9
16 13 7 15
17 5 10 12
18 9 8 6
20 18 7 6
hmm thanks for replies,
wasn't aware of the complete.cases function. but that gives another error
Error in complete.cases(dFrame) : not all arguments have the same length
chisq.test Error Message --> appears to address this issue in a way.
the issue with the problematic data frame is that it contained a POSIXlt object column with dates. clearly complete.cases and apply internal workings aren't handling this too well. the workaround is to cast to character with strftime and then back with strptime.
thanks,
General case, if you do not have na's in your data, then as Aditya Sihag suggested, the problem could be one of your data.frame columns's datatype may be a list of objects such as a list or POSIXlt object. You can either cast them or you can just use lapply on the column alone. But again make sure your column datatype is not a list or POSIXlt before applying lapply and if yes, then just cast it.
Without the problem data, I can only suggest a different function
wipe_na_rows <- function(X){
X[!apply(X, 1, function(x) any(is.na(x))),]
}
Related
I am trying to create an index for a data frame. Each team playing has its own row, but I would like to add a column to use as an index so that the first two teams have the index 'Game 0', the next two teams have the index 'Game 1' until the length of half the list. In python the code would look as follows:
for i in range(0,int(len(teams)/2)):
gamenumber.append('Game '+str(i))
gamenumber.append('Game '+str(i))
I am unfamiliar with R so any help would be appreciated!
This will give you a list of paired index numbers:
> teams=1:100
> data.frame("Games"=sort(c(1:(length(teams)/2), 1:(length(teams)/2))))
Games
1 1
2 1
3 2
4 2
5 3
6 3
7 4
8 4
9 5
10 5
11 6
12 6
13 7
14 7
15 8
16 8
17 9
18 9
19 10
20 10 #etc.
Assuming teams is a data.frame with an even number of rows:
rep(1:(nrow(teams)/2), each=2)
Raw Data na.approx desired result
1 1 1
NA 3 4
5 5 5
6 6 6
7 7 7
NA 8 4
NA 9 7
10 10 10
13 11 13
14 12 14
By default, i believe na.approx in R will interpolate NA between two known values; one before and another after NA (the result will be seen as column "na.approx" above). Is there a way I can change this function to interpolate based on next two known values? for eg, first NA to be interpolated using 5 and 6.... but not 1 and 5.
I am not sure if there is an exact equivalent to what you want to do, but you can achieve similar results the following way:
> data <- c(1, NA, 5,6,7,NA,NA,10,13,14)
> ind <- which(is.na(data))
> sapply(rev(ind), function(i) data[i] <<- data[i + 1] - 1)
> data
[1] 1 4 5 6 7 8 9 10 13 14
Simple question, I think. Basically, I want to use the concept "less than or equal to a number" as the condition to select the row of one column, and then find the value on the same row in another column. But what happens if the number stated in the condition isn't found in the first column?
Let's assume this is my data frame:
df<-as.data.frame((matrix(c(1:10,11:20), nrow = 10, ncol = 2)))
df
V1 V2
1 1 11
2 2 12
3 3 13
4 4 14
5 5 15
6 6 16
7 7 17
8 8 18
9 9 19
10 10 20
Let's assume I want to use the condition <=5 in df$V1 to obtain the row that is used to find the value of the same row in df$V2.
df[which(df$V1 <= 5),2]
15
But what happens if the number used in the condition isn't found? Let's assume this is my new data.frame
V1 V2
1 1 11
2 2 12
3 3 13
4 4 14
5 6 15
6 7 16
7 8 17
8 9 18
9 10 19
10 11 20
Using the same above command df[which(df$V1 <= 5),2], I obtain a different answer. For some reason I obtain the entire column instead of one number.
11 12 13 14 15 16 17 18 19 20
Any suggestions?
Use the subset operator:
df[df[,2]<= 5,1]
Is it possible to find corresponding rows of one data frame in an other data frame.
Using R commands?
After that store the result in an other data frame.
Example:
data1 = airquality[1:14,]
data2 = data.frame(index=data1$Ozone[6:14])
I want to have in an other data frame the date corrresponding the same rows of this 2 data frame. I consider the Ozone value of data1 like index.
So what i want to get finally is somethings like this in data3:
index Month Day
28 5 6
23 5 7
19 5 8
8 5 9
NA 5 10
7 5 11
16 5 12
11 5 13
14 5 14
You could use %in% operator:
data3 <- data1[data1$Ozone %in% data2$index, c("Ozone", "Month", "Day")]
data3
Ozone Month Day
5 NA 5 5
6 28 5 6
7 23 5 7
8 19 5 8
9 8 5 9
10 NA 5 10
11 7 5 11
12 16 5 12
13 11 5 13
14 14 5 14
You have NAs in your index example. R will pick all NAs in the resulting data.frame. Unless you want to pick all of them, avoid using them in indexes.
If you wanted to use row names, you could do something like this:
data1[!rownames(data1) %in% 1:5, c("Ozone", "Month", "Day")]
Ozone Month Day
6 28 5 6
7 23 5 7
8 19 5 8
9 8 5 9
10 NA 5 10
11 7 5 11
12 16 5 12
13 11 5 13
14 14 5 14
See here for further information about subsetting. Also this site is helpful.
Consider some vector in R: x
x<-1:10
I'd like to create a repeating sequence of x, with the first element of each sequence truncated with each repetition, yielding the same output as would be given by issuing the following command in R:
c(1:10,2:10,3:10,4:10,5:10,6:10,7:10,8:10,9:10,10:10)
Can this be done? In reality, I'm working with a much larger vector for x. I'm playing with numerous combinations of the rep() function, to no avail.
Here's an alternative using mapply:
unlist(mapply(":", 1:10, 10))
# [1] 1 2 3 4 5 6 7 8 9 10 2 3 4 5 6 7 8 9 10 3 4 5 6 7
# [25] 8 9 10 4 5 6 7 8 9 10 5 6 7 8 9 10 6 7 8 9 10 7 8 9
# [49] 10 8 9 10 9 10 10
A bit of a hack, because you can decompose what you are trying to do into two sequences:
rep(0:9, 10:1) + sequence(10:1)
You can see what each part does. I don't know if there is a way to feed the parameters to rep() or seq() like you would do in a Python expansion.
unlist(sapply(1:10, function(x) { x:10 }))