Remove rows from a single-column data frame - r

When I try to remove the last row from a single column data frame, I get a vector back instead of a data frame:
> df = data.frame(a=1:10)
> df
a
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
> df[-(length(df[,1])),]
[1] 1 2 3 4 5 6 7 8 9
The behavior I'm looking for is what happens when I use this command on a two-column data frame:
> df = data.frame(a=1:10,b=11:20)
> df
a b
1 1 11
2 2 12
3 3 13
4 4 14
5 5 15
6 6 16
7 7 17
8 8 18
9 9 19
10 10 20
> df[-(length(df[,1])),]
a b
1 1 11
2 2 12
3 3 13
4 4 14
5 5 15
6 6 16
7 7 17
8 8 18
9 9 19
My code is general, and I don't know a priori whether the data frame will contain one or many columns. Is there an easy workaround for this problem that will let me remove the last row no matter how many columns exist?

Try adding the drop = FALSE option:
R> df[-(length(df[,1])), , drop = FALSE]
a
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9

Related

How to replace NA values in one column of a data frame, with values from a column in a different data frame?

How do I replace the NA values in 'example' with the corresponding values in 'example 2'? So 7 would take the place of the first NA and 8 would take the place of the second NA etc. My data is much larger so I would not be able to rename the values individually for the multiple NAs. Thanks
example <- data.frame('count' = c(1,3,4,NA,8,NA,9,0,NA,NA,7,5,8,NA))
example2 <- data.frame('count' = c(7,8,4,6,7))
Another possible solution, based on replace:
example$count <- replace(example$count, is.na(example$count), example2$count)
example
#> count
#> 1 1
#> 2 3
#> 3 4
#> 4 7
#> 5 8
#> 6 8
#> 7 9
#> 8 0
#> 9 4
#> 10 6
#> 11 7
#> 12 5
#> 13 8
#> 14 7
You can try with :
example[is.na(example),] <- example2
Which will give you :
count
1 1
2 3
3 4
4 7
5 8
6 8
7 9
8 0
9 4
10 6
11 7
12 5
13 8
14 7
EDIT: Since you probably have more than just one column in your dataframes, you should use :
example$count[is.na(example$count)] <- example2$count
Another option using which to check the index of NA values:
ind <- which(is.na(example$count))
example[ind, "count"] <- example2$count
Output:
count
1 1
2 3
3 4
4 7
5 8
6 8
7 9
8 0
9 4
10 6
11 7
12 5
13 8
14 7

R: row-wise checking for multiple values

I have a dataset that looks like this
With further rows below. I want to create a column to the right that will have 1 if it matches with a certain value I am checking for row-wise and otherwise it will be 0.
For a single value I have the following code -
set.seed(4991)
my_data <- data.frame(ceiling(matrix(runif(100,4,10),ncol = 5)))
comval <- c(5)
my_data$bleh <- as.integer(apply(my_data, 1, function(r) any(comval %in% r)))
The output looks like this -
Which is what I want. Now the issue I am having is that if I have two or more values under 'comval' , for instance,
comval<-c(5,10)
I am getting 1 on the 'bleh' column for all columns that either have 5 or 10. The output is like -
It is like an OR logical operator. I need it to work as an AND logical operator, that is, 'bleh' column will have the value 1 only if all the values in 'comval' are there in the rows.
Also, I am trying to write a function here so I need to take the length(comval) as an input and then check for all the values in 'comval' against each row.
You could check if length of intersect is equal or greater than 1.
my_data$bleh <- as.integer(apply(my_data, 1, function(r) {
length(intersect(comval, unlist(r))) >= 1
}))
# X1 X2 X3 X4 X5 bleh
# 1 5 10 5 6 10 1
# 2 9 9 5 8 6 1
# 3 5 10 5 5 5 1
# 4 10 8 6 5 8 1
# 5 8 6 7 9 10 1
# 6 5 10 8 10 8 1
# 7 9 8 10 5 7 1
# 8 6 8 10 6 7 1
# 9 5 5 6 6 8 1
# 10 10 5 8 6 8 1
# 11 9 10 10 7 7 1
# 12 6 8 7 10 8 1
# 13 6 9 7 6 9 0
# 14 8 6 6 10 7 1
# 15 9 9 5 7 7 1
# 16 10 9 9 10 6 1
# 17 7 10 5 10 8 1
# 18 9 8 10 9 9 1
# 19 10 8 9 6 8 1
# 20 5 8 6 7 5 1

selecting common columns from different elements of a list

I have a data set in list format. The list is further divide into 20 elements. Each element contains 12 rows and some columns. Now I want to extract common columns from each element of the list and make a new data set. I try to make a reproducible example. Please see code
a<-data.frame(x=(1:10),y=(1:10),z=(1:10))
b<-data.frame(x=(1:10),y=(1:10),n=(1:10))
c<-data.frame(x=(1:10),y=(1:10),q=(1:10))
data<-list(a,b,c)
data1<-ldply(data)
required_data<-data1[,-3:-5]
Find the common columns using Reduce, subset them from list and bind them together
cols <- Reduce(intersect, lapply(data, colnames))
do.call(rbind, lapply(data, `[`, cols))
# x y
#1 1 1
#2 2 2
#3 3 3
#4 4 4
#5 5 5
#6 6 6
#7 7 7
#8 8 8
#9 9 9
#10 10 10
#11 1 1
#...
The last step can also be performed using
purrr::map_df(data, `[`, cols)
with base R, you can fist find the names in common
commonName <- names((r<-table(unlist(Map(names,data))))[r>1])
then retrieve the columns from list and integrate (similar to the second step in the solution by #Ronak Shah)
res <- Reduce(rbind,lapply(data, '[',commonName))
which gives:
> res
x y
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
11 1 1
12 2 2
13 3 3
14 4 4
15 5 5
16 6 6
17 7 7
18 8 8
19 9 9
20 10 10
21 1 1
22 2 2
23 3 3
24 4 4
25 5 5
26 6 6
27 7 7
28 8 8
29 9 9
30 10 10

R aggregation of columns by met condition in one column

I am trying to aggregate or associate 2 columns in a 4 column matrix. The matrix is filled with numeric values. I would like to show only column1 and column3 when column1 is >.25. I have tried numerous R commands but can't get the 2 columns to show when the criteria is met in column 1.
For example
1.094262, 14
0.5962845, 17
Below is the dataset. Example of desired output above.
0.1287953 3 12 1
1.094262 13 14 3
0.5962845 8 17 4
0.6511204 7 19 5
0.2533915 4 6 2
0.8222555 6 18 6
0.08695875 3 7 1
0.6096232 6 6 2
1.583204 24 7 1
0.08337463 4 7 1
0.06398186 1 11 2
0.2713974 4 11 2
0.6205648 13 4 1
1.276595 15 14 3
Is this what you are looking for?
df[df$V1>0.25,c(1,3)]
V1 V3
2 1.0942620 14
3 0.5962845 17
4 0.6511204 19
5 0.2533915 6
6 0.8222555 18
8 0.6096232 6
9 1.5832040 7
12 0.2713974 11
13 0.6205648 4
14 1.2765950 14
where df is:
df=read.table(text="0.1287953 3 12 1
1.094262 13 14 3
0.5962845 8 17 4
0.6511204 7 19 5
0.2533915 4 6 2
0.8222555 6 18 6
0.08695875 3 7 1
0.6096232 6 6 2
1.583204 24 7 1
0.08337463 4 7 1
0.06398186 1 11 2
0.2713974 4 11 2
0.6205648 13 4 1
1.276595 15 14 3", h=F)

How to generate an uneven sequence of numbers in R

Here's an example data frame:
df <- data.frame(x=c(1,1,2,2,2,3,3,4,5,6,6,6,9,9),y=c(1,2,3,4,6,3,7,8,6,4,3,7,3,2))
I want to generate a sequence of numbers according to the number of observations of y per x group (e.g. there are 2 observations of y for x=1). I want the sequence to be continuously increasing and jumps by 2 after each x group.
The desired output for this example would be:
1,2,5,6,7,10,11,14,17,20,21,22,25,26
How can I do this simply in R?
To expand on my comment, the groupings can be arbitrary, you simply need to recast it to the correct ordering. There are a few ways to do this, #akrun has shown that this can be accomplished using match function, or you can make use the the as.numeric function if this is easier to understand for yourself.
df <- data.frame(x=c(1,1,2,2,2,3,3,4,5,6,6,6,9,9),y=c(1,2,3,4,6,3,7,8,6,4,3,7,3,2))
# these are equivalent
df$newx <- as.numeric(factor(df$x, levels=unique(df$x)))
df$newx <- match(df$x, unique(df$x))
Since you now have a "new" releveling which is sequential, we can use the logic that was discussed in the comments.
df$newNumber <- 1:nrow(df) + (df$newx-1)*2
For this example, this will result in the following dataframe:
x y newx newNumber
1 1 1 1
1 2 1 2
2 3 2 5
2 4 2 6
2 6 2 7
3 3 3 10
3 7 3 11
4 8 4 14
5 6 5 17
6 4 6 20
6 3 6 21
6 7 6 22
9 3 7 25
9 2 7 26
where df$newNumber is the output you wanted.
To create the sequence 0,0,4,4,4,9,..., basically what you're doing is taking the minimum of each group and subtracting 1. The easiest way to do this is using the library(dplyr).
library(dplyr)
df %>%
group_by(x) %>%
mutate(newNumber2 = min(newNumber) -1)
Which will have the output:
Source: local data frame [14 x 5]
Groups: x
x y newx newNumber newNumber2
1 1 1 1 1 0
2 1 2 1 2 0
3 2 3 2 5 4
4 2 4 2 6 4
5 2 6 2 7 4
6 3 3 3 10 9
7 3 7 3 11 9
8 4 8 4 14 13
9 5 6 5 17 16
10 6 4 6 20 19
11 6 3 6 21 19
12 6 7 6 22 19
13 9 3 7 25 24
14 9 2 7 26 24

Resources