Subset rows excluse special values - r

I want to subset rows which do not contain special values. For example:
df <- data.frame(a=c(1,2,2,3,4,4),b=c(-9999,2,3,4,5,6),c=c(2,3,4,-9999,2,4))
a b c
1 1 -9999 2
2 2 2 3
3 2 3 4
4 3 4 -9999
5 4 5 2
6 4 6 4
df has many rows and columns , I want to subset the rows which don't contain -9999. Expect result as follow codes:
df[which(df$a!=-9999,df$b!=-9999,df$c!=-9999),]
a b c
2 2 2 3
3 2 3 4
5 4 5 2
6 4 6 4
when columns are to many to type above logical judge, how to subset it?

You can try this one:
temp <- which(df == "-9999",arr.ind = T)
df[-unique(temp[,1]),]
a b c
2 2 2 3
3 2 3 4
5 4 5 2
6 4 6 4

Related

Change the order of numerically named columns in r

If I have a dataframe like the one below which has numerical column names
example = data.frame(1=c(1,8,3,9), 2=c(3,2,3,3), 3=c(5,2,5,4), 4=c(1,2,3,4), 5=c(2,5,7,8))
Which looks like this:
1 2 3 4 5
1 3 5 1 2
8 2 2 2 5
3 3 5 3 7
9 3 4 4 8
And I want to arrange it so that the column names start with three and proceed through five and back to one, like this:
3 4 5 1 2
5 1 2 1 3
2 2 5 8 2
5 3 7 3 3
4 4 8 9 3
I know how to rearrange the position of a single column in a dataset, but I'm not sure how to do this with more than one column in this particular order.
We can use the column index concatenated (c) based on the sequence (:) on a range of values
example[c(3:5, 1:2)]
# 3 4 5 1 2
#1 5 1 2 1 3
#2 2 2 5 8 2
#3 5 3 7 3 3
#4 4 4 8 9 3
As the column names are all numeric, just convert to numeric and use that for ordering
v1 <- as.numeric(names(example))
example[c(v1[3:5], v1[1:2])]
Or simply do
example[c(names(example)[3:5], names(example)[1:2])]
Or another way is with head and tail
example[c(tail(names(example), 3), head(names(example), 2))]
data
example <- data.frame(`1`=c(1,8,3,9), `2`=c(3,2,3,3),
`3`=c(5,2,5,4), `4`=c(1,2,3,4), `5`=c(2,5,7,8), check.names = FALSE)
R will not easily let you create columns with numbers as name. If somehow, you are able to create columns with numbers you can use match to get order in which you want the column names.
example[match(c(3:5, 1:2), names(example))]
# 3 4 5 1 2
#1 5 1 2 1 3
#2 2 2 5 8 2
#3 5 3 7 3 3
#4 4 4 8 9 3

How to reverse a column in R

I have a dataframe as described below. Now I want to reverse the order of column B without hampering the total order of the dataframe. So now the column B has 5,4,3,2,1. I want to change it to 1,2,3,4,5. I don't want to sort as it will hamper the total ordering.
A B C
1 5 6
2 4 8
3 3 5
4 2 5
5 1 3
You can replace just that column:
x$B <- rev(x$B)
On your data:
> x$B <- rev(x$B)
> x
A B C
1 1 1 6
2 2 2 8
3 3 3 5
4 4 4 5
5 5 5 3
transform is also handy for this:
> transform(x, B = rev(B))
A B C
1 1 1 6
2 2 2 8
3 3 3 5
4 4 4 5
5 5 5 3
This doesn't modify x so you need to assign the result to something (perhaps back to x).

How to only keep the columns with same names between two data frames?

I have two data frames like the following:
a<-c(1,3,4,5,6,8)
b<-c(2,3,4,2,6,7)
c<-c(2,5,6,3,5,6)
df1<-data.frame(a,b,c)
d<-c(3,4,5,6,7,8)
e<-c(1,2,3,2,1,1)
c<-c(1,3,4,5,6,2)
df2<-data.frame(d,e,c)
> df1
a b c
1 1 2 2
2 3 3 5
3 4 4 6
4 5 2 3
5 6 6 5
6 8 7 6
> df2
d e c
1 3 1 1
2 4 2 3
3 5 3 4
4 6 2 5
5 7 1 6
6 8 1 2
I want combine the two data frames,and only keep the columns with the same names. The final data frame should like this:
> df3
c1 c2
1 2 1
2 5 3
3 6 4
4 3 5
5 5 6
6 6 2
My real data frames have hundreds columns,so I need codes do this job. Can anyone help me?
Find out which names belong to both dataframes and then bind them:
eqnames <- names(df1)[names(df1) %in% names(df2)]
df3 <- cbind(df1[eqnames], df2[eqnames])
You can then rename the columns:
names(df3) <- paste0(names(df3), 1:ncol(df3))
Resulting in:
> df3
c1 c2
1 2 1
2 5 3
3 6 4
4 3 5
5 5 6
6 6 2

Rbind same data.frame with column switching

I am not new to R, but I cannot solve this problem: I have a data.frame and want to rbind the same data.frame with coloumn switching. But R does not switch the columns.
Example:
set.seed(13)
df <- data.frame(var1 = sample(5), var2 = sample(5))
> df
var1 var2
1 4 1
2 1 3
3 2 4
4 5 2
5 3 5
> rbind(df, df[,c(2,1)])
var1 var2
1 4 1
2 1 3
3 2 4
4 5 2
5 3 5
6 4 1
7 1 3
8 2 4
9 5 2
10 3 5
As you can see, the coloumns are not switched (row 6-10) whereas switching the columns alone works like a charm:
> df[,c(2,1)]
var2 var1
1 1 4
2 3 1
3 4 2
4 2 5
5 5 3
I guess this has something to do with the column names, but I cannot figure out what exacly.
Can anyone help?
Kind regards!
As pointed out by #Henrik, from ?rbind.data.frame: "The rbind data frame method [...] matches columns by name. So try this:
> rbind(df, setNames(df[,c(2,1)], c("var1", "var2")))
var1 var2
1 4 1
2 1 3
3 2 4
4 5 2
5 3 5
6 1 4
7 3 1
8 4 2
9 2 5
10 5 3
this also works:
> rbind(as.matrix(df), as.matrix(df[,c(2,1)]))

Summing two dataframes based on common value

I have a dataframe that looks like
day.of.week count
1 0 3
2 3 1
3 4 1
4 5 1
5 6 3
and another like
day.of.week count
1 0 17
2 1 6
3 2 1
4 3 1
5 4 5
6 5 1
7 6 13
I want to add the values from df1 to df2 based on day.of.week. I was trying to use ddply
total=ddply(merge(total, subtotal, all.x=TRUE,all.y=TRUE),
.(day.of.week), summarize, count=sum(count))
which almost works, but merge combines rows that have a shared value. For instance in the example above for day.of.week=5. Rather than being merged to two records each with count one, it is instead merged to one record of count one, so instead of total count of two I get a total count of one.
day.of.week count
1 0 3
2 0 17
3 1 6
4 2 1
5 3 1
6 4 1
7 4 5
8 5 1
9 6 3
10 6 13
There is no need to merge. You can simply do
ddply(rbind(d1, d2), .(day.of.week), summarize, sum_count = sum(count))
I have assumed that both data frames have identical column names day.of.week and count
In addition to the suggestion Ben gave you about using merge, you could also do this simply using subsetting:
d1 <- read.table(textConnection(" day.of.week count
1 0 3
2 3 1
3 4 1
4 5 1
5 6 3"),sep="",header = TRUE)
d2 <- read.table(textConnection(" day.of.week count1
1 0 17
2 1 6
3 2 1
4 3 1
5 4 5
6 5 1
7 6 13"),sep = "",header = TRUE)
d2[match(d1[,1],d2[,1]),2] <- d2[match(d1[,1],d2[,1]),2] + d1[,2]
> d2
day.of.week count1
1 0 20
2 1 6
3 2 1
4 3 2
5 4 6
6 5 2
7 6 16
This assumes no repeated day.of.week rows, since match will return only the first match.

Resources