This question already has answers here:
Remove columns with zero values from a dataframe
(10 answers)
Closed 4 years ago.
I'm trying to recreate a data frame (DC5_prod) that has hundreds of columns, but many without any values other than zero.
The first column in the data frame is text and the rest are numeric. Is there a way to ignore the first column while simultaneously eliminating the remainder of columns that are composed entirely of zeros?
DC5_Prod
a b c d e f
1 AK 0 0 0 0 1
2 JI 0 0 0 0 0
The above is a snippet of how it currently stands and would want an output of:
DC5_Prod
a f
1 AK 1
2 JI 0
When I attempt to utilize the solution issued on a similar question on the site:
DC5_prod[, colSums(DC5_prod != 0) > 0]
just essentially returns the first column without removing any.
Try this R base approach
> ind <- sapply(DC5_Prod, function(x) sum(x==0)) != nrow(DC5_Prod)
> DC5_Prod[,ind]
a f
1 AK 1
2 JI 0
Related
This question already has answers here:
How do I extract a single column from a data.frame as a data.frame?
(3 answers)
Closed 2 years ago.
I have a data frame:
L1 2020 NA
1 1 0 0
2 2 1 0
3 3 1 0
I want to delete first and last column, to get dataframe like this:
2020
1 0
2 1
3 1
I tried:
1)
df <- df[,-c(1,ncol(df))]
or 2)
df <- subset(df, select = -c(1,ncol(df)))
For both I get result:
[1] 0 1 1
So I guess it changed data frame into vector. How can I delete these columns to keep it as a data frame?It is important for me to keep it like this. I don't have this problem when there are more columns. It changes only when one column is supposed to be left.
After specifiing the columns in the square-brackets, add ,drop=FALSE right after it.
The drop-argument is TRUE by default and you are struggling with this default.
df <- data.frame(a=1:10,b=1:10)
df[,1] #R simplifies to a vector via implicit drop=TRUE default
df[,1,drop=FALSE] #dataframe-structure remains
This question already has answers here:
Reshape three column data frame to matrix ("long" to "wide" format) [duplicate]
(6 answers)
Closed 5 years ago.
I have a data frame with four columns :
df=data.frame( UserId=c(1,2,2,2,3,3), CatoId=c('C','A','B','C','D','E'), No=c(1,9,2,2,5,3))
UserId CatoId No
1 C 1
2 A 9
2 B 2
2 C 2
3 D 5
3 E 3
I would like to transform the structure into the following one :
UserId A B C D E
1 0 0 1 0 0
2 9 2 2 0 0
3 0 0 0 5 3
Where the columns represents all possible values in CatoId.
The first data frame has 2 million rows and CatoId has 21 different values. So I don't want to use any loops. Is there a way to do this with R. Otherwise what is the best way to proceed?
My goal would be to apply a clustering algorithm on the last dataframe.
You can do this using dcast:
df1 <- dcast(df, UserId ~ CatoId, value.var = "No", fill = 0)
This question already has answers here:
Find all rows of matrix equal to vector
(3 answers)
Closed 6 years ago.
R beginner here, need ur help. Lets say we have a matrix like this one:
1 2 3
1 1 0 0
2 0 1 0
3 0 0 1
4 1 1 0
5 1 0 1
6 0 1 1
7 1 1 1
Next we have a certain vector f.e. (1, 0, 1), wich would be matching row 5.
Whats the best way to get the index 5 from the matrix given that vector?
I have allready read the questions
R - fastest way to select the rows of a matrix that satisfy multiple conditions
and
In R, select rows of a matrix that meet a condition
but i think the situation differs in this case. Thanx for your input!
I can propose combination of which, apply, and all functions.
m <- matrix(c(1,0,0,0,1,0,0,0,1,1,1,0,1,0,1,0,1,1,1,1,1), 7, byrow=TRUE)
which(apply(m, 1, function(x) return(all(x == c(1,0,1)))))
[1] 5
We can use rowSums
which(rowSums(m1 == rep(c(1,0,1), each = nrow(m1)))==3)
#5
#5
This question already exists:
R ffdfdply reset cumsum using data.table
Closed 9 years ago.
I am using the ff package to load an excel file.
i=as.ffdf(data.frame(a=c(1,1,1,1,1,1), b=c(1,4,6,2,5,3), c=c(1,1,1,1,1,1), d=c(1,0,1,1,0,1)))
I am trying to get the cumulative sum on column d and reset it whenever it found 0. I am trying to get the below output.
a b c d Result
1 1 1 1 1
1 4 1 0 0
1 6 1 1 1
1 2 1 1 2
1 5 1 0 0
1 3 1 1 1
I know, I can easily achieved it through ddply but I have large set of data rows i.e. > 5000000 rows.
Thanks
This will work but little bit slower with 24385601 rows. I created unique combination on column a and c and use the Arun solution. Key column (key_a_c) is used to split the data set i.e. to reset cumsum.
Create a unique key on column a and c
i$key_a_c <- ikey(i[c("a", "c")])
Generate cumulative series by spliting on the basis of key_a_c
p1=ffdfdply(i, split=as.character(i$key_a_c), FUN= function(x) {
x$Result <- as.ff(x[, "d"] * sequence(rle(x[, "d"])$lengths))
as.data.frame(x)
}, trace=T)
Please share your views and code if you have some optimized solution.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
how to apply a function to every row of a matrix (or a data frame) in R
R - how to call apply-like function on each row of dataframe with multiple arguments from each row of the df
I want to apply a function to each row in a data frame, however, R applies it to each column by default. How do I force it otherwise?
> a = as.data.frame(list(c(1,2,3),c(10,0,6)),header=T)
> a
c.1..2..3. c.10..0..6.
1 1 10
2 2 0
3 3 6
> sapply(a,min)
c.1..2..3. c.10..0..6.
1 0
I wanted something like
1 2
2 0
3 3
You want apply (see the docs for it). apply(var,1,fun) will apply to rows, apply(var,2,fun) will apply to columns.
> apply(a,1,min)
[1] 1 0 3