Count number of 0's on each column - r

My problem is quite simple but I not beeing able to solve him. I have a tibble dataframe and want to know how much 0's each column have. I tried to use the function sum(dataframe$column == 0) on each column but I think this is kinda inefficient since I want to apply this to a bunch of differents dataframes. Are there any other more automatic way to do it?

Related

How can I exclude partial duplications of a dataframe?

I am a beginner in R and I have a question about dataframes.
In my case, I have this data.frame as a example:
I just want one ENSEMBL_ID Per SYMBOL, so I'd like to get rid of one of these. Can someone tell me how could I do this? In this case unique(data.frame) does not work as there is a difference in the third column.

Extracting different vectors from a single column of data (in R)

I have a small problem, which I don't think is too hard, but I couldn't find any answer here (maybe I phrased my research wrong so please excuse me if the question has already been asked!)
I am importing data from an excel sheet which is split in two columns as in the following picture:
Now, I am trying to import all the data in the second column to my R script, but by splitting it into different vectors: one vector for category A, one for category B, etc... by keeping the data points in the order they are in the file (because as it happens, they are in chronological order).
Now, the categories each have a different number of elements, however, they are ordered alphabetically (ie you'll never find an A in the B's, for example). So I guess that makes it easier, but I'm still a novice with R and I don't really know how to proceed without getting really messy with the code and I know there's probably a simple way of doing it.
Does anyone have an idea on how to treat this nicely please? :)
We can use split in base R to return a list of vectors of 'Data' based on the unique values in 'Category'
lst1 <- split(df1$Data, df1$Category)

expand.grid - try to solve "cannot allocate vector of size" issue

I need to create huge data.frame of combinations, but I don't need them all. But as I saw here, expand.grid function is not able to add specific condition which combination throw out.
So I decided to go step by step. For example I have
variants<-9 # number of possible variants
aa<-c(0:variants) # vector of possible variants
ab<-c(0:variants)
ac<-c(0:variants)
ad<-c(0:variants)
ae<-c(0:variants)
af<-c(0:variants)
ag<-c(0:variants)
ah<-c(0:variants)
ai<-c(0:variants)
aj<-c(0:variants)
If I try to
expand.grid(aa,ab,ac,ad,ae,af,ag,ah,ai,aj)
the "cannot allocate vector of size" issue comes ..
So I tried to go step by step like
step<-2 # it is a condition for subsetting the grid
grid_2<-expand.grid(aa,ab)
sub_grid_2<-grid_2[abs(grid_2[,1]-grid_2[,2])<=step,]
which gives me combinations I need. To save memory I add then another column like
fun_grid_list_3<-function(x){
a<-sub_grid_2[x,1]
b<-sub_grid_2[x,2]
d<-rep(c(1:variants))
c<-data.frame(Var1=rep(a,variants),Var2=rep(b,variants),Var3=d)
return(c)
}
sublist_grid_3<-mclapply(c(1:nrow(sub_grid_2)),fun_grid_list_3,mc.cores=detectCores(),mc.preschedule=FALSE)
sub_grid_3=ldply(sublist_grid_3)
But the problem comes when I come to grid of 8 and more variables. It takes so much time, but it should be just adding a number into another frame. Maybe I am wrong and it trully need that time but I hope there is a more efficient way how to do this.
All I need is to create expand.grid of 2 variables, then add condition to subset it. Then add another column which respects the subsetted grid (add c(0:variants) to every row, it means create more rows of course ... and then subset it by condition and so ....
Can anybody help to make it faster? I hoped that use mclapply trought function should be the fastest, but maybe not ..
Thanks to anyone ...

Selecting rows in a long dataframe based on a short list

I'm sure this should be easier to do than the way I know how to do it.
I'd like to apply fields from a short dataframe back into a long one based on matching a common factor.
Example short dataframe, list of valid cases:
$ptid (factor) values 1,2,3,4,5...20
$valid 1/0 (to represent true/false; variable through ptid)
long dataframe has 15k rows, each level of $ptid will have several thousand rows
I want to apply $valid onto those rows when the it is 1/true from the list above
The way I know how to do it is to loop through each row of long dataframe, but this is horribly inelegant and also slow.
I have a niggling feeling there is a much better way with dply or similar and I'd really like to learn how.
Worked this out based on the comments, thank you Colonel.
combination_dataset <- Merge(short_dataframe, long_dataframe) worked (very quickly).
Thanks to those who commented.

Using ifelse statement to condense variables

New to R, taking a very accelerated class with very minimal instruction. So I apologize in advance if this is a rookie question.
The assignment I have is to take a specific column that has 21 levels from a dataframe, and condense them into 4 levels, using an if, or ifelse statement. I've tried what feels like hundreds of combinations, but this is the code that seemed most promising:
> b2$LANDFORM=ifelse(b2$LANDFORM=="af","af_type",
ifelse(b2$LANDFORM=="aflb","af_type",
ifelse(b2$LANDFORM=="afub","af_type",
ifelse(b2$LANDFORD=="afwb","af_type",
ifelse(b2$LANDFORM=="afws","af_type",
ifelse(b2$LANDFORM=="bfr","bf_type",
ifelse(b2$LANDFORM=="bfrlb","bf_type",
ifelse(b2$LANDFORM=="bfrwb","bf_type",
ifelse(b2$LANDFORM=="bfrwbws","bf_type",
ifelse(b2$LANDFORM=="bfrws","bf_type",
ifelse(b2$LANDFORM=="lb","lb_type",
ifelse(bs$LANDFORM=="lbaf","lb_type",
ifelse(b2$LANDFORM=="lbub","lb_type",
ifelse(b2$LANDFORM=="lbwb","lb_type","ws_type"))))))))))))))
LANDFORM is a factor, but I tried changing it to a character too, and the code still didn't work.
"ws_type" is the catch all for the remaining variables.
the code runs without errors, but when I check it, all I get is:
> unique(b2$LANDFORM)
[1] NA "af_type"
Am I even on the right path? Any suggestions? Should I bite the bullet and make a new column with substr()? Thanks in advance.
If your new levels are just the first two letters of the old ones followed by _type you can easily achieve what you want through:
#prototype of your column
mycol<-factor(sample(c("aflb","afub","afwb","afws","bfrlb","bfrwb","bfrws","lb","lbwb","lbws","wslb","wsub"), replace=TRUE, size=100))
as.factor(paste(sep="",substr(mycol,1,2),"_type"))
After a great deal of experimenting, I consulted a co-worker, and he was able to simplify a huge amount of this. Basically, I should have made a new column composed of the first two letters of the variables in LANDFORM, and then sample from that new column and replace values in LANDFORM, in order to make the ifelse() statement much shorter. The code is:
> b2$index=as.factor(substring(b2$LANDFORM,1,2))
b2$LANDFORM=ifelse(b2$index=="af","af_type",
ifelse(b2$index=="bf","bf_type",
ifelse(b2$index=="lb","lb_type",
ifelse(b2$index=="wb","wb_type",
ifelse(b2$index=="ws","ws_type","ub_type")))))
b2$LANDFORM=as.factor(b2$LANDFORM)
Thanks to everyone who gave me some guidance!

Resources