How to convert a cudf.core.dataframe.DataFrame into a pandas.DataFrame? - cudf

I have a cudf dataframe
type(pred)
> cudf.core.dataframe.DataFrame
print(pred)
> action
1778378 0
1778379 1
1778381 1
1778383 0
1778384 0
... ...
2390444 0
2390446 0
2390478 0
2390481 0
2390489 1
that I would like to convert to a pandas.DataFrame(). Though
pd.DataFrame(pred)
> 0
0 action
And just found the answer:
pred.to_pandas()

Related

Apply different array to each column and print results from each array programmatically?

I have a table with many columns (fields). In the first field, I need to retain only unique values. In the subsequent columns, I need to count the original number of values present in the first column, but only if the value in a given column is > 0.
I've managed to partially accomplish this with awk, but my current attempt would require me to manually create an array for every column in the table and manually type each array for the print command. This isn't really feasible.
Any help/suggestions (and explanation of how a potential solution works) would be greatly appreciated.
Here's a subset of the INPUT TABLE (it has already been sorted on column 1):
ATP6 93.883156 55.84006
COX1 230.708456 63.109
COX2 179.993226 74.224269
COX3 169.945901 72.036519
CYTB 228.799722 87.575892
LOC111099029 0.926958 6.124982
LOC111099030 10.124096 5.024844
LOC111099031 0 0
LOC111099031 0 0
LOC111099031 2.279801 2.289838
LOC111099032 17.674714 12.796428
LOC111099033 5.259716 7.326938
LOC111099034 3.514635 2.858349
LOC111099035 0 0
LOC111099035 1.929607 4.409107
LOC111099036 0 0
LOC111099036 1.45196 7.58513
LOC111099037 21.520663 26.353308
LOC111099038 6.019084 5.311657
LOC111099039 12.858404 13.689644
LOC111099040 0 0
LOC111099040 0 0
LOC111099040 0 0
LOC111099040 0 0
LOC111099040 0 0
LOC111099040 0 0
LOC111099040 0 0
LOC111099040 0 0
LOC111099040 0 0
LOC111099040 0 0
LOC111099040 0 0
LOC111099040 0 0
LOC111099040 0 0
LOC111099040 0 0
LOC111099040 0 0
LOC111099040 0 0
LOC111099040 0 0
LOC111099040 0 0
LOC111099040 0 0
LOC111099040 0 0
LOC111099040 0.354202 0.265986
LOC111099040 0.587969 0
LOC111099040 2.620288 1.077892
LOC111099040 4.290659 3.487692
LOC111099040 6.42671 6.906503
LOC111099041 0 0
LOC111099041 3.892818 4.934959
LOC111099042 0 0
LOC111099042 13.86859 14.319505
LOC111099043 0 0
Here's an example of the DESIRED OUTPUT:
LOC111099030 1 1
CYTB 1 1
LOC111099042 1 1
LOC111099037 1 1
LOC111099033 1 1
COX3 1 1
ATP6 1 1
LOC111099039 1 1
LOC111099036 1 1
LOC111099040 5 4
LOC111099035 1 1
LOC111099032 1 1
COX2 1 1
LOC111099038 1 1
LOC111099031 1 1
COX1 1 1
LOC111099029 1 1
LOC111099041 1 1
LOC111099034 1 1
Here's the code I've run to obtain the output above:
awk '{if ($2 > 0) gene_name[$1]++}; {if ($3 > 0) col3_arr[$1]++}; END{ for (var in gene_name) print var, "\t", gene_name[var], col3_arr[var]}' input_file.txt
P.S. I'm also open to a solution in R, as this manipulation is part of a larger R Markdown notebook. I went the awk route because I'm not particularly well-versed with R.
In R, with dplyr:
library(dplyr)
desired_result = your_data %>%
group_by(name_of_first_column) %>%
summarize(across(everything(), ~sum(. > 0)))
In base R, we may use rowsum
rowsum(+(df1[-1] > 0), df1[[1]])

How to fix rows order with pheatmap?

I have generate a heatmap with pheatmap and for some reasons, I want that the rows appear in a predefined order.
I see in previous posts that the solution is to set the paramater cluster_row to FALSE, and to order the matrix in the order we want, like this in my case:
Otu0085 Otu0086 Otu0087 Otu0088 Otu0091
AB200 0 0 0 0 0
2 91 0 2 1 0
20CF360 0 1 0 1 0
19CF359 0 0 0 2 0
11VP12 0 0 0 0 155
11VP04 4 1 0 0 345
However, when I do:
pheatmap(shared,cluster_rows = F)
My rows are sorted alphabetically, like this:
10CF278a
11
11AA07
11CF278b
11VP03
11VP04
11VP05
11VP06
11VP08
11VP09
ANy suggestions would be welcome
Thank's by advance

Error return by R predict function or underlying Rcpp

I apparently have successively used a newer R package called milr, multiple instance logistic regression. Admittedly, I do not make any claims regarding the goodness of the model. However, when I try to use the model to predict I get the error
Error in logit(cbind(1, newdata), .) : not compatible with requested type
when I call predict as follows:
miltp <- predict(milt, SQFM.te, SQFM.teb, type="bag") and
miltp <- predict(milt, SQFM.te, SQFM.teb)
However I get a NULL return when I call it as:
miltp <- predict(milt, SQFM.te, SQFM.teb, type="response") and
miltp <- predict(milt, SQFM.te, SQFM.teb, type="class")
I have tried using factors, integers and numerics, I am perplexed. My online search only yielded
Rcpp: Error: not compatible with requested type
which is not helpful for me as R and C++ is over my head. All comments are appreciated, some input info is given below I have tried some conversions
str(SQFM.te)
'data.frame': 100369 obs. of 5 variables:
$ arstmade: int 0 0 0 0 0 0 0 0 0 0 ...
$ perstop : int 0 0 0 0 0 0 0 0 0 0 ...
$ trhsloc : int 0 0 0 0 0 0 0 0 0 0 ...
$ acrept : int 0 0 0 0 0 0 0 0 0 0 ...
$ radio : int 1 1 1 1 1 1 1 1 1 1 ...
str(SQFM.teb)
int [1:100369] 3 3 3 3 3 3 3 3 3 3 ...
print(milt)
Coefficients:
intercept arstmade perstop trhsloc acrept radio
-1.69306 -0.09544 -7.95369 -0.53375 0.16506 -0.61778
Residual Deviance: Inf
BIC: Inf

UFF58 File reader using R Program

I have a input uff file with 'n' no.of channels. I want to read the UFF file and also split the values based on each individual channel. Then store the result for each channel in separate file. Each channel always start with '-1' '58' etc., and end with '-1'.
Example channel_01 from the input UFF file:
-1
58
filename
22-Mar-2016 10:16:53
164
MnBrgFr-AC225R/N;50.9683995923 mV/m/s2
0 0 0 0 channel_01 0 0 NONE 0 0
2 1048576 1 0.00000E+00 8.19669930804e-06 0.00000E+00
17 0 0 0 Time s
1 0 0 0 MnBrgFr-AC225R/N m/s2
0 0 0 0 NONE NONE
0 0 0 0 NONE NONE
392.665124452 392.659048025 392.658404832 392.661676933 392.665882251 392.671989083
392.67634175 392.673743248 392.672398388 392.669360175 392.665533757 392.66088639
392.660390546 392.660975268 392.663400693 392.662668621 392.661209156 392.65498538
392.649463269 392.649580214 392.649259786 392.658580248 392.664715147 392.667051694
-1

adding elements with same name in a matrix

I have two matrices
Mdates<-c("8Q1","8Q2","8Q3","8Q4","9Q1","9Q2","9Q3","9Q4","10Q1","10Q2","10Q3","10Q4","11Q1","11Q2","11Q3","11Q4","12Q1","12Q2","12Q3","12Q4","13Q1","13Q2","13Q3","14Q1","14Q2")
Cr<-matrix(c("14Q2","13Q2","14Q2","14Q1","13Q4","13Q4","12Q4","13Q3","13Q4","12Q3","14Q2",12867.8,12710.7,10746.9,9634.4,8238.5,7835.2,7315.6,7263.1,7002.7,6104.8,5759.3),ncol=2,byrow=FALSE)
I add all the things with the same name in Cr and put it under the same column name in Mdates, so idealy it would look like this:
8Q1 8Q2 8Q3 8Q4 9Q1 9Q2 9Q3 9Q4 10Q1 10Q2 10Q3 10Q4 11Q1 11Q2 11Q3 11Q4 12Q1 12Q2 12Q3 12Q4 13Q1 13Q2 13Q3 13Q4 14Q1 14Q2
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6104.8 7315.6 0 12710.7 7263.1 15241.3 9634.4 29373.9
You could try:
res <- tapply(as.numeric(Cr[,2]), factor(Cr[,1], levels=unique(Mdates)), FUN=sum)
res[is.na(res)] <- 0
res
# 8Q1 8Q2 8Q3 8Q4 9Q1 9Q2 9Q3 9Q4 10Q1 10Q2 10Q3 10Q4 11Q1
# 0 0 0 0 0 0 0 0 0 0 0 0 0
#11Q2 11Q3 11Q4 12Q1 12Q2 12Q3 12Q4 13Q1 13Q2 13Q3 14Q1 14Q2
# 0 0 0 0 0 6105 7316 0 12711 7263 9634 29374
I think the following does it.
First it selects the elements in Cr that are found in Mdates:
A<-Cr[ ,1]
B<-which(A %in% Mdates)
Crnew<-Cr[B, ]
The following step provides the summed values for each category:
fac <- as.factor(Crnew[ ,1])
num <- as.numeric(Crnew[ ,2])
x <-data.frame(fac, num)
tapply(x$num, x$fac, FUN=sum)

Resources