Error in gvisLineChart while I have a data frame - r

I am trying to plot some numbers of a data frame using gvisLineChart. The data I am feeding is a data.frame but still I am getting the error that Error: data has to be a data.frame.
My data has one single column, I trimmed the zeros & want to plot the remaining numbers.
My data frame is like below;
mydf$figures
[1] 1250 760 2590 7990 2070 6770 4760 4270 2550 6070 4580 2350 1510 4140 2450 3010 1070 1230 850 490 170 1970 0 0
[25] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Then I trimmed the zeros;
mydf2<- subset(mydf,figures != 0)
mydf2$figures
[1] 1250 760 2590 7990 2070 6770 4760 4270 2550 6070 4580 2350 1510 4140 2450 3010 1070 1230 850 490 170 1970
Now I want to plot the numbers;
library(googleVis)
library(googleCharts)
gvisLineChart(mydf2$figures)
Error in gvisCoreChart(data, xvar, yvar, options, chartid, chart.type = "LineChart") :
Error: data has to be a data.frame.
But, when I am checking the class, it is data.frame
class(mydf2)
[1] "data.frame"
Please help me to understand the error properly & guide me how can I plot the numbers using gvisLineChart. TIA

Related

How to extract columns which contains lots of 0 value with R?

I have a matrix with lots of columns (more than 817.000) and 40 rows . I would like to extract the columns which contain lots of 0 (for example > 30 or 35 , no matter the number) .
That should extract several columns, and I will choose one randomnly which I will use as a reference for the rest of the matrix.
Any idea?
Edit :
OTU0001 OTU0004 OTU0014 OTU0016 OTU0017 OTU0027 OTU0029 OTU0030
Sample_10.rare 0 0 85 0 0 0 0 0
Sample_11.rare 0 42 169 0 42 127 0 85
Sample_12.rare 0 0 0 0 0 0 0 42
Sample_13.rare 762 550 2159 127 550 0 677 1397
Sample_14.rare 847 508 2751 169 1397 169 593 1990
Sample_15.rare 1143 593 3725 677 2116 466 212 2286
Sample_16.rare 5630 5291 5291 1270 3852 1185 296 2836
It should extract 4 columns, OTU0001 OTU0016 OTU0027 OTU0029 because they got 3 zero each. And if it is possible, I would like to extract the position of the extracted columns.
An option with base R
Filter(function(x) sum(x == 0) > 7, df)
You could do something like this (Where 7 is the number of relevant zeros):
library(dplyr)
df <- tibble(Col1 = c(rep(0, 10), rep(1, 10)),
Col2 = c(rep(0,5), rep(1, 15)),
Col3 = c(rep(0,15), rep(1, 5)))
y <- df %>%
select_if(function(col) length(which(col==0)) > 7)

Match and replace character in column in R

I have been working on the following issue and cannot seem to solve it. Please give me your suggestions on how to solve it.
Let's say I have the following data frame.
NAICS 2017 NAICS 2012_1 NAICS 2012_2 NAICS 2012_3 NAICS 2012_4
2100 2111 0 0 0
9110 9119 5114 0 0
1113 5676 4875 2186 1153
6220 6225 1293 0 0
1115 3234 2163 0 0
7110 7873 0 0 0
1100 2679 8153 2114 1145
I want to essentially replace the NAICS 2017 column with the matching 2-digit NAICS if it exists within the other 4 NAICS columns.
So the code would determine if there is a 2 digit match (2100 matches 2111) then replace the 2 digit code with the four digit code. (2110 becomes 2111).
Here is how the final code would look.
NAICS 2017 NAICS 2012_1 NAICS 2012_2 NAICS 2012_3 NAICS 2012_4
2111 2111 0 0 0
9119 9119 5114 0 0
1153 5676 4875 2186 1153
6225 6225 1293 0 0
1115 3234 2163 0 0
7110 7873 0 0 0
1145 2679 8153 2114 1145
Optional Addition: Only change the NAICS code if the NAICS 2017 column is a 2 or 3 digit code (i.e. 2100 or 2110).
Could this be done with a grepl or gsub code?
If you would like a full data set please don't hesitate to ask.
Try this, the update column is the result:
library(dplyr)
it <- data.table::fread(
" NAICS_2017 NAICS 2012_1 NAICS 2012_2 NAICS 2012_3 NAICS 2012_4
2100 2111 0 0 0
9110 9119 5114 0 0
1113 5676 4875 2186 1153
6220 6225 1293 0 0
1115 3234 2163 0 0
7110 7873 0 0 0
1100 2679 8153 2114 1145"
)
it <- mutate_all(it, as.character)
matchit <- function(x){
tmp <- x[-1]
mypattern = paste0("^",stringr::str_sub(x[[1]],1,2),".*$")
hit <- tmp[which(grepl(mypattern, tmp))]
return(ifelse(length(hit), hit[[1]], x[[1]]))
}
it$update <- apply(it, 1, matchit)
it
#> NAICS_2017 NAICS 2012_1 NAICS 2012_2 NAICS 2012_3 NAICS 2012_4 update
#> 1 2100 2111 0 0 0 2111
#> 2 9110 9119 5114 0 0 9119
#> 3 1113 5676 4875 2186 1153 1153
#> 4 6220 6225 1293 0 0 6225
#> 5 1115 3234 2163 0 0 1115
#> 6 7110 7873 0 0 0 7110
#> 7 1100 2679 8153 2114 1145 1145
Explanation:
The function returns the first match of the pattern in a row(except the 1st element), if not, it returns the 1st element,
Then we can update the data.frame by applying it to each row.

How can I call for something in a data.frame when the destinction has to be done in two columns?

Sorry for the very specific question, but I have a file as such:
Adj Year man mt wm wmt by bytl gr grtl
3 careless 1802 0 126 0 54 0 13 0 51
4 careless 1803 0 166 0 72 0 1 0 18
5 careless 1804 0 167 0 58 0 2 0 25
6 careless 1805 0 117 0 5 0 5 0 7
7 careless 1806 0 408 0 88 0 15 0 27
8 careless 1807 0 214 0 71 0 9 0 32
...
560 mean 1939 21 5988 8 1961 0 1152 0 1512
561 mean 1940 20 5810 6 1965 1 914 0 1444
562 mean 1941 10 6062 4 2097 5 964 0 1550
563 mean 1942 8 5352 2 1660 2 947 2 1506
564 mean 1943 14 5145 5 1614 1 878 4 1196
565 mean 1944 42 5630 6 1939 1 902 0 1583
566 mean 1945 17 6140 7 2192 4 1004 0 1906
Now I have to call for specific values (e.g. [careless,1804,man] or [mean, 1944, wmt].
Now I have no clue how to do that, one possibility would be to split the data.frame and create an array if I'm correct. But I'd love to have a simpler solution.
Thank you in advance!
Subsetting for specific values in Adj and Year column and selecting the man column will give you the required output.
df[df$Adj == "careless" & df$Year == 1804, "man"]

Convert data frame from wide to long with 2 variables

I have the following wide data frame (mydf.wide):
DAY JAN F1 FEB F2 MAR F3 APR F4 MAY F5 JUN F6 JUL F7 AUG F8 SEP F9 OCT F10 NOV F11 DEC F12
1 169 0 296 0 1095 0 599 0 1361 0 1746 0 2411 0 2516 0 1614 0 908 0 488 0 209 0
2 193 0 554 0 1085 0 1820 0 1723 0 2787 0 2548 0 1402 0 1633 0 897 0 411 0 250 0
3 246 0 533 0 1111 0 1817 0 2238 0 2747 0 1575 0 1912 0 705 0 813 0 156 0 164 0
4 222 0 547 0 1125 0 1789 0 2181 0 2309 0 1569 0 1798 0 1463 0 878 0 241 0 230 0
I want to produce the following "semi-long":
DAY variable_month value_month value_F
1 JAN 169 0
I tried:
library(reshape2)
mydf.long <- melt(mydf.wide, id.vars=c("YEAR","DAY"), measure.vars=c("JAN","FEB","MAR","APR","MAY","JUN","JUL","AUG","SEP","OCT","NOV","DEC"))
but this skip the F variable and I don't know how to deal with two variables...
This is one of those cases where reshape(...) in base R is a better option.
months <- c(2,4,6,8,10,12,14,16,18,20,22,24) # column numbers of months
F <- c(3,5,7,9,11,13,15,17,19,21,23,25) # column numbers of Fn
mydf.long <- reshape(mydf.wide,idvar=1,
times=colnames(mydf.wide)[months],
varying=list(months,F),
v.names=c("value_month","value_F"),
direction="long")
colnames(mydf.long)[2] <- "variable_month"
head(mydf.long)
# DAY variable_month value_month value_F
# 1.JAN 1 JAN 169 0
# 2.JAN 2 JAN 193 0
# 3.JAN 3 JAN 246 0
# 4.JAN 4 JAN 222 0
# 1.FEB 1 FEB 296 0
# 2.FEB 2 FEB 554 0
You can also do this with 2 calls to melt(...)
library(reshape2)
months <- c(2,4,6,8,10,12,14,16,18,20,22,24) # column numbers of months
F <- c(3,5,7,9,11,13,15,17,19,21,23,25) # column numbers of Fn
z.1 <- melt(mydf.wide,id=1,measure=months,
variable.name="variable_month",value.name="value_month")
z.2 <- melt(mydf.wide,id=1,measure=F,value.name="value_F")
mydf.long <- cbind(z.1,value_F=z.2$value_F)
head(mydf.long)
# DAY variable_month value_month z.2$value_F
# 1 1 JAN 169 0
# 2 2 JAN 193 0
# 3 3 JAN 246 0
# 4 4 JAN 222 0
# 5 1 FEB 296 0
# 6 2 FEB 554 0
melt() and dcast() are available from the reshape2 and data.table packages. The recent versions of data.table allow to melt multiple columns simultaneously. The patterns() parameter can be used to specify the two sets of columns by regular expressions:
library(data.table) # CRAN version 1.10.4 used
regex_month <- toupper(paste(month.abb, collapse = "|"))
mydf.long <- melt(setDT(mydf.wide), measure.vars = patterns(regex_month, "F\\d"),
value.name = c("MONTH", "F"))
# rename factor levels
mydf.long[, variable := forcats::lvls_revalue(variable, toupper(month.abb))][]
DAY variable MONTH F
1: 1 JAN 169 0
2: 2 JAN 193 0
3: 3 JAN 246 0
4: 4 JAN 222 0
5: 1 FEB 296 0
...
44: 4 NOV 241 0
45: 1 DEC 209 0
46: 2 DEC 250 0
47: 3 DEC 164 0
48: 4 DEC 230 0
DAY variable MONTH F
Note that "F\\d" is used as regular expression in patterns(). A simple "F" would have catched FEB as well as F1, F2, etc. producing unexpected results.
Also note that mydf.wide needs to be coerced to a data.table object. Otherwise, reshape2::melt() will be dispatched on a data.frame object which doesn't recognize patterns().
Data
library(data.table)
mydf.wide <- fread(
"DAY JAN F1 FEB F2 MAR F3 APR F4 MAY F5 JUN F6 JUL F7 AUG F8 SEP F9 OCT F10 NOV F11 DEC F12
1 169 0 296 0 1095 0 599 0 1361 0 1746 0 2411 0 2516 0 1614 0 908 0 488 0 209 0
2 193 0 554 0 1085 0 1820 0 1723 0 2787 0 2548 0 1402 0 1633 0 897 0 411 0 250 0
3 246 0 533 0 1111 0 1817 0 2238 0 2747 0 1575 0 1912 0 705 0 813 0 156 0 164 0
4 222 0 547 0 1125 0 1789 0 2181 0 2309 0 1569 0 1798 0 1463 0 878 0 241 0 230 0",
data.table = FALSE)

How to remove rows with 0 values using R

Hi am using a matrix of gene expression, frag counts to calculate differentially expressed genes. I would like to know how to remove the rows which have values as 0. Then my data set will be compact and less spurious results will be given for the downstream analysis I do using this matrix.
Input
gene ZPT.1 ZPT.0 ZPT.2 ZPT.3 PDGT.1 PDGT.0
XLOC_000001 3516 626 1277 770 4309 9030
XLOC_000002 342 82 185 72 835 1095
XLOC_000003 2000 361 867 438 454 687
XLOC_000004 143 30 67 37 90 236
XLOC_000005 0 0 0 0 0 0
XLOC_000006 0 0 0 0 0 0
XLOC_000007 0 0 0 0 1 3
XLOC_000008 0 0 0 0 0 0
XLOC_000009 0 0 0 0 0 0
XLOC_000010 7 1 5 3 0 1
XLOC_000011 63 10 19 15 92 228
Desired output
gene ZPT.1 ZPT.0 ZPT.2 ZPT.3 PDGT.1 PDGT.0
XLOC_000001 3516 626 1277 770 4309 9030
XLOC_000002 342 82 185 72 835 1095
XLOC_000003 2000 361 867 438 454 687
XLOC_000004 143 30 67 37 90 236
XLOC_000007 0 0 0 0 1 3
XLOC_000010 7 1 5 3 0 1
XLOC_000011 63 10 19 15 92 228
As of now I only want to remove those rows where all the frag count columns are 0 if in any row some values are 0 and others are non zero I would like to keep that row intact as you can see my example above.
Please let me know how to do this.
df[apply(df[,-1], 1, function(x) !all(x==0)),]
A lot of options to do this within the tidyverse have been posted here: How to remove rows where all columns are zero using dplyr pipe
my preferred option is using rowwise()
library(tidyverse)
df <- df %>%
rowwise() %>%
filter(sum(c(col1,col2,col3)) != 0)

Resources