I am trying to pass particular domain else all the values replace with NULL
if BB= TRUE, or If BB = False then all the values should be there.
df6 <- data.frame(name=c("try,xab","xab,Lan","mhy,mun","vgtu,mmc","dgsy,aaf","kull,nnhu","hula,njam","mund,jiha","htfy,ntha","bhr,gydbt","sgyu,hytb","vdti,kula","mftyu,huta","ibdy,vcge","cday,bhsue","ajtu,nudj"),
email=c("xab.try#ybcd.com","Lan.xab#ybcd.com","tth.vgu#ybcd.com","mmc.vgtu#ybcd.com","aaf.dgsy#partnt.com","nnhu.kull#ybcd.com","njam.hula#ybcd.com","jiha.mund#ybcd.com","ntha.htfy#ybcd.com","gydbt.bhr#ybcd.com","hytb.sgyu#ybcd.com","kula.vdti#ybcd.com","huta.mftyu#ybcd.com","ggat.khul#ybcd.com","bhsue.cday#ybcd.com","nudj.ajtu#ybcd.com"))
BB=TRUE
col_drop <- c("partnt.com")
df6 <- ifelse(BB==TRUE,
df6 <- df6[ , !(names(df6) %in% col_drop)],df6) %>% as.data.frame()
the output should be like
This works for me :)
library(dplyr, warn.conflicts = FALSE)
df6 <- data.frame(name=c("try,xab","xab,Lan","mhy,mun","vgtu,mmc","dgsy,aaf","kull,nnhu","hula,njam","mund,jiha","htfy,ntha","bhr,gydbt","sgyu,hytb","vdti,kula","mftyu,huta","ibdy,vcge","cday,bhsue","ajtu,nudj"),
email=c("xab.try#ybcd.com","Lan.xab#ybcd.com","tth.vgu#ybcd.com","mmc.vgtu#ybcd.com","aaf.dgsy#partnt.com","nnhu.kull#ybcd.com","njam.hula#ybcd.com","jiha.mund#ybcd.com","ntha.htfy#ybcd.com","gydbt.bhr#ybcd.com","hytb.sgyu#ybcd.com","kula.vdti#ybcd.com","huta.mftyu#ybcd.com","ggat.khul#ybcd.com","bhsue.cday#ybcd.com","nudj.ajtu#ybcd.com"))
col_drop <- c("partnt.com")
mutate(df6, email = if_else(grepl(col_drop, email), email, NULL))
#> name email
#> 1 try,xab <NA>
#> 2 xab,Lan <NA>
#> 3 mhy,mun <NA>
#> 4 vgtu,mmc <NA>
#> 5 dgsy,aaf aaf.dgsy#partnt.com
#> 6 kull,nnhu <NA>
#> 7 hula,njam <NA>
#> 8 mund,jiha <NA>
#> 9 htfy,ntha <NA>
#> 10 bhr,gydbt <NA>
#> 11 sgyu,hytb <NA>
#> 12 vdti,kula <NA>
#> 13 mftyu,huta <NA>
#> 14 ibdy,vcge <NA>
#> 15 cday,bhsue <NA>
#> 16 ajtu,nudj <NA>
Created on 2020-09-27 by the reprex package (v0.3.0)
Does this work:
> df6[!grepl('partnt.com', df6$email), 'email'] <- NA
> df6
name email
1 try,xab <NA>
2 xab,Lan <NA>
3 mhy,mun <NA>
4 vgtu,mmc <NA>
5 dgsy,aaf aaf.dgsy#partnt.com
6 kull,nnhu <NA>
7 hula,njam <NA>
8 mund,jiha <NA>
9 htfy,ntha <NA>
10 bhr,gydbt <NA>
11 sgyu,hytb <NA>
12 vdti,kula <NA>
13 mftyu,huta <NA>
14 ibdy,vcge <NA>
15 cday,bhsue <NA>
16 ajtu,nudj <NA>
>
below should work:
library(data.table)
setDT(df6)
BB <- TRUE
domain_to_keep <- "partnt.com"
df6[BB & !grepl(paste0("#", domain_to_keep, "$"), email) , email := "" ]
I am new in processing RNA-seq data. I have human RNA-seq data and am now trying to count the genes using summarizeoverlaps, but I get this warning for all my files:
" In .Seqinfo.mergexy(x, y) :
The 2 combined objects have no sequence levels in common. (Use
suppressWarnings() to suppress this warning.)"
This is what I did:
I aligned my RNA seq files to the an Ensembl reference file (Homo_sapiens.GRCh38.cdna.all.fa.gz) and generated BAM files.
seqinfo(bamfiles)`
Seqinfo object with 190508 sequences from an unspecified genome:
seqnames seqlengths isCircular genome
ENST00000633009.1 20 <NA> <NA>
ENST00000634070.1 18 <NA> <NA>
ENST00000632963.1 20 <NA> <NA>
ENST00000633030.1 19 <NA> <NA>
ENST00000633765.1 31 <NA> <NA>
... ... ... ...
ENST00000638565.1 1331 <NA> <NA>
ENST00000673346.1 895 <NA> <NA>
ENST00000673247.1 369 <NA> <NA>
ENST00000672305.1 758 <NA> <NA>
ENST00000671911.1 943 <NA> <NA>
I also downloaded the GTF file from Ensembl:
Homo_sapiens.GRCh38.100.gtf.gz
seqinfo(txdb)
Seqinfo object with 47 sequences (1 circular) from an unspecified genome; no seqlengths:
seqnames seqlengths isCircular genome
1 <NA> <NA> <NA>
2 <NA> <NA> <NA>
3 <NA> <NA> <NA>
4 <NA> <NA> <NA>
5 <NA> <NA> <NA>
... ... ... ...
KI270731.1 <NA> <NA> <NA>
KI270733.1 <NA> <NA> <NA>
KI270734.1 <NA> <NA> <NA>
KI270744.1 <NA> <NA> <NA>
KI270750.1 <NA> <NA> <NA>
I am guessing it has something to do with the seqnames, but I am not sure what I have to do. I tried converting it to Ensembl-style:
mapSeqlevels(seqlevels(bamfiles), "Ensembl")
mapSeqlevels(seqlevels(txdb), "Ensembl")
but that did no do anything...
NB featurecounts does not work either...
Thanks in advance!
Sandra
I am trying to calculate readability, but it seems everything is written to expect either a file path or a Corpus. How do I handle a string?
Error (on the tokenization step):
Error: Unable to locate
I tried:
str<-"Readability zero one. Ten, Eleven.", "The cat in a dilapidated tophat."
library(koRpus)
ll.tagged <- tokenize(str, lang="en")
readability(ll.tagged,measure="Flesch.Kincaid")
You need to download the language file
install.koRpus.lang(c("en"))
library(koRpus.lang.en)
ll.tagged <- tokenize(str, format = "obj", lang = "en")
ll.tagged
doc_id token tag lemma lttr wclass desc stop stem idx sntc
1 <NA> Readability word.kRp 11 word <NA> <NA> <NA> 1 1
2 <NA> zero word.kRp 4 word <NA> <NA> <NA> 2 1
3 <NA> one word.kRp 3 word <NA> <NA> <NA> 3 1
4 <NA> . .kRp 1 fullstop <NA> <NA> <NA> 4 1
5 <NA> Ten word.kRp 3 word <NA> <NA> <NA> 5 2
6 <NA> , ,kRp 1 comma <NA> <NA> <NA> 6 2
[...]
10 <NA> cat word.kRp 3 word <NA> <NA> <NA> 10 3
11 <NA> in word.kRp 2 word <NA> <NA> <NA> 11 3
12 <NA> a word.kRp 1 word <NA> <NA> <NA> 12 3
13 <NA> dilapidated word.kRp 11 word <NA> <NA> <NA> 13 3
14 <NA> tophat word.kRp 6 word <NA> <NA> <NA> 14 3
15 <NA> . .kRp 1 fullstop <NA> <NA> <NA> 15 3
Sorry, the title may not describe well
I have a dataframe form google history
original
> head(testAC)
latitudeE7 longitudeE7 activity
1 247915291 1209946249 NULL
2 248033293 1209803613 NULL
3 248033293 1209803613 1505536182769, IN_VEHICLE, STILL, UNKNOWN, 54, 31, 15
result
> head(testAC)
latitudeE7|longitudeE7| activityTime|mainactivity| speed
1 247915291| 1209946249| | NULL |
2 248033293| 1209803613| | NULL |
3 248033293| 1209803613|1505536182769| IN_VEHICLE | 54
4 248033293| 1209803613|1505536182769| STILL | 31
5 248033293| 1209803613|1505536182769| UNKNOWN | 15
Original line 3, become result 3 to 5 lines
I only know do.call ("rbind", testAC$activity),
But just split the activity, latitudeE7 and longitudeE7 disappeared
> do.call ("rbind", testAC$activity)
timestampMs activity
1 1505536182769 IN_VEHICLE, STILL, UNKNOWN, 54, 31, 15
2 1505536077547 IN_VEHICLE, UNKNOWN, ON_BICYCLE, STILL, 64, 23, 8, 5
I look for two days, but may not keyword, can not find
Can anyone explain how to do what I want?
Thank you
I have a Rdata uploaded on Google Drive, maybe know more about it
google drive
How about this:
library(plyr)
cbind(dataAC[, 1:2], ldply(lapply(dataAC$activity, function(x) if (!is.null(x)) unlist(lapply(x, unlist)) else NA), rbind))
It will give you a dataframe instead of nested lists, and then you can reshape it however you want
latitudeE7 longitudeE7 1 timestampMs activity.type1 activity.type2 activity.type3 activity.confidence1 activity.confidence2
1 247915291 1209946249 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 248033293 1209803613 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 248033293 1209803613 <NA> 1505536182769 IN_VEHICLE STILL UNKNOWN 54 31
4 248002555 1209895254 <NA> 1505536077547 IN_VEHICLE UNKNOWN ON_BICYCLE 64 23
5 247966714 1209957315 <NA> 1505535932508 IN_VEHICLE ON_BICYCLE <NA> 54 46
6 247966714 1209957315 <NA> 1505535825664 <NA> <NA> <NA> <NA> <NA>
activity.confidence3 activity.type4 activity.confidence4 activity.type activity.confidence
1 <NA> <NA> <NA> <NA> <NA>
2 <NA> <NA> <NA> <NA> <NA>
3 15 <NA> <NA> <NA> <NA>
4 8 STILL 5 <NA> <NA>
5 <NA> <NA> <NA> <NA> <NA>
6 <NA> <NA> <NA> TILTING 100