Dropdown all list and collect data rcurl - r

From a page like this
https://stackoverflow.com/users/11786778/nathalie?tab=reputation
How is it possible to dropdown all list from reputation table and receive the information which is loaded in the process of load?

Is this what you want?
library(rvest)
library(magrittr)
library(plyr)
#Doing URLs one by one
url<-"https://stackoverflow.com/users/11786778/nathalie?tab=reputation"
##GET SALES DATA
pricesdata <- read_html(url) %>% html_nodes(xpath = "//table[1]") %>% html_table(fill=TRUE)
library(plyr)
df <- ldply(pricesdata, data.frame)
Produces:
1 <NA>
2 Take the result of call for a list of ids
3 <NA>
4 <NA>
5 <NA>
6 Add Detailed history
7 <NA>
8 <NA>
9 <NA>
10 <NA>
11 <NA>
12 <NA>
13 <NA>
14 <NA>
15 <NA>
16 <NA>
17 <NA>
18 <NA>
19 <NA>
20 <NA>
21 <NA>
22 <NA>
23 <NA>
24 <NA>
25 <NA>
26 <NA>
27 <NA>
28 <NA>
29 <NA>
30 <NA>
31 <NA>
32 <NA>
33 <NA>
34 <NA>
35 <NA>
36 <NA>
37 <NA>
38 <NA>
39 <NA>
40 <NA>
41 <NA>
42 <NA>
43 <NA>
>

Related

passing particular domain from column

I am trying to pass particular domain else all the values replace with NULL
if BB= TRUE, or If BB = False then all the values should be there.
df6 <- data.frame(name=c("try,xab","xab,Lan","mhy,mun","vgtu,mmc","dgsy,aaf","kull,nnhu","hula,njam","mund,jiha","htfy,ntha","bhr,gydbt","sgyu,hytb","vdti,kula","mftyu,huta","ibdy,vcge","cday,bhsue","ajtu,nudj"),
email=c("xab.try#ybcd.com","Lan.xab#ybcd.com","tth.vgu#ybcd.com","mmc.vgtu#ybcd.com","aaf.dgsy#partnt.com","nnhu.kull#ybcd.com","njam.hula#ybcd.com","jiha.mund#ybcd.com","ntha.htfy#ybcd.com","gydbt.bhr#ybcd.com","hytb.sgyu#ybcd.com","kula.vdti#ybcd.com","huta.mftyu#ybcd.com","ggat.khul#ybcd.com","bhsue.cday#ybcd.com","nudj.ajtu#ybcd.com"))
BB=TRUE
col_drop <- c("partnt.com")
df6 <- ifelse(BB==TRUE,
df6 <- df6[ , !(names(df6) %in% col_drop)],df6) %>% as.data.frame()
the output should be like
This works for me :)
library(dplyr, warn.conflicts = FALSE)
df6 <- data.frame(name=c("try,xab","xab,Lan","mhy,mun","vgtu,mmc","dgsy,aaf","kull,nnhu","hula,njam","mund,jiha","htfy,ntha","bhr,gydbt","sgyu,hytb","vdti,kula","mftyu,huta","ibdy,vcge","cday,bhsue","ajtu,nudj"),
email=c("xab.try#ybcd.com","Lan.xab#ybcd.com","tth.vgu#ybcd.com","mmc.vgtu#ybcd.com","aaf.dgsy#partnt.com","nnhu.kull#ybcd.com","njam.hula#ybcd.com","jiha.mund#ybcd.com","ntha.htfy#ybcd.com","gydbt.bhr#ybcd.com","hytb.sgyu#ybcd.com","kula.vdti#ybcd.com","huta.mftyu#ybcd.com","ggat.khul#ybcd.com","bhsue.cday#ybcd.com","nudj.ajtu#ybcd.com"))
col_drop <- c("partnt.com")
mutate(df6, email = if_else(grepl(col_drop, email), email, NULL))
#> name email
#> 1 try,xab <NA>
#> 2 xab,Lan <NA>
#> 3 mhy,mun <NA>
#> 4 vgtu,mmc <NA>
#> 5 dgsy,aaf aaf.dgsy#partnt.com
#> 6 kull,nnhu <NA>
#> 7 hula,njam <NA>
#> 8 mund,jiha <NA>
#> 9 htfy,ntha <NA>
#> 10 bhr,gydbt <NA>
#> 11 sgyu,hytb <NA>
#> 12 vdti,kula <NA>
#> 13 mftyu,huta <NA>
#> 14 ibdy,vcge <NA>
#> 15 cday,bhsue <NA>
#> 16 ajtu,nudj <NA>
Created on 2020-09-27 by the reprex package (v0.3.0)
Does this work:
> df6[!grepl('partnt.com', df6$email), 'email'] <- NA
> df6
name email
1 try,xab <NA>
2 xab,Lan <NA>
3 mhy,mun <NA>
4 vgtu,mmc <NA>
5 dgsy,aaf aaf.dgsy#partnt.com
6 kull,nnhu <NA>
7 hula,njam <NA>
8 mund,jiha <NA>
9 htfy,ntha <NA>
10 bhr,gydbt <NA>
11 sgyu,hytb <NA>
12 vdti,kula <NA>
13 mftyu,huta <NA>
14 ibdy,vcge <NA>
15 cday,bhsue <NA>
16 ajtu,nudj <NA>
>
below should work:
library(data.table)
setDT(df6)
BB <- TRUE
domain_to_keep <- "partnt.com"
df6[BB & !grepl(paste0("#", domain_to_keep, "$"), email) , email := "" ]

counting genes: error: the combined objects have no sequence levels in common

I am new in processing RNA-seq data. I have human RNA-seq data and am now trying to count the genes using summarizeoverlaps, but I get this warning for all my files:
" In .Seqinfo.mergexy(x, y) :
The 2 combined objects have no sequence levels in common. (Use
suppressWarnings() to suppress this warning.)"
This is what I did:
I aligned my RNA seq files to the an Ensembl reference file (Homo_sapiens.GRCh38.cdna.all.fa.gz) and generated BAM files.
seqinfo(bamfiles)`
Seqinfo object with 190508 sequences from an unspecified genome:
seqnames seqlengths isCircular genome
ENST00000633009.1 20 <NA> <NA>
ENST00000634070.1 18 <NA> <NA>
ENST00000632963.1 20 <NA> <NA>
ENST00000633030.1 19 <NA> <NA>
ENST00000633765.1 31 <NA> <NA>
... ... ... ...
ENST00000638565.1 1331 <NA> <NA>
ENST00000673346.1 895 <NA> <NA>
ENST00000673247.1 369 <NA> <NA>
ENST00000672305.1 758 <NA> <NA>
ENST00000671911.1 943 <NA> <NA>
I also downloaded the GTF file from Ensembl:
Homo_sapiens.GRCh38.100.gtf.gz
seqinfo(txdb)
Seqinfo object with 47 sequences (1 circular) from an unspecified genome; no seqlengths:
seqnames seqlengths isCircular genome
1 <NA> <NA> <NA>
2 <NA> <NA> <NA>
3 <NA> <NA> <NA>
4 <NA> <NA> <NA>
5 <NA> <NA> <NA>
... ... ... ...
KI270731.1 <NA> <NA> <NA>
KI270733.1 <NA> <NA> <NA>
KI270734.1 <NA> <NA> <NA>
KI270744.1 <NA> <NA> <NA>
KI270750.1 <NA> <NA> <NA>
I am guessing it has something to do with the seqnames, but I am not sure what I have to do. I tried converting it to Ensembl-style:
mapSeqlevels(seqlevels(bamfiles), "Ensembl")
mapSeqlevels(seqlevels(txdb), "Ensembl")
but that did no do anything...
NB featurecounts does not work either...
Thanks in advance!
Sandra

How can I tokenize a string in R?

I am trying to calculate readability, but it seems everything is written to expect either a file path or a Corpus. How do I handle a string?
Error (on the tokenization step):
Error: Unable to locate
I tried:
str<-"Readability zero one. Ten, Eleven.", "The cat in a dilapidated tophat."
library(koRpus)
ll.tagged <- tokenize(str, lang="en")
readability(ll.tagged,measure="Flesch.Kincaid")
You need to download the language file
install.koRpus.lang(c("en"))
library(koRpus.lang.en)
ll.tagged <- tokenize(str, format = "obj", lang = "en")
ll.tagged
doc_id token tag lemma lttr wclass desc stop stem idx sntc
1 <NA> Readability word.kRp 11 word <NA> <NA> <NA> 1 1
2 <NA> zero word.kRp 4 word <NA> <NA> <NA> 2 1
3 <NA> one word.kRp 3 word <NA> <NA> <NA> 3 1
4 <NA> . .kRp 1 fullstop <NA> <NA> <NA> 4 1
5 <NA> Ten word.kRp 3 word <NA> <NA> <NA> 5 2
6 <NA> , ,kRp 1 comma <NA> <NA> <NA> 6 2
[...]
10 <NA> cat word.kRp 3 word <NA> <NA> <NA> 10 3
11 <NA> in word.kRp 2 word <NA> <NA> <NA> 11 3
12 <NA> a word.kRp 1 word <NA> <NA> <NA> 12 3
13 <NA> dilapidated word.kRp 11 word <NA> <NA> <NA> 13 3
14 <NA> tophat word.kRp 6 word <NA> <NA> <NA> 14 3
15 <NA> . .kRp 1 fullstop <NA> <NA> <NA> 15 3

R - split list and marge same table

Sorry, the title may not describe well
I have a dataframe form google history
original
> head(testAC)
latitudeE7 longitudeE7 activity
1 247915291 1209946249 NULL
2 248033293 1209803613 NULL
3 248033293 1209803613 1505536182769, IN_VEHICLE, STILL, UNKNOWN, 54, 31, 15
result
> head(testAC)
latitudeE7|longitudeE7| activityTime|mainactivity| speed
1 247915291| 1209946249| | NULL |
2 248033293| 1209803613| | NULL |
3 248033293| 1209803613|1505536182769| IN_VEHICLE | 54
4 248033293| 1209803613|1505536182769| STILL | 31
5 248033293| 1209803613|1505536182769| UNKNOWN | 15
Original line 3, become result 3 to 5 lines
I only know do.call ("rbind", testAC$activity),
But just split the activity, latitudeE7 and longitudeE7 disappeared
> do.call ("rbind", testAC$activity)
timestampMs activity
1 1505536182769 IN_VEHICLE, STILL, UNKNOWN, 54, 31, 15
2 1505536077547 IN_VEHICLE, UNKNOWN, ON_BICYCLE, STILL, 64, 23, 8, 5
I look for two days, but may not keyword, can not find
Can anyone explain how to do what I want?
Thank you
I have a Rdata uploaded on Google Drive, maybe know more about it
google drive
How about this:
library(plyr)
cbind(dataAC[, 1:2], ldply(lapply(dataAC$activity, function(x) if (!is.null(x)) unlist(lapply(x, unlist)) else NA), rbind))
It will give you a dataframe instead of nested lists, and then you can reshape it however you want
latitudeE7 longitudeE7 1 timestampMs activity.type1 activity.type2 activity.type3 activity.confidence1 activity.confidence2
1 247915291 1209946249 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 248033293 1209803613 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 248033293 1209803613 <NA> 1505536182769 IN_VEHICLE STILL UNKNOWN 54 31
4 248002555 1209895254 <NA> 1505536077547 IN_VEHICLE UNKNOWN ON_BICYCLE 64 23
5 247966714 1209957315 <NA> 1505535932508 IN_VEHICLE ON_BICYCLE <NA> 54 46
6 247966714 1209957315 <NA> 1505535825664 <NA> <NA> <NA> <NA> <NA>
activity.confidence3 activity.type4 activity.confidence4 activity.type activity.confidence
1 <NA> <NA> <NA> <NA> <NA>
2 <NA> <NA> <NA> <NA> <NA>
3 15 <NA> <NA> <NA> <NA>
4 8 STILL 5 <NA> <NA>
5 <NA> <NA> <NA> <NA> <NA>
6 <NA> <NA> <NA> TILTING 100

R: sorting a row at a time - strange things happening

I have some data that looks like this:
> head(trim)
origin X.1 X.2 X.3 X.4 X.5
1 017003001 257056001 <NA> <NA> <NA> <NA>
6 017035001 017017001 017039001 <NA> <NA> <NA>
8 017038003 087041002 <NA> <NA> <NA> <NA>
11 027001001 027006006 027054001/027054003 <NA> <NA> <NA>
12 027002002 027081001 117016001 <NA> <NA> <NA>
15 027006006 027001001 027006001 <NA> <NA> <NA>
I'm attempting to sort across each row like so:
for(lc in 1:nrow(trim)){
trim[lc,] <-sort(trim[lc,],na.last=TRUE)
}
The output like so:
> head(trim)
origin X.1 X.2 X.3 X.4 X.5
1 017003001 257056001 <NA> <NA> <NA> <NA>
6 017017001 6 017039001 <NA> <NA> <NA>
8 017038003 087041002 <NA> <NA> <NA> <NA>
11 027001001 027006006 027054001/027054003 <NA> <NA> <NA>
12 027002002 027081001 117016001 <NA> <NA> <NA>
15 027001001 027006001 15 <NA> <NA> <NA>
Eh...so whats going on here with this weirdness? Why are the row names being sorted and seemingly replacing some genuine entries?

Resources