I am trying to extract score information for each ID and for each itemID. Here how my sample dataset looks like.
df <- data.frame(Text_1 = c("Scoring", "1 = Incorrect","Text1","Text2","Text3","Text4", "Demo 1: Color Naming","Amarillo","Azul","Verde","Azul",
"Demo 1: Errors","Item 1: Color naming","Amarillo","Azul","Verde","Azul",
"Item 1: Time in seconds","Item 1: Errors",
"Item 2: Shape Naming","Cuadrado/Cuadro","Cuadrado/Cuadro","Círculo","Estrella","Círculo","Triángulo",
"Item 2: Time in seconds","Item 2: Errors"),
School.2 = c("Teacher:","DC Name:","Date (mm/dd/yyyy):","Child Grade:","Student Study ID:",NA, NA,NA,NA,NA,NA,
0,"1 = Incorrect responses",0,1,NA,NA,NA,0,"1 = Incorrect responses",0,NA,NA,1,1,0,NA,0),
X_Elementary_School..3 = c("Bill:","X District","10/7/21","K","123-2222-2:",NA, NA,NA,NA,NA,NA,
NA,"Child response",NA,NA,NA,NA,NA,NA,"Child response",NA,NA,NA,NA,NA,NA,NA,NA),
School.4 = c("Teacher:","DC Name:","Date (mm/dd/yyyy):","Child Grade:","Student Study ID:",NA, 0,NA,1,NA,NA,0,"1 = Incorrect responses",0,1,NA,NA,120,0,"1 = Incorrect responses",NA,1,0,1,NA,1,110,0),
Y_Elementary_School..2 = c("John:","X District","11/7/21","K","112-1111-3:",NA, NA,NA,NA,NA,NA,NA,"Child response",NA,NA,NA,NA,NA,NA,"Child response",NA,NA,NA,NA,NA,NA, NA,NA))
> df
Text_1 School.2 X_Elementary_School..3 School.4 Y_Elementary_School..2
1 Scoring Teacher: Bill: Teacher: John:
2 1 = Incorrect DC Name: X District DC Name: X District
3 Text1 Date (mm/dd/yyyy): 10/7/21 Date (mm/dd/yyyy): 11/7/21
4 Text2 Child Grade: K Child Grade: K
5 Text3 Student Study ID: 123-2222-2: Student Study ID: 112-1111-3:
6 Text4 <NA> <NA> <NA> <NA>
7 Demo 1: Color Naming <NA> <NA> 0 <NA>
8 Amarillo <NA> <NA> <NA> <NA>
9 Azul <NA> <NA> 1 <NA>
10 Verde <NA> <NA> <NA> <NA>
11 Azul <NA> <NA> <NA> <NA>
12 Demo 1: Errors 0 <NA> 0 <NA>
13 Item 1: Color naming 1 = Incorrect responses Child response 1 = Incorrect responses Child response
14 Amarillo 0 <NA> 0 <NA>
15 Azul 1 <NA> 1 <NA>
16 Verde <NA> <NA> <NA> <NA>
17 Azul <NA> <NA> <NA> <NA>
18 Item 1: Time in seconds <NA> <NA> 120 <NA>
19 Item 1: Errors 0 <NA> 0 <NA>
20 Item 2: Shape Naming 1 = Incorrect responses Child response 1 = Incorrect responses Child response
21 Cuadrado/Cuadro 0 <NA> <NA> <NA>
22 Cuadrado/Cuadro <NA> <NA> 1 <NA>
23 Círculo <NA> <NA> 0 <NA>
24 Estrella 1 <NA> 1 <NA>
25 Círculo 1 <NA> <NA> <NA>
26 Triángulo 0 <NA> 1 <NA>
27 Item 2: Time in seconds <NA> <NA> 110 <NA>
28 Item 2: Errors 0 <NA> 0 <NA>
This sample dataset is limited only for two schools, two teachers and two students.
In this step, I need to extract student responses for each item.
Wherever the first column has Item , I need to grab from there. I especially need to index the rows and columns columns rather than giving the exact row columns number since this will be for multiple datafiles and each files has different information. No need to grab the ..:Error part.
################################################################################
# ## 2-extract the score information here
# ## 1-grab item information from where "Item 1:.." starts
Here, rather than using row number, how to automate this part.
score<-df[c(7:11,13:17,20:26),c(seq(2,dim(df)[2],2))] # need to automate row and columns index here
score<-as.data.frame(t(score))
rownames(score)<-seq(1,nrow(score),1)
colnames(score)<-paste0('i',seq(1,ncol(score),1)) # assign col names for items
score<-apply(score,2,as.numeric) # only keep numeric columns
score<-as.data.frame(score)
score$total<-rowSums(score,na.rm=T); score # create a total score
> score
i1 i2 i3 i4 i5 i6 i7 i8 i9 i10 i11 i12 i13 i14 i15 i16 i17 total
1 NA NA NA NA NA NA 0 1 NA NA NA 0 NA NA 1 1 0 3
2 0 NA 1 NA NA NA 0 1 NA NA NA NA 1 0 1 NA 1 5
Additionally, I need to add ID which I could not achieve here.
My desired output would be:
> score
ID i1 i2 i3 i4 i5 i6 i7 i8 i9 i10 i11 i12 i13 i14 i15 i16 i17 total
1 123-2222-2 NA NA NA NA NA NA 0 1 NA NA NA 0 NA NA 1 1 0 3
2 112-1111-3 0 NA 1 NA NA NA 0 1 NA NA NA NA 1 0 1 NA 1 5
From a page like this
https://stackoverflow.com/users/11786778/nathalie?tab=reputation
How is it possible to dropdown all list from reputation table and receive the information which is loaded in the process of load?
Is this what you want?
library(rvest)
library(magrittr)
library(plyr)
#Doing URLs one by one
url<-"https://stackoverflow.com/users/11786778/nathalie?tab=reputation"
##GET SALES DATA
pricesdata <- read_html(url) %>% html_nodes(xpath = "//table[1]") %>% html_table(fill=TRUE)
library(plyr)
df <- ldply(pricesdata, data.frame)
Produces:
1 <NA>
2 Take the result of call for a list of ids
3 <NA>
4 <NA>
5 <NA>
6 Add Detailed history
7 <NA>
8 <NA>
9 <NA>
10 <NA>
11 <NA>
12 <NA>
13 <NA>
14 <NA>
15 <NA>
16 <NA>
17 <NA>
18 <NA>
19 <NA>
20 <NA>
21 <NA>
22 <NA>
23 <NA>
24 <NA>
25 <NA>
26 <NA>
27 <NA>
28 <NA>
29 <NA>
30 <NA>
31 <NA>
32 <NA>
33 <NA>
34 <NA>
35 <NA>
36 <NA>
37 <NA>
38 <NA>
39 <NA>
40 <NA>
41 <NA>
42 <NA>
43 <NA>
>
I am trying to calculate readability, but it seems everything is written to expect either a file path or a Corpus. How do I handle a string?
Error (on the tokenization step):
Error: Unable to locate
I tried:
str<-"Readability zero one. Ten, Eleven.", "The cat in a dilapidated tophat."
library(koRpus)
ll.tagged <- tokenize(str, lang="en")
readability(ll.tagged,measure="Flesch.Kincaid")
You need to download the language file
install.koRpus.lang(c("en"))
library(koRpus.lang.en)
ll.tagged <- tokenize(str, format = "obj", lang = "en")
ll.tagged
doc_id token tag lemma lttr wclass desc stop stem idx sntc
1 <NA> Readability word.kRp 11 word <NA> <NA> <NA> 1 1
2 <NA> zero word.kRp 4 word <NA> <NA> <NA> 2 1
3 <NA> one word.kRp 3 word <NA> <NA> <NA> 3 1
4 <NA> . .kRp 1 fullstop <NA> <NA> <NA> 4 1
5 <NA> Ten word.kRp 3 word <NA> <NA> <NA> 5 2
6 <NA> , ,kRp 1 comma <NA> <NA> <NA> 6 2
[...]
10 <NA> cat word.kRp 3 word <NA> <NA> <NA> 10 3
11 <NA> in word.kRp 2 word <NA> <NA> <NA> 11 3
12 <NA> a word.kRp 1 word <NA> <NA> <NA> 12 3
13 <NA> dilapidated word.kRp 11 word <NA> <NA> <NA> 13 3
14 <NA> tophat word.kRp 6 word <NA> <NA> <NA> 14 3
15 <NA> . .kRp 1 fullstop <NA> <NA> <NA> 15 3
Sorry, the title may not describe well
I have a dataframe form google history
original
> head(testAC)
latitudeE7 longitudeE7 activity
1 247915291 1209946249 NULL
2 248033293 1209803613 NULL
3 248033293 1209803613 1505536182769, IN_VEHICLE, STILL, UNKNOWN, 54, 31, 15
result
> head(testAC)
latitudeE7|longitudeE7| activityTime|mainactivity| speed
1 247915291| 1209946249| | NULL |
2 248033293| 1209803613| | NULL |
3 248033293| 1209803613|1505536182769| IN_VEHICLE | 54
4 248033293| 1209803613|1505536182769| STILL | 31
5 248033293| 1209803613|1505536182769| UNKNOWN | 15
Original line 3, become result 3 to 5 lines
I only know do.call ("rbind", testAC$activity),
But just split the activity, latitudeE7 and longitudeE7 disappeared
> do.call ("rbind", testAC$activity)
timestampMs activity
1 1505536182769 IN_VEHICLE, STILL, UNKNOWN, 54, 31, 15
2 1505536077547 IN_VEHICLE, UNKNOWN, ON_BICYCLE, STILL, 64, 23, 8, 5
I look for two days, but may not keyword, can not find
Can anyone explain how to do what I want?
Thank you
I have a Rdata uploaded on Google Drive, maybe know more about it
google drive
How about this:
library(plyr)
cbind(dataAC[, 1:2], ldply(lapply(dataAC$activity, function(x) if (!is.null(x)) unlist(lapply(x, unlist)) else NA), rbind))
It will give you a dataframe instead of nested lists, and then you can reshape it however you want
latitudeE7 longitudeE7 1 timestampMs activity.type1 activity.type2 activity.type3 activity.confidence1 activity.confidence2
1 247915291 1209946249 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 248033293 1209803613 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 248033293 1209803613 <NA> 1505536182769 IN_VEHICLE STILL UNKNOWN 54 31
4 248002555 1209895254 <NA> 1505536077547 IN_VEHICLE UNKNOWN ON_BICYCLE 64 23
5 247966714 1209957315 <NA> 1505535932508 IN_VEHICLE ON_BICYCLE <NA> 54 46
6 247966714 1209957315 <NA> 1505535825664 <NA> <NA> <NA> <NA> <NA>
activity.confidence3 activity.type4 activity.confidence4 activity.type activity.confidence
1 <NA> <NA> <NA> <NA> <NA>
2 <NA> <NA> <NA> <NA> <NA>
3 15 <NA> <NA> <NA> <NA>
4 8 STILL 5 <NA> <NA>
5 <NA> <NA> <NA> <NA> <NA>
6 <NA> <NA> <NA> TILTING 100
I want to do an extration from rasterdata and add the information to the polygon, but I get this warning and the extration contains only null values
Warning message:
In .local(x, y, ...) :
cannot return a sp object because the data length varies between polygons
What could be the problem? I did the same extraction last week with the same information and it was working fine. The formula is
expop <- extract(rasterdata, floods1985, small=TRUE, fun=sum, na.rm=TRUE, df=FALSE, nl=1, sp=TRUE)
Data of floods1985 is
head(floods1985)
ID AREA CENTRIODX CENTRIODY DFONUMBER GLIDE__ LINKS OTHER NATIONS
0 1 92620 5.230 35.814 1 <NA> Algeria <NA> <NA>
1 2 678500 -45.349 -18.711 2 <NA> Brazil <NA> <NA>
2 3 12850 122.974 10.021 3 <NA> Philippines <NA> <NA>
3 4 16540 124.606 1.015 4 <NA> Indonesia <NA> <NA>
4 5 20080 32.349 -25.869 5 <NA> Mozambique <NA> <NA>
5 6 1040 43.360 -11.652 6 <NA> Comoros islas <NA> <NA>
X_AFFECTED
0 <NA>
1 <NA>
2 <NA>
3 <NA>
4 <NA>
5 <NA>
AND_RIVERS
0 Northeastern
1 States: Rio de Janeiro, Minas Gerais a Espirito Santo
2 Towns: Tanjay a Pamplona
3 Region: Northern Sulawesi; Towns: Gorontalo Regency
4 Provinces: Natal, Maputo; Rivers: Nkomati, Omati, Maputo, Umbeluzi, Incomati, Limpopo, Pungue, Buzi a Zambezi; Town: Ressano Garcia
5 Isla of Anjouan; Villages: Hassimpao, Marahare, Vouani
RIVERS BEGAN ENDED DAYS DEAD DISPLACED X_USD_ MAIN_CAUSE
0 <NA> 1985/01/01 1985/01/05 5 26 3000 <NA> Heavy rain
1 <NA> 1985/01/15 1985/02/02 19 229 80000 2000000000 Heavy rain
2 <NA> 1985/01/20 1985/01/21 2 43 444 <NA> Brief torrential rain
3 <NA> 1985/02/04 1985/02/18 15 21 300 <NA> Brief torrential rain
4 <NA> 1985/02/09 1985/02/11 3 19 <NA> 3000000 Heavy rain
5 <NA> 1985/02/16 1985/02/28 13 2 35000 5600000 Tropical cyclone
SEVERITY__ SQ_KM X_M___
0 1.0 92620 5.665675
1 1.5 678500 7.286395
2 1.0 12850 4.409933
3 1.0 16540 5.394627
4 1.5 20080 4.955976
5 1.0 1040 4.130977