extraction function gives a warning message and no data in r - r

I want to do an extration from rasterdata and add the information to the polygon, but I get this warning and the extration contains only null values
Warning message:
In .local(x, y, ...) :
cannot return a sp object because the data length varies between polygons
What could be the problem? I did the same extraction last week with the same information and it was working fine. The formula is
expop <- extract(rasterdata, floods1985, small=TRUE, fun=sum, na.rm=TRUE, df=FALSE, nl=1, sp=TRUE)
Data of floods1985 is
head(floods1985)
ID AREA CENTRIODX CENTRIODY DFONUMBER GLIDE__ LINKS OTHER NATIONS
0 1 92620 5.230 35.814 1 <NA> Algeria <NA> <NA>
1 2 678500 -45.349 -18.711 2 <NA> Brazil <NA> <NA>
2 3 12850 122.974 10.021 3 <NA> Philippines <NA> <NA>
3 4 16540 124.606 1.015 4 <NA> Indonesia <NA> <NA>
4 5 20080 32.349 -25.869 5 <NA> Mozambique <NA> <NA>
5 6 1040 43.360 -11.652 6 <NA> Comoros islas <NA> <NA>
X_AFFECTED
0 <NA>
1 <NA>
2 <NA>
3 <NA>
4 <NA>
5 <NA>
AND_RIVERS
0 Northeastern
1 States: Rio de Janeiro, Minas Gerais a Espirito Santo
2 Towns: Tanjay a Pamplona
3 Region: Northern Sulawesi; Towns: Gorontalo Regency
4 Provinces: Natal, Maputo; Rivers: Nkomati, Omati, Maputo, Umbeluzi, Incomati, Limpopo, Pungue, Buzi a Zambezi; Town: Ressano Garcia
5 Isla of Anjouan; Villages: Hassimpao, Marahare, Vouani
RIVERS BEGAN ENDED DAYS DEAD DISPLACED X_USD_ MAIN_CAUSE
0 <NA> 1985/01/01 1985/01/05 5 26 3000 <NA> Heavy rain
1 <NA> 1985/01/15 1985/02/02 19 229 80000 2000000000 Heavy rain
2 <NA> 1985/01/20 1985/01/21 2 43 444 <NA> Brief torrential rain
3 <NA> 1985/02/04 1985/02/18 15 21 300 <NA> Brief torrential rain
4 <NA> 1985/02/09 1985/02/11 3 19 <NA> 3000000 Heavy rain
5 <NA> 1985/02/16 1985/02/28 13 2 35000 5600000 Tropical cyclone
SEVERITY__ SQ_KM X_M___
0 1.0 92620 5.665675
1 1.5 678500 7.286395
2 1.0 12850 4.409933
3 1.0 16540 5.394627
4 1.5 20080 4.955976
5 1.0 1040 4.130977

Related

Extracting a numeric information align with ID from unstructured dataset in R

I am trying to extract score information for each ID and for each itemID. Here how my sample dataset looks like.
df <- data.frame(Text_1 = c("Scoring", "1 = Incorrect","Text1","Text2","Text3","Text4", "Demo 1: Color Naming","Amarillo","Azul","Verde","Azul",
"Demo 1: Errors","Item 1: Color naming","Amarillo","Azul","Verde","Azul",
"Item 1: Time in seconds","Item 1: Errors",
"Item 2: Shape Naming","Cuadrado/Cuadro","Cuadrado/Cuadro","Círculo","Estrella","Círculo","Triángulo",
"Item 2: Time in seconds","Item 2: Errors"),
School.2 = c("Teacher:","DC Name:","Date (mm/dd/yyyy):","Child Grade:","Student Study ID:",NA, NA,NA,NA,NA,NA,
0,"1 = Incorrect responses",0,1,NA,NA,NA,0,"1 = Incorrect responses",0,NA,NA,1,1,0,NA,0),
X_Elementary_School..3 = c("Bill:","X District","10/7/21","K","123-2222-2:",NA, NA,NA,NA,NA,NA,
NA,"Child response",NA,NA,NA,NA,NA,NA,"Child response",NA,NA,NA,NA,NA,NA,NA,NA),
School.4 = c("Teacher:","DC Name:","Date (mm/dd/yyyy):","Child Grade:","Student Study ID:",NA, 0,NA,1,NA,NA,0,"1 = Incorrect responses",0,1,NA,NA,120,0,"1 = Incorrect responses",NA,1,0,1,NA,1,110,0),
Y_Elementary_School..2 = c("John:","X District","11/7/21","K","112-1111-3:",NA, NA,NA,NA,NA,NA,NA,"Child response",NA,NA,NA,NA,NA,NA,"Child response",NA,NA,NA,NA,NA,NA, NA,NA))
> df
Text_1 School.2 X_Elementary_School..3 School.4 Y_Elementary_School..2
1 Scoring Teacher: Bill: Teacher: John:
2 1 = Incorrect DC Name: X District DC Name: X District
3 Text1 Date (mm/dd/yyyy): 10/7/21 Date (mm/dd/yyyy): 11/7/21
4 Text2 Child Grade: K Child Grade: K
5 Text3 Student Study ID: 123-2222-2: Student Study ID: 112-1111-3:
6 Text4 <NA> <NA> <NA> <NA>
7 Demo 1: Color Naming <NA> <NA> 0 <NA>
8 Amarillo <NA> <NA> <NA> <NA>
9 Azul <NA> <NA> 1 <NA>
10 Verde <NA> <NA> <NA> <NA>
11 Azul <NA> <NA> <NA> <NA>
12 Demo 1: Errors 0 <NA> 0 <NA>
13 Item 1: Color naming 1 = Incorrect responses Child response 1 = Incorrect responses Child response
14 Amarillo 0 <NA> 0 <NA>
15 Azul 1 <NA> 1 <NA>
16 Verde <NA> <NA> <NA> <NA>
17 Azul <NA> <NA> <NA> <NA>
18 Item 1: Time in seconds <NA> <NA> 120 <NA>
19 Item 1: Errors 0 <NA> 0 <NA>
20 Item 2: Shape Naming 1 = Incorrect responses Child response 1 = Incorrect responses Child response
21 Cuadrado/Cuadro 0 <NA> <NA> <NA>
22 Cuadrado/Cuadro <NA> <NA> 1 <NA>
23 Círculo <NA> <NA> 0 <NA>
24 Estrella 1 <NA> 1 <NA>
25 Círculo 1 <NA> <NA> <NA>
26 Triángulo 0 <NA> 1 <NA>
27 Item 2: Time in seconds <NA> <NA> 110 <NA>
28 Item 2: Errors 0 <NA> 0 <NA>
This sample dataset is limited only for two schools, two teachers and two students.
In this step, I need to extract student responses for each item.
Wherever the first column has Item , I need to grab from there. I especially need to index the rows and columns columns rather than giving the exact row columns number since this will be for multiple datafiles and each files has different information. No need to grab the ..:Error part.
################################################################################
# ## 2-extract the score information here
# ## 1-grab item information from where "Item 1:.." starts
Here, rather than using row number, how to automate this part.
score<-df[c(7:11,13:17,20:26),c(seq(2,dim(df)[2],2))] # need to automate row and columns index here
score<-as.data.frame(t(score))
rownames(score)<-seq(1,nrow(score),1)
colnames(score)<-paste0('i',seq(1,ncol(score),1)) # assign col names for items
score<-apply(score,2,as.numeric) # only keep numeric columns
score<-as.data.frame(score)
score$total<-rowSums(score,na.rm=T); score # create a total score
> score
i1 i2 i3 i4 i5 i6 i7 i8 i9 i10 i11 i12 i13 i14 i15 i16 i17 total
1 NA NA NA NA NA NA 0 1 NA NA NA 0 NA NA 1 1 0 3
2 0 NA 1 NA NA NA 0 1 NA NA NA NA 1 0 1 NA 1 5
Additionally, I need to add ID which I could not achieve here.
My desired output would be:
> score
ID i1 i2 i3 i4 i5 i6 i7 i8 i9 i10 i11 i12 i13 i14 i15 i16 i17 total
1 123-2222-2 NA NA NA NA NA NA 0 1 NA NA NA 0 NA NA 1 1 0 3
2 112-1111-3 0 NA 1 NA NA NA 0 1 NA NA NA NA 1 0 1 NA 1 5

How to read first few records of a json file using R?

I have a HUGE json.gz file and the file is already converted to .json file. I would like ask how we can read, say, the first 100 records from the .json file using R. Any help is greatly appreciated. The following is just sample code:
library(jsonlite)
library(R.utils)
r=stream_in(file("yelp_academic_dataset_business.json"))
The file "yelp_academic_dataset_business.json" can be found from the link:
https://www.dropbox.com/s/gd1k41y9gbpfwq3/yelp_academic_dataset_business.json
Using the data from your original link, #Shree's suggestion is spot-on. First, use readLines to download only as many lines as you need:
dat <- readLines("https://uc385e5985dd32823a7dc6ba9b5e.dl.dropboxusercontent.com/cd/0/get/AhyCjVEm8yKnLz4w0-hZaW-titb8fOhQdMcwhTMF1_3i_iJ7DOqOU_KQRTtcvaFBaSTpAznh_6eq-vKAEiDkeVygMnRjThrnz0V5fyC4AURAcg/file?_download_id=9916801659220323334123287637995650900165723151388885263767035946&_notify_domain=www.dropbox.com&dl=1", n =4 )
# dat <- readLines("yelp_academic_dataset_business.json", n = 4)
Now create a "fake text connection" and pass this to the json parser:
jsonlite::stream_in(textConnection(dat))
# Imported 4 records. Simplifying...
# business_id full_address hours.Tuesday.close hours.Tuesday.open hours.Friday.close hours.Friday.open hours.Monday.close hours.Monday.open
# 1 vcNAWiLM4dR7D2nwwJ7nCA 4840 E Indian School Rd\nSte 101\nPhoenix, AZ 85018 17:00 08:00 17:00 08:00 17:00 08:00
# 2 UsFtqoBl7naz8AVUBZMjQQ 202 McClure St\nDravosburg, PA 15034 <NA> <NA> <NA> <NA> <NA> <NA>
# 3 cE27W9VPgO88Qxe4ol6y_g 1530 Hamilton Rd\nBethel Park, PA 15234 <NA> <NA> <NA> <NA> <NA> <NA>
# 4 HZdLhv6COCleJMo7nPl-RA 301 S Hills Vlg\nPittsburgh, PA 15241 21:00 10:00 21:00 10:00 21:00 10:00
# hours.Wednesday.close hours.Wednesday.open hours.Thursday.close hours.Thursday.open hours.Sunday.close hours.Sunday.open hours.Saturday.close hours.Saturday.open open
# 1 17:00 08:00 17:00 08:00 <NA> <NA> <NA> <NA> TRUE
# 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> TRUE
# 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> FALSE
# 4 21:00 10:00 21:00 10:00 18:00 11:00 21:00 10:00 TRUE
# categories city review_count name neighborhoods longitude state stars
# 1 Doctors, Health & Medical Phoenix 9 Eric Goldberg, MD NULL -111.98376 AZ 3.5
# 2 Nightlife Dravosburg 4 Clancy's Pub NULL -79.88693 PA 3.5
# 3 Active Life, Mini Golf, Golf Bethel Park 5 Cool Springs Golf Center NULL -80.01591 PA 2.5
# 4 Shopping, Home Services, Internet Service Providers, Mobile Phones, Professional Services, Electronics Pittsburgh 3 Verizon Wireless NULL -80.05998 PA 3.5
# latitude attributes.By Appointment Only attributes.Happy Hour attributes.Accepts Credit Cards attributes.Good For Groups attributes.Outdoor Seating attributes.Price Range
# 1 33.49931 TRUE NA NA NA NA NA
# 2 40.35052 NA TRUE TRUE TRUE FALSE 1
# 3 40.35690 NA NA NA NA NA NA
# 4 40.35762 NA NA NA NA NA NA
# attributes.Good for Kids type
# 1 NA business
# 2 NA business
# 3 TRUE business
# 4 NA business

How can I tokenize a string in R?

I am trying to calculate readability, but it seems everything is written to expect either a file path or a Corpus. How do I handle a string?
Error (on the tokenization step):
Error: Unable to locate
I tried:
str<-"Readability zero one. Ten, Eleven.", "The cat in a dilapidated tophat."
library(koRpus)
ll.tagged <- tokenize(str, lang="en")
readability(ll.tagged,measure="Flesch.Kincaid")
You need to download the language file
install.koRpus.lang(c("en"))
library(koRpus.lang.en)
ll.tagged <- tokenize(str, format = "obj", lang = "en")
ll.tagged
doc_id token tag lemma lttr wclass desc stop stem idx sntc
1 <NA> Readability word.kRp 11 word <NA> <NA> <NA> 1 1
2 <NA> zero word.kRp 4 word <NA> <NA> <NA> 2 1
3 <NA> one word.kRp 3 word <NA> <NA> <NA> 3 1
4 <NA> . .kRp 1 fullstop <NA> <NA> <NA> 4 1
5 <NA> Ten word.kRp 3 word <NA> <NA> <NA> 5 2
6 <NA> , ,kRp 1 comma <NA> <NA> <NA> 6 2
[...]
10 <NA> cat word.kRp 3 word <NA> <NA> <NA> 10 3
11 <NA> in word.kRp 2 word <NA> <NA> <NA> 11 3
12 <NA> a word.kRp 1 word <NA> <NA> <NA> 12 3
13 <NA> dilapidated word.kRp 11 word <NA> <NA> <NA> 13 3
14 <NA> tophat word.kRp 6 word <NA> <NA> <NA> 14 3
15 <NA> . .kRp 1 fullstop <NA> <NA> <NA> 15 3

R - split list and marge same table

Sorry, the title may not describe well
I have a dataframe form google history
original
> head(testAC)
latitudeE7 longitudeE7 activity
1 247915291 1209946249 NULL
2 248033293 1209803613 NULL
3 248033293 1209803613 1505536182769, IN_VEHICLE, STILL, UNKNOWN, 54, 31, 15
result
> head(testAC)
latitudeE7|longitudeE7| activityTime|mainactivity| speed
1 247915291| 1209946249| | NULL |
2 248033293| 1209803613| | NULL |
3 248033293| 1209803613|1505536182769| IN_VEHICLE | 54
4 248033293| 1209803613|1505536182769| STILL | 31
5 248033293| 1209803613|1505536182769| UNKNOWN | 15
Original line 3, become result 3 to 5 lines
I only know do.call ("rbind", testAC$activity),
But just split the activity, latitudeE7 and longitudeE7 disappeared
> do.call ("rbind", testAC$activity)
timestampMs activity
1 1505536182769 IN_VEHICLE, STILL, UNKNOWN, 54, 31, 15
2 1505536077547 IN_VEHICLE, UNKNOWN, ON_BICYCLE, STILL, 64, 23, 8, 5
I look for two days, but may not keyword, can not find
Can anyone explain how to do what I want?
Thank you
I have a Rdata uploaded on Google Drive, maybe know more about it
google drive
How about this:
library(plyr)
cbind(dataAC[, 1:2], ldply(lapply(dataAC$activity, function(x) if (!is.null(x)) unlist(lapply(x, unlist)) else NA), rbind))
It will give you a dataframe instead of nested lists, and then you can reshape it however you want
latitudeE7 longitudeE7 1 timestampMs activity.type1 activity.type2 activity.type3 activity.confidence1 activity.confidence2
1 247915291 1209946249 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 248033293 1209803613 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 248033293 1209803613 <NA> 1505536182769 IN_VEHICLE STILL UNKNOWN 54 31
4 248002555 1209895254 <NA> 1505536077547 IN_VEHICLE UNKNOWN ON_BICYCLE 64 23
5 247966714 1209957315 <NA> 1505535932508 IN_VEHICLE ON_BICYCLE <NA> 54 46
6 247966714 1209957315 <NA> 1505535825664 <NA> <NA> <NA> <NA> <NA>
activity.confidence3 activity.type4 activity.confidence4 activity.type activity.confidence
1 <NA> <NA> <NA> <NA> <NA>
2 <NA> <NA> <NA> <NA> <NA>
3 15 <NA> <NA> <NA> <NA>
4 8 STILL 5 <NA> <NA>
5 <NA> <NA> <NA> <NA> <NA>
6 <NA> <NA> <NA> TILTING 100

Reshape aggregated rows to new columns, categorical data

I am trying to use R to aggregate rows to columns. Here is a sample of my dataset.
age sex hash emotion color
22 1 b17f9762462b37e7510f0e6d2534530d Lonely #006666
22 1 b17f9762462b37e7510f0e6d2534530d Energetic #66CC00
22 1 b17f9762462b37e7510f0e6d2534530d Calm #FFFFFF
22 1 b17f9762462b37e7510f0e6d2534530d Angry #FF0000
24 1 7bb50ca97a9b517239b39440a966d2f6 Calm #006666
24 1 7bb50ca97a9b517239b39440a966d2f6 Excited #0033cc
24 1 7bb50ca97a9b517239b39440a966d2f6 Empty/void #999999
24 1 7bb50ca97a9b517239b39440a966d2f6 No emotion #FF6600
26 1 209f1ba8ef86e855deccc0aae120825c Comfortable #330066
21 1 b9e9309c0b1255a7efb2edf9ba66ae46 Energetic #330099
21 1 b9e9309c0b1255a7efb2edf9ba66ae46 Happy #330066
26 1 209f1ba8ef86e855deccc0aae120825c No emotion #FFCC00
26 1 209f1ba8ef86e855deccc0aae120825c Calm #006666
21 1 61debd3dea6d1aacce5c9fc7daec4fe5 Empty/void #FFFFFF
21 1 b9e9309c0b1255a7efb2edf9ba66ae46 Calm #006666
26 1 209f1ba8ef86e855deccc0aae120825c No emotion #339900
21 1 61debd3dea6d1aacce5c9fc7daec4fe5 Loved #FF6600
26 1 209f1ba8ef86e855deccc0aae120825c No emotion #66CC00
What I want to do is get this:
age sex hash #000000 #FF0000 ... #FFFFFF
22 1 8798tkojstwz9ei sad happy ... loved
...
One response is defined by the hash, associated data is age and sex.
I want to have each response as 1 instead of several columns. Each color should have it's own column and the associated emotion as value of that column.
The whole dataset has 13 colors, 20+ emotions and 1000+ responses. The dataset looks exactly as the sample and is stored in a mySQL database.
I have tried with reshape, but it doesn't play well with categorical data or I did not use the appropriate functions. Any ideas? It can include some mySQL preparation if needed. Java was here very slow and since I have 12k+ rows R sounds like the right thing for this.
Thank you.
using reshape2
dcast(dat,...~color,value.var='emotion')
age sex hash #0033cc #006666 #330066 #330099 #339900 #66CC00 #999999 #FF0000 #FF6600
1 21 1 61debd3dea6d1aacce5c9fc7daec4fe5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> Loved
2 21 1 b9e9309c0b1255a7efb2edf9ba66ae46 <NA> Calm Happy Energetic <NA> <NA> <NA> <NA> <NA>
3 22 1 b17f9762462b37e7510f0e6d2534530d <NA> Lonely <NA> <NA> <NA> Energetic <NA> Angry <NA>
4 24 1 7bb50ca97a9b517239b39440a966d2f6 Excited Calm <NA> <NA> <NA> <NA> Empty <NA> Noemotion
5 26 1 209f1ba8ef86e855deccc0aae120825c <NA> Calm Comfortable <NA> Noemotion Noemotion <NA> <NA> <NA>
#FFCC00 #FFFFFF
1 <NA> Empty
2 <NA> <NA>
3 <NA> Calm
4 <NA> <NA>
5 Noemotion <NA>
If I understand your objective correctly, reshape() is indeed the function you're looking for. Assuming your dataset is called mydf, try this:
reshape(mydf, direction = "wide",
idvar = c("hash", "age", "sex"),
timevar = "color")
# age sex hash emotion.#006666 emotion.#66CC00
# 1 22 1 b17f9762462b37e7510f0e6d2534530d Lonely Energetic
# 5 24 1 7bb50ca97a9b517239b39440a966d2f6 Calm <NA>
# 9 26 1 209f1ba8ef86e855deccc0aae120825c Calm No emotion
# 10 21 1 b9e9309c0b1255a7efb2edf9ba66ae46 Calm <NA>
# 14 21 1 61debd3dea6d1aacce5c9fc7daec4fe5 <NA> <NA>
# emotion.#FFFFFF emotion.#FF0000 emotion.#0033cc emotion.#999999 emotion.#FF6600
# 1 Calm Angry <NA> <NA> <NA>
# 5 <NA> <NA> Excited Empty/void No emotion
# 9 <NA> <NA> <NA> <NA> <NA>
# 10 <NA> <NA> <NA> <NA> <NA>
# 14 Empty/void <NA> <NA> <NA> Loved
# emotion.#330066 emotion.#330099 emotion.#FFCC00 emotion.#339900
# 1 <NA> <NA> <NA> <NA>
# 5 <NA> <NA> <NA> <NA>
# 9 Comfortable <NA> No emotion No emotion
# 10 Happy Energetic <NA> <NA>
# 14 <NA> <NA> <NA> <NA>
You can rename the columns later if you need to.

Resources