I wanna divide a polygon shapefile (deforestation in brazilian Amazon), by years of deforestation. The years are in a string field, like "d2010_1", "d2010_2", "d2011_1" and so on. I want to divide it in 5 year periods. I tried the following:
d00a04 <- prodes[grepl("d2000",prodes#data$CLASS_NAME) ||
grepl("d2001",prodes#data$CLASS_NAME) ||
grepl("d2002",prodes#data$CLASS_NAME) ||
grepl("d2003",prodes#data$CLASS_NAME) ||
grepl("d2004",prodes#data$CLASS_NAME),]
but it gave the following error:
Error in if (is.numeric(i) && i < 0) { :
missing value where TRUE/FALSE needed
I also tried:
anos00a04 = c("d2000","d2001","d2002","d2003","d2004")
d00a04 <- subset(prodes,prodes#data$CLASS_NAME %in% anos00a04)
but it gave the same error message. I've seen some examples like here, here and here, but I need to see if the beginning of the string matches, not numerical operators such as <, > or ==. Any help, please?
EDIT: I figured out a way, but something strange is happening. I did the following:
anos <- sort(unique(prodes#data$CLASS_NAME))
anos00a04 <- anos[2:20]
The first command gives me all 49 levels from the original shapefile. The second returns only those between 2000 and 2004. So far so good. But when I ask to see the second variable, it shows the 19 itens (d2000_2 d2000_3 d2001_0 d2001_3 d2001_4...), but below it says: "49 Levels: d1997_0 d2000_2 d2000_3..." including those that were supposed to stay out (and were out of the listing). What's happening?
PS: "anos" is the portuguese word for "years".
Related
I am trying to count the number of rows in my dataset (called data) that contain a range of numbers (e.g. between 0 and 9) by using R. I have not created a dataframe and my dataset is directly imported from a csv file into R.
EXAMPLE OF DATASET (INPUT)
MESSAGE
I have to wait 3 days
Feel quite tired
No way is 7pm already
It is too late now
This is beautiful
So the output would be 2 rows (row 1 and 2)
I have tried the following code but it provides me the wrong output number of posts (3) - so I know I am definitely doing something wrong.
data = read.csv (xxxxxx)
#count number of rows that contain numbers between 0 and 9
numbers= filter(data, !grepl("[0-9]",MESSAGE))
length(numbers)
Thank you in advance.
Maybe you can try the code like below if there are at least one digit
> length(grep("\\d", MESSAGE, value = TRUE))
[1] 2
If you want to find out the rows where there is a single digit, you can try
> length(grep("\\b\\d(?![0-9])", MESSAGE, value = TRUE, perl = TRUE))
[1] 2
Data
MESSAGE <- c(
"I have to wait 3 days",
"Feel quite tired",
"No way is 7pm already",
"It is too late now",
"This is beautiful"
)
filter function returns a dataframe back and counting length on a dataframe returns number of columns and not rows. Also you are using regex to select rows which do not have a number by introducing ! in front.
You can use sum. + grepl :
result <- sum(grepl('[0-9]', data$MESSAGE))
I am working on a simple project to help me get to know R, coming from javascript.
I have imported a list of numbers, and all I simply want to do, is to export a table that looks like the following:
"range","number"
"0.000-0.510",863
"0.510-1.020",21
"1.020-1.530",2
"1.530-2.040",2
"2.040-2.550",0
"2.550-3.059",2
"3.059-3.569",0
"3.569-4.079",3
"4.079->4.589",0
"4.589->5.099",1
where the ranges are in 10 steps, from the smallest to the largest value, the "range" and "number" are the top rows, and the columns going down are the different ranges and number of occurrences in this range.
This is my attempt so far:
list <- read.csv(file = "results/solarSystem.data")
table(list)
range <- (max(list) - min(list)) / 10
a1<-as.data.frame(table(cut(list,breaks=c(min(list),min(list)+1*range,min(list)+2*range,min(list)+3*range,min(list)+4*range,min(list)+5*range,min(list)+6*range,min(list)+7*range,min(list)+8*range,min(list)+9*range,max(list)))))
colnames(a1)<-c("range","freq")
a1
However, I get an error that
'Error in cut.default(list, breaks = c(min(list), min(list) + 1 * range...
'x' must be numeric'
This is the file I am importing, what looks like just a simple list of numbers, so I don't understand how it cannot be numeric?
https://gyazo.com/8fd00ce45c1c033f9dc9bf6c829195eb
Any advice on this would be appreciated!
Peter
this sounds pretty basic but every time I try to make a histogram, my code is saying x needs to be numeric. I've been looking everywhere but can't find one relating to my problem. I have data with 240 obs with 5 variables.
Nipper length
Number of Whiskers
Crab Carapace
Sex
Estuary location
There is 3 locations and i'm trying to make a histogram with nipper length
I've tried making new factors and levels, with the 80 obs in each location but its not working
Crabs.data <-read.table(pipe("pbpaste"),header = FALSE)##Mac
names(Crabs.data)<-c("Crab Identification","Estuary Location","Sex","Crab Carapace","Length of Nipper","Number of Whiskers")
Crabs.data<-Crabs.data[,-1]
attach(Crabs.data)
hist(`Length of Nipper`~`Estuary Location`)
Error in hist.default(Length of Nipper ~ Estuary Location) :
'x' must be numeric
Instead of correct result
hist() doesn't seem to like taking more than one variable.
I think you'd have the best luck subsetting the data, that is, making a vector of nipper lengths for all crabs in a given estuary.
crabs.data<-read.table("whatever you're calling it")
names<-(as you have it)
Estuary1<-as.vector(unlist(subset(crabs.data, `Estuary Loc`=="Location", select = `Length of Nipper`)))
hist(Estuary1)
Repeat the last two lines for your other two estuaries. You may not need the unlist() command, depending on your table. I've tended to need it for Excel files, but I don't know what format your table is in (that would've been helpful).
I am trying to create a conditional loop to create a new variable called BigSales which should be given a value of 'yes' if either the date occurred before 2012 or the total gross for the day exceeded $65 million. Otherwise, it should be given a value of 'no'.
I tried:
for(i in 1:45){
if(movies$Gross[i] > 65 | movies$Date[i] < 2012-01-01){
movies$BigSales[i] <- "yes"}
else (
movies$BigSales[i] <- "no"
)
}
But I got the error message:
Error in if (movies$Gross[i] > 65 | movies$Date[i] < 2012 - 1 - 1) { :
missing value where TRUE/FALSE needed
In addition to that, the data set contains 100 observations, but it is only reading 45. How can I solve this?
It's possible to add a conditional column in this matter, but there are tools out there that make this easier and more comprehensible.
library(plyr)
library(dplyr)
movies <- mutate(movies, BigSales = ifelse(Gross > 65 && Date < "2012-01-01","yes","no"))
You should also be careful working with dates - call str(movies$Date) to make sure it's of the "Date" type, and if not you should pass it to as.Date
To answer your question as you asked it, you didn't put quotes around the date you listed, so it tried to evaluate it as 2012 - 2. If you'd prefer to solve this problem with the code you have, use "2012-01-01"
ifelse is vectorised, meaning it takes each item from input vector, process for the condition and returns a vector.
Another point is that since OP has mentioned that date before 2012 will be considered as BigSales "yes". Hence checking for only year of movies$Date will do the trick.
In base R, solution could be in
movies$BigSales <- ifelse(movies$Gross > 65 | as.numeric(format(movies$Date,"%Y")) < 2012,
"yes", "no")
Note : movies$Date is expected of type Date or POSIXct
So I have a 252 rows of data in column 4, and I would like to find the difference between two consecutive rows throughout the entire column
My current code is:
appleClose<-NULL
for (i in 1:Apple[1]){
appleClose[i] <- AA[i,4]
}
appleClose[]
I tried, and failed, with:
appleClose<-NULL
for (i in 1:Apple[1]){
appleClose[i] <- AA[i,4] - AA[i+1,4]
}
appleClose[]
Edit:
I am trying to optimize a stock market portfolio in retrospect.
AA is the ticker symbol for Apple. I downloaded that information through some R code written earlier in the program.
I have not yet checked out the diff function yet. I will do that now.
The error I am receiving is
Error in [.xts(AA, i + 1, 4) : subscript out of bounds
Is this what you mean?
> Apple=runif(5,1,10)
#5 numbers
> Apple
[1] 3.362267 2.489085 3.899513 5.591127 9.315716
#4 differences
> diff(Apple)
[1] -0.8731816 1.4104271 1.6916143 3.7245894
or depending on your data either
>diff(AA$Apple)
or maybe
>diff(AA[,4])
Another option (if you are referring to this, your question is not much clear)
AA[-1,4]- AA[-dim(A)[1],4]