R perform grep on input - r

I have a file with the following contents
3594 -124.049541 44.429077
-123.381222 44.530192
-123.479913 44.625517
-123.578917 44.720704
END
3595 -123.103772 45.009223
-122.427717 45.101578
-122.525757 45.198252
-122.624122 45.294789
END
3676 -122.989567 44.147495
-122.323040 44.238368
-122.419523 44.335217
-122.516322 44.431923
END
END
I'd like to read this file into R, but I'd like to retain only the indented lines.
This seems like a good job for grep, but I'm not sure how to make it work.
Any thoughts?

You could try, where file.txt contains your data:
a <- readLines("file.txt")
a <- a[grepl("^ ", a)]
do.call("rbind", strsplit(a, " "))[, -1]
[,1] [,2]
[1,] "-123.381222" "44.530192"
[2,] "-123.479913" "44.625517"
[3,] "-123.578917" "44.720704"
[4,] "-122.427717" "45.101578"
[5,] "-122.525757" "45.198252"
[6,] "-122.624122" "45.294789"
[7,] "-122.323040" "44.238368"
[8,] "-122.419523" "44.335217"
[9,] "-122.516322" "44.431923"
# Alternatively you can get the data by read.table() as suggested by #Josh
read.table(textConnection(a))
V1 V2
1 -123.3812 44.53019
2 -123.4799 44.62552
3 -123.5789 44.72070
4 -122.4277 45.10158
5 -122.5258 45.19825
6 -122.6241 45.29479
7 -122.3230 44.23837
8 -122.4195 44.33522
9 -122.5163 44.43192

Related

Getting pairs of coordinates in the same column? (R)

I'm playing around with the concaveman package.
I'm using this sample code to create a polygon of a concave hull around some test points:
library(concaveman)
data(points)
polygons <- concaveman(points)
plot(points)
plot(polygons, add = TRUE)
However, the polygon df has all the coordinates crammed into one row like so:
polygons
1
list(c(-122.0809, -122.0813, -122.0812, -122.082, -122.0819, -1...
I tried using unlist, but this just separates the x/y coordinate pairs to opposite ends of the df from each other:
fixpolygon <- data.frame(unlist(polygons))
outputs:
polygons1 -122.0809
polygons2 -122.0813
polygons3 -122.0812
...
polygons210 37.3736
polygons211 37.3764
polygons22 37.3767
How can I make it so that the output is like so:
c(-122.0809, 37.3736)
c(-122.0813, 37.3764)
...
etc. etc. ?
By inspecting
str(polygons)
we can see that what you want is already prepared in
polygons$polygons[[1]][[1]]
# V1 V2
# [1,] -122.0809 37.3736
# [2,] -122.0813 37.3764
# [3,] -122.0812 37.3767
# [4,] -122.0820 37.3772
# [5,] -122.0819 37.3792
# [6,] -122.0822 37.3792
# ...
Try using the sf package:
library(sf)
st_coordinates(st_as_sf(polygons))
X Y L1 L2
[1,] -122.0809 37.3736 1 1
[2,] -122.0813 37.3764 1 1
[3,] -122.0812 37.3767 1 1
[4,] -122.0820 37.3772 1 1
[5,] -122.0819 37.3792 1 1
[6,] -122.0822 37.3792 1 1

Accounting with apply not working

I am trying to use accounting from the formattable package within apply, and it does not seem to working -
library(formattable)
set.seed(4226)
temp = data.frame(a = sample(1000:50000, 10), b = sample(1000:50000, 10),
c = sample(1000:50000, 10), d = sample(1000:50000, 10))
temp
a b c d
1 45186 17792 43363 17080
2 26982 25410 2327 17982
3 45204 39757 29883 4283
4 27069 21334 10497 28776
5 47895 46241 22743 36257
6 30161 45254 21382 42275
7 18278 28936 27036 23620
8 31199 30182 10235 7355
9 10664 40312 28324 20864
10 45225 45545 44394 13364
apply(temp, 2, function(x){x = accounting(x, digits = 0)})
a b c d
[1,] 45186 17792 43363 17080
[2,] 26982 25410 2327 17982
[3,] 45204 39757 29883 4283
[4,] 27069 21334 10497 28776
[5,] 47895 46241 22743 36257
[6,] 30161 45254 21382 42275
[7,] 18278 28936 27036 23620
[8,] 31199 30182 10235 7355
[9,] 10664 40312 28324 20864
[10,] 45225 45545 44394 13364
What I want is -
a b c d
[1,] 45,186 17,792 43,363 17,080
[2,] 26,982 25,410 2,327 17,982
[3,] 45,204 39,757 29,883 4,283
[4,] 27,069 21,334 10,497 28,776
[5,] 47,895 46,241 22,743 36,257
[6,] 30,161 45,254 21,382 42,275
[7,] 18,278 28,936 27,036 23,620
[8,] 31,199 30,182 10,235 7,355
[9,] 10,664 40,312 28,324 20,864
[10,] 45,225 45,545 44,394 13,364
You probably want to keep things as a data frame, in which case apply is not the right tool. It will always give you a matrix back.
You might want one of the following options:
temp[cols] <- lapply(temp[cols], function(x){accounting(x, digits = 0)})
or
as.data.frame(lapply(temp[cols], function(x){accounting(x, digits = 0)}))
or using dplyr something like:
temp %>%
mutate_at(.vars = cols,.funs = accounting,digits = 0)

How to replace ties with NA in R

I am working on a function to return the column name of the largest value for each row. Something like:
colnames(x)[apply(x,1,which.max)]
However, before applying a function like this is there a straight forward and general way to replace ties with NA (or any other arbitrary letter etc.)?
I have the following matrix:
0 1
[1,] 5.000000e-01 0.5000000000
[2,] 9.901501e-01 0.0098498779
[3,] 9.981358e-01 0.0018641935
[4,] 9.996753e-01 0.0003246823
[5,] 9.998598e-01 0.0001402322
[6,] 1.303731e-02 0.9869626938
[7,] 1.157919e-03 0.9988420815
[8,] 6.274074e-07 0.9999993726
[9,] 1.659164e-07 0.9999998341
[10,] 6.517362e-08 0.9999999348
[11,] 8.951474e-06 0.9999910485
[12,] 5.070740e-06 0.9999949293
[13,] 1.278186e-07 0.9999998722
[14,] 9.914646e-08 0.9999999009
[15,] 7.058751e-08 0.9999999294
[16,] 2.847667e-09 0.9999999972
[17,] 1.675766e-08 0.9999999832
[18,] 2.172290e-06 0.9999978277
[19,] 4.964820e-06 0.9999950352
[20,] 1.333680e-07 0.9999998666
[21,] 2.087793e-07 0.9999997912
[22,] 2.358360e-06 0.9999976416
The first row has equal values for variables which I would like to replace with NA. While this is simple for this particular example, I want to be able to replace all ties with NA where they occur in any size matrix i.e. in this matrix:
1 2 3
[1,] 0.25 0.25 0.5
[2,] 0.3 0.3 0.3
all values would be replaced with NA except for [1,3]
I have looked at the function which.max.simple() which can deal with ties by replacing with NA but it doesn't appear to work any more, and all other methods of dealing with ties don't address my issue
I hope that makes sense
Thanks,
C
Here's a simple approach to replace any row-wise duplicated values with NA in a matrix m:
is.na(m) <- t(apply(m, 1, FUN = function(x) {
duplicated(x) | duplicated(x, fromLast = TRUE)}))
But consider the following notes:
1) be extra careful when comparing floating point numbers for equality (see Why are these numbers not equal?);
2) depending on your ultimate target, there may be simpler ways than replacing duplicated in your data (since it seems that you are only interested in column names); and
3) if you are going to replace values in a numeric matrix, don't use arbitrary characters for replacement since that will convert your whole matrix to character class (replacement with NA is not a problem)

How to get values on testdata in RSNNS

I have two files, "testi" containing few numbers and "testo" containing their square roots. I have another test named file which contains some numbers for which I want their square roots. I used the command
model <- mlp(testi,testo,size=50,learnFuncParams = c(0.001),maxit = 5000)
xyz <- predict(model,test)
The values which I get from "xyz" are
xyz
#[1,] 0.9971085
#[2,] 0.9992253
#[3,] 0.9992997
#[4,] 0.9993009
#[5,] 0.9993009
#[6,] 0.9993009
#[7,] 0.9993009
Whereas "test" contains
1 4
2 16
3 36
4 64
5 100
6 144
7 196
Please let me know why does this happen?
mlp has logistic output, you need to specify linOut=TRUE. In general, normalizing your data would also help.

Using rollapply() to find modal value

I've got panel data and have been playing around with k-means clustering. So now I've got a panel of factor values that are mostly stable but I'd like to smooth that out a bit more so that (for example) the data says "Wyoming was in group 1 in earlier years, moved into group 2, then moved into group 5" rather than "Wyoming was in group 1,1,1,2,3,2,2,5,5,5".
So the approach I'm taking is to use rollapply() to calculate the modal value. Below is code that works to calculate the mode ("Mode()"), and a wrapper for that ("ModeR()") that (perhaps clumsily) resolves the problem of multi-modal windows by randomly picking a mode. All that is fine, but when I put it into rollapply() I'm getting problems.
Mode <- function(vect){ # take a vector as input
temp <- as.data.frame(table(vect))
temp <- arrange(temp,desc(Freq)) # from dplyr
max.f <- temp[1,2]
temp <- filter(temp,Freq==max.f) # cut out anything that isn't modal
return(temp[,1])
}
ModeR <- function(vect){
out <- Mode(vect)
return(out[round(runif(1,min=0.5000001,max=length(out)+0.499999999))])
}
temp <- round(runif(20,min=1,max=10)) # A vector to test this out on.
cbind(temp,rollapply(data=temp,width=5,FUN=ModeR,fill=NA,align="right"))
which returned:
temp
[1,] 5 NA
[2,] 6 NA
[3,] 5 NA
[4,] 5 NA
[5,] 7 1
[6,] 6 1
[7,] 5 1
[8,] 5 1
[9,] 3 2
[10,] 1 3
[11,] 5 3
[12,] 7 3
[13,] 5 3
[14,] 4 3
[15,] 3 3
[16,] 4 2
[17,] 8 2
[18,] 5 2
[19,] 6 3
[20,] 6 3
Compare that with:
> ModeR(temp[1:5])
[1] 5
Levels: 5 6 7
> ModeR(temp[2:6])
[1] 6
Levels: 5 6 7
So it seems like the problem is in how ModeR is being applied in rollapply(). Any ideas?
Thanks!
Rick
Thanks to /u/murgs! His comment pointed me in the right direction (in addition to helping me streamline ModeR() using sample()).
ModeR() as written above returns a factor (as does Mode()). I need it to be a number. I can fix this by updating my code as follows:
Mode <- function(vect){ # take a vector as input
temp <- as.data.frame(table(vect))
temp <- arrange(temp,desc(Freq))
max.f <- temp[1,2]
temp <- filter(temp,Freq==max.f) # cut out anything that isn't modal
return(as.numeric(as.character(temp[,1]))) #HERE'S THE BIG CHANGE
}
ModeR <- function(vect){
out <- Mode(vect)
return(out[sample(1:length(out),1)]) #HERE'S SOME IMPROVED CODE!
}
Now rollapply() does what I expected it to do! There's still that weird as.character() bit (otherwise it rounds down the number). I'm not sure what's going on there, but the code works so I won't worry about it...

Resources