Extracting x and y values from a character string in R - r

I'm completely new to R. I have a dataframe which which contains the character below:
[{\"task\":\"T1\",\"task_label\":\"Draw around the infarct area\n\",\"value\":[{\"tool\":0,\"frame\":0,\"points\":[{\"x\":786,\"y\":139.8},{\"x\":712.3,\"y\":245.3},{\"x\":717.7,\"y\":291.7},{\"x\":804.9,\"y\":335.6},{\"x\":866.1,\"y\":352.7},{\"x\":877.5,\"y\":402.4},{\"x\":866,\"y\":492.9},{\"x\":823.2,\"y\":560.1},{\"x\":765.5,\"y\":603.6},{\"x\":791.8,\"y\":631.7},{\"x\":830.3,\"y\":617.8},{\"x\":846.9,\"y\":618.1},{\"x\":937.1,\"y\":538.5},{\"x\":941.1,\"y\":476.4},{\"x\":983.2,\"y\":443},{\"x\":1020.5,\"y\":338.4},{\"x\":997.1,\"y\":232.7},{\"x\":996.9,\"y\":232.7},{\"x\":921.5,\"y\":145},{\"x\":921.2,\"y\":145},{\"x\":850.6,\"y\":121},{\"x\":850.6,\"y\":120.7},{\"x\":786,\"y\":139.8}],\"details\":[],\"tool_label\":\"Tool name\"}]}]"
I am looking to extract the x and y coordinates and index them. For example:
x1 = 786, x2 = 712.3, x3 = 717.7 etc
y1 = 139.8, y2 = 245.3, y3 = 291.7 etc
I have tried using substring and gsub but have got unstuck.
Ideally, I would create a for loop which reads the number and stores as a variable.
Any suggestions would be really appreciated! Thanks

Your data looks like a json structure. Only precondition: remove the \n character "Draw around the infarct area\n". Then this worked on my system.
require(jsonlite)
dt <- fromJSON("[{\"task\":\"T1\",\"task_label\":\"Draw around the infarct area\",\"value\":[{\"tool\":0,\"frame\":0,\"points\":[{\"x\":786,\"y\":139.8},{\"x\":712.3,\"y\":245.3},{\"x\":717.7,\"y\":291.7},{\"x\":804.9,\"y\":335.6},{\"x\":866.1,\"y\":352.7},{\"x\":877.5,\"y\":402.4},{\"x\":866,\"y\":492.9},{\"x\":823.2,\"y\":560.1},{\"x\":765.5,\"y\":603.6},{\"x\":791.8,\"y\":631.7},{\"x\":830.3,\"y\":617.8},{\"x\":846.9,\"y\":618.1},{\"x\":937.1,\"y\":538.5},{\"x\":941.1,\"y\":476.4},{\"x\":983.2,\"y\":443},{\"x\":1020.5,\"y\":338.4},{\"x\":997.1,\"y\":232.7},{\"x\":996.9,\"y\":232.7},{\"x\":921.5,\"y\":145},{\"x\":921.2,\"y\":145},{\"x\":850.6,\"y\":121},{\"x\":850.6,\"y\":120.7},{\"x\":786,\"y\":139.8}],\"details\":[],\"tool_label\":\"Tool name\"}]}]")
(dt[[3]][[1]])[[3]][[1]]
If you want to remove the \ncharacters with code you could use a function like str_replace in the stringr package.

As #Jan points out, this is JSON data. But I think it's probably easier to get the data out with regular expressions.
library(stringr)
library(dplyr)
str_extract_all(data,'([xy])[\\\":]+([0-9\\.]+)') %>%
str_extract_all(c("[xy]","[0-9\\.]+")) %>%
bind_cols
# A tibble: 46 x 2
V1 V2
<chr> <chr>
1 x 786
2 y 139.8
3 x 712.3
4 y 245.3
5 x 717.7
6 y 291.7
7 x 804.9
8 y 335.6
9 x 866.1
10 y 352.7
# … with 36 more rows

Related

Creating a table that shows a function evaluated for sequences

How would I create a table that takes two varaibles composed of incremental sequences and evaluates a function for the these two variables. An example of what I want to create is like a multiplication table. So the function would be x*y and it would produce a table where [row, column] [1,1]=1, [1,2]=2 [5,5]=25 etc
I think you can use for loops bit I'm not sure.
Thanks in advance
JOE this is pretty basic ... try to follow a basic data manipulation tutorial.
For this type of operations you do not need loops. Read up on vector operations.
What you want to do can be easily done in R with a data frame/tibble.
base R
# create your test vectors
x <- c(1,1,5)
y <- c(1,2,5)
# store them in a data frame
df <- data.frame(x = x, y = y)
df
x y
1 1 1
2 1 2
3 5 5
# in base R you code by refernce to the object and dollar notation
df$mult <- df$x * df$y
df
x y mult
1 1 1 1
2 1 2 2
3 5 5 25
tidyverse
The tidyverse might be a bit more intuitive for vectorised operations:
library(dplyr) # the main data crunching package of the tidyverse
df <- data.frame(x = x, y = y)
# with mutate you can create a new vector (or overwrite an existing one)
df <- df %>% mutate(MULT = x * y)
df
x y MULT
1 1 1 1
2 1 2 2
Good luck with your learning journey!
3 5 5 25

Finding a Row with a Partial Match in another Data Frame and then writing its output into the original DF

I have a data frame "zadavatele" and data frame "Final_Town_Contacts". In the df zadavatele, there is a column "buyer_name" with the name and in df "Final_Town_Contacts" there is a column "Name" and "id". The problem is that the formats of the names are a bit different so I can't use the match function. The column "buyer_name" contains the name along with other words, hence it contains the full contents of the column "Name" and some other characters. The other problem is that not all towns included in the "zadavatele" df are included in "Final_Town_Contacts".
I want to find matching towns and then write the variable "id" from the data frame "Final_Town_Contacts" for the corresponding town in "zadavatele".
To demonstrate what I would like to do, here are the two data frames:
# A tibble: 3 x 1
buyer_name
<chr>
1 xx abc
2 y fdg
3 z sad
Name id
<chr> <dbl>
1 y 54
2 z 11
3 x 32
I would like to have this output in the "zadavatele" data frame:
buyer_name id
<chr> <dbl>
1 xx abc 32
2 y fdg 54
3 z sad 11
I was thinking of using a for loop:
for (i in 1:nrow(zadavatele)) {
for (n in 1:nrow(Final_Town_Contacts)){
if(str_detect(zadavatele$buyer_name[i], Final_Town_Contacts$Name[n] &&
!is.na(str_detect(zadavatele$buyer_name[i], Final_Town_Contacts$Name[n]))))
zadavatele$id[n] = Final_Town_Contacts$`LAU 2`
}
}
But that did not work. Do you have any idea how to make this work?
Thank you for your help!
Does this work, So only the matching names will be included as because of inner join. The buyer_name's first character is matched with Name column. Please let me know if that won't be the case always.
library(dplyr)
df1 %>% mutate(Name = substr(buyer_name,1,1)) %>% inner_join(df2) %>% select(1,3)
Joining, by = "Name"
buyer_name id
1 xx abc 32
2 y fdg 54
3 z sad 11

How to avoid number rounding when using as.numeric() in R?

I am reading well structured, textual data in R and in the process of converting from character to numeric, numbers lose their decimal places.
I have tried using round(digits = 2) but it didn't work since I first had to apply as.numeric. At one point, I did set up options(digits = 2) before the conversion but it didn't work either.
Ultimately, I desired to get a data.frame with its numbers being exactly the same as the ones seen as characters.
I looked up for help here and did find answers like this, this, and this; however, none really helped me solve this issue.
How will I prevent number rounding when converting from character to
numeric?
Here's a reproducible piece of code I wrote.
library(purrr)
my_char = c(" 246.00 222.22 197.98 135.10 101.50 86.45
72.17 62.11 64.94 76.62 109.33 177.80")
# Break characters between spaces
my_char = strsplit(my_char, "\\s+")
head(my_char, n = 2)
#> [[1]]
#> [1] "" "246.00" "222.22" "197.98" "135.10" "101.50" "86.45"
#> [8] "72.17" "62.11" "64.94" "76.62" "109.33" "177.80"
# Convert from characters to numeric.
my_char = map_dfc(my_char, as.numeric)
head(my_char, n = 2)
#> # A tibble: 2 x 1
#> V1
#> <dbl>
#> 1 NA
#> 2 246
# Delete first value because it's empty
my_char = my_char[-1,1]
head(my_char, n = 2)
#> # A tibble: 2 x 1
#> V1
#> <dbl>
#> 1 246
#> 2 222.
It's how R visualize data in a tibble.
The function map_dfc is not rounding your data, it's just a way R use to display data in a tibble.
If you want to print the data with the usual format, use as.data.frame, like this:
head(as.data.frame(my_char), n = 4)
V1
#>1 246.00
#>2 222.22
#>3 197.98
#>4 135.10
Showing that your data has not been rounded.
Hope this helps.

R: Create ID based on two different variables

I am a beginner trying to work with R, but constantly hitting walls.
I have a giant dataset (thousands of entries) that looks like this: there is a column for Latitude, Longitude and PlotCode.
I have more than one plot per Longitude and Latitude. I would like to create a new column with some sort of ID to all plots with the same latitude and Longitude.
Something that will look like this in the end:
Any suggestions? Thank you.
Welcome to SO! It's better to add data, desired outputs, attempts and so on in your question. However maybe you can find a solution with the package dplyr.
After installing it, you could do this:
library(dplyr)
# some data like yours
data_latlon <- data.frame(Lat = c(1,1,1,2,2,2,3,3,3)
, Long = c(45,45,45,12,12,12,23,23,23)
, PlotCode = c('a','a','a','b','b','b','c','c','c'))
data_latlon %>% # the pipe operator to have dplyr chains
group_by(Lat,Long) %>% # group by unique Lat and Long
summarise(PlotCodeGrouped = paste(PlotCode,collapse='')) # add a new column that collapse all the plot,
# you can specify how to separate
# with the collapse option, in
# this case nothing
# A tibble: 3 x 3
# Groups: Lat [?]
Lat Long PlotCodeGrouped
<dbl> <dbl> <chr>
1 1 45 aaa
2 2 12 bbb
3 3 23 ccc
EDIT
It's easier the code as you'd like the result:
data_latlon %>% # the pipe operator to have dplyr chains
group_by(Lat,Long, add=TRUE) # group by unique Lat and Long
# and add a ""hierarchical father"
# Groups: Lat, Long [3]
Lat Long PlotCode
<dbl> <dbl> <fct>
1 1. 45. a
2 1. 45. a
3 1. 45. a
4 2. 12. b
5 2. 12. b
6 2. 12. b
7 3. 23. c
8 3. 23. c
9 3. 23. c
I think I found the solution, what I needed is something called cluster ID.
dataframe <- transform(dataframe, Cluster_ID = as.numeric(interaction(Lat, Long, drop=TRUE)))
By grouping you mean sort / arrange them by PlotCode?
if so you can use sort function or you can use arrange function through
tidyverse / dplyr package

Sorting a column in descending order in R excluding the first row

I have a dataframe with 5 columns and a very large dataset. I want to sort by column 3. How do you sort everything after the first row? (When calling this function I want to end it with nrows)
Example output:
Original:
4
7
9
6
8
New:
4
9
8
7
6
Thanks!
If I'm correctly understanding what you want to do, this approach should work:
z <- data.frame(x1 = seq(10), x2 = rep(c(2,3), 5), x3 = seq(14, 23))
zsub <- z[2:nrow(z),]
zsub <- zsub[order(-zsub[,3]),]
znew <- rbind(z[1,], zsub)
Basically, snip off the rows you want to sort, sort them in descending order on column 3, then reattach the first row.
And here's a piped version using dplyr, so you don't clutter the workspace with extra objects:
library(dplyr)
z <- z %>%
slice(2:nrow(z)) %>%
arrange(-x3) %>%
rbind(slice(z, 1), .)
You might try this single line of code to modify the third column in your data frame df as described:
df[,3] <- c(df[1,3],sort(df[-1,3]))
df$x[-1] <- df$x[-1][order(df$x[-1], decreasing=T)]
# x
# 1 4
# 2 9
# 3 8
# 4 7
# 5 6

Resources