I have an object (Seurat object) an I need to get certain data out of it
> sc#misc[["colors"]][["seurat_clusters"]]
0 1 2 3 4 5 6 7
"#CC0C00FF" "#5C88DAFF" "#84BD00FF" "#FFCD00FF" "#7C878EFF" "#00B5E2FF" "#00AF66FF" "#CC0C00B2"
This data is needed as an vector but I don't know how to pull "#CC0C00FF" "#5C88DAFF" etc. out of it.
In order to hand this data to the next function, the result should look like this:
> vec
[1] "#CC0C00FF" "#5C88DAFF" "#84BD00FF"
Thanks in advance!
Solved it! I'm pretty disappointed by myself, because I didn't know this function existed:
> as.vector(sc#misc[["colors"]][["seurat_clusters"]])
[1] "#CC0C00FF" "#5C88DAFF" "#84BD00FF" "#FFCD00FF" "#7C878EFF" "#00B5E2FF" "#00AF66FF" "#CC0C00B2"
Related
I want to load the data from a JSON file into R to make a new dataframe. However the JSON file consists out of other links with data, so i can't seem to find the actual data from the JSON file. I got the JSON file from this website: https://ckan.dataplatform.nl/dataset/467dc230-20e0-4c3a-8240-dccbfc20807a/resource/531cc276-b88e-49bb-a97f-443707936a12/download/p-route-autoparkeren.json
This is the code i used.
library(rjson)
JSONList1 <- fromJSON(file = "utrecht2.json")
print(JSONList1)
JSONList1_df <- as.data.frame(JSONList1)
when i use this code i get only 1 observation with 411 variables.
Any idea how to do this? I'm a beginner and i've never worked with JSON files.
Maybe try fromJSON from package jsonlite
library(jsonlite)
JSONList1 <- fromJSON("https://ckan.dataplatform.nl/dataset/467dc230-20e0-4c3a-8240-dccbfc20807a/resource/531cc276-b88e-49bb-a97f-443707936a12/download/p-route-autoparkeren.json")
There are several packages offering JSON importing abilities. If I use the one I am involved with, then the resulting data appears to contain a data.frame as the first list element.
d <- RcppSimdJson::fload("https://ckan.dataplatform.nl/dataset/467dc230-20e0-4c3a-8240-dccbfc20807a/resource/531cc276-b88e-49bb-a97f-443707936a12/download/p-route-autoparkeren.json")
> class(d)
[1] "list"
> class(d[[1]])
[1] "data.frame"
>
> head(d[[1]])
dynamicDataUrl
1 http://opendata.technolution.nl/opendata/parkingdata/v1/dynamic/8d85bbdb-8bbd-4a24-b35f-85f21186ec04
2 http://opendata.technolution.nl/opendata/parkingdata/v1/dynamic/21b0388a-56f7-4cba-8fd3-4a1c914f5fe2
3 http://opendata.technolution.nl/opendata/parkingdata/v1/dynamic/45434989-3252-4c85-8731-c856b02c390c
4 http://opendata.technolution.nl/opendata/parkingdata/v1/dynamic/9064b206-7e62-402d-ae62-f25a0e47571b
5 http://opendata.technolution.nl/opendata/parkingdata/v1/dynamic/5829fb06-ee4a-4762-946c-ed6209edf7d5
6 http://opendata.technolution.nl/opendata/parkingdata/v1/dynamic/e4da517a-ef32-426d-821c-96e29ac5ac80
staticDataUrl
1 http://opendata.technolution.nl/opendata/parkingdata/v1/static/8d85bbdb-8bbd-4a24-b35f-85f21186ec04
2 http://opendata.technolution.nl/opendata/parkingdata/v1/static/21b0388a-56f7-4cba-8fd3-4a1c914f5fe2
3 http://opendata.technolution.nl/opendata/parkingdata/v1/static/45434989-3252-4c85-8731-c856b02c390c
4 http://opendata.technolution.nl/opendata/parkingdata/v1/static/9064b206-7e62-402d-ae62-f25a0e47571b
5 http://opendata.technolution.nl/opendata/parkingdata/v1/static/5829fb06-ee4a-4762-946c-ed6209edf7d5
6 http://opendata.technolution.nl/opendata/parkingdata/v1/static/e4da517a-ef32-426d-821c-96e29ac5ac80
limitedAccess identifier name
1 FALSE 8d85bbdb-8bbd-4a24-b35f-85f21186ec04 P06 - Sluisstraat
2 FALSE 21b0388a-56f7-4cba-8fd3-4a1c914f5fe2 3 - Burcht
3 FALSE 45434989-3252-4c85-8731-c856b02c390c P01 - Stationsplein
4 FALSE 9064b206-7e62-402d-ae62-f25a0e47571b Jaarbeurs P3 - Jaarbeurs P3
5 FALSE 5829fb06-ee4a-4762-946c-ed6209edf7d5 P03 - Dek Stadspoort
6 FALSE e4da517a-ef32-426d-821c-96e29ac5ac80 PG-Pieter Vreedeplein
locationForDisplay
1 NA
2 WGS84, 52.4387428557465, 4.82805132865906
3 WGS84, 52.2573226613971, 6.16240739822388
4 WGS84, 52.0854991774024, 5.10619640350342
5 WGS84, 52.256324421386, 6.15569114685059
6 WGS84, 51.5582297848141, 5.08894979953766
>
I would expect this to be similar for the other ones.
xx = c("calculated_p3", "calculated_c1" ,"calculated_p2" ,"calculated_c2", "calculated_d2",
"calculated_d3", "calculated_c3", "calculated_p1" ,"calculated_d1")
order(xx)
The output is: 2 4 7 9 5 6 8 3 1
Why is the "calculated_d1" ordered as the first element? And why is "calculated_c2" ordered as the 9th element? I don't understand here. Shouldn't "calculated_c1" be the first one?
Thank you for your help
order is written such that xx[order(xx)] is the same as sort(xx).
The numbers don't refer to the position that each entry should go to but rather the position the entries should come from if they were in order.
calculated_c1 should indeed be the first one. As it is in position 2, the first number is therefore a 2.
If you want to keep your order you can use factors:
factor(xx, xx)
[1] calculated_p3 calculated_c1 calculated_p2 calculated_c2 calculated_d2 calculated_d3 calculated_c3 calculated_p1
[9] calculated_d1
9 Levels: calculated_p3 calculated_c1 calculated_p2 calculated_c2 calculated_d2 calculated_d3 ... calculated_d1
I'm trying to split a string in one column...
> df.arpt
arpt
1 CMH 39402
2 IAH 97571
3 DAL 67191
4 HOU 07614
5 OKC 11127
...and break it out into two new columns with a result that looks like this...
> df.arpt
arpt arptCode arptID
1 CMH 39402 CMH 39402
2 IAH 97571 IAH 97571
3 DAL 67191 DAL 67191
4 HOU 07614 HOU 07614
5 OKC 11127 OKC 11127
I really want something like this to be possible...
> df.arpt$arptCode <- strsplit(df.arpt$arpt, " ")[[...]][1]
> df.arpt$arptID <- strsplit(df.arpt$arpt, " ")[[...]][2]
... where the ... in the code represents "for every record in the data frame".
Any suggestions on how to go about this? (I'd like to stick with base R / "out-of-the-box" R rather than higher-level packages.) Am I thinking about this the right way in R?
If the arptCode values are row names, you can convert them into a column.
library(tidyverse)
df.arpt %>%
rownames_to_column(var = "arptCode")
If they are not row names then you can use separate.
library(tidyverse)
df.arpt %>%
separate(arpt, into = c('arptCode', 'aprtID'))
How about this:
df<-data.frame(arpt =c("CMH 39402", "IAH 97571", "DAL 67191", "HOU 07614", "OKC 11127"))
tidyr::separate(df, arpt, into = c("artpCode", "arptID"))
Because the strings are all fixed length, I was able to apply the substr function instead in order to move past the problem. However, I still don't know what the solution would be if the result of the function was a list.
I recently reverted to R version 3.1.3 for compatibility reasons and am now encountering an unexplained error with the subset function.
I want to extract all rows for the gene "Migut.A00003" from the data frame transcr_effects using the gene name as listed in the data frame expr_mim_genes. (this will later become a loop). This action always returns all rows instead of specific rows I am looking for, no matter the formatting of the subset lookup:
> class(expr_mim_genes)
[1] "data.frame"
> sapply(expr_mim_genes, class)
gene longest.tr pair.length
"character" "logical" "numeric"
> head(expr_mim_genes)
gene longest.tr pair.length
1 Migut.A00003 NA 0
2 Migut.A00006 NA 0
3 Migut.A00007 NA 0
4 Migut.A00012 NA 0
5 Migut.A00014 NA 0
6 Migut.A00015 NA 0
> class(transcr_effects)
[1] "data.frame"
> sapply(transcr_effects, class)
pair gene
"character" "character"
> head(transcr_effects)
pair gene
1 pair1 Migut.N01020
2 pair10 Migut.A00351
3 pair1000 Migut.F00857
4 pair10007 Migut.D01637
5 pair10008 Migut.A00401
6 pair10009 Migut.G00442
. . .
7168 pair3430 Migut.A00003
. . .
The gene I am interested in:
> expr_mim_genes[1,"gene"]
[1] "Migut.A00003"
R sees these two terms as equivalent:
> expr_mim_genes[1,"gene"] == "Migut.A00003"
[1] TRUE
If I type in the name of the gene manually, the correct number of rows are returned:
> nrow(subset(transcr_effects, transcr_effects$gene=="Migut.A00003"))
[1] 1
> subset(transcr_effects, transcr_effects$gene=="Migut.A00003")
pair gene
7168 pair3430 Migut.A00003
However, this should return one row from the data.frame but it returns all rows:
> nrow(subset(transcr_effects, transcr_effects$gene == (expr_mim_genes[1,"gene"]))
[1] 10122
I have a feeling this has something to do with text formatting, but I've tried everything and haven't been able to figure it out. I've seen this issue with quoted v.s. unquoted entries, but it does not appear to be the issue here (see equality above).
I didn't have this problem before switching to R v.3.1.3, so maybe it is a version convention I am unaware of?
EDIT:
This is driving me crazy, but at least I think I have found a patch. There was quite a bit of data and file processing to get to this point in the code, involving loading at least 4 files. I've tried taking snippets of each file to post a reproducible example here, but sometimes when I analyze the snippets the error recurs, sometimes it does not (!!). After going through the process though, I discover that:
i = 1
gene = expr_mim_genes[i,"gene"]
> nrow(subset(transcr_effects, gene == gene))
[1] 10122
> nrow(subset(transcr_effects, gene == (expr_mim_genes[i,"gene"])))
[1] 1
I still can't explain this behavior of the code, but at least I know how to work around it.
Thanks all.
I have two seemingly identical zoo objects created by the same commands from csv files for different time periods. I try to combine them into one long zoo but I'm failing with "indexes overlap" error. ('merge' 'c' or 'rbind' all produce variants of the same error text.) As far as I can see there are no duplicates and the time periods do not overlap. What am I doing wrong? Am using R version 3.0.1 on Windows 7 64bit if that makes a difference.
> colnames(z2)
[1] "Amb" "HWS" "Diff"
> colnames(t.tmp)
[1] "Amb" "HWS" "Diff"
> max(index(z2))
[1] "2012-12-06 02:17:45 GMT"
> min(index(t.tmp))
[1] "2012-12-06 03:43:45 GMT"
> anyDuplicated(c(index(z2),index(t.tmp)))
[1] 0
> c(z2,t.tmp)
Error in rbind.zoo(...) : indexes overlap
>
UPDATE: In trying to make a reproducible case I've concluded this is an implementation error due to the large number of rows I'm dealing with: it fails if the final result is more than 311434 rows long.
> nrow(c(z2,head(t.tmp,n=101958)))
Error in rbind.zoo(...) : indexes overlap
> nrow(c(z2,head(t.tmp,n=101957)))
[1] 311434
# but row 101958 inserts fine on its own so its not a data problem.
> nrow(c(z2,tail(head(t.tmp,n=101958),n=2)))
[1] 209479
I'm sorry but I dont have the R scripting skills to produce a zoo of the critical length, hopefully someone might be able to help me out..
UPDATE 2- Responding to Jason's suggestion.. : The problem is in the MATCH but my R skills arent sufficient to know how to interpret it- does it mean MATCH finds a duplicate value in x.t whereas anyDuplicated does not?
> x.t <- c(index(z2),index(t.tmp));
> length(x.t)
[1] 520713
> ix <- ORDER (x.t)
> length(ix)
[1] 520713
> x.t <- x.t[ix]
> length(ix)
[1] 520713
> length(x.t)
[1] 520713
> tx <- table(MATCH(x.t,x.t))
> max(tx)
[1] 2
> tx[which(tx==2)]
311371 311373 311378 311383 311384 311386 311389 311392 311400 311401
2 2 2 2 2 2 2 2 2 2
> anyDuplicated(x.t)
[1] 0
After all the testing and head scratching it seems that the problem I'm having is timezone related. Setting the environment to the same time zone as the original data makes it work just fine.
Sys.setenv(TZ="GMT")
> z3<-rbind(z2,t.tmp)
> nrow(z3)
[1] 520713
Thanks to how to guard against accidental time zone conversion for the inspiration to look in that direction.