Cut a 3 dimensional data in r - multidimensional-array

I have a 3 dimensional data and I want to use a part of it (from a specific date to another specific date). I am trying to use split but it says (the same happends with subset)
Error in time > 1611755999 : comparison (6) is possible only for atomic and list types.
What am I doing wrong?
Thank you.

Related

How to access the entry value with unrecognised number of decimal in data frame in R?

I have a data frame in R that I want to analyse. I want to know how many specific numbers are in a data frame column. for example, I want to know the frequency of number 0.9998558 by using
sum(deviation_multiple_regression_3cell_types_all_spots_all_intersection_genes_exclude_50_10dec_rowSums_not_0_for_moran_scaled[,3]== 0.9998558)
However, it seems that the decimal shown is not the actual one (it must be 0.9998558xxxxx) since the result I got from using the above command is 0 (the correct one should be 3468). How can I access that number without knowing the exact decimal numbers so that I get the correct answer? Please see the screenshot below.
The code below gives the number of occurrences in the column.
x <- 0.9998558
length(which(df$a==x))
If you are looking for numbers stating with 0.9998558, I think you can do it in two different ways: working with data as numeric or as character.
Let x be your variable:
Data as character
This way counts exactly what you are looking for
sum(substr(as.character(x),1,9)=="0.9998558")
Data as numeric
This will include all the values with a difference with the reference value lower than 1e-7; this may include values not starting exactly with 0.9998558
sum(abs(x-0.9998558)<1e-7)
You can also "truncate" the numbers in your vector and compare them with the number you want. Here, we write 10^7 because 7 is the number of decimals you want to compare.
sum(trunc(x*10^7)/10^7)==0.9998558)

How can I access data in a nested R list?

I want to learn how to access data from a nested list in R. I am relatively new to the R programming language, so I am unsure how to proceed.
The data is a 'large list(947 elements, 654.9mb) and takes the form:
The numbers within the datalist refer to station numbers and when I click on one (in Rstudio) it looks like this:
I want to kow how I can access the data within 'doy' for example. I have tried:
data[[1]]
which returns all the data for the first element of the list (site, location, doy,ltm etc). So clearly the number used within the square brackets is interpreted as an index for the list, as opposed to an identifier for the elements/station in the list.
Then I tried:
data$1
but it returned the error:
Error: unexpected numeric constant in "data$1"
Then I tried:
data[data$1==doy]
But was returned this:
Error: unexpected numeric constant in "data[data$1"
So at this point, I realise that it is not construing the number of the station as a category/factor within the list. It's just reading it as a number. So I thought I'd put some quotes around it to see if that changed what happened:
data[data$"1"=="doy"]
This returned
named list()
But when I looked at it in the environment, it was a list of 0.
I looked at some of the similar question here on Stack (like: accessing nested lists in R) and tried:
data[data$"1"=="doy",][[1]]
But just got:
Error in data[data$"1" == "doy", ] : incorrect number of dimensions
How can I access this data? It reminds me of a structure in Matlab, but it doesn't seem to be indexed in a similar fashion in R.
Let's look at some ways to do what you want:
data[[1]]
This returns the first element of the list, which is itself a list. You can use the $ subsetting shorthand, but the name of the first element is nonstandard. R prefers names that start with letters and include only alphanumeric characters, periods and underscores. You can escape this behavior with backticks:
data$`1`
If you want to access one of the elements of list 1 in your list of lists, you need to further subset. To get to doy, which is the third element of 1. You can do that four ways.
data[[1]][[3]]
data$`1`[[3]]
data[[1]]$doy
data$`1`$doy
One way (in addition to what Ben Norris has shown):
our_list[[c("1", "doy")]]
Reproducible example data (please provide next time)
our_list <- list(`1` = list(site = "x", doy = 3))

R: creating factor using data from multiple columns

I want to create a column that codes for whether patients have had a comorbid diagnosis of depression or not. Problem is, the diagnosis can be recorded in one of 4 columns:
ComorbidDiagnosis;
OtherDiagnosis;
DischargeDiagnosis;
OtherDischargeDiagnosis.
I've been using
levels(dataframe$ynDepression)[levels(dataframe$ComorbidDiagnosis)=="Depression"]<-"Yes"
for all 4 columns but I don't know how to code those who don't have a diagnosis in any of the columns. I tried:
levels(dataframe$ynDepression)[levels(dataframe$DischOtherDiagnosis &
dataframe$OtherDiagnosis &
dataframe$ComorbidDiagnosis &
dataframe$DischComorbidDiagnosis)==""]<-"No"
I also tried using && instead but it didn't work. Am I missing something?
Thanks in advance!
Edit: I tried uploading an image of some example data but I don't have enough reputations to upload images yet. I'll try to put an example here but might not work:
Patient ID PrimaryDiagnosis OtherDiagnosis ComorbidDiagnosis
_________AN__________Depression
_________AN
_________AN__________Depression______PTSD
_________AN_________________________Depression
What's inside the [] must be (transformable to) a boolean for the subset to work. For example:
x<-1:5
x[x>3]
#4 5
x>3
# F F F T T
works because the condition is a boolean vector. Sometimes, the booleanship can be implicite, like in dataframe[,"var"] which means dataframe[,colnames(dataframe)=="var"] but R must be able to make it a boolean somehow.
EDIT : As pointed out by beginneR, you can also subset with something like df[,c(1,3)], which is numeric but works the same way as df[,"var"]. I like to see that kind of subset as implicit booleans as it enables a yes/no choice but you may very well not agree and only consider that they enable R to select columns and rows.
In your case, the conditions you use are invalid (dataframe$OtherDiagnosisfor example).
You would need something like rowSums(df[,c("var1","var2","var3")]=="")==3, which is a valid condition.

Select a column from a dynamic variable

How can I select the second column of a dynamically named variable?
I create variables of the form "population.USA", "population.Mexico", "population.Canada". Each variable has a column for the year, and another column for the population value. I would like to select the second column from each of these variables during a loop.
I use this syntax:
sprintf("population.%s", country)[, 2]
R returns the error: Error in sprintf("population.%s", country)[, 2] : incorrect number of dimensions
Based on your sequence of questions over the last few minutes, I have two general recommendations for you as you get familiar with R:
Don't use sprintf.
Don't use assign.
Now, obviously, those functions are both useful at times. But you've learned about them too early, before you've mastered some basic stuff about R's data structures. Try to write code without those crutches (for the time being!), as they're just causing you problems.
Rather than creating separate individual variables for each nation's population, place them in a list.
population <- vector("list",3)
names(population) <- c('USA','Mexico','Russia')
Then you can access each using the string representation of the name of each country:
population[['USA']] <- 10000
Or,
region <- 'USA'
population[[region]]
In this example, I've assigned a single value to a list element, lists will hold any other data type, including matrices or data frames. It will be a lot less typing than using sprintf and assign, and a lot safer and more efficient as well.
See ?get. Here is an example:
> country <- "FOO"
> assign(sprintf("population.%s", country), data.frame(runif(5), runif(5)))
>
> get(sprintf("population.%s", country))[,2]
[1] 0.2241105 0.5640709 0.5945869 0.1830719 0.1895938
It is critically important to look at the object returned by a function if you get an error. It is immediately clear why your example fails if you just look at what it returns:
> sprintf("population.%s", country)
[1] "population.FOO"
At that point it would be immediately clear, if you didn't already know or have thought to read ?sprintf, that sprintf() returns a string not the object of that name. Armed with that knowledge you would have narrowed down the problem to how to recall an object from the computed name?

R: partimat function doesn't recognize my classes

I am a relatively novice r user and am attempting to use the partimat() function within the klaR package to plot decision boundaries for a linear discriminant analysis but I keep encountering the same error. I have tried inputing the arguments multiple different ways according to the manual, but keep getting the following error:
Error in partimat.default(x, grouping, ...) :
at least two classes required
Here is an example of the input I've given:
partimat(sources1[,c(3:19)],grouping=sources1[,2],method="lda",prec=100)
where my data table is loaded in under the name "sources1" with columns 3 through 19 containing the explanatory variables and column 2 containing the classes. I have also tried doing it by entering the formula like so:
partimat(sources1$group~sources1$tio2+sources1$v+sources1$cr+sources1$co+sources1$ni+sources1$rb+sources1$sr+sources1$y+sources1$zr+sources1$nb+sources1$la+sources1$gd+sources1$yb+sources1$hf+sources1$ta+sources1$th+sources1$u,data=sources1)
with these being the column heading.
I have successfully run an LDA on this same data set without issue so I'm not quite sure what is wrong.
From the source code of the partimat.default function getAnywhere(partimat.default) it states
if (nlevels(grouping) < 2)
stop("at least two classes required")
Therefore maybe you haven't defined your grouping column as a factor variable. If you try summary(sources1[,2]) what do you get? If it's not a factor, try
sources1[,2] <- as.factor(sources1[,2])
Or in method 2 try removing the "sources1$"on each of your variable names in the formula as you specify the data frame in which to look for these variable names in the data argument. I think you are effectively specifying the dataframe twice and it might be looking, for instance, for
"sources1$sources1$groups"
Rather than
"sources1$groups"
Without further error messages or a reproducible example (i.e. include some data in your post) it's hard to say really.
HTH

Resources