R: error in t test - r

https://filebin.net/3et86d1gh8cer9mu this is example subset of my data
I try to apply a code that was already working on similar data, now I can't tract where its wrong. The code goes like this:
url <- 'https://filebin.net/3et86d1gh8cer9mu/TCA_subset_GnoG_melt.csv'
TCA_subset_GnoG_melt <- read.csv(url)
L <- data.frame()
IDs <- unique(TCA_subset_GnoG_melt$X1)
for (i in 1 : length(IDs)){
temp<-TCA_subset_GnoG_melt[(TCA_subset_GnoG_melt$X1)==IDs[i],]
temp<- na.omit(temp)
t_test_CTROL_ABC.7<- t.test(temp$value[temp$X1.1=="CTROL"], temp$value[temp$X1.1=="ABC.7"])
t_test_CTROL_ABC.8<- t.test(temp$value[temp$X1.1=="CTROL"], temp$value[temp$X1.1=="ABC.8"])
t_test_CTROL_ABC.7.8<- t.test(temp$value[temp$X1.1=="CTROL"], temp$value[temp$X1.1=="ABC7.8"])
t_test_ABC.7_ABC.8<- t.test(temp$value[temp$X1.1=="ABC.7"], temp$value[temp$X1.1=="ABC.8"])
t_test_ABC.7_ABC.7.8<- t.test(temp$value[temp$X1.1=="ABC.7"], temp$value[temp$X1.1=="ABC7.8"])
t_test_ABC.8_ABC.7.8<- t.test(temp$value[temp$X1.1=="ABC.8"], temp$value[temp$X1.1=="ABC7.8"])
LLc <- cbind(as.character(unique(IDs[i])), t_test_CTROL_ABC.7,t_test_CTROL_ABC.8,t_test_CTROL_ABC.7.8, t_test_ABC.7_ABC.8,t_test_ABC.7_ABC.7.8, t_test_ABC.8_ABC.7.8)
L<-rbind(L,LLc)
}
AA<-rownames(L)
L$names <- AA
p_value_TCA <-L[grep("p.value",L$names), ]
df <- apply(p_value_TCA ,2,as.character)
df = as.matrix(df)
the error i get is:
Error in t.test.default(temp$value[temp$X1.1 == "CTROL"], temp$value[temp$X1.1 == :
not enough 'y' observations
I dpm't understand it, when i check the code line by line it goes until the LLc creation and than the df "L" is empty. it makes no sense to me. help!

Related

Why does rasterToPoints generate an error on first call but not second?

I have some code that loops over a list of study IDs (ids) and turns them into separate polygons/spatial points. On the first execution of the loop it produces the following error:
Error in (function (x) : attempt to apply non-function
This is from the raster::rasterToPoints function. I've looked at the examples in the help section for this function and passing fun=NULL seems to be an acceptable method (filters out all NA values). All the values are equal to 1 anyways so I tried passing a simple function like it suggests such as function(x){x==1}. When this didn't work, I also tried to just suppress the error message but without any luck using try() or tryCatch().
Main questions:
1. Why does this produce an error at all?
2. Why does it only display the error on the first run through the loop?
Reproducible example:
library(ggplot2)
library(raster)
library(sf)
library(dplyr)
pacific <- map_data("world2")
pac_mod <- pacific
coordinates(pac_mod) <- ~long+lat
proj4string(pac_mod) <- CRS("+init=epsg:4326")
pac_mod2 <- spTransform(pac_mod, CRS("+init=epsg:4326"))
pac_rast <- raster(pac_mod2, resolution=0.5)
values(pac_rast) <- 1
all_diet_density_samples <- data.frame(
lat_min = c(35, 35),
lat_max = c(65, 65),
lon_min = c(140, 180),
lon_max = c(180, 235),
sample_replicates = c(38, 278),
id= c(1,2)
)
ids <- all_diet_density_samples$id
for (idnum in ids){
poly1 = all_diet_density_samples[idnum,]
pol = st_sfc(st_polygon(list(cbind(c(poly1$lon_min, poly1$lon_min, poly1$lon_max, poly1$lon_max, poly1$lon_min), c(poly1$lat_min, poly1$lat_max, poly1$lat_max, poly1$lat_min, poly1$lat_min)))))
pol_sf = st_as_sf(pol)
x <- rasterize(pol_sf, pac_rast)
df1 <- raster::rasterToPoints(x, fun=NULL, spatial=FALSE) #ERROR HERE
df2 <- as.data.frame(df1)
density_poly <- all_diet_density_samples %>% filter(id == idnum) %>% pull(sample_replicates)
df2$density <- density_poly
write.csv(df2, paste0("pol_", idnum, ".csv"))
}
Any help would be greatly appreciated!
These are error messages, but not errors in the strict sense as the script continues to run, and the results are not affected. They are related to garbage collection (removal from memory of objects that are no longer in use) and this makes it tricky to pinpoint what causes it (below you can see a slightly modified example that suggests another culprit), and why it does not always happen at the same spot.
Edit (Oct 2022)
These annoying messages
Error in x$.self$finalize() : attempt to apply non-function
Error in (function (x) : attempt to apply non-function
Will disappear with the next release of Rcpp, which is planned for Jan 2023. You can also install the development version of Rcpp like this:
install.packages("Rcpp", repos="https://rcppcore.github.io/drat")

KNN: "no missing values are allow" -> I do not have missing values

I am in a group project for a class and one of the people in my group ran the normalization, as well as creating the test/train sets so that we all have the same sets to work from (we're all utilizing different algorithms). I am assigned with running the KNN algorithm.
We had multiple columns with NA's so those columns were omitted (<-NULL). When attempting to run the KNN I keep getting the error of
Error in knn(train = trainsetne, test = testsetne, cl = ne_train_target, :
no missing values are allowed
I ran which(is.na(dataset$col)) and found:
which(is.na(testsetne$median_days_on_market))
# [1] 8038 8097 8098 8100 8293 8304
When I look through the dataset those cells do not have missing data.
I am wondering if I may get some help with how to either find and fix the "No missing values" or to find a work around (if any).
I am sorry if I am missing something simple. Any help is appreciated.
I have listed the code that we have below:
ne$pending_ratio_yy <- ne$total_listing_count_yy <- ne$average_listing_price_yy <- ne$median_square_feet_yy <- ne$median_listing_price_per_square_feet_yy <- ne$pending_listing_count_yy <- ne$price_reduced_count_yy <- ne$median_days_on_market_yy <- ne$new_listing_count_yy <- ne$price_increased_count_yy <- ne$active_listing_count_yy <- ne$median_listing_price_yy <- ne$flag <- NULL
ne$pending_ratio_mm <- ne$total_listing_count_mm <- ne$average_listing_price_mm <- ne$median_square_feet_mm <- ne$median_listing_price_per_square_feet_mm <- ne$pending_listing_count_mm <- ne$price_reduced_count_mm <- ne$price_increased_count_mm <- ne$new_listing_count_mm <- ne$median_days_on_market_mm <- ne$active_listing_count_mm <- ne$median_listing_price_mm <- NULL
ne$factor_month_date <- as.factor(ne$month_date_yyyymm)
ne$factor_median_days_on_market <- as.factor(ne$median_days_on_market)
train20ne= sample(1:20893, 4179)
trainsetne=ne[train20ne,1:10]
testsetne=ne[-train20ne,1:10]
#This is where I start to come in
ne_train_target <- ne[train20ne, 3]
ne_test_target <- ne[-train20ne, 3]
predict_1 <- knn(train = trainsetne, test = testsetne, cl=ne_train_target, k=145)
# Error in knn(train = trainsetne, test = testsetne, cl = ne_train_target, :
# no missing values are allowed

Subsetting a data set and plotting means

I have a data set including Year, Site, and Species Count. I am trying to write a code that reflects in some years, the counts were done twice. For those years I have to find the mean count at each site for each species (there are two different species), and plot those means. This is the code I have generated:
DataSet1 <- subset(channel_islands,
channel_islands$SpeciesName=="Hypsypops ubicundus, adult" |
channel_islands$SpeciesName=="Paralabrax clathratus,adult")
years<-unique(DataSet1$Year)
Hypsypops_mean <- NULL
Paralabrax_mean <- NULL
Mean <- NULL
years <- unique(DataSet1$Year)
for(i in 1:length(years)){
data_year <- DataSet1[which(DataSet1$Year == years[i]), ]
Hypsypops<-data_year[which(data_year$SpeciesName=="Hypsypops rubicundus,adult"), ]
Paralabrax<-data_year[which(data_year$SpeciesName=="Paralabrax clathratus,adult"), ]
UNIQUESITE<-unique(unique(data_year$Site))
for(m in 1:(length(UNIQUESITE))){
zz<-Hypsypops[Hypsypops$Site==m,]
if(length(zz$Site)>=2){
Meanp <- mean(Hypsypops$Count[Hypsypops$Site==UNIQUESITE[m]])
Hypsypops_mean <- rbind(Hypsypops_mean,
c(UNIQUESITE[m], years[i], round(Meanp,2),
'Hypsypops rubicundus,adult'))
}
kk <- Paralabrax[Paralabrax$Site==m, ]
if(length(kk$Site)>=2){
Meane <- mean(Paralabrax$Count[Paralabrax$Site==UNIQUESITE[m]])
Paralabrax_mean <- rbind(Paralabrax_mean,
c(UNIQUESITE[m], years[i], round(Meane, 2),
'Paralabrax clathratus,adult'))
}
}
if(i==1){
Mean<-rbind(Hypsypops_mean, Paralabrax_mean)
}
if(i>1){
Mean<-rbind(DataMean, Hypsypops_mean, Paralabrax_mean)
}
Hypsypops_mean<-NULL
Paralabrax_mean<-NULL
}
Mean <- as.data.frame(Mean,stringsAsFactors=F)
names(Mean) <- c('Site','Year','mean_count','SpeciesName')
Mean$Site <- as.integer(Mean$Site)
Mean$Year <- as.integer(Mean$Year)
Mean$mean_count <- as.numeric(Mean$mean_count)
par(mfrow=c(5,5), oma=c(4,2,4,2), mar=c(5.5,4,3,0))
for(i in 1:length(years)){
if(any(Mean$Year==years[i])) {
year1<-Mean[which(Mean$Year==years[i]),]
Species<-unique(as.character(year1$SpeciesName))
Colors<-c("pink","purple")[Species]
Data_Hr<-year1[year1$SpeciesName=="Hypsypops rubicundus,adult",]
Data_Pc<-year1[year1$SpeciesName=="Paralabrax clathratus,adult",]
plot(Data_Hr$mean_count~Data_Pc$mean_count,
xlab=c("Hypsypops rubicundus"),
ylab=c("Paralabrax clathratus"),main=years[i],pch=16)
}
}
It's a lot I'm sorry, I'm not sure of a way to streamline the process. But I keep getting an error:
Error in names(Mean) <- c("Site", "Year", "mean_count", "SpeciesName")
: 'names' attribute [4] must be the same length as the vector [0]
Not sure how I can debug this.
Not sure why you want to do this with an elaborate loop code. It sounds like you are trying to summarise your data.
This can be done in different ways. Here is a solution using dplyr:
DataSet1 %>%
group_by(Year, SpeciesName, Site) %>%
summarise(nrecords = n(),
Count = mean(Count))
To get a better answer, it might be helpful to post a subset of the data and the intended result you are after.

R: Package topicmodels: LDA: Error: invalid argument

I have a question regarding LDA in topicmodels in R.
I created a matrix with documents as rows, terms as columns, and the number of terms in a document as respective values from a data frame. While I wanted to start LDA, I got an Error Message stating "Error in !all.equal(x$v, as.integer(x$v)) : invalid argument type" . The data contains 1675 documents of 368 terms. What can I do to make the code work?
library("tm")
library("topicmodels")
data_matrix <- data %>%
group_by(documents, terms) %>%
tally %>%
spread(terms, n, fill=0)
doctermmatrix <- as.DocumentTermMatrix(data_matrix, weightTf("data_matrix"))
lda_head <- topicmodels::LDA(doctermmatrix, 10, method="Gibbs")
Help is much appreciated!
edit
# Toy Data
documentstoy <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16)
meta1toy <- c(3,4,1,12,1,2,3,5,1,4,2,1,1,1,1,1)
meta2toy <- c(10,0,10,1,1,0,1,1,3,3,0,0,18,1,10,10)
termstoy <- c("cus","cus","bill","bill","tube","tube","coa","coa","un","arc","arc","yib","yib","yib","dar","dar")
toydata <- data.frame(documentstoy,meta1toy,meta2toy,termstoy)
So I looked inside the code and apparently the lda() function only accepts integers as the input so you have to convert your categorical variables as below:
library('tm')
library('topicmodels')
documentstoy <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16)
meta1toy <- c(3,4,1,12,1,2,3,5,1,4,2,1,1,1,1,1)
meta2toy <- c(10,0,10,1,1,0,1,1,3,3,0,0,18,1,10,10)
toydata <- data.frame(documentstoy,meta1toy,meta2toy)
termstoy <- c("cus","cus","bill","bill","tube","tube","coa","coa","un","arc","arc","yib","yib","yib","dar","dar")
toy_unique = unique(termstoy)
for (i in 1:length(toy_unique)){
A = as.integer(termstoy == toy_unique[i])
toydata[toy_unique[i]] = A
}
lda_head <- topicmodels::LDA(toydata, 10, method="Gibbs")

"object not found" when running a function in R

I have created the following function
FilterIndi <- function(infile,name, date){
sub_file <- infile[,c("NUMBER","CREATE_DTTM_NEW", name)]
sub_file <- subset(sub_file, name==1)
library(data.table)
sub_file <- setDT(sub_file)[, .SD[which.max(CREATE_DTTM_NEW)], NUMBER]
sub_file$date <- sub_file$CREATE_DTTM_NEW
sub_file$CREATE_DTTM_NEW <- NULL
library(dplyr) #to do left_join
Unique <- left_join(Unique,sub_file, by =c("NUMBER"="NUMBER"))
Unique$name[is.na(Unique$name)] <-0
return(Unique)
}
FilterIndi(allfile, pde, pde_date )
pde is in data frame allfile but I get the following error:
Error in '[.data.frame'(infile, c("NUMBER", "CREATE_DTTM_NEW", :
object 'pde' not found
I can't figure out how to make it work.
Can someone please help me? Thanks a lot in advance.
EDIT: I have attached an image of allfile:

Resources