I have a binary time series with 359 observations.
like this; 0
0
0
0
0
0
0
0
0
1
0
0
0
0
1
0
0
0
...
I want to generate n data samples with same intervals but permuted order.
For this at first I found times which original data became one from zero something like this:
147 65 10 251
and then randomized the order of intervals into something like these:
251 10 65 147
10 251 147 65
.
.
.
and so far my code is something like this:
mydata <- "C:/Users/me/Desktop/2.xlsx"
library("xlsx")
library("tseries")
my_data <- read.xlsx(mydata, sheetName = "Sheet1", header = F)
file <- "C:/Users/me/Desktop/pp.xlsx"
ts=my_data[6]
ts=unlist(ts)
for (i in 1:100){
diff.ts<-diff(ts)
x=sample(diff(which(diff.ts==1)))
print(x)
write.xlsx(x,file[i], sheetName = "Sheet1",col.names=TRUE, row.names=FALSE, append=FALSE, password=NULL)
}
however,
I can not store all of these in .xlsx file even though while printing they seem fine
my second problem is that I do not only want to know which time 0 to one happened but also I want to write them as the original data for example if in one of the randomized samples the intervals is 10 251 147 65 I want a stored column with 1 in the 10th, 251th, 147th and 65th row as one and the other row as zero something like this :
0 0 0 0 0 0 0 0 0 1 0 0 .... .
Sorry for English errors
The interval objective is unclear and your permutated interval question is perhaps answered with the sample() function below, which will randomly pick a distribution of 1s and 0s. You can also adjust the probabilities to change change a 0 or 1 is selected, here it's 50/50%. Additionally, if you want a random sample but to ensure your code is repeatable, you can enforce a random seed in your session to draw the same permutation each time with: set.seed(123456), picking whatever seed you feel appropriate.
sample(x=c(0,1),size=359,replace=T,prob=c(0.5,0.5))
Alternatively, your question might suggest wanting to set values equal to 1 at a specific index. Here, for your example of 147,65,10,251 you can do:
intervals <- rep(0,359)
intervals[c(147,65,10,251)] <- 1
Or perhaps like this?
intervals <- rep(0,359)
intervals[sample(c(147,65,10,251))] <- 1
As this is a two part question and an answer to your excel writing issue: you have the write.xlsx called from within the for loop, meaning that you're writing the file at each loop. This may or may not be the behavior you want? I assume writing the entire dataset is preferable. However, as you're specifying file[i] as the output, and your variable file is a single value (or one-length vector) you'll get errors. You change this in the write.xlsx to write.xlsx(x,paste0("my_file_num",i,".xlsx"), ... or move the call outside the loop as illustrated below
file <- "C:/Users/me/Desktop/pp.xlsx"
ts=my_data[6]
ts=unlist(ts)
samples <- NULL
for (i in 1:100){
diff.ts<-diff(ts)
x=sample(diff(which(diff.ts==1)))
samples <- append(samples,list(samples=x))
print(x)
}
write.xlsx(samples,file, sheetName = "Sheet1",col.names=TRUE, row.names=FALSE, append=FALSE, password=NULL)
Related
I have a data frame with 10 columns and 4 row and a vector of the length 10.
My vector contains 7x 0 and 3x 1. I want to make a sample of my data frame that picks exactly the 20 variables where in my vector a 1 appears.
E.g. if my vector goes like that: 0 0 1 0 1 1 0 0 0 0 then I want the 3rd, 5th and 6th row to be picked by the sample.
How can I do that?
Can I add anything to this command to make it work? s <- sample_n(df$col1, 3)
I am up to create a multilevel analysis (and I am a total newbie).
In this analysis I want to test if a high value of a predictor( here:senseofhumor) (numeric value - transfered into "high","low","medium") would predict the (numeric)outcome more than the other (numeric)predictors (senseofhomor-seriousness-friednlyness).
I have a dataset with many people and groups and want to compare the outcome between the groups regarding the influence of SenseofhumorHIGH
The code for that might look like this
RandomslopeEC <- lme(criteria(timepoint1) ~ senseofhumor + seriousness + friendlyness , data = DATA, random = ~ **SenseofhumorHIGH**|group)
For that reason I created values "high" "low" "medium" for my numeric predictor via
library(tidyverse)
DATA <- DATA %>%
mutate(predictorNew = case_when(senseofhumor< quantile(senseofhumor, 0.5) ~ 'low',
senseofhumor > quantile(senseofhumor, 0.75)~'high',
TRUE ~ 'med'))
Now they look like this:
Person
Group
senseofhumor
1
56
low
7
1
high
87
7
low
764
45
high
Now I realized i might need to cut this variable values in separate variables if I want to test my idea.
Do any of you know how to generate variables, that they may look like this?
Person
Group
senseofhumorHIGH
senseofhumorMED
senseofhumorLOW
1
56
0
0
1
7
1
1
0
0
87
7
0
0
1
764
45
1
0
0
51
3
1
0
0
362
9
1
0
0
87
27
0
0
1
Does this make any sense to you regarding my approach? Or do you have a better idea?
Thanks a lot in advance
Welcome to learning R. You will want to convert these types of variables to "factors," and R will be able to process them accordingly. To do that, use as.factor(variable) - so for you it may be DATA$senseofhumor <- as.factor(DATA$senseofhumor). If you need to convert multiple columns, you can use:
factor_cols <- c("Var1","Var2","Var3") # list columns you want as factors
DATA[factor_cols] <- lapply(DATA[factor_cols], as.factor)
Since you are new, note that this forum is typically for questions that cant be easily found online. This question is relatively routine and more details can be found with a quick google search. While SO is a great place to learn R, you may be penalized by the SO community in the future for routine questions like this. Just trying to help ensure you keep learning!
I have been continuing to learn r to transition away from excel and I am wondering what the best way to approach the following problem is, or at least what tools are available to me:
I have a large data set (100K+ rows) and several columns that I could generate a signal off of and each value in the vectors can range between 0 and 3.
sig1 sig2 sig3 sig4
1 1 1 1
1 1 1 1
1 0 1 1
1 0 1 1
0 0 1 1
0 1 2 2
0 1 2 2
0 1 1 2
0 1 1 2
I want to generate composite signals using the state of each cell in the four columns then see what each of the composite signals tell me about the returns in a time series. For this question the scope is only generating the combinations.
So for example, one composite signal would be when all four cells in the vectors = 0. I could generate a new column that reads TRUE when that case is true and false in each other case, then go on to figure out how that effects the returns from the rest of the data frame.
The thing is I want to check all combinations of the four columns, so 0000, 0001, 0002, 0003 and on and on, which is quite a few. With the extent of my knowledge of r, I only know how to do that by using mutate() for each combination and explicitly entering the condition to check. I assume there is a better way to do this, but I haven't found it yet.
Thanks for the help!
I think that you could paste the columns together to get unique combinations, then just turn this to dummy variables:
library(dplyr)
library(dummies)
# Create sample data
data <- data.frame(sig1 = c(1,1,1,1,0,0,0),
sig2 = c(1,1,0,0,0,1,1),
sig3 = c(2,2,0,1,1,2,1))
# Paste together
data <- data %>% mutate(sig_tot = paste0(sig1,sig2,sig3))
# Generate dummmies
data <- cbind(data, dummy(data$sig_tot, sep = "_"))
# Turn to logical if needed
data <- data %>% mutate_at(vars(contains("data_")), as.logical)
data
I am trying to convert a data.frame to table without packages. Basically I take cookbook as reference for this and tried from data frame, both named or unnamed vectors. The data set is stackoverflow survey from kaggle.
moreThan1000 is a data.frame stores countries those have more than 1000 stackoverflow user and sorted by number column as shown below:
moreThan1000 <- subset(users, users$Number >1000)
moreThan1000 <- moreThan1000[order(moreThan1000$Number),]
when I try to convert it to a table like
tbl <- table(moreThan1000)
tbl <- table(moreThan1000$Country, moreThan1000$Number)
tbl <- table(moreThan1000$Country, moreThan1000$Number, dnn = c("Country","Number"))
after each attempt my conversion look like this:
Why moreThan1000 data.frame do not send just related countries but all countries to table? It seems to me conversion looks like a matrix.
I believe that this is because countries do not relate to each other. To each country corresponds a number, to another country will correspond an unrelated number. So the best way to reflect this is the original data.frame, not a table that will have just one 1 per row (unless two countries have the very same number of stackoverflow users). I haven't downloaded the dataset you're using, but look to what happens with a fake dataset, order by number just like your moreThan1000.
dat <- data.frame(A = letters[1:5], X = 21:25)
table(dat$A, dat$X)
21 22 23 24 25
a 1 0 0 0 0
b 0 1 0 0 0
c 0 0 1 0 0
d 0 0 0 1 0
e 0 0 0 0 1
Why would you expect anything different from your dataset?
The function "table" is used to tabulate your data.
So it will count how often every value occurs (in the "number"column!). In your case, every number only occurs once, so don't use this function here. It's working correctly, but it's not what you need.
Your data is already a tabulation, no need to count frequencies again.
You can check if there is an object conversion function, I guess you are looking for a function as.table rather than table.
I have been using the wfm function in "qdap" package for transposing the text row values into columns and ran into problem when the data contains numbers along with text. For example if the row value is "abcdef" the transpose works fine but if the value is "ab1000" then the truncation of numbers happen. Can anyone help with suggestions on how to work around this?
Approach tried so far:
input <- read.table(header=F, text="101 ab0003
101 pp6500
102 sm2456")
colnames(input) <- c("id","channel")
require(qdap)
library(qdap)
output <- t(with(input, wfm(channel, id)))
output <- as.data.frame(output)
expected_output<- read.table(header=F,text="1 1 0
0 0 1")
colnames(expected_output) <- c("ab0003","pp6500", "sm2456")
I think maybe wfm isn't the right tool for this job. It seems you don't really have sentences that you want to split into words. So you're using a function with a lot of overhead unnecessarily. What you really want it to tabulate the values you have by another grouping variable.
Here are two approaches. One using qdapTools's mtabulate, another using base R's table:
library(qdapTools)
mtabulate(with(input, split(channel, id)))
## ab0003 pp6500 sm2456
## 101 1 1 0
## 102 0 0 1
t(with(input, table(channel, id)))
## channel
## id ab0003 pp6500 sm2456
## 101 1 1 0
## 102 0 0 1
It may be possible your MWE is not reflecting the complexity of the data, if this is the case it brings us back to the original problem. wfm uses tmpackage as a backend to make some of the manipulations. So we'd need to supply something to the ldots (...). I re-read the documentation and this is a bit confusing (I have added this info in the dev version) but we want to pass removeNumbers=FALSE to TermDocumentMatrix as seen here:
output <- t(with(input, wfm(channel, id, removeNumbers=FALSE)))
as.data.frame(output)
## ab0003 pp6500 sm2456
## 101 1 1 0
## 102 0 0 1