I have two data frames. The first one (A) contain information about GOALS, and the second one (B) contains the specific information about the IDs which had that GOAL:
> A
GOAL
1 A116642173
2 A116642174
3 A116642175
4 A116642176
5 A116642178
6 A116642181
> B
ID GOAL
1 1873 A116433509
2 478 A116642178
3 2165 A116192937
4 165 A116192937
5 313 A116433701
6 475 A116367456
I would like to create new columns in one of this according the other data frame. So, first I create aditional columns:
> idkids=c(313,475,165,478,1873,2165)
> ids<-c(idkids)
> A[ ,paste0(ids)]<-0
> A
GOAL 313 475 165 478 1873 2165
1 A116642173 0 0 0 0 0 0
2 A116642174 0 0 0 0 0 0
3 A116642175 0 0 0 0 0 0
4 A116642176 0 0 0 0 0 0
5 A116642178 0 0 0 0 0 0
6 A116642181 0 0 0 0 0 0
I tried to use ifelse to find the GOAL for a specifid ID, but I didn't. I have tried to do this by two ways:
for (i in 1:kids){
A[ ,i+1]<-ifelse(A[ ,i+1]%in%B$ID,"",
ifelse(A$GOAL%in%B$GOAL, 1, 0))
}
for (i in 1:kids){
A[ ,i+1]<-ifelse(A[,i+1]%in%B$ID & A$GOAL%in%B$GOAL,1,0)
}
But my code didn't recognize the specific ID and it didn't give me 1 (TRUE) or 0 (FALSE). It give me 0 for all the columns... Can any one help me, please?
Here is one method to reshape the 'B' data into 'wide' and then do a join
library(dplyr)
library(tidyr)
pivot_wider(B, names_from = ID, values_from = ID, values_fn = length,
values_fill = 0) %>%
right_join(A)
I have the following code: model$data
model$data
[[1]]
Category1 Category2 Category3 Category4
3555 1 0 0 0
6447 1 0 0 0
5523 1 0 1 0
7550 1 0 1 0
6330 1 0 1 0
2451 1 0 0 0
4308 1 0 1 0
8917 0 0 0 0
4780 1 0 1 0
6802 1 0 1 0
2021 1 0 0 0
5792 1 0 1 0
5475 1 0 1 0
4198 1 0 0 0
223 1 0 1 0
4811 1 0 1 0
678 1 0 1 0
I am trying to use this formula to get an index of the column names:
sample(colnames(model$data), 1)
But I receive the following error message:
Error in sample.int(length(x), size, replace, prob) :
invalid first argument
Is there a way to avoid that error?
Notice this?
model$data
[[1]]
The [[1]] means that model$data is a list, whose first component is a data frame. To do anything with it, you need to pass model$data[[1]] to your code, not model$data.
sample(colnames(model$data[[1]]), 1)
This seems to be a near-duplicate of Random rows in dataframes in R and should probably be closed as duplicate. But for completeness, adapting that answer to sampling column-indices is trivial:
you don't need to generate a vector of column-names, only their indices. Keep it simple.
sample your col-indices from 1:ncol(df) instead of 1:nrow(df)
then put those column-indices on the RHS of the comma in df[, ...]
df[, sample(ncol(df), 1)]
the 1 is because you apparently want to take a sample of size 1.
one minor complication is that your dataframe is model$data[[1]], since your model$data looks like a list with one element which is a dataframe, rather than a plain dataframe. So first, assign df <- model$data[[1]]
finally, if you really really want the sampled column-name(s) as well as their indices:
samp_col_idxs <- sample(ncol(df), 1)
samp_col_names <- colnames(df) [samp_col_idxs]
I have an object currency I would like to select one column and the rows equal to 1 with the variable Pair.
>currency
EURUSD EURUSDi USDJPY USDJPYi GBPUSD GBPUSDi AUDUSD AUDUSDi XAUUSD XAUUSDi zeroes
2000-07-16 0 0 0 0 0 1 0 0 0 0 0
2000-07-23 0 0 0 0 0 1 0 0 0 0 0
2000-07-30 0 0 0 0 0 1 0 0 0 0 0
2000-08-06 0 0 0 0 0 0 0 0 0 1 0
2000-08-13 0 1 0 0 0 0 0 0 0 0 0
From the console I can do it with subset like this :
> subset(currency$GBPUSDi, GBPUSDi == 1)
GBPUSDi
2000-07-16 1
2000-07-23 1
2000-07-30 1
2000-08-06 1
2000-08-13 1
2000-08-20 1
But as soon as it is passed in a script with variable Pair it fails. I've searched for hours in the documentation and I'm having a headache trying to figure out what is wrong.
Please find the different command I've try :
subset (currency$Pair, Pair == 1)
subset (currency, Pair = 1, select = Pair)
weights$Cur[currency$Pair = 1]
The one that works is currency[,c(Pair)] but it only select column, how can I complete with row selection of Pair = 1 ?
currency[,c(Pair)][Pair = 1] and subset (currency[,c(Pair)], Pair = 1) with = or == doesn't work.
currency$Pair[currency$Pair == 1] should work ($Pair select column Pair and [currency$Pair == 1] select values equal to 1). It looks like it don't work in your case, because currency don't contain variable Pair.
If currency is not a dataframe but matrix, you can try
currency[currency[, c("Pair")] == 1, c("Pair")]
I have a series of data in the format (true/false). eg it looks like it can be generated from rbinom(n, 1, .1). I want a column that represents the # of rows since the last true. So the resulting data will look like
true/false gap
0 0
0 0
1 0
0 1
0 2
1 0
1 0
0 1
What is an efficient way to go from true/false to gap (in practice I'll this will be done on a large dataset with many different ids)
DF <- read.table(text="true/false gap
0 0
0 0
1 0
0 1
0 2
1 0
1 0
0 1", header=TRUE)
DF$gap2 <- sequence(rle(DF$true.false)$lengths) * #create a sequence for each run length
(1 - DF$true.false) * #multiply with 0 for all 1s
(cumsum(DF$true.false) != 0L) #multiply with zero for the leading zeros
# true.false gap gap2
#1 0 0 0
#2 0 0 0
#3 1 0 0
#4 0 1 1
#5 0 2 2
#6 1 0 0
#7 1 0 0
#8 0 1 1
The cumsum part might not be the most efficient for large vectors. Something like
if (DF$true.false[1] == 0) DF$gap2[seq_len(rle(DF$true.false)$lengths[1])] <- 0
might be an alternative (and of course the rle result could be stored temporarly to avoid calculating it twice).
Ok, let me put this in answer
1) No brainer method
data['gap'] = 0
for (i in 2:nrow(data)){
if data[i,'true/false'] == 0{
data[i,'gap'] = data[i-1,'gap'] + 1
}
}
2) No if check
data['gap'] = 0
for (i in 2:nrow(data)){
data[i,'gap'] = (data[i-1,'gap'] + 1) * (-(data[i,'gap'] - 1))
}
Really don't know which is faster, as both contain the same amount of reads from data, but (1) have an if statement, and I don't know how fast is it (compared to a single multiplication)
I am a new R user. Currently I am working on a dataset wherein I have to transform the multiple binary columns into single factor column
Here is the example:
current dataset like :
$ Property.RealEstate : num 1 1 1 0 0 0 0 0 1 0 ...
$ Property.Insurance : num 0 0 0 1 0 0 1 0 0 0 ...
$ Property.CarOther : num 0 0 0 0 0 0 0 1 0 1 ...
$ Property.Unknown : num 0 0 0 0 1 1 0 0 0 0 ...
Property.RealEstate Property.Insurance Property.CarOther Property.Unknown
1 0 0 0
0 1 0 0
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
Recoded column should be:
Property
1 Real estate
2 Insurance
3 Real estate
4 Insurance
5 CarOther
6 Unknown
It is basically a reverse of melt.matrix function.
Thank You all for your Precious Inputs. It does work.
But one issue though,
I have some rows which takes value as:
Property.RealEstate Property.Insurance Property.CarOther Property.Unknown
0 0 0 0
I want these to be marked as NA or Null
Would be a help if you suggest on this as well.
Thank You
> mat <- matrix(c(0,1,0,0,0,
+ 1,0,0,0,0,
+ 0,0,0,1,0,
+ 0,0,1,0,0,
+ 0,0,0,0,1), ncol = 5, byrow = TRUE)
> colnames(mat) <- c("Level1","Level2","Level3","Level4","Level5")
> mat
Level1 Level2 Level3 Level4 Level5
[1,] 0 1 0 0 0
[2,] 1 0 0 0 0
[3,] 0 0 0 1 0
[4,] 0 0 1 0 0
[5,] 0 0 0 0 1
Create a new factor based upon the index of each 1 in each row
Use the matrix column names as the labels for each level
NewFactor <- factor(apply(mat, 1, function(x) which(x == 1)),
labels = colnames(mat))
> NewFactor
[1] Level2 Level1 Level4 Level3 Level5
Levels: Level1 Level2 Level3 Level4 Level5
also you can try:
factor(mat%*%(1:ncol(mat)), labels = colnames(mat))
also use Tomas solution - ifounf somewhere in SO
as.factor(colnames(mat)[mat %*% 1:ncol(mat)])
Melt is certainly a solution. I'd suggest using the reshape2 melt as follows:
library(reshape2)
df=data.frame(Property.RealEstate=c(0,0,1,0,0,0),
Property.Insurance=c(0,1,0,1,0,0),
Property.CarOther=c(0,0,0,0,1,0),
Property.Unknown=c(0,0,0,0,0,1))
#add id column (presumably you have ids more meaningful than row numbers)
df$row=1:nrow(df)
#melt to "long" format
long=melt(df,id="row")
#only keep 1's
long=long[which(long$value==1),]
#merge in ids for NA entries
long=merge(df[,"row",drop=F],long,all.x=T)
#clean up to match example output
long=long[order(long$row),"variable",drop=F]
names(long)="Property"
long$Property=gsub("Property.","",long$Property,fixed=T)
#results
long
Alternately, you can just do it in the naïve way. I think it's more transparent than any of the other suggestions (including my other suggestion).
df=data.frame(Property.RealEstate=c(0,0,1,0,0,0),
Property.Insurance=c(0,1,0,1,0,0),
Property.CarOther=c(0,0,0,0,1,0),
Property.Unknown=c(0,0,0,0,0,1))
propcols=c("Property.RealEstate", "Property.Insurance", "Property.CarOther", "Property.Unknown")
df$Property=NA
for(colname in propcols)({
coldata=df[,colname]
df$Property[which(coldata==1)]=colname
})
df$Property=gsub("Property.","",df$Property,fixed=T)
Something different:
Get the data:
dat <- data.frame(Property.RealEstate=c(1,0,1,0,0,0),Property.Insurance=c(0,1,0,1,0,0),Property.CarOther=c(0,0,0,0,1,0),Property.Unknown=c(0,0,0,0,0,1))
Reshape it:
names(dat)[row(t(dat))[t(dat)==1]]
#[1] "Property.RealEstate" "Property.Insurance" "Property.RealEstate"
#[4] "Property.Insurance" "Property.CarOther" "Property.Unknown"
If you want it cleaned up, do:
gsub("Property\\.","",names(dat)[row(t(dat))[t(dat)==1]])
#[1] "RealEstate" "Insurance" "RealEstate" "Insurance" "CarOther" "Unknown"
If you prefer a factor output:
factor(row(t(dat))[t(dat)==1],labels=names(dat))
...and cleaned up:
factor(row(t(dat))[t(dat)==1],labels=gsub("Property\\.","",names(dat)) )