How to pause and restart a loop when specific conditions are met? - r

This is what my data looks like:
ID <- c(rep(1.1, 10), c(rep("1.2", 7)))
behaviour <- c("stand", "still", "eat", "lie", "lick", "still", "rum",
"still", "stand", "walk", "stand", "rum", "walk", "still",
"lie", "still", "stand")
df_behav <- data.frame(ID, behaviour)
ID behaviour
1 1.1 stand
2 1.1 still
3 1.1 eat
4 1.1 lie
5 1.1 lick
6 1.1 still
7 1.1 rum
8 1.1 still
9 1.1 stand
10 1.1 walk
11 1.2 stand
12 1.2 rum
13 1.2 walk
14 1.2 still
15 1.2 lie
16 1.2 still
17 1.2 stand
I would like to built a loop which deletes "still"-rows based on following conditons:
I would like to run the loop for all events with the same "ID".
And within the IDs, I would like the loop to start with the first "stand", pause the loop if "lie" is reached and restart the loop as soon as there is, again, "stand" and so on.
Of course there are more than 2 IDs in my actual data frame and that's why I am looking for a automated approach.
What my data should look like:
ID behaviour
1 1.1 stand
2 1.1 still
3 1.1 eat
4 1.1 lie
5 1.1 lick
6 1.1 rum
7 1.1 stand
8 1.1 walk
9 1.2 stand
10 1.2 rum
11 1.2 walk
12 1.2 still
13 1.2 lie
14 1.2 stand
I am quite new to R, so what I have done so far is the 'structure' around the actual code. I am not sure if a for-loop is the right approach. That's why I appreciate any kind of helpful comment or solution on my problem.
mylist_3 <- list()
for(i in 1:length(unique(df_behav$ID))){
animal1 <- filter(df_behav,
ID == unique(df_behav$ID)[i])
#here I would like to the actual code
animal9 <- as.character(unique(df_behav$ID)[i]) [1]
mylist_3[[animal9]] <- animal1
}
df_behav <- ldply(mylist_3, data.frame, .id = NULL)

Related

How to rank data from multiple rows and columns?

Example data:
>data.frame("A" = c(20,40,53), "B" = c(40,11,60))
What's the easiest way in R to get from this
A B
1 20 40
2 40 11
3 53 60
to this?
A B
1 2.0 3.5
2 3.5 1.0
3 5.0 6.0
I couldn't find a way to make rank() or frank() work on multiple rows/columns and googling things like "r rank dataframe" "r rank multiple rows" yielded only questions on how to rank multiple rows/columns individually, which is weird, as I suspect the question must have been answered before.
Try rank like below
df[] <- rank(df)
or
df <- list2DF(relist(rank(df),skeleton = unclass(df)))
and you will get
> df
A B
1 2.0 3.5
2 3.5 1.0
3 5.0 6.0

R: subset data.frame by another vector

I have a dataframe with 241 rows. It is called master and it looks like this:
Patient Sample PDMax FileName
1 1.1 6 GSM1
1 1.2 6 GSM2
2 2.1 8 GSM3
3 3.1 5 GSM4
3 3.2 7 GSM5
Now I have a vector called Biopsy with the important samples. I would like to subset the master dataframe, so that only the important informations are left.
This is the vector biopsy:
1.2 2.1 3.2
The result should be like this:
Patient Sample PDMax FileName
1 1.2 6 GSM2
2 2.1 8 GSM3
3 3.2 7 GSM5
How can I do that? I tried different things like merge() or subset(), but everything failed.
Thanks!
Have a look at the data wrangling verbs inside dplyr. Hadley Wickham's book is a great place to start (http://r4ds.had.co.nz/transform.html#filter-rows-with-filter)
library (dplyr)
master %>% filter(Sample %in% Biopsy)

Choose the right analysis

I have dataset in R that contains 2 group good and bad. The group good contains users that have a long lifetime and users in bad have a short lifetime.
So good contains game_id and game_played. For example good$game_id==1 (game 1) has been played good$game_played==12.5 hours.
I want to investigate if there are a difference between good and bad and see which game_id's that make the difference between good and bad.
I have 20 game_id's so I don't need Principal Component Analysis to make a reduction of game_id's. How should one make an analysis to see if some game_id's makes the difference between good and bad ?
So in R we get for good
an output like this:
game_id game_played
6 18.3
14 2.1
4 0.6
1 1.0
2 1.4
3 0.1
5 0.4
7 1.2
8 1.2
9 3.1
10 1.7
11 11.6
12 0.2
13 5.4
15 4.3
16 12.4
17 8.2
18 7.0
19 3.4
20 4.6
where game_id is the name of the game and game_played is the hours the game has been played in data good. For bad we have a similar output with difference values.

Stacking columns with similar names in R

I have a CSV file whose awful format I cannot change (simplified here):
Inc,a_One,a_Two,a_Three,b_One,b_Two,b_Three
1,1,1.5,"5 Things",2,2.5,"10 Things"
2,5,5.5,"10 Things",6,6.5,"20 Things"
Inc,a_One,a_Two,a_Three,b_One,b_Two,b_Three
3,9,9.5,"15 Things",10,10.5,"30 Things"
My desired output is a new CSV containing:
inc,label,one,two,three
1,"a",1,1.5,"5 Things"
2,"a",5,5.5,"10 Things"
3,"a",9,9.5,"15 Things"
1,"b",2,2.5,"10 Things"
2,"b",6,6.5,"20 Things"
3,"b",10,10.5,"30 Things"
Basically:
lowercase the headers
strip off header prefixes and preserve them by adding them to a new column
remove header repetitions in later rows
stack each column that shares the latter part of their names (e.g. a_One and b_One values should be merged into the same column).
During this process, preserve the Inc value from the original row (there may be more than one row like this in various places).
With caveats:
I don't know the column names ahead of time (many files, many different columns). These need to be parsed if they are to be used as logic for stripping the repetitious header rows.
There may or may not be more than one column with properties like Inc that need to be preserved when everything gets stacked. Generally, Inc represents any column that does not have a prefix like a_ or b_. I have a regex to strip out these prefixes already.
So far, I've accomplished this:
> wip_path <- 'C:/path/to/horrible.csv'
> rawwip <- read.csv(wip_path, header = FALSE, fill = FALSE)
> rawwip
V1 V2 V3 V4 V5 V6 V7
1 Inc a_One a_Two a_Three b_One b_Two b_Three
2 1 1 1.5 5 Things 2 2.5 10 Things
3 2 5 5.5 10 Things 6 6.5 20 Things
4 Inc a_One a_Two a_Three b_One b_Two b_Three
5 3 9 9.5 15 Things 10 10.5 30 Things
> skips <- which(rawwip$V1==rawwip[1,1])
> skips
[1] 1 4
> filwip <- rawwip[-skips,]
> filwip
V1 V2 V3 V4 V5 V6 V7
2 1 1 1.5 5 Things 2 2.5 10 Things
3 2 5 5.5 10 Things 6 6.5 20 Things
5 3 9 9.5 15 Things 10 10.5 30 Things
> rawwip[1,]
V1 V2 V3 V4 V5 V6 V7
1 Inc a_One a_Two a_Three b_One b_Two b_Three
But then when I try to apply a tolower() to these strings, I get:
> tolower(rawwip[1,])
[1] "4" "4" "4" "4" "4" "4" "4"
And this is quite unexpected.
So my questions are:
1) How can I gain access to the header strings in rawwip[1,] so that I can reformat them with tolower() and other string-manipulating functions?
2) Once I've done that, what's the most effective way to stack the columns with shared names while preserving the inc value for each row?
Bear in mind, there will be well over a thousand repetitious columns that can be filtered down to perhaps 20 shared column names. I will not know the position of each stackable column ahead of time. This needs to be determined within the script.
You can use the base reshape() function. For example with the input
dd<-read.csv(text='Inc,a_One,a_Two,a_Three,b_One,b_Two,b_Three
1,1,1.5,"5 Things",2,2.5,"10 Things"
2,5,5.5,"10 Things",6,6.5,"20 Things"
inc,a_one,a_two,a_three,b_one,b_two,b_three
3,9,9.5,"15 Things",10,10.5,"30 Things"')
you can do
dx <- reshape(subset(dd, Inc!="inc"),
varying=Map(function(x) paste(c("a","b"), x, sep="_"), c("One","Two","Three")),
v.names=c("One","Two","Three"),
idvar="Inc",
timevar="label",
times = c("a","b"),
direction="long")
dx
to get
Inc label One Two Three
1.a 1 a 1 1.5 5 Things
2.a 2 a 5 5.5 10 Things
3.a 3 a 9 9.5 15 Things
1.b 1 b 2 2.5 10 Things
2.b 2 b 6 6.5 20 Things
3.b 3 b 10 10.5 30 Things
Because your input data is messy (embedded headers), this creates everything as factors. You could try to convert to proper data types with
dx[]<-lapply(lapply(dx, as.character), type.convert)
I would suggest a combination of read.mtable from my GitHub-only "SOfun" package and merged.stack from my "splitstackshape" package.
Here's the approach. I'm assuming your data is stored in a file called "somedata.txt" in your working directory.
The packages we need:
library(splitstackshape) # for merged.stack
library(SOfun) # for read.mtable
First, grab a vector of the names. While we are at it, change the name structure from "a_one" to "one_a" -- it's a much more convenient format for both merged.stack and reshape.
theNames <- gsub("(.*)_(.*)", "\\2_\\1",
tolower(scan(what = "", sep = ",",
text = readLines("somefile.txt", n = 1))))
Second, use read.mtable to read the data in. We create the data chunks by identifying all the lines that start with letters. You can use a more specific regular expression if that doesn't match your actual data.
This will create a list of data.frames, so we use do.call(rbind, ...) to put it together in a single data.frame:
theData <- read.mtable("somefile.txt", "^[A-Za-z]", header = FALSE, sep = ",")
theData <- setNames(do.call(rbind, theData), theNames)
This is what the data now look like:
theData
# inc one_a two_a three_a one_b two_b three_b
# Inc,a_One,a_Two,a_Three,b_One,b_Two,b_Three.1 1 1 1.5 5 Things 2 2.5 10 Things
# Inc,a_One,a_Two,a_Three,b_One,b_Two,b_Three.2 2 5 5.5 10 Things 6 6.5 20 Things
# inc,a_one,a_two,a_three,b_one,b_two,b_three 3 9 9.5 15 Things 10 10.5 30 Things
From here, you can use merged.stack from "splitstackshape"....
merged.stack(theData, var.stubs = c("one", "two", "three"), sep = "_")
# inc .time_1 one two three
# 1: 1 a 1 1.5 5 Things
# 2: 1 b 2 2.5 10 Things
# 3: 2 a 5 5.5 10 Things
# 4: 2 b 6 6.5 20 Things
# 5: 3 a 9 9.5 15 Things
# 6: 3 b 10 10.5 30 Things
... or reshape from base R:
reshape(theData, direction = "long", idvar = "inc",
varying = 2:ncol(theData), sep = "_")
# inc time one two three
# 1.a 1 a 1 1.5 5 Things
# 2.a 2 a 5 5.5 10 Things
# 3.a 3 a 9 9.5 15 Things
# 1.b 1 b 2 2.5 10 Things
# 2.b 2 b 6 6.5 20 Things
# 3.b 3 b 10 10.5 30 Things

Manipulating a data frame with contents from a different data frame similar to a SQL join

Say I have a data frame with the contents:
Trial Person Time
1 John 1.2
2 John 1.3
3 John 1.1
1 Bill 2.3
2 Bill 2.5
3 Bill 2.7
and another data frame with the contents:
Person Offset
John 0.5
Bill 1.0
and I want to modify the original frame based on the appropriate value from the second. I could do this easily in any other language or in SQL, and I'm sure I could manage using for loops and what, but with everything else I see in R, I'm guessing it has special syntax to do this as a one-liner. So, if so, how? And if not, could you show how it could be done using loops. I haven't actually got around to learning looping in R yet since it has amazing things to simply extract and manipulate whatever values.
For reference, the output would:
Trial Person Time
1 John 0.7
2 John 0.8
3 John 0.6
1 Bill 1.3
2 Bill 1.5
3 Bill 1.7
There are many possibilities. Here is a simple one using merge() and a simple column-wise subtraction in the enlarged data.frame:
R> DF1 <- data.frame(trial=rep(1:3,2), \
Person=rep(c("John","Bill"), each=3), \
Time=c(1.2,1.3,1.1,2.3,2.5,2.7))
R> DF2 <- data.frame(Person=c("John","Bill"), Offset=c(0.5,1.0))
R> DF <- merge(DF1, DF2)
R> DF
Person trial Time Offset
1 Bill 1 2.3 1.0
2 Bill 2 2.5 1.0
3 Bill 3 2.7 1.0
4 John 1 1.2 0.5
5 John 2 1.3 0.5
6 John 3 1.1 0.5
R> DF$NewTime <- DF$Time - DF$Offset
R> DF
Person trial Time Offset NewTime
1 Bill 1 2.3 1.0 1.3
2 Bill 2 2.5 1.0 1.5
3 Bill 3 2.7 1.0 1.7
4 John 1 1.2 0.5 0.7
5 John 2 1.3 0.5 0.8
6 John 3 1.1 0.5 0.6
R>
One liner:
transform(merge(d1,d2), Time=Time - Offset, Offset=NULL)

Resources