I am currently learning R in order to write my thesis in my university.
In my project I have one data frame with 2 columns and 6001 rows. In my first column I have numbers from 10000,9999,9998 until 4000 and in the second column I have numeric elements. In my mind I want to do something very simple. I want to create a second data frame with half the size of the original data frame, that contains the even numbers of the first column with the numeric from the second column.
I tried some scripts that didn't go as planned. My first script was:ifelse(tkk[1] %% 2 == 0, tkal<-tkk, 0). And then I tried this one:
tkal <-case_when((tkk[1]%%2 ==0)~ tkk). But both of my scripts didn't run or have not the results that I wanted.
Does any of you have a solution or a better idea so I could solve this simple task?
Thank you in advance
If tkk is your dataframe you can do the following
tkk[tkk[,1]%%2==0,]
This returns all columns of the data frame where the first column has even values.
Code I used:
tkk=data.frame(1:20,rep(1,20))
tkk[tkk[,1]%%2==0,]
#2 2 1
#4 4 1
#6 6 1
#8 8 1
#10 10 1
#12 12 1
#14 14 1
#16 16 1
#18 18 1
#20 20 1
Try tkk2 <- dplyr::filter(tkk, first_column %% 2 == 0)
Note that you don't need any condition on the second column since R assumes an unique type for a column in data.frame.
You are looking to subset. I suggest using which() in case you have NA values.
tkal <- tkk[which(tkk[[1]] %% 2 ==0),]
Related
I have a data set with 40 columns and 2000 rows. the value of 2 columns are important. I want to select rows whose have the same value in these 2 columns.
a small sample of my data is like this
2 3 4 5 6 3 23 32
4 3 4 1 0 5 6 43
4 4 3 22 1 2 23
Suppose I want to select rows whose have same value in first and third columns. So I want the second row to be stored in a new data set
I take from your comments that you have numbers stored as factors in that dataframe. Factors have different internal values. So when the console output shows the factor level to be 4 it is not necessarily a 4 in the internal representation. In general, two different factors are not compatible with each other except if they have the same level set. To see the 'internal representation' of your first column use as.numeric(df[[1]]).
Now to the solution of your problem. You first have to convert the factors in your columns 1 and 3 (or all columns) into numeric values using the factor levels. Instructions for that can be found here.
## converting factor levels to numeric values
df[[1]] <- as.numeric(levels(df[[1]]))[df[[1]]]
df[[3]] <- as.numeric(levels(df[[3]]))[df[[3]]]
## filter data
df[df[1] == df[3],]
Lets say I have the following data.frame in R
df <- data.frame(order=(1:10),value=c(1,7,3,5,9,2,9,10,2,3))
Other than looping through data an testing whether value exceeds previous high value how can I get successive high values so that I can end up with a table like this
order value
1 1
2 7
5 9
8 10
TIA
Here's one option, if I understood the question correct:
df[df$value > cummax(c(-Inf, head(df$value, -1))),]
# order value
#1 1 1
#2 2 7
#5 5 9
#8 8 10
I use cummax to keep track of the maximum of column "value" and compare it (the previous row's cummax) to each "value" entry. To make sure the first entry is also selected, I start by "-Inf".
"get successive high values (of value?)" is unclear.
It seems you want to filter only rows whose value is higher than previous max.
First, we reorder your df in increasing order of value... (not clear but I think that's what you wanted)
Then we use logical indexing with diff()>0 to only include strictly-increasing rows:
rdf <- df[order(df$value),]
rdf[ diff(rdf$value)>0, ]
order value
1 1 1
9 9 2
10 10 3
4 4 5
2 2 7
7 7 9
8 8 10
I have data frame in R with 3 columns A,B and C
A B C
2 3 4
5 2 7
I want to get square of each number like this
A B C
4 9 16
25 4 49
Can anyone please help me out. I can able to make in excel but want to do in R
just do this. In R ^ will take care whether it is a number,vector,matrix or dataframe..
dataframe^2
If you want your result as a data.frame rather than a matrix, do
data.frame(dataframe^2)
I'm working with some data that looks like this:
AB 123 4 5 3 2 1
AB 234 4 2 7 4 3
...
The row id is actually the combination of the first two columns, so I would like to be able to reference row AB123 or AB234. However, since they are in two columns, I figured the easiest way to do this would be to merge columns 1 and 2 somehow and then convert it to a table with column 1 specified as the row names. Does anyone know how I can do this? Is there an easier way? Thanks.
row.names(df)<-paste(df[,1],df[,2],sep="")
I have a data set consisting of 2000 individuals. For each individual, i:2000 , the data set contains n repeated situations. Letting d denote this data set, each row of dis indexed by i and n. Among other variables, d has a variable pid which takes on identical value for an individual across different (situations) rows.
Taking into consideration the panel nature of the data, I want to re-sample d (as in bootstrap):
with replacement,
store each re-sample data as a data frame
I considered using the sample function but could not make it work. I am a new user of r and have no programming skills.
The data set consists of many variables, but all the variables have numeric values. The data set is as follows.
pid x y z
1 10 2 -5
1 12 3 -4.5
1 14 4 -4
1 16 5 -3.5
1 18 6 -3
1 20 7 -2.5
2 22 8 -2
2 24 9 -1.5
2 26 10 -1
2 28 11 -0.5
2 30 12 0
2 32 13 0.5
The first six rows are for the first person, for which pid=1, and the next sex rows, pid=2 are different observations for the second person.
This should work for you:
z <- replicate(100,
d[d$pid %in% sample(unique(d$pid), 2000, replace=TRUE),],
simplify = FALSE)
The result z will be a list of dataframes you can do whatever with.
EDIT: this is a little wordy, but will deal with duplicated rows. replicate has its obvious use of performing a set operation a given number of times (in the example below, 4). I then sample the unique values of pid (in this case 3 of those values, with replacement) and extract the rows of d corresponding to each sampled value. The combination of a do.call to rbind and lapply deal with the duplicates that are not handled well by the above code. Thus, instead of generating dataframes with potentially different lengths, this code generates a dataframe for each sampled pid and then uses do.call("rbind",...) to stick them back together within each iteration of replicate.
z <- replicate(4, do.call("rbind", lapply(sample(unique(d$pid),3,replace=TRUE),
function(x) d[d$pid==x,])),
simplify=FALSE)