Show only even numbers from a data set - r

I am trying to extract only the even numbers from the "cars" data set.
I know I need to create a new function.
I have come this far:
Is.even = function(x) x %% 2 == 0
When I enter in:
Is.even(cars[1])
It gives me back a logical response. I want to only display the actual even numbers in integer form and hide the odd numbers.
What am I doing wrong?

Apart from #neilfws' suggestion, if you pass your values as a vector you can also use Filter
Filter(Is.even, cars[, 1])
#[1] 4 4 8 10 10 10 12 12 12 12 14 14 14 14 16 16 18 18 18 18 20 20 20 20 20 22 24 24 24 24

Related

how to convert and store text file to csv

13-JUL-17
Bank User Space Occupied(GB)
------------------------------ ------------------
CKYC_MNSB .004211426
CORE_AMARNATH_ASP 8.75262451
CORE_AMBUJA 6.80389404
CORE_AMBUJA_ASP 10.0085449
CORE_ANAND_MERC_ASP 18.9866333
CORE_BALOTRA 17.8280029
CORE_BASODA 4.55432129
CORE_CHHAPI_ASP 11.9767456
CORE_DHANGDHRA_ASP 13.1849976
CORE_IDAR_ASP 13.3209229
CORE_JANTA_HALOL_ASP 12.7955933
Bank User Space Occupied(GB)
------------------------------ ------------------
CORE_JHALOD_URBAN_ASP 9.19219971
CORE_MANINAGAR 5.36090088
CORE_MANINAGAR_ASP 6.31414795
CORE_SANKHEDA 20.4329834
CORE_SMCB_ANAND_ASP 11.3191528
CORE_TARAPUR_ASP 8.24627686
CORE_VUCB .000610352
TBA_TEMP 5.39910889
TEST_DUNIA 4.15698242
20 rows selected.
TABLESPACE NAME Free Space in GB
------------------------------ ----------------
TBAPROJ 33.2736816
I have above text file.
How to store in CSV file with column separated?
I have load file but its very difficult to remove blank space from the file.
Each line you want matches the pattern of a word made from capital letters and underscores, then spaces, then a number that has a decimal point in it. so this grep will filter those out:
> file_raw <- readLines('file.txt')
> read.table(
text=paste(
file_raw[
grep("^[A-Z_].*\\s*\\.",file_raw)
],
collapse="\n"),
sep="",head=FALSE)
V1 V2
1 CKYC_MNSB 0.004211426
2 CORE_AMARNATH_ASP 8.752624510
3 CORE_AMBUJA 6.803894040
4 CORE_AMBUJA_ASP 10.008544900
5 CORE_ANAND_MERC_ASP 18.986633300
6 CORE_BALOTRA 17.828002900
7 CORE_BASODA 4.554321290
8 CORE_CHHAPI_ASP 11.976745600
9 CORE_DHANGDHRA_ASP 13.184997600
10 CORE_IDAR_ASP 13.320922900
11 CORE_JANTA_HALOL_ASP 12.795593300
12 CORE_JHALOD_URBAN_ASP 9.192199710
13 CORE_MANINAGAR 5.360900880
14 CORE_MANINAGAR_ASP 6.314147950
15 CORE_SANKHEDA 20.432983400
16 CORE_SMCB_ANAND_ASP 11.319152800
17 CORE_TARAPUR_ASP 8.246276860
18 CORE_VUCB 0.000610352
19 TBA_TEMP 5.399108890
20 TEST_DUNIA 4.156982420
21 TBAPROJ 33.273681600
Note that if you are expecting any of the first tokens to not match the pattern, for example CORE_999 or lower_case then you need to adjust the pattern. But without a formal spec we can only go on what you supplied.
There might be possibly a more elegant way, but this does the trick:
# read raw file in lines
file_raw <- readLines('file.txt')
# remove whitespace
file_trim <- trimws(file_raw,which = 'both')
# remove empty lines
file_trim <- file_trim[file_trim != '']
# sub white space with separator ,
file_csv <- gsub('\\s{2,}',',',file_trim)
In the end there will be still some things left like the -- separators and 20 rows selected., but that can be filtered out easily if you want, before writing or after reading it:
file_clean <- file_csv[!grepl('(-){3,}|rows selected',file_csv)]
write.csv(file_clean,'file_cleaned.csv')
> read.csv('file_cleaned.csv')
X x
1 1 13-JUL-17
2 2 Bank User,Space Occupied(GB)
3 3 CKYC_MNSB,.004211426
4 4 CORE_AMARNATH_ASP,8.75262451
5 5 CORE_AMBUJA,6.80389404
6 6 CORE_AMBUJA_ASP,10.0085449
7 7 CORE_ANAND_MERC_ASP,18.9866333
8 8 CORE_BALOTRA,17.8280029
9 9 CORE_BASODA,4.55432129
10 10 CORE_CHHAPI_ASP,11.9767456
11 11 CORE_DHANGDHRA_ASP,13.1849976
12 12 CORE_IDAR_ASP,13.3209229
13 13 CORE_JANTA_HALOL_ASP,12.7955933
14 14 Bank User,Space Occupied(GB)
15 15 CORE_JHALOD_URBAN_ASP,9.19219971
16 16 CORE_MANINAGAR,5.36090088
17 17 CORE_MANINAGAR_ASP,6.31414795
18 18 CORE_SANKHEDA,20.4329834
19 19 CORE_SMCB_ANAND_ASP,11.3191528
20 20 CORE_TARAPUR_ASP,8.24627686
21 21 CORE_VUCB,.000610352
22 22 TBA_TEMP,5.39910889
23 23 TEST_DUNIA,4.15698242
24 24 TABLESPACE NAME,Free Space in GB
25 25 TBAPROJ,33.2736816

How to use apply function instead of for loop if you have multiple if conditions to be excecuted

1st DF:
t.d
V1 V2 V3 V4
1 1 6 11 16
2 2 7 12 17
3 3 8 13 18
4 4 9 14 19
5 5 10 15 20
names(t.d) <- c("ID","A","B","C")
t.d$FinalTime <- c("7/30/2009 08:18:35","9/30/2009 19:18:35","11/30/2009 21:18:35","13/30/2009 20:18:35","15/30/2009 04:18:35")
t.d$InitTime <- c("6/30/2009 9:18:35","6/30/2009 9:18:35","6/30/2009 9:18:35","6/30/2009 9:18:35","6/30/2009 9:18:35")
>t.d
ID A B C FinalTime InitTime
1 1 6 11 16 7/30/2009 08:18:35 6/30/2009 9:18:35
2 2 7 12 17 9/30/2009 19:18:35 6/30/2009 9:18:35
3 3 8 13 18 11/30/2009 21:18:35 6/30/2009 9:18:35
4 4 9 14 19 13/30/2009 20:18:35 6/30/2009 9:18:35
5 5 10 15 20 15/30/2009 04:18:35 6/30/2009 9:18:35
2nd DF:
> s.d
F D E Time
1 10 19 28 6/30/2009 08:18:35
2 11 20 29 8/30/2009 19:18:35
3 12 21 30 9/30/2009 21:18:35
4 13 22 31 01/30/2009 20:18:35
5 14 23 32 10/30/2009 04:18:35
6 15 24 33 11/30/2009 04:18:35
7 16 25 34 12/30/2009 04:18:35
8 17 26 35 13/30/2009 04:18:35
9 18 27 36 15/30/2009 04:18:35
Output to be:
From DF "t.d" I have to calculate the time interval for each row between "FinalTime" and "InitTime" (InitTime will always be less than FinalTime).
Another DF "temp" from "s.d" has to be formed having data only within the above time interval, and then the most recent values of "F","D","E" have to be taken and attached to the 'ith' row of "t.d" from which the time interval was calculated.
Also we have to see if the newly formed DF "temp" has the following conditions true:
here 'j' represents value for each row:
if(temp$F[j] < 35.5) + (temp$D[j] >= 100) >= 1)
{
temp$Flag <- 1
} else{
temp$Flag <- 0
}
Originally I have 3 million rows in the dataframe and 20 columns in each DF.
I have solved the above problem using "for loop" but it obviously takes 2 to 3 days as there are a lot of rows.
(Also if I have to add new columns to the resultant DF if multiple conditions get satisfied on each row?)
Can anybody suggest a different technique? Like using apply functions?
My suggestion is:
use lapply over row indices
handle in the function call your if branches
return either your dataframe or NULL
combine everything with rbind
by replacing lapply with mclapply from the 'parallel' package, your code gets executed in parallel.
resultList <- lapply(1:nrow(t.d), function(i){
do stuff
if(condition){
return(df)
}else{
return(NULL)
}
resultDF <- do.call(rbind, resultList)

R: Doing calculations on multiple factors/levels (Dummy variables)

I have two equally long matching vectors of time series data: Price (x) and hour (h). Hour goes from 0-23. My hour variable is my dummy variable (or factor/level variable I guess it is called in R).
Right now i've defined 24 different dummy variables, and for each hour I type my dummy variable. So for example generating 24 plots to look at or calculate 24 means etc I would type:
plot.ts(hour1) # and so on for all 24.
I would like to do this for all 24 variables as easily as possible? So I can run a lot of different calculations. For example, how could I just compute the mean for all 24 dummy variables without making 24 lines of code, changing each dummy variable?
EDIT: Sorry, thought it was clear with the two vectors. Example:
1. Price Hour
2. 8 0
3. 12 1
4. 14 2
5. 16 3
6. 18 4
7. 20 5
8. 22 6
9. 24 7
10. 26 8
11. 28 9
12. 24 10
13. 26 11
14. 23 12
15. 23 13
16. 23 14
17. 14 15
18. 19 16
19. 25 17
20. 26 18
21. 28 19
22. 30 20
23. 33 21
24. 24 22
25. 10 23
26. 14 0
27. 12 1
28. 13 2
29. x ect.
It is not clear how your data are stored since you don't give a reproducible example. I assume you have separate variables for each hour1.
Generally, It is better to put your hourxx variable in a list to perform calculations.
For example, this will compute mean for all hours:
lapply(lapply(ls(pattern='hour.*'),get),mean)
EDIT after OP clarification:
You shuld create a new variable to distinguish between Hours intervals. Something like :
dat <- data.frame(Price=rnorm(24*5),Hour=rep(0:23,5))
dat$id <- cumsum(c(0,diff(dat$Hour)==-23))
Then using ply package for example , you can compute mean by id:
library(plyr)
ddply(dat,.(id),summarise,mPrice=mean(Price))
id mPrice
1 0 0.2999602
2 1 -0.2201148
3 2 0.2400192
4 3 -0.2087594
5 4 0.1666915

Creating a numerical variable order

I have a set of data with 3 columns: index column (with no name), colour, colour of seed, and germination time.
How do I create a numerical variable called 'order' with values 1 to 22 (the number of data sets)?
I don't know if I get you right, but simplest way would be:
> order <- c(1:22)
> order
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
No, if you run:
class(order)
you will get:
[1] "integer"
but you can easily get every element of object order (especially in a loop)
for(i in 1:length(order)){
print(order[i])
}

How to reorder a column in a data frame to be the last column

I have a data frame where columns are constantly being added to it. I also have a total column that I would like to stay at the end. I think I must have skipped over some really basic command somewhere but cannot seem to find the answer anywhere. Anyway, here is some sample data:
x=1:10
y=21:30
z=data.frame(x,y)
z$total=z$x+z$y
z$w=11:20
z$total=z$x+z$y+z$w
When I type z I get this:
x y total w
1 1 21 33 11
2 2 22 36 12
3 3 23 39 13
4 4 24 42 14
5 5 25 45 15
6 6 26 48 16
7 7 27 51 17
8 8 28 54 18
9 9 29 57 19
10 10 30 60 20
Note how the total column comes before the w, and obviously any subsequent columns. Is there a way I can force it to be the last column? I am guessing that I would have to use ncol(z) somehow. Or maybe not.
You can reorder your columns as follows:
z <- z[,c('x','y','w','total')]
To do this programmatically, after you're done adding your columns, you can retrieve their names like so:
nms <- colnames(z)
Then you can grab the ones that aren't 'total' like so:
nms[nms!='total']
Combined with the above:
z <- z[, c(nms[nms!='total'],'total')]
You have a logic issue here. Whenever you add to a data.frame, it grows to the right.
Easiest fix: keep total a vector until you are done, and only then append it. It will then be the rightmost column.
(For critical applications, you would of course determine your width k beforehand, allocate k+1 columns and just index the last one for totals.)

Resources