Error in makebin(data, file) : 'sid' invalid - r

I am getting the same error "Error in makebin(data, file) : 'sid' invalid"
running cspade on the small dataset below. Both my transactionID and eventID are ordered blockwise (as somebody mentioned in another post that they hsould be). So I don't see any reason for that error. Please let me know what could be the problem.
items transactionID sequenceID eventID
1 {item=/} 1 1 1458565800
2 {item=/login} 2 2 1458565803
3 {item=/profile} 3 3 1458565811
4 {item=/shop_list} 4 4 1458565814
5 {item=/} 5 1 1458565912
6 {item=/login} 6 2 1458565915
7 {item=/shop_list} 7 3 1458565918
8 {item=/} 8 1 1458565802
9 {item=/login} 9 2 1458565808
10 {item=/profile} 10 3 1458565812
11 {item=/product} 11 4 1458565818
12 {item=/} 12 1 1458565911
13 {item=/login} 13 2 1458565916
14 {item=/shop_list} 14 3 1458565922
15 {item=/profile} 15 4 1458565927
16 {item=/contact} 16 5 1458565929
17 {item=/profile} 17 6 1458565933
traffic <- read.csv("C:\\buczaal1\\RProg\\web_traffic.csv")
traffic_data <- data.frame(item=traffic$Page)
traffic.tran <- as(traffic_data, "transactions")
transactionInfo(traffic.tran)$sequenceID <- traffic$Seq
transactionInfo(traffic.tran)$eventID <- traffic$Timestamp
frequent_pattern <- cspade(traffic.tran, parameter= list(support=0.3))

Related

How can this R code be sped up with the apply (lapply, mapply ect.) functions?

I am not to proficient with the apply functions, or with R. But I know I overuse for loops which makes my code slow. How can the following code be sped up with apply functions, or in any other way?
sum_store = NULL
for (col in 1:ncol(cazy_fams)){ # for each column in cazy_fams (so for each master family eg. GH, AA ect...)
for (row in 1:nrow(cazy_fams)){ # for each row in cazy fams (so the specific family number e.g GH1 AA7 ect...)
# Isolating the row that pertains to the current cazy family being looked at for every dataframe in the list
filt_fam = lapply(family_summary, function(sample){
sample[as.character(sample$Family) %in% paste(colnames(cazy_fams[col]),cazy_fams[row,col], sep = ""),]
})
row_cat = do.call(rbind, filt_fam) # concatinating the lapply list output int a dataframe
if (nrow(row_cat) > 0){
fam_sum = aggregate(proteins ~ Family, data=row_cat, FUN=sum) # collapsing the dataframe into one row and summing the proteins count
sum_store = rbind(sum_store, fam_sum) # storing the results for that family
} else if (grepl("NA", paste(colnames(cazy_fams[col]),cazy_fams[row,col], sep = "")) == FALSE) {
Family = paste(colnames(cazy_fams[col]),cazy_fams[row,col], sep = "")
proteins = 0
sum_store = rbind(sum_store, data.frame(Family, proteins))
} else {
next
}
}
}
family_summary is just a list of 18 two column dataframes that look like this:
Family proteins
CE0 2
CE1 9
CE4 15
CE7 1
CE9 1
CE14 10
GH0 5
GH1 1
GH3 4
GH4 1
GH8 1
GH9 2
GH13 2
GH15 5
GH17 1
with different cazy families.
cazy_fams is just a dataframe with each coulms being a cazy class (eg. GH, AA ect...) and ech row being a family number, all taken from the linked website:
GH GT PL CE AA CBM
1 1 1 1 1 1
2 2 2 2 2 2
3 3 3 3 3 3
4 4 4 4 4 4
5 5 5 5 5 5
6 6 6 6 6 6
7 7 7 7 7 7
8 8 8 8 8 8
9 9 9 9 9 9
10 10 10 10 10 10
11 11 11 11 11 11
12 12 12 12 12 12
13 13 13 13 13 13
14 14 14 14 14 14
15 15 15 15 15 15
The reason behind the else if (grepl("NA", paste(colnames(cazy_fams[col]),cazy_fams[row,col], sep = "")) == FALSE) statment is to deal with the fact not all classes have the same number of family so when looping over my dataframe I end up with some GHNA and AANA with NA on the end.
The output sum_store is this:
Family proteins
GH1 54
GH2 51
GH3 125
GH4 29
GH5 40
GH6 25
GH7 0
GH8 16
GH9 25
GH10 19
GH11 5
GH12 5
GH13 164
GH14 3
GH15 61
A dataframe with all listed cazy families and the total number of apperances across the family_summary list.
Please let me know if you need anything else to help answer my question.

Indexing multiple text files using R

I have to combine 5 files with the same structure and add a new variable to index the new data frame, but all 5 files are using the same ID.
I successfully combine them but I do not find how to index them. I have tried a few loops, but they were not giving me what I wanted.
# Combining files
path <- "D:/..."
filenames <- list.files(path)
t <- do.call("rbind", lapply(filenames, read.table, header = TRUE))
# Trying indexing with loops:
for (i in 1:length(t$ID){
t$ID2<-(t$ID+last(t$ID2))
}
I have 5 files, all of them with the same structure, and all of them using the same variable for identification, i.e.
file 1 would have:
ID: 1 1 1 2 2 2 3 3 3
And file 2 to 5 would have exactly the same IDs:
I would like to combine them into a single data frame so I would have this:
ID: 1 1 1 2 2 2 3 3 3 1 1 1 2 2 2 3 3 3 1 1 1....
and then name them differently. So I would have:
ID: 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7...
How's this? This code finds the largest ID of first (i) data.frame and then adds that to IDs of next (i+1) data.frame. It records (i+1) largest ID and uses that in the (i+2) data.frame.
For this to work, you will have to forego the first do.call(rbind, ...) in your code.
xy1 <- data.frame(id = rep(1:4, each = 4), matrix(runif(4*4 * 3), ncol = 3))
xy2 <- data.frame(id = rep(1:7, each = 3), matrix(runif(3*7 * 3), ncol = 3))
xy3 <- data.frame(id = rep(1:3, each = 5), matrix(runif(3*5 * 3), ncol = 3))
xy <- list(xy1, xy2, xy3)
# First find largest ID of the first data.frame.
maxid <- max(xy[[1]]$id)
# Add previous max to current ID.
for (i in 2:length(xy)) {
xy[[i]]$id <- maxid + xy[[i]]$id
maxid <- max(xy[[i]]$id) # calculates largest id to be used next
}
> do.call(rbind, xy)
id X1 X2 X3
1 1 0.881397055 0.113236016 0.58935016
2 1 0.205762300 0.216630633 0.04096480
3 1 0.307112552 0.005092413 0.97769030
4 1 0.457299727 0.329346925 0.09582600
5 2 0.007010529 0.089751397 0.69746047
6 2 0.014806573 0.432586138 0.44480438
7 2 0.534909561 0.108258153 0.82475185
8 2 0.313796157 0.749077837 0.38798818
9 3 0.643547518 0.237040912 0.18304776
10 3 0.725906336 0.186099719 0.61738806
11 3 0.506767958 0.646870554 0.27792817
12 3 0.303638439 0.082478410 0.52484137
13 4 0.360623223 0.182054933 0.48604454
14 4 0.804174231 0.427352128 0.70075198
15 4 0.211255624 0.673377745 0.77251727
16 4 0.474358562 0.430095921 0.03648586
17 5 0.731251361 0.635859860 0.90235962
18 5 0.689463703 0.931878683 0.12179179
19 5 0.256770523 0.413928661 0.89254294
20 6 0.358319709 0.393714347 0.53143877
21 6 0.241538687 0.811901018 0.91577045
22 6 0.445141806 0.015133252 0.70977512
23 7 0.179662683 0.574578297 0.09957555
24 7 0.279302309 0.351412534 0.40911867
25 7 0.826039704 0.852739191 0.58671811
26 8 0.822024888 0.061122387 0.12308001
27 8 0.676081285 0.005285565 0.32040908
28 8 0.302821623 0.511678250 0.14814015
29 9 0.966690845 0.221078055 0.72651928
30 9 0.070768391 0.726477379 0.70431920
31 9 0.178425952 0.223096153 0.41111805
32 10 0.952963096 0.209673890 0.73485060
33 10 0.905570765 0.290359419 0.69499805
34 10 0.976600565 0.448144677 0.36100322
35 11 0.458720466 0.636912805 0.04170255
36 11 0.953471285 0.533102906 0.63543974
37 11 0.574490192 0.975327747 0.94730912
38 12 0.878968237 0.956726315 0.04761167
39 12 0.379196322 0.720179957 0.98719308
40 12 0.217246809 0.066895905 0.44981063
41 12 0.309354927 0.048701078 0.24654953
42 12 0.011187546 0.833095978 0.94793368
43 13 0.590529610 0.240967648 0.42954908
44 13 0.525187039 0.739698883 0.72047067
45 13 0.223469798 0.338660741 0.21820068
46 13 0.359939747 0.831732199 0.27095365
47 13 0.672778236 0.327900275 0.04854854
48 14 0.202447020 0.911963711 0.18576047
49 14 0.858830035 0.003633945 0.25713498
50 14 0.784197766 0.527018979 0.30911792
51 14 0.942135786 0.256841256 0.76965498
52 14 0.488395595 0.716133306 0.89618736

Check and count conditions for following value

I have a dataframe with 18 rows and 25 variables. The values are between 0 and 1. For each row, I want to count the number of times a high value (> than 0.7) is followed by a low value (<0.4) and stored that count in a new column.
So far I have been using:
df$n_calls<-rowSums(df > 0.7)
I know it is possible to use different conditions but in my case it is very important to check that the low value is right after the high value
Here is an example of my df
1 2 3 4 5 6 7 8 9 10 11
1 0.186158072 0.27738592 0.42165043 0.43501515 0.10918095 0.09976244 0.09571536 0.08674526 0.09239877 0.07523392 0.043679510
2 0.773469188 0.75381254 0.20389633 0.46444408 0.30433377 0.68334244 0.42105103 0.66224478 0.32412056 0.30951402 0.616658953
3 0.201245200 0.26873094 0.25892904 0.38605874 0.68438397 0.30236790 0.51493090 0.66314468 0.68910974 0.59134860 0.625550641
4 0.033746517 0.06388212 0.06978669 0.05517553 0.06032239 0.06736223 0.06514233 0.05133860 0.06034266 0.05702451 0.011144861
5 0.590297759 0.40352955 0.08106493 0.06063485 0.07780428 0.09633069 0.10882515 0.11468680 0.28375374 0.63941033 0.629284574
6 0.165001648 0.31174739 0.36955514 0.47581249 0.65349233 0.66471913 0.58004314 0.50790858 0.51298260 0.18651107 0.501195655
7 0.033164989 0.05678890 0.05941058 0.04139692 0.04660761 0.05452679 0.04939543 0.02780824 0.03680599 0.04645522 0.018496662
8 0.080893779 0.07228276 0.07473865 0.05536056 0.05732153 0.06403365 0.06139970 0.05142047 0.05698089 0.06998986 0.032598440
9 0.557273680 0.49226191 0.63900601 0.37497255 0.72114277 0.37557355 0.34360391 0.37502000 0.41622472 0.46852220 0.410656260
10 -0.004010143 0.03051558 0.04403711 0.02749514 0.04770637 0.05800898 0.05603494 0.04163723 0.04622024 0.04677767 0.007736933
11 0.280273472 0.59839662 0.74167893 0.75352655 0.75108785 0.72345468 0.65395063 0.32957749 0.08357061 0.33165070 0.731228429
12 0.107398713 0.10983041 0.13630594 0.19905651 0.47014034 0.72519345 0.69545405 0.62194265 0.49873996 0.16549282 0.087689371
13 0.164520925 0.22763832 0.50824238 0.59686660 0.68419908 0.66837348 0.62380175 0.20226234 0.11425066 0.09725765 0.078701134
14 0.076934267 0.09684586 0.10703672 0.08436558 0.10789735 0.24130640 0.36615645 0.42805115 0.42937392 0.51390288 0.584757257
15 0.055565174 0.06796064 0.07519020 0.05498454 0.05754891 0.06377643 0.06537049 0.05152625 0.05783594 0.05963775 0.022556411
16 0.126975964 0.19394191 0.53324900 0.60905758 0.67072084 0.61613836 0.55415573 0.18317823 0.13453799 0.09835233 0.067080267
17 0.730333357 0.65759923 0.59045925 0.63148539 0.36305458 0.40829673 0.48734552 0.58647457 0.66968986 0.48312152 0.453863785
18 0.196450179 0.33968393 0.51538678 0.44868341 0.22221050 0.18934329 0.19179838 0.18764290 0.22423578 0.27524872 0.608625015
12 13 14 15 16 17 18 19 20 21 22
1 0.038553121 0.040081485 0.05358118 0.07403555 0.05091901 0.042299806 0.04322122 0.05587749 0.06881493 0.09753878 0.10462942
2 0.618447812 0.048885425 0.06231155 0.08228801 0.05963307 0.022666894 0.09384802 0.07914030 0.08549148 0.08373159 0.07404309
3 0.179434300 0.679981042 0.69176338 0.74453573 0.70937271 0.289762839 0.17956945 0.68770664 0.73864122 0.73187173 0.34604987
4 0.005094105 0.007952117 0.02076629 0.04174891 0.02129751 0.010066515 0.01454399 0.04337116 0.05259742 0.05795045 0.04533231
5 0.554122074 0.322792638 0.21839661 0.18322419 0.05764354 0.041600287 0.04692187 0.04305403 0.05762126 0.06212474 0.05289008
6 0.719147265 0.481543275 0.20168371 0.19885731 0.27223662 0.587549079 0.66694312 0.76974309 0.45266122 0.23338301 0.09435850
7 0.019041585 0.005380972 0.01856521 0.03947278 0.01221314 0.004858193 0.01322566 0.02001854 0.02755861 0.03889634 0.03102918
8 0.031368415 0.024535386 0.04031225 0.06011198 0.03558484 0.027890723 0.04100022 0.04572906 0.05465957 0.06437218 0.06308497
9 0.290487995 0.109253389 0.09076971 0.11177720 0.08365271 0.074780381 0.07845467 0.08843678 0.12696256 0.15252180 0.16108674
10 0.004599971 0.004843833 0.02327683 0.05022203 0.02867540 0.013674600 0.02376855 0.03408261 0.04563785 0.04991278 0.04216682
11 0.702763718 0.204497547 0.05554607 0.07056242 0.04561622 0.027652748 0.05185238 0.03544719 0.04735368 0.05194280 0.05193089
12 0.087884047 0.068055513 0.07587232 0.09912338 0.09637278 0.085378227 0.09348430 0.09237792 0.10785289 0.22242136 0.28522539
13 0.050134608 0.060945434 0.07203437 0.09687331 0.07316602 0.067771770 0.07634787 0.08154630 0.09157153 0.08930093 0.09904561
14 0.255098748 0.323642069 0.34568802 0.42105224 0.41797424 0.434900416 0.39764147 0.30798058 0.31269146 0.42912436 0.52562571
15 0.015262751 0.027712972 0.03813722 0.07103989 0.05202094 0.040513502 0.04066496 0.23360454 0.34666910 0.62701471 0.61683636
16 0.052436966 0.080045644 0.11447572 0.10672800 0.07924541 0.064626998 0.07234429 0.06744468 0.07878329 0.08901864 0.07953835
17 0.422132751 0.127518376 0.13062324 0.15104667 0.12490013 0.110841862 0.10892834 0.07984952 0.09097741 0.15193027 0.18654107
18 0.662904286 0.247251060 0.20583902 0.32290931 0.47391488 0.574805088 0.64776018 0.73091902 0.27798841 0.35922799 0.36333131
23 24 n_calls
1 0.23100480 0.30027592 0
2 0.07209460 0.06670631 1
3 0.30800154 0.27452357 2
4 0.04148986 0.03842700 0
5 0.05362370 0.05018294 0
6 0.08703911 0.08242964 0
7 0.03186000 0.03233006 0
8 0.05789078 0.05637648 0
9 0.25593446 0.29909342 1
10 0.03615961 0.03356159 0
11 0.05754763 0.06368048 1
12 0.45794999 0.56138753 0
13 0.16676533 0.22718405 0
14 0.63646856 0.29169414 0
15 0.64039251 0.60901138 0
16 0.08805636 0.09688941 0
17 0.36883747 0.41561690 1
18 0.37085132 0.36292634
Any idea how to proceed?
We can use the rowSums based on subsetting the dataset by removing the last column, first column so that dimensions will the same and it compares the adjacent columns
rowSums(df[-length(df)] > 0.7 & df[-1] < 0.4)

Create a Table from a Data frame with some conditions

From the following data frame, I am trying to output two tables, one for PASS and another one for FAIL. The condition is that the output for each table should contain only the ID and the Score. Can anyone help me with this? I am still starting to know the full capabilities of the table function. If anyone could suggest other alternatives I would greatly appreciate it as long as the conditions for the output is met.
> df <- data.frame(
ID <- as.factor(c(20260, 11893, 54216, 11716, 53368, 46196, 40007, 20970, 11802, 46166, 23615, 11865, 16138, 64789, 43211, 66539));
Score <- c(9,7,6,2,10,7,8,10,6,7,7,9,9,9,10,8)
Remark<- as.factor(c("PASS","PASS","FAIL","FAIL","PASS","PASS","PASS","PASS","FAIL","PASS","PASS","PASS","PASS","PASS","PASS","PASS"))
)
> df
ID Score Remark
1 20260 9 PASS
2 11893 7 PASS
3 54216 6 FAIL
4 11716 2 FAIL
5 53368 10 PASS
6 46196 7 PASS
7 40007 8 PASS
8 20970 10 PASS
9 11802 6 FAIL
10 46166 7 PASS
11 23615 7 PASS
12 11865 9 PASS
13 16138 9 PASS
14 64789 9 PASS
15 43211 10 PASS
16 66539 8 PASS
Something like this?
df <- data.frame(
ID = as.factor(c(20260, 11893, 54216, 11716, 53368, 46196, 40007, 20970, 11802, 46166, 23615, 11865, 16138, 64789, 43211, 66539)),
Score = c(9,7,6,2,10,7,8,10,6,7,7,9,9,9,10,8),
Remark = as.factor(c("PASS","PASS","FAIL","FAIL","PASS","PASS","PASS","PASS","FAIL","PASS","PASS","PASS","PASS","PASS","PASS","PASS"))
)
df[df$Remark == "PASS", 1:2]
ID Score
1 20260 9
2 11893 7
5 53368 10
6 46196 7
7 40007 8
8 20970 10
10 46166 7
11 23615 7
12 11865 9
13 16138 9
14 64789 9
15 43211 10
16 66539 8

Extracting data from dataframe using different dataframe without headers (R)

I have a gridded data as a data-frame that has daily temperatures (in K) for 30 years. I need to extract data for days that matches another data-frame and keep the first and second columns (lon and lat).
Data example:
gridded data from which I need to remove days that do not match days in the second data (df2$Dates)
>head(Daily.df)
lon lat 1991-05-01 1991-05-02 1991-05-03 1991-05-04 1991-05-05 1991-05-06 1991-05-07 1991-05-08 1991-05-09
1 5.000 60 278.2488 280.1225 280.3909 279.4138 276.6809 276.2085 276.6250 277.7930 276.9693
2 5.125 60 278.2514 280.1049 280.3789 279.4395 276.7141 276.2467 276.6571 277.8264 277.0225
3 5.250 60 278.2529 280.0871 280.3648 279.4634 276.7437 276.2849 276.6918 277.8608 277.0740
4 5.375 60 278.2537 280.0687 280.3488 279.4858 276.7691 276.3238 276.7289 277.8960 277.1232
5 5.500 60 278.2537 280.0493 280.3319 279.5066 276.7909 276.3633 276.7688 277.9313 277.1701
6 5.625 60 278.2539 280.0294 280.3143 279.5264 276.8090 276.4042 276.8111 277.9666 277.2147
1991-05-10 1991-05-11 1991-05-12 1991-05-13 1991-05-14 1991-05-15 1991-05-16 1991-05-17 1991-05-18 1991-05-19
1 276.9616 277.3436 273.3149 274.4931 274.6967 275.6298 272.2511 271.5413 271.7289 271.7964
2 276.9689 277.2988 273.3689 274.5399 274.6801 275.6307 272.2214 271.4445 271.6410 271.7023
3 276.9720 277.2533 273.4225 274.5811 274.6646 275.6241 272.1858 271.3391 271.5424 271.5989
4 276.9716 277.2080 273.4726 274.6146 274.6507 275.6109 272.1456 271.2274 271.4340 271.4872
5 276.9689 277.1632 273.5163 274.6382 274.6380 275.5917 272.1022 271.1121 271.3168 271.3693
6 276.9645 277.1190 273.5507 274.6501 274.6263 275.5672 272.0571 270.9955 271.1919 271.2469
1991-05-20 1991-05-21 1991-05-22 1991-05-23 1991-05-24 1991-05-25 1991-05-26 1991-05-27 1991-05-28 1991-05-29
1 272.2633 268.0039 268.5981 269.4139 267.7836 265.8771 263.5669 266.1666 269.7285 272.5083
2 272.2543 268.0218 268.5847 269.4107 267.7886 265.8743 263.5125 266.1031 269.6471 272.4676
3 272.2434 268.0369 268.5716 269.4089 267.7910 265.8669 263.4592 266.0332 269.5697 272.4217
4 272.2308 268.0507 268.5597 269.4090 267.7925 265.8559 263.4066 265.9581 269.4987 272.3714
5 272.2164 268.0642 268.5505 269.4112 267.7936 265.8425 263.3546 265.8797 269.4355 272.3175
6 272.2005 268.0793 268.5451 269.4154 267.7962 265.8276 263.3039 265.7997 269.3818 272.2614
1991-05-30 1991-05-31 1991-06-01 1991-06-02 1991-06-03 1991-06-04 1991-06-05 1991-06-06 1991-06-07 1991-06-08
1 274.2950 273.4715 274.5197 274.7548 273.8259 272.4433 274.1811 274.4135 274.3999 276.0327
2 274.2205 273.4638 274.5292 274.8316 273.8658 272.4700 274.1992 274.4426 274.4650 276.0698
3 274.1421 273.4549 274.5373 274.9027 273.9028 272.4980 274.2160 274.4781 274.5309 276.1012
4 274.0609 273.4452 274.5438 274.9665 273.9365 272.5273 274.2322 274.5211 274.5969 276.1255
5 273.9784 273.4353 274.5482 275.0216 273.9660 272.5576 274.2481 274.5725 274.6617 276.1417
6 273.8960 273.4253 274.5508 275.0668 273.9912 272.5887 274.2649 274.6334 274.7239 276.1487
1991-06-09 1991-06-10 1991-06-11 1991-06-12 1991-06-13 1991-06-14 1991-06-15 1991-06-16 1991-06-17 1991-06-18
1 276.5216 277.1812 277.8093 278.3013 278.5323 278.5403 277.9563 278.3461 275.8296 273.8277
2 276.5531 277.1925 277.8261 278.3409 278.4956 278.5317 277.9148 278.3234 275.8167 273.8302
3 276.5861 277.2065 277.8457 278.3748 278.4503 278.5181 277.8654 278.2939 275.8057 273.8358
4 276.6204 277.2239 277.8684 278.4029 278.3988 278.4996 277.8080 278.2583 275.7966 273.8427
5 276.6564 277.2466 277.8945 278.4253 278.3423 278.4759 277.7429 278.2171 275.7888 273.8504
6 276.6938 277.2753 277.9242 278.4414 278.2834 278.4472 277.6715 278.1714 275.7819 273.8570
1991-06-19 1991-06-20 1991-06-21 1991-06-22 1991-06-23 1991-06-24 1991-06-25 1991-06-26 1991-06-27 1991-06-28
1 275.1738 274.6805 275.6100 274.8936 273.5818 273.2099 273.1788 271.2747 273.2458 276.9931
2 275.1808 274.7123 275.7043 274.9494 273.5861 273.1770 273.2280 271.2435 273.2662 276.9822
3 275.1859 274.7478 275.7993 275.0009 273.5956 273.1439 273.2730 271.2133 273.2803 276.9678
4 275.1891 274.7879 275.8941 275.0467 273.6107 273.1106 273.3130 271.1840 273.2886 276.9502
5 275.1902 274.8337 275.9870 275.0857 273.6318 273.0777 273.3472 271.1556 273.2918 276.9307
6 275.1891 274.8864 276.0776 275.1168 273.6589 273.0454 273.3752 271.1285 273.2905 276.9101
1991-06-29 1991-06-30
1 272.0784 273.5677
2 272.0577 273.5973
3 272.0339 273.6237
4 272.0075 273.6476
5 271.9794 273.6701
6 271.9500 273.6925
Second data I'm using for extracting (using Dates variable)
>head(df2)
Dates Temp Wind.S Wind.D
1 5/1/1991 18 4 238
2 5/2/1991 18 8 93
3 5/4/1991 22 8 229
4 5/6/1991 21 4 81
5 5/7/1991 21 8 192
6 5/9/1991 17 8 32
7 5/13/1991 22 8 229
8 5/18/1991 21 4 81
9 6/2/1991 21 8 192
10 6/7/1991 17 8 32
The header of the final data I'm looking for is something like this:
>head(df3)
lon lat 1991-05-01 1991-05-02 1991-05-04 1991-05-06 1991-05-09 1991-05-13
Example data following the format of yours
Daily.df <- data.frame(lon=1:5,lat=1:5,A=1:5,B=1:5,C=1:5,D=1:5)
colnames(Daily.df) <- c("lon","lat","1991-05-01","1991-05-02","1991-05-03","1991-05-04")
lon lat 1991-05-01 1991-05-02 1991-05-03 1991-05-04
1 1 1 1 1 1 1
2 2 2 2 2 2 2
3 3 3 3 3 3 3
4 4 4 4 4 4 4
5 5 5 5 5 5 5
df2 <- data.frame(Dates = c("5/1/1991","5/2/1991","5/4/1991"))
Dates
1 5/1/1991
2 5/2/1991
3 5/4/1991
Using lubridate to convert df2$Dates into the right format, make a vector of the column names you want to keep (thesedates) including lon and lat. Then use select_at to keep those columns.
library(lubridate)
library(dplyr)
thesedates <- c("lon","lat",as.character(mdy(df2$Dates)))
new.df <- Daily.df %>%
select_at(vars(thesedates))
Output
lon lat 1991-05-01 1991-05-02 1991-05-04
1 1 1 1 1 1
2 2 2 2 2 2
3 3 3 3 3 3
4 4 4 4 4 4
5 5 5 5 5 5
If you want to have a long data set to match, I would think you need to first convert the dates in df2 into the proper format and then wrangle the data into wide format.
Step 1 - Convert dates into correct format
df2$Dates <- as.Date(df2$Dates, format = "%m/%d/%Y")
Step 2 - convert to wide format
library(tidyr)
spread(df2, Dates, data)

Resources