I have a range of dates where some products were bought. I create a sort of a pivot table relating the products and the dates, but there are dates where nothing was sold. I can find the missing dates and even add them to the main data frame, the problem is that instead of keeping the date format, they adopt the integer format (with the integer being the distance to origin) and I can't order them. The code I'm using is this:
upper.bound<- paste("01", month[1], 2013, sep="-")
lower.bound <- paste("30", month[4], 2013, sep="-")
dates <- seq(as.Date(upper.bound, "%d-%m-%Y"), as.Date(lower.bound, "%d-%m-%Y"), "days")
diff <- setdiff(dates, as.Date(colnames(export_f_ub), "%Y-%m-%d"))
len <- dim(as.matrix(diff))[1]*11
aux <- data.frame()
aux <- seq(0,0,length.out=len)
dim(aux) <- c(11, dim(as.matrix(diff))[1])
col_dates <- as.Date(diff, origin="1970-01-01")
colnames(aux)<- c(col_dates)
This was a tryout to set the matrix to zeros and then bind it to the main one. But this doesn't work, as in the result I get the column names as numeric. Here's a screenshot of the console:
Console log
I've never seen someone try to assign a Date vector as column names of a matrix. Dimension names must always be character strings, so in general this is not something you should be doing.
That being said, in terms of effect, the intuitive expectation would be that the column-name-assignment machinery in R would at some point coerce the Date vector to character along the lines of as.character(), and thus you'd get the text representation of the dates, rather than a stringification of their underlying double values.
Calling `colnames<-`() on a matrix eventually calls `dimnames<-`() which drops into the C code by running .Primitive("dimnames<-"). I haven't really looked into the C implementation, but we can guess that at some point it pulls out the double values underlying the Date vector, coerces them to character, and that's why you end up with numbers as your column names.
The correct approach here is to call as.character() yourself when assigning the names:
col_dates <- as.Date(c('2013-06-03','2013-06-04','2013-06-05','2013-06-06','2013-06-08','2013-06-22','2013-07-07','2013-07-08','2013-07-11','2013-07-13','2013-07-23','2013-07-25','2013-07-26','2013-08-27','2013-09-03','2013-09-04','2013-09-05','2013-09-06','2013-09-07','2013-09-09','2013-09-10','2013-09-11','2013-09-13','2013-09-14','2013-09-15','2013-09-16','2013-09-18','2013-09-20','2013-09-21','2013-09-22','2013-09-24','2013-09-30'));
aux <- matrix(0,11L,length(col_dates));
colnames(aux) <- as.character(col_dates);
aux;
## 2013-06-03 2013-06-04 2013-06-05 2013-06-06 2013-06-08 2013-06-22 2013-07-07 2013-07-08 2013-07-11 2013-07-13 2013-07-23 2013-07-25 2013-07-26 2013-08-27 2013-09-03 2013-09-04 2013-09-05 2013-09-06 2013-09-07 2013-09-09 2013-09-10 2013-09-11 2013-09-13 2013-09-14 2013-09-15 2013-09-16 2013-09-18 2013-09-20 2013-09-21 2013-09-22 2013-09-24 2013-09-30
## [1,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [2,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [3,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [4,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [5,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [6,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [7,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [8,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [9,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [10,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [11,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
But you should be aware that the column names do not retain the Date class or internal representation (referring to the double values); they are pure character strings. If you want to recreate the Date vector from the column names, you'll have to run them through as.Date().
And by the way, dim(as.matrix(x))[1] for a vector x is an unnecessarily roundabout way of getting length(x).
Related
I have a geotiff image created using GDAL.
I would like to know if it is possible using GDAL and Python (or only one of them) to extract the pixel percentage of specific value.
In particular, if I do:
gdalinfo -hist input.tif
I get all the metadata info and, in particular,
Size is 4901, 2867
...
Band 1 Block=4901x1 Type=Byte, ColorInterp=Palette
Minimum=0.000, Maximum=5.000, Mean=2.263, StdDev=1.135
0...10...20...30...40...50...60...70...80...90...100 - done.
256 buckets from -0.5 to 255.5:
1740973 365790 6385650 3688110 1757506 113138 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Is there a way to calculate the pixel percentage of the 6 values defined in the histogram?
The size of the .tif file is 4901x2867 so if I can extract each of those fields using GDAL and/or Python then I can calculate something like this:
pixel_value_0 = 1740973/(4901x2867)
and get the percentage of the pixel value 0
You can convert raster image to a Numpy array and then do the calculation if using Python
from collections import Counter
from osgeo import gdal_array
# Read raster data as numeric array from file
rasterArray = gdal_array.LoadFile('RGB.byte.tif')
# The 3rd band
band3 = rasterArray[2]
# Flatten the 2D array to 1D and count occurrences of each values
# Then simple to get the stat for a pixel value in particular
print(Counter(band3.flatten()))
I've the following difficult problem. Here short example of my data. Assume that I've two data sets (my real example has something about 20). The data frames result as a list computed by a self written function with lapply. So, I put the data frames in my example in a list, too. Then I "rbind" them to compute a frequency table.
df1 <- data.frame(rev(seq(12:0)), paste0("a=",sample(0:12, 13, replace=T)))
colnames(df1) <- c("k", "a")
df2 <- data.frame(rev(seq(12:0)), paste0("a=",sample(0:12, 13, replace=T)))
colnames(df2) <- c("k", "a")
list_df <- list(df1,df2)
df_combine<- plyr::ldply(list_df, rbind)
freq_foo <- table(df_combine$k,df_combine$a)
I get a frequency table of the following form.
a=0 a=11 a=12 a=2 a=5 a=6 a=7 a=8 a=3 a=9
1 1 0 0 0 0 0 0 1 0 0
2 1 0 0 0 0 0 0 0 0 1
3 1 0 0 0 0 1 0 0 0 0
4 0 0 0 1 0 1 0 0 0 0
5 0 0 0 1 1 0 0 0 0 0
6 0 0 0 0 0 0 1 0 0 1
7 0 1 1 0 0 0 0 0 0 0
8 1 0 0 0 0 1 0 0 0 0
9 0 0 0 0 0 0 2 0 0 0
10 0 0 1 0 1 0 0 0 0 0
11 1 1 0 0 0 0 0 0 0 0
12 0 0 0 0 0 0 1 0 1 0
13 1 0 1 0 0 0 0 0 0 0
I want to extend and manipulate my table in the following way:
First the table should go over a range of a=0 to a=15. So if there is a missing column, it should be added. And 2nd) I want to order the columns from 0 to 15.
For the first problem I tried
if(freq_foo$paste0("a=",0:15) == F){freq_foo$paste("a=",0:15) <- 0}
but this should work only for data frames and not for tables. Also. i've no idea how to order the columns with an ascending order. The data type isnt important to me because I just want to use the output for further calculations. So, it can also be a data frame instead of a table.
#convert freq_foo table to dataframe
df <- as.data.frame.matrix(freq_foo)
#add all zeros column for missing column name in 0:15 series
df[, paste0("a=", c(0:15)[!(c(0:15) %in% as.numeric(gsub(".*=(\\d+)", "\\1", names(df))))])] <- 0
#order columns from 0 to 15
df <- df[, order(as.numeric(gsub(".*=(\\d+)", "\\1", names(df))))]
Output is:
a=0 a=1 a=2 a=3 a=4 a=5 a=6 a=7 a=8 a=9 a=10 a=11 a=12 a=13 a=14 a=15
1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0
2 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0
3 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0
5 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0
6 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
7 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0
8 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0
10 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
11 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
12 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0
13 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0
(Edit: Updated code after getting a requirement clarification from OP)
I am working on a homework assignment using intervention analysis. The question is:
Generate a simulation of the difference equation y_t=a_0+〖a_1 y〗_(t-1)+〖c_0 z〗_t+x_t where x_t is the forcing process x_t=w_t, w_t is a white noise, and 〖|a〗_1 |<1. Define the intervention variable z_t as binary (0,1) but you may choose the start time of the intervention; assume the intervention lasts for 2 units of time.
So I wrote this code:
set.seed(50)
y <- w <- rnorm(200, sd=1)
alpha0 <- 1
alpha1 <- 0.9
cee0 <- 1
z <-rep(0, 200)
for (t in 1:200) {z[t] <- ifelse( t = 78:79,1,0)}
So the intervention would occur at the 78th and 79th instant.
But this does not work. I keep getting this error/warning message:
In z[t] <- ifelse(t = 77:78, 1, 0) :
number of items to replace is not a multiple of replacement length
I have tried the analysis using a continuous intervention at the 100th instant and it works fine:
z <-rep(0, 200)
for (t in 1:200) {z[t] <- ifelse( t > 100,1,0)}
So why does the t > 100 work but t = 77:78 not work? Is there something I am missing here?
You could change your command as follows.
for (t in 1:200) {z[t] <- ifelse( t %in% 78:79,1,0)}
> z
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[57] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[113] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[169] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
I do have the following data structure...
ID value
1 1 1
2 1 63
3 1 2
4 1 58
5 2 3
6 2 4
7 3 34
8 3 25
Now I want to turn it into a kind of dyadic data structure. Every ID with the same value should have a relationship.
I tried several option and:
df_wide <- dcast(df, ID ~ value)
... have brought me a long way down the road...
ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 39 40
1 1001 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 1006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 1007 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 2 0 0
4 1011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 1018 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 1020 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
7 1030 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0
8 1036 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Now is my main problem to turn it into a proper matrix to get a igraph object out of it.
df_wide_matrix <- data.matrix(df_wide)
df_aus_wide_g <- graph.edgelist(df_wide_matrix ,directed = TRUE)
don't get me there...
I also tried to transform it into a adjacency matrix...
df_wide_matrix <- get.adjacency(graph.edgelist(as.matrix(df_wide), directed=FALSE))
... but it didn't work either
If you want to create an edge between all IDs with the same value, try something like this instead. First merge the data frame onto itself by the value. Then, remove the value column, and remove all (undirected) edges that are duplicate or just points. Finally, convert to a two-column matrix and create the edges.
res <- merge(df, df, by='value', all=FALSE)[,c('ID.x','ID.y')]
res <- res[res$ID.x<res$ID.y,]
resg <- graph.edgelist(as.matrix(res))
I am importing a file and trying to display only the numbers in each row, with any commas or labels. With the following code, my output is given below:
mydata <- read.table("/home/mukhera3/Desktop/Test/part-r-00000", sep=",")
mydata
Output
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
V461 V462 V463 V464 V465 V466 V467 V468 V469 V470 V471 V472 V473 V474 V475 V476 V477 V478 V479 V480 V481 V482 V483
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
V484 V485 V486 V487 V488 V489 V490 V491 V492 V493 V494 V495 V496 V497 V498 V499 V500 V501 V502 V503 V504 V505 V506
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
V507 V508 V509 V510 V511 V512 V513 V514 V515 V516 V517 V518 V519 V520 V521 V522 V523 V524 V525 V526 V527 V528 V529
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
V530 V531 V532 V533 V534 V535 V536 V537 V538 V539 V540 V541 V542 V543 V544 V545 V546 V547 V548 V549 V550 V551 V552
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
V553 V554 V555 V556 V557 V558 V559 V560 V561 V562 V563 V564 V565 V566 V567 V568 V569 V570 V571 V572 V573 V574 V575
When I replace the "," for sep with whitespace (sep=""), keeping everything else the same. this is what I get:
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
I want to display the numbers 0,1 .. without any commas or other row numbers etc. I am new to R programming, and do not know how to do this. Any help would be appreciated.
If you want your file to be read directly as a vector and not as a dataframe, you can, for instance, use scan instead of read.table. Example with your example file saved as a.txt in my working directory:
> mydata <- scan(file="a.txt",sep=",")
Read 46 items
> mydata
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
You can also get that result from read.table with some additional steps:
> mydata <- read.table("a.txt",sep=",") # Reads your file as a data.frame
> mydata <- unlist(mydata) # Transforms into a named vector
> names(mydata) <- NULL # Gets rid of the names
> mydata
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
If you just want to "display" it like that but don't want to change the nature of your table, you can simply use cat (combined with unlist):
> mydata <- read.table("a.txt",sep=",")
> cat(unlist(mydata))
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0