I am new to julia, I want plot a simple scatterplot from a dataframe where the colors are coded as String7 hexadecimal code, a snapshot,
Row │ x y ncv_color
│ Int64 Int64 String7
─────┼─────────────────────────
1 │ 120 4180 #005529
2 │ 120 3890 #004903
3 │ 110 4670 #004E66
4 │ 120 8270 #004A99
5 │ 120 9620 #005C5A
when I use the following code to draw a scatterplot, it works.
scatter(df.x, df.y)
Although when I use
As suggested by #ginkul using this
scatter(df2, df2, color=df.ncv_color)
I get FigureAxisPlot() as output and no plot is shown.
Any help would be appreciated.
versioninfo
Julia Version 1.6.3
Commit ae8452a9e0 (2021-09-23 17:34 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: AMD EPYC 7542 32-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, znver2)
The problem is that you're passing ncv_color as a Symbol. You can actually pass Symbol as color, for example :red, but since there is no such color as ncv_color nothing can be displayed. scatter doesn't know that you're implying a column of df. You need to pass your array explicitly.
scatter(df.x, df.y, color = df.ncv_color)
Related
I have an R script which was coded in utf-8. While running it in Rstudio, there is no problem with the Turkish characters. However, when I try to run it from cmd it throws an error:
Columns ÜrünAçiklama, and HataTanimi don't exist.
It gives this error because my dataframe has the columns 'ÜrünAçıklama' and 'HataTanımı'.
As you can see, there is no problem with the characters "Ü,ü,Ç" but there is a problem with dotless i (ı). I run the script with this line in cmd
Rscript --encoding="UTF-8" myscript.r
my OS is windows10
What should I do? Thanks in advance.
EDIT:
An example should be fine.
Here is my dataset. When I try to delete duplicate lines, I cannot reach the columns contain dotless i. You can try it in your own cmd with the following script.
library(readxl)
rm(list = ls())
shell("cls")
df <- read_excel("stackoverflow.xlsx")
df$ÜrünNo
df$ÜrünAçıklama
df$HataTanımı
df$HataZamanı
df_nd <- df[!duplicated(df[,c("ÜrünNo","ÜrünAçıklama","HataZamanı")]),]
Also here is my CMD output:
[1] 1 2 3 3 4
[1] "X" "Y" "Z" "Z" "Q"
[1] "A" "B" "C" "C" "D"
[1] 10 11 12 12 13
Error in `vectbl_as_col_location()`:
! Can't subset columns past the end.
x Columns `ÜrünAçiklama` and `HataZamani` don't exist.
Backtrace:
x
1. +-df[!duplicated(df[, c("ÜrünNo", "ÜrünAçiklama", "HataZamani")])]
2. +-tibble:::`[.tbl_df`(...)
3. +-base::duplicated(df[, c("ÜrünNo", "ÜrünAçiklama", "HataZamani")])
4. +-df[, c("ÜrünNo", "ÜrünAçiklama", "HataZamani")]
5. \-tibble:::`[.tbl_df`(df, , c("ÜrünNo", "ÜrünAçiklama", "HataZamani"))
6. \-tibble:::vectbl_as_col_location(...)
7. +-tibble:::subclass_col_index_errors(...)
8. | \-base::withCallingHandlers(...)
9. \-vctrs::vec_as_location(j, n, names)
10. \-vctrs `<fn>`()
11. \-vctrs:::stop_subscript_oob(...)
12. \-vctrs:::stop_subscript(...)
13. \-rlang::abort(...)
Execution halted
As you can see, I can reach columns one by one, however when I try to delete duplicate line it just says columns do not exist.
While I was making some research, I encountered this sentence.
R 4.2 for Windows will support UTF-8 as native encoding, which will be a major improvement in encoding support, allowing Windows R users to work with international text and data.
Later, I just realized I was using 4.1. Updating R is the easiest and the fastest solution. Sorry for the inconvience.
I have two different text files with a list of numbers which I want to plot. One file contains the x values and the other the y values. I know how to plot them if they were in the same file but I don't know how to go about it for the separate files. How do I go about it? I am using GNUplot by the way.
If it is useful here are two small bits of data from both files:
x values
0
563
1563
2563
3563
4563
5563
corresponding y values
738500.0
683000.0
647000.0
623500.0
607500.0
I guess I have seen such a question already, but I can't find it right now.
Well, Linux (in contrast to Windows) has some built-in tools where you can easily append two files line by line.
If you want to do this in gnuplot only (and hence platform independent), the following would be a suggestion.
Prerequisite is that you have your files already in a datablock. How to get this done see: gnuplot: load datafile 1:1 into datablock.
Code:
### merge files by line
reset session
$Data1 <<EOD
0
563
1563
2563
3563
4563
5563
EOD
$Data2 <<EOD
738500.0
683000.0
647000.0
623500.0
607500.0
EOD
maxRow = |$Data1| <= |$Data2| ? |$Data1| : |$Data2| # find the shorter datablock
set print $Data
do for [i=1:maxRow] {
print $Data1[i][1:strlen($Data1[i])-1]." ".$Data2[i]
}
set print
plot $Data u 1:2 w lp pt 7
### end of code
Result:
I'm trying to prepare a dataset to use it as training data for a deep neural network. It consists of 13 .txt files, each between 500MB and 2 GB large. However, when trying to run a "data_prepare.py" file, I get the Value error of this post's title.
Reading answers from previous posts, I have loaded my data into R and checked both for NaN and infinite numbers, but the commands used tell me there appears to be nothing wrong with my data. I have done the following:
I load my data as one single dataframe using magrittr, data.table and purrr packages(there are about 300 Million rows, all with 7 variables):
txt_fread <-
list.files(pattern="*.txt") %>%
map_df(~fread(.))
I have used sapply to check for finite and NaN values:
>any(sapply(txt_fread, is.finite))
[1] TRUE
> any(sapply(txt_fread, is.nan))
[1] FALSE
I have also tried loading each data frame into a jupyter notebook and check individually for those values using the following commands:
file1= pd.read_csv("File_name_xyz_intensity_rgb.txt", sep=" ", header=None)
np.any(np.isnan(file1))
False
np.all(np.isfinite(file1))
True
And when I use print(file1.info()), this is what I get as info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 22525176 entries, 0 to 22525175
Data columns (total 7 columns):
# Column Dtype
--- ------ -----
0 0 float64
1 1 float64
2 2 float64
3 3 int64
4 4 int64
5 5 int64
6 6 int64
dtypes: float64(3), int64(4)
memory usage: 1.2 GB
None
I know the file containing the code (data_prepare.py) works because it runs properly with a similar dataset. I therefore know it must be a problem with the new data I mention here, but I don't know what I have missed or done wrong while checking for NaNs and infinites. I have also tried reading and checking the .txt files individually, but it also hasn't helped much.
Any help is really appreciated!!
Btw: the R code with map_df came from a post by leerssej in How to import multiple .csv files at once?
I am trying to create a graph from a table I've made. I want to graph the values for month with the numbers in the Scheduled column. Unfortunately, it is displaying the months as like .75 or 2.25 and 4.75 instead of the actual month numbers and I don't know why.
I have tried changing the type of graph, the sumvar, the axes and values for them, but none of this has helped... it worked at one point but then simply stopped and I cannot figure out why.
1 SKED 7573
1 UNSK 1882
2 SKED 6635
2 UNSK 1642
3 SKED 817
3 UNSK 208
4 SKED 9494
4 UNSK 2376
5 SKED 1900
5 UNSK 551
6 SKED 9864
6 UNSK 3319
7 SKED 9770
7 UNSK 4145
pattern1 value=solid color=CXc01933;
pattern2 value=solid color=CX003366;
axis1 label=(angle=90 'Amount of Wheelchair Requests');
axis2 label=('Month') order=(0 to 12 by 1);
proc gchart data=Overall_Arr;
vbar month / type=sum SUMVAR=Arr_num subgroup=scheduled raxis=axis1 maxis=axis2
autoref clipref ;
run;
This is the table and this is the code to make the graph. I am expecting an output of a graph with two different colored bars, signifying the scheduled number and the unscheduled number. Before I put the order on the second axis it would output a graph but would have strange numbers for the month, like .75 or 4.25, etc, instead of using the 1 2 3 etc to signify the months. Now it is outputting no bars, I am assuming because it is trying to use those weird numbers but I've restricted the axis to whole numbers for the month... Any help would be appreciated.
Alright I actually think I figured it out, the problem was that month is also a command, so changing my variable's name allowed for it to be a variable instead of a command.
The following data shows my projects, time frames, and their phases. I would like to visualize this data using R ggplot() code shown below. However, as we can see below, getting an error while inferring Months from the data. I would like to use the name of Months as x-axis labels. Moreover, I need to print the name of the projects besides the rectangular boxes. Please help me in this. Thank you.
> temp
projects starts ends order Phase
A 2013-02-15 2013-03-15 1 Research
A 2013-03-16 2013-04-15 1 Prototype
B 2013-04-07 2013-04-30 2 Research
B 2013-05-01 2013-08-30 2 Prototype
C 2013-05-01 2013-07-30 3 Research
D 2013-05-01 2013-07-30 4 Research
> a = ggplot(temp, aes(xmin = starts, xmax = ends, ymin = order, ymax = order+0.5)) + geom_rect(aes(fill=Phase), color="black") + theme_bw()
> b = a + geom_text(aes(x= starts + (ends-starts)/2 ,y=order+0.25, label=projects))
> b
Error in unit(x, default.units) : 'x' and 'units' must have length > 0
In addition: Warning messages:
1: In Ops.factor(ends, starts) : - not meaningful for factors
2: In Ops.factor(starts, (ends - starts)/2) : + not meaningful for factors
3: Removed 6 rows containing missing values (geom_text).
Please also see the version of R.
> version
_
platform i686-pc-linux-gnu
arch i686
os linux-gnu
system i686, linux-gnu
status
major 2
minor 15.2
year 2012
month 10
day 26
svn rev 61015
language R
version.string R version 2.15.2 (2012-10-26)
try converting starts and ends to Date
temp$starts <- as.Date(temp$starts)
temp$ends <- as.Date(temp$ends)
If that does not work, you may want to use dput(temp) and paste that into your question.
Copying + Pasting OP's data, converting to date, then using OP's code