This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Reference for proper handling of PID file on Unix
Is the .pid file in unix created automatically by the operating system whenever a process runs OR should the process create a .pid file programmatically like "echo $$ > myprogram.pid" ?
The latter holds true -- a process must create the .pid file (the OS won't do it). pid files are often used by daemon processes for various purposes. For example, they can be used to prevent a process from running more than once. They can also be used so that control processes (e.g. apache2ctl) know what process to send signals to.
Related
I would like to automatically set up scrapyd with a maximum number of parallel processes automatically without editing the config file.
I know there is a config file to set up max_proc and max_proc_per_cpu. I am wondering if it's possible to start scrapyd via shell like:
scrapy --max_proc=32
I haven't found a helpful parameter when showing all available commands using scrapyd -h.
Does anyone know if there is a solution? Maybe editing the file using Python?
This question already has answers here:
sftp get files with R [closed]
(1 answer)
Schedule weekly Excel file download to an unique name
(1 answer)
Closed 4 years ago.
My company´s data department uploads a .csv file daily to a remote folder that I can only access through my company's network using FileZilla.
Everyday, I use the newest .csv file and process the data in R. I want to automate this process. I want to access the daily .csv file by reading the .csv file from the remote folder using the read.csv function in R.
How can I tell FileZilla to copy the file in the shared folder to a local folder in my PC and do this everyday at 6:00 a.m.? If this isn't possible, how can I access the remote folder through R and read the .csvfile from there?
Thanks in advance!
EDIT:
As seen here, FileZilla does not allow any sort of automation. You can use the client WinSCP instead, write a script to download/upload files from/to a remote SFTP server and schedule the script to run every n days using Windows Task Scheduler.
Now, in order to access an SFTP server from R, you can use the RCrul package. Unfortunately, this closed question (which was closed because it was not even a question to begin with) purges unwanted lines of code from an FTPserver (even though the title says SFTP) and it doesn't specify the user, password nor port specs. Moreover, it uses the write.lines() command, which as I understand, is used to create, not download files.
This question specifically refers to downloading a .csv file from a shared folder using SFTP protocol. Given that FileZilla is no good for this, how can I manage to do this in R using RCurl?
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I'm a little bit confused about process and open file tables.
I know that if 2 processes try to open the same file, there will be 2 entries in the open file table. I am trying to find out the reason for this.
Why there are 2 entries created in the open file table when 2 different processes try to reach the same file? Why it can't be done with 1 entry?
I'm not quite clear what you mean by "file tables". There are no common structures in the Linux kernel referred to as "file tables".
There is /etc/fstab, which stands for "filesystem table", which lists filesystems which are automatically mounted when the system is booted.
The "filetable" Stack Overflow tag that you included in this question is for SQL Server and not directly connected with Linux.
What it sounds like you are referring to when you talk about open files is links. See Hard and soft link mechanism. When a file is open in Linux, the kernel maintains what is basically another hard link to the file. That is why you can actually delete a file that is open and the system will continue running normally. Only when the application closes the file will the space on the disk actually be marked as free.
So for each inode on a filesystem (an inode is generally what we think of as a file), there are often multiple links--one for each entry in a directory, and one for each time an application opens the file.
Update: Here is a quote from the web page that inspired this question:
Each file table entry contains information about the current file. Foremost, is the status of the file, such as the file read or write status and other status information. Additionally, the file table entry maintains an offset which describes how many bytes have been read from (or written to) the file indicating where to read/write from next.
So, to directly answer the question, "Why there are 2 entries created in the open file table when 2 different processes try to reach the same file?", 2 entries are required because they may contain different information. One process may open the file read-only while the other read-write. And the file offset (position within the file) for each process will almost certainly be different.
This question already has an answer here:
multiple spark application submission on standalone mode
(1 answer)
Closed 5 years ago.
i want to run spark wordcount application on four different file at same time.
i have standalone cluster with 4 worker nodes, each node having one core and 1gb memory.
spark works in standalone mode...
1.4worker nodes
2.1 core for each worker node
3.1gb memory for each node
4.core_max set to 1
./conf/spark-env.sh
**
export SPARK_MASTER_OPTS="-Dspark.deploy.defaultCores=1"
export SPARK_WORKER_OPTS="-Dspark.deploy.defaultCores=1"
export SPARK_WORKER_CORES=1
export SPARK_WORKER_MEMORY=1g
export SPARK_WORKER_INSTANCES=4
**
i have executed using .sh file
./bin/spark-submit --master spark://-Aspire-E5-001:7077 ./wordcount.R txt1 &
./bin/spark-submit --master spark://-Aspire-E5-001:7077 ./wordcount.R txt2 &
./bin/spark-submit --master spark://-Aspire-E5-001:7077 ./wordcount.R txt3 &
./bin/spark-submit --master spark://-Aspire-E5-001:7077 ./wordcount.R txt4
is this a correct way to submit application parallelly ?
when one application running it takes 2sec like that(only using one core)
when 4 application given simultaneously then each application takes more than 4sec ...
how do i run spark application on different files parallely?
When you submit multiple jobs to a spark cluster, the Application master / resource-manager automatically schedules the jobs in parallel. (as spark is on top of yarn).
You dont need to do any extra scheduling for that.
And for the scenario you have shown, you could have read all different files in a single spark job.
And believe me, due to Spark's lazy evaluation / DAG optimizations and RDD transformations (logical/physical plans), reading of different files and word-count will go in parallel.
You can read all files in single job as:
sc.wholeTextFiles("<folder-path>")
The folder-path is the parent directory where all files reside.
Is it possible to close an application that was launched from within R?
Assume that I have opened a CSV file my_file.csv with its associated application via the shell.exec function. I then want to close this application.
Since R has no control over other programs you cannot directly close files opended without R reliably. You do not even know which program to close. E.g. one one computer a csv file may be opened with notepad, on another computer it may be opened with Excel.
If you know the program you can use system2() or similar commands to execute a command to kill the other program. E.g. if you want to close Excel execute system2("taskkill", args = "/im excel.exe"). Note that this will close all open instances of the program/Excel, not jut a specific one.