Mapping between HDFS Daemon and Kerberos Principal and Unix Account - unix

In my organization, to access the hadoop cluster we do the following on the Gateway:
sudo su -
cd /etc/username/
kinit some_string/instance -k -t some_string.keytab
hadoop fs -ls
This works perfectly fine, but I am trying to understand what exactly is going on.
When I do a 'whoami' obviously it shows 'root'. But any files created the above way on HDFS have the owner as 'some_string' and group as 'hdfs'. And I can neither kinit nor access HDFS as any other user. Why is this so?
Is this because: Hadoop's HDFS daemon is mapped to the kerberos principal (and that principal's ticket is only accessible to me as a root user?) And that principal is also mapped to the OS account some_string which is what i see as owner of the files on HDFS? If so where is the link defined (hadoop daemon to principal to os account)
I tried googling around a lot but could not find a definitive answer to my confusion. Even when I log in to HUE with my own user, I do not have write access to these files, which is also something I want to understand how to resolve.
Thanks.
Edit:
$ klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: some_string/instance#CLOUDERA.xxxx.CORP
Valid starting Expires Service principal
03/02/16 21:06:19 03/03/16 21:06:19 krbtgt/CLOUDERA.xxxx.CORP#CLOUDERA.xxxx.CORP
renew until 03/02/16 21:06:19

So when you are executing below command
kinit some_string/instance -k -t some_string.keytab
You are requesting for ticket of the principal which is stored in your some_string.keytab file which you can look at by using below command
klist -k some_string.keytab
It will show you output with principal name and version. Keytab files contains password as well so it dose not ask for password.
And the second question will be answer from the klist command it will show you the principal which is like user/_host#realm so in your case user is some_string, and when you get ticket of some_string user you are some_string for kerberos and your commands will be executed as some_string user so the owner of files created will be some_string.
Also you can list the tickets which you already have using klist command see below the output:
[root#myhostname ~]# klist -k some_Name.keytab
Keytab name: FILE:some_Name.keytab
KVNO Principal
---- --------------------------------------------------------------------------
1 myuser/myhostname#MYREAL.COM
Here my keytab is of myuser user and host is myhostname host.

Related

Set password for RStudio Server with AWS EC2 instance

I managed to follow all the steps to create EC2 instance and install R Server on it.
When I go to RStudio Server page to connect (which looks something like "ec2-[Public IP]-.eu-west-3.compute.amazonaws.com:8787"), I am asked a username and a password.
I figured out to set a username ("user1") this way:
$ sudo useradd user1
But then when I try this command to write the password:
echo user1:password | chpasswd
I receive this message:
chpasswd: cannot lock /etc/passwd; try again later.
I looked at different solutions suggested here:
https://superuser.com/questions/296373/cannot-lock-etc-passwd-try-again-later
but I do not see a resolution to my problem.
I did not find either any passwd.lock, shadow.lock, group.lock, gshadow.lock files to remove.
type in 'sudo passwd your_username' and you will be prompted to enter a new password

Google Cloud: Compute VM Instances

How do I get root access to my Google VM instance, and also how can I log into my VM Instance from my PC with a SSH client such as putty?
I would also like to add that I have tried to do sudo for things that need root access to do those things, such as yum or wget. But it does not allow me to do sudo, it asks me for the root password but I do not know how, or where I would be able to get the root password.
You can become root via sudo su. No password is required.
How do I use sudo to execute commands as root?
(splitting this off from the other answer since there are multiple questions within this post)
Once you connect to your GCE VM using PuTTY or gcloud compute instances ssh or even clicking on the "SSH" button on the Developers Console next to the instance, you should be able to use the sudo command. Note that you shouldn't be using the su command to become root, just run:
sudo [command]
and it should not prompt you for a password.
If you want to get a root shell to run several commands as root and you want to avoid prefixing all commands with sudo, run:
sudo su -
If you're still having issues, please post a new question with the exact command you're running and the output that you see.
sudo su root <enter key>
No password required :)
if you want to connect your gce (google-cloud) server with putty using root, here is the flow:
use puttygen to generate two ppk files:
for your gce-default-user
for root
do the followings on putty (replace gce-default-user with your gce username):
Putty->session->Connection->data->Auto-login username: gce-default-user
Putty->session->Connection->SSH->Auth->Private-key for authentication: gce-default-user.ppk
Then connect to server using your gce-default-user
make the following changes in sshd_config
sudo su
nano /etc/ssh/sshd_config
PermitRootLogin yes
UsePAM no
Save+exit
service sshd restart
Putty->session->Connection->data->Auto-login username: root
Putty->session->Connection->SSH->Auth->Private-key for authentication: root-gce.ppk
Now ou can login to root via putty.
If you need to use eclipse remote system and log-in as root:
Eclipse->windows->preferences->General->network Connection->SSH2->private-keys:
root-gce.ppk
Please try sudo su - on GCE.
By default on GCE, there is no password required to sudo (do as a substitute user). The - argument to su (substitute user) further simulates a full login, taking the target user (the default user for both is root) configured login shell and its profile scripts to set new environment parameters. You'll at least notice the prompt change from ending in $ to # in any case.
JUST GOT TO CLOUD SHELL BY CLICKING SSH
AND FOLLOW PASSWORD CHANGE COMMAND FOR ROOT USER USING SUDO :)
sudo passwd
and it will change the root password :)
then to becom root use command
su
type your password and become a root :)
How do I connect to my GCE instance using PuTTY?
(splitting this off from the other answer since there are multiple questions within this post)
Take a look at setting up ssh keys in the GCE documentation which shows how to do it; here's the summary but read the doc for additional notes:
Generate your keys using ssh-keygen or PuTTYgen for Windows, if you haven't already.
Copy the contents of your public key. If you just generated this key, it can probably be found in a file named id_rsa.pub.
Log in to the Developers Console.
In the navigation, Compute->Compute Engine->Metadata.
Click the SSH Keys tab.
Click the Edit button.
In the empty input box at the bottom of the list, enter the corresponding public key, in the following format:
<protocol> <public-key> username#example.com
This makes your public key automatically available to all of your instances in that project. To add multiple keys, list each key on a new line.
Click Done to save your changes.
It can take several minutes before the key is inserted into the instance. Try connecting with ssh to your instance. If it is successful, your key has been propagated to the instance.

Syncing local and remote directories using rsync+ssh+public key as a user different to the ssh key owner

The goal is to sync local and remote folders over ssh.
My current user is user1, and I have a password-less access setup over ssh to a server server1.
I want to sync local folder with a folder on server1 by means of rsync utility.
Normally I would run:
rsync -rtvz /path/to/local/folder server1:/path/to/remote/folder
ssh access works as expected, rsync is able to connect over ssh, but it returns "Permission denied" error because on server1 the folder /path/to/remote/folder is owned by user2:user2. File permissions of the folder do not allow it to be altered by anyone else.
user1 is a sudoer on server1 so sudo su - user2 works during ssh session.
How to forse rsync to switch the user when it ssh'ed to the server?
Adding user1 to the group user2 is not an option because all user/group management on the server is done automatically and replicated from a central repo every X mins, that I have not access to.
Same for changing permissions/ownership of the destination folder: it is updated automatically on a regular basis with a reset of all permissions.
Possible solution coming to my mind is a script that syncs the local folder with a temporary intermediate remote folder owned by user1 on the server, and then syncs two remotes folders as user2.
Googling for a shorter and prettier solution did not yield any success.
I have not tried it by myself, but how about using rsync's '--rsync-path' option?
rsync -rtvz --rsync-path='sudo -u user2 rsync' /path/to/local/folder server1:/path/to/remote/folder
To fix the permissions problem you need to run rsync over an an SSH session that logs in remotely as user2:
rsync avz -e 'ssh -i privatekeyfile' /path/to/local/folder/ user2#server1:/path/to/local/folder
The following answer explains how to setup the SSH keys.
Ant, download fileset from remote machine
Set up password-less access for user1 to access user2#server1, then do:
rsync -rtvz /path/to/local/folder user2#server1:/path/to/remote/folder

How to change the owner for a rsync

I understand preserving the permissions for rsync.
However in my case my local computer does not have the user the files need to under for the webserver. So when I rsync I need the owner and group to be apache on the webserver, but be my username on my local computer. Any suggestions?
I wanted to clarify to explain exactly what I need done.
My personal computer: named 'home' with the user account 'michael'
My web server: named 'server' with the user account 'remote' and user account 'apache'
Current situation: My website is on 'home' with the owner 'michael' and on 'server' with the owner 'apache'. 'home' needs to be using the user 'michael' and 'server' needs to be using the user 'apache'
Task: rsync my website on 'home' to 'server' but have all the files owner by 'apache' and the group 'apache'
Problem: rsync will preseve the permissions, owner, and group; however, I need all the files to be owner by apache. I know the not preserving the owner will put the owner of the user on 'server' but since that user is 'remote' then it uses that instead of 'apache'. I can not rsync with the user 'apache' (which would be nice), but a security risk I'm not willing to open up.
My only idea on how to solve: after each rsync manually chown -R and chgrp -R, but it's a huge system and this takes a long time, especially since this is going to production.
Does anyone know how to do this?
Current command I use to rsync:
rsync --progress -rltpDzC --force --delete -e "ssh -p22" ./ remote#server.com:/website
If you have access to rsync v.3.1.0 or later, use the --chown option:
rsync -og --chown=apache:apache [src] [dst]
More info in an answer from a similar question here: ServerFault: Rsync command issues, owner and group permissions doesn´t change
There are hacks you could put together on the receiving machine to get the ownership right -- run 'chmod -R apache /website' out of cron would be an effective but pretty kludgey option -- but instead, I'd recommend securely allowing rsync-over-ssh-as-apache.
You'd create a dedicated ssh keypair for this:
ssh-keygen -f ~/.ssh/apache-rsync
and then take ~/.ssh/apache-rsync.pub over to the webserver, where you'd put it into ~apache/.ssh/authorized_keys and carefully specify the allowed command, something like so, all on one line:
command="rsync --server -vlogDtprCz --delete . /website",from="IP.ADDR.OF.SENDER",no-port-forwarding,no-X11-forwarding,no-pty ssh-rsa AAABKEYPUBTEXTsVX9NjIK59wJ+fjDgTQtGwhATsfidQbO6u77dbAjTUmWCZjKAQ/fEFWZGSlqcO2yXXXXXXXXXXVd9DSS1tjE6vAQaRdnMXBggtn4M9rnePD2qlR5QOAUUwhyFPhm6U4VFhRoa3wLvoqCVtCV0cuirB6I45On96OPijOwvAuz3KIE3+W9offomzHsljUMXXXXXXXXXXMoYLywMG/GPrZ8supIDYk57waTQWymUyRohoQqFGMzuDNbq+U0JSRlvLFoVUZ5Piz+gKJwwiFwwAW2iNag/c4Mrb/BVDQAyEQ== comment#email.address
and then your rsync command on your "home" machine would be something like
rsync -av --delete -e 'ssh -i ~/.ssh/apache-rsync apache#server' ./ /website
There are other ways to skin this cat, but this is the clearest and involves the fewest workarounds, to my mind. It prevents getting a shell as apache, which is the biggest security concern, natch. If you're really deadset against allowing ssh as apache, there are other ways ... but this is how I've done it.
References here: http://ramblings.narrabilis.com/using-rsync-with-ssh, http://www.sakana.fr/blog/2008/05/07/securing-automated-rsync-over-ssh/
Last version (at least 3.1.1) of rsync allows you to specify the "remote ownership":
--usermap=tom:www-data
Changes tom ownership to www-data (aka PHP/Nginx). If you are using Mac as the client, use brew to upgrade to the last version. And on your server, download archives sources, then "make" it!
The solution using rsync --chown USER:GROUP [src] [dst] only works if the remote user has write access to the the destination directory which in most cases is not the case.
Here's another solution:
Overview
(srcmachine) (rsync) (destmachine)
srcuser -- SSH --> destuser
|
| sudo su jenkins
|
v
jenkins
Let's say that you want to rsync:
From:
Machine: srcmachine
User: srcuser
Directory: /var/lib/jenkins
To:
Machine: destmachine
User: destuser to establish the SSH connection.
Directory: /tmp
Final files owner: jenkins.
Solution
rsync --rsync-path 'sudo -u jenkins rsync' -avP --delete /var/lib/jenkins destuser#destmachine:/tmp
Read more here:
https://unix.stackexchange.com/a/546296/116861
rsync version 3.1.2
I mostly use windows in local, so this is the command line i use to sync files with the server (debian) :
user#user-PC /cygdrive/c/wamp64/www/projects
$ rsync -rptgoDvhnP --chown=www-data:www-data --exclude=.env --exclude=vendor --exclude=node_modules --exclude=.git --exclude=tests --exclude=.phpintel --exclude=storage ./website/ username#hostname:/var/www/html/website
-n : perform a trial run with no changes made, to really execute the command remove the -n option

Using ec2-init scripts with Ubuntu on EC2 - Automatically set hostname and register with Route53

I'd really like to be able to use the ec2-init scripts to do some housekeeping when I spin up an instance. Ideally I'd like to be able to pass user data to set the hostname and run a couple of initialization scripts (to configure puppet etc.).
I see a script called ec2-set-hostname but I'm not sure if you can use it to set an arbitrary hostname from user-data or what the format of the user-data would need to be.
Anyone used these scripts and know how if can set the hostname and run some scripts at the same time?
Thanks in advance.
In the end I decided to skip the ubuntu ec2 scripts and do something similar. I looked into using Amazon's Route53 service as the nameservice and it was really easy to get it up and running.
Using Route53
Here is what I did; Firstly I used the IAM tools to create a user 'route53' with liberal policy permissions for interacting with the Route53 service
Create the dns group & user
iam-groupcreate -g route53 -v
iam-usercreate -u route53 -g route53
Create keys for the user and note these for later
iam-useraddkey -u route53
Give access to the group to add zones and dns records
iam-grouplistpolicies -g route53
iam-groupaddpolicy -p hostedzone -e Allow -g route53 -a route53:* -r '*'
listing the users and policies for a group
iam-grouplistusers -g route53
iam-grouplistpolicies -g route53
iam-grouplistpolicies -g route53 -p hostedzone
To add and remove dns record entries I uses the excellent python wrapper library for Route53, cli53. This takes a lot of the pain out of using route53. You can grab it from here
https://github.com/barnybug/cli53
In my case the python script is symlinked in /usr/bin as cli53. You'll need to set the following environment variables containing keys created earlier for the route53 user.
export AWS_SECRET_ACCESS_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
export AWS_ACCESS_KEY_ID=XXXXXXXXXXXXXXXXXXXXX
You need to then create a zone entry for your domain e.g. simple.org
cli53.py create simple.org
This should return you an amazon nameserver address that you can associate with your domain name via your domain name registrar, so that hostname lookups for the domain will be redirected to the Route53 servers.
Once the zone is setup, adding and removing entries to it is simple e.g.
cli53 rrcreate simple.org hostname CNAME ec2-184-73-137-40.compute-1.amazonaws.com
cli53 rrdelete simple.org hostname
We use a CNAME entry with the Public DNS name of the ec2 instance as this hostname will resolve to the public IP externally and the private IP from within EC2. The following adds an entry for a host 'test2.simple.org'.
cli53 rrcreate simple.org test2 CNAME ec2-184-73-137-40.compute-1.amazonaws.com --ttl 60 --replace
Automatically set hostname and update Route53
Now what remains is to setup a script to automatically do this when the machine boots. This solution and the following script owes huge debt to Marius Ducea's excellent tutorial found here
http://www.ducea.com/2009/06/01/howto-update-dns-hostnames-automatically-for-your-amazon-ec2-instances/
It's basically doing the same as Marius' setup, but using Route53 instead of Bind.
The script uses the simple REST based services available to each EC2 Instance at
http://169.254.169.254
to retrieve the actual Public DNS name and grab the desired hostname from the instance. The hostname is passed to the instance using the customizable 'user-data' which we can specify when we start the instance. The script expects user-data in the format
hostname=test2
The script will
grab hostname info from the instance user-data
grab the public DNS name from the instance metadata
parse out the hostname
set the hostname to the fully qualified name e.g. test2.simple.org,
Add a CNAME record for this FQDN in Route53 point to the Public DNS Name
write an entry into the Messages of the day so that users can see the domain to ec2 mapping when they log in
Copy and save the following as /usr/bin/autohostname.sh
#!/bin/bash
DOMAIN=simple.org
USER_DATA=`/usr/bin/curl -s http://169.254.169.254/latest/user-data`
EC2_PUBLIC=`/usr/bin/curl -s http://169.254.169.254/latest/meta-data/public-hostname`
HOSTNAME=`echo $USER_DATA| cut -d = -f 2`
#set also the hostname to the running instance
FQDN=$HOSTNAME.$DOMAIN
hostname $FQDN
export AWS_SECRET_ACCESS_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
export AWS_ACCESS_KEY_ID=xxxxxxxxxxxxxxxxxxxxxxx
# Update Route53 with a CNAME record pointing the hostname to the EC2 public DNS name
# in this way it will resolve correctly to the private ip internally to ec2 and
# the public ip externally
RESULT=`/root/dns/cli53/cli53.py rrcreate $DOMAIN $HOSTNAME CNAME $EC2_PUBLIC --ttl 60 --replace`
logger "Created Route53 record with the result $RESULT"
# write an MOTD file so that the hostname is displayed on login
MESSAGE="Instance has been registered with the Route53 nameservers as '$FQDN' pointing to ec2 domain name '$EC2_PUBLIC'"
logger $MESSAGE
cat<<EOF > /etc/update-motd.d/40-autohostname
#!/bin/bash
# auto generated on boot by /root/bin/auto_hostname.sh via rc.local
echo "$MESSAGE"
EOF
chmod +x /etc/update-motd.d/40-autohostname
exit 0
To get the script to run at boot time, we add a line in /etc/rc.local e.g.
/usr/bin/autohostname.sh
Change the user-data for the test instance to 'hostname=test2' and reboot the instance. Once it reboots, you should be able to login to it via test2.simple.org. It may take a couple of minutes for this to resolve correctly, depending on the TTLs you specified. When you login, you should see a MOTD message telling you
Instance has been registered with the Route53 nameservers as 'test2.simple.org' pointing to ec2 domain name 'ec2-184-73-137-40.compute-1.amazonaws.com'
Once you have this working with the test instance it would make sense to back it up as an AMI that you can use to create other instances with the same autohostnaming abilities.
HTH
I installed the route53 gem, and wrote a little script:
gem install route53
#!/bin/bash
DATE=`date +%Y%m%d%H%M%S`
export HOME=/root
export DEBIAN_FRONTEND=noninteractive
export PATH=/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/usr/local/aws/bin /usr/local/node:$PATH
export JAVA_HOME=/usr/java/current
export EC2_HOME=/usr/local/aws
export EC2_PRIVATE_KEY=/root/.ec2/pk-XXXXXXXXXXXXXXXXXXXXXXX
export EC2_CERT=/root/.ec2/cert-XXXXXXXXXXXXXXXXXXXX
export EC2_INSTANCE_ID=`wget -q -O - http://169.254.169.254/latest/meta-data/instance-id`
echo "$EC2_INSTANCE_ID"
mkdir /root/$EC2_INSTANCE_ID
ec2din $EC2_INSTANCE_ID > /root/$EC2_INSTANCE_ID/$EC2_INSTANCE_ID.txt
export FQDN=`cat /root/$EC2_INSTANCE_ID/$EC2_INSTANCE_ID.txt |grep Name |awk '{print $5}'`
export EC2_DNS_NAME=`cat /root/$EC2_INSTANCE_ID/$EC2_INSTANCE_ID.txt |grep INSTANCE |awk '{print $4}'`
/usr/bin/ruby1.8 /usr/bin/route53 -g -z /hostedzone/XXXXXXXX --name $FQDN. --type CNAME --ttl 60 --values $EC2_DNS_NAME > /tmp/route53.out 2>&1
-Josh
I've taken a similar approach to 'sgargan' in that allow an instance to create its own DNS record in Route 53, but instead I've used the phython AWS library 'boto' and I have configured 'systemd' (the init/upstart replacement released in Fedora 15/16) to remove the dns entry wwhen the host is shut down.
Please see the following walk-through on how to do it : -
http://www.practicalclouds.com/content/blog/1/dave-mccormick/2012-02-28/using-route53-bring-back-some-dns-lovin-your-cloud
Whilst it isn't ideal exposing your internal ips in an external DNS zone file, until such time that Amazon create an internal DNS service then I think that it is preferable to running your own BIND instances.
The link mentioned in the previous answer is not available any more. But it is still available at the Wayback Machine: http://web.archive.org/web/20140709022644/http://www.practicalclouds.com/content/blog/1/dave-mccormick/2012-02-28/route53-bring-back-some-dns-lovin-ec2

Resources