Java 828242
http://www.ccs.neu.edu/home/matthias/HtDP2e/index.html
size of a file du –h /a.txt
hadoop123
install pdsh
cd pdsh
./configure
make
make install
pdsh -R exec -w 192.168.1.3[0-2] ssh -x -l %u %h yum -y install krb5-workstation.x86_64
pdsh -R exec -w 192.168.1.2[1-7] ssh -x -l %u %h date -s \"16 APR 2012 19:04:09\"
:%s/foobar/hadoop
Replace words in vi
Linux replace in editor
:%s/\/dfsdata//g
2012-02-21 01:36:55,819 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException:
Datanode state: LV = -19 CTime = 1328715960120 is newer than the
namespace state: LV = -19 CTime = 0
No of mappers depends on input file
Configuration
Imp to know current features how the new api are coming
Job
Inputformats
Mapperclass
Reducerclass
Outputformat
Key and value
Java reflection
Amount reads u do amount of write u do
Network bandwidth
No of songs per artist
Write ur own object
Lzp rpms http://pkgs.repoforge.org/lzo/
Networkinglinux
http://www.linuxhomenetworking.com/wiki/index.php/Main_Page
crontab
*/10 * * * * netstat -plten 2>&1 >> /root/netstat.log
*/10 * 3 * * netstat -plten 2>&1 >> | mail -s "cronjob output"
[email protected]
# umount /media/disk/
umount: /media/disk: device is busy
umount: /media/disk: device is busy
First thing you’ll do will probably be to close down all your terminals and xterms but
here’s a better way. You can use the fuser command to find out which process was
keeping the device busy:
# fuser -m /dev/sdc1
/dev/sdc1: 538
# ps auxw|grep 538
donncha 538 0.4 2.7 219212 56792 ? SLl Feb11 11:25 rhythmbox
Mount problem
Mount –o remount,rw /
Edit vi /etc/fstab
/dev/sda1
/storage/data1
reboot
clusteradmin
ALL=(ALL)
NOPASSWD: ALL
hwclock --set --date="5/1/10 15:48:07"
# date -s "2 OCT 2006 18:00:00"
date --set="2 OCT 2006 18:00:00"
date +%Y%m%d -s "20081128"
date +%T -s "10:13:13"
Linux commands
http://support.nagios.com/knowledgebase/faqs/index.php?
option=com_content&view=article&id=52&catid=35&faq_id=305&expand=f
alse&showdesc=true
http://yahoo.github.com/hadoop-common/installing.html
export PATH=$PATH:/usr/bin/:/usr/bin/
export JAVA_HOME=/usr/java/jdk1.7.0/
export PATH=$PATH:$JAVA_HOME
rpm –i –-force jdk.1.6….
if java –version shows 1.4
then
rm /usr/bin/java
ln –s /usr/java/jdk..1.6 /bin/java /usr/bin/java
fdisk -l | grep Disk
/etc/redhat-release
Uname
Uname -a
vi .bash_profile
export HADOOP_HOME=/home/hadoop/hadoop-0…./
export PATH=$PATH:$HADOOP_HOME/bin
for command not found
Pxe
/tftpboot/pxelinux.cfg/default
Set replication 2
bin/hadoop fs -setrep -R -w 2 /tmp/hadoophadoop/mapred/staging/hadoop/.staging
default Centos
LABEL Centos
MENU LABEL Centos
KERNEL images/centos/x86_64/5.6/vmlinuz
append vga=normal initrd=images/centos/x86_64/5.6/initrd.img
ramdisk_size=32768
ksdevice=eth0 ks=ftp://192.168.1.45/install/ks/ks.cfg
Avinash.ldif
# avinash, < style="font-weight:bold;">localdomain.com
#dn: uid=root,ou=People,dc=localdomain,dc=com
#uid: root
#cn: admin
#objectClass: account
#objectClass: posixAccount
#objectClass: top
#objectClass: shadowAccount
#userPassword: {SSHA}PCHPZji+1m+sX0HwudP+UEqL9RZ4CXNR
#shadowLastChange: 15221
#shadowMin: 0
#shadowMax: 99999
#shadowWarning: 7
#loginShell: /bin/bash
#uidNumber: 0
#gidNumber: 0
#homeDirectory: /root
#gecos: root
dn: uid=arun,ou=People,dc=localdomain,dc=com
cn: arun kumar
sn: kumar
objectClass: top
objectClass: person
objectClass: posixAccount
objectClass: shadowAccount
userPassword: {SSHA}PCHPZji+1m+sX0HwudP+UEqL9RZ4CXNR
uid: arun
uidNumber: 502
gidNumber: 501
loginShell: /bin/bash
homeDirectory: /home/arun
shadowLastChange: 10877
shadowMin: 0
shadowMax: 999999
shadowInactive: -1
Ldap Authentication
<Directory "/var/www/html">
AuthType Basic
AuthName "enter your login id"
AuthBasicProvider ldap
AuthzLDAPAuthoritative of
AuthLDAPURL ldap://192.168.1.45:389/dc=localdomain,dc=com?uid?sub
require valid-user
Options None
Ganglia
Download EPEL(extra packages for enterprise linux)
user@host ~]$ sudo rpm -Uvh
http://download.fedora.redhat.com/pub/epel/5/x86_64/epel-release-54.noarch.rpm
sudo yum install rrdtool ganglia ganglia-gmetad ganglia-gmond
ganglia-web
sudo /sbin/chkconfig --levels 235 gmond on
sudo /sbin/service gmond start
sudo vim /etc/gmetad.conf
sudo /sbin/chkconfig --levels 235 gmetad on
sudo /sbin/service gmetad start
yum install httpd
go to gmond.conf and add host = ipaddr at udp_sthg
and cluster{
“grreen”
puppet
http://www.linuxforu.com/how-to/puppet-show-automating-unixadministration/
http://library.linode.com/application-stacks/ puppet/installation#sph_configuring-puppet
Download EPEL if ur linux didn’t have in ur linux.
rpm -Uvh http://download.fedora.redhat.com/pub/epel/5/x86_64/epelrelease-5-4.noarch.rpm
yum install puppet-server --enablerepo=epel
yum install ruby-rdoc
vi /etc/puppet/manifests/site.pp
# /etc/puppet/manifests/site.pp
import "classes/*"
node default {
include sudo
}
vi /etc/puppet/manifests/classes/sudo.pp
1 # /etc/puppet/manifests/classes/sudo.pp
2
3 class sudo {
4
file { "/etc/sudoers":
5
owner => "root",
6
group => "root",
7
mode => 440,
8
}
9 }
service puppet-server start
# chkconfig puppet-server on
client
yum install puppet –enablerepo=epel
yum install ruby-rdoc
vi /etc/sysconfig/puppet
# The puppetmaster server
PUPPET_SERVER=PuppetMaster
# If you wish to specify the port to connect to do so here
#PUPPET_PORT=8140
# Where to log to. Specify syslog to send log messages to the
system log.
PUPPET_LOG=/var/log/puppet/puppet.log
# You may specify other parameters to the puppet client here
#PUPPET_EXTRA_OPTS=--waitforcert=500
# service puppet start
# chkconfig puppet on
Ping –c 3 puppet
Change hostname and domainname puppetd –-puppet server and clients for
fully qualified domainname
Puppetd --server puppet.example.com –-waitforcert 60 –-test
Msg will appear
Info:cereating …………….
In puppet-server machine
Puppetca –-list
Puppetca –-sign puppetclient.examples.com
Check whether your pupputmaster and puppet client is on or of
Stop your ip tables
Check your hostname and domain name
Make certificates transfer b/w puppet and puppetmaster
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multinode-cluster/#networking
file { "/avi":
source => "/etc/httpd/conf/httpd.conf",
recurse => "true"
Restart puppetmaster
Puppetd –server puppet.example.com –waitforcert 60 –test
If any request errors /var/lib/puppet/ssl/certs or certificates_requests we
haqqve to delete in certs folder in client and server
http://ankitasblogger.blogspot.com/2011/01/hadoop-cluster-setup.html
Hadoop cluster setup
http://www.mazsoft.com/blog/post/2009/11/19/setting-up-hadoophive-cluster-on-Centos5.aspx
Install hadoop tar from cloudera tarball or rpm but I recommend through
tarball
Install java through rpm
Copy hadoop to /usr/local
Cp –r hadoop.0.20.2.cdh3.u2…. /usr/local
Export java path
export JAVA_HOME=/usr/java/jdk
change in core-site.xml
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.1.127:8020</value>
</property>
change in hdfs.site.xml
/storage/name(sda) /storage1/name(disk)sdb
Should do soft mount not hard mount
Rsync –r /storage/name /storage2/
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/storage/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/storage/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
Mapred-site-xml
<property>
<name>mapred.job.tracker</name>
<value>192.168.1.127:8021</value>
</property>
Start passwordless connection b/w namenode and slaves
Ssh-keygen –t rsa
Mkdir b@ip –p .ssh
cat .ssh/id_rsa.pub | ssh b@B 'cat >> .ssh/authorized_keys'
ssh b@ip
or
ssh-copy-id –i /home/hadoop/.ssh/id_rsa.pub
[email protected]
ssh centos1
ssh centos2
...
ssh b/w jobtracker and datanodes
namenode conf files
nano conf/masters
secondary namenode
centos1
nano conf/slaves
datanodes
secondary namenode
vi masters
secondary namenode ip addr
slaves
datanodes ip addr
change the permissions of tarball hadoop with hadoop:hadoop for all the
servers
ssh-keygen -t dsa
ssh-copy-id -i /home/hadoop/.ssh/id_dsa hadoop@localhost
for each server
bin/dfs.start.all in namenode
bin/mapred.start.all in jobtracker
Namenode
SN
JT
DN
DN TT
DN TT
TT
MASTER SNIP
MASTER SN IP
SDN
lavesTT
SLAVES
4DNIP
SLAVES DNIP
No slaves
I have started it but getting errors like
bash: /usr/local/hadoop/hadoop-0.20.2-cdh3u2/bin/hadoop-daemon.sh: No
such file or directory
so I started single for one of the datanode
09/10/08 13:30:12 INFO ipc.Client: Retrying connect to server: /
192.168.1.127:8021. Already tried 0 time(s).
I changed the namespace id as equal to namenode
/storage/name/current/VERSION
SO I RAN WORD COUNT PRG SO I GOT IT
Ganglia with puppet
/etc/puppet/manifests/Site.pp
Node hostname{
Include ganglia
include ganglia::copy_conf
include ganglia::copy_services
}
#Ganglia configuration file
#Ganglia service
class ganglia{
package { 'rpm wget ftp://192.168.1.102/ganglia-gmond-3.0.71.el5.x86_64.rpm':
ensure => installed
}
}
worked with another
class ganglia{
exec{"gmond":
command => "/usr/bin/wget ftp://192.168.1.146/ganglia-gmond3.0.7-1.el5.x86_64.rpm",
cwd => "/root",
creates => "/root/ganglia-gmond-3.0.7-1.el5.x86_64.rpm",
}
}
class ganglia{
package { 'ganglia':
ensure => installed
}
http://www.unixmen.com/linux-tutorials/1591-install-puppet-master-andclient-in-ubuntu
/sbin/service {
'ganglia':
ensure => true,
enable => true,
require => Package['ganglia']
}
package { 'yum':
ensure => installed,
}
}
http://tech.mangot.com/
class ganglia{
package { "ganglia":
ensure => installed
}
package { "ganglia-gmond":
ensure => installed
}
service { "gmond":
ensure => running,
subscribe => File["/etc/init.d/gmond"],
enable => true,
require => File["gmond"]
}
}
Another .pp
enable
=> "true",
name
=> "pakiti",
start => "/etc/init.d/pakiti start",
status => "/etc/init.d/pakiti status",
stop
=> "/etc/init.d/pakiti stop",
ensure => "running",
hasstatus => "true",
require => Package["pakiti-client"],
}
worked with this code
class ganglia{
package { "ganglia":
ensure => installed
}
package { "ganglia-gmond":
ensure => installed
}
service { "gmond":
enable
=> "true",
#start
=> "/etc/init.d/gmond start",
ensure
=> "running",
#require => File['/etc/init.d/gmond']
}
}
Installation ganglia with puppet completed in centos
/etc/puppet/modules/ganglia/manifests/init.pp
class ganglia{
package { "ganglia":
ensure => installed
}
package { "ganglia-gmond":
ensure => installed
}
include ganglia::copy_conf
include ganglia::copy_services
}
service { "gmond":
enable
=> "true",
#start
=> "/etc/init.d/gmond start",
ensure
=> "running",
#require => File['/etc/init.d/gmond']
}
class ganglia::copy_services{
file { 'gmond':
path => '/etc/init.d/gmond',
content =>
template('/etc/puppet/modules/ganglia/templates/services/gmond.erb'),
ensure => file,
owner => "root",
group => "root",
mode => 777,
}
}
class ganglia::copy_conf{
file { 'gmond.conf':
path => '/etc/gmond.conf',
ensure => file,
content =>
template('/etc/puppet/modules/ganglia/templates/conf/gmond.conf.erb'),
owner => "root",
group => "root",
mode => 777,
}
}
Errors remove the requests and certs when you get error
/var/lib/puppet/ssl/certs
/var/lib/puppet/ssl/certificates-requests
Or else reinstall and run puppet
f
Move the /etc/init.d/gmond to /…services/gmond.erb
Move /etc/conf/gmond.conf to /..conf/gmond.conf.erb
http://groups.google.com/group/puppetusers/browse_thread/thread/1b4f4edf1d328b4d?pli=1
Hadoop with puppet
http://itand.me/using-puppet-to-manage-users-passwords-and-ss
http://duxklr.blogspot.com/2011/05/using-puppet-to-manage-users-groupsand.html
define add_user($uid){
$username = $avinash,
user {$avinash:
home => "/home/$avinash",
shell => "/bin/bash",
uid => $503,
ensure => created,
}
group{$avinash:
gid => $504,
require => user[$avinash]
ensure => created;
}
file{"/home/$avinash/":
ensure => directory,
owner => $avinash,
group => $avinash,
mode => 750,
require => [user[$avinash],group[$avinash]]
}
Finished but I cant create user
user { "avinash":
groups => 'avinash',
commend => 'This user was created by Puppet',
ensure => 'present',
managed_home => 'true',
}
file { "/home/avinash/":
ensure => 'directory',
require => User['avinash'],
owner => 'avinash',
mode => '700',
}
http://itand.me/using-puppet-to-manage-users-passwords-and-ss
worked with this code but didn’t work
class /usr/sbin/useradd {
$username = $avinash,
user { $avinash:
home => "/home/$avinash",
shell => "/bin/bash",
uid => $503,
ensure => created,
}
group {$avinash:
gid => $504,
require => user[$avinash]
ensure => created;
}
file{"/home/$avinash/":
ensure => directory,
owner => $avinash,
group => $avinash,
mode => 750,
require => [user[$avinash],group[$avinash]]
}
user { "avinash":
groups => 'avinash',
commend => 'This user was created by Puppet',
ensure => 'present',
managed_home => 'true',
}
file { "/home/avinash/":
ensure => 'directory',
require => User['avinash'],
owner => 'avinash',
mode => '700',
}
Failed
# /etc/puppet/modules/users/virtual.pp
class /usr/sbin/useradd::virtual {
@user { "avinash":
home => "/home/avinash",
ensure => "present",
groups => ["root","avinash"],
uid => "504",
password => "centos",
comment => "User",
shell => "/bin/bash",
managehome => "true",
}
http://marksallee.wordpress.com/2010/08/25/create-a-puppet-test-networkwith-virtualbox/
puppet with hadoop
class hadoop{
exec{"hadoop-tar":
command => "/usr/bin/wget ftp://192.168.1.127/hadoop-0.20.2cdh3u2.tar.gz",
cwd => "/home/hadoop",
creates => "/home/hadoop/hadoop-0.20.2-cdh3u2.tar.gz",
}
exec {"hadooptar":
command => "/bin/tar -xvvf hadoop-0.20.2-cdh3u2.tar.gz",
cwd => "/home/hadoop",
creates => "/home/hadoop/hadoop-0.20.2-cdh3u2/",
}
# a fuller example, including permissions and ownership
file { "/storage":
ensure => "directory",
owner => "hadoop",
group => "hadoop",
mode => 750,
}
}
future
http://bitfieldconsulting.com/puppet-and-mysql-create-databases-and-users
mysql
http://blog.gurski.org/index.php/2010/01/28/automatic-monitoring-withpuppet-and-nagios/
nagios
pig
tar
records = LOAD '/home/hadoop/sample.txt' AS (year:int,temp:int);
dump results;
dump filter_records;
Loadfunc
Reverse
Name
Avinash
Sharad
o/p
hsanivia
how to read a xml file
<xml>
<ps
</ps>
Xml to text
[email protected]
Lamp architecture
Kerberos:authentication
How hadoop is is better than ratitional technology
Flume
Collect log files,aggregate,transforms.
35871 port of flume master
Sqoop
Install sqoop,hbase from tar
Mysql mysql-server php-mysql
Chkconfig –levels 235 mysql on
start
Setpath hadoop,hbase,java
Copy mysql jdbc driver..jar to sqoop/lib/
Create user account for hadoop
Password for hadoop in mysql
Use database;
Create a table in mysql
Create table tname(id int,name char(20),primary key(id));
Insert into tname values()
GRANT ALL ON mysql.* TO 'hadoop'@'localhost';
sqoop import --connect jdbc:mysql://192.168.1.56/<mysql> --username
root --password centos --table <sqooptable>
bin/sqoop import --connect jdbc:mysql://localhost/<mysql>
--username hadoop
–-password centos --table foo
Then after running it generates trasfered files
See o/p in hdfs /user/hadoop/tname/part…
Create a empty table bar
Grant privilages
bin/sqoop export --connect jdbc:mysql://localhost/mysql --username
hadoop --password centos --table bar --export-dir yeluri.txt
mysql to hive
see in mysql whether the table is updated or not
that is sqoop
mysql to hive
sqoop import --connect jdbc:mysql://localhost/movielens --username root
--password centos --table genre --hive-import --hive-table GENRE --hivehome /usr/lib/hive
sqoop --hive-import --connect jdbc:mysql://localhost/movielens --username
root --password centos --table genre --hive-table GENRE /usr/lib/hive/bin/
PIG
Export pig path home
Java path home
bin/pig -x local
REGISTER /home/hadoop/Pigex.jar
b = foreach a generate number,age,year(c);
a = load '/home/hadoop/y.txt' as (number,age,year);
drbd http://www.cloudera.com/blog/2009/07/hadoop-ha-configuration/
touch /home/hadoop/excludes
<property>
<name>dfs.hosts.exclude</name>
<value>/home/hadoop/excludes</value>
<final>true</final>
</property>
<property>
<name>mapred.hosts.exclude</name>
<value>/home/hadoop/excludes</value>
<final>true</final>
</property>
Secondary namenode as namenode
Deleted name folder changed the ip addrs in hdfs,coresite changed
/storage/name as /storage/namesecondary
So I got errors as failed to initialize so I overcome with this
hadoop-daemon.sh start namenode –importCheckpoint
next
hadoop-daemon.sh start namenode
HIVE
export HIVE_INSTALL=/home/hadoop/hive-0.7.1-cdh3u2
export PATH=$PATH:$HIVE_INSTALL/bin
export HADOOP_HOME=/home/hadoop/hadoop-0.20.2-cdh3u2
sudo cp mysql-connector-java-5.1.15/mysql-connector-java-5.1.15-bin.jar /usr/lib/hive/lib/
switch to hdfs or user where user has permissions to hdfs
error because of not export java in hadoop-env.sh
install ant
yum search ant
ant.x86_64
export ANT_LIB=/path/to/ant/lib
export ANT_LIB=/usr/shareant/lib
<property>
<name>hive.hwi.listen.host</name>
<value>0.0.0.0</value>
<description>This is the host address the Hive Web Interface
will listen on</description>
</property>
<property>
<name>hive.hwi.listen.port</name>
<value>9999</value>
<description>This is the port the Hive Web Interface will
listen on</description>
</property>
<property>
<name>hive.hwi.war.file</name>
<value>lib/hive_hwi.war</value>
<description>This is the WAR file with the jsp content for Hive
Web Interface</description>
</property>
hive_hwi.war in /usr/lib/hive/lib/ hive_hwi.war.cdh3.war
chmod 755 t0…..war
#start httpd
#stop iptables
In worstcase
export ANT_LIB=/usr/share/ant/lib
bin/hive --service hwi
https://cwiki.apache.org/confluence/display/Hive/HiveWebInterface
http://localhost:9999/hwi
put some data into hdfs
ab
hive.metastore.warehouse.dir
// for(int val =0;Iterable(values)!=0;values.iterator())
// {
// }
alias eth0 forcedeth
alias eth1 forcedeth
alias scsi_hostadapter sata_nv
alias scsi_hostadapter1 usb-storage
add user in hdfs
add avinash in linux
bin/hadoop fs –mkdir /user/avinash
bin/hadoop fs –chown hadoop:supergroup /user/avinash
core-site-xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.1.231:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/var/lib/hadoop-0.20/cache/${user.name}</value>
</property>
<!-- OOZIE proxy user setting -->
<property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>*</value>
</property>
</configuration>
Hdfs-site-xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<!-- specify this so that running 'hadoop namenode -format' formats the
right dir -->
<name>dfs.name.dir</name>
<value>/var/lib/hadoop-0.20/cache/hadoop/dfs/name</value>
</property>
<!-- Enable Hue Plugins -->
<property>
<name>dfs.namenode.plugins</name>
<value>org.apache.hadoop.thriftfs.NamenodePlugin</value>
<description>Comma-separated list of namenode plug-ins to be
activated.
</description>
</property>
<property>
<name>dfs.datanode.plugins</name>
<value>org.apache.hadoop.thriftfs.DatanodePlugin</value>
<description>Comma-separated list of datanode plug-ins to be activated.
</description>
</property>
<property>
<name>dfs.thrift.address</name>
<value>0.0.0.0:10090</value>
</property>
</configuration>
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
<!-- Enable Hue plugins -->
<property>
<name>mapred.jobtracker.plugins</name>
<value>org.apache.hadoop.thriftfs.ThriftJobTrackerPlugin</value>
<description>Comma-separated list of jobtracker plug-ins to be activated.
</description>
</property>
<property>
<name>jobtracker.thrift.address</name>
<value>0.0.0.0:9290</value>
</property>
</configuration>
Kerberos cdh3
https://ccp.cloudera.com/display/CDHDOC/Configuring+Hadoop+Security+in
+CDH3#ConfiguringHadoopSecurityinCDH3-TryRunningaMap%2FReduceJob
cacti http://www.cacti.net/downloads/docs/html/unix_configure_cacti
http://www.cyberciti.biz/faq/fedora-rhel-install-cacti-monitoring-rrd-software/
hadoop jar jarname classname inputhdfspath outputhdfspath
#!/usr/bin/env python
'''
This script used by hadoop to determine network/rack topology. It
should be specified in hadoop-site.xml via topology.script.file.name
Property.
<property>
<name>topology.script.file.name</name>
<value>/home/hadoop/topology.py</value>
</property>
'''
import sys
from string import join
DEFAULT_RACK = '/default/rack0';
RACK_MAP = { '208.94.2.10' : '/datacenter1/rack0',
'1.2.3.4' : '/datacenter1/rack0',
'1.2.3.5' : '/datacenter1/rack0',
'1.2.3.6' : '/datacenter1/rack0',
'10.2.3.4' : '/datacenter2/rack0',
'10.2.3.4' : '/datacenter2/rack0'
}
if len(sys.argv)==1:
print DEFAULT_RACK
else:
print join([RACK_MAP.get(i, DEFAULT_RACK) for i in sys.argv[1:]]," ")
Saas
Pass
Iaas
Namenode is in safemode
Nfs
Given doc
Server-backup
Client namenode
echo "hadoop install"
#wget http://archive.cloudera.com/redhat/cdh/cdh3-repository-1.01.noarch.rpm
#rpm -ivh cdh3-repository-1.0-1.noarch.rpm
#yum search hadoop
echo "hadoop installing"
#sudo yum -y install hadoop-0.20
#yum -y install hadoop-0.20-namenode
#yum -y install hadoop-0.20-secondarynamenode
#yum -y install hadoop-0.20-jobtracker
#yum -y install hadoop-0.20-datanode
#yum -y install hadoop-0.20-tasktracker
echo " install mysql"
#yum -y install mysql-server mysql httpd
#scp
[email protected]:/usr/lib/hadoop-0.20/conf/core-site.xml
/usr/lib/hadoop-0.20/conf/core-site.xml
#scp
[email protected]:/usr/lib/hadoop-0.20/conf/hdfs-site.xml
/usr/lib/hadoop-0.20/conf/hdfs-site.xml
#scp
[email protected]:/usr/lib/hadoop-0.20/conf/mapred-site.xml
/usr/lib/hadoop-0.20/conf/mapred-site.xml
#wget ftp://192.168.1.32/jdk-6u25-linux-x64-rpm.bin
#chmod 755 jdk-6u25-linux-x64-rpm.bin
#./jdk-6u25-linux-x64-rpm.bin
export JAVA_HOME=/usr/java/jdk1.6.0_25
export PATH=$PATH:$JAVA_HOME
hadoop log retention
IRC, "mapred.userlog.retain.hours" (24h default) controls this in my
environment and it seems to work fine on my cluster. Are you sure you
have tasklogs older than 24h lying around? It might even be a bug that
may have been fixed in the subsequent 0.20 releases that went out
recently.
Thanks for the reply. I realized that the property you mentioned
was missing in my mapred-site.xml.
I added the entry and it works just fine.
Was my assumption that "*hadoop.tasklog.logsRetainHours " *in
log4j.properties will do the same wrong? What is this property for in that
case?
export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote
$HADOOP_NAMENODE_OPTS"
HADOOP_NAMENODE_OPTS="-Xmx500m" will set it to 500MB. The "OPTS" here
> refers to JVM options. -Xmx is a common JVM option to set the maximum
> heap.
Set dfs.replication=2;
Increase the heap size of tasktracker jvm
mapred.child.java.opts property.
The default setting is -Xmx200m, which gives each task 200 MB of memory.
Datanode summary
http://192.168.1.123:50075/blockScannerReport?listblocks
46122674
Sloved the below problem by stoping firewalls and selinux-disabled
cat temp.txt | awk -F "\t" '{$1=$1}1' OFS="\n"
print horizontal
cat temp.txt | awk -F "\t" '{$1=$1}1' OFS="\n" > avi.txt
paste ban.txt avi.txt
paste ban.txt avi.txt | awk -F " " '{print $1" -----"$3}'
ntp
server 0.us.pool.ntp.org
server 1.us.pool.ntp.org
server 2.us.pool.ntp.org
server 3.us.pool.ntp.org
service ntpd start
chkconfig ntpd on
iptables -I INPUT -p udp --dport 123 -j ACCEPT
iptables –L
ntpq -p