0 nutch installation guide

Installation guide Revision History Revision No Description Created Date Installation guide Author Shirlin Voon NUTCH Download nutch from the link and unzip the ?le http www apache org dyn closer cgi nutch Install JAVA in ubuntu javac -version sudo apt-get install openjdk- - jdk Set JAVAHOME export JAVAHOME usr lib jvm java- -openjdk-i echo JAVAHOME to check whether JAVAHOME set correctly Run ??bin nutch ? Run the following command if you see ??Permission denied ? chmod x bin nutch Create folder urls and save the starting url need to be crawled inside seed txt ?le Edit the ?le conf regex- url ?lter txt This ?le stores the blacklist ?le Download sorl from the link and unzip the ?le http www apache org dyn closer cgi lucene solr Start sorl using command below cd Applications apache-solr- example java ??jar start jar SORL ADMIN WEBSITE http localhost solr admin http localhost solr admin stats jsp Start nutch using command below cd Applications apache- nutch- bin nutch inject crawl crawldb urls bin nutch generate crawl crawldb crawl segments s ls ??d crawl segments tail - echo s bin nutch fetch s bin nutch parse s CInstallation guide bin nutch updated crawl crawldb s bin nutch invertlinks crawl linkdb ??dir crawl segments bin nutch solrindex http solr crawl crawldb ?? linkdb crawl linkdb crawl segments Start nutch command can simplify into command as below bin nutch crawl urls ??dir crawl ??solr http localhost solr -depth ??topN ApacheMySQLPHP Create folder www at home Download phpMyAdmin and unzip and put in folder www http www phpmyadmin net homepage downloads php sudo apt-get install update sudo apt-get install phpmyadmin sudo apt-get install apache sudo apt-get install libapache -mod-php Install MySQL optional sudo apt-get install mysql-server libapache -mod-auth-mysql php mysql sudo mysqlinstall db sudo apt-get install mysql-client-core- sudo apt-get install php -cli Chage default document root sudo cp etc apache sites-available default etc apache sites- available mysite gksudo gedit etc apache sites-available mysite Change documentroot to new location PS make sure no space in new location Change GITsudo apt-get install git git init install GIT initialize git CInstallation guide git status check status git add add all ?le git add Prism add folder git add ??test java ? add ?le git rm remove all ?le git commit ??m ??Description ? commit before push to server git push http dev Prism git git pull http dev Prism git SINGLE NODE CLUSTER SETUP You may refer to link below http www michael-noll com tutorials running-hadoop-on-ubuntu-linux- singlenode-cluster Make sure JAVA is installed Create an user used for all machine sudo addgroup hadoop sudo adduser ??ingroup hadoop prism su ?? prism Install ssh sudo apt-get install ssh ssh localhost Generate SSH key ssh-keygen ??t rsa ??P ?? ? Enable SSH access to local machine without key in password every time cat home prism ssh idrsa pub home prism ssh authorizedkeys Disabling IPv sudo nano etc sysctl conf copy following lines to end of the ?le disable ipv net ipv conf all disableipv

  • 42
  • 0
  • 0
Afficher les détails des licences
Licence et utilisation
Gratuit pour un usage personnel Attribution requise
Partager