Extracting information in general is mining. It could be extracting data in which case it is data mining. If we want to extract information regarding a website, an online application the process is referred to as web mining. It is also called as website scrapping.
There are many websites built upon these mined data. For example, we see many job portals showing job listing from popular job sites like dice, indeed etc. Not sure if it is a legal process. Just wanted to give some practical examples and hence mentioned it.
In addition to this corporate companies require mining scripts to extract information about their websittes including source from which their website is being accessed, number of hits etc. This is done from website logs. This is based on webserver in use. Most websites use Apache web server and log files are located under file httpd.
Web scraping can be done using plenty of scripting languages. All these scripting languages are used for creating dynamic webpages, automate system and database administration tasks etc.
1) Python - The most popular language. It has packages like urllib2 which helps in web scraping. BeautifulSoup is a package that can be installed and used
2) PHP - The most popular web scripting language has cURL which helps in process of web mining
3) Perl - Progrmmable Extraction And Reporting Language is geared towards this. Powerful perl pattern matching helps in the process of web scraping. In addition to this perl packages like www::mechanize can be downloaded and installed for this extraction.
4) Shell Script - csh,bash,tcsh and such scripts can be used to perform extraction. This can be best used for internal websites. Pattern matching helps in this process
There are many websites built upon these mined data. For example, we see many job portals showing job listing from popular job sites like dice, indeed etc. Not sure if it is a legal process. Just wanted to give some practical examples and hence mentioned it.
In addition to this corporate companies require mining scripts to extract information about their websittes including source from which their website is being accessed, number of hits etc. This is done from website logs. This is based on webserver in use. Most websites use Apache web server and log files are located under file httpd.
Web scraping can be done using plenty of scripting languages. All these scripting languages are used for creating dynamic webpages, automate system and database administration tasks etc.
1) Python - The most popular language. It has packages like urllib2 which helps in web scraping. BeautifulSoup is a package that can be installed and used
2) PHP - The most popular web scripting language has cURL which helps in process of web mining
3) Perl - Progrmmable Extraction And Reporting Language is geared towards this. Powerful perl pattern matching helps in the process of web scraping. In addition to this perl packages like www::mechanize can be downloaded and installed for this extraction.
4) Shell Script - csh,bash,tcsh and such scripts can be used to perform extraction. This can be best used for internal websites. Pattern matching helps in this process