Thursday, May 24, 2012
Email
Make My Homepage
RSS
  • Search Freelance Projects
  • Google
  • Yahoo
Register
Url Harvesting Script  E-mail
Thursday, 09 July 2009 01:27
I need a script or desktop application (for windows vista) to harvest website addresses (URLs) for me. My preference is a script that runs in PHP and MYSQL on a Linux server. I want to enter a list of keyword phrases like "cheap hosting" and "custom furniture". Typically there will be a few hundred of these at a time and I want to be able to add and delete phrases. When I run the script (let's call that a scan), it must do the following - get website addresses from the following sources for me (using "cheap hosting" as an example) - 1) http://www.google.com/sponsoredlinks?q=cheap+hosting&btnG=Search+Sponsored+Links - all the results (not just the first page of results) 2) The first 100 results from http://www.google.com/search?q=allinurl%3Acheap+hosting - filtered to show only the URLs that actually have one or more of the keywords in the domain name itself, but not as part of a subdomain. (So cheapcar.com and fasthosting.com is ok, but not computers.com/cheaphosting.htm, and not cheap.hosting.com) 3) The first 100 results from Google for http://www.google.com/search?q=cheap+hosting These results must go into a database in the form of website.com (NOT website.com/djdjd/ururu.htm) seperated into the 3 categories above and the date that the script was run. There will be function where I can enter from time to time (for a specific keyword phrase) multiple (anything from 10 - 1000 at a time) URLS in the form of website.com. If these URLs match existing URLs for that specific keyword phrase, it must be marked as "used". I also need to be able to mark/reset URLs to unused/default. Then I need a function to make a report of all the unique results gathered for a specific keyword between date x and Date Y (I will enter these values) that is not marked as used. The report must be in CSV format with these fields - keyword phrase url source That is the basic functionality of the script. Other features will include - 1)There will be a general filter applicable to all keyword phrases where I want to enter URLs that I do not want collected. 2) I want to run the scans in bulk, by selecting which keyword phrases to scan, at the end of the scan the script must give a report of the number of NEW results per keyword phrase that were collected during that scan that have not been collected before. 3)The script must have a setting to specify a random delay in seconds between searches to avoid being blocked by Google.


View Project Details
 

Related Projects

Home MySQL Url Harvesting Script
Copyright © 2012. Science2science. Designed by IndianNationalHost.com