[GLLUG] Looking for some open source projects (local) to aid in

vanek vanek at acd.net
Tue Jan 5 01:57:17 EST 2010


Ruby is awesome.
Can't remember where I found these scripts.
The hardest part was not writing the code, but setting up the Ruby 
environment.

#!/usr/bin/env ruby
require 'mechanize'
require 'hpricot'
# youtube most viewed

agent = WWW::Mechanize.new
url = "http://gdata.youtube.com/feeds/api/standardfeeds/most_viewed" # 
all time
page = agent.get(url)
# parse again w/ Hpcricot for some XML convenience
doc = Hpricot.parse(page.body)
# pp (doc/:entry) # like "search"; cool division overload
images = (doc/'media:thumbnail') # use strings instead of symbols for 
namespaces
FileUtils.mkdir_p 'youtube-images' # make the images dir
urls = images.map { |i| i[:url] }
urls.each_with_index do |file,index|
   puts "Saving image #{file}"
   agent.get(file).save_as("youtube-images/vid#{index}_#{File.basename 
file}")
end



#!/usr/bin/env ruby
require 'mechanize'
require 'hpricot'
# some btjunkie torrents

     agent = WWW::Mechanize.new
     agent.get("http://btjunkie.org/")
     links = agent.page.search('.tor_details tr a')
     hrefs = links.map { |m| m['href'] }.select { |u| u =~ /\.torrent$/ 
} # just links ending in .torrent
     FileUtils.mkdir_p('btjunkie-torrents') # keep it neat
     hrefs.each { |torrent|
       filename = "btjunkie-torrents/#{torrent.split('/')[-1]}"
       puts "Saving #{torrent} as #{filename}"
       agent.get(torrent).save_as(filename)
     }



David Singer wrote:
> ROR has some great tools for this. google screen scraping.
>
> David
>
> On Mon, Jan 4, 2010 at 7:16 PM, Clay Dowling<clay at lazarusid.com>  wrote:
>    
>> Steven Sayers wrote:
>>      
>>> I'm sure it is, it could become more difficult depending on the format /
>>> structure of the website, however.
>>>
>>> On Mon, Jan 4, 2010 at 1:32 PM,<user at qtm.net<mailto:user at qtm.net>>
>>> wrote:
>>>
>>>     Hi Steve,
>>>     Is it possible to take some information from a web page then put
>>>     it in a database or spreadsheet?
>>>     Thanks,
>>>     Phil
>>>
>>>        
>> For a project like that, use curl and the libxml tools.  I think there are
>> tutorials on the curl project site and there are definitely some things at
>> the libxml site.
>>
>> Clay
>> _______________________________________________
>> linux-user mailing list
>> linux-user at egr.msu.edu
>> http://mailman.egr.msu.edu/mailman/listinfo/linux-user
>>
>>      
> _______________________________________________
> linux-user mailing list
> linux-user at egr.msu.edu
> http://mailman.egr.msu.edu/mailman/listinfo/linux-user
>    



More information about the linux-user mailing list