[GLLUG] Looking for some open source projects (local) to aid in
vanek
vanek at acd.net
Tue Jan 5 01:57:17 EST 2010
Ruby is awesome.
Can't remember where I found these scripts.
The hardest part was not writing the code, but setting up the Ruby
environment.
#!/usr/bin/env ruby
require 'mechanize'
require 'hpricot'
# youtube most viewed
agent = WWW::Mechanize.new
url = "http://gdata.youtube.com/feeds/api/standardfeeds/most_viewed" #
all time
page = agent.get(url)
# parse again w/ Hpcricot for some XML convenience
doc = Hpricot.parse(page.body)
# pp (doc/:entry) # like "search"; cool division overload
images = (doc/'media:thumbnail') # use strings instead of symbols for
namespaces
FileUtils.mkdir_p 'youtube-images' # make the images dir
urls = images.map { |i| i[:url] }
urls.each_with_index do |file,index|
puts "Saving image #{file}"
agent.get(file).save_as("youtube-images/vid#{index}_#{File.basename
file}")
end
#!/usr/bin/env ruby
require 'mechanize'
require 'hpricot'
# some btjunkie torrents
agent = WWW::Mechanize.new
agent.get("http://btjunkie.org/")
links = agent.page.search('.tor_details tr a')
hrefs = links.map { |m| m['href'] }.select { |u| u =~ /\.torrent$/
} # just links ending in .torrent
FileUtils.mkdir_p('btjunkie-torrents') # keep it neat
hrefs.each { |torrent|
filename = "btjunkie-torrents/#{torrent.split('/')[-1]}"
puts "Saving #{torrent} as #{filename}"
agent.get(torrent).save_as(filename)
}
David Singer wrote:
> ROR has some great tools for this. google screen scraping.
>
> David
>
> On Mon, Jan 4, 2010 at 7:16 PM, Clay Dowling<clay at lazarusid.com> wrote:
>
>> Steven Sayers wrote:
>>
>>> I'm sure it is, it could become more difficult depending on the format /
>>> structure of the website, however.
>>>
>>> On Mon, Jan 4, 2010 at 1:32 PM,<user at qtm.net<mailto:user at qtm.net>>
>>> wrote:
>>>
>>> Hi Steve,
>>> Is it possible to take some information from a web page then put
>>> it in a database or spreadsheet?
>>> Thanks,
>>> Phil
>>>
>>>
>> For a project like that, use curl and the libxml tools. I think there are
>> tutorials on the curl project site and there are definitely some things at
>> the libxml site.
>>
>> Clay
>> _______________________________________________
>> linux-user mailing list
>> linux-user at egr.msu.edu
>> http://mailman.egr.msu.edu/mailman/listinfo/linux-user
>>
>>
> _______________________________________________
> linux-user mailing list
> linux-user at egr.msu.edu
> http://mailman.egr.msu.edu/mailman/listinfo/linux-user
>
More information about the linux-user
mailing list