[GLLUG] Looking for some open source projects (local) to aid in

Tue Jan 5 08:48:57 EST 2010

Might want to look at Nokogiri as well :)

David

On Mon, Jan 4, 2010 at 10:57 PM, vanek <vanek at acd.net> wrote:
> Ruby is awesome.
> Can't remember where I found these scripts.
> The hardest part was not writing the code, but setting up the Ruby
> environment.
>
> #!/usr/bin/env ruby
> require 'mechanize'
> require 'hpricot'
> # youtube most viewed
>
> agent = WWW::Mechanize.new
> url = "http://gdata.youtube.com/feeds/api/standardfeeds/most_viewed" # all
> time
> page = agent.get(url)
> # parse again w/ Hpcricot for some XML convenience
> doc = Hpricot.parse(page.body)
> # pp (doc/:entry) # like "search"; cool division overload
> images = (doc/'media:thumbnail') # use strings instead of symbols for
> namespaces
> FileUtils.mkdir_p 'youtube-images' # make the images dir
> urls = images.map { |i| i[:url] }
> urls.each_with_index do |file,index|
>  puts "Saving image #{file}"
>  agent.get(file).save_as("youtube-images/vid#{index}_#{File.basename file}")
> end
>
>
>
> #!/usr/bin/env ruby
> require 'mechanize'
> require 'hpricot'
> # some btjunkie torrents
>
>    agent = WWW::Mechanize.new
>    agent.get("http://btjunkie.org/")
>    links = agent.page.search('.tor_details tr a')
>    hrefs = links.map { |m| m['href'] }.select { |u| u =~ /\.torrent$/ } #
> just links ending in .torrent
>    FileUtils.mkdir_p('btjunkie-torrents') # keep it neat
>    hrefs.each { |torrent|
>      filename = "btjunkie-torrents/#{torrent.split('/')[-1]}"
>      puts "Saving #{torrent} as #{filename}"
>      agent.get(torrent).save_as(filename)
>    }
>
>
>
> David Singer wrote:
>>
>> ROR has some great tools for this. google screen scraping.
>>
>> David
>>
>> On Mon, Jan 4, 2010 at 7:16 PM, Clay Dowling<clay at lazarusid.com>  wrote:
>>
>>>
>>> Steven Sayers wrote:
>>>
>>>>
>>>> I'm sure it is, it could become more difficult depending on the format /
>>>> structure of the website, however.
>>>>
>>>> On Mon, Jan 4, 2010 at 1:32 PM,<user at qtm.net<mailto:user at qtm.net>>
>>>> wrote:
>>>>
>>>>    Hi Steve,
>>>>    Is it possible to take some information from a web page then put
>>>>    it in a database or spreadsheet?
>>>>    Thanks,
>>>>    Phil
>>>>
>>>>
>>>
>>> For a project like that, use curl and the libxml tools.  I think there
>>> are
>>> tutorials on the curl project site and there are definitely some things
>>> at
>>> the libxml site.
>>>
>>> Clay
>>> _______________________________________________
>>> linux-user mailing list
>>> linux-user at egr.msu.edu
>>> http://mailman.egr.msu.edu/mailman/listinfo/linux-user
>>>
>>>
>>
>> _______________________________________________
>> linux-user mailing list
>> linux-user at egr.msu.edu
>> http://mailman.egr.msu.edu/mailman/listinfo/linux-user
>>
>
>