[GLLUG] TCP/IP protocol efficiency

Wed Jan 24 12:54:18 EST 2007

Mike Szumlinski wrote:
> Access can be read only, that is perfectly fine.  The problem we are 
> running into is that a full scan can be several hundred gigabytes.  
> The problem seems to be that as soon as tcp/ip compression kicks in 
> enough on these scans to get some solid throughput, it is done 
> transferring that single file and starts all over again.  We are only 
> getting about 6-7MB/sec out of gig-e using sftp currently
As someone else suggested, you could be having encryption speed problems 
with sftp. Check the processor load to see if that is the bottleneck; if 
so, and given how far away from 100% utilization you are, you probably 
need to go with something that doesn't use (software) encryption. 
Hopefully that's acceptable; if not, let the list know :)

Are these scans by any chance localized by directory? Some FTP servers 
have the ability to .tar or .tar.gz directories on the fly, which could 
fix your compression problem and encryption problem in one fell swoop 
with no programming.

Depending on what you want to do, you may also be able to write a quick 
CGI script to dump out a tar file via HTTP, using HTTP encryption and/or 
authentication as needed. By "quick" here I mean "you'll probably 
recover the writing time in two or three transfers" given how you 
describe them :) . You'd want to make sure that the CGI script was 
actually sending the file out as tar made it, rather than buffering the 
entire file then sending it. PHP and other web systems can be made to do 
this, but this may be one case where you're better off with just a CGI 
script. This would allow you to send in arbitrary query string 
parameters to describe what scans you want. Be careful with your command 
line escaping! Also be aware that such a script would consume an entire 
server process while it is running, so it's a bad idea to do this on a 
production server that is serving "real" web content as well unless you 
can guarantee that only a fraction of the server's resources will be 
consumed by this process (no sudden glut of 100 people trying to grab 
huge directories). Anyway, this opens up a lot of options, from wget and 
curl to potentially a web interface to specify the desired files, if 
desired.