Until last week, I had never realized how difficult uploading large files to a web site can be. HTTP isn’t really that well suited to it, and PHP has a couple of glaring weaknesses that make it nearly impossible. It all started when I ran into a minor problem at school…

An extra service for my students

When a student can’t make it to my class, it isn’t uncommon for that student’s parents to call the school and ask the secretary to put a tape recorder in the classroom and record the class. That way, the kid can listen to the class and be sure not to miss anything. I have my doubts that anyone listens to an entire two hour class, but I can understand the desire on the parents’ part and I can also see how students might want to hear certain things. A tape is far from ideal, but I guess it is better than nothing.

A couple of weeks ago, several parents called to say their kids would be missing a certain class due to something going on at their public school. For first time, there were so many absent that the secretary didn’t have enough tape recorders to tape the class for everybody. She explained that to me and asked about buying some more. Tape recorders aren’t that expensive, but the more I thought about it, the more I realized I didn’t like the idea. The quality of in-class tape recordings is terrible. Tapes are out-dated. I still have my students use them for homework since it’s a pretty standard system and I haven’t come up with a fully workable system to collect digital recordings from each student every class. But still… the idea of having a fleet of tape recorders whirring away at the back of the classroom and then having to manually flip the tapes in them during class doesn’t thrill me.

An attempt to modernize the system

I decided the best way to do it would be to buy a nice digital recorder at the local hypermart. That way, the secretary could just email an MP3 to everyone! It would be easier for us, the sound quality would be better and it save the parents a trip to pick them up! Unfortunately, they couldn’t handle 100 megabyte email attachments. Strike that idea.

My next thought was that I’d just put the class recordings on our school blog. That didn’t work either. The free service we’re using allows us to upload 5 gigabytes, but specifically blocks audio files. Fine. “I’ll host them from my Dreamhost account,” I thought. They’re giving me 6 terabytes of bandwidth a month, why not use it? Just put the MP3 files on my server, link to them from the school blog and it’s good to go, right? Right. That worked without a hitch.

Letting the secretaries to upload files

I thought it would be easy to make a web-based way for the secretaries to upload big files. It wasn’t that hard for me to implement 5 megabyte uploads in 1999 with Perl, so 100 megabyte uploads in 2009 should be a piece of cake. They really should be, but they aren’t.

I made a quick file upload tool in PHP, put it on my page and tested it out. It worked great for anything up to 7 megabytes, and failed on anything bigger. That was just limit set in a config file. I revised the limit to 200 megabytes and tried again. It still failed. It wasn’t even clear why. The upload page wasn’t very user-friendly either. It gave no feedback until a file was either uploaded or the upload had failed. For tiny files that was fine, but for debugging a problem with big files, it was a bit annoying.

So, I set out searching online for a way to make my upload report its progress. After hours of reading through various forums and references I learned that PHP lacks the ability to monitor incoming file streams! Amazing. Not having name-spaces is one thing, but no way to monitor incoming file streams!!? Forget that.

Making an FTP-based tool

FTP is well suited to large file transfers, it has facilities for reporting progress and it doesn’t time out easily like HTTP, either. FTP only has a couple of problems– It’s confusing for computer novices and I don’t want to share my server log-in information with the secretaries.

The solution was to make a simple Python desktop application that hard-codes and encrypts my log-in information and initiates transfers to a directory on my server via an SSH-tunneled FTP connection. From the user’s perspective, it just pops up a window, asks what file should be uploaded, and then does it. Problem solved.

(1) Thanks to the friendly community at Stack Overflow for suggesting the Paramiko libraries. They made adding sFTP support easy.
(2) It seems that for those that must have HTTP file transfers with progress bars, the popular tool sets are ASP or Java or Perl plus AJAX.