Updates from January, 2012 Toggle Comment Threads

  • Urban 14:08 on 15 Jan. 2012 Permalink  

    Subtitlr retires 

    The agony has gone on long enough: from an idea in 2006, to a proof of concept in mid-2007, a business plan and a hopes of a start-up (under the name of Tucana, d.o.o.) in 2008, directly to the dustbin of history.

    I’ve just pulled the plug and shut it down.

    Its ideas were good: a wikipedia-inspired revision based subtitling and subtitle translation service, which would help spread the knowledge in the form of flash-based videoclips. It’s been obsoleted by other projects with more traction, such as DotSub and TED translations (incidentally, most of the clips I was inspired by and wanted to share with people, whose first language was not English, came from the TED itself). Now that Youtube’s language recognition and Google’s machine translation have gotten much better, there’s less and less need for painstaking transcription and all the manual work.

    If I had to choose one thing to blame for its lack of success, underestimating the difficulty of transcribing video would be it. It literally takes hours to accurately transcribe a single clip, which is no longer than a couple of minutes.

    I’ve tried rebranding and repurposing it into a Funny clip subtitler and at least got some fun and local enthusiasm out of that. However, it’s all a part of one big package which now needs closure.

    Some ideas I’ve had were never implemented, although I thought they had great potential; I wanted to bring together large databases of existing movie and TV show subtitles with the publicly available video content in Flash Video. Since at the time almost all video on the web was FLV, there was no technological barrier. And there’s still a lot of popular TV shows, movies, etc, burried deeply in the video CDNs (Youtube, Megavideo, Megaupload), and large databases of “pointers” are maintained and curated by different communities (Surfthechannel.com, Icefilms.info). Having the video and the subtitle available instantly, without the cost of hosting large files, was a textbook Mash-up idea.

    I’m posting some screenshots below, for the future me, so I can remember what I was spending countless hours of my time on. Yes, the design’s ugly, but bear in mind it was all a work of one man, eager to add functionality, and pressed into kickstarting the content generation by also transcribing and translating a bunch of videos.

    Thanks to all who shared the enthusiasm and helped in any way.

    Main page

     

    Video page

     

    Subtitle translation

    Rebranded as a Hitler-parody subtitle editor

     

     
  • Urban 02:01 on 9 Jan. 2012 Permalink  

    Some notes on Nexenta/ZFS 

    So far I’ve been pretty satisfied with my Nexenta setup. Dedup is a great feature and I’ve rsynced all other computers to it without a single thought of which files are copied more than once (all Windows and program files, multiple copies of pictures, multiple copies of Dropbox dirs, etc.). However, the following three things drove me nuts; here’s a word on how I’ve resolved them.

     

    Smbshare vs. Samba

    Yes, native ZFS smbshare is great; it even exposes snapshots as “Previous versions” to Windows boxes. And it can be simply managed from napp-it. However, smbshare won’t allow you to share the file system’s children 🙁

    Here’s how this works: let’s say you have 3 nested file systems:

    • mypool/data
    • mypool/data/pictures
    • mypool/data/pictures/vacation

    When you share mypool/data and navigate to it, you won’t see the pictures dir. When you navigate to pictures, you won’t see the vacation dir.

    It drove me crazy and it seems it won’t be supported anywhere in the near future. That’s why I disabled smbshare completely and installed plain old Samba. Because Samba’s not ZFS-aware (but instead a plain old app that accesses the file system) it shares everything as you’d expect. Problem solved.

     

    Backing up an entire zpool

    I wanted the following:

    • to backup the entire data pool to an external USB drive of the same capacity (2TB)
    • a solution that would be smart enough to recognize matching snapshots and only copy the diffs
    • it should also delete the destination snaps that no longer exist in the source pool
    • it wouldn’t hurt if it supported remote replication in case I wanted that later

    Much has been written about the awesomeness of zfs send | zfs receive, but I was immediately disappointed by all the manual work that still needed to be done. Sure, it supports recursive send/receive, it can be used over ssh, but it only operates on snapshots. It can incrementally copy the last delta (if you tell it exactly which two snapshots are to be used for diff), but if you prune your old snapshots to save space, it won’t know anything about that. So your backup size will grow indefinitely, constantly appending more data.

    What I wanted was a simple 1:1 copy of my pool to an external drive. I even considered adding the drive to the mirror to create a 3-way mirrored pool; once resilvering would complete, I could split the mirror and disconnect the USB drive. However, resilvering is not that smart and takes days; all the data needs to be copied every time, and making a snapshot interrupts and restarts the process.

    Then I found the excellent zxfer. It does work on Nexenta and allows you to mirror an entire pool; the procedure is pretty straightforward: first determine the path of your external USB drive using rmformat:

    #rmformat

    1. …
    2. Logical Node: /dev/rdsk/c3t0d0p0
    Physical Node: /[email protected],0/pci103c,[email protected],2/[email protected]/[email protected],0
    Connected Device: WD 20EADS External 1.75
    Device Type: Removable
    Bus: USB
    Size: 1907.7 GB
    Label: <Unknown>
    Access permissions: Medium is not write protected.

    Then create your USB drive backup pool:

    zpool create Mybook-USB-backupzpool /dev/rdsk/c3t0d0

    Finally, recursively backup your entire data zpool (here we set the target to be compressed with gzip-5 and deduped with sha256,verify)

    zxfer -dFkPv -o compression=gzip-5,dedup=verify -R sourcezpool Mybook-USB-backupzpool

    On subsequent runs it identifies the last common snapshot and copies only the diffs. [-d] switch deletes pruned snapshots from the target pool. For more, read the zxfer man page.

     

    Virtualbox

    This has been pretty disappointing; it’s a pain to set it up and it performs badly (that is, compared to VmWare on similar hardware). It burns around 15% of the CPU running an idle and freshly installed Ubuntu box. Command-line config is a pain and virtualizing (P2V) certain Windows boxes spits out errors that (according to Google) noone has ever seen before. The same image Just Works ™ under VmWare.

    Nonetheless, it’ll have to do for now. For more info on how to set it up, consult the following:

     
    • Gea 23:33 on 12 Jan. 2012 Permalink

      about SMB
      There are efforts to solve this problem.
      Currently: Each ZFS dataset is completely independant from others. You can set a mountpoint to mount the logically below others. But you cannot SMB browse between because they cannot inherit properties.

      Workaround: use one share

      about zfs send
      the target does not increase. its always an exact copy of the source including all volumes and snaps. Transfers are based on modified data blocks so its more efficient than file based methods

      About Virtialbox
      Why do you try to virtualize on top of OS when you need performance. Use a barebone virtualizer like ESXi instead and virtualize all OS’s including a ZFS NAS/SAN
      look at napp-it all-in-one

    • Urban 15:58 on 15 Jan. 2012 Permalink

      Hey, thanks for your elaborate comment.
      Regarding SMB, I think Samba is a pretty decent workaround as well; as far as I can tell, all you lose is “previous versions”.

      About zfs send: my understanding is that only the diff between two snaps is sent, while older snaps are left intact (I did in fact check that).
      Let’s say I have a weekly snapshot schedule and only want to keep 5 snaps. After a snap, I also send the incremental diff to the USB drive. In week 5, I still have two identical filesystems. However, in week 6 I make snap6 and destroy snap1; then I send delta (-i) between snap5 and snap6 to external drive. Now drive 1 has snaps 2-6 and drive 2 has snaps 1-6. This is not what I want, since drive 2 grows in size compared to drive 1. 

      Regarding Vbox: thanks for the tip, I’ll definitely try the all-in-one solution.

    • jaymemaurice 06:01 on 22 Dec. 2013 Permalink

      Old post – but you also lose out on a great multi-threading CIFS implementation by using Samba.

c
Compose new post
j
Next post/Next comment
k
Previous post/Previous comment
r
Reply
e
Edit
o
Show/Hide comments
t
Go to top
l
Go to login
h
Show/Hide help
shift + esc
Cancel