Skip navigation

Siri screen capture

A screen cap from Siri

I’m interested to see if the new Siri voice command stuff on the iPhone 4S goes anywhere. I don’t think that it will, but it’s not because Android was there first or because I don’t think it works. It’s because I don’t think people will use it.

Yes, Android had voice commands first, but Siri is very different. It was created by a dedicated company based on military artificial intelligence research – not just a side project to take dictation. Siri was fully fleshed out before Apple bought it. Voice on Android works (if you speak slowly and clearly) but it isn’t “smart”. The breakthrough of Siri is that it works out what you want based on natural language and context, not keywords.

I don’t think people will use it for two reasons:

  1. It won’t work in every environment. Too much background noise, other people talking, television/radio on, etc. If you have to make a conscious effort to change your environment to use it, then you simply won’t. It’s not so convenient if you have to step out of a room or switch something off and you can accomplish the same thing with a few taps.
  2. People like their privacy. Artificial intelligence is compelling on television and in the movies because it is a trick to let the audience know what the characters are thinking. You are watching them problem solve.

In real life, people don’t want everyone else to know that they’re looking up restaurant reviews, creating an appointment to meet someone for dinner, or checking sports scores.

Voice interface and artificial intelligence are very powerful, but until you can subvocalize, I just don’t see it catching on.

This is the primary reason that I don’t think that computing in the living room on a TV work. People have an intimate relationship with their data and having the display across a room just feels too invasive. Sure, it works great to share Youtube videos with friends and do other consumption activities. But not research or creation.

Would you honestly feel comfortable writing an email across your living room where anybody could walk in and read it (or look in through a window).

Now how about on a train or in the office with everyone listening?

Edit: A counter argument from John Athayde on Google+

How about while you’re driving? How about if you’re in a private office?
I don’t think it will be ubiquitous, but I do think it will become more used, especially for certain circumstances.

My response

True. Baby steps. I just think for most people, if they don’t use a feature regularly, then they forget about it.
I’m very interested to see how it plays out and am envious that Apple bought it, when it was going to go multi-platform ;-)

Edit: The screen capture is from a series by Joshua Topolsky. The other queries he made are also excellent!

No, I’m not being cornered by friends and family to address any problems.. At least if I were, I don’t think they would publish the schedule for me :)

Intervention is a creator-focused Internet Culture Convention near Washington DC held September 16 – 18.

I am scheduled to speak on three panels this year:

Open Source Software for Everyday Use
When: Friday 6:00pm – 7:00pm
Where: Panels Room 2

Do you feel that your creativity is held back because you can’t afford programs like Photoshop, Illustrator, Final Cut, or Microsoft Office? Our panel of experts have freed themselves from the bonds of expensive closed software ecosystems and you can too. Whether you’re just fed up, want to try new things, or can’t afford to pick up the software you want, there is an open source alternative available to you. Find out what packages our panelist use, how to find software that fits your need, and how to join the amazing communities that spring up around open source software.

Chances are you probably use some OSS, and you don’t even know it. Our panel of open source evangelists discuss the facts and help you find the OSS packages that can free you from outrageously high costs, bizarre licensing practices, and poor interface design. Come and learn ways to save yourself money, improve your productivity, and secure your computer. You don’t have time or money to ignore OSS anymore. Set yourself free.

Keeping Your Stuff Safe
When: Saturday 6:00pm – 7:00pm
Where: Panels Room 2

Strategies for backing up and preserving your digital life

Audio Podcasting 101
When: Sunday 1:00pm – 2:00pm
Where: Panels Room 2

Everything you need to get up and running with a podcast with as little money as possible. Learn from these podcasters how to avoid those rookie mistakes that can turn an audience away

Update: I just upgraded my primary workstation to the full KXStudio as a test. Details at the end of the post

I have long been wanting to replace the Windows operating system on our studio machine with Linux, but haven’t for a number of reasons. Recently the machine has been having fits and garbled a very important interview, so it came time to wipe it and start over.

Our home server took a dump, so our regular studio box took its place. Reconstituted server hardware is now in the studio, however it is less than ideal. It was my primary desktop machine about 7 years ago sporting a 1.1 GHz AMD AthlonXP processor and 1 gig of memory (system bus is only like 200 Mhz), so it’s limited in what can be expected from it performance-wise. Someday the studio will get a proper hardware upgrade. It would help if we sold the friggin’ house though.

What works:
An Ubuntu 10.04 (Lucid) system with a very harmonious audio environment recording from my firewire mixer at 48kHz and 24bits. Currently using the KX “low latency” (but not realtime) kernel. More on why I went with Lucid below.

JACK is the full-time audio core and everything goes through it with bridges if apps aren’t JACK native. Ardour works, Linux VST plugins work, many Windows VST plugins work too. I didn’t play with any of that much since my primary goal is podcast production, preferably with Skype remote co-hosts, which works (yay!)

I loaded up Skype, then Ardour. In Ardour I mapped the first two mics on my mixer to tracks 1 + 2 (since my wife and I co-host most of our shows together). I created a 3rd track and mapped the pulseaudio sync to it. This feeds all audio output from pulse applications (browser, media player, Skype) to that track. The outputs of my microphones were already mapped from Jack to Pulse, so I didn’t have to do anything there.

I called the sexy Skype Call Testing robot and voila — I could hear her, and she could hear me! Furthermore, Ardour recorded all of her audio on to track 3 which was completely discreet (neither of my mics were on the track) and my tracks 1 + 2 were completely discreet as well – exactly what I wanted, so mission accomplished!

This was mostly “out of the box” with very little tweaking. The tweaks wouldn’t even have been necessary if I had a USB, PCI, or fully supported firewire interface. KXStudio really does “just work”.

On Twitter today Thomas, Chris and I commented about the new Google+ a little and I think that their “Hangout” feature will be a boon to podcast recording. It allows ten person video conferencing for free. With this setup I could participate in a multi-person video conference and record its audio (or not), and still have clean tracks of my side of the conversation. If each person recorded their side of the conversation and we pull WAV files together, then we’d have pristine sound with the benefit of that facial and body language feedback to help the conversation go smoother.

Caveats:

  1. My latency is pretty abysmal (24 – 46ms depending on how hard I want to push the cpu), but that’s not important for podcast recording. Nothing is noticeable on the Skype call. I will work on latency when I get back to doing some music composition, but I suspect I will need a new rig for that given the slow system bus and other limited resources of this machine. I was able to run at 2.5ms without affecting my Ardour tracks (I only tried three), but every other program was bogged as the CPU spikes. I suspect the Ardour recording works so well since JACK is running in realtime mode with a high priority so all other programs get very little cpu time.
  2. You cannot play audio from a pulse source and route that to Skype. For instance, the other people on the call wouldn’t hear a YouTube video if you played it. This is a minor inconvenience. I suspect that you could route a VLC or Audacity instance to feed them audio (thinking about podcast feedback here), but I didn’t have a chance to try it. In a way it’s good because it means that your buddies won’t hear any desktop alerts or other system audio chimes if you forgot to turn them off.
    Another benefit of Google+ Hang Out is that you can do shared YouTube watching that syncs between all browsers. If anybody pauses, fast forwards or rewinds it automatically does so on everybody’s YouTube stream. Pretty nifty! If only Netflix or HBOGo would hook into this!
  3. If you shut your mixer off, it will not come back up in JACK. You need to reboot your whole system before recording again. Also a minor inconvenience, but somewhat annoying since that was something that didn’t seem to bother Windows.
    Of course with the audio issues I’ve had lately with recording through Windows, I was profalactically rebooting before every session so it’s a wash. It is likely possible to modprobe the firewire kernel module again after turning the mixer back on and force-restarting JACK, but I didn’t try that.

How I Got There:
When looking at all of the media-centric Linux distributions I decided to go with KXStudio on top of Ubuntu. I found it interesting that they recommend Lucid (10.04) rather than the newest version. There were actually forum comments from FalkTX (the main guy behind KXStudio) essentially saying that 10.04 is still the best platform for audio on Linux due to changes in the newer versions.

I thought briefly of going ahead with 11.04 (KX does support it), but figured I would use what they recommend. The KXStudio team backports all of the kernels, tools and the latest versions of pretty much all audio software to 10.04 so there isn’t much to lose. Also, it is a Long Term Support release for Ubuntu, so it will have security and bug fixes half way through 2013. A recording studio is something that you don’t want to mess around with a lot once you have things dialed in.

I downloaded and installed Ubuntu 10.04, then followed the instructions to add the KX repositories and “upgrade” to KXStudio. It went very smoothly with a couple minor question prompts and some waiting for it to download a couple gigs of software.

One of the steps is picking your desktop environment (they support Gnome, KDE and Unity). I’m most familiar with Gnome so that’s what I went with. Years ago I was a KDE user and I briefly considered going back to it, but this project just isn’t the place to do that.

Another step is to pick a kernel. I was going to go with the realtime kernel (2.6.38-8), but that wouldn’t allow the proprietary nvidia graphics (built into my motherboard) drivers to work. For some reason, the system will not boot into X with the open source nvidia drivers (Nouveau), so I’m kind of stuck here. I went ahead with the “low latency” kernel which is a little older, but still 2.6 (2.6.33, I believe).

I also had to work through some monitor resolution issues. It was stuck at 640×480, then at 800×600. The highest I’ve been able to get it is 1024×768 which is annoying on the widescreen monitor, but acceptable. I’ll work it out later. X configuration has always been a bit of a black art to me so I need to do some more research.

First time bringing up the connection tool I didn’t see the Firewire mixer (an Alesis Firewire 8). My friend Thomas has the same mixer and went through an arduous journey getting his to work which I was hoping to avoid (though, thankful for his notes getting his to work!)

Working through FFADO’s troubleshooting FAQ I found that issue was simply Ubuntu not loading the kernel module. I loaded the module and it showed right up. I added the module to modprobe.conf so it would auto-load on boot.

In my playing with kernels, somehow this stopped working after a reboot and I couldn’t figure out why. It kept saying that the ohci1394 kernel module was missing. It ends up that it was in a blacklist file. I removed it from that file and all was well.

That was it for install and configuration and met all of my first goals of recording microphone and Skype tracks. Next I played a little with reducing latency. The default load was about 24ms (1024 buffer with 2 periods for ALSA and 1024/3 periods for firewire). I dropped this down to 128/2 and was down to 2.5ms, but as mentioned above, the CPU spiked at 100% and Skype audio cracked up. The interesting thing is that no xruns were reported and my microphone tracks in Ardour didn’t have any drops at all. I’m curious to see if monitoring tracks while recording causes drops or xruns but didn’t have a chance to play with it.

I tried a few different settings; 512/2, 512/3, 1024/2 and 2048/2. The default of 1024/2 really was the sweet spot. I don’t know if KXStudio always sets that as default, or if it did it based on my hardware.

I believe if I had a modern machine with dual, quad, or more cores and a faster system bus that everything would work just fine at 5ms, or maybe even 2.5ms. My primary desktop workstation is still no prize winner as it’s almost five years old, but it is at least dual core. I have the KX repositories on it, but have only used them to get the latest builds of Ardour, Audacity and JACK.

Now I’m going to do the full upgrade to KXStudio and see what kind of latency I get on it. Though, the audio interface is either the internal sound card on the motherboard or my Sennheiser usb headphones, so I don’t know how much they’ll impact things.

If I get some time I may haul the mixer upstairs and try it. The wife and I have been talking about making our normal computer room the studio, thus removing the need for a dedicated studio machine anyway. That is not likely going to happen until we move though, so I don’t know that I want to go to the trouble of connecting things and tearing it down again just for testing.

I hope this at least inspires your experimentation, if not helps you – feel free to ask for assistance if you go this route and get stuck!

Here is an excellent reference document that explains (in simple terms) why audio on Linux is so complex:
http://tuxradar.com/content/how-it-works-linux-audio-explained

 

Update:

I went ahead and did the full KXStudio upgrade on my primary workstation. It’s a 4.5 year old Dell with the following specs:

  • Intel Core2 1.86 gHz cpu
  • 2 GB ram
  • 667 mHz bus
  • Sennheiser usb headset
  • Using the lowlatency kernel (same issue with nvidia driver, so no realtime kernel for me)

Jack is set at 48 kHz (native for the audio chip on the motherboard as well as the Sennheiser cup). I am able to playback and record 16 tracks with 6 effects in Ardour (a couple reverbs, compressor, 4 band parametric eq, fast look-ahead limiter) with a latency of 2.7ms CPU at 80% (spiking to 100) , DSP 17%

and no xruns!
This makes me very, very happy. I am so blown away with KXStudio and whatever magic they are doing behind the scenes.

I can’t wait to get to Balticon this Memorial Day weekend (May 27-30, 2011) and honored to once again participate as a panelist, performer, and moderator. What’s that? Why no, I didn’t just paste this over from last years post! And I’m insulted that you would suggest such a thing :)

Here is a list of the events I will be participating in. Come on by and say hello- I’d love to meet you! The schedule is pretty stable, but there is always a possibility that some of the times or rooms may change – so please check back before heading out to make sure you’ve got current info.

Sound Design & Extreme Audio Effects
When: Friday 10:00pm – 11:00pm
Where: Derby

Our expert panel of professional sound designers and audio engineers will explore various topics and techniques surrounding sound design and the art and science of extreme audio effects. They will share insider tips and tricks to help you squeeze strange and unworldly tones and textures out of the equipment you already own, inspire you to build and record uncommon noise makers, and to turn ordinary sounds into sonic landscapes designed to enhance your next multi-media project. They will cover both basic and advanced recording, tweaking and extreme manipulation techniques to provide you with the take-away know-how to shake awake audiences and transport them to strange new worlds.

Master’s Session: Audio Excellence in Podcasting
When: Saturday 9:00am – Noon
Where: Derby

Our panel of expert audio enthusiasts will discuss various topics surrounding audio engineering for spoken word, music, and everything in between. They will help you get the best sound out of the equipment you have, help you pick the next piece of important equipment within your budget, and help you avoid burnout by streamlining your workflow to shorten the time you spend editing audio. The audience is encouraged to bring questions and even samples of problems they are having with their own work.

Concert: Ditched by Kate
When: Saturday 7:00pm – 8:00pm
Where: Garden Room

New Media participants Phil Rossi and Chooch Schubert bring us alternative rock with their band “Ditched by Kate“.

Unlikely Disasters to Plan For
When: Saturday 10:00pm – 11:00pm
Where: Chesapeake

Because more things may uprise than just zombies and robots!
So much attention is paid to how one might survive the zombie apocalypse or robot uprising. But aren’t there a whole lot of other things we should be planning for? How about mole men? Insect sentience? Or grey goo? Join our panel of possible-apocalypses scholars enumerate the conceivable threats. We might even have time to figure out how to survive one or two! Audience participation encouraged!

Into The Blender: Live!
When: Sunday 9:00pm – 10:00pm
Where: Chesapeake

Geek Media: One size does not fit all
The IntoTheBlender.com podcast is back for another live show. This time we’re taking on a touchy subject: There are countless arenas of geek affection, but some seem near universal: movies, television, and books. Whether it be Star Wars or Firefly; Lord of the Rings, or Buffy; Gaiman or Pratchet – there are things you are SUPPOSED to love as a geek. Well, we don’t love them all and I bet you don’t either! Come compare your likes and dislikes with an assorted panel of lovers and haters of every genre. Take My Geek Card (I Dare You!)

Open Source Software for Everyday Use
When: Sunday 2:00pm – 3:00pm
Where: Derby

What we use at home and at work to free us from software giants
Do you feel that your creativity is held back because you can’t afford programs like Photoshop, Final Cut, or Microsoft Office? Our panel of experts have freed themselves from the bonds of expensive closed software ecosystems and you can too. Whether you’re just fed up, want to try new things, or can’t afford to pick up the software you want, there is an open source alternative available to you. Find out what packages our panelist use, how to find software that fits your need, and how to join the amazing communities that spring up around open source software.
Chances are you probably use some OSS, and you don’t even know it. Our panel of open source developers and evangelists discuss the facts and help you find the OSS packages that can free you from outrageously high costs, bizarre licensing practices, and poor interface design. Come and learn ways to save yourself money, improve your productivity, and secure your computer. You don’t have time or money to ignore OSS anymore. Set yourself free.

May 27-30, 2011