Voice and Platforms

5 mins

Is voice a platform, vertical or simply an enabler for whatever is being provided or produced?


It’s all of these.

When I began work on Tincup Voice, I had no illusions about creating a general purpose voice-enabled platform like Siri or Alexa. It wasn’t intended as an input or interface for an on-demand task or query system. Not only was I specifically not interested in solving that problem - I was instead narrowly focused on creating an easy to use mobile transcription that generated editable posts - I knew that I’d need a huge amount of capital to realistically compete with existing voice platforms operated by two of the largest (read: deep-pocketed) companies on the planet.

Tincup Voice is a vertical, i.e. it addresses a specific vertical market within the overall voice market. It creates high-fidelity editable posts from natural language speech, from your phone or desktop, without using privacy-invading device APIs.

The transcription vertical has a lot going on in it right now. It’s clearly extremely valuable as a whole. The giant platforms -Google, Amazon, Apple & Facebook - are using to it more precisely target ads while the smaller emerging platforms like Zoom, are attacking it as an enabler for other services.

What does this mean?

For services like Zoom - now a publicly traded company ($ZM, market cap $18.3B), it means they have pressure from all sides to grow their share of the videoconference vertical. That necessarily requires converting customers of existing competitors as well as pulling in completely new growth from customers that don’t currently use video communication tools. Historically, a big gap in video use inside the enterprise is its lack of text-based search & review. It’s incredibly clumsy and time-consuming to watch video while attempting to transcribe it for use in other internal systems that are text-based. Zoom’s recent announcement of automatic video transcription is a big win. Enterprises are going to love this - I guarantee this will result in new customers.

Innovations like that are emerging at the same time that Google is releasing built-in transcription in the consumer recorder app on the Pixel 4. Think about that - record a voice memo and instead of only getting the recording of your voice, it’s automatically transcribed. Assuming that the transcription is generated in an easy-to-use file format, this means that a consumer will be able to easily produce a complete transcription, i.e. voice -> recording -> transcription -> usable text, on their phone without any additional services. The transcription feature is said to work offline, too.

Since Google effectively owns Android (not literally, but they may as well) and they do own the Pixel device, they own the huge platform that that represents. They’re competing with Apple’s iOS & iPhone platform, obviously. Google is not directly competing with vendors like Zoom, though Zoom is competing with both Google and Apple as well as with transcription services like Trint (and indirectly with smaller service providers like Tincup Voice). That’s a lot of pressure.

Privacy is a huge issue. The big platforms that own the microphones that are now everywhere in your life - in your kitchen, living room & bedroom (smart TV, anyone?) are, by default, always recording. They don’t magically wake up when you use the key phrase ‘Hey Google’ or ‘Alexa…’ - they are already listening. When you use the ‘wake’ phrase, it simply begins processing that phrase as a request.

It should be assumed that it records & stores everything, even in an encoded manner directly to the device. Without access to the source, exactly how this happens remains subject to speculation - and justified paranoia.

Google has now placed this capabilty directly in an app on your phone. It isn’t clear yet exactly how or where the recorder app will store its transcriptions, but since Google is in the data business, you can assume until there’s evidence to the contrary that the transcriptions will be analyzed. Remember, you’re an advertising profile to Google. The more they know about you, the more valuable you are as a marketing target to their global customer base. Google generated $32.6 billion in advertising revenue in Q2 2019. That’s a lot of targeted ads; your marketing profile is valuable.

The same argument can be made about Amazon and Apple. Don’t forget Facebook, too. I haven’t used Facebook since 2008 because they’re a privacy & ethical nightmare, but there are countless examples of users claiming that Facebook is snooping on them after silently taking control of their phone’s microphones in order to more highly target ads in their feeds.

The conclusion here is that voice is incredibly valuable to the platform owners as a means to deliver ads. That’s gross and it’s why Tincup Voice - in the editable transcription post vertical - doesn’t use device APIs. When it needs your microphone to record a post, it requests access. You have control over this every single time. It doesn’t analyze your transcriptions and it doesn’t deliver ads. Everything is private by default.

A provider in the transcription vertical can use privacy as a key differentiator. Competing against the platform owners is futile unless you attack their Achilles’ heel: privacy & the resultant consumer trust.

- jbminn