Earlier this month the U.S. Patent and Trademark Office published a patent application from Apple that revealed an invention that generally relates to audio analysis and, more specifically, to analyzing audio input for efficient speech and music recognition. While Apple first introduced integrating music identification technology created by Shazam into iOS during their WWDC 2014 event as noted below, was the deal with Shazam only a stopgap measure until they deliver their own home grown solution? Apple’s latest patent application would strongly suggest just that.
Apple’s Patent Background
Audio recognition, such as speech and music recognition, has become increasingly significant in electronic devices, and in particular, portable electronic devices (e.g., portable media players, cellular telephones, and tablet computers). For example, virtual assistant applications running on electronic devices can apply speech recognition to decipher spoken user requests and deliver relevant services based on the deciphered spoken user request.
In another example, music identification software applications running on electronic devices can apply music recognition to analyze unknown excerpts of music and provide music identification services. Currently, applications providing speech recognition related services typically operate independently from those providing music recognition related services. Users must therefore select different applications depending on the type of service desired and the type of audio input (e.g., speech or music) provided, which negatively impacts user experience.
Apple’s Invention: Analyzing Audio Input for Efficient Speech and Music Recognition
Apple’s invention generally relates to systems and processes for analyzing audio input for efficient speech and music recognition are provided. In one example process, an audio input can be received. A determination can be made as to whether the audio input includes music. In addition, a determination can be made as to whether the audio input includes speech. In response to determining that the audio input includes music, an acoustic fingerprint representing a portion of the audio input that includes music is generated. In response to determining that the audio input includes speech rather than music, an end-point of a speech utterance of the audio input is identified.
Specifically, a virtual assistant can be configured to provide both speech recognition and music recognition related services. In order for both types of services to be provided seamlessly, it can be desirable for the virtual assistant to determine whether a received audio input includes speech or music and thereby provide the appropriate services to the user automatically based on the determination.
Apple’s patent FIG. 1 noted below illustrates an exemplary process for analyzing audio input for efficient speech and music recognition.
Apple’s patent FIG. 2 noted below illustrates an exemplary system and environment for carrying out aspects of analyzing audio input for efficient speech and music recognition.
Apple’s patent FIG. 4 noted below illustrates a functional block diagram of an exemplary electronic device such as an iPhone.
While Apple’s patent application was filed in the US, it was discovered today in Europe. Apple originally filed for a patent in May 2014. Considering that this is a patent application, the timing of such a product using Apple’s own patent pending technology to market is unknown at this time. The revelation of ‘Hey Sir’ working with Shazam could be found at around the 1hr. 21 minute mark of the WWDC event video found in our 2014 report.
Patently Apple presents a detailed summary of patent applications with associated graphics for journalistic news purposes as each such patent application is revealed by the U.S. Patent & Trade Office. Readers are cautioned that the full text of any patent application should be read in its entirety for full and accurate details. About Making Comments on our Site: Patently Apple reserves the right to post, dismiss or edit any comments. Comments are reviewed daily from 5am to 6pm MST and sporadically over the weekend.