Analyzing iMessage With SQL
You can use SQL to search your old iMessage data - and learn some surprising facts about the people you text along the way.
Join the DZone community and get the full member experience.
Join For FreeSQLite is an often overlooked flavor of SQL engines. Some have suggested it is the most prolific SQL engine in existence due to its highly flexible nature and ability to run on almost any platform with limited resources. Unlike other SQL engines like MySQL, PostgreSQL, MSSQL, or Oracle, SQLite runs without a server. SQLite does not rely on a data directory or a constantly running daemon: a database is encapsulated in a single file.
SQLite and iMessage
iMessage is one of the most popular messaging platforms today, largely because it is built into iOS and Mac devices. Since its release, it has evolved significantly. But, at its core, it is simply an instant messaging platform. iMessage uses SQLite in the background to store relational data about messages, conversations, and their participants.
As a long-time Apple user, I have backed up and transferred my iPhone data since my first time using an iPhone, which was November 10, 2009. Because I have been digitally hoarding my text data for so long, my iMessage database is nearly 1GB in size.
Until a few years ago, the built-in search feature for iMessage was very limited and buggy. Although it has recently improved significantly, it is, like nearly any end-user tool, very limited in how you can query it. Those of us who frequently work with data that is trapped behind a limited front-end often wish we could get direct access to the SQL database. Fortunately, the iMessage database is not inaccessible - in fact, it is very easy to access.
Finding the iMessage SQL Database
On Your Mac
If you have iMessage enabled on your Mac as well as your iPhone, you have 2 different databases from which to choose. The database on your Mac is very easy to find, as it is simply under ~/Library/Messages/chat.db
. If you do not use your Mac for iMessage, or, as in my case, your Mac iMessages do not go as far back, you can extract your iPhone's database by performing a backup to your Mac.
On Your iPhone
Follow these instructions to extract your iPhone's iMessage database:
- Open Finder and select your iPhone under "Locations".
- Find the "Backups" section and select "Back up all of the data on your iPhone to this Mac", then press Back Up Now to immediately create a new backup. This process may take a while.
- Once it is complete, you will find the SQLite file under
/Users/[username]/Library/Application Support/MobileSync/Backup/[backup name]/3d/3d0d7e5fb2ce288813306e4d4636395e047a3d28
. - If you plan to open this database with Arctype, you'll want to copy and rename the file with a
.db
extension to indicate that it is an SQLite file.
Getting Started With SQLite
Unlike most SQL servers, you do not need a connection string, host, or username to connect to an SQLite database. All you need to do is point your SQL client to the database file.
With Arctype
- Under the Connections dropdown, select "Add new data source"
- Select "SQLite"
- Find the SQLite database file. The file must have a .sqlite3 or .db extension for Arctype to open it.
More detailed instructions can be found in the Arctype Docs.
With Command Line
From a UNIX terminal, type sqlite3 [filename]
.
iMessage Schema
One of my favorite parts about Arctype is how easy it is to analyze database schema. I'm a long-time user of command-line tools and old-school editors, but sometimes having a more visually interactive tool is a lifesaver. Let's dig into the schema Apple has created for iMessage. Today we will focus on the chat
, message
, and handle
tables, as well as a few, join tables to connect related records.
Note that I have created a custom view called handle2
which adds a field id2
that obfuscates the phone numbers and email addresses of my personal contacts, and you will see this view referenced in the examples in this article.
Digging Into iMessage
Let's write some queries and makes some observations that would not be possible without direct SQL access.
Pique Your Nostalgia With Old Messages
To get started, let's begin with a simple query to view your first 50 messages. If you have chat threads that go back years and years, there is no easy way to access early messages from your iPhone or Mac.
The interface on both platforms requires you to scroll back by about 25 messages at a time. This is prohibitively time-consuming and can result in a crash or reset if the user sends you a new message while you're scrolled back.
Fortunately, we have custom SQL to save us:
handle.id
represents the readable identifier ford the user. It will be either a phone number or an email address.
Rate Your Friendships With SQL
Let's use SQL to find out who our best friends are. Assuming you view the quality of friendships as a function of the quantity of sent text messages, this should be very accurate!
First, let's divide the number of messages that are from_me
by those that are not to produce a reply ratio. This query shows the top 10 people we have been messaging by the total amount of messages, as well as the reply ratio.
Multiplication by 1.0 casts to the REAL
data type to avoid integer division, which would result in 1 or 0 instead of a decimal. You can use the link here to see the rules for integer division in SQLite.
One issue with this analysis is that fewer sent messages do not necessarily imply fewer words sent. Let's add some more fields to get a better insight.
Here we can see the total amount of characters sent and received, the average length of text messages sent and received, the total ratio of characters sent and received, and the reply ratio. In my case, people from whom I tend to receive more messages also send longer messages than me.
This query makes heavy use of aggregate filters. Aggregate filters allow you to use an aggregate function on only a part of the data by specifying a WHERE
clause to filter out unwanted records.
Examining iMessage Reactions
There are 2 newer iMessage features whose implementations, in the context of their schema design, are interesting to look into. Recently an announcement was made that Android phones will be able to show iMessage "reactions" properly. Historically, if you send an iMessage reaction to a non-Apple device, it will show up as a textual addition instead of an icon.
With the announcement of the new compatibility with Android devices, I was curious to learn how the current implementation of the feature works.
I SELECT
ed a few records with and without a reaction and compared the results. I discovered that the associated_message_type
column was usually set to 0, but in messages with a reaction, it was an integer value between 2000-2005. I also noticed that associated_message_guid
was present. Apple appears to be using 2000-2005 for its 5 reaction types, 3000-3005 for when a user removed a reaction, and 3 for an Apple Pay request.
From this investigation, it appears that reactions are sent as iMessages with the reaction's textual equivalent appended and a foreign key relation to the parent message. This allows the messages to seamlessly be sent and received by non-Apple devices.
If the message is sent over SMS, the metadata linking the reaction to the message it references is simply lost. If the device is iMessage capable, Apple devices will ignore the text
part of the message, find the associated message and add the proper reaction as a visual overlay.
Note that the message
table includes both a ROWID
and a guid
. ROWID
is a typical auto-increment integer id
field, which is useful for joining on within the local database. However, the auto-incremented primary key will not be the same for the same message across devices. The GUID
is globally unique, generated by the author of the message, and sent to all of its recipients. This allows foreign key references across different databases, devices, and users. For more information about the utility of GUIDs, check out this article.
Find Your Most Popular Group Chats
Group chats are stored in the chat
table. Join tables chat_handle_join
and chat_message_join
are used to associate users and messages, respectively, with group chats. Here's a query to find out the most used group chats (chat with > 1 member) and the identities of their participants.
The group_concat
function, which is familiar from MySQL by the same name and familiar to PostgreSQL users as string_agg
, is an aggregate function that concatenates strings together. See more on how it can be used within SQLite here.
The HAVING
clause is similar to a WHERE
clause but operates on aggregate functions. If you've wanted to write a query conditional on an aggregate but are not able to inside of your WHERE
clause, HAVING
is there for you.
Conclusion
SQLite is a powerful tool whose prolific reach across devices and numerous use cases make it one of the most impressive software projects around. If you're curious about what's behind the scenes, SQLite's source code is well known to be well-organized and fun (well, to some of us) to peek into.
iMessage is just one of many pieces of software that rely on SQLite and are used by millions of end-users. You can try out a SQL client like Arctype for free and start exploring the databases that power the tools you use daily!
Published at DZone with permission of Daniel Lifflander. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments