Big Data

Flattening a JSON Object So It’s Queryable Using Rockset

Many developers use NoSQL databases in order to ingest unstructured and schemaless data. When it comes to understanding the data by writing queries that join, aggregate, and search, it becomes more challenging. This is where Rockset becomes a great partner not only in understanding your unstructured data but in returning queries that join, aggregate, and search within milliseconds at scale. Rockset is a real-time indexing database built for the cloud that acts as an external indexing layer on top of your data lakes, data streams, transactional databases, and data warehouses.

flattening json objects

In this twitch stream, we created a MongoDB Atlas instance. After the instance is created, you have the option to use the MongoDB preseeded databases. Here I used the database called netflix and the collection called movies.

snapshot mongodb

After we configure the instance, we created an integration on Rockset with MongoDB, by using the built-in data connector for MongoDB. We provide limited credentials, so Rockset can read the data from MongoDB. The instructions to configure Atlas and create the Rockset integration can be found here — or you can watch the stream below!

Inspecting the data

Once the data is in Rockset, it will look something like this:

Embedded content:

If you noticed the field genres looks like this:

"genres": "[{'id': 80, 'name': 'Crime'}]"

… Strings, Strings, everywhere…


Basically, we have a string type as a value, when it should be an array of objects. Let’s say you wanted to see all the genre’s names without the id key; you wouldn’t be able to write a query that can do this, as it’s currently formatted.

Transforming Genres from a JSON String → to an ARRAY


Rockset has a function called UNNEST, that can be used to expand array of values or documents to be queried (aka flattening the JSON object). Assuming no errors in how genres is formatted as a string, we can accomplish this in 2 steps:

  1. Parse the given string as JSON:

Here, you can use JSON_PARSE, which parses a given JSON string as a JSON object:

SELECT JSON_PARSE("[{"id":3, "name":"thriller"}]");

When you run that in the Query Editor, you should get this back:

-- get an array of objects back
[{"id":3, "name":"thriller"}]

Keep in mind, our string is currently formatted like this:

“[{'id': 80,'name': 'Crime'}]"

  1. Expand the array and flatten the JSON object:


UNNEST(yourCollectionName.genres AS value) AS genres

When you run this query, you should get:

-- result of UNNEST where we return
[{"name": "Crime”}]

In the following recorded twitch stream, we actually got a curveball ball ?, where we couldn’t JSON_PARSE(genres). A parsing error was thrown because the string in the data is malformatted. In this case, we added an extra step to solve this. Check out the stream ? to see how we resolved the error– (and don’t forget to follow us!)

Embedded content:

TLDR: you can find all the resources you need to get started on Rockset in the developer corner.