TTL Data Structures

ttl 8 February 2025 11:25:22 PM

At the time of writing this post, TTL has:

16 users SELECT COUNT(user_id) AS users FROM users;
35 posts SELECT COUNT(post_id) AS posts FROM blog_post;
128 comments SELECT COUNT(comment_id) AS comments FROM blog_comment;

Despite these numbers, TTL is designed to be able to smoothly handle large quantities of data with ease. This is directly reflected in the data structures utilized.

Note: this article focuses on how data is handled programmatically once it is queried from the database. For more information about the database itself check out TTL Database Schema (post not yet created).

Posts
Feed

The feed is created from an object comprised of the day a post was published, the time a post was published, and the post data itself.

Obj {                                 
    unixTime (day rounded down): [{   
        unixTime (hour rounded up*): [
            { post }                  
        ]                             
    ]}                                
}                                     

The first thing to note are the unix time distinctions. The days are rounded down and the hours are rounded up so as to group posts by the day and hour they were posted. This structure is conducive to my design goal of having the feed resemble a day planner. Upon rendering the feed, the rounded up hour is placed such that all associated posts are located under it.

It is also worth noting that there is a special case if the post has been published within the current hour. By current hour I don't mean within the last 60 minutes, but rather shares the same hour value. Take 6:37 as the current time: only times between 6:00 and 6:37 would fall under this case, while times such as 5:38 would not be. If this condition is met, the data structure will store the current time rather than the hour rounded up.

This is a strong data structure as it is scalable, optimized for efficient, temporal-based queries, and flexible.

Concerning infinite scrolling, this data structure is perfect for my use case as it is efficient to create following a database query, Can seamlessly handle the addition of more entries, and implicitly contains numerous ways of rendering posts so as to optimize memory efficiency.

At the time of writing this post, TTL has:
8 days worth of posts SELECT COUNT(DISTINCT(EXTRACT('Day' FROM creation_date))) from blog_post;
an average of 4.375 posts per day across those 8 days SELECT AVG(posts) FROM (SELECT date_trunc('day', creation_date), COUNT(post_id) AS posts FROM blog_post GROUP BY date_trunc('day', creation_date));
a max of 13 posts in a single day SELECT MAX(posts) FROM (SELECT date_trunc('day', creation_date), COUNT(post_id) AS posts FROM blog_post GROUP BY date_trunc('day', creation_date));
and a min of 1 post in a single day SELECT MIN(posts) FROM (SELECT date_trunc('day', creation_date), COUNT(post_id) AS posts FROM blog_post GROUP BY date_trunc('day', creation_date));

With numbers such as these I can populate the data structure with every single post from the database with minimal tradeoffs as there is little concern for memory or initial loading times.

As the userbase and the number of posts grow, the flexibility of the data structure will allow me to weigh these tradeoffs. For example, depending on consistency of posting as well as total quantity of posts, I can experiment with different criteria for subpopulating my data structure such as:
posts by day (consistent posting, low total quantity of posts)
posts by hour (consistent posting, medium total quantity of posts)
static number of n posts per 'page' (general suitability)
or any combination of the above

Furthermore, this data structure will prove useful should a search function be implemented with temporal filters such as fetching posts from a certain day or within a certain range of time.

Publishing

Not only are there interesting data structures for the published posts, but while you are creating your post itself there are a number of data structures vital to the publishing experience.

Footnotes

If you have uploaded a post with an image, you will have noticed the footnote system in place. Footnotes are integral to the publishing experience as they simplify the content in the editor and provide an easy referencing system should you want to place the image elsewhere in the post or use it multiple times.

A markdown editor can be limiting, as one could find difficulty imagining how their formatted post will appear. This is especially evident with images, as with a large post containing many images a post can just become a mess of contextless urls. The footnote system I have implemented allows users to attach labels to their footnotes. What this means is that one can upload their photo and alter the label to provide a descriptor. Now when reading your post in the markdown editor, you will have a clear idea of the image present without having to remember what the url refers to.

When considering the data structure for footnotes, there were two main considerations:
Maintaining the order of the footnotes
Manipulating the data within a specific entry (editing label and alt text)

With these considerations in mind, I decided to utilize a singleton pattern and create a class that initializes a map in the constructor. I knew that I would need many functions to manipulate the data and a map lets me both keep track of order, as well as efficiently identify and alter specific entries.

I chose a map over an array for multiple reasons. As mentioned previously, manipulating the data within a specific entry was important. I knew that by having custom labels, I would need to do lookups by value often rather than by index meaning maps with a complexity of O(1) would be more efficient than arrays with a complexity of O(n). Other manipulations such as updating and deleting had similar benefits with regard to complexity.

Using a class allowed me to isolate all my functions regarding manipulation of the map, which made it easy to both add features and troubleshoot/debug.

Edit & Versions

When populating the text box with your post after previewing, It is not as straightforward as pulling from the database and rendering the post. Before the post can be rendered, the data has to be mapped to a new array with sanitized values of everything. An array is utilized as this is what my database query returns due to the possibility of multiple images in a post.

The versions are also sorted as well so that on switching versions through the dropdown menu, they appear in order they where made.

For more on general publishing, check out TTL Workflows (not yet created).

Comments

TTL features nested comments on blog posts. This was a feature I knew would be vital, yet also posed the most challenging. I considered the following before deciding on the data structure:
Data structure should be created in a single pass
Relatively flat data structure
Not every comment on a post will be viewed

With these considerations in mind, I decided to create both an array for root comments and a map to store every comment along with immediate children.

rootComments = [{
    username,    
    comment_id,  
    parent_id,   
    body,        
    creation_date
}]               

commentMap = {                 
    comment_id: {              
        username,              
        comment_id,            
        parent_id,             
        body,                  
        creation_date,         
        children: [             
            child_username,    
            child_comment_id,  
            child_parent_id,   
            child_body,        
            child_creation_date
        ]                      
    }                          
}                              

The root comments array allows me to render just the root comments very quickly on loading of post. Since we are adhering to the assumption that not every comment on a post will be viewed, we can serve the children comments on a need-to-know basis.

Currently every single comment associated with a given post is queried upon page load and occupies my data structure. This is ideal for my current userbase as it functions as a middle point between loading every single comment on page load and querying the database every single time for children comments. Rather than making a call to the backend on loading children comments we can make a call to our populated commentMap and render the data from there. Being a map, lookups like this are quite efficient with an O(1) complexity.

Another example of a map being more efficient are cases where you need to expand to an arbitrary comment. With nested comments this task can become increasingly complex as depth increases, but with a map you can generate an array of parent ids with time complexity O(d), where d is depth. If this was an array for instance it would be a complexity of O(n * d).

Conclusion

When creating this website, I didn't just want to create something that worked, I wanted to create something that would allow me to see what it took to create a project of this scale. I knew specifically that data would be one of the most important facets of this. Despite anticipating a small userbase of really only two of my friends, I wanted to create something I felt confident could withstand whatever size userbase was thrown at it.

Taking this approach I came to the conclusion that it was much easier to design flexible data structures if you think big to begin with. All of the data structures mentioned in this article were designed with the capability to be semi-populated based on specific criteria and updated efficiently. 

Rather than having the data structures dictate the experience for the userbase, the userbase is able to dictate the criteria for the data structures according to their needs.