User-Generated Collections, Content Recommendation MVP, and Explaining our Analytics Situation
Scrolller Weekly Update July 28th, 2021
Hello and Welcome to our weekly update. This week is a really long update because we wanted to share a lot of detail about our analytics and how we are working on solving growing pains given our unique situation. If you are new to the newsletter, subscribe to keep up with our journey of building a 100 million MAU web application.
Weekly updates are published on Wednesday every week; this will give us more weekday time to prepare the newsletter (Higher Quality) and make it easier to include updates from our internal weekly meeting (More Timely).
User Highlights
We recently introduced User Highlights as a way to thank users who go above and beyond to help improve Scroller.
This week, we are featuring one of our regular users, Júlio Pontes. He goes out of his way to let us know when he sees any issue and comes up with brilliant ideas on what could be included in the future to make scrolller better for all users.
We need help translating Scroller into Russian. You'll be helping us expand our reach to a larger global community. So, if you are out there and would love to offer your hand, don't hesitate to reach out and help Scrolller become a better experience for Russian users. We can't wait to meet you. Sign up below, and Thank You in advance!
Sign Up To Help → https://airtable.com/shrrRkFqMSZBqF24Q
Another favor we would like to request from our users kindly is help with feature testing. The helpful feedback we get will help us know whether our product features fit their purpose in the first place. It will also help us know how we can better improve Scroller features for everyone.
To help us with Scroller Features testing, Go Here. We greatly appreciate your taking the valuable time to offer us feedback. Many thanks!
User Experience
228776 → 232486
User Analytics Explainer (Long)
Our Start with Google Analytics
We began by using Google Analytics to all record page views and events. All of our analytics so far is handled by the client-side only, meaning that when an event happens on the client, that is immediately submitted to the setup analytics service.
Soon enough, our website grew to exceed the GA quotas, so we had to reduce the number of events being submitted. We settled on having less important events only submit 10% of the time - so for every 10 times the event occurred, only one is submitted to GA.
The issue with this approach was that only some of the events in a session are submitted. We were left with sparse data which couldn't reliably provide insights into the user and their experience.
Another issue with GA being a cloud analytics solution is that it's difficult to analyze the dataset that you can't download or access without a costly upgrade to analytics360. Having the data accessible by your server where you can run SQL directly would be much faster.
Matomo
To overcome these issues surrounding GA's quotas and impractical data processing, we decided to set up an open-source self-hosted analytics solution. The solution that we opted for was Matomo, marketing itself as superior to GA.
We set up Matomo to run on the main API server, in hindsight, a terrible decision. As mentioned previously, all the analytics were handled on the client-side, so sending all this analytics data resulted in huge loads of web requests to the API server - slowing it down even more.
Within a fortnight of setting Matomo into action, we found our website spewing out strange HTTP errors that we hadn't seen before, and our API had gone down completely for a short time. Upon investigation, we found our API server's disk space was full to the brim, and its CPU would often reach full utilization. We'd accumulated 85GB of Matomo data in our MySQL database, and we were overloading it with web requests.
We needed a different solution.
Back to Google Analytics
We recently have migrated the client to use a plug-and-play analytics abstraction, making it super easy to switch between analytics providers if we choose to do so in the future. For the moment, we're back on GA, so we're not blind in the meantime, but our chosen solution may change soon. We still can't afford Google's Analytics360 offering, and even if we could, that money could be better spent investing in the website.
What are our options?
Self-hosted analytics is not off the cards; neither is using GA and only sending part of the data. It seems we have a few different solutions available to us, each with its unique points worth thinking about.
1. Self-hosted Analytics
This is still an option - and gives us direct access to all our analytics data compared with a cloud offering.
Disk Consumption
To reduce our disk consumption, we will need to be pickier with what data we're capturing. For example, we log every page viewed by a user - and you can imagine that we get quite a few page views every day. It seems that for a web application with an almost infinite number of URLs, recording these is pretty meaningless.
We have to process these URLs (e.g., extracting the Reddit post ID) to make sense of what's going on. I don't want to go too deep into this, but the gist is that we should go with events rather than page views (submitting URLs). We need to consider what data gets stored in analytics: too little, and we can't make data-driven decisions, too much, and our disks become full.
Maxed-out CPU
Putting Matomo on the API server caused all of our other services (including website and API) to become slow and sometimes even crash - definitely a bad idea. It seems a good time for this issue to arise, as we've recently been thinking more about CI/CD and our architecture in general.
If self-hosted analytics is the path we go down, it might be the case that we run it on a server that is solely responsible for analytics - probably a mid-range VPS. We also have to consider where the database should be kept: serverless (auto-scaling) or on a server that we manage.
2. Cloud Analytics
Cloud Analytics, such as GA, is Software-as-a-Service (SaaS) - so we pay for the ability to access this service from another company.
It has the advantage of offloading all of our responsibility as we can assume that it works properly. The downside is the difficulty of exporting and manipulating the dataset and the cost that may be incurred.
Quotas
To overcome the quotas that we've seen with GA, we need to either limit our analytics submissions or pay for a higher class of service.
If we're limiting the analytics we're sending off; it's clear that a crude 10% of events isn't going to cut it. James recently asked the team their opinions on how we should approach this, with the main takeaways being to record: all important events for 100% sessions and some percentage of all events (maybe around 5%).
Other paid services can be researched later on.
3. First-class/Integrated Analytics
Here lies the most versatile and powerful form of analytics that we could adopt. It's an involved effort to set into motion but has a very high potential moving into the future and can grow - in terms of functionality and scale - as Scrolller grows.
Rather than having an analytics solution separate from our own systems, this option incorporates analytics into our own system. What would otherwise be analytics data points just become another field in our database.
This means we could easily perform SQL operations on it, as the relationships in our RDBMS system (PostgreSQL) would already be in place. This makes operations faster and also reduces the cumulative amount of data needing to be held.
If we chose this route, the data would reside in our database; then, we could gain insight into the data by generating dashboards for internal use - and maybe even create some public-facing ones for users and investors.
This would be an incremental move and likely have another basic analytics solution while we iron out the kinks in our custom one.
Of the three options, what do you think will work best?
Pro-Metric Ads
After a couple of weeks, we are finally starting to display the new pro-metric ads in Scrolller. By next week we will see the impact of our new ads in our analytics and share the results.
Recommendation System MVP
After spending a couple of weeks gathering the data needed, the team has decided to use "Alternating Least Squares for Matrix Factorisation" as a suitable algorithm to build the recommendation system.
In this approach, a data matrix (in our case, the favorites and follows of the users) is split up into a matrix representing the latent factors that describe the data best (what collections appear together) and the user descriptions (what factors are present for this particular user). This is computed iteratively by minimizing the prediction error for the factors and the user descriptions in alternation. [1]
ALS uses different error functions if applied to explicit (user ratings) or implicit (views, likes) feedback. We deal with implicit feedback. The number of favorites for a collection should be seen not as a rating on a scale but as a measure of certainty that the user actually likes the collection. [5]
Advantages of ALS:
Does exactly what we want and doesn't fail in unexpected ways
Creates relationships between collections in the most efficient way
If we had written a neural network, it would do the same, just with more unknowns
Many additions for things we might want to consider [2]
Is Whitebox: We can easily analyze the learned latent dimension
Use it as a sanity check (e.g., should dogs and architecture be reduced to the same dimension)
Use it for a tagging system etc.
It can be run in parallel and scaled up easily.
It has been used successfully by many people, with examples spanning 10000 entities [3][4]
Hard to find such statements for other methods
Disadvantages of ALS:
We can only use this system for the N collections we have
In the long run, tracking views of content could increase the collections we have information on
Still, I possibly don't want to scale it much higher than 10000 collections (we will see)
We need a similarity-based system for the other collections
NLP based, image identity based, or Reddit based
We still need to display random content to gain information further
Roadmap for Growth for User Accounts
Pro-Metric Advertising (In Progress)
Content Recommendation (In Progress)
Categories Feature Update + Unlock (Blocked)
Scrolller APP (Back Log)
Revenue
Improved Premium Feature Set
Premium is the largest and most stable source of revenue that we have at Scrolller. It, along with advertising, allows us to continue developing and hosting Scrolller. The 4 selling points of premium are an advertisement-free experience, faster hosting, beta access to new features, and community access. When we first created this list, we did it as a goal post to reach our development of scrolller. After a few months, we have not delivered on all of the premium features yet; faster hosting is the last feature before we can say we reached our goals completely. Sadly, our team has lacked the skills to execute on this, but this will change as we hire someone to take over our open Dev Ops position in the next two weeks. In preparation for this new hire, we are starting to plan the feature set now.
This new project we started is to build that remaining feature and improve what features exist for our premium subscribers. Success is that Premium will feel like an amazing deal. We get to that vision of success for this is yet to be decided, but we are excited to start!
Roadmap for Growth in Revenue
Ad Network Mediation System (Blocked)
Improved Premium Feature Set (Design Stage)
Social Traffic
Social Media Update
In our last social platforms update, we mentioned that our Instagram automation failed and got temporarily banned severally. To avoid being banned permanently, we decided to stop the automation and work social media manually.
We have since started with that, but the growth is still slow, especially for Facebook and Twitter. Instagram is at a steady growth. For example, this week, we got 13 new follows. Twitter is up to 75 followers from 68 (7 new followers), and Facebook got 5 follows and 1 like. We also noted that videos are getting more engagement compared to photos. This is something that tells us we could leverage videos more.
So, compared to Instagram, Twitter and Facebook have much slower growth. We will keep working with different social media content and strategies to leverage social platforms to grow our website traffic.
If you can share and help us with any social media growth strategies that have worked for you, you are welcome, and we'd love to hear from you. Let us know in the comments sections.
User Generated Collections
During our weekly meeting, we further specified what our MVP for user-generated collections would look like. In our first iteration, we are going to launch sharable favorites. An advantage of launching sharable favorites will make it easy for many of our users to quickly launch a new collection populated by what they already identified as great content. A disadvantage is that this might, in the short term, reduce the effectiveness of favorites as a signal in our new recommendation system.
Once the feature is launched, a button will appear on your favorites page that will allow you to create your first collection. This collection will be populated by content in your favorites based on which mode you are in (SFW or NSFW), allowing you to create up to two different collections in your account.
We will be actively tracking the following and favorites for your user-generated collections to generate a score. The top 100 collections will earn their owners a free premium as long as they remain on the leader board. We hope that this will incentivize people the create and share collections.
Our MVP will be pretty barebones at the start, but we have big plans to make user-generated collections an important part of the scrolller experience. Giving our users a way to organize the content on our platform, share that collection with their friends, tag their favorite content, and even upload their own content to our platform. We are excited to be launching this feature and look forward to feedback from our users.
Roadmap for Growth in Social Traffic
User Generated Collections: Create and share your favorites and follows (In Progress)
Facebook Automation (Back Log)
Twitter Automation (Back Log)
Team Highlights
Under Team Highlights this week, we asked a few members of the team their thoughts and experiences about Scroller. Here's what they had to say,
What I love about Scrolller is how easy it is to find interesting, funny, motivational, or whatever category videos/photos, and follow/favorite them to be able to come back later and enjoy — Karol Lysik
Being a part of the scroller team is like having a second family that cares about you and helps you grow. Ever since joining Scrolller, I have explored more into my own creativity, and the Scroller team motivates me to deliver the best results because that's what the rest also do. And the best part is when and if boredom hits me while working, all I have to do is open the scrolller, and it all goes away — Lavanya Sahu
Scroller is a site I really enjoy! I recently finished my Computer Science master's degree in Germany where I was able to build a good foundation in the field of Artificial Intelligence. However, I had little opportunity to apply the knowledge in real-world projects. When the Scrolller team reached out to the users a few months ago, I took the opportunity and offered to apply my skills in the context of content recommendation. With my thesis over, I have joined Scrolller in the form of a very flexible internship. Working at Scrolller gives me the opportunity to work with realistic data in a company setting. Also, the Scroller team supports me at every step which makes it very worthwhile. I'm looking forward to seeing Scrolller take shape. I can still imagine the many amazing features I would love for the future Scrolller — Tim
Why do you like Scroller? What do you enjoy most on Scroller? Would you mind letting us know in the comments sections? We love hearing from our users!
Road Map
You can share feedback with us about any of the new features below. Using this Airtable Form. Looking forward to your comments
This week
Improve quality and user experience of pro-metric advertisements (In Progress)
User Generated Collections: Create and share (In Progress)
Content Recommendation (In Progress)
Categories page for SFW collections + Unlock (Blocked)
New native advertisements (Back Log)
Monetization Update and Optimization (Back Log)
Enhancements to Auto Scroll Feature (Back Log)
P.S. As a reminder, subscribe to our newsletter and join us on our challenging journey.
Also: Sign Up To Help → https://airtable.com/shrrRkFqMSZBqF24Q us translate Scroller to Russian.
To help us with Scroller feedback, Go Here
Until next time, happy Scrollling!