
For the Data Geek 1 contest, I had used the data from the old SDN RSS feeds which I had collected via an ABAP report (and blogged about here). This time out I thought I would again collect the data via SAP software. My intention was to learn more about a variety of SAP software in the process of entering the Data Geek 2 contest with Lumira. I used SQL Anywhere to read the @SCNblogs twitter timeline by using twitter and bitly APIs and using SAPUI5 to format 3 months worth of data into a table, I also picked up some Oauth knowledge along the way. This data was copied into Excel for formatting and then used in Lumira. The SCNblogs tweets usually contain a bit.ly url and appear as follows.
I have highlighted the bit.ly link in the above text. An example of the statistics bitly provides can be found here. I combined the two using SQL Anywhere and SAPUI5 using the twitter & bitly APIs. I used the bitly api http://dev.bitly.com/link_metrics.html#v3_link_referring_domains to calculate the total number of clicks and those clicks where twitter was the source of the click. Looks like this in SAPUI5.
That is the basics of the data collection process and I can share my CSV file if you would like to take a look.
Now onto how I used Lumira to answer my questions, “best day/time to publish a blog on SCN?” and “what time to tweet about that blog?” I also put some names to the blogs @SCNblogs tweet about using the bit.ly api.
Although what is a typical day for people on SCN, as it is worldwide and covers many timezones. From the twitter API the "time_zone": "Berlin"
Therefore I have my time to tweet: Friday 14:00.
The view count of the SCN blogs (via @SCNblogs) will be used for this next section. As part of my initial collection of @SCNblogs twitter timeline I was left with the scn bit.ly URL and text, which was what I was after but I thought more detail was required. Therefore I used SQL Anywhere to expand the bitly link to its original state with the bitly api, then I was able to scrape out the user and view data of the source blogs. Now the assumption being that the creation date is the published date i.e. a draft blog will be updated with the published data/time as the creation date. I will find out when my blog hits the SCN blog space.
A blog created(published) on a Wednesday is the day with the most views from the last 3 months. So again the next step would be to drill down into Wednesday to find the best time of day.
The time format was in AM/PM from SCN blogs Jive site. So although “9” has the highest overall views , this is for AM/PM combined. So 1pm is the best time (of publication) for SCN blogs for views over my dataset.
So I have my time to publish a blog: Wednesday 13:00
Again the data is for the time the blog was published and not the time it was viewed.
Also to note there is a filter on “unknown” data in the SCN blogs charts as there is a lot of content being moved or removed from their original url after being tweeted. I was a bit concerned I had some bad connections but whether I logged in or not, I was presented with this many times.
I have my answers to my original questions so now for some further analysis of the data.
Another aspect of the BITLY api is that it allows you to query the location of the user clicking on the link. The below chart shows all the countries where users are clicking on @SCNblogs tweets over the full 3 month date range.
I'm impressed at the number of countries clicking on the @SCNblogs tweet links.
From the process of combining the SCN blogs information to the tweets I could attach an SCN user name to the @SCNblogs tweets. This allows me to do the following.
For the next chart I added a calculated measure of the number of retweets and favourites on the @SCNblogs timeline from my 3 month dataset.
Above screenshot shows the creation of the new measure. By clicking the plus icon I typed the first few characters of the existing measures and was provided a dropdown to select from. Once I added the selection I had the SCN bloggers behind the @SCNblogs retweets and favourites. tammy.powlas being the top blogger in this category with over 100 retweets or favourites for the @SCNblogs tweets.
I added a count of clicks to find out how many times these bloggers had blogged.
I added the individual @SCNblog tweet text and the SCN blogger name to find out which tweet had the most favourites on twitter.
The blue column shows the favourited tweets and the red column the number of retweets.
Tammy Powlas again top of the charts with over 45,000 page views. *only those blogs appearing in @scnblogs timeline.
Again this is my sample data for @SCNblogs data over the last 3 months (Have I stated that before :smile: )
I thought I would add additional attributes to the chart to see if I can confirm Wednesday as the day of the week to create/publish a blog?
So 6 of the 10 blogs appear on a Wednesday, so from my dataset Wednesday is the day to publish blogs. Most likely require a larger dataset over a longer period with all published blogs to prove this theory. However Wednesday 13:00 is the answer to my original question for this dataset.
While collecting the data I did use the bitly api to get the full url of the base SCN blog. When I discovered that the Lumira split command allows more than one character, I split out the http://scn.sap.com from the URL and was left with the community/space of the original blog.
Therefore the place for bloggers over my 3 month dataset is the business-trends community.
business-trends community top the number of views too.
A couple of mentions to blogs/information that I used from SCN.
* Nested JSON what fun! thank you to Dagfinn Parnas for asking this question http://scn.sap.com/thread/3180215.The question allows me to thank Peter Muessig who provided the answer. This was the method I used to get the data for the challenge in an SAPUI5 table format.
* Thank you to Eric Farrar for posting this blog on SCN that inspired me to try all of this in the first place. While I had to slightly deviate from the blog due to my lack of knowledge, I remain very impressed with SQL Anywhere. https://scn.sap.com/community/sybase-sql-anywhere/blog/2009/12/10/calculating-hash-based-message-aut...
I have analysed the data I collected with some random samples and double checks and remain satisfied with the general quality. The data is driven by the twitter api on SCNblogs timeline and therefore misses out on some of the blogs on the main SCN site. The @SCNblogs account uses other URL shortening services such as tinyurl & spr.ly although only a few tweets using these. Unicode characters in the URL of the tweets threw out my data collection process and 150 backend blogs (mainly blogs not in English) went missing out of a total 2500. Where the URL could not be located with bit.ly api then the click count was set to zero and the backend blogger information could not be collected. The data is a snapshot in time and may or may not contain some errors. I'm still waiting for a an SCN api http://scn.sap.com/api that may help with any future SCN site data queries.
Put my analysis money where my mouth is...
I have a taken the SCN blog option to publish at a certain date/time. Below screen shot of Wednesday 13:00.
However I am hoping that the timezone is correct and that my blog will be published and not moderated! I will find out soon if my plan comes together.
Oh and one final thing, I need to fill out the DG2 entry form to get another free t-shirt
How to Join the Data Geek Challenge
Get Your Data Geek Badges here
http://scn.sap.com/docs/DOC-44751
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
8 | |
7 | |
7 | |
6 | |
5 | |
5 | |
5 | |
5 | |
5 | |
4 |