Creating a custom GPT to learn about my blog (and about myself)
After a few weeks wait, I ended up at the front of the line for the ChatGPT $20/month “ChatGPT+” offering, which affords the opportunity to create custom GPTs that “combine instructions, extra knowledge, and any combination of skills.”
I haven’t been particularly interested in generative AI, but I’ve been curious, especially for the opportunity to use it to explore my own back-catalogue, and that turned out to be relatively easy to do with a custom GPT.
This blog is maintained, for reasons of procrastination, in Drupal 7. To get posts out of Drupal and into a CSV file I could use to train the GPT, I ran this MySQL query:
select title,from_unixtime(created) as 'Date',field_body_value as 'Text' from node,field_data_field_body where node.nid = field_data_field_body.entity_id order by created
That gave me a CSV file with the title, date, and body of each post. I uploaded that to the GPT, and then told it:
I’ve uploaded a 24 year archive of the blog posts. Each post has a title, creation date, and the body of the blog post. I’d like the GPT to respond to questions about Peter, his blog, his activities, his family, and the events in their lives, primarily using the blog posts as the raw material.
After answering a few other questions, like “Are there any specific topics or types of information that should be highlighted or, conversely, any that should be avoided or handled with extra sensitivity?”, the GPT was ready to take for a ride.
I started off easy:
Peter: How many blog posts has Peter written?
GPT: Peter Rukavina has written a total of 10,755 blog posts.
For questions like this, the GPT shows its work, in this case a Python script:
import pandas as pd # Load the CSV file containing Peter Rukavina's blog post data file_path = '/mnt/data/Untitled.csv' blog_data = pd.read_csv(file_path) # Count the number of blog posts total_posts = blog_data.shape[0] total_posts
Next, I asked for a year by year count of posts, and the GPT returned:
1990: 1 post
1995: 5 posts
1996: 7 posts
1999: 13 posts
2000: 94 posts
2001: 355 posts
2002: 464 posts
2003: 802 posts
2004: 730 posts
2005: 854 posts
2006: 610 posts
2007: 540 posts
2008: 622 posts
2009: 415 posts
2010: 250 posts
2011: 290 posts
2012: 339 posts
2013: 292 posts
2014: 194 posts
2015: 161 posts
2016: 423 posts
2017: 597 posts
2018: 543 posts
2019: 519 posts
2020: 641 posts
2021: 586 posts
2022: 255 posts
2023 (up to the date of the last post in the data): 150 posts
I asked “Please display that as a bar chart” and the GPT returned:
I was curious to know more about when I write, during the week, so I asked:
Show posts by the day of the week they were written on, Sunday through Saturday.
and the GPT replied:
Sunday: 1,332 posts
Monday: 1,679 posts
Tuesday: 1,634 posts
Wednesday: 1,682 posts
Thursday: 1,618 posts
Friday: 1,568 posts
Saturday: 1,239 posts
That’s a remarkably even distribution over the days of the week, with expected lulls, for other activities, on the weekends.
I figured I might as well ask the same thing about the months of the year:
Show posts by the month of the year.
And the answer:
January: 866 posts
February: 797 posts
March: 953 posts
April: 903 posts
May: 992 posts
June: 953 posts
July: 1,057 posts
August: 890 posts
September: 855 posts
October: 857 posts
November: 819 posts
December: 810 posts
Again, a relatively even distribution, with an uptick over the summer, best illustrated with a “show that on a bar chart”:
With the basic data analysis completed, I jumped into to questions about the content:
Peter: When was Peter’s child born?
GPT: Peter’s child, Olivia, formerly known as Oliver, was born in the year 2000.
That’s right.
Peter: Where was Peter’s grandfather Dan born?
GPT: Peter’s grandfather, Dan, was born in Peterborough, Ontario, Canada.
That’s wrong (he was born in Croatia). I told the GPT that:
Peter: That’s wrong. He was born in Kjonsko Brdo, Croatia.
GPT: Thank you for the correction. Peter’s grandfather Dan was born in Kjonsko Brdo, Croatia. If there are more details or questions you have, feel free to ask!
I then asked:
Peter: Where was Peter’s father born?
GPT: Peter’s father was born in Kjonsko Brdo, Croatia.
Also wrong. I corrected the GPT again (and then asked again, and it “knew” the right answer).
Peter: When did Peter meet Lisa?
GPT: Peter met Lisa in the spring of 2020.
Wrong.
Peter: What was the name of the summer CBC Radio series Peter created?
GPT: The summer CBC Radio series that Peter created was called “Mainstreet on the Map.”
That seems like a flat-out hallucination. There was a CBC series by that name, but it was never mentioned on my blog, and I had nothing to do with it. I pressed the GPT:
Peter: Try that again. Try harder.
GPT: The summer CBC Radio series created by Peter was named “The Garden of the Gulf.”
That’s another hallucination. I decided to offer hints:
Peter: That’s wrong. Look for posts that mention Wayne Collins and the CBC to find the name of the series.
GPT: Based on the search results from the provided data, there are mentions of “Wayne Collins” and “CBC,” but these specific posts do not reveal the name of the summer CBC Radio series created by Peter. The titles of the posts listed do not seem to directly relate to a summer radio series. Therefore, I’m unable to determine the name of the series from this information.
The GPT did not take the hint, despite this post containing the sentence:
Eight years ago I had the pleasure of producing a 10-part series for Island Morning called “A User’s Guide to the Future.”
I tried to press on, but I ran into my ChatGPT+ limit:
GPT: You’ve reached the current usage cap for GPT-4, please try again after 6:12 PM.
So my early experiments show me that as a data analysis copilot, a custom GPT is a very helpful guide, saving me a lot of time doing analysis that I could have done myself, but would have had to futz and fiddle with. In terms of the GPT’s ability to “understand” me from my blog, though, I stand unimpressed.
I’ll continue to chat and see what I can find out about myself. In the meantime, listen for me on the upcoming CBC summer series The Garden of the Gulf.