I Built a Robin Sloan Machine

Peter Rukavina

The Venn of “links in Robin Sloan’s email newsletters” and “things that interest me” is incredibly high. I noticed this especially with the April Edition: I read the newsletter in Readwise Reader, and I was adding almost every link to the queue for reading later in Reader.

So I decided that I needed a Robin Sloan Machine: a bit of code that would take all the links in the public URL for the newsletter and add them to Reader for later reading.

My ask to ChatGPT:

I have a URL that I want to extract all the links from, and then feed each link to the Readwise Reader API, adding each URL, with title, and tags of my choosing, as Readwise items. Please provide me with a PHP script that will do that. Make the original URL an argument for the script, and make the tag(s) an argument to the script, so that I can pass these as parameters and feed other URLs in a similar way.

The first pass at this was fully functional. I tinkered a little in subsequent prompts.

Anticipating that I might want to call this as an API endpoint on my own server at some point:

Please modify the PHP script so that it will accept EITHER parameters passed on the command link, OR the URL query parameters “url” and “tags.”

Tweaking the Reader API to clean up the HTML, and realizing that my script doesn’t need to figure out the title, as Reader will do that itself:

Modify the script so that it doesn’t need to pass the “Title” parameter to the Readwise API — the API will calculate this on its own when it reads the URL. Also, add the API parameter “should_clean_html” when calling the Readwise Reader API, and set it to “true”.

And, finally:

Modify the script so that multiple links found that are different only because they have a hash tag at the end are not sent — just send ONE to Readwise. And, finally, add code to pay attention to the note in the API about rate limiting.

Here’s the final script:

<?php
$isCli = php_sapi_name() === 'cli';
if ($isCli) {
   if ($argc < 3) {
       echo "Usage: php index.php <source_url> <tag1,tag2,...>\n";
       exit(1);
   }
   $sourceUrl = $argv[1];
   $tagsInput = $argv[2];
} else {
   if (!isset($_GET['url']) || !isset($_GET['tags'])) {
       header("HTTP/1.1 400 Bad Request");
       echo "Missing 'url' or 'tags' query parameter.";
       exit;
   }
   $sourceUrl = $_GET['url'];
   $tagsInput = $_GET['tags'];
}
$tags = array_map('trim', explode(',', $tagsInput));
$readwiseToken = 'YOUR_READWISE_API_TOKEN'; // Replace with your actual token
function fetchHtml($url) {
   $context = stream_context_create([
       'http' => ['header' => "User-Agent: PHP script\r\n"]
   ]);
   return @file_get_contents($url, false, $context);
}
function normalizeUrl($url) {
   $parsed = parse_url($url);
   if (!$parsed) return $url;
   // Remove fragment
   unset($parsed['fragment']);
   // Rebuild URL without fragment
   $normalized = '';
   if (isset($parsed['scheme'])) $normalized .= $parsed['scheme'] . '://';
   if (isset($parsed['user'])) {
       $normalized .= $parsed['user'];
       if (isset($parsed['pass'])) $normalized .= ':' . $parsed['pass'];
       $normalized .= '@';
   }
   if (isset($parsed['host'])) $normalized .= $parsed['host'];
   if (isset($parsed['port'])) $normalized .= ':' . $parsed['port'];
   if (isset($parsed['path'])) $normalized .= $parsed['path'];
   if (isset($parsed['query'])) $normalized .= '?' . $parsed['query'];
   return $normalized;
}
function extractLinks($html, $baseUrl) {
   libxml_use_internal_errors(true);
   $dom = new DOMDocument;
   $dom->loadHTML($html);
   $xpath = new DOMXPath($dom);
   $links = [];
   foreach ($xpath->query('//a[@href]') as $node) {
       $href = trim($node->getAttribute('href'));
       if (!$href || strpos($href, 'javascript:') === 0) continue;
       // Convert to absolute URL if needed
       if (parse_url($href, PHP_URL_SCHEME) === null) {
           $href = rtrim($baseUrl, '/') . '/' . ltrim($href, '/');
       }
       $normalized = normalizeUrl($href);
       $links[$normalized] = true; // use keys for uniqueness
   }
   return array_keys($links);
}
function sendToReadwise($url, $tags, $token) {
   $payload = [
       'url' => $url,
       'tags' => $tags,
       'should_clean_html' => true
   ];
   $ch = curl_init('https://readwise.io/api/v3/save/');
   curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
   curl_setopt($ch, CURLOPT_HEADER, true);
   curl_setopt($ch, CURLOPT_HTTPHEADER, [
       'Authorization: Token ' . $token,
       'Content-Type: application/json'
   ]);
   curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($payload));
   $response = curl_exec($ch);
   $headerSize = curl_getinfo($ch, CURLINFO_HEADER_SIZE);
   $header = substr($response, 0, $headerSize);
   $body = substr($response, $headerSize);
   $httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
   if (curl_errno($ch)) {
       echo "cURL error for $url: " . curl_error($ch) . "\n";
   } elseif ($httpCode == 429) {
       echo "Rate limit hit for $url. ";
       preg_match('/Retry-After: (\d+)/i', $header, $matches);
       $retryAfter = isset($matches[1]) ? (int)$matches[1] : 60;
       echo "Waiting for $retryAfter seconds...\n";
       sleep($retryAfter);
       sendToReadwise($url, $tags, $token); // Retry
   } elseif ($httpCode >= 200 && $httpCode < 300) {
       echo "Sent: $url\n";
   } else {
       echo "Failed to send $url. HTTP $httpCode\n";
       echo $body . "\n";
   }
   curl_close($ch);
}
// MAIN
$html = fetchHtml($sourceUrl);
if (!$html) {
   echo "Failed to fetch source URL: $sourceUrl\n";
   exit(1);
}
$links = extractLinks($html, $sourceUrl);
echo "Found " . count($links) . " unique normalized links on $sourceUrl\n";
foreach ($links as $link) {
   sendToReadwise($link, $tags, $readwiseToken);
   sleep(1); // basic pacing
}

I ran this on my Mac with:

php index.php https://www.robinsloan.com/newsletters/golden-sardine/ "robinsloan,newsletter,2025-04"

I set it to run, and ended up with 85 new links in my Reader for browsing. I created a View in Reader to segregate them into their own handy area (tag:robinsloan):

A screen shot of the Readwise Reader app on my computer screen, showing a new Robin Sloan view, with a list of articles.

From start to finish this took about 35 minutes to create and refine. It would have been possible without ChatGPT sitting on my shoulder whispering code in my ear, but it would have taken a lot longer.

Add new comment

Plain text

  • Allowed HTML tags: <b> <i> <em> <strong> <blockquote> <code> <ul> <ol> <li>
  • Lines and paragraphs break automatically.

About This Blog

Photo of Peter RukavinaI am . I am a writer, letterpress printer, and a curious person.

To learn more about me, read my /nowlook at my bio, listen to audio I’ve posted, read presentations and speeches I’ve written, or get in touch (peter@rukavina.net is the quickest way). 

I have been writing here since May 1999: you can explore the 25+ years of blog posts in the archive.

You can subscribe to an RSS feed of posts, an RSS feed of comments, or a podcast RSS feed that just contains audio posts. You can also receive a daily digests of posts by email.

Search