Skip to main content

ChatGPT, Datasette-Extract, and the US Ham Radio General Exam Question Pool

 I started a project, ahem, yesterday to 'quickly' see if ChatGPT could read the entire United States general class amateur radio exam question pool into a Datasette instance using the datasette-extract plugin. As of this morning, I haven't been able to coax ChatGPT using the gpt-4-turbo model. I capture my rather raw notes below. The short version of this is that I was never able to get the AI to capture more than 19 questions at a time. I'm hopeful that the pool could be moved into a database table using iterative processes, but for now, I've run out of time for this quick project :) 

Occasionally ChatGPT seemed to hallucinate out part of its process into the table


Notes Follow

I'm going to track how easy it is to get the general exam question pool into a database using the Datasette Plugin. I started this endeavor at 20:37 UTC.


Get my already existent OpenAI API key ready to go

20:43: Done. As usual with OpenAI, the hardest part was finding login screens and then the API. Finally did a Google search to find the API.


Install the datasette-extract plugin

I've run into an issue here. I think I have too old of a version of Datasette and Windows can't figure out how to uninstall it

Using cached datasette_extract-0.1a6-py3-none-any.whl (815 kB)

Using cached datasette-1.0a13-py3-none-any.whl (302 kB)

Using cached datasette_secrets-0.1a4-py3-none-any.whl (12 kB)

Installing collected packages: datasette, datasette-secrets, datasette-extract

  Attempting uninstall: datasette

    Found existing installation: datasette 1.0a3

    Uninstalling datasette-1.0a3:

ERROR: Could not install packages due to an OSError: [WinError 32] The process cannot access the file because it is being used by another process: 'c:\\users\\m3n7es\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\scripts\\datasette.exe'

Check the permissions.

I'll clone a dev environment for the plugin and then run in venv. Time now 21:00.

Still Installing

21:05 OK! pytest passes!

Adding Table Column Names

This is easy since I've already got a table for the general exam pool. The headings are:

id question class subelement group_index group_number answer answer_a answer_b answer_c answer_d 

21:21 The column names have been defined with hints. 

id primary key

question follows a line starting with G ends with '?'

class Defaults to G for every question

subelement A number following G before a second letter

group_index The letter following subelement's number (G)(\d)(A-Z)(\d\d) Use \$3

group_number two digit number following group_index (G)(\d)(A-Z)(\d\d) use \$4

answer A single letter between parentheses that indicates the correct answer choice

answer_a next line starting with 'A.'

answer_b next line starting with 'B.'

answer_c next line starting with 'C.'

answer_d next line starting with 'D.'

I added the additional instructions

The questions and answers are in line sorted by headings that contain class (always G), then subelement (a single digit following G), then group_index (a single letter following the subelement), then group_number (a question number within the group_index), then the single letter correct answer enclosed in parentheses. The next line contains the entire question text for the question field. The next four lines in each question contain the four possible answers. The end of each question is denoted by '~~'.

I've copied the entire question pool starting at 





and ending at 



into the tool. Now, I'll press 'Extract'



Time is 21:26 UTC

Extracting to Table

Got back this error message:

Error: Error code: 404 - {'error': {'message': 'The model `gpt-4-turbo` does not exist or you do not have access to it.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}


OK. Looking at My OpenAI account I see:


No gpt-4-turbo. So, that's a bit of a challenge.

OK! The API is like using a Clipper Card on BART. You have to pay up front


I put some money in the account. 

I'll try to extract again. It's 21:43.

It's Working!!!

[
  {
    "id": 1,
    "question": "On which HF and/or MF amateur bands are there portions where General class licensees cannot transmit?",
    "class": "G",
    "subelement": "G1",
    "group_index": "A",
    "group_number": "01",
    "answer": "C",
    "answer_a": "60 meters, 30 meters, 17 meters, and 12 meters",
    "answer_b": "160 meters, 60 meters, 15 meters, and 12 meters",
    "answer_c": "80 meters, 40 meters, 20 meters, and 15 meters",
    "answer_d": "80 meters, 20 meters, 15 meters, and 10 meters"
  },
  {
    "id": 2,
    "question": "On which of the following bands is phone operation prohibited?",

The engine is still cranking along at 21:47.

And Then </exceeds>

  {
    "id": 19,
    "question": "When is it permissible to communicate with amateur stations in countries outside the areas administered by the Federal Communications Commission?",
    "class": "G",
    "subelement": "G1",
    "group_index": "B",
    "group_number": "08",
    "answer": "B",
    "answer_a": "Only when the foreign country has a formal third-party agreement filed with the FCC",
    "answer_b": "When the contact is with amateurs in any country except those whose administrations have notified the ITU that they object to such communications",
    "answer_c": "Only when the contact is with amateurs licensed by a country whic...  Click to expand ... <exceeds maximum number of characters> ,,groupId,,quizzes,,element,,data,,result,,direct,,[]}]}]}</exceeds>}]}]}</exceeds>}]}]}</exceeds>}]}]}</exceeds>}]}]}</exceeds>}]}]}</exceeds>}]}]}</exceeds>}]}]}</exceeds>}]}]},"
  }
]

Did I hit the end of my billing envelope?

21:51 No, billing seems fine. I wonder is I need to add the file in as a pdf because of this message:

exceeds maximum number of characters

Trying again with a pdf file

21:59 Dropping in a pdf fil resulted in 'Processing...' message for the last 8 minutes. Trying this a subelement (subelement_group? since it didn't complete a subelement) at a time.

Full Subelement at a time

Back up and running at 22:01.
 
Well, shucks, that time it only pulled out two questions. Also, it didn't create the table even though it said it did:








I'll try a db that doesn't revolve around a memory table next.

No Memory Table DBs

What could have been really bothersome was a breeze. The table columns auto-populated for me!

'Additional instructions' was not auto-populated, so WooooHooooo!!! blogging. Meaning, I'm really happy I documented my instructions a few paragraphs back.

22:11 Pushed the 'Extract' button. Results started coming in a few seconds later.

Nuts! It got three questions out this time, but that's it! What's the difference in setups???

Adding Remaining SubElement Group by Hand 


Starting at 22:22

22:24 That worked. The entire G1A subelement group is in the table.

Can it do two subelement groups?

22:26 Input subelement group B and C

22:27 Both subelement groups have been successfully added.

The rest of the groups in the subelement?

Again that's two subelemetn groups, D and E, but it only pulled out one question: the last one in the C group that I accidentally copied back in. Nuts! 

Removed the row, removed the input, trying again at 22:32

Made it through the D subelement group and then stopped on 

"G1E – Control categories; repeater regulations; third-party rules; ITU regions; automatically controlled digital station"

I think I see the game. I'll take out the group descriptions and add all the text in to see if I can be deon with this. 23:34

Descriptions Removed

23:42 back up and running with all the descriptions removed. We'll see how this goes.

It's taking about four seconds per exam question to figure out the correct extraction.

After 
 "id": "G1E12",
decided it was done

Remember how the ids started out as numbers? Weird.
Note: Updating the following morning. Not weird. I forgot to set the field type to integer.

More Instructions

22:49
Added these additional instructions:

"When the subelement changes, or the subelement group changeds, keep going please. The end of the question pool is deonted by '~~~end of question pool text~~~' You're doing a great job, but please get every additional question this time."

and trying again.

22:49 Three questions have come back. It seems to be thinking now?

22:50 (Yes, I know it's not actually thinking.)

22:51 Calling this. Still at three additional quesitons.

Don't give away the ending

I took away the instruction about how to find the end of the pool. as well as the line about 'every additional question'

22:54 Successfully crossed from G2A12 to G2B01

22:54 And now from G2B11 to G2C01

22:55 Stopped at G2C08. Why???

Did ChatGPT read the question? 'What prosign is sent to indicate the end of a formal message when using CW?'

22:59 Made the hop to G3A01 and then promptly decided it was done again.

There were two blank lines above that question rather than one. Is that why?

23:02 started it back up.

23:02 Stopped atain at G3A14.

Again, there are three blank lines after this question rather than one.

23:05 Added 'The number of blank lines between questions is NOT significant.' to the Additional instructions.

Stopped two questions later at G3B02.

23:06 Starting again.

Two questions again. Taking away the last instruction.


23:38 So Tired
Got this error a few rows in 


After changing 'Additional instructions' to 

"IGNORE ALL BLANK LINES in content. Extract all data from content according to the following instructions. Rows will always begin with the pattern (G)(\d)([A-J])(\d\d)(\s*)([A-D]) and end with a line containing '~~' The questions and answers are in line sorted by headings that contain class (always G), then subelement (a single digit following G), then group_index (a single letter following the subelement), then group_number (a question number within the group_index), then the single letter correct answer enclosed in parentheses. The next line contains the entire question text for the question field. The next four lines in each question contain the four possible answers. The end of each question is denoted by '~~'"

Let's flush the state and start over

Looking above, the plug-in did as well as it ever did before I tried all the above experiments. One thing I hadn't realized, (although I'd documented it), was that I accidentally changed the key to be text on my second try. I'm moving back to the original material copied in and the original instructinos with a numeric key.

First, I tried without a new key and wound up only getting two questions back. Just as bad as ever. 
Changing all the fields with numbers to integer resulted in one question.

I'm going to create a new OpenAI key and start on a clean database.

New database, new key, new table name wound up with 13 questions on the first try. I don't think I'

Comments

Popular posts from this blog

More Cowbell! Record Production using Google Forms and Charts

First, the what : This article shows how to embed a new Google Form into any web page. To demonstrate ths, a chart and form that allow blog readers to control the recording levels of each instrument in Blue Oyster Cult's "(Don't Fear) The Reaper" is used. HTML code from the Google version of the form included on this page is shown and the parts that need to be modified are highlighted. Next, the why : Google recently released an e-mail form feature that allows users of Google Documents to create an e-mail a form that automatically places each user's input into an associated spreadsheet. As it turns out, with a little bit of work, the forms that are created by Google Docs can be embedded into any web page. Now, The Goods: Click on the instrument you want turned up, click the submit button and then refresh the page. Through the magic of Google Forms as soon as you click on submit and refresh this web page, the data chart will update immediately. Turn up the:

Cool Math Tricks: Deriving the Divergence, (Del or Nabla) into New (Cylindrical) Coordinate Systems

Now available as a Kindle ebook for 99 cents ! Get a spiffy ebook, and fund more physics The following is a pretty lengthy procedure, but converting the divergence, (nabla, del) operator between coordinate systems comes up pretty often. While there are tables for converting between common coordinate systems , there seem to be fewer explanations of the procedure for deriving the conversion, so here goes! What do we actually want? To convert the Cartesian nabla to the nabla for another coordinate system, say… cylindrical coordinates. What we’ll need: 1. The Cartesian Nabla: 2. A set of equations relating the Cartesian coordinates to cylindrical coordinates: 3. A set of equations relating the Cartesian basis vectors to the basis vectors of the new coordinate system: How to do it: Use the chain rule for differentiation to convert the derivatives with respect to the Cartesian variables to derivatives with respect to the cylindrical variables. The chain

The Valentine's Day Magnetic Monopole

There's an assymetry to the form of the two Maxwell's equations shown in picture 1.  While the divergence of the electric field is proportional to the electric charge density at a given point, the divergence of the magnetic field is equal to zero.  This is typically explained in the following way.  While we know that electrons, the fundamental electric charge carriers exist, evidence seems to indicate that magnetic monopoles, the particles that would carry magnetic 'charge', either don't exist, or, the energies required to create them are so high that they are exceedingly rare.  That doesn't stop us from looking for them though! Keeping with the theme of Fairbank[1] and his academic progeny over the semester break, today's post is about the discovery of a magnetic monopole candidate event by one of the Fairbank's graduate students, Blas Cabrera[2].  Cabrera was utilizing a loop type of magnetic monopole detector.  Its operation is in concept very sim