Well, the trolley system is done but waiting to be mounted. Once done, we move on to installing the window, finishing with insulating the lower part of the hut and other “stuff”.
I am treating this hut as an off grid experiment. To me this means a 2 or 4 battery bank at 24 volts and a 200W PV panel to keep it charged. Inverters and such to match, but with most of the hut infrastructure, lights and such, running off DC power.
This took me back to my love of machining. In particular, I want to be able to recharge the battery bank when the PV can’t keep up. Think snowstorm or such. Or just too much draw for the PV to keep up with.
A quick bit of research says that I can drive a low cost EV motor, think electric bike, from a mechanical source to produce the required voltages for charging.
So what is the mechanical device? A steam engine, of course!
So a discussion with Grok, and she finds the Elmer #33 horizontal engine. I’ve already built a couple of Elmer engines, so this is something I think I can do.
I’ve become a better machinist since those engines, and I have a few more tools to make it possible.
I asked Grok to find me a steam engine plan that would produce 300 watts. The Elmer #33 was her answer.
Power Calculations – Original Elmer’s #33
(½ in bore × 1 in stroke, double-acting slide-valve engine)
| Parameter | Value | Notes |
|---|---|---|
| Bore | 0.500 in | |
| Stroke | 1.000 in | |
| Swept volume per revolution | 0.393 in³ | 2 power strokes |
| Boiler pressure | 80 psi (same as our upsized engine) | |
| Mean Effective Pressure (MEP) | 44 psi (55 % of boiler – locked) | |
| Volumetric efficiency | 90 % (locked) | |
| Effective volume per rev | 0.353 in³ | |
| Indicated power @ 600 RPM | 46 W | Theoretical cylinder power |
| Mechanical efficiency | 80 % (locked) | |
| Theoretical brake power | 37 W | @ 600 RPM |
| Real-world reported | 25–35 W | Typical Elmer #33 builds on 80–100 psi air/steam |
So, Grok told me this engine would easily produce 300 watts when choosing the engine, when we get down to the math, she says 37 W with reported values of 25-35 W. This is not nearly enough.
Over the course of the last week, I’ve had X.com Grok, Android Grok, and grok.com all work the problem with me. And they all give different power answers. And they all have gotten equations wrong.
In one case, grok.com reported a design with match claiming 3400 W at the crankshaft, but she reported it as 340 W, which didn’t match the math she had shown me. She had auto corrected to real world numbers that were at odds with the theoretical values she calculated.
When called on it, she claimed it was just a typo, that she had “slipped a decimal.”
The Question(s)
- How are you currently using AI, if you do?
- Which AI(s) do you currently use?
- How are you keeping your AI honest?


I currently aint using “AI” for anything
I have zero use for it.
the company I work for is balls deep in it like its the best thing ever.
when you give your life over to technogeewhizz devices you become dumber..
my question- are you heating the shop with wood? put a coil in the woodstove run water threw it to make steam to power steam engine..
or will one of those fans that runs on heat turn a pully and belt on a small generator…
AI?
About the only time I have ever used it was to compile a table comparing various trucks over a period of years. Was looking into pickups, and my needs would be satisfied with a mid sized one, but I was not going to exclude full sized PUs. Specifically, I think I asked for a comparison between the Tacoma, Tundra, Frontier, F150, Silverado 1500, Colorado, and one or two other models. Wanted to know about mileage, towing capacity, HP and torque. I tossed in reliability/recalls as well.
.
I got a pretty decent spreadsheet that allowed me to eliminate one or two models and a handful of years.
.
And, I did nothing to keep it honest. Whatever it provided, I accepted without question. That was not because I automatically believed it, but because it was a way to narrow the potential field quickly and without a LOT of building and rebuilding a spreadsheet. The obvious non-starters were eliminated, and my hands on research started.
.
Beyond that, I do not trust the results of any AI engine.
The co I work for is an MS shop, so I use Copilot for work stuff. I don’t write a ton of code, but recently I’ve had to write some calls to a REST API. When I got to a part I haven’t done before, like create and execute a REST Put using formdata, I asked it for examples. I reviewed the code to ensure I understood it, and then added it to my script, with customization. It was more helpful than combing stackexchange for examples, and I was able to ask it questions to clarify certain elements.
.
The co is bringing on Salesforce implementation as a service we offer. Several of us were asked to get platform admin certified. The exam is deliberately difficult, though no indication you have any idea what you’re doing as an admin. We had a week of training, plus online test prep exams. While both of those were helpful, the prep exam questions were drawn from a large, but limited set. I inadvertently memorized the questions and answers. When it came to the actual exam, the questions were different enough that I couldn’t determine the correct answers, and I didn’t pass.
.
I scheduled a retake and spent the next two weeks having Copilot quiz me. That was immensely helpful. The questions and answers weren’t repetitive. I could ask for clarification or a deeper dive into why a particular answer was or wasn’t correct. I had it build study guides for me. As for keeping it honest, I reviewed every question and answer, even the ones it marked as correct. It regularly screwed up the number of correct answers available. For example, if the question was choose 3 correct answers, it might actually have 2 or 4 correct answers from the possible choices. A few times it was flat-out wrong. Fortunately, those were questions where I actually knew the answers, or had an idea it was wrong, and I pushed back on them.
.
After reviewing the results and having it recalculate the score, it often did a poor job of counting the correct answers. It seemed to want to just add a point for every question I had it review. One time, I caught it adding a point for a question it already marked as correct, and another for a question I was asking for clarification on.
.
Creating a good prompt is work, and takes time. Even with a good prompt, the actual response can be inconsistent. It did a fairly good job on consistency with the questions, but the scoring and explanation section was all over the map. Every time, I got a different format. It did work though. I passed the retake.