Managing nushell scripts
Nushell is a powerful programming language because it makes very good use of the unix piping mechanics.
Take this very simple one line that I made for printing terminal screen to a pdf like it's the 80s:
export def makepdf [name:string] {
to text | enscript -B -fHelvetica10 -p - | ps2pdf - | save $"($name).pdf" -f; mupdf $"($name).pdf"
}
I save this to a file called utils.nu and in order to use it I can invoke two different commands
use utils.nu;
echo "hello world!" | utils makepdf helloworld
or the source command:
source utils.nu;
echo "hello world!" | makepdf helloworld
Both perform the same things which is piping a text into a utility called enscript and then using ghostscript's ps2pdf converter, we turn it into pdf and viewed via my favorite lightweight mupdf application.
It's a lot of dependencies to make this one simple function work, but when it does, it can fly through a database of records and build an entire file tree of pdfs if needed (batch processing).
And it's very simple to write an elaborate program that takes advantage of this functional paradigm. I made this database browser that goes through my chat histories with llms with skim (sk/ fzf substitute):
export def sqlbrowse [] {
open $"($env.HOME)/chatbot_conversation.db"
| (do {|x| ($x.conversation_history|sk|update chatbot_response (gum write --value $"($in.chatbot_response)")) } $in) | get chatbot_response
}
Thereby, making a pdf of all of the conversations can be done with one simple shell command:
open $"($env.HOME)/chatbot_conversation.db" | (each {|x| ($x.conversation_history.chatbot_response | makepdf $"$x.conversation_history.id") })
And you get a separate pdf file named {1 to n}.pdf in your currend working directory. I have 1000s of scientific articles that I have accumulated over the decades that I haven't had much time to digest. From a archival perspective, it doesn't make sense to have it in an PDF format if they are not read/used or shared. Furthermore, PDFs are not very LLM friendly since they have a very expensive attention vector on positioning of text. It's often better for text-heavy documents to be converted and archived in a cleaner text format that can be wrapped in meta data and piped to other interfaces particularly note-taking knowledge bases that are LLM based.
LLMs do a particularly good job at ingesting reading for you. And the hard part might be in data preparation, the first 80% of the extraction is knowing the right tools. For pdf extraction, poppler-utils has a pdftotext binary that converts pdfs to plaintext.
By doing so, you save 50% of the file size and give the content a 2nd life by being text greppable by popular utilities you know and love.
The really neat thing about batch processing is that it allows you to centralize your data for intelligence gathering and sharing to your other knowledge base interfaces. Enterprises are all about this kind of stuff. Since LLMs have hit the scene, corporations have been well underway in migrating legacy business documents into compilation data. But one of the hard things to port over are the day-to-day documents like Excel spreadsheets with a few VB macros to parse csv tables and generate reports.
That's three different languages, technically, 4 if you count the initial SQL commands under the hood to fetch the csv file. And probably 95% of businesses still operate like this day-to-day.
By moving a lot of the scripting into nushell, you can take advantage of SQL's powerful query language that is the standard and mix it with nushell's type in a shell for amazing utility. Below is a script that edits a chatbot response with a SQL UPDATE command using fuzzyfinder as a selector:
export def llmdbedit [] {
open $"($env.HOME)/chatbot_conversation.db"
| query db "UPDATE conversation_history SET chatbot_response = :chatbot_response WHERE id = :id" -p (do {|x| ($x.conversation_history|sk --format {get user_input} --preview {get chatbot_response}|update chatbot_response (gum write --value $"($in.chatbot_response)")) } $in | reject user_input timestamp)
}
This makes database queries simple and fun.
But how do we manage all of these new commands that we build? Nushell scripts can be invoked as I mentioned and I don't find them to fit as modules for public consumption. Rather, they are personal scripts that can generate my resume or coverletter for me without the use of python libraries. It's a system in a shell concept.
Personally, I found github gists works really well with nushell scripts. Simply write:
gh gist create utils.nu
And github automatically creates the utils.nu gist for you with a link. You might have to authenticate with the gh-cli first with an SSH or GH_TOKEN but after that, you basically have a control center for querying a repository of nushell scripts. Here is a little helper function I made to get all of the gists with a file extension .nu piped into a fuzzy finder and saved locally as a gist.nu. This allows me to source it with source gist.nu to activate it.
export def ghnu [] {
gh gist view (gh gist list | from tsv --noheaders | filter {|el| ($el.column1 | path parse | get extension) == nu} | sk --format {get column1} | get column0) | save gist.nu -f
}
This takes a lot of the guess work out of python library management for the everyday mundane clerical work that businesses use python for. Python is a terrible programming language for businesses because of its virtual environment management and versioning which makes every update a dance with the devil.
While nushell is technically a forever alpha project with breaking changes as it relates to strings. It's surprisingly robust and performant over python simply by being a batteries included type system.
What I mean by a batteries-included type system is when you source the nushell script, the function definitions immediately get upgraded to global shell CLIs. This means you get help functionalities and great error messages along with solving your current problem:
◄ 0s ⋈┈◎ : help makepdf
Usage:
> makepdf <name>
Flags:
-h, --help: Display the help message for this command
Parameters:
name <string>
Input/output types:
╭───┬───────┬────────╮
│ # │ input │ output │
├───┼───────┼────────┤
│ 0 │ any │ any │
╰───┴───────┴────────╯
By comparison, in python you would need to import the click library or use the Tiangolo's Typer CLI library to gain some of the same access to your own utility builders.
One recent discovery I came across is this clever cli tool called pdfcpu. It does many things with pdf but one in particular that I found quite ingenious is PDF creation from a json data format.
I previously made a very verbose postscript parser in nushell that I felt wasn't maintainable or scaleable as I would essentially be parsing ghostscript in a nushell repl to make documents.
With pdfcpu, I can get a full cover letter written and formatted in pdf in less than a second.
xport def coverletter [] {
# input | enscript -fCourier14 -B -p - | ps2pdf - coverletter.pdf; mupdf coverletter.pdf
# let cv = importjson resume.json # local development
let cv = http get https://gist.githubusercontent.com/shaoyanji/b7b844737e6469c9160bf41aa8970068/raw/resume.json
let font = "Roboto-Regular"
#let font = "Courier"
let boldf = "Helvetica-Bold"
#let boldf = "Courier-Bold"
let fsz = 12
let px = 96 / 2.54
let lm = 2.5 * $px
let tm = 4.5 * $px
let rm = 2 * $px
let prompt = "Write a job cover letter in German. Compose a brief and impactful cover letter based on the provided job description and resume. The letter should be no longer than three paragraphs and should be written in a professional, yet conversational tone. Avoid using any placeholders, and ensure that the letter flows naturally and is tailored to the job. Analyze the job description to identify key qualifications and requirements which are listed in yaml after the the job description. Introduce the candidate succinctly, aligning their career objectives with the role. Highlight relevant skills and experiences from the resume that directly match the job’s demands, using specific examples to illustrate these qualifications. Reference notable aspects of the company, such as its mission or values, that resonate with the candidate’s professional goals. Conclude with a strong statement of why the candidate is a good fit for the position, expressing a desire to discuss further. Please write the cover letter in a way that directly addresses the job role and the company’s characteristics, ensuring it remains concise and engaging without unnecessary embellishments. The letter should be formatted into paragraphs and should not include a greeting or signature."
{paper: A4, pages: {1: {content: {
text: [
{font: {name:$font, size: $fsz}, value: (
$cv.basics.name
| append [
$cv.basics.location.address
[[$cv.basics.location.countryCode $cv.basics.location.postalCode]]
$cv.basics.phone
$cv.basics.email
]
| to text), pos:[(595 - $rm - ($cv.basics.email | str length) * 12 * .5) (842 - 5 * $fsz)]}
{font: {name:$font, size: $fsz}, value: ("Freiburg, " + (date now | format date "%d.%m.%Y")), pos:[$lm (842 - 8 * $fsz)]}
{font: {name:$font, size: $fsz}, value: (input --reedline -d '\n\n\n' "Address> " ), pos:[$lm (842 - 12 * $fsz)]}
{font: {name:$boldf, size: $fsz}, value: (input -d 'Betreff: Bewerbung' "Subject> " ), pos:[$lm (842 - 14 * $fsz)]}
{font: {name:$font, size: $fsz}, value: ("Sehr geehrte Damen und Herren,\n\n" + (gum write --header="vG <Esc> gqq in vim" --value (groq ($prompt + ($cv | to text) + (input --reedline "Job Description> "))))), pos:[$lm (14 * $fsz)]}
{font: {name:$font, size: $fsz}, value: "Mit freundlichen Grüßen,\n\n\n\n\n\n\nMatt Ji", pos:[$lm (2 * $fsz) ]}
]
image: [
{src: "https://jisifu.vern.cc/signature.png", pos:[$lm (4 * $fsz)]}
# {src: "./signature.webp", pos:[50 (4 * $fsz)]} # local development
]
}
}}}
| save pdfcpu.json -f;
pdfcpu create pdfcpu.json Ji_Matt_Cover_Letter.pdf; mupdf Ji_Matt_Cover_Letter.pdf
}
And translated in German to boot with LLMs. But this does require some knowledge of vim shortcuts to format line widths as the llm json blob doesn't account for the text to be processed with its newlines.
This compose-ability is the key factor in nushell. While python can sort of do all of this too, it is thread limited and perform way worse than go for the micro-work that is needed for pdf generation and rust for the wonderful shell that just works out-of-the-box.