Compared to using Python “straight” with a IDE aor inside Jupyter Notebook, PIE (Python In Excel) is quite unappealing. However there are good reasons to get it to know better. It's not a bad product and I suspect that it will get a lot better over the years. In the time I've been playing with it, I've learned some things:
PIE uploads the code and its data to a secure Microsoft server, where it gets executed and the results are returned. If you;re used to running everything on your machine it sounds like an abomination, and in some ways, it is, So what is good about it?
Anyone can run PIE code without installing anything. Your complex model can be sent to the VP of sales who can enter some parameters and get those beautiful Seaborn graphs, and they only need Excel on their machine. Clearly this is the big selling point.
As far as "I don't want to send my data to Microsoft" remember that in most MS Office shops your data is either stored on OneDrive or in Sharepoint (Teams) so Microsoft already has your data.
That approach comes with some limitations you will have to live with, and I encourage playing around with PIE t get used to them:
- Limited packages. Of course Python comes "batteries included: so having "only" the standard library is not as limiting as it is for a C programmer. On top, PIE comes with Pandas, Numpy, Matplotlib, Seaborn, and StatsModels, so exactly a Spartan outfit either and will comfortably cover most business needs. But yes, pip install great-tables is not an option.
- Because of its execution model you can't load data on your machine if it's not already "inside Excel." External files can be referenced but you need to set up a query in PowerQuery to make it happen. Furthermore, "for security reasons" (but I suspect server load concerns) you can't access HTTP or data over API's either.
- Now the tricky part: PIE needs to know the data it needs "at compile time" when the code is sent to the server. As a consequence, the xl function used to retrieve Excel data must be s string literal and cannot be a variable or calculation. This will work:
qry = "MyBigQuery"
df = xl(qry)
- Speaking of which, when creating a query in PowerQuery, its default is to create an Excel sheet with output in the same name. By default, it seems PIE will prefer the Excel (Data) Table over the query and that matters because your data table reference needs to include [#All] to include the headers. Even worse, if you never intended the Excel table (it just takes up space) and delete the sheet, your line of code will now read xl("#REF!") (remember that intricate compile-time-literals-only thing!), but it will run after updating it back to xl("my_data").
- The server will cache data but only for a limited time. A cell will take initially a long time to run if you're loading a lot of data but then run quickly when executing it again, But walk away for coffee, and run it again and it will need to retrieve its data again
- The Editor can be found in the Formulas ribbon. Think IDLE, not Pycharm, but it beats working in the formula bar
- Cells (formulas) get executed in regular calculation order: column by column. So it would run your A1 cell first, then A2, A3 and then B1 (if you're organizing your code that way), and not A1-B1-A2-A3.
With that I listed what I think are most of the surprises you can run into. If you're able to install Python and VS Code (Pycharm) on your machine, I'd still do that, develop your code there and then copy it over to Excel. Debugging in PIE is not fun! But you can share your code once it's in PIE, and the importance of that cannot be understated.