Know what floats my boat? Charts and graphs.
Give me a cool chart to dig into and I'm unreasonably happy. I love watching the news on election nights, not for the vote count, but for all the great charts. I switch between channels all evening to see every possible way that each network finds to present numerical data.
Is that weird? I don't think that's weird.
Also: The moment I realized ChatGPT Plus was a game-changer for my business
As it turns out, ChatGPT does a great job making charts and tables. And given that this ubiquitous generative AI chatbot can synthesize a ton of information into something chart-worthy, what ChatGPT gives up in pretty presentation it more than makes up for in informational value.
Exactly what sort of chart-making tools are available for ChatGPT? There are three ways you can proceed:
In this article, I'll be discussing the first two options, but skipping over the third. While there are a variety of charting plugins for ChatGPT Plus, they all take you out of the ChatGPT interface and employ external services. They all attempt to charge an upsell fee to get you to use their SaaS-based charting services. Essentially, they're listed in the ChatGPT store as ads, not as out-of-the-box functional tools. Plus, they tend to be very unreliable.
Note:What ChatGPT used to call "Code Interpreter" is now called "Advanced Data Analysis." So whenever we previously used the term Code Interpreter, we'll now be moving on to calling it Advanced Data Analysis.
Advanced Data Analysis produces relatively ugly charts. But it rocks. First, let's discuss where ChatGPT gets its data, then we'll make some tables.
Earlier, we talked about which charting tools are available in which versions of ChatGPT. But there's more to it than simply charting tools. If you want to use ChatGPT productively, you need to understand what the various editions can do.
ChatGPT free version: This version uses the GPT-3.5 large language model, which isn't quite as capable as the GPT-4 version. From a data point of view, its view of worldly knowledge ends in September 2021. You can paste about 1,500 words of data into the chat dialog, which it is capable of processing.
ChatGPT Plus with Advanced Data Analysis: This version uses the GPT-4 large language model. It will import data tables in a wide range of file formats. While it doesn't specify a size limit for imported data, it can handle fairly large files, but will break if the files exceed some undefined level of complexity. Its view of worldly knowledge ends in September 2021, but you can augment that knowledge by uploading data files.
Also: GPT-3.5 vs GPT-4: Is ChatGPT Plus worth its subscription fee?
ChatGPT with WebPilot plugin: This version also uses the GPT-4 LLM. Basically, you can run either Advanced Data Analytics or plugins. If you choose to run plugins, the WebPilot plugin is fairly reliable when it comes to web searches (although it breaks on big requests). That means that ChatGPT's worldly knowledge doesn't need to end in 2021, but can generally extend to the current date.
ChatGPT Enterprise: Advanced Data Analytics and plugins are also available in the newly announced enterprise version. You can upload files to Enterprise, and they will remain confidential. Enterprise is also supposed to allow for bigger files and bigger responses. Pricing has not been specified.
Let's start with an example. For the following demonstration, we'll be working with the top five cities in terms of population.
List the top five cities in the world by population. Include country.
I asked this of ChatGPT's free version and here's what I got back:
Screenshot by David Gewirtz/Turning it into a table is simple. Just tell ChatGPT you want a table:
Screenshot by David Gewirtz/Make a table of the top five cities in the world by population. Include country.
You can manipulate and customize a table by giving ChatGPT more detailed instructions. Again, using the free version, we'll add a population count field. Of course, that data is out of date, but it's presented anyway:
Screenshot by David Gewirtz/Make a table of the top five cities in the world by population. Include country and a population field
You can also specify certain details about the table, like field order and units. Here, I'm moving the country first and compressing the population numbers.
Make a table of the top five cities in the world by population. Include country and a population field. Display the fields in the order of rank, country, city, population. Display population in millions (with one decimal point), so 37,833,000 would display as 37.8M.
Note that I gave the AI an example of how I wanted the numbers to display.
Screenshot by David Gewirtz/That's about as far as the free version will take us. From now on, we're switching to the$20/month ChatGPT Plus version.
ChatGPT Plus with Advanced Data Analytics enabled can make line charts, bar charts, histograms, pie charts, scatter plots, heatmaps, box plots, area charts, bubble charts, Gantt charts, Pareto charts, network diagrams, Sankey diagrams, choropleth maps, radar charts, word clouds, treemaps, and 3D charts.
In this example, we're just going to make a simple bar chart.
Make a bar chart of the top five cities in the world by population
Chatty little tool, isn't it?
Screenshot by David Gewirtz/The eagle-eyed among you may have noticed the discrepancy in populations between the previous table shown and the results here. Notice that the table has a green icon and this has a purple icon. We've jumped from GPT-3.5 (the free version of ChatGPT) to GPT-4 (in ChatGPT Plus). It's interesting that the differing LLMs have slightly different data. This is all part of why it pays to be careful when using AIs and double-check your work. In our case, we're just demonstrating charts, but this is a tangible example of where confidently presented data can be wrong or inconsistent.
One of Advanced Data Analytics' superpowers is the ability to upload a dataset. For our example, I downloaded the Popular Baby Names dataset from Data.gov. This is a comma-separated file of New York City baby names from 2011-2014. Even though it's a decade out of date, it's fun to play with.
The dataset I chose for this article is readily available from a government site, so you can replicate this experiment on your own. There are a ton of great datasets available on Data.gov, but I found that many are far too large for ChatGPT to use. Once I downloaded this one, I realized it also included information on ethnicity, so we can run a number of different charts from the same dataset.
Also: How to use ChatGPT to create an app
Click the little upload button and then tell it the data file you want to import.
Screenshot by David Gewirtz/I asked it to show me the first five lines of the file so I'd know more about the file's format.
I was curious about how the dataset distributed gender names. Here's my first prompt:
Create a pie chart showing gender as a percentage of the overall dataset
And here's the result:
Screenshot by David Gewirtz/Unfortunately, the dark shade of green makes the numbers difficult to read. Fortunately, you can instruct Advanced Data Analytics to use different colors. I was careful to choose colors that did not reinforce gender stereotypes.
Screenshot by David Gewirtz/Create a pie chart showing gender as a percentage of the overall dataset. Use light green for male and medium yellow for female.
As we saw earlier, the data collected includes ethnicity. Here's how to see the distribution of the various ethnicities New York recorded in the early 2010s:
Show the distribution of ethnicity in the dataset using a pie chart. Use only light colors.
And here's the result. Notice anything?
Screenshot by David Gewirtz/Apparently, New York didn't properly normalize its data. It used "WHITE NON HISPANIC" and "WHITE NON HISP" together, "BLACK NON HISPANIC" and "BLACK NON HISP" together, and "ASIAN AND PACIFIC ISLANDER" and "ASIAN AND PACI" together. This resulted in inaccurate representations of the data.
One benefit of ChatGPT is it remembers instructions throughout a session. So I was able to give it this instruction:
For all the following requests, group "WHITE NON HISPANIC" and "WHITE NON HISP" together. Group "BLACK NON HISPANIC" and "BLACK NON HISP" together. Group "ASIAN AND PACIFIC ISLANDER" and "ASIAN AND PACI". Use the longer of the two ethnicity names when displaying ethnicity.
And it replied:
Screenshot by David Gewirtz/Let's try the chart again, using the same prompt.
Show the distribution of ethnicity in the dataset using a pie chart. Use only light colors.
That's better:
Screenshot by David Gewirtz/You need to be diligent when looking at results. For example, in a request for top baby names, the AI separated out "Madison" and "MADISON" as two different names:
Screenshot by David Gewirtz/For all the following requests, baby names should be case insensitive.
Let's wrap up with a complex chart from one prompt. Here's our prompt:
For each ethnicity, present two pie charts, one for each gender. Each pie chart should list the top five baby names for that gender and that ethnicity. Use only light colors.
As it turns out, the chart generated text that was too small to read. So, to get a more useful chart, we can export it back out. I'm going to specify both file format and file width:
Screenshot by David Gewirtz/Export this chart as a 3000 pixel wide JPG file.
And here's the result:
Screenshot by David Gewirtz/Notice that Sofia and Sophia are very popular, but are shown as two different names. But that's what makes charts so fascinating.
Advanced Data Analytics comes with ChatGPT Plus as a beta feature you have to turn on in the Settings panel. ChatGPT Plus is$20/month. Advanced Data Analytics also is included with the Enterprise edition, but pricing for that hasn't been released yet.
Assume that there's always a privacy risk.
I asked this of ChatGPT and this is what it told me:Data privacy is a priority for ChatGPT. Uploaded data is used solely for the purpose of the user's current session and is not stored long-term or used for any other purposes. However, for highly sensitive data, users should always exercise caution and consider using the Enterprise version of ChatGPT, which offers enhanced data confidentiality.
Also: Generative AI brings new risks to everyone. Here's how you can stay safe
My recommendation: Don't trust ChatGPT or any generative AI tool. The Enterprise version is supposed to have more privacy controls, but I would recommend you only upload data that you won't mind finding its way to public visibility.
It's possible, but there are some practical limitations. First, the Plus account will throttle the number of requests you can make in a given period of time. Second, you have to upload each file individually. There is the possibility you could use a licensed ChatGPT API to do real-time analytics. But for the chatbot itself, you're looking at parsing data at rest.
You can follow my day-to-day project updates on social media. Be sure to subscribe to my weekly update newsletter on Substack, and follow me on Twitter at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, and on YouTube at YouTube.com/DavidGewirtzTV.