In this article, we’ll look at data manipulation and visualization techniques in Julia. However, I’ll not get into the details of each parameter of every function, as the objective of this series is to use Julia as a tool to achieve our goal, i.e. building and backtesting trading strategies. So, we’ll stay focused on that.

You can refer to the detailed documentation of a function if you need it to solve any particular challenge you face while programming.

This article is divided into the following sections:

Data manipulation
Importing and exporting data as CSV and excel files
Data visualization

In my previous posts in this Julia programming series, I introduced the language and started with the basic syntax of Julia programming. You can check that out as well.

Data manipulation

You need to understand the data structures dealing with large heterogeneous data sets whenever you work with any programming language. In the Julia world, they are called dataframes.

Julia’s DataFrames.jl package provides a way to structure and manipulate data.

It can be installed using the “Pkg” module.

Creating new dataframes

Here’s an example of creating a new dataframe.

Output:

Name	Team	Work_experience
String	String	Int64
Vivek	EPAT	15
Viraj	Marketing	8
Rohan	Sales	7
Ishan	Quantra	10

a	b
Float64	Float64
0.845011	0.720306
0.647665	0.0409036
0.427267	0.221369
0.413642	0.374832
0.477994	0.118461
0.0849006	0.157679
0.0477405	0.845332
0.518909	0.159305
0.93499	0.259579
0.60034	0.115911

Column names can be accessed using the names() function.

Output:

3-element Vector{String}:
"Name"
"Team"
"Work_experience"

3-element Vector{Symbol}:
:Name
:Team
:Work_experience

Renaming columns can be done using the rename() function.

name	team	work experience
String	String	Int64
Vivek	EPAT	15
Viraj	Marketing	8
Rohan	Sales	7
Ishan	Quantra	10

Indexing and summarising data

Indexing dataframes to use particular rows or columns for manipulation is a fundamental operation, and summarising data helps us understand it better. In Julia, summary stats of any dataset can be printed using the describe() function.

variable	mean	min	median	max	nmissing	eltype
Symbol	Float64	Float64	Float64	Float64	Int64	DataType
a	0.499846	0.0477405	0.498452	0.93499	0	Float64
b	0.301368	0.0409036	0.190337	0.845332	0	Float64

Another way to find the number of rows and columns in a dataframe is using ncol() and nrow() functions.

Output:
2
10

Let’s look at multiple methods of accessing rows and columns of a dataframe.

Output:
4-element Vector{String}:
"Vivek"
"Viraj"
"Rohan"
"Ishan"

4-element Vector{String}:
"EPAT"
"Marketing"
"Sales"
"Quantra"

3-element Vector{String}:
"EPAT"
"Marketing"
"Sales"

name	team	work experience
String	String	Int64
Vivek	EPAT	15

name	team
String	String
Vivek	EPAT
Viraj	Marketing
Rohan	Sales
Ishan	Quantra

Basic mathematical operations

As discussed in my previous post, basic arithmetic operations can be performed on individual columns.

10-element Vector{Float64}:

-0.5474996670806442
 0.5174063588946236
-0.564150142575268
 0.12873854328766576
 0.2741519215981265
 0.20241852864291987
 0.09324017568958975
-0.41716724316286524
 0.2693306887583933
-0.5967498723478988

You’ll have to use the “.” operator for element-wise division.

10-element Vector{Float64}:

0.06754620232737023
3.013387340201863
0.4169119702423886
1.2293455286486041
1.4462537614868343
8.482279426917298
1.1103752688515762
0.21238611891693882
3.1244976300403002
0.38733760512833965

Basic operations

Rearranging columns

r” is a regex search string. Here, any column with a string “work” will be selected and moved to the first place. You can write the full column name as well.

work experience	name	team
Int64	String	String
15	Vivek	EPAT
8	Viraj	Marketing
7	Rohan	Sales
10	Ishan	Quantra

Adding a new column in a dataframe

Here we add another column, “c”, to the dataframe df_2.

a	b	c
Float64	Float64	Float64
0.845011	0.720306	0.962749
0.647665	0.0409036	0.10846
0.427267	0.221369	0.197592
0.413642	0.374832	0.967406
0.477994	0.118461	0.0233091
0.0849006	0.157679	0.936764
0.0477405	0.845332	0.296003
0.518909	0.159305	0.514714
0.93499	0.259579	0.620951
0.60034	0.115911	0.0224133

Dataframe-to-matrix conversion

10×3 Matrix{Float64}:

0.0396604  0.58716    0.741712
0.774389   0.256983   0.429361
0.403371   0.967521   0.989583
0.690069   0.56133    0.50599
0.888493   0.614341   0.152574
0.229472   0.0270531  0.932589
0.937996   0.844756   0.0745573
0.112492   0.52966    0.712178
0.396105   0.126774   0.397762
0.377277   0.974027   0.685073

Grouping data

Let’s look at ways to group data, which comes in handy while summarising data.

In-built datasets in Julia

The package RDatasets.jl in Julia helps you import all the in-build packages in R that can be used for testing purposes.

Here’s how you can find out the list of available datasets. It has 763 datasets.

We’ll work with one of the in-built datasets (“iris”) in this section. “iris” provides the data for multiple measurements of 3 plant species and 4 features for each of them. More details about this dataset can be found here.

The following snapshot shows the variables in the iris dataset.

SepalLength	SepalWidth	PetalLength	PetalWidth	Species
Float64	Float64	Float64	Float64	Cat…
5.1	3.5	1.4	0.2	setosa
4.9	3.0	1.4	0.2	setosa
4.7	3.2	1.3	0.2	setosa
4.6	3.1	1.5	0.2	setosa
5.0	3.6	1.4	0.2	setosa
5.4	3.9	1.7	0.4	setosa
4.6	3.4	1.4	0.3	setosa
5.0	3.4	1.5	0.2	setosa
4.4	2.9	1.4	0.2	setosa
4.9	3.1	1.5	0.1	setosa
5.4	3.7	1.5	0.2	setosa
4.8	3.4	1.6	0.2	setosa
4.8	3.0	1.4	0.1	setosa
4.3	3.0	1.1	0.1	setosa
5.8	4.0	1.2	0.2	setosa
5.7	4.4	1.5	0.4	setosa
5.4	3.9	1.3	0.4	setosa
5.1	3.5	1.4	0.3	setosa
5.7	3.8	1.7	0.3	setosa
5.1	3.8	1.5	0.3	setosa
5.4	3.4	1.7	0.2	setosa
5.1	3.7	1.5	0.4	setosa
4.6	3.6	1.0	0.2	setosa
5.1	3.3	1.7	0.5	setosa
4.8	3.4	1.9	0.2	setosa
5.0	3.0	1.6	0.2	setosa
5.0	3.4	1.6	0.4	setosa
5.2	3.5	1.5	0.2	setosa
5.2	3.4	1.4	0.2	setosa
4.7	3.2	1.6	0.2	setosa
⋮	⋮	⋮	⋮	⋮

Here’s the summary of this dataset.

variable	mean	min	median	max	nmissing	eltype
Symbol	Union…	Any	Union…	Any	Int64	DataType
SepalLength	5.84333	4.3	5.8	7.9	0	Float64
SepalWidth	3.05733	2.0	3.0	4.4	0	Float64
PetalLength	3.758	1.0	4.35	6.9	0	Float64
PetalWidth	1.19933	0.1	1.3	2.5	0	Float64
Species		setosa		virginica	0	CategoricalValue{String, UInt8}

Let’s look at some of the questions you might want to answer using the iris dataset.

We can perform arithmetic operations by grouping data based on various columns. Here’s how we can get the answer to the following question -

What’s the mean value of the sepal length of each species?

Species	mm
Category	Float64
setosa	5.006
versicolor	5.936
virginica	6.588

Another package that helps make the operations more intuitive is Pipe.jl. It lets you write operations as they are performed instead of the backward approach.

Species	mm
Category	Float64
setosa	5.006
versicolor	5.936
virginica	6.588

Species	nrow
Category	Float64
setosa	50
versicolor	50
virginica	50

Dealing with missing data

Julia has a “missing” object that is used for unavailable data. You can use skipmissing() function to perform operations ignoring the missing values.

Output:

a	b
Int64?	String?
1	Apple
missing	Orange
3	missing
7	Grapes

You can use dropmissing() function to remove the missing values.

Int64	String
a	b
1	Apple
7	Grapes

More details for dealing with missing values can be found here.

Importing and exporting data as CSV and Excel files

Reading data is the first step in analysing any kind of data. Most of the information we come across is either in CSV or excel format, so we’ll focus on these two. We will work with CSV.jl and XLSX.jl for dealing with CSV and Excel files.

Reading and writing CSV files

We’ll read a CSV file (infy.csv), as a dataframe, containing historical stock price data for Infosys downloaded from Yahoo finance for the period 21-Dec-2020 to 22-Dec-2021.

Here’s a summary for this data.

variable	mean	min	median	max	nmissing	eltype
Symbol	Union…	Any	Union…	Any	Int64	DataType
Date		2020-12-22		2021-12-21	0	Date
Open	20.5674	16.39	20.63	24.05	0	Float64
High	20.7164	16.69	20.775	24.5	0	Float64
Low	20.4097	16.36	20.51	23.94	0	Float64
Close	20.5685	16.58	20.725	24.22	0	Float64
Adj Close	20.3422	16.2664	20.5451	24.22	0	Float64
Volume	7.09982e6	1320600	6.43815e6	22911800	0	Int64

Here, we calculate the range -

Date	Open	High	Low	Close	Adj Close	Volume	range
Date	Float64	Float64	Float64	Float64	Float64	Int64	Float64
2020-12-22	16.39	16.74	16.36	16.58	16.2664	6714400	0.379999
2020-12-23	16.9	16.93	16.57	16.59	16.2762	5913500	0.36
2020-12-24	16.68	16.69	16.52	16.6	16.286	1320600	0.170001
2020-12-28	16.73	16.84	16.72	16.77	16.4528	4239300	0.120001
2020-12-29	16.9	16.9	16.67	16.76	16.443	8473700	0.23
2020-12-30	16.87	17.0	16.83	16.93	16.6098	3877200	0.17
2020-12-31	17.01	17.03	16.89	16.95	16.6294	3693700	0.140002
2021-01-04	17.39	17.43	17.06	17.25	16.9237	12597600	0.370001
2021-01-05	17.32	17.67	17.32	17.65	17.3162	8109900	0.35
2021-01-06	17.4	17.79	17.34	17.73	17.3946	9136300	0.450001
2021-01-07	17.36	17.55	17.26	17.55	17.2181	10272000	0.289999
2021-01-08	18.07	18.61	18.02	18.59	18.2384	17802400	0.590001
2021-01-11	18.68	18.86	18.55	18.76	18.4052	12220600	0.310002
2021-01-12	18.92	18.94	18.54	18.6	18.2482	10629100	0.4
2021-01-13	19.03	19.07	18.4	18.43	18.0814	18409900	0.67
2021-01-14	18.57	18.65	18.14	18.22	17.8754	13286100	0.510001
2021-01-15	18.19	18.38	18.11	18.17	17.8263	7443000	0.269998
2021-01-19	18.08	18.18	17.95	18.12	17.7773	7179600	0.229999
2021-01-20	18.37	18.47	18.29	18.4	18.052	5408500	0.179998
2021-01-21	18.39	18.4	18.15	18.2	17.8558	7963400	0.25
2021-01-22	18.23	18.27	18.06	18.18	17.8361	5663500	0.210001
2021-01-25	18.15	18.22	17.84	17.92	17.5811	6012600	0.379999
2021-01-26	17.92	17.92	17.75	17.85	17.5124	5472600	0.17
2021-01-27	17.65	17.89	17.44	17.47	17.1396	11388300	0.449998
2021-01-28	17.46	17.75	17.41	17.64	17.3064	7877600	0.34
2021-01-29	17.16	17.23	16.88	16.88	16.5607	9671400	0.350001
2021-02-01	17.19	17.42	17.05	17.38	17.0513	5829200	0.370001
2021-02-02	17.45	17.51	17.34	17.44	17.1101	4119800	0.17
2021-02-03	17.6	17.75	17.49	17.65	17.3162	4677800	0.26
2021-02-04	17.54	17.64	17.36	17.59	17.2573	4439600	0.279998
⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮

This updated dataframe can be saved using CSV.write() function.

Reading and writing excel files

We’ll use the XLSX.jl package in Julia to read and write excel files.

Here’s how it can be done -

Date	Open	High	Low	Close	Adj Close	Volume
Any	Any	Any	Any	Any	Any	Any
2020-12-22	16.39	16.74	16.36	16.58	16.2664	6714400
2020-12-23	16.9	16.93	16.57	16.59	16.2762	5913500
2020-12-24	16.68	16.69	16.52	16.6	16.286	1320600
2020-12-28	16.73	16.84	16.72	16.77	16.4528	4239300
2020-12-29	16.9	16.9	16.67	16.76	16.443	8473700
2020-12-30	16.87	17.0	16.83	16.93	16.6098	3877200
2020-12-31	17.01	17.03	16.89	16.95	16.6294	3693700
2021-01-04	17.39	17.43	17.06	17.25	16.9237	12597600
2021-01-05	17.32	17.67	17.32	17.65	17.3162	8109900
2021-01-06	17.4	17.79	17.34	17.73	17.3946	9136300
2021-01-07	17.36	17.55	17.26	17.55	17.2181	10272000
2021-01-08	18.07	18.61	18.02	18.59	18.2384	17802400
2021-01-11	18.68	18.86	18.55	18.76	18.4052	12220600
2021-01-12	18.92	18.94	18.54	18.6	18.2482	10629100
2021-01-13	19.03	19.07	18.4	18.43	18.0814	18409900
2021-01-14	18.57	18.65	18.14	18.22	17.8754	13286100
2021-01-15	18.19	18.38	18.11	18.17	17.8263	7443000
2021-01-19	18.08	18.18	17.95	18.12	17.7773	7179600
2021-01-20	18.37	18.47	18.29	18.4	18.052	5408500
2021-01-21	18.39	18.4	18.15	18.2	17.8558	7963400
2021-01-22	18.23	18.27	18.06	18.18	17.8361	5663500
2021-01-25	18.15	18.22	17.84	17.92	17.5811	6012600
2021-01-26	17.92	17.92	17.75	17.85	17.5124	5472600
2021-01-27	17.65	17.89	17.44	17.47	17.1396	11388300
2021-01-28	17.46	17.75	17.41	17.64	17.3064	7877600
2021-01-29	17.16	17.23	16.88	16.88	16.5607	9671400
2021-02-01	17.19	17.42	17.05	17.38	17.0513	5829200
2021-02-02	17.45	17.51	17.34	17.44	17.1101	4119800
2021-02-03	17.6	17.75	17.49	17.65	17.3162	4677800
2021-02-04	17.54	17.64	17.36	17.59	17.2573	4439600
⋮	⋮	⋮	⋮	⋮	⋮	⋮

We can write an excel file using the writetable() function.

Julia has in-built read() and write() open() close() functions to work with text files. More details can be found here.

Data can be written in .jld format as well. .jld is Julia’s data format built using the JLD.jl package.

Details for the following packages can be found here -

Data visualization

Data visualization is crucial for understanding and analysing data. We’ll now look at some of the plots using Plots.jl. Plots.jl is one of the commonly used plotting libraries in Julia.

Line plot

Here’s a simple line plot.

Attributes of a plot

The following attributes can be added to the plot. These attributes can be used for all the plots discussed in this article.

xlabel - For x-axis label
ylabel - For y-axis label
title - Title of the plot
ylims - Range of y-axis
xlims - Range of the x-axis
label - Label names in the legend
linewidth/lw - For adjusting the width of the line
color - For adding specific colours to the lines
legend - Require legend or not and position of the legend. It can take: “topleft”, “topright”, “bottomleft”, “bottomright”, “right”, “bottom”, “top”, “right”, true, false
layout - For adding multiple plots in the same image.
size - Size of the plot

This list is not exhaustive; many attributes can be used. However, as I have mentioned earlier, we’ll stay focused on the question: How do we use Julia to achieve our goal?

The attributes presented above are most commonly used and should suffice for creating plots.

Here’s an example that combines all the features mentioned above.

Scatter plot

Scatter plots can be generated using multiple methods. Here are a few examples -

Heatmap

Histogram

Pie chart

Here’s a sample layout with different plots.

Plotting mathematical functions

Here are some plots of mathematical functions.

Saving plots

The plot generated can be saved in various formats using the savefig() function.

Animated plots

We can also use the plots and covert and save them as gifs or videos.

Lorenz attractor

The following is the code of the Lorenz attractor as seen in the Julia documentation:

More details about animated plots can be found here.

Various packages for plotting in Julia

Plots.jl is the basic plotting library in Julia. There are other packages for visualization such as -

GadFly.jl
GoogleCharts.jl
Makie.jl
PyPlot.jl
PGFPlotsX.jl
UnicodePlots.jl and
VegaLite.jl

Conclusion

This article covers the foundations of data manipulation and visualization using Julia.

In the following article, we’ll look at methods to get timeseries data for stock prices and analyse it using the tools presented in this article. Until then, take this article as a building block and explore the aspects you found interesting in detail!

However, if you are looking to pursue and venture into algorithmic trading then our comprehensive algo trading course taught by industry experts, trading practitioners and stalwarts like Dr. E. P. Chan, Dr. Euan Sinclair to name a few - is just the thing for you. Enroll now!

Author: Anshul Tayal

Disclaimer: All data and information provided in this article are for informational purposes only. QuantInsti^® makes no representations as to accuracy, completeness, currentness, suitability, or validity of any information in this article and will not be liable for any errors, omissions, or delays in this information or any losses, injuries, or damages arising from its display or use. All information is provided on an as-is basis.

EPAT Walkthrough & Live Q&A