This week's R bulletin will cover functions calls, sorting data frame, creating time series object, and functions like is.na, na.omit, paste, help, rep, and seq function. Hope you like this R weekly bulletin. Enjoy reading!
- To show files - Ctrl+5
- To show plots - Ctrl+6
- To show packages - Ctrl+7
Problem Solving Ideas
Calling a function in an R script
If you want to call a custom-built function in your R script from another script, one can use the “exists” function along with the “source” function. See the example below:
if(exists("daily_price_data", mode="function")) source("Stock price data.R")
In this case, the expression will check whether a function called “dailypricedata” exists in the “Stock price data.R” script, and if it does, it will load the function in the current script. We can then use the function any number of times in our script by providing the relevant arguments.
Convert dates from Google finance to a time series object
When we download stock price data from Google finance, the “DATE” column shows a date in the yyymmdd format. This format is not considered as a time series object in R. To convert the dates from Google Finance into a time series object, one can use the ymd function from the lubridate package. The ymd function accepts dates in the form year, month, day. In the case of dates in other formats, the lubridate package has functions like ydm, mdy, myd, dmy, and dym, which can be used to convert it into a time series object.
library(lubridate) dt = ymd(20160523) print(dt)
Sorting a data frame in an ascending or descending order
The arrange function from the dplyr package can be used to sort a data frame. The first argument is the data.frame and the next argument is the variable to sort by, either in an ascending or in a descending order.
In the example below, we create a two column data frame comprising of stock symbols and their respective percentage price change. We then sort the Percent change column first in an ascending order, and in the second instance in a descending order.
library(dplyr) # Create a dataframe Ticker = c("UNITECH", "RCOM", "VEDL", "CANBK") Percent_Change = c(2.3, -0.25, 0.5, 1.24) df = data.frame(Ticker, Percent_Change) print(df)
Ticker Percent_Change 1 UNITECH 2.30 2 RCOM -0.25 3 VEDL 0.50 4 CANBK 1.24
# Sort in an ascending order df_descending = arrange(df, Percent_Change) print(df_descending)
Ticker Percent_Change 1 RCOM -0.25 2 VEDL 0.50 3 CANBK 1.24 4 UNITECH 2.30
# Sort in a descending order df_descending = arrange(df, desc(Percent_Change)) print(df_descending)
Ticker Percent_Change 1 UNITECH 2.30 2 CANBK 1.24 3 VEDL 0.50 4 RCOM -0.25
The paste is a very useful function in R and is used to concatenate (join) the arguments supplied to it. To include or remove the space between the arguments use the “sep” argument.
Example 1: Combining a string of words and a function using paste
x = c(20:45) paste("Mean of x is", mean(x), sep = " ")
 "Mean of x is 32.5"
Example 2: Creating a filename using the dirPath, symbol, and the file extension name as the arguments to the paste function.
dirPath = "C:/Users/MyFolder/" symbol = "INFY" filename = paste(dirPath, symbol, ".csv", sep = "") print(filename)
is.na and na.omit function
The is.na functions checks whether there are any NA values in the given data set, whereas, the na.omit function will remove all the NA values from the given data set.
Example: Consider a data frame comprising of open and close prices for a stock corresponding to each date.
date = c(20160501, 20160502, 20160503, 20160504) open = c(234, NA, 236.85, 237.45) close = c(236, 237, NA, 238) df = data.frame(date, open, close) print(df)
date open close 1 20160501 234.00 236 2 20160502 NA 237 3 20160503 236.85 NA 4 20160504 237.45 238
Let us check whether the data frame has any NA values using the is.na function.
date open close [1,] FALSE FALSE FALSE [2,] FALSE TRUE FALSE [3,] FALSE FALSE TRUE [4,] FALSE FALSE FALSE
As you can see from the result, it has two NA values. Let us now use the na.omit function, and view the results.
date open close 1 20160501 234.00 236 4 20160504 237.45 238
As can be seen from the result, the rows having NA values got omitted, and the resultant data frame now comprises of non-NA values only.
These functions can be used to check for any NA values in large data sets on which we wish to apply some computations. The presence of NA values can cause the computations to give unwanted results, and hence such NA values need to be either removed or replaced by relevant values.
rep and seq function
The rep function repeats the arguments for the specified number of times, while the sequence function is used to form a required sequence of numbers. Note that in the sequence function we use a comma and not a colon.
rep("Strategy", times = 3)
 "Strategy" "Strategy" "Strategy"
 1 2 3 1 2 3
 1 2 3 4 5
seq(1, 5, 0.5)
 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
help and example function
The help function provides information on the topic sort, while the example function provides examples on the given topic.
To access the R help files associated with specific functions within a particular package, include the function name as the first argument to the help function along with the package name mentioned in the second argument.
Alternatively, one can also type a question mark followed by the function name (e.g. ?barplot) and execute the command to know more about the function.
We hope you liked this bulletin. In the next weekly bulletin, we will list more interesting ways and methods plus R functions for our readers.
We have noticed that some users are facing challenges while downloading the market data from Yahoo and Google Finance platforms. In case you are looking for an alternative source for market data, you can use Quandl for the same.