Go on. Explore!

lvl 0 • 0 xp

lvl 1

Home Market Decks Interests

Python

memorize.ai (lvl 286)

Section 1

Preview this deck

create a contour plot with 30 contours of Z array? create the same with a filled contour?

Front

1 / 240

Python

0.0

0 reviews

5			0
4			0
3			0
2			0
1			0

Active users

All-time users

Favorites

Last updated

1 year ago

Date created

Mar 1, 2020

Cards (240)

Section 1

(50 cards)

create a contour plot with 30 contours of Z array? create the same with a filled contour?

Front

plt.contour(Z, 30) plt.contourf(Z, 30)

Back

Create a list of neighbors in range 1 to 9 and two empty numpy arrays for training and test accuracies

Front

neihbors = np.arange(1,9) training_accuracy = np.empty(len(neighbors)) test_accuracy = np.empty(len(neighbors))

Back

What does np.meshgrid do?

Front

replicates 1D arrays into a 2D array while keeping axes.

Back

What command will save a png from a plot? save as pic

Front

plt.savefig('pic.png)

Back

What kinds of plots are available to the kind argument for a joint plot?

Front

kind = scatter, reg, resid, kde, hex

Back

How do you create a numpy array from a pandas dataframe (df) index? how would you create a list?

Front

numpy array df.index.values list df.index.values.tolist() chaining .values to anything pandas related gives a numpy array

Back

Create a combined violinplot overlayed with a spreadout strip plot: data = tips, x=day, y=tip with violinplot in light gray without the internals so just showing the KDE section.

Front

sns.violinplot(x='day', y='tip', data=tips, inner=None, color='lightgray') sns.stripplot(x='day', y='tip', data=tips, siz=4, jitter=True) plt.label('tip ($)') plt.show()

Back

Use knn to score X, y test sets

Front

knn.score(X_test, y_test)

Back

How do you add controls over axes extents - set limits to x-axis to 0 to 5 and y-axis from -10 to 10

Front

plt.axis([0, 5, -10, 10]) Or plt.xlim([0, 5]) plt.ylim([-10, 10])

Back

Read in image X.png and show it without axes

Front

img = plt.imread('X.png') plt.imshow(img) plt.axis('off') plt.show()

Back

What are the arguments to subplot() command?

Front

subplot(nrows, ncols, nsubplot)

Back

What does np.linspace do? What does np.linspace(-2, 2, 3) create? How about np.linspace(-1, 1, 5)?

Front

creates a 1D array of uniformly spaced values array from -2 to 2 using 3 numbers [-2, 0, 2] [-1, -0.5, 0, 0.5, 1]

Back

Import module from sklearn to split data into training and test sets

Front

from sklearn.model_selection import train_test_split

Back

Create a variable intensity from an image saved as img from all color channels

Front

intensity = img.sum(axis=2)

Back

Create a subplot system with 2-rows, one column. Create two stipplots using auto data set investigating hp grouped horizontally by cyl. For the second plot spread out the points and make the size of points3

Front

#Constructing Strip Plots #Make a strip plot of 'hp' grouped by 'cyl' plt.subplot(2,1,1) sns.stripplot(x='cyl', y='hp', data=auto) # Make the strip plot again using jitter and a smaller point size plt.subplot(2,1,2) sns.stripplot(x='cyl', y='hp', data=auto, size=3, jitter=True) # Display the plot plt.show()

Back

Create a hexbin graph of mpg(y) and hp(x) with gridsize 15 across and 12 high with a range of mpg: 8-48, hp:40-235

Front

plt.hexbin(hp, mpg, gridsize = (15, 12), extent=(40,235,8,48))

Back

What is machine learning?

Front

Andreas Muller: 'The art and science of giving computers the ability to learn and make decisions from data without being explicitly programmed'

Back

For Univariate data what plot types are appropriate?

Front

Strip Plots, Swarm Plots, Box Plots, Violin Plots

Back

Create a 2D histogram with mpg(y), hp(x) with bins of 20 and 20. limit hp to a range to 40-235 and mpg to 8-48

Front

plt.hist2D(hp, mpg, bins=(20,20), range=((40, 235), (8, 48))

Back

Fit a linear regression model on X, y from sklearn?

Front

from sklearn import linear_model reg = linear_model.LinearRegression() reg.fit(X,y)

Back

How do you add a color bar to a plot?

Front

plt.colorbar()

Back

Using matplotlib plot temperature in red and dewpoint in blue against the target variable (t), label x-axis as 'Date' and give title 'Temperature and Dew Point' using subplot

Front

plt.subplot(2,1,1) plt.plot(t, temperature, 'r') plt.xlabel('Date') plt.title('Temperature') pltsubplot(2,1,2) plt.plot(t, dewpoint, 'b') plt.xlabel('Date') plt.title('Dew Point') ptl.tight_layout plt.show()

Back

Create a seaborn strip plot for tip amount per day, with day on x axis and tip amount on y-axis from dataframe tips, apply argument to spread out data

Front

sns.stripplot(x='day', y='tip', data=tip, jitter=True)

Back

Import knn classifier from sklearn?

Front

from sklearn.neighbors import KNeighborsClassifier

Back

Annotate a graph with the label 'setosa' placed at location (5.0, 3.5), with text at (4.25, 4.0) and an arrow in red

Front

plt.annotate('setosa', xy=(5.0, 3.5), xytext=(4.25, 4.0), arrowprops={'color':'red'}

Back

Where is histogram pixel equalization used?

Front

In astronomical and medical images to get more contrast so that features will stand out. This is because it uses the full range of intensities rathern

Back

Using matplotlib plot temperature in red and dewpoint in blue against the target variable (t), label x-axis as 'Date' and give title 'Temperature and Dew Point'

Front

plt.plot(t, temperature, 'r') plt.plot(t, dewpoint, 'b') plt.xlabel('Date') plt.title('Temperature and Dew Point') # at this point plot is held in memory plt.show()

Back

How do you add a legend to a plot? what are some locations?

Front

plt.legend(loc='upper right') Locations: 'upper center/left/right' 'center left/right' 'lower center/left/right' center best right

Back

How are violin plots made?

Front

A kernel density estimate (KDE) is wrapped around the boxplot to show where distribution is thicker.

Back

How do you calculate the covariance matrix and the correlation matrix from a dataframe, df?

Front

df.cov() df.corr()

Back

What is the goal of supervised learning?

Front

Automate time-consuming or expensive manual tasks or make predictions about future. Need labeled data

Back

plot residuals from auto data frame, hp on x-axis, mpg on y-axis from auto dataframe in green

Front

sns.residplot(x='hp', y='mpg', data=auto, color='green')

Back

With lots of data should you use strip, swarm, box or violin plots?

Front

boxplots or violin plots are suggested

Back

Create a seaborn swarm plot from the tips data set with day on the x-axis and top on the y-axis, group/color by sex and print horizontally

Front

sns.swarmplot(x='day', y='tip', data=tips, hue='sex', orient = 'h')

Back

From the dataframe iris, return the column for 'sepal_length' where the column 'species' is equal to setosa

Front

iris.loc[iris['species] == 'setosa', 'sepal_length']

Back

For Bivariate or Multivariate data what plot types are appropriate?

Front

Joint Plots, Pair Plots, Heat Maps

Back

How do you remove the first 5 characters from a character string for every element in a pandas df column, df.col?

Front

df.col = df.col.str.slice(start=5)

Back

What is a histogram?

Front

Back

Create a seaborn joint plot and output kde, plotting bill and tip from dataframe tips.

Front

sns.jointplot(x='bill', y='tip', data=tips kind='kde')

Back

You are loading stocks.csv and you want the first column 'Date' to be a time series index. how is it loaded?

Front

pd.read_csv('/dates.csv', index_col=0, parse_dates=True)

Back

With a timeseries index create a new dataframe from the dataframe temperature for march and april of 2010

Front

march_apr = temperature['2010-03':'2010-04']

Back

Basically describe k-nearest neighbors

Front

Predict label of a datapoint by looking at k closest labeled data points and then taking a majority vote

Back

Use train_test_split to split data in ton 70/30 stratified split

Front

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=13, stratify=y)

Back

import seaborn and plot weight by hp with a regression line from the auto dataframe

Front

import seaborn as sns sns.lmplot(x='weight', y='hp', data=auto)

Back

How to you set/change styles in matplotlib.pyplot? What are some styles?

Front

plt.style.use('ggplot') classic seaborn-ticks seaborn-colorblind dark_background grayscale seaborn-bright seaborn-dark-palette seaborn-notebook seaborn-poster seaborn-paper seaborn seaborn-muted _classic_test seaborn-deep seaborn-whitegrid seaborn-pastel seaborn-dark seaborn-white ggplot seaborn-darkgrid bmh fivethirtyeight seaborn-talk

Back

What is a moving window?

Front

Extracts information on longer time scales and aggregates using a summary statistic: average, median, standard deviation

Back

create a seaborn pairplot from the tips dataframe and group by sex

Front

sns.pairplot(tips, hue='sex')

Back

Find the corresponding value in one list to the index of the maximum value in another list. Return value in Y that is max value in X

Front

y[x.argmax()]

Back

Find the max value in a numpy array, x.

Front

x.max()

Back

What does plt.tight_layout() do?

Front

automatically adjusts subplot parameters so that the subplots fit in to the figure area. It improves spacing

Back

Section 2

(50 cards)

What is the first thing you do with data before trying to build a model?

Front

Split off a test set that is never viewed or used for validation and use this to report final model accuracy. Want to ensure models ability to generalize to unseen data.

Back

What is the motivation for cross validation?

Front

model performance ins dependent on the way the data is split and the pecularities of the test set. It may not be representative of the models ability to generalize

Back

Explain how a ROC curve works, what if you set p=0? p=1? what happens?

Front

Back

What metric is used for multi-class classification models?

Front

Accuracy score and confusion matrix

Back

What argument can be added to xgb.cv to stop training earl?

Front

early_stopping_rounds = # early stopping tests the model after every boosting round against a hold-out dataset and stopping the creation of additional boosting rounds if the hold-out metric does not improve for a given number of rounds.

Back

Explain Lasso Regression?

Front

Back

In dataframe df, convert all instances of '?' to np.nan

Front

df[df == '?'] = np.nan

Back

Explain Standardizing and Normalizing, how are they different?

Front

Standardization: Subtract Mean and divide by Variance Normalization: Subtract minimum and divide by range (min = 0, max=1), can also

Back

What is a better metric for a problem suffering from class imbalance?

Front

ROC Curve built from confusion matrix

Back

Why do we use objective functions or loss functions

Front

Quantifies how far off a prediction is from the actual result Measures difference between estimated and true values for some collection of data Goal: Find the model that yields the minimum value of the loss function

Back

Create an xgboost DMatrix from X, y

Front

import xgboost as xgb xgb.DMatrix(data=X, label=y)

Back

In using lasso to select coefficients, fit a lasso model on X, y and plot the names of variables (names) and the coeffiecients from lasso (lasso_coef),

Front

lasso_coef = lasso.fit(X,y).coef_ plt.plot(range(len(names), lasso_coef) plt.xticks(range(len(names), names, rotation=60) plt.ylabel('Coefficients') plt.show()

Back

In RandomizedSearchCV, what is used as an argument instead of param_grid and what additional argument is necessary that is not included in GridSearchCV?

Front

param_distributions, n_iter = number of iterations to stop at

Back

What to you import from sklearn for scaling?

Front

from sklearn.preprocessing import scale or StandardScaler

Back

import Logistic regression from sklearn

Front

from sklearn.linear_model import LogisticRegression

Back

Import and put together steps to Create a GridSearchCV model using ElasticNet using an Imputer with mean as the strategy, StandarScaler in a pipeline. Training on training data and test on 40%, calculate the r2 on test

Front

from sklearn.linear_model import ElasticNet from sklearn.model_selection import GridSearchCV, train_test_split from sklearn.pipeline import Pipeline from sklearn.preprocessing import Imputer, StandardScaler # Setup the pipeline steps: steps steps = [('imputation', Imputer(missing_values='NaN', strategy='mean', axis=0)), ('scaler', StandardScaler()), ('elasticnet', ElasticNet())] # Create the pipeline: pipeline pipeline = Pipeline(steps) # Specify the hyperparameter space parameters = {'elasticnet__l1_ratio':np.linspace(0,1,30)} # Create train and test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=42) # Create the GridSearchCV object: gm_cv gm_cv = GridSearchCV(pipeline, parameters, cv=3) # Fit to the training set gm_cv.fit(X_train, y_train) # Compute and print the metrics r2 = gm_cv.score(X_test, y_test) print("Tuned ElasticNet Alpha: {}".format(gm_cv.best_params_)) print("Tuned ElasticNet R squared: {}".format(r2))

Back

What can be imported from sklearn for imputation?

Front

from sklearn.preprocessing import Imputer

Back

Import XGBoost and instantiate a classifier

Front

import xgboost as xgb xg_cl = xgb.XGBClassifier(objective='binary:logistic', n_estimators = 10, seed=123)

Back

What are some common regression metrics?

Front

Root mean squared error (RMSE) - use more often Mean absolute error (MAE) Mean absolute percent error (MAPE)

Back

Just because a dataframe has no values as NaN in each column does that mean the dataframe has no missing values?

Front

No, missing values could be labeled 0, ?, missing, etc.

Back

Why do we use GridSearch?

Front

How do we choose the right parameters for several parameters leading to the lowest loss possible when they interact in non-obvious and non-linear ways

Back

Why do we use regularization?

Front

Large coefficients can lead to overfitting, regularization applies a penalty to parameters to ensure that as they grow they are adding value.

Back

What does a recall of 1 mean?

Front

a low threshold in which you have classified all events at the expense of misclassifying with lots of false positives

Back

What are common loss functions used in XGBoost

Front

Regression: reg:linear Classification: reg:logistic (just decision, not probility) Classificatino: binary:logistic - want probability

Back

import classification report and confusion matrix from sklearn and print output for y_pred, y_test?

Front

from sklearn.metrics import classification_report, confusion matrix classification_report(y_test, y_pred) confusion_matrix(y_test, y_pred)

Back

What is XGBoost?

Front

Optimized gradient boosting machine learning library originally written in C++. Popular because of it's speed and performance and core algorithm is parallelizable consitently outperforms single-algorithm methods

Back

Explain Ridge Regression?

Front

Back

How is the regression line chosen

Front

minimizing the loss function , here it's the vertical distance b/w data point and line - Sum of Squared Errors - Ordinary Least Squares (OLS) - Miminize sum of squared residuals

Back

What is used to create dummy variables in scikit learn and pandas? using the pandas method how do you drop the first dummy variable so that redundant information is not passed to a model?

Front

Scikit learn - OneHotEncoder() pandas - df.get_dummis() pd.get_dummies(df, drop_first=True)

Back

What is R^2?

Front

Quantifies the amount of variance in the target variable accounted for by the feature variables

Back

How do you import mean_squared_error from sklearn?

Front

from sklearn.metrics import mean_squared_error

Back

What are the axes of the ROC curve?

Front

y: True Positive Rate (sensitivity, recall) x: False Positive Rate (1-specficity)

Back

Creating an Imputer instance where we want to replace NaN values with the mean using the column

Front

imp = Imputer(missing_values='NaN', strategy='mean', axis=0) imp.fit_transform(df)

Back

What is boosting?

Front

Meta-algorithm. Ensemble used to convert many weak learners into a strong learner. Where the strong learner can be tuned to achieve high accuracy. By iteratively learning a set of weak models on subsets of the data and weighting each weak prediction according to each weak learner's performance. Then combine the weighted predictions to obtain a single weighted prediction that is much better as any single prediction.

Back

How is RandomizedSearchCV different from GridSearchCV?

Front

With your searching over a large space it can be computationally expensive to use GridSearch. Randomized Search creates a (possible infinite) range of hyperparameter values per hyperparameter that you would like to search over. Sets the number of iterations you would like for the search to occur during each iteration, randomly draw a value in the range of specified values for each hyperparameter searched over and train/evaluate a model. After reaching the max number of iterations, select the best configuration

Back

Import a decision tree classifier and Randomized Search CV from sklearn

Front

from sklearn.tree import DecisionTreeClassifier from sklearn.model_selection import RandomizedSearchCV

Back

Supervised Learning?

Front

labeled data, Numeric or categorical features Numeric features should be scaled (z-score) categorical feautures should be one-hot encoded

Back

Create X, y numpy arrays from dataframe df where target variable is outcome.

Front

X = df.drop('outcome').values y = df.oucome

Back

create a cross validation using xgboost using input as an already created dmatrix with the params dictionary as params, 6 folds with 5 trees

Front

xgg.cv(dtrain=dmatrix, params=params, nfold=5, num_boost_round=5, metrics='auc', as_pandas=True, seed = 13)

Back

How do you import Ridge Regressor and what are the arguments?

Front

from sklearn.linear_model import Ridge ridge = Ridge(alpha0.1, normalize = True) Normalize will ensure all variables are on the same scale

Back

What does predict_proba return?

Front

probabiliites for each class in order, for binary you get [prob_0, prob_1]

Back

import GridSearchCV from sklearn. how would you get the best params and best score?

Front

from sklearn.model_selection import GridSearchCV model.best_params_ model.best_score_

Back

Import roc_auc_score from sklearn? how can you use it in cross_val_score?

Front

from sklearn.metrics import roc_auc_score cross_val_score(X, y, cv=5, scoring='roc_auc')

Back

WHat can be caluculated from a confusion matrix?

Front

accuracy, precision, recall (sensitivity), specificity F1 Score: 2 (precisionrecall)/(precision + recall) High precision: not many real emails predicted as spam high recall: predicted most spam emails correctly

Back

What's most common metric for binary classification?

Front

ROC_AUC

Back

Using pandas how could you drop all rows from dataframe df that contain NaN in any column?

Front

df.dropna()

Back

In a pipeline consisting of several steps, how would you create a parameters dictionary to set n_neighbors to a range from 1 to 50 for knn?

Front

parameters = {knn__n_neighbors = np.arange(1, 50)} need double underscore after knn to connect to knn parameter

Back

Why is standardizing necessary?

Front

With algorithms that use distance to inform them, then features on larger scales can unduly influence the model. We normalize (center and standardize) to address this

Back

Import modules and run a 5-fold cross validation linear regression on X, y and store in cv_results

Front

from sklearn.model_selection import cross_val_score from sklearn.linear_model import LinearRegression reg = LinearRegression() cv_results = cross_val_score(reg, X, y, cv=5)

Back

What is class imbalance?

Front

When one class is much greater than another Example spam email, real mail 99%, spam 1%, just call everything real mail and you would be 99% correct but this is not helpful

Back

Section 3

(50 cards)

For gradient descent, if the slope is positive how do the weights change?

Front

Instead of just subtracting the slope from the current value whihc might take too big a step. The solution is to use a learning rate and update each weight by subtracting learning_rate*slope

Back

What is the neg_mean_squared_error metric in sklearn?

Front

It's sklearn's way of calculating the mean_squared_error in an API compatible way. It's optimizing to reduce loss so it was made negative.

Back

How do you save a keras model and reload it? What file format is necessary so that you do not need to save architecture and weights separately?

Front

from keras.models import load_model model.save('my_model.h5) my_model = load_model('my_model.h5) HDF5 is the file type

Back

How does forward propogation work in neural networks?

Front

input array is used in a dot product with the weight arrary for each node, these are summed or fed into and activation function to provide the output of a node. The output of the first hidden node is input for the second layer weight arrays to provide input to the output node.

Back

For the column df.lot, fill missing values with 0.

Front

df.lot.fillna(0)

Back

What are hierachical clustering?

Front

Clusters that contain one another - Animals --> Mammals + Reptiles; Mammals --> humans + apes + rodents. Hierarchical clustering is divided into aggolomertive and divisive clustering.

Back

What activation function just returns the node value?

Front

Identity function

Back

Model capacity is the ability to learn features, how is model capacity increased? What sequence should be followed on a new project?

Front

By adding layers or nodes, both will add model complexity. start with one layer with a small amount of nodes, then add nodes followed by layers until error stops improving and then stop back.

Back

What is forward propagation and backpropagation?

Front

forward propogation sends input data through the hidden layers to the output layer. Backpropogation takes the error and propogates it back through the hidden layers toward the input layer. It calculates the necessary slopes sequentially for the weights closest to the prediction through the hidden layers eventially back to the weights coming from the inputs. These slopes are then used to update the weights. Will always do forward propogation to get an error that is used in back propogation

Back

What is a typical learning rate? I f the slope of the mean squared loss function is -24 and the weight was 2, what would the updated weight be?

Front

0.01 2 - (0.01*-24) = 2.24

Back

How do you create a dictionary from a dataframe (df) with columns as lists associated with the column name as key?

Front

df.to_dict('records')

Back

fit a KMeans model with an expected 3 clusters on Array and get the labels

Front

model = KMeans(n_clusters=3) model.fit(Array) labels = model.predict(Array)

Back

What is the dying neuron problem?

Front

When a neuron takes a value of zero for all rows in your data. Once a node starts always getting negative inputs it may continue only getting negative inputs and thus contributes nothing to the model. Appropriately settting the learning rate can address this issue. Alternatively using more advanced ReLU methods such as LeakyReLU which have a gradient below zero, LeakyReLU = max(0.1x, x)

Back

What argument is required in the first dense layer of a Sequential model?

Front

input_shape = (number_of_columns, )

Back

What is used to verify a loaded model structure?

Front

model.summary()

Back

How do you import and use Early Stopping in keras? What is a typical patience?

Front

from keras.callbacks import EarlyStopping early_stopping = EarlyStopping(patience=2) patience of 2-3 is typical as this is for epochs and if improvement doesn't occur with the 3rd pass through the data then it probably will not.

Back

What does StandardScaler do?

Front

transforms each feature to have mean 0 and variance 1

Back

You have created a dataframe with cluster labels and the target value. Create a crosstabulation of label and target from the dataframe df.

Front

pd.crosstab(df['label'], df['target'])

Back

After running GridSearchCV and RandomizedGridSearchCV (some_model), how can you print out the best parameters and best score?

Front

some_model.best_score_ some_model.best_params_

Back

What linear algebra process is used during forward propogation?

Front

Dot product between weights and input

Back

How do you import kmeans from sklearn?

Front

from sklearn.cluster import KMeans

Back

What is the vanishing gradient problem?

Front

Vanishing gradients occur when many layers have very small slopes due to being on the flat part of the tanh curve. in deep networks, updates to backprop were close to zero. ReLU and variations address the vanishing gradient problem

Back

What is overfitting and underfitting?

Front

In overfitting your model will fit oddities in training data due to happenstance that are not generalizable. Underfitting, the model fails to find important features in the training data.

Back

where is early_stopping placed in a keras flow? How do you include a validation split?

Front

model.fit(predictors, target, epochs=20, validation_split=0.3, callbacks=[early_stopping])

Back

If you have gone through 4 iterations of calculating slopes (using backward propagation) and then updated weights, how many times must you have done forward propagation?

Front

4

Back

What is unsupervised learning?

Front

finds patterns in data falls into clustering and dimension reduction cluster customers by their purchases compressing data using purchase patterns

Back

import and use standard scaler on the samples array to create samples_scaled

Front

from sklearn.preprocessing import StandardScaler scaler = StandardScaler() scaler.fit(samples) samples_scaled = scaler.transform(samples

Back

Why do we use activation functions? Give example using banking transactions with changes in the number of children you have.

Front

These functions allow us to capture non-linearities. Example with banking transactions. Going from 1 child to 2 children could impact your banking transactions differently than going from 3-4 children.

Back

What is gradient descent?

Front

Start at a random point (randomized weights), and take steps until you are somehwere flat, find the slope and take step downhill. if point on a loss function has a slope (derivative) then take a step in the opposite direction to move downhill, continue the process

Back

What is used to measure cluster quality when no labels are present? How can you get this measuement?

Front

Inertia - measures how spread out clusters are - lower is better. Measures distance from each sample to centroid of it's cluster model.inertia_

Back

What's the difference between StandardScaler and Normalizer?

Front

StandardScaler standardizes features by removing the mean and and scaling to unit variance. Normalizer rescales each sample independently of the other StandardScaler acts Column-wise Normalizer acts Row-wise -useful for sparse datasets

Back

How is unsupervised different from supervised?

Front

supervised learning finds patterns for a prediction task using labels unsupervised learning learning patterns without labels or guidance to a p

Back

From a clustering KMeans clustering model (model), how can you output the centroids?

Front

model.cluster_centers_

Back

What is a loss function?

Front

Aggregates errors in predictions from many data points into a single number. This is a measure of the model's predictive performance. Goal is to minimize the loss function

Back

What are batches and epochs in stochastic gradient descent?

Front

In stochastic gradient descent, slopes are calculated on a subset of the data, a batch. Use a different batch of data to calculate the next update. Start over from the beginning once all data is used. Each time through the training data is called an epoch.

Back

What does adding nodes to a hidden layer allow a neural network to account for?

Front

More Interactions

Back

How are feature variances important to kmeans clustering?

Front

in kmeans: feature variance = feature influence, so larger variance features will dominate

Back

How are NN weights updated after an epoch using previous weights, the learning rate and the calculated slope

Front

updated = weights - learning_rate * slope weights - learning_rate (2 - input_data error)

Back

What is log loss or crossentropy?

Front

Back

How can you use inertia to select the number of clusters?

Front

Plot inertia (y) by number of clusters(x), look for elbow in the plot

Back

What is the difference between sigmoid and softmax activation functions?

Front

Back

Explain ReLU activation.

Front

ReLU - Rectified Linear Activation - composed of two linear pieces RELU(x) = 0 if x < 0, x if x >=0

Back

How does regression work, start with average?

Front

Regression starts with the average and then adds the effect of age, weight, other variables

Back

Why is kfold cross validation rarely used in deep learning?

Front

Deep learning is typically applied to large data sets in which repeated training is too time consuming. Also with large data the validation set can be large and representative so it is trusted.

Back

What is required in the compile layer (2 arguments)?

Front

model.compile(optimizer = opt , loss = loss_function)

Back

Using keras how do you convert a categorical target (result) to a Onehot encoded matrix?

Front

from keras.utils import to_categorical target = to_categorical(result)

Back

Which layers of a NN capture complex or 'higher level' interactions?

Front

The later layers. Earlier layers capture simple interactions and these are built upon

Back

If your predictions were all exactly right, and your errors were all exactly 0, the slope of the loss function with respect to your predictions would also be 0. In that circumstance, which of the following statements would be correct? The updates to all weights in the network would also be 0. The updates to all weights in the network would be dependent on the activation functions. The updates to all weights in the network would be proportional to values from the input data.

Front

The updates to all weights in the network would also be 0.

Back

What is the difference between LabelEncoder and OneHotEncoder? How can these be used in a pipeline?

Front

LabelEncoder converts a column of strings into integers OneHotEncoder: takes column of integers and encodes them as dummy variables These cannot currently be used in a pipeline

Back

What needs to be imported from scipy to perform hierarchical clustering? Then create a cluster object called mergings using samples as input.

Front

from scipy.cluster.hierarchy import linkage, dendrogram mergings = linkage(samples, method='complete')

Back

Section 4

(50 cards)

What is the advantage of dimension reduction?

Front

more efficient storage and computation. Can remove less-informative noise features which cause issues with regression/predictino

Back

What is the autocorrelation function?

Front

ACF - shows autocorrelation function for all lags. Any significant non-zero values implies that you can forecast from the past

Back

How can you get a list of acf values?

Front

from statsmodels.tsa.stattools import acf acf(x)

Back

What is NMF and how are it's models different from PCA?

Front

Non-negative matrix factorization. NMF models are interpretable, unlike PCA, making them easy to interpret and explain to others

Back

Import the necessary library and plot the acf for series x up to 20 lags with a confidence interval of 0.05?

Front

from statsmodels.graphics.tsaplots import plot_acf plot_acf(x, lags=20, alpha=0.05)

Back

What do the axes of a TSNE plot mean?

Front

The axes are not interpretable - t-SNE features are different every time - stochastic NOT deterministic

Back

What is PCA?

Front

Principal Component Analysis Fundamental reduction Technique. First step is decorrelation, second step reduces dimension Rotates sampels to be aligned with axes, shipfts data samples so they have mean of 0.

Back

import t-SNE and apply it to samples with a learning rate of 100

Front

from sklearn.manifold import TSNE model = TSNE(learning_rate=100) transformed = model.fit_transform(samples)

Back

How do you create an array with two rows and three columns from a 1d numpy array?

Front

array.reshape( (2,3) )

Back

What is necessary to import from scipy to extract cluster labels from hierarchical clustering using input mergings and a distance of 15?

Front

from scipy.cluster.hierarchy import fcluster labels = fcluster(mergings, 15, criterion='distance')

Back

What is autocorrelation?

Front

Correlation between a variable and a lagged version of itself, also called serial correlation. Usually mean lag1 correlation.

Back

What is the statistical test for Random Walk with drift?

Front

Pt = Mu + Pt-1 + Et Regression Test: Pt - Pt-1 = alpha + Beta*Pt-1 + Et Test: Ho: Beta = 0 (random walk) H1: B < 0 (not random walk) this is Dickey-Fuller Test

Back

What does MaxAbsScaler do to the data?

Front

transforms the data so that all users have the same influence on the model regardless of how many differetn artists they have listened to. Scale each feature by its maximum absolute value. This estimator scales and translates each feature individually such that the maximal absolute value of each feature in the training set will be 1.0. It does not shift/center the data, and thus does not destroy any sparsity.

Back

What's the difference between single linkage and complete linkage?

Front

complete: distance b/w clusters is the distance between furthest points Single: distance b/w closest points of clusters

Back

In a dendrogram, what defines the distance between clusters?

Front

Distance is defined by linkage method complete: distance b/w clusters is maximum distance between their samples

Back

What is the interpretation of NMF Components applied to documents? What is the interpretation applied to images?

Front

When applied to text, NMF is one method of Topic Modeling When applied to images, NMF components are parts of images

Back

When looking at correlation between two stocks should you evaluate their returns or levels?

Front

returns because analyzing two trending series. Example - DJI and UFO sightings, both trend up over time but there is no relationship between the two series.

Back

Can you forecast a random walk?

Front

No, best guess for tomorrow's price is today's price.

Back

Import PCA from sklearn and fit on samples data set?

Front

from sklearn.decomposition import PCA model = PCA() model.fit(samples)

Back

Calculate the daily change in the df.Price column as a new column called df.Change and calculate it's autocorrelation?

Front

df.Change = df.Price.diff() autocorrelation = df.Change.autocorr()

Back

What is White Noise?

Front

A Time Series with: - constant mean - constant variance - zero autocorrelations at all lags

Back

Calculate the autocorrelation of df.Col1?

Front

df.Col1.autocorr()

Back

How can you calculate the percent change or the difference between rows of a dataframe column?

Front

df['col'].pct_change() df['col'].diff()

Back

What must be true for the input and output of NMF?

Front

the input data must be non-negative only This applies to words and images

Back

After pca.fit(samples), how can you get the number of PCs and the explained variance?

Front

pca.n_components_ pca.explained_variance_

Back

How do you create a 1D array from a numpy array with multipe rows?

Front

array.flatten()

Back

What is the preferred way to save a sparse matrix in python instead of as anumpy array? Because sklearn doesn't support csr_matrix, what is used instead?

Front

scipy.sparse.csr_matrix csr_matrix remembers only the non-zero entries to save space in sklearn use TruncatedSVD - performs same transformation as PCA but will accept a csr_matrix as input.

Back

What does a positive or negative autocorrelation mean?

Front

with returns: positive autocorrelation - trend following - a increase yesterday means an increase today negative autocorrelation - mean reverting - an increase yesterday means a decrease today.

Back

What does t-SNE stand for?

Front

t-distributed stochastic neighbor embedding maps samples from high-diimensional space to 2D or 3D. Map approximately preserves nearness of samples. Great for inspecting datasets

Back

How do you create an array of unique values from a pandas column in the dataframe df?

Front

df.column.unique()

Back

How can you calculate similarity of a vector to a dataframe of vectors?

Front

df.dot(vector)

Back

Convert the column Price currently of type string to float in the df dataframe?

Front

df.Price = df.Price.apply(lambda x: float(x))

Back

How do you import NMF, what type of arrays will it take and what must be specified every time?

Front

from sklearn.decomposition import NMF NMF works with NumPy arrays and csr_matrix you must specify n_components everytime

Back

Calculate the correlation between two columns of dataframe df, col1, col2? q

Front

df.col1.corr(df.col2)

Back

What is the learning rate argument for TSNE?

Front

Learning rate is a parameter to be tuned - the wrong choice will have points bunched together. Try values between 50 and 200

Back

What is the intrinsic dimension of a dataset?

Front

The number of features needed to approximate the dataset. What is the most compact representation - pca can tell us.

Back

Merge stocks and bonds dataframes that both have datetime indexes to generate a dataframe for days in which both markets are open?

Front

stocks.join(bonds, how='inner')

Back

What is the benefit of PCA?

Front

Creates non-correlated, PCA features are not linearly correlated. PCA alinges principal components along directions of maximal variance

Back

How is time series data different from cross-sectional data?

Front

Time series data is ordered in time, where for cross-sectional the data point is taken at one point or where time is meaningless - example - google searches for diet - hits low during the holidays then spikes with new year

Back

How can PCA identify intrinsic dimension?

Front

using signficant variance

Back

Create a simple linear regression from Statsmodels for variables, x and y

Front

import statsmodels.api as sm sm.OLS(x, y)

Back

What is the y-axis (numeric) on a dengrogram mean?

Front

height on dendrogram = distance between merging clusters

Back

In pandas how can you turn an index into a datatime index?

Front

df.index = pd.to_datetime(df.index)

Back

Use pandas to resample a dataframe from daily to weekly

Front

df = df.resample(rule = 'W', how = 'last)

Back

What does an alpha = 0.05 mean as a confidence interval for ACF? what should alpha be set to if you do not want to see confidence bands?

Front

There is a 5% change that if true autocorrelation is zero, it will fall outside blue band. alpha = 1.0 will remove CI bands

Back

Create a dendrogram from mergings, with the list companies for labels, rotate 90, font size=6

Front

dendrogram(mergings, labels=companies leaf_rotation = 90, leaf_font_size = 6)

Back

What can be used to compare documents using NMF values for a recommender system?

Front

cosine similarity

Back

What needs to be imported to execute the augmented dickey-fuller test on X?

Front

from statsmodels.tsa.stattools import adfuller adfuller(X)

Back

What is the relationship between r-squared and correlation?

Front

correlation^2 = rsquared sign(corr) = sign(regression slope)

Back

What is the equation for Random Walk?

Front

Today's price is equal to yesterday's price plus some noise. Pt = Pt-1 + Et

Back

Section 5

(40 cards)

Interpret positive and negative values of Phi in an AR model?

Front

negative Phi: Mean Reversion positive Phi: Momentum

Back

What is a good example of Cointegration?

Front

Dog on a Leash Pt = owner position Qt = dog position both series looks like a random walk but difference or distance between them looks mean reverting. if dog falls behind, it gets pulled forward if dog gets too far ahead, gets pulled back heating oil and Natural Gas are both random walks but spread (difference) is mean reverting

Back

Import a non-linear support vector classifier and fit to X,y and score x_test, y_test

Front

from sklearn.svm import SVC svm = SVC() svm.fit(X,y) svm.score(x_test, y_test)

Back

What is different between the forecast plot for AR and MA models in Python?

Front

The forecast for MA will be the same after the one-step ahead prediction

Back

You run the adfuller test and get a p-value of 0.583, how is this interpreted?

Front

The p-value is high enough that the null hypothesis is not rejected. The series is a Random Walk.

Back

What is the loss function for LinearRegression?

Front

Minimize sum of squared errors SUM (Xi - X^i)^2

Back

For a MA model what if Theta is zero?

Front

White noise

Back

You have created an ARMA model in result, plot and predict new data

Front

result.plot_predict(start=0, end=2022)

Back

If you have a model, result, how cna you print the AIC and BIC?

Front

result.aic result.bic

Back

What is a decision boundary?

Front

the surface separating different predicted classes

Back

How can you calculate a cumulative sum in an array from a list of values where each element in the array adds the next value in the list?

Front

numpy.cumsum()

Back

What is the interpretation of negative and positive theta for an MA(1)?

Front

Negative Theta: One-period Mean reversion Positive Theta: One-period Momentum

Back

What is cointegration?

Front

Even if two series are random walks, their linear combination (Pt - cQt) may not be a random walk. If that's true Pt - cQt is forecastable and they are said to be co-integrated

Back

What is the AR model if Phi is 1? 0?

Front

1 = Random Walk 0 = White Noise

Back

You have fit a ARMA model and saved in result. Print out the parameters from result.

Front

result.params

Back

Describe the moving average model and the equation?

Front

In a MA model, todays value equals a mean plus noise plus a fraction of yesterday's noise Rt = Mu + Et(1) + Theta*Et-1

Back

If the AR1 parameter is 0.9, then the autocorrelation is 0.9 for the first lag, what is the autocorrelation for lag2? lag3?

Front

(0.9)^2 = 0.81 (0.9)^3 = 0.729

Back

What does linearly separable mean?

Front

a data set can be perfectly explained by a linear classifier

Back

What must Phi be for stationarity?

Front

-1 < Phi < 1

Back

import package and plot pacf with 20 lags and an alpha of 0.05

Front

from statsmodels.graphics.tsaplots import plot_pacf plot_pacf(x, lags=20, alpha =0.5)

Back

how do you make a random walk stationary?

Front

Differencing Df.Col.diff() will give first difference

Back

Wha is the equation and description of an Autoregressive model?

Front

Today's value equals plus a fraction of yesterdays value plus noise Rt = Mu + Phi*Rt-1 + Et

Back

We have data with quarterly seasonality, how can we take the correct difference - 4?

Front

Seasonal.Quarter.diff(4)

Back

If you only have a lag of 1 in an AR model what is this called?

Front

AR(1) Model or AR Model of order 1.

Back

What are some examples of Non-stationary series?

Front

Random Walk Seasonal Data - mean varies with time of year Trending Data - mean increases over time

Back

What is an ARMA model?

Front

A combination AR and MA model R = Mu + PhiRt-1 + Et + ThetaEt-1

Back

What is stationarity?

Front

Like the independence assumption in Regression. Strong Stationarity in time series means that the joint distribution of the observations is not dependent on time Weak Stationarity: mean, variance and autocorrelation are time-invariant

Back

Address a multiplicative increase and differensing at the same time - df.trend_season?

Front

np.log(df.trend_season).diff(4)

Back

You have an model, ARIMA(data, order=(1,1,1) (p,d,q). What does the order argument mean?

Front

p=number of AR parameters d = number of differences q = MA parameters

Back

Is lower or higher Information Criteria better?

Front

Lower is better

Back

What are economic substitutes

Front

Commodities that can act as substitutes and as a result their prices are linked heating oil and Natural Gas Platinum and Paladium Corn and Wheat Corn and Sugar

Back

What is the loss for Classification?

Front

Crossentropy: SUM(-ln(y_predicted))

Back

Import logistic regression and fit on X,y, then score x_test, y_test

Front

from sklearn.linear_model import LogisticRegression lr = LogisticRegression() lr.fit(X,y) lr.score(x_test, y_test)

Back

What does the partial autocorrelation function do?

Front

estimates the benefit of adding additional lags

Back

Name two methods of identifying the Order of an AR Model?

Front

1. Partial Autocorrelation Function 2. Information Criteria

Back

Why do we care about Stationarity?

Front

If a time series is not stationary then it becomes difficult to model. If parameters vary with time, too many parameters to estimate b/c parameters will be different in time. Stationarity necessary to reduce the set of parameters for estimation.

Back

Import the necessary package from statsmodels and fit an AR model of order 1 with no MA component

Front

from statsmodels.tsa.arima_model import ARMA model = ARMA(data, order=(1,0) result = model.fit()

Back

What is information criteria?

Front

adjusts goodness-of-fit for number of parameters by instituting a penalty for additional parameters two common measures: AIC and BIC

Back

What are to steps to test for Cointegration? What statsmodels function can be used to test in one step

Front

1. Regress Pt on Qt and get slope 2. Run adfuller test on Pt-cQt) to test for random walk Alternatively can use coint function to combine both steps coint(P, Q)

Back

If you run the Dickey-Fuller test and observe a p-value of 0.78 what can you conclude?

Front

Do not reject the null hypothesis that the series is a random walk.

Back