Analyzing Patient Data
|
Import a library into a program using import libraryname .
Use the numpy library to work with arrays in Python.
Use variable = value to assign a value to a variable in order to record it in memory.
Variables are created on demand whenever a value is assigned to them.
Use print(something) to display the value of something .
The expression array.shape gives the shape of an array.
Use array[x, y] to select a single element from a 2D array.
Array indices start at 0, not 1.
Use low:high to specify a slice that includes the indices from low to high-1 .
All the indexing and slicing that works on arrays also works on strings.
Use # some kind of explanation to add comments to programs.
Use numpy.mean(array) , numpy.max(array) , and numpy.min(array) to calculate simple statistics.
Use numpy.mean(array, axis=0) or numpy.mean(array, axis=1) to calculate statistics across the specified axis.
Use the pyplot library from matplotlib for creating simple visualizations.
|
Repeating Actions with Loops
|
Use for variable in sequence to process the elements of a sequence one at a time.
The body of a for loop must be indented.
Use len(thing) to determine the length of something that contains other values.
|
Storing Multiple Values in Lists
|
[value1, value2, value3, ...] creates a list.
Lists are indexed and sliced in the same way as strings and arrays.
Lists are mutable (i.e., their values can be changed in place).
Strings are immutable (i.e., the characters in them cannot be changed).
|
Analyzing Data from Multiple Files
|
Use glob.glob(pattern) to create a list of files whose names match a pattern.
Use * in a pattern to match zero or more characters, and ? to match any single character.
|
Making Choices
|
Use if condition to start a conditional statement, elif condition to provide additional tests, and else to provide a default.
The bodies of the branches of conditional statements must be indented.
Use == to test for equality.
X and Y is only true if both X and Y are true.
X or Y is true if either X or Y , or both, are true.
Zero, the empty string, and the empty list are considered false; all other numbers, strings, and lists are considered true.
Nest loops to operate on multi-dimensional data.
Put code whose parameters change frequently in a function, then call it with different parameter values to customize its behavior.
|
Creating Functions
|
Define a function using def name(...params...) .
The body of a function must be indented.
Call a function using name(...values...) .
Numbers are stored as integers or floating-point numbers.
Integer division produces the whole part of the answer (not the fractional part).
Each time a function is called, a new stack frame is created on the call stack to hold its parameters and local variables.
Python looks for variables in the current stack frame before looking for them at the top level.
Use help(thing) to view help for something.
Put docstrings in functions to provide help for that function.
Specify default values for parameters when defining a function using name=value in the parameter list.
Parameters can be passed by matching based on name, by position, or by omitting them (in which case the default value is used).
|
Reading Tabular Data into DataFrames
|
Use the Pandas library to do statistics on tabular data.
Use index_col to specify that a column’s values should be used as row headings.
Use DataFrame.info to find out more about a dataframe.
The DataFrame.columns variable stores information about the dataframe’s columns.
Use DataFrame.T to transpose a dataframe.
Use DataFrame.describe to get summary statistics about data.
|
Pandas DataFrames
|
Use DataFrame.iloc[..., ...] to select values by integer location.
Use : on its own to mean all columns or all rows.
Select multiple columns or rows using DataFrame.loc and a named slice.
Result of slicing can be used in further operations.
Use comparisons to select data based on value.
Select values or NaN using a Boolean mask.
|
Plotting
|
matplotlib is the most widely used scientific plotting library in Python.
Plot data directly from a Pandas dataframe.
Select and transform data, then plot it.
Many styles of plot are available.
Can plot many sets of data together.
|
Machine Learning Flash
|
flatten input dataset as the SVM is unaware of the idea of an image
Split your data 50/50 and train on the first half.
Predict the other half.
Produce a confusion matrix to check the quality of the learning.
Plot some images and their predicted values.
|