You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Rename the internal transform function to mapColumns.
* Change the exception class to have a context that we can overwrite with the call-site.
* Update tutorials
As hinted in the previous example we can create a dataframe with `fromList`. This function takes in a list of tuples. We don't broadast values like python does i.e if you put in a single value into a column all other values will be null/nothing. But we'll detail how to get the same functionality.
85
+
As hinted in the previous example we can create a dataframe with `fromNamedColumns`. This function takes in a list of tuples. We don't broadast values like python does i.e if you put in a single value into a column all other values will be null/nothing. But we'll detail how to get the same functionality.
86
86
87
87
```python
88
88
df2 = pd.DataFrame(
@@ -110,13 +110,13 @@ df2 = pd.DataFrame(
110
110
-- All our data types must be printable and orderable.
@@ -164,12 +136,15 @@ index | name | birth_year | bmi
164
136
3 | Daniel Donovan | 1981 | 27.13469387755102
165
137
```
166
138
167
-
The dataframe implementation can be read top down. `apply` a function that gets the year to the `birthdate`;
168
-
store the result in the `birth_year` column; combine `weight` and `height` into the bmi column using the
169
-
formula `w / h ** 2`; then select the `name`, `birth_year` and `bmi` fields.
170
139
171
-
Dataframe focuses on splitting transformations into transformations on the whole dataframe so it's easily usable
172
-
in a repl-like environment.
140
+
The Haskell implementation can be read top down:
141
+
* Create a column called `birth_year` by getting the year from the `birthdate` column.
142
+
* Create a column called `bmi`which is computed as `weight / height ** 2`,
143
+
* then select the `name`, `birth_year` and `bmi` fields.
144
+
145
+
`lift` takes a regular, unary (one argument) Haskell function and applied it to a column. To apply a binary function to two columns we use `lift2`.
146
+
147
+
The Polars column type can be a single column or a list of columns. This means that applying a single transformation to many columns can be written as follows:
173
148
174
149
In the example Polars expression expansion example:
175
150
@@ -181,7 +156,7 @@ result = df.select(
181
156
print(result)
182
157
```
183
158
184
-
We instead write this two `applyWithAlias` calls:
159
+
In Haskell, we don't provide a way of doing this out of the box. So you'd have to write something more explicit:
185
160
186
161
```haskell
187
162
df_csv
@@ -202,7 +177,7 @@ index | name | height-5% | weight-5%
202
177
3 | Daniel Donovan | 1.6624999999999999 | 78.945
203
178
```
204
179
205
-
However we can make our program shorter by using regular Haskell and folding over the dataframe.
180
+
We can use standard Haskell machinery to make the program short without sactificing readability.
206
181
207
182
```haskell
208
183
let reduce name =D.derive (name <>"-5%") ((col @Double name) * (lit 0.95))
@@ -211,16 +186,19 @@ df_csv
211
186
|>D.select ["name", "weight-5%", "height-5%"]
212
187
```
213
188
214
-
Or alternatively,
189
+
Or alternatively, if our transformation only involves the variable we are modifying we can write the same code as follows:
215
190
216
191
```haskell
217
192
addSuffix suffix name =D.rename name (name <> suffix)
218
193
df_csv
219
194
|>D.applyMany ["weight", "height"] (*0.95)
195
+
-- We have to rename the fields so they match what we had before.
220
196
|>D.fold (addSuffix "-5%")
221
197
|>D.select ["name", "weight-5%", "height-5%"]
222
198
```
223
199
200
+
This means that we can still rely on the expressive power of Haskell itself without relying entirely on the column expressions. This keeps our implementation more flexible.
0 commit comments