Showing posts with label regression. Show all posts
Showing posts with label regression. Show all posts

Wednesday, March 21, 2012

linear regression with nested explanation variable

We are trying to create a model of linear regression with nested table. We used the create mining model sintax as follow :

create mining model rate_plan3002_nested2

( CUST_cycle LONG KEY,

VOICE_CHARGES double CONTINUOUS predict,

DUR_PARTNER_GRP_1 double regressor CONTINUOUS ,

nested_taarif_time_3002 table

( CUST_cycle long CONTINUOUS,

TARIFF_TIME text key,

TARIFF_VOICE_DUR_ALL double regressor CONTINUOUS

)

) using microsoft_linear_regression

INSERT INTO MINING STRUCTURE [rate_plan3002_nested2_Structure]

(CUST_cycle ,

VOICE_CHARGES ,

DUR_PARTNER_GRP_1 ,

[nested_taarif_time_3002](SKIP,TARIFF_TIME ,TARIFF_VOICE_DUR_ALL)

)

SHAPE {

OPENQUERY([Cell],

'SELECT CUST_cycle ,

VOICE_CHARGES ,

DUR_PARTNER_GRP_1

FROM dbo.panel_anality_3002

order by CUST_cycle ')}

APPEND

({OPENQUERY([Cell],

'select CUST_cycle,

TARIFF_TIME,

CYCLE_DATE

from dbo.nested_taarif_time_3002

order by CUST_cycle,TARIFF_TIME')

}

relate CUST_cycle to CUST_cycle

) as nested_taarif_time_3002

The results we got are a model with intercept only. if we don't use the nested variable (the red line) we get a rigth model . (we had more variable ....)

Is there a way to do this regression correctly?

Thanks,

Dror

Hi Dror,

You could remove the "regressor" flag from the nested table column (in the create mining model statement) if this column is not indented to be part of the regression equation.

Thanks,

Dana Cristofor

|||

Thanks Dana,

the problem is that ido want it to be part of the regression

otherwise i don't need the nested table

Thanks

|||

I'm investigating an issue - let me work on this and get back to you.

Thanks

|||

I think I may have found the problem. I created a simple model with processed and had expected regressions. I then created the same model using a nested table and processed and got a constant result back - very confusing. I then tried creating an additional model in the nested structure that would be identical to the first non-nested model I created, and again got no regressions - startling!

What I found out was that for some reason (possibly a bug) when I added a nested table, the wizard did not add the "regressor" flag to any of my continuous inputs. Once I manually added the regressor flag and reprocessed, I got the expected regressions in my output.

Please check the regressor flag on the model columns and let me know if this helps for you. To set the regressor flag, go to the Mining Models tab of the Data Mining designer, click the cell representing the input column under the mining model (not the mining structure) and view the properties. The regressor flag is a possible option for the mining model column.

Thanks

-Jamie

|||

Thanks Jamie,

we added the regressor flag in the minig models tab, and the we got the same results:

when we add the regressor flag to the column in the nested table, we get an intercept only model.

if we put the regressor flag on the nested table only, and not on the column of the table, we get the same regression as if we didn't use the nested table.

is there a way to solve it?

we would also like to run the model from the DMXquery of the management studio, is there a way to get the script of the model from visual studio?

Thanks,

Dror

|||

I wonder if it's possible that there's too much "noise" with the nested table? I've tried with a degenerate case to prove that there's no obvious bug in the software preventing nested regressors from working. Can you also try the same model with the decision tree algorithm and see what happens?

You can get the DMX form of the model by following the instructions at the tip and trick here: http://www.sqlserverdatamining.com/DMCommunity/TipsNTricks/3652.aspx

Linear Regression for column values

This is a real challenge. I hope someone is smart enough to know how
to do this.

I have a table

TABLE1
[Column 1- 2001]
[Column 2- 2002]
[Column 3- 2003]
[Column 4 - 2004]
[Column 5 - 2005]
[Column 6 - 2006]
[Column 7 - Slope]

[2001][2002][2003][2004][2005][2006] [Slope]
[1] [2] [3] [4] [5] [6] [1]
[1.2] [.9] [4] [5] [5.4] [6.2] [?]

Slope is defined as "M" in the equation y=mx+b

I need a way a finding the linear equation that best fits the points so
I can have SQL calculate the slope.

Are there any smart people around that would know how to do this?

thankssql guy123 (stoppal@.hotmail.com) writes:

Quote:

Originally Posted by

This is a real challenge. I hope someone is smart enough to know how
to do this.
>
I have a table
>
TABLE1
[Column 1- 2001]
[Column 2- 2002]
[Column 3- 2003]
[Column 4 - 2004]
[Column 5 - 2005]
[Column 6 - 2006]
[Column 7 - Slope]
>
>
[2001][2002][2003][2004][2005][2006] [Slope]
[1] [2] [3] [4] [5] [6] [1]
[1.2] [.9] [4] [5] [5.4] [6.2] [?]
>
>
Slope is defined as "M" in the equation y=mx+b
>
I need a way a finding the linear equation that best fits the points so
I can have SQL calculate the slope.
>
Are there any smart people around that would know how to do this?


Smart? Either you did learn linear regression in school or you didn't.

I will have to admit that although I taught mathematical statistics as a
student, that I've forgotten the forumulas. But I looked them up in my
statisticcs book:

m = SUM((Xi - AVG(X))*(Yi - AVG(Y)) / SUM(SQR(Xi - AVG(X))

where Xi and Yi are the individual values for the X and Y variables.

For the computation in SQL you can of course not use SUM and AVG since
your table is turned the wrong way. The years should have been columns
instead.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||Thank you for your information. I never learned the actual formula,
only how to click the linear regression button in excel.

Is there a way to transform the table into the correct format? Maybe a
tranverse function in SQL?

thanks

Erland Sommarskog wrote:

Quote:

Originally Posted by

sql guy123 (stoppal@.hotmail.com) writes:

Quote:

Originally Posted by

This is a real challenge. I hope someone is smart enough to know how
to do this.

I have a table

TABLE1
[Column 1- 2001]
[Column 2- 2002]
[Column 3- 2003]
[Column 4 - 2004]
[Column 5 - 2005]
[Column 6 - 2006]
[Column 7 - Slope]

[2001][2002][2003][2004][2005][2006] [Slope]
[1] [2] [3] [4] [5] [6] [1]
[1.2] [.9] [4] [5] [5.4] [6.2] [?]

Slope is defined as "M" in the equation y=mx+b

I need a way a finding the linear equation that best fits the points so
I can have SQL calculate the slope.

Are there any smart people around that would know how to do this?


>
Smart? Either you did learn linear regression in school or you didn't.
>
I will have to admit that although I taught mathematical statistics as a
student, that I've forgotten the forumulas. But I looked them up in my
statisticcs book:
>
m = SUM((Xi - AVG(X))*(Yi - AVG(Y)) / SUM(SQR(Xi - AVG(X))
>
where Xi and Yi are the individual values for the X and Y variables.
>
For the computation in SQL you can of course not use SUM and AVG since
your table is turned the wrong way. The years should have been columns
instead.
>
>
>
--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
>
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx

|||sql guy123 (stoppal@.hotmail.com) writes:

Quote:

Originally Posted by

Thank you for your information. I never learned the actual formula,
only how to click the linear regression button in excel.
>
Is there a way to transform the table into the correct format? Maybe a
tranverse function in SQL?


There is an UNPIVOT operator in SQL 2005. But it's mainly syntactic
sugar, and you can easily unpivot without it:

SELECT year, value = CASE year WHEN 2001 THEN [2001]
WHEN 2002 THEN [2002]
...
END
FROM tbl
CROSS JOIN (SELECT 2001 UNION ALL
SELECT 2002 UNION ALL
...) AS year

My thought, though, was that you would redesign the table on a
permanent basis.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx

Linear Regression and tolking the coefficient for each variabel?

When using linear regression in the SQL Server 2005 Business IntelIigence Studio I interpet the information below as follow: X has a standard deviation of +- 37.046. Is it possible to obtain the standard deviation of each coefficient in the regression expression?

You can get the variance of the coefficients from node_distribution of linear regression root node. The following DMX query will return the information you need:

select flattened Node_Distribution from [Model Name].Content where Node_type = 25

The query result is a DataTable with the following column names:

Node_Distribution.ATTRIBUTE_NAMENode_Distribution.ATTRIBUTE_VALUE

Node_Distribution.SUPPORTNode_Distribution.PROBABILITYNode_Distribution.VARIANCE

Node_Distribution.VALUETYPE

And the data rows will look like this:

C5Missing00.007874015748031501

C5228.58812310000.99212598425196997913.5033

C431.9528749318584006.737398583630157

C432.1589801071577000 8

C45.29710479474068006.737398583630159

59.33039676663630090944.677980779511

The rows with a value type 7 (emphasized in above table) contain the information you need. You might need to write a small C# program to iterate through the resulting datable to get the information you want.

Thanks,

Linear Regression and tolking the coefficient for each variabel?

When using linear regression in the SQL Server 2005 Business IntelIigence Studio I interpet the information below as follow: X has a standard deviation of +- 37.046. Is it possible to obtain the standard deviation of each coefficient in the regression expression?

You can get the variance of the coefficients from node_distribution of linear regression root node. The following DMX query will return the information you need:

select flattened Node_Distribution from [Model Name].Content where Node_type = 25

The query result is a DataTable with the following column names:

Node_Distribution.ATTRIBUTE_NAMENode_Distribution.ATTRIBUTE_VALUE

Node_Distribution.SUPPORTNode_Distribution.PROBABILITYNode_Distribution.VARIANCE

Node_Distribution.VALUETYPE

And the data rows will look like this:

C5Missing00.007874015748031501

C5228.58812310000.99212598425196997913.5033

C431.9528749318584006.737398583630157

C432.1589801071577000 8

C45.29710479474068006.737398583630159

59.33039676663630090944.677980779511

The rows with a value type 7 (emphasized in above table) contain the information you need. You might need to write a small C# program to iterate through the resulting datable to get the information you want.

Thanks,

sql